The commit "13bcd440f2ff nvmem: core: verify cell's raw_len" caused an
extension of the "mac-address" cell from 6 to 8 bytes due to word_size
of 4 bytes. This led to a required byte swap of the full buffer length,
which caused truncation of the mac-address when read.
Previously, the mac-address was incorrectly truncated from
70:B3:D5:14:E9:0E to 00:00:70:B3:D5:14.
Fix the issue by swapping only the first 6 bytes to correctly pass the
mac-address to the upper layers.
Fixes: 13bcd440f2ff ("nvmem: core: verify cell's raw_len")
Cc: stable(a)vger.kernel.org
Signed-off-by: Steffen Bätz <steffen(a)innosonix.de>
Tested-by: Alexander Stein <alexander.stein(a)ew.tq-group.com>
---
v4:
- Adopted the commit message wording recommended by
Frank.li(a)nxp.com to improve clarity.
- Simplified byte count determination by using min() instead of an
explicit conditional statement.
- Kept the Tested-by tag from the previous patch as the change
remains trivial but tested.
v3:
- replace magic number 6 with ETH_ALEN
- Fix misleading indentation and properly group 'mac-address' statements
v2:
- Add Cc: stable(a)vger.kernel.org as requested by Greg KH's patch bot
drivers/nvmem/imx-ocotp-ele.c | 5 ++++-
drivers/nvmem/imx-ocotp.c | 5 ++++-
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/nvmem/imx-ocotp-ele.c b/drivers/nvmem/imx-ocotp-ele.c
index ca6dd71d8a2e..7807ec0e2d18 100644
--- a/drivers/nvmem/imx-ocotp-ele.c
+++ b/drivers/nvmem/imx-ocotp-ele.c
@@ -12,6 +12,7 @@
#include <linux/of.h>
#include <linux/platform_device.h>
#include <linux/slab.h>
+#include <linux/if_ether.h> /* ETH_ALEN */
enum fuse_type {
FUSE_FSB = BIT(0),
@@ -118,9 +119,11 @@ static int imx_ocotp_cell_pp(void *context, const char *id, int index,
int i;
/* Deal with some post processing of nvmem cell data */
- if (id && !strcmp(id, "mac-address"))
+ if (id && !strcmp(id, "mac-address")) {
+ bytes = min(bytes, ETH_ALEN);
for (i = 0; i < bytes / 2; i++)
swap(buf[i], buf[bytes - i - 1]);
+ }
return 0;
}
diff --git a/drivers/nvmem/imx-ocotp.c b/drivers/nvmem/imx-ocotp.c
index 79dd4fda0329..7bf7656d4f96 100644
--- a/drivers/nvmem/imx-ocotp.c
+++ b/drivers/nvmem/imx-ocotp.c
@@ -23,6 +23,7 @@
#include <linux/platform_device.h>
#include <linux/slab.h>
#include <linux/delay.h>
+#include <linux/if_ether.h> /* ETH_ALEN */
#define IMX_OCOTP_OFFSET_B0W0 0x400 /* Offset from base address of the
* OTP Bank0 Word0
@@ -227,9 +228,11 @@ static int imx_ocotp_cell_pp(void *context, const char *id, int index,
int i;
/* Deal with some post processing of nvmem cell data */
- if (id && !strcmp(id, "mac-address"))
+ if (id && !strcmp(id, "mac-address")) {
+ bytes = min(bytes, ETH_ALEN);
for (i = 0; i < bytes / 2; i++)
swap(buf[i], buf[bytes - i - 1]);
+ }
return 0;
}
--
2.43.0
--
*innosonix GmbH*
Hauptstr. 35
96482 Ahorn
central: +49 9561 7459980
www.innosonix.de <http://www.innosonix.de>
innosonix GmbH
Geschäftsführer:
Markus Bätz, Steffen Bätz
USt.-IdNr / VAT-Nr.: DE266020313
EORI-Nr.:
DE240121536680271
HRB 5192 Coburg
WEEE-Reg.-Nr. DE88021242
When netfslib is issuing subrequests, the subrequests start processing
immediately and may complete before we reach the end of the issuing
function. At the end of the issuing function we set NETFS_RREQ_ALL_QUEUED
to indicate to the collector that we aren't going to issue any more subreqs
and that it can do the final notifications and cleanup.
Now, this isn't a problem if the request is synchronous
(NETFS_RREQ_OFFLOAD_COLLECTION is unset) as the result collection will be
done in-thread and we're guaranteed an opportunity to run the collector.
However, if the request is asynchronous, collection is primarily triggered
by the termination of subrequests queuing it on a workqueue. Now, a race
can occur here if the app thread sets ALL_QUEUED after the last subrequest
terminates.
This can happen most easily with the copy2cache code (as used by Ceph)
where, in the collection routine of a read request, an asynchronous write
request is spawned to copy data to the cache. Folios are added to the
write request as they're unlocked, but there may be a delay before
ALL_QUEUED is set as the write subrequests may complete before we get
there.
If all the write subreqs have finished by the ALL_QUEUED point, no further
events happen and the collection never happens, leaving the request
hanging.
Fix this by queuing the collector after setting ALL_QUEUED. This is a bit
heavy-handed and it may be sufficient to do it only if there are no extant
subreqs.
Also add a tracepoint to cross-reference both requests in a copy-to-request
operation and add a trace to the netfs_rreq tracepoint to indicate the
setting of ALL_QUEUED.
Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item")
Reported-by: Max Kellermann <max.kellermann(a)ionos.com>
Link: https://lore.kernel.org/r/CAKPOu+8z_ijTLHdiCYGU_Uk7yYD=shxyGLwfe-L7AV3DhebS…
Signed-off-by: David Howells <dhowells(a)redhat.com>
cc: Paulo Alcantara <pc(a)manguebit.org>
cc: Viacheslav Dubeyko <slava(a)dubeyko.com>
cc: Alex Markuze <amarkuze(a)redhat.com>
cc: Ilya Dryomov <idryomov(a)gmail.com>
cc: netfs(a)lists.linux.dev
cc: ceph-devel(a)vger.kernel.org
cc: linux-fsdevel(a)vger.kernel.org
cc: stable(a)vger.kernel.org
---
fs/netfs/read_pgpriv2.c | 4 ++++
include/trace/events/netfs.h | 30 ++++++++++++++++++++++++++++++
2 files changed, 34 insertions(+)
diff --git a/fs/netfs/read_pgpriv2.c b/fs/netfs/read_pgpriv2.c
index 080d2a6a51d9..8097bc069c1d 100644
--- a/fs/netfs/read_pgpriv2.c
+++ b/fs/netfs/read_pgpriv2.c
@@ -111,6 +111,7 @@ static struct netfs_io_request *netfs_pgpriv2_begin_copy_to_cache(
goto cancel_put;
__set_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &creq->flags);
+ trace_netfs_copy2cache(rreq, creq);
trace_netfs_write(creq, netfs_write_trace_copy_to_cache);
netfs_stat(&netfs_n_wh_copy_to_cache);
rreq->copy_to_cache = creq;
@@ -155,6 +156,9 @@ void netfs_pgpriv2_end_copy_to_cache(struct netfs_io_request *rreq)
netfs_issue_write(creq, &creq->io_streams[1]);
smp_wmb(); /* Write lists before ALL_QUEUED. */
set_bit(NETFS_RREQ_ALL_QUEUED, &creq->flags);
+ trace_netfs_rreq(rreq, netfs_rreq_trace_end_copy_to_cache);
+ if (list_empty_careful(&creq->io_streams[1].subrequests))
+ netfs_wake_collector(creq);
netfs_put_request(creq, netfs_rreq_trace_put_return);
creq->copy_to_cache = NULL;
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index 73e96ccbe830..64a382fbc31a 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -55,6 +55,7 @@
EM(netfs_rreq_trace_copy, "COPY ") \
EM(netfs_rreq_trace_dirty, "DIRTY ") \
EM(netfs_rreq_trace_done, "DONE ") \
+ EM(netfs_rreq_trace_end_copy_to_cache, "END-C2C") \
EM(netfs_rreq_trace_free, "FREE ") \
EM(netfs_rreq_trace_ki_complete, "KI-CMPL") \
EM(netfs_rreq_trace_recollect, "RECLLCT") \
@@ -559,6 +560,35 @@ TRACE_EVENT(netfs_write,
__entry->start, __entry->start + __entry->len - 1)
);
+TRACE_EVENT(netfs_copy2cache,
+ TP_PROTO(const struct netfs_io_request *rreq,
+ const struct netfs_io_request *creq),
+
+ TP_ARGS(rreq, creq),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, rreq)
+ __field(unsigned int, creq)
+ __field(unsigned int, cookie)
+ __field(unsigned int, ino)
+ ),
+
+ TP_fast_assign(
+ struct netfs_inode *__ctx = netfs_inode(rreq->inode);
+ struct fscache_cookie *__cookie = netfs_i_cookie(__ctx);
+ __entry->rreq = rreq->debug_id;
+ __entry->creq = creq->debug_id;
+ __entry->cookie = __cookie ? __cookie->debug_id : 0;
+ __entry->ino = rreq->inode->i_ino;
+ ),
+
+ TP_printk("R=%08x CR=%08x c=%08x i=%x ",
+ __entry->rreq,
+ __entry->creq,
+ __entry->cookie,
+ __entry->ino)
+ );
+
TRACE_EVENT(netfs_collect,
TP_PROTO(const struct netfs_io_request *wreq),
From: Ge Yang <yangge1116(a)126.com>
Since commit d228814b1913 ("efi/libstub: Add get_event_log() support
for CC platforms") reuses TPM2 support code for the CC platforms, when
launching a TDX virtual machine with coco measurement enabled, the
following error log is generated:
[Firmware Bug]: Failed to parse event in TPM Final Events Log
Call Trace:
efi_config_parse_tables()
efi_tpm_eventlog_init()
tpm2_calc_event_log_size()
__calc_tpm2_event_size()
The pcr_idx value in the Intel TDX log header is 1, causing the function
__calc_tpm2_event_size() to fail to recognize the log header, ultimately
leading to the "Failed to parse event in TPM Final Events Log" error.
According to UEFI Specification 2.10, Section 38.4.1: For TDX, TPM PCR
0 maps to MRTD, so the log header uses TPM PCR 1 instead. To successfully
parse the TDX event log header, the check for a pcr_idx value of 0
must be skipped.
According to Table 6 in Section 10.2.1 of the TCG PC Client
Specification, the index field does not require the PCR index to be
fixed at zero. Therefore, skipping the check for a pcr_idx value of
0 for CC platforms is safe.
Link: https://uefi.org/specs/UEFI/2.10/38_Confidential_Computing.html#intel-trust…
Link: https://trustedcomputinggroup.org/wp-content/uploads/TCG_PCClient_PFP_r1p05…
Fixes: d228814b1913 ("efi/libstub: Add get_event_log() support for CC platforms")
Signed-off-by: Ge Yang <yangge1116(a)126.com>
Cc: stable(a)vger.kernel.org
---
V3:
- fix build error
V2:
- limit the fix for CC only suggested by Jarkko and Sathyanarayanan
drivers/char/tpm/eventlog/tpm2.c | 4 +++-
drivers/firmware/efi/libstub/tpm.c | 13 +++++++++----
drivers/firmware/efi/tpm.c | 4 +++-
include/linux/tpm_eventlog.h | 14 +++++++++++---
4 files changed, 26 insertions(+), 9 deletions(-)
diff --git a/drivers/char/tpm/eventlog/tpm2.c b/drivers/char/tpm/eventlog/tpm2.c
index 37a0580..30ef47c 100644
--- a/drivers/char/tpm/eventlog/tpm2.c
+++ b/drivers/char/tpm/eventlog/tpm2.c
@@ -18,6 +18,7 @@
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/tpm_eventlog.h>
+#include <linux/cc_platform.h>
#include "../tpm.h"
#include "common.h"
@@ -36,7 +37,8 @@
static size_t calc_tpm2_event_size(struct tcg_pcr_event2_head *event,
struct tcg_pcr_event *event_header)
{
- return __calc_tpm2_event_size(event, event_header, false);
+ return __calc_tpm2_event_size(event, event_header, false,
+ cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT));
}
static void *tpm2_bios_measurements_start(struct seq_file *m, loff_t *pos)
diff --git a/drivers/firmware/efi/libstub/tpm.c b/drivers/firmware/efi/libstub/tpm.c
index a5c6c4f..9728060 100644
--- a/drivers/firmware/efi/libstub/tpm.c
+++ b/drivers/firmware/efi/libstub/tpm.c
@@ -50,7 +50,8 @@ void efi_enable_reset_attack_mitigation(void)
static void efi_retrieve_tcg2_eventlog(int version, efi_physical_addr_t log_location,
efi_physical_addr_t log_last_entry,
efi_bool_t truncated,
- struct efi_tcg2_final_events_table *final_events_table)
+ struct efi_tcg2_final_events_table *final_events_table,
+ bool is_cc_event)
{
efi_guid_t linux_eventlog_guid = LINUX_EFI_TPM_EVENT_LOG_GUID;
efi_status_t status;
@@ -87,7 +88,8 @@ static void efi_retrieve_tcg2_eventlog(int version, efi_physical_addr_t log_loca
last_entry_size =
__calc_tpm2_event_size((void *)last_entry_addr,
(void *)(long)log_location,
- false);
+ false,
+ is_cc_event);
} else {
last_entry_size = sizeof(struct tcpa_event) +
((struct tcpa_event *) last_entry_addr)->event_size;
@@ -123,7 +125,8 @@ static void efi_retrieve_tcg2_eventlog(int version, efi_physical_addr_t log_loca
header = data + offset + final_events_size;
event_size = __calc_tpm2_event_size(header,
(void *)(long)log_location,
- false);
+ false,
+ is_cc_event);
/* If calc fails this is a malformed log */
if (!event_size)
break;
@@ -157,6 +160,7 @@ void efi_retrieve_eventlog(void)
efi_tcg2_protocol_t *tpm2 = NULL;
efi_bool_t truncated;
efi_status_t status;
+ bool is_cc_event = false;
status = efi_bs_call(locate_protocol, &tpm2_guid, NULL, (void **)&tpm2);
if (status == EFI_SUCCESS) {
@@ -186,11 +190,12 @@ void efi_retrieve_eventlog(void)
final_events_table =
get_efi_config_table(EFI_CC_FINAL_EVENTS_TABLE_GUID);
+ is_cc_event = true;
}
if (status != EFI_SUCCESS || !log_location)
return;
efi_retrieve_tcg2_eventlog(version, log_location, log_last_entry,
- truncated, final_events_table);
+ truncated, final_events_table, is_cc_event);
}
diff --git a/drivers/firmware/efi/tpm.c b/drivers/firmware/efi/tpm.c
index cdd4310..ca8535d 100644
--- a/drivers/firmware/efi/tpm.c
+++ b/drivers/firmware/efi/tpm.c
@@ -12,6 +12,7 @@
#include <linux/init.h>
#include <linux/memblock.h>
#include <linux/tpm_eventlog.h>
+#include <linux/cc_platform.h>
int efi_tpm_final_log_size;
EXPORT_SYMBOL(efi_tpm_final_log_size);
@@ -23,7 +24,8 @@ static int __init tpm2_calc_event_log_size(void *data, int count, void *size_inf
while (count > 0) {
header = data + size;
- event_size = __calc_tpm2_event_size(header, size_info, true);
+ event_size = __calc_tpm2_event_size(header, size_info, true,
+ cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT));
if (event_size == 0)
return -1;
size += event_size;
diff --git a/include/linux/tpm_eventlog.h b/include/linux/tpm_eventlog.h
index 891368e..b3380c9 100644
--- a/include/linux/tpm_eventlog.h
+++ b/include/linux/tpm_eventlog.h
@@ -143,6 +143,7 @@ struct tcg_algorithm_info {
* @event: Pointer to the event whose size should be calculated
* @event_header: Pointer to the initial event containing the digest lengths
* @do_mapping: Whether or not the event needs to be mapped
+ * @is_cc_event: Whether or not the event is from a CC platform
*
* The TPM2 event log format can contain multiple digests corresponding to
* separate PCR banks, and also contains a variable length of the data that
@@ -159,7 +160,8 @@ struct tcg_algorithm_info {
static __always_inline u32 __calc_tpm2_event_size(struct tcg_pcr_event2_head *event,
struct tcg_pcr_event *event_header,
- bool do_mapping)
+ bool do_mapping,
+ bool is_cc_event)
{
struct tcg_efi_specid_event_head *efispecid;
struct tcg_event_field *event_field;
@@ -201,8 +203,14 @@ static __always_inline u32 __calc_tpm2_event_size(struct tcg_pcr_event2_head *ev
count = event->count;
event_type = event->event_type;
- /* Verify that it's the log header */
- if (event_header->pcr_idx != 0 ||
+ /*
+ * Verify that it's the log header. According to the TCG PC Client
+ * Specification, when identifying a log header, the check for a
+ * pcr_idx value of 0 is not required. For CC platforms, skipping
+ * this check during log header is necessary; otherwise, the CC
+ * platform's log header may fail to be recognized.
+ */
+ if ((!is_cc_event && event_header->pcr_idx != 0) ||
event_header->event_type != NO_ACTION ||
memcmp(event_header->digest, zero_digest, sizeof(zero_digest))) {
size = 0;
--
2.7.4
Since commit 55d4980ce55b ("nvmem: core: support specifying both: cell
raw data & post read lengths"), the aligned length (e.g. '8' instead of
'6') is passed to the read_post_process callback. This causes that the 2
bytes following the MAC address in the ocotp are swapped to the
beginning of the address. As a result, an invalid MAC address is
returned and to make it even worse, this address can be equal on boards
with the same OUI vendor prefix.
Fixes: 55d4980ce55b ("nvmem: core: support specifying both: cell raw data & post read lengths")
Signed-off-by: Christian Eggers <ceggers(a)arri.de>
Cc: stable(a)vger.kernel.org
---
Tested on i.MX6ULL, but I assume that this is also required for.
imx-ocotp-ele.c (i.MX93).
drivers/nvmem/imx-ocotp-ele.c | 5 +++--
drivers/nvmem/imx-ocotp.c | 5 +++--
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/nvmem/imx-ocotp-ele.c b/drivers/nvmem/imx-ocotp-ele.c
index 83617665c8d7..07830785ebf1 100644
--- a/drivers/nvmem/imx-ocotp-ele.c
+++ b/drivers/nvmem/imx-ocotp-ele.c
@@ -6,6 +6,7 @@
*/
#include <linux/device.h>
+#include <linux/if_ether.h>
#include <linux/io.h>
#include <linux/module.h>
#include <linux/nvmem-provider.h>
@@ -119,8 +120,8 @@ static int imx_ocotp_cell_pp(void *context, const char *id, int index,
/* Deal with some post processing of nvmem cell data */
if (id && !strcmp(id, "mac-address"))
- for (i = 0; i < bytes / 2; i++)
- swap(buf[i], buf[bytes - i - 1]);
+ for (i = 0; i < ETH_ALEN / 2; i++)
+ swap(buf[i], buf[ETH_ALEN - i - 1]);
return 0;
}
diff --git a/drivers/nvmem/imx-ocotp.c b/drivers/nvmem/imx-ocotp.c
index 22cc77908018..4dd3b0f94de2 100644
--- a/drivers/nvmem/imx-ocotp.c
+++ b/drivers/nvmem/imx-ocotp.c
@@ -16,6 +16,7 @@
#include <linux/clk.h>
#include <linux/device.h>
+#include <linux/if_ether.h>
#include <linux/io.h>
#include <linux/module.h>
#include <linux/nvmem-provider.h>
@@ -228,8 +229,8 @@ static int imx_ocotp_cell_pp(void *context, const char *id, int index,
/* Deal with some post processing of nvmem cell data */
if (id && !strcmp(id, "mac-address"))
- for (i = 0; i < bytes / 2; i++)
- swap(buf[i], buf[bytes - i - 1]);
+ for (i = 0; i < ETH_ALEN / 2; i++)
+ swap(buf[i], buf[ETH_ALEN - i - 1]);
return 0;
}
--
2.43.0
Verify that the inode mode is sane when loading it from the disk to
avoid complaints from VFS about setting up invalid inodes.
Reported-by: syzbot+895c23f6917da440ed0d(a)syzkaller.appspotmail.com
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
fs/isofs/inode.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
I plan to merge this fix through my tree.
diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
index d5da9817df9b..33e6a620c103 100644
--- a/fs/isofs/inode.c
+++ b/fs/isofs/inode.c
@@ -1440,9 +1440,16 @@ static int isofs_read_inode(struct inode *inode, int relocated)
inode->i_op = &page_symlink_inode_operations;
inode_nohighmem(inode);
inode->i_data.a_ops = &isofs_symlink_aops;
- } else
+ } else if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode) ||
+ S_ISFIFO(inode->i_mode) || S_ISSOCK(inode->i_mode)) {
/* XXX - parse_rock_ridge_inode() had already set i_rdev. */
init_special_inode(inode, inode->i_mode, inode->i_rdev);
+ } else {
+ printk(KERN_DEBUG "ISOFS: Invalid file type 0%04o for inode %lu.\n",
+ inode->i_mode, inode->i_ino);
+ ret = -EIO;
+ goto fail;
+ }
ret = 0;
out:
--
2.43.0
The _CRS resources in many cases want to have ResourceSource field
to be a type of ACPI String. This means that to compile properly
we need to enclosure the name path into double quotes. This will
in practice defer the interpretation to a run-time stage, However,
this may be interpreted differently on different OSes and ACPI
interpreter implementations. In particular ACPICA might not correctly
recognize the leading '^' (caret) character and will not resolve
the relative name path properly. On top of that, this piece may be
used in SSDTs which are loaded after the DSDT and on itself may also
not resolve relative name paths outside of their own scopes.
With this all said, fix documentation to use fully-qualified name
paths always to avoid any misinterpretations, which is proven to
work.
Fixes: 8eb5c87a92c0 ("i2c: add ACPI support for I2C mux ports")
Reported-by: Yevhen Kondrashyn <e.kondrashyn(a)gmail.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
---
Rafael, I prefer, if no objections, to push this as v6.16-rc6 material since
the reported issue was detected on old (v5.10.y) and still LTS kernel. Would be
nice for people to not trap to it in older kernels.
Documentation/firmware-guide/acpi/i2c-muxes.rst | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/Documentation/firmware-guide/acpi/i2c-muxes.rst b/Documentation/firmware-guide/acpi/i2c-muxes.rst
index 3a8997ccd7c4..f366539acd79 100644
--- a/Documentation/firmware-guide/acpi/i2c-muxes.rst
+++ b/Documentation/firmware-guide/acpi/i2c-muxes.rst
@@ -14,7 +14,7 @@ Consider this topology::
| | | 0x70 |--CH01--> i2c client B (0x50)
+------+ +------+
-which corresponds to the following ASL::
+which corresponds to the following ASL (in the scope of \_SB)::
Device (SMB1)
{
@@ -24,7 +24,7 @@ which corresponds to the following ASL::
Name (_HID, ...)
Name (_CRS, ResourceTemplate () {
I2cSerialBus (0x70, ControllerInitiated, I2C_SPEED,
- AddressingMode7Bit, "^SMB1", 0x00,
+ AddressingMode7Bit, "\\_SB.SMB1", 0x00,
ResourceConsumer,,)
}
@@ -37,7 +37,7 @@ which corresponds to the following ASL::
Name (_HID, ...)
Name (_CRS, ResourceTemplate () {
I2cSerialBus (0x50, ControllerInitiated, I2C_SPEED,
- AddressingMode7Bit, "^CH00", 0x00,
+ AddressingMode7Bit, "\\_SB.SMB1.CH00", 0x00,
ResourceConsumer,,)
}
}
@@ -52,7 +52,7 @@ which corresponds to the following ASL::
Name (_HID, ...)
Name (_CRS, ResourceTemplate () {
I2cSerialBus (0x50, ControllerInitiated, I2C_SPEED,
- AddressingMode7Bit, "^CH01", 0x00,
+ AddressingMode7Bit, "\\_SB.SMB1.CH01", 0x00,
ResourceConsumer,,)
}
}
--
2.47.2
The IMX8M reference manuals indicate in the USDHC Clock generator section
that the clock rate for DDR is 1/2 the input clock therefore HS400 rates
clocked at 200Mhz require a 400Mhz SDHC clock.
This showed about a 1.5x improvement in read performance for the eMMC's
used on the various imx8m{m,n,p}-venice boards.
Fixes: 6f30b27c5ef5 ("arm64: dts: imx8mm: Add Gateworks i.MX 8M Mini Development Kits")
Signed-off-by: Tim Harvey <tharvey(a)gateworks.com>
---
arch/arm64/boot/dts/freescale/imx8mm-venice-gw700x.dtsi | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/freescale/imx8mm-venice-gw700x.dtsi b/arch/arm64/boot/dts/freescale/imx8mm-venice-gw700x.dtsi
index 5a3b1142ddf4..37db4f0dd505 100644
--- a/arch/arm64/boot/dts/freescale/imx8mm-venice-gw700x.dtsi
+++ b/arch/arm64/boot/dts/freescale/imx8mm-venice-gw700x.dtsi
@@ -418,6 +418,8 @@ &usdhc3 {
pinctrl-0 = <&pinctrl_usdhc3>;
pinctrl-1 = <&pinctrl_usdhc3_100mhz>;
pinctrl-2 = <&pinctrl_usdhc3_200mhz>;
+ assigned-clocks = <&clk IMX8MM_CLK_USDHC3>;
+ assigned-clock-rates = <400000000>;
bus-width = <8>;
non-removable;
status = "okay";
--
2.25.1
inline data handling has a race between writing and writing to a memory
map.
When ext4_page_mkwrite is called, it calls ext4_convert_inline_data, which
destroys the inline data, but if block allocation fails, restores the
inline data. In that process, we could have:
CPU1 CPU2
destroy_inline_data
write_begin (does not see inline data)
restory_inline_data
write_end (sees inline data)
This leads to bugs like the one below, as write_begin did not prepare for
the case of inline data, which is expected by the write_end side of it.
------------[ cut here ]------------
kernel BUG at fs/ext4/inline.c:235!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 1 UID: 0 PID: 5838 Comm: syz-executor110 Not tainted 6.13.0-rc3-syzkaller-00209-g499551201b5f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
RIP: 0010:ext4_write_inline_data fs/ext4/inline.c:235 [inline]
RIP: 0010:ext4_write_inline_data_end+0xdc7/0xdd0 fs/ext4/inline.c:774
Code: 47 1d 8c e8 4b 3a 91 ff 90 0f 0b e8 63 7a 47 ff 48 8b 7c 24 10 48 c7 c6 e0 47 1d 8c e8 32 3a 91 ff 90 0f 0b e8 4a 7a 47 ff 90 <0f> 0b 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 0018:ffffc900031c7320 EFLAGS: 00010293
RAX: ffffffff8257f9a6 RBX: 000000000000005a RCX: ffff888012968000
RDX: 0000000000000000 RSI: 000000000000005a RDI: 000000000000005b
RBP: ffffc900031c7448 R08: ffffffff8257ef87 R09: 1ffff11006806070
R10: dffffc0000000000 R11: ffffed1006806071 R12: 000000000000005a
R13: dffffc0000000000 R14: ffff888076b65bd8 R15: 000000000000005b
FS: 00007f5c6bacf6c0(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000a00 CR3: 0000000073fb6000 CR4: 0000000000350ef0
Call Trace:
<TASK>
generic_perform_write+0x6f8/0x990 mm/filemap.c:4070
ext4_buffered_write_iter+0xc5/0x350 fs/ext4/file.c:299
ext4_file_write_iter+0x892/0x1c50
iter_file_splice_write+0xbfc/0x1510 fs/splice.c:743
do_splice_from fs/splice.c:941 [inline]
direct_splice_actor+0x11d/0x220 fs/splice.c:1164
splice_direct_to_actor+0x588/0xc80 fs/splice.c:1108
do_splice_direct_actor fs/splice.c:1207 [inline]
do_splice_direct+0x289/0x3e0 fs/splice.c:1233
do_sendfile+0x564/0x8a0 fs/read_write.c:1363
__do_sys_sendfile64 fs/read_write.c:1424 [inline]
__se_sys_sendfile64+0x17c/0x1e0 fs/read_write.c:1410
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5c6bb18d09
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 b1 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f5c6bacf218 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
RAX: ffffffffffffffda RBX: 00007f5c6bba0708 RCX: 00007f5c6bb18d09
RDX: 0000000000000000 RSI: 0000000000000005 RDI: 0000000000000004
RBP: 00007f5c6bba0700 R08: 0000000000000000 R09: 0000000000000000
R10: 000080001d00c0d0 R11: 0000000000000246 R12: 00007f5c6bb6d620
R13: 00007f5c6bb6d0c0 R14: 0031656c69662f2e R15: 8088e3ad122bc192
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
This happens because ext4_page_mkwrite is not protected by the inode_lock.
The xattr semaphore is not sufficient to protect inline data handling in a
sane way, so we need to rely on the inode_lock. Adding the inode_lock to
ext4_page_mkwrite is not an option, otherwise lock-ordering problems with
mmap_lock may arise.
The conversion inside ext4_page_mkwrite was introduced at commit
7b4cc9787fe3 ("ext4: evict inline data when writing to memory map"). This
fixes a documented bug in the commit message, which suggests some
alternative fixes.
Convert inline data when mmap is called, instead of doing it only when the
mmapped page is written to. Using the inode_lock there does not lead to
lock-ordering issues.
The drawback is that inline conversion will happen when the file is
mmapped, even though the page will not be written to.
Fixes: 7b4cc9787fe3 ("ext4: evict inline data when writing to memory map")
Reported-by: syzbot+0c89d865531d053abb2d(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=0c89d865531d053abb2d
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)igalia.com>
Cc: stable(a)vger.kernel.org
---
Changes in v2:
- Convert inline data at mmap time, avoiding data loss.
- Link to v1: https://lore.kernel.org/r/20250519-ext4_inline_page_mkwrite-v1-1-865d9a62b5…
---
fs/ext4/file.c | 6 ++++++
fs/ext4/inode.c | 4 ----
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index beb078ee4811d6092e362e37307e7d87e5276cbc..f2380471df5d99500e49fdc639fa3e56143c328f 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -819,6 +819,12 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
if (!daxdev_mapping_supported(vma, dax_dev))
return -EOPNOTSUPP;
+ inode_lock(inode);
+ ret = ext4_convert_inline_data(inode);
+ inode_unlock(inode);
+ if (ret)
+ return ret;
+
file_accessed(file);
if (IS_DAX(file_inode(file))) {
vma->vm_ops = &ext4_dax_vm_ops;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 94c7d2d828a64e42ded09c82497ed7617071aa19..895ecda786194b29d32c9c49785d56a1a84e2096 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -6222,10 +6222,6 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
filemap_invalidate_lock_shared(mapping);
- err = ext4_convert_inline_data(inode);
- if (err)
- goto out_ret;
-
/*
* On data journalling we skip straight to the transaction handle:
* there's no delalloc; page truncated will be checked later; the
---
base-commit: 4a95bc121ccdaee04c4d72f84dbfa6b880a514b6
change-id: 20250519-ext4_inline_page_mkwrite-c42ca1f02295
Best regards,
--
Thadeu Lima de Souza Cascardo <cascardo(a)igalia.com>
During our internal testing, we started observing intermittent boot
failures when the machine uses 4-level paging and has a large amount
of persistent memory:
BUG: unable to handle page fault for address: ffffe70000000034
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] SMP NOPTI
RIP: 0010:__init_single_page+0x9/0x6d
Call Trace:
<TASK>
__init_zone_device_page+0x17/0x5d
memmap_init_zone_device+0x154/0x1bb
pagemap_range+0x2e0/0x40f
memremap_pages+0x10b/0x2f0
devm_memremap_pages+0x1e/0x60
dev_dax_probe+0xce/0x2ec [device_dax]
dax_bus_probe+0x6d/0xc9
[... snip ...]
</TASK>
It turns out that the kernel panics while initializing vmemmap
(struct page array) when the vmemmap region spans two PGD entries,
because the new PGD entry is only installed in init_mm.pgd,
but not in the page tables of other tasks.
And looking at __populate_section_memmap():
if (vmemmap_can_optimize(altmap, pgmap))
// does not sync top level page tables
r = vmemmap_populate_compound_pages(pfn, start, end, nid, pgmap);
else
// sync top level page tables in x86
r = vmemmap_populate(start, end, nid, altmap);
In the normal path, vmemmap_populate() in arch/x86/mm/init_64.c
synchronizes the top level page table (See commit 9b861528a801
("x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping
changes")) so that all tasks in the system can see the new vmemmap area.
However, when vmemmap_can_optimize() returns true, the optimized path
skips synchronization of top-level page tables. This is because
vmemmap_populate_compound_pages() is implemented in core MM code, which
does not handle synchronization of the top-level page tables. Instead,
the core MM has historically relied on each architecture to perform this
synchronization manually.
It turns out that current approach of relying on each arch to handle
the page table sync manually is fragile because 1) it's easy to forget
to sync the top level page table, and 2) it's also easy to overlook that
the kernel should not access vmemmap / direct mapping area before the sync.
As suggested by Dave Hansen, define x86_64 versions of
{pgd,p4d}_populate_kernel() and arch_sync_kernel_pagetables(), and
explicitly perform top-level page table synchronization in
{pgd,p4d}_populate_kernel(). Top level page tables are synchronized in
pgd_pouplate_kernel() for 5-level paging and in p4d_populate_kernel()
for 4-level paging.
arch_sync_kernel_pagetables(addr) synchronizes the top level page table
entry for address. It calls sync_kernel_pagetables_{l4,l5} depending on
the page table levels and installs the page entry in all page tables
in the system to make it visible to all tasks.
Note that sync_kernel_pagetables_{l4,l5} are simply versions of
sync_global_pgds_{l4,l5} that synchronizes only a single page table entry
for specified address, instead of for all page table entries corresponding
to a range. No functional difference intended between sync_global_pgds_*
and sync_kernel_pagetables_* other than that.
This also fixes a crash in vmemmap_set_pmd() caused by accessing vmemmap
before sync_global_pgds() [1]:
BUG: unable to handle page fault for address: ffffeb3ff1200000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
Tainted: [W]=WARN
RIP: 0010:vmemmap_set_pmd+0xff/0x230
<TASK>
vmemmap_populate_hugepages+0x176/0x180
vmemmap_populate+0x34/0x80
__populate_section_memmap+0x41/0x90
sparse_add_section+0x121/0x3e0
__add_pages+0xba/0x150
add_pages+0x1d/0x70
memremap_pages+0x3dc/0x810
devm_memremap_pages+0x1c/0x60
xe_devm_add+0x8b/0x100 [xe]
xe_tile_init_noalloc+0x6a/0x70 [xe]
xe_device_probe+0x48c/0x740 [xe]
[... snip ...]
Cc: <stable(a)vger.kernel.org>
Fixes: 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps")
Fixes: faf1c0008a33 ("x86/vmemmap: optimize for consecutive sections in partial populated PMDs")
Closes: https://lore.kernel.org/linux-mm/20250311114420.240341-1-gwan-gyeong.mun@in… [1]
Suggested-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Harry Yoo <harry.yoo(a)oracle.com>
---
arch/x86/include/asm/pgalloc.h | 22 ++++++++++
arch/x86/mm/init_64.c | 80 ++++++++++++++++++++++++++++++++++
2 files changed, 102 insertions(+)
diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
index c88691b15f3c..d66f2db54b16 100644
--- a/arch/x86/include/asm/pgalloc.h
+++ b/arch/x86/include/asm/pgalloc.h
@@ -10,6 +10,7 @@
#define __HAVE_ARCH_PTE_ALLOC_ONE
#define __HAVE_ARCH_PGD_FREE
+#define __HAVE_ARCH_SYNC_KERNEL_PGTABLE
#include <asm-generic/pgalloc.h>
static inline int __paravirt_pgd_alloc(struct mm_struct *mm) { return 0; }
@@ -114,6 +115,17 @@ static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud)
set_p4d(p4d, __p4d(_PAGE_TABLE | __pa(pud)));
}
+void arch_sync_kernel_pagetables(unsigned long addr);
+
+static inline void p4d_populate_kernel(unsigned long addr,
+ p4d_t *p4d, pud_t *pud)
+{
+ paravirt_alloc_pud(&init_mm, __pa(pud) >> PAGE_SHIFT);
+ set_p4d(p4d, __p4d(_PAGE_TABLE | __pa(pud)));
+ if (!pgtable_l5_enabled())
+ arch_sync_kernel_pagetables(addr);
+}
+
static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d, pud_t *pud)
{
paravirt_alloc_pud(mm, __pa(pud) >> PAGE_SHIFT);
@@ -137,6 +149,16 @@ static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, p4d_t *p4d)
set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(p4d)));
}
+static inline void pgd_populate_kernel(unsigned long addr,
+ pgd_t *pgd, p4d_t *p4d)
+{
+ if (!pgtable_l5_enabled())
+ return;
+ paravirt_alloc_p4d(&init_mm, __pa(p4d) >> PAGE_SHIFT);
+ set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(p4d)));
+ arch_sync_kernel_pagetables(addr);
+}
+
static inline void pgd_populate_safe(struct mm_struct *mm, pgd_t *pgd, p4d_t *p4d)
{
if (!pgtable_l5_enabled())
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index fdb6cab524f0..cbddbef434d5 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -223,6 +223,86 @@ static void sync_global_pgds(unsigned long start, unsigned long end)
sync_global_pgds_l4(start, end);
}
+static void sync_kernel_pagetables_l4(unsigned long addr)
+{
+ pgd_t *pgd_ref = pgd_offset_k(addr);
+ const p4d_t *p4d_ref;
+ struct page *page;
+
+ VM_WARN_ON_ONCE(pgtable_l5_enabled());
+ /*
+ * With folded p4d, pgd_none() is always false, we need to
+ * handle synchronization on p4d level.
+ */
+ MAYBE_BUILD_BUG_ON(pgd_none(*pgd_ref));
+ p4d_ref = p4d_offset(pgd_ref, addr);
+
+ if (p4d_none(*p4d_ref))
+ return;
+
+ spin_lock(&pgd_lock);
+ list_for_each_entry(page, &pgd_list, lru) {
+ pgd_t *pgd;
+ p4d_t *p4d;
+ spinlock_t *pgt_lock;
+
+ pgd = (pgd_t *)page_address(page) + pgd_index(addr);
+ p4d = p4d_offset(pgd, addr);
+ /* the pgt_lock only for Xen */
+ pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
+ spin_lock(pgt_lock);
+
+ if (!p4d_none(*p4d_ref) && !p4d_none(*p4d))
+ BUG_ON(p4d_pgtable(*p4d)
+ != p4d_pgtable(*p4d_ref));
+
+ if (p4d_none(*p4d))
+ set_p4d(p4d, *p4d_ref);
+
+ spin_unlock(pgt_lock);
+ }
+ spin_unlock(&pgd_lock);
+}
+
+static void sync_kernel_pagetables_l5(unsigned long addr)
+{
+ const pgd_t *pgd_ref = pgd_offset_k(addr);
+ struct page *page;
+
+ VM_WARN_ON_ONCE(!pgtable_l5_enabled());
+
+ if (pgd_none(*pgd_ref))
+ return;
+
+ spin_lock(&pgd_lock);
+ list_for_each_entry(page, &pgd_list, lru) {
+ pgd_t *pgd;
+ spinlock_t *pgt_lock;
+
+ pgd = (pgd_t *)page_address(page) + pgd_index(addr);
+ /* the pgt_lock only for Xen */
+ pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
+ spin_lock(pgt_lock);
+
+ if (!pgd_none(*pgd_ref) && !pgd_none(*pgd))
+ BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
+
+ if (pgd_none(*pgd))
+ set_pgd(pgd, *pgd_ref);
+
+ spin_unlock(pgt_lock);
+ }
+ spin_unlock(&pgd_lock);
+}
+
+void arch_sync_kernel_pagetables(unsigned long addr)
+{
+ if (pgtable_l5_enabled())
+ sync_kernel_pagetables_l5(addr);
+ else
+ sync_kernel_pagetables_l4(addr);
+}
+
/*
* NOTE: This function is marked __ref because it calls __init function
* (alloc_bootmem_pages). It's safe to do it ONLY when after_bootmem == 0.
--
2.43.0
The patch titled
Subject: mm/hmm: move pmd_to_hmm_pfn_flags() to the respective #ifdeffery
has been added to the -mm mm-new branch. Its filename is
mm-hmm-move-pmd_to_hmm_pfn_flags-to-the-respective-ifdeffery.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Subject: mm/hmm: move pmd_to_hmm_pfn_flags() to the respective #ifdeffery
Date: Thu, 10 Jul 2025 11:23:53 +0300
When pmd_to_hmm_pfn_flags() is unused, it prevents kernel builds with
clang, `make W=1` and CONFIG_TRANSPARENT_HUGEPAGE=n:
mm/hmm.c:186:29: warning: unused function 'pmd_to_hmm_pfn_flags' [-Wunused-function]
Fix this by moving the function to the respective existing ifdeffery
for its the only user.
See also:
6863f5643dd7 ("kbuild: allow Clang to find unused static inline functions for W=1 build")
Link: https://lkml.kernel.org/r/20250710082403.664093-1-andriy.shevchenko@linux.i…
Fixes: 9d3973d60f0a ("mm/hmm: cleanup the hmm_vma_handle_pmd stub")
Signed-off-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Reviewed-by: Leon Romanovsky <leonro(a)nvidia.com>
Cc: Andriy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Cc: Bill Wendling <morbo(a)google.com>
Cc: Jerome Glisse <jglisse(a)redhat.com>
Cc: Justin Stitt <justinstitt(a)google.com>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hmm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/hmm.c~mm-hmm-move-pmd_to_hmm_pfn_flags-to-the-respective-ifdeffery
+++ a/mm/hmm.c
@@ -183,6 +183,7 @@ static inline unsigned long hmm_pfn_flag
return order << HMM_PFN_ORDER_SHIFT;
}
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static inline unsigned long pmd_to_hmm_pfn_flags(struct hmm_range *range,
pmd_t pmd)
{
@@ -193,7 +194,6 @@ static inline unsigned long pmd_to_hmm_p
hmm_pfn_flags_order(PMD_SHIFT - PAGE_SHIFT);
}
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr,
unsigned long end, unsigned long hmm_pfns[],
pmd_t pmd)
_
Patches currently in -mm which might be from andriy.shevchenko(a)linux.intel.com are
mm-hmm-move-pmd_to_hmm_pfn_flags-to-the-respective-ifdeffery.patch
panic-add-panic_sys_info-sysctl-to-take-human-readable-string-parameter-fix.patch
The patch titled
Subject: nilfs2: reject invalid file types when reading inodes
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-reject-invalid-file-types-when-reading-inodes.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: reject invalid file types when reading inodes
Date: Thu, 10 Jul 2025 22:49:08 +0900
To prevent inodes with invalid file types from tripping through the vfs
and causing malfunctions or assertion failures, add a missing sanity check
when reading an inode from a block device. If the file type is not valid,
treat it as a filesystem error.
Link: https://lkml.kernel.org/r/20250710134952.29862-1-konishi.ryusuke@gmail.com
Fixes: 05fe58fdc10d ("nilfs2: inode operations")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+895c23f6917da440ed0d(a)syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/inode.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--- a/fs/nilfs2/inode.c~nilfs2-reject-invalid-file-types-when-reading-inodes
+++ a/fs/nilfs2/inode.c
@@ -472,11 +472,18 @@ static int __nilfs_read_inode(struct sup
inode->i_op = &nilfs_symlink_inode_operations;
inode_nohighmem(inode);
inode->i_mapping->a_ops = &nilfs_aops;
- } else {
+ } else if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode) ||
+ S_ISFIFO(inode->i_mode) || S_ISSOCK(inode->i_mode)) {
inode->i_op = &nilfs_special_inode_operations;
init_special_inode(
inode, inode->i_mode,
huge_decode_dev(le64_to_cpu(raw_inode->i_device_code)));
+ } else {
+ nilfs_error(sb,
+ "invalid file type bits in mode 0%o for inode %lu",
+ inode->i_mode, ino);
+ err = -EIO;
+ goto failed_unmap;
}
nilfs_ifile_unmap_inode(raw_inode);
brelse(bh);
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-reject-invalid-file-types-when-reading-inodes.patch
David Howells <dhowells(a)redhat.com> wrote:
> Here are some miscellaneous fixes and changes for netfslib and cifs, if you
> could consider pulling them. All the bugs fixed were observed in cifs, so
> they should probably go through the cifs tree unless Christian would much
> prefer for them to go through the VFS tree.
Hi David,
your commit 2b1424cd131c ("netfs: Fix wait/wake to be consistent about
the waitqueue used") has given me serious headaches; it has caused
outages in our web hosting clusters (yet again - all Linux versions
since 6.9 had serious netfs regressions). Your patch was backported to
6.15 as commit 329ba1cb402a in 6.15.3 (why oh why??), and therefore
the bugs it has caused will be "available" to all Linux stable users.
The problem we had is that writing to certain files never finishes. It
looks like it has to do with the cachefiles subrequest never reporting
completion. (We use Ceph with cachefiles)
I have tried applying the fixes in this pull request, which sounded
promising, but the problem is still there. The only thing that helps
is reverting 2b1424cd131c completely - everything is fine with 6.15.5
plus the revert.
What do you need from me in order to analyze the bug?
Max
Set the dirty bit when the memory resource is not cleared
during BO release.
v2(Christian):
- Drop the cleared flag set to false.
- Improve the amdgpu_vram_mgr_set_clear_state() function.
v3:
- Add back the resource clear flag set function call after
being wiped during eviction (Christian).
- Modified the patch subject name.
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam(a)amd.com>
Suggested-by: Christian König <christian.koenig(a)amd.com>
Cc: stable(a)vger.kernel.org
Fixes: a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality")
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h
index b256cbc2bc27..2c88d5fd87da 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h
@@ -66,7 +66,10 @@ to_amdgpu_vram_mgr_resource(struct ttm_resource *res)
static inline void amdgpu_vram_mgr_set_cleared(struct ttm_resource *res)
{
- to_amdgpu_vram_mgr_resource(res)->flags |= DRM_BUDDY_CLEARED;
+ struct amdgpu_vram_mgr_resource *ares = to_amdgpu_vram_mgr_resource(res);
+
+ WARN_ON(ares->flags & DRM_BUDDY_CLEARED);
+ ares->flags |= DRM_BUDDY_CLEARED;
}
#endif
--
2.43.0
From: Edip Hazuri <edip(a)medip.dev>
The mute led on this laptop is using ALC245 but requires a quirk to work
This patch enables the existing quirk for the device.
Tested on Victus 16-r0xxx Laptop. The LED behaviour works
as intended.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Edip Hazuri <edip(a)medip.dev>
---
sound/pci/hda/patch_realtek.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index 060db37ea..132cef8fa 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -10814,6 +10814,7 @@ static const struct hda_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x103c, 0x8b97, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
SND_PCI_QUIRK(0x103c, 0x8bb3, "HP Slim OMEN", ALC287_FIXUP_CS35L41_I2C_2),
SND_PCI_QUIRK(0x103c, 0x8bb4, "HP Slim OMEN", ALC287_FIXUP_CS35L41_I2C_2),
+ SND_PCI_QUIRK(0x103c, 0x8bbe, "HP Victus 16-r0xxx (MB 8BBE)", ALC245_FIXUP_HP_MUTE_LED_COEFBIT),
SND_PCI_QUIRK(0x103c, 0x8bc8, "HP Victus 15-fa1xxx", ALC245_FIXUP_HP_MUTE_LED_COEFBIT),
SND_PCI_QUIRK(0x103c, 0x8bcd, "HP Omen 16-xd0xxx", ALC245_FIXUP_HP_MUTE_LED_V1_COEFBIT),
SND_PCI_QUIRK(0x103c, 0x8bdd, "HP Envy 17", ALC287_FIXUP_CS35L41_I2C_2),
--
2.50.1
Currently, __mkroute_output overrules the MTU value configured for
broadcast routes.
This buggy behaviour can be reproduced with:
ip link set dev eth1 mtu 9000
ip route del broadcast 192.168.0.255 dev eth1 proto kernel scope link src 192.168.0.2
ip route add broadcast 192.168.0.255 dev eth1 proto kernel scope link src 192.168.0.2 mtu 1500
The maximum packet size should be 1500, but it is actually 8000:
ping -b 192.168.0.255 -s 8000
Fix __mkroute_output to allow MTU values to be configured for
for broadcast routes (to support a mixed-MTU local-area-network).
Signed-off-by: Oscar Maes <oscmaes92(a)gmail.com>
---
net/ipv4/route.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index fccb05fb3..a2a3b6482 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2585,7 +2585,6 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
do_cache = true;
if (type == RTN_BROADCAST) {
flags |= RTCF_BROADCAST | RTCF_LOCAL;
- fi = NULL;
} else if (type == RTN_MULTICAST) {
flags |= RTCF_MULTICAST | RTCF_LOCAL;
if (!ip_check_mc_rcu(in_dev, fl4->daddr, fl4->saddr,
--
2.39.5
When the GPU scheduler was ported to using a struct for its
initialization parameters, it was overlooked that panfrost creates a
distinct workqueue for timeout handling.
The pointer to this new workqueue is not initialized to the struct,
resulting in NULL being passed to the scheduler, which then uses the
system_wq for timeout handling.
Set the correct workqueue to the init args struct.
Cc: stable(a)vger.kernel.org # 6.15+
Fixes: 796a9f55a8d1 ("drm/sched: Use struct for drm_sched_init() params")
Reported-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Closes: https://lore.kernel.org/dri-devel/b5d0921c-7cbf-4d55-aa47-c35cd7861c02@igal…
Signed-off-by: Philipp Stanner <phasta(a)kernel.org>
---
drivers/gpu/drm/panfrost/panfrost_job.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 5657106c2f7d..15e2d505550f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -841,7 +841,6 @@ int panfrost_job_init(struct panfrost_device *pfdev)
.num_rqs = DRM_SCHED_PRIORITY_COUNT,
.credit_limit = 2,
.timeout = msecs_to_jiffies(JOB_TIMEOUT_MS),
- .timeout_wq = pfdev->reset.wq,
.name = "pan_js",
.dev = pfdev->dev,
};
@@ -879,6 +878,7 @@ int panfrost_job_init(struct panfrost_device *pfdev)
pfdev->reset.wq = alloc_ordered_workqueue("panfrost-reset", 0);
if (!pfdev->reset.wq)
return -ENOMEM;
+ args.timeout_wq = pfdev->reset.wq;
for (j = 0; j < NUM_JOB_SLOTS; j++) {
js->queue[j].fence_context = dma_fence_context_alloc(1);
--
2.49.0
The SD spec says: "In UHS-I mode, after selecting one of SDR50, SDR104,
or DDR50 mode by Function Group 1, host needs to change the Power Limit
to enable the card to operate in higher performance".
The driver previously determined SD card current limits incorrectly by
checking capability bits before bus speed was established, and by using
support bits in function group 4 (bytes 6 & 7) rather than the actual
current requirement (bytes 0 & 1). This is wrong because the card
responds for a given bus speed.
This patch queries the card's current requirement after setting the bus
speed, and uses the reported value to select the appropriate current
limit.
while at it, remove some unused constants and the misleading comment in
the code.
Fixes: d9812780a020 ("mmc: sd: limit SD card power limit according to cards capabilities")
Signed-off-by: Avri Altman <avri.altman(a)sandisk.com>
Cc: stable(a)vger.kernel.org
---
drivers/mmc/core/sd.c | 36 +++++++++++++-----------------------
include/linux/mmc/card.h | 6 ------
2 files changed, 13 insertions(+), 29 deletions(-)
diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
index cf92c5b2059a..357edfb910df 100644
--- a/drivers/mmc/core/sd.c
+++ b/drivers/mmc/core/sd.c
@@ -365,7 +365,6 @@ static int mmc_read_switch(struct mmc_card *card)
card->sw_caps.sd3_bus_mode = status[13];
/* Driver Strengths supported by the card */
card->sw_caps.sd3_drv_type = status[9];
- card->sw_caps.sd3_curr_limit = status[7] | status[6] << 8;
}
out:
@@ -556,7 +555,7 @@ static int sd_set_current_limit(struct mmc_card *card, u8 *status)
{
int current_limit = SD_SET_CURRENT_LIMIT_200;
int err;
- u32 max_current;
+ u32 max_current, card_needs;
/*
* Current limit switch is only defined for SDR50, SDR104, and DDR50
@@ -575,33 +574,24 @@ static int sd_set_current_limit(struct mmc_card *card, u8 *status)
max_current = sd_get_host_max_current(card->host);
/*
- * We only check host's capability here, if we set a limit that is
- * higher than the card's maximum current, the card will be using its
- * maximum current, e.g. if the card's maximum current is 300ma, and
- * when we set current limit to 200ma, the card will draw 200ma, and
- * when we set current limit to 400/600/800ma, the card will draw its
- * maximum 300ma from the host.
- *
- * The above is incorrect: if we try to set a current limit that is
- * not supported by the card, the card can rightfully error out the
- * attempt, and remain at the default current limit. This results
- * in a 300mA card being limited to 200mA even though the host
- * supports 800mA. Failures seen with SanDisk 8GB UHS cards with
- * an iMX6 host. --rmk
+ * query the card of its maximun current/power consumption given the
+ * bus speed mode
*/
- if (max_current >= 800 &&
- card->sw_caps.sd3_curr_limit & SD_MAX_CURRENT_800)
+ err = mmc_sd_switch(card, 0, 0, card->sd_bus_speed, status);
+ if (err)
+ return err;
+
+ card_needs = status[1] | status[0] << 8;
+
+ if (max_current >= 800 && card_needs > 600)
current_limit = SD_SET_CURRENT_LIMIT_800;
- else if (max_current >= 600 &&
- card->sw_caps.sd3_curr_limit & SD_MAX_CURRENT_600)
+ else if (max_current >= 600 && card_needs > 400)
current_limit = SD_SET_CURRENT_LIMIT_600;
- else if (max_current >= 400 &&
- card->sw_caps.sd3_curr_limit & SD_MAX_CURRENT_400)
+ else if (max_current >= 400 && card_needs > 200)
current_limit = SD_SET_CURRENT_LIMIT_400;
if (current_limit != SD_SET_CURRENT_LIMIT_200) {
- err = mmc_sd_switch(card, SD_SWITCH_SET, 3,
- current_limit, status);
+ err = mmc_sd_switch(card, SD_SWITCH_SET, 3, current_limit, status);
if (err)
return err;
diff --git a/include/linux/mmc/card.h b/include/linux/mmc/card.h
index e9e964c20e53..67c1386ca574 100644
--- a/include/linux/mmc/card.h
+++ b/include/linux/mmc/card.h
@@ -177,17 +177,11 @@ struct sd_switch_caps {
#define SD_DRIVER_TYPE_A 0x02
#define SD_DRIVER_TYPE_C 0x04
#define SD_DRIVER_TYPE_D 0x08
- unsigned int sd3_curr_limit;
#define SD_SET_CURRENT_LIMIT_200 0
#define SD_SET_CURRENT_LIMIT_400 1
#define SD_SET_CURRENT_LIMIT_600 2
#define SD_SET_CURRENT_LIMIT_800 3
-#define SD_MAX_CURRENT_200 (1 << SD_SET_CURRENT_LIMIT_200)
-#define SD_MAX_CURRENT_400 (1 << SD_SET_CURRENT_LIMIT_400)
-#define SD_MAX_CURRENT_600 (1 << SD_SET_CURRENT_LIMIT_600)
-#define SD_MAX_CURRENT_800 (1 << SD_SET_CURRENT_LIMIT_800)
-
#define SD4_SET_POWER_LIMIT_0_72W 0
#define SD4_SET_POWER_LIMIT_1_44W 1
#define SD4_SET_POWER_LIMIT_2_16W 2
--
2.25.1
The quilt patch titled
Subject: ocfs2: reset folio to NULL when get folio fails
has been removed from the -mm tree. Its filename was
ocfs2-reset-folio-to-null-when-get-folio-fails.patch
This patch was dropped because it was merged into the mm-nonmm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Lizhi Xu <lizhi.xu(a)windriver.com>
Subject: ocfs2: reset folio to NULL when get folio fails
Date: Mon, 16 Jun 2025 09:31:40 +0800
The reproducer uses FAULT_INJECTION to make memory allocation fail, which
causes __filemap_get_folio() to fail, when initializing w_folios[i] in
ocfs2_grab_folios_for_write(), it only returns an error code and the value
of w_folios[i] is the error code, which causes
ocfs2_unlock_and_free_folios() to recycle the invalid w_folios[i] when
releasing folios.
Link: https://lkml.kernel.org/r/20250616013140.3602219-1-lizhi.xu@windriver.com
Reported-by: syzbot+c2ea94ae47cd7e3881ec(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c2ea94ae47cd7e3881ec
Signed-off-by: Lizhi Xu <lizhi.xu(a)windriver.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/aops.c | 1 +
1 file changed, 1 insertion(+)
--- a/fs/ocfs2/aops.c~ocfs2-reset-folio-to-null-when-get-folio-fails
+++ a/fs/ocfs2/aops.c
@@ -1071,6 +1071,7 @@ static int ocfs2_grab_folios_for_write(s
if (IS_ERR(wc->w_folios[i])) {
ret = PTR_ERR(wc->w_folios[i]);
mlog_errno(ret);
+ wc->w_folios[i] = NULL;
goto out;
}
}
_
Patches currently in -mm which might be from lizhi.xu(a)windriver.com are
The quilt patch titled
Subject: mm/ptdump: take the memory hotplug lock inside ptdump_walk_pgd()
has been removed from the -mm tree. Its filename was
mm-ptdump-take-the-memory-hotplug-lock-inside-ptdump_walk_pgd.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Anshuman Khandual <anshuman.khandual(a)arm.com>
Subject: mm/ptdump: take the memory hotplug lock inside ptdump_walk_pgd()
Date: Fri, 20 Jun 2025 10:54:27 +0530
Memory hot remove unmaps and tears down various kernel page table regions
as required. The ptdump code can race with concurrent modifications of
the kernel page tables. When leaf entries are modified concurrently, the
dump code may log stale or inconsistent information for a VA range, but
this is otherwise not harmful.
But when intermediate levels of kernel page table are freed, the dump code
will continue to use memory that has been freed and potentially
reallocated for another purpose. In such cases, the ptdump code may
dereference bogus addresses, leading to a number of potential problems.
To avoid the above mentioned race condition, platforms such as arm64,
riscv and s390 take memory hotplug lock, while dumping kernel page table
via the sysfs interface /sys/kernel/debug/kernel_page_tables.
Similar race condition exists while checking for pages that might have
been marked W+X via /sys/kernel/debug/kernel_page_tables/check_wx_pages
which in turn calls ptdump_check_wx(). Instead of solving this race
condition again, let's just move the memory hotplug lock inside generic
ptdump_check_wx() which will benefit both the scenarios.
Drop get_online_mems() and put_online_mems() combination from all existing
platform ptdump code paths.
Link: https://lkml.kernel.org/r/20250620052427.2092093-1-anshuman.khandual@arm.com
Fixes: bbd6ec605c0f ("arm64/mm: Enable memory hot remove")
Signed-off-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Dev Jain <dev.jain(a)arm.com>
Acked-by: Alexander Gordeev <agordeev(a)linux.ibm.com> [s390]
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Paul Walmsley <paul.walmsley(a)sifive.com>
Cc: Palmer Dabbelt <palmer(a)dabbelt.com>
Cc: Alexander Gordeev <agordeev(a)linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: Christian Borntraeger <borntraeger(a)linux.ibm.com>
Cc: Sven Schnelle <svens(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/arm64/mm/ptdump_debugfs.c | 3 ---
arch/riscv/mm/ptdump.c | 3 ---
arch/s390/mm/dump_pagetables.c | 2 --
mm/ptdump.c | 2 ++
4 files changed, 2 insertions(+), 8 deletions(-)
--- a/arch/arm64/mm/ptdump_debugfs.c~mm-ptdump-take-the-memory-hotplug-lock-inside-ptdump_walk_pgd
+++ a/arch/arm64/mm/ptdump_debugfs.c
@@ -1,6 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/debugfs.h>
-#include <linux/memory_hotplug.h>
#include <linux/seq_file.h>
#include <asm/ptdump.h>
@@ -9,9 +8,7 @@ static int ptdump_show(struct seq_file *
{
struct ptdump_info *info = m->private;
- get_online_mems();
ptdump_walk(m, info);
- put_online_mems();
return 0;
}
DEFINE_SHOW_ATTRIBUTE(ptdump);
--- a/arch/riscv/mm/ptdump.c~mm-ptdump-take-the-memory-hotplug-lock-inside-ptdump_walk_pgd
+++ a/arch/riscv/mm/ptdump.c
@@ -6,7 +6,6 @@
#include <linux/efi.h>
#include <linux/init.h>
#include <linux/debugfs.h>
-#include <linux/memory_hotplug.h>
#include <linux/seq_file.h>
#include <linux/ptdump.h>
@@ -413,9 +412,7 @@ bool ptdump_check_wx(void)
static int ptdump_show(struct seq_file *m, void *v)
{
- get_online_mems();
ptdump_walk(m, m->private);
- put_online_mems();
return 0;
}
--- a/arch/s390/mm/dump_pagetables.c~mm-ptdump-take-the-memory-hotplug-lock-inside-ptdump_walk_pgd
+++ a/arch/s390/mm/dump_pagetables.c
@@ -247,11 +247,9 @@ static int ptdump_show(struct seq_file *
.marker = markers,
};
- get_online_mems();
mutex_lock(&cpa_mutex);
ptdump_walk_pgd(&st.ptdump, &init_mm, NULL);
mutex_unlock(&cpa_mutex);
- put_online_mems();
return 0;
}
DEFINE_SHOW_ATTRIBUTE(ptdump);
--- a/mm/ptdump.c~mm-ptdump-take-the-memory-hotplug-lock-inside-ptdump_walk_pgd
+++ a/mm/ptdump.c
@@ -176,6 +176,7 @@ void ptdump_walk_pgd(struct ptdump_state
{
const struct ptdump_range *range = st->range;
+ get_online_mems();
mmap_write_lock(mm);
while (range->start != range->end) {
walk_page_range_debug(mm, range->start, range->end,
@@ -183,6 +184,7 @@ void ptdump_walk_pgd(struct ptdump_state
range++;
}
mmap_write_unlock(mm);
+ put_online_mems();
/* Flush out the last page */
st->note_page_flush(st);
_
Patches currently in -mm which might be from anshuman.khandual(a)arm.com are
The quilt patch titled
Subject: mm/huge_memory: don't ignore queried cachemode in vmf_insert_pfn_pud()
has been removed from the -mm tree. Its filename was
mm-huge_memory-dont-ignore-queried-cachemode-in-vmf_insert_pfn_pud.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/huge_memory: don't ignore queried cachemode in vmf_insert_pfn_pud()
Date: Fri, 13 Jun 2025 11:27:00 +0200
Patch series "mm/huge_memory: vmf_insert_folio_*() and
vmf_insert_pfn_pud() fixes", v3.
While working on improving vm_normal_page() and friends, I stumbled over
this issues: refcounted "normal" folios must not be marked using
pmd_special() / pud_special(). Otherwise, we're effectively telling the
system that these folios are no "normal", violating the rules we
documented for vm_normal_page().
Fortunately, there are not many pmd_special()/pud_special() users yet. So
far there doesn't seem to be serious damage.
Tested using the ndctl tests ("ndctl:dax" suite).
This patch (of 3):
We set up the cache mode but ... don't forward the updated pgprot to
insert_pfn_pud().
Only a problem on x86-64 PAT when mapping PFNs using PUDs that require a
special cachemode.
Fix it by using the proper pgprot where the cachemode was setup.
It is unclear in which configurations we would get the cachemode wrong:
through vfio seems possible. Getting cachemodes wrong is usually ...
bad. As the fix is easy, let's backport it to stable.
Identified by code inspection.
Link: https://lkml.kernel.org/r/20250613092702.1943533-1-david@redhat.com
Link: https://lkml.kernel.org/r/20250613092702.1943533-2-david@redhat.com
Fixes: 7b806d229ef1 ("mm: remove vmf_insert_pfn_xxx_prot() for huge page-table entries")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Dan Williams <dan.j.williams(a)intel.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reviewed-by: Jason Gunthorpe <jgg(a)nvidia.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Tested-by: Dan Williams <dan.j.williams(a)intel.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Dev Jain <dev.jain(a)arm.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Mariano Pache <npache(a)redhat.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
--- a/mm/huge_memory.c~mm-huge_memory-dont-ignore-queried-cachemode-in-vmf_insert_pfn_pud
+++ a/mm/huge_memory.c
@@ -1516,10 +1516,9 @@ static pud_t maybe_pud_mkwrite(pud_t pud
}
static void insert_pfn_pud(struct vm_area_struct *vma, unsigned long addr,
- pud_t *pud, pfn_t pfn, bool write)
+ pud_t *pud, pfn_t pfn, pgprot_t prot, bool write)
{
struct mm_struct *mm = vma->vm_mm;
- pgprot_t prot = vma->vm_page_prot;
pud_t entry;
if (!pud_none(*pud)) {
@@ -1581,7 +1580,7 @@ vm_fault_t vmf_insert_pfn_pud(struct vm_
pfnmap_setup_cachemode_pfn(pfn_t_to_pfn(pfn), &pgprot);
ptl = pud_lock(vma->vm_mm, vmf->pud);
- insert_pfn_pud(vma, addr, vmf->pud, pfn, write);
+ insert_pfn_pud(vma, addr, vmf->pud, pfn, pgprot, write);
spin_unlock(ptl);
return VM_FAULT_NOPAGE;
@@ -1625,7 +1624,7 @@ vm_fault_t vmf_insert_folio_pud(struct v
add_mm_counter(mm, mm_counter_file(folio), HPAGE_PUD_NR);
}
insert_pfn_pud(vma, addr, vmf->pud, pfn_to_pfn_t(folio_pfn(folio)),
- write);
+ vma->vm_page_prot, write);
spin_unlock(ptl);
return VM_FAULT_NOPAGE;
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-balloon_compaction-we-cannot-have-isolated-pages-in-the-balloon-list.patch
mm-balloon_compaction-convert-balloon_page_delete-to-balloon_page_finalize.patch
mm-zsmalloc-drop-pageisolated-related-vm_bug_ons.patch
mm-page_alloc-let-page-freeing-clear-any-set-page-type.patch
mm-balloon_compaction-make-pageoffline-sticky-until-the-page-is-freed.patch
mm-zsmalloc-make-pagezsmalloc-sticky-until-the-page-is-freed.patch
mm-migrate-rename-isolate_movable_page-to-isolate_movable_ops_page.patch
mm-migrate-rename-putback_movable_folio-to-putback_movable_ops_page.patch
mm-migrate-factor-out-movable_ops-page-handling-into-migrate_movable_ops_page.patch
mm-migrate-remove-folio_test_movable-and-folio_movable_ops.patch
mm-migrate-move-movable_ops-page-handling-out-of-move_to_new_folio.patch
mm-zsmalloc-stop-using-__clearpagemovable.patch
mm-balloon_compaction-stop-using-__clearpagemovable.patch
mm-migrate-remove-__clearpagemovable.patch
mm-migration-remove-pagemovable.patch
mm-rename-__pagemovable-to-page_has_movable_ops.patch
mm-page_isolation-drop-__folio_test_movable-check-for-large-folios.patch
mm-remove-__folio_test_movable.patch
mm-stop-storing-migration_ops-in-page-mapping.patch
mm-convert-movable-flag-in-page-mapping-to-a-page-flag.patch
mm-rename-pg_isolated-to-pg_movable_ops_isolated.patch
mm-page-flags-rename-page_mapping_movable-to-page_mapping_anon_ksm.patch
mm-page-alloc-remove-pagemappingflags.patch
mm-page-flags-remove-folio_mapping_flags.patch
mm-simplify-folio_expected_ref_count.patch
mm-rename-page_mapping_-to-folio_mapping_.patch
docs-mm-convert-from-non-lru-page-migration-to-movable_ops-page-migration.patch
mm-balloon_compaction-movable_ops-doc-updates.patch
mm-balloon_compaction-provide-single-balloon_page_insert-and-balloon_mapping_gfp_mask.patch
mm-convert-fpb_ignore_-into-fpb_respect_.patch
mm-smaller-folio_pte_batch-improvements.patch
mm-split-folio_pte_batch-into-folio_pte_batch-and-folio_pte_batch_flags.patch
mm-remove-boolean-output-parameters-from-folio_pte_batch_ext.patch
mm-memory-introduce-is_huge_zero_pfn-and-use-it-in-vm_normal_page_pmd.patch
The quilt patch titled
Subject: mm/memory-tier: fix abstract distance calculation overflow
has been removed from the -mm tree. Its filename was
mm-memory-tier-fix-abstract-distance-calculation-overflow.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Li Zhijian <lizhijian(a)fujitsu.com>
Subject: mm/memory-tier: fix abstract distance calculation overflow
Date: Tue, 10 Jun 2025 14:27:51 +0800
In mt_perf_to_adistance(), the calculation of abstract distance (adist)
involves multiplying several int values including
MEMTIER_ADISTANCE_DRAM.
*adist = MEMTIER_ADISTANCE_DRAM *
(perf->read_latency + perf->write_latency) /
(default_dram_perf.read_latency + default_dram_perf.write_latency) *
(default_dram_perf.read_bandwidth + default_dram_perf.write_bandwidth) /
(perf->read_bandwidth + perf->write_bandwidth);
Since these values can be large, the multiplication may exceed the
maximum value of an int (INT_MAX) and overflow (Our platform did),
leading to an incorrect adist.
User-visible impact:
The memory tiering subsystem will misinterpret slow memory (like CXL)
as faster than DRAM, causing inappropriate demotion of pages from
CXL (slow memory) to DRAM (fast memory).
For example, we will see the following demotion chains from the dmesg, where
Node0,1 are DRAM, and Node2,3 are CXL node:
Demotion targets for Node 0: null
Demotion targets for Node 1: null
Demotion targets for Node 2: preferred: 0-1, fallback: 0-1
Demotion targets for Node 3: preferred: 0-1, fallback: 0-1
Change MEMTIER_ADISTANCE_DRAM to be a long constant by writing it with
the 'L' suffix. This prevents the overflow because the multiplication
will then be done in the long type which has a larger range.
Link: https://lkml.kernel.org/r/20250611023439.2845785-1-lizhijian@fujitsu.com
Link: https://lkml.kernel.org/r/20250610062751.2365436-1-lizhijian@fujitsu.com
Fixes: 3718c02dbd4c ("acpi, hmat: calculate abstract distance with HMAT")
Signed-off-by: Li Zhijian <lizhijian(a)fujitsu.com>
Reviewed-by: Huang Ying <ying.huang(a)linux.alibaba.com>
Acked-by: Balbir Singh <balbirs(a)nvidia.com>
Reviewed-by: Donet Tom <donettom(a)linux.ibm.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/memory-tiers.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/memory-tiers.h~mm-memory-tier-fix-abstract-distance-calculation-overflow
+++ a/include/linux/memory-tiers.h
@@ -18,7 +18,7 @@
* adistance value (slightly faster) than default DRAM adistance to be part of
* the same memory tier.
*/
-#define MEMTIER_ADISTANCE_DRAM ((4 * MEMTIER_CHUNK_SIZE) + (MEMTIER_CHUNK_SIZE >> 1))
+#define MEMTIER_ADISTANCE_DRAM ((4L * MEMTIER_CHUNK_SIZE) + (MEMTIER_CHUNK_SIZE >> 1))
struct memory_tier;
struct memory_dev_type {
_
Patches currently in -mm which might be from lizhijian(a)fujitsu.com are
The quilt patch titled
Subject: readahead: fix return value of page_cache_next_miss() when no hole is found
has been removed from the -mm tree. Its filename was
readahead-fix-return-value-of-page_cache_next_miss-when-no-hole-is-found.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Chi Zhiling <chizhiling(a)kylinos.cn>
Subject: readahead: fix return value of page_cache_next_miss() when no hole is found
Date: Thu, 5 Jun 2025 13:49:35 +0800
max_scan in page_cache_next_miss always decreases to zero when no hole is
found, causing the return value to be index + 0.
Fix this by preserving the max_scan value throughout the loop.
Jan said "From what I know and have seen in the past, wrong responses
from page_cache_next_miss() can lead to readahead window reduction and
thus reduced read speeds."
Link: https://lkml.kernel.org/r/20250605054935.2323451-1-chizhiling@163.com
Fixes: 901a269ff3d5 ("filemap: fix page_cache_next_miss() when no hole found")
Signed-off-by: Chi Zhiling <chizhiling(a)kylinos.cn>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Cc: Josef Bacik <josef(a)toxicpanda.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/filemap.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/filemap.c~readahead-fix-return-value-of-page_cache_next_miss-when-no-hole-is-found
+++ a/mm/filemap.c
@@ -1778,8 +1778,9 @@ pgoff_t page_cache_next_miss(struct addr
pgoff_t index, unsigned long max_scan)
{
XA_STATE(xas, &mapping->i_pages, index);
+ unsigned long nr = max_scan;
- while (max_scan--) {
+ while (nr--) {
void *entry = xas_next(&xas);
if (!entry || xa_is_value(entry))
return xas.xa_index;
_
Patches currently in -mm which might be from chizhiling(a)kylinos.cn are
Good morning
I'm hoping for a resolution to an issue I'm currently having with the latest kernel release and Workstation Pro (Free Edition). Every time I try to open Workstation, it's prompting me to reinstall modules that ultimately fail (see screenshots). Also see the attached log file. After doing some research, I see the issue is that several header files are missing. I've tried to manually compile and install original modules, but with no success. I'm still faced with an incompatibility issue and the latest kernel. This problem does not occur when I switch back to kernel 6.14.
I'm currently running the latest kernel (6.15.4) on Fedora 42. My hardware specs:
CPU: AMD Ryzen 7840U
Memory: Crucial 96 GB 5600 DDR5
[Screenshot From 2025-07-08 16-53-52.png][Screenshot From 2025-07-08 16-54-06.png],
The quilt patch titled
Subject: mm: fix the inaccurate memory statistics issue for users
has been removed from the -mm tree. Its filename was
mm-fix-the-inaccurate-memory-statistics-issue-for-users.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Subject: mm: fix the inaccurate memory statistics issue for users
Date: Thu, 5 Jun 2025 20:58:29 +0800
On some large machines with a high number of CPUs running a 64K pagesize
kernel, we found that the 'RES' field is always 0 displayed by the top
command for some processes, which will cause a lot of confusion for users.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
875525 root 20 0 12480 0 0 R 0.3 0.0 0:00.08 top
1 root 20 0 172800 0 0 S 0.0 0.0 0:04.52 systemd
The main reason is that the batch size of the percpu counter is quite
large on these machines, caching a significant percpu value, since
converting mm's rss stats into percpu_counter by commit f1a7941243c1 ("mm:
convert mm's rss stats into percpu_counter"). Intuitively, the batch
number should be optimized, but on some paths, performance may take
precedence over statistical accuracy. Therefore, introducing a new
interface to add the percpu statistical count and display it to users,
which can remove the confusion. In addition, this change is not expected
to be on a performance-critical path, so the modification should be
acceptable.
In addition, the 'mm->rss_stat' is updated by using add_mm_counter() and
dec/inc_mm_counter(), which are all wrappers around
percpu_counter_add_batch(). In percpu_counter_add_batch(), there is
percpu batch caching to avoid 'fbc->lock' contention. This patch changes
task_mem() and task_statm() to get the accurate mm counters under the
'fbc->lock', but this should not exacerbate kernel 'mm->rss_stat' lock
contention due to the percpu batch caching of the mm counters. The
following test also confirm the theoretical analysis.
I run the stress-ng that stresses anon page faults in 32 threads on my 32
cores machine, while simultaneously running a script that starts 32
threads to busy-loop pread each stress-ng thread's /proc/pid/status
interface. From the following data, I did not observe any obvious impact
of this patch on the stress-ng tests.
w/o patch:
stress-ng: info: [6848] 4,399,219,085,152 CPU Cycles 67.327 B/sec
stress-ng: info: [6848] 1,616,524,844,832 Instructions 24.740 B/sec (0.367 instr. per cycle)
stress-ng: info: [6848] 39,529,792 Page Faults Total 0.605 M/sec
stress-ng: info: [6848] 39,529,792 Page Faults Minor 0.605 M/sec
w/patch:
stress-ng: info: [2485] 4,462,440,381,856 CPU Cycles 68.382 B/sec
stress-ng: info: [2485] 1,615,101,503,296 Instructions 24.750 B/sec (0.362 instr. per cycle)
stress-ng: info: [2485] 39,439,232 Page Faults Total 0.604 M/sec
stress-ng: info: [2485] 39,439,232 Page Faults Minor 0.604 M/sec
On comparing a very simple app which just allocates & touches some
memory against v6.1 (which doesn't have f1a7941243c1) and latest Linus
tree (4c06e63b9203) I can see that on latest Linus tree the values for
VmRSS, RssAnon and RssFile from /proc/self/status are all zeroes while
they do report values on v6.1 and a Linus tree with this patch.
Link: https://lkml.kernel.org/r/f4586b17f66f97c174f7fd1f8647374fdb53de1c.17491190…
Fixes: f1a7941243c1 ("mm: convert mm's rss stats into percpu_counter")
Signed-off-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Reviewed-by: Aboorva Devarajan <aboorvad(a)linux.ibm.com>
Tested-by: Aboorva Devarajan <aboorvad(a)linux.ibm.com>
Tested-by Donet Tom <donettom(a)linux.ibm.com>
Acked-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Acked-by: SeongJae Park <sj(a)kernel.org>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/task_mmu.c | 14 +++++++-------
include/linux/mm.h | 5 +++++
2 files changed, 12 insertions(+), 7 deletions(-)
--- a/fs/proc/task_mmu.c~mm-fix-the-inaccurate-memory-statistics-issue-for-users
+++ a/fs/proc/task_mmu.c
@@ -36,9 +36,9 @@ void task_mem(struct seq_file *m, struct
unsigned long text, lib, swap, anon, file, shmem;
unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss;
- anon = get_mm_counter(mm, MM_ANONPAGES);
- file = get_mm_counter(mm, MM_FILEPAGES);
- shmem = get_mm_counter(mm, MM_SHMEMPAGES);
+ anon = get_mm_counter_sum(mm, MM_ANONPAGES);
+ file = get_mm_counter_sum(mm, MM_FILEPAGES);
+ shmem = get_mm_counter_sum(mm, MM_SHMEMPAGES);
/*
* Note: to minimize their overhead, mm maintains hiwater_vm and
@@ -59,7 +59,7 @@ void task_mem(struct seq_file *m, struct
text = min(text, mm->exec_vm << PAGE_SHIFT);
lib = (mm->exec_vm << PAGE_SHIFT) - text;
- swap = get_mm_counter(mm, MM_SWAPENTS);
+ swap = get_mm_counter_sum(mm, MM_SWAPENTS);
SEQ_PUT_DEC("VmPeak:\t", hiwater_vm);
SEQ_PUT_DEC(" kB\nVmSize:\t", total_vm);
SEQ_PUT_DEC(" kB\nVmLck:\t", mm->locked_vm);
@@ -92,12 +92,12 @@ unsigned long task_statm(struct mm_struc
unsigned long *shared, unsigned long *text,
unsigned long *data, unsigned long *resident)
{
- *shared = get_mm_counter(mm, MM_FILEPAGES) +
- get_mm_counter(mm, MM_SHMEMPAGES);
+ *shared = get_mm_counter_sum(mm, MM_FILEPAGES) +
+ get_mm_counter_sum(mm, MM_SHMEMPAGES);
*text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK))
>> PAGE_SHIFT;
*data = mm->data_vm + mm->stack_vm;
- *resident = *shared + get_mm_counter(mm, MM_ANONPAGES);
+ *resident = *shared + get_mm_counter_sum(mm, MM_ANONPAGES);
return mm->total_vm;
}
--- a/include/linux/mm.h~mm-fix-the-inaccurate-memory-statistics-issue-for-users
+++ a/include/linux/mm.h
@@ -2568,6 +2568,11 @@ static inline unsigned long get_mm_count
return percpu_counter_read_positive(&mm->rss_stat[member]);
}
+static inline unsigned long get_mm_counter_sum(struct mm_struct *mm, int member)
+{
+ return percpu_counter_sum_positive(&mm->rss_stat[member]);
+}
+
void mm_trace_rss_stat(struct mm_struct *mm, int member);
static inline void add_mm_counter(struct mm_struct *mm, int member, long value)
_
Patches currently in -mm which might be from baolin.wang(a)linux.alibaba.com are
selftests-khugepaged-fix-the-shmem-collapse-failure.patch
selftests-mm-add-shmem-collapse-as-a-default-test-item.patch
mm-huge_memory-fix-the-check-for-allowed-huge-orders-in-shmem.patch
khugepaged-allow-khugepaged-to-check-all-anonymous-mthp-orders.patch
khugepaged-kick-khugepaged-for-enabling-none-pmd-sized-mthps.patch
mm-fault-in-complete-folios-instead-of-individual-pages-for-tmpfs.patch
The quilt patch titled
Subject: mm/damon: fix divide by zero in damon_get_intervals_score()
has been removed from the -mm tree. Its filename was
mm-damon-fix-divide-by-zero-in-damon_get_intervals_score.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Honggyu Kim <honggyu.kim(a)sk.com>
Subject: mm/damon: fix divide by zero in damon_get_intervals_score()
Date: Wed, 2 Jul 2025 09:02:04 +0900
The current implementation allows having zero size regions with no special
reasons, but damon_get_intervals_score() gets crashed by divide by zero
when the region size is zero.
[ 29.403950] Oops: divide error: 0000 [#1] SMP NOPTI
This patch fixes the bug, but does not disallow zero size regions to keep
the backward compatibility since disallowing zero size regions might be a
breaking change for some users.
In addition, the same crash can happen when intervals_goal.access_bp is
zero so this should be fixed in stable trees as well.
Link: https://lkml.kernel.org/r/20250702000205.1921-5-honggyu.kim@sk.com
Fixes: f04b0fedbe71 ("mm/damon/core: implement intervals auto-tuning")
Signed-off-by: Honggyu Kim <honggyu.kim(a)sk.com>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/core.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/damon/core.c~mm-damon-fix-divide-by-zero-in-damon_get_intervals_score
+++ a/mm/damon/core.c
@@ -1449,6 +1449,7 @@ static unsigned long damon_get_intervals
}
}
target_access_events = max_access_events * goal_bp / 10000;
+ target_access_events = target_access_events ? : 1;
return access_events * 10000 / target_access_events;
}
_
Patches currently in -mm which might be from honggyu.kim(a)sk.com are
samples-damon-change-enable-parameters-to-enabled.patch
The quilt patch titled
Subject: samples/damon: fix damon sample mtier for start failure
has been removed from the -mm tree. Its filename was
samples-damon-fix-damon-sample-mtier-for-start-failure.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Honggyu Kim <honggyu.kim(a)sk.com>
Subject: samples/damon: fix damon sample mtier for start failure
Date: Wed, 2 Jul 2025 09:02:03 +0900
The damon_sample_mtier_start() can fail so we must reset the "enable"
parameter to "false" again for proper rollback.
In such cases, setting Y to "enable" then N triggers the similar crash
with mtier because damon sample start failed but the "enable" stays as Y.
Link: https://lkml.kernel.org/r/20250702000205.1921-4-honggyu.kim@sk.com
Fixes: 82a08bde3cf7 ("samples/damon: implement a DAMON module for memory tiering")
Signed-off-by: Honggyu Kim <honggyu.kim(a)sk.com>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
samples/damon/mtier.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/samples/damon/mtier.c~samples-damon-fix-damon-sample-mtier-for-start-failure
+++ a/samples/damon/mtier.c
@@ -164,8 +164,12 @@ static int damon_sample_mtier_enable_sto
if (enable == enabled)
return 0;
- if (enable)
- return damon_sample_mtier_start();
+ if (enable) {
+ err = damon_sample_mtier_start();
+ if (err)
+ enable = false;
+ return err;
+ }
damon_sample_mtier_stop();
return 0;
}
_
Patches currently in -mm which might be from honggyu.kim(a)sk.com are
samples-damon-change-enable-parameters-to-enabled.patch
The quilt patch titled
Subject: samples/damon: fix damon sample wsse for start failure
has been removed from the -mm tree. Its filename was
samples-damon-fix-damon-sample-wsse-for-start-failure.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Honggyu Kim <honggyu.kim(a)sk.com>
Subject: samples/damon: fix damon sample wsse for start failure
Date: Wed, 2 Jul 2025 09:02:02 +0900
The damon_sample_wsse_start() can fail so we must reset the "enable"
parameter to "false" again for proper rollback.
In such cases, setting Y to "enable" then N triggers the similar crash
with wsse because damon sample start failed but the "enable" stays as Y.
Link: https://lkml.kernel.org/r/20250702000205.1921-3-honggyu.kim@sk.com
Fixes: b757c6cfc696 ("samples/damon/wsse: start and stop DAMON as the user requests")
Signed-off-by: Honggyu Kim <honggyu.kim(a)sk.com>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
samples/damon/wsse.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/samples/damon/wsse.c~samples-damon-fix-damon-sample-wsse-for-start-failure
+++ a/samples/damon/wsse.c
@@ -102,8 +102,12 @@ static int damon_sample_wsse_enable_stor
if (enable == enabled)
return 0;
- if (enable)
- return damon_sample_wsse_start();
+ if (enable) {
+ err = damon_sample_wsse_start();
+ if (err)
+ enable = false;
+ return err;
+ }
damon_sample_wsse_stop();
return 0;
}
_
Patches currently in -mm which might be from honggyu.kim(a)sk.com are
samples-damon-change-enable-parameters-to-enabled.patch
The quilt patch titled
Subject: samples/damon: fix damon sample prcl for start failure
has been removed from the -mm tree. Its filename was
samples-damon-fix-damon-sample-prcl-for-start-failure.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Honggyu Kim <honggyu.kim(a)sk.com>
Subject: samples/damon: fix damon sample prcl for start failure
Date: Wed, 2 Jul 2025 09:02:01 +0900
Patch series "mm/damon: fix divide by zero and its samples", v3.
This series includes fixes against damon and its samples to make it safer
when damon sample starting fails.
It includes the following changes.
- fix unexpected divide by zero crash for zero size regions
- fix bugs for damon samples in case of start failures
This patch (of 4):
The damon_sample_prcl_start() can fail so we must reset the "enable"
parameter to "false" again for proper rollback.
In such cases, setting Y to "enable" then N triggers the following crash
because damon sample start failed but the "enable" stays as Y.
[ 2441.419649] damon_sample_prcl: start
[ 2454.146817] damon_sample_prcl: stop
[ 2454.146862] ------------[ cut here ]------------
[ 2454.146865] kernel BUG at mm/slub.c:546!
[ 2454.148183] Oops: invalid opcode: 0000 [#1] SMP NOPTI
...
[ 2454.167555] Call Trace:
[ 2454.167822] <TASK>
[ 2454.168061] damon_destroy_ctx+0x78/0x140
[ 2454.168454] damon_sample_prcl_enable_store+0x8d/0xd0
[ 2454.168932] param_attr_store+0xa1/0x120
[ 2454.169315] module_attr_store+0x20/0x50
[ 2454.169695] sysfs_kf_write+0x72/0x90
[ 2454.170065] kernfs_fop_write_iter+0x150/0x1e0
[ 2454.170491] vfs_write+0x315/0x440
[ 2454.170833] ksys_write+0x69/0xf0
[ 2454.171162] __x64_sys_write+0x19/0x30
[ 2454.171525] x64_sys_call+0x18b2/0x2700
[ 2454.171900] do_syscall_64+0x7f/0x680
[ 2454.172258] ? exit_to_user_mode_loop+0xf6/0x180
[ 2454.172694] ? clear_bhb_loop+0x30/0x80
[ 2454.173067] ? clear_bhb_loop+0x30/0x80
[ 2454.173439] entry_SYSCALL_64_after_hwframe+0x76/0x7e
Link: https://lkml.kernel.org/r/20250702000205.1921-1-honggyu.kim@sk.com
Link: https://lkml.kernel.org/r/20250702000205.1921-2-honggyu.kim@sk.com
Fixes: 2aca254620a8 ("samples/damon: introduce a skeleton of a smaple DAMON module for proactive reclamation")
Signed-off-by: Honggyu Kim <honggyu.kim(a)sk.com>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
samples/damon/prcl.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/samples/damon/prcl.c~samples-damon-fix-damon-sample-prcl-for-start-failure
+++ a/samples/damon/prcl.c
@@ -122,8 +122,12 @@ static int damon_sample_prcl_enable_stor
if (enable == enabled)
return 0;
- if (enable)
- return damon_sample_prcl_start();
+ if (enable) {
+ err = damon_sample_prcl_start();
+ if (err)
+ enable = false;
+ return err;
+ }
damon_sample_prcl_stop();
return 0;
}
_
Patches currently in -mm which might be from honggyu.kim(a)sk.com are
samples-damon-change-enable-parameters-to-enabled.patch
The quilt patch titled
Subject: kasan: remove kasan_find_vm_area() to prevent possible deadlock
has been removed from the -mm tree. Its filename was
kasan-remove-kasan_find_vm_area-to-prevent-possible-deadlock.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Yeoreum Yun <yeoreum.yun(a)arm.com>
Subject: kasan: remove kasan_find_vm_area() to prevent possible deadlock
Date: Thu, 3 Jul 2025 19:10:18 +0100
find_vm_area() couldn't be called in atomic_context. If find_vm_area() is
called to reports vm area information, kasan can trigger deadlock like:
CPU0 CPU1
vmalloc();
alloc_vmap_area();
spin_lock(&vn->busy.lock)
spin_lock_bh(&some_lock);
<interrupt occurs>
<in softirq>
spin_lock(&some_lock);
<access invalid address>
kasan_report();
print_report();
print_address_description();
kasan_find_vm_area();
find_vm_area();
spin_lock(&vn->busy.lock) // deadlock!
To prevent possible deadlock while kasan reports, remove kasan_find_vm_area().
Link: https://lkml.kernel.org/r/20250703181018.580833-1-yeoreum.yun@arm.com
Fixes: c056a364e954 ("kasan: print virtual mapping info in reports")
Signed-off-by: Yeoreum Yun <yeoreum.yun(a)arm.com>
Reported-by: Yunseong Kim <ysk(a)kzalloc.com>
Reviewed-by: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Andrey Konovalov <andreyknvl(a)gmail.com>
Cc: Byungchul Park <byungchul(a)sk.com>
Cc: Dmitriy Vyukov <dvyukov(a)google.com>
Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kasan/report.c | 45 +-------------------------------------------
1 file changed, 2 insertions(+), 43 deletions(-)
--- a/mm/kasan/report.c~kasan-remove-kasan_find_vm_area-to-prevent-possible-deadlock
+++ a/mm/kasan/report.c
@@ -370,36 +370,6 @@ static inline bool init_task_stack_addr(
sizeof(init_thread_union.stack));
}
-/*
- * This function is invoked with report_lock (a raw_spinlock) held. A
- * PREEMPT_RT kernel cannot call find_vm_area() as it will acquire a sleeping
- * rt_spinlock.
- *
- * For !RT kernel, the PROVE_RAW_LOCK_NESTING config option will print a
- * lockdep warning for this raw_spinlock -> spinlock dependency. This config
- * option is enabled by default to ensure better test coverage to expose this
- * kind of RT kernel problem. This lockdep splat, however, can be suppressed
- * by using DEFINE_WAIT_OVERRIDE_MAP() if it serves a useful purpose and the
- * invalid PREEMPT_RT case has been taken care of.
- */
-static inline struct vm_struct *kasan_find_vm_area(void *addr)
-{
- static DEFINE_WAIT_OVERRIDE_MAP(vmalloc_map, LD_WAIT_SLEEP);
- struct vm_struct *va;
-
- if (IS_ENABLED(CONFIG_PREEMPT_RT))
- return NULL;
-
- /*
- * Suppress lockdep warning and fetch vmalloc area of the
- * offending address.
- */
- lock_map_acquire_try(&vmalloc_map);
- va = find_vm_area(addr);
- lock_map_release(&vmalloc_map);
- return va;
-}
-
static void print_address_description(void *addr, u8 tag,
struct kasan_report_info *info)
{
@@ -429,19 +399,8 @@ static void print_address_description(vo
}
if (is_vmalloc_addr(addr)) {
- struct vm_struct *va = kasan_find_vm_area(addr);
-
- if (va) {
- pr_err("The buggy address belongs to the virtual mapping at\n"
- " [%px, %px) created by:\n"
- " %pS\n",
- va->addr, va->addr + va->size, va->caller);
- pr_err("\n");
-
- page = vmalloc_to_page(addr);
- } else {
- pr_err("The buggy address %px belongs to a vmalloc virtual mapping\n", addr);
- }
+ pr_err("The buggy address %px belongs to a vmalloc virtual mapping\n", addr);
+ page = vmalloc_to_page(addr);
}
if (page) {
_
Patches currently in -mm which might be from yeoreum.yun(a)arm.com are
The quilt patch titled
Subject: scripts: gdb: vfs: support external dentry names
has been removed from the -mm tree. Its filename was
scripts-gdb-vfs-support-external-dentry-names.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Illia Ostapyshyn <illia(a)yshyn.com>
Subject: scripts: gdb: vfs: support external dentry names
Date: Sun, 29 Jun 2025 02:38:11 +0200
d_shortname of struct dentry only reserves D_NAME_INLINE_LEN characters
and contains garbage for longer names. Use d_name instead, which always
references the valid name.
Link: https://lore.kernel.org/all/20250525213709.878287-2-illia@yshyn.com/
Link: https://lkml.kernel.org/r/20250629003811.2420418-1-illia@yshyn.com
Fixes: 79300ac805b6 ("scripts/gdb: fix dentry_name() lookup")
Signed-off-by: Illia Ostapyshyn <illia(a)yshyn.com>
Tested-by: Florian Fainelli <florian.fainelli(a)broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli(a)broadcom.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Jan Kiszka <jan.kiszka(a)siemens.com>
Cc: Kieran Bingham <kbingham(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
scripts/gdb/linux/vfs.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/scripts/gdb/linux/vfs.py~scripts-gdb-vfs-support-external-dentry-names
+++ a/scripts/gdb/linux/vfs.py
@@ -22,7 +22,7 @@ def dentry_name(d):
if parent == d or parent == 0:
return ""
p = dentry_name(d['d_parent']) + "/"
- return p + d['d_shortname']['string'].string()
+ return p + d['d_name']['name'].string()
class DentryName(gdb.Function):
"""Return string of the full path of a dentry.
_
Patches currently in -mm which might be from illia(a)yshyn.com are
The quilt patch titled
Subject: mm/damon/core: handle damon_call_control as normal under kdmond deactivation
has been removed from the -mm tree. Its filename was
mm-damon-core-handle-damon_call_control-as-normal-under-kdmond-deactivation.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/core: handle damon_call_control as normal under kdmond deactivation
Date: Sun, 29 Jun 2025 13:49:14 -0700
DAMON sysfs interface internally uses damon_call() to update DAMON
parameters as users requested, online. However, DAMON core cancels any
damon_call() requests when it is deactivated by DAMOS watermarks.
As a result, users cannot change DAMON parameters online while DAMON is
deactivated. Note that users can turn DAMON off and on with different
watermarks to work around. Since deactivated DAMON is nearly same to
stopped DAMON, the work around should have no big problem. Anyway, a bug
is a bug.
There is no real good reason to cancel the damon_call() request under
DAMOS deactivation. Fix it by simply handling the request as normal,
rather than cancelling under the situation.
Link: https://lkml.kernel.org/r/20250629204914.54114-1-sj@kernel.org
Fixes: 42b7491af14c ("mm/damon/core: introduce damon_call()")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [6.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/core.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
--- a/mm/damon/core.c~mm-damon-core-handle-damon_call_control-as-normal-under-kdmond-deactivation
+++ a/mm/damon/core.c
@@ -2355,9 +2355,8 @@ static void kdamond_usleep(unsigned long
*
* If there is a &struct damon_call_control request that registered via
* &damon_call() on @ctx, do or cancel the invocation of the function depending
- * on @cancel. @cancel is set when the kdamond is deactivated by DAMOS
- * watermarks, or the kdamond is already out of the main loop and therefore
- * will be terminated.
+ * on @cancel. @cancel is set when the kdamond is already out of the main loop
+ * and therefore will be terminated.
*/
static void kdamond_call(struct damon_ctx *ctx, bool cancel)
{
@@ -2405,7 +2404,7 @@ static int kdamond_wait_activation(struc
if (ctx->callback.after_wmarks_check &&
ctx->callback.after_wmarks_check(ctx))
break;
- kdamond_call(ctx, true);
+ kdamond_call(ctx, false);
damos_walk_cancel(ctx);
}
return -EBUSY;
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-introduce-damon_stat-module.patch
mm-damon-introduce-damon_stat-module-fix.patch
mm-damon-introduce-damon_stat-module-fix-2.patch
mm-damon-stat-calculate-and-expose-estimated-memory-bandwidth.patch
mm-damon-stat-calculate-and-expose-idle-time-percentiles.patch
docs-admin-guide-mm-damon-add-damon_stat-usage-document.patch
mm-damon-paddr-use-alloc_migartion_target-with-no-migration-fallback-nodemask.patch
revert-mm-rename-alloc_demote_folio-to-alloc_migrate_folio.patch
revert-mm-make-alloc_demote_folio-externally-invokable-for-migration.patch
selftets-damon-add-a-test-for-memcg_path-leak.patch
mm-damon-sysfs-schemes-decouple-from-damos_quota_goal_metric.patch
mm-damon-sysfs-schemes-decouple-from-damos_action.patch
mm-damon-sysfs-schemes-decouple-from-damos_wmark_metric.patch
mm-damon-sysfs-schemes-decouple-from-damos_filter_type.patch
mm-damon-sysfs-decouple-from-damon_ops_id.patch
selftests-damon-add-drgn-script-for-extracting-damon-status.patch
selftests-damon-_damon_sysfs-set-kdamondpid-in-start.patch
selftests-damon-add-python-and-drgn-based-damon-sysfs-test.patch
selftests-damon-sysfspy-test-monitoring-attribute-parameters.patch
selftests-damon-sysfspy-test-adaptive-targets-parameter.patch
selftests-damon-sysfspy-test-damos-schemes-parameters-setup.patch
mm-damon-add-trace-event-for-auto-tuned-monitoring-intervals.patch
mm-damon-add-trace-event-for-effective-size-quota.patch
mm-damon-add-trace-event-for-effective-size-quota-fix.patch
mm-damon-add-trace-event-for-effective-size-quota-fix-2.patch
samples-damon-wsse-fix-boot-time-enable-handling.patch
samples-damon-prcl-fix-boot-time-enable-crash.patch
samples-damon-mtier-support-boot-time-enable-setup.patch
mm-damon-reclaim-reset-enabled-when-damon-start-failed.patch
mm-damon-lru_sort-reset-enabled-when-damon-start-failed.patch
mm-damon-reclaim-use-parameter-context-correctly.patch
samples-damon-wsse-rename-to-have-damon_sample_-prefix.patch
samples-damon-prcl-rename-to-have-damon_sample_-prefix.patch
samples-damon-mtier-rename-to-have-damon_sample_-prefix.patch
mm-damon-sysfs-use-damon-core-api-damon_is_running.patch
mm-damon-sysfs-dont-hold-kdamond_lock-in-before_terminate.patch
docs-mm-damon-maintainer-profile-update-for-mm-new-tree.patch
mm-damon-add-struct-damos_migrate_dests.patch
mm-damon-core-add-damos-migrate_dests-field.patch
mm-damon-sysfs-schemes-implement-damos-action-destinations-directory.patch
mm-damon-sysfs-schemes-set-damos-migrate_dests.patch
docs-abi-damon-document-schemes-dests-directory.patch
docs-admin-guide-mm-damon-usage-document-dests-directory.patch
The quilt patch titled
Subject: mm/rmap: fix potential out-of-bounds page table access during batched unmap
has been removed from the -mm tree. Its filename was
mm-rmap-fix-potential-out-of-bounds-page-table-access-during-batched-unmap.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Lance Yang <lance.yang(a)linux.dev>
Subject: mm/rmap: fix potential out-of-bounds page table access during batched unmap
Date: Fri, 27 Jun 2025 14:23:19 +0800
As pointed out by David[1], the batched unmap logic in
try_to_unmap_one() may read past the end of a PTE table when a large
folio's PTE mappings are not fully contained within a single page
table.
While this scenario might be rare, an issue triggerable from userspace
must be fixed regardless of its likelihood. This patch fixes the
out-of-bounds access by refactoring the logic into a new helper,
folio_unmap_pte_batch().
The new helper correctly calculates the safe batch size by capping the
scan at both the VMA and PMD boundaries. To simplify the code, it also
supports partial batching (i.e., any number of pages from 1 up to the
calculated safe maximum), as there is no strong reason to special-case
for fully mapped folios.
Link: https://lkml.kernel.org/r/20250701143100.6970-1-lance.yang@linux.dev
Link: https://lkml.kernel.org/r/20250630011305.23754-1-lance.yang@linux.dev
Link: https://lkml.kernel.org/r/20250627062319.84936-1-lance.yang@linux.dev
Link: https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redha… [1]
Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation")
Signed-off-by: Lance Yang <lance.yang(a)linux.dev>
Suggested-by: David Hildenbrand <david(a)redhat.com>
Reported-by: David Hildenbrand <david(a)redhat.com>
Closes: https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redha…
Suggested-by: Barry Song <baohua(a)kernel.org>
Acked-by: Barry Song <baohua(a)kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Harry Yoo <harry.yoo(a)oracle.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Chris Li <chrisl(a)kernel.org>
Cc: "Huang, Ying" <huang.ying.caritas(a)gmail.com>
Cc: Kairui Song <kasong(a)tencent.com>
Cc: Lance Yang <lance.yang(a)linux.dev>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Mingzhe Yang <mingzhe.yang(a)ly.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Tangquan Zheng <zhengtangquan(a)oppo.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/rmap.c | 46 ++++++++++++++++++++++++++++------------------
1 file changed, 28 insertions(+), 18 deletions(-)
--- a/mm/rmap.c~mm-rmap-fix-potential-out-of-bounds-page-table-access-during-batched-unmap
+++ a/mm/rmap.c
@@ -1845,23 +1845,32 @@ void folio_remove_rmap_pud(struct folio
#endif
}
-/* We support batch unmapping of PTEs for lazyfree large folios */
-static inline bool can_batch_unmap_folio_ptes(unsigned long addr,
- struct folio *folio, pte_t *ptep)
+static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
+ struct page_vma_mapped_walk *pvmw,
+ enum ttu_flags flags, pte_t pte)
{
const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;
- int max_nr = folio_nr_pages(folio);
- pte_t pte = ptep_get(ptep);
+ unsigned long end_addr, addr = pvmw->address;
+ struct vm_area_struct *vma = pvmw->vma;
+ unsigned int max_nr;
+
+ if (flags & TTU_HWPOISON)
+ return 1;
+ if (!folio_test_large(folio))
+ return 1;
+
+ /* We may only batch within a single VMA and a single page table. */
+ end_addr = pmd_addr_end(addr, vma->vm_end);
+ max_nr = (end_addr - addr) >> PAGE_SHIFT;
+ /* We only support lazyfree batching for now ... */
if (!folio_test_anon(folio) || folio_test_swapbacked(folio))
- return false;
+ return 1;
if (pte_unused(pte))
- return false;
- if (pte_pfn(pte) != folio_pfn(folio))
- return false;
+ return 1;
- return folio_pte_batch(folio, addr, ptep, pte, max_nr, fpb_flags, NULL,
- NULL, NULL) == max_nr;
+ return folio_pte_batch(folio, addr, pvmw->pte, pte, max_nr, fpb_flags,
+ NULL, NULL, NULL);
}
/*
@@ -2024,9 +2033,7 @@ static bool try_to_unmap_one(struct foli
if (pte_dirty(pteval))
folio_mark_dirty(folio);
} else if (likely(pte_present(pteval))) {
- if (folio_test_large(folio) && !(flags & TTU_HWPOISON) &&
- can_batch_unmap_folio_ptes(address, folio, pvmw.pte))
- nr_pages = folio_nr_pages(folio);
+ nr_pages = folio_unmap_pte_batch(folio, &pvmw, flags, pteval);
end_addr = address + nr_pages * PAGE_SIZE;
flush_cache_range(vma, address, end_addr);
@@ -2206,13 +2213,16 @@ discard:
hugetlb_remove_rmap(folio);
} else {
folio_remove_rmap_ptes(folio, subpage, nr_pages, vma);
- folio_ref_sub(folio, nr_pages - 1);
}
if (vma->vm_flags & VM_LOCKED)
mlock_drain_local();
- folio_put(folio);
- /* We have already batched the entire folio */
- if (nr_pages > 1)
+ folio_put_refs(folio, nr_pages);
+
+ /*
+ * If we are sure that we batched the entire folio and cleared
+ * all PTEs, we can just optimize and stop right here.
+ */
+ if (nr_pages == folio_nr_pages(folio))
goto walk_done;
continue;
walk_abort:
_
Patches currently in -mm which might be from lance.yang(a)linux.dev are
locking-rwsem-make-owner-helpers-globally-available.patch
hung_task-extend-hung-task-blocker-tracking-to-rwsems.patch
The quilt patch titled
Subject: scripts/gdb: de-reference per-CPU MCE interrupts
has been removed from the -mm tree. Its filename was
scripts-gdb-de-reference-per-cpu-mce-interrupts.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Florian Fainelli <florian.fainelli(a)broadcom.com>
Subject: scripts/gdb: de-reference per-CPU MCE interrupts
Date: Mon, 23 Jun 2025 20:00:19 -0700
The per-CPU MCE interrupts are looked up by reference and need to be
de-referenced before printing, otherwise we print the addresses of the
variables instead of their contents:
MCE: 18379471554386948492 Machine check exceptions
MCP: 18379471554386948488 Machine check polls
The corrected output looks like this instead now:
MCE: 0 Machine check exceptions
MCP: 1 Machine check polls
Link: https://lkml.kernel.org/r/20250625021109.1057046-1-florian.fainelli@broadco…
Link: https://lkml.kernel.org/r/20250624030020.882472-1-florian.fainelli@broadcom…
Fixes: b0969d7687a7 ("scripts/gdb: print interrupts")
Signed-off-by: Florian Fainelli <florian.fainelli(a)broadcom.com>
Cc: Jan Kiszka <jan.kiszka(a)siemens.com>
Cc: Kieran Bingham <kbingham(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
scripts/gdb/linux/interrupts.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/scripts/gdb/linux/interrupts.py~scripts-gdb-de-reference-per-cpu-mce-interrupts
+++ a/scripts/gdb/linux/interrupts.py
@@ -110,7 +110,7 @@ def x86_show_mce(prec, var, pfx, desc):
pvar = gdb.parse_and_eval(var)
text = "%*s: " % (prec, pfx)
for cpu in cpus.each_online_cpu():
- text += "%10u " % (cpus.per_cpu(pvar, cpu))
+ text += "%10u " % (cpus.per_cpu(pvar, cpu).dereference())
text += " %s\n" % (desc)
return text
_
Patches currently in -mm which might be from florian.fainelli(a)broadcom.com are
The quilt patch titled
Subject: maple_tree: fix mt_destroy_walk() on root leaf node
has been removed from the -mm tree. Its filename was
maple_tree-fix-mt_destroy_walk-on-root-leaf-node.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Wei Yang <richard.weiyang(a)gmail.com>
Subject: maple_tree: fix mt_destroy_walk() on root leaf node
Date: Tue, 24 Jun 2025 15:18:40 -0400
On destroy, we should set each node dead. But current code miss this when
the maple tree has only the root node.
The reason is mt_destroy_walk() leverage mte_destroy_descend() to set node
dead, but this is skipped since the only root node is a leaf.
Fixes this by setting the node dead if it is a leaf.
Link: https://lore.kernel.org/all/20250407231354.11771-1-richard.weiyang@gmail.co…
Link: https://lkml.kernel.org/r/20250624191841.64682-1-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reviewed-by: Dev Jain <dev.jain(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 1 +
1 file changed, 1 insertion(+)
--- a/lib/maple_tree.c~maple_tree-fix-mt_destroy_walk-on-root-leaf-node
+++ a/lib/maple_tree.c
@@ -5319,6 +5319,7 @@ static void mt_destroy_walk(struct maple
struct maple_enode *start;
if (mte_is_leaf(enode)) {
+ mte_set_node_dead(enode);
node->type = mte_node_type(enode);
goto free_leaf;
}
_
Patches currently in -mm which might be from richard.weiyang(a)gmail.com are
mm-migrate-remove-the-eexist-conversion-for-move_pages.patch
The quilt patch titled
Subject: mm/vmalloc: leave lazy MMU mode on PTE mapping error
has been removed from the -mm tree. Its filename was
mm-vmalloc-leave-lazy-mmu-mode-on-pte-mapping-error.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Alexander Gordeev <agordeev(a)linux.ibm.com>
Subject: mm/vmalloc: leave lazy MMU mode on PTE mapping error
Date: Mon, 23 Jun 2025 09:57:21 +0200
vmap_pages_pte_range() enters the lazy MMU mode, but fails to leave it in
case an error is encountered.
Link: https://lkml.kernel.org/r/20250623075721.2817094-1-agordeev@linux.ibm.com
Fixes: 2ba3e6947aed ("mm/vmalloc: track which page-table levels were modified")
Signed-off-by: Alexander Gordeev <agordeev(a)linux.ibm.com>
Reported-by: kernel test robot <lkp(a)intel.com>
Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Closes: https://lore.kernel.org/r/202506132017.T1l1l6ME-lkp@intel.com/
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmalloc.c | 22 +++++++++++++++-------
1 file changed, 15 insertions(+), 7 deletions(-)
--- a/mm/vmalloc.c~mm-vmalloc-leave-lazy-mmu-mode-on-pte-mapping-error
+++ a/mm/vmalloc.c
@@ -514,6 +514,7 @@ static int vmap_pages_pte_range(pmd_t *p
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
pgtbl_mod_mask *mask)
{
+ int err = 0;
pte_t *pte;
/*
@@ -530,12 +531,18 @@ static int vmap_pages_pte_range(pmd_t *p
do {
struct page *page = pages[*nr];
- if (WARN_ON(!pte_none(ptep_get(pte))))
- return -EBUSY;
- if (WARN_ON(!page))
- return -ENOMEM;
- if (WARN_ON(!pfn_valid(page_to_pfn(page))))
- return -EINVAL;
+ if (WARN_ON(!pte_none(ptep_get(pte)))) {
+ err = -EBUSY;
+ break;
+ }
+ if (WARN_ON(!page)) {
+ err = -ENOMEM;
+ break;
+ }
+ if (WARN_ON(!pfn_valid(page_to_pfn(page)))) {
+ err = -EINVAL;
+ break;
+ }
set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
(*nr)++;
@@ -543,7 +550,8 @@ static int vmap_pages_pte_range(pmd_t *p
arch_leave_lazy_mmu_mode();
*mask |= PGTBL_PTE_MODIFIED;
- return 0;
+
+ return err;
}
static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
_
Patches currently in -mm which might be from agordeev(a)linux.ibm.com are
The quilt patch titled
Subject: scripts/gdb: fix interrupts display after MCP on x86
has been removed from the -mm tree. Its filename was
scripts-gdb-fix-interrupts-display-after-mcp-on-x86.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Florian Fainelli <florian.fainelli(a)broadcom.com>
Subject: scripts/gdb: fix interrupts display after MCP on x86
Date: Mon, 23 Jun 2025 09:41:52 -0700
The text line would not be appended to as it should have, it should have
been a '+=' but ended up being a '==', fix that.
Link: https://lkml.kernel.org/r/20250623164153.746359-1-florian.fainelli@broadcom…
Fixes: b0969d7687a7 ("scripts/gdb: print interrupts")
Signed-off-by: Florian Fainelli <florian.fainelli(a)broadcom.com>
Cc: Jan Kiszka <jan.kiszka(a)siemens.com>
Cc: Kieran Bingham <kbingham(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
scripts/gdb/linux/interrupts.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/scripts/gdb/linux/interrupts.py~scripts-gdb-fix-interrupts-display-after-mcp-on-x86
+++ a/scripts/gdb/linux/interrupts.py
@@ -142,7 +142,7 @@ def x86_show_interupts(prec):
if constants.LX_CONFIG_X86_MCE:
text += x86_show_mce(prec, "&mce_exception_count", "MCE", "Machine check exceptions")
- text == x86_show_mce(prec, "&mce_poll_count", "MCP", "Machine check polls")
+ text += x86_show_mce(prec, "&mce_poll_count", "MCP", "Machine check polls")
text += show_irq_err_count(prec)
_
Patches currently in -mm which might be from florian.fainelli(a)broadcom.com are
The quilt patch titled
Subject: lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
has been removed from the -mm tree. Its filename was
lib-alloc_tag-do-not-acquire-non-existent-lock-in-alloc_tag_top_users.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Harry Yoo <harry.yoo(a)oracle.com>
Subject: lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
Date: Sat, 21 Jun 2025 04:53:05 +0900
alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock even
when the alloc_tag_cttype is not allocated because:
1) alloc tagging is disabled because mem profiling is disabled
(!alloc_tag_cttype)
2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
3) alloc tagging is enabled, but failed initialization
(!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
In all cases, alloc_tag_cttype is not allocated, and therefore
alloc_tag_top_users() should not attempt to acquire the semaphore.
This leads to a crash on memory allocation failure by attempting to
acquire a non-existent semaphore:
Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
Tainted: [D]=DIE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:down_read_trylock+0xaa/0x3b0
Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
Call Trace:
<TASK>
codetag_trylock_module_list+0xd/0x20
alloc_tag_top_users+0x369/0x4b0
__show_mem+0x1cd/0x6e0
warn_alloc+0x2b1/0x390
__alloc_frozen_pages_noprof+0x12b9/0x21a0
alloc_pages_mpol+0x135/0x3e0
alloc_slab_page+0x82/0xe0
new_slab+0x212/0x240
___slab_alloc+0x82a/0xe00
</TASK>
As David Wang points out, this issue became easier to trigger after commit
780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
Before the commit, the issue occurred only when it failed to allocate and
initialize alloc_tag_cttype or if a memory allocation fails before
alloc_tag_init() is called. After the commit, it can be easily triggered
when memory profiling is compiled but disabled at boot.
To properly determine whether alloc_tag_init() has been called and its
data structures initialized, verify that alloc_tag_cttype is a valid
pointer before acquiring the semaphore. If the variable is NULL or an
error value, it has not been properly initialized. In such a case, just
skip and do not attempt to acquire the semaphore.
[harry.yoo(a)oracle.com: v3]
Link: https://lkml.kernel.org/r/20250624072513.84219-1-harry.yoo@oracle.com
Link: https://lkml.kernel.org/r/20250620195305.1115151-1-harry.yoo@oracle.com
Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
Signed-off-by: Harry Yoo <harry.yoo(a)oracle.com>
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
Acked-by: Suren Baghdasaryan <surenb(a)google.com>
Tested-by: Raghavendra K T <raghavendra.kt(a)amd.com>
Cc: Casey Chen <cachen(a)purestorage.com>
Cc: David Wang <00107082(a)163.com>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Yuanyuan Zhong <yzhong(a)purestorage.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/alloc_tag.c | 3 +++
1 file changed, 3 insertions(+)
--- a/lib/alloc_tag.c~lib-alloc_tag-do-not-acquire-non-existent-lock-in-alloc_tag_top_users
+++ a/lib/alloc_tag.c
@@ -135,6 +135,9 @@ size_t alloc_tag_top_users(struct codeta
struct codetag_bytes n;
unsigned int i, nr = 0;
+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
+ return 0;
+
if (can_sleep)
codetag_lock_module_list(alloc_tag_cttype, true);
else if (!codetag_trylock_module_list(alloc_tag_cttype))
_
Patches currently in -mm which might be from harry.yoo(a)oracle.com are
mm-zsmalloc-do-not-pass-__gfp_movable-if-config_compaction=n.patch
mm-check-if-folio-has-valid-mapcount-before-folio_test_anonksm-when-necessary.patch
The quilt patch titled
Subject: kallsyms: fix build without execinfo
has been removed from the -mm tree. Its filename was
kallsyms-fix-build-without-execinfo.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Achill Gilgenast <fossdd(a)pwned.life>
Subject: kallsyms: fix build without execinfo
Date: Sun, 22 Jun 2025 03:45:49 +0200
Some libc's like musl libc don't provide execinfo.h since it's not part of
POSIX. In order to fix compilation on musl, only include execinfo.h if
available (HAVE_BACKTRACE_SUPPORT)
This was discovered with c104c16073b7 ("Kunit to check the longest symbol
length") which starts to include linux/kallsyms.h with Alpine Linux'
configs.
Link: https://lkml.kernel.org/r/20250622014608.448718-1-fossdd@pwned.life
Fixes: c104c16073b7 ("Kunit to check the longest symbol length")
Signed-off-by: Achill Gilgenast <fossdd(a)pwned.life>
Cc: Luis Henriques <luis(a)igalia.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/include/linux/kallsyms.h | 4 ++++
1 file changed, 4 insertions(+)
--- a/tools/include/linux/kallsyms.h~kallsyms-fix-build-without-execinfo
+++ a/tools/include/linux/kallsyms.h
@@ -18,6 +18,7 @@ static inline const char *kallsyms_looku
return NULL;
}
+#ifdef HAVE_BACKTRACE_SUPPORT
#include <execinfo.h>
#include <stdlib.h>
static inline void print_ip_sym(const char *loglvl, unsigned long ip)
@@ -30,5 +31,8 @@ static inline void print_ip_sym(const ch
free(name);
}
+#else
+static inline void print_ip_sym(const char *loglvl, unsigned long ip) {}
+#endif
#endif
_
Patches currently in -mm which might be from fossdd(a)pwned.life are
The quilt patch titled
Subject: lib-alloc_tag-do-not-acquire-non-existent-lock-in-alloc_tag_top_users-v3
has been removed from the -mm tree. Its filename was
lib-alloc_tag-do-not-acquire-non-existent-lock-in-alloc_tag_top_users-v3.patch
This patch was dropped because it was folded into lib-alloc_tag-do-not-acquire-non-existent-lock-in-alloc_tag_top_users.patch
------------------------------------------------------
From: Harry Yoo <harry.yoo(a)oracle.com>
Subject: lib-alloc_tag-do-not-acquire-non-existent-lock-in-alloc_tag_top_users-v3
Date: Tue, 24 Jun 2025 16:25:13 +0900
Link: https://lkml.kernel.org/r/20250624072513.84219-1-harry.yoo@oracle.com
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
Closes: https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com
Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
Signed-off-by: Harry Yoo <harry.yoo(a)oracle.com>
Cc: Casey Chen <cachen(a)purestorage.com>
Cc: David Wang <00107082(a)163.com>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Yuanyuan Zhong <yzhong(a)purestorage.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/alloc_tag.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/lib/alloc_tag.c~lib-alloc_tag-do-not-acquire-non-existent-lock-in-alloc_tag_top_users-v3
+++ a/lib/alloc_tag.c
@@ -137,7 +137,8 @@ size_t alloc_tag_top_users(struct codeta
if (IS_ERR_OR_NULL(alloc_tag_cttype))
return 0;
- else if (can_sleep)
+
+ if (can_sleep)
codetag_lock_module_list(alloc_tag_cttype, true);
else if (!codetag_trylock_module_list(alloc_tag_cttype))
return 0;
_
Patches currently in -mm which might be from harry.yoo(a)oracle.com are
lib-alloc_tag-do-not-acquire-non-existent-lock-in-alloc_tag_top_users.patch
mm-zsmalloc-do-not-pass-__gfp_movable-if-config_compaction=n.patch
mm-check-if-folio-has-valid-mapcount-before-folio_test_anonksm-when-necessary.patch
The patch titled
Subject: mm/shmem, swap: improve cached mTHP handling and fix potential hung
has been added to the -mm mm-new branch. Its filename is
mm-shmem-swap-improve-cached-mthp-handling-and-fix-potential-hung.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Kairui Song <kasong(a)tencent.com>
Subject: mm/shmem, swap: improve cached mTHP handling and fix potential hung
Date: Thu, 10 Jul 2025 11:36:59 +0800
Patch series "mm/shmem, swap: bugfix and improvement of mTHP swap in", v5.
The current mTHP swapin path have several problems. It may potentially
hang, may cause redundant faults due to false positive swap cache lookup,
and it will involve at least 4 Xarray tree walks (get order, get order
again, confirm swap, insert folio). And for !CONFIG_TRANSPARENT_HUGEPAGE
builds, it will performs some mTHP related checks.
This series fixes all of the mentioned issues, and the code should be more
robust and prepared for the swap table series. Now tree walks is reduced
to twice (get order & confirm, insert folio), !CONFIG_TRANSPARENT_HUGEPAGE
build overhead is also minimized, and comes with a sanity check now.
The performance is slightly better after this series, sequential swap in
of 24G data from ZRAM, using transparent_hugepage_tmpfs=always (24 samples
each):
Before: Avg: 10.67s, stddev: 0.04
After patch 1: Avg: 10.49s, stddev: 0.04
After patch 2: Avg: 10.42s, stddev: 0.05
After patch 3: Avg: 10.45s, stddev: 0.05
After patch 4: Avg: 10.49s, stddev: 0.04
After patch 5: Avg: 9.67s, stddev: 0.03
After patch 6: Avg: 9.67s, stddev: 0.04
After patch 7: Avg: 9.68s, stddev: 0.05
After patch 8: Avg: 9.66s, stddev: 0.04
Several patches improve the performance by a little, which is about ~10%
faster in total.
Build kernel test showed very slightly improvement, testing with make -j48
with defconfig in a 768M memcg also using ZRAM as swap, and
transparent_hugepage_tmpfs=always (6 test runs):
Before: avg: 3353.66s, stddev: 33.73
After patch 1: avg: 3354.19s, stddev: 42.54
After patch 2: avg: 3364.16s, stddev: 52.74
After patch 3: avg: 3355.73s, stddev: 36.17
After patch 4: avg: 3352.78s, stddev: 39.80
After patch 5: avg: 3355.19s, stddev: 50.78
After patch 6: avg: 3333.63s, stddev: 32.50
After patch 7: avg: 3297.70s, stddev: 38.93
After patch 8: avg: 3302.35s, stddev: 50.61
This patch (of 8):
The current swap-in code assumes that, when a swap entry in shmem mapping
is order 0, its cached folios (if present) must be order 0 too, which
turns out not always correct.
The problem is shmem_split_large_entry is called before verifying the
folio will eventually be swapped in, one possible race is:
CPU1 CPU2
shmem_swapin_folio
/* swap in of order > 0 swap entry S1 */
folio = swap_cache_get_folio
/* folio = NULL */
order = xa_get_order
/* order > 0 */
folio = shmem_swap_alloc_folio
/* mTHP alloc failure, folio = NULL */
<... Interrupted ...>
shmem_swapin_folio
/* S1 is swapped in */
shmem_writeout
/* S1 is swapped out, folio cached */
shmem_split_large_entry(..., S1)
/* S1 is split, but the folio covering it has order > 0 now */
Now any following swapin of S1 will hang: `xa_get_order` returns 0, and
folio lookup will return a folio with order > 0. The
`xa_get_order(&mapping->i_pages, index) != folio_order(folio)` will always
return false causing swap-in to return -EEXIST.
And this looks fragile. So fix this up by allowing seeing a larger folio
in swap cache, and check the whole shmem mapping range covered by the
swapin have the right swap value upon inserting the folio. And drop the
redundant tree walks before the insertion.
This will actually improve performance, as it avoids two redundant Xarray
tree walks in the hot path, and the only side effect is that in the
failure path, shmem may redundantly reallocate a few folios causing
temporary slight memory pressure.
And worth noting, it may seems the order and value check before inserting
might help reducing the lock contention, which is not true. The swap
cache layer ensures raced swapin will either see a swap cache folio or
failed to do a swapin (we have SWAP_HAS_CACHE bit even if swap cache is
bypassed), so holding the folio lock and checking the folio flag is
already good enough for avoiding the lock contention. The chance that a
folio passes the swap entry value check but the shmem mapping slot has
changed should be very low.
Link: https://lkml.kernel.org/r/20250710033706.71042-1-ryncsn@gmail.com
Link: https://lkml.kernel.org/r/20250710033706.71042-2-ryncsn@gmail.com
Fixes: 809bc86517cc ("mm: shmem: support large folio swap out")
Signed-off-by: Kairui Song <kasong(a)tencent.com>
Reviewed-by: Kemeng Shi <shikemeng(a)huaweicloud.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: <stable(a)vger.kernel.org>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Chris Li <chrisl(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Kairui Song <kasong(a)tencent.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Nhat Pham <nphamcs(a)gmail.com>
Cc: Dev Jain <dev.jain(a)arm.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/shmem.c | 30 +++++++++++++++++++++---------
1 file changed, 21 insertions(+), 9 deletions(-)
--- a/mm/shmem.c~mm-shmem-swap-improve-cached-mthp-handling-and-fix-potential-hung
+++ a/mm/shmem.c
@@ -884,7 +884,9 @@ static int shmem_add_to_page_cache(struc
pgoff_t index, void *expected, gfp_t gfp)
{
XA_STATE_ORDER(xas, &mapping->i_pages, index, folio_order(folio));
- long nr = folio_nr_pages(folio);
+ unsigned long nr = folio_nr_pages(folio);
+ swp_entry_t iter, swap;
+ void *entry;
VM_BUG_ON_FOLIO(index != round_down(index, nr), folio);
VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
@@ -896,14 +898,24 @@ static int shmem_add_to_page_cache(struc
gfp &= GFP_RECLAIM_MASK;
folio_throttle_swaprate(folio, gfp);
+ swap = iter = radix_to_swp_entry(expected);
do {
xas_lock_irq(&xas);
- if (expected != xas_find_conflict(&xas)) {
- xas_set_err(&xas, -EEXIST);
- goto unlock;
+ xas_for_each_conflict(&xas, entry) {
+ /*
+ * The range must either be empty, or filled with
+ * expected swap entries. Shmem swap entries are never
+ * partially freed without split of both entry and
+ * folio, so there shouldn't be any holes.
+ */
+ if (!expected || entry != swp_to_radix_entry(iter)) {
+ xas_set_err(&xas, -EEXIST);
+ goto unlock;
+ }
+ iter.val += 1 << xas_get_order(&xas);
}
- if (expected && xas_find_conflict(&xas)) {
+ if (expected && iter.val - nr != swap.val) {
xas_set_err(&xas, -EEXIST);
goto unlock;
}
@@ -2323,7 +2335,7 @@ static int shmem_swapin_folio(struct ino
error = -ENOMEM;
goto failed;
}
- } else if (order != folio_order(folio)) {
+ } else if (order > folio_order(folio)) {
/*
* Swap readahead may swap in order 0 folios into swapcache
* asynchronously, while the shmem mapping can still stores
@@ -2348,15 +2360,15 @@ static int shmem_swapin_folio(struct ino
swap = swp_entry(swp_type(swap), swp_offset(swap) + offset);
}
+ } else if (order < folio_order(folio)) {
+ swap.val = round_down(swap.val, 1 << folio_order(folio));
}
alloced:
/* We have to do this with folio locked to prevent races */
folio_lock(folio);
if ((!skip_swapcache && !folio_test_swapcache(folio)) ||
- folio->swap.val != swap.val ||
- !shmem_confirm_swap(mapping, index, swap) ||
- xa_get_order(&mapping->i_pages, index) != folio_order(folio)) {
+ folio->swap.val != swap.val) {
error = -EEXIST;
goto unlock;
}
_
Patches currently in -mm which might be from kasong(a)tencent.com are
mm-list_lru-refactor-the-locking-code.patch
mm-shmem-swap-improve-cached-mthp-handling-and-fix-potential-hung.patch
mm-shmem-swap-avoid-redundant-xarray-lookup-during-swapin.patch
mm-shmem-swap-tidy-up-thp-swapin-checks.patch
mm-shmem-swap-tidy-up-swap-entry-splitting.patch
mm-shmem-swap-never-use-swap-cache-and-readahead-for-swp_synchronous_io.patch
mm-shmem-swap-simplify-swapin-path-and-result-handling.patch
mm-shmem-swap-rework-swap-entry-and-index-calculation-for-large-swapin.patch
mm-shmem-swap-fix-major-fault-counting.patch
Use common wrappers operating directly on the struct sg_table objects to
fix incorrect use of statterlists related calls. dma_unmap_sg() function
has to be called with the number of elements originally passed to the
dma_map_sg() function, not the one returned in sgtable's nents.
CC: stable(a)vger.kernel.org
Fixes: 425902f5c8e3 ("fpga zynq: Use the scatterlist interface")
Signed-off-by: Marek Szyprowski <m.szyprowski(a)samsung.com>
---
v2:
- fixed build break (missing flags parameter)
---
drivers/fpga/zynq-fpga.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git drivers/fpga/zynq-fpga.c drivers/fpga/zynq-fpga.c
index f7e08f7ea9ef..0be0d569589d 100644
--- drivers/fpga/zynq-fpga.c
+++ drivers/fpga/zynq-fpga.c
@@ -406,7 +406,7 @@ static int zynq_fpga_ops_write(struct fpga_manager *mgr, struct sg_table *sgt)
}
priv->dma_nelms =
- dma_map_sg(mgr->dev.parent, sgt->sgl, sgt->nents, DMA_TO_DEVICE);
+ dma_map_sgtable(mgr->dev.parent, sgt, DMA_TO_DEVICE, 0);
if (priv->dma_nelms == 0) {
dev_err(&mgr->dev, "Unable to DMA map (TO_DEVICE)\n");
return -ENOMEM;
@@ -478,7 +478,7 @@ static int zynq_fpga_ops_write(struct fpga_manager *mgr, struct sg_table *sgt)
clk_disable(priv->clk);
out_free:
- dma_unmap_sg(mgr->dev.parent, sgt->sgl, sgt->nents, DMA_TO_DEVICE);
+ dma_unmap_sgtable(mgr->dev.parent, sgt, DMA_TO_DEVICE, 0);
return err;
}
--
2.34.1
From: Michael Kelley <mhklinux(a)outlook.com>
Commit 96959283a58d ("Drivers: hv: Always select CONFIG_SYSFB
for Hyper-V guests") selects CONFIG_SYSFB for Hyper-V guests
so that screen_info is available to the VMBus driver to get
the location of the framebuffer in Generation 2 VMs. However,
if CONFIG_HYPERV is enabled but CONFIG_EFI is not, a kernel
link error results in ARM64 builds because screen_info is
provided by the EFI firmware interface. While configuring
an ARM64 Hyper-V guest without EFI isn't useful since EFI is
required to boot, the configuration is still possible and
the link error should be prevented.
Fix this by making the selection of CONFIG_SYSFB conditional
on CONFIG_EFI being defined. For Generation 1 VMs on x86/x64,
which don't use EFI, the additional condition is OK because
such VMs get the framebuffer information via a mechanism
that doesn't use screen_info.
Fixes: 96959283a58d ("Drivers: hv: Always select CONFIG_SYSFB for Hyper-V guests")
Reported-by: Arnd Bergmann <arnd(a)arndb.de>
Closes: https://lore.kernel.org/linux-hyperv/20250610091810.2638058-1-arnd@kernel.o…
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506080820.1wmkQufc-lkp@intel.com/
Signed-off-by: Michael Kelley <mhklinux(a)outlook.com>
---
drivers/hv/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index 1cd188b73b74..57623ca7f350 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -9,7 +9,7 @@ config HYPERV
select PARAVIRT
select X86_HV_CALLBACK_VECTOR if X86
select OF_EARLY_FLATTREE if OF
- select SYSFB if !HYPERV_VTL_MODE
+ select SYSFB if EFI && !HYPERV_VTL_MODE
help
Select this option to run Linux as a Hyper-V client operating
system.
--
2.25.1
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
[ Upstream commit de1675de39aa945bad5937d1fde4df3682670639 ]
Revert commit 234f71555019 ("ACPI: battery: negate current when
discharging") breaks not one but several userspace implementations
of battery monitoring: Steam and MangoHud. Perhaps it breaks more,
but those are the two that have been tested.
Reported-by: Matthew Schwartz <matthew.schwartz(a)linux.dev>
Closes: https://lore.kernel.org/linux-acpi/87C1B2AF-D430-4568-B620-14B941A8ABA4@lin…
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
**YES**, this commit should be backported to stable kernel trees.
## Extensive Explanation:
### 1. **Critical Userspace Regression Fix**
This revert fixes a critical regression that breaks userspace
applications. The commit message explicitly states that the original
change "breaks not one but several userspace implementations of battery
monitoring: Steam and MangoHud." This is a clear violation of the
kernel's "no userspace regressions" rule, which is one of the most
fundamental principles in kernel development.
### 2. **Code Analysis Shows ABI Change**
Looking at the code changes:
```c
// The problematic code that was added (and is now being removed):
if (psp == POWER_SUPPLY_PROP_CURRENT_NOW &&
(battery->state & ACPI_BATTERY_STATE_DISCHARGING) &&
acpi_battery_handle_discharging(battery)
== POWER_SUPPLY_STATUS_DISCHARGING)
val->intval = -val->intval;
```
This code was changing the sign of the current value when discharging,
which altered the userspace-visible ABI. Even if the original behavior
didn't match documentation, changing it broke existing applications that
depended on the long-standing behavior.
### 3. **Simple and Safe Revert**
The commit is a clean revert that:
- Removes the problematic sign negation logic
- Restores the original behavior that userspace applications expect
- Is self-contained to the `acpi_battery_get_property()` function
- Has minimal risk of introducing new issues
### 4. **Meets Stable Kernel Criteria**
According to stable kernel rules, this commit qualifies because it:
- **Fixes a real bug** that users are hitting (broken battery monitoring
in Steam and MangoHud)
- **Is obviously correct** (it's a revert of a change that broke
userspace)
- **Is small and contained** (only affects the ACPI battery property
reporting)
- **Has been tested** (the reporter confirmed the issue)
- **Fixes a regression** (restores previous working behavior)
### 5. **Comparison to Similar Commits**
Looking at the historical commits marked with "Backport Status: YES",
they share similar characteristics:
- Fix actual problems affecting users
- Are relatively small and contained changes
- Don't introduce new features or architectural changes
- Focus on fixing specific issues
This revert fits perfectly into this pattern as it's fixing a specific
userspace compatibility issue without introducing any new functionality.
### 6. **Timing Considerations**
The original problematic commit (234f71555019) was from May 8, 2025, and
this revert is from July 3, 2025. If the original commit made it into
any stable releases, those stable trees would need this revert to
prevent userspace breakage.
### Conclusion:
This is a textbook example of a commit that should be backported to
stable kernels. It fixes a userspace regression, is minimal in scope,
carries very low risk, and restores previously working behavior that
applications depend on. The fact that it's a revert of a problematic
change makes it even more suitable for stable backporting.
drivers/acpi/battery.c | 19 +++----------------
1 file changed, 3 insertions(+), 16 deletions(-)
diff --git a/drivers/acpi/battery.c b/drivers/acpi/battery.c
index 93bb1f7d90986..6760330a8af55 100644
--- a/drivers/acpi/battery.c
+++ b/drivers/acpi/battery.c
@@ -243,23 +243,10 @@ static int acpi_battery_get_property(struct power_supply *psy,
break;
case POWER_SUPPLY_PROP_CURRENT_NOW:
case POWER_SUPPLY_PROP_POWER_NOW:
- if (battery->rate_now == ACPI_BATTERY_VALUE_UNKNOWN) {
+ if (battery->rate_now == ACPI_BATTERY_VALUE_UNKNOWN)
ret = -ENODEV;
- break;
- }
-
- val->intval = battery->rate_now * 1000;
- /*
- * When discharging, the current should be reported as a
- * negative number as per the power supply class interface
- * definition.
- */
- if (psp == POWER_SUPPLY_PROP_CURRENT_NOW &&
- (battery->state & ACPI_BATTERY_STATE_DISCHARGING) &&
- acpi_battery_handle_discharging(battery)
- == POWER_SUPPLY_STATUS_DISCHARGING)
- val->intval = -val->intval;
-
+ else
+ val->intval = battery->rate_now * 1000;
break;
case POWER_SUPPLY_PROP_CHARGE_FULL_DESIGN:
case POWER_SUPPLY_PROP_ENERGY_FULL_DESIGN:
--
2.39.5
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 76303ee8d54bff6d9a6d55997acd88a6c2ba63cf
Gitweb: https://git.kernel.org/tip/76303ee8d54bff6d9a6d55997acd88a6c2ba63cf
Author: Jann Horn <jannh(a)google.com>
AuthorDate: Wed, 02 Jul 2025 10:32:04 +02:00
Committer: Dave Hansen <dave.hansen(a)linux.intel.com>
CommitterDate: Wed, 09 Jul 2025 07:46:36 -07:00
x86/mm: Disable hugetlb page table sharing on 32-bit
Only select ARCH_WANT_HUGE_PMD_SHARE on 64-bit x86.
Page table sharing requires at least three levels because it involves
shared references to PMD tables; 32-bit x86 has either two-level paging
(without PAE) or three-level paging (with PAE), but even with
three-level paging, having a dedicated PGD entry for hugetlb is only
barely possible (because the PGD only has four entries), and it seems
unlikely anyone's actually using PMD sharing on 32-bit.
Having ARCH_WANT_HUGE_PMD_SHARE enabled on non-PAE 32-bit X86 (which
has 2-level paging) became particularly problematic after commit
59d9094df3d7 ("mm: hugetlb: independent PMD page table shared count"),
since that changes `struct ptdesc` such that the `pt_mm` (for PGDs) and
the `pt_share_count` (for PMDs) share the same union storage - and with
2-level paging, PMDs are PGDs.
(For comparison, arm64 also gates ARCH_WANT_HUGE_PMD_SHARE on the
configuration of page tables such that it is never enabled with 2-level
paging.)
Closes: https://lore.kernel.org/r/srhpjxlqfna67blvma5frmy3aa@altlinux.org
Fixes: cfe28c5d63d8 ("x86: mm: Remove x86 version of huge_pmd_share.")
Reported-by: Vitaly Chikunov <vt(a)altlinux.org>
Suggested-by: Dave Hansen <dave.hansen(a)intel.com>
Signed-off-by: Jann Horn <jannh(a)google.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Acked-by: Oscar Salvador <osalvador(a)suse.de>
Acked-by: David Hildenbrand <david(a)redhat.com>
Tested-by: Vitaly Chikunov <vt(a)altlinux.org>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250702-x86-2level-hugetlb-v2-1-1a98096edf92%4…
---
arch/x86/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 71019b3..4e0fe68 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -147,7 +147,7 @@ config X86
select ARCH_WANTS_DYNAMIC_TASK_STRUCT
select ARCH_WANTS_NO_INSTR
select ARCH_WANT_GENERAL_HUGETLB
- select ARCH_WANT_HUGE_PMD_SHARE
+ select ARCH_WANT_HUGE_PMD_SHARE if X86_64
select ARCH_WANT_LD_ORPHAN_WARN
select ARCH_WANT_OPTIMIZE_DAX_VMEMMAP if X86_64
select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP if X86_64
[Ideally I'd like to have tests in selftests, but I couldn't extract
them out of the syzbot reproducer, so sending that first series to check
against syzbot]
Followup of https://lore.kernel.org/linux-input/c75433e0-9b47-4072-bbe8-b1d14ea97b13@ro…
This initial series attempt at fixing the various bugs discovered by
Alan regarding __hid_request().
Syzbot managed to create a report descriptor which presents a feature
request of size 0 (still trying to extract it) and this exposed the fact
that __hid_request() was incorrectly handling the case when the report
ID is not used.
Send a first batch of fixes now so we get the feedback from syzbot ASAP.
Note: in the series, I also mentioned that the report of size 0 should
be stripped out of the HID device, but without a local reproducer this
is going to be difficult to implement.
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
---
Benjamin Tissoires (3):
HID: core: ensure the allocated report buffer can contain the reserved report ID
HID: core: ensure __hid_request reserves the report ID as the first byte
HID: core: do not bypass hid_hw_raw_request
drivers/hid/hid-core.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
---
base-commit: 1f988d0788f50d8464f957e793fab356e2937369
change-id: 20250709-report-size-null-37619ea20288
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
Object creation is a careful dance where we must guarantee that the
object is fully constructed before it is visible to other threads, and
GEM buffer objects are no difference.
Final publishing happens by calling drm_gem_handle_create(). After
that the only allowed thing to do is call drm_gem_object_put() because
a concurrent call to the GEM_CLOSE ioctl with a correctly guessed id
(which is trivial since we have a linear allocator) can already tear
down the object again.
Luckily most drivers get this right, the very few exceptions I've
pinged the relevant maintainers for. Unfortunately we also need
drm_gem_handle_create() when creating additional handles for an
already existing object (e.g. GETFB ioctl or the various bo import
ioctl), and hence we cannot have a drm_gem_handle_create_and_put() as
the only exported function to stop these issues from happening.
Now unfortunately the implementation of drm_gem_handle_create() isn't
living up to standards: It does correctly finishe object
initialization at the global level, and hence is safe against a
concurrent tear down. But it also sets up the file-private aspects of
the handle, and that part goes wrong: We fully register the object in
the drm_file.object_idr before calling drm_vma_node_allow() or
obj->funcs->open, which opens up races against concurrent removal of
that handle in drm_gem_handle_delete().
Fix this with the usual two-stage approach of first reserving the
handle id, and then only registering the object after we've completed
the file-private setup.
Jacek reported this with a testcase of concurrently calling GEM_CLOSE
on a freshly-created object (which also destroys the object), but it
should be possible to hit this with just additional handles created
through import or GETFB without completed destroying the underlying
object with the concurrent GEM_CLOSE ioctl calls.
Note that the close-side of this race was fixed in f6cd7daecff5 ("drm:
Release driver references to handle before making it available
again"), which means a cool 9 years have passed until someone noticed
that we need to make this symmetry or there's still gaps left :-/
Without the 2-stage close approach we'd still have a race, therefore
that's an integral part of this bugfix.
More importantly, this means we can have NULL pointers behind
allocated id in our drm_file.object_idr. We need to check for that
now:
- drm_gem_handle_delete() checks for ERR_OR_NULL already
- drm_gem.c:object_lookup() also chekcs for NULL
- drm_gem_release() should never be called if there's another thread
still existing that could call into an IOCTL that creates a new
handle, so cannot race. For paranoia I added a NULL check to
drm_gem_object_release_handle() though.
- most drivers (etnaviv, i915, msm) are find because they use
idr_find(), which maps both ENOENT and NULL to NULL.
- drivers using idr_for_each_entry() should also be fine, because
idr_get_next does filter out NULL entries and continues the
iteration.
- The same holds for drm_show_memory_stats().
v2: Use drm_WARN_ON (Thomas)
Reported-by: Jacek Lawrynowicz <jacek.lawrynowicz(a)linux.intel.com>
Tested-by: Jacek Lawrynowicz <jacek.lawrynowicz(a)linux.intel.com>
Reviewed-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: stable(a)vger.kernel.org
Cc: Jacek Lawrynowicz <jacek.lawrynowicz(a)linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Simona Vetter <simona(a)ffwll.ch>
Signed-off-by: Simona Vetter <simona.vetter(a)intel.com>
Signed-off-by: Simona Vetter <simona.vetter(a)ffwll.ch>
---
drivers/gpu/drm/drm_gem.c | 10 +++++++++-
include/drm/drm_file.h | 3 +++
2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index bc505d938b3e..1aa9192c4cc6 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -316,6 +316,9 @@ drm_gem_object_release_handle(int id, void *ptr, void *data)
struct drm_file *file_priv = data;
struct drm_gem_object *obj = ptr;
+ if (drm_WARN_ON(obj->dev, !data))
+ return 0;
+
if (obj->funcs->close)
obj->funcs->close(obj, file_priv);
@@ -436,7 +439,7 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
idr_preload(GFP_KERNEL);
spin_lock(&file_priv->table_lock);
- ret = idr_alloc(&file_priv->object_idr, obj, 1, 0, GFP_NOWAIT);
+ ret = idr_alloc(&file_priv->object_idr, NULL, 1, 0, GFP_NOWAIT);
spin_unlock(&file_priv->table_lock);
idr_preload_end();
@@ -457,6 +460,11 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
goto err_revoke;
}
+ /* mirrors drm_gem_handle_delete to avoid races */
+ spin_lock(&file_priv->table_lock);
+ obj = idr_replace(&file_priv->object_idr, obj, handle);
+ WARN_ON(obj != NULL);
+ spin_unlock(&file_priv->table_lock);
*handlep = handle;
return 0;
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index eab7546aad79..115763799625 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -300,6 +300,9 @@ struct drm_file {
*
* Mapping of mm object handles to object pointers. Used by the GEM
* subsystem. Protected by @table_lock.
+ *
+ * Note that allocated entries might be NULL as a transient state when
+ * creating or deleting a handle.
*/
struct idr object_idr;
--
2.49.0
Hello stable team,
could you please backport commit 440cf77625e3 ("perf: build: Setup
PKG_CONFIG_LIBDIR for cross compilation") to linux-6.6.y ?
Its absence prevents some people from building the perf tool in cross-compile
environment with this kernel. The patch applies cleanly on linux-6.6.y
Thanks,
Alexis
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Approximately 20% of devices are experiencing intermittent boot failures
with this kernel version. The issue appears to be related to auto login
failures, where an incorrect password is being detected on the serial
console during the login process.
This intermittent regression is noticed on stable-rc 6.15.5-rc2 and
Linux mainline master v6.16-rc5. This regressions is only specific
to the devices not on the qemu's.
Test environments:
- dragonboard-410c
- dragonboard-845c
- e850-96
- juno-r2
- rk3399-rock-pi-4b
- x86
Regression Analysis:
- New regression? Yes
- Reproducibility? 20% only
Test regression: 6.15.5-rc2 v6.16-rc5 auto login failed Login incorrect
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
## log in problem
runner-ns46nmmj-project-40964107-concurrent-0 login: #
Password:
Login incorrect
runner-ns46nmmj-project-40964107-concurrent-0 login:
## Investigation
The following three patches were reverted and the system was re-tested.
The previously reported issues are no longer observed after applying the
reverts.
serial: imx: Restore original RXTL for console to fix data loss
commit f23c52aafb1675ab1d1f46914556d8e29cbbf7b3 upstream.
serial: core: restore of_node information in sysfs
commit d36f0e9a0002f04f4d6dd9be908d58fe5bd3a279 upstream.
tty: serial: uartlite: register uart driver in init
[ Upstream commit 6bd697b5fc39fd24e2aa418c7b7d14469f550a93 ]
Reference bug report lore link,
- https://lore.kernel.org/stable/CA+G9fYvidpyHTQ179dAJ4TSdhthC-Mtjuks5iQjMf+o…
## Test links
* log 1: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.15.y/build/v6.15…
* log 2: https://qa-reports.linaro.org/api/testruns/29021720/log_file/
* log 3: https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.16-rc5/te…
* Boot test: https://regressions.linaro.org/lkft/linux-stable-rc-linux-6.15.y/v6.15.4-26…
* LAVA jobs 1: https://lkft.validation.linaro.org/scheduler/job/8344153#L1186
* LAVA jobs 2: https://lkft.validation.linaro.org/scheduler/job/8343870#L1266
## Build
* kernel: 6.15.5-rc2
* git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
* git commit: f6977c36decb0875e78bdb8599749bce1e84c753
* git describe: v6.15.4-264-gf6977c36decb
* test details:
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.15.y/build/v6.15…
## Test Regressions (compared to v6.15.3-590-gd93bc5feded1)
* dragonboard-410c, boot
- clang-20-lkftconfig
- clang-nightly-lkftconfig-kselftest
- gcc-13-lkftconfig-debug
* dragonboard-845c, boot
- clang-20-lkftconfig
- korg-clang-20-lkftconfig-lto-thing
* dragonboard-845c-compat, boot
- gcc-13-lkftconfig-compat
* e850-96, boot
- gcc-13-lkftconfig-no-kselftest-frag
* juno-r2, boot
- clang-20-lkftconfig
- gcc-13-lkftconfig-debug
- gcc-13-lkftconfig-kselftest
* rk3399-rock-pi-4b, boot
- clang-20-lkftconfig
* x86, boot
- clang-20-lkftconfig
- clang-20-lkftconfig-no-kselftest-frag
- clang-nightly-lkftconfig-kselftest
- clang-nightly-lkftconfig-lto-thing
- gcc-13-defconfig-preempt_rt
- gcc-13-lkftconfig-no-kselftest-frag
--
Linaro LKFT
https://lkft.linaro.org
According to the "GPIO Expander Map / Table" section of the J722S EVM
Schematic within the Evaluation Module Design Files package [0], the
GPIO Pin P05 located on the GPIO Expander 1 (I2C0/0x23) has to be pulled
down to select the Type-C interface. Since commit under Fixes claims to
enable the Type-C interface, update the property within "p05-hog" from
"output-high" to "output-low", thereby switching from the Type-A
interface to the Type-C interface.
[0]: https://www.ti.com/lit/zip/sprr495
Cc: <stable(a)vger.kernel.org>
Fixes: 485705df5d5f ("arm64: dts: ti: k3-j722s: Enable PCIe and USB support on J722S-EVM")
Signed-off-by: Siddharth Vadapalli <s-vadapalli(a)ti.com>
---
Hello,
This patch is based on commit
86731a2a651e Linux 6.16-rc3
of Mainline Linux.
Regards,
Siddharth.
arch/arm64/boot/dts/ti/k3-j722s-evm.dts | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/ti/k3-j722s-evm.dts b/arch/arm64/boot/dts/ti/k3-j722s-evm.dts
index a47852fdca70..d0533723412a 100644
--- a/arch/arm64/boot/dts/ti/k3-j722s-evm.dts
+++ b/arch/arm64/boot/dts/ti/k3-j722s-evm.dts
@@ -634,7 +634,7 @@ p05-hog {
/* P05 - USB2.0_MUX_SEL */
gpio-hog;
gpios = <5 GPIO_ACTIVE_LOW>;
- output-high;
+ output-low;
};
p01_hog: p01-hog {
--
2.34.1
When the GPU scheduler was ported to using a struct for its
initialization parameters, it was overlooked that panfrost creates a
distinct workqueue for timeout handling.
The pointer to this new workqueue is not initialized to the struct,
resulting in NULL being passed to the scheduler, which then uses the
system_wq for timeout handling.
Set the correct workqueue to the init args struct.
Cc: stable(a)vger.kernel.org # 6.15+
Fixes: 796a9f55a8d1 ("drm/sched: Use struct for drm_sched_init() params")
Reported-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Closes: https://lore.kernel.org/dri-devel/b5d0921c-7cbf-4d55-aa47-c35cd7861c02@igal…
Signed-off-by: Philipp Stanner <phasta(a)kernel.org>
---
drivers/gpu/drm/panfrost/panfrost_job.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 5657106c2f7d..15e2d505550f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -841,7 +841,6 @@ int panfrost_job_init(struct panfrost_device *pfdev)
.num_rqs = DRM_SCHED_PRIORITY_COUNT,
.credit_limit = 2,
.timeout = msecs_to_jiffies(JOB_TIMEOUT_MS),
- .timeout_wq = pfdev->reset.wq,
.name = "pan_js",
.dev = pfdev->dev,
};
@@ -879,6 +878,7 @@ int panfrost_job_init(struct panfrost_device *pfdev)
pfdev->reset.wq = alloc_ordered_workqueue("panfrost-reset", 0);
if (!pfdev->reset.wq)
return -ENOMEM;
+ args.timeout_wq = pfdev->reset.wq;
for (j = 0; j < NUM_JOB_SLOTS; j++) {
js->queue[j].fence_context = dma_fence_context_alloc(1);
--
2.49.0
The SD current limit logic is updated to avoid explicitly setting the
current limit when the maximum power is 200mA (0.72W) or less, as this
is already the default value. The code now only issues a current limit
switch if a higher limit is required, and the unused
SD_SET_CURRENT_NO_CHANGE constant is removed. This reduces unnecessary
commands and simplifies the logic.
Fixes: 0aa6770000ba ("mmc: sdhci: only set 200mA support for 1.8v if 200mA is available")
Signed-off-by: Avri Altman <avri.altman(a)sandisk.com>
Cc: stable(a)vger.kernel.org
---
drivers/mmc/core/sd.c | 7 ++-----
include/linux/mmc/card.h | 1 -
2 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
index ec02067f03c5..cf92c5b2059a 100644
--- a/drivers/mmc/core/sd.c
+++ b/drivers/mmc/core/sd.c
@@ -554,7 +554,7 @@ static u32 sd_get_host_max_current(struct mmc_host *host)
static int sd_set_current_limit(struct mmc_card *card, u8 *status)
{
- int current_limit = SD_SET_CURRENT_NO_CHANGE;
+ int current_limit = SD_SET_CURRENT_LIMIT_200;
int err;
u32 max_current;
@@ -598,11 +598,8 @@ static int sd_set_current_limit(struct mmc_card *card, u8 *status)
else if (max_current >= 400 &&
card->sw_caps.sd3_curr_limit & SD_MAX_CURRENT_400)
current_limit = SD_SET_CURRENT_LIMIT_400;
- else if (max_current >= 200 &&
- card->sw_caps.sd3_curr_limit & SD_MAX_CURRENT_200)
- current_limit = SD_SET_CURRENT_LIMIT_200;
- if (current_limit != SD_SET_CURRENT_NO_CHANGE) {
+ if (current_limit != SD_SET_CURRENT_LIMIT_200) {
err = mmc_sd_switch(card, SD_SWITCH_SET, 3,
current_limit, status);
if (err)
diff --git a/include/linux/mmc/card.h b/include/linux/mmc/card.h
index ddcdf23d731c..e9e964c20e53 100644
--- a/include/linux/mmc/card.h
+++ b/include/linux/mmc/card.h
@@ -182,7 +182,6 @@ struct sd_switch_caps {
#define SD_SET_CURRENT_LIMIT_400 1
#define SD_SET_CURRENT_LIMIT_600 2
#define SD_SET_CURRENT_LIMIT_800 3
-#define SD_SET_CURRENT_NO_CHANGE (-1)
#define SD_MAX_CURRENT_200 (1 << SD_SET_CURRENT_LIMIT_200)
#define SD_MAX_CURRENT_400 (1 << SD_SET_CURRENT_LIMIT_400)
--
2.25.1
According to PCI/endpoint/pci-endpoint-cfs.rst, the endpoint (EP) should
only link up after `echo 1 > start` is executed.
To match the documented behavior, do not start the link automatically
when adding the EP controller.
Fixes: 75c2f26da03f ("PCI: imx6: Add i.MX PCIe EP mode support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Richard Zhu <hongxing.zhu(a)nxp.com>
Reviewed-by: Frank Li <Frank.Li(a)nxp.com>
---
drivers/pci/controller/dwc/pci-imx6.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/pci/controller/dwc/pci-imx6.c b/drivers/pci/controller/dwc/pci-imx6.c
index f5f2ac638f4b..fda03512944d 100644
--- a/drivers/pci/controller/dwc/pci-imx6.c
+++ b/drivers/pci/controller/dwc/pci-imx6.c
@@ -1468,9 +1468,6 @@ static int imx_add_pcie_ep(struct imx_pcie *imx_pcie,
pci_epc_init_notify(ep->epc);
- /* Start LTSSM. */
- imx_pcie_ltssm_enable(dev);
-
return 0;
}
--
2.37.1
apps_reset is LTSSM_EN on i.MX7, i.MX8MQ, i.MX8MM and i.MX8MP platforms.
Since the assertion/de-assertion of apps_reset(LTSSM_EN bit) had been
wrappered in imx_pcie_ltssm_enable() and imx_pcie_ltssm_disable();
Remove apps_reset toggle in imx_pcie_assert_core_reset() and
imx_pcie_deassert_core_reset() functions. Use imx_pcie_ltssm_enable()
and imx_pcie_ltssm_disable() to configure apps_reset directly.
Fix fail to enumerate reliably PI7C9X2G608GP (hotplug) at i.MX8MM, which
reported By Tim.
Only i.MX7D, i.MX8MQ, i.MX8MM, and i.MX8MP have the apps_reset. With
this change, the assertion/deassertion of LTSSM_EN bit are unified into
imx_pcie_ltssm_enable() and imx_pcie_ltssm_disable() functions, and
aligned with other i.MX platforms.
Reported-by: Tim Harvey <tharvey(a)gateworks.com>
Closes: https://lore.kernel.org/all/CAJ+vNU3ohR2YKTwC4xoYrc1z-neDoH2TTZcMHDy+poj9=j…
Fixes: ef61c7d8d032 ("PCI: imx6: Deassert apps_reset in imx_pcie_deassert_core_reset()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Richard Zhu <hongxing.zhu(a)nxp.com>
Reviewed-by: Frank Li <Frank.Li(a)nxp.com>
Tested-by: Tim Harvey <tharvey(a)gateworks.com> # imx8mp-venice-gw74xx (i.MX8MP + hotplug capable switch)
---
drivers/pci/controller/dwc/pci-imx6.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/controller/dwc/pci-imx6.c b/drivers/pci/controller/dwc/pci-imx6.c
index 9754cc6e09b9..f5f2ac638f4b 100644
--- a/drivers/pci/controller/dwc/pci-imx6.c
+++ b/drivers/pci/controller/dwc/pci-imx6.c
@@ -860,7 +860,6 @@ static int imx95_pcie_core_reset(struct imx_pcie *imx_pcie, bool assert)
static void imx_pcie_assert_core_reset(struct imx_pcie *imx_pcie)
{
reset_control_assert(imx_pcie->pciephy_reset);
- reset_control_assert(imx_pcie->apps_reset);
if (imx_pcie->drvdata->core_reset)
imx_pcie->drvdata->core_reset(imx_pcie, true);
@@ -872,7 +871,6 @@ static void imx_pcie_assert_core_reset(struct imx_pcie *imx_pcie)
static int imx_pcie_deassert_core_reset(struct imx_pcie *imx_pcie)
{
reset_control_deassert(imx_pcie->pciephy_reset);
- reset_control_deassert(imx_pcie->apps_reset);
if (imx_pcie->drvdata->core_reset)
imx_pcie->drvdata->core_reset(imx_pcie, false);
@@ -1247,6 +1245,9 @@ static int imx_pcie_host_init(struct dw_pcie_rp *pp)
}
}
+ /* Make sure that PCIe LTSSM is cleared */
+ imx_pcie_ltssm_disable(dev);
+
ret = imx_pcie_deassert_core_reset(imx_pcie);
if (ret < 0) {
dev_err(dev, "pcie deassert core reset failed: %d\n", ret);
--
2.37.1
The patch titled
Subject: mm/damon/core: commit damos->target_nid
has been added to the -mm mm-new branch. Its filename is
mm-damon-core-commit-damos-target_nid.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Bijan Tabatabai <bijantabatab(a)micron.com>
Subject: mm/damon/core: commit damos->target_nid
Date: Tue, 8 Jul 2025 19:47:29 -0500
When committing new scheme parameters from the sysfs, the target_nid field
of the damos struct would not be copied. This would result in the
target_nid field to retain its original value, despite being updated in
the sysfs interface.
This patch fixes this issue by copying target_nid in damos_commit().
Link: https://lkml.kernel.org/r/20250709004729.17252-1-bijan311@gmail.com
Fixes: 83dc7bbaecae ("mm/damon/sysfs: use damon_commit_ctx()")
Signed-off-by: Bijan Tabatabai <bijantabatab(a)micron.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Ravi Shankar Jonnalagadda <ravis.opensrc(a)micron.com>
Cc: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/core.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/damon/core.c~mm-damon-core-commit-damos-target_nid
+++ a/mm/damon/core.c
@@ -978,6 +978,7 @@ static int damos_commit(struct damos *ds
return err;
dst->wmarks = src->wmarks;
+ dst->target_nid = src->target_nid;
err = damos_commit_filters(dst, src);
return err;
_
Patches currently in -mm which might be from bijantabatab(a)micron.com are
mm-damon-core-commit-damos-target_nid.patch
mm-damon-core-commit-damos-migrate_dests.patch
mm-damon-move-migration-helpers-from-paddr-to-ops-common.patch
mm-damon-vaddr-add-vaddr-versions-of-migrate_hotcold.patch
docs-mm-damon-design-document-vaddr-support-for-migrate_hotcold.patch
mm-damon-vaddr-use-damos-migrate_dests-in-migrate_hotcold.patch
mm-damon-move-folio-filtering-from-paddr-to-ops-common.patch
mm-damon-vaddr-apply-filters-in-migrate_hot-cold.patch
From: Bijan Tabatabai <bijantabatab(a)micron.com>
When committing new scheme parameters from the sysfs, the target_nid
field of the damos struct would not be copied. This would result in the
target_nid field to retain its original value, despite being updated in
the sysfs interface.
This patch fixes this issue by copying target_nid in damos_commit().
Cc: stable(a)vger.kernel.org
Fixes: 83dc7bbaecae ("mm/damon/sysfs: use damon_commit_ctx()")
Signed-off-by: Bijan Tabatabai <bijantabatab(a)micron.com>
---
mm/damon/core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 5357a18066b0..ab28ab5d08f6 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -978,6 +978,7 @@ static int damos_commit(struct damos *dst, struct damos *src)
return err;
dst->wmarks = src->wmarks;
+ dst->target_nid = src->target_nid;
err = damos_commit_filters(dst, src);
return err;
--
2.43.0
The patch titled
Subject: selftests/mm: fix split_huge_page_test for folio_split() tests.
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
selftests-mm-fix-split_huge_page_test-for-folio_split-tests.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Zi Yan <ziy(a)nvidia.com>
Subject: selftests/mm: fix split_huge_page_test for folio_split() tests.
Date: Tue, 8 Jul 2025 21:27:59 -0400
PID_FMT does not have an offset field, so folio_split() tests are not
performed. Add PID_FMT_OFFSET with an offset field and use it to perform
folio_split() tests.
Link: https://lkml.kernel.org/r/20250709012800.3225727-1-ziy@nvidia.com
Fixes: 80a5c494c89f ("selftests/mm: add tests for folio_split(), buddy allocator like split")
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Dev Jain <dev.jain(a)arm.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Mariano Pache <npache(a)redhat.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/split_huge_page_test.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/tools/testing/selftests/mm/split_huge_page_test.c~selftests-mm-fix-split_huge_page_test-for-folio_split-tests
+++ a/tools/testing/selftests/mm/split_huge_page_test.c
@@ -31,6 +31,7 @@ uint64_t pmd_pagesize;
#define INPUT_MAX 80
#define PID_FMT "%d,0x%lx,0x%lx,%d"
+#define PID_FMT_OFFSET "%d,0x%lx,0x%lx,%d,%d"
#define PATH_FMT "%s,0x%lx,0x%lx,%d"
#define PFN_MASK ((1UL<<55)-1)
@@ -483,7 +484,7 @@ void split_thp_in_pagecache_to_order_at(
write_debugfs(PID_FMT, getpid(), (uint64_t)addr,
(uint64_t)addr + fd_size, order);
else
- write_debugfs(PID_FMT, getpid(), (uint64_t)addr,
+ write_debugfs(PID_FMT_OFFSET, getpid(), (uint64_t)addr,
(uint64_t)addr + fd_size, order, offset);
for (i = 0; i < fd_size; i++)
_
Patches currently in -mm which might be from ziy(a)nvidia.com are
selftests-mm-fix-split_huge_page_test-for-folio_split-tests.patch
mm-rename-config_page_block_order-to-config_page_block_max_order.patch
mm-page_alloc-pageblock-flags-functions-clean-up.patch
mm-page_isolation-make-page-isolation-a-standalone-bit.patch
mm-page_alloc-add-support-for-initializing-pageblock-as-isolated.patch
mm-page_isolation-remove-migratetype-from-move_freepages_block_isolate.patch
mm-page_isolation-remove-migratetype-from-undo_isolate_page_range.patch
mm-page_isolation-remove-migratetype-parameter-from-more-functions.patch
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 0ea148a799198518d8ebab63ddd0bb6114a103bc
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025063052-strive-fabulous-239b@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0ea148a799198518d8ebab63ddd0bb6114a103bc Mon Sep 17 00:00:00 2001
From: Kairui Song <kasong(a)tencent.com>
Date: Wed, 4 Jun 2025 23:10:38 +0800
Subject: [PATCH] mm: userfaultfd: fix race of userfaultfd_move and swap cache
This commit fixes two kinds of races, they may have different results:
Barry reported a BUG_ON in commit c50f8e6053b0, we may see the same
BUG_ON if the filemap lookup returned NULL and folio is added to swap
cache after that.
If another kind of race is triggered (folio changed after lookup) we
may see RSS counter is corrupted:
[ 406.893936] BUG: Bad rss-counter state mm:ffff0000c5a9ddc0
type:MM_ANONPAGES val:-1
[ 406.894071] BUG: Bad rss-counter state mm:ffff0000c5a9ddc0
type:MM_SHMEMPAGES val:1
Because the folio is being accounted to the wrong VMA.
I'm not sure if there will be any data corruption though, seems no.
The issues above are critical already.
On seeing a swap entry PTE, userfaultfd_move does a lockless swap cache
lookup, and tries to move the found folio to the faulting vma. Currently,
it relies on checking the PTE value to ensure that the moved folio still
belongs to the src swap entry and that no new folio has been added to the
swap cache, which turns out to be unreliable.
While working and reviewing the swap table series with Barry, following
existing races are observed and reproduced [1]:
In the example below, move_pages_pte is moving src_pte to dst_pte, where
src_pte is a swap entry PTE holding swap entry S1, and S1 is not in the
swap cache:
CPU1 CPU2
userfaultfd_move
move_pages_pte()
entry = pte_to_swp_entry(orig_src_pte);
// Here it got entry = S1
... < interrupted> ...
<swapin src_pte, alloc and use folio A>
// folio A is a new allocated folio
// and get installed into src_pte
<frees swap entry S1>
// src_pte now points to folio A, S1
// has swap count == 0, it can be freed
// by folio_swap_swap or swap
// allocator's reclaim.
<try to swap out another folio B>
// folio B is a folio in another VMA.
<put folio B to swap cache using S1 >
// S1 is freed, folio B can use it
// for swap out with no problem.
...
folio = filemap_get_folio(S1)
// Got folio B here !!!
... < interrupted again> ...
<swapin folio B and free S1>
// Now S1 is free to be used again.
<swapout src_pte & folio A using S1>
// Now src_pte is a swap entry PTE
// holding S1 again.
folio_trylock(folio)
move_swap_pte
double_pt_lock
is_pte_pages_stable
// Check passed because src_pte == S1
folio_move_anon_rmap(...)
// Moved invalid folio B here !!!
The race window is very short and requires multiple collisions of multiple
rare events, so it's very unlikely to happen, but with a deliberately
constructed reproducer and increased time window, it can be reproduced
easily.
This can be fixed by checking if the folio returned by filemap is the
valid swap cache folio after acquiring the folio lock.
Another similar race is possible: filemap_get_folio may return NULL, but
folio (A) could be swapped in and then swapped out again using the same
swap entry after the lookup. In such a case, folio (A) may remain in the
swap cache, so it must be moved too:
CPU1 CPU2
userfaultfd_move
move_pages_pte()
entry = pte_to_swp_entry(orig_src_pte);
// Here it got entry = S1, and S1 is not in swap cache
folio = filemap_get_folio(S1)
// Got NULL
... < interrupted again> ...
<swapin folio A and free S1>
<swapout folio A re-using S1>
move_swap_pte
double_pt_lock
is_pte_pages_stable
// Check passed because src_pte == S1
folio_move_anon_rmap(...)
// folio A is ignored !!!
Fix this by checking the swap cache again after acquiring the src_pte
lock. And to avoid the filemap overhead, we check swap_map directly [2].
The SWP_SYNCHRONOUS_IO path does make the problem more complex, but so far
we don't need to worry about that, since folios can only be exposed to the
swap cache in the swap out path, and this is covered in this patch by
checking the swap cache again after acquiring the src_pte lock.
Testing with a simple C program that allocates and moves several GB of
memory did not show any observable performance change.
Link: https://lkml.kernel.org/r/20250604151038.21968-1-ryncsn@gmail.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Kairui Song <kasong(a)tencent.com>
Closes: https://lore.kernel.org/linux-mm/CAMgjq7B1K=6OOrK2OUZ0-tqCzi+EJt+2_K97TPGoS… [1]
Link: https://lore.kernel.org/all/CAGsJ_4yJhJBo16XhiC-nUzSheyX-V3-nFE+tAi=8Y560K8… [2]
Reviewed-by: Lokesh Gidra <lokeshgidra(a)google.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Suren Baghdasaryan <surenb(a)google.com>
Reviewed-by: Barry Song <baohua(a)kernel.org>
Reviewed-by: Chris Li <chrisl(a)kernel.org>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kairui Song <kasong(a)tencent.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index bc473ad21202..8253978ee0fb 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1084,8 +1084,18 @@ static int move_swap_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma,
pte_t orig_dst_pte, pte_t orig_src_pte,
pmd_t *dst_pmd, pmd_t dst_pmdval,
spinlock_t *dst_ptl, spinlock_t *src_ptl,
- struct folio *src_folio)
+ struct folio *src_folio,
+ struct swap_info_struct *si, swp_entry_t entry)
{
+ /*
+ * Check if the folio still belongs to the target swap entry after
+ * acquiring the lock. Folio can be freed in the swap cache while
+ * not locked.
+ */
+ if (src_folio && unlikely(!folio_test_swapcache(src_folio) ||
+ entry.val != src_folio->swap.val))
+ return -EAGAIN;
+
double_pt_lock(dst_ptl, src_ptl);
if (!is_pte_pages_stable(dst_pte, src_pte, orig_dst_pte, orig_src_pte,
@@ -1102,6 +1112,25 @@ static int move_swap_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma,
if (src_folio) {
folio_move_anon_rmap(src_folio, dst_vma);
src_folio->index = linear_page_index(dst_vma, dst_addr);
+ } else {
+ /*
+ * Check if the swap entry is cached after acquiring the src_pte
+ * lock. Otherwise, we might miss a newly loaded swap cache folio.
+ *
+ * Check swap_map directly to minimize overhead, READ_ONCE is sufficient.
+ * We are trying to catch newly added swap cache, the only possible case is
+ * when a folio is swapped in and out again staying in swap cache, using the
+ * same entry before the PTE check above. The PTL is acquired and released
+ * twice, each time after updating the swap_map's flag. So holding
+ * the PTL here ensures we see the updated value. False positive is possible,
+ * e.g. SWP_SYNCHRONOUS_IO swapin may set the flag without touching the
+ * cache, or during the tiny synchronization window between swap cache and
+ * swap_map, but it will be gone very quickly, worst result is retry jitters.
+ */
+ if (READ_ONCE(si->swap_map[swp_offset(entry)]) & SWAP_HAS_CACHE) {
+ double_pt_unlock(dst_ptl, src_ptl);
+ return -EAGAIN;
+ }
}
orig_src_pte = ptep_get_and_clear(mm, src_addr, src_pte);
@@ -1412,7 +1441,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
}
err = move_swap_pte(mm, dst_vma, dst_addr, src_addr, dst_pte, src_pte,
orig_dst_pte, orig_src_pte, dst_pmd, dst_pmdval,
- dst_ptl, src_ptl, src_folio);
+ dst_ptl, src_ptl, src_folio, si, entry);
}
out: