April 2025 - Linux-stable-mirror

[REGRESSION][BISECTED] Commit 60e3318e3e900 in stable/linux-6.1.y breaks cifs client failover to another server in DFS namespace

by Andrew Paniakin

Commit 60e3318e3e900 ("cifs: use fs_context for automounts") was released in v6.1.54 and broke the failover when one of the servers inside DFS becomes unavailable. We reproduced the problem on the EC2 instances of different types. Reverting aforementioned commint on top of the latest stable verison v6.1.94 helps to resolve the problem. Earliest working version is v6.2-rc1. There were two big merges of CIFS fixes: [1] and [2]. We would like to ask for the help to investigate this problem and if some of those patches need to be backported. Also, is it safe to just revert problematic commit until proper fixes/backports will be available? We will help to do testing and confirm if fix works, but let me also list the steps we used to reproduce the problem if it will help to identify the problem: 1. Create Active Directory domain eg. 'corp.fsxtest.local' in AWS Directory Service with: - three AWS FSX file systems filesystem1..filesystem3 - three Windows servers; They have DFS installed as per https://learn.microsoft.com/en-us/windows-server/storage/dfs-namespaces/dfs…: - dfs-srv1: EC2AMAZ-2EGTM59 - dfs-srv2: EC2AMAZ-1N36PRD - dfs-srv3: EC2AMAZ-0PAUH2U 2. Create DFS namespace eg. 'dfs-namespace' in Windows server 2008 mode and three folders targets in it: - referral-a mapped to filesystem1.corp.local - referral-b mapped to filesystem2.corp.local - referral-c mapped to filesystem3.corp.local - local folders dfs-srv1..dfs-srv3 in C:\DFSRoots\dfs-namespace of every Windows server. This helps to quickly define underlying server when DFS is mounted. 3. Enabled cifs debug logs: ``` echo 'module cifs +p' > /sys/kernel/debug/dynamic_debug/control echo 'file fs/cifs/* +p' > /sys/kernel/debug/dynamic_debug/control echo 7 > /proc/fs/cifs/cifsFYI ``` 4. Mount DFS namespace on Amazon Linux 2023 instance running any vanilla kernel v6.1.54+: ``` dmesg -c &>/dev/null cd /mnt mount -t cifs -o cred=/mnt/creds,echo_interval=5 \ //corp.fsxtest.local/dfs-namespace \ ./dfs-namespace ``` 5. List DFS root, it's also required to avoid recursive mounts that happen during regular 'ls' run: ``` sh -c 'ls dfs-namespace' dfs-srv2 referral-a referral-b ``` The DFS server is EC2AMAZ-1N36PRD, it's also listed in mount: ``` [root@ip-172-31-2-82 mnt]# mount | grep dfs //corp.fsxtest.local/dfs-namespace on /mnt/dfs-namespace type cifs (rw,relatime,vers=3.1.1,cache=strict,username=Admin,domain=corp.fsxtest.local,uid=0,noforceuid,gid=0,noforcegid,addr=172.31.11.26,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=5,actimeo=1,closetimeo=1) //EC2AMAZ-1N36PRD.corp.fsxtest.local/dfs-namespace/referral-a on /mnt/dfs-namespace/referral-a type cifs (rw,relatime,vers=3.1.1,cache=strict,username=Admin,domain=corp.fsxtest.local,uid=0,noforceuid,gid=0,noforcegid,addr=172.31.12.80,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=5,actimeo=1,closetimeo=1) ``` List files in first folder: ``` sh -c 'ls dfs-namespace/referral-a' filea.txt.txt ``` 6. Shutdown DFS server-2. List DFS root again, server changed from dfs-srv2 to dfs-srv1 EC2AMAZ-2EGTM59: ``` sh -c 'ls dfs-namespace' dfs-srv1 referral-a referral-b ``` 7. Try to list files in another folder, this causes ls to fail with error: ``` sh -c 'ls dfs-namespace/referral-b' ls: cannot access 'dfs-namespace/referral-b': No route to host``` Sometimes it's also 'Operation now in progress' error. mount shows the same output: ``` //corp.fsxtest.local/dfs-namespace on /mnt/dfs-namespace type cifs (rw,relatime,vers=3.1.1,cache=strict,username=Admin,domain=corp.fsxtest.local,uid=0,noforceuid,gid=0,noforcegid,addr=172.31.11.26,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=5,actimeo=1,closetimeo=1) //EC2AMAZ-1N36PRD.corp.fsxtest.local/dfs-namespace/referral-a on /mnt/dfs-namespace/referral-a type cifs (rw,relatime,vers=3.1.1,cache=strict,username=Admin,domain=corp.fsxtest.local,uid=0,noforceuid,gid=0,noforcegid,addr=172.31.12.80,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=5,actimeo=1,closetimeo=1) ``` I also attached kernel debug logs from this test. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… Reported-by: Andrei Paniakin <apanyaki(a)amazon.com> Bisected-by: Simba Bonga <simbarb(a)amazon.com> --- #regzbot introduced: v6.1.54..v6.2-rc1

1 month, 3 weeks

3
12
0 0

[PATCH] soc: qcom: pmic_glink_altmode: fix spurious DP hotplug events

by Johan Hovold

The PMIC GLINK driver is currently generating DisplayPort hotplug notifications whenever something is connected to (or disconnected from) a port regardless of the type of notification sent by the firmware. These notifications are forwarded to user space by the DRM subsystem as connector "change" uevents: KERNEL[1556.223776] change /devices/platform/soc(a)0/ae00000.display-subsystem/ae01000.display-controller/drm/card0 (drm) ACTION=change DEVPATH=/devices/platform/soc(a)0/ae00000.display-subsystem/ae01000.display-controller/drm/card0 SUBSYSTEM=drm HOTPLUG=1 CONNECTOR=36 DEVNAME=/dev/dri/card0 DEVTYPE=drm_minor SEQNUM=4176 MAJOR=226 MINOR=0 On the Lenovo ThinkPad X13s and T14s, the PMIC GLINK firmware sends two identical notifications with orientation information when connecting a charger, each generating a bogus DRM hotplug event. On the X13s, two such notification are also sent every 90 seconds while a charger remains connected, which again are forwarded to user space: port = 1, svid = ff00, mode = 255, hpd_state = 0 payload = 01 00 00 00 00 00 00 ff 00 00 00 00 00 00 00 00 Note that the firmware only sends on of these when connecting an ethernet adapter. Fix the spurious hotplug events by only forwarding hotplug notifications for the Type-C DisplayPort service id. This also reduces the number of uevents from four to two when an actual DisplayPort altmode device is connected: port = 0, svid = ff01, mode = 2, hpd_state = 0 payload = 00 01 02 00 f2 0c 01 ff 03 00 00 00 00 00 00 00 port = 0, svid = ff01, mode = 2, hpd_state = 1 payload = 00 01 02 00 f2 0c 01 ff 43 00 00 00 00 00 00 00 Fixes: 080b4e24852b ("soc: qcom: pmic_glink: Introduce altmode support") Cc: stable(a)vger.kernel.org # 6.3 Cc: Bjorn Andersson <andersson(a)kernel.org> Reported-by: Clayton Craft <clayton(a)craftyguy.net> Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org> --- Clayton reported seeing display flickering with recent RC kernels, which may possibly be related to these spurious events being generated with even greater frequency. That still remains to be fully understood, but the spurious events, that on the X13s are generated every 90 seconds, should be fixed either way. Johan drivers/soc/qcom/pmic_glink_altmode.c | 30 +++++++++++++++++---------- 1 file changed, 19 insertions(+), 11 deletions(-) diff --git a/drivers/soc/qcom/pmic_glink_altmode.c b/drivers/soc/qcom/pmic_glink_altmode.c index bd06ce161804..7f11acd33323 100644 --- a/drivers/soc/qcom/pmic_glink_altmode.c +++ b/drivers/soc/qcom/pmic_glink_altmode.c @@ -218,21 +218,29 @@ static void pmic_glink_altmode_worker(struct work_struct *work) { struct pmic_glink_altmode_port *alt_port = work_to_altmode_port(work); struct pmic_glink_altmode *altmode = alt_port->altmode; + enum drm_connector_status conn_status; typec_switch_set(alt_port->typec_switch, alt_port->orientation); - if (alt_port->svid == USB_TYPEC_DP_SID && alt_port->mode == 0xff) - pmic_glink_altmode_safe(altmode, alt_port); - else if (alt_port->svid == USB_TYPEC_DP_SID) - pmic_glink_altmode_enable_dp(altmode, alt_port, alt_port->mode, - alt_port->hpd_state, alt_port->hpd_irq); - else - pmic_glink_altmode_enable_usb(altmode, alt_port); + if (alt_port->svid == USB_TYPEC_DP_SID) { + if (alt_port->mode == 0xff) { + pmic_glink_altmode_safe(altmode, alt_port); + } else { + pmic_glink_altmode_enable_dp(altmode, alt_port, + alt_port->mode, + alt_port->hpd_state, + alt_port->hpd_irq); + } - drm_aux_hpd_bridge_notify(&alt_port->bridge->dev, - alt_port->hpd_state ? - connector_status_connected : - connector_status_disconnected); + if (alt_port->hpd_state) + conn_status = connector_status_connected; + else + conn_status = connector_status_disconnected; + + drm_aux_hpd_bridge_notify(&alt_port->bridge->dev, conn_status); + } else { + pmic_glink_altmode_enable_usb(altmode, alt_port); + } pmic_glink_altmode_request(altmode, ALTMODE_PAN_ACK, alt_port->index); } -- 2.48.1

1 month, 3 weeks

5
9
0 0

[PATCH] ACPI: resource: fix a typo for MECHREVO in irq1_edge_low_force_override

by Mingcong Bai

The vendor name for MECHREVO was incorrectly spelled in commit b53f09ecd602 ("ACPI: resource: Do IRQ override on MECHREV GM7XG0M"). Correct this typo in this trivial patch. Cc: stable(a)vger.kernel.org Fixes: b53f09ecd602 ("ACPI: resource: Do IRQ override on MECHREV GM7XG0M") Signed-off-by: Mingcong Bai <jeffbai(a)aosc.io> --- drivers/acpi/resource.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 14c7bac4100b..7d59c6c9185f 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -534,7 +534,7 @@ static const struct dmi_system_id irq1_level_low_skip_override[] = { */ static const struct dmi_system_id irq1_edge_low_force_override[] = { { - /* MECHREV Jiaolong17KS Series GM7XG0M */ + /* MECHREVO Jiaolong17KS Series GM7XG0M */ .matches = { DMI_MATCH(DMI_BOARD_NAME, "GM7XG0M"), }, -- 2.49.0

1 month, 3 weeks

2
1
0 0

[PATCH v1] usb: typec: tcpm: apply vbus before data bringup in tcpm_src_attach

by RD Babiera

This patch fixes Type-C compliance test TD 4.7.6 - Try.SNK DRP Connect SNKAS. tVbusON has a limit of 275ms when entering SRC_ATTACHED. Compliance testers can interpret the TryWait.Src to Attached.Src transition after Try.Snk as being in Attached.Src the entire time, so ~170ms is lost to the debounce timer. Setting the data role can be a costly operation in host mode, and when completed after 100ms can cause Type-C compliance test check TD 4.7.5.V.4 to fail. Turn VBUS on before tcpm_set_roles to meet timing requirement. Fixes: f0690a25a140 ("staging: typec: USB Type-C Port Manager (tcpm)") Cc: stable(a)vger.kernel.org Signed-off-by: RD Babiera <rdbabiera(a)google.com> Reviewed-by: Badhri Jagan Sridharan <badhri(a)google.com> --- drivers/usb/typec/tcpm/tcpm.c | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c index 784fa23102f9..e099a3c4428d 100644 --- a/drivers/usb/typec/tcpm/tcpm.c +++ b/drivers/usb/typec/tcpm/tcpm.c @@ -4355,17 +4355,6 @@ static int tcpm_src_attach(struct tcpm_port *port) tcpm_enable_auto_vbus_discharge(port, true); - ret = tcpm_set_roles(port, true, TYPEC_STATE_USB, - TYPEC_SOURCE, tcpm_data_role_for_source(port)); - if (ret < 0) - return ret; - - if (port->pd_supported) { - ret = port->tcpc->set_pd_rx(port->tcpc, true); - if (ret < 0) - goto out_disable_mux; - } - /* * USB Type-C specification, version 1.2, * chapter 4.5.2.2.8.1 (Attached.SRC Requirements) @@ -4375,13 +4364,24 @@ static int tcpm_src_attach(struct tcpm_port *port) (polarity == TYPEC_POLARITY_CC2 && port->cc1 == TYPEC_CC_RA)) { ret = tcpm_set_vconn(port, true); if (ret < 0) - goto out_disable_pd; + return ret; } ret = tcpm_set_vbus(port, true); if (ret < 0) goto out_disable_vconn; + ret = tcpm_set_roles(port, true, TYPEC_STATE_USB, TYPEC_SOURCE, + tcpm_data_role_for_source(port)); + if (ret < 0) + goto out_disable_vbus; + + if (port->pd_supported) { + ret = port->tcpc->set_pd_rx(port->tcpc, true); + if (ret < 0) + goto out_disable_mux; + } + port->pd_capable = false; port->partner = NULL; @@ -4392,14 +4392,14 @@ static int tcpm_src_attach(struct tcpm_port *port) return 0; -out_disable_vconn: - tcpm_set_vconn(port, false); -out_disable_pd: - if (port->pd_supported) - port->tcpc->set_pd_rx(port->tcpc, false); out_disable_mux: tcpm_mux_set(port, TYPEC_STATE_SAFE, USB_ROLE_NONE, TYPEC_ORIENTATION_NONE); +out_disable_vbus: + tcpm_set_vbus(port, false); +out_disable_vconn: + tcpm_set_vconn(port, false); + return ret; } base-commit: 615dca38c2eae55aff80050275931c87a812b48c -- 2.49.0.967.g6a0df3ecc3-goog

1 month, 3 weeks

3
4
0 0

[PATCHv3] thunderbolt: do not double dequeue a request

by Sergey Senozhatsky

Some of our devices crash in tb_cfg_request_dequeue(): general protection fault, probably for non-canonical address 0xdead000000000122 CPU: 6 PID: 91007 Comm: kworker/6:2 Tainted: G U W 6.6.65 RIP: 0010:tb_cfg_request_dequeue+0x2d/0xa0 Call Trace: <TASK> ? tb_cfg_request_dequeue+0x2d/0xa0 tb_cfg_request_work+0x33/0x80 worker_thread+0x386/0x8f0 kthread+0xed/0x110 ret_from_fork+0x38/0x50 ret_from_fork_asm+0x1b/0x30 The circumstances are unclear, however, the theory is that tb_cfg_request_work() can be scheduled twice for a request: first time via frame.callback from ring_work() and second time from tb_cfg_request(). Both times kworkers will execute tb_cfg_request_dequeue(), which results in double list_del() from the ctl->request_queue (the list poison deference hints at it: 0xdead000000000122). Do not dequeue requests that don't have TB_CFG_REQUEST_ACTIVE bit set. Signed-off-by: Sergey Senozhatsky <senozhatsky(a)chromium.org> Cc: stable(a)vger.kernel.org --- v3: tweaked commit message drivers/thunderbolt/ctl.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/thunderbolt/ctl.c b/drivers/thunderbolt/ctl.c index cd15e84c47f4..1db2e951b53f 100644 --- a/drivers/thunderbolt/ctl.c +++ b/drivers/thunderbolt/ctl.c @@ -151,6 +151,11 @@ static void tb_cfg_request_dequeue(struct tb_cfg_request *req) struct tb_ctl *ctl = req->ctl; mutex_lock(&ctl->request_queue_lock); + if (!test_bit(TB_CFG_REQUEST_ACTIVE, &req->flags)) { + mutex_unlock(&ctl->request_queue_lock); + return; + } + list_del(&req->list); clear_bit(TB_CFG_REQUEST_ACTIVE, &req->flags); if (test_bit(TB_CFG_REQUEST_CANCELED, &req->flags)) -- 2.49.0.395.g12beb8f557-goog

1 month, 3 weeks

2
2
0 0

[PATCH] clk: s2mps11: initialise clk_hw_onecell_data::num before accessing ::hws[] in probe()

by André Draszik

With UBSAN enabled, we're getting the following trace: UBSAN: array-index-out-of-bounds in .../drivers/clk/clk-s2mps11.c:186:3 index 0 is out of range for type 'struct clk_hw *[] __counted_by(num)' (aka 'struct clk_hw *[]') This is because commit f316cdff8d67 ("clk: Annotate struct clk_hw_onecell_data with __counted_by") annotated the hws member of that struct with __counted_by, which informs the bounds sanitizer about the number of elements in hws, so that it can warn when hws is accessed out of bounds. As noted in that change, the __counted_by member must be initialised with the number of elements before the first array access happens, otherwise there will be a warning from each access prior to the initialisation because the number of elements is zero. This occurs in s2mps11_clk_probe() due to ::num being assigned after ::hws access. Move the assignment to satisfy the requirement of assign-before-access. Cc: stable(a)vger.kernel.org Fixes: f316cdff8d67 ("clk: Annotate struct clk_hw_onecell_data with __counted_by") Signed-off-by: André Draszik <andre.draszik(a)linaro.org> --- drivers/clk/clk-s2mps11.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/clk/clk-s2mps11.c b/drivers/clk/clk-s2mps11.c index 014db6386624071e173b5b940466301d2596400a..8ddf3a9a53dfd5bb52a05a3e02788a357ea77ad3 100644 --- a/drivers/clk/clk-s2mps11.c +++ b/drivers/clk/clk-s2mps11.c @@ -137,6 +137,8 @@ static int s2mps11_clk_probe(struct platform_device *pdev) if (!clk_data) return -ENOMEM; + clk_data->num = S2MPS11_CLKS_NUM; + switch (hwid) { case S2MPS11X: s2mps11_reg = S2MPS11_REG_RTC_CTRL; @@ -186,7 +188,6 @@ static int s2mps11_clk_probe(struct platform_device *pdev) clk_data->hws[i] = &s2mps11_clks[i].hw; } - clk_data->num = S2MPS11_CLKS_NUM; of_clk_add_hw_provider(s2mps11_clks->clk_np, of_clk_hw_onecell_get, clk_data); --- base-commit: 9388ec571cb1adba59d1cded2300eeb11827679c change-id: 20250326-s2mps11-ubsan-c90978e7bc04 Best regards, -- André Draszik <andre.draszik(a)linaro.org>

1 month, 3 weeks

3
2
0 0

[PATCH 1/2] iio: Fix scan mask subset check logic

by Fabio Estevam

From: Fabio Estevam <festevam(a)denx.de> Since commit 2718f15403fb ("iio: sanity check available_scan_masks array"), verbose and misleading warnings are printed for devices like the MAX11601: max1363 1-0064: available_scan_mask 8 subset of 0. Never used max1363 1-0064: available_scan_mask 9 subset of 0. Never used max1363 1-0064: available_scan_mask 10 subset of 0. Never used max1363 1-0064: available_scan_mask 11 subset of 0. Never used max1363 1-0064: available_scan_mask 12 subset of 0. Never used max1363 1-0064: available_scan_mask 13 subset of 0. Never used ... [warnings continue] Fix the available_scan_masks sanity check logic so that it only prints the warning when an element of available_scan_mask is in fact a subset of a previous one. These warnings incorrectly report that later scan masks are subsets of the first one, even when they are not. The issue lies in the logic that checks for subset relationships between scan masks. Fix the subset detection to correctly compare each mask only against previous masks, and only warn when a true subset is found. With this fix, the warning output becomes both correct and more informative: max1363 1-0064: Mask 7 (0xc) is a subset of mask 6 (0xf) and will be ignored Cc: stable(a)vger.kernel.org Fixes: 2718f15403fb ("iio: sanity check available_scan_masks array") Signed-off-by: Fabio Estevam <festevam(a)denx.de> --- drivers/iio/industrialio-core.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c index 6a6568d4a2cb..855d5fd3e6b2 100644 --- a/drivers/iio/industrialio-core.c +++ b/drivers/iio/industrialio-core.c @@ -1904,6 +1904,11 @@ static int iio_check_extended_name(const struct iio_dev *indio_dev) static const struct iio_buffer_setup_ops noop_ring_setup_ops; +static int is_subset(unsigned long a, unsigned long b) +{ + return (a & ~b) == 0; +} + static void iio_sanity_check_avail_scan_masks(struct iio_dev *indio_dev) { unsigned int num_masks, masklength, longs_per_mask; @@ -1947,21 +1952,13 @@ static void iio_sanity_check_avail_scan_masks(struct iio_dev *indio_dev) * available masks in the order of preference (presumably the least * costy to access masks first). */ - for (i = 0; i < num_masks - 1; i++) { - const unsigned long *mask1; - int j; - mask1 = av_masks + i * longs_per_mask; - for (j = i + 1; j < num_masks; j++) { - const unsigned long *mask2; - - mask2 = av_masks + j * longs_per_mask; - if (bitmap_subset(mask2, mask1, masklength)) + for (i = 1; i < num_masks; ++i) + for (int j = 0; j < i; ++j) + if (is_subset(av_masks[i], av_masks[j])) dev_warn(indio_dev->dev.parent, - "available_scan_mask %d subset of %d. Never used\n", - j, i); - } - } + "Mask %d (0x%lx) is a subset of mask %d (0x%lx) and will be ignored\n", + i, av_masks[i], j, av_masks[j]); } /** -- 2.34.1

1 month, 3 weeks

3
17
0 0

FAILED: patch "[PATCH] mm: page_alloc: speed up fallbacks in rmqueue_bulk()" failed to apply to 6.12-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.12-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y git checkout FETCH_HEAD git cherry-pick -x 90abee6d7895d5eef18c91d870d8168be4e76e9d # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025042150-hardiness-hunting-0780@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 90abee6d7895d5eef18c91d870d8168be4e76e9d Mon Sep 17 00:00:00 2001 From: Johannes Weiner <hannes(a)cmpxchg.org> Date: Mon, 7 Apr 2025 14:01:53 -0400 Subject: [PATCH] mm: page_alloc: speed up fallbacks in rmqueue_bulk() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy") as the root cause of a 56.4% regression in vm-scalability::lru-file-mmap-read. Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion"), as the root cause for a regression in worst-case zone->lock+irqoff hold times. Both of these patches modify the page allocator's fallback path to be less greedy in an effort to stave off fragmentation. The flip side of this is that fallbacks are also less productive each time around, which means the fallback search can run much more frequently. Carlos' traces point to rmqueue_bulk() specifically, which tries to refill the percpu cache by allocating a large batch of pages in a loop. It highlights how once the native freelists are exhausted, the fallback code first scans orders top-down for whole blocks to claim, then falls back to a bottom-up search for the smallest buddy to steal. For the next batch page, it goes through the same thing again. This can be made more efficient. Since rmqueue_bulk() holds the zone->lock over the entire batch, the freelists are not subject to outside changes; when the search for a block to claim has already failed, there is no point in trying again for the next page. Modify __rmqueue() to remember the last successful fallback mode, and restart directly from there on the next rmqueue_bulk() iteration. Oliver confirms that this improves beyond the regression that the test robot reported against c2f6ea38fc1b: commit: f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap") c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy") acc4d5ff0b ("Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") 2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()") <--- your patch f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1 acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5 ---------------- --------------------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev %change %stddev \ | \ | \ | \ 25525364 ± 3% -56.4% 11135467 -57.8% 10779336 +31.6% 33581409 vm-scalability.throughput Carlos confirms that worst-case times are almost fully recovered compared to before the earlier culprit patch: 2dd482ba627d (before freelist hygiene): 1ms c0cd6f557b90 (after freelist hygiene): 90ms next-20250319 (steal smallest buddy): 280ms this patch : 8ms [jackmanb(a)google.com: comment updates] Link: https://lkml.kernel.org/r/D92AC0P9594X.3BML64MUKTF8Z@google.com [hannes(a)cmpxchg.org: reset rmqueue_mode in rmqueue_buddy() error loop, per Yunsheng Lin] Link: https://lkml.kernel.org/r/20250409140023.GA2313@cmpxchg.org Link: https://lkml.kernel.org/r/20250407180154.63348-1-hannes@cmpxchg.org Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion") Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy") Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org> Signed-off-by: Brendan Jackman <jackmanb(a)google.com> Reported-by: kernel test robot <oliver.sang(a)intel.com> Reported-by: Carlos Song <carlos.song(a)nxp.com> Tested-by: Carlos Song <carlos.song(a)nxp.com> Tested-by: kernel test robot <oliver.sang(a)intel.com> Closes: https://lore.kernel.org/oe-lkp/202503271547.fc08b188-lkp@intel.com Reviewed-by: Brendan Jackman <jackmanb(a)google.com> Tested-by: Shivank Garg <shivankg(a)amd.com> Acked-by: Zi Yan <ziy(a)nvidia.com> Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> [6.10+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9a219fe8e130..1715e34b91af 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2183,23 +2183,15 @@ try_to_claim_block(struct zone *zone, struct page *page, } /* - * Try finding a free buddy page on the fallback list. - * - * This will attempt to claim a whole pageblock for the requested type - * to ensure grouping of such requests in the future. - * - * If a whole block cannot be claimed, steal an individual page, regressing to - * __rmqueue_smallest() logic to at least break up as little contiguity as - * possible. + * Try to allocate from some fallback migratetype by claiming the entire block, + * i.e. converting it to the allocation's start migratetype. * * The use of signed ints for order and current_order is a deliberate * deviation from the rest of this file, to make the for loop * condition simpler. - * - * Return the stolen page, or NULL if none can be found. */ static __always_inline struct page * -__rmqueue_fallback(struct zone *zone, int order, int start_migratetype, +__rmqueue_claim(struct zone *zone, int order, int start_migratetype, unsigned int alloc_flags) { struct free_area *area; @@ -2237,14 +2229,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, page = try_to_claim_block(zone, page, current_order, order, start_migratetype, fallback_mt, alloc_flags); - if (page) - goto got_one; + if (page) { + trace_mm_page_alloc_extfrag(page, order, current_order, + start_migratetype, fallback_mt); + return page; + } } - if (alloc_flags & ALLOC_NOFRAGMENT) - return NULL; + return NULL; +} + +/* + * Try to steal a single page from some fallback migratetype. Leave the rest of + * the block as its current migratetype, potentially causing fragmentation. + */ +static __always_inline struct page * +__rmqueue_steal(struct zone *zone, int order, int start_migratetype) +{ + struct free_area *area; + int current_order; + struct page *page; + int fallback_mt; + bool claim_block; - /* No luck claiming pageblock. Find the smallest fallback page */ for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) { area = &(zone->free_area[current_order]); fallback_mt = find_suitable_fallback(area, current_order, @@ -2254,25 +2261,28 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, page = get_page_from_free_area(area, fallback_mt); page_del_and_expand(zone, page, order, current_order, fallback_mt); - goto got_one; + trace_mm_page_alloc_extfrag(page, order, current_order, + start_migratetype, fallback_mt); + return page; } return NULL; - -got_one: - trace_mm_page_alloc_extfrag(page, order, current_order, - start_migratetype, fallback_mt); - - return page; } +enum rmqueue_mode { + RMQUEUE_NORMAL, + RMQUEUE_CMA, + RMQUEUE_CLAIM, + RMQUEUE_STEAL, +}; + /* * Do the hard work of removing an element from the buddy allocator. * Call me with the zone->lock already held. */ static __always_inline struct page * __rmqueue(struct zone *zone, unsigned int order, int migratetype, - unsigned int alloc_flags) + unsigned int alloc_flags, enum rmqueue_mode *mode) { struct page *page; @@ -2291,16 +2301,48 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype, } } - page = __rmqueue_smallest(zone, order, migratetype); - if (unlikely(!page)) { - if (alloc_flags & ALLOC_CMA) + /* + * First try the freelists of the requested migratetype, then try + * fallbacks modes with increasing levels of fragmentation risk. + * + * The fallback logic is expensive and rmqueue_bulk() calls in + * a loop with the zone->lock held, meaning the freelists are + * not subject to any outside changes. Remember in *mode where + * we found pay dirt, to save us the search on the next call. + */ + switch (*mode) { + case RMQUEUE_NORMAL: + page = __rmqueue_smallest(zone, order, migratetype); + if (page) + return page; + fallthrough; + case RMQUEUE_CMA: + if (alloc_flags & ALLOC_CMA) { page = __rmqueue_cma_fallback(zone, order); - - if (!page) - page = __rmqueue_fallback(zone, order, migratetype, - alloc_flags); + if (page) { + *mode = RMQUEUE_CMA; + return page; + } + } + fallthrough; + case RMQUEUE_CLAIM: + page = __rmqueue_claim(zone, order, migratetype, alloc_flags); + if (page) { + /* Replenished preferred freelist, back to normal mode. */ + *mode = RMQUEUE_NORMAL; + return page; + } + fallthrough; + case RMQUEUE_STEAL: + if (!(alloc_flags & ALLOC_NOFRAGMENT)) { + page = __rmqueue_steal(zone, order, migratetype); + if (page) { + *mode = RMQUEUE_STEAL; + return page; + } + } } - return page; + return NULL; } /* @@ -2312,6 +2354,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long count, struct list_head *list, int migratetype, unsigned int alloc_flags) { + enum rmqueue_mode rmqm = RMQUEUE_NORMAL; unsigned long flags; int i; @@ -2323,7 +2366,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, } for (i = 0; i < count; ++i) { struct page *page = __rmqueue(zone, order, migratetype, - alloc_flags); + alloc_flags, &rmqm); if (unlikely(page == NULL)) break; @@ -2948,7 +2991,9 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, if (alloc_flags & ALLOC_HIGHATOMIC) page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); if (!page) { - page = __rmqueue(zone, order, migratetype, alloc_flags); + enum rmqueue_mode rmqm = RMQUEUE_NORMAL; + + page = __rmqueue(zone, order, migratetype, alloc_flags, &rmqm); /* * If the allocation fails, allow OOM handling and

1 month, 3 weeks

2
2
0 0

FAILED: patch "[PATCH] drm/xe: Invalidate L3 read-only cachelines for geometry" failed to apply to 6.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.14.y git checkout FETCH_HEAD git cherry-pick -x e775278cd75f24a2758c28558c4e41b36c935740 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025042247-mounting-playlist-f479@gregkh' --subject-prefix 'PATCH 6.14.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From e775278cd75f24a2758c28558c4e41b36c935740 Mon Sep 17 00:00:00 2001 From: Kenneth Graunke <kenneth(a)whitecape.org> Date: Sun, 30 Mar 2025 12:59:23 -0400 Subject: [PATCH] drm/xe: Invalidate L3 read-only cachelines for geometry streams too Historically, the Vertex Fetcher unit has not been an L3 client. That meant that, when a buffer containing vertex data was written to, it was necessary to issue a PIPE_CONTROL::VF Cache Invalidate to invalidate any VF L2 cachelines associated with that buffer, so the new value would be properly read from memory. Since Tigerlake and later, VERTEX_BUFFER_STATE and 3DSTATE_INDEX_BUFFER have included an "L3 Bypass Enable" bit which userspace drivers can set to request that the vertex fetcher unit snoop L3. However, unlike most true L3 clients, the "VF Cache Invalidate" bit continues to only invalidate the VF L2 cache - and not any associated L3 lines. To handle that, PIPE_CONTROL has a new "L3 Read Only Cache Invalidation Bit", which according to the docs, "controls the invalidation of the Geometry streams cached in L3 cache at the top of the pipe." In other words, the vertex and index buffer data that gets cached in L3 when "L3 Bypass Disable" is set. Mesa always sets L3 Bypass Disable so that the VF unit snoops L3, and whenever it issues a VF Cache Invalidate, it also issues a L3 Read Only Cache Invalidate so that both L2 and L3 vertex data is invalidated. xe is issuing VF cache invalidates too (which handles cases like CPU writes to a buffer between GPU batches). Because userspace may enable L3 snooping, it needs to issue an L3 Read Only Cache Invalidate as well. Fixes significant flickering in Firefox on Meteorlake, which was writing to vertex buffers via the CPU between batches; the missing L3 Read Only invalidates were causing the vertex fetcher to read stale data from L3. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4460 Fixes: 6ef3bb60557d ("drm/xe: enable lite restore") Cc: stable(a)vger.kernel.org # v6.13+ Signed-off-by: Kenneth Graunke <kenneth(a)whitecape.org> Reviewed-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com> Link: https://lore.kernel.org/r/20250330165923.56410-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com> (cherry picked from commit 61672806b579dd5a150a042ec9383be2bbc2ae7e) Signed-off-by: Lucas De Marchi <lucas.demarchi(a)intel.com> diff --git a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h index a255946b6f77..8cfcd3360896 100644 --- a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h +++ b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h @@ -41,6 +41,7 @@ #define GFX_OP_PIPE_CONTROL(len) ((0x3<<29)|(0x3<<27)|(0x2<<24)|((len)-2)) +#define PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE BIT(10) /* gen12 */ #define PIPE_CONTROL0_HDC_PIPELINE_FLUSH BIT(9) /* gen12 */ #define PIPE_CONTROL_COMMAND_CACHE_INVALIDATE (1<<29) diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c index 917fc16de866..a7582b097ae6 100644 --- a/drivers/gpu/drm/xe/xe_ring_ops.c +++ b/drivers/gpu/drm/xe/xe_ring_ops.c @@ -137,7 +137,8 @@ emit_pipe_control(u32 *dw, int i, u32 bit_group_0, u32 bit_group_1, u32 offset, static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw, int i) { - u32 flags = PIPE_CONTROL_CS_STALL | + u32 flags0 = 0; + u32 flags1 = PIPE_CONTROL_CS_STALL | PIPE_CONTROL_COMMAND_CACHE_INVALIDATE | PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE | PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE | @@ -148,11 +149,15 @@ static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw, PIPE_CONTROL_STORE_DATA_INDEX; if (invalidate_tlb) - flags |= PIPE_CONTROL_TLB_INVALIDATE; + flags1 |= PIPE_CONTROL_TLB_INVALIDATE; - flags &= ~mask_flags; + flags1 &= ~mask_flags; - return emit_pipe_control(dw, i, 0, flags, LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0); + if (flags1 & PIPE_CONTROL_VF_CACHE_INVALIDATE) + flags0 |= PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE; + + return emit_pipe_control(dw, i, flags0, flags1, + LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0); } static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,

1 month, 3 weeks

3
2
0 0

FAILED: patch "[PATCH] mm: page_alloc: speed up fallbacks in rmqueue_bulk()" failed to apply to 6.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.14.y git checkout FETCH_HEAD git cherry-pick -x 90abee6d7895d5eef18c91d870d8168be4e76e9d # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025042149-busily-amaretto-e684@gregkh' --subject-prefix 'PATCH 6.14.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 90abee6d7895d5eef18c91d870d8168be4e76e9d Mon Sep 17 00:00:00 2001 From: Johannes Weiner <hannes(a)cmpxchg.org> Date: Mon, 7 Apr 2025 14:01:53 -0400 Subject: [PATCH] mm: page_alloc: speed up fallbacks in rmqueue_bulk() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy") as the root cause of a 56.4% regression in vm-scalability::lru-file-mmap-read. Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion"), as the root cause for a regression in worst-case zone->lock+irqoff hold times. Both of these patches modify the page allocator's fallback path to be less greedy in an effort to stave off fragmentation. The flip side of this is that fallbacks are also less productive each time around, which means the fallback search can run much more frequently. Carlos' traces point to rmqueue_bulk() specifically, which tries to refill the percpu cache by allocating a large batch of pages in a loop. It highlights how once the native freelists are exhausted, the fallback code first scans orders top-down for whole blocks to claim, then falls back to a bottom-up search for the smallest buddy to steal. For the next batch page, it goes through the same thing again. This can be made more efficient. Since rmqueue_bulk() holds the zone->lock over the entire batch, the freelists are not subject to outside changes; when the search for a block to claim has already failed, there is no point in trying again for the next page. Modify __rmqueue() to remember the last successful fallback mode, and restart directly from there on the next rmqueue_bulk() iteration. Oliver confirms that this improves beyond the regression that the test robot reported against c2f6ea38fc1b: commit: f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap") c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy") acc4d5ff0b ("Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") 2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()") <--- your patch f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1 acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5 ---------------- --------------------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev %change %stddev \ | \ | \ | \ 25525364 ± 3% -56.4% 11135467 -57.8% 10779336 +31.6% 33581409 vm-scalability.throughput Carlos confirms that worst-case times are almost fully recovered compared to before the earlier culprit patch: 2dd482ba627d (before freelist hygiene): 1ms c0cd6f557b90 (after freelist hygiene): 90ms next-20250319 (steal smallest buddy): 280ms this patch : 8ms [jackmanb(a)google.com: comment updates] Link: https://lkml.kernel.org/r/D92AC0P9594X.3BML64MUKTF8Z@google.com [hannes(a)cmpxchg.org: reset rmqueue_mode in rmqueue_buddy() error loop, per Yunsheng Lin] Link: https://lkml.kernel.org/r/20250409140023.GA2313@cmpxchg.org Link: https://lkml.kernel.org/r/20250407180154.63348-1-hannes@cmpxchg.org Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion") Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy") Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org> Signed-off-by: Brendan Jackman <jackmanb(a)google.com> Reported-by: kernel test robot <oliver.sang(a)intel.com> Reported-by: Carlos Song <carlos.song(a)nxp.com> Tested-by: Carlos Song <carlos.song(a)nxp.com> Tested-by: kernel test robot <oliver.sang(a)intel.com> Closes: https://lore.kernel.org/oe-lkp/202503271547.fc08b188-lkp@intel.com Reviewed-by: Brendan Jackman <jackmanb(a)google.com> Tested-by: Shivank Garg <shivankg(a)amd.com> Acked-by: Zi Yan <ziy(a)nvidia.com> Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> [6.10+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9a219fe8e130..1715e34b91af 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2183,23 +2183,15 @@ try_to_claim_block(struct zone *zone, struct page *page, } /* - * Try finding a free buddy page on the fallback list. - * - * This will attempt to claim a whole pageblock for the requested type - * to ensure grouping of such requests in the future. - * - * If a whole block cannot be claimed, steal an individual page, regressing to - * __rmqueue_smallest() logic to at least break up as little contiguity as - * possible. + * Try to allocate from some fallback migratetype by claiming the entire block, + * i.e. converting it to the allocation's start migratetype. * * The use of signed ints for order and current_order is a deliberate * deviation from the rest of this file, to make the for loop * condition simpler. - * - * Return the stolen page, or NULL if none can be found. */ static __always_inline struct page * -__rmqueue_fallback(struct zone *zone, int order, int start_migratetype, +__rmqueue_claim(struct zone *zone, int order, int start_migratetype, unsigned int alloc_flags) { struct free_area *area; @@ -2237,14 +2229,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, page = try_to_claim_block(zone, page, current_order, order, start_migratetype, fallback_mt, alloc_flags); - if (page) - goto got_one; + if (page) { + trace_mm_page_alloc_extfrag(page, order, current_order, + start_migratetype, fallback_mt); + return page; + } } - if (alloc_flags & ALLOC_NOFRAGMENT) - return NULL; + return NULL; +} + +/* + * Try to steal a single page from some fallback migratetype. Leave the rest of + * the block as its current migratetype, potentially causing fragmentation. + */ +static __always_inline struct page * +__rmqueue_steal(struct zone *zone, int order, int start_migratetype) +{ + struct free_area *area; + int current_order; + struct page *page; + int fallback_mt; + bool claim_block; - /* No luck claiming pageblock. Find the smallest fallback page */ for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) { area = &(zone->free_area[current_order]); fallback_mt = find_suitable_fallback(area, current_order, @@ -2254,25 +2261,28 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, page = get_page_from_free_area(area, fallback_mt); page_del_and_expand(zone, page, order, current_order, fallback_mt); - goto got_one; + trace_mm_page_alloc_extfrag(page, order, current_order, + start_migratetype, fallback_mt); + return page; } return NULL; - -got_one: - trace_mm_page_alloc_extfrag(page, order, current_order, - start_migratetype, fallback_mt); - - return page; } +enum rmqueue_mode { + RMQUEUE_NORMAL, + RMQUEUE_CMA, + RMQUEUE_CLAIM, + RMQUEUE_STEAL, +}; + /* * Do the hard work of removing an element from the buddy allocator. * Call me with the zone->lock already held. */ static __always_inline struct page * __rmqueue(struct zone *zone, unsigned int order, int migratetype, - unsigned int alloc_flags) + unsigned int alloc_flags, enum rmqueue_mode *mode) { struct page *page; @@ -2291,16 +2301,48 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype, } } - page = __rmqueue_smallest(zone, order, migratetype); - if (unlikely(!page)) { - if (alloc_flags & ALLOC_CMA) + /* + * First try the freelists of the requested migratetype, then try + * fallbacks modes with increasing levels of fragmentation risk. + * + * The fallback logic is expensive and rmqueue_bulk() calls in + * a loop with the zone->lock held, meaning the freelists are + * not subject to any outside changes. Remember in *mode where + * we found pay dirt, to save us the search on the next call. + */ + switch (*mode) { + case RMQUEUE_NORMAL: + page = __rmqueue_smallest(zone, order, migratetype); + if (page) + return page; + fallthrough; + case RMQUEUE_CMA: + if (alloc_flags & ALLOC_CMA) { page = __rmqueue_cma_fallback(zone, order); - - if (!page) - page = __rmqueue_fallback(zone, order, migratetype, - alloc_flags); + if (page) { + *mode = RMQUEUE_CMA; + return page; + } + } + fallthrough; + case RMQUEUE_CLAIM: + page = __rmqueue_claim(zone, order, migratetype, alloc_flags); + if (page) { + /* Replenished preferred freelist, back to normal mode. */ + *mode = RMQUEUE_NORMAL; + return page; + } + fallthrough; + case RMQUEUE_STEAL: + if (!(alloc_flags & ALLOC_NOFRAGMENT)) { + page = __rmqueue_steal(zone, order, migratetype); + if (page) { + *mode = RMQUEUE_STEAL; + return page; + } + } } - return page; + return NULL; } /* @@ -2312,6 +2354,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long count, struct list_head *list, int migratetype, unsigned int alloc_flags) { + enum rmqueue_mode rmqm = RMQUEUE_NORMAL; unsigned long flags; int i; @@ -2323,7 +2366,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, } for (i = 0; i < count; ++i) { struct page *page = __rmqueue(zone, order, migratetype, - alloc_flags); + alloc_flags, &rmqm); if (unlikely(page == NULL)) break; @@ -2948,7 +2991,9 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, if (alloc_flags & ALLOC_HIGHATOMIC) page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); if (!page) { - page = __rmqueue(zone, order, migratetype, alloc_flags); + enum rmqueue_mode rmqm = RMQUEUE_NORMAL; + + page = __rmqueue(zone, order, migratetype, alloc_flags, &rmqm); /* * If the allocation fails, allow OOM handling and

1 month, 3 weeks

2
2
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror April 2025