- Linux-stable-mirror - lists.linaro.org

[merged] sched-numa-fix-the-potential-null-pointer-dereference-in-task_numa_work.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: sched/numa: fix the potential null pointer dereference in task_numa_work() has been removed from the -mm tree. Its filename was sched-numa-fix-the-potential-null-pointer-dereference-in-task_numa_work.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Shawn Wang <shawnwang(a)linux.alibaba.com> Subject: sched/numa: fix the potential null pointer dereference in task_numa_work() Date: Fri, 25 Oct 2024 10:22:08 +0800 When running stress-ng-vm-segv test, we found a null pointer dereference error in task_numa_work(). Here is the backtrace: [323676.066985] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 ...... [323676.067108] CPU: 35 PID: 2694524 Comm: stress-ng-vm-se ...... [323676.067113] pstate: 23401009 (nzCv daif +PAN -UAO +TCO +DIT +SSBS BTYPE=--) [323676.067115] pc : vma_migratable+0x1c/0xd0 [323676.067122] lr : task_numa_work+0x1ec/0x4e0 [323676.067127] sp : ffff8000ada73d20 [323676.067128] x29: ffff8000ada73d20 x28: 0000000000000000 x27: 000000003e89f010 [323676.067130] x26: 0000000000080000 x25: ffff800081b5c0d8 x24: ffff800081b27000 [323676.067133] x23: 0000000000010000 x22: 0000000104d18cc0 x21: ffff0009f7158000 [323676.067135] x20: 0000000000000000 x19: 0000000000000000 x18: ffff8000ada73db8 [323676.067138] x17: 0001400000000000 x16: ffff800080df40b0 x15: 0000000000000035 [323676.067140] x14: ffff8000ada73cc8 x13: 1fffe0017cc72001 x12: ffff8000ada73cc8 [323676.067142] x11: ffff80008001160c x10: ffff000be639000c x9 : ffff8000800f4ba4 [323676.067145] x8 : ffff000810375000 x7 : ffff8000ada73974 x6 : 0000000000000001 [323676.067147] x5 : 0068000b33e26707 x4 : 0000000000000001 x3 : ffff0009f7158000 [323676.067149] x2 : 0000000000000041 x1 : 0000000000004400 x0 : 0000000000000000 [323676.067152] Call trace: [323676.067153] vma_migratable+0x1c/0xd0 [323676.067155] task_numa_work+0x1ec/0x4e0 [323676.067157] task_work_run+0x78/0xd8 [323676.067161] do_notify_resume+0x1ec/0x290 [323676.067163] el0_svc+0x150/0x160 [323676.067167] el0t_64_sync_handler+0xf8/0x128 [323676.067170] el0t_64_sync+0x17c/0x180 [323676.067173] Code: d2888001 910003fd f9000bf3 aa0003f3 (f9401000) [323676.067177] SMP: stopping secondary CPUs [323676.070184] Starting crashdump kernel... stress-ng-vm-segv in stress-ng is used to stress test the SIGSEGV error handling function of the system, which tries to cause a SIGSEGV error on return from unmapping the whole address space of the child process. Normally this program will not cause kernel crashes. But before the munmap system call returns to user mode, a potential task_numa_work() for numa balancing could be added and executed. In this scenario, since the child process has no vma after munmap, the vma_next() in task_numa_work() will return a null pointer even if the vma iterator restarts from 0. Recheck the vma pointer before dereferencing it in task_numa_work(). Link: https://lkml.kernel.org/r/20241025022208.125527-1-shawnwang@linux.alibaba.c… Fixes: 214dbc428137 ("sched: convert to vma iterator") Signed-off-by: Shawn Wang <shawnwang(a)linux.alibaba.com> Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com> Cc: Ben Segall <bsegall(a)google.com> Cc: Dietmar Eggemann <dietmar.eggemann(a)arm.com> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Juri Lelli <juri.lelli(a)redhat.com> Cc: Mel Gorman <mgorman(a)suse.de> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Steven Rostedt (Google) <rostedt(a)goodmis.org> Cc: Valentin Schneider <vschneid(a)redhat.com> Cc: Vincent Guittot <vincent.guittot(a)linaro.org> Cc: <stable(a)vger.kernel.org> [6.2+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- kernel/sched/fair.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/kernel/sched/fair.c~sched-numa-fix-the-potential-null-pointer-dereference-in-task_numa_work +++ a/kernel/sched/fair.c @@ -3369,7 +3369,7 @@ retry_pids: vma = vma_next(&vmi); } - do { + for (; vma; vma = vma_next(&vmi)) { if (!vma_migratable(vma) || !vma_policy_mof(vma) || is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) { trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_UNSUITABLE); @@ -3491,7 +3491,7 @@ retry_pids: */ if (vma_pids_forced) break; - } for_each_vma(vmi, vma); + } /* * If no VMAs are remaining and VMAs were skipped due to the PID _ Patches currently in -mm which might be from shawnwang(a)linux.alibaba.com are

10 months, 4 weeks

1
0
0 0

[PATCH net v3] mctp i2c: handle NULL header address

by Matt Johnston

daddr can be NULL if there is no neighbour table entry present, in that case the tx packet should be dropped. saddr will usually be set by MCTP core, but check for NULL in case a packet is transmitted by a different protocol. Fixes: f5b8abf9fc3d ("mctp i2c: MCTP I2C binding driver") Cc: stable(a)vger.kernel.org Reported-by: Dung Cao <dung(a)os.amperecomputing.com> Signed-off-by: Matt Johnston <matt(a)codeconstruct.com.au> --- Changes in v3: - Revert to simpler saddr check of v1, mention in commit message - Revert whitespace change from v2 - Link to v2: https://lore.kernel.org/r/20241021-mctp-i2c-null-dest-v2-1-4503e478517c@cod… Changes in v2: - Set saddr to device address if NULL, mention in commit message - Fix patch prefix formatting - Link to v1: https://lore.kernel.org/r/20241018-mctp-i2c-null-dest-v1-1-ba1ab52966e9@cod… --- drivers/net/mctp/mctp-i2c.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/mctp/mctp-i2c.c b/drivers/net/mctp/mctp-i2c.c index 4dc057c121f5d0fb9c9c48bf16b6933ae2f7b2ac..e70fb66879941f3937b7ffc5bc1e20a8a435a441 100644 --- a/drivers/net/mctp/mctp-i2c.c +++ b/drivers/net/mctp/mctp-i2c.c @@ -588,6 +588,9 @@ static int mctp_i2c_header_create(struct sk_buff *skb, struct net_device *dev, if (len > MCTP_I2C_MAXMTU) return -EMSGSIZE; + if (!daddr || !saddr) + return -EINVAL; + lldst = *((u8 *)daddr); llsrc = *((u8 *)saddr); --- base-commit: cb560795c8c2ceca1d36a95f0d1b2eafc4074e37 change-id: 20241018-mctp-i2c-null-dest-a0ba271e0c48 Best regards, -- Matt Johnston <matt(a)codeconstruct.com.au>

10 months, 4 weeks

3
2
0 0

[PATCH net] net: phy: dp83869: fix status reporting for 1000base-x autonegotiation

by Romain Gantois

The DP83869 PHY transceiver supports converting from RGMII to 1000base-x. In this operation mode, autonegotiation can be performed, as described in IEEE802.3. The DP83869 has a set of fiber-specific registers located at offset 0xc00. When the transceiver is configured in RGMII-to-1000base-x mode, these registers are mapped onto offset 0, which should, in theory, make reading the autonegotiation status transparent. However, the fiber registers at offset 0xc04 and 0xc05 do not follow the bit layout of their standard counterparts. Thus, genphy_read_status() doesn't properly read the capabilities advertised by the link partner, resulting in incorrect link parameters. Similarly, genphy_config_aneg() doesn't properly write advertised capabilities. Fix the 1000base-x autonegotiation procedure by replacing genphy_read_status() and genphy_config_aneg() with driver-specific functions which take into account the nonstandard bit layout of the DP83869 registers in 1000base-x mode. Fixes: a29de52ba2a1 ("net: dp83869: Add ability to advertise Fiber connection") Cc: stable(a)vger.kernel.org Signed-off-by: Romain Gantois <romain.gantois(a)bootlin.com> --- drivers/net/phy/dp83869.c | 130 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 127 insertions(+), 3 deletions(-) diff --git a/drivers/net/phy/dp83869.c b/drivers/net/phy/dp83869.c index 5f056d7db83eed23f1cab42365fdc566a0d8e47f..7f89a4f963cab50d6954e8b8996d7bbe2c72a9ca 100644 --- a/drivers/net/phy/dp83869.c +++ b/drivers/net/phy/dp83869.c @@ -41,6 +41,8 @@ #define DP83869_IO_MUX_CFG 0x0170 #define DP83869_OP_MODE 0x01df #define DP83869_FX_CTRL 0x0c00 +#define DP83869_FX_ANADV 0x0c04 +#define DP83869_FX_LPABL 0x0c05 #define DP83869_SW_RESET BIT(15) #define DP83869_SW_RESTART BIT(14) @@ -135,6 +137,17 @@ #define DP83869_DOWNSHIFT_4_COUNT 4 #define DP83869_DOWNSHIFT_8_COUNT 8 +/* FX_ANADV bits */ +#define DP83869_BP_FULL_DUPLEX BIT(5) +#define DP83869_BP_PAUSE BIT(7) +#define DP83869_BP_ASYMMETRIC_PAUSE BIT(8) + +/* FX_LPABL bits */ +#define DP83869_LPA_1000FULL BIT(5) +#define DP83869_LPA_PAUSE_CAP BIT(7) +#define DP83869_LPA_PAUSE_ASYM BIT(8) +#define DP83869_LPA_LPACK BIT(14) + enum { DP83869_PORT_MIRRORING_KEEP, DP83869_PORT_MIRRORING_EN, @@ -153,19 +166,129 @@ struct dp83869_private { int mode; }; +static int dp83869_config_aneg(struct phy_device *phydev) +{ + struct dp83869_private *dp83869 = phydev->priv; + unsigned long *advertising; + int err, changed = false; + u32 adv; + + if (dp83869->mode != DP83869_RGMII_1000_BASE) + return genphy_config_aneg(phydev); + + /* Forcing speed or duplex isn't supported in 1000base-x mode */ + if (phydev->autoneg != AUTONEG_ENABLE) + return 0; + + /* In fiber modes, register locations 0xc0... get mapped to offset 0. + * Unfortunately, the fiber-specific autonegotiation advertisement + * register at address 0xc04 does not have the same bit layout as the + * corresponding standard MII_ADVERTISE register. Thus, functions such + * as genphy_config_advert() will write the advertisement register + * incorrectly. + */ + advertising = phydev->advertising; + + /* Only allow advertising what this PHY supports */ + linkmode_and(advertising, advertising, + phydev->supported); + + if (linkmode_test_bit(ETHTOOL_LINK_MODE_1000baseX_Full_BIT, advertising)) + adv |= DP83869_BP_FULL_DUPLEX; + if (linkmode_test_bit(ETHTOOL_LINK_MODE_Pause_BIT, advertising)) + adv |= DP83869_BP_PAUSE; + if (linkmode_test_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT, advertising)) + adv |= DP83869_BP_ASYMMETRIC_PAUSE; + + err = phy_modify_changed(phydev, DP83869_FX_ANADV, + DP83869_BP_FULL_DUPLEX | DP83869_BP_PAUSE | + DP83869_BP_ASYMMETRIC_PAUSE, + adv); + + if (err < 0) + return err; + else if (err) + changed = true; + + return genphy_check_and_restart_aneg(phydev, changed); +} + +static int dp83869_read_status_fiber(struct phy_device *phydev) +{ + int err, lpa, old_link = phydev->link; + unsigned long *lp_advertising; + + err = genphy_update_link(phydev); + if (err) + return err; + + if (phydev->autoneg == AUTONEG_ENABLE && old_link && phydev->link) + return 0; + + phydev->speed = SPEED_UNKNOWN; + phydev->duplex = DUPLEX_UNKNOWN; + phydev->pause = 0; + phydev->asym_pause = 0; + + lp_advertising = phydev->lp_advertising; + + if (phydev->autoneg != AUTONEG_ENABLE) { + linkmode_zero(lp_advertising); + + phydev->duplex = DUPLEX_FULL; + phydev->speed = SPEED_1000; + + return 0; + } + + if (!phydev->autoneg_complete) { + linkmode_zero(lp_advertising); + return 0; + } + + /* In fiber modes, register locations 0xc0... get mapped to offset 0. + * Unfortunately, the fiber-specific link partner capabilities register + * at address 0xc05 does not have the same bit layout as the + * corresponding standard MII_LPA register. Thus, functions such as + * genphy_read_lpa() will read autonegotiation results incorrectly. + */ + + lpa = phy_read(phydev, DP83869_FX_LPABL); + if (lpa < 0) + return lpa; + + linkmode_mod_bit(ETHTOOL_LINK_MODE_1000baseX_Full_BIT, + lp_advertising, lpa & DP83869_LPA_1000FULL); + + linkmode_mod_bit(ETHTOOL_LINK_MODE_Pause_BIT, lp_advertising, + lpa & DP83869_LPA_PAUSE_CAP); + + linkmode_mod_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT, lp_advertising, + lpa & DP83869_LPA_PAUSE_ASYM); + + linkmode_mod_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, + lp_advertising, lpa & DP83869_LPA_LPACK); + + phy_resolve_aneg_linkmode(phydev); + + return 0; +} + static int dp83869_read_status(struct phy_device *phydev) { struct dp83869_private *dp83869 = phydev->priv; int ret; + if (dp83869->mode == DP83869_RGMII_1000_BASE) + return dp83869_read_status_fiber(phydev); + ret = genphy_read_status(phydev); if (ret) return ret; - if (linkmode_test_bit(ETHTOOL_LINK_MODE_FIBRE_BIT, phydev->supported)) { + if (dp83869->mode == DP83869_RGMII_100_BASE) { if (phydev->link) { - if (dp83869->mode == DP83869_RGMII_100_BASE) - phydev->speed = SPEED_100; + phydev->speed = SPEED_100; } else { phydev->speed = SPEED_UNKNOWN; phydev->duplex = DUPLEX_UNKNOWN; @@ -898,6 +1021,7 @@ static int dp83869_phy_reset(struct phy_device *phydev) .soft_reset = dp83869_phy_reset, \ .config_intr = dp83869_config_intr, \ .handle_interrupt = dp83869_handle_interrupt, \ + .config_aneg = dp83869_config_aneg, \ .read_status = dp83869_read_status, \ .get_tunable = dp83869_get_tunable, \ .set_tunable = dp83869_set_tunable, \ --- base-commit: 94c11e852955b2eef5c4f0b36cfeae7dcf11a759 change-id: 20241025-dp83869-1000base-x-0f0a61725784 Best regards, -- Romain Gantois <romain.gantois(a)bootlin.com>

10 months, 4 weeks

2
1
0 0

[PATCH v2 0/6] phy: core: Fix bugs for several APIs and simplify an API

by Zijun Hu

This patch series is to fix bugs for below APIs: devm_phy_put() devm_of_phy_provider_unregister() devm_phy_destroy() phy_get() of_phy_get() devm_phy_get() devm_of_phy_get() devm_of_phy_get_by_index() And simplify below API: of_phy_simple_xlate(). Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com> --- Changes in v2: - Correct title, commit message, and inline comments. - Link to v1: https://lore.kernel.org/r/20241020-phy_core_fix-v1-0-078062f7da71@quicinc.c… --- Zijun Hu (6): phy: core: Fix that API devm_phy_put() fails to release the phy phy: core: Fix that API devm_of_phy_provider_unregister() fails to unregister the phy provider phy: core: Fix that API devm_phy_destroy() fails to destroy the phy phy: core: Fix an OF node refcount leakage in _of_phy_get() phy: core: Fix an OF node refcount leakage in of_phy_provider_lookup() phy: core: Simplify API of_phy_simple_xlate() implementation drivers/phy/phy-core.c | 39 +++++++++++++++++++-------------------- 1 file changed, 19 insertions(+), 20 deletions(-) --- base-commit: e70d2677ef4088d59158739d72b67ac36d1b132b change-id: 20241020-phy_core_fix-e3ad65db98f7 Best regards, -- Zijun Hu <quic_zijuhu(a)quicinc.com>

10 months, 4 weeks

3
17
0 0

[PATCH v1] arm64: dts: ti: k3-am62-verdin: Fix SD regulator startup delay

by Francesco Dolcini

From: Francesco Dolcini <francesco.dolcini(a)toradex.com> The power switch used to power the SD card interface might have more than 2ms turn-on time, increase the startup delay to 20ms to prevent failures. Fixes: 316b80246b16 ("arm64: dts: ti: add verdin am62") Cc: <stable(a)vger.kernel.org> Signed-off-by: Francesco Dolcini <francesco.dolcini(a)toradex.com> --- arch/arm64/boot/dts/ti/k3-am62-verdin.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/ti/k3-am62-verdin.dtsi b/arch/arm64/boot/dts/ti/k3-am62-verdin.dtsi index 5bef31b8577b..f0eac05f7483 100644 --- a/arch/arm64/boot/dts/ti/k3-am62-verdin.dtsi +++ b/arch/arm64/boot/dts/ti/k3-am62-verdin.dtsi @@ -160,7 +160,7 @@ reg_sdhc1_vmmc: regulator-sdhci1 { regulator-max-microvolt = <3300000>; regulator-min-microvolt = <3300000>; regulator-name = "+V3.3_SD"; - startup-delay-us = <2000>; + startup-delay-us = <20000>; }; reg_sdhc1_vqmmc: regulator-sdhci1-vqmmc { -- 2.39.5

10 months, 4 weeks

2
1
0 0

[PATCH] usb: musb: sunxi: Fix accessing an released usb phy

by Zijun Hu

From: Zijun Hu <quic_zijuhu(a)quicinc.com> Commit 6ed05c68cbca ("usb: musb: sunxi: Explicitly release USB PHY on exit") will cause that usb phy @glue->xceiv is accessed after released. 1) register platform driver @sunxi_musb_driver // get the usb phy @glue->xceiv sunxi_musb_probe() -> devm_usb_get_phy(). 2) register and unregister platform driver @musb_driver musb_probe() -> sunxi_musb_init() use the phy here //the phy is released here musb_remove() -> sunxi_musb_exit() -> devm_usb_put_phy() 3) register @musb_driver again musb_probe() -> sunxi_musb_init() use the phy here but the phy has been released at 2). ... Fixed by reverting the commit, namely, removing devm_usb_put_phy() from sunxi_musb_exit(). Fixes: 6ed05c68cbca ("usb: musb: sunxi: Explicitly release USB PHY on exit") Cc: stable(a)vger.kernel.org Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com> --- drivers/usb/musb/sunxi.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/usb/musb/sunxi.c b/drivers/usb/musb/sunxi.c index d54283fd026b..05b6e7e52e02 100644 --- a/drivers/usb/musb/sunxi.c +++ b/drivers/usb/musb/sunxi.c @@ -293,8 +293,6 @@ static int sunxi_musb_exit(struct musb *musb) if (test_bit(SUNXI_MUSB_FL_HAS_SRAM, &glue->flags)) sunxi_sram_release(musb->controller->parent); - devm_usb_put_phy(glue->dev, glue->xceiv); - return 0; } --- base-commit: afb92ad8733ef0a2843cc229e4d96aead80bc429 change-id: 20241029-sunxi_fix-07fe18228733 Best regards, -- Zijun Hu <quic_zijuhu(a)quicinc.com>

10 months, 4 weeks

1
0
0 0

[PATCH] vmscan,migrate: fix double-decrement on node stats when demoting pages

by Gregory Price

When numa balancing is enabled with demotion, vmscan will call migrate_pages when shrinking LRUs. Successful demotions will cause node vmstat numbers to double-decrement, leading to an imbalanced page count. The result is dmesg output like such: $ cat /proc/sys/vm/stat_refresh [77383.088417] vmstat_refresh: nr_isolated_anon -103212 [77383.088417] vmstat_refresh: nr_isolated_file -899642 This negative value may impact compaction and reclaim throttling. The double-decrement occurs in the migrate_pages path: caller to shrink_folio_list decrements the count shrink_folio_list demote_folio_list migrate_pages migrate_pages_batch migrate_folio_move migrate_folio_done mod_node_page_state(-ve) <- second decrement This path happens for SUCCESSFUL migrations, not failures. Typically callers to migrate_pages are required to handle putback/accounting for failures, but this is already handled in the shrink code. When accounting for migrations, instead do not decrement the count when the migration reason is MR_DEMOTION. As of v6.11, this demotion logic is the only source of MR_DEMOTION. Signed-off-by: Gregory Price <gourry(a)gourry.net> Fixes: 26aa2d199d6f2 ("mm/migrate: demote pages during reclaim") Cc: stable(a)vger.kernel.org --- mm/migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/migrate.c b/mm/migrate.c index 923ea80ba744..e3aac274cf16 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1099,7 +1099,7 @@ static void migrate_folio_done(struct folio *src, * not accounted to NR_ISOLATED_*. They can be recognized * as __folio_test_movable */ - if (likely(!__folio_test_movable(src))) + if (likely(!__folio_test_movable(src)) && reason != MR_DEMOTION) mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON + folio_is_file_lru(src), -folio_nr_pages(src)); -- 2.43.0

10 months, 4 weeks

6
10
0 0

[PATCH 5.10/5.15/6.1 0/1] wifi: ath10k: Check return value of ath10k_get_arvif() in ath10k_wmi_event_tdls_peer()

by Dmitry Kandybka

SVACE reports a potential NULL pointer dereference in 5.10, 5.15 and 6.1 stable releases since the commit 4c9f8d114660 ("ath10k: enable TDLS peer inactivity detection") that caused this report was appeared. The problem has been fixed by the following upstream patch that was adapted to 5.10, 5.15 and 6.1. All of the changes made to the patch in order to adapt it are described at the end of commit message. Found by Linux Verification Center (linuxtesting.org) with SVACE. Peter Kosyh (1): wifi: ath10k: Check return value of ath10k_get_arvif() in ath10k_wmi_event_tdls_peer() drivers/net/wireless/ath/ath10k/wmi-tlv.c | 7 +++++++ 1 file changed, 7 insertions(+) -- 2.43.5

11 months

1
1
0 0

[PATCH v4 1/3] drm/xe: Move LNL scheduling WA to xe_device.h

by Nirmoy Das

Move LNL scheduling WA to xe_device.h so this can be used in other places without needing keep the same comment about removal of this WA in the future. The WA, which flushes work or workqueues, is now wrapped in macros and can be reused wherever needed. Cc: Badal Nilawar <badal.nilawar(a)intel.com> Cc: Matthew Auld <matthew.auld(a)intel.com> Cc: Matthew Brost <matthew.brost(a)intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray(a)intel.com> Cc: Lucas De Marchi <lucas.demarchi(a)intel.com> cc: <stable(a)vger.kernel.org> # v6.11+ Suggested-by: John Harrison <John.C.Harrison(a)Intel.com> Signed-off-by: Nirmoy Das <nirmoy.das(a)intel.com> --- drivers/gpu/drm/xe/xe_device.h | 14 ++++++++++++++ drivers/gpu/drm/xe/xe_guc_ct.c | 11 +---------- 2 files changed, 15 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h index 4c3f0ebe78a9..f1fbfe916867 100644 --- a/drivers/gpu/drm/xe/xe_device.h +++ b/drivers/gpu/drm/xe/xe_device.h @@ -191,4 +191,18 @@ void xe_device_declare_wedged(struct xe_device *xe); struct xe_file *xe_file_get(struct xe_file *xef); void xe_file_put(struct xe_file *xef); +/* + * Occasionally it is seen that the G2H worker starts running after a delay of more than + * a second even after being queued and activated by the Linux workqueue subsystem. This + * leads to G2H timeout error. The root cause of issue lies with scheduling latency of + * Lunarlake Hybrid CPU. Issue disappears if we disable Lunarlake atom cores from BIOS + * and this is beyond xe kmd. + * + * TODO: Drop this change once workqueue scheduling delay issue is fixed on LNL Hybrid CPU. + */ +#define LNL_FLUSH_WORKQUEUE(wq__) \ + flush_workqueue(wq__) +#define LNL_FLUSH_WORK(wrk__) \ + flush_work(wrk__) + #endif diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 1b5d8fb1033a..703b44b257a7 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -1018,17 +1018,8 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len, ret = wait_event_timeout(ct->g2h_fence_wq, g2h_fence.done, HZ); - /* - * Occasionally it is seen that the G2H worker starts running after a delay of more than - * a second even after being queued and activated by the Linux workqueue subsystem. This - * leads to G2H timeout error. The root cause of issue lies with scheduling latency of - * Lunarlake Hybrid CPU. Issue dissappears if we disable Lunarlake atom cores from BIOS - * and this is beyond xe kmd. - * - * TODO: Drop this change once workqueue scheduling delay issue is fixed on LNL Hybrid CPU. - */ if (!ret) { - flush_work(&ct->g2h_worker); + LNL_FLUSH_WORK(&ct->g2h_worker); if (g2h_fence.done) { xe_gt_warn(gt, "G2H fence %u, action %04x, done\n", g2h_fence.seqno, action[0]); -- 2.46.0

11 months

2
4
0 0

[PATCH] firewire: core: fix invalid port index for parent device

by Takashi Sakamoto

In a commit 24b7f8e5cd65 ("firewire: core: use helper functions for self ID sequence"), the enumeration over self ID sequence was refactored with some helper functions with KUnit tests. These helper functions are guaranteed to work expectedly by the KUnit tests, however their application includes a mistake to assign invalid value to the index of port connected to parent device. This bug affects the case that any extra node devices which has three or more ports are connected to 1394 OHCI controller. In the case, the path to update the tree cache could hits WARN_ON(), and gets general protection fault due to the access to invalid address computed by the invalid value. This commit fixes the bug to assign correct port index. Cc: stable(a)vger.kernel.org Reported-by: Edmund Raile <edmund.raile(a)proton.me> Closes: https://lore.kernel.org/lkml/8a9902a4ece9329af1e1e42f5fea76861f0bf0e8.camel… Fixes: 24b7f8e5cd65 ("firewire: core: use helper functions for self ID sequence") Signed-off-by: Takashi Sakamoto <o-takashi(a)sakamocchi.jp> --- drivers/firewire/core-topology.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/firewire/core-topology.c b/drivers/firewire/core-topology.c index 6adadb11962e..892b94cfd626 100644 --- a/drivers/firewire/core-topology.c +++ b/drivers/firewire/core-topology.c @@ -204,7 +204,7 @@ static struct fw_node *build_tree(struct fw_card *card, const u32 *sid, int self // the node->ports array where the parent node should be. Later, // when we handle the parent node, we fix up the reference. ++parent_count; - node->color = i; + node->color = port_index; break; case PHY_PACKET_SELF_ID_PORT_STATUS_CHILD: -- 2.45.2

11 months

2
2
0 0

[RFC PATCH 6.1.y 0/1] cpufreq: amd-pstate: Enable CPU boost in passive and guided modes

by Nabil S. Alramli

Greetings, This is a RFC for a maintenance patch to an issue in the amd_pstate driver where CPU frequency cannot be boosted in passive or guided modes. Without this patch, AMD machines using stable kernels are unable to get their CPU frequency boosted, which is a significant performance issue. For example, on a host that has AMD EPYC 7662 64-Core processor without this patch running at full CPU load: $ for i in $(cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq); \ do ni=$(echo "scale=1; $i/1000000" | bc -l); echo "$ni GHz"; done | \ sort | uniq -c 128 2.0 GHz And with this patch: $ for i in $(cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq); \ do ni=$(echo "scale=1; $i/1000000" | bc -l); echo "$ni GHz"; done | \ sort | uniq -c 128 3.3 GHz I am not sure what the correct process is for submitting patches which affect only stable trees but not the current code base, and do not apply to the current tree. As such, I am submitting this directly to stable@, but please let me know if I should be submitting this elsewhere. The issue was introduced in v6.1 via commit bedadcfb011f ("cpufreq: amd-pstate: Fix initial highest_perf value"), and exists in stable kernels up until v6.6.51. In v6.6.51, a large change, commit 1ec40a175a48 ("cpufreq: amd-pstate: Enable amd-pstate preferred core support"), was introduced which significantly refactored the code. This commit cannot be ported back on its own, and would require reviewing and cherry picking at least a few dozen of commits in cpufreq, amd-pstate, ACPI, CPPC. This means kernels v6.1 up until v6.6.51 are affected by this significant performance issue, and cannot be easily remediated. Thank you for your attention and I look forward to your response in regards to what the best way to proceed is for getting this important performance fix merged. Best Regards, Nabil S. Alramli (1): cpufreq: amd-pstate: Enable CPU boost in passive and guided modes drivers/cpufreq/amd-pstate.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) -- 2.35.1

11 months

3
6
0 0

[PATCH] mtd: spi-nor: winbond: fix w25q128 regression

by Linus Walleij

From: Michael Walle <mwalle(a)kernel.org> Commit 83e824a4a595 ("mtd: spi-nor: Correct flags for Winbond w25q128") removed the flags for non-SFDP devices. It was assumed that it wasn't in use anymore. This wasn't true. Add the no_sfdp_flags as well as the size again. We add the additional flags for dual and quad read because they have been reported to work properly by Hartmut using both older and newer versions of this flash, the similar flashes with 64Mbit and 256Mbit already have these flags and because it will (luckily) trigger our legacy SFDP parsing, so newer versions with SFDP support will still get the parameters from the SFDP tables. This was applied to mainline as commit e49b2731c396 ("mtd: spi-nor: winbond: fix w25q128 regression") however the code has changed a lot after v6.6 so the patch did not apply to v6.6 or v6.1 which still has the problem. This patch fixes the issue in the way of the old API and has been tested on hardware. Please apply it for v6.1 and v6.6. Reported-by: Hartmut Birr <e9hack(a)gmail.com> Closes: https://lore.kernel.org/r/CALxbwRo_-9CaJmt7r7ELgu+vOcgk=xZcGHobnKf=oT2=u4d4… Fixes: 83e824a4a595 ("mtd: spi-nor: Correct flags for Winbond w25q128") Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org> Signed-off-by: Michael Walle <mwalle(a)kernel.org> Acked-by: Tudor Ambarus <tudor.ambarus(a)linaro.org> Reviewed-by: Esben Haabendal <esben(a)geanix.com> Reviewed-by: Pratyush Yadav <pratyush(a)kernel.org> Signed-off-by: Pratyush Yadav <pratyush(a)kernel.org> Link: https://lore.kernel.org/r/20240621120929.2670185-1-mwalle@kernel.org [Backported to v6.6 - vastly different due to upstream changes] Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org> --- This fix backported to stable v6.6 and v6.1 after reports from OpenWrt users: https://github.com/openwrt/openwrt/issues/16796 --- drivers/mtd/spi-nor/winbond.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/mtd/spi-nor/winbond.c b/drivers/mtd/spi-nor/winbond.c index cd99c9a1c568..95dd28b9bf14 100644 --- a/drivers/mtd/spi-nor/winbond.c +++ b/drivers/mtd/spi-nor/winbond.c @@ -120,9 +120,10 @@ static const struct flash_info winbond_nor_parts[] = { NO_SFDP_FLAGS(SECT_4K) }, { "w25q80bl", INFO(0xef4014, 0, 64 * 1024, 16) NO_SFDP_FLAGS(SECT_4K) }, - { "w25q128", INFO(0xef4018, 0, 0, 0) - PARSE_SFDP - FLAGS(SPI_NOR_HAS_LOCK | SPI_NOR_HAS_TB) }, + { "w25q128", INFO(0xef4018, 0, 64 * 1024, 256) + FLAGS(SPI_NOR_HAS_LOCK | SPI_NOR_HAS_TB) + NO_SFDP_FLAGS(SECT_4K | SPI_NOR_DUAL_READ | + SPI_NOR_QUAD_READ) }, { "w25q256", INFO(0xef4019, 0, 64 * 1024, 512) NO_SFDP_FLAGS(SECT_4K | SPI_NOR_DUAL_READ | SPI_NOR_QUAD_READ) .fixups = &w25q256_fixups }, --- base-commit: ffc253263a1375a65fa6c9f62a893e9767fbebfa change-id: 20241027-v6-6-7ed05eaccb22 Best regards, -- Linus Walleij <linus.walleij(a)linaro.org>

11 months

2
1
0 0

+ mm-thp-fix-deferred-split-unqueue-naming-and-locking.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm/thp: fix deferred split unqueue naming and locking has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-thp-fix-deferred-split-unqueue-naming-and-locking.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Hugh Dickins <hughd(a)google.com> Subject: mm/thp: fix deferred split unqueue naming and locking Date: Sun, 27 Oct 2024 13:02:13 -0700 (PDT) Recent changes are putting more pressure on THP deferred split queues: under load revealing long-standing races, causing list_del corruptions, "Bad page state"s and worse (I keep BUGs in both of those, so usually don't get to see how badly they end up without). The relevant recent changes being 6.8's mTHP, 6.10's mTHP swapout, and 6.12's mTHP swapin, improved swap allocation, and underused THP splitting. Before fixing locking: rename misleading folio_undo_large_rmappable(), which does not undo large_rmappable, to folio_unqueue_deferred_split(), which is what it does. But that and its out-of-line __callee are mm internals of very limited usability: add comment and WARN_ON_ONCEs to check usage; and return a bool to say if a deferred split was unqueued, which can then be used in WARN_ON_ONCEs around safety checks (sparing callers the arcane conditionals in __folio_unqueue_deferred_split()). Just omit the folio_unqueue_deferred_split() from free_unref_folios(), all of whose callers now call it beforehand (and if any forget then bad_page() will tell) - except for its caller put_pages_list(), which itself no longer has any callers (and will be deleted separately). Swapout: mem_cgroup_swapout() has been resetting folio->memcg_data 0 without checking and unqueueing a THP folio from deferred split list; which is unfortunate, since the split_queue_lock depends on the memcg (when memcg is enabled); so swapout has been unqueueing such THPs later, when freeing the folio, using the pgdat's lock instead: potentially corrupting the memcg's list. __remove_mapping() has frozen refcount to 0 here, so no problem with calling folio_unqueue_deferred_split() before resetting memcg_data. That goes back to 5.4 commit 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware"): which included a check on swapcache before adding to deferred queue, but no check on deferred queue before adding THP to swapcache. That worked fine with the usual sequence of events in reclaim (though there were a couple of rare ways in which a THP on deferred queue could have been swapped out), but 6.12 commit dafff3f4c850 ("mm: split underused THPs") avoids splitting underused THPs in reclaim, which makes swapcache THPs on deferred queue commonplace. Keep the check on swapcache before adding to deferred queue? Yes: it is no longer essential, but preserves the existing behaviour, and is likely to be a worthwhile optimization (vmstat showed much more traffic on the queue under swapping load if the check was removed); update its comment. Memcg-v1 move (deprecated): mem_cgroup_move_account() has been changing folio->memcg_data without checking and unqueueing a THP folio from the deferred list, sometimes corrupting "from" memcg's list, like swapout. Refcount is non-zero here, so folio_unqueue_deferred_split() can only be used in a WARN_ON_ONCE to validate the fix, which must be done earlier: mem_cgroup_move_charge_pte_range() first try to split the THP (splitting of course unqueues), or skip it if that fails. Not ideal, but moving charge has been requested, and khugepaged should repair the THP later: nobody wants new custom unqueueing code just for this deprecated case. The 87eaceb3faa5 commit did have the code to move from one deferred list to another (but was not conscious of its unsafety while refcount non-0); but that was removed by 5.6 commit fac0516b5534 ("mm: thp: don't need care deferred split queue in memcg charge move path"), which argued that the existence of a PMD mapping guarantees that the THP cannot be on a deferred list. As above, false in rare cases, and now commonly false. Backport to 6.11 should be straightforward. Earlier backports must take care that other _deferred_list fixes and dependencies are included. There is not a strong case for backports, but they can fix cornercases. Link: https://lkml.kernel.org/r/8dc111ae-f6db-2da7-b25c-7a20b1effe3b@google.com Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") Fixes: dafff3f4c850 ("mm: split underused THPs") Signed-off-by: Hugh Dickins <hughd(a)google.com> Acked-by: David Hildenbrand <david(a)redhat.com> Reviewed-by: Yang Shi <shy828301(a)gmail.com> Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com> Cc: Barry Song <baohua(a)kernel.org> Cc: Chris Li <chrisl(a)kernel.org> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com> Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Nhat Pham <nphamcs(a)gmail.com> Cc: Ryan Roberts <ryan.roberts(a)arm.com> Cc: Shakeel Butt <shakeel.butt(a)linux.dev> Cc: Usama Arif <usamaarif642(a)gmail.com> Cc: Wei Yang <richard.weiyang(a)gmail.com> Cc: Zi Yan <ziy(a)nvidia.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/huge_memory.c | 35 ++++++++++++++++++++++++++--------- mm/internal.h | 10 +++++----- mm/memcontrol-v1.c | 25 +++++++++++++++++++++++++ mm/memcontrol.c | 8 +++++--- mm/migrate.c | 4 ++-- mm/page_alloc.c | 1 - mm/swap.c | 4 ++-- mm/vmscan.c | 4 ++-- 8 files changed, 67 insertions(+), 24 deletions(-) --- a/mm/huge_memory.c~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/huge_memory.c @@ -3588,10 +3588,27 @@ int split_folio_to_list(struct folio *fo return split_huge_page_to_list_to_order(&folio->page, list, ret); } -void __folio_undo_large_rmappable(struct folio *folio) +/* + * __folio_unqueue_deferred_split() is not to be called directly: + * the folio_unqueue_deferred_split() inline wrapper in mm/internal.h + * limits its calls to those folios which may have a _deferred_list for + * queueing THP splits, and that list is (racily observed to be) non-empty. + * + * It is unsafe to call folio_unqueue_deferred_split() until folio refcount is + * zero: because even when split_queue_lock is held, a non-empty _deferred_list + * might be in use on deferred_split_scan()'s unlocked on-stack list. + * + * If memory cgroups are enabled, split_queue_lock is in the mem_cgroup: it is + * therefore important to unqueue deferred split before changing folio memcg. + */ +bool __folio_unqueue_deferred_split(struct folio *folio) { struct deferred_split *ds_queue; unsigned long flags; + bool unqueued = false; + + WARN_ON_ONCE(folio_ref_count(folio)); + WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg(folio)); ds_queue = get_deferred_split_queue(folio); spin_lock_irqsave(&ds_queue->split_queue_lock, flags); @@ -3603,8 +3620,11 @@ void __folio_undo_large_rmappable(struct MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); } list_del_init(&folio->_deferred_list); + unqueued = true; } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + + return unqueued; /* useful for debug warnings */ } /* partially_mapped=false won't clear PG_partially_mapped folio flag */ @@ -3627,14 +3647,11 @@ void deferred_split_folio(struct folio * return; /* - * The try_to_unmap() in page reclaim path might reach here too, - * this may cause a race condition to corrupt deferred split queue. - * And, if page reclaim is already handling the same folio, it is - * unnecessary to handle it again in shrinker. - * - * Check the swapcache flag to determine if the folio is being - * handled by page reclaim since THP swap would add the folio into - * swap cache before calling try_to_unmap(). + * Exclude swapcache: originally to avoid a corrupt deferred split + * queue. Nowadays that is fully prevented by mem_cgroup_swapout(); + * but if page reclaim is already handling the same folio, it is + * unnecessary to handle it again in the shrinker, so excluding + * swapcache here may still be a useful optimization. */ if (folio_test_swapcache(folio)) return; --- a/mm/internal.h~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/internal.h @@ -684,11 +684,11 @@ static inline void folio_set_order(struc #endif } -void __folio_undo_large_rmappable(struct folio *folio); -static inline void folio_undo_large_rmappable(struct folio *folio) +bool __folio_unqueue_deferred_split(struct folio *folio); +static inline bool folio_unqueue_deferred_split(struct folio *folio) { if (folio_order(folio) <= 1 || !folio_test_large_rmappable(folio)) - return; + return false; /* * At this point, there is no one trying to add the folio to @@ -696,9 +696,9 @@ static inline void folio_undo_large_rmap * to check without acquiring the split_queue_lock. */ if (data_race(list_empty(&folio->_deferred_list))) - return; + return false; - __folio_undo_large_rmappable(folio); + return __folio_unqueue_deferred_split(folio); } static inline struct folio *page_rmappable_folio(struct page *page) --- a/mm/memcontrol.c~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/memcontrol.c @@ -4629,9 +4629,6 @@ static void uncharge_folio(struct folio struct obj_cgroup *objcg; VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); - VM_BUG_ON_FOLIO(folio_order(folio) > 1 && - !folio_test_hugetlb(folio) && - !list_empty(&folio->_deferred_list), folio); /* * Nobody should be changing or seriously looking at @@ -4678,6 +4675,7 @@ static void uncharge_folio(struct folio ug->nr_memory += nr_pages; ug->pgpgout++; + WARN_ON_ONCE(folio_unqueue_deferred_split(folio)); folio->memcg_data = 0; } @@ -4789,6 +4787,9 @@ void mem_cgroup_migrate(struct folio *ol /* Transfer the charge and the css ref */ commit_charge(new, memcg); + + /* Warning should never happen, so don't worry about refcount non-0 */ + WARN_ON_ONCE(folio_unqueue_deferred_split(old)); old->memcg_data = 0; } @@ -4975,6 +4976,7 @@ void mem_cgroup_swapout(struct folio *fo VM_BUG_ON_FOLIO(oldid, folio); mod_memcg_state(swap_memcg, MEMCG_SWAP, nr_entries); + folio_unqueue_deferred_split(folio); folio->memcg_data = 0; if (!mem_cgroup_is_root(memcg)) --- a/mm/memcontrol-v1.c~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/memcontrol-v1.c @@ -848,6 +848,8 @@ static int mem_cgroup_move_account(struc css_get(&to->css); css_put(&from->css); + /* Warning should never happen, so don't worry about refcount non-0 */ + WARN_ON_ONCE(folio_unqueue_deferred_split(folio)); folio->memcg_data = (unsigned long)to; __folio_memcg_unlock(from); @@ -1217,7 +1219,9 @@ static int mem_cgroup_move_charge_pte_ra enum mc_target_type target_type; union mc_target target; struct folio *folio; + bool tried_split_before = false; +retry_pmd: ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { if (mc.precharge < HPAGE_PMD_NR) { @@ -1227,6 +1231,27 @@ static int mem_cgroup_move_charge_pte_ra target_type = get_mctgt_type_thp(vma, addr, *pmd, &target); if (target_type == MC_TARGET_PAGE) { folio = target.folio; + /* + * Deferred split queue locking depends on memcg, + * and unqueue is unsafe unless folio refcount is 0: + * split or skip if on the queue? first try to split. + */ + if (!list_empty(&folio->_deferred_list)) { + spin_unlock(ptl); + if (!tried_split_before) + split_folio(folio); + folio_unlock(folio); + folio_put(folio); + if (tried_split_before) + return 0; + tried_split_before = true; + goto retry_pmd; + } + /* + * So long as that pmd lock is held, the folio cannot + * be racily added to the _deferred_list, because + * __folio_remove_rmap() will find !partially_mapped. + */ if (folio_isolate_lru(folio)) { if (!mem_cgroup_move_account(folio, true, mc.from, mc.to)) { --- a/mm/migrate.c~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/migrate.c @@ -490,7 +490,7 @@ static int __folio_migrate_mapping(struc folio_test_large_rmappable(folio)) { if (!folio_ref_freeze(folio, expected_count)) return -EAGAIN; - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); folio_ref_unfreeze(folio, expected_count); } @@ -515,7 +515,7 @@ static int __folio_migrate_mapping(struc } /* Take off deferred split queue while frozen and memcg set */ - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); /* * Now we know that no one else is looking at the folio: --- a/mm/page_alloc.c~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/page_alloc.c @@ -2686,7 +2686,6 @@ void free_unref_folios(struct folio_batc unsigned long pfn = folio_pfn(folio); unsigned int order = folio_order(folio); - folio_undo_large_rmappable(folio); if (!free_pages_prepare(&folio->page, order)) continue; /* --- a/mm/swap.c~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/swap.c @@ -121,7 +121,7 @@ void __folio_put(struct folio *folio) } page_cache_release(folio); - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); mem_cgroup_uncharge(folio); free_unref_page(&folio->page, folio_order(folio)); } @@ -988,7 +988,7 @@ void folios_put_refs(struct folio_batch free_huge_folio(folio); continue; } - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); __page_cache_release(folio, &lruvec, &flags); if (j != i) --- a/mm/vmscan.c~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/vmscan.c @@ -1476,7 +1476,7 @@ free_it: */ nr_reclaimed += nr_pages; - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); if (folio_batch_add(&free_folios, folio) == 0) { mem_cgroup_uncharge_folios(&free_folios); try_to_unmap_flush(); @@ -1864,7 +1864,7 @@ static unsigned int move_folios_to_lru(s if (unlikely(folio_put_testzero(folio))) { __folio_clear_lru_flags(folio); - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); if (folio_batch_add(&free_folios, folio) == 0) { spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_folios(&free_folios); _ Patches currently in -mm which might be from hughd(a)google.com are mm-thp-fix-deferred-split-queue-not-partially_mapped.patch mm-thp-fix-deferred-split-unqueue-naming-and-locking.patch mm-delete-the-unused-put_pages_list.patch

11 months

1
0
0 0

[PATCH v3] xhci: Fix Link TRB DMA in command ring stopped completion event

by Faisal Hassan

During the aborting of a command, the software receives a command completion event for the command ring stopped, with the TRB pointing to the next TRB after the aborted command. If the command we abort is located just before the Link TRB in the command ring, then during the 'command ring stopped' completion event, the xHC gives the Link TRB in the event's cmd DMA, which causes a mismatch in handling command completion event. To address this situation, move the 'command ring stopped' completion event check slightly earlier, since the specific command it stopped on isn't of significant concern. Fixes: 7f84eef0dafb ("USB: xhci: No-op command queueing and irq handler.") Cc: stable(a)vger.kernel.org Signed-off-by: Faisal Hassan <quic_faisalh(a)quicinc.com> --- Changes in v3: - Skip dma check for the cmd ring stopped event - v2 link: https://lore.kernel.org/all/20241021131904.20678-1-quic_faisalh@quicinc.com Changes in v2: - Added Fixes tag - Removed traversing of TRBs with in_range() API. - Simplified the if condition check. - v1 link: https://lore.kernel.org/all/20241018195953.12315-1-quic_faisalh@quicinc.com drivers/usb/host/xhci-ring.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index b2950c35c740..1ffc69c48eac 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -1718,6 +1718,14 @@ static void handle_cmd_completion(struct xhci_hcd *xhci, trace_xhci_handle_command(xhci->cmd_ring, &cmd_trb->generic); + cmd_comp_code = GET_COMP_CODE(le32_to_cpu(event->status)); + + /* If CMD ring stopped we own the trbs between enqueue and dequeue */ + if (cmd_comp_code == COMP_COMMAND_RING_STOPPED) { + complete_all(&xhci->cmd_ring_stop_completion); + return; + } + cmd_dequeue_dma = xhci_trb_virt_to_dma(xhci->cmd_ring->deq_seg, cmd_trb); /* @@ -1734,14 +1742,6 @@ static void handle_cmd_completion(struct xhci_hcd *xhci, cancel_delayed_work(&xhci->cmd_timer); - cmd_comp_code = GET_COMP_CODE(le32_to_cpu(event->status)); - - /* If CMD ring stopped we own the trbs between enqueue and dequeue */ - if (cmd_comp_code == COMP_COMMAND_RING_STOPPED) { - complete_all(&xhci->cmd_ring_stop_completion); - return; - } - if (cmd->command_trb != xhci->cmd_ring->dequeue) { xhci_err(xhci, "Command completion event does not match command\n"); -- 2.17.1

11 months

3
2
0 0

patch "docs: iio: ad7380: fix supply for ad7380-4" added to char-misc-linus

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled docs: iio: ad7380: fix supply for ad7380-4 to my char-misc git tree which can be found at git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git in the char-misc-linus branch. The patch will show up in the next release of the linux-next tree (usually sometime within the next 24 hours during the week.) The patch will hopefully also be merged in Linus's tree for the next -rc kernel release. If you have any questions about this process, please let me know. From 795114e849ddfd48150eb0135d04748a8c81cec5 Mon Sep 17 00:00:00 2001 From: Julien Stephan <jstephan(a)baylibre.com> Date: Tue, 22 Oct 2024 15:22:40 +0200 Subject: docs: iio: ad7380: fix supply for ad7380-4 ad7380-4 is the only device from ad738x family that doesn't have an internal reference. Moreover it's external reference is called REFIN in the datasheet while all other use REFIO as an optional external reference. Update documentation to highlight this. Fixes: 3e82dfc82f38 ("docs: iio: new docs for ad7380 driver") Reviewed-by: David Lechner <dlechner(a)baylibre.com> Signed-off-by: Julien Stephan <jstephan(a)baylibre.com> Link: https://patch.msgid.link/20241022-ad7380-fix-supplies-v3-5-f0cefe1b7fa6@bay… Cc: <Stable(a)vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> --- Documentation/iio/ad7380.rst | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/Documentation/iio/ad7380.rst b/Documentation/iio/ad7380.rst index 9c784c1e652e..6f70b49b9ef2 100644 --- a/Documentation/iio/ad7380.rst +++ b/Documentation/iio/ad7380.rst @@ -41,13 +41,22 @@ supports only 1 SDO line. Reference voltage ----------------- -2 possible reference voltage sources are supported: +ad7380-4 +~~~~~~~~ + +ad7380-4 supports only an external reference voltage (2.5V to 3.3V). It must be +declared in the device tree as ``refin-supply``. + +All other devices from ad738x family +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +All other devices from ad738x support 2 possible reference voltage sources: - Internal reference (2.5V) - External reference (2.5V to 3.3V) The source is determined by the device tree. If ``refio-supply`` is present, -then the external reference is used, else the internal reference is used. +then it is used as external reference, else the internal reference is used. Oversampling and resolution boost --------------------------------- -- 2.47.0

11 months

1
0
0 0

patch "iio: adc: ad7380: fix supplies for ad7380-4" added to char-misc-linus

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled iio: adc: ad7380: fix supplies for ad7380-4 to my char-misc git tree which can be found at git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git in the char-misc-linus branch. The patch will show up in the next release of the linux-next tree (usually sometime within the next 24 hours during the week.) The patch will hopefully also be merged in Linus's tree for the next -rc kernel release. If you have any questions about this process, please let me know. From 05f9c67179c9a8d66dee175fb4b17f380908a26f Mon Sep 17 00:00:00 2001 From: Julien Stephan <jstephan(a)baylibre.com> Date: Tue, 22 Oct 2024 15:22:39 +0200 Subject: iio: adc: ad7380: fix supplies for ad7380-4 ad7380-4 is the only device in the family that does not have an internal reference. It uses "refin" as a required external reference. All other devices in the family use "refio"" as an optional external reference. Fixes: 737413da8704 ("iio: adc: ad7380: add support for ad738x-4 4 channels variants") Reviewed-by: Nuno Sa <nuno.sa(a)analog.com> Reviewed-by: David Lechner <dlechner(a)baylibre.com> Signed-off-by: Julien Stephan <jstephan(a)baylibre.com> Link: https://patch.msgid.link/20241022-ad7380-fix-supplies-v3-4-f0cefe1b7fa6@bay… Cc: <Stable(a)vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> --- drivers/iio/adc/ad7380.c | 36 ++++++++++++++++++++++++++---------- 1 file changed, 26 insertions(+), 10 deletions(-) diff --git a/drivers/iio/adc/ad7380.c b/drivers/iio/adc/ad7380.c index b107d8e97ab3..fb728570debe 100644 --- a/drivers/iio/adc/ad7380.c +++ b/drivers/iio/adc/ad7380.c @@ -89,6 +89,7 @@ struct ad7380_chip_info { bool has_mux; const char * const *supplies; unsigned int num_supplies; + bool external_ref_only; const char * const *vcm_supplies; unsigned int num_vcm_supplies; const unsigned long *available_scan_masks; @@ -431,6 +432,7 @@ static const struct ad7380_chip_info ad7380_4_chip_info = { .num_simult_channels = 4, .supplies = ad7380_supplies, .num_supplies = ARRAY_SIZE(ad7380_supplies), + .external_ref_only = true, .available_scan_masks = ad7380_4_channel_scan_masks, .timing_specs = &ad7380_4_timing, }; @@ -1047,17 +1049,31 @@ static int ad7380_probe(struct spi_device *spi) "Failed to enable power supplies\n"); fsleep(T_POWERUP_US); - /* - * If there is no REFIO supply, then it means that we are using - * the internal 2.5V reference, otherwise REFIO is reference voltage. - */ - ret = devm_regulator_get_enable_read_voltage(&spi->dev, "refio"); - if (ret < 0 && ret != -ENODEV) - return dev_err_probe(&spi->dev, ret, - "Failed to get refio regulator\n"); + if (st->chip_info->external_ref_only) { + ret = devm_regulator_get_enable_read_voltage(&spi->dev, + "refin"); + if (ret < 0) + return dev_err_probe(&spi->dev, ret, + "Failed to get refin regulator\n"); - external_ref_en = ret != -ENODEV; - st->vref_mv = external_ref_en ? ret / 1000 : AD7380_INTERNAL_REF_MV; + st->vref_mv = ret / 1000; + + /* these chips don't have a register bit for this */ + external_ref_en = false; + } else { + /* + * If there is no REFIO supply, then it means that we are using + * the internal reference, otherwise REFIO is reference voltage. + */ + ret = devm_regulator_get_enable_read_voltage(&spi->dev, + "refio"); + if (ret < 0 && ret != -ENODEV) + return dev_err_probe(&spi->dev, ret, + "Failed to get refio regulator\n"); + + external_ref_en = ret != -ENODEV; + st->vref_mv = external_ref_en ? ret / 1000 : AD7380_INTERNAL_REF_MV; + } if (st->chip_info->num_vcm_supplies > ARRAY_SIZE(st->vcm_mv)) return dev_err_probe(&spi->dev, -EINVAL, -- 2.47.0

11 months

1
0
0 0

patch "dt-bindings: iio: adc: ad7380: fix ad7380-4 reference supply" added to char-misc-linus

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled dt-bindings: iio: adc: ad7380: fix ad7380-4 reference supply to my char-misc git tree which can be found at git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git in the char-misc-linus branch. The patch will show up in the next release of the linux-next tree (usually sometime within the next 24 hours during the week.) The patch will hopefully also be merged in Linus's tree for the next -rc kernel release. If you have any questions about this process, please let me know. From fbe5956e8809f04e9121923db0b6d1b94f2b93ba Mon Sep 17 00:00:00 2001 From: Julien Stephan <jstephan(a)baylibre.com> Date: Tue, 22 Oct 2024 15:22:36 +0200 Subject: dt-bindings: iio: adc: ad7380: fix ad7380-4 reference supply ad7380-4 is the only device from ad738x family that doesn't have an internal reference. Moreover its external reference is called REFIN in the datasheet while all other use REFIO as an optional external reference. If refio-supply is omitted the internal reference is used. Fix the binding by adding refin-supply and makes it required for ad7380-4 only. Fixes: 1a291cc8ee17 ("dt-bindings: iio: adc: ad7380: add support for ad738x-4 4 channels variants") Acked-by: Conor Dooley <conor.dooley(a)microchip.com> Reviewed-by: David Lechner <dlechner(a)baylibre.com> Signed-off-by: Julien Stephan <jstephan(a)baylibre.com> Link: https://patch.msgid.link/20241022-ad7380-fix-supplies-v3-1-f0cefe1b7fa6@bay… Cc: <Stable(a)vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> --- .../bindings/iio/adc/adi,ad7380.yaml | 21 +++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/Documentation/devicetree/bindings/iio/adc/adi,ad7380.yaml b/Documentation/devicetree/bindings/iio/adc/adi,ad7380.yaml index bd19abb867d9..0065d6508824 100644 --- a/Documentation/devicetree/bindings/iio/adc/adi,ad7380.yaml +++ b/Documentation/devicetree/bindings/iio/adc/adi,ad7380.yaml @@ -67,6 +67,10 @@ properties: A 2.5V to 3.3V supply for the external reference voltage. When omitted, the internal 2.5V reference is used. + refin-supply: + description: + A 2.5V to 3.3V supply for external reference voltage, for ad7380-4 only. + aina-supply: description: The common mode voltage supply for the AINA- pin on pseudo-differential @@ -135,6 +139,23 @@ allOf: ainc-supply: false aind-supply: false + # ad7380-4 uses refin-supply as external reference. + # All other chips from ad738x family use refio as optional external reference. + # When refio-supply is omitted, internal reference is used. + - if: + properties: + compatible: + enum: + - adi,ad7380-4 + then: + properties: + refio-supply: false + required: + - refin-supply + else: + properties: + refin-supply: false + examples: - | #include <dt-bindings/interrupt-controller/irq.h> -- 2.47.0

11 months

1
0
0 0

Re: [PATCH 6.11 000/261] 6.11.6-rc1 review

by Ronald Warsow

Hi Greg no regressions here on x86_64 (RKL, Intel 11th Gen. CPU) Thanks Tested-by: Ronald Warsow <rwarsow(a)gmx.de>

11 months

1
0
0 0

[PATCH v4] Revert "drm/mipi-dsi: Set the fwnode for mipi_dsi_device"

by Jason-JH.Lin via B4 Relay

From: "Jason-JH.Lin" <jason-jh.lin(a)mediatek.com> This reverts commit ac88a1f41f93499df6f50fd18ea835e6ff4f3200. Reason for revert: 1. The commit [1] does not land on linux-5.15, so this patch does not fix anything. 2. Since the fw_devlink improvements series [2] does not land on linux-5.15, using device_set_fwnode() causes the panel to flash during bootup. Incorrect link management may lead to incorrect device initialization, affecting firmware node links and consumer relationships. The fwnode setting of panel to the DSI device would cause a DSI initialization error without series[2], so this patch was reverted to avoid using the incomplete fw_devlink functionality. [1] commit 3fb16866b51d ("driver core: fw_devlink: Make cycle detection more robust") [2] Link: https://lore.kernel.org/all/20230207014207.1678715-1-saravanak@google.com Cc: stable(a)vger.kernel.org # 5.15.169 Cc: stable(a)vger.kernel.org # 5.10.228 Cc: stable(a)vger.kernel.org # 5.4.284 Signed-off-by: Jason-JH.Lin <jason-jh.lin(a)mediatek.com> --- drivers/gpu/drm/drm_mipi_dsi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_mipi_dsi.c b/drivers/gpu/drm/drm_mipi_dsi.c index 24606b632009..468a3a7cb6a5 100644 --- a/drivers/gpu/drm/drm_mipi_dsi.c +++ b/drivers/gpu/drm/drm_mipi_dsi.c @@ -221,7 +221,7 @@ mipi_dsi_device_register_full(struct mipi_dsi_host *host, return dsi; } - device_set_node(&dsi->dev, of_fwnode_handle(info->node)); + dsi->dev.of_node = info->node; dsi->channel = info->channel; strlcpy(dsi->name, info->type, sizeof(dsi->name)); --- base-commit: 74cdd62cb4706515b454ce5bacb73b566c1d1bcf change-id: 20241024-fixup-5-15-5fdd68dae707 Best regards, -- Jason-JH.Lin <jason-jh.lin(a)mediatek.com>

11 months

1
0
0 0

[PATCH hotfix 6.12 v2] mm/mlock: set the correct prev on failure

by Wei Yang

After commit 94d7d9233951 ("mm: abstract the vma_merge()/split_vma() pattern for mprotect() et al."), if vma_modify_flags() return error, the vma is set to an error code. This will lead to an invalid prev be returned. Generally this shouldn't matter as the caller should treat an error as indicating state is now invalidated, however unfortunately apply_mlockall_flags() does not check for errors and assumes that mlock_fixup() correctly maintains prev even if an error were to occur. This patch fixes that assumption. [lorenzo: provide a better fix and rephrase the log] Fixes: 94d7d9233951 ("mm: abstract the vma_merge()/split_vma() pattern for mprotect() et al.") Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com> CC: Liam R. Howlett <Liam.Howlett(a)Oracle.com> CC: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> CC: Vlastimil Babka <vbabka(a)suse.cz> CC: Jann Horn <jannh(a)google.com> Cc: <stable(a)vger.kernel.org> --- v2: rearrange the fix and change log per Lorenzo's suggestion add fix tag and cc stable --- mm/mlock.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/mm/mlock.c b/mm/mlock.c index e3e3dc2b2956..cde076fa7d5e 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -725,14 +725,17 @@ static int apply_mlockall_flags(int flags) } for_each_vma(vmi, vma) { + int error; vm_flags_t newflags; newflags = vma->vm_flags & ~VM_LOCKED_MASK; newflags |= to_add; - /* Ignore errors */ - mlock_fixup(&vmi, vma, &prev, vma->vm_start, vma->vm_end, - newflags); + error = mlock_fixup(&vmi, vma, &prev, vma->vm_start, vma->vm_end, + newflags); + /* Ignore errors, but prev needs fixing up. */ + if (error) + prev = vma; cond_resched(); } out: -- 2.34.1

11 months

5
6
0 0

[PATCH] Revert "drm/mipi-dsi: Set the fwnode for mipi_dsi_device"

by Jason-JH.Lin via B4 Relay

From: "Jason-JH.Lin" <jason-jh.lin(a)mediatek.com> This reverts commit ac88a1f41f93499df6f50fd18ea835e6ff4f3200. Reason for revert: 1. The commit [1] does not land on linux-5.15, so this patch does not fix anything. 2. Since the fw_device improvements series [2] does not land on linux-5.15, using device_set_fwnode() causes the panel to flash during bootup. Incorrect link management may lead to incorrect device initialization, affecting firmware node links and consumer relationships. The fwnode setting of panel to the DSI device would cause a DSI initialization error without series[2], so this patch was reverted to avoid using the incomplete fw_devlink functionality. [1] commit 3fb16866b51d ("driver core: fw_devlink: Make cycle detection more robust") [2] Link: https://lore.kernel.org/all/20230207014207.1678715-1-saravanak@google.com Cc: stable(a)vger.kernel.org # 5.15.169 Signed-off-by: Jason-JH.Lin <jason-jh.lin(a)mediatek.com> --- drivers/gpu/drm/drm_mipi_dsi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_mipi_dsi.c b/drivers/gpu/drm/drm_mipi_dsi.c index 24606b632009..468a3a7cb6a5 100644 --- a/drivers/gpu/drm/drm_mipi_dsi.c +++ b/drivers/gpu/drm/drm_mipi_dsi.c @@ -221,7 +221,7 @@ mipi_dsi_device_register_full(struct mipi_dsi_host *host, return dsi; } - device_set_node(&dsi->dev, of_fwnode_handle(info->node)); + dsi->dev.of_node = info->node; dsi->channel = info->channel; strlcpy(dsi->name, info->type, sizeof(dsi->name)); --- base-commit: 74cdd62cb4706515b454ce5bacb73b566c1d1bcf change-id: 20241024-fixup-5-15-5fdd68dae707 Best regards, -- Jason-JH.Lin <jason-jh.lin(a)mediatek.com>

11 months

4
5
0 0

[PATCH v3] Revert "drm/mipi-dsi: Set the fwnode for mipi_dsi_device"

by Jason-JH.Lin via B4 Relay

From: "Jason-JH.Lin" <jason-jh.lin(a)mediatek.com> This reverts commit ac88a1f41f93499df6f50fd18ea835e6ff4f3200. Reason for revert: 1. The commit [1] does not land on linux-5.15, so this patch does not fix anything. 2. Since the fw_device improvements series [2] does not land on linux-5.15, using device_set_fwnode() causes the panel to flash during bootup. Incorrect link management may lead to incorrect device initialization, affecting firmware node links and consumer relationships. The fwnode setting of panel to the DSI device would cause a DSI initialization error without series[2], so this patch was reverted to avoid using the incomplete fw_devlink functionality. [1] commit 3fb16866b51d ("driver core: fw_devlink: Make cycle detection more robust") [2] Link: https://lore.kernel.org/all/20230207014207.1678715-1-saravanak@google.com Cc: stable(a)vger.kernel.org # 5.15.169 Cc: stable(a)vger.kernel.org # 5.10.228 Cc: stable(a)vger.kernel.org # 5.4.284 Signed-off-by: Jason-JH.Lin <jason-jh.lin(a)mediatek.com> --- drivers/gpu/drm/drm_mipi_dsi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_mipi_dsi.c b/drivers/gpu/drm/drm_mipi_dsi.c index 24606b632009..468a3a7cb6a5 100644 --- a/drivers/gpu/drm/drm_mipi_dsi.c +++ b/drivers/gpu/drm/drm_mipi_dsi.c @@ -221,7 +221,7 @@ mipi_dsi_device_register_full(struct mipi_dsi_host *host, return dsi; } - device_set_node(&dsi->dev, of_fwnode_handle(info->node)); + dsi->dev.of_node = info->node; dsi->channel = info->channel; strlcpy(dsi->name, info->type, sizeof(dsi->name)); --- base-commit: 74cdd62cb4706515b454ce5bacb73b566c1d1bcf change-id: 20241024-fixup-5-15-5fdd68dae707 Best regards, -- Jason-JH.Lin <jason-jh.lin(a)mediatek.com>

11 months

3
2
0 0

Remove "block: fix sanity checks in blk_rq_map_user_bvec" from all stable queues

by Uday Shankar

Please remove the following patch from all stable queues: 2ff949441802 ("block: fix sanity checks in blk_rq_map_user_bvec") The above patch should not go into any stable tree unless accompanied by its (currently inflight) fix: https://lore.kernel.org/linux-block/20241028090840.446180-1-hch@lst.de/

11 months

2
1
0 0

+ vmscanmigrate-fix-double-decrement-on-node-stats-when-demoting-pages.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: vmscan,migrate: fix page count imbalance on node stats when demoting pages has been added to the -mm mm-hotfixes-unstable branch. Its filename is vmscanmigrate-fix-double-decrement-on-node-stats-when-demoting-pages.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Gregory Price <gourry(a)gourry.net> Subject: vmscan,migrate: fix page count imbalance on node stats when demoting pages Date: Fri, 25 Oct 2024 10:17:24 -0400 When numa balancing is enabled with demotion, vmscan will call migrate_pages when shrinking LRUs. migrate_pages will decrement the the node's isolated page count, leading to an imbalanced count when invoked from (MG)LRU code. The result is dmesg output like such: $ cat /proc/sys/vm/stat_refresh [77383.088417] vmstat_refresh: nr_isolated_anon -103212 [77383.088417] vmstat_refresh: nr_isolated_file -899642 This negative value may impact compaction and reclaim throttling. The following path produces the decrement: shrink_folio_list demote_folio_list migrate_pages migrate_pages_batch migrate_folio_move migrate_folio_done mod_node_page_state(-ve) <- decrement This path happens for SUCCESSFUL migrations, not failures. Typically callers to migrate_pages are required to handle putback/accounting for failures, but this is already handled in the shrink code. When accounting for migrations, instead do not decrement the count when the migration reason is MR_DEMOTION. As of v6.11, this demotion logic is the only source of MR_DEMOTION. Link: https://lkml.kernel.org/r/20241025141724.17927-1-gourry@gourry.net Fixes: 26aa2d199d6f ("mm/migrate: demote pages during reclaim") Signed-off-by: Gregory Price <gourry(a)gourry.net> Reviewed-by: Yang Shi <shy828301(a)gmail.com> Cc: Dave Hansen <dave.hansen(a)linux.intel.com> Cc: Huang Ying <ying.huang(a)intel.com> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/migrate.c~vmscanmigrate-fix-double-decrement-on-node-stats-when-demoting-pages +++ a/mm/migrate.c @@ -1178,7 +1178,7 @@ static void migrate_folio_done(struct fo * not accounted to NR_ISOLATED_*. They can be recognized * as __folio_test_movable */ - if (likely(!__folio_test_movable(src))) + if (likely(!__folio_test_movable(src)) && reason != MR_DEMOTION) mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON + folio_is_file_lru(src), -folio_nr_pages(src)); _ Patches currently in -mm which might be from gourry(a)gourry.net are vmscanmigrate-fix-double-decrement-on-node-stats-when-demoting-pages.patch

11 months

1
0
0 0

[PATCH net 0/3] mptcp: sched: fix some lock issues

by Matthieu Baerts (NGI0)

Two small fixes related to the MPTCP packets scheduler: - Patch 1: add missing rcu_read_(un)lock(). A fix for >= 6.6. - Patch 2: remove unneeded lock when listing packets schedulers. A fix for >= 6.10. And some modifications in the MPTCP selftests: - Patch 3: a small addition to the MPTCP selftests to cover more code. Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org> --- Matthieu Baerts (NGI0) (3): mptcp: init: protect sched with rcu_read_lock mptcp: remove unneeded lock when listing scheds selftests: mptcp: list sysctl data net/mptcp/protocol.c | 2 ++ net/mptcp/sched.c | 2 -- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 9 +++++++++ 3 files changed, 11 insertions(+), 2 deletions(-) --- base-commit: 3b05b9c36ddd01338e1352588f2ec1ea23f97d43 change-id: 20241021-net-mptcp-sched-lock-10dfc75d1e00 Best regards, -- Matthieu Baerts (NGI0) <matttbe(a)kernel.org>

11 months

5
8
0 0

[PATCH 0/2] Expand comment

by Linus Walleij

Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org> --- Linus Walleij (2): ARM: entry: Do a dummy read from VMAP shadow ARM: entry: expand comment in __switch_to arch/arm/kernel/entry-armv.S | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) --- base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc change-id: 20241028-comments-in-switch-to-0e24480e8495 Best regards, -- Linus Walleij <linus.walleij(a)linaro.org>

11 months

1
2
0 0

[PATCH 0/2] phy: tegra: xusb: fix device(_node) release in tegra210_xusb_padctl_probe

by Javier Carrasco

This series fixes two similar issues in tegra_210_xusb_padctl_probe(). Two resources (device_node *np and the device within struct platform_device *pdev) are acquired, but never released after they are no longer needed. To avoid leaking such resources, calls to of_node_put() and put_device() must be added. In this case, the resources are not assigned anywhere in the probe function, and they must be released in the error and success paths. If I overlooked any assignment, please report it as an error. I have tried to affect the existing code and execution paths as less as possible by releasing the device and device_node as soon as possible, but if goto jumps to labels for the cleanup are desired, I can go for that approach instead. This series has been compiled successfully, but not tested on real hardware as I don't have access to it. Any validation with the affected hardware is always welcome. Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com> --- Javier Carrasco (2): phy: tegra: xusb: fix device release in tegra210_xusb_padctl_probe phy: tegra: xusb: fix device node release in tegra210_xusb_padctl_probe drivers/phy/tegra/xusb-tegra210.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) --- base-commit: dec9255a128e19c5fcc3bdb18175d78094cc624d change-id: 20241028-phy-tegra-xusb-tegra210-put_device-ff7ae76403b4 Best regards, -- Javier Carrasco <javier.carrasco.cruz(a)gmail.com>

11 months

1
2
0 0

[PATCH v8 3/3] tpm: Lazily flush the auth session

by Jarkko Sakkinen

Move the allocation of chip->auth to tpm2_start_auth_session() so that this field can be used as flag to tell whether auth session is active or not. Instead of flushing and reloading the auth session for every transaction separately, keep the session open unless /dev/tpm0 is used. Reported-by: Pengyu Ma <mapengyu(a)gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219229 Cc: stable(a)vger.kernel.org # v6.10+ Fixes: 7ca110f2679b ("tpm: Address !chip->auth in tpm_buf_append_hmac_session*()") Tested-by: Pengyu Ma <mapengyu(a)gmail.com> Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org> --- v8: - Since auth session and null key are flushed at a same time, only either needs to be checked. Addresses and a remark from James Bottomley few revisions ago. - kfree_sensitive() - Effectively squash top three patches given the simplifications. v7: - No changes. v6: - No changes. v5: - No changes. v4: - Changed as bug. v3: - Refined the commit message. - Removed the conditional for applying TPM2_SA_CONTINUE_SESSION only when /dev/tpm0 is open. It is not required as the auth session is flushed, not saved. v2: - A new patch. --- drivers/char/tpm/tpm-chip.c | 10 +++++++ drivers/char/tpm/tpm-dev-common.c | 3 +++ drivers/char/tpm/tpm-interface.c | 6 +++-- drivers/char/tpm/tpm2-sessions.c | 45 ++++++++++++++++++------------- 4 files changed, 44 insertions(+), 20 deletions(-) diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c index 854546000c92..1ff99a7091bb 100644 --- a/drivers/char/tpm/tpm-chip.c +++ b/drivers/char/tpm/tpm-chip.c @@ -674,6 +674,16 @@ EXPORT_SYMBOL_GPL(tpm_chip_register); */ void tpm_chip_unregister(struct tpm_chip *chip) { +#ifdef CONFIG_TCG_TPM2_HMAC + int rc; + + rc = tpm_try_get_ops(chip); + if (!rc) { + tpm2_end_auth_session(chip); + tpm_put_ops(chip); + } +#endif + tpm_del_legacy_sysfs(chip); if (tpm_is_hwrng_enabled(chip)) hwrng_unregister(&chip->hwrng); diff --git a/drivers/char/tpm/tpm-dev-common.c b/drivers/char/tpm/tpm-dev-common.c index 30b4c288c1bb..c7a88fa7b0fc 100644 --- a/drivers/char/tpm/tpm-dev-common.c +++ b/drivers/char/tpm/tpm-dev-common.c @@ -27,6 +27,9 @@ static ssize_t tpm_dev_transmit(struct tpm_chip *chip, struct tpm_space *space, struct tpm_header *header = (void *)buf; ssize_t ret, len; + if (chip->flags & TPM_CHIP_FLAG_TPM2) + tpm2_end_auth_session(chip); + ret = tpm2_prepare_space(chip, space, buf, bufsiz); /* If the command is not implemented by the TPM, synthesize a * response with a TPM2_RC_COMMAND_CODE return for user-space. diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index 5da134f12c9a..8134f002b121 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -379,10 +379,12 @@ int tpm_pm_suspend(struct device *dev) rc = tpm_try_get_ops(chip); if (!rc) { - if (chip->flags & TPM_CHIP_FLAG_TPM2) + if (chip->flags & TPM_CHIP_FLAG_TPM2) { + tpm2_end_auth_session(chip); tpm2_shutdown(chip, TPM2_SU_STATE); - else + } else { rc = tpm1_pm_suspend(chip, tpm_suspend_pcr); + } tpm_put_ops(chip); } diff --git a/drivers/char/tpm/tpm2-sessions.c b/drivers/char/tpm/tpm2-sessions.c index 950a3e48293b..03145a465b5d 100644 --- a/drivers/char/tpm/tpm2-sessions.c +++ b/drivers/char/tpm/tpm2-sessions.c @@ -333,6 +333,9 @@ void tpm_buf_append_hmac_session(struct tpm_chip *chip, struct tpm_buf *buf, } #ifdef CONFIG_TCG_TPM2_HMAC + /* The first write to /dev/tpm{rm0} will flush the session. */ + attributes |= TPM2_SA_CONTINUE_SESSION; + /* * The Architecture Guide requires us to strip trailing zeros * before computing the HMAC @@ -484,7 +487,8 @@ static void tpm2_KDFe(u8 z[EC_PT_SZ], const char *str, u8 *pt_u, u8 *pt_v, sha256_final(&sctx, out); } -static void tpm_buf_append_salt(struct tpm_buf *buf, struct tpm_chip *chip) +static void tpm_buf_append_salt(struct tpm_buf *buf, struct tpm_chip *chip, + struct tpm2_auth *auth) { struct crypto_kpp *kpp; struct kpp_request *req; @@ -543,7 +547,7 @@ static void tpm_buf_append_salt(struct tpm_buf *buf, struct tpm_chip *chip) sg_set_buf(&s[0], chip->null_ec_key_x, EC_PT_SZ); sg_set_buf(&s[1], chip->null_ec_key_y, EC_PT_SZ); kpp_request_set_input(req, s, EC_PT_SZ*2); - sg_init_one(d, chip->auth->salt, EC_PT_SZ); + sg_init_one(d, auth->salt, EC_PT_SZ); kpp_request_set_output(req, d, EC_PT_SZ); crypto_kpp_compute_shared_secret(req); kpp_request_free(req); @@ -554,8 +558,7 @@ static void tpm_buf_append_salt(struct tpm_buf *buf, struct tpm_chip *chip) * This works because KDFe fully consumes the secret before it * writes the salt */ - tpm2_KDFe(chip->auth->salt, "SECRET", x, chip->null_ec_key_x, - chip->auth->salt); + tpm2_KDFe(auth->salt, "SECRET", x, chip->null_ec_key_x, auth->salt); out: crypto_free_kpp(kpp); @@ -853,7 +856,9 @@ int tpm_buf_check_hmac_response(struct tpm_chip *chip, struct tpm_buf *buf, if (rc) /* manually close the session if it wasn't consumed */ tpm2_flush_context(chip, auth->handle); - memzero_explicit(auth, sizeof(*auth)); + + kfree_sensitive(auth); + chip->auth = NULL; } else { /* reset for next use */ auth->session = TPM_HEADER_SIZE; @@ -881,7 +886,8 @@ void tpm2_end_auth_session(struct tpm_chip *chip) return; tpm2_flush_context(chip, auth->handle); - memzero_explicit(auth, sizeof(*auth)); + kfree_sensitive(auth); + chip->auth = NULL; } EXPORT_SYMBOL(tpm2_end_auth_session); @@ -962,16 +968,20 @@ static int tpm2_load_null(struct tpm_chip *chip, u32 *null_key) */ int tpm2_start_auth_session(struct tpm_chip *chip) { + struct tpm2_auth *auth; struct tpm_buf buf; - struct tpm2_auth *auth = chip->auth; - int rc; u32 null_key; + int rc; - if (!auth) { - dev_warn_once(&chip->dev, "auth session is not active\n"); + if (chip->auth) { + dev_warn_once(&chip->dev, "auth session is active\n"); return 0; } + auth = kzalloc(sizeof(*auth), GFP_KERNEL); + if (!auth) + return -ENOMEM; + rc = tpm2_load_null(chip, &null_key); if (rc) goto out; @@ -992,7 +1002,7 @@ int tpm2_start_auth_session(struct tpm_chip *chip) tpm_buf_append(&buf, auth->our_nonce, sizeof(auth->our_nonce)); /* append encrypted salt and squirrel away unencrypted in auth */ - tpm_buf_append_salt(&buf, chip); + tpm_buf_append_salt(&buf, chip, auth); /* session type (HMAC, audit or policy) */ tpm_buf_append_u8(&buf, TPM2_SE_HMAC); @@ -1014,10 +1024,13 @@ int tpm2_start_auth_session(struct tpm_chip *chip) tpm_buf_destroy(&buf); - if (rc) - goto out; + if (rc == TPM2_RC_SUCCESS) { + chip->auth = auth; + return 0; + } - out: +out: + kfree_sensitive(auth); return rc; } EXPORT_SYMBOL(tpm2_start_auth_session); @@ -1367,10 +1380,6 @@ int tpm2_sessions_init(struct tpm_chip *chip) return rc; } - chip->auth = kmalloc(sizeof(*chip->auth), GFP_KERNEL); - if (!chip->auth) - return -ENOMEM; - return rc; } #endif /* CONFIG_TCG_TPM2_HMAC */ -- 2.47.0

11 months

2
2
0 0

[PATCH] dmaengine: ti: dma-crossbar: Add missing put_device in ti_am335x_xbar_route_allocate

by Javier Carrasco

The refcount of the device obtained with of_find_device_by_node() must be decremented when the device is no longer required. Add the missing calls to put_device(&pdev->dev) in the error paths of ti_am335x_xbar_route_allocate(). Cc: stable(a)vger.kernel.org Fixes: 42dbdcc6bf96 ("dmaengine: ti-dma-crossbar: Add support for crossbar on AM33xx/AM43xx") Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com> --- Similar to what commit 615a4bfc426e ("dmaengine: ti: Add missing put_device in ti_dra7_xbar_route_allocate") did for dra7, where the calls to put_device were also missing in the error paths. --- drivers/dma/ti/dma-crossbar.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/dma/ti/dma-crossbar.c b/drivers/dma/ti/dma-crossbar.c index 7f17ee87a6dc..ae596b3fc636 100644 --- a/drivers/dma/ti/dma-crossbar.c +++ b/drivers/dma/ti/dma-crossbar.c @@ -81,18 +81,22 @@ static void *ti_am335x_xbar_route_allocate(struct of_phandle_args *dma_spec, struct ti_am335x_xbar_data *xbar = platform_get_drvdata(pdev); struct ti_am335x_xbar_map *map; - if (dma_spec->args_count != 3) + if (dma_spec->args_count != 3) { + put_device(&pdev->dev); return ERR_PTR(-EINVAL); + } if (dma_spec->args[2] >= xbar->xbar_events) { dev_err(&pdev->dev, "Invalid XBAR event number: %d\n", dma_spec->args[2]); + put_device(&pdev->dev); return ERR_PTR(-EINVAL); } if (dma_spec->args[0] >= xbar->dma_requests) { dev_err(&pdev->dev, "Invalid DMA request line number: %d\n", dma_spec->args[0]); + put_device(&pdev->dev); return ERR_PTR(-EINVAL); } @@ -100,12 +104,14 @@ static void *ti_am335x_xbar_route_allocate(struct of_phandle_args *dma_spec, dma_spec->np = of_parse_phandle(ofdma->of_node, "dma-masters", 0); if (!dma_spec->np) { dev_err(&pdev->dev, "Can't get DMA master\n"); + put_device(&pdev->dev); return ERR_PTR(-EINVAL); } map = kzalloc(sizeof(*map), GFP_KERNEL); if (!map) { of_node_put(dma_spec->np); + put_device(&pdev->dev); return ERR_PTR(-ENOMEM); } --- base-commit: dec9255a128e19c5fcc3bdb18175d78094cc624d change-id: 20241028-ti-dma-crossbar-put_device-39ebab498011 Best regards, -- Javier Carrasco <javier.carrasco.cruz(a)gmail.com>

11 months

1
0
0 0

[PATCH v3] drm/xe/ufence: Flush xe ordered_wq in case of ufence timeout

by Nirmoy Das

Flush xe ordered_wq in case of ufence timeout which is observed on LNL and that points to recent scheduling issue with E-cores. This is similar to the recent fix: commit e51527233804 ("drm/xe/guc/ct: Flush g2h worker in case of g2h response timeout") and should be removed once there is a E-core scheduling fix for LNL. v2: Add platform check(Himal) s/__flush_workqueue/flush_workqueue(Jani) v3: Remove gfx platform check as the issue related to cpu platform(John) Cc: Badal Nilawar <badal.nilawar(a)intel.com> Cc: Matthew Auld <matthew.auld(a)intel.com> Cc: John Harrison <John.C.Harrison(a)Intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray(a)intel.com> Cc: Lucas De Marchi <lucas.demarchi(a)intel.com> Cc: <stable(a)vger.kernel.org> # v6.11+ Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2754 Suggested-by: Matthew Brost <matthew.brost(a)intel.com> Signed-off-by: Nirmoy Das <nirmoy.das(a)intel.com> Reviewed-by: Matthew Brost <matthew.brost(a)intel.com> --- drivers/gpu/drm/xe/xe_wait_user_fence.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_wait_user_fence.c b/drivers/gpu/drm/xe/xe_wait_user_fence.c index f5deb81eba01..886c9862d89c 100644 --- a/drivers/gpu/drm/xe/xe_wait_user_fence.c +++ b/drivers/gpu/drm/xe/xe_wait_user_fence.c @@ -155,6 +155,17 @@ int xe_wait_user_fence_ioctl(struct drm_device *dev, void *data, } if (!timeout) { + /* + * This is analogous to e51527233804 ("drm/xe/guc/ct: Flush g2h worker + * in case of g2h response timeout") + * + * TODO: Drop this change once workqueue scheduling delay issue is + * fixed on LNL Hybrid CPU. + */ + flush_workqueue(xe->ordered_wq); + err = do_compare(addr, args->value, args->mask, args->op); + if (err <= 0) + break; err = -ETIME; break; } -- 2.46.0

11 months

2
2
0 0

[PATCH] f2fs: fix to do sanity check on node blkaddr in truncate_node()

by Chao Yu

syzbot reports a f2fs bug as below: ------------[ cut here ]------------ kernel BUG at fs/f2fs/segment.c:2534! RIP: 0010:f2fs_invalidate_blocks+0x35f/0x370 fs/f2fs/segment.c:2534 Call Trace: truncate_node+0x1ae/0x8c0 fs/f2fs/node.c:909 f2fs_remove_inode_page+0x5c2/0x870 fs/f2fs/node.c:1288 f2fs_evict_inode+0x879/0x15c0 fs/f2fs/inode.c:856 evict+0x4e8/0x9b0 fs/inode.c:723 f2fs_handle_failed_inode+0x271/0x2e0 fs/f2fs/inode.c:986 f2fs_create+0x357/0x530 fs/f2fs/namei.c:394 lookup_open fs/namei.c:3595 [inline] open_last_lookups fs/namei.c:3694 [inline] path_openat+0x1c03/0x3590 fs/namei.c:3930 do_filp_open+0x235/0x490 fs/namei.c:3960 do_sys_openat2+0x13e/0x1d0 fs/open.c:1415 do_sys_open fs/open.c:1430 [inline] __do_sys_openat fs/open.c:1446 [inline] __se_sys_openat fs/open.c:1441 [inline] __x64_sys_openat+0x247/0x2a0 fs/open.c:1441 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0010:f2fs_invalidate_blocks+0x35f/0x370 fs/f2fs/segment.c:2534 The root cause is: on a fuzzed image, blkaddr in nat entry may be corrupted, then it will cause system panic when using it in f2fs_invalidate_blocks(), to avoid this, let's add sanity check on nat blkaddr in truncate_node(). Reported-by: syzbot+33379ce4ac76acf7d0c7(a)syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/0000000000009a6cd706224ca720@googl… Cc: stable(a)vger.kernel.org Signed-off-by: Chao Yu <chao(a)kernel.org> --- fs/f2fs/node.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 59b13ff243fa..af36c6d6542b 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -905,6 +905,16 @@ static int truncate_node(struct dnode_of_data *dn) if (err) return err; + if (ni.blk_addr != NEW_ADDR && + !f2fs_is_valid_blkaddr(sbi, ni.blk_addr, DATA_GENERIC_ENHANCE)) { + f2fs_err_ratelimited(sbi, + "nat entry is corrupted, run fsck to fix it, ino:%u, " + "nid:%u, blkaddr:%u", ni.ino, ni.nid, ni.blk_addr); + set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_handle_error(sbi, ERROR_INCONSISTENT_NAT); + return -EFSCORRUPTED; + } + /* Deallocate node address */ f2fs_invalidate_blocks(sbi, ni.blk_addr); dec_valid_node_count(sbi, dn->inode, dn->nid == dn->inode->i_ino); -- 2.40.1

11 months

2
1
0 0

[PATCH] KVM: arm64: Mark the VM as dead for failed initializations

by Raghavendra Rao Ananta

Syzbot hit the following WARN_ON() in kvm_timer_update_irq(): WARNING: CPU: 0 PID: 3281 at arch/arm64/kvm/arch_timer.c:459 kvm_timer_update_irq+0x21c/0x394 Call trace: kvm_timer_update_irq+0x21c/0x394 arch/arm64/kvm/arch_timer.c:459 kvm_timer_vcpu_reset+0x158/0x684 arch/arm64/kvm/arch_timer.c:968 kvm_reset_vcpu+0x3b4/0x560 arch/arm64/kvm/reset.c:264 kvm_vcpu_set_target arch/arm64/kvm/arm.c:1553 [inline] kvm_arch_vcpu_ioctl_vcpu_init arch/arm64/kvm/arm.c:1573 [inline] kvm_arch_vcpu_ioctl+0x112c/0x1b3c arch/arm64/kvm/arm.c:1695 kvm_vcpu_ioctl+0x4ec/0xf74 virt/kvm/kvm_main.c:4658 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl fs/ioctl.c:893 [inline] __arm64_sys_ioctl+0x108/0x184 fs/ioctl.c:893 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall+0x78/0x1b8 arch/arm64/kernel/syscall.c:49 el0_svc_common+0xe8/0x1b0 arch/arm64/kernel/syscall.c:132 do_el0_svc+0x40/0x50 arch/arm64/kernel/syscall.c:151 el0_svc+0x54/0x14c arch/arm64/kernel/entry-common.c:712 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 The sequence that led to the report is when KVM_ARM_VCPU_INIT ioctl is invoked after a failed first KVM_RUN. In a general sense though, since kvm_arch_vcpu_run_pid_change() doesn't tear down any of the past initiatializations, it's possible that the VM's state could be left half-baked. Any upcoming ioctls could behave erroneously because of this. Since these late vCPU initializations is past the point of attributing the failures to any ioctl, instead of tearing down each of the previous setups, simply mark the VM as dead, gving an opportunity for the userspace to close and try again. Cc: <stable(a)vger.kernel.org> Reported-by: syzbot <syzkaller(a)googlegroups.com> Suggested-by: Oliver Upton <oliver.upton(a)linux.dev> Signed-off-by: Raghavendra Rao Ananta <rananta(a)google.com> --- arch/arm64/kvm/arm.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index a0d01c46e4084..ae3551bc98aeb 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -821,12 +821,12 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu) */ ret = kvm_vgic_map_resources(kvm); if (ret) - return ret; + goto out_err; } ret = kvm_finalize_sys_regs(vcpu); if (ret) - return ret; + goto out_err; /* * This needs to happen after any restriction has been applied @@ -836,16 +836,16 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu) ret = kvm_timer_enable(vcpu); if (ret) - return ret; + goto out_err; ret = kvm_arm_pmu_v3_enable(vcpu); if (ret) - return ret; + goto out_err; if (is_protected_kvm_enabled()) { ret = pkvm_create_hyp_vm(kvm); if (ret) - return ret; + goto out_err; } if (!irqchip_in_kernel(kvm)) { @@ -869,6 +869,10 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu) mutex_unlock(&kvm->arch.config_lock); return ret; + +out_err: + kvm_vm_dead(kvm); + return ret; } bool kvm_arch_intc_initialized(struct kvm *kvm) -- 2.47.0.163.g1226f6d8fa-goog

11 months

3
5
0 0

[PATCH v2 0/2] clocksource/drivers/timer-ti-dm: fix child node refcount handling

by Javier Carrasco

This series adds the missing calls to of_node_put(arm_timer) to release the resource, and then switches to the more robust approach that makes use of the automatic cleanup facility (not available for all stable kernels). Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com> --- Changes in v2: - Add second patch for automatic cleanup. - Link to v1: https://lore.kernel.org/r/20241013-timer-ti-dm-systimer-of_node_put-v1-1-0c… --- Javier Carrasco (2): clocksource/drivers/timer-ti-dm: fix child node refcount handling clocksource/drivers/timer-ti-dm: automate device_node cleanup in dmtimer_percpu_quirk_init() drivers/clocksource/timer-ti-dm-systimer.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- base-commit: d61a00525464bfc5fe92c6ad713350988e492b88 change-id: 20241013-timer-ti-dm-systimer-of_node_put-d42735687698 Best regards, -- Javier Carrasco <javier.carrasco.cruz(a)gmail.com>

11 months

1
1
0 0

[PATCH 0/2] virt: fsl: fix missing of_node_put() on early exit from for_each_compatible_node()

by Javier Carrasco

This short series fixes an old bug that only happens if a memory allocation fails, which might be the reason why it was never found. When at it an unnecessary jump to a label has been removed to simplify the error path and avoid jumps out of the loop, which might have hidden the bug while reading the code. Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com> --- Javier Carrasco (2): virt: fsl: fix missing of_node_put() on early exit from for_each_compatible_node() virt: fsl: refactor out_of_memory label drivers/virt/fsl_hypervisor.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) --- base-commit: dec9255a128e19c5fcc3bdb18175d78094cc624d change-id: 20241028-fsl_hypervisor-of_node_put-5051e664f038 Best regards, -- Javier Carrasco <javier.carrasco.cruz(a)gmail.com>

11 months

1
1
0 0

[PATCH v8 1/3] tpm: Return tpm2_sessions_init() when null key creation fails

by Jarkko Sakkinen

Do not continue tpm2_sessions_init() further if the null key pair creation fails. Cc: stable(a)vger.kernel.org # v6.10+ Fixes: d2add27cf2b8 ("tpm: Add NULL primary creation") Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org> --- v8: - Refine commit message. v7: - Add the error message back but fix it up a bit: 1. Remove 'TPM:' given dev_err(). 2. s/NULL/null/ as this has nothing to do with the macro in libc. 3. Fix the reasoning: null key creation failed v6: - Address: https://lore.kernel.org/linux-integrity/69c893e7-6b87-4daa-80db-44d1120e80f… as TPM RC is taken care of at the call site. Add also the missing documentation for the return values. v5: - Do not print klog messages on error, as tpm2_save_context() already takes care of this. v4: - Fixed up stable version. v3: - Handle TPM and POSIX error separately and return -ENODEV always back to the caller. v2: - Refined the commit message. --- drivers/char/tpm/tpm2-sessions.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/char/tpm/tpm2-sessions.c b/drivers/char/tpm/tpm2-sessions.c index d3521aadd43e..a0306126e86c 100644 --- a/drivers/char/tpm/tpm2-sessions.c +++ b/drivers/char/tpm/tpm2-sessions.c @@ -1347,14 +1347,21 @@ static int tpm2_create_null_primary(struct tpm_chip *chip) * * Derive and context save the null primary and allocate memory in the * struct tpm_chip for the authorizations. + * + * Return: + * * 0 - OK + * * -errno - A system error + * * TPM_RC - A TPM error */ int tpm2_sessions_init(struct tpm_chip *chip) { int rc; rc = tpm2_create_null_primary(chip); - if (rc) - dev_err(&chip->dev, "TPM: security failed (NULL seed derivation): %d\n", rc); + if (rc) { + dev_err(&chip->dev, "null key creation failed with %d\n", rc); + return rc; + } chip->auth = kmalloc(sizeof(*chip->auth), GFP_KERNEL); if (!chip->auth) -- 2.47.0

11 months

2
2
0 0

[PATCH] clocksource/drivers/timer-ti-dm: fix child node refcount handling

by Javier Carrasco

of_find_compatible_node() increments the node's refcount, and it must be decremented again with a call to of_node_put() when the pointer is no longer required to avoid leaking memory. Add the missing calls to of_node_put() in dmtimer_percpu_quirck_init() for the 'arm_timer' device node. Cc: stable(a)vger.kernel.org Fixes: 25de4ce5ed02 ("clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940") Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com> --- drivers/clocksource/timer-ti-dm-systimer.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/clocksource/timer-ti-dm-systimer.c b/drivers/clocksource/timer-ti-dm-systimer.c index c2dcd8d68e45..23be1d21ce21 100644 --- a/drivers/clocksource/timer-ti-dm-systimer.c +++ b/drivers/clocksource/timer-ti-dm-systimer.c @@ -691,8 +691,10 @@ static int __init dmtimer_percpu_quirk_init(struct device_node *np, u32 pa) arm_timer = of_find_compatible_node(NULL, NULL, "arm,armv7-timer"); if (of_device_is_available(arm_timer)) { pr_warn_once("ARM architected timer wrap issue i940 detected\n"); + of_node_put(arm_timer); return 0; } + of_node_put(arm_timer); if (pa == 0x4882c000) /* dra7 dmtimer15 */ return dmtimer_percpu_timer_init(np, 0); --- base-commit: d61a00525464bfc5fe92c6ad713350988e492b88 change-id: 20241013-timer-ti-dm-systimer-of_node_put-d42735687698 Best regards, -- Javier Carrasco <javier.carrasco.cruz(a)gmail.com>

11 months

2
3
0 0

[PATCH hotfix 6.12] mm, mmap: limit THP aligment of anonymous mappings to PMD-aligned sizes

by Vlastimil Babka

Since commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") a mmap() of anonymous memory without a specific address hint and of at least PMD_SIZE will be aligned to PMD so that it can benefit from a THP backing page. However this change has been shown to regress some workloads significantly. [1] reports regressions in various spec benchmarks, with up to 600% slowdown of the cactusBSSN benchmark on some platforms. The benchmark seems to create many mappings of 4632kB, which would have merged to a large THP-backed area before commit efa7df3e3bb5 and now they are fragmented to multiple areas each aligned to PMD boundary with gaps between. The regression then seems to be caused mainly due to the benchmark's memory access pattern suffering from TLB or cache aliasing due to the aligned boundaries of the individual areas. Another known regression bisected to commit efa7df3e3bb5 is darktable [2] [3] and early testing suggests this patch fixes the regression there as well. To fix the regression but still try to benefit from THP-friendly anonymous mapping alignment, add a condition that the size of the mapping must be a multiple of PMD size instead of at least PMD size. In case of many odd-sized mapping like the cactusBSSN creates, those will stop being aligned and with gaps between, and instead naturally merge again. Reported-by: Michael Matz <matz(a)suse.de> Debugged-by: Gabriel Krisman Bertazi <gabriel(a)krisman.be> Closes: https://bugzilla.suse.com/show_bug.cgi?id=1229012 [1] Reported-by: Matthias Bodenbinder <matthias(a)bodenbinder.de> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219366 [2] Closes: https://lore.kernel.org/all/2050f0d4-57b0-481d-bab8-05e8d48fed0c@leemhuis.i… [3] Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") Cc: <stable(a)vger.kernel.org> Cc: Rik van Riel <riel(a)surriel.com> Cc: Yang Shi <yang(a)os.amperecomputing.com> Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz> --- mm/mmap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/mmap.c b/mm/mmap.c index 9c0fb43064b5..a5297cfb1dfc 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -900,7 +900,8 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, if (get_area) { addr = get_area(file, addr, len, pgoff, flags); - } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { + } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) + && IS_ALIGNED(len, PMD_SIZE)) { /* Ensures that larger anonymous mappings are THP aligned. */ addr = thp_get_unmapped_area_vmflags(file, addr, len, pgoff, flags, vm_flags); -- 2.47.0

11 months

5
8
0 0

[PATCH v2] drm: xlnx: zynqmp_dpsub: fix hotplug detection

by Steffen Dirkwinkel

From: Steffen Dirkwinkel <s.dirkwinkel(a)beckhoff.com> drm_kms_helper_poll_init needs to be called after zynqmp_dpsub_kms_init. zynqmp_dpsub_kms_init creates the connector and without it we don't enable hotplug detection. Fixes: eb2d64bfcc17 ("drm: xlnx: zynqmp_dpsub: Report HPD through the bridge") Signed-off-by: Steffen Dirkwinkel <s.dirkwinkel(a)beckhoff.com> --- drivers/gpu/drm/xlnx/zynqmp_kms.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/xlnx/zynqmp_kms.c b/drivers/gpu/drm/xlnx/zynqmp_kms.c index bd1368df7870..311397cee5ca 100644 --- a/drivers/gpu/drm/xlnx/zynqmp_kms.c +++ b/drivers/gpu/drm/xlnx/zynqmp_kms.c @@ -509,12 +509,12 @@ int zynqmp_dpsub_drm_init(struct zynqmp_dpsub *dpsub) if (ret) return ret; - drm_kms_helper_poll_init(drm); - ret = zynqmp_dpsub_kms_init(dpsub); if (ret < 0) goto err_poll_fini; + drm_kms_helper_poll_init(drm); + /* Reset all components and register the DRM device. */ drm_mode_config_reset(drm); -- 2.47.0

11 months

2
1
0 0

[tip: sched/urgent] sched/numa: Fix the potential null pointer dereference in task_numa_work()

by tip-bot2 for Shawn Wang

The following commit has been merged into the sched/urgent branch of tip: Commit-ID: 9c70b2a33cd2aa6a5a59c5523ef053bd42265209 Gitweb: https://git.kernel.org/tip/9c70b2a33cd2aa6a5a59c5523ef053bd42265209 Author: Shawn Wang <shawnwang(a)linux.alibaba.com> AuthorDate: Fri, 25 Oct 2024 10:22:08 +08:00 Committer: Peter Zijlstra <peterz(a)infradead.org> CommitterDate: Sat, 26 Oct 2024 09:28:37 +02:00 sched/numa: Fix the potential null pointer dereference in task_numa_work() When running stress-ng-vm-segv test, we found a null pointer dereference error in task_numa_work(). Here is the backtrace: [323676.066985] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 ...... [323676.067108] CPU: 35 PID: 2694524 Comm: stress-ng-vm-se ...... [323676.067113] pstate: 23401009 (nzCv daif +PAN -UAO +TCO +DIT +SSBS BTYPE=--) [323676.067115] pc : vma_migratable+0x1c/0xd0 [323676.067122] lr : task_numa_work+0x1ec/0x4e0 [323676.067127] sp : ffff8000ada73d20 [323676.067128] x29: ffff8000ada73d20 x28: 0000000000000000 x27: 000000003e89f010 [323676.067130] x26: 0000000000080000 x25: ffff800081b5c0d8 x24: ffff800081b27000 [323676.067133] x23: 0000000000010000 x22: 0000000104d18cc0 x21: ffff0009f7158000 [323676.067135] x20: 0000000000000000 x19: 0000000000000000 x18: ffff8000ada73db8 [323676.067138] x17: 0001400000000000 x16: ffff800080df40b0 x15: 0000000000000035 [323676.067140] x14: ffff8000ada73cc8 x13: 1fffe0017cc72001 x12: ffff8000ada73cc8 [323676.067142] x11: ffff80008001160c x10: ffff000be639000c x9 : ffff8000800f4ba4 [323676.067145] x8 : ffff000810375000 x7 : ffff8000ada73974 x6 : 0000000000000001 [323676.067147] x5 : 0068000b33e26707 x4 : 0000000000000001 x3 : ffff0009f7158000 [323676.067149] x2 : 0000000000000041 x1 : 0000000000004400 x0 : 0000000000000000 [323676.067152] Call trace: [323676.067153] vma_migratable+0x1c/0xd0 [323676.067155] task_numa_work+0x1ec/0x4e0 [323676.067157] task_work_run+0x78/0xd8 [323676.067161] do_notify_resume+0x1ec/0x290 [323676.067163] el0_svc+0x150/0x160 [323676.067167] el0t_64_sync_handler+0xf8/0x128 [323676.067170] el0t_64_sync+0x17c/0x180 [323676.067173] Code: d2888001 910003fd f9000bf3 aa0003f3 (f9401000) [323676.067177] SMP: stopping secondary CPUs [323676.070184] Starting crashdump kernel... stress-ng-vm-segv in stress-ng is used to stress test the SIGSEGV error handling function of the system, which tries to cause a SIGSEGV error on return from unmapping the whole address space of the child process. Normally this program will not cause kernel crashes. But before the munmap system call returns to user mode, a potential task_numa_work() for numa balancing could be added and executed. In this scenario, since the child process has no vma after munmap, the vma_next() in task_numa_work() will return a null pointer even if the vma iterator restarts from 0. Recheck the vma pointer before dereferencing it in task_numa_work(). Fixes: 214dbc428137 ("sched: convert to vma iterator") Signed-off-by: Shawn Wang <shawnwang(a)linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Cc: stable(a)vger.kernel.org # v6.2+ Link: https://lkml.kernel.org/r/20241025022208.125527-1-shawnwang@linux.alibaba.c… --- kernel/sched/fair.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8796146..2d16c85 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3369,7 +3369,7 @@ retry_pids: vma = vma_next(&vmi); } - do { + for (; vma; vma = vma_next(&vmi)) { if (!vma_migratable(vma) || !vma_policy_mof(vma) || is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) { trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_UNSUITABLE); @@ -3491,7 +3491,7 @@ retry_pids: */ if (vma_pids_forced) break; - } for_each_vma(vmi, vma); + } /* * If no VMAs are remaining and VMAs were skipped due to the PID

11 months

1
0
0 0

[PATCH 1/3] gpiolib: fix debugfs newline separators

by Johan Hovold

The gpiolib debugfs interface exports a list of all gpio chips in a system and the state of their pins. The gpio chip sections are supposed to be separated by a newline character, but a long-standing bug prevents the separator from being included when output is generated in multiple sessions, making the output inconsistent and hard to read. Make sure to only suppress the newline separator at the beginning of the file as intended. Fixes: f9c4a31f6150 ("gpiolib: Use seq_file's iterator interface") Cc: stable(a)vger.kernel.org # 3.7 Cc: Thierry Reding <treding(a)nvidia.com> Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org> --- drivers/gpio/gpiolib.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index d5952ab7752c..e27488a90bc9 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -4926,6 +4926,8 @@ static void *gpiolib_seq_start(struct seq_file *s, loff_t *pos) return NULL; s->private = priv; + if (*pos > 0) + priv->newline = true; priv->idx = srcu_read_lock(&gpio_devices_srcu); list_for_each_entry_srcu(gdev, &gpio_devices, list, -- 2.45.2

11 months

1
0
0 0

[PATCH AUTOSEL 6.11 01/32] xfrm: extract dst lookup parameters into a struct

by Sasha Levin

From: Eyal Birger <eyal.birger(a)gmail.com> [ Upstream commit e509996b16728e37d5a909a5c63c1bd64f23b306 ] Preparation for adding more fields to dst lookup functions without changing their signatures. Signed-off-by: Eyal Birger <eyal.birger(a)gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert(a)secunet.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- include/net/xfrm.h | 26 +++++++++++++------------- net/ipv4/xfrm4_policy.c | 38 ++++++++++++++++---------------------- net/ipv6/xfrm6_policy.c | 28 +++++++++++++--------------- net/xfrm/xfrm_device.c | 11 ++++++++--- net/xfrm/xfrm_policy.c | 35 +++++++++++++++++++++++------------ 5 files changed, 73 insertions(+), 65 deletions(-) diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 54cef89f6c1ec..0f49f70dfd141 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -349,20 +349,23 @@ struct xfrm_if_cb { void xfrm_if_register_cb(const struct xfrm_if_cb *ifcb); void xfrm_if_unregister_cb(void); +struct xfrm_dst_lookup_params { + struct net *net; + int tos; + int oif; + xfrm_address_t *saddr; + xfrm_address_t *daddr; + u32 mark; +}; + struct net_device; struct xfrm_type; struct xfrm_dst; struct xfrm_policy_afinfo { struct dst_ops *dst_ops; - struct dst_entry *(*dst_lookup)(struct net *net, - int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - u32 mark); - int (*get_saddr)(struct net *net, int oif, - xfrm_address_t *saddr, - xfrm_address_t *daddr, - u32 mark); + struct dst_entry *(*dst_lookup)(const struct xfrm_dst_lookup_params *params); + int (*get_saddr)(xfrm_address_t *saddr, + const struct xfrm_dst_lookup_params *params); int (*fill_dst)(struct xfrm_dst *xdst, struct net_device *dev, const struct flowi *fl); @@ -1735,10 +1738,7 @@ static inline int xfrm_user_policy(struct sock *sk, int optname, } #endif -struct dst_entry *__xfrm_dst_lookup(struct net *net, int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - int family, u32 mark); +struct dst_entry *__xfrm_dst_lookup(int family, const struct xfrm_dst_lookup_params *params); struct xfrm_policy *xfrm_policy_alloc(struct net *net, gfp_t gfp); diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c index 0294fef577fab..ac1a28ef0c560 100644 --- a/net/ipv4/xfrm4_policy.c +++ b/net/ipv4/xfrm4_policy.c @@ -17,47 +17,41 @@ #include <net/ip.h> #include <net/l3mdev.h> -static struct dst_entry *__xfrm4_dst_lookup(struct net *net, struct flowi4 *fl4, - int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - u32 mark) +static struct dst_entry *__xfrm4_dst_lookup(struct flowi4 *fl4, + const struct xfrm_dst_lookup_params *params) { struct rtable *rt; memset(fl4, 0, sizeof(*fl4)); - fl4->daddr = daddr->a4; - fl4->flowi4_tos = tos; - fl4->flowi4_l3mdev = l3mdev_master_ifindex_by_index(net, oif); - fl4->flowi4_mark = mark; - if (saddr) - fl4->saddr = saddr->a4; - - rt = __ip_route_output_key(net, fl4); + fl4->daddr = params->daddr->a4; + fl4->flowi4_tos = params->tos; + fl4->flowi4_l3mdev = l3mdev_master_ifindex_by_index(params->net, + params->oif); + fl4->flowi4_mark = params->mark; + if (params->saddr) + fl4->saddr = params->saddr->a4; + + rt = __ip_route_output_key(params->net, fl4); if (!IS_ERR(rt)) return &rt->dst; return ERR_CAST(rt); } -static struct dst_entry *xfrm4_dst_lookup(struct net *net, int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - u32 mark) +static struct dst_entry *xfrm4_dst_lookup(const struct xfrm_dst_lookup_params *params) { struct flowi4 fl4; - return __xfrm4_dst_lookup(net, &fl4, tos, oif, saddr, daddr, mark); + return __xfrm4_dst_lookup(&fl4, params); } -static int xfrm4_get_saddr(struct net *net, int oif, - xfrm_address_t *saddr, xfrm_address_t *daddr, - u32 mark) +static int xfrm4_get_saddr(xfrm_address_t *saddr, + const struct xfrm_dst_lookup_params *params) { struct dst_entry *dst; struct flowi4 fl4; - dst = __xfrm4_dst_lookup(net, &fl4, 0, oif, NULL, daddr, mark); + dst = __xfrm4_dst_lookup(&fl4, params); if (IS_ERR(dst)) return -EHOSTUNREACH; diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c index b1d81c4270ab3..fc3f5eec68985 100644 --- a/net/ipv6/xfrm6_policy.c +++ b/net/ipv6/xfrm6_policy.c @@ -23,23 +23,21 @@ #include <net/ip6_route.h> #include <net/l3mdev.h> -static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - u32 mark) +static struct dst_entry *xfrm6_dst_lookup(const struct xfrm_dst_lookup_params *params) { struct flowi6 fl6; struct dst_entry *dst; int err; memset(&fl6, 0, sizeof(fl6)); - fl6.flowi6_l3mdev = l3mdev_master_ifindex_by_index(net, oif); - fl6.flowi6_mark = mark; - memcpy(&fl6.daddr, daddr, sizeof(fl6.daddr)); - if (saddr) - memcpy(&fl6.saddr, saddr, sizeof(fl6.saddr)); + fl6.flowi6_l3mdev = l3mdev_master_ifindex_by_index(params->net, + params->oif); + fl6.flowi6_mark = params->mark; + memcpy(&fl6.daddr, params->daddr, sizeof(fl6.daddr)); + if (params->saddr) + memcpy(&fl6.saddr, params->saddr, sizeof(fl6.saddr)); - dst = ip6_route_output(net, NULL, &fl6); + dst = ip6_route_output(params->net, NULL, &fl6); err = dst->error; if (dst->error) { @@ -50,15 +48,14 @@ static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, int oif, return dst; } -static int xfrm6_get_saddr(struct net *net, int oif, - xfrm_address_t *saddr, xfrm_address_t *daddr, - u32 mark) +static int xfrm6_get_saddr(xfrm_address_t *saddr, + const struct xfrm_dst_lookup_params *params) { struct dst_entry *dst; struct net_device *dev; struct inet6_dev *idev; - dst = xfrm6_dst_lookup(net, 0, oif, NULL, daddr, mark); + dst = xfrm6_dst_lookup(params); if (IS_ERR(dst)) return -EHOSTUNREACH; @@ -68,7 +65,8 @@ static int xfrm6_get_saddr(struct net *net, int oif, return -EHOSTUNREACH; } dev = idev->dev; - ipv6_dev_get_saddr(dev_net(dev), dev, &daddr->in6, 0, &saddr->in6); + ipv6_dev_get_saddr(dev_net(dev), dev, &params->daddr->in6, 0, + &saddr->in6); dst_release(dst); return 0; } diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c index 9a44d363ba620..fcd67fdfe79bd 100644 --- a/net/xfrm/xfrm_device.c +++ b/net/xfrm/xfrm_device.c @@ -269,6 +269,8 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x, dev = dev_get_by_index(net, xuo->ifindex); if (!dev) { + struct xfrm_dst_lookup_params params; + if (!(xuo->flags & XFRM_OFFLOAD_INBOUND)) { saddr = &x->props.saddr; daddr = &x->id.daddr; @@ -277,9 +279,12 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x, daddr = &x->props.saddr; } - dst = __xfrm_dst_lookup(net, 0, 0, saddr, daddr, - x->props.family, - xfrm_smark_get(0, x)); + memset(&params, 0, sizeof(params)); + params.net = net; + params.saddr = saddr; + params.daddr = daddr; + params.mark = xfrm_smark_get(0, x); + dst = __xfrm_dst_lookup(x->props.family, &params); if (IS_ERR(dst)) return (is_packet_offload) ? -EINVAL : 0; diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index c56c61b0c12ef..1025b5b3a1dd6 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -267,10 +267,8 @@ static const struct xfrm_if_cb *xfrm_if_get_cb(void) return rcu_dereference(xfrm_if_cb); } -struct dst_entry *__xfrm_dst_lookup(struct net *net, int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - int family, u32 mark) +struct dst_entry *__xfrm_dst_lookup(int family, + const struct xfrm_dst_lookup_params *params) { const struct xfrm_policy_afinfo *afinfo; struct dst_entry *dst; @@ -279,7 +277,7 @@ struct dst_entry *__xfrm_dst_lookup(struct net *net, int tos, int oif, if (unlikely(afinfo == NULL)) return ERR_PTR(-EAFNOSUPPORT); - dst = afinfo->dst_lookup(net, tos, oif, saddr, daddr, mark); + dst = afinfo->dst_lookup(params); rcu_read_unlock(); @@ -293,6 +291,7 @@ static inline struct dst_entry *xfrm_dst_lookup(struct xfrm_state *x, xfrm_address_t *prev_daddr, int family, u32 mark) { + struct xfrm_dst_lookup_params params; struct net *net = xs_net(x); xfrm_address_t *saddr = &x->props.saddr; xfrm_address_t *daddr = &x->id.daddr; @@ -307,7 +306,14 @@ static inline struct dst_entry *xfrm_dst_lookup(struct xfrm_state *x, daddr = x->coaddr; } - dst = __xfrm_dst_lookup(net, tos, oif, saddr, daddr, family, mark); + params.net = net; + params.saddr = saddr; + params.daddr = daddr; + params.tos = tos; + params.oif = oif; + params.mark = mark; + + dst = __xfrm_dst_lookup(family, &params); if (!IS_ERR(dst)) { if (prev_saddr != saddr) @@ -2440,15 +2446,15 @@ int __xfrm_sk_clone_policy(struct sock *sk, const struct sock *osk) } static int -xfrm_get_saddr(struct net *net, int oif, xfrm_address_t *local, - xfrm_address_t *remote, unsigned short family, u32 mark) +xfrm_get_saddr(unsigned short family, xfrm_address_t *saddr, + const struct xfrm_dst_lookup_params *params) { int err; const struct xfrm_policy_afinfo *afinfo = xfrm_policy_get_afinfo(family); if (unlikely(afinfo == NULL)) return -EINVAL; - err = afinfo->get_saddr(net, oif, local, remote, mark); + err = afinfo->get_saddr(saddr, params); rcu_read_unlock(); return err; } @@ -2477,9 +2483,14 @@ xfrm_tmpl_resolve_one(struct xfrm_policy *policy, const struct flowi *fl, remote = &tmpl->id.daddr; local = &tmpl->saddr; if (xfrm_addr_any(local, tmpl->encap_family)) { - error = xfrm_get_saddr(net, fl->flowi_oif, - &tmp, remote, - tmpl->encap_family, 0); + struct xfrm_dst_lookup_params params; + + memset(&params, 0, sizeof(params)); + params.net = net; + params.oif = fl->flowi_oif; + params.daddr = remote; + error = xfrm_get_saddr(tmpl->encap_family, &tmp, + &params); if (error) goto fail; local = &tmp; -- 2.43.0

11 months

2
32
0 0

[REGRESSION] mold linker depends on ETXTBSY, but open(2) no longer returns it

by Rui Ueyama

I'm the creator and the maintainer of the mold linker (https://github.com/rui314/mold). Recently, we discovered that mold started causing process crashes in certain situations due to a change in the Linux kernel. Here are the details: - In general, overwriting an existing file is much faster than creating an empty file and writing to it on Linux, so mold attempts to reuse an existing executable file if it exists. - If a program is running, opening the executable file for writing previously failed with ETXTBSY. If that happens, mold falls back to creating a new file. - However, the Linux kernel recently changed the behavior so that writing to an executable file is now always permitted (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…). That caused mold to write to an executable file even if there's a process running that file. Since changes to mmap'ed files are immediately visible to other processes, any processes running that file would almost certainly crash in a very mysterious way. Identifying the cause of these random crashes took us a few days. Rejecting writes to an executable file that is currently running is a well-known behavior, and Linux had operated that way for a very long time. So, I don’t believe relying on this behavior was our mistake; rather, I see this as a regression in the Linux kernel. Here is a bug report to the mold linker: https://github.com/rui314/mold/issues/1361 #regzbot introduced: 2a010c41285345da60cece35575b4e0af7e7bf44

11 months

1
0
0 0

Re: Patch "xfrm: Add Direction to the SA in or out" has been added to the 6.6-stable tree

by Antony Antony

On Sat, Oct 26, 2024 at 03:40:02 -0400, Sasha Levin wrote: > This is a note to let you know that I've just added the patch titled > > xfrm: Add Direction to the SA in or out This patch is a part of a new feature SA direction and it appears the auto patch selector picked one patch out of patch set? I think this patch alone should not be applied to older stable kernel. -antony > > to the 6.6-stable tree which can be found at: > http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… > > The filename of the patch is: > xfrm-add-direction-to-the-sa-in-or-out.patch > and it can be found in the queue-6.6 subdirectory. > > If you, or anyone else, feels it should not be added to the stable tree, > please let <stable(a)vger.kernel.org> know about it. > > > > commit 3f97c69c749f417158bc3d69478204562fc8c98d > Author: Antony Antony <antony.antony(a)secunet.com> > Date: Tue Apr 30 09:08:52 2024 +0200 > > xfrm: Add Direction to the SA in or out > > [ Upstream commit a4a87fa4e96c7746e009de06a567688fd9af6013 ] > > This patch introduces the 'dir' attribute, 'in' or 'out', to the > xfrm_state, SA, enhancing usability by delineating the scope of values > based on direction. An input SA will restrict values pertinent to input, > effectively segregating them from output-related values. > And an output SA will restrict attributes for output. This change aims > to streamline the configuration process and improve the overall > consistency of SA attributes during configuration. > > This feature sets the groundwork for future patches, including > the upcoming IP-TFS patch. > > Signed-off-by: Antony Antony <antony.antony(a)secunet.com> > Reviewed-by: Sabrina Dubroca <sd(a)queasysnail.net> > Signed-off-by: Steffen Klassert <steffen.klassert(a)secunet.com> > Stable-dep-of: 3f0ab59e6537 ("xfrm: validate new SA's prefixlen using SA family when sel.family is unset") > Signed-off-by: Sasha Levin <sashal(a)kernel.org> > > diff --git a/include/net/xfrm.h b/include/net/xfrm.h > index 93a9866ee481f..c5cf062afd4a2 100644 > --- a/include/net/xfrm.h > +++ b/include/net/xfrm.h > @@ -292,6 +292,7 @@ struct xfrm_state { > /* Private data of this transformer, format is opaque, > * interpreted by xfrm_type methods. */ > void *data; > + u8 dir; > }; > > static inline struct net *xs_net(struct xfrm_state *x) > diff --git a/include/uapi/linux/xfrm.h b/include/uapi/linux/xfrm.h > index 23543c33fee82..7cd491caef354 100644 > --- a/include/uapi/linux/xfrm.h > +++ b/include/uapi/linux/xfrm.h > @@ -140,6 +140,11 @@ enum { > XFRM_POLICY_MAX = 3 > }; > > +enum xfrm_sa_dir { > + XFRM_SA_DIR_IN = 1, > + XFRM_SA_DIR_OUT = 2 > +}; > + > enum { > XFRM_SHARE_ANY, /* No limitations */ > XFRM_SHARE_SESSION, /* For this session only */ > @@ -314,6 +319,7 @@ enum xfrm_attr_type_t { > XFRMA_SET_MARK_MASK, /* __u32 */ > XFRMA_IF_ID, /* __u32 */ > XFRMA_MTIMER_THRESH, /* __u32 in seconds for input SA */ > + XFRMA_SA_DIR, /* __u8 */ > __XFRMA_MAX > > #define XFRMA_OUTPUT_MARK XFRMA_SET_MARK /* Compatibility */ > diff --git a/net/xfrm/xfrm_compat.c b/net/xfrm/xfrm_compat.c > index 655fe4ff86212..703d4172c7d73 100644 > --- a/net/xfrm/xfrm_compat.c > +++ b/net/xfrm/xfrm_compat.c > @@ -98,6 +98,7 @@ static const int compat_msg_min[XFRM_NR_MSGTYPES] = { > }; > > static const struct nla_policy compat_policy[XFRMA_MAX+1] = { > + [XFRMA_UNSPEC] = { .strict_start_type = XFRMA_SA_DIR }, > [XFRMA_SA] = { .len = XMSGSIZE(compat_xfrm_usersa_info)}, > [XFRMA_POLICY] = { .len = XMSGSIZE(compat_xfrm_userpolicy_info)}, > [XFRMA_LASTUSED] = { .type = NLA_U64}, > @@ -129,6 +130,7 @@ static const struct nla_policy compat_policy[XFRMA_MAX+1] = { > [XFRMA_SET_MARK_MASK] = { .type = NLA_U32 }, > [XFRMA_IF_ID] = { .type = NLA_U32 }, > [XFRMA_MTIMER_THRESH] = { .type = NLA_U32 }, > + [XFRMA_SA_DIR] = NLA_POLICY_RANGE(NLA_U8, XFRM_SA_DIR_IN, XFRM_SA_DIR_OUT), > }; > > static struct nlmsghdr *xfrm_nlmsg_put_compat(struct sk_buff *skb, > @@ -277,9 +279,10 @@ static int xfrm_xlate64_attr(struct sk_buff *dst, const struct nlattr *src) > case XFRMA_SET_MARK_MASK: > case XFRMA_IF_ID: > case XFRMA_MTIMER_THRESH: > + case XFRMA_SA_DIR: > return xfrm_nla_cpy(dst, src, nla_len(src)); > default: > - BUILD_BUG_ON(XFRMA_MAX != XFRMA_MTIMER_THRESH); > + BUILD_BUG_ON(XFRMA_MAX != XFRMA_SA_DIR); > pr_warn_once("unsupported nla_type %d\n", src->nla_type); > return -EOPNOTSUPP; > } > @@ -434,7 +437,7 @@ static int xfrm_xlate32_attr(void *dst, const struct nlattr *nla, > int err; > > if (type > XFRMA_MAX) { > - BUILD_BUG_ON(XFRMA_MAX != XFRMA_MTIMER_THRESH); > + BUILD_BUG_ON(XFRMA_MAX != XFRMA_SA_DIR); > NL_SET_ERR_MSG(extack, "Bad attribute"); > return -EOPNOTSUPP; > } > diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c > index 04dc0c8a83707..fc18b9b4f22f3 100644 > --- a/net/xfrm/xfrm_device.c > +++ b/net/xfrm/xfrm_device.c > @@ -253,6 +253,12 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x, > return -EINVAL; > } > > + if ((xuo->flags & XFRM_OFFLOAD_INBOUND && x->dir == XFRM_SA_DIR_OUT) || > + (!(xuo->flags & XFRM_OFFLOAD_INBOUND) && x->dir == XFRM_SA_DIR_IN)) { > + NL_SET_ERR_MSG(extack, "Mismatched SA and offload direction"); > + return -EINVAL; > + } > + > is_packet_offload = xuo->flags & XFRM_OFFLOAD_PACKET; > > /* We don't yet support UDP encapsulation and TFC padding. */ > diff --git a/net/xfrm/xfrm_replay.c b/net/xfrm/xfrm_replay.c > index ce56d659c55a6..bc56c63057252 100644 > --- a/net/xfrm/xfrm_replay.c > +++ b/net/xfrm/xfrm_replay.c > @@ -778,7 +778,8 @@ int xfrm_init_replay(struct xfrm_state *x, struct netlink_ext_ack *extack) > } > > if (x->props.flags & XFRM_STATE_ESN) { > - if (replay_esn->replay_window == 0) { > + if (replay_esn->replay_window == 0 && > + (!x->dir || x->dir == XFRM_SA_DIR_IN)) { > NL_SET_ERR_MSG(extack, "ESN replay window must be > 0"); > return -EINVAL; > } > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c > index 8a6e8656d014f..93c19f64746fa 100644 > --- a/net/xfrm/xfrm_state.c > +++ b/net/xfrm/xfrm_state.c > @@ -1349,6 +1349,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > if (km_query(x, tmpl, pol) == 0) { > spin_lock_bh(&net->xfrm.xfrm_state_lock); > x->km.state = XFRM_STATE_ACQ; > + x->dir = XFRM_SA_DIR_OUT; > list_add(&x->km.all, &net->xfrm.state_all); > XFRM_STATE_INSERT(bydst, &x->bydst, > net->xfrm.state_bydst + h, > @@ -1801,6 +1802,7 @@ static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig, > x->lastused = orig->lastused; > x->new_mapping = 0; > x->new_mapping_sport = 0; > + x->dir = orig->dir; > > return x; > > @@ -1921,8 +1923,14 @@ int xfrm_state_update(struct xfrm_state *x) > } > > if (x1->km.state == XFRM_STATE_ACQ) { > + if (x->dir && x1->dir != x->dir) > + goto out; > + > __xfrm_state_insert(x); > x = NULL; > + } else { > + if (x1->dir != x->dir) > + goto out; > } > err = 0; > > diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c > index 979f23cded401..4328e81ea6a31 100644 > --- a/net/xfrm/xfrm_user.c > +++ b/net/xfrm/xfrm_user.c > @@ -130,7 +130,7 @@ static inline int verify_sec_ctx_len(struct nlattr **attrs, struct netlink_ext_a > } > > static inline int verify_replay(struct xfrm_usersa_info *p, > - struct nlattr **attrs, > + struct nlattr **attrs, u8 sa_dir, > struct netlink_ext_ack *extack) > { > struct nlattr *rt = attrs[XFRMA_REPLAY_ESN_VAL]; > @@ -168,6 +168,30 @@ static inline int verify_replay(struct xfrm_usersa_info *p, > return -EINVAL; > } > > + if (sa_dir == XFRM_SA_DIR_OUT) { > + if (rs->replay_window) { > + NL_SET_ERR_MSG(extack, "Replay window should be 0 for output SA"); > + return -EINVAL; > + } > + if (rs->seq || rs->seq_hi) { > + NL_SET_ERR_MSG(extack, > + "Replay seq and seq_hi should be 0 for output SA"); > + return -EINVAL; > + } > + if (rs->bmp_len) { > + NL_SET_ERR_MSG(extack, "Replay bmp_len should 0 for output SA"); > + return -EINVAL; > + } > + } > + > + if (sa_dir == XFRM_SA_DIR_IN) { > + if (rs->oseq || rs->oseq_hi) { > + NL_SET_ERR_MSG(extack, > + "Replay oseq and oseq_hi should be 0 for input SA"); > + return -EINVAL; > + } > + } > + > return 0; > } > > @@ -176,6 +200,7 @@ static int verify_newsa_info(struct xfrm_usersa_info *p, > struct netlink_ext_ack *extack) > { > int err; > + u8 sa_dir = attrs[XFRMA_SA_DIR] ? nla_get_u8(attrs[XFRMA_SA_DIR]) : 0; > > err = -EINVAL; > switch (p->family) { > @@ -334,7 +359,7 @@ static int verify_newsa_info(struct xfrm_usersa_info *p, > goto out; > if ((err = verify_sec_ctx_len(attrs, extack))) > goto out; > - if ((err = verify_replay(p, attrs, extack))) > + if ((err = verify_replay(p, attrs, sa_dir, extack))) > goto out; > > err = -EINVAL; > @@ -358,6 +383,77 @@ static int verify_newsa_info(struct xfrm_usersa_info *p, > err = -EINVAL; > goto out; > } > + > + if (sa_dir == XFRM_SA_DIR_OUT) { > + NL_SET_ERR_MSG(extack, > + "MTIMER_THRESH attribute should not be set on output SA"); > + err = -EINVAL; > + goto out; > + } > + } > + > + if (sa_dir == XFRM_SA_DIR_OUT) { > + if (p->flags & XFRM_STATE_DECAP_DSCP) { > + NL_SET_ERR_MSG(extack, "Flag DECAP_DSCP should not be set for output SA"); > + err = -EINVAL; > + goto out; > + } > + > + if (p->flags & XFRM_STATE_ICMP) { > + NL_SET_ERR_MSG(extack, "Flag ICMP should not be set for output SA"); > + err = -EINVAL; > + goto out; > + } > + > + if (p->flags & XFRM_STATE_WILDRECV) { > + NL_SET_ERR_MSG(extack, "Flag WILDRECV should not be set for output SA"); > + err = -EINVAL; > + goto out; > + } > + > + if (p->replay_window) { > + NL_SET_ERR_MSG(extack, "Replay window should be 0 for output SA"); > + err = -EINVAL; > + goto out; > + } > + > + if (attrs[XFRMA_REPLAY_VAL]) { > + struct xfrm_replay_state *replay; > + > + replay = nla_data(attrs[XFRMA_REPLAY_VAL]); > + > + if (replay->seq || replay->bitmap) { > + NL_SET_ERR_MSG(extack, > + "Replay seq and bitmap should be 0 for output SA"); > + err = -EINVAL; > + goto out; > + } > + } > + } > + > + if (sa_dir == XFRM_SA_DIR_IN) { > + if (p->flags & XFRM_STATE_NOPMTUDISC) { > + NL_SET_ERR_MSG(extack, "Flag NOPMTUDISC should not be set for input SA"); > + err = -EINVAL; > + goto out; > + } > + > + if (attrs[XFRMA_SA_EXTRA_FLAGS]) { > + u32 xflags = nla_get_u32(attrs[XFRMA_SA_EXTRA_FLAGS]); > + > + if (xflags & XFRM_SA_XFLAG_DONT_ENCAP_DSCP) { > + NL_SET_ERR_MSG(extack, "Flag DONT_ENCAP_DSCP should not be set for input SA"); > + err = -EINVAL; > + goto out; > + } > + > + if (xflags & XFRM_SA_XFLAG_OSEQ_MAY_WRAP) { > + NL_SET_ERR_MSG(extack, "Flag OSEQ_MAY_WRAP should not be set for input SA"); > + err = -EINVAL; > + goto out; > + } > + > + } > } > > out: > @@ -734,6 +830,9 @@ static struct xfrm_state *xfrm_state_construct(struct net *net, > if (attrs[XFRMA_IF_ID]) > x->if_id = nla_get_u32(attrs[XFRMA_IF_ID]); > > + if (attrs[XFRMA_SA_DIR]) > + x->dir = nla_get_u8(attrs[XFRMA_SA_DIR]); > + > err = __xfrm_init_state(x, false, attrs[XFRMA_OFFLOAD_DEV], extack); > if (err) > goto error; > @@ -1182,8 +1281,13 @@ static int copy_to_user_state_extra(struct xfrm_state *x, > if (ret) > goto out; > } > - if (x->mapping_maxage) > + if (x->mapping_maxage) { > ret = nla_put_u32(skb, XFRMA_MTIMER_THRESH, x->mapping_maxage); > + if (ret) > + goto out; > + } > + if (x->dir) > + ret = nla_put_u8(skb, XFRMA_SA_DIR, x->dir); > out: > return ret; > } > @@ -1618,6 +1722,9 @@ static int xfrm_alloc_userspi(struct sk_buff *skb, struct nlmsghdr *nlh, > if (err) > goto out; > > + if (attrs[XFRMA_SA_DIR]) > + x->dir = nla_get_u8(attrs[XFRMA_SA_DIR]); > + > resp_skb = xfrm_state_netlink(skb, x, nlh->nlmsg_seq); > if (IS_ERR(resp_skb)) { > err = PTR_ERR(resp_skb); > @@ -2401,7 +2508,8 @@ static inline unsigned int xfrm_aevent_msgsize(struct xfrm_state *x) > + nla_total_size_64bit(sizeof(struct xfrm_lifetime_cur)) > + nla_total_size(sizeof(struct xfrm_mark)) > + nla_total_size(4) /* XFRM_AE_RTHR */ > - + nla_total_size(4); /* XFRM_AE_ETHR */ > + + nla_total_size(4) /* XFRM_AE_ETHR */ > + + nla_total_size(sizeof(x->dir)); /* XFRMA_SA_DIR */ > } > > static int build_aevent(struct sk_buff *skb, struct xfrm_state *x, const struct km_event *c) > @@ -2458,6 +2566,12 @@ static int build_aevent(struct sk_buff *skb, struct xfrm_state *x, const struct > if (err) > goto out_cancel; > > + if (x->dir) { > + err = nla_put_u8(skb, XFRMA_SA_DIR, x->dir); > + if (err) > + goto out_cancel; > + } > + > nlmsg_end(skb, nlh); > return 0; > > @@ -3017,6 +3131,7 @@ EXPORT_SYMBOL_GPL(xfrm_msg_min); > #undef XMSGSIZE > > const struct nla_policy xfrma_policy[XFRMA_MAX+1] = { > + [XFRMA_UNSPEC] = { .strict_start_type = XFRMA_SA_DIR }, > [XFRMA_SA] = { .len = sizeof(struct xfrm_usersa_info)}, > [XFRMA_POLICY] = { .len = sizeof(struct xfrm_userpolicy_info)}, > [XFRMA_LASTUSED] = { .type = NLA_U64}, > @@ -3048,6 +3163,7 @@ const struct nla_policy xfrma_policy[XFRMA_MAX+1] = { > [XFRMA_SET_MARK_MASK] = { .type = NLA_U32 }, > [XFRMA_IF_ID] = { .type = NLA_U32 }, > [XFRMA_MTIMER_THRESH] = { .type = NLA_U32 }, > + [XFRMA_SA_DIR] = NLA_POLICY_RANGE(NLA_U8, XFRM_SA_DIR_IN, XFRM_SA_DIR_OUT), > }; > EXPORT_SYMBOL_GPL(xfrma_policy); > > @@ -3188,8 +3304,9 @@ static void xfrm_netlink_rcv(struct sk_buff *skb) > > static inline unsigned int xfrm_expire_msgsize(void) > { > - return NLMSG_ALIGN(sizeof(struct xfrm_user_expire)) > - + nla_total_size(sizeof(struct xfrm_mark)); > + return NLMSG_ALIGN(sizeof(struct xfrm_user_expire)) + > + nla_total_size(sizeof(struct xfrm_mark)) + > + nla_total_size(sizeof_field(struct xfrm_state, dir)); > } > > static int build_expire(struct sk_buff *skb, struct xfrm_state *x, const struct km_event *c) > @@ -3216,6 +3333,12 @@ static int build_expire(struct sk_buff *skb, struct xfrm_state *x, const struct > if (err) > return err; > > + if (x->dir) { > + err = nla_put_u8(skb, XFRMA_SA_DIR, x->dir); > + if (err) > + return err; > + } > + > nlmsg_end(skb, nlh); > return 0; > } > @@ -3323,6 +3446,9 @@ static inline unsigned int xfrm_sa_len(struct xfrm_state *x) > if (x->mapping_maxage) > l += nla_total_size(sizeof(x->mapping_maxage)); > > + if (x->dir) > + l += nla_total_size(sizeof(x->dir)); > + > return l; > } >

11 months

1
0
0 0

[PATCH AUTOSEL 4.19 1/3] ASoC: fsl_esai: change dev_warn to dev_dbg in irq handler

by Sasha Levin

From: Shengjiu Wang <shengjiu.wang(a)nxp.com> [ Upstream commit 54c805c1eb264c839fa3027d0073bb7f323b0722 ] Irq handler need to be executed as fast as possible, so the log in irq handler is better to use dev_dbg which needs to be enabled when debugging. Signed-off-by: Shengjiu Wang <shengjiu.wang(a)nxp.com> Reviewed-by: Iuliana Prodan <iuliana.prodan(a)nxp.com> Link: https://patch.msgid.link/1728622433-2873-1-git-send-email-shengjiu.wang@nxp… Signed-off-by: Mark Brown <broonie(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- sound/soc/fsl/fsl_esai.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sound/soc/fsl/fsl_esai.c b/sound/soc/fsl/fsl_esai.c index baa76337c33f3..e3c324467a9b4 100644 --- a/sound/soc/fsl/fsl_esai.c +++ b/sound/soc/fsl/fsl_esai.c @@ -77,10 +77,10 @@ static irqreturn_t esai_isr(int irq, void *devid) dev_dbg(&pdev->dev, "isr: Transmission Initialized\n"); if (esr & ESAI_ESR_RFF_MASK) - dev_warn(&pdev->dev, "isr: Receiving overrun\n"); + dev_dbg(&pdev->dev, "isr: Receiving overrun\n"); if (esr & ESAI_ESR_TFE_MASK) - dev_warn(&pdev->dev, "isr: Transmission underrun\n"); + dev_dbg(&pdev->dev, "isr: Transmission underrun\n"); if (esr & ESAI_ESR_TLS_MASK) dev_dbg(&pdev->dev, "isr: Just transmitted the last slot\n"); -- 2.43.0

11 months

1
2
0 0

[PATCH AUTOSEL 5.4 1/3] ASoC: fsl_esai: change dev_warn to dev_dbg in irq handler

by Sasha Levin

From: Shengjiu Wang <shengjiu.wang(a)nxp.com> [ Upstream commit 54c805c1eb264c839fa3027d0073bb7f323b0722 ] Irq handler need to be executed as fast as possible, so the log in irq handler is better to use dev_dbg which needs to be enabled when debugging. Signed-off-by: Shengjiu Wang <shengjiu.wang(a)nxp.com> Reviewed-by: Iuliana Prodan <iuliana.prodan(a)nxp.com> Link: https://patch.msgid.link/1728622433-2873-1-git-send-email-shengjiu.wang@nxp… Signed-off-by: Mark Brown <broonie(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- sound/soc/fsl/fsl_esai.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sound/soc/fsl/fsl_esai.c b/sound/soc/fsl/fsl_esai.c index 33ade79fa032e..4904f48de612d 100644 --- a/sound/soc/fsl/fsl_esai.c +++ b/sound/soc/fsl/fsl_esai.c @@ -98,10 +98,10 @@ static irqreturn_t esai_isr(int irq, void *devid) dev_dbg(&pdev->dev, "isr: Transmission Initialized\n"); if (esr & ESAI_ESR_RFF_MASK) - dev_warn(&pdev->dev, "isr: Receiving overrun\n"); + dev_dbg(&pdev->dev, "isr: Receiving overrun\n"); if (esr & ESAI_ESR_TFE_MASK) - dev_warn(&pdev->dev, "isr: Transmission underrun\n"); + dev_dbg(&pdev->dev, "isr: Transmission underrun\n"); if (esr & ESAI_ESR_TLS_MASK) dev_dbg(&pdev->dev, "isr: Just transmitted the last slot\n"); -- 2.43.0

11 months

1
2
0 0

[PATCH AUTOSEL 5.10 1/4] xfrm: extract dst lookup parameters into a struct

by Sasha Levin

From: Eyal Birger <eyal.birger(a)gmail.com> [ Upstream commit e509996b16728e37d5a909a5c63c1bd64f23b306 ] Preparation for adding more fields to dst lookup functions without changing their signatures. Signed-off-by: Eyal Birger <eyal.birger(a)gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert(a)secunet.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- include/net/xfrm.h | 26 +++++++++++++------------- net/ipv4/xfrm4_policy.c | 38 ++++++++++++++++---------------------- net/ipv6/xfrm6_policy.c | 28 +++++++++++++--------------- net/xfrm/xfrm_device.c | 11 ++++++++--- net/xfrm/xfrm_policy.c | 35 +++++++++++++++++++++++------------ 5 files changed, 73 insertions(+), 65 deletions(-) diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 6fbaf304648f6..142967e456b18 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -321,20 +321,23 @@ struct xfrm_if_cb { void xfrm_if_register_cb(const struct xfrm_if_cb *ifcb); void xfrm_if_unregister_cb(void); +struct xfrm_dst_lookup_params { + struct net *net; + int tos; + int oif; + xfrm_address_t *saddr; + xfrm_address_t *daddr; + u32 mark; +}; + struct net_device; struct xfrm_type; struct xfrm_dst; struct xfrm_policy_afinfo { struct dst_ops *dst_ops; - struct dst_entry *(*dst_lookup)(struct net *net, - int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - u32 mark); - int (*get_saddr)(struct net *net, int oif, - xfrm_address_t *saddr, - xfrm_address_t *daddr, - u32 mark); + struct dst_entry *(*dst_lookup)(const struct xfrm_dst_lookup_params *params); + int (*get_saddr)(xfrm_address_t *saddr, + const struct xfrm_dst_lookup_params *params); int (*fill_dst)(struct xfrm_dst *xdst, struct net_device *dev, const struct flowi *fl); @@ -1658,10 +1661,7 @@ static inline int xfrm_user_policy(struct sock *sk, int optname, } #endif -struct dst_entry *__xfrm_dst_lookup(struct net *net, int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - int family, u32 mark); +struct dst_entry *__xfrm_dst_lookup(int family, const struct xfrm_dst_lookup_params *params); struct xfrm_policy *xfrm_policy_alloc(struct net *net, gfp_t gfp); diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c index 4548a91acdc89..d1c2619e03740 100644 --- a/net/ipv4/xfrm4_policy.c +++ b/net/ipv4/xfrm4_policy.c @@ -17,47 +17,41 @@ #include <net/ip.h> #include <net/l3mdev.h> -static struct dst_entry *__xfrm4_dst_lookup(struct net *net, struct flowi4 *fl4, - int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - u32 mark) +static struct dst_entry *__xfrm4_dst_lookup(struct flowi4 *fl4, + const struct xfrm_dst_lookup_params *params) { struct rtable *rt; memset(fl4, 0, sizeof(*fl4)); - fl4->daddr = daddr->a4; - fl4->flowi4_tos = tos; - fl4->flowi4_l3mdev = l3mdev_master_ifindex_by_index(net, oif); - fl4->flowi4_mark = mark; - if (saddr) - fl4->saddr = saddr->a4; - - rt = __ip_route_output_key(net, fl4); + fl4->daddr = params->daddr->a4; + fl4->flowi4_tos = params->tos; + fl4->flowi4_l3mdev = l3mdev_master_ifindex_by_index(params->net, + params->oif); + fl4->flowi4_mark = params->mark; + if (params->saddr) + fl4->saddr = params->saddr->a4; + + rt = __ip_route_output_key(params->net, fl4); if (!IS_ERR(rt)) return &rt->dst; return ERR_CAST(rt); } -static struct dst_entry *xfrm4_dst_lookup(struct net *net, int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - u32 mark) +static struct dst_entry *xfrm4_dst_lookup(const struct xfrm_dst_lookup_params *params) { struct flowi4 fl4; - return __xfrm4_dst_lookup(net, &fl4, tos, oif, saddr, daddr, mark); + return __xfrm4_dst_lookup(&fl4, params); } -static int xfrm4_get_saddr(struct net *net, int oif, - xfrm_address_t *saddr, xfrm_address_t *daddr, - u32 mark) +static int xfrm4_get_saddr(xfrm_address_t *saddr, + const struct xfrm_dst_lookup_params *params) { struct dst_entry *dst; struct flowi4 fl4; - dst = __xfrm4_dst_lookup(net, &fl4, 0, oif, NULL, daddr, mark); + dst = __xfrm4_dst_lookup(&fl4, params); if (IS_ERR(dst)) return -EHOSTUNREACH; diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c index 492b9692c0dc0..40183fdf7da0e 100644 --- a/net/ipv6/xfrm6_policy.c +++ b/net/ipv6/xfrm6_policy.c @@ -23,23 +23,21 @@ #include <net/ip6_route.h> #include <net/l3mdev.h> -static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - u32 mark) +static struct dst_entry *xfrm6_dst_lookup(const struct xfrm_dst_lookup_params *params) { struct flowi6 fl6; struct dst_entry *dst; int err; memset(&fl6, 0, sizeof(fl6)); - fl6.flowi6_l3mdev = l3mdev_master_ifindex_by_index(net, oif); - fl6.flowi6_mark = mark; - memcpy(&fl6.daddr, daddr, sizeof(fl6.daddr)); - if (saddr) - memcpy(&fl6.saddr, saddr, sizeof(fl6.saddr)); + fl6.flowi6_l3mdev = l3mdev_master_ifindex_by_index(params->net, + params->oif); + fl6.flowi6_mark = params->mark; + memcpy(&fl6.daddr, params->daddr, sizeof(fl6.daddr)); + if (params->saddr) + memcpy(&fl6.saddr, params->saddr, sizeof(fl6.saddr)); - dst = ip6_route_output(net, NULL, &fl6); + dst = ip6_route_output(params->net, NULL, &fl6); err = dst->error; if (dst->error) { @@ -50,15 +48,14 @@ static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, int oif, return dst; } -static int xfrm6_get_saddr(struct net *net, int oif, - xfrm_address_t *saddr, xfrm_address_t *daddr, - u32 mark) +static int xfrm6_get_saddr(xfrm_address_t *saddr, + const struct xfrm_dst_lookup_params *params) { struct dst_entry *dst; struct net_device *dev; struct inet6_dev *idev; - dst = xfrm6_dst_lookup(net, 0, oif, NULL, daddr, mark); + dst = xfrm6_dst_lookup(params); if (IS_ERR(dst)) return -EHOSTUNREACH; @@ -68,7 +65,8 @@ static int xfrm6_get_saddr(struct net *net, int oif, return -EHOSTUNREACH; } dev = idev->dev; - ipv6_dev_get_saddr(dev_net(dev), dev, &daddr->in6, 0, &saddr->in6); + ipv6_dev_get_saddr(dev_net(dev), dev, &params->daddr->in6, 0, + &saddr->in6); dst_release(dst); return 0; } diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c index 8b8e957a69c36..4d13f7a372ab6 100644 --- a/net/xfrm/xfrm_device.c +++ b/net/xfrm/xfrm_device.c @@ -241,6 +241,8 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x, dev = dev_get_by_index(net, xuo->ifindex); if (!dev) { + struct xfrm_dst_lookup_params params; + if (!(xuo->flags & XFRM_OFFLOAD_INBOUND)) { saddr = &x->props.saddr; daddr = &x->id.daddr; @@ -249,9 +251,12 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x, daddr = &x->props.saddr; } - dst = __xfrm_dst_lookup(net, 0, 0, saddr, daddr, - x->props.family, - xfrm_smark_get(0, x)); + memset(&params, 0, sizeof(params)); + params.net = net; + params.saddr = saddr; + params.daddr = daddr; + params.mark = xfrm_smark_get(0, x); + dst = __xfrm_dst_lookup(x->props.family, &params); if (IS_ERR(dst)) return 0; diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 39910d4eff62b..a7f8da5241ae5 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -251,10 +251,8 @@ static const struct xfrm_if_cb *xfrm_if_get_cb(void) return rcu_dereference(xfrm_if_cb); } -struct dst_entry *__xfrm_dst_lookup(struct net *net, int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - int family, u32 mark) +struct dst_entry *__xfrm_dst_lookup(int family, + const struct xfrm_dst_lookup_params *params) { const struct xfrm_policy_afinfo *afinfo; struct dst_entry *dst; @@ -263,7 +261,7 @@ struct dst_entry *__xfrm_dst_lookup(struct net *net, int tos, int oif, if (unlikely(afinfo == NULL)) return ERR_PTR(-EAFNOSUPPORT); - dst = afinfo->dst_lookup(net, tos, oif, saddr, daddr, mark); + dst = afinfo->dst_lookup(params); rcu_read_unlock(); @@ -277,6 +275,7 @@ static inline struct dst_entry *xfrm_dst_lookup(struct xfrm_state *x, xfrm_address_t *prev_daddr, int family, u32 mark) { + struct xfrm_dst_lookup_params params; struct net *net = xs_net(x); xfrm_address_t *saddr = &x->props.saddr; xfrm_address_t *daddr = &x->id.daddr; @@ -291,7 +290,14 @@ static inline struct dst_entry *xfrm_dst_lookup(struct xfrm_state *x, daddr = x->coaddr; } - dst = __xfrm_dst_lookup(net, tos, oif, saddr, daddr, family, mark); + params.net = net; + params.saddr = saddr; + params.daddr = daddr; + params.tos = tos; + params.oif = oif; + params.mark = mark; + + dst = __xfrm_dst_lookup(family, &params); if (!IS_ERR(dst)) { if (prev_saddr != saddr) @@ -2344,15 +2350,15 @@ int __xfrm_sk_clone_policy(struct sock *sk, const struct sock *osk) } static int -xfrm_get_saddr(struct net *net, int oif, xfrm_address_t *local, - xfrm_address_t *remote, unsigned short family, u32 mark) +xfrm_get_saddr(unsigned short family, xfrm_address_t *saddr, + const struct xfrm_dst_lookup_params *params) { int err; const struct xfrm_policy_afinfo *afinfo = xfrm_policy_get_afinfo(family); if (unlikely(afinfo == NULL)) return -EINVAL; - err = afinfo->get_saddr(net, oif, local, remote, mark); + err = afinfo->get_saddr(saddr, params); rcu_read_unlock(); return err; } @@ -2381,9 +2387,14 @@ xfrm_tmpl_resolve_one(struct xfrm_policy *policy, const struct flowi *fl, remote = &tmpl->id.daddr; local = &tmpl->saddr; if (xfrm_addr_any(local, tmpl->encap_family)) { - error = xfrm_get_saddr(net, fl->flowi_oif, - &tmp, remote, - tmpl->encap_family, 0); + struct xfrm_dst_lookup_params params; + + memset(&params, 0, sizeof(params)); + params.net = net; + params.oif = fl->flowi_oif; + params.daddr = remote; + error = xfrm_get_saddr(tmpl->encap_family, &tmp, + &params); if (error) goto fail; local = &tmp; -- 2.43.0

11 months

1
3
0 0

[PATCH AUTOSEL 5.15 1/7] xfrm: extract dst lookup parameters into a struct

by Sasha Levin

From: Eyal Birger <eyal.birger(a)gmail.com> [ Upstream commit e509996b16728e37d5a909a5c63c1bd64f23b306 ] Preparation for adding more fields to dst lookup functions without changing their signatures. Signed-off-by: Eyal Birger <eyal.birger(a)gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert(a)secunet.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- include/net/xfrm.h | 26 +++++++++++++------------- net/ipv4/xfrm4_policy.c | 38 ++++++++++++++++---------------------- net/ipv6/xfrm6_policy.c | 28 +++++++++++++--------------- net/xfrm/xfrm_device.c | 11 ++++++++--- net/xfrm/xfrm_policy.c | 35 +++++++++++++++++++++++------------ 5 files changed, 73 insertions(+), 65 deletions(-) diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 2e2e30d31a763..642e0b60130d8 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -315,20 +315,23 @@ struct xfrm_if_cb { void xfrm_if_register_cb(const struct xfrm_if_cb *ifcb); void xfrm_if_unregister_cb(void); +struct xfrm_dst_lookup_params { + struct net *net; + int tos; + int oif; + xfrm_address_t *saddr; + xfrm_address_t *daddr; + u32 mark; +}; + struct net_device; struct xfrm_type; struct xfrm_dst; struct xfrm_policy_afinfo { struct dst_ops *dst_ops; - struct dst_entry *(*dst_lookup)(struct net *net, - int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - u32 mark); - int (*get_saddr)(struct net *net, int oif, - xfrm_address_t *saddr, - xfrm_address_t *daddr, - u32 mark); + struct dst_entry *(*dst_lookup)(const struct xfrm_dst_lookup_params *params); + int (*get_saddr)(xfrm_address_t *saddr, + const struct xfrm_dst_lookup_params *params); int (*fill_dst)(struct xfrm_dst *xdst, struct net_device *dev, const struct flowi *fl); @@ -1645,10 +1648,7 @@ static inline int xfrm_user_policy(struct sock *sk, int optname, } #endif -struct dst_entry *__xfrm_dst_lookup(struct net *net, int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - int family, u32 mark); +struct dst_entry *__xfrm_dst_lookup(int family, const struct xfrm_dst_lookup_params *params); struct xfrm_policy *xfrm_policy_alloc(struct net *net, gfp_t gfp); diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c index 4548a91acdc89..d1c2619e03740 100644 --- a/net/ipv4/xfrm4_policy.c +++ b/net/ipv4/xfrm4_policy.c @@ -17,47 +17,41 @@ #include <net/ip.h> #include <net/l3mdev.h> -static struct dst_entry *__xfrm4_dst_lookup(struct net *net, struct flowi4 *fl4, - int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - u32 mark) +static struct dst_entry *__xfrm4_dst_lookup(struct flowi4 *fl4, + const struct xfrm_dst_lookup_params *params) { struct rtable *rt; memset(fl4, 0, sizeof(*fl4)); - fl4->daddr = daddr->a4; - fl4->flowi4_tos = tos; - fl4->flowi4_l3mdev = l3mdev_master_ifindex_by_index(net, oif); - fl4->flowi4_mark = mark; - if (saddr) - fl4->saddr = saddr->a4; - - rt = __ip_route_output_key(net, fl4); + fl4->daddr = params->daddr->a4; + fl4->flowi4_tos = params->tos; + fl4->flowi4_l3mdev = l3mdev_master_ifindex_by_index(params->net, + params->oif); + fl4->flowi4_mark = params->mark; + if (params->saddr) + fl4->saddr = params->saddr->a4; + + rt = __ip_route_output_key(params->net, fl4); if (!IS_ERR(rt)) return &rt->dst; return ERR_CAST(rt); } -static struct dst_entry *xfrm4_dst_lookup(struct net *net, int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - u32 mark) +static struct dst_entry *xfrm4_dst_lookup(const struct xfrm_dst_lookup_params *params) { struct flowi4 fl4; - return __xfrm4_dst_lookup(net, &fl4, tos, oif, saddr, daddr, mark); + return __xfrm4_dst_lookup(&fl4, params); } -static int xfrm4_get_saddr(struct net *net, int oif, - xfrm_address_t *saddr, xfrm_address_t *daddr, - u32 mark) +static int xfrm4_get_saddr(xfrm_address_t *saddr, + const struct xfrm_dst_lookup_params *params) { struct dst_entry *dst; struct flowi4 fl4; - dst = __xfrm4_dst_lookup(net, &fl4, 0, oif, NULL, daddr, mark); + dst = __xfrm4_dst_lookup(&fl4, params); if (IS_ERR(dst)) return -EHOSTUNREACH; diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c index 492b9692c0dc0..40183fdf7da0e 100644 --- a/net/ipv6/xfrm6_policy.c +++ b/net/ipv6/xfrm6_policy.c @@ -23,23 +23,21 @@ #include <net/ip6_route.h> #include <net/l3mdev.h> -static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - u32 mark) +static struct dst_entry *xfrm6_dst_lookup(const struct xfrm_dst_lookup_params *params) { struct flowi6 fl6; struct dst_entry *dst; int err; memset(&fl6, 0, sizeof(fl6)); - fl6.flowi6_l3mdev = l3mdev_master_ifindex_by_index(net, oif); - fl6.flowi6_mark = mark; - memcpy(&fl6.daddr, daddr, sizeof(fl6.daddr)); - if (saddr) - memcpy(&fl6.saddr, saddr, sizeof(fl6.saddr)); + fl6.flowi6_l3mdev = l3mdev_master_ifindex_by_index(params->net, + params->oif); + fl6.flowi6_mark = params->mark; + memcpy(&fl6.daddr, params->daddr, sizeof(fl6.daddr)); + if (params->saddr) + memcpy(&fl6.saddr, params->saddr, sizeof(fl6.saddr)); - dst = ip6_route_output(net, NULL, &fl6); + dst = ip6_route_output(params->net, NULL, &fl6); err = dst->error; if (dst->error) { @@ -50,15 +48,14 @@ static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, int oif, return dst; } -static int xfrm6_get_saddr(struct net *net, int oif, - xfrm_address_t *saddr, xfrm_address_t *daddr, - u32 mark) +static int xfrm6_get_saddr(xfrm_address_t *saddr, + const struct xfrm_dst_lookup_params *params) { struct dst_entry *dst; struct net_device *dev; struct inet6_dev *idev; - dst = xfrm6_dst_lookup(net, 0, oif, NULL, daddr, mark); + dst = xfrm6_dst_lookup(params); if (IS_ERR(dst)) return -EHOSTUNREACH; @@ -68,7 +65,8 @@ static int xfrm6_get_saddr(struct net *net, int oif, return -EHOSTUNREACH; } dev = idev->dev; - ipv6_dev_get_saddr(dev_net(dev), dev, &daddr->in6, 0, &saddr->in6); + ipv6_dev_get_saddr(dev_net(dev), dev, &params->daddr->in6, 0, + &saddr->in6); dst_release(dst); return 0; } diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c index 8b8e957a69c36..4d13f7a372ab6 100644 --- a/net/xfrm/xfrm_device.c +++ b/net/xfrm/xfrm_device.c @@ -241,6 +241,8 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x, dev = dev_get_by_index(net, xuo->ifindex); if (!dev) { + struct xfrm_dst_lookup_params params; + if (!(xuo->flags & XFRM_OFFLOAD_INBOUND)) { saddr = &x->props.saddr; daddr = &x->id.daddr; @@ -249,9 +251,12 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x, daddr = &x->props.saddr; } - dst = __xfrm_dst_lookup(net, 0, 0, saddr, daddr, - x->props.family, - xfrm_smark_get(0, x)); + memset(&params, 0, sizeof(params)); + params.net = net; + params.saddr = saddr; + params.daddr = daddr; + params.mark = xfrm_smark_get(0, x); + dst = __xfrm_dst_lookup(x->props.family, &params); if (IS_ERR(dst)) return 0; diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index bc867d1905f52..ab6f5955aa9cf 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -251,10 +251,8 @@ static const struct xfrm_if_cb *xfrm_if_get_cb(void) return rcu_dereference(xfrm_if_cb); } -struct dst_entry *__xfrm_dst_lookup(struct net *net, int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - int family, u32 mark) +struct dst_entry *__xfrm_dst_lookup(int family, + const struct xfrm_dst_lookup_params *params) { const struct xfrm_policy_afinfo *afinfo; struct dst_entry *dst; @@ -263,7 +261,7 @@ struct dst_entry *__xfrm_dst_lookup(struct net *net, int tos, int oif, if (unlikely(afinfo == NULL)) return ERR_PTR(-EAFNOSUPPORT); - dst = afinfo->dst_lookup(net, tos, oif, saddr, daddr, mark); + dst = afinfo->dst_lookup(params); rcu_read_unlock(); @@ -277,6 +275,7 @@ static inline struct dst_entry *xfrm_dst_lookup(struct xfrm_state *x, xfrm_address_t *prev_daddr, int family, u32 mark) { + struct xfrm_dst_lookup_params params; struct net *net = xs_net(x); xfrm_address_t *saddr = &x->props.saddr; xfrm_address_t *daddr = &x->id.daddr; @@ -291,7 +290,14 @@ static inline struct dst_entry *xfrm_dst_lookup(struct xfrm_state *x, daddr = x->coaddr; } - dst = __xfrm_dst_lookup(net, tos, oif, saddr, daddr, family, mark); + params.net = net; + params.saddr = saddr; + params.daddr = daddr; + params.tos = tos; + params.oif = oif; + params.mark = mark; + + dst = __xfrm_dst_lookup(family, &params); if (!IS_ERR(dst)) { if (prev_saddr != saddr) @@ -2342,15 +2348,15 @@ int __xfrm_sk_clone_policy(struct sock *sk, const struct sock *osk) } static int -xfrm_get_saddr(struct net *net, int oif, xfrm_address_t *local, - xfrm_address_t *remote, unsigned short family, u32 mark) +xfrm_get_saddr(unsigned short family, xfrm_address_t *saddr, + const struct xfrm_dst_lookup_params *params) { int err; const struct xfrm_policy_afinfo *afinfo = xfrm_policy_get_afinfo(family); if (unlikely(afinfo == NULL)) return -EINVAL; - err = afinfo->get_saddr(net, oif, local, remote, mark); + err = afinfo->get_saddr(saddr, params); rcu_read_unlock(); return err; } @@ -2379,9 +2385,14 @@ xfrm_tmpl_resolve_one(struct xfrm_policy *policy, const struct flowi *fl, remote = &tmpl->id.daddr; local = &tmpl->saddr; if (xfrm_addr_any(local, tmpl->encap_family)) { - error = xfrm_get_saddr(net, fl->flowi_oif, - &tmp, remote, - tmpl->encap_family, 0); + struct xfrm_dst_lookup_params params; + + memset(&params, 0, sizeof(params)); + params.net = net; + params.oif = fl->flowi_oif; + params.daddr = remote; + error = xfrm_get_saddr(tmpl->encap_family, &tmp, + &params); if (error) goto fail; local = &tmp; -- 2.43.0

11 months

1
6
0 0

[PATCH AUTOSEL 6.1 1/8] xfrm: extract dst lookup parameters into a struct

by Sasha Levin

From: Eyal Birger <eyal.birger(a)gmail.com> [ Upstream commit e509996b16728e37d5a909a5c63c1bd64f23b306 ] Preparation for adding more fields to dst lookup functions without changing their signatures. Signed-off-by: Eyal Birger <eyal.birger(a)gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert(a)secunet.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- include/net/xfrm.h | 26 +++++++++++++------------- net/ipv4/xfrm4_policy.c | 38 ++++++++++++++++---------------------- net/ipv6/xfrm6_policy.c | 28 +++++++++++++--------------- net/xfrm/xfrm_device.c | 11 ++++++++--- net/xfrm/xfrm_policy.c | 35 +++++++++++++++++++++++------------ 5 files changed, 73 insertions(+), 65 deletions(-) diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 5b9c2c535702c..55ea15ccd5327 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -326,20 +326,23 @@ struct xfrm_if_cb { void xfrm_if_register_cb(const struct xfrm_if_cb *ifcb); void xfrm_if_unregister_cb(void); +struct xfrm_dst_lookup_params { + struct net *net; + int tos; + int oif; + xfrm_address_t *saddr; + xfrm_address_t *daddr; + u32 mark; +}; + struct net_device; struct xfrm_type; struct xfrm_dst; struct xfrm_policy_afinfo { struct dst_ops *dst_ops; - struct dst_entry *(*dst_lookup)(struct net *net, - int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - u32 mark); - int (*get_saddr)(struct net *net, int oif, - xfrm_address_t *saddr, - xfrm_address_t *daddr, - u32 mark); + struct dst_entry *(*dst_lookup)(const struct xfrm_dst_lookup_params *params); + int (*get_saddr)(xfrm_address_t *saddr, + const struct xfrm_dst_lookup_params *params); int (*fill_dst)(struct xfrm_dst *xdst, struct net_device *dev, const struct flowi *fl); @@ -1659,10 +1662,7 @@ static inline int xfrm_user_policy(struct sock *sk, int optname, } #endif -struct dst_entry *__xfrm_dst_lookup(struct net *net, int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - int family, u32 mark); +struct dst_entry *__xfrm_dst_lookup(int family, const struct xfrm_dst_lookup_params *params); struct xfrm_policy *xfrm_policy_alloc(struct net *net, gfp_t gfp); diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c index 3d0dfa6cf9f96..9ac9ed9738068 100644 --- a/net/ipv4/xfrm4_policy.c +++ b/net/ipv4/xfrm4_policy.c @@ -17,47 +17,41 @@ #include <net/ip.h> #include <net/l3mdev.h> -static struct dst_entry *__xfrm4_dst_lookup(struct net *net, struct flowi4 *fl4, - int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - u32 mark) +static struct dst_entry *__xfrm4_dst_lookup(struct flowi4 *fl4, + const struct xfrm_dst_lookup_params *params) { struct rtable *rt; memset(fl4, 0, sizeof(*fl4)); - fl4->daddr = daddr->a4; - fl4->flowi4_tos = tos; - fl4->flowi4_l3mdev = l3mdev_master_ifindex_by_index(net, oif); - fl4->flowi4_mark = mark; - if (saddr) - fl4->saddr = saddr->a4; - - rt = __ip_route_output_key(net, fl4); + fl4->daddr = params->daddr->a4; + fl4->flowi4_tos = params->tos; + fl4->flowi4_l3mdev = l3mdev_master_ifindex_by_index(params->net, + params->oif); + fl4->flowi4_mark = params->mark; + if (params->saddr) + fl4->saddr = params->saddr->a4; + + rt = __ip_route_output_key(params->net, fl4); if (!IS_ERR(rt)) return &rt->dst; return ERR_CAST(rt); } -static struct dst_entry *xfrm4_dst_lookup(struct net *net, int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - u32 mark) +static struct dst_entry *xfrm4_dst_lookup(const struct xfrm_dst_lookup_params *params) { struct flowi4 fl4; - return __xfrm4_dst_lookup(net, &fl4, tos, oif, saddr, daddr, mark); + return __xfrm4_dst_lookup(&fl4, params); } -static int xfrm4_get_saddr(struct net *net, int oif, - xfrm_address_t *saddr, xfrm_address_t *daddr, - u32 mark) +static int xfrm4_get_saddr(xfrm_address_t *saddr, + const struct xfrm_dst_lookup_params *params) { struct dst_entry *dst; struct flowi4 fl4; - dst = __xfrm4_dst_lookup(net, &fl4, 0, oif, NULL, daddr, mark); + dst = __xfrm4_dst_lookup(&fl4, params); if (IS_ERR(dst)) return -EHOSTUNREACH; diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c index b7b5dbf5d037b..6e3e0f1bd81c9 100644 --- a/net/ipv6/xfrm6_policy.c +++ b/net/ipv6/xfrm6_policy.c @@ -23,23 +23,21 @@ #include <net/ip6_route.h> #include <net/l3mdev.h> -static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - u32 mark) +static struct dst_entry *xfrm6_dst_lookup(const struct xfrm_dst_lookup_params *params) { struct flowi6 fl6; struct dst_entry *dst; int err; memset(&fl6, 0, sizeof(fl6)); - fl6.flowi6_l3mdev = l3mdev_master_ifindex_by_index(net, oif); - fl6.flowi6_mark = mark; - memcpy(&fl6.daddr, daddr, sizeof(fl6.daddr)); - if (saddr) - memcpy(&fl6.saddr, saddr, sizeof(fl6.saddr)); + fl6.flowi6_l3mdev = l3mdev_master_ifindex_by_index(params->net, + params->oif); + fl6.flowi6_mark = params->mark; + memcpy(&fl6.daddr, params->daddr, sizeof(fl6.daddr)); + if (params->saddr) + memcpy(&fl6.saddr, params->saddr, sizeof(fl6.saddr)); - dst = ip6_route_output(net, NULL, &fl6); + dst = ip6_route_output(params->net, NULL, &fl6); err = dst->error; if (dst->error) { @@ -50,15 +48,14 @@ static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, int oif, return dst; } -static int xfrm6_get_saddr(struct net *net, int oif, - xfrm_address_t *saddr, xfrm_address_t *daddr, - u32 mark) +static int xfrm6_get_saddr(xfrm_address_t *saddr, + const struct xfrm_dst_lookup_params *params) { struct dst_entry *dst; struct net_device *dev; struct inet6_dev *idev; - dst = xfrm6_dst_lookup(net, 0, oif, NULL, daddr, mark); + dst = xfrm6_dst_lookup(params); if (IS_ERR(dst)) return -EHOSTUNREACH; @@ -68,7 +65,8 @@ static int xfrm6_get_saddr(struct net *net, int oif, return -EHOSTUNREACH; } dev = idev->dev; - ipv6_dev_get_saddr(dev_net(dev), dev, &daddr->in6, 0, &saddr->in6); + ipv6_dev_get_saddr(dev_net(dev), dev, &params->daddr->in6, 0, + &saddr->in6); dst_release(dst); return 0; } diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c index 21269e8f2db4b..2535ee034a5c8 100644 --- a/net/xfrm/xfrm_device.c +++ b/net/xfrm/xfrm_device.c @@ -248,6 +248,8 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x, dev = dev_get_by_index(net, xuo->ifindex); if (!dev) { + struct xfrm_dst_lookup_params params; + if (!(xuo->flags & XFRM_OFFLOAD_INBOUND)) { saddr = &x->props.saddr; daddr = &x->id.daddr; @@ -256,9 +258,12 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x, daddr = &x->props.saddr; } - dst = __xfrm_dst_lookup(net, 0, 0, saddr, daddr, - x->props.family, - xfrm_smark_get(0, x)); + memset(&params, 0, sizeof(params)); + params.net = net; + params.saddr = saddr; + params.daddr = daddr; + params.mark = xfrm_smark_get(0, x); + dst = __xfrm_dst_lookup(x->props.family, &params); if (IS_ERR(dst)) return 0; diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 5fddde2d5bc48..adb12f428be30 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -251,10 +251,8 @@ static const struct xfrm_if_cb *xfrm_if_get_cb(void) return rcu_dereference(xfrm_if_cb); } -struct dst_entry *__xfrm_dst_lookup(struct net *net, int tos, int oif, - const xfrm_address_t *saddr, - const xfrm_address_t *daddr, - int family, u32 mark) +struct dst_entry *__xfrm_dst_lookup(int family, + const struct xfrm_dst_lookup_params *params) { const struct xfrm_policy_afinfo *afinfo; struct dst_entry *dst; @@ -263,7 +261,7 @@ struct dst_entry *__xfrm_dst_lookup(struct net *net, int tos, int oif, if (unlikely(afinfo == NULL)) return ERR_PTR(-EAFNOSUPPORT); - dst = afinfo->dst_lookup(net, tos, oif, saddr, daddr, mark); + dst = afinfo->dst_lookup(params); rcu_read_unlock(); @@ -277,6 +275,7 @@ static inline struct dst_entry *xfrm_dst_lookup(struct xfrm_state *x, xfrm_address_t *prev_daddr, int family, u32 mark) { + struct xfrm_dst_lookup_params params; struct net *net = xs_net(x); xfrm_address_t *saddr = &x->props.saddr; xfrm_address_t *daddr = &x->id.daddr; @@ -291,7 +290,14 @@ static inline struct dst_entry *xfrm_dst_lookup(struct xfrm_state *x, daddr = x->coaddr; } - dst = __xfrm_dst_lookup(net, tos, oif, saddr, daddr, family, mark); + params.net = net; + params.saddr = saddr; + params.daddr = daddr; + params.tos = tos; + params.oif = oif; + params.mark = mark; + + dst = __xfrm_dst_lookup(family, &params); if (!IS_ERR(dst)) { if (prev_saddr != saddr) @@ -2346,15 +2352,15 @@ int __xfrm_sk_clone_policy(struct sock *sk, const struct sock *osk) } static int -xfrm_get_saddr(struct net *net, int oif, xfrm_address_t *local, - xfrm_address_t *remote, unsigned short family, u32 mark) +xfrm_get_saddr(unsigned short family, xfrm_address_t *saddr, + const struct xfrm_dst_lookup_params *params) { int err; const struct xfrm_policy_afinfo *afinfo = xfrm_policy_get_afinfo(family); if (unlikely(afinfo == NULL)) return -EINVAL; - err = afinfo->get_saddr(net, oif, local, remote, mark); + err = afinfo->get_saddr(saddr, params); rcu_read_unlock(); return err; } @@ -2383,9 +2389,14 @@ xfrm_tmpl_resolve_one(struct xfrm_policy *policy, const struct flowi *fl, remote = &tmpl->id.daddr; local = &tmpl->saddr; if (xfrm_addr_any(local, tmpl->encap_family)) { - error = xfrm_get_saddr(net, fl->flowi_oif, - &tmp, remote, - tmpl->encap_family, 0); + struct xfrm_dst_lookup_params params; + + memset(&params, 0, sizeof(params)); + params.net = net; + params.oif = fl->flowi_oif; + params.daddr = remote; + error = xfrm_get_saddr(tmpl->encap_family, &tmp, + &params); if (error) goto fail; local = &tmp; -- 2.43.0

11 months

1
7
0 0

[PATCH v2] drm/xe/ufence: Flush xe ordered_wq in case of ufence timeout

by Nirmoy Das

Flush xe ordered_wq in case of ufence timeout which is observed on LNL and that points to the recent scheduling issue with E-cores. This is similar to the recent fix: commit e51527233804 ("drm/xe/guc/ct: Flush g2h worker in case of g2h response timeout") and should be removed once there is E core scheduling fix. v2: Add platform check(Himal) s/__flush_workqueue/flush_workqueue(Jani) Cc: Badal Nilawar <badal.nilawar(a)intel.com> Cc: Jani Nikula <jani.nikula(a)intel.com> Cc: Matthew Auld <matthew.auld(a)intel.com> Cc: John Harrison <John.C.Harrison(a)Intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray(a)intel.com> Cc: Lucas De Marchi <lucas.demarchi(a)intel.com> Cc: <stable(a)vger.kernel.org> # v6.11+ Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2754 Suggested-by: Matthew Brost <matthew.brost(a)intel.com> Signed-off-by: Nirmoy Das <nirmoy.das(a)intel.com> Reviewed-by: Matthew Brost <matthew.brost(a)intel.com> --- drivers/gpu/drm/xe/xe_wait_user_fence.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_wait_user_fence.c b/drivers/gpu/drm/xe/xe_wait_user_fence.c index f5deb81eba01..78a0ad3c78fe 100644 --- a/drivers/gpu/drm/xe/xe_wait_user_fence.c +++ b/drivers/gpu/drm/xe/xe_wait_user_fence.c @@ -13,6 +13,7 @@ #include "xe_device.h" #include "xe_gt.h" #include "xe_macros.h" +#include "compat-i915-headers/i915_drv.h" #include "xe_exec_queue.h" static int do_compare(u64 addr, u64 value, u64 mask, u16 op) @@ -155,6 +156,19 @@ int xe_wait_user_fence_ioctl(struct drm_device *dev, void *data, } if (!timeout) { + if (IS_LUNARLAKE(xe)) { + /* + * This is analogous to e51527233804 ("drm/xe/guc/ct: Flush g2h + * worker in case of g2h response timeout") + * + * TODO: Drop this change once workqueue scheduling delay issue is + * fixed on LNL Hybrid CPU. + */ + flush_workqueue(xe->ordered_wq); + err = do_compare(addr, args->value, args->mask, args->op); + if (err <= 0) + break; + } err = -ETIME; break; } -- 2.46.0

11 months

6
10
0 0

[PATCH v2] vdpa: solidrun: Fix UB bug with devres

by Philipp Stanner

In psnet_open_pf_bar() and snet_open_vf_bar() a string later passed to pcim_iomap_regions() is placed on the stack. Neither pcim_iomap_regions() nor the functions it calls copy that string. Should the string later ever be used, this, consequently, causes undefined behavior since the stack frame will by then have disappeared. Fix the bug by allocating the strings on the heap through devm_kasprintf(). Cc: stable(a)vger.kernel.org # v6.3 Fixes: 51a8f9d7f587 ("virtio: vdpa: new SolidNET DPU driver.") Reported-by: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr> Closes: https://lore.kernel.org/all/74e9109a-ac59-49e2-9b1d-d825c9c9f891@wanadoo.fr/ Suggested-by: Andy Shevchenko <andy(a)kernel.org> Signed-off-by: Philipp Stanner <pstanner(a)redhat.com> Reviewed-by: Stefano Garzarella <sgarzare(a)redhat.com> --- Changes in v2: - Add Stefano's RB --- drivers/vdpa/solidrun/snet_main.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/vdpa/solidrun/snet_main.c b/drivers/vdpa/solidrun/snet_main.c index 99428a04068d..c8b74980dbd1 100644 --- a/drivers/vdpa/solidrun/snet_main.c +++ b/drivers/vdpa/solidrun/snet_main.c @@ -555,7 +555,7 @@ static const struct vdpa_config_ops snet_config_ops = { static int psnet_open_pf_bar(struct pci_dev *pdev, struct psnet *psnet) { - char name[50]; + char *name; int ret, i, mask = 0; /* We don't know which BAR will be used to communicate.. * We will map every bar with len > 0. @@ -573,7 +573,10 @@ static int psnet_open_pf_bar(struct pci_dev *pdev, struct psnet *psnet) return -ENODEV; } - snprintf(name, sizeof(name), "psnet[%s]-bars", pci_name(pdev)); + name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "psnet[%s]-bars", pci_name(pdev)); + if (!name) + return -ENOMEM; + ret = pcim_iomap_regions(pdev, mask, name); if (ret) { SNET_ERR(pdev, "Failed to request and map PCI BARs\n"); @@ -590,10 +593,13 @@ static int psnet_open_pf_bar(struct pci_dev *pdev, struct psnet *psnet) static int snet_open_vf_bar(struct pci_dev *pdev, struct snet *snet) { - char name[50]; + char *name; int ret; - snprintf(name, sizeof(name), "snet[%s]-bar", pci_name(pdev)); + name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "snet[%s]-bars", pci_name(pdev)); + if (!name) + return -ENOMEM; + /* Request and map BAR */ ret = pcim_iomap_regions(pdev, BIT(snet->psnet->cfg.vf_bar), name); if (ret) { -- 2.47.0

11 months

1
0
0 0

[PATCH for 4.19.y] bonding (gcc13): synchronize bond_{a,t}lb_xmit() types

by Nobuhiro Iwamatsu

From: "Jiri Slaby (SUSE)" <jirislaby(a)kernel.org> commit 777fa87c7682228e155cf0892ba61cb2ab1fe3ae upstream. Both bond_alb_xmit() and bond_tlb_xmit() produce a valid warning with gcc-13: drivers/net/bonding/bond_alb.c:1409:13: error: conflicting types for 'bond_tlb_xmit' due to enum/integer mismatch; have 'netdev_tx_t(struct sk_buff *, struct net_device *)' ... include/net/bond_alb.h:160:5: note: previous declaration of 'bond_tlb_xmit' with type 'int(struct sk_buff *, struct net_device *)' drivers/net/bonding/bond_alb.c:1523:13: error: conflicting types for 'bond_alb_xmit' due to enum/integer mismatch; have 'netdev_tx_t(struct sk_buff *, struct net_device *)' ... include/net/bond_alb.h:159:5: note: previous declaration of 'bond_alb_xmit' with type 'int(struct sk_buff *, struct net_device *)' I.e. the return type of the declaration is int, while the definitions spell netdev_tx_t. Synchronize both of them to the latter. Cc: Martin Liska <mliska(a)suse.cz> Cc: Jay Vosburgh <j.vosburgh(a)gmail.com> Cc: Veaceslav Falico <vfalico(a)gmail.com> Cc: Andy Gospodarek <andy(a)greyhouse.net> Signed-off-by: Jiri Slaby (SUSE) <jirislaby(a)kernel.org> Link: https://lore.kernel.org/r/20221031114409.10417-1-jirislaby@kernel.org Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> [iwamatsu: adjust context] Signed-off-by: Nobuhiro Iwamatsu (CIP) <nobuhiro1.iwamatsu(a)toshiba.co.jp> --- include/net/bond_alb.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/net/bond_alb.h b/include/net/bond_alb.h index 3a6c932b6dcaff..52467d0b4677c9 100644 --- a/include/net/bond_alb.h +++ b/include/net/bond_alb.h @@ -172,8 +172,8 @@ int bond_alb_init_slave(struct bonding *bond, struct slave *slave); void bond_alb_deinit_slave(struct bonding *bond, struct slave *slave); void bond_alb_handle_link_change(struct bonding *bond, struct slave *slave, char link); void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave); -int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev); -int bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev); +netdev_tx_t bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev); +netdev_tx_t bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev); void bond_alb_monitor(struct work_struct *); int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr); void bond_alb_clear_vlan(struct bonding *bond, unsigned short vlan_id); -- 2.45.2

11 months

1
0
0 0

[PATCH] Revert "drm/mipi-dsi: Set the fwnode for mipi_dsi_device"

by Jason-JH.Lin via B4 Relay

From: "Jason-JH.Lin" <jason-jh.lin(a)mediatek.com> This reverts commit ac88a1f41f93499df6f50fd18ea835e6ff4f3200. Reason for revert: 1. The commit [1] does not land on linux-5.15, so this patch does not fix anything. 2. Since the fw_device improvements series [2] does not land on linux-5.15, using device_set_fwnode() causes the panel to flash during bootup. Incorrect link management may lead to incorrect device initialization, affecting firmware node links and consumer relationships. The fwnode setting of panel to the DSI device would cause a DSI initialization error without series[2], so this patch was reverted to avoid using the incomplete fw_devlink functionality. [1] commit 3fb16866b51d ("driver core: fw_devlink: Make cycle detection more robust") [2] Link: https://lore.kernel.org/all/20230207014207.1678715-1-saravanak@google.com Signed-off-by: Jason-JH.Lin <jason-jh.lin(a)mediatek.com> --- drivers/gpu/drm/drm_mipi_dsi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_mipi_dsi.c b/drivers/gpu/drm/drm_mipi_dsi.c index 24606b632009..468a3a7cb6a5 100644 --- a/drivers/gpu/drm/drm_mipi_dsi.c +++ b/drivers/gpu/drm/drm_mipi_dsi.c @@ -221,7 +221,7 @@ mipi_dsi_device_register_full(struct mipi_dsi_host *host, return dsi; } - device_set_node(&dsi->dev, of_fwnode_handle(info->node)); + dsi->dev.of_node = info->node; dsi->channel = info->channel; strlcpy(dsi->name, info->type, sizeof(dsi->name)); --- base-commit: 74cdd62cb4706515b454ce5bacb73b566c1d1bcf change-id: 20241024-fixup-5-15-5fdd68dae707 Best regards, -- Jason-JH.Lin <jason-jh.lin(a)mediatek.com>

11 months

4
6
0 0

Re: Patch "xhci: dbgtty: use kfifo from tty_port struct" has been added to the 6.11-stable tree

by Jiri Slaby

On 22. 10. 24, 19:45, Sasha Levin wrote: > This is a note to let you know that I've just added the patch titled > > xhci: dbgtty: use kfifo from tty_port struct This is a cleanup, not needed in stable. -- js suse labs

11 months

2
1
0 0

Request to backport a fix to 6.10 and 6.11 stable kernels

by Dhananjay Ugwekar

Hello, The patch "[PATCH v3] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs" fixes the RAPL energy-pkg event on AMD CPUs. It was broken by "commit 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf")", which got merged in v6.10-rc1. I missed the "Fixes" tag while posting the patch on LKML. Please backport the fix to 6.10 (I see this is EOL so probably this wont be possible) and 6.11 stable kernels. Mainline commit ID for the fix is "8d72eba1cf8c". Thanks, Dhananjay

11 months

2
2
0 0

Add 2 commits to kernel 6.11.y

by Gong, Richard

Hi, The commits below are required to enable amd_atl driver on AMD processors. 0f70fdd42559 x86/amd_nb: Add new PCI IDs for AMD family 1Ah model 60h-70 f8bc84b6096f x86/amd_nb: Add new PCI ID for AMD family 1Ah model 20h Please add those 2 commits to stable kernel 6.11.y. Thanks! Regards, Richard

11 months

2
1
0 0

Re: Patch "btrfs: also add stripe entries for NOCOW writes" has been added to the 6.11-stable tree

by Johannes Thumshirn

On 24.10.24 13:17, Sasha Levin wrote: > This is a note to let you know that I've just added the patch titled > > btrfs: also add stripe entries for NOCOW writes > > to the 6.11-stable tree which can be found at: > http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… > > The filename of the patch is: > btrfs-also-add-stripe-entries-for-nocow-writes.patch > and it can be found in the queue-6.11 subdirectory. > > If you, or anyone else, feels it should not be added to the stable tree, > please let <stable(a)vger.kernel.org> know about it. Hey Sasha, this patch is for the RAID stripe-tree feature marked as experimental, so I don't think it's needed to be backported, as noone should use it (apart from testing). > > > > commit ec508a5930024b40064d70cb3dbdf856760cdf5d > Author: Johannes Thumshirn <johannes.thumshirn(a)wdc.com> > Date: Thu Sep 19 12:16:38 2024 +0200 > > btrfs: also add stripe entries for NOCOW writes > > [ Upstream commit 97f9782276fc9cb0de37a5eecb82204e48a5a612 ] > > NOCOW writes do not generate stripe_extent entries in the RAID stripe > tree, as the RAID stripe-tree feature initially was designed with a > zoned filesystem in mind and on a zoned filesystem, we do not allow NOCOW > writes. But the RAID stripe-tree feature is independent from the zoned > feature, so we must also do NOCOW writes for RAID stripe-tree filesystems. > > Reviewed-by: Naohiro Aota <naohiro.aota(a)wdc.com> > Signed-off-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com> > Signed-off-by: David Sterba <dsterba(a)suse.com> > Signed-off-by: Sasha Levin <sashal(a)kernel.org> > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index b1b6564ab68f0..48149c2e68954 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -3087,6 +3087,11 @@ int btrfs_finish_one_ordered(struct btrfs_ordered_extent *ordered_extent) > ret = btrfs_update_inode_fallback(trans, inode); > if (ret) /* -ENOMEM or corruption */ > btrfs_abort_transaction(trans, ret); > + > + ret = btrfs_insert_raid_extent(trans, ordered_extent); > + if (ret) > + btrfs_abort_transaction(trans, ret); > + > goto out; > } > >

11 months

2
1
0 0

[PATCH v2] Revert "drm/mipi-dsi: Set the fwnode for mipi_dsi_device"

by Jason-JH.Lin via B4 Relay

From: "Jason-JH.Lin" <jason-jh.lin(a)mediatek.com> This reverts commit ac88a1f41f93499df6f50fd18ea835e6ff4f3200. Reason for revert: 1. The commit [1] does not land on linux-5.15, so this patch does not fix anything. 2. Since the fw_device improvements series [2] does not land on linux-5.15, using device_set_fwnode() causes the panel to flash during bootup. Incorrect link management may lead to incorrect device initialization, affecting firmware node links and consumer relationships. The fwnode setting of panel to the DSI device would cause a DSI initialization error without series[2], so this patch was reverted to avoid using the incomplete fw_devlink functionality. [1] commit 3fb16866b51d ("driver core: fw_devlink: Make cycle detection more robust") [2] Link: https://lore.kernel.org/all/20230207014207.1678715-1-saravanak@google.com Cc: stable(a)vger.kernel.org # 5.15.169 Signed-off-by: Jason-JH.Lin <jason-jh.lin(a)mediatek.com> --- drivers/gpu/drm/drm_mipi_dsi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_mipi_dsi.c b/drivers/gpu/drm/drm_mipi_dsi.c index 24606b632009..468a3a7cb6a5 100644 --- a/drivers/gpu/drm/drm_mipi_dsi.c +++ b/drivers/gpu/drm/drm_mipi_dsi.c @@ -221,7 +221,7 @@ mipi_dsi_device_register_full(struct mipi_dsi_host *host, return dsi; } - device_set_node(&dsi->dev, of_fwnode_handle(info->node)); + dsi->dev.of_node = info->node; dsi->channel = info->channel; strlcpy(dsi->name, info->type, sizeof(dsi->name)); --- base-commit: 74cdd62cb4706515b454ce5bacb73b566c1d1bcf change-id: 20241024-fixup-5-15-5fdd68dae707 Best regards, -- Jason-JH.Lin <jason-jh.lin(a)mediatek.com>

11 months

1
0
0 0

[PATCH v2 4/5] xhci: Don't perform Soft Retry for Etron xHCI host

by Kuangyi Chiang

Since commit f8f80be501aa ("xhci: Use soft retry to recover faster from transaction errors"), unplugging USB device while enumeration results in errors like this: [ 364.855321] xhci_hcd 0000:0b:00.0: ERROR Transfer event for disabled endpoint slot 5 ep 2 [ 364.864622] xhci_hcd 0000:0b:00.0: @0000002167656d70 67f03000 00000021 0c000000 05038001 [ 374.934793] xhci_hcd 0000:0b:00.0: Abort failed to stop command ring: -110 [ 374.958793] xhci_hcd 0000:0b:00.0: xHCI host controller not responding, assume dead [ 374.967590] xhci_hcd 0000:0b:00.0: HC died; cleaning up [ 374.973984] xhci_hcd 0000:0b:00.0: Timeout while waiting for configure endpoint command Seems that Etorn xHCI host can not perform Soft Retry correctly, apply XHCI_NO_SOFT_RETRY quirk to disable Soft Retry and then issue is gone. This patch depends on commit a4a251f8c235 ("usb: xhci: do not perform Soft Retry for some xHCI hosts"). Fixes: f8f80be501aa ("xhci: Use soft retry to recover faster from transaction errors") Cc: <stable(a)vger.kernel.org> Signed-off-by: Kuangyi Chiang <ki.chiang65(a)gmail.com> --- drivers/usb/host/xhci-pci.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c index ddc9a82cceec..f2ca0b912977 100644 --- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -400,6 +400,7 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci) xhci->quirks |= XHCI_ETRON_HOST; xhci->quirks |= XHCI_RESET_ON_RESUME; xhci->quirks |= XHCI_BROKEN_STREAMS; + xhci->quirks |= XHCI_NO_SOFT_RETRY; } if (pdev->vendor == PCI_VENDOR_ID_RENESAS && -- 2.25.1

11 months

1
0
0 0

[PATCH v2 3/5] xhci: Fix control transfer error on Etron xHCI host

by Kuangyi Chiang

Performing a stability stress test on a USB3.0 2.5G ethernet adapter results in errors like this: [ 91.441469] r8152 2-3:1.0 eth3: get_registers -71 [ 91.458659] r8152 2-3:1.0 eth3: get_registers -71 [ 91.475911] r8152 2-3:1.0 eth3: get_registers -71 [ 91.493203] r8152 2-3:1.0 eth3: get_registers -71 [ 91.510421] r8152 2-3:1.0 eth3: get_registers -71 The r8152 driver will periodically issue lots of control-IN requests to access the status of ethernet adapter hardware registers during the test. This happens when the xHCI driver enqueue a control TD (which cross over the Link TRB between two ring segments, as shown) in the endpoint zero's transfer ring. Seems the Etron xHCI host can not perform this TD correctly, causing the USB transfer error occurred, maybe the upper driver retry that control-IN request can solve problem, but not all drivers do this. | | ------- | TRB | Setup Stage ------- | TRB | Link ------- ------- | TRB | Data Stage ------- | TRB | Status Stage ------- | | To work around this, the xHCI driver should enqueue a No Op TRB if next available TRB is the Link TRB in the ring segment, this can prevent the Setup and Data Stage TRB to be breaked by the Link TRB. Check if the XHCI_ETRON_HOST quirk flag is set before invoking the workaround in xhci_queue_ctrl_tx(). Fixes: d0e96f5a71a0 ("USB: xhci: Control transfer support.") Cc: <stable(a)vger.kernel.org> Signed-off-by: Kuangyi Chiang <ki.chiang65(a)gmail.com> --- drivers/usb/host/xhci-ring.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index b6eb928e260f..9e132b08bfde 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -3727,6 +3727,20 @@ int xhci_queue_ctrl_tx(struct xhci_hcd *xhci, gfp_t mem_flags, if (!urb->setup_packet) return -EINVAL; + if ((xhci->quirks & XHCI_ETRON_HOST) && + urb->dev->speed >= USB_SPEED_SUPER) { + /* + * If next available TRB is the Link TRB in the ring segment then + * enqueue a No Op TRB, this can prevent the Setup and Data Stage + * TRB to be breaked by the Link TRB. + */ + if (trb_is_link(ep_ring->enqueue + 1)) { + field = TRB_TYPE(TRB_TR_NOOP) | ep_ring->cycle_state; + queue_trb(xhci, ep_ring, false, 0, 0, + TRB_INTR_TARGET(0), field); + } + } + /* 1 TRB for setup, 1 for status */ num_trbs = 2; /* -- 2.25.1

11 months

1
0
0 0

Free Piano

by Kelly Hall

Hello Dear, I am giving away my late husband's Yamaha Piano to any instrument lover. Kindly let me know if you are interested or have someone who will be interested in the instrument. Thank you, K.Hall

11 months

1
0
0 0

FAILED: patch "[PATCH] drm/amd/display: temp w/a for dGPU to enter idle" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x 23d16ede33a4db4973468bf6652a09da5efd1468 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024102838-opal-rule-c671@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 23d16ede33a4db4973468bf6652a09da5efd1468 Mon Sep 17 00:00:00 2001 From: Aurabindo Pillai <aurabindo.pillai(a)amd.com> Date: Tue, 1 Oct 2024 18:03:02 -0400 Subject: [PATCH] drm/amd/display: temp w/a for dGPU to enter idle optimizations [Why&How] vblank immediate disable currently does not work for all asics. On DCN401, the vblank interrupts never stop coming, and hence we never get a chance to trigger idle optimizations. Add a workaround to enable immediate disable only on APUs for now. This adds a 2-frame delay for triggering idle optimization, which is a negligible overhead. Fixes: 58a261bfc967 ("drm/amd/display: use a more lax vblank enable policy for older ASICs") Fixes: e45b6716de4b ("drm/amd/display: use a more lax vblank enable policy for DCN35+") Cc: Mario Limonciello <mario.limonciello(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Reviewed-by: Harry Wentland <harry.wentland(a)amd.com> Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira(a)amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai(a)amd.com> Signed-off-by: Wayne Lin <wayne.lin(a)amd.com> Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> (cherry picked from commit 9b47278cec98e9894adf39229e91aaf4ab9140c5) Cc: stable(a)vger.kernel.org diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 6b5e2206e687..13421a58210d 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -8374,7 +8374,8 @@ static void manage_dm_interrupts(struct amdgpu_device *adev, if (amdgpu_ip_version(adev, DCE_HWIP, 0) < IP_VERSION(3, 5, 0) || acrtc_state->stream->link->psr_settings.psr_version < - DC_PSR_VERSION_UNSUPPORTED) { + DC_PSR_VERSION_UNSUPPORTED || + !(adev->flags & AMD_IS_APU)) { timing = &acrtc_state->stream->timing; /* at least 2 frames */

11 months

1
0
0 0

FAILED: patch "[PATCH] nfsd: fix race between laundromat and free_stateid" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y git checkout FETCH_HEAD git cherry-pick -x 8dd91e8d31febf4d9cca3ae1bb4771d33ae7ee5a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024102844-gibberish-surplus-b6d4@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 8dd91e8d31febf4d9cca3ae1bb4771d33ae7ee5a Mon Sep 17 00:00:00 2001 From: Olga Kornievskaia <okorniev(a)redhat.com> Date: Fri, 18 Oct 2024 15:24:58 -0400 Subject: [PATCH] nfsd: fix race between laundromat and free_stateid There is a race between laundromat handling of revoked delegations and a client sending free_stateid operation. Laundromat thread finds that delegation has expired and needs to be revoked so it marks the delegation stid revoked and it puts it on a reaper list but then it unlock the state lock and the actual delegation revocation happens without the lock. Once the stid is marked revoked a racing free_stateid processing thread does the following (1) it calls list_del_init() which removes it from the reaper list and (2) frees the delegation stid structure. The laundromat thread ends up not calling the revoke_delegation() function for this particular delegation but that means it will no release the lock lease that exists on the file. Now, a new open for this file comes in and ends up finding that lease list isn't empty and calls nfsd_breaker_owns_lease() which ends up trying to derefence a freed delegation stateid. Leading to the followint use-after-free KASAN warning: kernel: ================================================================== kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd] kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205 kernel: kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ #9 kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024 kernel: Call trace: kernel: dump_backtrace+0x98/0x120 kernel: show_stack+0x1c/0x30 kernel: dump_stack_lvl+0x80/0xe8 kernel: print_address_description.constprop.0+0x84/0x390 kernel: print_report+0xa4/0x268 kernel: kasan_report+0xb4/0xf8 kernel: __asan_report_load8_noabort+0x1c/0x28 kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd] kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd] kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd] kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd] kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd] kernel: nfsd4_open+0xa08/0xe80 [nfsd] kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd] kernel: nfsd_dispatch+0x22c/0x718 [nfsd] kernel: svc_process_common+0x8e8/0x1960 [sunrpc] kernel: svc_process+0x3d4/0x7e0 [sunrpc] kernel: svc_handle_xprt+0x828/0xe10 [sunrpc] kernel: svc_recv+0x2cc/0x6a8 [sunrpc] kernel: nfsd+0x270/0x400 [nfsd] kernel: kthread+0x288/0x310 kernel: ret_from_fork+0x10/0x20 This patch proposes a fixed that's based on adding 2 new additional stid's sc_status values that help coordinate between the laundromat and other operations (nfsd4_free_stateid() and nfsd4_delegreturn()). First to make sure, that once the stid is marked revoked, it is not removed by the nfsd4_free_stateid(), the laundromat take a reference on the stateid. Then, coordinating whether the stid has been put on the cl_revoked list or we are processing FREE_STATEID and need to make sure to remove it from the list, each check that state and act accordingly. If laundromat has added to the cl_revoke list before the arrival of FREE_STATEID, then nfsd4_free_stateid() knows to remove it from the list. If nfsd4_free_stateid() finds that operations arrived before laundromat has placed it on cl_revoke list, it marks the state freed and then laundromat will no longer add it to the list. Also, for nfsd4_delegreturn() when looking for the specified stid, we need to access stid that are marked removed or freeable, it means the laundromat has started processing it but hasn't finished and this delegreturn needs to return nfserr_deleg_revoked and not nfserr_bad_stateid. The latter will not trigger a FREE_STATEID and the lack of it will leave this stid on the cl_revoked list indefinitely. Fixes: 2d4a532d385f ("nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock") CC: stable(a)vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev(a)redhat.com> Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 56b261608af4..d1a2c677be7e 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1359,21 +1359,47 @@ static void destroy_delegation(struct nfs4_delegation *dp) destroy_unhashed_deleg(dp); } +/** + * revoke_delegation - perform nfs4 delegation structure cleanup + * @dp: pointer to the delegation + * + * This function assumes that it's called either from the administrative + * interface (nfsd4_revoke_states()) that's revoking a specific delegation + * stateid or it's called from a laundromat thread (nfsd4_landromat()) that + * determined that this specific state has expired and needs to be revoked + * (both mark state with the appropriate stid sc_status mode). It is also + * assumed that a reference was taken on the @dp state. + * + * If this function finds that the @dp state is SC_STATUS_FREED it means + * that a FREE_STATEID operation for this stateid has been processed and + * we can proceed to removing it from recalled list. However, if @dp state + * isn't marked SC_STATUS_FREED, it means we need place it on the cl_revoked + * list and wait for the FREE_STATEID to arrive from the client. At the same + * time, we need to mark it as SC_STATUS_FREEABLE to indicate to the + * nfsd4_free_stateid() function that this stateid has already been added + * to the cl_revoked list and that nfsd4_free_stateid() is now responsible + * for removing it from the list. Inspection of where the delegation state + * in the revocation process is protected by the clp->cl_lock. + */ static void revoke_delegation(struct nfs4_delegation *dp) { struct nfs4_client *clp = dp->dl_stid.sc_client; WARN_ON(!list_empty(&dp->dl_recall_lru)); + WARN_ON_ONCE(!(dp->dl_stid.sc_status & + (SC_STATUS_REVOKED | SC_STATUS_ADMIN_REVOKED))); trace_nfsd_stid_revoke(&dp->dl_stid); - if (dp->dl_stid.sc_status & - (SC_STATUS_REVOKED | SC_STATUS_ADMIN_REVOKED)) { - spin_lock(&clp->cl_lock); - refcount_inc(&dp->dl_stid.sc_count); - list_add(&dp->dl_recall_lru, &clp->cl_revoked); - spin_unlock(&clp->cl_lock); + spin_lock(&clp->cl_lock); + if (dp->dl_stid.sc_status & SC_STATUS_FREED) { + list_del_init(&dp->dl_recall_lru); + goto out; } + list_add(&dp->dl_recall_lru, &clp->cl_revoked); + dp->dl_stid.sc_status |= SC_STATUS_FREEABLE; +out: + spin_unlock(&clp->cl_lock); destroy_unhashed_deleg(dp); } @@ -1780,6 +1806,7 @@ void nfsd4_revoke_states(struct net *net, struct super_block *sb) mutex_unlock(&stp->st_mutex); break; case SC_TYPE_DELEG: + refcount_inc(&stid->sc_count); dp = delegstateid(stid); spin_lock(&state_lock); if (!unhash_delegation_locked( @@ -6545,6 +6572,7 @@ nfs4_laundromat(struct nfsd_net *nn) dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru); if (!state_expired(&lt, dp->dl_time)) break; + refcount_inc(&dp->dl_stid.sc_count); unhash_delegation_locked(dp, SC_STATUS_REVOKED); list_add(&dp->dl_recall_lru, &reaplist); } @@ -7157,7 +7185,9 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, s->sc_status |= SC_STATUS_CLOSED; spin_unlock(&s->sc_lock); dp = delegstateid(s); - list_del_init(&dp->dl_recall_lru); + if (s->sc_status & SC_STATUS_FREEABLE) + list_del_init(&dp->dl_recall_lru); + s->sc_status |= SC_STATUS_FREED; spin_unlock(&cl->cl_lock); nfs4_put_stid(s); ret = nfs_ok; @@ -7487,7 +7517,9 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if ((status = fh_verify(rqstp, &cstate->current_fh, S_IFREG, 0))) return status; - status = nfsd4_lookup_stateid(cstate, stateid, SC_TYPE_DELEG, 0, &s, nn); + status = nfsd4_lookup_stateid(cstate, stateid, SC_TYPE_DELEG, + SC_STATUS_REVOKED | SC_STATUS_FREEABLE, + &s, nn); if (status) goto out; dp = delegstateid(s); diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 79c743c01a47..35b3564c065f 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -114,6 +114,8 @@ struct nfs4_stid { /* For a deleg stateid kept around only to process free_stateid's: */ #define SC_STATUS_REVOKED BIT(1) #define SC_STATUS_ADMIN_REVOKED BIT(2) +#define SC_STATUS_FREEABLE BIT(3) +#define SC_STATUS_FREED BIT(4) unsigned short sc_status; struct list_head sc_cp_list;

11 months

1
0
0 0

FAILED: patch "[PATCH] nfsd: fix race between laundromat and free_stateid" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y git checkout FETCH_HEAD git cherry-pick -x 8dd91e8d31febf4d9cca3ae1bb4771d33ae7ee5a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024102842-isolation-backspace-e3c1@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 8dd91e8d31febf4d9cca3ae1bb4771d33ae7ee5a Mon Sep 17 00:00:00 2001 From: Olga Kornievskaia <okorniev(a)redhat.com> Date: Fri, 18 Oct 2024 15:24:58 -0400 Subject: [PATCH] nfsd: fix race between laundromat and free_stateid There is a race between laundromat handling of revoked delegations and a client sending free_stateid operation. Laundromat thread finds that delegation has expired and needs to be revoked so it marks the delegation stid revoked and it puts it on a reaper list but then it unlock the state lock and the actual delegation revocation happens without the lock. Once the stid is marked revoked a racing free_stateid processing thread does the following (1) it calls list_del_init() which removes it from the reaper list and (2) frees the delegation stid structure. The laundromat thread ends up not calling the revoke_delegation() function for this particular delegation but that means it will no release the lock lease that exists on the file. Now, a new open for this file comes in and ends up finding that lease list isn't empty and calls nfsd_breaker_owns_lease() which ends up trying to derefence a freed delegation stateid. Leading to the followint use-after-free KASAN warning: kernel: ================================================================== kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd] kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205 kernel: kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ #9 kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024 kernel: Call trace: kernel: dump_backtrace+0x98/0x120 kernel: show_stack+0x1c/0x30 kernel: dump_stack_lvl+0x80/0xe8 kernel: print_address_description.constprop.0+0x84/0x390 kernel: print_report+0xa4/0x268 kernel: kasan_report+0xb4/0xf8 kernel: __asan_report_load8_noabort+0x1c/0x28 kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd] kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd] kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd] kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd] kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd] kernel: nfsd4_open+0xa08/0xe80 [nfsd] kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd] kernel: nfsd_dispatch+0x22c/0x718 [nfsd] kernel: svc_process_common+0x8e8/0x1960 [sunrpc] kernel: svc_process+0x3d4/0x7e0 [sunrpc] kernel: svc_handle_xprt+0x828/0xe10 [sunrpc] kernel: svc_recv+0x2cc/0x6a8 [sunrpc] kernel: nfsd+0x270/0x400 [nfsd] kernel: kthread+0x288/0x310 kernel: ret_from_fork+0x10/0x20 This patch proposes a fixed that's based on adding 2 new additional stid's sc_status values that help coordinate between the laundromat and other operations (nfsd4_free_stateid() and nfsd4_delegreturn()). First to make sure, that once the stid is marked revoked, it is not removed by the nfsd4_free_stateid(), the laundromat take a reference on the stateid. Then, coordinating whether the stid has been put on the cl_revoked list or we are processing FREE_STATEID and need to make sure to remove it from the list, each check that state and act accordingly. If laundromat has added to the cl_revoke list before the arrival of FREE_STATEID, then nfsd4_free_stateid() knows to remove it from the list. If nfsd4_free_stateid() finds that operations arrived before laundromat has placed it on cl_revoke list, it marks the state freed and then laundromat will no longer add it to the list. Also, for nfsd4_delegreturn() when looking for the specified stid, we need to access stid that are marked removed or freeable, it means the laundromat has started processing it but hasn't finished and this delegreturn needs to return nfserr_deleg_revoked and not nfserr_bad_stateid. The latter will not trigger a FREE_STATEID and the lack of it will leave this stid on the cl_revoked list indefinitely. Fixes: 2d4a532d385f ("nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock") CC: stable(a)vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev(a)redhat.com> Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 56b261608af4..d1a2c677be7e 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1359,21 +1359,47 @@ static void destroy_delegation(struct nfs4_delegation *dp) destroy_unhashed_deleg(dp); } +/** + * revoke_delegation - perform nfs4 delegation structure cleanup + * @dp: pointer to the delegation + * + * This function assumes that it's called either from the administrative + * interface (nfsd4_revoke_states()) that's revoking a specific delegation + * stateid or it's called from a laundromat thread (nfsd4_landromat()) that + * determined that this specific state has expired and needs to be revoked + * (both mark state with the appropriate stid sc_status mode). It is also + * assumed that a reference was taken on the @dp state. + * + * If this function finds that the @dp state is SC_STATUS_FREED it means + * that a FREE_STATEID operation for this stateid has been processed and + * we can proceed to removing it from recalled list. However, if @dp state + * isn't marked SC_STATUS_FREED, it means we need place it on the cl_revoked + * list and wait for the FREE_STATEID to arrive from the client. At the same + * time, we need to mark it as SC_STATUS_FREEABLE to indicate to the + * nfsd4_free_stateid() function that this stateid has already been added + * to the cl_revoked list and that nfsd4_free_stateid() is now responsible + * for removing it from the list. Inspection of where the delegation state + * in the revocation process is protected by the clp->cl_lock. + */ static void revoke_delegation(struct nfs4_delegation *dp) { struct nfs4_client *clp = dp->dl_stid.sc_client; WARN_ON(!list_empty(&dp->dl_recall_lru)); + WARN_ON_ONCE(!(dp->dl_stid.sc_status & + (SC_STATUS_REVOKED | SC_STATUS_ADMIN_REVOKED))); trace_nfsd_stid_revoke(&dp->dl_stid); - if (dp->dl_stid.sc_status & - (SC_STATUS_REVOKED | SC_STATUS_ADMIN_REVOKED)) { - spin_lock(&clp->cl_lock); - refcount_inc(&dp->dl_stid.sc_count); - list_add(&dp->dl_recall_lru, &clp->cl_revoked); - spin_unlock(&clp->cl_lock); + spin_lock(&clp->cl_lock); + if (dp->dl_stid.sc_status & SC_STATUS_FREED) { + list_del_init(&dp->dl_recall_lru); + goto out; } + list_add(&dp->dl_recall_lru, &clp->cl_revoked); + dp->dl_stid.sc_status |= SC_STATUS_FREEABLE; +out: + spin_unlock(&clp->cl_lock); destroy_unhashed_deleg(dp); } @@ -1780,6 +1806,7 @@ void nfsd4_revoke_states(struct net *net, struct super_block *sb) mutex_unlock(&stp->st_mutex); break; case SC_TYPE_DELEG: + refcount_inc(&stid->sc_count); dp = delegstateid(stid); spin_lock(&state_lock); if (!unhash_delegation_locked( @@ -6545,6 +6572,7 @@ nfs4_laundromat(struct nfsd_net *nn) dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru); if (!state_expired(&lt, dp->dl_time)) break; + refcount_inc(&dp->dl_stid.sc_count); unhash_delegation_locked(dp, SC_STATUS_REVOKED); list_add(&dp->dl_recall_lru, &reaplist); } @@ -7157,7 +7185,9 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, s->sc_status |= SC_STATUS_CLOSED; spin_unlock(&s->sc_lock); dp = delegstateid(s); - list_del_init(&dp->dl_recall_lru); + if (s->sc_status & SC_STATUS_FREEABLE) + list_del_init(&dp->dl_recall_lru); + s->sc_status |= SC_STATUS_FREED; spin_unlock(&cl->cl_lock); nfs4_put_stid(s); ret = nfs_ok; @@ -7487,7 +7517,9 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if ((status = fh_verify(rqstp, &cstate->current_fh, S_IFREG, 0))) return status; - status = nfsd4_lookup_stateid(cstate, stateid, SC_TYPE_DELEG, 0, &s, nn); + status = nfsd4_lookup_stateid(cstate, stateid, SC_TYPE_DELEG, + SC_STATUS_REVOKED | SC_STATUS_FREEABLE, + &s, nn); if (status) goto out; dp = delegstateid(s); diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 79c743c01a47..35b3564c065f 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -114,6 +114,8 @@ struct nfs4_stid { /* For a deleg stateid kept around only to process free_stateid's: */ #define SC_STATUS_REVOKED BIT(1) #define SC_STATUS_ADMIN_REVOKED BIT(2) +#define SC_STATUS_FREEABLE BIT(3) +#define SC_STATUS_FREED BIT(4) unsigned short sc_status; struct list_head sc_cp_list;

11 months

1
0
0 0

FAILED: patch "[PATCH] nfsd: fix race between laundromat and free_stateid" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y git checkout FETCH_HEAD git cherry-pick -x 8dd91e8d31febf4d9cca3ae1bb4771d33ae7ee5a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024102839-bullhorn-canister-fae3@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 8dd91e8d31febf4d9cca3ae1bb4771d33ae7ee5a Mon Sep 17 00:00:00 2001 From: Olga Kornievskaia <okorniev(a)redhat.com> Date: Fri, 18 Oct 2024 15:24:58 -0400 Subject: [PATCH] nfsd: fix race between laundromat and free_stateid There is a race between laundromat handling of revoked delegations and a client sending free_stateid operation. Laundromat thread finds that delegation has expired and needs to be revoked so it marks the delegation stid revoked and it puts it on a reaper list but then it unlock the state lock and the actual delegation revocation happens without the lock. Once the stid is marked revoked a racing free_stateid processing thread does the following (1) it calls list_del_init() which removes it from the reaper list and (2) frees the delegation stid structure. The laundromat thread ends up not calling the revoke_delegation() function for this particular delegation but that means it will no release the lock lease that exists on the file. Now, a new open for this file comes in and ends up finding that lease list isn't empty and calls nfsd_breaker_owns_lease() which ends up trying to derefence a freed delegation stateid. Leading to the followint use-after-free KASAN warning: kernel: ================================================================== kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd] kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205 kernel: kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ #9 kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024 kernel: Call trace: kernel: dump_backtrace+0x98/0x120 kernel: show_stack+0x1c/0x30 kernel: dump_stack_lvl+0x80/0xe8 kernel: print_address_description.constprop.0+0x84/0x390 kernel: print_report+0xa4/0x268 kernel: kasan_report+0xb4/0xf8 kernel: __asan_report_load8_noabort+0x1c/0x28 kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd] kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd] kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd] kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd] kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd] kernel: nfsd4_open+0xa08/0xe80 [nfsd] kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd] kernel: nfsd_dispatch+0x22c/0x718 [nfsd] kernel: svc_process_common+0x8e8/0x1960 [sunrpc] kernel: svc_process+0x3d4/0x7e0 [sunrpc] kernel: svc_handle_xprt+0x828/0xe10 [sunrpc] kernel: svc_recv+0x2cc/0x6a8 [sunrpc] kernel: nfsd+0x270/0x400 [nfsd] kernel: kthread+0x288/0x310 kernel: ret_from_fork+0x10/0x20 This patch proposes a fixed that's based on adding 2 new additional stid's sc_status values that help coordinate between the laundromat and other operations (nfsd4_free_stateid() and nfsd4_delegreturn()). First to make sure, that once the stid is marked revoked, it is not removed by the nfsd4_free_stateid(), the laundromat take a reference on the stateid. Then, coordinating whether the stid has been put on the cl_revoked list or we are processing FREE_STATEID and need to make sure to remove it from the list, each check that state and act accordingly. If laundromat has added to the cl_revoke list before the arrival of FREE_STATEID, then nfsd4_free_stateid() knows to remove it from the list. If nfsd4_free_stateid() finds that operations arrived before laundromat has placed it on cl_revoke list, it marks the state freed and then laundromat will no longer add it to the list. Also, for nfsd4_delegreturn() when looking for the specified stid, we need to access stid that are marked removed or freeable, it means the laundromat has started processing it but hasn't finished and this delegreturn needs to return nfserr_deleg_revoked and not nfserr_bad_stateid. The latter will not trigger a FREE_STATEID and the lack of it will leave this stid on the cl_revoked list indefinitely. Fixes: 2d4a532d385f ("nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock") CC: stable(a)vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev(a)redhat.com> Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 56b261608af4..d1a2c677be7e 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1359,21 +1359,47 @@ static void destroy_delegation(struct nfs4_delegation *dp) destroy_unhashed_deleg(dp); } +/** + * revoke_delegation - perform nfs4 delegation structure cleanup + * @dp: pointer to the delegation + * + * This function assumes that it's called either from the administrative + * interface (nfsd4_revoke_states()) that's revoking a specific delegation + * stateid or it's called from a laundromat thread (nfsd4_landromat()) that + * determined that this specific state has expired and needs to be revoked + * (both mark state with the appropriate stid sc_status mode). It is also + * assumed that a reference was taken on the @dp state. + * + * If this function finds that the @dp state is SC_STATUS_FREED it means + * that a FREE_STATEID operation for this stateid has been processed and + * we can proceed to removing it from recalled list. However, if @dp state + * isn't marked SC_STATUS_FREED, it means we need place it on the cl_revoked + * list and wait for the FREE_STATEID to arrive from the client. At the same + * time, we need to mark it as SC_STATUS_FREEABLE to indicate to the + * nfsd4_free_stateid() function that this stateid has already been added + * to the cl_revoked list and that nfsd4_free_stateid() is now responsible + * for removing it from the list. Inspection of where the delegation state + * in the revocation process is protected by the clp->cl_lock. + */ static void revoke_delegation(struct nfs4_delegation *dp) { struct nfs4_client *clp = dp->dl_stid.sc_client; WARN_ON(!list_empty(&dp->dl_recall_lru)); + WARN_ON_ONCE(!(dp->dl_stid.sc_status & + (SC_STATUS_REVOKED | SC_STATUS_ADMIN_REVOKED))); trace_nfsd_stid_revoke(&dp->dl_stid); - if (dp->dl_stid.sc_status & - (SC_STATUS_REVOKED | SC_STATUS_ADMIN_REVOKED)) { - spin_lock(&clp->cl_lock); - refcount_inc(&dp->dl_stid.sc_count); - list_add(&dp->dl_recall_lru, &clp->cl_revoked); - spin_unlock(&clp->cl_lock); + spin_lock(&clp->cl_lock); + if (dp->dl_stid.sc_status & SC_STATUS_FREED) { + list_del_init(&dp->dl_recall_lru); + goto out; } + list_add(&dp->dl_recall_lru, &clp->cl_revoked); + dp->dl_stid.sc_status |= SC_STATUS_FREEABLE; +out: + spin_unlock(&clp->cl_lock); destroy_unhashed_deleg(dp); } @@ -1780,6 +1806,7 @@ void nfsd4_revoke_states(struct net *net, struct super_block *sb) mutex_unlock(&stp->st_mutex); break; case SC_TYPE_DELEG: + refcount_inc(&stid->sc_count); dp = delegstateid(stid); spin_lock(&state_lock); if (!unhash_delegation_locked( @@ -6545,6 +6572,7 @@ nfs4_laundromat(struct nfsd_net *nn) dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru); if (!state_expired(&lt, dp->dl_time)) break; + refcount_inc(&dp->dl_stid.sc_count); unhash_delegation_locked(dp, SC_STATUS_REVOKED); list_add(&dp->dl_recall_lru, &reaplist); } @@ -7157,7 +7185,9 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, s->sc_status |= SC_STATUS_CLOSED; spin_unlock(&s->sc_lock); dp = delegstateid(s); - list_del_init(&dp->dl_recall_lru); + if (s->sc_status & SC_STATUS_FREEABLE) + list_del_init(&dp->dl_recall_lru); + s->sc_status |= SC_STATUS_FREED; spin_unlock(&cl->cl_lock); nfs4_put_stid(s); ret = nfs_ok; @@ -7487,7 +7517,9 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if ((status = fh_verify(rqstp, &cstate->current_fh, S_IFREG, 0))) return status; - status = nfsd4_lookup_stateid(cstate, stateid, SC_TYPE_DELEG, 0, &s, nn); + status = nfsd4_lookup_stateid(cstate, stateid, SC_TYPE_DELEG, + SC_STATUS_REVOKED | SC_STATUS_FREEABLE, + &s, nn); if (status) goto out; dp = delegstateid(s); diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 79c743c01a47..35b3564c065f 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -114,6 +114,8 @@ struct nfs4_stid { /* For a deleg stateid kept around only to process free_stateid's: */ #define SC_STATUS_REVOKED BIT(1) #define SC_STATUS_ADMIN_REVOKED BIT(2) +#define SC_STATUS_FREEABLE BIT(3) +#define SC_STATUS_FREED BIT(4) unsigned short sc_status; struct list_head sc_cp_list;

11 months

1
0
0 0

FAILED: patch "[PATCH] nfsd: fix race between laundromat and free_stateid" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x 8dd91e8d31febf4d9cca3ae1bb4771d33ae7ee5a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024102837-elusive-service-68d6@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 8dd91e8d31febf4d9cca3ae1bb4771d33ae7ee5a Mon Sep 17 00:00:00 2001 From: Olga Kornievskaia <okorniev(a)redhat.com> Date: Fri, 18 Oct 2024 15:24:58 -0400 Subject: [PATCH] nfsd: fix race between laundromat and free_stateid There is a race between laundromat handling of revoked delegations and a client sending free_stateid operation. Laundromat thread finds that delegation has expired and needs to be revoked so it marks the delegation stid revoked and it puts it on a reaper list but then it unlock the state lock and the actual delegation revocation happens without the lock. Once the stid is marked revoked a racing free_stateid processing thread does the following (1) it calls list_del_init() which removes it from the reaper list and (2) frees the delegation stid structure. The laundromat thread ends up not calling the revoke_delegation() function for this particular delegation but that means it will no release the lock lease that exists on the file. Now, a new open for this file comes in and ends up finding that lease list isn't empty and calls nfsd_breaker_owns_lease() which ends up trying to derefence a freed delegation stateid. Leading to the followint use-after-free KASAN warning: kernel: ================================================================== kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd] kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205 kernel: kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ #9 kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024 kernel: Call trace: kernel: dump_backtrace+0x98/0x120 kernel: show_stack+0x1c/0x30 kernel: dump_stack_lvl+0x80/0xe8 kernel: print_address_description.constprop.0+0x84/0x390 kernel: print_report+0xa4/0x268 kernel: kasan_report+0xb4/0xf8 kernel: __asan_report_load8_noabort+0x1c/0x28 kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd] kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd] kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd] kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd] kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd] kernel: nfsd4_open+0xa08/0xe80 [nfsd] kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd] kernel: nfsd_dispatch+0x22c/0x718 [nfsd] kernel: svc_process_common+0x8e8/0x1960 [sunrpc] kernel: svc_process+0x3d4/0x7e0 [sunrpc] kernel: svc_handle_xprt+0x828/0xe10 [sunrpc] kernel: svc_recv+0x2cc/0x6a8 [sunrpc] kernel: nfsd+0x270/0x400 [nfsd] kernel: kthread+0x288/0x310 kernel: ret_from_fork+0x10/0x20 This patch proposes a fixed that's based on adding 2 new additional stid's sc_status values that help coordinate between the laundromat and other operations (nfsd4_free_stateid() and nfsd4_delegreturn()). First to make sure, that once the stid is marked revoked, it is not removed by the nfsd4_free_stateid(), the laundromat take a reference on the stateid. Then, coordinating whether the stid has been put on the cl_revoked list or we are processing FREE_STATEID and need to make sure to remove it from the list, each check that state and act accordingly. If laundromat has added to the cl_revoke list before the arrival of FREE_STATEID, then nfsd4_free_stateid() knows to remove it from the list. If nfsd4_free_stateid() finds that operations arrived before laundromat has placed it on cl_revoke list, it marks the state freed and then laundromat will no longer add it to the list. Also, for nfsd4_delegreturn() when looking for the specified stid, we need to access stid that are marked removed or freeable, it means the laundromat has started processing it but hasn't finished and this delegreturn needs to return nfserr_deleg_revoked and not nfserr_bad_stateid. The latter will not trigger a FREE_STATEID and the lack of it will leave this stid on the cl_revoked list indefinitely. Fixes: 2d4a532d385f ("nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock") CC: stable(a)vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev(a)redhat.com> Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 56b261608af4..d1a2c677be7e 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1359,21 +1359,47 @@ static void destroy_delegation(struct nfs4_delegation *dp) destroy_unhashed_deleg(dp); } +/** + * revoke_delegation - perform nfs4 delegation structure cleanup + * @dp: pointer to the delegation + * + * This function assumes that it's called either from the administrative + * interface (nfsd4_revoke_states()) that's revoking a specific delegation + * stateid or it's called from a laundromat thread (nfsd4_landromat()) that + * determined that this specific state has expired and needs to be revoked + * (both mark state with the appropriate stid sc_status mode). It is also + * assumed that a reference was taken on the @dp state. + * + * If this function finds that the @dp state is SC_STATUS_FREED it means + * that a FREE_STATEID operation for this stateid has been processed and + * we can proceed to removing it from recalled list. However, if @dp state + * isn't marked SC_STATUS_FREED, it means we need place it on the cl_revoked + * list and wait for the FREE_STATEID to arrive from the client. At the same + * time, we need to mark it as SC_STATUS_FREEABLE to indicate to the + * nfsd4_free_stateid() function that this stateid has already been added + * to the cl_revoked list and that nfsd4_free_stateid() is now responsible + * for removing it from the list. Inspection of where the delegation state + * in the revocation process is protected by the clp->cl_lock. + */ static void revoke_delegation(struct nfs4_delegation *dp) { struct nfs4_client *clp = dp->dl_stid.sc_client; WARN_ON(!list_empty(&dp->dl_recall_lru)); + WARN_ON_ONCE(!(dp->dl_stid.sc_status & + (SC_STATUS_REVOKED | SC_STATUS_ADMIN_REVOKED))); trace_nfsd_stid_revoke(&dp->dl_stid); - if (dp->dl_stid.sc_status & - (SC_STATUS_REVOKED | SC_STATUS_ADMIN_REVOKED)) { - spin_lock(&clp->cl_lock); - refcount_inc(&dp->dl_stid.sc_count); - list_add(&dp->dl_recall_lru, &clp->cl_revoked); - spin_unlock(&clp->cl_lock); + spin_lock(&clp->cl_lock); + if (dp->dl_stid.sc_status & SC_STATUS_FREED) { + list_del_init(&dp->dl_recall_lru); + goto out; } + list_add(&dp->dl_recall_lru, &clp->cl_revoked); + dp->dl_stid.sc_status |= SC_STATUS_FREEABLE; +out: + spin_unlock(&clp->cl_lock); destroy_unhashed_deleg(dp); } @@ -1780,6 +1806,7 @@ void nfsd4_revoke_states(struct net *net, struct super_block *sb) mutex_unlock(&stp->st_mutex); break; case SC_TYPE_DELEG: + refcount_inc(&stid->sc_count); dp = delegstateid(stid); spin_lock(&state_lock); if (!unhash_delegation_locked( @@ -6545,6 +6572,7 @@ nfs4_laundromat(struct nfsd_net *nn) dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru); if (!state_expired(&lt, dp->dl_time)) break; + refcount_inc(&dp->dl_stid.sc_count); unhash_delegation_locked(dp, SC_STATUS_REVOKED); list_add(&dp->dl_recall_lru, &reaplist); } @@ -7157,7 +7185,9 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, s->sc_status |= SC_STATUS_CLOSED; spin_unlock(&s->sc_lock); dp = delegstateid(s); - list_del_init(&dp->dl_recall_lru); + if (s->sc_status & SC_STATUS_FREEABLE) + list_del_init(&dp->dl_recall_lru); + s->sc_status |= SC_STATUS_FREED; spin_unlock(&cl->cl_lock); nfs4_put_stid(s); ret = nfs_ok; @@ -7487,7 +7517,9 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if ((status = fh_verify(rqstp, &cstate->current_fh, S_IFREG, 0))) return status; - status = nfsd4_lookup_stateid(cstate, stateid, SC_TYPE_DELEG, 0, &s, nn); + status = nfsd4_lookup_stateid(cstate, stateid, SC_TYPE_DELEG, + SC_STATUS_REVOKED | SC_STATUS_FREEABLE, + &s, nn); if (status) goto out; dp = delegstateid(s); diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 79c743c01a47..35b3564c065f 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -114,6 +114,8 @@ struct nfs4_stid { /* For a deleg stateid kept around only to process free_stateid's: */ #define SC_STATUS_REVOKED BIT(1) #define SC_STATUS_ADMIN_REVOKED BIT(2) +#define SC_STATUS_FREEABLE BIT(3) +#define SC_STATUS_FREED BIT(4) unsigned short sc_status; struct list_head sc_cp_list;

11 months

1
0
0 0

FAILED: patch "[PATCH] nfsd: fix race between laundromat and free_stateid" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 8dd91e8d31febf4d9cca3ae1bb4771d33ae7ee5a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024102835-pungent-sadly-717e@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 8dd91e8d31febf4d9cca3ae1bb4771d33ae7ee5a Mon Sep 17 00:00:00 2001 From: Olga Kornievskaia <okorniev(a)redhat.com> Date: Fri, 18 Oct 2024 15:24:58 -0400 Subject: [PATCH] nfsd: fix race between laundromat and free_stateid There is a race between laundromat handling of revoked delegations and a client sending free_stateid operation. Laundromat thread finds that delegation has expired and needs to be revoked so it marks the delegation stid revoked and it puts it on a reaper list but then it unlock the state lock and the actual delegation revocation happens without the lock. Once the stid is marked revoked a racing free_stateid processing thread does the following (1) it calls list_del_init() which removes it from the reaper list and (2) frees the delegation stid structure. The laundromat thread ends up not calling the revoke_delegation() function for this particular delegation but that means it will no release the lock lease that exists on the file. Now, a new open for this file comes in and ends up finding that lease list isn't empty and calls nfsd_breaker_owns_lease() which ends up trying to derefence a freed delegation stateid. Leading to the followint use-after-free KASAN warning: kernel: ================================================================== kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd] kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205 kernel: kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ #9 kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024 kernel: Call trace: kernel: dump_backtrace+0x98/0x120 kernel: show_stack+0x1c/0x30 kernel: dump_stack_lvl+0x80/0xe8 kernel: print_address_description.constprop.0+0x84/0x390 kernel: print_report+0xa4/0x268 kernel: kasan_report+0xb4/0xf8 kernel: __asan_report_load8_noabort+0x1c/0x28 kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd] kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd] kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd] kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd] kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd] kernel: nfsd4_open+0xa08/0xe80 [nfsd] kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd] kernel: nfsd_dispatch+0x22c/0x718 [nfsd] kernel: svc_process_common+0x8e8/0x1960 [sunrpc] kernel: svc_process+0x3d4/0x7e0 [sunrpc] kernel: svc_handle_xprt+0x828/0xe10 [sunrpc] kernel: svc_recv+0x2cc/0x6a8 [sunrpc] kernel: nfsd+0x270/0x400 [nfsd] kernel: kthread+0x288/0x310 kernel: ret_from_fork+0x10/0x20 This patch proposes a fixed that's based on adding 2 new additional stid's sc_status values that help coordinate between the laundromat and other operations (nfsd4_free_stateid() and nfsd4_delegreturn()). First to make sure, that once the stid is marked revoked, it is not removed by the nfsd4_free_stateid(), the laundromat take a reference on the stateid. Then, coordinating whether the stid has been put on the cl_revoked list or we are processing FREE_STATEID and need to make sure to remove it from the list, each check that state and act accordingly. If laundromat has added to the cl_revoke list before the arrival of FREE_STATEID, then nfsd4_free_stateid() knows to remove it from the list. If nfsd4_free_stateid() finds that operations arrived before laundromat has placed it on cl_revoke list, it marks the state freed and then laundromat will no longer add it to the list. Also, for nfsd4_delegreturn() when looking for the specified stid, we need to access stid that are marked removed or freeable, it means the laundromat has started processing it but hasn't finished and this delegreturn needs to return nfserr_deleg_revoked and not nfserr_bad_stateid. The latter will not trigger a FREE_STATEID and the lack of it will leave this stid on the cl_revoked list indefinitely. Fixes: 2d4a532d385f ("nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock") CC: stable(a)vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev(a)redhat.com> Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 56b261608af4..d1a2c677be7e 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1359,21 +1359,47 @@ static void destroy_delegation(struct nfs4_delegation *dp) destroy_unhashed_deleg(dp); } +/** + * revoke_delegation - perform nfs4 delegation structure cleanup + * @dp: pointer to the delegation + * + * This function assumes that it's called either from the administrative + * interface (nfsd4_revoke_states()) that's revoking a specific delegation + * stateid or it's called from a laundromat thread (nfsd4_landromat()) that + * determined that this specific state has expired and needs to be revoked + * (both mark state with the appropriate stid sc_status mode). It is also + * assumed that a reference was taken on the @dp state. + * + * If this function finds that the @dp state is SC_STATUS_FREED it means + * that a FREE_STATEID operation for this stateid has been processed and + * we can proceed to removing it from recalled list. However, if @dp state + * isn't marked SC_STATUS_FREED, it means we need place it on the cl_revoked + * list and wait for the FREE_STATEID to arrive from the client. At the same + * time, we need to mark it as SC_STATUS_FREEABLE to indicate to the + * nfsd4_free_stateid() function that this stateid has already been added + * to the cl_revoked list and that nfsd4_free_stateid() is now responsible + * for removing it from the list. Inspection of where the delegation state + * in the revocation process is protected by the clp->cl_lock. + */ static void revoke_delegation(struct nfs4_delegation *dp) { struct nfs4_client *clp = dp->dl_stid.sc_client; WARN_ON(!list_empty(&dp->dl_recall_lru)); + WARN_ON_ONCE(!(dp->dl_stid.sc_status & + (SC_STATUS_REVOKED | SC_STATUS_ADMIN_REVOKED))); trace_nfsd_stid_revoke(&dp->dl_stid); - if (dp->dl_stid.sc_status & - (SC_STATUS_REVOKED | SC_STATUS_ADMIN_REVOKED)) { - spin_lock(&clp->cl_lock); - refcount_inc(&dp->dl_stid.sc_count); - list_add(&dp->dl_recall_lru, &clp->cl_revoked); - spin_unlock(&clp->cl_lock); + spin_lock(&clp->cl_lock); + if (dp->dl_stid.sc_status & SC_STATUS_FREED) { + list_del_init(&dp->dl_recall_lru); + goto out; } + list_add(&dp->dl_recall_lru, &clp->cl_revoked); + dp->dl_stid.sc_status |= SC_STATUS_FREEABLE; +out: + spin_unlock(&clp->cl_lock); destroy_unhashed_deleg(dp); } @@ -1780,6 +1806,7 @@ void nfsd4_revoke_states(struct net *net, struct super_block *sb) mutex_unlock(&stp->st_mutex); break; case SC_TYPE_DELEG: + refcount_inc(&stid->sc_count); dp = delegstateid(stid); spin_lock(&state_lock); if (!unhash_delegation_locked( @@ -6545,6 +6572,7 @@ nfs4_laundromat(struct nfsd_net *nn) dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru); if (!state_expired(&lt, dp->dl_time)) break; + refcount_inc(&dp->dl_stid.sc_count); unhash_delegation_locked(dp, SC_STATUS_REVOKED); list_add(&dp->dl_recall_lru, &reaplist); } @@ -7157,7 +7185,9 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, s->sc_status |= SC_STATUS_CLOSED; spin_unlock(&s->sc_lock); dp = delegstateid(s); - list_del_init(&dp->dl_recall_lru); + if (s->sc_status & SC_STATUS_FREEABLE) + list_del_init(&dp->dl_recall_lru); + s->sc_status |= SC_STATUS_FREED; spin_unlock(&cl->cl_lock); nfs4_put_stid(s); ret = nfs_ok; @@ -7487,7 +7517,9 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if ((status = fh_verify(rqstp, &cstate->current_fh, S_IFREG, 0))) return status; - status = nfsd4_lookup_stateid(cstate, stateid, SC_TYPE_DELEG, 0, &s, nn); + status = nfsd4_lookup_stateid(cstate, stateid, SC_TYPE_DELEG, + SC_STATUS_REVOKED | SC_STATUS_FREEABLE, + &s, nn); if (status) goto out; dp = delegstateid(s); diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 79c743c01a47..35b3564c065f 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -114,6 +114,8 @@ struct nfs4_stid { /* For a deleg stateid kept around only to process free_stateid's: */ #define SC_STATUS_REVOKED BIT(1) #define SC_STATUS_ADMIN_REVOKED BIT(2) +#define SC_STATUS_FREEABLE BIT(3) +#define SC_STATUS_FREED BIT(4) unsigned short sc_status; struct list_head sc_cp_list;

11 months

1
0
0 0

FAILED: patch "[PATCH] nfsd: fix race between laundromat and free_stateid" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 8dd91e8d31febf4d9cca3ae1bb4771d33ae7ee5a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024102832-murky-pasty-feca@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 8dd91e8d31febf4d9cca3ae1bb4771d33ae7ee5a Mon Sep 17 00:00:00 2001 From: Olga Kornievskaia <okorniev(a)redhat.com> Date: Fri, 18 Oct 2024 15:24:58 -0400 Subject: [PATCH] nfsd: fix race between laundromat and free_stateid There is a race between laundromat handling of revoked delegations and a client sending free_stateid operation. Laundromat thread finds that delegation has expired and needs to be revoked so it marks the delegation stid revoked and it puts it on a reaper list but then it unlock the state lock and the actual delegation revocation happens without the lock. Once the stid is marked revoked a racing free_stateid processing thread does the following (1) it calls list_del_init() which removes it from the reaper list and (2) frees the delegation stid structure. The laundromat thread ends up not calling the revoke_delegation() function for this particular delegation but that means it will no release the lock lease that exists on the file. Now, a new open for this file comes in and ends up finding that lease list isn't empty and calls nfsd_breaker_owns_lease() which ends up trying to derefence a freed delegation stateid. Leading to the followint use-after-free KASAN warning: kernel: ================================================================== kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd] kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205 kernel: kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ #9 kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024 kernel: Call trace: kernel: dump_backtrace+0x98/0x120 kernel: show_stack+0x1c/0x30 kernel: dump_stack_lvl+0x80/0xe8 kernel: print_address_description.constprop.0+0x84/0x390 kernel: print_report+0xa4/0x268 kernel: kasan_report+0xb4/0xf8 kernel: __asan_report_load8_noabort+0x1c/0x28 kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd] kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd] kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd] kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd] kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd] kernel: nfsd4_open+0xa08/0xe80 [nfsd] kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd] kernel: nfsd_dispatch+0x22c/0x718 [nfsd] kernel: svc_process_common+0x8e8/0x1960 [sunrpc] kernel: svc_process+0x3d4/0x7e0 [sunrpc] kernel: svc_handle_xprt+0x828/0xe10 [sunrpc] kernel: svc_recv+0x2cc/0x6a8 [sunrpc] kernel: nfsd+0x270/0x400 [nfsd] kernel: kthread+0x288/0x310 kernel: ret_from_fork+0x10/0x20 This patch proposes a fixed that's based on adding 2 new additional stid's sc_status values that help coordinate between the laundromat and other operations (nfsd4_free_stateid() and nfsd4_delegreturn()). First to make sure, that once the stid is marked revoked, it is not removed by the nfsd4_free_stateid(), the laundromat take a reference on the stateid. Then, coordinating whether the stid has been put on the cl_revoked list or we are processing FREE_STATEID and need to make sure to remove it from the list, each check that state and act accordingly. If laundromat has added to the cl_revoke list before the arrival of FREE_STATEID, then nfsd4_free_stateid() knows to remove it from the list. If nfsd4_free_stateid() finds that operations arrived before laundromat has placed it on cl_revoke list, it marks the state freed and then laundromat will no longer add it to the list. Also, for nfsd4_delegreturn() when looking for the specified stid, we need to access stid that are marked removed or freeable, it means the laundromat has started processing it but hasn't finished and this delegreturn needs to return nfserr_deleg_revoked and not nfserr_bad_stateid. The latter will not trigger a FREE_STATEID and the lack of it will leave this stid on the cl_revoked list indefinitely. Fixes: 2d4a532d385f ("nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock") CC: stable(a)vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev(a)redhat.com> Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 56b261608af4..d1a2c677be7e 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1359,21 +1359,47 @@ static void destroy_delegation(struct nfs4_delegation *dp) destroy_unhashed_deleg(dp); } +/** + * revoke_delegation - perform nfs4 delegation structure cleanup + * @dp: pointer to the delegation + * + * This function assumes that it's called either from the administrative + * interface (nfsd4_revoke_states()) that's revoking a specific delegation + * stateid or it's called from a laundromat thread (nfsd4_landromat()) that + * determined that this specific state has expired and needs to be revoked + * (both mark state with the appropriate stid sc_status mode). It is also + * assumed that a reference was taken on the @dp state. + * + * If this function finds that the @dp state is SC_STATUS_FREED it means + * that a FREE_STATEID operation for this stateid has been processed and + * we can proceed to removing it from recalled list. However, if @dp state + * isn't marked SC_STATUS_FREED, it means we need place it on the cl_revoked + * list and wait for the FREE_STATEID to arrive from the client. At the same + * time, we need to mark it as SC_STATUS_FREEABLE to indicate to the + * nfsd4_free_stateid() function that this stateid has already been added + * to the cl_revoked list and that nfsd4_free_stateid() is now responsible + * for removing it from the list. Inspection of where the delegation state + * in the revocation process is protected by the clp->cl_lock. + */ static void revoke_delegation(struct nfs4_delegation *dp) { struct nfs4_client *clp = dp->dl_stid.sc_client; WARN_ON(!list_empty(&dp->dl_recall_lru)); + WARN_ON_ONCE(!(dp->dl_stid.sc_status & + (SC_STATUS_REVOKED | SC_STATUS_ADMIN_REVOKED))); trace_nfsd_stid_revoke(&dp->dl_stid); - if (dp->dl_stid.sc_status & - (SC_STATUS_REVOKED | SC_STATUS_ADMIN_REVOKED)) { - spin_lock(&clp->cl_lock); - refcount_inc(&dp->dl_stid.sc_count); - list_add(&dp->dl_recall_lru, &clp->cl_revoked); - spin_unlock(&clp->cl_lock); + spin_lock(&clp->cl_lock); + if (dp->dl_stid.sc_status & SC_STATUS_FREED) { + list_del_init(&dp->dl_recall_lru); + goto out; } + list_add(&dp->dl_recall_lru, &clp->cl_revoked); + dp->dl_stid.sc_status |= SC_STATUS_FREEABLE; +out: + spin_unlock(&clp->cl_lock); destroy_unhashed_deleg(dp); } @@ -1780,6 +1806,7 @@ void nfsd4_revoke_states(struct net *net, struct super_block *sb) mutex_unlock(&stp->st_mutex); break; case SC_TYPE_DELEG: + refcount_inc(&stid->sc_count); dp = delegstateid(stid); spin_lock(&state_lock); if (!unhash_delegation_locked( @@ -6545,6 +6572,7 @@ nfs4_laundromat(struct nfsd_net *nn) dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru); if (!state_expired(&lt, dp->dl_time)) break; + refcount_inc(&dp->dl_stid.sc_count); unhash_delegation_locked(dp, SC_STATUS_REVOKED); list_add(&dp->dl_recall_lru, &reaplist); } @@ -7157,7 +7185,9 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, s->sc_status |= SC_STATUS_CLOSED; spin_unlock(&s->sc_lock); dp = delegstateid(s); - list_del_init(&dp->dl_recall_lru); + if (s->sc_status & SC_STATUS_FREEABLE) + list_del_init(&dp->dl_recall_lru); + s->sc_status |= SC_STATUS_FREED; spin_unlock(&cl->cl_lock); nfs4_put_stid(s); ret = nfs_ok; @@ -7487,7 +7517,9 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if ((status = fh_verify(rqstp, &cstate->current_fh, S_IFREG, 0))) return status; - status = nfsd4_lookup_stateid(cstate, stateid, SC_TYPE_DELEG, 0, &s, nn); + status = nfsd4_lookup_stateid(cstate, stateid, SC_TYPE_DELEG, + SC_STATUS_REVOKED | SC_STATUS_FREEABLE, + &s, nn); if (status) goto out; dp = delegstateid(s); diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 79c743c01a47..35b3564c065f 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -114,6 +114,8 @@ struct nfs4_stid { /* For a deleg stateid kept around only to process free_stateid's: */ #define SC_STATUS_REVOKED BIT(1) #define SC_STATUS_ADMIN_REVOKED BIT(2) +#define SC_STATUS_FREEABLE BIT(3) +#define SC_STATUS_FREED BIT(4) unsigned short sc_status; struct list_head sc_cp_list;

11 months

1
0
0 0

FAILED: patch "[PATCH] drm/amdgpu: fix random data corruption for sdma 7" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x 108bc59fe817686a59d2008f217bad38a5cf4427 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024102849-flannels-ashy-e1bf@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 108bc59fe817686a59d2008f217bad38a5cf4427 Mon Sep 17 00:00:00 2001 From: Frank Min <Frank.Min(a)amd.com> Date: Thu, 10 Oct 2024 16:41:32 +0800 Subject: [PATCH] drm/amdgpu: fix random data corruption for sdma 7 There is random data corruption caused by const fill, this is caused by write compression mode not correctly configured. So correct compression mode for const fill. Signed-off-by: Frank Min <Frank.Min(a)amd.com> Reviewed-by: Alex Deucher <alexander.deucher(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> (cherry picked from commit 75400f8d6e36afc88d59db8a1f3e4b7d90d836ad) Cc: stable(a)vger.kernel.org # 6.11.x diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c index a8763496aed3..9288f37a3cc5 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c @@ -51,6 +51,12 @@ MODULE_FIRMWARE("amdgpu/sdma_7_0_1.bin"); #define SDMA0_HYP_DEC_REG_END 0x589a #define SDMA1_HYP_DEC_REG_OFFSET 0x20 +/*define for compression field for sdma7*/ +#define SDMA_PKT_CONSTANT_FILL_HEADER_compress_offset 0 +#define SDMA_PKT_CONSTANT_FILL_HEADER_compress_mask 0x00000001 +#define SDMA_PKT_CONSTANT_FILL_HEADER_compress_shift 16 +#define SDMA_PKT_CONSTANT_FILL_HEADER_COMPRESS(x) (((x) & SDMA_PKT_CONSTANT_FILL_HEADER_compress_mask) << SDMA_PKT_CONSTANT_FILL_HEADER_compress_shift) + static const struct amdgpu_hwip_reg_entry sdma_reg_list_7_0[] = { SOC15_REG_ENTRY_STR(GC, 0, regSDMA0_STATUS_REG), SOC15_REG_ENTRY_STR(GC, 0, regSDMA0_STATUS1_REG), @@ -1724,7 +1730,8 @@ static void sdma_v7_0_emit_fill_buffer(struct amdgpu_ib *ib, uint64_t dst_offset, uint32_t byte_count) { - ib->ptr[ib->length_dw++] = SDMA_PKT_COPY_LINEAR_HEADER_OP(SDMA_OP_CONST_FILL); + ib->ptr[ib->length_dw++] = SDMA_PKT_CONSTANT_FILL_HEADER_OP(SDMA_OP_CONST_FILL) | + SDMA_PKT_CONSTANT_FILL_HEADER_COMPRESS(1); ib->ptr[ib->length_dw++] = lower_32_bits(dst_offset); ib->ptr[ib->length_dw++] = upper_32_bits(dst_offset); ib->ptr[ib->length_dw++] = src_data;

11 months

1
0
0 0

FAILED: patch "[PATCH] KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y git checkout FETCH_HEAD git cherry-pick -x f559b2e9c5c5308850544ab59396b7d53cfc67bd # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024102849-giggle-five-0241@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^.. Possible dependencies: f559b2e9c5c5 ("KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory") 2732be902353 ("KVM: nSVM: Don't strip host's C-bit from guest's CR3 when reading PDPTRs") 883b0a91f41a ("KVM: SVM: Move Nested SVM Implementation to nested.c") 46a010dd6896 ("kVM SVM: Move SVM related files to own sub-directory") d55c9d4009c7 ("KVM: nSVM: check for EFER.SVME=1 before entering guest") 0b66465344a7 ("KVM: nSVM: Remove an obsolete comment.") 78f2145c4d93 ("KVM: nSVM: avoid loss of pending IRQ/NMI before entering L2") b518ba9fa691 ("KVM: nSVM: implement check_nested_events for interrupts") 64b5bd270426 ("KVM: nSVM: ignore L1 interrupt window while running L2 with V_INTR_MASKING=1") b5ec2e020b70 ("KVM: nSVM: do not change host intercepts while nested VM is running") 689f3bf21628 ("KVM: x86: unify callbacks to load paging root") 257038745cae ("KVM: x86: Move nSVM CPUID 0x8000000A handling into common x86 code") a50718cc3f43 ("KVM: nSVM: Expose SVM features to L1 iff nested is enabled") 703c335d0693 ("KVM: x86/mmu: Configure max page level during hardware setup") bde772355958 ("KVM: x86/mmu: Merge kvm_{enable,disable}_tdp() into a common function") 213e0e1f500b ("KVM: SVM: Refactor logging of NPT enabled/disabled") a1bead2abaa1 ("KVM: VMX: Directly query Intel PT mode when refreshing PMUs") 139085101f85 ("KVM: x86: Use KVM cpu caps to detect MSR_TSC_AUX virt support") 93c380e7b528 ("KVM: x86: Set emulated/transmuted feature bits via kvm_cpu_caps") bd7919999047 ("KVM: x86: Override host CPUID results with kvm_cpu_caps") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f559b2e9c5c5308850544ab59396b7d53cfc67bd Mon Sep 17 00:00:00 2001 From: Sean Christopherson <seanjc(a)google.com> Date: Wed, 9 Oct 2024 07:08:38 -0700 Subject: [PATCH] KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory Ignore nCR3[4:0] when loading PDPTEs from memory for nested SVM, as bits 4:0 of CR3 are ignored when PAE paging is used, and thus VMRUN doesn't enforce 32-byte alignment of nCR3. In the absolute worst case scenario, failure to ignore bits 4:0 can result in an out-of-bounds read, e.g. if the target page is at the end of a memslot, and the VMM isn't using guard pages. Per the APM: The CR3 register points to the base address of the page-directory-pointer table. The page-directory-pointer table is aligned on a 32-byte boundary, with the low 5 address bits 4:0 assumed to be 0. And the SDM's much more explicit: 4:0 Ignored Note, KVM gets this right when loading PDPTRs, it's only the nSVM flow that is broken. Fixes: e4e517b4be01 ("KVM: MMU: Do not unconditionally read PDPTE from guest memory") Reported-by: Kirk Swidowski <swidowski(a)google.com> Cc: Andy Nguyen <theflow(a)google.com> Cc: 3pvd <3pvd(a)google.com> Cc: stable(a)vger.kernel.org Signed-off-by: Sean Christopherson <seanjc(a)google.com> Message-ID: <20241009140838.1036226-1-seanjc(a)google.com> Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index d5314cb7dff4..cf84103ce38b 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -63,8 +63,12 @@ static u64 nested_svm_get_tdp_pdptr(struct kvm_vcpu *vcpu, int index) u64 pdpte; int ret; + /* + * Note, nCR3 is "assumed" to be 32-byte aligned, i.e. the CPU ignores + * nCR3[4:0] when loading PDPTEs from memory. + */ ret = kvm_vcpu_read_guest_page(vcpu, gpa_to_gfn(cr3), &pdpte, - offset_in_page(cr3) + index * 8, 8); + (cr3 & GENMASK(11, 5)) + index * 8, 8); if (ret) return 0; return pdpte;

11 months

1
0
0 0

FAILED: patch "[PATCH] KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y git checkout FETCH_HEAD git cherry-pick -x f559b2e9c5c5308850544ab59396b7d53cfc67bd # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024102847-level-stoic-fc77@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^.. Possible dependencies: f559b2e9c5c5 ("KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory") 2732be902353 ("KVM: nSVM: Don't strip host's C-bit from guest's CR3 when reading PDPTRs") 883b0a91f41a ("KVM: SVM: Move Nested SVM Implementation to nested.c") 46a010dd6896 ("kVM SVM: Move SVM related files to own sub-directory") d55c9d4009c7 ("KVM: nSVM: check for EFER.SVME=1 before entering guest") 0b66465344a7 ("KVM: nSVM: Remove an obsolete comment.") 78f2145c4d93 ("KVM: nSVM: avoid loss of pending IRQ/NMI before entering L2") b518ba9fa691 ("KVM: nSVM: implement check_nested_events for interrupts") 64b5bd270426 ("KVM: nSVM: ignore L1 interrupt window while running L2 with V_INTR_MASKING=1") b5ec2e020b70 ("KVM: nSVM: do not change host intercepts while nested VM is running") 689f3bf21628 ("KVM: x86: unify callbacks to load paging root") 257038745cae ("KVM: x86: Move nSVM CPUID 0x8000000A handling into common x86 code") a50718cc3f43 ("KVM: nSVM: Expose SVM features to L1 iff nested is enabled") 703c335d0693 ("KVM: x86/mmu: Configure max page level during hardware setup") bde772355958 ("KVM: x86/mmu: Merge kvm_{enable,disable}_tdp() into a common function") 213e0e1f500b ("KVM: SVM: Refactor logging of NPT enabled/disabled") a1bead2abaa1 ("KVM: VMX: Directly query Intel PT mode when refreshing PMUs") 139085101f85 ("KVM: x86: Use KVM cpu caps to detect MSR_TSC_AUX virt support") 93c380e7b528 ("KVM: x86: Set emulated/transmuted feature bits via kvm_cpu_caps") bd7919999047 ("KVM: x86: Override host CPUID results with kvm_cpu_caps") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f559b2e9c5c5308850544ab59396b7d53cfc67bd Mon Sep 17 00:00:00 2001 From: Sean Christopherson <seanjc(a)google.com> Date: Wed, 9 Oct 2024 07:08:38 -0700 Subject: [PATCH] KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory Ignore nCR3[4:0] when loading PDPTEs from memory for nested SVM, as bits 4:0 of CR3 are ignored when PAE paging is used, and thus VMRUN doesn't enforce 32-byte alignment of nCR3. In the absolute worst case scenario, failure to ignore bits 4:0 can result in an out-of-bounds read, e.g. if the target page is at the end of a memslot, and the VMM isn't using guard pages. Per the APM: The CR3 register points to the base address of the page-directory-pointer table. The page-directory-pointer table is aligned on a 32-byte boundary, with the low 5 address bits 4:0 assumed to be 0. And the SDM's much more explicit: 4:0 Ignored Note, KVM gets this right when loading PDPTRs, it's only the nSVM flow that is broken. Fixes: e4e517b4be01 ("KVM: MMU: Do not unconditionally read PDPTE from guest memory") Reported-by: Kirk Swidowski <swidowski(a)google.com> Cc: Andy Nguyen <theflow(a)google.com> Cc: 3pvd <3pvd(a)google.com> Cc: stable(a)vger.kernel.org Signed-off-by: Sean Christopherson <seanjc(a)google.com> Message-ID: <20241009140838.1036226-1-seanjc(a)google.com> Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index d5314cb7dff4..cf84103ce38b 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -63,8 +63,12 @@ static u64 nested_svm_get_tdp_pdptr(struct kvm_vcpu *vcpu, int index) u64 pdpte; int ret; + /* + * Note, nCR3 is "assumed" to be 32-byte aligned, i.e. the CPU ignores + * nCR3[4:0] when loading PDPTEs from memory. + */ ret = kvm_vcpu_read_guest_page(vcpu, gpa_to_gfn(cr3), &pdpte, - offset_in_page(cr3) + index * 8, 8); + (cr3 & GENMASK(11, 5)) + index * 8, 8); if (ret) return 0; return pdpte;

11 months

1
0
0 0

FAILED: patch "[PATCH] ACPI: PRM: Find EFI_MEMORY_RUNTIME block for PRM handler and" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x 088984c8d54c0053fc4ae606981291d741c5924b # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024102850-postal-pogo-94ed@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 088984c8d54c0053fc4ae606981291d741c5924b Mon Sep 17 00:00:00 2001 From: Koba Ko <kobak(a)nvidia.com> Date: Sun, 13 Oct 2024 04:50:10 +0800 Subject: [PATCH] ACPI: PRM: Find EFI_MEMORY_RUNTIME block for PRM handler and context PRMT needs to find the correct type of block to translate the PA-VA mapping for EFI runtime services. The issue arises because the PRMT is finding a block of type EFI_CONVENTIONAL_MEMORY, which is not appropriate for runtime services as described in Section 2.2.2 (Runtime Services) of the UEFI Specification [1]. Since the PRM handler is a type of runtime service, this causes an exception when the PRM handler is called. [Firmware Bug]: Unable to handle paging request in EFI runtime service WARNING: CPU: 22 PID: 4330 at drivers/firmware/efi/runtime-wrappers.c:341 __efi_queue_work+0x11c/0x170 Call trace: Let PRMT find a block with EFI_MEMORY_RUNTIME for PRM handler and PRM context. If no suitable block is found, a warning message will be printed, but the procedure continues to manage the next PRM handler. However, if the PRM handler is actually called without proper allocation, it would result in a failure during error handling. By using the correct memory types for runtime services, ensure that the PRM handler and the context are properly mapped in the virtual address space during runtime, preventing the paging request error. The issue is really that only memory that has been remapped for runtime by the firmware can be used by the PRM handler, and so the region needs to have the EFI_MEMORY_RUNTIME attribute. Link: https://uefi.org/sites/default/files/resources/UEFI_Spec_2_10_Aug29.pdf # [1] Fixes: cefc7ca46235 ("ACPI: PRM: implement OperationRegion handler for the PlatformRtMechanism subtype") Cc: All applicable <stable(a)vger.kernel.org> Signed-off-by: Koba Ko <kobak(a)nvidia.com> Reviewed-by: Matthew R. Ochs <mochs(a)nvidia.com> Reviewed-by: Zhang Rui <rui.zhang(a)intel.com> Reviewed-by: Ard Biesheuvel <ardb(a)kernel.org> Link: https://patch.msgid.link/20241012205010.4165798-1-kobak@nvidia.com [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> diff --git a/drivers/acpi/prmt.c b/drivers/acpi/prmt.c index 1cfaa5957ac4..d59307a76ca3 100644 --- a/drivers/acpi/prmt.c +++ b/drivers/acpi/prmt.c @@ -72,17 +72,21 @@ struct prm_module_info { struct prm_handler_info handlers[] __counted_by(handler_count); }; -static u64 efi_pa_va_lookup(u64 pa) +static u64 efi_pa_va_lookup(efi_guid_t *guid, u64 pa) { efi_memory_desc_t *md; u64 pa_offset = pa & ~PAGE_MASK; u64 page = pa & PAGE_MASK; for_each_efi_memory_desc(md) { - if (md->phys_addr < pa && pa < md->phys_addr + PAGE_SIZE * md->num_pages) + if ((md->attribute & EFI_MEMORY_RUNTIME) && + (md->phys_addr < pa && pa < md->phys_addr + PAGE_SIZE * md->num_pages)) { return pa_offset + md->virt_addr + page - md->phys_addr; + } } + pr_warn("Failed to find VA for GUID: %pUL, PA: 0x%llx", guid, pa); + return 0; } @@ -148,9 +152,15 @@ acpi_parse_prmt(union acpi_subtable_headers *header, const unsigned long end) th = &tm->handlers[cur_handler]; guid_copy(&th->guid, (guid_t *)handler_info->handler_guid); - th->handler_addr = (void *)efi_pa_va_lookup(handler_info->handler_address); - th->static_data_buffer_addr = efi_pa_va_lookup(handler_info->static_data_buffer_address); - th->acpi_param_buffer_addr = efi_pa_va_lookup(handler_info->acpi_param_buffer_address); + th->handler_addr = + (void *)efi_pa_va_lookup(&th->guid, handler_info->handler_address); + + th->static_data_buffer_addr = + efi_pa_va_lookup(&th->guid, handler_info->static_data_buffer_address); + + th->acpi_param_buffer_addr = + efi_pa_va_lookup(&th->guid, handler_info->acpi_param_buffer_address); + } while (++cur_handler < tm->handler_count && (handler_info = get_next_handler(handler_info))); return 0; @@ -277,6 +287,13 @@ static acpi_status acpi_platformrt_space_handler(u32 function, if (!handler || !module) goto invalid_guid; + if (!handler->handler_addr || + !handler->static_data_buffer_addr || + !handler->acpi_param_buffer_addr) { + buffer->prm_status = PRM_HANDLER_ERROR; + return AE_OK; + } + ACPI_COPY_NAMESEG(context.signature, "PRMC"); context.revision = 0x0; context.reserved = 0x0;

11 months

1
0
0 0

FAILED: patch "[PATCH] btrfs: fix the delalloc range locking if sector size < page" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x f10f59f91a6278e9637327d1206140d28e2d5004 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024102859-around-strangely-5c99@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f10f59f91a6278e9637327d1206140d28e2d5004 Mon Sep 17 00:00:00 2001 From: Qu Wenruo <wqu(a)suse.com> Date: Wed, 9 Oct 2024 09:37:03 +1030 Subject: [PATCH] btrfs: fix the delalloc range locking if sector size < page size Inside lock_delalloc_folios(), there are several problems related to sector size < page size handling: - Set the writer locks without checking if the folio is still valid We call btrfs_folio_start_writer_lock() just like it's folio_lock(). But since the folio may not even be the folio of the current mapping, we can easily screw up the folio->private. - The range is not clamped inside the page This means we can over write other bitmaps if the start/len is not properly handled, and trigger the btrfs_subpage_assert(). - @processed_end is always rounded up to page end If the delalloc range is not page aligned, and we need to retry (returning -EAGAIN), then we will unlock to the page end. Thankfully this is not a huge problem, as now btrfs_folio_end_writer_lock() can handle range larger than the locked range, and only unlock what is already locked. Fix all these problems by: - Lock and check the folio first, then call btrfs_folio_set_writer_lock() So that if we got a folio not belonging to the inode, we won't touch folio->private. - Properly truncate the range inside the page - Update @processed_end to the locked range end Fixes: 1e1de38792e0 ("btrfs: make process_one_page() to handle subpage locking") CC: stable(a)vger.kernel.org # 6.1+ Signed-off-by: Qu Wenruo <wqu(a)suse.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 309a8ae48434..872cca54cc6c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -262,22 +262,23 @@ static noinline int lock_delalloc_folios(struct inode *inode, for (i = 0; i < found_folios; i++) { struct folio *folio = fbatch.folios[i]; - u32 len = end + 1 - start; + u64 range_start; + u32 range_len; if (folio == locked_folio) continue; - if (btrfs_folio_start_writer_lock(fs_info, folio, start, - len)) - goto out; - + folio_lock(folio); if (!folio_test_dirty(folio) || folio->mapping != mapping) { - btrfs_folio_end_writer_lock(fs_info, folio, start, - len); + folio_unlock(folio); goto out; } + range_start = max_t(u64, folio_pos(folio), start); + range_len = min_t(u64, folio_pos(folio) + folio_size(folio), + end + 1) - range_start; + btrfs_folio_set_writer_lock(fs_info, folio, range_start, range_len); - processed_end = folio_pos(folio) + folio_size(folio) - 1; + processed_end = range_start + range_len - 1; } folio_batch_release(&fbatch); cond_resched();

11 months

1
0
0 0

FAILED: patch "[PATCH] btrfs: fix the delalloc range locking if sector size < page" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x f10f59f91a6278e9637327d1206140d28e2d5004 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024102857-unvaried-grip-2f5d@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f10f59f91a6278e9637327d1206140d28e2d5004 Mon Sep 17 00:00:00 2001 From: Qu Wenruo <wqu(a)suse.com> Date: Wed, 9 Oct 2024 09:37:03 +1030 Subject: [PATCH] btrfs: fix the delalloc range locking if sector size < page size Inside lock_delalloc_folios(), there are several problems related to sector size < page size handling: - Set the writer locks without checking if the folio is still valid We call btrfs_folio_start_writer_lock() just like it's folio_lock(). But since the folio may not even be the folio of the current mapping, we can easily screw up the folio->private. - The range is not clamped inside the page This means we can over write other bitmaps if the start/len is not properly handled, and trigger the btrfs_subpage_assert(). - @processed_end is always rounded up to page end If the delalloc range is not page aligned, and we need to retry (returning -EAGAIN), then we will unlock to the page end. Thankfully this is not a huge problem, as now btrfs_folio_end_writer_lock() can handle range larger than the locked range, and only unlock what is already locked. Fix all these problems by: - Lock and check the folio first, then call btrfs_folio_set_writer_lock() So that if we got a folio not belonging to the inode, we won't touch folio->private. - Properly truncate the range inside the page - Update @processed_end to the locked range end Fixes: 1e1de38792e0 ("btrfs: make process_one_page() to handle subpage locking") CC: stable(a)vger.kernel.org # 6.1+ Signed-off-by: Qu Wenruo <wqu(a)suse.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 309a8ae48434..872cca54cc6c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -262,22 +262,23 @@ static noinline int lock_delalloc_folios(struct inode *inode, for (i = 0; i < found_folios; i++) { struct folio *folio = fbatch.folios[i]; - u32 len = end + 1 - start; + u64 range_start; + u32 range_len; if (folio == locked_folio) continue; - if (btrfs_folio_start_writer_lock(fs_info, folio, start, - len)) - goto out; - + folio_lock(folio); if (!folio_test_dirty(folio) || folio->mapping != mapping) { - btrfs_folio_end_writer_lock(fs_info, folio, start, - len); + folio_unlock(folio); goto out; } + range_start = max_t(u64, folio_pos(folio), start); + range_len = min_t(u64, folio_pos(folio) + folio_size(folio), + end + 1) - range_start; + btrfs_folio_set_writer_lock(fs_info, folio, range_start, range_len); - processed_end = folio_pos(folio) + folio_size(folio) - 1; + processed_end = range_start + range_len - 1; } folio_batch_release(&fbatch); cond_resched();

11 months

1
0
0 0

FAILED: patch "[PATCH] btrfs: fix the delalloc range locking if sector size < page" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x f10f59f91a6278e9637327d1206140d28e2d5004 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024102855-swab-anytime-3da1@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f10f59f91a6278e9637327d1206140d28e2d5004 Mon Sep 17 00:00:00 2001 From: Qu Wenruo <wqu(a)suse.com> Date: Wed, 9 Oct 2024 09:37:03 +1030 Subject: [PATCH] btrfs: fix the delalloc range locking if sector size < page size Inside lock_delalloc_folios(), there are several problems related to sector size < page size handling: - Set the writer locks without checking if the folio is still valid We call btrfs_folio_start_writer_lock() just like it's folio_lock(). But since the folio may not even be the folio of the current mapping, we can easily screw up the folio->private. - The range is not clamped inside the page This means we can over write other bitmaps if the start/len is not properly handled, and trigger the btrfs_subpage_assert(). - @processed_end is always rounded up to page end If the delalloc range is not page aligned, and we need to retry (returning -EAGAIN), then we will unlock to the page end. Thankfully this is not a huge problem, as now btrfs_folio_end_writer_lock() can handle range larger than the locked range, and only unlock what is already locked. Fix all these problems by: - Lock and check the folio first, then call btrfs_folio_set_writer_lock() So that if we got a folio not belonging to the inode, we won't touch folio->private. - Properly truncate the range inside the page - Update @processed_end to the locked range end Fixes: 1e1de38792e0 ("btrfs: make process_one_page() to handle subpage locking") CC: stable(a)vger.kernel.org # 6.1+ Signed-off-by: Qu Wenruo <wqu(a)suse.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 309a8ae48434..872cca54cc6c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -262,22 +262,23 @@ static noinline int lock_delalloc_folios(struct inode *inode, for (i = 0; i < found_folios; i++) { struct folio *folio = fbatch.folios[i]; - u32 len = end + 1 - start; + u64 range_start; + u32 range_len; if (folio == locked_folio) continue; - if (btrfs_folio_start_writer_lock(fs_info, folio, start, - len)) - goto out; - + folio_lock(folio); if (!folio_test_dirty(folio) || folio->mapping != mapping) { - btrfs_folio_end_writer_lock(fs_info, folio, start, - len); + folio_unlock(folio); goto out; } + range_start = max_t(u64, folio_pos(folio), start); + range_len = min_t(u64, folio_pos(folio) + folio_size(folio), + end + 1) - range_start; + btrfs_folio_set_writer_lock(fs_info, folio, range_start, range_len); - processed_end = folio_pos(folio) + folio_size(folio) - 1; + processed_end = range_start + range_len - 1; } folio_batch_release(&fbatch); cond_resched();

11 months

1
0
0 0

FAILED: patch "[PATCH] btrfs: qgroup: set a more sane default value for subtree drop" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 5f9062a48db260fd6b53d86ecfb4d5dc59266316 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024102858-yield-gestate-e6bc@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 5f9062a48db260fd6b53d86ecfb4d5dc59266316 Mon Sep 17 00:00:00 2001 From: Qu Wenruo <wqu(a)suse.com> Date: Tue, 10 Sep 2024 15:21:04 +0930 Subject: [PATCH] btrfs: qgroup: set a more sane default value for subtree drop threshold Since commit 011b46c30476 ("btrfs: skip subtree scan if it's too high to avoid low stall in btrfs_commit_transaction()"), btrfs qgroup can automatically skip large subtree scan at the cost of marking qgroup inconsistent. It's designed to address the final performance problem of snapshot drop with qgroup enabled, but to be safe the default value is BTRFS_MAX_LEVEL, requiring a user space daemon to set a different value to make it work. I'd say it's not a good idea to rely on user space tool to set this default value, especially when some operations (snapshot dropping) can be triggered immediately after mount, leaving a very small window to that that sysfs interface. So instead of disabling this new feature by default, enable it with a low threshold (3), so that large subvolume tree drop at mount time won't cause huge qgroup workload. CC: stable(a)vger.kernel.org # 6.1 Signed-off-by: Qu Wenruo <wqu(a)suse.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 1238a38c59b2..5afb68c0304b 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1959,7 +1959,7 @@ static void btrfs_init_qgroup(struct btrfs_fs_info *fs_info) fs_info->qgroup_seq = 1; fs_info->qgroup_ulist = NULL; fs_info->qgroup_rescan_running = false; - fs_info->qgroup_drop_subtree_thres = BTRFS_MAX_LEVEL; + fs_info->qgroup_drop_subtree_thres = BTRFS_QGROUP_DROP_SUBTREE_THRES_DEFAULT; mutex_init(&fs_info->qgroup_rescan_lock); } diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 1332ec59c539..a0e8deca87a7 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1407,7 +1407,7 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info) fs_info->quota_root = NULL; fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_ON; fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_SIMPLE_MODE; - fs_info->qgroup_drop_subtree_thres = BTRFS_MAX_LEVEL; + fs_info->qgroup_drop_subtree_thres = BTRFS_QGROUP_DROP_SUBTREE_THRES_DEFAULT; spin_unlock(&fs_info->qgroup_lock); btrfs_free_qgroup_config(fs_info); diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index 98adf4ec7b01..c229256d6fd5 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -121,6 +121,8 @@ struct btrfs_inode; #define BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN (1ULL << 63) #define BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING (1ULL << 62) +#define BTRFS_QGROUP_DROP_SUBTREE_THRES_DEFAULT (3) + /* * Record a dirty extent, and info qgroup to update quota on it */

11 months

1
0
0 0

FAILED: patch "[PATCH] btrfs: qgroup: set a more sane default value for subtree drop" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 5f9062a48db260fd6b53d86ecfb4d5dc59266316 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024102856-chance-seventh-bedd@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 5f9062a48db260fd6b53d86ecfb4d5dc59266316 Mon Sep 17 00:00:00 2001 From: Qu Wenruo <wqu(a)suse.com> Date: Tue, 10 Sep 2024 15:21:04 +0930 Subject: [PATCH] btrfs: qgroup: set a more sane default value for subtree drop threshold Since commit 011b46c30476 ("btrfs: skip subtree scan if it's too high to avoid low stall in btrfs_commit_transaction()"), btrfs qgroup can automatically skip large subtree scan at the cost of marking qgroup inconsistent. It's designed to address the final performance problem of snapshot drop with qgroup enabled, but to be safe the default value is BTRFS_MAX_LEVEL, requiring a user space daemon to set a different value to make it work. I'd say it's not a good idea to rely on user space tool to set this default value, especially when some operations (snapshot dropping) can be triggered immediately after mount, leaving a very small window to that that sysfs interface. So instead of disabling this new feature by default, enable it with a low threshold (3), so that large subvolume tree drop at mount time won't cause huge qgroup workload. CC: stable(a)vger.kernel.org # 6.1 Signed-off-by: Qu Wenruo <wqu(a)suse.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 1238a38c59b2..5afb68c0304b 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1959,7 +1959,7 @@ static void btrfs_init_qgroup(struct btrfs_fs_info *fs_info) fs_info->qgroup_seq = 1; fs_info->qgroup_ulist = NULL; fs_info->qgroup_rescan_running = false; - fs_info->qgroup_drop_subtree_thres = BTRFS_MAX_LEVEL; + fs_info->qgroup_drop_subtree_thres = BTRFS_QGROUP_DROP_SUBTREE_THRES_DEFAULT; mutex_init(&fs_info->qgroup_rescan_lock); } diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 1332ec59c539..a0e8deca87a7 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1407,7 +1407,7 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info) fs_info->quota_root = NULL; fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_ON; fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_SIMPLE_MODE; - fs_info->qgroup_drop_subtree_thres = BTRFS_MAX_LEVEL; + fs_info->qgroup_drop_subtree_thres = BTRFS_QGROUP_DROP_SUBTREE_THRES_DEFAULT; spin_unlock(&fs_info->qgroup_lock); btrfs_free_qgroup_config(fs_info); diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index 98adf4ec7b01..c229256d6fd5 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -121,6 +121,8 @@ struct btrfs_inode; #define BTRFS_QGROUP_RUNTIME_FLAG_CANCEL_RESCAN (1ULL << 63) #define BTRFS_QGROUP_RUNTIME_FLAG_NO_ACCOUNTING (1ULL << 62) +#define BTRFS_QGROUP_DROP_SUBTREE_THRES_DEFAULT (3) + /* * Record a dirty extent, and info qgroup to update quota on it */

11 months

1
0
0 0

[PATCH] mmc: dw_mmc: take SWIOTLB memory size limitation into account

by Aurelien Jarno

The Synopsys DesignWare mmc controller on the JH7110 SoC (dw_mmc-starfive.c driver) is using a 32-bit IDMAC address bus width, and thus requires the use of SWIOTLB. The commit 8396c793ffdf ("mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4K") increased the max_seq_size, even for 4K pages, causing "swiotlb buffer is full" to happen because swiotlb can only handle a memory size up to 256kB only. Fix the issue, by making sure the dw_mmc driver doesn't use segments bigger than what SWIOTLB can handle. Reported-by: Ron Economos <re(a)w6rz.net> Reported-by: Jing Luo <jing(a)jing.rocks> Fixes: 8396c793ffdf ("mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4K") Cc: stable(a)vger.kernel.org Signed-off-by: Aurelien Jarno <aurelien(a)aurel32.net> --- drivers/mmc/host/dw_mmc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c index 41e451235f637..dc0d6201f7b73 100644 --- a/drivers/mmc/host/dw_mmc.c +++ b/drivers/mmc/host/dw_mmc.c @@ -2958,7 +2958,8 @@ static int dw_mci_init_slot(struct dw_mci *host) mmc->max_segs = host->ring_size; mmc->max_blk_size = 65535; mmc->max_req_size = DW_MCI_DESC_DATA_LENGTH * host->ring_size; - mmc->max_seg_size = mmc->max_req_size; + mmc->max_seg_size = + min_t(size_t, mmc->max_req_size, dma_max_mapping_size(host->dev)); mmc->max_blk_count = mmc->max_req_size / 512; } else if (host->use_dma == TRANS_MODE_EDMAC) { mmc->max_segs = 64; -- 2.45.2

11 months

5
4
0 0

[PATCH v2 01/13] iio: chemical: bme680: Fix missing header

by Vasileios Amoiridis

Add the linux/regmap.h header since the struct regmap_config is used in this file. Cc: <Stable(a)vger.kernel.org> Fixes: 1b3bd8592780 ("iio: chemical: Add support for Bosch BME680 sensor") Signed-off-by: Vasileios Amoiridis <vassilisamir(a)gmail.com> --- drivers/iio/chemical/bme680.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/iio/chemical/bme680.h b/drivers/iio/chemical/bme680.h index b2c547ac8d34..dc9ff477da34 100644 --- a/drivers/iio/chemical/bme680.h +++ b/drivers/iio/chemical/bme680.h @@ -2,6 +2,8 @@ #ifndef BME680_H_ #define BME680_H_ +#include <linux/regmap.h> + #define BME680_REG_CHIP_ID 0xD0 #define BME680_CHIP_ID_VAL 0x61 #define BME680_REG_SOFT_RESET 0xE0 -- 2.43.0

11 months

3
2
0 0

[tip: timers/urgent] posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on clone

by tip-bot2 for Benjamin Segall

The following commit has been merged into the timers/urgent branch of tip: Commit-ID: b5413156bad91dc2995a5c4eab1b05e56914638a Gitweb: https://git.kernel.org/tip/b5413156bad91dc2995a5c4eab1b05e56914638a Author: Benjamin Segall <bsegall(a)google.com> AuthorDate: Fri, 25 Oct 2024 18:35:35 -07:00 Committer: Thomas Gleixner <tglx(a)linutronix.de> CommitterDate: Sun, 27 Oct 2024 10:36:04 +01:00 posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on clone When cloning a new thread, its posix_cputimers are not inherited, and are cleared by posix_cputimers_init(). However, this does not clear the tick dependency it creates in tsk->tick_dep_mask, and the handler does not reach the code to clear the dependency if there were no timers to begin with. Thus if a thread has a cputimer running before clone/fork, all descendants will prevent nohz_full unless they create a cputimer of their own. Fix this by entirely clearing the tick_dep_mask in copy_process(). (There is currently no inherited state that needs a tick dependency) Process-wide timers do not have this problem because fork does not copy signal_struct as a baseline, it creates one from scratch. Fixes: b78783000d5c ("posix-cpu-timers: Migrate to use new tick dependency mask model") Signed-off-by: Ben Segall <bsegall(a)google.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Reviewed-by: Frederic Weisbecker <frederic(a)kernel.org> Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/all/xm26o737bq8o.fsf@google.com --- include/linux/tick.h | 8 ++++++++ kernel/fork.c | 2 ++ 2 files changed, 10 insertions(+) diff --git a/include/linux/tick.h b/include/linux/tick.h index 7274463..99c9c5a 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -251,12 +251,19 @@ static inline void tick_dep_set_task(struct task_struct *tsk, if (tick_nohz_full_enabled()) tick_nohz_dep_set_task(tsk, bit); } + static inline void tick_dep_clear_task(struct task_struct *tsk, enum tick_dep_bits bit) { if (tick_nohz_full_enabled()) tick_nohz_dep_clear_task(tsk, bit); } + +static inline void tick_dep_init_task(struct task_struct *tsk) +{ + atomic_set(&tsk->tick_dep_mask, 0); +} + static inline void tick_dep_set_signal(struct task_struct *tsk, enum tick_dep_bits bit) { @@ -290,6 +297,7 @@ static inline void tick_dep_set_task(struct task_struct *tsk, enum tick_dep_bits bit) { } static inline void tick_dep_clear_task(struct task_struct *tsk, enum tick_dep_bits bit) { } +static inline void tick_dep_init_task(struct task_struct *tsk) { } static inline void tick_dep_set_signal(struct task_struct *tsk, enum tick_dep_bits bit) { } static inline void tick_dep_clear_signal(struct signal_struct *signal, diff --git a/kernel/fork.c b/kernel/fork.c index 89ceb4a..6fa9fe6 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -105,6 +105,7 @@ #include <linux/rseq.h> #include <uapi/linux/pidfd.h> #include <linux/pidfs.h> +#include <linux/tick.h> #include <asm/pgalloc.h> #include <linux/uaccess.h> @@ -2292,6 +2293,7 @@ __latent_entropy struct task_struct *copy_process( acct_clear_integrals(p); posix_cputimers_init(&p->posix_cputimers); + tick_dep_init_task(p); p->io_context = NULL; audit_set_context(p, NULL);

11 months

1
0
0 0

[PATCH v2] iio: invensense: fix multiple odr switch when FIFO is off

by Jean-Baptiste Maneyrol via B4 Relay

From: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol(a)tdk.com> When multiple ODR switch happens during FIFO off, the change could not be taken into account if you get back to previous FIFO on value. For example, if you run sensor buffer at 50Hz, stop, change to 200Hz, then back to 50Hz and restart buffer, data will be timestamped at 200Hz. This due to testing against mult and not new_mult. To prevent this, let's just run apply_odr automatically when FIFO is off. It will also simplify driver code. Update inv_mpu6050 and inv_icm42600 to delete now useless apply_odr. Fixes: 95444b9eeb8c ("iio: invensense: fix odr switching to same value") Cc: stable(a)vger.kernel.org Signed-off-by: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol(a)tdk.com> --- Changes in v2: - Delete unused anymore local variables. - Link to v1: https://lore.kernel.org/r/20241017-invn-inv-sensors-timestamp-fix-switch-fi… --- drivers/iio/common/inv_sensors/inv_sensors_timestamp.c | 4 ++++ drivers/iio/imu/inv_icm42600/inv_icm42600_accel.c | 2 -- drivers/iio/imu/inv_icm42600/inv_icm42600_gyro.c | 3 --- drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c | 1 - 4 files changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/iio/common/inv_sensors/inv_sensors_timestamp.c b/drivers/iio/common/inv_sensors/inv_sensors_timestamp.c index f44458c380d92823ce2e7e5f78ca877ea4c06118..37d0bdaa8d824f79dcd2f341be7501d249926951 100644 --- a/drivers/iio/common/inv_sensors/inv_sensors_timestamp.c +++ b/drivers/iio/common/inv_sensors/inv_sensors_timestamp.c @@ -70,6 +70,10 @@ int inv_sensors_timestamp_update_odr(struct inv_sensors_timestamp *ts, if (mult != ts->mult) ts->new_mult = mult; + /* When FIFO is off, directly apply the new ODR */ + if (!fifo) + inv_sensors_timestamp_apply_odr(ts, 0, 0, 0); + return 0; } EXPORT_SYMBOL_NS_GPL(inv_sensors_timestamp_update_odr, IIO_INV_SENSORS_TIMESTAMP); diff --git a/drivers/iio/imu/inv_icm42600/inv_icm42600_accel.c b/drivers/iio/imu/inv_icm42600/inv_icm42600_accel.c index 56ac198142500a2e1fc40b62cdd465cc736d8bf0..7968aa27f9fd798f206e72891f1c9b483811dea2 100644 --- a/drivers/iio/imu/inv_icm42600/inv_icm42600_accel.c +++ b/drivers/iio/imu/inv_icm42600/inv_icm42600_accel.c @@ -200,7 +200,6 @@ static int inv_icm42600_accel_update_scan_mode(struct iio_dev *indio_dev, { struct inv_icm42600_state *st = iio_device_get_drvdata(indio_dev); struct inv_icm42600_sensor_state *accel_st = iio_priv(indio_dev); - struct inv_sensors_timestamp *ts = &accel_st->ts; struct inv_icm42600_sensor_conf conf = INV_ICM42600_SENSOR_CONF_INIT; unsigned int fifo_en = 0; unsigned int sleep_temp = 0; @@ -229,7 +228,6 @@ static int inv_icm42600_accel_update_scan_mode(struct iio_dev *indio_dev, } /* update data FIFO write */ - inv_sensors_timestamp_apply_odr(ts, 0, 0, 0); ret = inv_icm42600_buffer_set_fifo_en(st, fifo_en | st->fifo.en); out_unlock: diff --git a/drivers/iio/imu/inv_icm42600/inv_icm42600_gyro.c b/drivers/iio/imu/inv_icm42600/inv_icm42600_gyro.c index 938af5b640b00f58d2b8185f752c4755edfb0d25..c6bb68bf5e1449d4b961ac962311cbc5aa3c0a97 100644 --- a/drivers/iio/imu/inv_icm42600/inv_icm42600_gyro.c +++ b/drivers/iio/imu/inv_icm42600/inv_icm42600_gyro.c @@ -99,8 +99,6 @@ static int inv_icm42600_gyro_update_scan_mode(struct iio_dev *indio_dev, const unsigned long *scan_mask) { struct inv_icm42600_state *st = iio_device_get_drvdata(indio_dev); - struct inv_icm42600_sensor_state *gyro_st = iio_priv(indio_dev); - struct inv_sensors_timestamp *ts = &gyro_st->ts; struct inv_icm42600_sensor_conf conf = INV_ICM42600_SENSOR_CONF_INIT; unsigned int fifo_en = 0; unsigned int sleep_gyro = 0; @@ -128,7 +126,6 @@ static int inv_icm42600_gyro_update_scan_mode(struct iio_dev *indio_dev, } /* update data FIFO write */ - inv_sensors_timestamp_apply_odr(ts, 0, 0, 0); ret = inv_icm42600_buffer_set_fifo_en(st, fifo_en | st->fifo.en); out_unlock: diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c index 3bfeabab0ec4f6fa28fbbcd47afe92af5b8a58e2..5b1088cc3704f1ad1288a0d65b2f957b91455d7f 100644 --- a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c +++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c @@ -112,7 +112,6 @@ int inv_mpu6050_prepare_fifo(struct inv_mpu6050_state *st, bool enable) if (enable) { /* reset timestamping */ inv_sensors_timestamp_reset(&st->timestamp); - inv_sensors_timestamp_apply_odr(&st->timestamp, 0, 0, 0); /* reset FIFO */ d = st->chip_config.user_ctrl | INV_MPU6050_BIT_FIFO_RST; ret = regmap_write(st->map, st->reg->user_ctrl, d); --- base-commit: c3e9df514041ec6c46be83801b1891392f4522f7 change-id: 20241017-invn-inv-sensors-timestamp-fix-switch-fifo-off-3f29110e95d0 Best regards, -- Jean-Baptiste Maneyrol <jean-baptiste.maneyrol(a)tdk.com>

11 months

2
1
0 0

Concerns over transparency of informal kernel groups

by Jiaxun Yang

Dear Linux Community Members, Over the years, various informal groups have formed within our community, serving purposes such as maintaining connections with companies and external bodies, handling sensitive information, making challenging decisions, and, at times, representing the community as a whole. These groups contribute significantly to our community's development and deserve our recognition and appreciation. I'll name a few below that I identified from `Documentation/`: - Code of Conduct Committee <conduct(a)kernel.org> - Linux kernel security team <security(a)kernel.org> - Linux kernel hardware security team <hardware-security(a)kernel.org> - Kernel CVE assignment team <cve(a)kernel.org> - Stable Team for unpublished vulnerabilities <stable(a)kernel.org> (I suspect it's just an alias to regular stable team, but I found no evidence). Over recent events, I've taken a closer look at how our community's governance operates, only to find that there's remarkably little public information available about those informal groups. With the exception of the Linux kernel hardware security team, it seems none of these groups maintain a public list of members that I can easily find. Upon digging into the details, I’d like to raise a few concerns and offer some thoughts for further discussion: - Absence of a Membership Register Our community is built on mutual trust. Without knowing who comprises these groups, it's understandably difficult for people to have full confidence in their work. A publicly available membership list would not only foster trust but also allow us to address our recognition and appreciation. - Lack of Guidelines for Actions Many of these groups appear to operate without documented guidelines. While I trust each respectful individual's integrity, documented guidelines would enable the wider community to better understand and appreciate the roles and responsibilities involved. - Insufficient Transparency in Decision-Making I fully respect the need for confidentiality in handling security matters, yet some degree of openness around decision-making processes is essential in my opinion. Releasing communications post-embargo, for instance, could promote understanding and prevent potential abuse of confidential procedures. - No Conflict of Interest Policy Particularly in the case of the Code of Conduct Committee, there may arise situations where individuals face challenging decisions involving personal connections. A conflict of interest policy would provide valuable guidance in such circumstances. Thank you for reading. I know none of us enjoy being pulled away by these non-technical concerns, we love coding after all. However, I feel these concerns are vital for the community's continued health. It might be a candidate of Linux TAB discussion. I'm looking forward to everyone's input. Thanks - Jiaxun

11 months

5
9
0 0

[PATCH can] can: mcp251xfd: mcp251xfd_ring_alloc(): fix coalescing configuration when switching CAN modes

by Marc Kleine-Budde

Since commit 50ea5449c563 ("can: mcp251xfd: fix ring configuration when switching from CAN-CC to CAN-FD mode"), the current ring and coalescing configuration is passed to can_ram_get_layout(). That fixed the issue when switching between CAN-CC and CAN-FD mode with configured ring (rx, tx) and/or coalescing parameters (rx-frames-irq, tx-frames-irq). However 50ea5449c563 ("can: mcp251xfd: fix ring configuration when switching from CAN-CC to CAN-FD mode"), introduced a regression when switching CAN modes with disabled coalescing configuration: Even if the previous CAN mode has no coalescing configured, the new mode is configured with active coalescing. This leads to delayed receiving of CAN-FD frames. This comes from the fact, that ethtool uses usecs = 0 and max_frames = 1 to disable coalescing, however the driver uses internally priv->{rx,tx}_obj_num_coalesce_irq = 0 to indicate disabled coalescing. Fix the regression by assigning struct ethtool_coalesce ec->{rx,tx}_max_coalesced_frames_irq = 1 if coalescing is disabled in the driver as can_ram_get_layout() expects this. Reported-by: https://github.com/vdh-robothania Closes: https://github.com/raspberrypi/linux/issues/6407 Fixes: 50ea5449c563 ("can: mcp251xfd: fix ring configuration when switching from CAN-CC to CAN-FD mode") Cc: stable(a)vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de> --- drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c index e684991fa3917d4f6b6ebda8329f72971237574e..7209a831f0f2089e409c6be635f0e5dc7b2271da 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c @@ -2,7 +2,7 @@ // // mcp251xfd - Microchip MCP251xFD Family CAN controller driver // -// Copyright (c) 2019, 2020, 2021 Pengutronix, +// Copyright (c) 2019, 2020, 2021, 2024 Pengutronix, // Marc Kleine-Budde <kernel(a)pengutronix.de> // // Based on: @@ -483,9 +483,11 @@ int mcp251xfd_ring_alloc(struct mcp251xfd_priv *priv) }; const struct ethtool_coalesce ec = { .rx_coalesce_usecs_irq = priv->rx_coalesce_usecs_irq, - .rx_max_coalesced_frames_irq = priv->rx_obj_num_coalesce_irq, + .rx_max_coalesced_frames_irq = priv->rx_obj_num_coalesce_irq == 0 ? + 1 : priv->rx_obj_num_coalesce_irq, .tx_coalesce_usecs_irq = priv->tx_coalesce_usecs_irq, - .tx_max_coalesced_frames_irq = priv->tx_obj_num_coalesce_irq, + .tx_max_coalesced_frames_irq = priv->tx_obj_num_coalesce_irq == 0 ? + 1 : priv->tx_obj_num_coalesce_irq, }; struct can_ram_layout layout; --- base-commit: 9efc44fb2dba6138b0575826319200049078679a change-id: 20241010-mcp251xfd-fix-coalesing-f373066dd42e Best regards, -- Marc Kleine-Budde <mkl(a)pengutronix.de>

11 months

2
1
0 0

[PATCHSET v5.1 3/9] xfs: metadata inode directory trees

by Darrick J. Wong

Hi all, This series delivers a new feature -- metadata inode directories. This is a separate directory tree (rooted in the superblock) that contains only inodes that contain filesystem metadata. Different metadata objects can be looked up with regular paths. Start by creating xfs_imeta{dir,file}* functions to mediate access to the metadata directory tree. By the end of this mega series, all existing metadata inodes (rt+quota) will use this directory tree instead of the superblock. Next, define the metadir on-disk format, which consists of marking inodes with a new iflag that says they're metadata. This prevents bulkstat and friends from ever getting their hands on fs metadata files. If you're going to start using this code, I strongly recommend pulling from my git trees, which are linked below. This has been running on the djcloud for months with no problems. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=me… xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h… --- Commits in this patchset: * xfs: constify the xfs_sb predicates * xfs: constify the xfs_inode predicates * xfs: rename metadata inode predicates * xfs: standardize EXPERIMENTAL warning generation * xfs: define the on-disk format for the metadir feature * xfs: iget for metadata inodes * xfs: load metadata directory root at mount time * xfs: enforce metadata inode flag * xfs: read and write metadata inode directory tree * xfs: disable the agi rotor for metadata inodes * xfs: hide metadata inodes from everyone because they are special * xfs: advertise metadata directory feature * xfs: allow bulkstat to return metadata directories * xfs: don't count metadata directory files to quota * xfs: mark quota inodes as metadata files * xfs: adjust xfs_bmap_add_attrfork for metadir * xfs: record health problems with the metadata directory * xfs: refactor directory tree root predicates * xfs: do not count metadata directory files when doing online quotacheck * xfs: don't fail repairs on metadata files with no attr fork * xfs: metadata files can have xattrs if metadir is enabled * xfs: adjust parent pointer scrubber for sb-rooted metadata files * xfs: fix di_metatype field of inodes that won't load * xfs: scrub metadata directories * xfs: check the metadata directory inumber in superblocks * xfs: move repair temporary files to the metadata directory tree * xfs: check metadata directory file path connectivity * xfs: confirm dotdot target before replacing it during a repair * xfs: repair metadata directory file path connectivity --- fs/xfs/Makefile | 5 fs/xfs/libxfs/xfs_attr.c | 5 fs/xfs/libxfs/xfs_bmap.c | 5 fs/xfs/libxfs/xfs_format.h | 121 +++++++-- fs/xfs/libxfs/xfs_fs.h | 25 ++ fs/xfs/libxfs/xfs_health.h | 6 fs/xfs/libxfs/xfs_ialloc.c | 58 +++- fs/xfs/libxfs/xfs_inode_buf.c | 90 ++++++- fs/xfs/libxfs/xfs_inode_buf.h | 3 fs/xfs/libxfs/xfs_inode_util.c | 2 fs/xfs/libxfs/xfs_log_format.h | 2 fs/xfs/libxfs/xfs_metadir.c | 481 ++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_metadir.h | 47 ++++ fs/xfs/libxfs/xfs_metafile.c | 52 ++++ fs/xfs/libxfs/xfs_metafile.h | 31 ++ fs/xfs/libxfs/xfs_ondisk.h | 2 fs/xfs/libxfs/xfs_sb.c | 12 + fs/xfs/libxfs/xfs_types.c | 4 fs/xfs/libxfs/xfs_types.h | 2 fs/xfs/scrub/agheader.c | 5 fs/xfs/scrub/common.c | 65 ++++- fs/xfs/scrub/common.h | 5 fs/xfs/scrub/dir.c | 10 + fs/xfs/scrub/dir_repair.c | 20 + fs/xfs/scrub/dirtree.c | 32 ++ fs/xfs/scrub/dirtree.h | 12 - fs/xfs/scrub/findparent.c | 28 ++ fs/xfs/scrub/health.c | 1 fs/xfs/scrub/inode.c | 35 ++- fs/xfs/scrub/inode_repair.c | 34 ++- fs/xfs/scrub/metapath.c | 521 +++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/nlinks.c | 4 fs/xfs/scrub/nlinks_repair.c | 4 fs/xfs/scrub/orphanage.c | 4 fs/xfs/scrub/parent.c | 39 ++- fs/xfs/scrub/parent_repair.c | 37 ++- fs/xfs/scrub/quotacheck.c | 7 - fs/xfs/scrub/refcount_repair.c | 2 fs/xfs/scrub/repair.c | 22 +- fs/xfs/scrub/repair.h | 3 fs/xfs/scrub/scrub.c | 12 + fs/xfs/scrub/scrub.h | 2 fs/xfs/scrub/stats.c | 1 fs/xfs/scrub/tempfile.c | 105 ++++++++ fs/xfs/scrub/tempfile.h | 3 fs/xfs/scrub/trace.c | 1 fs/xfs/scrub/trace.h | 42 +++ fs/xfs/xfs_dquot.c | 1 fs/xfs/xfs_fsops.c | 4 fs/xfs/xfs_health.c | 2 fs/xfs/xfs_icache.c | 74 ++++++ fs/xfs/xfs_inode.c | 19 + fs/xfs/xfs_inode.h | 36 ++- fs/xfs/xfs_inode_item.c | 7 - fs/xfs/xfs_inode_item_recover.c | 2 fs/xfs/xfs_ioctl.c | 7 + fs/xfs/xfs_iops.c | 15 + fs/xfs/xfs_itable.c | 33 ++ fs/xfs/xfs_itable.h | 3 fs/xfs/xfs_message.c | 47 ++++ fs/xfs/xfs_message.h | 19 + fs/xfs/xfs_mount.c | 31 ++ fs/xfs/xfs_mount.h | 11 + fs/xfs/xfs_qm.c | 36 +++ fs/xfs/xfs_quota.h | 5 fs/xfs/xfs_rtalloc.c | 38 ++- fs/xfs/xfs_super.c | 13 - fs/xfs/xfs_trace.c | 2 fs/xfs/xfs_trace.h | 102 ++++++++ fs/xfs/xfs_trans_dquot.c | 6 fs/xfs/xfs_xattr.c | 3 71 files changed, 2324 insertions(+), 201 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_metadir.c create mode 100644 fs/xfs/libxfs/xfs_metadir.h create mode 100644 fs/xfs/libxfs/xfs_metafile.c create mode 100644 fs/xfs/libxfs/xfs_metafile.h create mode 100644 fs/xfs/scrub/metapath.c

11 months

3
5
0 0

[merged mm-hotfixes-stable] mm-avoid-unconditional-one-tick-sleep-when-swapcache_prepare-fails.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: avoid unconditional one-tick sleep when swapcache_prepare fails has been removed from the -mm tree. Its filename was mm-avoid-unconditional-one-tick-sleep-when-swapcache_prepare-fails.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Barry Song <v-songbaohua(a)oppo.com> Subject: mm: avoid unconditional one-tick sleep when swapcache_prepare fails Date: Fri, 27 Sep 2024 09:19:36 +1200 Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") introduced an unconditional one-tick sleep when `swapcache_prepare()` fails, which has led to reports of UI stuttering on latency-sensitive Android devices. To address this, we can use a waitqueue to wake up tasks that fail `swapcache_prepare()` sooner, instead of always sleeping for a full tick. While tasks may occasionally be woken by an unrelated `do_swap_page()`, this method is preferable to two scenarios: rapid re-entry into page faults, which can cause livelocks, and multiple millisecond sleeps, which visibly degrade user experience. Oven's testing shows that a single waitqueue resolves the UI stuttering issue. If a 'thundering herd' problem becomes apparent later, a waitqueue hash similar to `folio_wait_table[PAGE_WAIT_TABLE_SIZE]` for page bit locks can be introduced. [v-songbaohua(a)oppo.com: wake_up only when swapcache_wq waitqueue is active] Link: https://lkml.kernel.org/r/20241008130807.40833-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240926211936.75373-1-21cnbao@gmail.com Fixes: 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") Signed-off-by: Barry Song <v-songbaohua(a)oppo.com> Reported-by: Oven Liyang <liyangouwen1(a)oppo.com> Tested-by: Oven Liyang <liyangouwen1(a)oppo.com> Cc: Kairui Song <kasong(a)tencent.com> Cc: "Huang, Ying" <ying.huang(a)intel.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Chris Li <chrisl(a)kernel.org> Cc: Hugh Dickins <hughd(a)google.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: Yosry Ahmed <yosryahmed(a)google.com> Cc: SeongJae Park <sj(a)kernel.org> Cc: Kalesh Singh <kaleshsingh(a)google.com> Cc: Suren Baghdasaryan <surenb(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) --- a/mm/memory.c~mm-avoid-unconditional-one-tick-sleep-when-swapcache_prepare-fails +++ a/mm/memory.c @@ -4187,6 +4187,8 @@ static struct folio *alloc_swap_folio(st } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq); + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -4199,6 +4201,7 @@ vm_fault_t do_swap_page(struct vm_fault { struct vm_area_struct *vma = vmf->vma; struct folio *swapcache, *folio = NULL; + DECLARE_WAITQUEUE(wait, current); struct page *page; struct swap_info_struct *si = NULL; rmap_t rmap_flags = RMAP_NONE; @@ -4297,7 +4300,9 @@ vm_fault_t do_swap_page(struct vm_fault * Relax a bit to prevent rapid * repeated page faults. */ + add_wait_queue(&swapcache_wq, &wait); schedule_timeout_uninterruptible(1); + remove_wait_queue(&swapcache_wq, &wait); goto out_page; } need_clear_cache = true; @@ -4604,8 +4609,11 @@ unlock: pte_unmap_unlock(vmf->pte, vmf->ptl); out: /* Clear the swap cache pin for direct swapin after PTL unlock */ - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret; @@ -4620,8 +4628,11 @@ out_release: folio_unlock(swapcache); folio_put(swapcache); } - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret; _ Patches currently in -mm which might be from v-songbaohua(a)oppo.com are mm-fix-pswpin-counter-for-large-folios-swap-in.patch

11 months

1
0
0 0

+ mm-mmap-limit-thp-aligment-of-anonymous-mappings-to-pmd-aligned-sizes.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm, mmap: limit THP aligment of anonymous mappings to PMD-aligned sizes has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-mmap-limit-thp-aligment-of-anonymous-mappings-to-pmd-aligned-sizes.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Vlastimil Babka <vbabka(a)suse.cz> Subject: mm, mmap: limit THP aligment of anonymous mappings to PMD-aligned sizes Date: Thu, 24 Oct 2024 17:12:29 +0200 Since commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") a mmap() of anonymous memory without a specific address hint and of at least PMD_SIZE will be aligned to PMD so that it can benefit from a THP backing page. However this change has been shown to regress some workloads significantly. [1] reports regressions in various spec benchmarks, with up to 600% slowdown of the cactusBSSN benchmark on some platforms. The benchmark seems to create many mappings of 4632kB, which would have merged to a large THP-backed area before commit efa7df3e3bb5 and now they are fragmented to multiple areas each aligned to PMD boundary with gaps between. The regression then seems to be caused mainly due to the benchmark's memory access pattern suffering from TLB or cache aliasing due to the aligned boundaries of the individual areas. Another known regression bisected to commit efa7df3e3bb5 is darktable [2] [3] and early testing suggests this patch fixes the regression there as well. To fix the regression but still try to benefit from THP-friendly anonymous mapping alignment, add a condition that the size of the mapping must be a multiple of PMD size instead of at least PMD size. In case of many odd-sized mapping like the cactusBSSN creates, those will stop being aligned and with gaps between, and instead naturally merge again. Link: https://lkml.kernel.org/r/20241024151228.101841-2-vbabka@suse.cz Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz> Reported-by: Michael Matz <matz(a)suse.de> Debugged-by: Gabriel Krisman Bertazi <gabriel(a)krisman.be> Closes: https://bugzilla.suse.com/show_bug.cgi?id=1229012 [1] Reported-by: Matthias Bodenbinder <matthias(a)bodenbinder.de> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219366 [2] Closes: https://lore.kernel.org/all/2050f0d4-57b0-481d-bab8-05e8d48fed0c@leemhuis.i… [3] Cc: Rik van Riel <riel(a)surriel.com> Cc: Yang Shi <yang(a)os.amperecomputing.com> Cc: Jann Horn <jannh(a)google.com> Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Cc: Petr Tesarik <ptesarik(a)suse.com> Cc: Thorsten Leemhuis <regressions(a)leemhuis.info> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/mmap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/mm/mmap.c~mm-mmap-limit-thp-aligment-of-anonymous-mappings-to-pmd-aligned-sizes +++ a/mm/mmap.c @@ -900,7 +900,8 @@ __get_unmapped_area(struct file *file, u if (get_area) { addr = get_area(file, addr, len, pgoff, flags); - } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { + } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) + && IS_ALIGNED(len, PMD_SIZE)) { /* Ensures that larger anonymous mappings are THP aligned. */ addr = thp_get_unmapped_area_vmflags(file, addr, len, pgoff, flags, vm_flags); _ Patches currently in -mm which might be from vbabka(a)suse.cz are mm-mmap-limit-thp-aligment-of-anonymous-mappings-to-pmd-aligned-sizes.patch

11 months

1
0
0 0

+ mm-shrinker-avoid-memleak-in-alloc_shrinker_info.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm: shrinker: avoid memleak in alloc_shrinker_info has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-shrinker-avoid-memleak-in-alloc_shrinker_info.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Chen Ridong <chenridong(a)huawei.com> Subject: mm: shrinker: avoid memleak in alloc_shrinker_info Date: Fri, 25 Oct 2024 06:09:42 +0000 A memleak was found as below: unreferenced object 0xffff8881010d2a80 (size 32): comm "mkdir", pid 1559, jiffies 4294932666 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 @............... backtrace (crc 2e7ef6fa): [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470 [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0 [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0 [<ffffffff81198dd9>] online_css+0x29/0xa0 [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360 [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0 [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90 [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220 [<ffffffff813e1c97>] do_mkdirat+0x87/0x130 [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70 [<ffffffff81f8c928>] do_syscall_64+0x68/0x140 [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e alloc_shrinker_info(), when shrinker_unit_alloc() returns an errer, the info won't be freed. Just fix it. Link: https://lkml.kernel.org/r/20241025060942.1049263-1-chenridong@huaweicloud.c… Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}") Signed-off-by: Chen Ridong <chenridong(a)huawei.com> Acked-by: Qi Zheng <zhengqi.arch(a)bytedance.com> Acked-by: Roman Gushchin <roman.gushchin(a)linux.dev> Acked-by: Vlastimil Babka <vbabka(a)suse.cz> Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual(a)arm.com> Cc: Dave Chinner <david(a)fromorbit.com> Cc: Muchun Song <muchun.song(a)linux.dev> Cc: Wang Weiyang <wangweiyang2(a)huawei.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/shrinker.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) --- a/mm/shrinker.c~mm-shrinker-avoid-memleak-in-alloc_shrinker_info +++ a/mm/shrinker.c @@ -76,19 +76,21 @@ void free_shrinker_info(struct mem_cgrou int alloc_shrinker_info(struct mem_cgroup *memcg) { - struct shrinker_info *info; int nid, ret = 0; int array_size = 0; mutex_lock(&shrinker_mutex); array_size = shrinker_unit_size(shrinker_nr_max); for_each_node(nid) { - info = kvzalloc_node(sizeof(*info) + array_size, GFP_KERNEL, nid); + struct shrinker_info *info = kvzalloc_node(sizeof(*info) + array_size, + GFP_KERNEL, nid); if (!info) goto err; info->map_nr_max = shrinker_nr_max; - if (shrinker_unit_alloc(info, NULL, nid)) + if (shrinker_unit_alloc(info, NULL, nid)) { + kvfree(info); goto err; + } rcu_assign_pointer(memcg->nodeinfo[nid]->shrinker_info, info); } mutex_unlock(&shrinker_mutex); _ Patches currently in -mm which might be from chenridong(a)huawei.com are mm-shrinker-avoid-memleak-in-alloc_shrinker_info.patch

11 months

1
0
0 0

[PATCH 0/2] usb: dwc3: Disable susphy during initialization

by Thinh Nguyen

We notice some platforms set "snps,dis_u3_susphy_quirk" and "snps,dis_u2_susphy_quirk" when they should not need to. Just make sure that the GUSB3PIPECTL.SUSPENDENABLE and GUSB2PHYCFG.SUSPHY are clear during initialization. The host initialization involved xhci. So the dwc3 needs to implement the xhci_plat_priv->plat_start() for xhci to re-enable the suspend bits. Since there's a prerequisite patch to drivers/usb/host/xhci-plat.h that's not a fix patch, this series should go on Greg's usb-testing branch instead of usb-linus. Thinh Nguyen (2): usb: xhci-plat: Don't include xhci.h usb: dwc3: core: Prevent phy suspend during init drivers/usb/dwc3/core.c | 90 +++++++++++++++--------------------- drivers/usb/dwc3/core.h | 1 + drivers/usb/dwc3/gadget.c | 2 + drivers/usb/dwc3/host.c | 27 +++++++++++ drivers/usb/host/xhci-plat.h | 4 +- 5 files changed, 71 insertions(+), 53 deletions(-) base-commit: 3d122e6d27e417a9fa91181922743df26b2cd679 -- 2.28.0

11 months

5
12
0 0

+ vmscanmigrate-fix-double-decrement-on-node-stats-when-demoting-pages.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: vmscan,migrate: fix double-decrement on node stats when demoting pages has been added to the -mm mm-hotfixes-unstable branch. Its filename is vmscanmigrate-fix-double-decrement-on-node-stats-when-demoting-pages.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Gregory Price <gourry(a)gourry.net> Subject: vmscan,migrate: fix double-decrement on node stats when demoting pages Date: Fri, 25 Oct 2024 10:17:24 -0400 When numa balancing is enabled with demotion, vmscan will call migrate_pages when shrinking LRUs. Successful demotions will cause node vmstat numbers to double-decrement, leading to an imbalanced page count. The result is dmesg output like such: $ cat /proc/sys/vm/stat_refresh [77383.088417] vmstat_refresh: nr_isolated_anon -103212 [77383.088417] vmstat_refresh: nr_isolated_file -899642 This negative value may impact compaction and reclaim throttling. The double-decrement occurs in the migrate_pages path: caller to shrink_folio_list decrements the count shrink_folio_list demote_folio_list migrate_pages migrate_pages_batch migrate_folio_move migrate_folio_done mod_node_page_state(-ve) <- second decrement This path happens for SUCCESSFUL migrations, not failures. Typically callers to migrate_pages are required to handle putback/accounting for failures, but this is already handled in the shrink code. When accounting for migrations, instead do not decrement the count when the migration reason is MR_DEMOTION. As of v6.11, this demotion logic is the only source of MR_DEMOTION. Link: https://lkml.kernel.org/r/20241025141724.17927-1-gourry@gourry.net Fixes: 26aa2d199d6f2 ("mm/migrate: demote pages during reclaim") Signed-off-by: Gregory Price <gourry(a)gourry.net> Cc: Dave Hansen <dave.hansen(a)linux.intel.com> Cc: Huang Ying <ying.huang(a)intel.com> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: Wei Xu <weixugc(a)google.com> Cc: Yang Shi <shy828301(a)gmail.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/migrate.c~vmscanmigrate-fix-double-decrement-on-node-stats-when-demoting-pages +++ a/mm/migrate.c @@ -1178,7 +1178,7 @@ static void migrate_folio_done(struct fo * not accounted to NR_ISOLATED_*. They can be recognized * as __folio_test_movable */ - if (likely(!__folio_test_movable(src))) + if (likely(!__folio_test_movable(src)) && reason != MR_DEMOTION) mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON + folio_is_file_lru(src), -folio_nr_pages(src)); _ Patches currently in -mm which might be from gourry(a)gourry.net are vmscanmigrate-fix-double-decrement-on-node-stats-when-demoting-pages.patch

11 months

1
0
0 0

[PATCH v2 1/4] KVM: arm64: Don't retire aborted MMIO instruction

by Oliver Upton

Returning an abort to the guest for an unsupported MMIO access is a documented feature of the KVM UAPI. Nevertheless, it's clear that this plumbing has seen limited testing, since userspace can trivially cause a WARN in the MMIO return: WARNING: CPU: 0 PID: 30558 at arch/arm64/include/asm/kvm_emulate.h:536 kvm_handle_mmio_return+0x46c/0x5c4 arch/arm64/include/asm/kvm_emulate.h:536 Call trace: kvm_handle_mmio_return+0x46c/0x5c4 arch/arm64/include/asm/kvm_emulate.h:536 kvm_arch_vcpu_ioctl_run+0x98/0x15b4 arch/arm64/kvm/arm.c:1133 kvm_vcpu_ioctl+0x75c/0xa78 virt/kvm/kvm_main.c:4487 __do_sys_ioctl fs/ioctl.c:51 [inline] __se_sys_ioctl fs/ioctl.c:893 [inline] __arm64_sys_ioctl+0x14c/0x1c8 fs/ioctl.c:893 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49 el0_svc_common+0x1e0/0x23c arch/arm64/kernel/syscall.c:132 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151 el0_svc+0x38/0x68 arch/arm64/kernel/entry-common.c:712 el0t_64_sync_handler+0x90/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 The splat is complaining that KVM is advancing PC while an exception is pending, i.e. that KVM is retiring the MMIO instruction despite a pending synchronous external abort. Womp womp. Fix the glaring UAPI bug by skipping over all the MMIO emulation in case there is a pending synchronous exception. Note that while userspace is capable of pending an asynchronous exception (SError, IRQ, or FIQ), it is still safe to retire the MMIO instruction in this case as (1) they are by definition asynchronous, and (2) KVM relies on hardware support for pending/delivering these exceptions instead of the software state machine for advancing PC. Cc: stable(a)vger.kernel.org Fixes: da345174ceca ("KVM: arm/arm64: Allow user injection of external data aborts") Reported-by: Alexander Potapenko <glider(a)google.com> Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev> --- arch/arm64/kvm/mmio.c | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kvm/mmio.c b/arch/arm64/kvm/mmio.c index cd6b7b83e2c3..ab365e839874 100644 --- a/arch/arm64/kvm/mmio.c +++ b/arch/arm64/kvm/mmio.c @@ -72,6 +72,31 @@ unsigned long kvm_mmio_read_buf(const void *buf, unsigned int len) return data; } +static bool kvm_pending_sync_exception(struct kvm_vcpu *vcpu) +{ + if (!vcpu_get_flag(vcpu, PENDING_EXCEPTION)) + return false; + + if (vcpu_el1_is_32bit(vcpu)) { + switch (vcpu_get_flag(vcpu, EXCEPT_MASK)) { + case unpack_vcpu_flag(EXCEPT_AA32_UND): + case unpack_vcpu_flag(EXCEPT_AA32_IABT): + case unpack_vcpu_flag(EXCEPT_AA32_DABT): + return true; + default: + return false; + } + } else { + switch (vcpu_get_flag(vcpu, EXCEPT_MASK)) { + case unpack_vcpu_flag(EXCEPT_AA64_EL1_SYNC): + case unpack_vcpu_flag(EXCEPT_AA64_EL2_SYNC): + return true; + default: + return false; + } + } +} + /** * kvm_handle_mmio_return -- Handle MMIO loads after user space emulation * or in-kernel IO emulation @@ -84,8 +109,11 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu) unsigned int len; int mask; - /* Detect an already handled MMIO return */ - if (unlikely(!vcpu->mmio_needed)) + /* + * Detect if the MMIO return was already handled or if userspace aborted + * the MMIO access. + */ + if (unlikely(!vcpu->mmio_needed || kvm_pending_sync_exception(vcpu))) return 1; vcpu->mmio_needed = 0; -- 2.47.0.163.g1226f6d8fa-goog

11 months

1
0
0 0

+ sched-numa-fix-the-potential-null-pointer-dereference-in-task_numa_work.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: sched/numa: fix the potential null pointer dereference in task_numa_work() has been added to the -mm mm-hotfixes-unstable branch. Its filename is sched-numa-fix-the-potential-null-pointer-dereference-in-task_numa_work.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Shawn Wang <shawnwang(a)linux.alibaba.com> Subject: sched/numa: fix the potential null pointer dereference in task_numa_work() Date: Fri, 25 Oct 2024 10:22:08 +0800 When running stress-ng-vm-segv test, we found a null pointer dereference error in task_numa_work(). Here is the backtrace: [323676.066985] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 ...... [323676.067108] CPU: 35 PID: 2694524 Comm: stress-ng-vm-se ...... [323676.067113] pstate: 23401009 (nzCv daif +PAN -UAO +TCO +DIT +SSBS BTYPE=--) [323676.067115] pc : vma_migratable+0x1c/0xd0 [323676.067122] lr : task_numa_work+0x1ec/0x4e0 [323676.067127] sp : ffff8000ada73d20 [323676.067128] x29: ffff8000ada73d20 x28: 0000000000000000 x27: 000000003e89f010 [323676.067130] x26: 0000000000080000 x25: ffff800081b5c0d8 x24: ffff800081b27000 [323676.067133] x23: 0000000000010000 x22: 0000000104d18cc0 x21: ffff0009f7158000 [323676.067135] x20: 0000000000000000 x19: 0000000000000000 x18: ffff8000ada73db8 [323676.067138] x17: 0001400000000000 x16: ffff800080df40b0 x15: 0000000000000035 [323676.067140] x14: ffff8000ada73cc8 x13: 1fffe0017cc72001 x12: ffff8000ada73cc8 [323676.067142] x11: ffff80008001160c x10: ffff000be639000c x9 : ffff8000800f4ba4 [323676.067145] x8 : ffff000810375000 x7 : ffff8000ada73974 x6 : 0000000000000001 [323676.067147] x5 : 0068000b33e26707 x4 : 0000000000000001 x3 : ffff0009f7158000 [323676.067149] x2 : 0000000000000041 x1 : 0000000000004400 x0 : 0000000000000000 [323676.067152] Call trace: [323676.067153] vma_migratable+0x1c/0xd0 [323676.067155] task_numa_work+0x1ec/0x4e0 [323676.067157] task_work_run+0x78/0xd8 [323676.067161] do_notify_resume+0x1ec/0x290 [323676.067163] el0_svc+0x150/0x160 [323676.067167] el0t_64_sync_handler+0xf8/0x128 [323676.067170] el0t_64_sync+0x17c/0x180 [323676.067173] Code: d2888001 910003fd f9000bf3 aa0003f3 (f9401000) [323676.067177] SMP: stopping secondary CPUs [323676.070184] Starting crashdump kernel... stress-ng-vm-segv in stress-ng is used to stress test the SIGSEGV error handling function of the system, which tries to cause a SIGSEGV error on return from unmapping the whole address space of the child process. Normally this program will not cause kernel crashes. But before the munmap system call returns to user mode, a potential task_numa_work() for numa balancing could be added and executed. In this scenario, since the child process has no vma after munmap, the vma_next() in task_numa_work() will return a null pointer even if the vma iterator restarts from 0. Recheck the vma pointer before dereferencing it in task_numa_work(). Link: https://lkml.kernel.org/r/20241025022208.125527-1-shawnwang@linux.alibaba.c… Fixes: 214dbc428137 ("sched: convert to vma iterator") Signed-off-by: Shawn Wang <shawnwang(a)linux.alibaba.com> Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com> Cc: Ben Segall <bsegall(a)google.com> Cc: Dietmar Eggemann <dietmar.eggemann(a)arm.com> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Juri Lelli <juri.lelli(a)redhat.com> Cc: Mel Gorman <mgorman(a)suse.de> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Steven Rostedt (Google) <rostedt(a)goodmis.org> Cc: Valentin Schneider <vschneid(a)redhat.com> Cc: Vincent Guittot <vincent.guittot(a)linaro.org> Cc: <stable(a)vger.kernel.org> [6.2+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- kernel/sched/fair.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/kernel/sched/fair.c~sched-numa-fix-the-potential-null-pointer-dereference-in-task_numa_work +++ a/kernel/sched/fair.c @@ -3369,7 +3369,7 @@ retry_pids: vma = vma_next(&vmi); } - do { + for (; vma; vma = vma_next(&vmi)) { if (!vma_migratable(vma) || !vma_policy_mof(vma) || is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) { trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_UNSUITABLE); @@ -3491,7 +3491,7 @@ retry_pids: */ if (vma_pids_forced) break; - } for_each_vma(vmi, vma); + } /* * If no VMAs are remaining and VMAs were skipped due to the PID _ Patches currently in -mm which might be from shawnwang(a)linux.alibaba.com are sched-numa-fix-the-potential-null-pointer-dereference-in-task_numa_work.patch

11 months

1
0
0 0

[PATCH] pinctrl: qcom: spmi: fix debugfs drive strength

by Johan Hovold

Commit 723e8462a4fe ("pinctrl: qcom: spmi-gpio: Fix the GPIO strength mapping") fixed a long-standing issue in the Qualcomm SPMI PMIC gpio driver which had the 'low' and 'high' drive strength settings switched but failed to update the debugfs interface which still gets this wrong. Fix the debugfs code so that the exported values match the hardware settings. Note that this probably means that most devicetrees that try to describe the firmware settings got this wrong if the settings were derived from debugfs. Before the above mentioned commit the settings would have actually matched the firmware settings even if they were described incorrectly, but now they are inverted. Fixes: 723e8462a4fe ("pinctrl: qcom: spmi-gpio: Fix the GPIO strength mapping") Fixes: eadff3024472 ("pinctrl: Qualcomm SPMI PMIC GPIO pin controller driver") Cc: Anjelique Melendez <quic_amelende(a)quicinc.com> Cc: stable(a)vger.kernel.org # 3.19 Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org> --- drivers/pinctrl/qcom/pinctrl-spmi-gpio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pinctrl/qcom/pinctrl-spmi-gpio.c b/drivers/pinctrl/qcom/pinctrl-spmi-gpio.c index 3d03293f6320..3a12304e2b7d 100644 --- a/drivers/pinctrl/qcom/pinctrl-spmi-gpio.c +++ b/drivers/pinctrl/qcom/pinctrl-spmi-gpio.c @@ -667,7 +667,7 @@ static void pmic_gpio_config_dbg_show(struct pinctrl_dev *pctldev, "push-pull", "open-drain", "open-source" }; static const char *const strengths[] = { - "no", "high", "medium", "low" + "no", "low", "medium", "high" }; pad = pctldev->desc->pins[pin].drv_data; -- 2.45.2

11 months

3
2
0 0

[to-be-updated] mm-page_alloc-fix-numa-stats-update-for-cpu-less-nodes.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/page_alloc: fix NUMA stats update for cpu-less nodes has been removed from the -mm tree. Its filename was mm-page_alloc-fix-numa-stats-update-for-cpu-less-nodes.patch This patch was dropped because an updated version will be issued ------------------------------------------------------ From: Dongjoo Seo <dongjoo.linux.dev(a)gmail.com> Subject: mm/page_alloc: fix NUMA stats update for cpu-less nodes Date: Wed, 23 Oct 2024 10:50:37 -0700 In the case of memoryless node, when a process prefers a node with no memory(e.g., because it is running on a CPU local to that node), the kernel treats a nearby node with memory as the preferred node. As a result, such allocations do not increment the numa_foreign counter on the memoryless node, leading to skewed NUMA_HIT, NUMA_MISS, and NUMA_FOREIGN stats for the nearest node. This patch corrects this issue by: 1. Checking if the zone or preferred zone is CPU-less before updating the NUMA stats. 2. Ensuring NUMA_HIT is only updated if the zone is not CPU-less. 3. Ensuring NUMA_FOREIGN is only updated if the preferred zone is not CPU-less. Example Before and After Patch: - Before Patch: node0 node1 node2 numa_hit 86333181 114338269 5108 numa_miss 5199455 0 56844591 numa_foreign 32281033 29763013 0 interleave_hit 91 91 0 local_node 86326417 114288458 0 other_node 5206219 49768 56849702 - After Patch: node0 node1 node2 numa_hit 2523058 9225528 0 numa_miss 150213 10226 21495942 numa_foreign 17144215 4501270 0 interleave_hit 91 94 0 local_node 2493918 9208226 0 other_node 179351 27528 21495942 Similarly, in the context of cpuless nodes, this patch ensures that NUMA statistics are accurately updated by adding checks to prevent the miscounting of memory allocations when the involved nodes have no CPUs. This ensures more precise tracking of memory access patterns across all nodes, regardless of whether they have CPUs or not, improving the overall reliability of NUMA stat. The reason is that page allocation from dev_dax, cpuset, memcg .. comes with preferred allocating zone in cpuless node and it's hard to track the zone info for miss information. Link: https://lkml.kernel.org/r/20241023175037.9125-1-dongjoo.linux.dev@gmail.com Signed-off-by: Dongjoo Seo <dongjoo.linux.dev(a)gmail.com> Cc: Davidlohr Bueso <dave(a)stgolabs.net> Cc: Fan Ni <nifan(a)outlook.com> Cc: Dan Williams <dan.j.williams(a)intel.com> Cc: Adam Manzanares <a.manzanares(a)samsung.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_alloc.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) --- a/mm/page_alloc.c~mm-page_alloc-fix-numa-stats-update-for-cpu-less-nodes +++ a/mm/page_alloc.c @@ -2858,19 +2858,21 @@ static inline void zone_statistics(struc { #ifdef CONFIG_NUMA enum numa_stat_item local_stat = NUMA_LOCAL; + bool z_is_cpuless = !node_state(zone_to_nid(z), N_CPU); + bool pref_is_cpuless = !node_state(zone_to_nid(preferred_zone), N_CPU); - /* skip numa counters update if numa stats is disabled */ if (!static_branch_likely(&vm_numa_stat_key)) return; - if (zone_to_nid(z) != numa_node_id()) + if (zone_to_nid(z) != numa_node_id() || z_is_cpuless) local_stat = NUMA_OTHER; - if (zone_to_nid(z) == zone_to_nid(preferred_zone)) + if (zone_to_nid(z) == zone_to_nid(preferred_zone) && !z_is_cpuless) __count_numa_events(z, NUMA_HIT, nr_account); else { __count_numa_events(z, NUMA_MISS, nr_account); - __count_numa_events(preferred_zone, NUMA_FOREIGN, nr_account); + if (!pref_is_cpuless) + __count_numa_events(preferred_zone, NUMA_FOREIGN, nr_account); } __count_numa_events(z, local_stat, nr_account); #endif _ Patches currently in -mm which might be from dongjoo.linux.dev(a)gmail.com are

11 months

1
0
0 0

[PATCH v2 0/6] cxl: Initialization and shutdown fixes

by Dan Williams

Changes since v1 [1]: - Fix some misspellings missed by checkpatch in changelogs (Jonathan) - Add comments explaining the order of objects in drivers/cxl/Makefile (Jonathan) - Rename attach_device => cxl_rescan_attach (Jonathan) - Fixup Zijun's email (Zijun) [1]: http://lore.kernel.org/172862483180.2150669.5564474284074502692.stgit@dwill… --- Original cover: Gregory's modest proposal to fix CXL cxl_mem_probe() failures due to delayed arrival of the CXL "root" infrastructure [1] prompted questions of how the existing mechanism for retrying cxl_mem_probe() could be failing. The critical missing piece in the debug was that Gregory's setup had almost all CXL modules built-in to the kernel. On the way to that discovery several other bugs and init-order corner cases were discovered. The main fix is to make sure the drivers/cxl/Makefile object order supports root CXL ports being fully initialized upon cxl_acpi_probe() exit. The modular case has some similar potential holes that are fixed with MODULE_SOFTDEP() and other fix ups. Finally, an attempt to update cxl_test to reproduce the original report resulted in the discovery of a separate long standing use after free bug in cxl_region_detach(). [2]: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net --- Dan Williams (6): cxl/port: Fix CXL port initialization order when the subsystem is built-in cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices() cxl/acpi: Ensure ports ready at cxl_acpi_probe() return cxl/port: Fix use-after-free, permit out-of-order decoder shutdown cxl/port: Prevent out-of-order decoder allocation cxl/test: Improve init-order fidelity relative to real-world systems drivers/base/core.c | 35 +++++++ drivers/cxl/Kconfig | 1 drivers/cxl/Makefile | 20 +++- drivers/cxl/acpi.c | 7 + drivers/cxl/core/hdm.c | 50 +++++++++-- drivers/cxl/core/port.c | 13 ++- drivers/cxl/core/region.c | 91 ++++++++++--------- drivers/cxl/cxl.h | 3 - include/linux/device.h | 3 + tools/testing/cxl/test/cxl.c | 200 +++++++++++++++++++++++------------------- tools/testing/cxl/test/mem.c | 1 11 files changed, 269 insertions(+), 155 deletions(-) base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b

11 months

6
16
0 0

[PATCH] drm/amd/pm: Vangogh: Fix kernel memory out of bounds write

by Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> KASAN reports that the GPU metrics table allocated in vangogh_tables_init() is not large enough for the memset done in smu_cmn_init_soft_gpu_metrics(). Condensed report follows: [ 33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu] [ 33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067 ... [ 33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G W 6.12.0-rc4 #356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544 [ 33.861816] Tainted: [W]=WARN [ 33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023 [ 33.861822] Call Trace: [ 33.861826] <TASK> [ 33.861829] dump_stack_lvl+0x66/0x90 [ 33.861838] print_report+0xce/0x620 [ 33.861853] kasan_report+0xda/0x110 [ 33.862794] kasan_check_range+0xfd/0x1a0 [ 33.862799] __asan_memset+0x23/0x40 [ 33.862803] smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.863306] vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.864257] vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.865682] amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.866160] amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.867135] dev_attr_show+0x43/0xc0 [ 33.867147] sysfs_kf_seq_show+0x1f1/0x3b0 [ 33.867155] seq_read_iter+0x3f8/0x1140 [ 33.867173] vfs_read+0x76c/0xc50 [ 33.867198] ksys_read+0xfb/0x1d0 [ 33.867214] do_syscall_64+0x90/0x160 ... [ 33.867353] Allocated by task 378 on cpu 7 at 22.794876s: [ 33.867358] kasan_save_stack+0x33/0x50 [ 33.867364] kasan_save_track+0x17/0x60 [ 33.867367] __kasan_kmalloc+0x87/0x90 [ 33.867371] vangogh_init_smc_tables+0x3f9/0x840 [amdgpu] [ 33.867835] smu_sw_init+0xa32/0x1850 [amdgpu] [ 33.868299] amdgpu_device_init+0x467b/0x8d90 [amdgpu] [ 33.868733] amdgpu_driver_load_kms+0x19/0xf0 [amdgpu] [ 33.869167] amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu] [ 33.869608] local_pci_probe+0xda/0x180 [ 33.869614] pci_device_probe+0x43f/0x6b0 Empirically we can confirm that the former allocates 152 bytes for the table, while the latter memsets the 168 large block. This is somewhat alleviated by the fact that allocation goes into a 192 SLAB bucket, but then for v3_0 metrics the table grows to 264 bytes which would definitely be a problem. Root cause appears that when GPU metrics tables for v2_4 parts were added it was not considered to enlarge the table to fit. The fix in this patch is rather "brute force" and perhaps later should be done in a smarter way, by extracting and consolidating the part version to size logic to a common helper, instead of brute forcing the largest possible allocation. Nevertheless, for now this works and fixes the out of bounds write. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> Fixes: 41cec40bc9ba ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics") Cc: Mario Limonciello <mario.limonciello(a)amd.com> Cc: Evan Quan <evan.quan(a)amd.com> Cc: Wenyou Yang <WenYou.Yang(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Cc: <stable(a)vger.kernel.org> # v6.6+ --- drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c index 22737b11b1bf..36f4a4651918 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c @@ -242,7 +242,10 @@ static int vangogh_tables_init(struct smu_context *smu) goto err0_out; smu_table->metrics_time = 0; - smu_table->gpu_metrics_table_size = max(sizeof(struct gpu_metrics_v2_3), sizeof(struct gpu_metrics_v2_2)); + smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v2_2); + smu_table->gpu_metrics_table_size = max(smu_table->gpu_metrics_table_size, sizeof(struct gpu_metrics_v2_3)); + smu_table->gpu_metrics_table_size = max(smu_table->gpu_metrics_table_size, sizeof(struct gpu_metrics_v2_4)); + smu_table->gpu_metrics_table_size = max(smu_table->gpu_metrics_table_size, sizeof(struct gpu_metrics_v3_0)); smu_table->gpu_metrics_table = kzalloc(smu_table->gpu_metrics_table_size, GFP_KERNEL); if (!smu_table->gpu_metrics_table) goto err1_out; -- 2.46.0

11 months

3
3
0 0

[PATCH] coresight: etm4x: Fix PID tracing when perf is run in an init PID namespace

by Julien Meunier

The previous implementation limited the tracing capabilities when perf was run in the init PID namespace, making it impossible to trace applications in non-init PID namespaces. This update improves the tracing process by verifying the event owner. This allows us to determine whether the user has the necessary permissions to trace the application. Cc: stable(a)vger.kernel.org Fixes: aab473867fed ("coresight: etm4x: Don't trace PID for non-root PID namespace") Signed-off-by: Julien Meunier <julien.meunier(a)nokia.com> --- drivers/hwtracing/coresight/coresight-etm4x-core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index bf01f01964cf..8365307b1aec 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -695,7 +695,7 @@ static int etm4_parse_event_config(struct coresight_device *csdev, /* Only trace contextID when runs in root PID namespace */ if ((attr->config & BIT(ETM_OPT_CTXTID)) && - task_is_in_init_pid_ns(current)) + task_is_in_init_pid_ns(event->owner)) /* bit[6], Context ID tracing bit */ config->cfg |= TRCCONFIGR_CID; @@ -710,7 +710,7 @@ static int etm4_parse_event_config(struct coresight_device *csdev, goto out; } /* Only trace virtual contextID when runs in root PID namespace */ - if (task_is_in_init_pid_ns(current)) + if (task_is_in_init_pid_ns(event->owner)) config->cfg |= TRCCONFIGR_VMID | TRCCONFIGR_VMIDOPT; } -- 2.34.1

11 months

4
9
0 0

Re: [PATCH v2] drm/xe/ufence: Flush xe ordered_wq in case of ufence timeout

by Matthew Brost

On Fri, Oct 25, 2024 at 06:06:47PM +0200, Nirmoy Das wrote: > On 10/24/2024 7:22 PM, Matthew Brost wrote: > > On Thu, Oct 24, 2024 at 10:14:21AM -0700, John Harrison wrote: > > On 10/24/2024 08:18, Nirmoy Das wrote: > > Flush xe ordered_wq in case of ufence timeout which is observed > on LNL and that points to the recent scheduling issue with E-cores. > > This is similar to the recent fix: > commit e51527233804 ("drm/xe/guc/ct: Flush g2h worker in case of g2h > response timeout") and should be removed once there is E core > scheduling fix. > > v2: Add platform check(Himal) > s/__flush_workqueue/flush_workqueue(Jani) > > Cc: Badal Nilawar [1]<badal.nilawar(a)intel.com> > Cc: Jani Nikula [2]<jani.nikula(a)intel.com> > Cc: Matthew Auld [3]<matthew.auld(a)intel.com> > Cc: John Harrison [4]<John.C.Harrison(a)Intel.com> > Cc: Himal Prasad Ghimiray [5]<himal.prasad.ghimiray(a)intel.com> > Cc: Lucas De Marchi [6]<lucas.demarchi(a)intel.com> > Cc: [7]<stable(a)vger.kernel.org> # v6.11+ > Link: [8]https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2754 > Suggested-by: Matthew Brost [9]<matthew.brost(a)intel.com> > Signed-off-by: Nirmoy Das [10]<nirmoy.das(a)intel.com> > Reviewed-by: Matthew Brost [11]<matthew.brost(a)intel.com> > --- > drivers/gpu/drm/xe/xe_wait_user_fence.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/drivers/gpu/drm/xe/xe_wait_user_fence.c b/drivers/gpu/drm/xe/xe_wai > t_user_fence.c > index f5deb81eba01..78a0ad3c78fe 100644 > --- a/drivers/gpu/drm/xe/xe_wait_user_fence.c > +++ b/drivers/gpu/drm/xe/xe_wait_user_fence.c > @@ -13,6 +13,7 @@ > #include "xe_device.h" > #include "xe_gt.h" > #include "xe_macros.h" > +#include "compat-i915-headers/i915_drv.h" > #include "xe_exec_queue.h" > static int do_compare(u64 addr, u64 value, u64 mask, u16 op) > @@ -155,6 +156,19 @@ int xe_wait_user_fence_ioctl(struct drm_device *dev, void * > data, > } > if (!timeout) { > + if (IS_LUNARLAKE(xe)) { > + /* > + * This is analogous to e51527233804 ("drm/xe/gu > c/ct: Flush g2h > + * worker in case of g2h response timeout") > + * > + * TODO: Drop this change once workqueue schedul > ing delay issue is > + * fixed on LNL Hybrid CPU. > + */ > + flush_workqueue(xe->ordered_wq); > > If we are having multiple instances of this workaround, can we wrap them up > in as 'LNL_FLUSH_WORKQUEUE(q)' or some such? Put the IS_LNL check inside the > macro and make it pretty obvious exactly where all the instances are by > having a single macro name to search for. > > > +1, I think Lucas is suggesting something similar to this on the chat to > make sure we don't lose track of removing these W/A when this gets > fixed. > > Matt > > Sounds good. I will add LNL_FLUSH_WORKQUEUE() and use that for all the > places we need this WA. > You will need 2 macros... - LNL_FLUSH_WORKQUEUE() which accepts xe_device, workqueue_struct - LNL_FLUSH_WORK() which accepts xe_device, work_struct Matt > Regards, > > Nirmoy > > > > John. > > > + err = do_compare(addr, args->value, args->mask, > args->op); > + if (err <= 0) > + break; > + } > err = -ETIME; > break; > } > > References > > 1. mailto:badal.nilawar@intel.com > 2. mailto:jani.nikula@intel.com > 3. mailto:matthew.auld@intel.com > 4. mailto:John.C.Harrison@Intel.com > 5. mailto:himal.prasad.ghimiray@intel.com > 6. mailto:lucas.demarchi@intel.com > 7. mailto:stable@vger.kernel.org > 8. https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2754 > 9. mailto:matthew.brost@intel.com > 10. mailto:nirmoy.das@intel.com > 11. mailto:matthew.brost@intel.com

11 months

1
0
0 0

[PATCH v2] drm/amd/pm: Vangogh: Fix kernel memory out of bounds write

by Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> KASAN reports that the GPU metrics table allocated in vangogh_tables_init() is not large enough for the memset done in smu_cmn_init_soft_gpu_metrics(). Condensed report follows: [ 33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu] [ 33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067 ... [ 33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G W 6.12.0-rc4 #356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544 [ 33.861816] Tainted: [W]=WARN [ 33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023 [ 33.861822] Call Trace: [ 33.861826] <TASK> [ 33.861829] dump_stack_lvl+0x66/0x90 [ 33.861838] print_report+0xce/0x620 [ 33.861853] kasan_report+0xda/0x110 [ 33.862794] kasan_check_range+0xfd/0x1a0 [ 33.862799] __asan_memset+0x23/0x40 [ 33.862803] smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.863306] vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.864257] vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.865682] amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.866160] amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.867135] dev_attr_show+0x43/0xc0 [ 33.867147] sysfs_kf_seq_show+0x1f1/0x3b0 [ 33.867155] seq_read_iter+0x3f8/0x1140 [ 33.867173] vfs_read+0x76c/0xc50 [ 33.867198] ksys_read+0xfb/0x1d0 [ 33.867214] do_syscall_64+0x90/0x160 ... [ 33.867353] Allocated by task 378 on cpu 7 at 22.794876s: [ 33.867358] kasan_save_stack+0x33/0x50 [ 33.867364] kasan_save_track+0x17/0x60 [ 33.867367] __kasan_kmalloc+0x87/0x90 [ 33.867371] vangogh_init_smc_tables+0x3f9/0x840 [amdgpu] [ 33.867835] smu_sw_init+0xa32/0x1850 [amdgpu] [ 33.868299] amdgpu_device_init+0x467b/0x8d90 [amdgpu] [ 33.868733] amdgpu_driver_load_kms+0x19/0xf0 [amdgpu] [ 33.869167] amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu] [ 33.869608] local_pci_probe+0xda/0x180 [ 33.869614] pci_device_probe+0x43f/0x6b0 Empirically we can confirm that the former allocates 152 bytes for the table, while the latter memsets the 168 large block. Root cause appears that when GPU metrics tables for v2_4 parts were added it was not considered to enlarge the table to fit. The fix in this patch is rather "brute force" and perhaps later should be done in a smarter way, by extracting and consolidating the part version to size logic to a common helper, instead of brute forcing the largest possible allocation. Nevertheless, for now this works and fixes the out of bounds write. v2: * Drop impossible v3_0 case. (Mario) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> Fixes: 41cec40bc9ba ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics") Cc: Mario Limonciello <mario.limonciello(a)amd.com> Cc: Evan Quan <evan.quan(a)amd.com> Cc: Wenyou Yang <WenYou.Yang(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Cc: <stable(a)vger.kernel.org> # v6.6+ --- drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c index 22737b11b1bf..1fe020f1f4db 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c @@ -242,7 +242,9 @@ static int vangogh_tables_init(struct smu_context *smu) goto err0_out; smu_table->metrics_time = 0; - smu_table->gpu_metrics_table_size = max(sizeof(struct gpu_metrics_v2_3), sizeof(struct gpu_metrics_v2_2)); + smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v2_2); + smu_table->gpu_metrics_table_size = max(smu_table->gpu_metrics_table_size, sizeof(struct gpu_metrics_v2_3)); + smu_table->gpu_metrics_table_size = max(smu_table->gpu_metrics_table_size, sizeof(struct gpu_metrics_v2_4)); smu_table->gpu_metrics_table = kzalloc(smu_table->gpu_metrics_table_size, GFP_KERNEL); if (!smu_table->gpu_metrics_table) goto err1_out; -- 2.46.0

11 months

2
1
0 0

[PATCH iwl-net 2/2] idpf: fix idpf_vc_core_init error path

by Pavan Kumar Linga

In an event where the platform running the device control plane is rebooted, reset is detected on the driver. It releases all the resources and waits for the reset to complete. Once the reset is done, it tries to build the resources back. At this time if the device control plane is not yet started, then the driver timeouts on the virtchnl message and retries to establish the mailbox again. In the retry flow, mailbox is deinitialized but the mailbox workqueue is still alive and polling for the mailbox message. This results in accessing the released control queue leading to null-ptr-deref. Fix it by unrolling the work queue cancellation and mailbox deinitialization in the order which they got initialized. Also remove the redundant scheduling of the mailbox task in idpf_vc_core_init. Fixes: 4930fbf419a7 ("idpf: add core init and interrupt request") Fixes: 34c21fa894a1 ("idpf: implement virtchnl transaction manager") Cc: stable(a)vger.kernel.org # 6.9+ Reviewed-by: Tarun K Singh <tarun.k.singh(a)intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga(a)intel.com> --- drivers/net/ethernet/intel/idpf/idpf_lib.c | 1 + drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 7 ------- 2 files changed, 1 insertion(+), 7 deletions(-) diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c index c3848e10e7db..b4fbb99bfad2 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_lib.c +++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c @@ -1786,6 +1786,7 @@ static int idpf_init_hard_reset(struct idpf_adapter *adapter) */ err = idpf_vc_core_init(adapter); if (err) { + cancel_delayed_work_sync(&adapter->mbx_task); idpf_deinit_dflt_mbx(adapter); goto unlock_mutex; } diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c index 3be883726b87..d77d6c3805e2 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c @@ -3017,11 +3017,6 @@ int idpf_vc_core_init(struct idpf_adapter *adapter) goto err_netdev_alloc; } - /* Start the mailbox task before requesting vectors. This will ensure - * vector information response from mailbox is handled - */ - queue_delayed_work(adapter->mbx_wq, &adapter->mbx_task, 0); - queue_delayed_work(adapter->serv_wq, &adapter->serv_task, msecs_to_jiffies(5 * (adapter->pdev->devfn & 0x07))); @@ -3046,7 +3041,6 @@ int idpf_vc_core_init(struct idpf_adapter *adapter) err_intr_req: cancel_delayed_work_sync(&adapter->serv_task); - cancel_delayed_work_sync(&adapter->mbx_task); idpf_vport_params_buf_rel(adapter); err_netdev_alloc: kfree(adapter->vports); @@ -3070,7 +3064,6 @@ int idpf_vc_core_init(struct idpf_adapter *adapter) adapter->state = __IDPF_VER_CHECK; if (adapter->vcxn_mngr) idpf_vc_xn_shutdown(adapter->vcxn_mngr); - idpf_deinit_dflt_mbx(adapter); set_bit(IDPF_HR_DRV_LOAD, adapter->flags); queue_delayed_work(adapter->vc_event_wq, &adapter->vc_event_task, msecs_to_jiffies(task_delay)); -- 2.43.0

11 months

3
2
0 0

[PATCH v1] usb: typec: tcpm: restrict SNK_WAIT_CAPABILITIES_TIMEOUT transitions to non self-powered devices

by Amit Sunil Dhamne

PD3.1 spec ("8.3.3.3.3 PE_SNK_Wait_for_Capabilities State") mandates that the policy engine perform a hard reset when SinkWaitCapTimer expires. Instead the code explicitly does a GET_SOURCE_CAP when the timer expires as part of SNK_WAIT_CAPABILITIES_TIMEOUT. Due to this the following compliance test failures are reported by the compliance tester (added excerpts from the PD Test Spec): * COMMON.PROC.PD.2#1: The Tester receives a Get_Source_Cap Message from the UUT. This message is valid except the following conditions: [COMMON.PROC.PD.2#1] a. The check fails if the UUT sends this message before the Tester has established an Explicit Contract ... * TEST.PD.PROT.SNK.4: ... 4. The check fails if the UUT does not send a Hard Reset between tTypeCSinkWaitCap min and max. [TEST.PD.PROT.SNK.4#1] The delay is between the VBUS present vSafe5V min and the time of the first bit of Preamble of the Hard Reset sent by the UUT. For the purpose of interoperability, restrict the quirk introduced in https://lore.kernel.org/all/20240523171806.223727-1-sebastian.reichel@colla… to only non self-powered devices as battery powered devices will not have the issue mentioned in that commit. Cc: stable(a)vger.kernel.org Fixes: 122968f8dda8 ("usb: typec: tcpm: avoid resets for missing source capability messages") Reported-by: Badhri Jagan Sridharan <badhri(a)google.com> Closes: https://lore.kernel.org/all/CAPTae5LAwsVugb0dxuKLHFqncjeZeJ785nkY4Jfd+M-tCj… Signed-off-by: Amit Sunil Dhamne <amitsd(a)google.com> Reviewed-by: Badhri Jagan Sridharan <badhri(a)google.com> --- drivers/usb/typec/tcpm/tcpm.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c index d6f2412381cf..c8f467d24fbb 100644 --- a/drivers/usb/typec/tcpm/tcpm.c +++ b/drivers/usb/typec/tcpm/tcpm.c @@ -4515,7 +4515,8 @@ static inline enum tcpm_state hard_reset_state(struct tcpm_port *port) return ERROR_RECOVERY; if (port->pwr_role == TYPEC_SOURCE) return SRC_UNATTACHED; - if (port->state == SNK_WAIT_CAPABILITIES_TIMEOUT) + if (port->state == SNK_WAIT_CAPABILITIES || + port->state == SNK_WAIT_CAPABILITIES_TIMEOUT) return SNK_READY; return SNK_UNATTACHED; } @@ -5043,8 +5044,11 @@ static void run_state_machine(struct tcpm_port *port) tcpm_set_state(port, SNK_SOFT_RESET, PD_T_SINK_WAIT_CAP); } else { - tcpm_set_state(port, SNK_WAIT_CAPABILITIES_TIMEOUT, - PD_T_SINK_WAIT_CAP); + if (!port->self_powered) + upcoming_state = SNK_WAIT_CAPABILITIES_TIMEOUT; + else + upcoming_state = hard_reset_state(port); + tcpm_set_state(port, upcoming_state, PD_T_SINK_WAIT_CAP); } break; case SNK_WAIT_CAPABILITIES_TIMEOUT: base-commit: c6d9e43954bfa7415a1e9efdb2806ec1d8a8afc8 -- 2.47.0.105.g07ac214952-goog

11 months

4
3
0 0

[PATCH] nvmet-auth: assign dh_key to NULL after kfree_sensitive

by Vitaliy Shevtsov

ctrl->dh_key might be used across multiple calls to nvmet_setup_dhgroup() for the same controller. So it's better to nullify it after release on error path in order to avoid double free later in nvmet_destroy_auth(). Found by Linux Verification Center (linuxtesting.org) with Svace. Fixes: 7a277c37d352 ("nvmet-auth: Diffie-Hellman key exchange support") Cc: stable(a)vger.kernel.org Signed-off-by: Vitaliy Shevtsov <v.shevtsov(a)maxima.ru> --- drivers/nvme/target/auth.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/nvme/target/auth.c b/drivers/nvme/target/auth.c index e900525b7866..7bca64de4a2f 100644 --- a/drivers/nvme/target/auth.c +++ b/drivers/nvme/target/auth.c @@ -101,6 +101,7 @@ int nvmet_setup_dhgroup(struct nvmet_ctrl *ctrl, u8 dhgroup_id) pr_debug("%s: ctrl %d failed to generate private key, err %d\n", __func__, ctrl->cntlid, ret); kfree_sensitive(ctrl->dh_key); + ctrl->dh_key = NULL; return ret; } ctrl->dh_keysize = crypto_kpp_maxsize(ctrl->dh_tfm); -- 2.46.1

11 months

5
4
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror