March 2024 - Linux-stable-mirror

[PATCH v2] btrfs: do not skip re-registration for the mounted device

by Anand Jain

There are reports that since version 6.7 update-grub fails to find the device of the root on systems without initrd and on a single device. This looks like the device name changed in the output of /proc/self/mountinfo: 6.5-rc5 working 18 1 0:16 / / rw,noatime - btrfs /dev/sda8 ... 6.7 not working: 17 1 0:15 / / rw,noatime - btrfs /dev/root ... and "update-grub" shows this error: /usr/sbin/grub-probe: error: cannot find a device for / (is /dev mounted?) This looks like it's related to the device name, but grub-probe recognizes the "/dev/root" path and tries to find the underlying device. However there's a special case for some filesystems, for btrfs in particular. The generic root device detection heuristic is not done and it all relies on reading the device infos by a btrfs specific ioctl. This ioctl returns the device name as it was saved at the time of device scan (in this case it's /dev/root). The change in 6.7 for temp_fsid to allow several single device filesystem to exist with the same fsid (and transparently generate a new UUID at mount time) was to skip caching/registering such devices. This also skipped mounted device. One step of scanning is to check if the device name hasn't changed, and if yes then update the cached value. This broke the grub-probe as it always read the device /dev/root and couldn't find it in the system. A temporary workaround is to create a symlink but this does not survive reboot. The right fix is to allow updating the device path of a mounted filesystem even if this is a single device one. In the fix, check if the device's major:minor number matches with the cached device. If they do, then we can allow the scan to happen so that device_list_add() can take care of updating the device path. The file descriptor remains unchanged. This does not affect the temp_fsid feature, the UUID of the mounted filesystem remains the same and the matching is based on device major:minor which is unique per mounted filesystem. This covers the path when the device (that exists for all mounted devices) name changes, updating /dev/root to /dev/sdx. Any other single device with filesystem and is not mounted is still skipped. Note that if a system is booted and initial mount is done on the /dev/root device, this will be the cached name of the device. Only after the command "btrfs device scan" it will change as it triggers the rename. The fix was verified by users whose systems were affected. CC: stable(a)vger.kernel.org # 6.7+ Fixes: bc27d6f0aa0e ("btrfs: scan but don't register device on single device filesystem") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=218353 Link: https://lore.kernel.org/lkml/CAKLYgeJ1tUuqLcsquwuFqjDXPSJpEiokrWK2gisPKDZLs… Signed-off-by: Anand Jain <anand.jain(a)oracle.com> Tested-by: Alex Romosan <aromosan(a)gmail.com> Tested-by: CHECK_1234543212345(a)protonmail.com --- v2: Updated git commit log from [PATCH] with permission. Thx. [PATCH] btrfs: always scan a single device when mounted Add Tested-by. fs/btrfs/volumes.c | 44 ++++++++++++++++++++++++++++++++++---------- 1 file changed, 34 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 474ab7ed65ea..192c540a650c 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1299,6 +1299,31 @@ int btrfs_forget_devices(dev_t devt) return ret; } +static bool btrfs_skip_registration(struct btrfs_super_block *disk_super, + dev_t devt, bool mount_arg_dev) +{ + struct btrfs_fs_devices *fs_devices; + + list_for_each_entry(fs_devices, &fs_uuids, fs_list) { + struct btrfs_device *device; + + mutex_lock(&fs_devices->device_list_mutex); + list_for_each_entry(device, &fs_devices->devices, dev_list) { + if (device->devt == devt) { + mutex_unlock(&fs_devices->device_list_mutex); + return false; + } + } + mutex_unlock(&fs_devices->device_list_mutex); + } + + if (!mount_arg_dev && btrfs_super_num_devices(disk_super) == 1 && + !(btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_SEEDING)) + return true; + + return false; +} + /* * Look for a btrfs signature on a device. This may be called out of the mount path * and we are not allowed to call set_blocksize during the scan. The superblock @@ -1316,6 +1341,7 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags, struct btrfs_device *device = NULL; struct bdev_handle *bdev_handle; u64 bytenr, bytenr_orig; + dev_t devt = 0; int ret; lockdep_assert_held(&uuid_mutex); @@ -1355,18 +1381,16 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags, goto error_bdev_put; } - if (!mount_arg_dev && btrfs_super_num_devices(disk_super) == 1 && - !(btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_SEEDING)) { - dev_t devt; + ret = lookup_bdev(path, &devt); + if (ret) + btrfs_warn(NULL, "lookup bdev failed for path %s: %d", + path, ret); - ret = lookup_bdev(path, &devt); - if (ret) - btrfs_warn(NULL, "lookup bdev failed for path %s: %d", - path, ret); - else + if (btrfs_skip_registration(disk_super, devt, mount_arg_dev)) { + pr_debug("BTRFS: skip registering single non-seed device %s\n", + path); + if (devt) btrfs_free_stale_devices(devt, NULL); - - pr_debug("BTRFS: skip registering single non-seed device %s\n", path); device = NULL; goto free_disk_super; } -- 2.39.3

1 year, 4 months

3
9
0 0

[PATCH 2/2] usb: gadget: tegra-xudc: Fix USB3 PHY retrieval logic

by Wayne Chang

This commit resolves an issue in the tegra-xudc USB gadget driver that incorrectly fetched USB3 PHY instances. The problem stemmed from the assumption of a one-to-one correspondence between USB2 and USB3 PHY names and their association with physical USB ports in the device tree. Previously, the driver associated USB3 PHY names directly with the USB3 instance number, leading to mismatches when mapping the physical USB ports. For instance, if using USB3-1 PHY, the driver expect the corresponding PHY name as 'usb3-1'. However, the physical USB ports in the device tree were designated as USB2-0 and USB3-0 as we only have one device controller, causing a misalignment. This commit rectifies the issue by adjusting the PHY naming logic. Now, the driver correctly correlates the USB2 and USB3 PHY instances, allowing the USB2-0 and USB3-1 PHYs to form a physical USB port pair while accurately reflecting their configuration in the device tree by naming them USB2-0 and USB3-0, respectively. The change ensures that the PHY and PHY names align appropriately, resolving the mismatch between physical USB ports and their associated names in the device tree. Fixes: b4e19931c98a ("usb: gadget: tegra-xudc: Support multiple device modes") Cc: stable(a)vger.kernel.org Signed-off-by: Wayne Chang <waynec(a)nvidia.com> --- drivers/usb/gadget/udc/tegra-xudc.c | 39 ++++++++++++++++++----------- 1 file changed, 25 insertions(+), 14 deletions(-) diff --git a/drivers/usb/gadget/udc/tegra-xudc.c b/drivers/usb/gadget/udc/tegra-xudc.c index cb85168fd00c..7aa46d426f31 100644 --- a/drivers/usb/gadget/udc/tegra-xudc.c +++ b/drivers/usb/gadget/udc/tegra-xudc.c @@ -3491,8 +3491,8 @@ static void tegra_xudc_device_params_init(struct tegra_xudc *xudc) static int tegra_xudc_phy_get(struct tegra_xudc *xudc) { - int err = 0, usb3; - unsigned int i; + int err = 0, usb3_companion_port; + unsigned int i, j; xudc->utmi_phy = devm_kcalloc(xudc->dev, xudc->soc->num_phys, sizeof(*xudc->utmi_phy), GFP_KERNEL); @@ -3520,7 +3520,7 @@ static int tegra_xudc_phy_get(struct tegra_xudc *xudc) if (IS_ERR(xudc->utmi_phy[i])) { err = PTR_ERR(xudc->utmi_phy[i]); dev_err_probe(xudc->dev, err, - "failed to get usb2-%d PHY\n", i); + "failed to get PHY for phy-name usb2-%d\n", i); goto clean_up; } else if (xudc->utmi_phy[i]) { /* Get usb-phy, if utmi phy is available */ @@ -3539,19 +3539,30 @@ static int tegra_xudc_phy_get(struct tegra_xudc *xudc) } /* Get USB3 phy */ - usb3 = tegra_xusb_padctl_get_usb3_companion(xudc->padctl, i); - if (usb3 < 0) + usb3_companion_port = tegra_xusb_padctl_get_usb3_companion(xudc->padctl, i); + if (usb3_companion_port < 0) continue; - snprintf(phy_name, sizeof(phy_name), "usb3-%d", usb3); - xudc->usb3_phy[i] = devm_phy_optional_get(xudc->dev, phy_name); - if (IS_ERR(xudc->usb3_phy[i])) { - err = PTR_ERR(xudc->usb3_phy[i]); - dev_err_probe(xudc->dev, err, - "failed to get usb3-%d PHY\n", usb3); - goto clean_up; - } else if (xudc->usb3_phy[i]) - dev_dbg(xudc->dev, "usb3-%d PHY registered", usb3); + for (j = 0; j < xudc->soc->num_phys; j++) { + snprintf(phy_name, sizeof(phy_name), "usb3-%d", j); + xudc->usb3_phy[i] = devm_phy_optional_get(xudc->dev, phy_name); + if (IS_ERR(xudc->usb3_phy[i])) { + err = PTR_ERR(xudc->usb3_phy[i]); + dev_err_probe(xudc->dev, err, + "failed to get PHY for phy-name usb3-%d\n", j); + goto clean_up; + } else if (xudc->usb3_phy[i]) { + int usb2_port = + tegra_xusb_padctl_get_port_number(xudc->utmi_phy[i]); + int usb3_port = + tegra_xusb_padctl_get_port_number(xudc->usb3_phy[i]); + if (usb3_port == usb3_companion_port) { + dev_dbg(xudc->dev, "USB2 port %d is paired with USB3 port %d for device mode port %d\n", + usb2_port, usb3_port, i); + break; + } + } + } } return err; -- 2.25.1

1 year, 4 months

2
2
0 0

[PATCH v2] libceph: init the cursor when preparing the sparse read

by xiubli＠redhat.com

From: Xiubo Li <xiubli(a)redhat.com> The osd code has remove cursor initilizing code and this will make the sparse read state into a infinite loop. We should initialize the cursor just before each sparse-read in messnger v2. Cc: stable(a)vger.kernel.org URL: https://tracker.ceph.com/issues/64607 Fixes: 8e46a2d068c9 ("libceph: just wait for more data to be available on the socket") Reported-by: Luis Henriques <lhenriques(a)suse.de> Signed-off-by: Xiubo Li <xiubli(a)redhat.com> --- V2: - Just removed the unnecessary 'sparse_read_total' check. net/ceph/messenger_v2.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c index a0ca5414b333..ab3ab130a911 100644 --- a/net/ceph/messenger_v2.c +++ b/net/ceph/messenger_v2.c @@ -2034,6 +2034,9 @@ static int prepare_sparse_read_data(struct ceph_connection *con) if (!con_secure(con)) con->in_data_crc = -1; + ceph_msg_data_cursor_init(&con->v2.in_cursor, con->in_msg, + con->in_msg->sparse_read_total); + reset_in_kvecs(con); con->v2.in_state = IN_S_PREPARE_SPARSE_DATA_CONT; con->v2.data_len_remain = data_len(msg); -- 2.43.0

1 year, 4 months

4
3
0 0

[PATCH 5.15 00/83] 5.15.151-rc2 review

by Greg Kroah-Hartman

This is the start of the stable review cycle for the 5.15.151 release. There are 83 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Thu, 07 Mar 2024 11:31:11 +0000. Anything received after that time might be too late. The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.151-r… or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below. thanks, greg k-h ------------- Pseudo-Shortlog of commits: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Linux 5.15.151-rc2 Davide Caratti <dcaratti(a)redhat.com> mptcp: fix double-free on socket dismantle Gal Pressman <gal(a)nvidia.com> Revert "tls: rx: move counting TlsDecryptErrors for sync" Jakub Kicinski <kuba(a)kernel.org> net: tls: fix async vs NIC crypto offload Martynas Pumputis <m(a)lambda.lt> bpf: Derive source IP addr via bpf_*_fib_lookup() Louis DeLosSantos <louis.delos.devel(a)gmail.com> bpf: Add table ID to bpf_fib_lookup BPF helper Martin KaFai Lau <martin.lau(a)kernel.org> bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Revert "interconnect: Teach lockdep about icc_bw_lock order" Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Revert "interconnect: Fix locking for runpm vs reclaim" Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org> gpio: fix resource unwinding order in error path Andy Shevchenko <andriy.shevchenko(a)linux.intel.com> gpiolib: Fix the error path order in gpiochip_add_data_with_key() Arturas Moskvinas <arturas.moskvinas(a)gmail.com> gpio: 74x164: Enable output pins after registers are reset Kuniyuki Iwashima <kuniyu(a)amazon.com> af_unix: Drop oob_skb ref before purging queue in GC. Max Krummenacher <max.krummenacher(a)toradex.com> Revert "drm/bridge: lt8912b: Register and attach our DSI device at probe" Oscar Salvador <osalvador(a)suse.de> fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super Baokun Li <libaokun1(a)huawei.com> cachefiles: fix memory leak in cachefiles_add_cache() Paolo Abeni <pabeni(a)redhat.com> mptcp: fix possible deadlock in subflow diag Paolo Abeni <pabeni(a)redhat.com> mptcp: push at DSS boundaries Geliang Tang <tanggeliang(a)kylinos.cn> mptcp: add needs_id for netlink appending addr Jean Sacren <sakiwit(a)gmail.com> mptcp: clean up harmless false expressions Matthieu Baerts (NGI0) <matttbe(a)kernel.org> selftests: mptcp: add missing kconfig for NF Filter in v6 Matthieu Baerts (NGI0) <matttbe(a)kernel.org> selftests: mptcp: add missing kconfig for NF Filter Paolo Abeni <pabeni(a)redhat.com> mptcp: rename timer related helper to less confusing names Paolo Abeni <pabeni(a)redhat.com> mptcp: process pending subflow error on close Paolo Abeni <pabeni(a)redhat.com> mptcp: move __mptcp_error_report in protocol.c Paolo Bonzini <pbonzini(a)redhat.com> x86/cpu/intel: Detect TME keyid bits before setting MTRR mask registers Bjorn Andersson <quic_bjorande(a)quicinc.com> pmdomain: qcom: rpmhpd: Fix enabled_corner aggregation Elad Nachman <enachman(a)marvell.com> mmc: sdhci-xenon: fix PHY init clock stability Elad Nachman <enachman(a)marvell.com> mmc: sdhci-xenon: add timeout for PHY init complete Ivan Semenov <ivan(a)semenov.dev> mmc: core: Fix eMMC initialization with 1-bit bus connection Curtis Klein <curtis.klein(a)hpe.com> dmaengine: fsl-qdma: init irq after reg initialization Tadeusz Struk <tstruk(a)gigaio.com> dmaengine: ptdma: use consistent DMA masks Peng Ma <peng.ma(a)nxp.com> dmaengine: fsl-qdma: fix SoC may hang on 16 byte unaligned read David Sterba <dsterba(a)suse.com> btrfs: dev-replace: properly validate device names Johannes Berg <johannes.berg(a)intel.com> wifi: nl80211: reject iftype change with mesh ID change Alexander Ofitserov <oficerovas(a)altlinux.org> gtp: fix use-after-free and null-ptr-deref in gtp_newlink() Takashi Sakamoto <o-takashi(a)sakamocchi.jp> ALSA: firewire-lib: fix to check cycle continuity Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp> tomoyo: fix UAF write bug in tomoyo_write_control() Dimitris Vlachos <dvlachos(a)ics.forth.gr> riscv: Sparse-Memory/vmemmap out-of-bounds fix David Howells <dhowells(a)redhat.com> afs: Fix endless loop in directory parsing Jiri Slaby (SUSE) <jirislaby(a)kernel.org> fbcon: always restore the old font data in fbcon_do_set_font() Takashi Iwai <tiwai(a)suse.de> ALSA: Drop leftover snd-rtctimer stuff from Makefile Hans de Goede <hdegoede(a)redhat.com> power: supply: bq27xxx-i2c: Do not free non existing IRQ Arnd Bergmann <arnd(a)arndb.de> efi/capsule-loader: fix incorrect allocation size Sabrina Dubroca <sd(a)queasysnail.net> tls: decrement decrypt_pending if no async completion will be called Jakub Kicinski <kuba(a)kernel.org> tls: rx: use async as an in-out argument Jakub Kicinski <kuba(a)kernel.org> tls: rx: assume crypto always calls our callback Jakub Kicinski <kuba(a)kernel.org> tls: rx: move counting TlsDecryptErrors for sync Jakub Kicinski <kuba(a)kernel.org> tls: rx: don't track the async count Jakub Kicinski <kuba(a)kernel.org> tls: rx: factor out writing ContentType to cmsg Jakub Kicinski <kuba(a)kernel.org> tls: rx: wrap decryption arguments in a structure Jakub Kicinski <kuba(a)kernel.org> tls: rx: don't report text length from the bowels of decrypt Jakub Kicinski <kuba(a)kernel.org> tls: rx: drop unnecessary arguments from tls_setup_from_iter() Jakub Kicinski <kuba(a)kernel.org> tls: hw: rx: use return value of tls_device_decrypted() to carry status Jakub Kicinski <kuba(a)kernel.org> tls: rx: refactor decrypt_skb_update() Jakub Kicinski <kuba(a)kernel.org> tls: rx: don't issue wake ups when data is decrypted Jakub Kicinski <kuba(a)kernel.org> tls: rx: don't store the decryption status in socket context Jakub Kicinski <kuba(a)kernel.org> tls: rx: don't store the record type in socket context Oleksij Rempel <linux(a)rempel-privat.de> igb: extend PTP timestamp adjustments to i211 Lin Ma <linma(a)zju.edu.cn> rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back Florian Westphal <fw(a)strlen.de> netfilter: bridge: confirm multicast packets before passing them up the stack Florian Westphal <fw(a)strlen.de> netfilter: let reset rules clean out conntrack entries Florian Westphal <fw(a)strlen.de> netfilter: make function op structures const Florian Westphal <fw(a)strlen.de> netfilter: core: move ip_ct_attach indirection to struct nf_ct_hook Florian Westphal <fw(a)strlen.de> netfilter: nfnetlink_queue: silence bogus compiler warning Ignat Korchagin <ignat(a)cloudflare.com> netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate() Kai-Heng Feng <kai.heng.feng(a)canonical.com> Bluetooth: Enforce validation on max value of connection interval Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com> Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST Zijun Hu <quic_zijuhu(a)quicinc.com> Bluetooth: hci_event: Fix wrongly recorded wakeup BD_ADDR Ying Hsu <yinghsu(a)chromium.org> Bluetooth: Avoid potential use-after-free in hci_error_reset Jakub Raczynski <j.raczynski(a)samsung.com> stmmac: Clear variable when destroying workqueue Justin Iurman <justin.iurman(a)uliege.be> uapi: in6: replace temporary label with rfc9486 Javier Carrasco <javier.carrasco.cruz(a)gmail.com> net: usb: dm9601: fix wrong return value in dm9601_mdio_read Jakub Kicinski <kuba(a)kernel.org> veth: try harder when allocating queue memory Vasily Averin <vvs(a)openvz.org> net: enable memcg accounting for veth queues Oleksij Rempel <linux(a)rempel-privat.de> lan78xx: enable auto speed configuration for LAN7850 if no EEPROM is detected Eric Dumazet <edumazet(a)google.com> ipv6: fix potential "struct net" leak in inet6_rtm_getaddr() Jakub Kicinski <kuba(a)kernel.org> net: veth: clear GRO when clearing XDP even when down Doug Smythies <dsmythies(a)telus.net> cpufreq: intel_pstate: fix pstate limits enforcement for adjust_perf call back Yunjian Wang <wangyunjian(a)huawei.com> tun: Fix xdp_rxq_info's queue_index when detaching Florian Westphal <fw(a)strlen.de> net: ip_tunnel: prevent perpetual headroom growth Ryosuke Yasuoka <ryasuoka(a)redhat.com> netlink: Fix kernel-infoleak-after-free in __skb_datagram_iter Han Xu <han.xu(a)nxp.com> mtd: spinand: gigadevice: Fix the get ecc status issue Pablo Neira Ayuso <pablo(a)netfilter.org> netfilter: nf_tables: disallow timeout for anonymous sets ------------- Diffstat: Makefile | 4 +- arch/riscv/include/asm/pgtable.h | 2 +- arch/x86/kernel/cpu/intel.c | 178 ++++++------ drivers/cpufreq/intel_pstate.c | 3 + drivers/dma/fsl-qdma.c | 25 +- drivers/dma/ptdma/ptdma-dmaengine.c | 2 - drivers/firmware/efi/capsule-loader.c | 2 +- drivers/gpio/gpio-74x164.c | 4 +- drivers/gpio/gpiolib.c | 12 +- drivers/gpu/drm/bridge/lontium-lt8912b.c | 11 +- drivers/interconnect/core.c | 18 +- drivers/mmc/core/mmc.c | 2 + drivers/mmc/host/sdhci-xenon-phy.c | 48 +++- drivers/mtd/nand/spi/gigadevice.c | 6 +- drivers/net/ethernet/intel/igb/igb_ptp.c | 5 +- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 +- drivers/net/gtp.c | 12 +- drivers/net/tun.c | 1 + drivers/net/usb/dm9601.c | 2 +- drivers/net/usb/lan78xx.c | 3 +- drivers/net/veth.c | 40 +-- drivers/power/supply/bq27xxx_battery_i2c.c | 4 +- drivers/soc/qcom/rpmhpd.c | 7 +- drivers/video/fbdev/core/fbcon.c | 8 +- fs/afs/dir.c | 4 +- fs/btrfs/dev-replace.c | 24 +- fs/cachefiles/bind.c | 3 + fs/hugetlbfs/inode.c | 6 +- include/linux/netfilter.h | 14 +- include/net/ipv6_stubs.h | 5 + include/net/netfilter/nf_conntrack.h | 8 + include/net/strparser.h | 4 + include/net/tls.h | 11 +- include/uapi/linux/bpf.h | 37 ++- include/uapi/linux/in6.h | 2 +- net/bluetooth/hci_core.c | 7 +- net/bluetooth/hci_event.c | 13 +- net/bluetooth/l2cap_core.c | 8 +- net/bridge/br_netfilter_hooks.c | 96 +++++++ net/bridge/netfilter/nf_conntrack_bridge.c | 30 ++ net/core/filter.c | 67 ++++- net/core/rtnetlink.c | 11 +- net/ipv4/ip_tunnel.c | 28 +- net/ipv4/netfilter/nf_reject_ipv4.c | 1 + net/ipv6/addrconf.c | 7 +- net/ipv6/af_inet6.c | 1 + net/ipv6/netfilter/nf_reject_ipv6.c | 1 + net/mptcp/diag.c | 3 + net/mptcp/pm_netlink.c | 30 +- net/mptcp/protocol.c | 123 +++++++-- net/mptcp/subflow.c | 36 --- net/netfilter/core.c | 45 +-- net/netfilter/nf_conntrack_core.c | 21 +- net/netfilter/nf_conntrack_netlink.c | 4 +- net/netfilter/nf_conntrack_proto_tcp.c | 35 +++ net/netfilter/nf_nat_core.c | 2 +- net/netfilter/nf_tables_api.c | 7 + net/netfilter/nfnetlink_queue.c | 10 +- net/netfilter/nft_compat.c | 20 ++ net/netlink/af_netlink.c | 2 +- net/tls/tls_device.c | 6 +- net/tls/tls_sw.c | 316 ++++++++++------------ net/unix/garbage.c | 22 +- net/wireless/nl80211.c | 2 + security/tomoyo/common.c | 3 +- sound/core/Makefile | 1 - sound/firewire/amdtp-stream.c | 2 +- tools/include/uapi/linux/bpf.h | 37 ++- tools/testing/selftests/net/mptcp/config | 2 + 69 files changed, 991 insertions(+), 529 deletions(-)

1 year, 4 months

6
5
0 0

[PATCH] mmc: part_switch: fixes switch on gp3 partition

by Dominique Martinet

From: Dominique Martinet <dominique.martinet(a)atmark-techno.com> Commit e7794c14fd73 ("mmc: rpmb: fixes pause retune on all RPMB partitions.") added a mask check for 'part_type', but the mask used was wrong leading to the code intended for rpmb also being executed for GP3. On some MMCs (but not all) this would make gp3 partition inaccessible: armadillo:~# head -c 1 < /dev/mmcblk2gp3 head: standard input: I/O error armadillo:~# dmesg -c [ 422.976583] mmc2: running CQE recovery [ 423.058182] mmc2: running CQE recovery [ 423.137607] mmc2: running CQE recovery [ 423.137802] blk_update_request: I/O error, dev mmcblk2gp3, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0 [ 423.237125] mmc2: running CQE recovery [ 423.318206] mmc2: running CQE recovery [ 423.397680] mmc2: running CQE recovery [ 423.397837] blk_update_request: I/O error, dev mmcblk2gp3, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 [ 423.408287] Buffer I/O error on dev mmcblk2gp3, logical block 0, async page read the part_type values of interest here are defined as follow: main 0 boot0 1 boot1 2 rpmb 3 gp0 4 gp1 5 gp2 6 gp3 7 so mask with EXT_CSD_PART_CONFIG_ACC_MASK (7) to correctly identify rpmb Fixes: e7794c14fd73 ("mmc: rpmb: fixes pause retune on all RPMB partitions.") Cc: stable(a)vger.kernel.org Cc: Jorge Ramirez-Ortiz <jorge(a)foundries.io> Signed-off-by: Dominique Martinet <dominique.martinet(a)atmark-techno.com> --- A couple of notes: - this doesn't fail on all eMMCs, I can still access gp3 on some models but it seems to fail reliably with micron's "G1M15L" - I've encountered this on the 5.10 backport (in 5.10.208), so that'll need to be backported everywhere the fix was taken... Thanks! --- drivers/mmc/core/block.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index 32d49100dff5..86efa6084696 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -874,10 +874,11 @@ static const struct block_device_operations mmc_bdops = { static int mmc_blk_part_switch_pre(struct mmc_card *card, unsigned int part_type) { - const unsigned int mask = EXT_CSD_PART_CONFIG_ACC_RPMB; + const unsigned int mask = EXT_CSD_PART_CONFIG_ACC_MASK; + const unsigned int rpmb = EXT_CSD_PART_CONFIG_ACC_RPMB; int ret = 0; - if ((part_type & mask) == mask) { + if ((part_type & mask) == rpmb) { if (card->ext_csd.cmdq_en) { ret = mmc_cmdq_disable(card); if (ret) @@ -892,10 +893,11 @@ static int mmc_blk_part_switch_pre(struct mmc_card *card, static int mmc_blk_part_switch_post(struct mmc_card *card, unsigned int part_type) { - const unsigned int mask = EXT_CSD_PART_CONFIG_ACC_RPMB; + const unsigned int mask = EXT_CSD_PART_CONFIG_ACC_MASK; + const unsigned int rpmb = EXT_CSD_PART_CONFIG_ACC_RPMB; int ret = 0; - if ((part_type & mask) == mask) { + if ((part_type & mask) == rpmb) { mmc_retune_unpause(card->host); if (card->reenable_cmdq && !card->ext_csd.cmdq_en) ret = mmc_cmdq_enable(card); --- base-commit: 5847c9777c303a792202c609bd761dceb60f4eed change-id: 20240306-mmc-partswitch-c3a50b5084ae Best regards, -- Dominique Martinet | Asmadeus

1 year, 4 months

5
11
0 0

[PATCH v4 2/2] of: dynamic: Synchronize of_changeset_destroy() with the devlink removals

by Herve Codina

In the following sequence: 1) of_platform_depopulate() 2) of_overlay_remove() During the step 1, devices are destroyed and devlinks are removed. During the step 2, OF nodes are destroyed but __of_changeset_entry_destroy() can raise warnings related to missing of_node_put(): ERROR: memory leak, expected refcount 1 instead of 2 ... Indeed, during the devlink removals performed at step 1, the removal itself releasing the device (and the attached of_node) is done by a job queued in a workqueue and so, it is done asynchronously with respect to function calls. When the warning is present, of_node_put() will be called but wrongly too late from the workqueue job. In order to be sure that any ongoing devlink removals are done before the of_node destruction, synchronize the of_changeset_destroy() with the devlink removals. Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") Cc: stable(a)vger.kernel.org Signed-off-by: Herve Codina <herve.codina(a)bootlin.com> --- drivers/of/dynamic.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c index 3bf27052832f..169e2a9ae22f 100644 --- a/drivers/of/dynamic.c +++ b/drivers/of/dynamic.c @@ -9,6 +9,7 @@ #define pr_fmt(fmt) "OF: " fmt +#include <linux/device.h> #include <linux/of.h> #include <linux/spinlock.h> #include <linux/slab.h> @@ -667,6 +668,12 @@ void of_changeset_destroy(struct of_changeset *ocs) { struct of_changeset_entry *ce, *cen; + /* + * Wait for any ongoing device link removals before destroying some of + * nodes. + */ + device_link_wait_removal(); + list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node) __of_changeset_entry_destroy(ce); } -- 2.43.0

1 year, 4 months

4
3
0 0

[PATCH v4 1/2] driver core: Introduce device_link_wait_removal()

by Herve Codina

The commit 80dd33cf72d1 ("drivers: base: Fix device link removal") introduces a workqueue to release the consumer and supplier devices used in the devlink. In the job queued, devices are release and in turn, when all the references to these devices are dropped, the release function of the device itself is called. Nothing is present to provide some synchronisation with this workqueue in order to ensure that all ongoing releasing operations are done and so, some other operations can be started safely. For instance, in the following sequence: 1) of_platform_depopulate() 2) of_overlay_remove() During the step 1, devices are released and related devlinks are removed (jobs pushed in the workqueue). During the step 2, OF nodes are destroyed but, without any synchronisation with devlink removal jobs, of_overlay_remove() can raise warnings related to missing of_node_put(): ERROR: memory leak, expected refcount 1 instead of 2 Indeed, the missing of_node_put() call is going to be done, too late, from the workqueue job execution. Introduce device_link_wait_removal() to offer a way to synchronize operations waiting for the end of devlink removals (i.e. end of workqueue jobs). Also, as a flushing operation is done on the workqueue, the workqueue used is moved from a system-wide workqueue to a local one. Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") Cc: stable(a)vger.kernel.org Signed-off-by: Herve Codina <herve.codina(a)bootlin.com> --- drivers/base/core.c | 26 +++++++++++++++++++++++--- include/linux/device.h | 1 + 2 files changed, 24 insertions(+), 3 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index d5f4e4aac09b..48b28c59c592 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -44,6 +44,7 @@ static bool fw_devlink_is_permissive(void); static void __fw_devlink_link_to_consumers(struct device *dev); static bool fw_devlink_drv_reg_done; static bool fw_devlink_best_effort; +static struct workqueue_struct *device_link_wq; /** * __fwnode_link_add - Create a link between two fwnode_handles. @@ -532,12 +533,26 @@ static void devlink_dev_release(struct device *dev) /* * It may take a while to complete this work because of the SRCU * synchronization in device_link_release_fn() and if the consumer or - * supplier devices get deleted when it runs, so put it into the "long" - * workqueue. + * supplier devices get deleted when it runs, so put it into the + * dedicated workqueue. */ - queue_work(system_long_wq, &link->rm_work); + queue_work(device_link_wq, &link->rm_work); } +/** + * device_link_wait_removal - Wait for ongoing devlink removal jobs to terminate + */ +void device_link_wait_removal(void) +{ + /* + * devlink removal jobs are queued in the dedicated work queue. + * To be sure that all removal jobs are terminated, ensure that any + * scheduled work has run to completion. + */ + flush_workqueue(device_link_wq); +} +EXPORT_SYMBOL_GPL(device_link_wait_removal); + static struct class devlink_class = { .name = "devlink", .dev_groups = devlink_groups, @@ -4099,9 +4114,14 @@ int __init devices_init(void) sysfs_dev_char_kobj = kobject_create_and_add("char", dev_kobj); if (!sysfs_dev_char_kobj) goto char_kobj_err; + device_link_wq = alloc_workqueue("device_link_wq", 0, 0); + if (!device_link_wq) + goto wq_err; return 0; + wq_err: + kobject_put(sysfs_dev_char_kobj); char_kobj_err: kobject_put(sysfs_dev_block_kobj); block_kobj_err: diff --git a/include/linux/device.h b/include/linux/device.h index 1795121dee9a..d7d8305a72e8 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -1249,6 +1249,7 @@ void device_link_del(struct device_link *link); void device_link_remove(void *consumer, struct device *supplier); void device_links_supplier_sync_state_pause(void); void device_links_supplier_sync_state_resume(void); +void device_link_wait_removal(void); /* Create alias, so I can be autoloaded. */ #define MODULE_ALIAS_CHARDEV(major,minor) \ -- 2.43.0

1 year, 4 months

5
15
0 0

[PATCH wpan] mac802154: fix llsec key resources release in mac802154_llsec_key_del

by Fedor Pchelkin

mac802154_llsec_key_del() can free resources of a key directly without following the RCU rules for waiting before the end of a grace period. This may lead to use-after-free in case llsec_lookup_key() is traversing the list of keys in parallel with a key deletion: refcount_t: addition on 0; use-after-free. WARNING: CPU: 4 PID: 16000 at lib/refcount.c:25 refcount_warn_saturate+0x162/0x2a0 Modules linked in: CPU: 4 PID: 16000 Comm: wpan-ping Not tainted 6.7.0 #19 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:refcount_warn_saturate+0x162/0x2a0 Call Trace: <TASK> llsec_lookup_key.isra.0+0x890/0x9e0 mac802154_llsec_encrypt+0x30c/0x9c0 ieee802154_subif_start_xmit+0x24/0x1e0 dev_hard_start_xmit+0x13e/0x690 sch_direct_xmit+0x2ae/0xbc0 __dev_queue_xmit+0x11dd/0x3c20 dgram_sendmsg+0x90b/0xd60 __sys_sendto+0x466/0x4c0 __x64_sys_sendto+0xe0/0x1c0 do_syscall_64+0x45/0xf0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 Also, ieee802154_llsec_key_entry structures are not freed by mac802154_llsec_key_del(): unreferenced object 0xffff8880613b6980 (size 64): comm "iwpan", pid 2176, jiffies 4294761134 (age 60.475s) hex dump (first 32 bytes): 78 0d 8f 18 80 88 ff ff 22 01 00 00 00 00 ad de x......."....... 00 00 00 00 00 00 00 00 03 00 cd ab 00 00 00 00 ................ backtrace: [<ffffffff81dcfa62>] __kmem_cache_alloc_node+0x1e2/0x2d0 [<ffffffff81c43865>] kmalloc_trace+0x25/0xc0 [<ffffffff88968b09>] mac802154_llsec_key_add+0xac9/0xcf0 [<ffffffff8896e41a>] ieee802154_add_llsec_key+0x5a/0x80 [<ffffffff8892adc6>] nl802154_add_llsec_key+0x426/0x5b0 [<ffffffff86ff293e>] genl_family_rcv_msg_doit+0x1fe/0x2f0 [<ffffffff86ff46d1>] genl_rcv_msg+0x531/0x7d0 [<ffffffff86fee7a9>] netlink_rcv_skb+0x169/0x440 [<ffffffff86ff1d88>] genl_rcv+0x28/0x40 [<ffffffff86fec15c>] netlink_unicast+0x53c/0x820 [<ffffffff86fecd8b>] netlink_sendmsg+0x93b/0xe60 [<ffffffff86b91b35>] ____sys_sendmsg+0xac5/0xca0 [<ffffffff86b9c3dd>] ___sys_sendmsg+0x11d/0x1c0 [<ffffffff86b9c65a>] __sys_sendmsg+0xfa/0x1d0 [<ffffffff88eadbf5>] do_syscall_64+0x45/0xf0 [<ffffffff890000ea>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 Handle the proper resource release in the RCU callback function mac802154_llsec_key_del_rcu(). Note that if llsec_lookup_key() finds a key, it gets a refcount via llsec_key_get() and locally copies key id from key_entry (which is a list element). So it's safe to call llsec_key_put() and free the list entry after the RCU grace period elapses. Found by Linux Verification Center (linuxtesting.org). Fixes: 5d637d5aabd8 ("mac802154: add llsec structures and mutators") Cc: stable(a)vger.kernel.org Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru> --- Should the patch be targeted to "net" tree directly? include/net/cfg802154.h | 1 + net/mac802154/llsec.c | 18 +++++++++++++----- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h index cd95711b12b8..76d2cd2e2b30 100644 --- a/include/net/cfg802154.h +++ b/include/net/cfg802154.h @@ -401,6 +401,7 @@ struct ieee802154_llsec_key { struct ieee802154_llsec_key_entry { struct list_head list; + struct rcu_head rcu; struct ieee802154_llsec_key_id id; struct ieee802154_llsec_key *key; diff --git a/net/mac802154/llsec.c b/net/mac802154/llsec.c index 8d2eabc71bbe..f13b07ebfb98 100644 --- a/net/mac802154/llsec.c +++ b/net/mac802154/llsec.c @@ -265,19 +265,27 @@ int mac802154_llsec_key_add(struct mac802154_llsec *sec, return -ENOMEM; } +static void mac802154_llsec_key_del_rcu(struct rcu_head *rcu) +{ + struct ieee802154_llsec_key_entry *pos; + struct mac802154_llsec_key *mkey; + + pos = container_of(rcu, struct ieee802154_llsec_key_entry, rcu); + mkey = container_of(pos->key, struct mac802154_llsec_key, key); + + llsec_key_put(mkey); + kfree_sensitive(pos); +} + int mac802154_llsec_key_del(struct mac802154_llsec *sec, const struct ieee802154_llsec_key_id *key) { struct ieee802154_llsec_key_entry *pos; list_for_each_entry(pos, &sec->table.keys, list) { - struct mac802154_llsec_key *mkey; - - mkey = container_of(pos->key, struct mac802154_llsec_key, key); - if (llsec_key_id_equal(&pos->id, key)) { list_del_rcu(&pos->list); - llsec_key_put(mkey); + call_rcu(&pos->rcu, mac802154_llsec_key_del_rcu); return 0; } } -- 2.43.2

1 year, 4 months

3
4
0 0

[merged mm-stable] mm-swap-fix-race-between-free_swap_and_cache-and-swapoff.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: swap: fix race between free_swap_and_cache() and swapoff() has been removed from the -mm tree. Its filename was mm-swap-fix-race-between-free_swap_and_cache-and-swapoff.patch This patch was dropped because it was merged into the mm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Ryan Roberts <ryan.roberts(a)arm.com> Subject: mm: swap: fix race between free_swap_and_cache() and swapoff() Date: Wed, 6 Mar 2024 14:03:56 +0000 There was previously a theoretical window where swapoff() could run and teardown a swap_info_struct while a call to free_swap_and_cache() was running in another thread. This could cause, amongst other bad possibilities, swap_page_trans_huge_swapped() (called by free_swap_and_cache()) to access the freed memory for swap_map. This is a theoretical problem and I haven't been able to provoke it from a test case. But there has been agreement based on code review that this is possible (see link below). Fix it by using get_swap_device()/put_swap_device(), which will stall swapoff(). There was an extra check in _swap_info_get() to confirm that the swap entry was not free. This isn't present in get_swap_device() because it doesn't make sense in general due to the race between getting the reference and swapoff. So I've added an equivalent check directly in free_swap_and_cache(). Details of how to provoke one possible issue (thanks to David Hildenbrand for deriving this): --8<----- __swap_entry_free() might be the last user and result in "count == SWAP_HAS_CACHE". swapoff->try_to_unuse() will stop as soon as soon as si->inuse_pages==0. So the question is: could someone reclaim the folio and turn si->inuse_pages==0, before we completed swap_page_trans_huge_swapped(). Imagine the following: 2 MiB folio in the swapcache. Only 2 subpages are still references by swap entries. Process 1 still references subpage 0 via swap entry. Process 2 still references subpage 1 via swap entry. Process 1 quits. Calls free_swap_and_cache(). -> count == SWAP_HAS_CACHE [then, preempted in the hypervisor etc.] Process 2 quits. Calls free_swap_and_cache(). -> count == SWAP_HAS_CACHE Process 2 goes ahead, passes swap_page_trans_huge_swapped(), and calls __try_to_reclaim_swap(). __try_to_reclaim_swap()->folio_free_swap()->delete_from_swap_cache()-> put_swap_folio()->free_swap_slot()->swapcache_free_entries()-> swap_entry_free()->swap_range_free()-> ... WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); What stops swapoff to succeed after process 2 reclaimed the swap cache but before process1 finished its call to swap_page_trans_huge_swapped()? --8<----- Link: https://lkml.kernel.org/r/20240306140356.3974886-1-ryan.roberts@arm.com Fixes: 7c00bafee87c ("mm/swap: free swap slots in batch") Closes: https://lore.kernel.org/linux-mm/65a66eb9-41f8-4790-8db2-0c70ea15979f@redha… Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: "Huang, Ying" <ying.huang(a)intel.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/swapfile.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) --- a/mm/swapfile.c~mm-swap-fix-race-between-free_swap_and_cache-and-swapoff +++ a/mm/swapfile.c @@ -1232,6 +1232,11 @@ static unsigned char __swap_entry_free_l * with get_swap_device() and put_swap_device(), unless the swap * functions call get/put_swap_device() by themselves. * + * Note that when only holding the PTL, swapoff might succeed immediately + * after freeing a swap entry. Therefore, immediately after + * __swap_entry_free(), the swap info might become stale and should not + * be touched without a prior get_swap_device(). + * * Check whether swap entry is valid in the swap device. If so, * return pointer to swap_info_struct, and keep the swap entry valid * via preventing the swap device from being swapoff, until @@ -1609,13 +1614,19 @@ int free_swap_and_cache(swp_entry_t entr if (non_swap_entry(entry)) return 1; - p = _swap_info_get(entry); + p = get_swap_device(entry); if (p) { + if (WARN_ON(data_race(!p->swap_map[swp_offset(entry)]))) { + put_swap_device(p); + return 0; + } + count = __swap_entry_free(p, entry); if (count == SWAP_HAS_CACHE && !swap_page_trans_huge_swapped(p, entry)) __try_to_reclaim_swap(p, swp_offset(entry), TTRS_UNMAPPED | TTRS_FULL); + put_swap_device(p); } return p != NULL; } _ Patches currently in -mm which might be from ryan.roberts(a)arm.com are

1 year, 4 months

1
0
0 0

+ mm-swap-fix-race-between-free_swap_and_cache-and-swapoff.patch added to mm-unstable branch

by Andrew Morton

The patch titled Subject: mm: swap: fix race between free_swap_and_cache() and swapoff() has been added to the -mm mm-unstable branch. Its filename is mm-swap-fix-race-between-free_swap_and_cache-and-swapoff.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Ryan Roberts <ryan.roberts(a)arm.com> Subject: mm: swap: fix race between free_swap_and_cache() and swapoff() Date: Wed, 6 Mar 2024 14:03:56 +0000 There was previously a theoretical window where swapoff() could run and teardown a swap_info_struct while a call to free_swap_and_cache() was running in another thread. This could cause, amongst other bad possibilities, swap_page_trans_huge_swapped() (called by free_swap_and_cache()) to access the freed memory for swap_map. This is a theoretical problem and I haven't been able to provoke it from a test case. But there has been agreement based on code review that this is possible (see link below). Fix it by using get_swap_device()/put_swap_device(), which will stall swapoff(). There was an extra check in _swap_info_get() to confirm that the swap entry was not free. This isn't present in get_swap_device() because it doesn't make sense in general due to the race between getting the reference and swapoff. So I've added an equivalent check directly in free_swap_and_cache(). Details of how to provoke one possible issue (thanks to David Hildenbrand for deriving this): --8<----- __swap_entry_free() might be the last user and result in "count == SWAP_HAS_CACHE". swapoff->try_to_unuse() will stop as soon as soon as si->inuse_pages==0. So the question is: could someone reclaim the folio and turn si->inuse_pages==0, before we completed swap_page_trans_huge_swapped(). Imagine the following: 2 MiB folio in the swapcache. Only 2 subpages are still references by swap entries. Process 1 still references subpage 0 via swap entry. Process 2 still references subpage 1 via swap entry. Process 1 quits. Calls free_swap_and_cache(). -> count == SWAP_HAS_CACHE [then, preempted in the hypervisor etc.] Process 2 quits. Calls free_swap_and_cache(). -> count == SWAP_HAS_CACHE Process 2 goes ahead, passes swap_page_trans_huge_swapped(), and calls __try_to_reclaim_swap(). __try_to_reclaim_swap()->folio_free_swap()->delete_from_swap_cache()-> put_swap_folio()->free_swap_slot()->swapcache_free_entries()-> swap_entry_free()->swap_range_free()-> ... WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); What stops swapoff to succeed after process 2 reclaimed the swap cache but before process1 finished its call to swap_page_trans_huge_swapped()? --8<----- Link: https://lkml.kernel.org/r/20240306140356.3974886-1-ryan.roberts@arm.com Fixes: 7c00bafee87c ("mm/swap: free swap slots in batch") Closes: https://lore.kernel.org/linux-mm/65a66eb9-41f8-4790-8db2-0c70ea15979f@redha… Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: "Huang, Ying" <ying.huang(a)intel.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/swapfile.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) --- a/mm/swapfile.c~mm-swap-fix-race-between-free_swap_and_cache-and-swapoff +++ a/mm/swapfile.c @@ -1232,6 +1232,11 @@ static unsigned char __swap_entry_free_l * with get_swap_device() and put_swap_device(), unless the swap * functions call get/put_swap_device() by themselves. * + * Note that when only holding the PTL, swapoff might succeed immediately + * after freeing a swap entry. Therefore, immediately after + * __swap_entry_free(), the swap info might become stale and should not + * be touched without a prior get_swap_device(). + * * Check whether swap entry is valid in the swap device. If so, * return pointer to swap_info_struct, and keep the swap entry valid * via preventing the swap device from being swapoff, until @@ -1609,13 +1614,19 @@ int free_swap_and_cache(swp_entry_t entr if (non_swap_entry(entry)) return 1; - p = _swap_info_get(entry); + p = get_swap_device(entry); if (p) { + if (WARN_ON(data_race(!p->swap_map[swp_offset(entry)]))) { + put_swap_device(p); + return 0; + } + count = __swap_entry_free(p, entry); if (count == SWAP_HAS_CACHE && !swap_page_trans_huge_swapped(p, entry)) __try_to_reclaim_swap(p, swp_offset(entry), TTRS_UNMAPPED | TTRS_FULL); + put_swap_device(p); } return p != NULL; } _ Patches currently in -mm which might be from ryan.roberts(a)arm.com are mm-swap-fix-race-between-free_swap_and_cache-and-swapoff.patch

1 year, 4 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror March 2024