This is the start of the stable review cycle for the 6.1.112 release. There are 73 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 29 Sep 2024 12:17:00 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.112-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.1.112-rc1
Edward Adam Davis eadavis@qq.com USB: usbtmc: prevent kernel-usb-infoleak
Junhao Xie bigfoot@classfun.cn USB: serial: pl2303: add device id for Macrosilicon MS3020
Tony Luck tony.luck@intel.com x86/mm: Switch to new Intel CPU model defines
Sumeet Pawnikar sumeet.r.pawnikar@intel.com powercap: RAPL: fix invalid initialization for pl4_supported field
Filipe Manana fdmanana@suse.com btrfs: calculate the right space for delayed refs when updating global reserve
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: mptcp: join: restrict fullmesh endp on 1st sf
Marc Kleine-Budde mkl@pengutronix.de can: mcp251xfd: move mcp251xfd_timestamp_start()/stop() into mcp251xfd_chip_start/stop()
Marc Kleine-Budde mkl@pengutronix.de can: mcp251xfd: properly indent labels
Hagar Hemdan hagarhem@amazon.com gpio: prevent potential speculation leaks in gpio_device_get_desc()
Kent Gibson warthog618@gmail.com gpiolib: cdev: Ignore reconfiguration without direction
Ping-Ke Shih pkshih@realtek.com Revert "wifi: cfg80211: check wiphy mutex is held for wdev mutex"
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: missing iterator type in lookup walk
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_set_pipapo: walk over current view on netlink dump
Dan Carpenter dan.carpenter@linaro.org netfilter: nft_socket: Fix a NULL vs IS_ERR() bug in nft_socket_cgroup_subtree_level()
Florian Westphal fw@strlen.de netfilter: nft_socket: make cgroupsv2 matching work with namespaces
Dave Chinner dchinner@redhat.com xfs: journal geometry is not properly bounds checked
Darrick J. Wong djwong@kernel.org xfs: set bnobt/cntbt numrecs correctly when formatting new AGs
Darrick J. Wong djwong@kernel.org xfs: fix reloading entire unlinked bucket lists
Darrick J. Wong djwong@kernel.org xfs: make inode unlinked bucket recovery work with quotacheck
Darrick J. Wong djwong@kernel.org xfs: reload entire unlinked bucket lists
Darrick J. Wong djwong@kernel.org xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list
Shiyang Ruan ruansy.fnst@fujitsu.com xfs: correct calculation for agend and blockcount
Dave Chinner dchinner@redhat.com xfs: fix unlink vs cluster buffer instantiation race
Darrick J. Wong djwong@kernel.org xfs: fix negative array access in xfs_getbmap
Darrick J. Wong djwong@kernel.org xfs: load uncached unlinked inodes into memory on demand
Shiyang Ruan ruansy.fnst@fujitsu.com xfs: fix the calculation for "end" and "length"
Dave Chinner dchinner@redhat.com xfs: remove WARN when dquot cache insertion fails
Long Li leo.lilong@huaweicloud.com xfs: fix ag count overflow during growfs
Dave Chinner dchinner@redhat.com xfs: collect errors from inodegc for unlinked inode recovery
Dave Chinner dchinner@redhat.com xfs: fix AGF vs inode cluster buffer deadlock
Dave Chinner dchinner@redhat.com xfs: defered work could create precommits
Dave Chinner dchinner@redhat.com xfs: buffer pins need to hold a buffer reference
Ye Bin yebin10@huawei.com xfs: fix BUG_ON in xfs_getbmap()
Dave Chinner dchinner@redhat.com xfs: quotacheck failure can race with background inode inactivation
Darrick J. Wong djwong@kernel.org xfs: fix uninitialized variable access
Dave Chinner dchinner@redhat.com xfs: block reservation too large for minleft allocation
Dave Chinner dchinner@redhat.com xfs: prefer free inodes at ENOSPC over chunk allocation
Dave Chinner dchinner@redhat.com xfs: fix low space alloc deadlock
Dave Chinner dchinner@redhat.com xfs: don't use BMBT btree split workers for IO completion
Wengang Wang wen.gang.wang@oracle.com xfs: fix extent busy updating
Wu Guanghao wuguanghao3@huawei.com xfs: Fix deadlock on xfs_inodegc_worker
Dave Chinner dchinner@redhat.com xfs: dquot shrinker doesn't check for XFS_DQFLAG_FREEING
Ferry Meng mengferry@linux.alibaba.com ocfs2: strict bound check before memcmp in ocfs2_xattr_find_entry()
Ferry Meng mengferry@linux.alibaba.com ocfs2: add bounds checking to ocfs2_xattr_find_entry()
Geert Uytterhoeven geert+renesas@glider.be spi: spidev: Add missing spi_device_id for jg10309-01
Hongyu Jin hongyu.jin@unisoc.com block: Fix where bio IO priority gets set
zhang jiao zhangjiao2@cmss.chinamobile.com tools: hv: rm .*.cmd when make clean
Michael Kelley mhklinux@outlook.com x86/hyperv: Set X86_FEATURE_TSC_KNOWN_FREQ when Hyper-V provides frequency
Paulo Alcantara pc@manguebit.com smb: client: fix hang in wait_for_response() for negproto
Liao Chen liaochen4@huawei.com spi: bcm63xx: Enable module autoloading
hongchi.peng hongchi.peng@siengine.com drm: komeda: Fix an issue related to normalized zpos
Fabio Estevam festevam@gmail.com spi: spidev: Add an entry for elgin,jg10309-01
Liao Chen liaochen4@huawei.com ASoC: tda7419: fix module autoloading
Liao Chen liaochen4@huawei.com ASoC: intel: fix module autoloading
Hans de Goede hdegoede@redhat.com ASoC: Intel: soc-acpi-cht: Make Lenovo Yoga Tab 3 X90F DMI match less strict
Marc Kleine-Budde mkl@pengutronix.de can: mcp251xfd: mcp251xfd_ring_init(): check TX-coalescing configuration
Emmanuel Grumbach emmanuel.grumbach@intel.com wifi: iwlwifi: clear trans->state earlier upon error
Dmitry Antipov dmantipov@yandex.ru wifi: mac80211: free skb on error path in ieee80211_beacon_get_ap()
Emmanuel Grumbach emmanuel.grumbach@intel.com wifi: iwlwifi: mvm: don't wait for tx queues if firmware is dead
Emmanuel Grumbach emmanuel.grumbach@intel.com wifi: iwlwifi: mvm: pause TCM when the firmware is stopped
Daniel Gabay daniel.gabay@intel.com wifi: iwlwifi: mvm: fix iwl_mvm_scan_fits() calculation
Benjamin Berg benjamin.berg@intel.com wifi: iwlwifi: lower message level for FW buffer destination
Huacai Chen chenhuacai@kernel.org LoongArch: Define ARCH_IRQ_INIT_FLAGS as IRQ_NOPROBE
Jacky Chou jacky_chou@aspeedtech.com net: ftgmac100: Ensure tx descriptor updates are visible
Mike Rapoport rppt@kernel.org microblaze: don't treat zero reserved memory regions as error
Ross Brown true.robot.ross@gmail.com hwmon: (asus-ec-sensors) remove VRM temp X570-E GAMING
Thomas Blocher thomas.blocher@ek-dev.de pinctrl: at91: make it work with current gpiolib
Sherry Yang sherry.yang@oracle.com scsi: lpfc: Fix overflow build issue
Kailang Yang kailang@realtek.com ALSA: hda/realtek - FIxed ALC285 headphone no sound
Kailang Yang kailang@realtek.com ALSA: hda/realtek - Fixed ALC256 headphone no sound
Hongbo Li lihongbo22@huawei.com ASoC: allow module autoloading for table board_ids
Hongbo Li lihongbo22@huawei.com ASoC: allow module autoloading for table db1200_pids
Albert Jakieła jakiela@google.com ASoC: SOF: mediatek: Add missing board compatible
-------------
Diffstat:
Makefile | 4 +- arch/loongarch/include/asm/hw_irq.h | 2 + arch/loongarch/kernel/irq.c | 3 - arch/microblaze/mm/init.c | 5 - arch/x86/kernel/cpu/mshyperv.c | 1 + arch/x86/mm/init.c | 16 +- block/blk-core.c | 10 + block/blk-mq.c | 10 - drivers/gpio/gpiolib-cdev.c | 12 +- drivers/gpio/gpiolib.c | 3 +- drivers/gpu/drm/arm/display/komeda/komeda_kms.c | 10 +- drivers/hwmon/asus-ec-sensors.c | 2 +- drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c | 42 ++-- drivers/net/can/spi/mcp251xfd/mcp251xfd-dump.c | 2 +- drivers/net/can/spi/mcp251xfd/mcp251xfd-regmap.c | 2 +- drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c | 14 +- drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c | 2 +- .../net/can/spi/mcp251xfd/mcp251xfd-timestamp.c | 7 +- drivers/net/can/spi/mcp251xfd/mcp251xfd.h | 1 + drivers/net/ethernet/faraday/ftgmac100.c | 26 ++- drivers/net/wireless/intel/iwlwifi/fw/dbg.c | 2 +- drivers/net/wireless/intel/iwlwifi/iwl-trans.h | 2 +- drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c | 9 +- drivers/net/wireless/intel/iwlwifi/mvm/ops.c | 2 + drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 23 +- .../wireless/intel/iwlwifi/pcie/ctxt-info-gen3.c | 3 +- drivers/pinctrl/pinctrl-at91.c | 5 +- drivers/powercap/intel_rapl_msr.c | 12 +- drivers/scsi/lpfc/lpfc_bsg.c | 2 +- drivers/spi/spi-bcm63xx.c | 1 + drivers/spi/spidev.c | 2 + drivers/usb/class/usbtmc.c | 2 +- drivers/usb/serial/pl2303.c | 1 + drivers/usb/serial/pl2303.h | 4 + fs/btrfs/block-rsv.c | 14 +- fs/btrfs/block-rsv.h | 12 + fs/btrfs/delayed-ref.h | 21 ++ fs/ocfs2/xattr.c | 27 ++- fs/smb/client/connect.c | 14 +- fs/xfs/libxfs/xfs_ag.c | 19 +- fs/xfs/libxfs/xfs_alloc.c | 69 +++++- fs/xfs/libxfs/xfs_bmap.c | 16 +- fs/xfs/libxfs/xfs_bmap.h | 2 + fs/xfs/libxfs/xfs_bmap_btree.c | 19 +- fs/xfs/libxfs/xfs_btree.c | 18 +- fs/xfs/libxfs/xfs_fs.h | 2 + fs/xfs/libxfs/xfs_ialloc.c | 17 ++ fs/xfs/libxfs/xfs_log_format.h | 9 +- fs/xfs/libxfs/xfs_sb.c | 56 ++++- fs/xfs/libxfs/xfs_trans_inode.c | 113 +-------- fs/xfs/xfs_attr_inactive.c | 1 - fs/xfs/xfs_bmap_util.c | 18 +- fs/xfs/xfs_buf_item.c | 88 +++++-- fs/xfs/xfs_dquot.c | 1 - fs/xfs/xfs_export.c | 14 ++ fs/xfs/xfs_extent_busy.c | 1 + fs/xfs/xfs_fsmap.c | 1 + fs/xfs/xfs_fsops.c | 13 +- fs/xfs/xfs_icache.c | 58 ++++- fs/xfs/xfs_icache.h | 4 +- fs/xfs/xfs_inode.c | 260 ++++++++++++++++++--- fs/xfs/xfs_inode.h | 36 ++- fs/xfs/xfs_inode_item.c | 149 ++++++++++++ fs/xfs/xfs_inode_item.h | 1 + fs/xfs/xfs_itable.c | 11 + fs/xfs/xfs_log.c | 47 ++-- fs/xfs/xfs_log_recover.c | 19 +- fs/xfs/xfs_mount.h | 11 +- fs/xfs/xfs_notify_failure.c | 15 +- fs/xfs/xfs_qm.c | 72 ++++-- fs/xfs/xfs_super.c | 1 + fs/xfs/xfs_trace.h | 46 ++++ fs/xfs/xfs_trans.c | 9 +- include/net/netfilter/nf_tables.h | 13 ++ net/mac80211/tx.c | 4 +- net/netfilter/nf_tables_api.c | 5 + net/netfilter/nft_lookup.c | 1 + net/netfilter/nft_set_pipapo.c | 6 +- net/netfilter/nft_socket.c | 41 +++- net/wireless/core.h | 8 +- sound/pci/hda/patch_realtek.c | 76 ++++-- sound/soc/amd/acp/acp-sof-mach.c | 2 + sound/soc/au1x/db1200.c | 1 + sound/soc/codecs/tda7419.c | 1 + sound/soc/intel/common/soc-acpi-intel-cht-match.c | 1 - sound/soc/intel/keembay/kmb_platform.c | 1 + sound/soc/sof/mediatek/mt8195/mt8195.c | 3 + tools/hv/Makefile | 2 +- tools/testing/selftests/net/mptcp/mptcp_join.sh | 4 +- 89 files changed, 1255 insertions(+), 462 deletions(-)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Albert Jakieła jakiela@google.com
[ Upstream commit c0196faaa927321a63e680427e075734ee656e42 ]
Add Google Dojo compatible.
Signed-off-by: Albert Jakieła jakiela@google.com Reviewed-by: Chen-Yu Tsai wenst@chromium.org Link: https://patch.msgid.link/20240809135627.544429-1-jakiela@google.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/sof/mediatek/mt8195/mt8195.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/sound/soc/sof/mediatek/mt8195/mt8195.c b/sound/soc/sof/mediatek/mt8195/mt8195.c index 53cadbe8a05cc..ac96ea07e591b 100644 --- a/sound/soc/sof/mediatek/mt8195/mt8195.c +++ b/sound/soc/sof/mediatek/mt8195/mt8195.c @@ -663,6 +663,9 @@ static struct snd_sof_of_mach sof_mt8195_machs[] = { { .compatible = "google,tomato", .sof_tplg_filename = "sof-mt8195-mt6359-rt1019-rt5682.tplg" + }, { + .compatible = "google,dojo", + .sof_tplg_filename = "sof-mt8195-mt6359-max98390-rt5682.tplg" }, { .compatible = "mediatek,mt8195", .sof_tplg_filename = "sof-mt8195.tplg"
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hongbo Li lihongbo22@huawei.com
[ Upstream commit 0e9fdab1e8df490354562187cdbb8dec643eae2c ]
Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded based on the alias from platform_device_id table.
Signed-off-by: Hongbo Li lihongbo22@huawei.com Link: https://patch.msgid.link/20240821061955.2273782-2-lihongbo22@huawei.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/au1x/db1200.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/sound/soc/au1x/db1200.c b/sound/soc/au1x/db1200.c index 400eaf9f8b140..f185711180cb4 100644 --- a/sound/soc/au1x/db1200.c +++ b/sound/soc/au1x/db1200.c @@ -44,6 +44,7 @@ static const struct platform_device_id db1200_pids[] = { }, {}, }; +MODULE_DEVICE_TABLE(platform, db1200_pids);
/*------------------------- AC97 PART ---------------------------*/
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hongbo Li lihongbo22@huawei.com
[ Upstream commit 5f7c98b7519a3a847d9182bd99d57ea250032ca1 ]
Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded based on the alias from platform_device_id table.
Signed-off-by: Hongbo Li lihongbo22@huawei.com Link: https://patch.msgid.link/20240821061955.2273782-3-lihongbo22@huawei.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/amd/acp/acp-sof-mach.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/sound/soc/amd/acp/acp-sof-mach.c b/sound/soc/amd/acp/acp-sof-mach.c index 972600d271586..c594af432b3ee 100644 --- a/sound/soc/amd/acp/acp-sof-mach.c +++ b/sound/soc/amd/acp/acp-sof-mach.c @@ -152,6 +152,8 @@ static const struct platform_device_id board_ids[] = { }, { } }; +MODULE_DEVICE_TABLE(platform, board_ids); + static struct platform_driver acp_asoc_audio = { .driver = { .name = "sof_mach",
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kailang Yang kailang@realtek.com
[ Upstream commit 9b82ff1362f50914c8292902e07be98a9f59d33d ]
Dell platform, plug headphone or headset, it had a chance to get no sound from headphone. Replace depop procedure will solve this issue.
Signed-off-by: Kailang Yang kailang@realtek.com Link: https://lore.kernel.org/bb8e2de30d294dc287944efa0667685a@realtek.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/pci/hda/patch_realtek.c | 50 ++++++++++++++++++++++++++--------- 1 file changed, 37 insertions(+), 13 deletions(-)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index d869d6ba96f3d..784dfdf0cd6f4 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -4928,6 +4928,30 @@ static void alc269_fixup_hp_line1_mic1_led(struct hda_codec *codec, } }
+static void alc_hp_mute_disable(struct hda_codec *codec, unsigned int delay) +{ + if (delay <= 0) + delay = 75; + snd_hda_codec_write(codec, 0x21, 0, + AC_VERB_SET_AMP_GAIN_MUTE, AMP_OUT_MUTE); + msleep(delay); + snd_hda_codec_write(codec, 0x21, 0, + AC_VERB_SET_PIN_WIDGET_CONTROL, 0x0); + msleep(delay); +} + +static void alc_hp_enable_unmute(struct hda_codec *codec, unsigned int delay) +{ + if (delay <= 0) + delay = 75; + snd_hda_codec_write(codec, 0x21, 0, + AC_VERB_SET_PIN_WIDGET_CONTROL, PIN_OUT); + msleep(delay); + snd_hda_codec_write(codec, 0x21, 0, + AC_VERB_SET_AMP_GAIN_MUTE, AMP_OUT_UNMUTE); + msleep(delay); +} + static const struct coef_fw alc225_pre_hsmode[] = { UPDATE_COEF(0x4a, 1<<8, 0), UPDATE_COEFEX(0x57, 0x05, 1<<14, 0), @@ -5029,6 +5053,7 @@ static void alc_headset_mode_unplugged(struct hda_codec *codec) case 0x10ec0236: case 0x10ec0256: case 0x19e58326: + alc_hp_mute_disable(codec, 75); alc_process_coef_fw(codec, coef0256); break; case 0x10ec0234: @@ -5300,6 +5325,7 @@ static void alc_headset_mode_default(struct hda_codec *codec) alc_write_coef_idx(codec, 0x45, 0xc089); msleep(50); alc_process_coef_fw(codec, coef0256); + alc_hp_enable_unmute(codec, 75); break; case 0x10ec0234: case 0x10ec0274: @@ -5397,6 +5423,7 @@ static void alc_headset_mode_ctia(struct hda_codec *codec) case 0x10ec0256: case 0x19e58326: alc_process_coef_fw(codec, coef0256); + alc_hp_enable_unmute(codec, 75); break; case 0x10ec0234: case 0x10ec0274: @@ -5512,6 +5539,7 @@ static void alc_headset_mode_omtp(struct hda_codec *codec) case 0x10ec0256: case 0x19e58326: alc_process_coef_fw(codec, coef0256); + alc_hp_enable_unmute(codec, 75); break; case 0x10ec0234: case 0x10ec0274: @@ -5617,25 +5645,21 @@ static void alc_determine_headset_type(struct hda_codec *codec) alc_write_coef_idx(codec, 0x06, 0x6104); alc_write_coefex_idx(codec, 0x57, 0x3, 0x09a3);
- snd_hda_codec_write(codec, 0x21, 0, - AC_VERB_SET_AMP_GAIN_MUTE, AMP_OUT_MUTE); - msleep(80); - snd_hda_codec_write(codec, 0x21, 0, - AC_VERB_SET_PIN_WIDGET_CONTROL, 0x0); - alc_process_coef_fw(codec, coef0255); msleep(300); val = alc_read_coef_idx(codec, 0x46); is_ctia = (val & 0x0070) == 0x0070; - + if (!is_ctia) { + alc_write_coef_idx(codec, 0x45, 0xe089); + msleep(100); + val = alc_read_coef_idx(codec, 0x46); + if ((val & 0x0070) == 0x0070) + is_ctia = false; + else + is_ctia = true; + } alc_write_coefex_idx(codec, 0x57, 0x3, 0x0da3); alc_update_coefex_idx(codec, 0x57, 0x5, 1<<14, 0); - - snd_hda_codec_write(codec, 0x21, 0, - AC_VERB_SET_PIN_WIDGET_CONTROL, PIN_OUT); - msleep(80); - snd_hda_codec_write(codec, 0x21, 0, - AC_VERB_SET_AMP_GAIN_MUTE, AMP_OUT_UNMUTE); break; case 0x10ec0234: case 0x10ec0274:
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kailang Yang kailang@realtek.com
[ Upstream commit 1fa7b099d60ad64f559bd3b8e3f0d94b2e015514 ]
Dell platform with ALC215 ALC285 ALC289 ALC225 ALC295 ALC299, plug headphone or headset. It had a chance to get no sound from headphone. Replace depop procedure will solve this issue.
Signed-off-by: Kailang Yang kailang@realtek.com Link: https://lore.kernel.org/d0de1b03fd174520945dde216d765223@realtek.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/pci/hda/patch_realtek.c | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 784dfdf0cd6f4..277303cbe96de 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -5088,6 +5088,7 @@ static void alc_headset_mode_unplugged(struct hda_codec *codec) case 0x10ec0295: case 0x10ec0289: case 0x10ec0299: + alc_hp_mute_disable(codec, 75); alc_process_coef_fw(codec, alc225_pre_hsmode); alc_process_coef_fw(codec, coef0225); break; @@ -5313,6 +5314,7 @@ static void alc_headset_mode_default(struct hda_codec *codec) case 0x10ec0299: alc_process_coef_fw(codec, alc225_pre_hsmode); alc_process_coef_fw(codec, coef0225); + alc_hp_enable_unmute(codec, 75); break; case 0x10ec0255: alc_process_coef_fw(codec, coef0255); @@ -5472,6 +5474,7 @@ static void alc_headset_mode_ctia(struct hda_codec *codec) alc_process_coef_fw(codec, coef0225_2); else alc_process_coef_fw(codec, coef0225_1); + alc_hp_enable_unmute(codec, 75); break; case 0x10ec0867: alc_update_coefex_idx(codec, 0x57, 0x5, 1<<14, 0); @@ -5577,6 +5580,7 @@ static void alc_headset_mode_omtp(struct hda_codec *codec) case 0x10ec0289: case 0x10ec0299: alc_process_coef_fw(codec, coef0225); + alc_hp_enable_unmute(codec, 75); break; } codec_dbg(codec, "Headset jack set to Nokia-style headset mode.\n"); @@ -5736,12 +5740,6 @@ static void alc_determine_headset_type(struct hda_codec *codec) case 0x10ec0295: case 0x10ec0289: case 0x10ec0299: - snd_hda_codec_write(codec, 0x21, 0, - AC_VERB_SET_AMP_GAIN_MUTE, AMP_OUT_MUTE); - msleep(80); - snd_hda_codec_write(codec, 0x21, 0, - AC_VERB_SET_PIN_WIDGET_CONTROL, 0x0); - alc_process_coef_fw(codec, alc225_pre_hsmode); alc_update_coef_idx(codec, 0x67, 0xf000, 0x1000); val = alc_read_coef_idx(codec, 0x45); @@ -5758,15 +5756,19 @@ static void alc_determine_headset_type(struct hda_codec *codec) val = alc_read_coef_idx(codec, 0x46); is_ctia = (val & 0x00f0) == 0x00f0; } + if (!is_ctia) { + alc_update_coef_idx(codec, 0x45, 0x3f<<10, 0x38<<10); + alc_update_coef_idx(codec, 0x49, 3<<8, 1<<8); + msleep(100); + val = alc_read_coef_idx(codec, 0x46); + if ((val & 0x00f0) == 0x00f0) + is_ctia = false; + else + is_ctia = true; + } alc_update_coef_idx(codec, 0x4a, 7<<6, 7<<6); alc_update_coef_idx(codec, 0x4a, 3<<4, 3<<4); alc_update_coef_idx(codec, 0x67, 0xf000, 0x3000); - - snd_hda_codec_write(codec, 0x21, 0, - AC_VERB_SET_PIN_WIDGET_CONTROL, PIN_OUT); - msleep(80); - snd_hda_codec_write(codec, 0x21, 0, - AC_VERB_SET_AMP_GAIN_MUTE, AMP_OUT_UNMUTE); break; case 0x10ec0867: is_ctia = true;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sherry Yang sherry.yang@oracle.com
[ Upstream commit 3417c9574e368f0330637505f00d3814ca8854d2 ]
Build failed while enabling "CONFIG_GCOV_KERNEL=y" and "CONFIG_GCOV_PROFILE_ALL=y" with following error:
BUILDSTDERR: drivers/scsi/lpfc/lpfc_bsg.c: In function 'lpfc_get_cgnbuf_info': BUILDSTDERR: ./include/linux/fortify-string.h:114:33: error: '__builtin_memcpy' accessing 18446744073709551615 bytes at offsets 0 and 0 overlaps 9223372036854775807 bytes at offset -9223372036854775808 [-Werror=restrict] BUILDSTDERR: 114 | #define __underlying_memcpy __builtin_memcpy BUILDSTDERR: | ^ BUILDSTDERR: ./include/linux/fortify-string.h:637:9: note: in expansion of macro '__underlying_memcpy' BUILDSTDERR: 637 | __underlying_##op(p, q, __fortify_size); \ BUILDSTDERR: | ^~~~~~~~~~~~~ BUILDSTDERR: ./include/linux/fortify-string.h:682:26: note: in expansion of macro '__fortify_memcpy_chk' BUILDSTDERR: 682 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \ BUILDSTDERR: | ^~~~~~~~~~~~~~~~~~~~ BUILDSTDERR: drivers/scsi/lpfc/lpfc_bsg.c:5468:9: note: in expansion of macro 'memcpy' BUILDSTDERR: 5468 | memcpy(cgn_buff, cp, cinfosz); BUILDSTDERR: | ^~~~~~
This happens from the commit 06bb7fc0feee ("kbuild: turn on -Wrestrict by default"). Address this issue by using size_t type.
Signed-off-by: Sherry Yang sherry.yang@oracle.com Link: https://lore.kernel.org/r/20240821065131.1180791-1-sherry.yang@oracle.com Reviewed-by: Justin Tee justin.tee@broadcom.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/lpfc/lpfc_bsg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/lpfc/lpfc_bsg.c b/drivers/scsi/lpfc/lpfc_bsg.c index 2373dad016033..fc300febe9140 100644 --- a/drivers/scsi/lpfc/lpfc_bsg.c +++ b/drivers/scsi/lpfc/lpfc_bsg.c @@ -5409,7 +5409,7 @@ lpfc_get_cgnbuf_info(struct bsg_job *job) struct get_cgnbuf_info_req *cgnbuf_req; struct lpfc_cgn_info *cp; uint8_t *cgn_buff; - int size, cinfosz; + size_t size, cinfosz; int rc = 0;
if (job->request_len < sizeof(struct fc_bsg_request) +
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Blocher thomas.blocher@ek-dev.de
[ Upstream commit 752f387faaae0ae2e84d3f496922524785e77d60 ]
pinctrl-at91 currently does not support the gpio-groups devicetree property and has no pin-range. Because of this at91 gpios stopped working since patch commit 2ab73c6d8323fa1e ("gpio: Support GPIO controllers without pin-ranges") This was discussed in the patches commit fc328a7d1fcce263 ("gpio: Revert regression in sysfs-gpio (gpiolib.c)") commit 56e337f2cf132632 ("Revert "gpio: Revert regression in sysfs-gpio (gpiolib.c)"")
As a workaround manually set pin-range via gpiochip_add_pin_range() until a) pinctrl-at91 is reworked to support devicetree gpio-groups b) another solution as mentioned in commit 56e337f2cf132632 ("Revert "gpio: Revert regression in sysfs-gpio (gpiolib.c)"") is found
Signed-off-by: Thomas Blocher thomas.blocher@ek-dev.de Link: https://lore.kernel.org/5b992862-355d-f0de-cd3d-ff99e67a4ff1@ek-dev.de Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pinctrl/pinctrl-at91.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/pinctrl/pinctrl-at91.c b/drivers/pinctrl/pinctrl-at91.c index ff3b6a8a0b170..333f9d70c7f48 100644 --- a/drivers/pinctrl/pinctrl-at91.c +++ b/drivers/pinctrl/pinctrl-at91.c @@ -1420,8 +1420,11 @@ static int at91_pinctrl_probe(struct platform_device *pdev)
/* We will handle a range of GPIO pins */ for (i = 0; i < gpio_banks; i++) - if (gpio_chips[i]) + if (gpio_chips[i]) { pinctrl_add_gpio_range(info->pctl, &gpio_chips[i]->range); + gpiochip_add_pin_range(&gpio_chips[i]->chip, dev_name(info->pctl->dev), 0, + gpio_chips[i]->range.pin_base, gpio_chips[i]->range.npins); + }
dev_info(&pdev->dev, "initialized AT91 pinctrl driver\n");
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ross Brown true.robot.ross@gmail.com
[ Upstream commit 9efaebc0072b8e95505544bf385c20ee8a29d799 ]
X570-E GAMING does not have VRM temperature sensor.
Signed-off-by: Ross Brown true.robot.ross@gmail.com Signed-off-by: Eugene Shalygin eugene.shalygin@gmail.com Link: https://lore.kernel.org/r/20240730062320.5188-2-eugene.shalygin@gmail.com Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/asus-ec-sensors.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hwmon/asus-ec-sensors.c b/drivers/hwmon/asus-ec-sensors.c index b4d65916b3c00..d893cfd1cb829 100644 --- a/drivers/hwmon/asus-ec-sensors.c +++ b/drivers/hwmon/asus-ec-sensors.c @@ -369,7 +369,7 @@ static const struct ec_board_info board_info_strix_b550_i_gaming = {
static const struct ec_board_info board_info_strix_x570_e_gaming = { .sensors = SENSOR_SET_TEMP_CHIPSET_CPU_MB | - SENSOR_TEMP_T_SENSOR | SENSOR_TEMP_VRM | + SENSOR_TEMP_T_SENSOR | SENSOR_FAN_CHIPSET | SENSOR_CURR_CPU | SENSOR_IN_CPU_CORE, .mutex_path = ASUS_HW_ACCESS_MUTEX_ASMX,
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mike Rapoport rppt@kernel.org
[ Upstream commit 0075df288dd8a7abfe03b3766176c393063591dd ]
Before commit 721f4a6526da ("mm/memblock: remove empty dummy entry") the check for non-zero of memblock.reserved.cnt in mmu_init() would always be true either because memblock.reserved.cnt is initialized to 1 or because there were memory reservations earlier.
The removal of dummy empty entry in memblock caused this check to fail because now memblock.reserved.cnt is initialized to 0.
Remove the check for non-zero of memblock.reserved.cnt because it's perfectly fine to have an empty memblock.reserved array that early in boot.
Reported-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Mike Rapoport rppt@kernel.org Reviewed-by: Wei Yang richard.weiyang@gmail.com Tested-by: Guenter Roeck linux@roeck-us.net Link: https://lore.kernel.org/r/20240729053327.4091459-1-rppt@kernel.org Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org --- arch/microblaze/mm/init.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c index 353fabdfcbc54..2a3248194d505 100644 --- a/arch/microblaze/mm/init.c +++ b/arch/microblaze/mm/init.c @@ -193,11 +193,6 @@ asmlinkage void __init mmu_init(void) { unsigned int kstart, ksize;
- if (!memblock.reserved.cnt) { - pr_emerg("Error memory count\n"); - machine_restart(NULL); - } - if ((u32) memblock.memory.regions[0].size < 0x400000) { pr_emerg("Memory must be greater than 4MB\n"); machine_restart(NULL);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jacky Chou jacky_chou@aspeedtech.com
[ Upstream commit 4186c8d9e6af57bab0687b299df10ebd47534a0a ]
The driver must ensure TX descriptor updates are visible before updating TX pointer and TX clear pointer.
This resolves TX hangs observed on AST2600 when running iperf3.
Signed-off-by: Jacky Chou jacky_chou@aspeedtech.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/faraday/ftgmac100.c | 26 ++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c index a03879a27b041..7adc46aa75e66 100644 --- a/drivers/net/ethernet/faraday/ftgmac100.c +++ b/drivers/net/ethernet/faraday/ftgmac100.c @@ -566,7 +566,7 @@ static bool ftgmac100_rx_packet(struct ftgmac100 *priv, int *processed) (*processed)++; return true;
- drop: +drop: /* Clean rxdes0 (which resets own bit) */ rxdes->rxdes0 = cpu_to_le32(status & priv->rxdes0_edorr_mask); priv->rx_pointer = ftgmac100_next_rx_pointer(priv, pointer); @@ -650,6 +650,11 @@ static bool ftgmac100_tx_complete_packet(struct ftgmac100 *priv) ftgmac100_free_tx_packet(priv, pointer, skb, txdes, ctl_stat); txdes->txdes0 = cpu_to_le32(ctl_stat & priv->txdes0_edotr_mask);
+ /* Ensure the descriptor config is visible before setting the tx + * pointer. + */ + smp_wmb(); + priv->tx_clean_pointer = ftgmac100_next_tx_pointer(priv, pointer);
return true; @@ -803,6 +808,11 @@ static netdev_tx_t ftgmac100_hard_start_xmit(struct sk_buff *skb, dma_wmb(); first->txdes0 = cpu_to_le32(f_ctl_stat);
+ /* Ensure the descriptor config is visible before setting the tx + * pointer. + */ + smp_wmb(); + /* Update next TX pointer */ priv->tx_pointer = pointer;
@@ -823,7 +833,7 @@ static netdev_tx_t ftgmac100_hard_start_xmit(struct sk_buff *skb,
return NETDEV_TX_OK;
- dma_err: +dma_err: if (net_ratelimit()) netdev_err(netdev, "map tx fragment failed\n");
@@ -845,7 +855,7 @@ static netdev_tx_t ftgmac100_hard_start_xmit(struct sk_buff *skb, * last fragment, so we know ftgmac100_free_tx_packet() * hasn't freed the skb yet. */ - drop: +drop: /* Drop the packet */ dev_kfree_skb_any(skb); netdev->stats.tx_dropped++; @@ -1338,7 +1348,7 @@ static void ftgmac100_reset(struct ftgmac100 *priv) ftgmac100_init_all(priv, true);
netdev_dbg(netdev, "Reset done !\n"); - bail: +bail: if (priv->mii_bus) mutex_unlock(&priv->mii_bus->mdio_lock); if (netdev->phydev) @@ -1537,15 +1547,15 @@ static int ftgmac100_open(struct net_device *netdev)
return 0;
- err_ncsi: +err_ncsi: napi_disable(&priv->napi); netif_stop_queue(netdev); - err_alloc: +err_alloc: ftgmac100_free_buffers(priv); free_irq(netdev->irq, netdev); - err_irq: +err_irq: netif_napi_del(&priv->napi); - err_hw: +err_hw: iowrite32(0, priv->base + FTGMAC100_OFFSET_IER); ftgmac100_free_rings(priv); return err;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Huacai Chen chenhuacai@loongson.cn
[ Upstream commit 274ea3563e5ab9f468c15bfb9d2492803a66d9be ]
Currently we call irq_set_noprobe() in a loop for all IRQs, but indeed it only works for IRQs below NR_IRQS_LEGACY because at init_IRQ() only legacy interrupts have been allocated.
Instead, we can define ARCH_IRQ_INIT_FLAGS as IRQ_NOPROBE in asm/hwirq.h and the core will automatically set the flag for all interrupts.
Reviewed-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Tianyang Zhang zhangtianyang@loongson.cn Signed-off-by: Sasha Levin sashal@kernel.org --- arch/loongarch/include/asm/hw_irq.h | 2 ++ arch/loongarch/kernel/irq.c | 3 --- 2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/loongarch/include/asm/hw_irq.h b/arch/loongarch/include/asm/hw_irq.h index af4f4e8fbd858..8156ffb674159 100644 --- a/arch/loongarch/include/asm/hw_irq.h +++ b/arch/loongarch/include/asm/hw_irq.h @@ -9,6 +9,8 @@
extern atomic_t irq_err_count;
+#define ARCH_IRQ_INIT_FLAGS IRQ_NOPROBE + /* * interrupt-retrigger: NOP for now. This may not be appropriate for all * machines, we'll see ... diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c index 0524bf1169b74..4496649c9e68b 100644 --- a/arch/loongarch/kernel/irq.c +++ b/arch/loongarch/kernel/irq.c @@ -122,9 +122,6 @@ void __init init_IRQ(void) panic("IPI IRQ request failed\n"); #endif
- for (i = 0; i < NR_IRQS; i++) - irq_set_noprobe(i); - for_each_possible_cpu(i) { page = alloc_pages_node(cpu_to_node(i), GFP_KERNEL, order);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Benjamin Berg benjamin.berg@intel.com
[ Upstream commit f8a129c1e10256c785164ed5efa5d17d45fbd81b ]
An invalid buffer destination is not a problem for the driver and it does not make sense to report it with the KERN_ERR message level. As such, change the message to use IWL_DEBUG_FW.
Reported-by: Len Brown lenb@kernel.org Closes: https://lore.kernel.org/r/CAJvTdKkcxJss=DM2sxgv_MR5BeZ4_OC-3ad6tA40TYH2yqHCW... Signed-off-by: Benjamin Berg benjamin.berg@intel.com Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Link: https://patch.msgid.link/20240825191257.20abf78f05bc.Ifbcecc2ae9fb40b9698302... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/pcie/ctxt-info-gen3.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/ctxt-info-gen3.c b/drivers/net/wireless/intel/iwlwifi/pcie/ctxt-info-gen3.c index 75fd386b048e9..35c60faf8e8fb 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/ctxt-info-gen3.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/ctxt-info-gen3.c @@ -68,7 +68,8 @@ iwl_pcie_ctxt_info_dbg_enable(struct iwl_trans *trans, } break; default: - IWL_ERR(trans, "WRT: Invalid buffer destination\n"); + IWL_DEBUG_FW(trans, "WRT: Invalid buffer destination (%d)\n", + le32_to_cpu(fw_mon_cfg->buf_location)); } out: if (dbg_flags)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Gabay daniel.gabay@intel.com
[ Upstream commit d44162280899c3fc2c6700e21e491e71c3c96e3d ]
The calculation should consider also the 6GHz IE's len, fix that. In addition, in iwl_mvm_sched_scan_start() the scan_fits helper is called only in case non_psc_incldued is true, but it should be called regardless, fix that as well.
Signed-off-by: Daniel Gabay daniel.gabay@intel.com Reviewed-by: Ilan Peer ilan.peer@intel.com Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Link: https://patch.msgid.link/20240825191257.7db825442fd2.I99f4d6587709de02072fd5... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 23 ++++++++++--------- 1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c index b58441c2af730..20c5cc72e4269 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c @@ -824,8 +824,8 @@ static inline bool iwl_mvm_scan_fits(struct iwl_mvm *mvm, int n_ssids, return ((n_ssids <= PROBE_OPTION_MAX) && (n_channels <= mvm->fw->ucode_capa.n_scan_channels) & (ies->common_ie_len + - ies->len[NL80211_BAND_2GHZ] + - ies->len[NL80211_BAND_5GHZ] <= + ies->len[NL80211_BAND_2GHZ] + ies->len[NL80211_BAND_5GHZ] + + ies->len[NL80211_BAND_6GHZ] <= iwl_mvm_max_scan_ie_fw_cmd_room(mvm))); }
@@ -2935,18 +2935,16 @@ int iwl_mvm_sched_scan_start(struct iwl_mvm *mvm, params.n_channels = j; }
- if (non_psc_included && - !iwl_mvm_scan_fits(mvm, req->n_ssids, ies, params.n_channels)) { - kfree(params.channels); - return -ENOBUFS; + if (!iwl_mvm_scan_fits(mvm, req->n_ssids, ies, params.n_channels)) { + ret = -ENOBUFS; + goto out; }
uid = iwl_mvm_build_scan_cmd(mvm, vif, &hcmd, ¶ms, type); - - if (non_psc_included) - kfree(params.channels); - if (uid < 0) - return uid; + if (uid < 0) { + ret = uid; + goto out; + }
ret = iwl_mvm_send_cmd(mvm, &hcmd); if (!ret) { @@ -2963,6 +2961,9 @@ int iwl_mvm_sched_scan_start(struct iwl_mvm *mvm, mvm->sched_scan_pass_all = SCHED_SCAN_PASS_ALL_DISABLED; }
+out: + if (non_psc_included) + kfree(params.channels); return ret; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Emmanuel Grumbach emmanuel.grumbach@intel.com
[ Upstream commit 0668ebc8c2282ca1e7eb96092a347baefffb5fe7 ]
Not doing so will make us send a host command to the transport while the firmware is not alive, which will trigger a WARNING.
bad state = 0 WARNING: CPU: 2 PID: 17434 at drivers/net/wireless/intel/iwlwifi/iwl-trans.c:115 iwl_trans_send_cmd+0x1cb/0x1e0 [iwlwifi] RIP: 0010:iwl_trans_send_cmd+0x1cb/0x1e0 [iwlwifi] Call Trace: <TASK> iwl_mvm_send_cmd+0x40/0xc0 [iwlmvm] iwl_mvm_config_scan+0x198/0x260 [iwlmvm] iwl_mvm_recalc_tcm+0x730/0x11d0 [iwlmvm] iwl_mvm_tcm_work+0x1d/0x30 [iwlmvm] process_one_work+0x29e/0x640 worker_thread+0x2df/0x690 ? rescuer_thread+0x540/0x540 kthread+0x192/0x1e0 ? set_kthread_struct+0x90/0x90 ret_from_fork+0x22/0x30
Signed-off-by: Emmanuel Grumbach emmanuel.grumbach@intel.com Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Link: https://patch.msgid.link/20240825191257.5abe71ca1b6b.I97a968cb8be1f24f94652d... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/mvm/ops.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c index 88b6d4e566c40..0a11ee347bf32 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c @@ -1366,6 +1366,8 @@ void iwl_mvm_stop_device(struct iwl_mvm *mvm)
clear_bit(IWL_MVM_STATUS_FIRMWARE_RUNNING, &mvm->status);
+ iwl_mvm_pause_tcm(mvm, false); + iwl_fw_dbg_stop_sync(&mvm->fwrt); iwl_trans_stop_device(mvm->trans); iwl_free_fw_paging(&mvm->fwrt);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Emmanuel Grumbach emmanuel.grumbach@intel.com
[ Upstream commit 3a84454f5204718ca5b4ad2c1f0bf2031e2403d1 ]
There is a WARNING in iwl_trans_wait_tx_queues_empty() (that was recently converted from just a message), that can be hit if we wait for TX queues to become empty after firmware died. Clearly, we can't expect anything from the firmware after it's declared dead.
Don't call iwl_trans_wait_tx_queues_empty() in this case. While it could be a good idea to stop the flow earlier, the flush functions do some maintenance work that is not related to the firmware, so keep that part of the code running even when the firmware is not running.
Signed-off-by: Emmanuel Grumbach emmanuel.grumbach@intel.com Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Link: https://patch.msgid.link/20240825191257.a7cbd794cee9.I44a739fbd4ffcc46b83844... [edit commit message] Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c index 4e8bdd3d701bf..bd4301857ba87 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c @@ -4800,6 +4800,10 @@ static void iwl_mvm_flush_no_vif(struct iwl_mvm *mvm, u32 queues, bool drop) int i;
if (!iwl_mvm_has_new_tx_api(mvm)) { + /* we can't ask the firmware anything if it is dead */ + if (test_bit(IWL_MVM_STATUS_HW_RESTART_REQUESTED, + &mvm->status)) + return; if (drop) { mutex_lock(&mvm->mutex); iwl_mvm_flush_tx_path(mvm, @@ -4881,8 +4885,11 @@ static void iwl_mvm_mac_flush(struct ieee80211_hw *hw,
/* this can take a while, and we may need/want other operations * to succeed while doing this, so do it without the mutex held + * If the firmware is dead, this can't work... */ - if (!drop && !iwl_mvm_has_new_tx_api(mvm)) + if (!drop && !iwl_mvm_has_new_tx_api(mvm) && + !test_bit(IWL_MVM_STATUS_HW_RESTART_REQUESTED, + &mvm->status)) iwl_trans_wait_tx_queues_empty(mvm->trans, msk); }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Antipov dmantipov@yandex.ru
[ Upstream commit 786c5be9ac29a39b6f37f1fdd2ea59d0fe35d525 ]
In 'ieee80211_beacon_get_ap()', free allocated skb in case of error returned by 'ieee80211_beacon_protect()'. Compile tested only.
Signed-off-by: Dmitry Antipov dmantipov@yandex.ru Link: https://patch.msgid.link/20240805142035.227847-1-dmantipov@yandex.ru Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/tx.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index 419baf8efddea..0685ae2ea64eb 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -5196,8 +5196,10 @@ ieee80211_beacon_get_ap(struct ieee80211_hw *hw, if (beacon->tail) skb_put_data(skb, beacon->tail, beacon->tail_len);
- if (ieee80211_beacon_protect(skb, local, sdata, link) < 0) + if (ieee80211_beacon_protect(skb, local, sdata, link) < 0) { + dev_kfree_skb(skb); return NULL; + }
ieee80211_beacon_get_finish(hw, vif, link, offs, beacon, skb, chanctx_conf, csa_off_base);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Emmanuel Grumbach emmanuel.grumbach@intel.com
[ Upstream commit 094513f8a2fbddee51b055d8035f995551f98fce ]
When the firmware crashes, we first told the op_mode and only then, changed the transport's state. This is a problem if the op_mode's nic_error() handler needs to send a host command: it'll see that the transport's state still reflects that the firmware is alive.
Today, this has no consequences since we set the STATUS_FW_ERROR bit and that will prevent sending host commands. iwl_fw_dbg_stop_restart_recording looks at this bit to know not to send a host command for example.
To fix the hibernation, we needed to reset the firmware without having an error and checking STATUS_FW_ERROR to see whether the firmware is alive will no longer hold, so this change is necessary as well.
Change the flow a bit. Change trans->state before calling the op_mode's nic_error() method and check trans->state instead of STATUS_FW_ERROR. This will keep the current behavior of iwl_fw_dbg_stop_restart_recording upon firmware error, and it'll allow us to call iwl_fw_dbg_stop_restart_recording safely even if STATUS_FW_ERROR is clear, but yet, the firmware is not alive.
Signed-off-by: Emmanuel Grumbach emmanuel.grumbach@intel.com Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Link: https://patch.msgid.link/20240825191257.9d7427fbdfd7.Ia056ca57029a382c921d6f... [I missed this was a dependency for the hibernation fix, changed the commit message a bit accordingly] Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/fw/dbg.c | 2 +- drivers/net/wireless/intel/iwlwifi/iwl-trans.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c index 3b0ed1cdfa11e..7fadaec777cea 100644 --- a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c +++ b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c @@ -3131,7 +3131,7 @@ void iwl_fw_dbg_stop_restart_recording(struct iwl_fw_runtime *fwrt, { int ret __maybe_unused = 0;
- if (test_bit(STATUS_FW_ERROR, &fwrt->trans->status)) + if (!iwl_trans_fw_running(fwrt->trans)) return;
if (fw_has_capa(&fwrt->fw->ucode_capa, diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-trans.h b/drivers/net/wireless/intel/iwlwifi/iwl-trans.h index 70022cadee35b..ad29663a356be 100644 --- a/drivers/net/wireless/intel/iwlwifi/iwl-trans.h +++ b/drivers/net/wireless/intel/iwlwifi/iwl-trans.h @@ -1472,8 +1472,8 @@ static inline void iwl_trans_fw_error(struct iwl_trans *trans, bool sync)
/* prevent double restarts due to the same erroneous FW */ if (!test_and_set_bit(STATUS_FW_ERROR, &trans->status)) { - iwl_op_mode_nic_error(trans->op_mode, sync); trans->state = IWL_TRANS_NO_FW; + iwl_op_mode_nic_error(trans->op_mode, sync); } }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Kleine-Budde mkl@pengutronix.de
[ Upstream commit ac2b81eb8b2d104033560daea886ee84531e3d0a ]
When changing the interface from CAN-CC to CAN-FD mode the old coalescing parameters are re-used. This might cause problem, as the configured parameters are too big for CAN-FD mode.
During testing an invalid TX coalescing configuration has been seen. The problem should be been fixed in the previous patch, but add a safeguard here to ensure that the number of TEF coalescing buffers (if configured) is exactly the half of all TEF buffers.
Link: https://lore.kernel.org/all/20240805-mcp251xfd-fix-ringconfig-v1-2-72086f0ca... Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c index 0fde8154a649b..a894cb1fb9bfe 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c @@ -280,7 +280,7 @@ int mcp251xfd_ring_init(struct mcp251xfd_priv *priv) const struct mcp251xfd_rx_ring *rx_ring; u16 base = 0, ram_used; u8 fifo_nr = 1; - int i; + int err = 0, i;
netdev_reset_queue(priv->ndev);
@@ -376,10 +376,18 @@ int mcp251xfd_ring_init(struct mcp251xfd_priv *priv) netdev_err(priv->ndev, "Error during ring configuration, using more RAM (%u bytes) than available (%u bytes).\n", ram_used, MCP251XFD_RAM_SIZE); - return -ENOMEM; + err = -ENOMEM; }
- return 0; + if (priv->tx_obj_num_coalesce_irq && + priv->tx_obj_num_coalesce_irq * 2 != priv->tx->obj_num) { + netdev_err(priv->ndev, + "Error during ring configuration, number of TEF coalescing buffers (%u) must be half of TEF buffers (%u).\n", + priv->tx_obj_num_coalesce_irq, priv->tx->obj_num); + err = -EINVAL; + } + + return err; }
void mcp251xfd_ring_free(struct mcp251xfd_priv *priv)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit 839a4ec06f75cec8fec2cc5fc14e921d0c3f7369 ]
There are 2G and 4G RAM versions of the Lenovo Yoga Tab 3 X90F and it turns out that the 2G version has a DMI product name of "CHERRYVIEW D1 PLATFORM" where as the 4G version has "CHERRYVIEW C0 PLATFORM". The sys-vendor + product-version check are unique enough that the product-name check is not necessary.
Drop the product-name check so that the existing DMI match for the 4G RAM version also matches the 2G RAM version.
Signed-off-by: Hans de Goede hdegoede@redhat.com Reviewed-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Link: https://patch.msgid.link/20240823074305.16873-1-hdegoede@redhat.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/intel/common/soc-acpi-intel-cht-match.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/sound/soc/intel/common/soc-acpi-intel-cht-match.c b/sound/soc/intel/common/soc-acpi-intel-cht-match.c index 5e2ec60e2954b..e4c3492a0c282 100644 --- a/sound/soc/intel/common/soc-acpi-intel-cht-match.c +++ b/sound/soc/intel/common/soc-acpi-intel-cht-match.c @@ -84,7 +84,6 @@ static const struct dmi_system_id lenovo_yoga_tab3_x90[] = { /* Lenovo Yoga Tab 3 Pro YT3-X90, codec missing from DSDT */ .matches = { DMI_MATCH(DMI_SYS_VENDOR, "Intel Corporation"), - DMI_MATCH(DMI_PRODUCT_NAME, "CHERRYVIEW D1 PLATFORM"), DMI_MATCH(DMI_PRODUCT_VERSION, "Blade3-10A-001"), }, },
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Liao Chen liaochen4@huawei.com
[ Upstream commit ae61a3391088d29aa8605c9f2db84295ab993a49 ]
Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded based on the alias from of_device_id table.
Signed-off-by: Liao Chen liaochen4@huawei.com Link: https://patch.msgid.link/20240826084924.368387-2-liaochen4@huawei.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/intel/keembay/kmb_platform.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/sound/soc/intel/keembay/kmb_platform.c b/sound/soc/intel/keembay/kmb_platform.c index b4893365d01d5..d5c48bed7a250 100644 --- a/sound/soc/intel/keembay/kmb_platform.c +++ b/sound/soc/intel/keembay/kmb_platform.c @@ -817,6 +817,7 @@ static const struct of_device_id kmb_plat_of_match[] = { { .compatible = "intel,keembay-tdm", .data = &intel_kmb_tdm_dai}, {} }; +MODULE_DEVICE_TABLE(of, kmb_plat_of_match);
static int kmb_plat_dai_probe(struct platform_device *pdev) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Liao Chen liaochen4@huawei.com
[ Upstream commit 934b44589da9aa300201a00fe139c5c54f421563 ]
Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded based on the alias from of_device_id table.
Signed-off-by: Liao Chen liaochen4@huawei.com Link: https://patch.msgid.link/20240826084924.368387-4-liaochen4@huawei.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/tda7419.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/sound/soc/codecs/tda7419.c b/sound/soc/codecs/tda7419.c index d964e5207569c..6010df2994c7b 100644 --- a/sound/soc/codecs/tda7419.c +++ b/sound/soc/codecs/tda7419.c @@ -623,6 +623,7 @@ static const struct of_device_id tda7419_of_match[] = { { .compatible = "st,tda7419" }, { }, }; +MODULE_DEVICE_TABLE(of, tda7419_of_match);
static struct i2c_driver tda7419_driver = { .driver = {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fabio Estevam festevam@gmail.com
[ Upstream commit 5f3eee1eef5d0edd23d8ac0974f56283649a1512 ]
The rv1108-elgin-r1 board has an LCD controlled via SPI in userspace. The marking on the LCD is JG10309-01.
Add the "elgin,jg10309-01" compatible string.
Signed-off-by: Fabio Estevam festevam@gmail.com Reviewed-by: Heiko Stuebner heiko@sntech.de Link: https://patch.msgid.link/20240828180057.3167190-2-festevam@gmail.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spidev.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c index 477c3578e7d9e..7ae032f8de63c 100644 --- a/drivers/spi/spidev.c +++ b/drivers/spi/spidev.c @@ -722,6 +722,7 @@ static int spidev_of_check(struct device *dev) static const struct of_device_id spidev_dt_ids[] = { { .compatible = "cisco,spi-petra", .data = &spidev_of_check }, { .compatible = "dh,dhcom-board", .data = &spidev_of_check }, + { .compatible = "elgin,jg10309-01", .data = &spidev_of_check }, { .compatible = "lineartechnology,ltc2488", .data = &spidev_of_check }, { .compatible = "lwn,bk4", .data = &spidev_of_check }, { .compatible = "menlo,m53cpld", .data = &spidev_of_check },
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: hongchi.peng hongchi.peng@siengine.com
[ Upstream commit 258905cb9a6414be5c9ca4aa20ef855f8dc894d4 ]
We use komeda_crtc_normalize_zpos to normalize zpos of affected planes to their blending zorder in CU. If there's only one slave plane in affected planes and its layer_split property is enabled, order++ for its split layer, so that when calculating the normalized_zpos of master planes, the split layer of the slave plane is included, but the max_slave_zorder does not include the split layer and keep zero because there's only one slave plane in affacted planes, although we actually use two slave layers in this commit.
In most cases, this bug does not result in a commit failure, but assume the following situation: slave_layer 0: zpos = 0, layer split enabled, normalized_zpos = 0;(use slave_layer 2 as its split layer) master_layer 0: zpos = 2, layer_split enabled, normalized_zpos = 2;(use master_layer 2 as its split layer) master_layer 1: zpos = 4, normalized_zpos = 4; master_layer 3: zpos = 5, normalized_zpos = 5; kcrtc_st->max_slave_zorder = 0; When we use master_layer 3 as a input of CU in function komeda_compiz_set_input and check it with function komeda_component_check_input, the parameter idx is equal to normailzed_zpos minus max_slave_zorder, the value of idx is 5 and is euqal to CU's max_active_inputs, so that komeda_component_check_input returns a -EINVAL value.
To fix the bug described above, when calculating the max_slave_zorder with the layer_split enabled, count the split layer in this calculation directly.
Signed-off-by: hongchi.peng hongchi.peng@siengine.com Acked-by: Liviu Dudau liviu.dudau@arm.com Signed-off-by: Liviu Dudau liviu.dudau@arm.com Link: https://patchwork.freedesktop.org/patch/msgid/20240826024517.3739-1-hongchi.... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/arm/display/komeda/komeda_kms.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/arm/display/komeda/komeda_kms.c b/drivers/gpu/drm/arm/display/komeda/komeda_kms.c index 451746ebbe713..89f3d6aa72b08 100644 --- a/drivers/gpu/drm/arm/display/komeda/komeda_kms.c +++ b/drivers/gpu/drm/arm/display/komeda/komeda_kms.c @@ -163,6 +163,7 @@ static int komeda_crtc_normalize_zpos(struct drm_crtc *crtc, struct drm_plane *plane; struct list_head zorder_list; int order = 0, err; + u32 slave_zpos = 0;
DRM_DEBUG_ATOMIC("[CRTC:%d:%s] calculating normalized zpos values\n", crtc->base.id, crtc->name); @@ -202,10 +203,13 @@ static int komeda_crtc_normalize_zpos(struct drm_crtc *crtc, plane_st->zpos, plane_st->normalized_zpos);
/* calculate max slave zorder */ - if (has_bit(drm_plane_index(plane), kcrtc->slave_planes)) + if (has_bit(drm_plane_index(plane), kcrtc->slave_planes)) { + slave_zpos = plane_st->normalized_zpos; + if (to_kplane_st(plane_st)->layer_split) + slave_zpos++; kcrtc_st->max_slave_zorder = - max(plane_st->normalized_zpos, - kcrtc_st->max_slave_zorder); + max(slave_zpos, kcrtc_st->max_slave_zorder); + } }
crtc_st->zpos_changed = true;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Liao Chen liaochen4@huawei.com
[ Upstream commit 709df70a20e990d262c473ad9899314039e8ec82 ]
Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded based on the alias from of_device_id table.
Signed-off-by: Liao Chen liaochen4@huawei.com Link: https://patch.msgid.link/20240831094231.795024-1-liaochen4@huawei.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-bcm63xx.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/spi/spi-bcm63xx.c b/drivers/spi/spi-bcm63xx.c index 147199002df1e..a9921dcd6b797 100644 --- a/drivers/spi/spi-bcm63xx.c +++ b/drivers/spi/spi-bcm63xx.c @@ -482,6 +482,7 @@ static const struct of_device_id bcm63xx_spi_of_match[] = { { .compatible = "brcm,bcm6358-spi", .data = &bcm6358_spi_reg_offsets }, { }, }; +MODULE_DEVICE_TABLE(of, bcm63xx_spi_of_match);
static int bcm63xx_spi_probe(struct platform_device *pdev) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.com
[ Upstream commit 7ccc1465465d78e6411b7bd730d06e7435802b5c ]
Call cifs_reconnect() to wake up processes waiting on negotiate protocol to handle the case where server abruptly shut down and had no chance to properly close the socket.
Simple reproducer:
ssh 192.168.2.100 pkill -STOP smbd mount.cifs //192.168.2.100/test /mnt -o ... [never returns]
Cc: Rickard Andersson rickaran@axis.com Signed-off-by: Paulo Alcantara (Red Hat) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/connect.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c index 21b344762d0f8..87ce71b39b771 100644 --- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -673,6 +673,19 @@ allocate_buffers(struct TCP_Server_Info *server) static bool server_unresponsive(struct TCP_Server_Info *server) { + /* + * If we're in the process of mounting a share or reconnecting a session + * and the server abruptly shut down (e.g. socket wasn't closed, packet + * had been ACK'ed but no SMB response), don't wait longer than 20s to + * negotiate protocol. + */ + spin_lock(&server->srv_lock); + if (server->tcpStatus == CifsInNegotiate && + time_after(jiffies, server->lstrp + 20 * HZ)) { + spin_unlock(&server->srv_lock); + cifs_reconnect(server, false); + return true; + } /* * We need to wait 3 echo intervals to make sure we handle such * situations right: @@ -684,7 +697,6 @@ server_unresponsive(struct TCP_Server_Info *server) * 65s kernel_recvmsg times out, and we see that we haven't gotten * a response in >60s. */ - spin_lock(&server->srv_lock); if ((server->tcpStatus == CifsGood || server->tcpStatus == CifsNeedNegotiate) && (!server->ops->can_echo || server->ops->can_echo(server)) &&
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Kelley mhklinux@outlook.com
[ Upstream commit 8fcc514809de41153b43ccbe1a0cdf7f72b78e7e ]
A Linux guest on Hyper-V gets the TSC frequency from a synthetic MSR, if available. In this case, set X86_FEATURE_TSC_KNOWN_FREQ so that Linux doesn't unnecessarily do refined TSC calibration when setting up the TSC clocksource.
With this change, a message such as this is no longer output during boot when the TSC is used as the clocksource:
[ 1.115141] tsc: Refined TSC clocksource calibration: 2918.408 MHz
Furthermore, the guest and host will have exactly the same view of the TSC frequency, which is important for features such as the TSC deadline timer that are emulated by the Hyper-V host.
Signed-off-by: Michael Kelley mhklinux@outlook.com Reviewed-by: Roman Kisel romank@linux.microsoft.com Link: https://lore.kernel.org/r/20240606025559.1631-1-mhklinux@outlook.com Signed-off-by: Wei Liu wei.liu@kernel.org Message-ID: 20240606025559.1631-1-mhklinux@outlook.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/cpu/mshyperv.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index 9b039e9635e40..542b818c0d20d 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -324,6 +324,7 @@ static void __init ms_hyperv_init_platform(void) ms_hyperv.misc_features & HV_FEATURE_FREQUENCY_MSRS_AVAILABLE) { x86_platform.calibrate_tsc = hv_get_tsc_khz; x86_platform.calibrate_cpu = hv_get_tsc_khz; + setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ); }
if (ms_hyperv.priv_high & HV_ISOLATION) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: zhang jiao zhangjiao2@cmss.chinamobile.com
[ Upstream commit 5e5cc1eb65256e6017e3deec04f9806f2f317853 ]
rm .*.cmd when make clean
Signed-off-by: zhang jiao zhangjiao2@cmss.chinamobile.com Reviewed-by: Saurabh Sengar ssengar@linux.microsoft.com Link: https://lore.kernel.org/r/20240902042103.5867-1-zhangjiao2@cmss.chinamobile.... Signed-off-by: Wei Liu wei.liu@kernel.org Message-ID: 20240902042103.5867-1-zhangjiao2@cmss.chinamobile.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/hv/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/hv/Makefile b/tools/hv/Makefile index fe770e679ae8f..5643058e2d377 100644 --- a/tools/hv/Makefile +++ b/tools/hv/Makefile @@ -47,7 +47,7 @@ $(OUTPUT)hv_fcopy_daemon: $(HV_FCOPY_DAEMON_IN)
clean: rm -f $(ALL_PROGRAMS) - find $(or $(OUTPUT),.) -name '*.o' -delete -o -name '.*.d' -delete + find $(or $(OUTPUT),.) -name '*.o' -delete -o -name '.*.d' -delete -o -name '.*.cmd' -delete
install: $(ALL_PROGRAMS) install -d -m 755 $(DESTDIR)$(sbindir); \
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hongyu Jin hongyu.jin@unisoc.com
[ Upstream commit f3c89983cb4fc00be64eb0d5cbcfcdf2cacb965e ]
Commit 82b74cac2849 ("blk-ioprio: Convert from rqos policy to direct call") pushed setting bio I/O priority down into blk_mq_submit_bio() -- which is too low within block core's submit_bio() because it skips setting I/O priority for block drivers that implement fops->submit_bio() (e.g. DM, MD, etc).
Fix this by moving bio_set_ioprio() up from blk-mq.c to blk-core.c and call it from submit_bio(). This ensures all block drivers call bio_set_ioprio() during initial bio submission.
Fixes: a78418e6a04c ("block: Always initialize bio IO priority on submit") Co-developed-by: Yibin Ding yibin.ding@unisoc.com Signed-off-by: Yibin Ding yibin.ding@unisoc.com Signed-off-by: Hongyu Jin hongyu.jin@unisoc.com Reviewed-by: Eric Biggers ebiggers@google.com Reviewed-by: Mikulas Patocka mpatocka@redhat.com [snitzer: revised commit header] Signed-off-by: Mike Snitzer snitzer@kernel.org Reviewed-by: Ming Lei ming.lei@redhat.com Link: https://lore.kernel.org/r/20240130202638.62600-2-snitzer@kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-core.c | 10 ++++++++++ block/blk-mq.c | 10 ---------- 2 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c index a4155f123ab38..94941e3ce2194 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -49,6 +49,7 @@ #include "blk-pm.h" #include "blk-cgroup.h" #include "blk-throttle.h" +#include "blk-ioprio.h"
struct dentry *blk_debugfs_root;
@@ -799,6 +800,14 @@ void submit_bio_noacct(struct bio *bio) } EXPORT_SYMBOL(submit_bio_noacct);
+static void bio_set_ioprio(struct bio *bio) +{ + /* Nobody set ioprio so far? Initialize it based on task's nice value */ + if (IOPRIO_PRIO_CLASS(bio->bi_ioprio) == IOPRIO_CLASS_NONE) + bio->bi_ioprio = get_current_ioprio(); + blkcg_set_ioprio(bio); +} + /** * submit_bio - submit a bio to the block device layer for I/O * @bio: The &struct bio which describes the I/O @@ -824,6 +833,7 @@ void submit_bio(struct bio *bio) count_vm_events(PGPGOUT, bio_sectors(bio)); }
+ bio_set_ioprio(bio); submit_bio_noacct(bio); } EXPORT_SYMBOL(submit_bio); diff --git a/block/blk-mq.c b/block/blk-mq.c index daf0e4f3444e7..542b28a2e6b0f 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -42,7 +42,6 @@ #include "blk-stat.h" #include "blk-mq-sched.h" #include "blk-rq-qos.h" -#include "blk-ioprio.h"
static DEFINE_PER_CPU(struct llist_head, blk_cpu_done);
@@ -2949,14 +2948,6 @@ static bool blk_mq_can_use_cached_rq(struct request *rq, struct blk_plug *plug, return true; }
-static void bio_set_ioprio(struct bio *bio) -{ - /* Nobody set ioprio so far? Initialize it based on task's nice value */ - if (IOPRIO_PRIO_CLASS(bio->bi_ioprio) == IOPRIO_CLASS_NONE) - bio->bi_ioprio = get_current_ioprio(); - blkcg_set_ioprio(bio); -} - /** * blk_mq_submit_bio - Create and send a request to block device. * @bio: Bio pointer. @@ -2980,7 +2971,6 @@ void blk_mq_submit_bio(struct bio *bio) blk_status_t ret;
bio = blk_queue_bounce(bio, q); - bio_set_ioprio(bio);
if (plug) { rq = rq_list_peek(&plug->cached_rq);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geert Uytterhoeven geert+renesas@glider.be
[ Upstream commit 5478a4f7b94414def7b56d2f18bc2ed9b0f3f1f2 ]
When the of_device_id entry for "elgin,jg10309-01" was added, the corresponding spi_device_id was forgotten, causing a warning message during boot-up:
SPI driver spidev has no spi_device_id for elgin,jg10309-01
Fix module autoloading and shut up the warning by adding the missing entry.
Fixes: 5f3eee1eef5d0edd ("spi: spidev: Add an entry for elgin,jg10309-01") Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Link: https://patch.msgid.link/54bbb9d8a8db7e52d13e266f2d4a9bcd8b42a98a.1725366625... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spidev.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c index 7ae032f8de63c..81a3cf9253452 100644 --- a/drivers/spi/spidev.c +++ b/drivers/spi/spidev.c @@ -694,6 +694,7 @@ static struct class *spidev_class; static const struct spi_device_id spidev_spi_ids[] = { { .name = "bh2228fv" }, { .name = "dh2228fv" }, + { .name = "jg10309-01" }, { .name = "ltc2488" }, { .name = "sx1301" }, { .name = "bk4" },
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ferry Meng mengferry@linux.alibaba.com
[ Upstream commit 9e3041fecdc8f78a5900c3aa51d3d756e73264d6 ]
Add a paranoia check to make sure it doesn't stray beyond valid memory region containing ocfs2 xattr entries when scanning for a match. It will prevent out-of-bound access in case of crafted images.
Link: https://lkml.kernel.org/r/20240520024024.1976129-1-joseph.qi@linux.alibaba.c... Signed-off-by: Ferry Meng mengferry@linux.alibaba.com Signed-off-by: Joseph Qi joseph.qi@linux.alibaba.com Reported-by: lei lu llfamsec@gmail.com Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Cc: Mark Fasheh mark@fasheh.com Cc: Joel Becker jlbec@evilplan.org Cc: Junxiao Bi junxiao.bi@oracle.com Cc: Changwei Ge gechangwei@live.cn Cc: Gang He ghe@suse.com Cc: Jun Piao piaojun@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Stable-dep-of: af77c4fc1871 ("ocfs2: strict bound check before memcmp in ocfs2_xattr_find_entry()") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ocfs2/xattr.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c index 55699c5735413..61213b7e4dfbe 100644 --- a/fs/ocfs2/xattr.c +++ b/fs/ocfs2/xattr.c @@ -1066,7 +1066,7 @@ ssize_t ocfs2_listxattr(struct dentry *dentry, return i_ret + b_ret; }
-static int ocfs2_xattr_find_entry(int name_index, +static int ocfs2_xattr_find_entry(struct inode *inode, int name_index, const char *name, struct ocfs2_xattr_search *xs) { @@ -1080,6 +1080,10 @@ static int ocfs2_xattr_find_entry(int name_index, name_len = strlen(name); entry = xs->here; for (i = 0; i < le16_to_cpu(xs->header->xh_count); i++) { + if ((void *)entry >= xs->end) { + ocfs2_error(inode->i_sb, "corrupted xattr entries"); + return -EFSCORRUPTED; + } cmp = name_index - ocfs2_xattr_get_type(entry); if (!cmp) cmp = name_len - entry->xe_name_len; @@ -1170,7 +1174,7 @@ static int ocfs2_xattr_ibody_get(struct inode *inode, xs->base = (void *)xs->header; xs->here = xs->header->xh_entries;
- ret = ocfs2_xattr_find_entry(name_index, name, xs); + ret = ocfs2_xattr_find_entry(inode, name_index, name, xs); if (ret) return ret; size = le64_to_cpu(xs->here->xe_value_size); @@ -2702,7 +2706,7 @@ static int ocfs2_xattr_ibody_find(struct inode *inode,
/* Find the named attribute. */ if (oi->ip_dyn_features & OCFS2_INLINE_XATTR_FL) { - ret = ocfs2_xattr_find_entry(name_index, name, xs); + ret = ocfs2_xattr_find_entry(inode, name_index, name, xs); if (ret && ret != -ENODATA) return ret; xs->not_found = ret; @@ -2837,7 +2841,7 @@ static int ocfs2_xattr_block_find(struct inode *inode, xs->end = (void *)(blk_bh->b_data) + blk_bh->b_size; xs->here = xs->header->xh_entries;
- ret = ocfs2_xattr_find_entry(name_index, name, xs); + ret = ocfs2_xattr_find_entry(inode, name_index, name, xs); } else ret = ocfs2_xattr_index_block_find(inode, blk_bh, name_index,
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ferry Meng mengferry@linux.alibaba.com
[ Upstream commit af77c4fc1871847b528d58b7fdafb4aa1f6a9262 ]
xattr in ocfs2 maybe 'non-indexed', which saved with additional space requested. It's better to check if the memory is out of bound before memcmp, although this possibility mainly comes from crafted poisonous images.
Link: https://lkml.kernel.org/r/20240520024024.1976129-2-joseph.qi@linux.alibaba.c... Signed-off-by: Ferry Meng mengferry@linux.alibaba.com Signed-off-by: Joseph Qi joseph.qi@linux.alibaba.com Reported-by: lei lu llfamsec@gmail.com Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Cc: Changwei Ge gechangwei@live.cn Cc: Gang He ghe@suse.com Cc: Joel Becker jlbec@evilplan.org Cc: Jun Piao piaojun@huawei.com Cc: Junxiao Bi junxiao.bi@oracle.com Cc: Mark Fasheh mark@fasheh.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ocfs2/xattr.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c index 61213b7e4dfbe..3ba40f16ef056 100644 --- a/fs/ocfs2/xattr.c +++ b/fs/ocfs2/xattr.c @@ -1072,7 +1072,7 @@ static int ocfs2_xattr_find_entry(struct inode *inode, int name_index, { struct ocfs2_xattr_entry *entry; size_t name_len; - int i, cmp = 1; + int i, name_offset, cmp = 1;
if (name == NULL) return -EINVAL; @@ -1087,10 +1087,15 @@ static int ocfs2_xattr_find_entry(struct inode *inode, int name_index, cmp = name_index - ocfs2_xattr_get_type(entry); if (!cmp) cmp = name_len - entry->xe_name_len; - if (!cmp) - cmp = memcmp(name, (xs->base + - le16_to_cpu(entry->xe_name_offset)), - name_len); + if (!cmp) { + name_offset = le16_to_cpu(entry->xe_name_offset); + if ((xs->base + name_offset + name_len) > xs->end) { + ocfs2_error(inode->i_sb, + "corrupted xattr entries"); + return -EFSCORRUPTED; + } + cmp = memcmp(name, (xs->base + name_offset), name_len); + } if (cmp == 0) break; entry += 1;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Chinner dchinner@redhat.com
[ Upstream commit 52f31ed228212ba572c44e15e818a3a5c74122c0 ]
Resulting in a UAF if the shrinker races with some other dquot freeing mechanism that sets XFS_DQFLAG_FREEING before the dquot is removed from the LRU. This can occur if a dquot purge races with drop_caches.
Reported-by: syzbot+912776840162c13db1a3@syzkaller.appspotmail.com Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_qm.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)
--- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -423,6 +423,14 @@ xfs_qm_dquot_isolate( goto out_miss_busy;
/* + * If something else is freeing this dquot and hasn't yet removed it + * from the LRU, leave it for the freeing task to complete the freeing + * process rather than risk it being free from under us here. + */ + if (dqp->q_flags & XFS_DQFLAG_FREEING) + goto out_miss_unlock; + + /* * This dquot has acquired a reference in the meantime remove it from * the freelist and try again. */ @@ -441,10 +449,8 @@ xfs_qm_dquot_isolate( * skip it so there is time for the IO to complete before we try to * reclaim it again on the next LRU pass. */ - if (!xfs_dqflock_nowait(dqp)) { - xfs_dqunlock(dqp); - goto out_miss_busy; - } + if (!xfs_dqflock_nowait(dqp)) + goto out_miss_unlock;
if (XFS_DQ_IS_DIRTY(dqp)) { struct xfs_buf *bp = NULL; @@ -478,6 +484,8 @@ xfs_qm_dquot_isolate( XFS_STATS_INC(dqp->q_mount, xs_qm_dqreclaims); return LRU_REMOVED;
+out_miss_unlock: + xfs_dqunlock(dqp); out_miss_busy: trace_xfs_dqreclaim_busy(dqp); XFS_STATS_INC(dqp->q_mount, xs_qm_dqreclaim_misses);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wu Guanghao wuguanghao3@huawei.com
[ Upstream commit 4da112513c01d7d0acf1025b8764349d46e177d6 ]
We are doing a test about deleting a large number of files when memory is low. A deadlock problem was found.
[ 1240.279183] -> #1 (fs_reclaim){+.+.}-{0:0}: [ 1240.280450] lock_acquire+0x197/0x460 [ 1240.281548] fs_reclaim_acquire.part.0+0x20/0x30 [ 1240.282625] kmem_cache_alloc+0x2b/0x940 [ 1240.283816] xfs_trans_alloc+0x8a/0x8b0 [ 1240.284757] xfs_inactive_ifree+0xe4/0x4e0 [ 1240.285935] xfs_inactive+0x4e9/0x8a0 [ 1240.286836] xfs_inodegc_worker+0x160/0x5e0 [ 1240.287969] process_one_work+0xa19/0x16b0 [ 1240.289030] worker_thread+0x9e/0x1050 [ 1240.290131] kthread+0x34f/0x460 [ 1240.290999] ret_from_fork+0x22/0x30 [ 1240.291905] [ 1240.291905] -> #0 ((work_completion)(&gc->work)){+.+.}-{0:0}: [ 1240.293569] check_prev_add+0x160/0x2490 [ 1240.294473] __lock_acquire+0x2c4d/0x5160 [ 1240.295544] lock_acquire+0x197/0x460 [ 1240.296403] __flush_work+0x6bc/0xa20 [ 1240.297522] xfs_inode_mark_reclaimable+0x6f0/0xdc0 [ 1240.298649] destroy_inode+0xc6/0x1b0 [ 1240.299677] dispose_list+0xe1/0x1d0 [ 1240.300567] prune_icache_sb+0xec/0x150 [ 1240.301794] super_cache_scan+0x2c9/0x480 [ 1240.302776] do_shrink_slab+0x3f0/0xaa0 [ 1240.303671] shrink_slab+0x170/0x660 [ 1240.304601] shrink_node+0x7f7/0x1df0 [ 1240.305515] balance_pgdat+0x766/0xf50 [ 1240.306657] kswapd+0x5bd/0xd20 [ 1240.307551] kthread+0x34f/0x460 [ 1240.308346] ret_from_fork+0x22/0x30 [ 1240.309247] [ 1240.309247] other info that might help us debug this: [ 1240.309247] [ 1240.310944] Possible unsafe locking scenario: [ 1240.310944] [ 1240.312379] CPU0 CPU1 [ 1240.313363] ---- ---- [ 1240.314433] lock(fs_reclaim); [ 1240.315107] lock((work_completion)(&gc->work)); [ 1240.316828] lock(fs_reclaim); [ 1240.318088] lock((work_completion)(&gc->work)); [ 1240.319203] [ 1240.319203] *** DEADLOCK *** ... [ 2438.431081] Workqueue: xfs-inodegc/sda xfs_inodegc_worker [ 2438.432089] Call Trace: [ 2438.432562] __schedule+0xa94/0x1d20 [ 2438.435787] schedule+0xbf/0x270 [ 2438.436397] schedule_timeout+0x6f8/0x8b0 [ 2438.445126] wait_for_completion+0x163/0x260 [ 2438.448610] __flush_work+0x4c4/0xa40 [ 2438.455011] xfs_inode_mark_reclaimable+0x6ef/0xda0 [ 2438.456695] destroy_inode+0xc6/0x1b0 [ 2438.457375] dispose_list+0xe1/0x1d0 [ 2438.458834] prune_icache_sb+0xe8/0x150 [ 2438.461181] super_cache_scan+0x2b3/0x470 [ 2438.461950] do_shrink_slab+0x3cf/0xa50 [ 2438.462687] shrink_slab+0x17d/0x660 [ 2438.466392] shrink_node+0x87e/0x1d40 [ 2438.467894] do_try_to_free_pages+0x364/0x1300 [ 2438.471188] try_to_free_pages+0x26c/0x5b0 [ 2438.473567] __alloc_pages_slowpath.constprop.136+0x7aa/0x2100 [ 2438.482577] __alloc_pages+0x5db/0x710 [ 2438.485231] alloc_pages+0x100/0x200 [ 2438.485923] allocate_slab+0x2c0/0x380 [ 2438.486623] ___slab_alloc+0x41f/0x690 [ 2438.490254] __slab_alloc+0x54/0x70 [ 2438.491692] kmem_cache_alloc+0x23e/0x270 [ 2438.492437] xfs_trans_alloc+0x88/0x880 [ 2438.493168] xfs_inactive_ifree+0xe2/0x4e0 [ 2438.496419] xfs_inactive+0x4eb/0x8b0 [ 2438.497123] xfs_inodegc_worker+0x16b/0x5e0 [ 2438.497918] process_one_work+0xbf7/0x1a20 [ 2438.500316] worker_thread+0x8c/0x1060 [ 2438.504938] ret_from_fork+0x22/0x30
When the memory is insufficient, xfs_inonodegc_worker will trigger memory reclamation when memory is allocated, then flush_work() may be called to wait for the work to complete. This causes a deadlock.
So use memalloc_nofs_save() to avoid triggering memory reclamation in xfs_inodegc_worker.
Signed-off-by: Wu Guanghao wuguanghao3@huawei.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_icache.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1858,6 +1858,7 @@ xfs_inodegc_worker( struct xfs_inodegc, work); struct llist_node *node = llist_del_all(&gc->list); struct xfs_inode *ip, *n; + unsigned int nofs_flag;
ASSERT(gc->cpu == smp_processor_id());
@@ -1866,6 +1867,13 @@ xfs_inodegc_worker( if (!node) return;
+ /* + * We can allocate memory here while doing writeback on behalf of + * memory reclaim. To avoid memory allocation deadlocks set the + * task-wide nofs context for the following operations. + */ + nofs_flag = memalloc_nofs_save(); + ip = llist_entry(node, struct xfs_inode, i_gclist); trace_xfs_inodegc_worker(ip->i_mount, READ_ONCE(gc->shrinker_hits));
@@ -1874,6 +1882,8 @@ xfs_inodegc_worker( xfs_iflags_set(ip, XFS_INACTIVATING); xfs_inodegc_inactivate(ip); } + + memalloc_nofs_restore(nofs_flag); }
/*
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wengang Wang wen.gang.wang@oracle.com
[ Upstream commit 601a27ea09a317d0fe2895df7d875381fb393041 ]
In xfs_extent_busy_update_extent() case 6 and 7, whenever bno is modified on extent busy, the relavent length has to be modified accordingly.
Signed-off-by: Wengang Wang wen.gang.wang@oracle.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_extent_busy.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/xfs/xfs_extent_busy.c +++ b/fs/xfs/xfs_extent_busy.c @@ -236,6 +236,7 @@ xfs_extent_busy_update_extent( * */ busyp->bno = fend; + busyp->length = bend - fend; } else if (bbno < fbno) { /* * Case 8:
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Chinner dchinner@redhat.com
[ Upstream commit c85007e2e3942da1f9361e4b5a9388ea3a8dcc5b ]
When we split a BMBT due to record insertion, we offload it to a worker thread because we can be deep in the stack when we try to allocate a new block for the BMBT. Allocation can use several kilobytes of stack (full memory reclaim, swap and/or IO path can end up on the stack during allocation) and we can already be several kilobytes deep in the stack when we need to split the BMBT.
A recent workload demonstrated a deadlock in this BMBT split offload. It requires several things to happen at once:
1. two inodes need a BMBT split at the same time, one must be unwritten extent conversion from IO completion, the other must be from extent allocation.
2. there must be a no available xfs_alloc_wq worker threads available in the worker pool.
3. There must be sustained severe memory shortages such that new kworker threads cannot be allocated to the xfs_alloc_wq pool for both threads that need split work to be run
4. The split work from the unwritten extent conversion must run first.
5. when the BMBT block allocation runs from the split work, it must loop over all AGs and not be able to either trylock an AGF successfully, or each AGF is is able to lock has no space available for a single block allocation.
6. The BMBT allocation must then attempt to lock the AGF that the second task queued to the rescuer thread already has locked before it finds an AGF it can allocate from.
At this point, we have an ABBA deadlock between tasks queued on the xfs_alloc_wq rescuer thread and a locked AGF. i.e. The queued task holding the AGF lock can't be run by the rescuer thread until the task the rescuer thread is runing gets the AGF lock....
This is a highly improbably series of events, but there it is.
There's a couple of ways to fix this, but the easiest way to ensure that we only punt tasks with a locked AGF that holds enough space for the BMBT block allocations to the worker thread.
This works for unwritten extent conversion in IO completion (which doesn't have a locked AGF and space reservations) because we have tight control over the IO completion stack. It is typically only 6 functions deep when xfs_btree_split() is called because we've already offloaded the IO completion work to a worker thread and hence we don't need to worry about stack overruns here.
The other place we can be called for a BMBT split without a preceeding allocation is __xfs_bunmapi() when punching out the center of an existing extent. We don't remove extents in the IO path, so these operations don't tend to be called with a lot of stack consumed. Hence we don't really need to ship the split off to a worker thread in these cases, either.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_btree.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)
--- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -2913,9 +2913,22 @@ xfs_btree_split_worker( }
/* - * BMBT split requests often come in with little stack to work on. Push + * BMBT split requests often come in with little stack to work on so we push * them off to a worker thread so there is lots of stack to use. For the other * btree types, just call directly to avoid the context switch overhead here. + * + * Care must be taken here - the work queue rescuer thread introduces potential + * AGF <> worker queue deadlocks if the BMBT block allocation has to lock new + * AGFs to allocate blocks. A task being run by the rescuer could attempt to + * lock an AGF that is already locked by a task queued to run by the rescuer, + * resulting in an ABBA deadlock as the rescuer cannot run the lock holder to + * release it until the current thread it is running gains the lock. + * + * To avoid this issue, we only ever queue BMBT splits that don't have an AGF + * already locked to allocate from. The only place that doesn't hold an AGF + * locked is unwritten extent conversion at IO completion, but that has already + * been offloaded to a worker thread and hence has no stack consumption issues + * we have to worry about. */ STATIC int /* error */ xfs_btree_split( @@ -2929,7 +2942,8 @@ xfs_btree_split( struct xfs_btree_split_args args; DECLARE_COMPLETION_ONSTACK(done);
- if (cur->bc_btnum != XFS_BTNUM_BMAP) + if (cur->bc_btnum != XFS_BTNUM_BMAP || + cur->bc_tp->t_firstblock == NULLFSBLOCK) return __xfs_btree_split(cur, level, ptrp, key, curp, stat);
args.cur = cur;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Chinner dchinner@redhat.com
[ Upstream commit 1dd0510f6d4b85616a36aabb9be38389467122d9 ]
I've recently encountered an ABBA deadlock with g/476. The upcoming changes seem to make this much easier to hit, but the underlying problem is a pre-existing one.
Essentially, if we select an AG for allocation, then lock the AGF and then fail to allocate for some reason (e.g. minimum length requirements cannot be satisfied), then we drop out of the allocation with the AGF still locked.
The caller then modifies the allocation constraints - usually loosening them up - and tries again. This can result in trying to access AGFs that are lower than the AGF we already have locked from the failed attempt. e.g. the failed attempt skipped several AGs before failing, so we have locks an AG higher than the start AG. Retrying the allocation from the start AG then causes us to violate AGF lock ordering and this can lead to deadlocks.
The deadlock exists even if allocation succeeds - we can do a followup allocations in the same transaction for BMBT blocks that aren't guaranteed to be in the same AG as the original, and can move into higher AGs. Hence we really need to move the tp->t_firstblock tracking down into xfs_alloc_vextent() where it can be set when we exit with a locked AG.
xfs_alloc_vextent() can also check there if the requested allocation falls within the allow range of AGs set by tp->t_firstblock. If we can't allocate within the range set, we have to fail the allocation. If we are allowed to to non-blocking AGF locking, we can ignore the AG locking order limitations as we can use try-locks for the first iteration over requested AG range.
This invalidates a set of post allocation asserts that check that the allocation is always above tp->t_firstblock if it is set. Because we can use try-locks to avoid the deadlock in some circumstances, having a pre-existing locked AGF doesn't always prevent allocation from lower order AGFs. Hence those ASSERTs need to be removed.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Allison Henderson allison.henderson@oracle.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_alloc.c | 69 ++++++++++++++++++++++++++++++++++++++-------- fs/xfs/libxfs/xfs_bmap.c | 14 --------- fs/xfs/xfs_trace.h | 1 3 files changed, 58 insertions(+), 26 deletions(-)
--- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -3164,10 +3164,13 @@ xfs_alloc_vextent( xfs_alloctype_t type; /* input allocation type */ int bump_rotor = 0; xfs_agnumber_t rotorstep = xfs_rotorstep; /* inode32 agf stepper */ + xfs_agnumber_t minimum_agno = 0;
mp = args->mp; type = args->otype = args->type; args->agbno = NULLAGBLOCK; + if (args->tp->t_firstblock != NULLFSBLOCK) + minimum_agno = XFS_FSB_TO_AGNO(mp, args->tp->t_firstblock); /* * Just fix this up, for the case where the last a.g. is shorter * (or there's only one a.g.) and the caller couldn't easily figure @@ -3201,6 +3204,13 @@ xfs_alloc_vextent( */ args->agno = XFS_FSB_TO_AGNO(mp, args->fsbno); args->pag = xfs_perag_get(mp, args->agno); + + if (minimum_agno > args->agno) { + trace_xfs_alloc_vextent_skip_deadlock(args); + error = 0; + break; + } + error = xfs_alloc_fix_freelist(args, 0); if (error) { trace_xfs_alloc_vextent_nofix(args); @@ -3232,6 +3242,8 @@ xfs_alloc_vextent( case XFS_ALLOCTYPE_FIRST_AG: /* * Rotate through the allocation groups looking for a winner. + * If we are blocking, we must obey minimum_agno contraints for + * avoiding ABBA deadlocks on AGF locking. */ if (type == XFS_ALLOCTYPE_FIRST_AG) { /* @@ -3239,7 +3251,7 @@ xfs_alloc_vextent( */ args->agno = XFS_FSB_TO_AGNO(mp, args->fsbno); args->type = XFS_ALLOCTYPE_THIS_AG; - sagno = 0; + sagno = minimum_agno; flags = 0; } else { /* @@ -3248,6 +3260,7 @@ xfs_alloc_vextent( args->agno = sagno = XFS_FSB_TO_AGNO(mp, args->fsbno); flags = XFS_ALLOC_FLAG_TRYLOCK; } + /* * Loop over allocation groups twice; first time with * trylock set, second time without. @@ -3276,19 +3289,21 @@ xfs_alloc_vextent( if (args->agno == sagno && type == XFS_ALLOCTYPE_START_BNO) args->type = XFS_ALLOCTYPE_THIS_AG; + /* - * For the first allocation, we can try any AG to get - * space. However, if we already have allocated a - * block, we don't want to try AGs whose number is below - * sagno. Otherwise, we may end up with out-of-order - * locking of AGF, which might cause deadlock. - */ + * If we are try-locking, we can't deadlock on AGF + * locks, so we can wrap all the way back to the first + * AG. Otherwise, wrap back to the start AG so we can't + * deadlock, and let the end of scan handler decide what + * to do next. + */ if (++(args->agno) == mp->m_sb.sb_agcount) { - if (args->tp->t_firstblock != NULLFSBLOCK) - args->agno = sagno; - else + if (flags & XFS_ALLOC_FLAG_TRYLOCK) args->agno = 0; + else + args->agno = sagno; } + /* * Reached the starting a.g., must either be done * or switch to non-trylock mode. @@ -3300,7 +3315,14 @@ xfs_alloc_vextent( break; }
+ /* + * Blocking pass next, so we must obey minimum + * agno constraints to avoid ABBA AGF deadlocks. + */ flags = 0; + if (minimum_agno > sagno) + sagno = minimum_agno; + if (type == XFS_ALLOCTYPE_START_BNO) { args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno); @@ -3322,9 +3344,9 @@ xfs_alloc_vextent( ASSERT(0); /* NOTREACHED */ } - if (args->agbno == NULLAGBLOCK) + if (args->agbno == NULLAGBLOCK) { args->fsbno = NULLFSBLOCK; - else { + } else { args->fsbno = XFS_AGB_TO_FSB(mp, args->agno, args->agbno); #ifdef DEBUG ASSERT(args->len >= args->minlen); @@ -3335,6 +3357,29 @@ xfs_alloc_vextent( #endif
} + + /* + * We end up here with a locked AGF. If we failed, the caller is likely + * going to try to allocate again with different parameters, and that + * can widen the AGs that are searched for free space. If we have to do + * BMBT block allocation, we have to do a new allocation. + * + * Hence leaving this function with the AGF locked opens up potential + * ABBA AGF deadlocks because a future allocation attempt in this + * transaction may attempt to lock a lower number AGF. + * + * We can't release the AGF until the transaction is commited, so at + * this point we must update the "firstblock" tracker to point at this + * AG if the tracker is empty or points to a lower AG. This allows the + * next allocation attempt to be modified appropriately to avoid + * deadlocks. + */ + if (args->agbp && + (args->tp->t_firstblock == NULLFSBLOCK || + args->pag->pag_agno > minimum_agno)) { + args->tp->t_firstblock = XFS_AGB_TO_FSB(mp, + args->pag->pag_agno, 0); + } xfs_perag_put(args->pag); return 0; error0: --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3413,21 +3413,7 @@ xfs_bmap_process_allocated_extent( xfs_fileoff_t orig_offset, xfs_extlen_t orig_length) { - int nullfb; - - nullfb = ap->tp->t_firstblock == NULLFSBLOCK; - - /* - * check the allocation happened at the same or higher AG than - * the first block that was allocated. - */ - ASSERT(nullfb || - XFS_FSB_TO_AGNO(args->mp, ap->tp->t_firstblock) <= - XFS_FSB_TO_AGNO(args->mp, args->fsbno)); - ap->blkno = args->fsbno; - if (nullfb) - ap->tp->t_firstblock = args->fsbno; ap->length = args->len; /* * If the extent size hint is active, we tried to round the --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -1877,6 +1877,7 @@ DEFINE_ALLOC_EVENT(xfs_alloc_small_noten DEFINE_ALLOC_EVENT(xfs_alloc_small_done); DEFINE_ALLOC_EVENT(xfs_alloc_small_error); DEFINE_ALLOC_EVENT(xfs_alloc_vextent_badargs); +DEFINE_ALLOC_EVENT(xfs_alloc_vextent_skip_deadlock); DEFINE_ALLOC_EVENT(xfs_alloc_vextent_nofix); DEFINE_ALLOC_EVENT(xfs_alloc_vextent_noagbp); DEFINE_ALLOC_EVENT(xfs_alloc_vextent_loopfailed);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Chinner dchinner@redhat.com
[ Upstream commit f08f984c63e9980614ae3a0a574b31eaaef284b2 ]
When an XFS filesystem has free inodes in chunks already allocated on disk, it will still allocate new inode chunks if the target AG has no free inodes in it. Normally, this is a good idea as it preserves locality of all the inodes in a given directory.
However, at ENOSPC this can lead to using the last few remaining free filesystem blocks to allocate a new chunk when there are many, many free inodes that could be allocated without consuming free space. This results in speeding up the consumption of the last few blocks and inode create operations then returning ENOSPC when there free inodes available because we don't have enough block left in the filesystem for directory creation reservations to proceed.
Hence when we are near ENOSPC, we should be attempting to preserve the remaining blocks for directory block allocation rather than using them for unnecessary inode chunk creation.
This particular behaviour is exposed by xfs/294, when it drives to ENOSPC on empty file creation whilst there are still thousands of free inodes available for allocation in other AGs in the filesystem.
Hence, when we are within 1% of ENOSPC, change the inode allocation behaviour to prefer to use existing free inodes over allocating new inode chunks, even though it results is poorer locality of the data set. It is more important for the allocations to be space efficient near ENOSPC than to have optimal locality for performance, so lets modify the inode AG selection code to reflect that fact.
This allows generic/294 to not only pass with this allocator rework patchset, but to increase the number of post-ENOSPC empty inode allocations to from ~600 to ~9080 before we hit ENOSPC on the directory create transaction reservation.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Allison Henderson allison.henderson@oracle.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_ialloc.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
--- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -1737,6 +1737,7 @@ xfs_dialloc( struct xfs_perag *pag; struct xfs_ino_geometry *igeo = M_IGEO(mp); bool ok_alloc = true; + bool low_space = false; int flags; xfs_ino_t ino;
@@ -1768,6 +1769,20 @@ xfs_dialloc( }
/* + * If we are near to ENOSPC, we want to prefer allocation from AGs that + * have free inodes in them rather than use up free space allocating new + * inode chunks. Hence we turn off allocation for the first non-blocking + * pass through the AGs if we are near ENOSPC to consume free inodes + * that we can immediately allocate, but then we allow allocation on the + * second pass if we fail to find an AG with free inodes in it. + */ + if (percpu_counter_read_positive(&mp->m_fdblocks) < + mp->m_low_space[XFS_LOWSP_1_PCNT]) { + ok_alloc = false; + low_space = true; + } + + /* * Loop until we find an allocation group that either has free inodes * or in which we can allocate some inodes. Iterate through the * allocation groups upward, wrapping at the end. @@ -1795,6 +1810,8 @@ xfs_dialloc( break; } flags = 0; + if (low_space) + ok_alloc = true; } xfs_perag_put(pag); }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Chinner dchinner@redhat.com
[ Upstream commit d5753847b216db0e553e8065aa825cfe497ad143 ]
When we enter xfs_bmbt_alloc_block() without having first allocated a data extent (i.e. tp->t_firstblock == NULLFSBLOCK) because we are doing something like unwritten extent conversion, the transaction block reservation is used as the minleft value.
This works for operations like unwritten extent conversion, but it assumes that the block reservation is only for a BMBT split. THis is not always true, and sometimes results in larger than necessary minleft values being set. We only actually need enough space for a btree split, something we already handle correctly in xfs_bmapi_write() via the xfs_bmapi_minleft() calculation.
We should use xfs_bmapi_minleft() in xfs_bmbt_alloc_block() to calculate the number of blocks a BMBT split on this inode is going to require, not use the transaction block reservation that contains the maximum number of blocks this transaction may consume in it...
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Allison Henderson allison.henderson@oracle.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_bmap.c | 2 +- fs/xfs/libxfs/xfs_bmap.h | 2 ++ fs/xfs/libxfs/xfs_bmap_btree.c | 19 +++++++++---------- 3 files changed, 12 insertions(+), 11 deletions(-)
--- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -4242,7 +4242,7 @@ xfs_bmapi_convert_unwritten( return 0; }
-static inline xfs_extlen_t +xfs_extlen_t xfs_bmapi_minleft( struct xfs_trans *tp, struct xfs_inode *ip, --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -220,6 +220,8 @@ int xfs_bmap_add_extent_unwritten_real(s struct xfs_inode *ip, int whichfork, struct xfs_iext_cursor *icur, struct xfs_btree_cur **curp, struct xfs_bmbt_irec *new, int *logflagsp); +xfs_extlen_t xfs_bmapi_minleft(struct xfs_trans *tp, struct xfs_inode *ip, + int fork);
enum xfs_bmap_intent_type { XFS_BMAP_MAP = 1, --- a/fs/xfs/libxfs/xfs_bmap_btree.c +++ b/fs/xfs/libxfs/xfs_bmap_btree.c @@ -213,18 +213,16 @@ xfs_bmbt_alloc_block( if (args.fsbno == NULLFSBLOCK) { args.fsbno = be64_to_cpu(start->l); args.type = XFS_ALLOCTYPE_START_BNO; + /* - * Make sure there is sufficient room left in the AG to - * complete a full tree split for an extent insert. If - * we are converting the middle part of an extent then - * we may need space for two tree splits. - * - * We are relying on the caller to make the correct block - * reservation for this operation to succeed. If the - * reservation amount is insufficient then we may fail a - * block allocation here and corrupt the filesystem. + * If we are coming here from something like unwritten extent + * conversion, there has been no data extent allocation already + * done, so we have to ensure that we attempt to locate the + * entire set of bmbt allocations in the same AG, as + * xfs_bmapi_write() would have reserved. */ - args.minleft = args.tp->t_blk_res; + args.minleft = xfs_bmapi_minleft(cur->bc_tp, cur->bc_ino.ip, + cur->bc_ino.whichfork); } else if (cur->bc_tp->t_flags & XFS_TRANS_LOWMODE) { args.type = XFS_ALLOCTYPE_START_BNO; } else { @@ -248,6 +246,7 @@ xfs_bmbt_alloc_block( * successful activate the lowspace algorithm. */ args.fsbno = 0; + args.minleft = 0; args.type = XFS_ALLOCTYPE_FIRST_AG; error = xfs_alloc_vextent(&args); if (error)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 60b730a40c43fbcc034970d3e77eb0f25b8cc1cf ]
If the end position of a GETFSMAP query overlaps an allocated space and we're using the free space info to generate fsmap info, the akeys information gets fed into the fsmap formatter with bad results. Zero-init the space.
Reported-by: syzbot+090ae72d552e6bd93cfe@syzkaller.appspotmail.com Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_fsmap.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -761,6 +761,7 @@ xfs_getfsmap_datadev_bnobt( { struct xfs_alloc_rec_incore akeys[2];
+ memset(akeys, 0, sizeof(akeys)); info->missing_owner = XFS_FMR_OWN_UNKNOWN; return __xfs_getfsmap_datadev(tp, keys, info, xfs_getfsmap_datadev_bnobt_query, &akeys[0]);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Chinner dchinner@redhat.com
[ Upstream commit 0c7273e494dd5121e20e160cb2f047a593ee14a8 ]
The background inode inactivation can attached dquots to inodes, but this can race with a foreground quotacheck failure that leads to disabling quotas and freeing the mp->m_quotainfo structure. The background inode inactivation then tries to allocate a quota, tries to dereference mp->m_quotainfo, and crashes like so:
XFS (loop1): Quotacheck: Unsuccessful (Error -5): Disabling quotas. xfs filesystem being mounted at /root/syzkaller.qCVHXV/0/file0 supports timestamps until 2038 (0x7fffffff) BUG: kernel NULL pointer dereference, address: 00000000000002a8 .... CPU: 0 PID: 161 Comm: kworker/0:4 Not tainted 6.2.0-c9c3395d5e3d #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: xfs-inodegc/loop1 xfs_inodegc_worker RIP: 0010:xfs_dquot_alloc+0x95/0x1e0 .... Call Trace: <TASK> xfs_qm_dqread+0x46/0x440 xfs_qm_dqget_inode+0x154/0x500 xfs_qm_dqattach_one+0x142/0x3c0 xfs_qm_dqattach_locked+0x14a/0x170 xfs_qm_dqattach+0x52/0x80 xfs_inactive+0x186/0x340 xfs_inodegc_worker+0xd3/0x430 process_one_work+0x3b1/0x960 worker_thread+0x52/0x660 kthread+0x161/0x1a0 ret_from_fork+0x29/0x50 </TASK> ....
Prevent this race by flushing all the queued background inode inactivations pending before purging all the cached dquots when quotacheck fails.
Reported-by: Pengfei Xu pengfei.xu@intel.com Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_qm.c | 40 ++++++++++++++++++++++++++-------------- 1 file changed, 26 insertions(+), 14 deletions(-)
--- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -1321,15 +1321,14 @@ xfs_qm_quotacheck(
error = xfs_iwalk_threaded(mp, 0, 0, xfs_qm_dqusage_adjust, 0, true, NULL); - if (error) { - /* - * The inode walk may have partially populated the dquot - * caches. We must purge them before disabling quota and - * tearing down the quotainfo, or else the dquots will leak. - */ - xfs_qm_dqpurge_all(mp); - goto error_return; - } + + /* + * On error, the inode walk may have partially populated the dquot + * caches. We must purge them before disabling quota and tearing down + * the quotainfo, or else the dquots will leak. + */ + if (error) + goto error_purge;
/* * We've made all the changes that we need to make incore. Flush them @@ -1363,10 +1362,8 @@ xfs_qm_quotacheck( * and turn quotaoff. The dquots won't be attached to any of the inodes * at this point (because we intentionally didn't in dqget_noattach). */ - if (error) { - xfs_qm_dqpurge_all(mp); - goto error_return; - } + if (error) + goto error_purge;
/* * If one type of quotas is off, then it will lose its @@ -1376,7 +1373,7 @@ xfs_qm_quotacheck( mp->m_qflags &= ~XFS_ALL_QUOTA_CHKD; mp->m_qflags |= flags;
- error_return: +error_return: xfs_buf_delwri_cancel(&buffer_list);
if (error) { @@ -1395,6 +1392,21 @@ xfs_qm_quotacheck( } else xfs_notice(mp, "Quotacheck: Done."); return error; + +error_purge: + /* + * On error, we may have inodes queued for inactivation. This may try + * to attach dquots to the inode before running cleanup operations on + * the inode and this can race with the xfs_qm_destroy_quotainfo() call + * below that frees mp->m_quotainfo. To avoid this race, flush all the + * pending inodegc operations before we purge the dquots from memory, + * ensuring that background inactivation is idle whilst we turn off + * quotas. + */ + xfs_inodegc_flush(mp); + xfs_qm_dqpurge_all(mp); + goto error_return; + }
/*
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ye Bin yebin10@huawei.com
[ Upstream commit 8ee81ed581ff35882b006a5205100db0b57bf070 ]
There's issue as follows: XFS: Assertion failed: (bmv->bmv_iflags & BMV_IF_DELALLOC) != 0, file: fs/xfs/xfs_bmap_util.c, line: 329 ------------[ cut here ]------------ kernel BUG at fs/xfs/xfs_message.c:102! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 14612 Comm: xfs_io Not tainted 6.3.0-rc2-next-20230315-00006-g2729d23ddb3b-dirty #422 RIP: 0010:assfail+0x96/0xa0 RSP: 0018:ffffc9000fa178c0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff888179a18000 RDX: 0000000000000000 RSI: ffff888179a18000 RDI: 0000000000000002 RBP: 0000000000000000 R08: ffffffff8321aab6 R09: 0000000000000000 R10: 0000000000000001 R11: ffffed1105f85139 R12: ffffffff8aacc4c0 R13: 0000000000000149 R14: ffff888269f58000 R15: 000000000000000c FS: 00007f42f27a4740(0000) GS:ffff88882fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000b92388 CR3: 000000024f006000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> xfs_getbmap+0x1a5b/0x1e40 xfs_ioc_getbmap+0x1fd/0x5b0 xfs_file_ioctl+0x2cb/0x1d50 __x64_sys_ioctl+0x197/0x210 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Above issue may happen as follows: ThreadA ThreadB do_shared_fault __do_fault xfs_filemap_fault __xfs_filemap_fault filemap_fault xfs_ioc_getbmap -> Without BMV_IF_DELALLOC flag xfs_getbmap xfs_ilock(ip, XFS_IOLOCK_SHARED); filemap_write_and_wait do_page_mkwrite xfs_filemap_page_mkwrite __xfs_filemap_fault xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); iomap_page_mkwrite ... xfs_buffered_write_iomap_begin xfs_bmapi_reserve_delalloc -> Allocate delay extent xfs_ilock_data_map_shared(ip) xfs_getbmap_report_one ASSERT((bmv->bmv_iflags & BMV_IF_DELALLOC) != 0) -> trigger BUG_ON
As xfs_filemap_page_mkwrite() only hold XFS_MMAPLOCK_SHARED lock, there's small window mkwrite can produce delay extent after file write in xfs_getbmap(). To solve above issue, just skip delalloc extents.
Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_bmap_util.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-)
--- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -314,15 +314,13 @@ xfs_getbmap_report_one( if (isnullstartblock(got->br_startblock) || got->br_startblock == DELAYSTARTBLOCK) { /* - * Delalloc extents that start beyond EOF can occur due to - * speculative EOF allocation when the delalloc extent is larger - * than the largest freespace extent at conversion time. These - * extents cannot be converted by data writeback, so can exist - * here even if we are not supposed to be finding delalloc - * extents. + * Take the flush completion as being a point-in-time snapshot + * where there are no delalloc extents, and if any new ones + * have been created racily, just skip them as being 'after' + * the flush and so don't get reported. */ - if (got->br_startoff < XFS_B_TO_FSB(ip->i_mount, XFS_ISIZE(ip))) - ASSERT((bmv->bmv_iflags & BMV_IF_DELALLOC) != 0); + if (!(bmv->bmv_iflags & BMV_IF_DELALLOC)) + return 0;
p->bmv_oflags |= BMV_OF_DELALLOC; p->bmv_block = -2;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Chinner dchinner@redhat.com
[ Upstream commit 89a4bf0dc3857569a77061d3d5ea2ac85f7e13c6 ]
When a buffer is unpinned by xfs_buf_item_unpin(), we need to access the buffer after we've dropped the buffer log item reference count. This opens a window where we can have two racing unpins for the buffer item (e.g. shutdown checkpoint context callback processing racing with journal IO iclog completion processing) and both attempt to access the buffer after dropping the BLI reference count. If we are unlucky, the "BLI freed" context wins the race and frees the buffer before the "BLI still active" case checks the buffer pin count.
This results in a use after free that can only be triggered in active filesystem shutdown situations.
To fix this, we need to ensure that buffer existence extends beyond the BLI reference count checks and until the unpin processing is complete. This implies that a buffer pin operation must also take a buffer reference to ensure that the buffer cannot be freed until the buffer unpin processing is complete.
Reported-by: yangerkun yangerkun@huawei.com Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_buf_item.c | 88 ++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 65 insertions(+), 23 deletions(-)
--- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -452,10 +452,18 @@ xfs_buf_item_format( * This is called to pin the buffer associated with the buf log item in memory * so it cannot be written out. * - * We also always take a reference to the buffer log item here so that the bli - * is held while the item is pinned in memory. This means that we can - * unconditionally drop the reference count a transaction holds when the - * transaction is completed. + * We take a reference to the buffer log item here so that the BLI life cycle + * extends at least until the buffer is unpinned via xfs_buf_item_unpin() and + * inserted into the AIL. + * + * We also need to take a reference to the buffer itself as the BLI unpin + * processing requires accessing the buffer after the BLI has dropped the final + * BLI reference. See xfs_buf_item_unpin() for an explanation. + * If unpins race to drop the final BLI reference and only the + * BLI owns a reference to the buffer, then the loser of the race can have the + * buffer fgreed from under it (e.g. on shutdown). Taking a buffer reference per + * pin count ensures the life cycle of the buffer extends for as + * long as we hold the buffer pin reference in xfs_buf_item_unpin(). */ STATIC void xfs_buf_item_pin( @@ -470,13 +478,30 @@ xfs_buf_item_pin(
trace_xfs_buf_item_pin(bip);
+ xfs_buf_hold(bip->bli_buf); atomic_inc(&bip->bli_refcount); atomic_inc(&bip->bli_buf->b_pin_count); }
/* - * This is called to unpin the buffer associated with the buf log item which - * was previously pinned with a call to xfs_buf_item_pin(). + * This is called to unpin the buffer associated with the buf log item which was + * previously pinned with a call to xfs_buf_item_pin(). We enter this function + * with a buffer pin count, a buffer reference and a BLI reference. + * + * We must drop the BLI reference before we unpin the buffer because the AIL + * doesn't acquire a BLI reference whenever it accesses it. Therefore if the + * refcount drops to zero, the bli could still be AIL resident and the buffer + * submitted for I/O at any point before we return. This can result in IO + * completion freeing the buffer while we are still trying to access it here. + * This race condition can also occur in shutdown situations where we abort and + * unpin buffers from contexts other that journal IO completion. + * + * Hence we have to hold a buffer reference per pin count to ensure that the + * buffer cannot be freed until we have finished processing the unpin operation. + * The reference is taken in xfs_buf_item_pin(), and we must hold it until we + * are done processing the buffer state. In the case of an abort (remove = + * true) then we re-use the current pin reference as the IO reference we hand + * off to IO failure handling. */ STATIC void xfs_buf_item_unpin( @@ -493,24 +518,18 @@ xfs_buf_item_unpin(
trace_xfs_buf_item_unpin(bip);
- /* - * Drop the bli ref associated with the pin and grab the hold required - * for the I/O simulation failure in the abort case. We have to do this - * before the pin count drops because the AIL doesn't acquire a bli - * reference. Therefore if the refcount drops to zero, the bli could - * still be AIL resident and the buffer submitted for I/O (and freed on - * completion) at any point before we return. This can be removed once - * the AIL properly holds a reference on the bli. - */ freed = atomic_dec_and_test(&bip->bli_refcount); - if (freed && !stale && remove) - xfs_buf_hold(bp); if (atomic_dec_and_test(&bp->b_pin_count)) wake_up_all(&bp->b_waiters);
- /* nothing to do but drop the pin count if the bli is active */ - if (!freed) + /* + * Nothing to do but drop the buffer pin reference if the BLI is + * still active. + */ + if (!freed) { + xfs_buf_rele(bp); return; + }
if (stale) { ASSERT(bip->bli_flags & XFS_BLI_STALE); @@ -523,6 +542,15 @@ xfs_buf_item_unpin( trace_xfs_buf_item_unpin_stale(bip);
/* + * The buffer has been locked and referenced since it was marked + * stale so we own both lock and reference exclusively here. We + * do not need the pin reference any more, so drop it now so + * that we only have one reference to drop once item completion + * processing is complete. + */ + xfs_buf_rele(bp); + + /* * If we get called here because of an IO error, we may or may * not have the item on the AIL. xfs_trans_ail_delete() will * take care of that situation. xfs_trans_ail_delete() drops @@ -538,16 +566,30 @@ xfs_buf_item_unpin( ASSERT(bp->b_log_item == NULL); } xfs_buf_relse(bp); - } else if (remove) { + return; + } + + if (remove) { /* - * The buffer must be locked and held by the caller to simulate - * an async I/O failure. We acquired the hold for this case - * before the buffer was unpinned. + * We need to simulate an async IO failures here to ensure that + * the correct error completion is run on this buffer. This + * requires a reference to the buffer and for the buffer to be + * locked. We can safely pass ownership of the pin reference to + * the IO to ensure that nothing can free the buffer while we + * wait for the lock and then run the IO failure completion. */ xfs_buf_lock(bp); bp->b_flags |= XBF_ASYNC; xfs_buf_ioend_fail(bp); + return; } + + /* + * BLI has no more active references - it will be moved to the AIL to + * manage the remaining BLI/buffer life cycle. There is nothing left for + * us to do here so drop the pin reference to the buffer. + */ + xfs_buf_rele(bp); }
STATIC uint
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Chinner dchinner@redhat.com
[ Upstream commit cb042117488dbf0b3b38b05771639890fada9a52 ]
To fix a AGI-AGF-inode cluster buffer deadlock, we need to move inode cluster buffer operations to the ->iop_precommit() method. However, this means that deferred operations can require precommits to be run on the final transaction that the deferred ops pass back to xfs_trans_commit() context. This will be exposed by attribute handling, in that the last changes to the inode in the attr set state machine "disappear" because the precommit operation is not run.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_trans.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -970,6 +970,11 @@ __xfs_trans_commit( error = xfs_defer_finish_noroll(&tp); if (error) goto out_unreserve; + + /* Run precommits from final tx in defer chain. */ + error = xfs_trans_run_precommits(tp); + if (error) + goto out_unreserve; }
/*
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Chinner dchinner@redhat.com
[ Upstream commit 82842fee6e5979ca7e2bf4d839ef890c22ffb7aa ]
Lock order in XFS is AGI -> AGF, hence for operations involving inode unlinked list operations we always lock the AGI first. Inode unlinked list operations operate on the inode cluster buffer, so the lock order there is AGI -> inode cluster buffer.
For O_TMPFILE operations, this now means the lock order set down in xfs_rename and xfs_link is AGI -> inode cluster buffer -> AGF as the unlinked ops are done before the directory modifications that may allocate space and lock the AGF.
Unfortunately, we also now lock the inode cluster buffer when logging an inode so that we can attach the inode to the cluster buffer and pin it in memory. This creates a lock order of AGF -> inode cluster buffer in directory operations as we have to log the inode after we've allocated new space for it.
This creates a lock inversion between the AGF and the inode cluster buffer. Because the inode cluster buffer is shared across multiple inodes, the inversion is not specific to individual inodes but can occur when inodes in the same cluster buffer are accessed in different orders.
To fix this we need move all the inode log item cluster buffer interactions to the end of the current transaction. Unfortunately, xfs_trans_log_inode() calls are littered throughout the transactions with no thought to ordering against other items or locking. This makes it difficult to do anything that involves changing the call sites of xfs_trans_log_inode() to change locking orders.
However, we do now have a mechanism that allows is to postpone dirty item processing to just before we commit the transaction: the ->iop_precommit method. This will be called after all the modifications are done and high level objects like AGI and AGF buffers have been locked and modified, thereby providing a mechanism that guarantees we don't lock the inode cluster buffer before those high level objects are locked.
This change is largely moving the guts of xfs_trans_log_inode() to xfs_inode_item_precommit() and providing an extra flag context in the inode log item to track the dirty state of the inode in the current transaction. This also means we do a lot less repeated work in xfs_trans_log_inode() by only doing it once per transaction when all the work is done.
Fixes: 298f7bec503f ("xfs: pin inode backing buffer to the inode log item") Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_log_format.h | 9 ++ fs/xfs/libxfs/xfs_trans_inode.c | 113 ++---------------------------- fs/xfs/xfs_inode_item.c | 149 ++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_inode_item.h | 1 4 files changed, 166 insertions(+), 106 deletions(-)
--- a/fs/xfs/libxfs/xfs_log_format.h +++ b/fs/xfs/libxfs/xfs_log_format.h @@ -324,7 +324,6 @@ struct xfs_inode_log_format_32 { #define XFS_ILOG_DOWNER 0x200 /* change the data fork owner on replay */ #define XFS_ILOG_AOWNER 0x400 /* change the attr fork owner on replay */
- /* * The timestamps are dirty, but not necessarily anything else in the inode * core. Unlike the other fields above this one must never make it to disk @@ -333,6 +332,14 @@ struct xfs_inode_log_format_32 { */ #define XFS_ILOG_TIMESTAMP 0x4000
+/* + * The version field has been changed, but not necessarily anything else of + * interest. This must never make it to disk - it is used purely to ensure that + * the inode item ->precommit operation can update the fsync flag triggers + * in the inode item correctly. + */ +#define XFS_ILOG_IVERSION 0x8000 + #define XFS_ILOG_NONCORE (XFS_ILOG_DDATA | XFS_ILOG_DEXT | \ XFS_ILOG_DBROOT | XFS_ILOG_DEV | \ XFS_ILOG_ADATA | XFS_ILOG_AEXT | \ --- a/fs/xfs/libxfs/xfs_trans_inode.c +++ b/fs/xfs/libxfs/xfs_trans_inode.c @@ -40,9 +40,8 @@ xfs_trans_ijoin( iip->ili_lock_flags = lock_flags; ASSERT(!xfs_iflags_test(ip, XFS_ISTALE));
- /* - * Get a log_item_desc to point at the new item. - */ + /* Reset the per-tx dirty context and add the item to the tx. */ + iip->ili_dirty_flags = 0; xfs_trans_add_item(tp, &iip->ili_item); }
@@ -76,17 +75,10 @@ xfs_trans_ichgtime( /* * This is called to mark the fields indicated in fieldmask as needing to be * logged when the transaction is committed. The inode must already be - * associated with the given transaction. - * - * The values for fieldmask are defined in xfs_inode_item.h. We always log all - * of the core inode if any of it has changed, and we always log all of the - * inline data/extents/b-tree root if any of them has changed. - * - * Grab and pin the cluster buffer associated with this inode to avoid RMW - * cycles at inode writeback time. Avoid the need to add error handling to every - * xfs_trans_log_inode() call by shutting down on read error. This will cause - * transactions to fail and everything to error out, just like if we return a - * read error in a dirty transaction and cancel it. + * associated with the given transaction. All we do here is record where the + * inode was dirtied and mark the transaction and inode log item dirty; + * everything else is done in the ->precommit log item operation after the + * changes in the transaction have been completed. */ void xfs_trans_log_inode( @@ -96,7 +88,6 @@ xfs_trans_log_inode( { struct xfs_inode_log_item *iip = ip->i_itemp; struct inode *inode = VFS_I(ip); - uint iversion_flags = 0;
ASSERT(iip); ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); @@ -105,18 +96,6 @@ xfs_trans_log_inode( tp->t_flags |= XFS_TRANS_DIRTY;
/* - * Don't bother with i_lock for the I_DIRTY_TIME check here, as races - * don't matter - we either will need an extra transaction in 24 hours - * to log the timestamps, or will clear already cleared fields in the - * worst case. - */ - if (inode->i_state & I_DIRTY_TIME) { - spin_lock(&inode->i_lock); - inode->i_state &= ~I_DIRTY_TIME; - spin_unlock(&inode->i_lock); - } - - /* * First time we log the inode in a transaction, bump the inode change * counter if it is configured for this to occur. While we have the * inode locked exclusively for metadata modification, we can usually @@ -128,86 +107,10 @@ xfs_trans_log_inode( if (!test_and_set_bit(XFS_LI_DIRTY, &iip->ili_item.li_flags)) { if (IS_I_VERSION(inode) && inode_maybe_inc_iversion(inode, flags & XFS_ILOG_CORE)) - iversion_flags = XFS_ILOG_CORE; + flags |= XFS_ILOG_IVERSION; }
- /* - * If we're updating the inode core or the timestamps and it's possible - * to upgrade this inode to bigtime format, do so now. - */ - if ((flags & (XFS_ILOG_CORE | XFS_ILOG_TIMESTAMP)) && - xfs_has_bigtime(ip->i_mount) && - !xfs_inode_has_bigtime(ip)) { - ip->i_diflags2 |= XFS_DIFLAG2_BIGTIME; - flags |= XFS_ILOG_CORE; - } - - /* - * Inode verifiers do not check that the extent size hint is an integer - * multiple of the rt extent size on a directory with both rtinherit - * and extszinherit flags set. If we're logging a directory that is - * misconfigured in this way, clear the hint. - */ - if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) && - (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) && - (ip->i_extsize % ip->i_mount->m_sb.sb_rextsize) > 0) { - ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE | - XFS_DIFLAG_EXTSZINHERIT); - ip->i_extsize = 0; - flags |= XFS_ILOG_CORE; - } - - /* - * Record the specific change for fdatasync optimisation. This allows - * fdatasync to skip log forces for inodes that are only timestamp - * dirty. - */ - spin_lock(&iip->ili_lock); - iip->ili_fsync_fields |= flags; - - if (!iip->ili_item.li_buf) { - struct xfs_buf *bp; - int error; - - /* - * We hold the ILOCK here, so this inode is not going to be - * flushed while we are here. Further, because there is no - * buffer attached to the item, we know that there is no IO in - * progress, so nothing will clear the ili_fields while we read - * in the buffer. Hence we can safely drop the spin lock and - * read the buffer knowing that the state will not change from - * here. - */ - spin_unlock(&iip->ili_lock); - error = xfs_imap_to_bp(ip->i_mount, tp, &ip->i_imap, &bp); - if (error) { - xfs_force_shutdown(ip->i_mount, SHUTDOWN_META_IO_ERROR); - return; - } - - /* - * We need an explicit buffer reference for the log item but - * don't want the buffer to remain attached to the transaction. - * Hold the buffer but release the transaction reference once - * we've attached the inode log item to the buffer log item - * list. - */ - xfs_buf_hold(bp); - spin_lock(&iip->ili_lock); - iip->ili_item.li_buf = bp; - bp->b_flags |= _XBF_INODES; - list_add_tail(&iip->ili_item.li_bio_list, &bp->b_li_list); - xfs_trans_brelse(tp, bp); - } - - /* - * Always OR in the bits from the ili_last_fields field. This is to - * coordinate with the xfs_iflush() and xfs_buf_inode_iodone() routines - * in the eventual clearing of the ili_fields bits. See the big comment - * in xfs_iflush() for an explanation of this coordination mechanism. - */ - iip->ili_fields |= (flags | iip->ili_last_fields | iversion_flags); - spin_unlock(&iip->ili_lock); + iip->ili_dirty_flags |= flags; }
int --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -29,6 +29,153 @@ static inline struct xfs_inode_log_item return container_of(lip, struct xfs_inode_log_item, ili_item); }
+static uint64_t +xfs_inode_item_sort( + struct xfs_log_item *lip) +{ + return INODE_ITEM(lip)->ili_inode->i_ino; +} + +/* + * Prior to finally logging the inode, we have to ensure that all the + * per-modification inode state changes are applied. This includes VFS inode + * state updates, format conversions, verifier state synchronisation and + * ensuring the inode buffer remains in memory whilst the inode is dirty. + * + * We have to be careful when we grab the inode cluster buffer due to lock + * ordering constraints. The unlinked inode modifications (xfs_iunlink_item) + * require AGI -> inode cluster buffer lock order. The inode cluster buffer is + * not locked until ->precommit, so it happens after everything else has been + * modified. + * + * Further, we have AGI -> AGF lock ordering, and with O_TMPFILE handling we + * have AGI -> AGF -> iunlink item -> inode cluster buffer lock order. Hence we + * cannot safely lock the inode cluster buffer in xfs_trans_log_inode() because + * it can be called on a inode (e.g. via bumplink/droplink) before we take the + * AGF lock modifying directory blocks. + * + * Rather than force a complete rework of all the transactions to call + * xfs_trans_log_inode() once and once only at the end of every transaction, we + * move the pinning of the inode cluster buffer to a ->precommit operation. This + * matches how the xfs_iunlink_item locks the inode cluster buffer, and it + * ensures that the inode cluster buffer locking is always done last in a + * transaction. i.e. we ensure the lock order is always AGI -> AGF -> inode + * cluster buffer. + * + * If we return the inode number as the precommit sort key then we'll also + * guarantee that the order all inode cluster buffer locking is the same all the + * inodes and unlink items in the transaction. + */ +static int +xfs_inode_item_precommit( + struct xfs_trans *tp, + struct xfs_log_item *lip) +{ + struct xfs_inode_log_item *iip = INODE_ITEM(lip); + struct xfs_inode *ip = iip->ili_inode; + struct inode *inode = VFS_I(ip); + unsigned int flags = iip->ili_dirty_flags; + + /* + * Don't bother with i_lock for the I_DIRTY_TIME check here, as races + * don't matter - we either will need an extra transaction in 24 hours + * to log the timestamps, or will clear already cleared fields in the + * worst case. + */ + if (inode->i_state & I_DIRTY_TIME) { + spin_lock(&inode->i_lock); + inode->i_state &= ~I_DIRTY_TIME; + spin_unlock(&inode->i_lock); + } + + /* + * If we're updating the inode core or the timestamps and it's possible + * to upgrade this inode to bigtime format, do so now. + */ + if ((flags & (XFS_ILOG_CORE | XFS_ILOG_TIMESTAMP)) && + xfs_has_bigtime(ip->i_mount) && + !xfs_inode_has_bigtime(ip)) { + ip->i_diflags2 |= XFS_DIFLAG2_BIGTIME; + flags |= XFS_ILOG_CORE; + } + + /* + * Inode verifiers do not check that the extent size hint is an integer + * multiple of the rt extent size on a directory with both rtinherit + * and extszinherit flags set. If we're logging a directory that is + * misconfigured in this way, clear the hint. + */ + if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) && + (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) && + (ip->i_extsize % ip->i_mount->m_sb.sb_rextsize) > 0) { + ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE | + XFS_DIFLAG_EXTSZINHERIT); + ip->i_extsize = 0; + flags |= XFS_ILOG_CORE; + } + + /* + * Record the specific change for fdatasync optimisation. This allows + * fdatasync to skip log forces for inodes that are only timestamp + * dirty. Once we've processed the XFS_ILOG_IVERSION flag, convert it + * to XFS_ILOG_CORE so that the actual on-disk dirty tracking + * (ili_fields) correctly tracks that the version has changed. + */ + spin_lock(&iip->ili_lock); + iip->ili_fsync_fields |= (flags & ~XFS_ILOG_IVERSION); + if (flags & XFS_ILOG_IVERSION) + flags = ((flags & ~XFS_ILOG_IVERSION) | XFS_ILOG_CORE); + + if (!iip->ili_item.li_buf) { + struct xfs_buf *bp; + int error; + + /* + * We hold the ILOCK here, so this inode is not going to be + * flushed while we are here. Further, because there is no + * buffer attached to the item, we know that there is no IO in + * progress, so nothing will clear the ili_fields while we read + * in the buffer. Hence we can safely drop the spin lock and + * read the buffer knowing that the state will not change from + * here. + */ + spin_unlock(&iip->ili_lock); + error = xfs_imap_to_bp(ip->i_mount, tp, &ip->i_imap, &bp); + if (error) + return error; + + /* + * We need an explicit buffer reference for the log item but + * don't want the buffer to remain attached to the transaction. + * Hold the buffer but release the transaction reference once + * we've attached the inode log item to the buffer log item + * list. + */ + xfs_buf_hold(bp); + spin_lock(&iip->ili_lock); + iip->ili_item.li_buf = bp; + bp->b_flags |= _XBF_INODES; + list_add_tail(&iip->ili_item.li_bio_list, &bp->b_li_list); + xfs_trans_brelse(tp, bp); + } + + /* + * Always OR in the bits from the ili_last_fields field. This is to + * coordinate with the xfs_iflush() and xfs_buf_inode_iodone() routines + * in the eventual clearing of the ili_fields bits. See the big comment + * in xfs_iflush() for an explanation of this coordination mechanism. + */ + iip->ili_fields |= (flags | iip->ili_last_fields); + spin_unlock(&iip->ili_lock); + + /* + * We are done with the log item transaction dirty state, so clear it so + * that it doesn't pollute future transactions. + */ + iip->ili_dirty_flags = 0; + return 0; +} + /* * The logged size of an inode fork is always the current size of the inode * fork. This means that when an inode fork is relogged, the size of the logged @@ -662,6 +809,8 @@ xfs_inode_item_committing( }
static const struct xfs_item_ops xfs_inode_item_ops = { + .iop_sort = xfs_inode_item_sort, + .iop_precommit = xfs_inode_item_precommit, .iop_size = xfs_inode_item_size, .iop_format = xfs_inode_item_format, .iop_pin = xfs_inode_item_pin, --- a/fs/xfs/xfs_inode_item.h +++ b/fs/xfs/xfs_inode_item.h @@ -17,6 +17,7 @@ struct xfs_inode_log_item { struct xfs_log_item ili_item; /* common portion */ struct xfs_inode *ili_inode; /* inode ptr */ unsigned short ili_lock_flags; /* inode lock flags */ + unsigned int ili_dirty_flags; /* dirty in current tx */ /* * The ili_lock protects the interactions between the dirty state and * the flush state of the inode log item. This allows us to do atomic
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Chinner dchinner@redhat.com
[ Upstream commit d4d12c02bf5f768f1b423c7ae2909c5afdfe0d5f ]
Unlinked list recovery requires errors removing the inode the from the unlinked list get fed back to the main recovery loop. Now that we offload the unlinking to the inodegc work, we don't get errors being fed back when we trip over a corruption that prevents the inode from being removed from the unlinked list.
This means we never clear the corrupt unlinked list bucket, resulting in runtime operations eventually tripping over it and shutting down.
Fix this by collecting inodegc worker errors and feed them back to the flush caller. This is largely best effort - the only context that really cares is log recovery, and it only flushes a single inode at a time so we don't need complex synchronised handling. Essentially the inodegc workers will capture the first error that occurs and the next flush will gather them and clear them. The flush itself will only report the first gathered error.
In the cases where callers can return errors, propagate the collected inodegc flush error up the error handling chain.
In the case of inode unlinked list recovery, there are several superfluous calls to flush queued unlinked inodes - xlog_recover_iunlink_bucket() guarantees that it has flushed the inodegc and collected errors before it returns. Hence nothing in the calling path needs to run a flush, even when an error is returned.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_icache.c | 46 +++++++++++++++++++++++++++++++++++++--------- fs/xfs/xfs_icache.h | 4 ++-- fs/xfs/xfs_inode.c | 20 ++++++-------------- fs/xfs/xfs_inode.h | 2 +- fs/xfs/xfs_log_recover.c | 19 +++++++++---------- fs/xfs/xfs_mount.h | 1 + fs/xfs/xfs_super.c | 1 + fs/xfs/xfs_trans.c | 4 +++- 8 files changed, 60 insertions(+), 37 deletions(-)
--- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -454,6 +454,27 @@ xfs_inodegc_queue_all( return ret; }
+/* Wait for all queued work and collect errors */ +static int +xfs_inodegc_wait_all( + struct xfs_mount *mp) +{ + int cpu; + int error = 0; + + flush_workqueue(mp->m_inodegc_wq); + for_each_online_cpu(cpu) { + struct xfs_inodegc *gc; + + gc = per_cpu_ptr(mp->m_inodegc, cpu); + if (gc->error && !error) + error = gc->error; + gc->error = 0; + } + + return error; +} + /* * Check the validity of the inode we just found it the cache */ @@ -1490,15 +1511,14 @@ xfs_blockgc_free_space( if (error) return error;
- xfs_inodegc_flush(mp); - return 0; + return xfs_inodegc_flush(mp); }
/* * Reclaim all the free space that we can by scheduling the background blockgc * and inodegc workers immediately and waiting for them all to clear. */ -void +int xfs_blockgc_flush_all( struct xfs_mount *mp) { @@ -1519,7 +1539,7 @@ xfs_blockgc_flush_all( for_each_perag_tag(mp, agno, pag, XFS_ICI_BLOCKGC_TAG) flush_delayed_work(&pag->pag_blockgc_work);
- xfs_inodegc_flush(mp); + return xfs_inodegc_flush(mp); }
/* @@ -1841,13 +1861,17 @@ xfs_inodegc_set_reclaimable( * This is the last chance to make changes to an otherwise unreferenced file * before incore reclamation happens. */ -static void +static int xfs_inodegc_inactivate( struct xfs_inode *ip) { + int error; + trace_xfs_inode_inactivating(ip); - xfs_inactive(ip); + error = xfs_inactive(ip); xfs_inodegc_set_reclaimable(ip); + return error; + }
void @@ -1879,8 +1903,12 @@ xfs_inodegc_worker(
WRITE_ONCE(gc->shrinker_hits, 0); llist_for_each_entry_safe(ip, n, node, i_gclist) { + int error; + xfs_iflags_set(ip, XFS_INACTIVATING); - xfs_inodegc_inactivate(ip); + error = xfs_inodegc_inactivate(ip); + if (error && !gc->error) + gc->error = error; }
memalloc_nofs_restore(nofs_flag); @@ -1904,13 +1932,13 @@ xfs_inodegc_push( * Force all currently queued inode inactivation work to run immediately and * wait for the work to finish. */ -void +int xfs_inodegc_flush( struct xfs_mount *mp) { xfs_inodegc_push(mp); trace_xfs_inodegc_flush(mp, __return_address); - flush_workqueue(mp->m_inodegc_wq); + return xfs_inodegc_wait_all(mp); }
/* --- a/fs/xfs/xfs_icache.h +++ b/fs/xfs/xfs_icache.h @@ -59,7 +59,7 @@ int xfs_blockgc_free_dquots(struct xfs_m unsigned int iwalk_flags); int xfs_blockgc_free_quota(struct xfs_inode *ip, unsigned int iwalk_flags); int xfs_blockgc_free_space(struct xfs_mount *mp, struct xfs_icwalk *icm); -void xfs_blockgc_flush_all(struct xfs_mount *mp); +int xfs_blockgc_flush_all(struct xfs_mount *mp);
void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip); void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip); @@ -77,7 +77,7 @@ void xfs_blockgc_start(struct xfs_mount
void xfs_inodegc_worker(struct work_struct *work); void xfs_inodegc_push(struct xfs_mount *mp); -void xfs_inodegc_flush(struct xfs_mount *mp); +int xfs_inodegc_flush(struct xfs_mount *mp); void xfs_inodegc_stop(struct xfs_mount *mp); void xfs_inodegc_start(struct xfs_mount *mp); void xfs_inodegc_cpu_dead(struct xfs_mount *mp, unsigned int cpu); --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1620,16 +1620,7 @@ xfs_inactive_ifree( */ xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_ICOUNT, -1);
- /* - * Just ignore errors at this point. There is nothing we can do except - * to try to keep going. Make sure it's not a silent error. - */ - error = xfs_trans_commit(tp); - if (error) - xfs_notice(mp, "%s: xfs_trans_commit returned error %d", - __func__, error); - - return 0; + return xfs_trans_commit(tp); }
/* @@ -1696,12 +1687,12 @@ xfs_inode_needs_inactive( * now be truncated. Also, we clear all of the read-ahead state * kept for the inode here since the file is now closed. */ -void +int xfs_inactive( xfs_inode_t *ip) { struct xfs_mount *mp; - int error; + int error = 0; int truncate = 0;
/* @@ -1742,7 +1733,7 @@ xfs_inactive( * reference to the inode at this point anyways. */ if (xfs_can_free_eofblocks(ip, true)) - xfs_free_eofblocks(ip); + error = xfs_free_eofblocks(ip);
goto out; } @@ -1779,7 +1770,7 @@ xfs_inactive( /* * Free the inode. */ - xfs_inactive_ifree(ip); + error = xfs_inactive_ifree(ip);
out: /* @@ -1787,6 +1778,7 @@ out: * the attached dquots. */ xfs_qm_dqdetach(ip); + return error; }
/* --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -470,7 +470,7 @@ enum layout_break_reason { (xfs_has_grpid((pip)->i_mount) || (VFS_I(pip)->i_mode & S_ISGID))
int xfs_release(struct xfs_inode *ip); -void xfs_inactive(struct xfs_inode *ip); +int xfs_inactive(struct xfs_inode *ip); int xfs_lookup(struct xfs_inode *dp, const struct xfs_name *name, struct xfs_inode **ipp, struct xfs_name *ci_name); int xfs_create(struct user_namespace *mnt_userns, --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -2711,7 +2711,9 @@ xlog_recover_iunlink_bucket( * just to flush the inodegc queue and wait for it to * complete. */ - xfs_inodegc_flush(mp); + error = xfs_inodegc_flush(mp); + if (error) + break; }
prev_agino = agino; @@ -2719,10 +2721,15 @@ xlog_recover_iunlink_bucket( }
if (prev_ip) { + int error2; + ip->i_prev_unlinked = prev_agino; xfs_irele(prev_ip); + + error2 = xfs_inodegc_flush(mp); + if (error2 && !error) + return error2; } - xfs_inodegc_flush(mp); return error; }
@@ -2789,7 +2796,6 @@ xlog_recover_iunlink_ag( * bucket and remaining inodes on it unreferenced and * unfreeable. */ - xfs_inodegc_flush(pag->pag_mount); xlog_recover_clear_agi_bucket(pag, bucket); } } @@ -2806,13 +2812,6 @@ xlog_recover_process_iunlinks(
for_each_perag(log->l_mp, agno, pag) xlog_recover_iunlink_ag(pag); - - /* - * Flush the pending unlinked inodes to ensure that the inactivations - * are fully completed on disk and the incore inodes can be reclaimed - * before we signal that recovery is complete. - */ - xfs_inodegc_flush(log->l_mp); }
STATIC void --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -62,6 +62,7 @@ struct xfs_error_cfg { struct xfs_inodegc { struct llist_head list; struct delayed_work work; + int error;
/* approximate count of inodes in the list */ unsigned int items; --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1089,6 +1089,7 @@ xfs_inodegc_init_percpu( #endif init_llist_head(&gc->list); gc->items = 0; + gc->error = 0; INIT_DELAYED_WORK(&gc->work, xfs_inodegc_worker); } return 0; --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -290,7 +290,9 @@ retry: * Do not perform a synchronous scan because callers can hold * other locks. */ - xfs_blockgc_flush_all(mp); + error = xfs_blockgc_flush_all(mp); + if (error) + return error; want_retry = false; goto retry; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Long Li leo.lilong@huaweicloud.com
[ Upstream commit c3b880acadc95d6e019eae5d669e072afda24f1b ]
I found a corruption during growfs:
XFS (loop0): Internal error agbno >= mp->m_sb.sb_agblocks at line 3661 of file fs/xfs/libxfs/xfs_alloc.c. Caller __xfs_free_extent+0x28e/0x3c0 CPU: 0 PID: 573 Comm: xfs_growfs Not tainted 6.3.0-rc7-next-20230420-00001-gda8c95746257 Call Trace: <TASK> dump_stack_lvl+0x50/0x70 xfs_corruption_error+0x134/0x150 __xfs_free_extent+0x2c1/0x3c0 xfs_ag_extend_space+0x291/0x3e0 xfs_growfs_data+0xd72/0xe90 xfs_file_ioctl+0x5f9/0x14a0 __x64_sys_ioctl+0x13e/0x1c0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd XFS (loop0): Corruption detected. Unmount and run xfs_repair XFS (loop0): Internal error xfs_trans_cancel at line 1097 of file fs/xfs/xfs_trans.c. Caller xfs_growfs_data+0x691/0xe90 CPU: 0 PID: 573 Comm: xfs_growfs Not tainted 6.3.0-rc7-next-20230420-00001-gda8c95746257 Call Trace: <TASK> dump_stack_lvl+0x50/0x70 xfs_error_report+0x93/0xc0 xfs_trans_cancel+0x2c0/0x350 xfs_growfs_data+0x691/0xe90 xfs_file_ioctl+0x5f9/0x14a0 __x64_sys_ioctl+0x13e/0x1c0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f2d86706577
The bug can be reproduced with the following sequence:
# truncate -s 1073741824 xfs_test.img # mkfs.xfs -f -b size=1024 -d agcount=4 xfs_test.img # truncate -s 2305843009213693952 xfs_test.img # mount -o loop xfs_test.img /mnt/test # xfs_growfs -D 1125899907891200 /mnt/test
The root cause is that during growfs, user space passed in a large value of newblcoks to xfs_growfs_data_private(), due to current sb_agblocks is too small, new AG count will exceed UINT_MAX. Because of AG number type is unsigned int and it would overflow, that caused nagcount much smaller than the actual value. During AG extent space, delta blocks in xfs_resizefs_init_new_ags() will much larger than the actual value due to incorrect nagcount, even exceed UINT_MAX. This will cause corruption and be detected in __xfs_free_extent. Fix it by growing the filesystem to up to the maximally allowed AGs and not return EINVAL when new AG count overflow.
Signed-off-by: Long Li leo.lilong@huawei.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_fs.h | 2 ++ fs/xfs/xfs_fsops.c | 13 +++++++++---- 2 files changed, 11 insertions(+), 4 deletions(-)
--- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -257,6 +257,8 @@ typedef struct xfs_fsop_resblks { #define XFS_MAX_AG_BLOCKS (XFS_MAX_AG_BYTES / XFS_MIN_BLOCKSIZE) #define XFS_MAX_CRC_AG_BLOCKS (XFS_MAX_AG_BYTES / XFS_MIN_CRC_BLOCKSIZE)
+#define XFS_MAX_AGNUMBER ((xfs_agnumber_t)(NULLAGNUMBER - 1)) + /* keep the maximum size under 2^31 by a small amount */ #define XFS_MAX_LOG_BYTES \ ((2 * 1024 * 1024 * 1024ULL) - XFS_MIN_LOG_BYTES) --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -115,11 +115,16 @@ xfs_growfs_data_private(
nb_div = nb; nb_mod = do_div(nb_div, mp->m_sb.sb_agblocks); - nagcount = nb_div + (nb_mod != 0); - if (nb_mod && nb_mod < XFS_MIN_AG_BLOCKS) { - nagcount--; - nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks; + if (nb_mod && nb_mod >= XFS_MIN_AG_BLOCKS) + nb_div++; + else if (nb_mod) + nb = nb_div * mp->m_sb.sb_agblocks; + + if (nb_div > XFS_MAX_AGNUMBER + 1) { + nb_div = XFS_MAX_AGNUMBER + 1; + nb = nb_div * mp->m_sb.sb_agblocks; } + nagcount = nb_div; delta = nb - mp->m_sb.sb_dblocks; /* * Reject filesystems with a single AG because they are not
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Chinner dchinner@redhat.com
[ Upstream commit 4b827b3f305d1fcf837265f1e12acc22ee84327c ]
It just creates unnecessary bot noise these days.
Reported-by: syzbot+6ae213503fb12e87934f@syzkaller.appspotmail.com Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_dquot.c | 1 - 1 file changed, 1 deletion(-)
--- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -798,7 +798,6 @@ xfs_qm_dqget_cache_insert( error = radix_tree_insert(tree, id, dqp); if (unlikely(error)) { /* Duplicate found! Caller must try again. */ - WARN_ON(error != -EEXIST); mutex_unlock(&qi->qi_tree_lock); trace_xfs_dqget_dup(dqp); return error;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shiyang Ruan ruansy.fnst@fujitsu.com
[ Upstream commit 5cf32f63b0f4c520460c1a5dd915dc4f09085f29 ]
The value of "end" should be "start + length - 1".
Signed-off-by: Shiyang Ruan ruansy.fnst@fujitsu.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_notify_failure.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/fs/xfs/xfs_notify_failure.c +++ b/fs/xfs/xfs_notify_failure.c @@ -114,7 +114,8 @@ xfs_dax_notify_ddev_failure( int error = 0; xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, daddr); xfs_agnumber_t agno = XFS_FSB_TO_AGNO(mp, fsbno); - xfs_fsblock_t end_fsbno = XFS_DADDR_TO_FSB(mp, daddr + bblen); + xfs_fsblock_t end_fsbno = XFS_DADDR_TO_FSB(mp, + daddr + bblen - 1); xfs_agnumber_t end_agno = XFS_FSB_TO_AGNO(mp, end_fsbno);
error = xfs_trans_alloc_empty(mp, &tp); @@ -210,7 +211,7 @@ xfs_dax_notify_failure( ddev_end = ddev_start + bdev_nr_bytes(mp->m_ddev_targp->bt_bdev) - 1;
/* Ignore the range out of filesystem area */ - if (offset + len < ddev_start) + if (offset + len - 1 < ddev_start) return -ENXIO; if (offset > ddev_end) return -ENXIO; @@ -222,8 +223,8 @@ xfs_dax_notify_failure( len -= ddev_start - offset; offset = 0; } - if (offset + len > ddev_end) - len -= ddev_end - offset; + if (offset + len - 1 > ddev_end) + len = ddev_end - offset + 1;
return xfs_dax_notify_ddev_failure(mp, BTOBB(offset), BTOBB(len), mf_flags);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 68b957f64fca1930164bfc6d6d379acdccd547d7 ]
shrikanth hegde reports that filesystems fail shortly after mount with the following failure:
WARNING: CPU: 56 PID: 12450 at fs/xfs/xfs_inode.c:1839 xfs_iunlink_lookup+0x58/0x80 [xfs]
This of course is the WARN_ON_ONCE in xfs_iunlink_lookup:
ip = radix_tree_lookup(&pag->pag_ici_root, agino); if (WARN_ON_ONCE(!ip || !ip->i_ino)) { ... }
From diagnostic data collected by the bug reporters, it would appear
that we cleanly mounted a filesystem that contained unlinked inodes. Unlinked inodes are only processed as a final step of log recovery, which means that clean mounts do not process the unlinked list at all.
Prior to the introduction of the incore unlinked lists, this wasn't a problem because the unlink code would (very expensively) traverse the entire ondisk metadata iunlink chain to keep things up to date. However, the incore unlinked list code complains when it realizes that it is out of sync with the ondisk metadata and shuts down the fs, which is bad.
Ritesh proposed to solve this problem by unconditionally parsing the unlinked lists at mount time, but this imposes a mount time cost for every filesystem to catch something that should be very infrequent. Instead, let's target the places where we can encounter a next_unlinked pointer that refers to an inode that is not in cache, and load it into cache.
Note: This patch does not address the problem of iget loading an inode from the middle of the iunlink list and needing to set i_prev_unlinked correctly.
Reported-by: shrikanth hegde sshegde@linux.vnet.ibm.com Triaged-by: Ritesh Harjani ritesh.list@gmail.com Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_inode.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++---- fs/xfs/xfs_trace.h | 25 ++++++++++++++++ 2 files changed, 100 insertions(+), 5 deletions(-)
--- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1829,12 +1829,17 @@ xfs_iunlink_lookup(
rcu_read_lock(); ip = radix_tree_lookup(&pag->pag_ici_root, agino); + if (!ip) { + /* Caller can handle inode not being in memory. */ + rcu_read_unlock(); + return NULL; + }
/* - * Inode not in memory or in RCU freeing limbo should not happen. - * Warn about this and let the caller handle the failure. + * Inode in RCU freeing limbo should not happen. Warn about this and + * let the caller handle the failure. */ - if (WARN_ON_ONCE(!ip || !ip->i_ino)) { + if (WARN_ON_ONCE(!ip->i_ino)) { rcu_read_unlock(); return NULL; } @@ -1843,7 +1848,10 @@ xfs_iunlink_lookup( return ip; }
-/* Update the prev pointer of the next agino. */ +/* + * Update the prev pointer of the next agino. Returns -ENOLINK if the inode + * is not in cache. + */ static int xfs_iunlink_update_backref( struct xfs_perag *pag, @@ -1858,7 +1866,8 @@ xfs_iunlink_update_backref(
ip = xfs_iunlink_lookup(pag, next_agino); if (!ip) - return -EFSCORRUPTED; + return -ENOLINK; + ip->i_prev_unlinked = prev_agino; return 0; } @@ -1902,6 +1911,62 @@ xfs_iunlink_update_bucket( return 0; }
+/* + * Load the inode @next_agino into the cache and set its prev_unlinked pointer + * to @prev_agino. Caller must hold the AGI to synchronize with other changes + * to the unlinked list. + */ +STATIC int +xfs_iunlink_reload_next( + struct xfs_trans *tp, + struct xfs_buf *agibp, + xfs_agino_t prev_agino, + xfs_agino_t next_agino) +{ + struct xfs_perag *pag = agibp->b_pag; + struct xfs_mount *mp = pag->pag_mount; + struct xfs_inode *next_ip = NULL; + xfs_ino_t ino; + int error; + + ASSERT(next_agino != NULLAGINO); + +#ifdef DEBUG + rcu_read_lock(); + next_ip = radix_tree_lookup(&pag->pag_ici_root, next_agino); + ASSERT(next_ip == NULL); + rcu_read_unlock(); +#endif + + xfs_info_ratelimited(mp, + "Found unrecovered unlinked inode 0x%x in AG 0x%x. Initiating recovery.", + next_agino, pag->pag_agno); + + /* + * Use an untrusted lookup just to be cautious in case the AGI has been + * corrupted and now points at a free inode. That shouldn't happen, + * but we'd rather shut down now since we're already running in a weird + * situation. + */ + ino = XFS_AGINO_TO_INO(mp, pag->pag_agno, next_agino); + error = xfs_iget(mp, tp, ino, XFS_IGET_UNTRUSTED, 0, &next_ip); + if (error) + return error; + + /* If this is not an unlinked inode, something is very wrong. */ + if (VFS_I(next_ip)->i_nlink != 0) { + error = -EFSCORRUPTED; + goto rele; + } + + next_ip->i_prev_unlinked = prev_agino; + trace_xfs_iunlink_reload_next(next_ip); +rele: + ASSERT(!(VFS_I(next_ip)->i_state & I_DONTCACHE)); + xfs_irele(next_ip); + return error; +} + static int xfs_iunlink_insert_inode( struct xfs_trans *tp, @@ -1933,6 +1998,8 @@ xfs_iunlink_insert_inode( * inode. */ error = xfs_iunlink_update_backref(pag, agino, next_agino); + if (error == -ENOLINK) + error = xfs_iunlink_reload_next(tp, agibp, agino, next_agino); if (error) return error;
@@ -2027,6 +2094,9 @@ xfs_iunlink_remove_inode( */ error = xfs_iunlink_update_backref(pag, ip->i_prev_unlinked, ip->i_next_unlinked); + if (error == -ENOLINK) + error = xfs_iunlink_reload_next(tp, agibp, ip->i_prev_unlinked, + ip->i_next_unlinked); if (error) return error;
--- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3679,6 +3679,31 @@ TRACE_EVENT(xfs_iunlink_update_dinode, __entry->new_ptr) );
+TRACE_EVENT(xfs_iunlink_reload_next, + TP_PROTO(struct xfs_inode *ip), + TP_ARGS(ip), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_agnumber_t, agno) + __field(xfs_agino_t, agino) + __field(xfs_agino_t, prev_agino) + __field(xfs_agino_t, next_agino) + ), + TP_fast_assign( + __entry->dev = ip->i_mount->m_super->s_dev; + __entry->agno = XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino); + __entry->agino = XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino); + __entry->prev_agino = ip->i_prev_unlinked; + __entry->next_agino = ip->i_next_unlinked; + ), + TP_printk("dev %d:%d agno 0x%x agino 0x%x prev_unlinked 0x%x next_unlinked 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->agno, + __entry->agino, + __entry->prev_agino, + __entry->next_agino) +); + DECLARE_EVENT_CLASS(xfs_ag_inode_class, TP_PROTO(struct xfs_inode *ip), TP_ARGS(ip),
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 1bba82fe1afac69c85c1f5ea137c8e73de3c8032 ]
In commit 8ee81ed581ff, Ye Bin complained about an ASSERT in the bmapx code that trips if we encounter a delalloc extent after flushing the pagecache to disk. The ioctl code does not hold MMAPLOCK so it's entirely possible that a racing write page fault can create a delalloc extent after the file has been flushed. The proposed solution was to replace the assertion with an early return that avoids filling out the bmap recordset with a delalloc entry if the caller didn't ask for it.
At the time, I recall thinking that the forward logic sounded ok, but felt hesitant because I suspected that changing this code would cause something /else/ to burst loose due to some other subtlety.
syzbot of course found that subtlety. If all the extent mappings found after the flush are delalloc mappings, we'll reach the end of the data fork without ever incrementing bmv->bmv_entries. This is new, since before we'd have emitted the delalloc mappings even though the caller didn't ask for them. Once we reach the end, we'll try to set BMV_OF_LAST on the -1st entry (because bmv_entries is zero) and go corrupt something else in memory. Yay.
I really dislike all these stupid patches that fiddle around with debug code and break things that otherwise worked well enough. Nobody was complaining that calling XFS_IOC_BMAPX without BMV_IF_DELALLOC would return BMV_OF_DELALLOC records, and now we've gone from "weird behavior that nobody cared about" to "bad behavior that must be addressed immediately".
Maybe I'll just ignore anything from Huawei from now on for my own sake.
Reported-by: syzbot+c103d3808a0de5faaf80@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-xfs/20230412024907.GP360889@frogsfrogsfrogs/ Fixes: 8ee81ed581ff ("xfs: fix BUG_ON in xfs_getbmap()") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_bmap_util.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -558,7 +558,9 @@ xfs_getbmap( if (!xfs_iext_next_extent(ifp, &icur, &got)) { xfs_fileoff_t end = XFS_B_TO_FSB(mp, XFS_ISIZE(ip));
- out[bmv->bmv_entries - 1].bmv_oflags |= BMV_OF_LAST; + if (bmv->bmv_entries > 0) + out[bmv->bmv_entries - 1].bmv_oflags |= + BMV_OF_LAST;
if (whichfork != XFS_ATTR_FORK && bno < end && !xfs_getbmap_full(bmv)) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Chinner dchinner@redhat.com
[ Upstream commit 348a1983cf4cf5099fc398438a968443af4c9f65 ]
Luis has been reporting an assert failure when freeing an inode cluster during inode inactivation for a while. The assert looks like:
XFS: Assertion failed: bp->b_flags & XBF_DONE, file: fs/xfs/xfs_trans_buf.c, line: 241 ------------[ cut here ]------------ kernel BUG at fs/xfs/xfs_message.c:102! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 4 PID: 73 Comm: kworker/4:1 Not tainted 6.10.0-rc1 #4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 Workqueue: xfs-inodegc/loop5 xfs_inodegc_worker [xfs] RIP: 0010:assfail (fs/xfs/xfs_message.c:102) xfs RSP: 0018:ffff88810188f7f0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff88816e748250 RCX: 1ffffffff844b0e7 RDX: 0000000000000004 RSI: ffff88810188f558 RDI: ffffffffc2431fa0 RBP: 1ffff11020311f01 R08: 0000000042431f9f R09: ffffed1020311e9b R10: ffff88810188f4df R11: ffffffffac725d70 R12: ffff88817a3f4000 R13: ffff88812182f000 R14: ffff88810188f998 R15: ffffffffc2423f80 FS: 0000000000000000(0000) GS:ffff8881c8400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055fe9d0f109c CR3: 000000014426c002 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> xfs_trans_read_buf_map (fs/xfs/xfs_trans_buf.c:241 (discriminator 1)) xfs xfs_imap_to_bp (fs/xfs/xfs_trans.h:210 fs/xfs/libxfs/xfs_inode_buf.c:138) xfs xfs_inode_item_precommit (fs/xfs/xfs_inode_item.c:145) xfs xfs_trans_run_precommits (fs/xfs/xfs_trans.c:931) xfs __xfs_trans_commit (fs/xfs/xfs_trans.c:966) xfs xfs_inactive_ifree (fs/xfs/xfs_inode.c:1811) xfs xfs_inactive (fs/xfs/xfs_inode.c:2013) xfs xfs_inodegc_worker (fs/xfs/xfs_icache.c:1841 fs/xfs/xfs_icache.c:1886) xfs process_one_work (kernel/workqueue.c:3231) worker_thread (kernel/workqueue.c:3306 (discriminator 2) kernel/workqueue.c:3393 (discriminator 2)) kthread (kernel/kthread.c:389) ret_from_fork (arch/x86/kernel/process.c:147) ret_from_fork_asm (arch/x86/entry/entry_64.S:257) </TASK>
And occurs when the the inode precommit handlers is attempt to look up the inode cluster buffer to attach the inode for writeback.
The trail of logic that I can reconstruct is as follows.
1. the inode is clean when inodegc runs, so it is not attached to a cluster buffer when precommit runs.
2. #1 implies the inode cluster buffer may be clean and not pinned by dirty inodes when inodegc runs.
3. #2 implies that the inode cluster buffer can be reclaimed by memory pressure at any time.
4. The assert failure implies that the cluster buffer was attached to the transaction, but not marked done. It had been accessed earlier in the transaction, but not marked done.
5. #4 implies the cluster buffer has been invalidated (i.e. marked stale).
6. #5 implies that the inode cluster buffer was instantiated uninitialised in the transaction in xfs_ifree_cluster(), which only instantiates the buffers to invalidate them and never marks them as done.
Given factors 1-3, this issue is highly dependent on timing and environmental factors. Hence the issue can be very difficult to reproduce in some situations, but highly reliable in others. Luis has an environment where it can be reproduced easily by g/531 but, OTOH, I've reproduced it only once in ~2000 cycles of g/531.
I think the fix is to have xfs_ifree_cluster() set the XBF_DONE flag on the cluster buffers, even though they may not be initialised. The reasons why I think this is safe are:
1. A buffer cache lookup hit on a XBF_STALE buffer will clear the XBF_DONE flag. Hence all future users of the buffer know they have to re-initialise the contents before use and mark it done themselves.
2. xfs_trans_binval() sets the XFS_BLI_STALE flag, which means the buffer remains locked until the journal commit completes and the buffer is unpinned. Hence once marked XBF_STALE/XFS_BLI_STALE by xfs_ifree_cluster(), the only context that can access the freed buffer is the currently running transaction.
3. #2 implies that future buffer lookups in the currently running transaction will hit the transaction match code and not the buffer cache. Hence XBF_STALE and XFS_BLI_STALE will not be cleared unless the transaction initialises and logs the buffer with valid contents again. At which point, the buffer will be marked marked XBF_DONE again, so having XBF_DONE already set on the stale buffer is a moot point.
4. #2 also implies that any concurrent access to that cluster buffer will block waiting on the buffer lock until the inode cluster has been fully freed and is no longer an active inode cluster buffer.
5. #4 + #1 means that any future user of the disk range of that buffer will always see the range of disk blocks covered by the cluster buffer as not done, and hence must initialise the contents themselves.
6. Setting XBF_DONE in xfs_ifree_cluster() then means the unlinked inode precommit code will see a XBF_DONE buffer from the transaction match as it expects. It can then attach the stale but newly dirtied inode to the stale but newly dirtied cluster buffer without unexpected failures. The stale buffer will then sail through the journal and do the right thing with the attached stale inode during unpin.
Hence the fix is just one line of extra code. The explanation of why we have to set XBF_DONE in xfs_ifree_cluster, OTOH, is long and complex....
Fixes: 82842fee6e59 ("xfs: fix AGF vs inode cluster buffer deadlock") Signed-off-by: Dave Chinner dchinner@redhat.com Tested-by: Luis Chamberlain mcgrof@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_inode.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-)
--- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -2297,11 +2297,26 @@ xfs_ifree_cluster( * This buffer may not have been correctly initialised as we * didn't read it from disk. That's not important because we are * only using to mark the buffer as stale in the log, and to - * attach stale cached inodes on it. That means it will never be - * dispatched for IO. If it is, we want to know about it, and we - * want it to fail. We can acheive this by adding a write - * verifier to the buffer. + * attach stale cached inodes on it. + * + * For the inode that triggered the cluster freeing, this + * attachment may occur in xfs_inode_item_precommit() after we + * have marked this buffer stale. If this buffer was not in + * memory before xfs_ifree_cluster() started, it will not be + * marked XBF_DONE and this will cause problems later in + * xfs_inode_item_precommit() when we trip over a (stale, !done) + * buffer to attached to the transaction. + * + * Hence we have to mark the buffer as XFS_DONE here. This is + * safe because we are also marking the buffer as XBF_STALE and + * XFS_BLI_STALE. That means it will never be dispatched for + * IO and it won't be unlocked until the cluster freeing has + * been committed to the journal and the buffer unpinned. If it + * is written, we want to know about it, and we want it to + * fail. We can acheive this by adding a write verifier to the + * buffer. */ + bp->b_flags |= XBF_DONE; bp->b_ops = &xfs_inode_buf_ops;
/*
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shiyang Ruan ruansy.fnst@fujitsu.com
[ Upstream commit 3c90c01e49342b166e5c90ec2c85b220be15a20e ]
The agend should be "start + length - 1", then, blockcount should be "end + 1 - start". Correct 2 calculation mistakes.
Also, rename "agend" to "range_agend" because it's not the end of the AG per se; it's the end of the dead region within an AG's agblock space.
Fixes: 5cf32f63b0f4 ("xfs: fix the calculation for "end" and "length"") Signed-off-by: Shiyang Ruan ruansy.fnst@fujitsu.com Reviewed-by: "Darrick J. Wong" djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_notify_failure.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/fs/xfs/xfs_notify_failure.c +++ b/fs/xfs/xfs_notify_failure.c @@ -126,8 +126,8 @@ xfs_dax_notify_ddev_failure( struct xfs_rmap_irec ri_low = { }; struct xfs_rmap_irec ri_high; struct xfs_agf *agf; - xfs_agblock_t agend; struct xfs_perag *pag; + xfs_agblock_t range_agend;
pag = xfs_perag_get(mp, agno); error = xfs_alloc_read_agf(pag, tp, 0, &agf_bp); @@ -148,10 +148,10 @@ xfs_dax_notify_ddev_failure( ri_high.rm_startblock = XFS_FSB_TO_AGBNO(mp, end_fsbno);
agf = agf_bp->b_addr; - agend = min(be32_to_cpu(agf->agf_length), + range_agend = min(be32_to_cpu(agf->agf_length) - 1, ri_high.rm_startblock); notify.startblock = ri_low.rm_startblock; - notify.blockcount = agend - ri_low.rm_startblock; + notify.blockcount = range_agend + 1 - ri_low.rm_startblock;
error = xfs_rmap_query_range(cur, &ri_low, &ri_high, xfs_dax_failure_fn, ¬ify);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit f12b96683d6976a3a07fdf3323277c79dbe8f6ab ]
Alter the definition of i_prev_unlinked slightly to make it more obvious when an inode with 0 link count is not part of the iunlink bucket lists rooted in the AGI. This distinction is necessary because it is not sufficient to check inode.i_nlink to decide if an inode is on the unlinked list. Updates to i_nlink can happen while holding only ILOCK_EXCL, but updates to an inode's position in the AGI unlinked list (which happen after the nlink update) requires both ILOCK_EXCL and the AGI buffer lock.
The next few patches will make it possible to reload an entire unlinked bucket list when we're walking the inode table or performing handle operations and need more than the ability to iget the last inode in the chain.
The upcoming directory repair code also needs to be able to make this distinction to decide if a zero link count directory should be moved to the orphanage or allowed to inactivate. An upcoming enhancement to the online AGI fsck code will need this distinction to check and rebuild the AGI unlinked buckets.
Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_icache.c | 2 +- fs/xfs/xfs_inode.c | 3 ++- fs/xfs/xfs_inode.h | 20 +++++++++++++++++++- 3 files changed, 22 insertions(+), 3 deletions(-)
--- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -113,7 +113,7 @@ xfs_inode_alloc( INIT_LIST_HEAD(&ip->i_ioend_list); spin_lock_init(&ip->i_ioend_lock); ip->i_next_unlinked = NULLAGINO; - ip->i_prev_unlinked = NULLAGINO; + ip->i_prev_unlinked = 0;
return ip; } --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -2015,6 +2015,7 @@ xfs_iunlink_insert_inode( }
/* Point the head of the list to point to this inode. */ + ip->i_prev_unlinked = NULLAGINO; return xfs_iunlink_update_bucket(tp, pag, agibp, bucket_index, agino); }
@@ -2117,7 +2118,7 @@ xfs_iunlink_remove_inode( }
ip->i_next_unlinked = NULLAGINO; - ip->i_prev_unlinked = NULLAGINO; + ip->i_prev_unlinked = 0; return error; }
--- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -68,8 +68,21 @@ typedef struct xfs_inode { uint64_t i_diflags2; /* XFS_DIFLAG2_... */ struct timespec64 i_crtime; /* time created */
- /* unlinked list pointers */ + /* + * Unlinked list pointers. These point to the next and previous inodes + * in the AGI unlinked bucket list, respectively. These fields can + * only be updated with the AGI locked. + * + * i_next_unlinked caches di_next_unlinked. + */ xfs_agino_t i_next_unlinked; + + /* + * If the inode is not on an unlinked list, this field is zero. If the + * inode is the first element in an unlinked list, this field is + * NULLAGINO. Otherwise, i_prev_unlinked points to the previous inode + * in the unlinked list. + */ xfs_agino_t i_prev_unlinked;
/* VFS inode */ @@ -81,6 +94,11 @@ typedef struct xfs_inode { struct list_head i_ioend_list; } xfs_inode_t;
+static inline bool xfs_inode_on_unlinked_list(const struct xfs_inode *ip) +{ + return ip->i_prev_unlinked != 0; +} + static inline bool xfs_inode_has_attr_fork(struct xfs_inode *ip) { return ip->i_forkoff > 0;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 83771c50e42b92de6740a63e152c96c052d37736 ]
The previous patch to reload unrecovered unlinked inodes when adding a newly created inode to the unlinked list is missing a key piece of functionality. It doesn't handle the case that someone calls xfs_iget on an inode that is not the last item in the incore list. For example, if at mount time the ondisk iunlink bucket looks like this:
AGI -> 7 -> 22 -> 3 -> NULL
None of these three inodes are cached in memory. Now let's say that someone tries to open inode 3 by handle. We need to walk the list to make sure that inodes 7 and 22 get loaded cold, and that the i_prev_unlinked of inode 3 gets set to 22.
Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_export.c | 6 +++ fs/xfs/xfs_inode.c | 100 ++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_inode.h | 9 ++++ fs/xfs/xfs_itable.c | 9 ++++ fs/xfs/xfs_trace.h | 20 ++++++++++ 5 files changed, 144 insertions(+)
--- a/fs/xfs/xfs_export.c +++ b/fs/xfs/xfs_export.c @@ -146,6 +146,12 @@ xfs_nfs_get_inode( return ERR_PTR(error); }
+ error = xfs_inode_reload_unlinked(ip); + if (error) { + xfs_irele(ip); + return ERR_PTR(error); + } + if (VFS_I(ip)->i_generation != generation) { xfs_irele(ip); return ERR_PTR(-ESTALE); --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -3622,3 +3622,103 @@ xfs_iunlock2_io_mmap( if (ip1 != ip2) inode_unlock(VFS_I(ip1)); } + +/* + * Reload the incore inode list for this inode. Caller should ensure that + * the link count cannot change, either by taking ILOCK_SHARED or otherwise + * preventing other threads from executing. + */ +int +xfs_inode_reload_unlinked_bucket( + struct xfs_trans *tp, + struct xfs_inode *ip) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_buf *agibp; + struct xfs_agi *agi; + struct xfs_perag *pag; + xfs_agnumber_t agno = XFS_INO_TO_AGNO(mp, ip->i_ino); + xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ip->i_ino); + xfs_agino_t prev_agino, next_agino; + unsigned int bucket; + bool foundit = false; + int error; + + /* Grab the first inode in the list */ + pag = xfs_perag_get(mp, agno); + error = xfs_ialloc_read_agi(pag, tp, &agibp); + xfs_perag_put(pag); + if (error) + return error; + + bucket = agino % XFS_AGI_UNLINKED_BUCKETS; + agi = agibp->b_addr; + + trace_xfs_inode_reload_unlinked_bucket(ip); + + xfs_info_ratelimited(mp, + "Found unrecovered unlinked inode 0x%x in AG 0x%x. Initiating list recovery.", + agino, agno); + + prev_agino = NULLAGINO; + next_agino = be32_to_cpu(agi->agi_unlinked[bucket]); + while (next_agino != NULLAGINO) { + struct xfs_inode *next_ip = NULL; + + if (next_agino == agino) { + /* Found this inode, set its backlink. */ + next_ip = ip; + next_ip->i_prev_unlinked = prev_agino; + foundit = true; + } + if (!next_ip) { + /* Inode already in memory. */ + next_ip = xfs_iunlink_lookup(pag, next_agino); + } + if (!next_ip) { + /* Inode not in memory, reload. */ + error = xfs_iunlink_reload_next(tp, agibp, prev_agino, + next_agino); + if (error) + break; + + next_ip = xfs_iunlink_lookup(pag, next_agino); + } + if (!next_ip) { + /* No incore inode at all? We reloaded it... */ + ASSERT(next_ip != NULL); + error = -EFSCORRUPTED; + break; + } + + prev_agino = next_agino; + next_agino = next_ip->i_next_unlinked; + } + + xfs_trans_brelse(tp, agibp); + /* Should have found this inode somewhere in the iunlinked bucket. */ + if (!error && !foundit) + error = -EFSCORRUPTED; + return error; +} + +/* Decide if this inode is missing its unlinked list and reload it. */ +int +xfs_inode_reload_unlinked( + struct xfs_inode *ip) +{ + struct xfs_trans *tp; + int error; + + error = xfs_trans_alloc_empty(ip->i_mount, &tp); + if (error) + return error; + + xfs_ilock(ip, XFS_ILOCK_SHARED); + if (xfs_inode_unlinked_incomplete(ip)) + error = xfs_inode_reload_unlinked_bucket(tp, ip); + xfs_iunlock(ip, XFS_ILOCK_SHARED); + xfs_trans_cancel(tp); + + return error; +} --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -593,4 +593,13 @@ void xfs_end_io(struct work_struct *work int xfs_ilock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2); void xfs_iunlock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
+static inline bool +xfs_inode_unlinked_incomplete( + struct xfs_inode *ip) +{ + return VFS_I(ip)->i_nlink == 0 && !xfs_inode_on_unlinked_list(ip); +} +int xfs_inode_reload_unlinked_bucket(struct xfs_trans *tp, struct xfs_inode *ip); +int xfs_inode_reload_unlinked(struct xfs_inode *ip); + #endif /* __XFS_INODE_H__ */ --- a/fs/xfs/xfs_itable.c +++ b/fs/xfs/xfs_itable.c @@ -80,6 +80,15 @@ xfs_bulkstat_one_int( if (error) goto out;
+ if (xfs_inode_unlinked_incomplete(ip)) { + error = xfs_inode_reload_unlinked_bucket(tp, ip); + if (error) { + xfs_iunlock(ip, XFS_ILOCK_SHARED); + xfs_irele(ip); + return error; + } + } + ASSERT(ip != NULL); ASSERT(ip->i_imap.im_blkno != 0); inode = VFS_I(ip); --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3704,6 +3704,26 @@ TRACE_EVENT(xfs_iunlink_reload_next, __entry->next_agino) );
+TRACE_EVENT(xfs_inode_reload_unlinked_bucket, + TP_PROTO(struct xfs_inode *ip), + TP_ARGS(ip), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_agnumber_t, agno) + __field(xfs_agino_t, agino) + ), + TP_fast_assign( + __entry->dev = ip->i_mount->m_super->s_dev; + __entry->agno = XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino); + __entry->agino = XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino); + ), + TP_printk("dev %d:%d agno 0x%x agino 0x%x bucket %u", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->agno, + __entry->agino, + __entry->agino % XFS_AGI_UNLINKED_BUCKETS) +); + DECLARE_EVENT_CLASS(xfs_ag_inode_class, TP_PROTO(struct xfs_inode *ip), TP_ARGS(ip),
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 49813a21ed57895b73ec4ed3b99d4beec931496f ]
Teach quotacheck to reload the unlinked inode lists when walking the inode table. This requires extra state handling, since it's possible that a reloaded inode will get inactivated before quotacheck tries to scan it; in this case, we need to ensure that the reloaded inode does not have dquots attached when it is freed.
Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_attr_inactive.c | 1 - fs/xfs/xfs_inode.c | 12 +++++++++--- fs/xfs/xfs_inode.h | 5 ++++- fs/xfs/xfs_mount.h | 10 +++++++++- fs/xfs/xfs_qm.c | 7 +++++++ 5 files changed, 29 insertions(+), 6 deletions(-)
--- a/fs/xfs/xfs_attr_inactive.c +++ b/fs/xfs/xfs_attr_inactive.c @@ -333,7 +333,6 @@ xfs_attr_inactive( int error = 0;
mp = dp->i_mount; - ASSERT(! XFS_NOT_DQATTACHED(mp, dp));
xfs_ilock(dp, lock_mode); if (!xfs_inode_has_attr_fork(dp)) --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1743,9 +1743,13 @@ xfs_inactive( ip->i_df.if_nextents > 0 || ip->i_delayed_blks > 0)) truncate = 1;
- error = xfs_qm_dqattach(ip); - if (error) - goto out; + if (xfs_iflags_test(ip, XFS_IQUOTAUNCHECKED)) { + xfs_qm_dqdetach(ip); + } else { + error = xfs_qm_dqattach(ip); + if (error) + goto out; + }
if (S_ISLNK(VFS_I(ip)->i_mode)) error = xfs_inactive_symlink(ip); @@ -1963,6 +1967,8 @@ xfs_iunlink_reload_next( trace_xfs_iunlink_reload_next(next_ip); rele: ASSERT(!(VFS_I(next_ip)->i_state & I_DONTCACHE)); + if (xfs_is_quotacheck_running(mp) && next_ip) + xfs_iflags_set(next_ip, XFS_IQUOTAUNCHECKED); xfs_irele(next_ip); return error; } --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -344,6 +344,9 @@ static inline bool xfs_inode_has_large_e */ #define XFS_INACTIVATING (1 << 13)
+/* Quotacheck is running but inode has not been added to quota counts. */ +#define XFS_IQUOTAUNCHECKED (1 << 14) + /* All inode state flags related to inode reclaim. */ #define XFS_ALL_IRECLAIM_FLAGS (XFS_IRECLAIMABLE | \ XFS_IRECLAIM | \ @@ -358,7 +361,7 @@ static inline bool xfs_inode_has_large_e #define XFS_IRECLAIM_RESET_FLAGS \ (XFS_IRECLAIMABLE | XFS_IRECLAIM | \ XFS_IDIRTY_RELEASE | XFS_ITRUNCATED | XFS_NEED_INACTIVE | \ - XFS_INACTIVATING) + XFS_INACTIVATING | XFS_IQUOTAUNCHECKED)
/* * Flags for inode locking. --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -401,6 +401,8 @@ __XFS_HAS_FEAT(nouuid, NOUUID) #define XFS_OPSTATE_WARNED_SHRINK 8 /* Kernel has logged a warning about logged xattr updates being used. */ #define XFS_OPSTATE_WARNED_LARP 9 +/* Mount time quotacheck is running */ +#define XFS_OPSTATE_QUOTACHECK_RUNNING 10
#define __XFS_IS_OPSTATE(name, NAME) \ static inline bool xfs_is_ ## name (struct xfs_mount *mp) \ @@ -423,6 +425,11 @@ __XFS_IS_OPSTATE(inode32, INODE32) __XFS_IS_OPSTATE(readonly, READONLY) __XFS_IS_OPSTATE(inodegc_enabled, INODEGC_ENABLED) __XFS_IS_OPSTATE(blockgc_enabled, BLOCKGC_ENABLED) +#ifdef CONFIG_XFS_QUOTA +__XFS_IS_OPSTATE(quotacheck_running, QUOTACHECK_RUNNING) +#else +# define xfs_is_quotacheck_running(mp) (false) +#endif
static inline bool xfs_should_warn(struct xfs_mount *mp, long nr) @@ -440,7 +447,8 @@ xfs_should_warn(struct xfs_mount *mp, lo { (1UL << XFS_OPSTATE_BLOCKGC_ENABLED), "blockgc" }, \ { (1UL << XFS_OPSTATE_WARNED_SCRUB), "wscrub" }, \ { (1UL << XFS_OPSTATE_WARNED_SHRINK), "wshrink" }, \ - { (1UL << XFS_OPSTATE_WARNED_LARP), "wlarp" } + { (1UL << XFS_OPSTATE_WARNED_LARP), "wlarp" }, \ + { (1UL << XFS_OPSTATE_QUOTACHECK_RUNNING), "quotacheck" }
/* * Max and min values for mount-option defined I/O --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -1160,6 +1160,10 @@ xfs_qm_dqusage_adjust( if (error) return error;
+ error = xfs_inode_reload_unlinked(ip); + if (error) + goto error0; + ASSERT(ip->i_delayed_blks == 0);
if (XFS_IS_REALTIME_INODE(ip)) { @@ -1173,6 +1177,7 @@ xfs_qm_dqusage_adjust( }
nblks = (xfs_qcnt_t)ip->i_nblocks - rtblks; + xfs_iflags_clear(ip, XFS_IQUOTAUNCHECKED);
/* * Add the (disk blocks and inode) resources occupied by this @@ -1319,8 +1324,10 @@ xfs_qm_quotacheck( flags |= XFS_PQUOTA_CHKD; }
+ xfs_set_quotacheck_running(mp); error = xfs_iwalk_threaded(mp, 0, 0, xfs_qm_dqusage_adjust, 0, true, NULL); + xfs_clear_quotacheck_running(mp);
/* * On error, the inode walk may have partially populated the dquot
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 537c013b140d373d1ffe6290b841dc00e67effaa ]
During review of the patcheset that provided reloading of the incore iunlink list, Dave made a few suggestions, and I updated the copy in my dev tree. Unfortunately, I then got distracted by ... who even knows what ... and forgot to backport those changes from my dev tree to my release candidate branch. I then sent multiple pull requests with stale patches, and that's what was merged into -rc3.
So.
This patch re-adds the use of an unlocked iunlink list check to determine if we want to allocate the resources to recreate the incore list. Since lost iunlinked inodes are supposed to be rare, this change helps us avoid paying the transaction and AGF locking costs every time we open any inode.
This also re-adds the shutdowns on failure, and re-applies the restructuring of the inner loop in xfs_inode_reload_unlinked_bucket, and re-adds a requested comment about the quotachecking code.
Retain the original RVB tag from Dave since there's no code change from the last submission.
Fixes: 68b957f64fca1 ("xfs: load uncached unlinked inodes into memory on demand") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_export.c | 16 ++++++++++++---- fs/xfs/xfs_inode.c | 48 +++++++++++++++++++++++++++++++++++------------- fs/xfs/xfs_itable.c | 2 ++ fs/xfs/xfs_qm.c | 15 ++++++++++++--- 4 files changed, 61 insertions(+), 20 deletions(-)
--- a/fs/xfs/xfs_export.c +++ b/fs/xfs/xfs_export.c @@ -146,10 +146,18 @@ xfs_nfs_get_inode( return ERR_PTR(error); }
- error = xfs_inode_reload_unlinked(ip); - if (error) { - xfs_irele(ip); - return ERR_PTR(error); + /* + * Reload the incore unlinked list to avoid failure in inodegc. + * Use an unlocked check here because unrecovered unlinked inodes + * should be somewhat rare. + */ + if (xfs_inode_unlinked_incomplete(ip)) { + error = xfs_inode_reload_unlinked(ip); + if (error) { + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); + xfs_irele(ip); + return ERR_PTR(error); + } }
if (VFS_I(ip)->i_generation != generation) { --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1744,6 +1744,14 @@ xfs_inactive( truncate = 1;
if (xfs_iflags_test(ip, XFS_IQUOTAUNCHECKED)) { + /* + * If this inode is being inactivated during a quotacheck and + * has not yet been scanned by quotacheck, we /must/ remove + * the dquots from the inode before inactivation changes the + * block and inode counts. Most probably this is a result of + * reloading the incore iunlinked list to purge unrecovered + * unlinked inodes. + */ xfs_qm_dqdetach(ip); } else { error = xfs_qm_dqattach(ip); @@ -3657,6 +3665,16 @@ xfs_inode_reload_unlinked_bucket( if (error) return error;
+ /* + * We've taken ILOCK_SHARED and the AGI buffer lock to stabilize the + * incore unlinked list pointers for this inode. Check once more to + * see if we raced with anyone else to reload the unlinked list. + */ + if (!xfs_inode_unlinked_incomplete(ip)) { + foundit = true; + goto out_agibp; + } + bucket = agino % XFS_AGI_UNLINKED_BUCKETS; agi = agibp->b_addr;
@@ -3671,25 +3689,27 @@ xfs_inode_reload_unlinked_bucket( while (next_agino != NULLAGINO) { struct xfs_inode *next_ip = NULL;
+ /* Found this caller's inode, set its backlink. */ if (next_agino == agino) { - /* Found this inode, set its backlink. */ next_ip = ip; next_ip->i_prev_unlinked = prev_agino; foundit = true; + goto next_inode; } - if (!next_ip) { - /* Inode already in memory. */ - next_ip = xfs_iunlink_lookup(pag, next_agino); - } - if (!next_ip) { - /* Inode not in memory, reload. */ - error = xfs_iunlink_reload_next(tp, agibp, prev_agino, - next_agino); - if (error) - break;
- next_ip = xfs_iunlink_lookup(pag, next_agino); - } + /* Try in-memory lookup first. */ + next_ip = xfs_iunlink_lookup(pag, next_agino); + if (next_ip) + goto next_inode; + + /* Inode not in memory, try reloading it. */ + error = xfs_iunlink_reload_next(tp, agibp, prev_agino, + next_agino); + if (error) + break; + + /* Grab the reloaded inode. */ + next_ip = xfs_iunlink_lookup(pag, next_agino); if (!next_ip) { /* No incore inode at all? We reloaded it... */ ASSERT(next_ip != NULL); @@ -3697,10 +3717,12 @@ xfs_inode_reload_unlinked_bucket( break; }
+next_inode: prev_agino = next_agino; next_agino = next_ip->i_next_unlinked; }
+out_agibp: xfs_trans_brelse(tp, agibp); /* Should have found this inode somewhere in the iunlinked bucket. */ if (!error && !foundit) --- a/fs/xfs/xfs_itable.c +++ b/fs/xfs/xfs_itable.c @@ -80,10 +80,12 @@ xfs_bulkstat_one_int( if (error) goto out;
+ /* Reload the incore unlinked list to avoid failure in inodegc. */ if (xfs_inode_unlinked_incomplete(ip)) { error = xfs_inode_reload_unlinked_bucket(tp, ip); if (error) { xfs_iunlock(ip, XFS_ILOCK_SHARED); + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); xfs_irele(ip); return error; } --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -1160,9 +1160,18 @@ xfs_qm_dqusage_adjust( if (error) return error;
- error = xfs_inode_reload_unlinked(ip); - if (error) - goto error0; + /* + * Reload the incore unlinked list to avoid failure in inodegc. + * Use an unlocked check here because unrecovered unlinked inodes + * should be somewhat rare. + */ + if (xfs_inode_unlinked_incomplete(ip)) { + error = xfs_inode_reload_unlinked(ip); + if (error) { + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); + goto error0; + } + }
ASSERT(ip->i_delayed_blks == 0);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 8e698ee72c4ecbbf18264568eb310875839fd601 ]
Through generic/300, I discovered that mkfs.xfs creates corrupt filesystems when given these parameters:
Filesystems formatted with --unsupported are not supported!! meta-data=/dev/sda isize=512 agcount=8, agsize=16352 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=1 = reflink=1 bigtime=1 inobtcount=1 nrext64=1 data = bsize=4096 blocks=130816, imaxpct=25 = sunit=32 swidth=128 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=8192, version=2 = sectsz=512 sunit=32 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 = rgcount=0 rgsize=0 blks Discarding blocks...Done. Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... - 16:30:50: zeroing log - 16320 of 16320 blocks done - scan filesystem freespace and inode maps... agf_freeblks 25, counted 0 in ag 4 sb_fdblocks 8823, counted 8798
The root cause of this problem is the numrecs handling in xfs_freesp_init_recs, which is used to initialize a new AG. Prior to calling the function, we set up the new bnobt block with numrecs == 1 and rely on _freesp_init_recs to format that new record. If the last record created has a blockcount of zero, then it sets numrecs = 0.
That last bit isn't correct if the AG contains the log, the start of the log is not immediately after the initial blocks due to stripe alignment, and the end of the log is perfectly aligned with the end of the AG. For this case, we actually formatted a single bnobt record to handle the free space before the start of the (stripe aligned) log, and incremented arec to try to format a second record. That second record turned out to be unnecessary, so what we really want is to leave numrecs at 1.
The numrecs handling itself is overly complicated because a different function sets numrecs == 1. Change the bnobt creation code to start with numrecs set to zero and only increment it after successfully formatting a free space extent into the btree block.
Fixes: f327a00745ff ("xfs: account for log space when formatting new AGs") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_ag.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-)
--- a/fs/xfs/libxfs/xfs_ag.c +++ b/fs/xfs/libxfs/xfs_ag.c @@ -415,10 +415,12 @@ xfs_freesp_init_recs( ASSERT(start >= mp->m_ag_prealloc_blocks); if (start != mp->m_ag_prealloc_blocks) { /* - * Modify first record to pad stripe align of log + * Modify first record to pad stripe align of log and + * bump the record count. */ arec->ar_blockcount = cpu_to_be32(start - mp->m_ag_prealloc_blocks); + be16_add_cpu(&block->bb_numrecs, 1); nrec = arec + 1;
/* @@ -429,7 +431,6 @@ xfs_freesp_init_recs( be32_to_cpu(arec->ar_startblock) + be32_to_cpu(arec->ar_blockcount)); arec = nrec; - be16_add_cpu(&block->bb_numrecs, 1); } /* * Change record start to after the internal log @@ -438,15 +439,13 @@ xfs_freesp_init_recs( }
/* - * Calculate the record block count and check for the case where - * the log might have consumed all available space in the AG. If - * so, reset the record count to 0 to avoid exposure of an invalid - * record start block. + * Calculate the block count of this record; if it is nonzero, + * increment the record count. */ arec->ar_blockcount = cpu_to_be32(id->agsize - be32_to_cpu(arec->ar_startblock)); - if (!arec->ar_blockcount) - block->bb_numrecs = 0; + if (arec->ar_blockcount) + be16_add_cpu(&block->bb_numrecs, 1); }
/* @@ -458,7 +457,7 @@ xfs_bnoroot_init( struct xfs_buf *bp, struct aghdr_init_data *id) { - xfs_btree_init_block(mp, bp, XFS_BTNUM_BNO, 0, 1, id->agno); + xfs_btree_init_block(mp, bp, XFS_BTNUM_BNO, 0, 0, id->agno); xfs_freesp_init_recs(mp, bp, id); }
@@ -468,7 +467,7 @@ xfs_cntroot_init( struct xfs_buf *bp, struct aghdr_init_data *id) { - xfs_btree_init_block(mp, bp, XFS_BTNUM_CNT, 0, 1, id->agno); + xfs_btree_init_block(mp, bp, XFS_BTNUM_CNT, 0, 0, id->agno); xfs_freesp_init_recs(mp, bp, id); }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Chinner dchinner@redhat.com
[ Upstream commit f1e1765aad7de7a8b8102044fc6a44684bc36180 ]
If the journal geometry results in a sector or log stripe unit validation problem, it indicates that we cannot set the log up to safely write to the the journal. In these cases, we must abort the mount because the corruption needs external intervention to resolve. Similarly, a journal that is too large cannot be written to safely, either, so we shouldn't allow those geometries to mount, either.
If the log is too small, we risk having transaction reservations overruning the available log space and the system hanging waiting for space it can never provide. This is purely a runtime hang issue, not a corruption issue as per the first cases listed above. We abort mounts of the log is too small for V5 filesystems, but we must allow v4 filesystems to mount because, historically, there was no log size validity checking and so some systems may still be out there with undersized logs.
The problem is that on V4 filesystems, when we discover a log geometry problem, we skip all the remaining checks and then allow the log to continue mounting. This mean that if one of the log size checks fails, we skip the log stripe unit check. i.e. we allow the mount because a "non-fatal" geometry is violated, and then fail to check the hard fail geometries that should fail the mount.
Move all these fatal checks to the superblock verifier, and add a new check for the two log sector size geometry variables having the same values. This will prevent any attempt to mount a log that has invalid or inconsistent geometries long before we attempt to mount the log.
However, for the minimum log size checks, we can only do that once we've setup up the log and calculated all the iclog sizes and roundoffs. Hence this needs to remain in the log mount code after the log has been initialised. It is also the only case where we should allow a v4 filesystem to continue running, so leave that handling in place, too.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org ---
Notes: A new fix for the latest 6.1.y backport series just came in. Ran some tests on it as well and all looks good. Please include with the original set.
Thanks, Leah
fs/xfs/libxfs/xfs_sb.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++- fs/xfs/xfs_log.c | 47 +++++++++++++---------------------------- 2 files changed, 70 insertions(+), 33 deletions(-)
--- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -413,7 +413,6 @@ xfs_validate_sb_common( sbp->sb_inodelog < XFS_DINODE_MIN_LOG || sbp->sb_inodelog > XFS_DINODE_MAX_LOG || sbp->sb_inodesize != (1 << sbp->sb_inodelog) || - sbp->sb_logsunit > XLOG_MAX_RECORD_BSIZE || sbp->sb_inopblock != howmany(sbp->sb_blocksize,sbp->sb_inodesize) || XFS_FSB_TO_B(mp, sbp->sb_agblocks) < XFS_MIN_AG_BYTES || XFS_FSB_TO_B(mp, sbp->sb_agblocks) > XFS_MAX_AG_BYTES || @@ -431,6 +430,61 @@ xfs_validate_sb_common( return -EFSCORRUPTED; }
+ /* + * Logs that are too large are not supported at all. Reject them + * outright. Logs that are too small are tolerated on v4 filesystems, + * but we can only check that when mounting the log. Hence we skip + * those checks here. + */ + if (sbp->sb_logblocks > XFS_MAX_LOG_BLOCKS) { + xfs_notice(mp, + "Log size 0x%x blocks too large, maximum size is 0x%llx blocks", + sbp->sb_logblocks, XFS_MAX_LOG_BLOCKS); + return -EFSCORRUPTED; + } + + if (XFS_FSB_TO_B(mp, sbp->sb_logblocks) > XFS_MAX_LOG_BYTES) { + xfs_warn(mp, + "log size 0x%llx bytes too large, maximum size is 0x%llx bytes", + XFS_FSB_TO_B(mp, sbp->sb_logblocks), + XFS_MAX_LOG_BYTES); + return -EFSCORRUPTED; + } + + /* + * Do not allow filesystems with corrupted log sector or stripe units to + * be mounted. We cannot safely size the iclogs or write to the log if + * the log stripe unit is not valid. + */ + if (sbp->sb_versionnum & XFS_SB_VERSION_SECTORBIT) { + if (sbp->sb_logsectsize != (1U << sbp->sb_logsectlog)) { + xfs_notice(mp, + "log sector size in bytes/log2 (0x%x/0x%x) must match", + sbp->sb_logsectsize, 1U << sbp->sb_logsectlog); + return -EFSCORRUPTED; + } + } else if (sbp->sb_logsectsize || sbp->sb_logsectlog) { + xfs_notice(mp, + "log sector size in bytes/log2 (0x%x/0x%x) are not zero", + sbp->sb_logsectsize, sbp->sb_logsectlog); + return -EFSCORRUPTED; + } + + if (sbp->sb_logsunit > 1) { + if (sbp->sb_logsunit % sbp->sb_blocksize) { + xfs_notice(mp, + "log stripe unit 0x%x bytes must be a multiple of block size", + sbp->sb_logsunit); + return -EFSCORRUPTED; + } + if (sbp->sb_logsunit > XLOG_MAX_RECORD_BSIZE) { + xfs_notice(mp, + "log stripe unit 0x%x bytes over maximum size (0x%x bytes)", + sbp->sb_logsunit, XLOG_MAX_RECORD_BSIZE); + return -EFSCORRUPTED; + } + } + /* Validate the realtime geometry; stolen from xfs_repair */ if (sbp->sb_rextsize * sbp->sb_blocksize > XFS_MAX_RTEXTSIZE || sbp->sb_rextsize * sbp->sb_blocksize < XFS_MIN_RTEXTSIZE) { --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -639,7 +639,6 @@ xfs_log_mount( int num_bblks) { struct xlog *log; - bool fatal = xfs_has_crc(mp); int error = 0; int min_logfsbs;
@@ -661,53 +660,37 @@ xfs_log_mount( mp->m_log = log;
/* - * Validate the given log space and drop a critical message via syslog - * if the log size is too small that would lead to some unexpected - * situations in transaction log space reservation stage. + * Now that we have set up the log and it's internal geometry + * parameters, we can validate the given log space and drop a critical + * message via syslog if the log size is too small. A log that is too + * small can lead to unexpected situations in transaction log space + * reservation stage. The superblock verifier has already validated all + * the other log geometry constraints, so we don't have to check those + * here. * - * Note: we can't just reject the mount if the validation fails. This - * would mean that people would have to downgrade their kernel just to - * remedy the situation as there is no way to grow the log (short of - * black magic surgery with xfs_db). + * Note: For v4 filesystems, we can't just reject the mount if the + * validation fails. This would mean that people would have to + * downgrade their kernel just to remedy the situation as there is no + * way to grow the log (short of black magic surgery with xfs_db). * - * We can, however, reject mounts for CRC format filesystems, as the + * We can, however, reject mounts for V5 format filesystems, as the * mkfs binary being used to make the filesystem should never create a * filesystem with a log that is too small. */ min_logfsbs = xfs_log_calc_minimum_size(mp); - if (mp->m_sb.sb_logblocks < min_logfsbs) { xfs_warn(mp, "Log size %d blocks too small, minimum size is %d blocks", mp->m_sb.sb_logblocks, min_logfsbs); - error = -EINVAL; - } else if (mp->m_sb.sb_logblocks > XFS_MAX_LOG_BLOCKS) { - xfs_warn(mp, - "Log size %d blocks too large, maximum size is %lld blocks", - mp->m_sb.sb_logblocks, XFS_MAX_LOG_BLOCKS); - error = -EINVAL; - } else if (XFS_FSB_TO_B(mp, mp->m_sb.sb_logblocks) > XFS_MAX_LOG_BYTES) { - xfs_warn(mp, - "log size %lld bytes too large, maximum size is %lld bytes", - XFS_FSB_TO_B(mp, mp->m_sb.sb_logblocks), - XFS_MAX_LOG_BYTES); - error = -EINVAL; - } else if (mp->m_sb.sb_logsunit > 1 && - mp->m_sb.sb_logsunit % mp->m_sb.sb_blocksize) { - xfs_warn(mp, - "log stripe unit %u bytes must be a multiple of block size", - mp->m_sb.sb_logsunit); - error = -EINVAL; - fatal = true; - } - if (error) { + /* * Log check errors are always fatal on v5; or whenever bad * metadata leads to a crash. */ - if (fatal) { + if (xfs_has_crc(mp)) { xfs_crit(mp, "AAIEEE! Log failed size checks. Abort!"); ASSERT(0); + error = -EINVAL; goto out_free_log; } xfs_crit(mp, "Log size out of supported range.");
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
commit 7f3287db654395f9c5ddd246325ff7889f550286 upstream.
When running in container environmment, /sys/fs/cgroup/ might not be the real root node of the sk-attached cgroup.
Example:
In container: % stat /sys//fs/cgroup/ Device: 0,21 Inode: 2214 .. % stat /sys/fs/cgroup/foo Device: 0,21 Inode: 2264 ..
The expectation would be for:
nft add rule .. socket cgroupv2 level 1 "foo" counter
to match traffic from a process that got added to "foo" via "echo $pid > /sys/fs/cgroup/foo/cgroup.procs".
However, 'level 3' is needed to make this work.
Seen from initial namespace, the complete hierarchy is:
% stat /sys/fs/cgroup/system.slice/docker-.../foo Device: 0,21 Inode: 2264 ..
i.e. hierarchy is 0 1 2 3 / -> system.slice -> docker-1... -> foo
... but the container doesn't know that its "/" is the "docker-1.." cgroup. Current code will retrieve the 'system.slice' cgroup node and store its kn->id in the destination register, so compare with 2264 ("foo" cgroup id) will not match.
Fetch "/" cgroup from ->init() and add its level to the level we try to extract. cgroup root-level is 0 for the init-namespace or the level of the ancestor that is exposed as the cgroup root inside the container.
In the above case, cgrp->level of "/" resolved in the container is 2 (docker-1...scope/) and request for 'level 1' will get adjusted to fetch the actual level (3).
v2: use CONFIG_SOCK_CGROUP_DATA, eval function depends on it. (kernel test robot)
Cc: cgroups@vger.kernel.org Fixes: e0bb96db96f8 ("netfilter: nft_socket: add support for cgroupsv2") Reported-by: Nadia Pinaeva n.m.pinaeva@gmail.com Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nft_socket.c | 41 ++++++++++++++++++++++++++++++++++++++--- 1 file changed, 38 insertions(+), 3 deletions(-)
--- a/net/netfilter/nft_socket.c +++ b/net/netfilter/nft_socket.c @@ -9,7 +9,8 @@
struct nft_socket { enum nft_socket_keys key:8; - u8 level; + u8 level; /* cgroupv2 level to extract */ + u8 level_user; /* cgroupv2 level provided by userspace */ u8 len; union { u8 dreg; @@ -53,6 +54,28 @@ nft_sock_get_eval_cgroupv2(u32 *dest, st memcpy(dest, &cgid, sizeof(u64)); return true; } + +/* process context only, uses current->nsproxy. */ +static noinline int nft_socket_cgroup_subtree_level(void) +{ + struct cgroup *cgrp = cgroup_get_from_path("/"); + int level; + + if (!cgrp) + return -ENOENT; + + level = cgrp->level; + + cgroup_put(cgrp); + + if (WARN_ON_ONCE(level > 255)) + return -ERANGE; + + if (WARN_ON_ONCE(level < 0)) + return -EINVAL; + + return level; +} #endif
static struct sock *nft_socket_do_lookup(const struct nft_pktinfo *pkt) @@ -174,9 +197,10 @@ static int nft_socket_init(const struct case NFT_SOCKET_MARK: len = sizeof(u32); break; -#ifdef CONFIG_CGROUPS +#ifdef CONFIG_SOCK_CGROUP_DATA case NFT_SOCKET_CGROUPV2: { unsigned int level; + int err;
if (!tb[NFTA_SOCKET_LEVEL]) return -EINVAL; @@ -185,6 +209,17 @@ static int nft_socket_init(const struct if (level > 255) return -EOPNOTSUPP;
+ err = nft_socket_cgroup_subtree_level(); + if (err < 0) + return err; + + priv->level_user = level; + + level += err; + /* Implies a giant cgroup tree */ + if (WARN_ON_ONCE(level > 255)) + return -EOPNOTSUPP; + priv->level = level; len = sizeof(u64); break; @@ -209,7 +244,7 @@ static int nft_socket_dump(struct sk_buf if (nft_dump_register(skb, NFTA_SOCKET_DREG, priv->dreg)) return -1; if (priv->key == NFT_SOCKET_CGROUPV2 && - nla_put_be32(skb, NFTA_SOCKET_LEVEL, htonl(priv->level))) + nla_put_be32(skb, NFTA_SOCKET_LEVEL, htonl(priv->level_user))) return -1; return 0; }
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
commit 7052622fccb1efb850c6b55de477f65d03525a30 upstream.
The cgroup_get_from_path() function never returns NULL, it returns error pointers. Update the error handling to match.
Fixes: 7f3287db6543 ("netfilter: nft_socket: make cgroupsv2 matching work with namespaces") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Acked-by: Florian Westphal fw@strlen.de Acked-by: Pablo Neira Ayuso pablo@netfilter.org Link: https://patch.msgid.link/bbc0c4e0-05cc-4f44-8797-2f4b3920a820@stanley.mounta... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nft_socket.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/netfilter/nft_socket.c +++ b/net/netfilter/nft_socket.c @@ -61,8 +61,8 @@ static noinline int nft_socket_cgroup_su struct cgroup *cgrp = cgroup_get_from_path("/"); int level;
- if (!cgrp) - return -ENOENT; + if (IS_ERR(cgrp)) + return PTR_ERR(cgrp);
level = cgrp->level;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit 29b359cf6d95fd60730533f7f10464e95bd17c73 upstream.
The generation mask can be updated while netlink dump is in progress. The pipapo set backend walk iterator cannot rely on it to infer what view of the datastructure is to be used. Add notation to specify if user wants to read/update the set.
Based on patch from Florian Westphal.
Fixes: 2b84e215f874 ("netfilter: nft_set_pipapo: .walk does not deal with generations") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/netfilter/nf_tables.h | 13 +++++++++++++ net/netfilter/nf_tables_api.c | 5 +++++ net/netfilter/nft_set_pipapo.c | 5 +++-- 3 files changed, 21 insertions(+), 2 deletions(-)
--- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -296,9 +296,22 @@ struct nft_set_elem { void *priv; };
+/** + * enum nft_iter_type - nftables set iterator type + * + * @NFT_ITER_READ: read-only iteration over set elements + * @NFT_ITER_UPDATE: iteration under mutex to update set element state + */ +enum nft_iter_type { + NFT_ITER_UNSPEC, + NFT_ITER_READ, + NFT_ITER_UPDATE, +}; + struct nft_set; struct nft_set_iter { u8 genmask; + enum nft_iter_type type:8; unsigned int count; unsigned int skip; int err; --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -628,6 +628,7 @@ static void nft_map_deactivate(const str { struct nft_set_iter iter = { .genmask = nft_genmask_next(ctx->net), + .type = NFT_ITER_UPDATE, .fn = nft_mapelem_deactivate, };
@@ -5143,6 +5144,7 @@ int nf_tables_bind_set(const struct nft_ }
iter.genmask = nft_genmask_next(ctx->net); + iter.type = NFT_ITER_UPDATE; iter.skip = 0; iter.count = 0; iter.err = 0; @@ -5218,6 +5220,7 @@ static void nft_map_activate(const struc { struct nft_set_iter iter = { .genmask = nft_genmask_next(ctx->net), + .type = NFT_ITER_UPDATE, .fn = nft_mapelem_activate, };
@@ -5574,6 +5577,7 @@ static int nf_tables_dump_set(struct sk_ args.cb = cb; args.skb = skb; args.iter.genmask = nft_genmask_cur(net); + args.iter.type = NFT_ITER_READ; args.iter.skip = cb->args[0]; args.iter.count = 0; args.iter.err = 0; @@ -6957,6 +6961,7 @@ static int nft_set_flush(struct nft_ctx { struct nft_set_iter iter = { .genmask = genmask, + .type = NFT_ITER_UPDATE, .fn = nft_setelem_flush, };
--- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -2042,13 +2042,14 @@ static void nft_pipapo_walk(const struct struct nft_set_iter *iter) { struct nft_pipapo *priv = nft_set_priv(set); - struct net *net = read_pnet(&set->net); const struct nft_pipapo_match *m; const struct nft_pipapo_field *f; int i, r;
+ WARN_ON_ONCE(iter->type == NFT_ITER_UNSPEC); + rcu_read_lock(); - if (iter->genmask == nft_genmask_cur(net)) + if (iter->type == NFT_ITER_READ) m = rcu_dereference(priv->match); else m = priv->clone;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit efefd4f00c967d00ad7abe092554ffbb70c1a793 upstream.
Add missing decorator type to lookup expression and tighten WARN_ON_ONCE check in pipapo to spot earlier that this is unset.
Fixes: 29b359cf6d95 ("netfilter: nft_set_pipapo: walk over current view on netlink dump") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nft_lookup.c | 1 + net/netfilter/nft_set_pipapo.c | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-)
--- a/net/netfilter/nft_lookup.c +++ b/net/netfilter/nft_lookup.c @@ -211,6 +211,7 @@ static int nft_lookup_validate(const str return 0;
iter.genmask = nft_genmask_next(ctx->net); + iter.type = NFT_ITER_UPDATE; iter.skip = 0; iter.count = 0; iter.err = 0; --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -2046,7 +2046,8 @@ static void nft_pipapo_walk(const struct const struct nft_pipapo_field *f; int i, r;
- WARN_ON_ONCE(iter->type == NFT_ITER_UNSPEC); + WARN_ON_ONCE(iter->type != NFT_ITER_READ && + iter->type != NFT_ITER_UPDATE);
rcu_read_lock(); if (iter->type == NFT_ITER_READ)
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ping-Ke Shih pkshih@realtek.com
This reverts commit 19d13ec00a8b1d60c5cc06bd0006b91d5bd8d46f which is commmit 1474bc87fe57deac726cc10203f73daa6c3212f7 upstream.
The reverted commit is based on implementation of wiphy locking that isn't planned to redo on a stable kernel, so revert it to avoid warning:
WARNING: CPU: 0 PID: 9 at net/wireless/core.h:231 disconnect_work+0xb8/0x144 [cfg80211] CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.6.51-00141-ga1649b6f8ed6 #7 Hardware name: Freescale i.MX6 SoloX (Device Tree) Workqueue: events disconnect_work [cfg80211] unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x58/0x70 dump_stack_lvl from __warn+0x70/0x1c0 __warn from warn_slowpath_fmt+0x16c/0x294 warn_slowpath_fmt from disconnect_work+0xb8/0x144 [cfg80211] disconnect_work [cfg80211] from process_one_work+0x204/0x620 process_one_work from worker_thread+0x1b0/0x474 worker_thread from kthread+0x10c/0x12c kthread from ret_from_fork+0x14/0x24
Reported-by: petter@technux.se Closes: https://lore.kernel.org/linux-wireless/9e98937d781c990615ef27ee0c858ff9@tech... Cc: Johannes Berg johannes@sipsolutions.net Signed-off-by: Ping-Ke Shih pkshih@realtek.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/wireless/core.h | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)
--- a/net/wireless/core.h +++ b/net/wireless/core.h @@ -228,7 +228,6 @@ void cfg80211_register_wdev(struct cfg80 static inline void wdev_lock(struct wireless_dev *wdev) __acquires(wdev) { - lockdep_assert_held(&wdev->wiphy->mtx); mutex_lock(&wdev->mtx); __acquire(wdev->mtx); } @@ -236,16 +235,11 @@ static inline void wdev_lock(struct wire static inline void wdev_unlock(struct wireless_dev *wdev) __releases(wdev) { - lockdep_assert_held(&wdev->wiphy->mtx); __release(wdev->mtx); mutex_unlock(&wdev->mtx); }
-static inline void ASSERT_WDEV_LOCK(struct wireless_dev *wdev) -{ - lockdep_assert_held(&wdev->wiphy->mtx); - lockdep_assert_held(&wdev->mtx); -} +#define ASSERT_WDEV_LOCK(wdev) lockdep_assert_held(&(wdev)->mtx)
static inline bool cfg80211_has_monitors_only(struct cfg80211_registered_device *rdev) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kent Gibson warthog618@gmail.com
commit b440396387418fe2feaacd41ca16080e7a8bc9ad upstream.
linereq_set_config() behaves badly when direction is not set. The configuration validation is borrowed from linereq_create(), where, to verify the intent of the user, the direction must be set to in order to effect a change to the electrical configuration of a line. But, when applied to reconfiguration, that validation does not allow for the unset direction case, making it possible to clear flags set previously without specifying the line direction.
Adding to the inconsistency, those changes are not immediately applied by linereq_set_config(), but will take effect when the line value is next get or set.
For example, by requesting a configuration with no flags set, an output line with GPIO_V2_LINE_FLAG_ACTIVE_LOW and GPIO_V2_LINE_FLAG_OPEN_DRAIN set could have those flags cleared, inverting the sense of the line and changing the line drive to push-pull on the next line value set.
Skip the reconfiguration of lines for which the direction is not set, and only reconfigure the lines for which direction is set.
Fixes: a54756cb24ea ("gpiolib: cdev: support GPIO_V2_LINE_SET_CONFIG_IOCTL") Signed-off-by: Kent Gibson warthog618@gmail.com Link: https://lore.kernel.org/r/20240626052925.174272-3-warthog618@gmail.com Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpiolib-cdev.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
--- a/drivers/gpio/gpiolib-cdev.c +++ b/drivers/gpio/gpiolib-cdev.c @@ -1523,12 +1523,14 @@ static long linereq_set_config_unlocked( line = &lr->lines[i]; desc = lr->lines[i].desc; flags = gpio_v2_line_config_flags(lc, i); - gpio_v2_line_config_flags_to_desc_flags(flags, &desc->flags); - edflags = flags & GPIO_V2_LINE_EDGE_DETECTOR_FLAGS; /* - * Lines have to be requested explicitly for input - * or output, else the line will be treated "as is". + * Lines not explicitly reconfigured as input or output + * are left unchanged. */ + if (!(flags & GPIO_V2_LINE_DIRECTION_FLAGS)) + continue; + gpio_v2_line_config_flags_to_desc_flags(flags, &desc->flags); + edflags = flags & GPIO_V2_LINE_EDGE_DETECTOR_FLAGS; if (flags & GPIO_V2_LINE_FLAG_OUTPUT) { int val = gpio_v2_line_config_output_value(lc, i);
@@ -1536,7 +1538,7 @@ static long linereq_set_config_unlocked( ret = gpiod_direction_output(desc, val); if (ret) return ret; - } else if (flags & GPIO_V2_LINE_FLAG_INPUT) { + } else { ret = gpiod_direction_input(desc); if (ret) return ret;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hagar Hemdan hagarhem@amazon.com
commit d795848ecce24a75dfd46481aee066ae6fe39775 upstream.
Userspace may trigger a speculative read of an address outside the gpio descriptor array. Users can do that by calling gpio_ioctl() with an offset out of range. Offset is copied from user and then used as an array index to get the gpio descriptor without sanitization in gpio_device_get_desc().
This change ensures that the offset is sanitized by using array_index_nospec() to mitigate any possibility of speculative information leaks.
This bug was discovered and resolved using Coverity Static Analysis Security Testing (SAST) by Synopsys, Inc.
Signed-off-by: Hagar Hemdan hagarhem@amazon.com Link: https://lore.kernel.org/r/20240523085332.1801-1-hagarhem@amazon.com Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Hugo SIMELIERE hsimeliere.opensource@witekio.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpiolib.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -5,6 +5,7 @@ #include <linux/module.h> #include <linux/interrupt.h> #include <linux/irq.h> +#include <linux/nospec.h> #include <linux/spinlock.h> #include <linux/list.h> #include <linux/device.h> @@ -146,7 +147,7 @@ struct gpio_desc *gpiochip_get_desc(stru if (hwnum >= gdev->ngpio) return ERR_PTR(-EINVAL);
- return &gdev->descs[hwnum]; + return &gdev->descs[array_index_nospec(hwnum, gdev->ngpio)]; } EXPORT_SYMBOL_GPL(gpiochip_get_desc);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Kleine-Budde mkl@pengutronix.de
commit 51b2a721612236335ddec4f3fb5f59e72a204f3a upstream.
To fix the coding style, remove the whitespace in front of labels.
Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c | 34 +++++++++++------------ drivers/net/can/spi/mcp251xfd/mcp251xfd-dump.c | 2 - drivers/net/can/spi/mcp251xfd/mcp251xfd-regmap.c | 2 - drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c | 2 - 4 files changed, 20 insertions(+), 20 deletions(-)
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c @@ -791,7 +791,7 @@ static int mcp251xfd_chip_start(struct m
return 0;
- out_chip_stop: +out_chip_stop: mcp251xfd_dump(priv); mcp251xfd_chip_stop(priv, CAN_STATE_STOPPED);
@@ -1576,7 +1576,7 @@ static irqreturn_t mcp251xfd_irq(int irq handled = IRQ_HANDLED; } while (1);
- out_fail: +out_fail: can_rx_offload_threaded_irq_finish(&priv->offload);
netdev_err(priv->ndev, "IRQ handler returned %d (intf=0x%08x).\n", @@ -1641,22 +1641,22 @@ static int mcp251xfd_open(struct net_dev
return 0;
- out_free_irq: +out_free_irq: free_irq(spi->irq, priv); - out_destroy_workqueue: +out_destroy_workqueue: destroy_workqueue(priv->wq); - out_can_rx_offload_disable: +out_can_rx_offload_disable: can_rx_offload_disable(&priv->offload); set_bit(MCP251XFD_FLAGS_DOWN, priv->flags); mcp251xfd_timestamp_stop(priv); - out_transceiver_disable: +out_transceiver_disable: mcp251xfd_transceiver_disable(priv); - out_mcp251xfd_ring_free: +out_mcp251xfd_ring_free: mcp251xfd_ring_free(priv); - out_pm_runtime_put: +out_pm_runtime_put: mcp251xfd_chip_stop(priv, CAN_STATE_STOPPED); pm_runtime_put(ndev->dev.parent); - out_close_candev: +out_close_candev: close_candev(ndev);
return err; @@ -1820,9 +1820,9 @@ mcp251xfd_register_get_dev_id(const stru *effective_speed_hz_slow = xfer[0].effective_speed_hz; *effective_speed_hz_fast = xfer[1].effective_speed_hz;
- out_kfree_buf_tx: +out_kfree_buf_tx: kfree(buf_tx); - out_kfree_buf_rx: +out_kfree_buf_rx: kfree(buf_rx);
return err; @@ -1936,13 +1936,13 @@ static int mcp251xfd_register(struct mcp
return 0;
- out_unregister_candev: +out_unregister_candev: unregister_candev(ndev); - out_chip_sleep: +out_chip_sleep: mcp251xfd_chip_sleep(priv); - out_runtime_disable: +out_runtime_disable: pm_runtime_disable(ndev->dev.parent); - out_runtime_put_noidle: +out_runtime_put_noidle: pm_runtime_put_noidle(ndev->dev.parent); mcp251xfd_clks_and_vdd_disable(priv);
@@ -2162,9 +2162,9 @@ static int mcp251xfd_probe(struct spi_de
return 0;
- out_can_rx_offload_del: +out_can_rx_offload_del: can_rx_offload_del(&priv->offload); - out_free_candev: +out_free_candev: spi->max_speed_hz = priv->spi_max_speed_hz_orig;
free_candev(ndev); --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-dump.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-dump.c @@ -94,7 +94,7 @@ static void mcp251xfd_dump_registers(con kfree(buf); }
- out: +out: mcp251xfd_dump_header(iter, MCP251XFD_DUMP_OBJECT_TYPE_REG, reg); }
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-regmap.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-regmap.c @@ -397,7 +397,7 @@ mcp251xfd_regmap_crc_read(void *context,
return err; } - out: +out: memcpy(val_buf, buf_rx->data, val_len);
return 0; --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c @@ -219,7 +219,7 @@ int mcp251xfd_handle_tefif(struct mcp251 total_frame_len += frame_len; }
- out_netif_wake_queue: +out_netif_wake_queue: len = i; /* number of handled goods TEFs */ if (len) { struct mcp251xfd_tef_ring *ring = priv->tef;
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Kleine-Budde mkl@pengutronix.de
commit a7801540f325d104de5065850a003f1d9bdc6ad3 upstream.
The mcp251xfd wakes up from Low Power or Sleep Mode when SPI activity is detected. To avoid this, make sure that the timestamp worker is stopped before shutting down the chip.
Split the starting of the timestamp worker out of mcp251xfd_timestamp_init() into the separate function mcp251xfd_timestamp_start().
Call mcp251xfd_timestamp_init() before mcp251xfd_chip_start(), move mcp251xfd_timestamp_start() to mcp251xfd_chip_start(). In this way, mcp251xfd_timestamp_stop() can be called unconditionally by mcp251xfd_chip_stop().
Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c | 8 +++++--- drivers/net/can/spi/mcp251xfd/mcp251xfd-timestamp.c | 7 +++++-- drivers/net/can/spi/mcp251xfd/mcp251xfd.h | 1 + 3 files changed, 11 insertions(+), 5 deletions(-)
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c @@ -744,6 +744,7 @@ static void mcp251xfd_chip_stop(struct m
mcp251xfd_chip_interrupts_disable(priv); mcp251xfd_chip_rx_int_disable(priv); + mcp251xfd_timestamp_stop(priv); mcp251xfd_chip_sleep(priv); }
@@ -763,6 +764,8 @@ static int mcp251xfd_chip_start(struct m if (err) goto out_chip_stop;
+ mcp251xfd_timestamp_start(priv); + err = mcp251xfd_set_bittiming(priv); if (err) goto out_chip_stop; @@ -1610,11 +1613,12 @@ static int mcp251xfd_open(struct net_dev if (err) goto out_mcp251xfd_ring_free;
+ mcp251xfd_timestamp_init(priv); + err = mcp251xfd_chip_start(priv); if (err) goto out_transceiver_disable;
- mcp251xfd_timestamp_init(priv); clear_bit(MCP251XFD_FLAGS_DOWN, priv->flags); can_rx_offload_enable(&priv->offload);
@@ -1648,7 +1652,6 @@ out_destroy_workqueue: out_can_rx_offload_disable: can_rx_offload_disable(&priv->offload); set_bit(MCP251XFD_FLAGS_DOWN, priv->flags); - mcp251xfd_timestamp_stop(priv); out_transceiver_disable: mcp251xfd_transceiver_disable(priv); out_mcp251xfd_ring_free: @@ -1674,7 +1677,6 @@ static int mcp251xfd_stop(struct net_dev free_irq(ndev->irq, priv); destroy_workqueue(priv->wq); can_rx_offload_disable(&priv->offload); - mcp251xfd_timestamp_stop(priv); mcp251xfd_chip_stop(priv, CAN_STATE_STOPPED); mcp251xfd_transceiver_disable(priv); mcp251xfd_ring_free(priv); --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-timestamp.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-timestamp.c @@ -48,9 +48,12 @@ void mcp251xfd_timestamp_init(struct mcp cc->shift = 1; cc->mult = clocksource_hz2mult(priv->can.clock.freq, cc->shift);
- timecounter_init(&priv->tc, &priv->cc, ktime_get_real_ns()); - INIT_DELAYED_WORK(&priv->timestamp, mcp251xfd_timestamp_work); +} + +void mcp251xfd_timestamp_start(struct mcp251xfd_priv *priv) +{ + timecounter_init(&priv->tc, &priv->cc, ktime_get_real_ns()); schedule_delayed_work(&priv->timestamp, MCP251XFD_TIMESTAMP_WORK_DELAY_SEC * HZ); } --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h @@ -939,6 +939,7 @@ int mcp251xfd_ring_alloc(struct mcp251xf int mcp251xfd_handle_rxif(struct mcp251xfd_priv *priv); int mcp251xfd_handle_tefif(struct mcp251xfd_priv *priv); void mcp251xfd_timestamp_init(struct mcp251xfd_priv *priv); +void mcp251xfd_timestamp_start(struct mcp251xfd_priv *priv); void mcp251xfd_timestamp_stop(struct mcp251xfd_priv *priv);
void mcp251xfd_tx_obj_write_sync(struct work_struct *work);
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit 49ac6f05ace5bb0070c68a0193aa05d3c25d4c83 upstream.
A new endpoint using the IP of the initial subflow has been recently added to increase the code coverage. But it breaks the test when using old kernels not having commit 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk"), e.g. on v5.15.
Similar to commit d4c81bbb8600 ("selftests: mptcp: join: support local endpoint being tracked or not"), it is possible to add the new endpoint conditionally, by checking if "mptcp_pm_subflow_check_next" is present in kallsyms: this is not directly linked to the commit introducing this symbol but for the parent one which is linked anyway. So we can know in advance what will be the expected behaviour, and add the new endpoint only when it makes sense to do so.
Fixes: 4878f9f8421f ("selftests: mptcp: join: validate fullmesh endp on 1st sf") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-1-8f124... Signed-off-by: Jakub Kicinski kuba@kernel.org [ Conflicts in mptcp_join.sh, because the 'run_tests' helper has been modified in multiple commits that are not in this version, e.g. commit e571fb09c893 ("selftests: mptcp: add speed env var"). The conflict was in the context, the new lines can still be added at the same place. ] Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -3048,7 +3048,9 @@ fullmesh_tests() pm_nl_set_limits $ns1 1 3 pm_nl_set_limits $ns2 1 3 pm_nl_add_endpoint $ns1 10.0.2.1 flags signal - pm_nl_add_endpoint $ns2 10.0.1.2 flags subflow,fullmesh + if mptcp_lib_kallsyms_has "mptcp_pm_subflow_check_next$"; then + pm_nl_add_endpoint $ns2 10.0.1.2 flags subflow,fullmesh + fi run_tests $ns1 $ns2 10.0.1.1 0 0 fullmesh_1 slow chk_join_nr 3 3 3 chk_add_nr 1 1
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
commit f8f210dc84709804c9f952297f2bfafa6ea6b4bd upstream.
When updating the global block reserve, we account for the 6 items needed by an unlink operation and the 6 delayed references for each one of those items. However the calculation for the delayed references is not correct in case we have the free space tree enabled, as in that case we need to touch the free space tree as well and therefore need twice the number of bytes. So use the btrfs_calc_delayed_ref_bytes() helper to calculate the number of bytes need for the delayed references at btrfs_update_global_block_rsv().
Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com [Diogo: this patch has been cherry-picked from the original commit; conflicts included lack of a define (picked from commit 5630e2bcfe223) and lack of btrfs_calc_delayed_ref_bytes (picked from commit 0e55a54502b97) - changed const struct -> struct for compatibility.] Signed-off-by: Diogo Jahchan Koike djahchankoike@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/block-rsv.c | 14 ++++++++------ fs/btrfs/block-rsv.h | 12 ++++++++++++ fs/btrfs/delayed-ref.h | 21 +++++++++++++++++++++ 3 files changed, 41 insertions(+), 6 deletions(-)
--- a/fs/btrfs/block-rsv.c +++ b/fs/btrfs/block-rsv.c @@ -384,17 +384,19 @@ void btrfs_update_global_block_rsv(struc
/* * But we also want to reserve enough space so we can do the fallback - * global reserve for an unlink, which is an additional 5 items (see the - * comment in __unlink_start_trans for what we're modifying.) + * global reserve for an unlink, which is an additional + * BTRFS_UNLINK_METADATA_UNITS items. * * But we also need space for the delayed ref updates from the unlink, - * so its 10, 5 for the actual operation, and 5 for the delayed ref - * updates. + * so add BTRFS_UNLINK_METADATA_UNITS units for delayed refs, one for + * each unlink metadata item. */ - min_items += 10; + min_items += BTRFS_UNLINK_METADATA_UNITS;
num_bytes = max_t(u64, num_bytes, - btrfs_calc_insert_metadata_size(fs_info, min_items)); + btrfs_calc_insert_metadata_size(fs_info, min_items) + + btrfs_calc_delayed_ref_bytes(fs_info, + BTRFS_UNLINK_METADATA_UNITS));
spin_lock(&sinfo->lock); spin_lock(&block_rsv->lock); --- a/fs/btrfs/block-rsv.h +++ b/fs/btrfs/block-rsv.h @@ -50,6 +50,18 @@ struct btrfs_block_rsv { u64 qgroup_rsv_reserved; };
+/* + * Number of metadata items necessary for an unlink operation: + * + * 1 for the possible orphan item + * 1 for the dir item + * 1 for the dir index + * 1 for the inode ref + * 1 for the inode + * 1 for the parent inode + */ +#define BTRFS_UNLINK_METADATA_UNITS 6 + void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, enum btrfs_rsv_type type); void btrfs_init_root_block_rsv(struct btrfs_root *root); struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_fs_info *fs_info, --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -253,6 +253,27 @@ extern struct kmem_cache *btrfs_delayed_ int __init btrfs_delayed_ref_init(void); void __cold btrfs_delayed_ref_exit(void);
+static inline u64 btrfs_calc_delayed_ref_bytes(struct btrfs_fs_info *fs_info, + int num_delayed_refs) +{ + u64 num_bytes; + + num_bytes = btrfs_calc_insert_metadata_size(fs_info, num_delayed_refs); + + /* + * We have to check the mount option here because we could be enabling + * the free space tree for the first time and don't have the compat_ro + * option set yet. + * + * We need extra reservations if we have the free space tree because + * we'll have to modify that tree as well. + */ + if (btrfs_test_opt(fs_info, FREE_SPACE_TREE)) + num_bytes *= 2; + + return num_bytes; +} + static inline void btrfs_init_generic_ref(struct btrfs_ref *generic_ref, int action, u64 bytenr, u64 len, u64 parent) {
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sumeet Pawnikar sumeet.r.pawnikar@intel.com
commit d05b5e0baf424c8c4b4709ac11f66ab726c8deaf upstream.
The current initialization of the struct x86_cpu_id via pl4_support_ids[] is partial and wrong. It is initializing "stepping" field with "X86_FEATURE_ANY" instead of "feature" field.
Use X86_MATCH_INTEL_FAM6_MODEL macro instead of initializing each field of the struct x86_cpu_id for pl4_supported list of CPUs. This X86_MATCH_INTEL_FAM6_MODEL macro internally uses another macro X86_MATCH_VENDOR_FAM_MODEL_FEATURE for X86 based CPU matching with appropriate initialized values.
Reported-by: Dave Hansen dave.hansen@intel.com Link: https://lore.kernel.org/lkml/28ead36b-2d9e-1a36-6f4e-04684e420260@intel.com Fixes: eb52bc2ae5b8 ("powercap: RAPL: Add Power Limit4 support for Meteor Lake SoC") Fixes: b08b95cf30f5 ("powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P") Fixes: 515755906921 ("powercap: RAPL: Add Power Limit4 support for RaptorLake") Fixes: 1cc5b9a411e4 ("powercap: Add Power Limit4 support for Alder Lake SoC") Fixes: 8365a898fe53 ("powercap: Add Power Limit4 support") Signed-off-by: Sumeet Pawnikar sumeet.r.pawnikar@intel.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Reviewed-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com [ Ricardo: I removed METEORLAKE and METEORLAKE_L from pl4_support_ids as they are not included in v6.1. ] Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/powercap/intel_rapl_msr.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
--- a/drivers/powercap/intel_rapl_msr.c +++ b/drivers/powercap/intel_rapl_msr.c @@ -136,12 +136,12 @@ static int rapl_msr_write_raw(int cpu, s
/* List of verified CPUs. */ static const struct x86_cpu_id pl4_support_ids[] = { - { X86_VENDOR_INTEL, 6, INTEL_FAM6_TIGERLAKE_L, X86_FEATURE_ANY }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ALDERLAKE, X86_FEATURE_ANY }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ALDERLAKE_L, X86_FEATURE_ANY }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ALDERLAKE_N, X86_FEATURE_ANY }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_RAPTORLAKE, X86_FEATURE_ANY }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_RAPTORLAKE_P, X86_FEATURE_ANY }, + X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE_L, NULL), + X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE, NULL), + X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, NULL), + X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_N, NULL), + X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE, NULL), + X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE_P, NULL), {} };
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tony Luck tony.luck@intel.com
commit 2eda374e883ad297bd9fe575a16c1dc850346075 upstream.
New CPU #defines encode vendor and family as well as model.
[ dhansen: vertically align 0's in invlpg_miss_ids[] ]
Signed-off-by: Tony Luck tony.luck@intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Link: https://lore.kernel.org/all/20240424181518.41946-1-tony.luck%40intel.com [ Ricardo: I used the old match macro X86_MATCH_INTEL_FAM6_MODEL() instead of X86_MATCH_VFM() as in the upstream commit. I also kept the ALDERLAKE_N name instead of ATOM_GRACEMONT. Both refer to the same CPU model. ] Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com Reviewed-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/mm/init.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-)
--- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -262,21 +262,17 @@ static void __init probe_page_size_mask( } }
-#define INTEL_MATCH(_model) { .vendor = X86_VENDOR_INTEL, \ - .family = 6, \ - .model = _model, \ - } /* * INVLPG may not properly flush Global entries * on these CPUs when PCIDs are enabled. */ static const struct x86_cpu_id invlpg_miss_ids[] = { - INTEL_MATCH(INTEL_FAM6_ALDERLAKE ), - INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ), - INTEL_MATCH(INTEL_FAM6_ALDERLAKE_N ), - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ), - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P), - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S), + X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE, 0), + X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, 0), + X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_N, 0), + X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE, 0), + X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE_P, 0), + X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE_S, 0), {} };
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Junhao Xie bigfoot@classfun.cn
commit 7d47d22444bb7dc1b6d768904a22070ef35e1fc0 upstream.
Add the device id for the Macrosilicon MS3020 which is a PL2303HXN based device.
Signed-off-by: Junhao Xie bigfoot@classfun.cn Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/serial/pl2303.c | 1 + drivers/usb/serial/pl2303.h | 4 ++++ 2 files changed, 5 insertions(+)
--- a/drivers/usb/serial/pl2303.c +++ b/drivers/usb/serial/pl2303.c @@ -118,6 +118,7 @@ static const struct usb_device_id id_tab { USB_DEVICE(SMART_VENDOR_ID, SMART_PRODUCT_ID) }, { USB_DEVICE(AT_VENDOR_ID, AT_VTKIT3_PRODUCT_ID) }, { USB_DEVICE(IBM_VENDOR_ID, IBM_PRODUCT_ID) }, + { USB_DEVICE(MACROSILICON_VENDOR_ID, MACROSILICON_MS3020_PRODUCT_ID) }, { } /* Terminating entry */ };
--- a/drivers/usb/serial/pl2303.h +++ b/drivers/usb/serial/pl2303.h @@ -171,3 +171,7 @@ /* Allied Telesis VT-Kit3 */ #define AT_VENDOR_ID 0x0caa #define AT_VTKIT3_PRODUCT_ID 0x3001 + +/* Macrosilicon MS3020 */ +#define MACROSILICON_VENDOR_ID 0x345f +#define MACROSILICON_MS3020_PRODUCT_ID 0x3020
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Edward Adam Davis eadavis@qq.com
commit 625fa77151f00c1bd00d34d60d6f2e710b3f9aad upstream.
The syzbot reported a kernel-usb-infoleak in usbtmc_write, we need to clear the structure before filling fields.
Fixes: 4ddc645f40e9 ("usb: usbtmc: Add ioctl for vendor specific write") Reported-and-tested-by: syzbot+9d34f80f841e948c3fdb@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9d34f80f841e948c3fdb Signed-off-by: Edward Adam Davis eadavis@qq.com Cc: stable stable@kernel.org Link: https://lore.kernel.org/r/tencent_9649AA6EC56EDECCA8A7D106C792D1C66B06@qq.co... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/class/usbtmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/usb/class/usbtmc.c +++ b/drivers/usb/class/usbtmc.c @@ -754,7 +754,7 @@ static struct urb *usbtmc_create_urb(voi if (!urb) return NULL;
- dmabuf = kmalloc(bufsize, GFP_KERNEL); + dmabuf = kzalloc(bufsize, GFP_KERNEL); if (!dmabuf) { usb_free_urb(urb); return NULL;
Am 27.09.2024 um 14:23 schrieb Greg Kroah-Hartman:
This is the start of the stable review cycle for the 6.1.112 release. There are 73 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Builds, boots and works on my 2-socket Ivy Bridge Xeon E5-2697 v2 server. No dmesg oddities or regressions found.
Tested-by: Peter Schneider pschneider1968@googlemail.com
Beste Grüße, Peter Schneider
This is the start of the stable review cycle for the 6.1.112 release. There are 73 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 29 Sep 2024 12:17:00 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.112-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my x86_64 and ARM64 test systems. No errors or regressions.
Tested-by: Allen Pais apais@linux.microsoft.com
Thanks.
On Fri, 27 Sep 2024 14:23:11 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.112 release. There are 73 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 29 Sep 2024 12:17:00 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.112-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v6.1: 10 builds: 10 pass, 0 fail 26 boots: 26 pass, 0 fail 115 tests: 115 pass, 0 fail
Linux version: 6.1.112-rc1-g4f910bc2b928 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On 9/27/24 05:23, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.112 release. There are 73 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 29 Sep 2024 12:17:00 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.112-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli florian.fainelli@broadcom.com
On Fri, 27 Sept 2024 at 18:00, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.1.112 release. There are 73 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 29 Sep 2024 12:17:00 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.112-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 6.1.112-rc1 * git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git * git commit: 4f910bc2b928f935a8a8203ccfa7be8456ac8f29 * git describe: v6.1.110-137-g4f910bc2b928 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.1.y/build/v6.1.11...
## Test Regressions (compared to v6.1.110-64-gdc7da8d6f263)
## Metric Regressions (compared to v6.1.110-64-gdc7da8d6f263)
## Test Fixes (compared to v6.1.110-64-gdc7da8d6f263)
## Metric Fixes (compared to v6.1.110-64-gdc7da8d6f263)
## Test result summary total: 146051, pass: 125727, fail: 2019, skip: 18116, xfail: 189
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 135 total, 135 passed, 0 failed * arm64: 41 total, 41 passed, 0 failed * i386: 28 total, 26 passed, 2 failed * mips: 26 total, 25 passed, 1 failed * parisc: 4 total, 4 passed, 0 failed * powerpc: 36 total, 35 passed, 1 failed * riscv: 7 total, 7 passed, 0 failed * s390: 14 total, 14 passed, 0 failed * sh: 10 total, 10 passed, 0 failed * sparc: 7 total, 7 passed, 0 failed * x86_64: 33 total, 33 passed, 0 failed
## Test suites summary * boot * commands * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-efivarfs * kselftest-exec * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-filesystems-epoll * kselftest-firmware * kselftest-fpu * kselftest-ftrace * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-kcmp * kselftest-kvm * kselftest-membarrier * kselftest-memfd * kselftest-mincore * kselftest-mqueue * kselftest-net * kselftest-net-mptcp * kselftest-openat2 * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-tc-testing * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user_events * kselftest-vDSO * kselftest-watchdog * kselftest-x86 * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-hugetlb * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-pty * ltp-sched * ltp-smoke * ltp-syscalls * ltp-tracing * perf * rcutorture
-- Linaro LKFT https://lkft.linaro.org
On 9/27/24 06:23, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.112 release. There are 73 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 29 Sep 2024 12:17:00 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.112-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On 9/27/24 5:23 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.1.112 release. There are 73 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 29 Sep 2024 12:17:00 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.112-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
---- On Fri, 27 Sep 2024 17:23:11 +0500 Greg Kroah-Hartman wrote ---
This is the start of the stable review cycle for the 6.1.112 release. There are 73 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 29 Sep 2024 12:17:00 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.112-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below.
thanks,
greg k-h
Hi,
Please find the KernelCI report below :-
OVERVIEW
Builds: 26 passed, 0 failed
Boot tests: 500 passed, 0 failed
CI systems: maestro
REVISION
Commit name: v6.1.110-137-g4f910bc2b928 hash: 4f910bc2b928f935a8a8203ccfa7be8456ac8f29 Checked out from https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
BUILDS
No new build failures found
BOOT TESTS
No new boot failures found
See complete and up-to-date report at: https://kcidb.kernelci.org/d/revision/revision?orgId=1&var-datasource=pr...
Tested-by: kernelci.org bot bot@kernelci.org
Thanks, KernelCI team
Hi!
This is the start of the stable review cycle for the 6.1.112 release. There are 73 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
CIP testing did not find any problems here:
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-6...
Tested-by: Pavel Machek (CIP) pavel@denx.de
Best regards, Pavel
linux-stable-mirror@lists.linaro.org