This is the start of the stable review cycle for the 6.3.10 release. There are 199 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 28 Jun 2023 18:07:23 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.3.10-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.3.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.3.10-rc1
Namjae Jeon linkinjeon@kernel.org ksmbd: call putname after using the last component
Namjae Jeon linkinjeon@kernel.org ksmbd: fix uninitialized pointer read in smb2_create_link()
Namjae Jeon linkinjeon@kernel.org ksmbd: fix uninitialized pointer read in ksmbd_vfs_rename()
Marc Zyngier maz@kernel.org KVM: arm64: Restore GICv2-on-GICv3 functionality
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: drop module reference after updating chain
Clark Wang xiaoning.wang@nxp.com i2c: imx-lpi2c: fix type char overflow issue when calculating the clock cycle
Dheeraj Kumar Srivastava dheerajkumar.srivastava@amd.com x86/apic: Fix kernel panic when booting with intremap=off and x2apic_phys
Omar Sandoval osandov@fb.com x86/unwind/orc: Add ELF section with ORC version identifier
Andrey Smetanin asmetanin@yandex-team.ru vhost_net: revert upend_idx only on retriable error
Shannon Nelson shannon.nelson@amd.com vhost_vdpa: tell vqs about the negotiated
Rong Tao rongtao@cestc.cn tools/virtio: Fix arm64 ringtest compilation error
Min Li lm0963hack@gmail.com drm/radeon: fix race condition UAF in radeon_gem_set_domain_ioctl
Min Li lm0963hack@gmail.com drm/exynos: fix race condition UAF in exynos_g2d_exec_ioctl
Inki Dae inki.dae@samsung.com drm/exynos: vidi: fix a wrong error return
Nitesh Shetty nj.shetty@samsung.com null_blk: Fix: memory release when memory_backed=1
Linus Walleij linus.walleij@linaro.org ARM: dts: Fix erroneous ADS touchscreen polarities
David Zheng david.zheng@intel.com i2c: designware: fix idx_write_cnt in read loop
Simon Horman horms@kernel.org i2c: mchp-pci1xxxx: Avoid cast to incompatible function type
Sayed, Karimuddin karimuddin.sayed@intel.com ALSA: hda/realtek: Add "Intel Reference board" and "NUC 13" SSID in the ALC256
Min-Hua Chen minhuadotchen@gmail.com net: sched: wrap tc_skip_wrapper with CONFIG_RETPOLINE
Chancel Liu chancel.liu@nxp.com ASoC: fsl_sai: Enable BCI bit if SAI works on synchronous mode with BYP asserted
Alexander Gordeev agordeev@linux.ibm.com s390/purgatory: disable branch profiling
Andreas Gruenbacher agruenba@redhat.com gfs2: Don't get stuck writing page onto itself under direct I/O
Sicong Jiang kevin.jiangsc@gmail.com ASoC: amd: yc: Add Thinkpad Neo14 to quirks list for acp6x
Edson Juliano Drosdeck edson.drosdeck@gmail.com ASoC: nau8824: Add quirk to active-high jack-detect
Hao Yao hao.yao@intel.com platform/x86: int3472: Avoid crash in unregistering regulator gpio
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org soundwire: qcom: add proper error paths in qcom_swrm_startup()
Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com soundwire: dmi-quirks: add new mapping for HP Spectre x360
Herve Codina herve.codina@bootlin.com ASoC: simple-card: Add missing of_node_put() in case of error
Srinivas Kandagatla srinivas.kandagatla@linaro.org ASoC: codecs: wcd938x-sdw: do not set can_multi_write flag
Clark Wang xiaoning.wang@nxp.com spi: lpspi: disable lpspi module irq in DMA mode
Vineeth Vijayan vneethv@linux.ibm.com s390/cio: unregister device when the only path is gone
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org arm64: dts: qcom: sc7280-qcard: drop incorrect dai-cells from WCD938x SDW
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org arm64: dts: qcom: sc7280-idp: drop incorrect dai-cells from WCD938x SDW
Hans de Goede hdegoede@redhat.com Input: soc_button_array - add invalid acpi_index DMI quirk handling
Uday Shankar ushankar@purestorage.com nvme: improve handling of long keep alives
Uday Shankar ushankar@purestorage.com nvme: check IO start time when deciding to defer KA
Uday Shankar ushankar@purestorage.com nvme: double KA polling frequency to avoid KATO with TBKAS on
min15.li min15.li@samsung.com nvme: fix miss command type check
Dan Carpenter dan.carpenter@linaro.org usb: gadget: udc: fix NULL dereference in remove()
Shida Zhang zhangshida@kylinos.cn btrfs: fix an uninitialized variable warning in btrfs_log_inode
Osama Muhammad osmtendev@gmail.com nfcsim.c: Fix error checking for debugfs_create_dir
Hans Verkuil hverkuil-cisco@xs4all.nl media: cec: core: don't set last_initiator if tx in progress
Hans Verkuil hverkuil-cisco@xs4all.nl media: cec: core: disable adapter in cec_devnode_unregister
Steve French stfrench@microsoft.com smb3: missing null check in SMB2_change_notify
Marc Zyngier maz@kernel.org arm64: Add missing Set/Way CMO encodings
Denis Arefev arefev@swemel.ru HID: wacom: Add error check to wacom_parse_and_register()
Maurizio Lombardi mlombard@redhat.com scsi: target: iscsi: Prevent login threads from racing between each other
Maurizio Lombardi mlombard@redhat.com scsi: target: iscsi: Remove unused transport_timer
Maurizio Lombardi mlombard@redhat.com scsi: target: iscsi: Fix hang in the iSCSI login code
Michael Walle mwalle@kernel.org gpiolib: Fix irq_domain resource tracking for gpiochip_irqchip_add_domain()
Su Hui suhui@nfschina.com iommu/amd: Fix possible memory leak of 'domain'
Jiasheng Jiang jiasheng@iscas.ac.cn gpio: sifive: add missing check for platform_get_irq
Jiawen Wu jiawenwu@trustnetic.com gpiolib: Fix GPIO chip IRQ initialization restriction
Nicolas Frattaroli frattaroli.nicolas@gmail.com arm64: dts: rockchip: fix nEXTRST on SOQuartz
Maciej Żenczykowski maze@google.com revert "net: align SO_RCVMARK required privileges with SO_MARK"
Eric Dumazet edumazet@google.com sch_netem: acquire qdisc lock in netem_change()
Shyam Sundar S K Shyam-sundar.S-k@amd.com platform/x86/amd/pmf: Register notify handler only if SPS is enabled
Danielle Ratson danieller@nvidia.com selftests: forwarding: Fix race condition in mirror installation
Jens Axboe axboe@kernel.dk io_uring/net: use the correct msghdr union member in io_sendmsg_copy_hdr
Jiri Olsa jolsa@kernel.org bpf: Force kprobe multi expected_attach_type for kprobe_multi link
Florent Revest revest@chromium.org bpf/btf: Accept function names that contain dots
Francesco Dolcini francesco.dolcini@toradex.com Revert "net: phy: dp83867: perform soft reset and retain established link"
Pablo Neira Ayuso pablo@netfilter.org netfilter: nfnetlink_osf: fix module autoload
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: disallow updates of anonymous sets
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: reject unbound chain set before commit phase
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: reject unbound anonymous set before commit phase
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: disallow element updates of bound anonymous sets
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_set_pipapo: .walk does not deal with generations
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: drop map element references from preparation phase
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: fix chain binding transaction logic
Ross Lagerwall ross.lagerwall@citrix.com be2net: Extend xmit workaround to BE3 chip
Vladimir Oltean olteanv@gmail.com net: dsa: introduce preferred_default_local_cpu_port and use on MT7530
Arınç ÜNAL arinc.unal@arinc9.com net: dsa: mt7530: fix handling of LLDP frames
Arınç ÜNAL arinc.unal@arinc9.com net: dsa: mt7530: fix handling of BPDUs on MT7530 switch
Arınç ÜNAL arinc.unal@arinc9.com net: dsa: mt7530: fix trapping frames on non-MT7621 SoC MT7530 switch
Terin Stock terin@cloudflare.com ipvs: align inner_mac_header for encapsulation
Sergey Shtylyov s.shtylyov@omp.ru mmc: usdhi60rol0: fix deferred probing
Sergey Shtylyov s.shtylyov@omp.ru mmc: sh_mmcif: fix deferred probing
Sergey Shtylyov s.shtylyov@omp.ru mmc: sdhci-acpi: fix deferred probing
Sergey Shtylyov s.shtylyov@omp.ru mmc: owl: fix deferred probing
Sergey Shtylyov s.shtylyov@omp.ru mmc: omap_hsmmc: fix deferred probing
Sergey Shtylyov s.shtylyov@omp.ru mmc: omap: fix deferred probing
Sergey Shtylyov s.shtylyov@omp.ru mmc: mvsdio: fix deferred probing
Sergey Shtylyov s.shtylyov@omp.ru mmc: mtk-sd: fix deferred probing
Stefan Wahren stefan.wahren@i2se.com net: qca_spi: Avoid high load if QCA7000 is not available
Íñigo Huguet ihuguet@redhat.com sfc: use budget for TX completions
Yevgeny Kliteynik kliteyn@nvidia.com net/mlx5: DR, Fix wrong action data allocation in decap action
Sebastian Andrzej Siewior bigeasy@linutronix.de xfrm: Linearize the skb after offloading if needed.
Magali Lemes magali.lemes@canonical.com selftests: net: fcnal-test: check if FIPS mode is enabled
Magali Lemes magali.lemes@canonical.com selftests: net: vrf-xfrm-tests: change authentication and encryption algos
Magali Lemes magali.lemes@canonical.com selftests: net: tls: check if FIPS mode is enabled
Yonghong Song yhs@fb.com bpf: Fix a bpf_jit_dump issue for x86_64 with sysctl bpf_jit_enable.
Maciej Żenczykowski maze@google.com xfrm: fix inbound ipv4/udp/esp packets to UDPv6 dualstack sockets
Maxim Mikityanskiy maxim@isovalent.com bpf: Fix verifier id tracking of scalars on spill
Leon Romanovsky leon@kernel.org xfrm: add missed call to delete offloaded policies
Reiji Watanabe reijiw@google.com KVM: arm64: PMU: Restore the host's PMUSERENR_EL0
Benedict Wong benedictwong@google.com xfrm: Ensure policies always checked on XFRM-I input path
Benedict Wong benedictwong@google.com xfrm: Treat already-verified secpath entries as optional
Chen Aotian chenaotian2@163.com ieee802154: hwsim: Fix possible memory leaks
Lee Jones lee@kernel.org x86/mm: Avoid using set_pgd() outside of real PGD pages
Jens Axboe axboe@kernel.dk io_uring/poll: serialize poll linked timer start with poll removal
Ming Lei ming.lei@redhat.com block: make sure local irq is disabled when calling __blkcg_rstat_flush
Andrew Powers-Holmes aholmes@omnom.net arm64: dts: rockchip: Fix rk356x PCIe register and range mappings
Namjae Jeon linkinjeon@kernel.org ksmbd: add mnt_want_write to ksmbd vfs functions
Namjae Jeon linkinjeon@kernel.org ksmbd: fix racy issue from using ->d_parent and ->d_name
Al Viro viro@zeniv.linux.org.uk fs: introduce lock_rename_child() helper
Namjae Jeon linkinjeon@kernel.org ksmbd: remove internal.h include
Russ Weight russell.h.weight@intel.com regmap: spi-avmm: Fix regmap_bus max_raw_write
Teresa Remmet t.remmet@phytec.de regulator: pca9450: Fix LDO3OUT and LDO4OUT MASK
Neil Armstrong neil.armstrong@linaro.org spi: spi-geni-qcom: correctly handle -EPROBE_DEFER from dma_request_chan()
Mukesh Sisodiya mukesh.sisodiya@intel.com wifi: iwlwifi: pcie: Handle SO-F device for PCI id 0x7AF0
Krister Johansen kjlx@templeofstupid.com bpf: ensure main program has an extable
Sergey Shtylyov s.shtylyov@omp.ru mmc: meson-gx: fix deferred probing
Sergey Shtylyov s.shtylyov@omp.ru mmc: sunxi: fix deferred probing
Sergey Shtylyov s.shtylyov@omp.ru mmc: bcm2835: fix deferred probing
Sergey Shtylyov s.shtylyov@omp.ru mmc: sdhci-spear: fix deferred probing
Christophe Kerello christophe.kerello@foss.st.com mmc: mmci: stm32: fix max busy timeout calculation
Martin Hundebøll martin@geanix.com mmc: meson-gx: remove redundant mmc_request_done() call from irq context
Stephan Gerhold stephan@gerhold.net mmc: sdhci-msm: Disable broken 64-bit DMA on MSM8916
Jisheng Zhang jszhang@kernel.org mmc: litex_mmc: set PROBE_PREFER_ASYNCHRONOUS
Jiawen Wu jiawenwu@trustnetic.com net: mdio: fix the wrong parameters
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex in freezer_css_{online,offline}()
Xiu Jianfeng xiujianfeng@huawei.com cgroup: Do not corrupt task iteration when rebinding subsystem
Paolo Abeni pabeni@redhat.com mptcp: ensure listener is unhashed before updating the sk status
Paolo Abeni pabeni@redhat.com mptcp: consolidate fallback and non fallback state machine
Paolo Abeni pabeni@redhat.com mptcp: fix possible list corruption on passive MPJ
Paolo Abeni pabeni@redhat.com mptcp: fix possible divide by zero in recvmsg()
Paolo Abeni pabeni@redhat.com mptcp: handle correctly disconnect() failures
Jens Axboe axboe@kernel.dk io_uring/net: disable partial retries for recvmsg with cmsg
Jens Axboe axboe@kernel.dk io_uring/net: clear msg_controllen on partial sendmsg retry
Dexuan Cui decui@microsoft.com PCI: hv: Add a per-bus mutex state_lock
Dexuan Cui decui@microsoft.com PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic
Dexuan Cui decui@microsoft.com PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev
Dexuan Cui decui@microsoft.com Revert "PCI: hv: Fix a timing issue which causes kdump to fail occasionally"
Dexuan Cui decui@microsoft.com PCI: hv: Fix a race condition bug in hv_pci_query_relations()
Michael Kelley mikelley@microsoft.com Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs
Dexuan Cui decui@microsoft.com Drivers: hv: vmbus: Call hv_synic_free() if hv_synic_alloc() fails
Liam R. Howlett Liam.Howlett@oracle.com mm/mprotect: fix do_mprotect_pkey() limit check
Lorenzo Stoakes lstoakes@gmail.com mm/vmalloc: do not output a spurious warning when huge vmalloc() fails
Gavin Shan gshan@redhat.com KVM: Avoid illegal stage2 mapping on invalid memory slot
Hans de Goede hdegoede@redhat.com thermal/intel/intel_soc_dts_iosf: Fix reporting wrong temperatures
Rafael J. Wysocki rafael.j.wysocki@intel.com ACPI: sleep: Avoid breaking S3 wakeup due to might_sleep()
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: prevent general protection fault in nilfs_clear_dirty_page()
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: fix buffer corruption due to concurrent device reads
Prathu Baronia prathubaronia2011@gmail.com scripts: fix the gfp flags header path in gfp-translate
Rafael Aquini aquini@redhat.com writeback: fix dereferencing NULL mapping->host on writeback_page_template
Roberto Sassu roberto.sassu@huawei.com memfd: check for non-NULL file_seals in memfd_create() syscall
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: join: skip mixed tests if not supported
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: join: uniform listener tests
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: join: skip PM listener tests if not supported
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: join: skip MPC backups tests if not supported
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: join: skip fail tests if not supported
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: join: skip userspace PM tests if not supported
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: join: skip fullmesh flag tests if not supported
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: join: skip backup if set flag on ID not supported
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: join: skip implicit tests if not supported
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: join: support RM_ADDR for used endpoints or not
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: join: skip Fastclose tests if not supported
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: join: support local endpoint being tracked or not
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: join: skip test if iptables/tc cmds fail
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: join: skip check if MIB counter not supported
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: join: helpers to skip tests
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: join: use 'iptables-legacy' if available
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: lib: skip if not below kernel version
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: userspace pm: skip PM listener events tests if unavailable
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: userspace pm: skip if not supported
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: userspace pm: skip if 'ip' tool is unavailable
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: sockopt: skip TCP_INQ checks if not supported
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: sockopt: skip getsockopt checks if not supported
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: sockopt: relax expected returned size
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: pm nl: skip fullmesh flag checks if not supported
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: pm nl: remove hardcoded default limits
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: diag: skip inuse tests if not supported
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: diag: skip listen tests if not supported
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: connect: skip TFO tests if not supported
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: connect: skip disconnect tests if not supported
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: connect: skip transp tests if not supported
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: lib: skip if missing symbol
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: join: fix ShellCheck warnings
Matthieu Baerts matthieu.baerts@tessares.net selftests: mptcp: remove duplicated entries in usage
Nathan Chancellor nathan@kernel.org riscv: Link with '-z norelro'
Michael S. Tsirkin mst@redhat.com Revert "virtio-blk: support completion batching for the IRQ path"
Thomas Gleixner tglx@linutronix.de tick/common: Align tick period during sched_timer setup
Vishal Moola (Oracle) vishal.moola@gmail.com afs: Fix waiting for writeback then skipping folio
Vishal Moola (Oracle) vishal.moola@gmail.com afs: Fix dangling folio ref counts in writeback
Linus Torvalds torvalds@linux-foundation.org Revert "efi: random: refresh non-volatile random seed when RNG is initialized"
Mike Kravetz mike.kravetz@oracle.com udmabuf: revert 'Add support for mapping hugepages (v4)'
Namjae Jeon linkinjeon@kernel.org ksmbd: validate session id and tree id in the compound request
Namjae Jeon linkinjeon@kernel.org ksmbd: fix out-of-bound read in smb2_write
Namjae Jeon linkinjeon@kernel.org ksmbd: validate command payload size
Lino Sanfilippo l.sanfilippo@kunbus.com tpm, tpm_tis: Claim locality in interrupt handler
Alexei Starovoitov ast@kernel.org mm: Fix copy_from_user_nofault().
Damien Le Moal dlemoal@kernel.org ata: libata-scsi: Avoid deadlock on rescan after device resume
Tom Chung chiahsuan.chung@amd.com drm/amd/display: fix the system hang while disable PSR
Rodrigo Siqueira Rodrigo.Siqueira@amd.com drm/amd/display: Add wrapper to call planes and stream update
Rodrigo Siqueira Rodrigo.Siqueira@amd.com drm/amd/display: Use dc_update_planes_and_stream
Shyam Prasad N sprasad@microsoft.com cifs: fix status checks in cifs_tree_connect
-------------
Diffstat:
Makefile | 4 +- arch/arm/boot/dts/am57xx-cl-som-am57x.dts | 2 +- arch/arm/boot/dts/at91sam9261ek.dts | 2 +- arch/arm/boot/dts/imx7d-pico-hobbit.dts | 2 +- arch/arm/boot/dts/imx7d-sdb.dts | 2 +- arch/arm/boot/dts/omap3-cm-t3x.dtsi | 2 +- arch/arm/boot/dts/omap3-devkit8000-lcd-common.dtsi | 2 +- arch/arm/boot/dts/omap3-lilly-a83x.dtsi | 2 +- arch/arm/boot/dts/omap3-overo-common-lcd35.dtsi | 2 +- arch/arm/boot/dts/omap3-overo-common-lcd43.dtsi | 2 +- arch/arm/boot/dts/omap3-pandora-common.dtsi | 2 +- arch/arm/boot/dts/omap5-cm-t54.dts | 2 +- arch/arm64/boot/dts/qcom/sc7280-idp.dtsi | 2 - arch/arm64/boot/dts/qcom/sc7280-qcard.dtsi | 2 - .../boot/dts/rockchip/rk3566-soquartz-cm4.dts | 18 +- arch/arm64/boot/dts/rockchip/rk3566-soquartz.dtsi | 29 +- arch/arm64/boot/dts/rockchip/rk3568.dtsi | 14 +- arch/arm64/boot/dts/rockchip/rk356x.dtsi | 7 +- arch/arm64/include/asm/sysreg.h | 6 + arch/arm64/kvm/hyp/include/hyp/switch.h | 13 +- arch/arm64/kvm/vgic/vgic-init.c | 11 +- arch/riscv/Makefile | 2 +- arch/s390/purgatory/Makefile | 1 + arch/x86/Makefile | 12 + arch/x86/include/asm/Kbuild | 1 + arch/x86/include/asm/orc_header.h | 19 + arch/x86/kernel/apic/x2apic_phys.c | 5 +- arch/x86/kernel/unwind_orc.c | 3 + arch/x86/mm/kaslr.c | 8 +- arch/x86/net/bpf_jit_comp.c | 2 +- block/blk-cgroup.c | 5 +- drivers/acpi/acpica/achware.h | 2 - drivers/acpi/sleep.c | 16 +- drivers/ata/libata-core.c | 3 +- drivers/ata/libata-eh.c | 2 +- drivers/ata/libata-scsi.c | 22 +- drivers/base/regmap/regmap-spi-avmm.c | 2 +- drivers/block/null_blk/main.c | 1 + drivers/block/virtio_blk.c | 82 ++- drivers/char/tpm/tpm_tis_core.c | 2 + drivers/dma-buf/udmabuf.c | 47 +- drivers/firmware/efi/efi.c | 21 - drivers/gpio/gpio-sifive.c | 8 +- drivers/gpio/gpiolib.c | 11 +- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 73 ++- drivers/gpu/drm/exynos/exynos_drm_g2d.c | 2 +- drivers/gpu/drm/exynos/exynos_drm_vidi.c | 2 - drivers/gpu/drm/radeon/radeon_gem.c | 4 +- drivers/hid/wacom_sys.c | 7 +- drivers/hv/channel_mgmt.c | 18 +- drivers/hv/vmbus_drv.c | 5 +- drivers/i2c/busses/i2c-designware-core.h | 1 + drivers/i2c/busses/i2c-designware-slave.c | 4 + drivers/i2c/busses/i2c-imx-lpi2c.c | 4 +- drivers/i2c/busses/i2c-mchp-pci1xxxx.c | 6 +- drivers/input/misc/soc_button_array.c | 30 ++ drivers/iommu/amd/iommu.c | 8 +- drivers/media/cec/core/cec-adap.c | 8 +- drivers/media/cec/core/cec-core.c | 2 + drivers/media/cec/core/cec-priv.h | 1 + drivers/mmc/host/bcm2835.c | 4 +- drivers/mmc/host/litex_mmc.c | 1 + drivers/mmc/host/meson-gx-mmc.c | 14 +- drivers/mmc/host/mmci.c | 3 +- drivers/mmc/host/mtk-sd.c | 2 +- drivers/mmc/host/mvsdio.c | 2 +- drivers/mmc/host/omap.c | 2 +- drivers/mmc/host/omap_hsmmc.c | 6 +- drivers/mmc/host/owl-mmc.c | 2 +- drivers/mmc/host/sdhci-acpi.c | 2 +- drivers/mmc/host/sdhci-msm.c | 3 + drivers/mmc/host/sdhci-spear.c | 4 +- drivers/mmc/host/sh_mmcif.c | 2 +- drivers/mmc/host/sunxi-mmc.c | 4 +- drivers/mmc/host/usdhi6rol0.c | 6 +- drivers/net/dsa/mt7530.c | 35 +- drivers/net/dsa/mt7530.h | 5 + drivers/net/ethernet/emulex/benet/be_main.c | 4 +- .../mellanox/mlx5/core/steering/dr_action.c | 9 +- drivers/net/ethernet/qualcomm/qca_spi.c | 3 +- drivers/net/ethernet/sfc/ef10.c | 25 +- drivers/net/ethernet/sfc/ef100_nic.c | 7 +- drivers/net/ethernet/sfc/ef100_tx.c | 4 +- drivers/net/ethernet/sfc/ef100_tx.h | 2 +- drivers/net/ethernet/sfc/tx_common.c | 4 +- drivers/net/ethernet/sfc/tx_common.h | 2 +- drivers/net/ieee802154/mac802154_hwsim.c | 6 +- drivers/net/phy/dp83867.c | 2 +- drivers/net/phy/mdio_bus.c | 2 +- drivers/net/wireless/intel/iwlwifi/pcie/drv.c | 2 + drivers/nfc/nfcsim.c | 4 - drivers/nvme/host/core.c | 52 +- drivers/nvme/host/ioctl.c | 2 +- drivers/nvme/host/nvme.h | 3 +- drivers/nvme/target/passthru.c | 2 +- drivers/pci/controller/pci-hyperv.c | 139 +++-- drivers/platform/x86/amd/pmf/core.c | 10 +- .../platform/x86/intel/int3472/clk_and_regulator.c | 13 +- drivers/s390/cio/device.c | 5 +- drivers/soundwire/dmi-quirks.c | 7 + drivers/soundwire/qcom.c | 17 +- drivers/spi/spi-fsl-lpspi.c | 7 +- drivers/spi/spi-geni-qcom.c | 2 + drivers/target/iscsi/iscsi_target.c | 2 - drivers/target/iscsi/iscsi_target_login.c | 63 +-- drivers/target/iscsi/iscsi_target_nego.c | 74 +-- drivers/target/iscsi/iscsi_target_util.c | 51 ++ drivers/target/iscsi/iscsi_target_util.h | 4 + drivers/thermal/intel/intel_soc_dts_iosf.c | 2 +- drivers/usb/gadget/udc/amd5536udc_pci.c | 3 + drivers/vhost/net.c | 11 +- drivers/vhost/vdpa.c | 13 + fs/afs/write.c | 7 +- fs/btrfs/tree-log.c | 2 +- fs/cifs/connect.c | 9 +- fs/cifs/dfs.c | 9 +- fs/cifs/smb2pdu.c | 2 +- fs/gfs2/file.c | 17 +- fs/internal.h | 2 - fs/ksmbd/server.c | 33 +- fs/ksmbd/smb2misc.c | 33 +- fs/ksmbd/smb2pdu.c | 223 +++----- fs/ksmbd/smbacl.c | 10 +- fs/ksmbd/vfs.c | 587 +++++++++++---------- fs/ksmbd/vfs.h | 36 +- fs/ksmbd/vfs_cache.c | 7 +- fs/namei.c | 125 ++++- fs/nilfs2/page.c | 10 +- fs/nilfs2/segbuf.c | 6 + fs/nilfs2/segment.c | 7 + fs/nilfs2/super.c | 25 +- include/acpi/acpixf.h | 1 + include/asm-generic/vmlinux.lds.h | 3 + include/linux/gpio/driver.h | 8 + include/linux/libata.h | 2 +- include/linux/namei.h | 9 + include/linux/regulator/pca9450.h | 4 +- include/net/dsa.h | 8 + include/net/netfilter/nf_tables.h | 31 +- include/net/xfrm.h | 1 + include/target/iscsi/iscsi_target_core.h | 7 +- include/trace/events/writeback.h | 2 +- io_uring/net.c | 17 +- io_uring/poll.c | 9 +- kernel/bpf/btf.c | 20 +- kernel/bpf/syscall.c | 5 + kernel/bpf/verifier.c | 10 +- kernel/cgroup/cgroup.c | 20 +- kernel/cgroup/legacy_freezer.c | 8 +- kernel/time/tick-common.c | 13 +- kernel/time/tick-sched.c | 13 +- mm/maccess.c | 16 +- mm/memfd.c | 9 +- mm/mprotect.c | 2 +- mm/usercopy.c | 2 +- mm/vmalloc.c | 17 +- net/core/sock.c | 6 - net/dsa/dsa.c | 24 +- net/ipv4/esp4_offload.c | 3 + net/ipv4/xfrm4_input.c | 1 + net/ipv6/esp6_offload.c | 3 + net/ipv6/xfrm6_input.c | 3 + net/mptcp/pm_netlink.c | 1 + net/mptcp/protocol.c | 111 ++-- net/mptcp/subflow.c | 17 +- net/netfilter/ipvs/ip_vs_xmit.c | 2 + net/netfilter/nf_tables_api.c | 330 ++++++++++-- net/netfilter/nfnetlink_osf.c | 1 + net/netfilter/nft_immediate.c | 90 +++- net/netfilter/nft_set_bitmap.c | 5 +- net/netfilter/nft_set_hash.c | 23 +- net/netfilter/nft_set_pipapo.c | 20 +- net/netfilter/nft_set_rbtree.c | 5 +- net/netfilter/xt_osf.c | 1 - net/sched/sch_api.c | 2 + net/sched/sch_netem.c | 8 +- net/xfrm/xfrm_input.c | 1 + net/xfrm/xfrm_interface_core.c | 54 +- net/xfrm/xfrm_policy.c | 14 + scripts/gfp-translate | 6 +- scripts/mod/modpost.c | 5 + scripts/orc_hash.sh | 16 + sound/pci/hda/patch_realtek.c | 2 + sound/soc/amd/yc/acp6x-mach.c | 7 + sound/soc/codecs/nau8824.c | 24 + sound/soc/codecs/wcd938x-sdw.c | 1 - sound/soc/fsl/fsl_sai.c | 11 +- sound/soc/fsl/fsl_sai.h | 1 + sound/soc/generic/simple-card.c | 1 + tools/testing/selftests/net/fcnal-test.sh | 27 +- .../net/forwarding/mirror_gre_bridge_1d.sh | 4 + .../net/forwarding/mirror_gre_bridge_1q.sh | 4 + tools/testing/selftests/net/mptcp/config | 1 + tools/testing/selftests/net/mptcp/diag.sh | 42 +- tools/testing/selftests/net/mptcp/mptcp_connect.c | 8 +- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 20 + tools/testing/selftests/net/mptcp/mptcp_join.sh | 523 +++++++++++------- tools/testing/selftests/net/mptcp/mptcp_lib.sh | 64 +++ tools/testing/selftests/net/mptcp/mptcp_sockopt.c | 18 +- tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 20 +- tools/testing/selftests/net/mptcp/pm_netlink.sh | 27 +- tools/testing/selftests/net/mptcp/userspace_pm.sh | 13 +- tools/testing/selftests/net/tls.c | 24 +- tools/testing/selftests/net/vrf-xfrm-tests.sh | 32 +- tools/virtio/ringtest/main.h | 11 + virt/kvm/kvm_main.c | 20 +- 206 files changed, 2818 insertions(+), 1461 deletions(-)
From: Shyam Prasad N sprasad@microsoft.com
[ Upstream commit 91f4480c41f56f7c723323cf7f581f1d95d9ffbc ]
The ordering of status checks at the beginning of cifs_tree_connect is wrong. As a result, a tcon which is good may stay marked as needing reconnect infinitely.
Fixes: 2f0e4f034220 ("cifs: check only tcon status on tcon related functions") Cc: stable@vger.kernel.org # 6.3 Signed-off-by: Shyam Prasad N sprasad@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cifs/connect.c | 9 +++++---- fs/cifs/dfs.c | 9 +++++---- 2 files changed, 10 insertions(+), 8 deletions(-)
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index 8e9a672320ab7..1250d156619b7 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -4086,16 +4086,17 @@ int cifs_tree_connect(const unsigned int xid, struct cifs_tcon *tcon, const stru
/* only send once per connect */ spin_lock(&tcon->tc_lock); + if (tcon->status == TID_GOOD) { + spin_unlock(&tcon->tc_lock); + return 0; + } + if (tcon->status != TID_NEW && tcon->status != TID_NEED_TCON) { spin_unlock(&tcon->tc_lock); return -EHOSTDOWN; }
- if (tcon->status == TID_GOOD) { - spin_unlock(&tcon->tc_lock); - return 0; - } tcon->status = TID_IN_TCON; spin_unlock(&tcon->tc_lock);
diff --git a/fs/cifs/dfs.c b/fs/cifs/dfs.c index 2f93bf8c3325a..2390b2fedd6a3 100644 --- a/fs/cifs/dfs.c +++ b/fs/cifs/dfs.c @@ -575,16 +575,17 @@ int cifs_tree_connect(const unsigned int xid, struct cifs_tcon *tcon, const stru
/* only send once per connect */ spin_lock(&tcon->tc_lock); + if (tcon->status == TID_GOOD) { + spin_unlock(&tcon->tc_lock); + return 0; + } + if (tcon->status != TID_NEW && tcon->status != TID_NEED_TCON) { spin_unlock(&tcon->tc_lock); return -EHOSTDOWN; }
- if (tcon->status == TID_GOOD) { - spin_unlock(&tcon->tc_lock); - return 0; - } tcon->status = TID_IN_TCON; spin_unlock(&tcon->tc_lock);
From: Rodrigo Siqueira Rodrigo.Siqueira@amd.com
[ Upstream commit f7511289821ffccc07579406d6ab520aa11049f5 ]
[Why & How] The old dc_commit_updates_for_stream lacks manipulation for many corner cases where the DC feature requires special attention; as a result, it starts to show its limitation (e.g., the SubVP feature is not supported by it, among other cases). To modernize and unify our internal API, this commit replaces the old dc_commit_updates_for_stream with dc_update_planes_and_stream, which has more features.
Reviewed-by: Harry Wentland Harry.Wentland@amd.com Acked-by: Qingqing Zhuo qingqing.zhuo@amd.com Signed-off-by: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Stable-dep-of: ea2062dd1f03 ("drm/amd/display: fix the system hang while disable PSR") Signed-off-by: Sasha Levin sashal@kernel.org --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 30 +++++++++---------- 1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 289c261ffe876..ae58f24be511c 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -2683,10 +2683,12 @@ static void dm_gpureset_commit_state(struct dc_state *dc_state, bundle->surface_updates[m].surface->force_full_update = true; } - dc_commit_updates_for_stream( - dm->dc, bundle->surface_updates, + + dc_update_planes_and_stream(dm->dc, + bundle->surface_updates, dc_state->stream_status->plane_count, - dc_state->streams[k], &bundle->stream_update, dc_state); + dc_state->streams[k], + &bundle->stream_update); }
cleanup: @@ -8198,12 +8200,11 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state, acrtc_state->stream->link->psr_settings.psr_allow_active) amdgpu_dm_psr_disable(acrtc_state->stream);
- dc_commit_updates_for_stream(dm->dc, - bundle->surface_updates, - planes_count, - acrtc_state->stream, - &bundle->stream_update, - dc_state); + dc_update_planes_and_stream(dm->dc, + bundle->surface_updates, + planes_count, + acrtc_state->stream, + &bundle->stream_update);
/** * Enable or disable the interrupts on the backend. @@ -8741,12 +8742,11 @@ static void amdgpu_dm_atomic_commit_tail(struct drm_atomic_state *state)
mutex_lock(&dm->dc_lock); - dc_commit_updates_for_stream(dm->dc, - dummy_updates, - status->plane_count, - dm_new_crtc_state->stream, - &stream_update, - dc_state); + dc_update_planes_and_stream(dm->dc, + dummy_updates, + status->plane_count, + dm_new_crtc_state->stream, + &stream_update); mutex_unlock(&dm->dc_lock); }
From: Rodrigo Siqueira Rodrigo.Siqueira@amd.com
[ Upstream commit 81f743a08f3b214638aa389e252ae5e6c3592e7c ]
[Why & How] This commit is part of a sequence of changes that replaces the commit sequence used in the DC with a new one. As a result of this transition, we moved some specific parts from the commit sequence and brought them to amdgpu_dm. This commit adds a wrapper inside DM that enable our drivers to do any necessary preparation or change before we offload the plane/stream update to DC.
Reviewed-by: Harry Wentland Harry.Wentland@amd.com Acked-by: Qingqing Zhuo qingqing.zhuo@amd.com Signed-off-by: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Stable-dep-of: ea2062dd1f03 ("drm/amd/display: fix the system hang while disable PSR") Signed-off-by: Sasha Levin sashal@kernel.org --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 51 +++++++++++++++---- 1 file changed, 41 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index ae58f24be511c..51a3130f607e0 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -350,6 +350,35 @@ static inline bool is_dc_timing_adjust_needed(struct dm_crtc_state *old_state, return false; }
+/** + * update_planes_and_stream_adapter() - Send planes to be updated in DC + * + * DC has a generic way to update planes and stream via + * dc_update_planes_and_stream function; however, DM might need some + * adjustments and preparation before calling it. This function is a wrapper + * for the dc_update_planes_and_stream that does any required configuration + * before passing control to DC. + */ +static inline bool update_planes_and_stream_adapter(struct dc *dc, + int update_type, + int planes_count, + struct dc_stream_state *stream, + struct dc_stream_update *stream_update, + struct dc_surface_update *array_of_surface_update) +{ + /* + * Previous frame finished and HW is ready for optimization. + */ + if (update_type == UPDATE_TYPE_FAST) + dc_post_update_surfaces_to_stream(dc); + + return dc_update_planes_and_stream(dc, + array_of_surface_update, + planes_count, + stream, + stream_update); +} + /** * dm_pflip_high_irq() - Handle pageflip interrupt * @interrupt_params: ignored @@ -2684,11 +2713,12 @@ static void dm_gpureset_commit_state(struct dc_state *dc_state, true; }
- dc_update_planes_and_stream(dm->dc, - bundle->surface_updates, - dc_state->stream_status->plane_count, - dc_state->streams[k], - &bundle->stream_update); + update_planes_and_stream_adapter(dm->dc, + UPDATE_TYPE_FULL, + dc_state->stream_status->plane_count, + dc_state->streams[k], + &bundle->stream_update, + bundle->surface_updates); }
cleanup: @@ -8200,11 +8230,12 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state, acrtc_state->stream->link->psr_settings.psr_allow_active) amdgpu_dm_psr_disable(acrtc_state->stream);
- dc_update_planes_and_stream(dm->dc, - bundle->surface_updates, - planes_count, - acrtc_state->stream, - &bundle->stream_update); + update_planes_and_stream_adapter(dm->dc, + acrtc_state->update_type, + planes_count, + acrtc_state->stream, + &bundle->stream_update, + bundle->surface_updates);
/** * Enable or disable the interrupts on the backend.
From: Tom Chung chiahsuan.chung@amd.com
[ Upstream commit ea2062dd1f0384ae1b136d333ee4ced15bedae38 ]
[Why] When the PSR enabled. If you try to adjust the timing parameters, it may cause system hang. Because the timing mismatch with the DMCUB settings.
[How] Disable the PSR before adjusting timing parameters.
Cc: Mario Limonciello mario.limonciello@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Acked-by: Stylon Wang stylon.wang@amd.com Signed-off-by: Tom Chung chiahsuan.chung@amd.com Reviewed-by: Wayne Lin Wayne.Lin@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 51a3130f607e0..2cbd6949804f5 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -8213,6 +8213,12 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state, if (acrtc_state->abm_level != dm_old_crtc_state->abm_level) bundle->stream_update.abm_level = &acrtc_state->abm_level;
+ mutex_lock(&dm->dc_lock); + if ((acrtc_state->update_type > UPDATE_TYPE_FAST) && + acrtc_state->stream->link->psr_settings.psr_allow_active) + amdgpu_dm_psr_disable(acrtc_state->stream); + mutex_unlock(&dm->dc_lock); + /* * If FreeSync state on the stream has changed then we need to * re-adjust the min/max bounds now that DC doesn't handle this @@ -8226,10 +8232,6 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state, spin_unlock_irqrestore(&pcrtc->dev->event_lock, flags); } mutex_lock(&dm->dc_lock); - if ((acrtc_state->update_type > UPDATE_TYPE_FAST) && - acrtc_state->stream->link->psr_settings.psr_allow_active) - amdgpu_dm_psr_disable(acrtc_state->stream); - update_planes_and_stream_adapter(dm->dc, acrtc_state->update_type, planes_count,
From: Damien Le Moal dlemoal@kernel.org
[ Upstream commit 6aa0365a3c8512587fffd42fe438768709ddef8e ]
When an ATA port is resumed from sleep, the port is reset and a power management request issued to libata EH to reset the port and rescanning the device(s) attached to the port. Device rescanning is done by scheduling an ata_scsi_dev_rescan() work, which will execute scsi_rescan_device().
However, scsi_rescan_device() takes the generic device lock, which is also taken by dpm_resume() when the SCSI device is resumed as well. If a device rescan execution starts before the completion of the SCSI device resume, the rcu locking used to refresh the cached VPD pages of the device, combined with the generic device locking from scsi_rescan_device() and from dpm_resume() can cause a deadlock.
Avoid this situation by changing struct ata_port scsi_rescan_task to be a delayed work instead of a simple work_struct. ata_scsi_dev_rescan() is modified to check if the SCSI device associated with the ATA device that must be rescanned is not suspended. If the SCSI device is still suspended, ata_scsi_dev_rescan() returns early and reschedule itself for execution after an arbitrary delay of 5ms.
Reported-by: Kai-Heng Feng kai.heng.feng@canonical.com Reported-by: Joe Breuer linux-kernel@jmbreuer.net Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217530 Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management") Signed-off-by: Damien Le Moal dlemoal@kernel.org Reviewed-by: Hannes Reinecke hare@suse.de Tested-by: Kai-Heng Feng kai.heng.feng@canonical.com Tested-by: Joe Breuer linux-kernel@jmbreuer.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ata/libata-core.c | 3 ++- drivers/ata/libata-eh.c | 2 +- drivers/ata/libata-scsi.c | 22 +++++++++++++++++++++- include/linux/libata.h | 2 +- 4 files changed, 25 insertions(+), 4 deletions(-)
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 14c17c3bda4e6..d37df499c9fa6 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -5348,7 +5348,7 @@ struct ata_port *ata_port_alloc(struct ata_host *host)
mutex_init(&ap->scsi_scan_mutex); INIT_DELAYED_WORK(&ap->hotplug_task, ata_scsi_hotplug); - INIT_WORK(&ap->scsi_rescan_task, ata_scsi_dev_rescan); + INIT_DELAYED_WORK(&ap->scsi_rescan_task, ata_scsi_dev_rescan); INIT_LIST_HEAD(&ap->eh_done_q); init_waitqueue_head(&ap->eh_wait_q); init_completion(&ap->park_req_pending); @@ -5954,6 +5954,7 @@ static void ata_port_detach(struct ata_port *ap) WARN_ON(!(ap->pflags & ATA_PFLAG_UNLOADED));
cancel_delayed_work_sync(&ap->hotplug_task); + cancel_delayed_work_sync(&ap->scsi_rescan_task);
skip_eh: /* clean up zpodd on port removal */ diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index a6c9018118027..6f8d141915939 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -2984,7 +2984,7 @@ static int ata_eh_revalidate_and_attach(struct ata_link *link, ehc->i.flags |= ATA_EHI_SETMODE;
/* schedule the scsi_rescan_device() here */ - schedule_work(&(ap->scsi_rescan_task)); + schedule_delayed_work(&ap->scsi_rescan_task, 0); } else if (dev->class == ATA_DEV_UNKNOWN && ehc->tries[dev->devno] && ata_class_enabled(ehc->classes[dev->devno])) { diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index a9e768fdae4e3..51673d95e5b22 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -4597,10 +4597,11 @@ int ata_scsi_user_scan(struct Scsi_Host *shost, unsigned int channel, void ata_scsi_dev_rescan(struct work_struct *work) { struct ata_port *ap = - container_of(work, struct ata_port, scsi_rescan_task); + container_of(work, struct ata_port, scsi_rescan_task.work); struct ata_link *link; struct ata_device *dev; unsigned long flags; + bool delay_rescan = false;
mutex_lock(&ap->scsi_scan_mutex); spin_lock_irqsave(ap->lock, flags); @@ -4614,6 +4615,21 @@ void ata_scsi_dev_rescan(struct work_struct *work) if (scsi_device_get(sdev)) continue;
+ /* + * If the rescan work was scheduled because of a resume + * event, the port is already fully resumed, but the + * SCSI device may not yet be fully resumed. In such + * case, executing scsi_rescan_device() may cause a + * deadlock with the PM code on device_lock(). Prevent + * this by giving up and retrying rescan after a short + * delay. + */ + delay_rescan = sdev->sdev_gendev.power.is_suspended; + if (delay_rescan) { + scsi_device_put(sdev); + break; + } + spin_unlock_irqrestore(ap->lock, flags); scsi_rescan_device(&(sdev->sdev_gendev)); scsi_device_put(sdev); @@ -4623,4 +4639,8 @@ void ata_scsi_dev_rescan(struct work_struct *work)
spin_unlock_irqrestore(ap->lock, flags); mutex_unlock(&ap->scsi_scan_mutex); + + if (delay_rescan) + schedule_delayed_work(&ap->scsi_rescan_task, + msecs_to_jiffies(5)); } diff --git a/include/linux/libata.h b/include/linux/libata.h index a759dfbdcc91a..bb5dda3ec2db5 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -836,7 +836,7 @@ struct ata_port {
struct mutex scsi_scan_mutex; struct delayed_work hotplug_task; - struct work_struct scsi_rescan_task; + struct delayed_work scsi_rescan_task;
unsigned int hsm_task_state;
From: Alexei Starovoitov ast@kernel.org
commit d319f344561de23e810515d109c7278919bff7b0 upstream.
There are several issues with copy_from_user_nofault():
- access_ok() is designed for user context only and for that reason it has WARN_ON_IN_IRQ() which triggers when bpf, kprobe, eprobe and perf on ppc are calling it from irq.
- it's missing nmi_uaccess_okay() which is a nop on all architectures except x86 where it's required. The comment in arch/x86/mm/tlb.c explains the details why it's necessary. Calling copy_from_user_nofault() from bpf, [ke]probe without this check is not safe.
- __copy_from_user_inatomic() under CONFIG_HARDENED_USERCOPY is calling check_object_size()->__check_object_size()->check_heap_object()->find_vmap_area()->spin_lock() which is not safe to do from bpf, [ke]probe and perf due to potential deadlock.
Fix all three issues. At the end the copy_from_user_nofault() becomes equivalent to copy_from_user_nmi() from safety point of view with a difference in the return value.
Reported-by: Hsin-Wei Hung hsinweih@uci.edu Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Florian Lehner dev@der-flo.net Tested-by: Hsin-Wei Hung hsinweih@uci.edu Tested-by: Florian Lehner dev@der-flo.net Link: https://lore.kernel.org/r/20230410174345.4376-2-dev@der-flo.net Signed-off-by: Alexei Starovoitov ast@kernel.org Cc: Javier Honduvilla Coto javierhonduco@gmail.com Cc: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/maccess.c | 16 +++++++++++----- mm/usercopy.c | 2 +- 2 files changed, 12 insertions(+), 6 deletions(-)
--- a/mm/maccess.c +++ b/mm/maccess.c @@ -5,6 +5,7 @@ #include <linux/export.h> #include <linux/mm.h> #include <linux/uaccess.h> +#include <asm/tlb.h>
bool __weak copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size) @@ -113,11 +114,16 @@ Efault: long copy_from_user_nofault(void *dst, const void __user *src, size_t size) { long ret = -EFAULT; - if (access_ok(src, size)) { - pagefault_disable(); - ret = __copy_from_user_inatomic(dst, src, size); - pagefault_enable(); - } + + if (!__access_ok(src, size)) + return ret; + + if (!nmi_uaccess_okay()) + return ret; + + pagefault_disable(); + ret = __copy_from_user_inatomic(dst, src, size); + pagefault_enable();
if (ret) return -EFAULT; --- a/mm/usercopy.c +++ b/mm/usercopy.c @@ -173,7 +173,7 @@ static inline void check_heap_object(con return; }
- if (is_vmalloc_addr(ptr)) { + if (is_vmalloc_addr(ptr) && !pagefault_disabled()) { struct vmap_area *area = find_vmap_area(addr);
if (!area)
From: Lino Sanfilippo l.sanfilippo@kunbus.com
commit 0e069265bce5a40c4eee52e2364bbbd4dabee94a upstream.
Writing the TPM_INT_STATUS register in the interrupt handler to clear the interrupts only has effect if a locality is held. Since this is not guaranteed at the time the interrupt is fired, claim the locality explicitly in the handler.
Signed-off-by: Lino Sanfilippo l.sanfilippo@kunbus.com Tested-by: Michael Niewöhner linux@mniewoehner.de Tested-by: Jarkko Sakkinen jarkko@kernel.org Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/tpm/tpm_tis_core.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/char/tpm/tpm_tis_core.c +++ b/drivers/char/tpm/tpm_tis_core.c @@ -772,7 +772,9 @@ static irqreturn_t tis_int_handler(int d wake_up_interruptible(&priv->int_queue);
/* Clear interrupts handled with TPM_EOI */ + tpm_tis_request_locality(chip, 0); rc = tpm_tis_write32(priv, TPM_INT_STATUS(priv->locality), interrupt); + tpm_tis_relinquish_locality(chip, 0); if (rc < 0) return IRQ_NONE;
From: Namjae Jeon linkinjeon@kernel.org
commit 2b9b8f3b68edb3d67d79962f02e26dbb5ae3808d upstream.
->StructureSize2 indicates command payload size. ksmbd should validate this size with rfc1002 length before accessing it. This patch remove unneeded check and add the validation for this.
[ 8.912583] BUG: KASAN: slab-out-of-bounds in ksmbd_smb2_check_message+0x12a/0xc50 [ 8.913051] Read of size 2 at addr ffff88800ac7d92c by task kworker/0:0/7 ... [ 8.914967] Call Trace: [ 8.915126] <TASK> [ 8.915267] dump_stack_lvl+0x33/0x50 [ 8.915506] print_report+0xcc/0x620 [ 8.916558] kasan_report+0xae/0xe0 [ 8.917080] kasan_check_range+0x35/0x1b0 [ 8.917334] ksmbd_smb2_check_message+0x12a/0xc50 [ 8.917935] ksmbd_verify_smb_message+0xae/0xd0 [ 8.918223] handle_ksmbd_work+0x192/0x820 [ 8.918478] process_one_work+0x419/0x760 [ 8.918727] worker_thread+0x2a2/0x6f0 [ 8.919222] kthread+0x187/0x1d0 [ 8.919723] ret_from_fork+0x1f/0x30 [ 8.919954] </TASK>
Cc: stable@vger.kernel.org Reported-by: Chih-Yen Chang cc85nod@gmail.com Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ksmbd/smb2misc.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-)
--- a/fs/ksmbd/smb2misc.c +++ b/fs/ksmbd/smb2misc.c @@ -351,6 +351,7 @@ int ksmbd_smb2_check_message(struct ksmb int command; __u32 clc_len; /* calculated length */ __u32 len = get_rfc1002_len(work->request_buf); + __u32 req_struct_size;
if (le32_to_cpu(hdr->NextCommand) > 0) len = le32_to_cpu(hdr->NextCommand); @@ -373,17 +374,9 @@ int ksmbd_smb2_check_message(struct ksmb }
if (smb2_req_struct_sizes[command] != pdu->StructureSize2) { - if (command != SMB2_OPLOCK_BREAK_HE && - (hdr->Status == 0 || pdu->StructureSize2 != SMB2_ERROR_STRUCTURE_SIZE2_LE)) { - /* error packets have 9 byte structure size */ - ksmbd_debug(SMB, - "Illegal request size %u for command %d\n", - le16_to_cpu(pdu->StructureSize2), command); - return 1; - } else if (command == SMB2_OPLOCK_BREAK_HE && - hdr->Status == 0 && - le16_to_cpu(pdu->StructureSize2) != OP_BREAK_STRUCT_SIZE_20 && - le16_to_cpu(pdu->StructureSize2) != OP_BREAK_STRUCT_SIZE_21) { + if (command == SMB2_OPLOCK_BREAK_HE && + le16_to_cpu(pdu->StructureSize2) != OP_BREAK_STRUCT_SIZE_20 && + le16_to_cpu(pdu->StructureSize2) != OP_BREAK_STRUCT_SIZE_21) { /* special case for SMB2.1 lease break message */ ksmbd_debug(SMB, "Illegal request size %d for oplock break\n", @@ -392,6 +385,14 @@ int ksmbd_smb2_check_message(struct ksmb } }
+ req_struct_size = le16_to_cpu(pdu->StructureSize2) + + __SMB2_HEADER_STRUCTURE_SIZE; + if (command == SMB2_LOCK_HE) + req_struct_size -= sizeof(struct smb2_lock_element); + + if (req_struct_size > len + 1) + return 1; + if (smb2_calc_size(hdr, &clc_len)) return 1;
From: Namjae Jeon linkinjeon@kernel.org
commit 5fe7f7b78290638806211046a99f031ff26164e1 upstream.
ksmbd_smb2_check_message doesn't validate hdr->NextCommand. If ->NextCommand is bigger than Offset + Length of smb2 write, It will allow oversized smb2 write length. It will cause OOB read in smb2_write.
Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21164 Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ksmbd/smb2misc.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
--- a/fs/ksmbd/smb2misc.c +++ b/fs/ksmbd/smb2misc.c @@ -351,10 +351,16 @@ int ksmbd_smb2_check_message(struct ksmb int command; __u32 clc_len; /* calculated length */ __u32 len = get_rfc1002_len(work->request_buf); - __u32 req_struct_size; + __u32 req_struct_size, next_cmd = le32_to_cpu(hdr->NextCommand);
- if (le32_to_cpu(hdr->NextCommand) > 0) - len = le32_to_cpu(hdr->NextCommand); + if ((u64)work->next_smb2_rcv_hdr_off + next_cmd > len) { + pr_err("next command(%u) offset exceeds smb msg size\n", + next_cmd); + return 1; + } + + if (next_cmd > 0) + len = next_cmd; else if (work->next_smb2_rcv_hdr_off) len -= work->next_smb2_rcv_hdr_off;
From: Namjae Jeon linkinjeon@kernel.org
commit 5005bcb4219156f1bf7587b185080ec1da08518e upstream.
This patch validate session id and tree id in compound request. If first operation in the compound is SMB2 ECHO request, ksmbd bypass session and tree validation. So work->sess and work->tcon could be NULL. If secound request in the compound access work->sess or tcon, It cause NULL pointer dereferecing error.
Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21165 Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ksmbd/server.c | 33 ++++++++++++++++++++------------- fs/ksmbd/smb2pdu.c | 44 +++++++++++++++++++++++++++++++++++++++----- 2 files changed, 59 insertions(+), 18 deletions(-)
--- a/fs/ksmbd/server.c +++ b/fs/ksmbd/server.c @@ -185,24 +185,31 @@ static void __handle_ksmbd_work(struct k goto send; }
- if (conn->ops->check_user_session) { - rc = conn->ops->check_user_session(work); - if (rc < 0) { - command = conn->ops->get_cmd_val(work); - conn->ops->set_rsp_status(work, - STATUS_USER_SESSION_DELETED); - goto send; - } else if (rc > 0) { - rc = conn->ops->get_ksmbd_tcon(work); + do { + if (conn->ops->check_user_session) { + rc = conn->ops->check_user_session(work); if (rc < 0) { - conn->ops->set_rsp_status(work, - STATUS_NETWORK_NAME_DELETED); + if (rc == -EINVAL) + conn->ops->set_rsp_status(work, + STATUS_INVALID_PARAMETER); + else + conn->ops->set_rsp_status(work, + STATUS_USER_SESSION_DELETED); goto send; + } else if (rc > 0) { + rc = conn->ops->get_ksmbd_tcon(work); + if (rc < 0) { + if (rc == -EINVAL) + conn->ops->set_rsp_status(work, + STATUS_INVALID_PARAMETER); + else + conn->ops->set_rsp_status(work, + STATUS_NETWORK_NAME_DELETED); + goto send; + } } } - }
- do { rc = __process_request(work, conn, &command); if (rc == SERVER_HANDLER_ABORT) break; --- a/fs/ksmbd/smb2pdu.c +++ b/fs/ksmbd/smb2pdu.c @@ -91,7 +91,6 @@ int smb2_get_ksmbd_tcon(struct ksmbd_wor unsigned int cmd = le16_to_cpu(req_hdr->Command); int tree_id;
- work->tcon = NULL; if (cmd == SMB2_TREE_CONNECT_HE || cmd == SMB2_CANCEL_HE || cmd == SMB2_LOGOFF_HE) { @@ -105,10 +104,28 @@ int smb2_get_ksmbd_tcon(struct ksmbd_wor }
tree_id = le32_to_cpu(req_hdr->Id.SyncId.TreeId); + + /* + * If request is not the first in Compound request, + * Just validate tree id in header with work->tcon->id. + */ + if (work->next_smb2_rcv_hdr_off) { + if (!work->tcon) { + pr_err("The first operation in the compound does not have tcon\n"); + return -EINVAL; + } + if (work->tcon->id != tree_id) { + pr_err("tree id(%u) is different with id(%u) in first operation\n", + tree_id, work->tcon->id); + return -EINVAL; + } + return 1; + } + work->tcon = ksmbd_tree_conn_lookup(work->sess, tree_id); if (!work->tcon) { pr_err("Invalid tid %d\n", tree_id); - return -EINVAL; + return -ENOENT; }
return 1; @@ -547,7 +564,6 @@ int smb2_check_user_session(struct ksmbd unsigned int cmd = conn->ops->get_cmd_val(work); unsigned long long sess_id;
- work->sess = NULL; /* * SMB2_ECHO, SMB2_NEGOTIATE, SMB2_SESSION_SETUP command do not * require a session id, so no need to validate user session's for @@ -558,15 +574,33 @@ int smb2_check_user_session(struct ksmbd return 0;
if (!ksmbd_conn_good(conn)) - return -EINVAL; + return -EIO;
sess_id = le64_to_cpu(req_hdr->SessionId); + + /* + * If request is not the first in Compound request, + * Just validate session id in header with work->sess->id. + */ + if (work->next_smb2_rcv_hdr_off) { + if (!work->sess) { + pr_err("The first operation in the compound does not have sess\n"); + return -EINVAL; + } + if (work->sess->id != sess_id) { + pr_err("session id(%llu) is different with the first operation(%lld)\n", + sess_id, work->sess->id); + return -EINVAL; + } + return 1; + } + /* Check for validity of user session */ work->sess = ksmbd_session_lookup_all(conn, sess_id); if (work->sess) return 1; ksmbd_debug(SMB, "Invalid user session, Uid %llu\n", sess_id); - return -EINVAL; + return -ENOENT; }
static void destroy_previous_session(struct ksmbd_conn *conn,
From: Mike Kravetz mike.kravetz@oracle.com
commit b7cb3821905b79b6ed474fd5ba34d1e187649139 upstream.
This effectively reverts commit 16c243e99d33 ("udmabuf: Add support for mapping hugepages (v4)"). Recently, Junxiao Chang found a BUG with page map counting as described here [1]. This issue pointed out that the udmabuf driver was making direct use of subpages of hugetlb pages. This is not a good idea, and no other mm code attempts such use. In addition to the mapcount issue, this also causes issues with hugetlb vmemmap optimization and page poisoning.
For now, remove hugetlb support.
If udmabuf wants to be used on hugetlb mappings, it should be changed to only use complete hugetlb pages. This will require different alignment and size requirements on the UDMABUF_CREATE API.
[1] https://lore.kernel.org/linux-mm/20230512072036.1027784-1-junxiao.chang@inte...
Link: https://lkml.kernel.org/r/20230608204927.88711-1-mike.kravetz@oracle.com Fixes: 16c243e99d33 ("udmabuf: Add support for mapping hugepages (v4)") Signed-off-by: Mike Kravetz mike.kravetz@oracle.com Acked-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Acked-by: Vivek Kasireddy vivek.kasireddy@intel.com Acked-by: Gerd Hoffmann kraxel@redhat.com Cc: David Hildenbrand david@redhat.com Cc: Dongwon Kim dongwon.kim@intel.com Cc: James Houghton jthoughton@google.com Cc: Jerome Marchand jmarchan@redhat.com Cc: Junxiao Chang junxiao.chang@intel.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Michal Hocko mhocko@suse.com Cc: Muchun Song muchun.song@linux.dev Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma-buf/udmabuf.c | 47 +++++----------------------------------------- 1 file changed, 6 insertions(+), 41 deletions(-)
--- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -12,7 +12,6 @@ #include <linux/shmem_fs.h> #include <linux/slab.h> #include <linux/udmabuf.h> -#include <linux/hugetlb.h> #include <linux/vmalloc.h> #include <linux/iosys-map.h>
@@ -207,9 +206,7 @@ static long udmabuf_create(struct miscde struct udmabuf *ubuf; struct dma_buf *buf; pgoff_t pgoff, pgcnt, pgidx, pgbuf = 0, pglimit; - struct page *page, *hpage = NULL; - pgoff_t subpgoff, maxsubpgs; - struct hstate *hpstate; + struct page *page; int seals, ret = -EINVAL; u32 i, flags;
@@ -245,7 +242,7 @@ static long udmabuf_create(struct miscde if (!memfd) goto err; mapping = memfd->f_mapping; - if (!shmem_mapping(mapping) && !is_file_hugepages(memfd)) + if (!shmem_mapping(mapping)) goto err; seals = memfd_fcntl(memfd, F_GET_SEALS, 0); if (seals == -EINVAL) @@ -256,48 +253,16 @@ static long udmabuf_create(struct miscde goto err; pgoff = list[i].offset >> PAGE_SHIFT; pgcnt = list[i].size >> PAGE_SHIFT; - if (is_file_hugepages(memfd)) { - hpstate = hstate_file(memfd); - pgoff = list[i].offset >> huge_page_shift(hpstate); - subpgoff = (list[i].offset & - ~huge_page_mask(hpstate)) >> PAGE_SHIFT; - maxsubpgs = huge_page_size(hpstate) >> PAGE_SHIFT; - } for (pgidx = 0; pgidx < pgcnt; pgidx++) { - if (is_file_hugepages(memfd)) { - if (!hpage) { - hpage = find_get_page_flags(mapping, pgoff, - FGP_ACCESSED); - if (!hpage) { - ret = -EINVAL; - goto err; - } - } - page = hpage + subpgoff; - get_page(page); - subpgoff++; - if (subpgoff == maxsubpgs) { - put_page(hpage); - hpage = NULL; - subpgoff = 0; - pgoff++; - } - } else { - page = shmem_read_mapping_page(mapping, - pgoff + pgidx); - if (IS_ERR(page)) { - ret = PTR_ERR(page); - goto err; - } + page = shmem_read_mapping_page(mapping, pgoff + pgidx); + if (IS_ERR(page)) { + ret = PTR_ERR(page); + goto err; } ubuf->pages[pgbuf++] = page; } fput(memfd); memfd = NULL; - if (hpage) { - put_page(hpage); - hpage = NULL; - } }
exp_info.ops = &udmabuf_ops;
From: Linus Torvalds torvalds@linux-foundation.org
commit 69cbeb61ff9093a9155cb19a36d633033f71093a upstream.
This reverts commit e7b813b32a42a3a6281a4fd9ae7700a0257c1d50 (and the subsequent fix for it: 41a15855c1ee "efi: random: fix NULL-deref when refreshing seed").
It turns otu to cause non-deterministic boot stalls on at least a HP 6730b laptop.
Reported-and-bisected-by: Sami Korkalainen sami.korkalainen@proton.me Link: https://lore.kernel.org/all/GQUnKz2al3yke5mB2i1kp3SzNHjK8vi6KJEh7rnLrOQ24Orl... Cc: Jason A. Donenfeld Jason@zx2c4.com Cc: Bagas Sanjaya bagasdotme@gmail.com Cc: stable@kernel.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firmware/efi/efi.c | 21 --------------------- 1 file changed, 21 deletions(-)
--- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -361,24 +361,6 @@ static void __init efi_debugfs_init(void static inline void efi_debugfs_init(void) {} #endif
-static void refresh_nv_rng_seed(struct work_struct *work) -{ - u8 seed[EFI_RANDOM_SEED_SIZE]; - - get_random_bytes(seed, sizeof(seed)); - efi.set_variable(L"RandomSeed", &LINUX_EFI_RANDOM_SEED_TABLE_GUID, - EFI_VARIABLE_NON_VOLATILE | EFI_VARIABLE_BOOTSERVICE_ACCESS | - EFI_VARIABLE_RUNTIME_ACCESS, sizeof(seed), seed); - memzero_explicit(seed, sizeof(seed)); -} -static int refresh_nv_rng_seed_notification(struct notifier_block *nb, unsigned long action, void *data) -{ - static DECLARE_WORK(work, refresh_nv_rng_seed); - schedule_work(&work); - return NOTIFY_DONE; -} -static struct notifier_block refresh_nv_rng_seed_nb = { .notifier_call = refresh_nv_rng_seed_notification }; - /* * We register the efi subsystem with the firmware subsystem and the * efivars subsystem with the efi subsystem, if the system was booted with @@ -451,9 +433,6 @@ static int __init efisubsys_init(void) platform_device_register_simple("efi_secret", 0, NULL, 0); #endif
- if (efi_rt_services_supported(EFI_RT_SUPPORTED_SET_VARIABLE)) - execute_with_initialized_rng(&refresh_nv_rng_seed_nb); - return 0;
err_remove_group:
From: Vishal Moola (Oracle) vishal.moola@gmail.com
commit a2b6f2ab3e144f8e23666aafeba0e4d9ea4b7975 upstream.
Commit acc8d8588cb7 converted afs_writepages_region() to write back a folio batch. If writeback needs rescheduling, the function exits without dropping the references to the folios in fbatch. This patch fixes that.
[DH: Moved the added line before the _leave()]
Fixes: acc8d8588cb7 ("afs: convert afs_writepages_region() to use filemap_get_folios_tag()") Signed-off-by: Vishal Moola (Oracle) vishal.moola@gmail.com Signed-off-by: David Howells dhowells@redhat.com cc: Marc Dionne marc.dionne@auristor.com cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/20230607204120.89416-1-vishal.moola@gmail.com/ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/afs/write.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/afs/write.c b/fs/afs/write.c index c822d6006033..fd433024070e 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -763,6 +763,7 @@ static int afs_writepages_region(struct address_space *mapping, if (wbc->sync_mode == WB_SYNC_NONE) { if (skips >= 5 || need_resched()) { *_next = start; + folio_batch_release(&fbatch); _leave(" = 0 [%llx]", *_next); return 0; }
From: Vishal Moola (Oracle) vishal.moola@gmail.com
commit 819da022dd007398d0c42ebcd8dbb1b681acea53 upstream.
Commit acc8d8588cb7 converted afs_writepages_region() to write back a folio batch. The function waits for writeback to a folio, but then proceeds to the rest of the batch without trying to write that folio again. This patch fixes has it attempt to write the folio again.
[DH: Also remove an 'else' that adding a goto makes redundant]
Fixes: acc8d8588cb7 ("afs: convert afs_writepages_region() to use filemap_get_folios_tag()") Signed-off-by: Vishal Moola (Oracle) vishal.moola@gmail.com Signed-off-by: David Howells dhowells@redhat.com cc: Marc Dionne marc.dionne@auristor.com cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/20230607204120.89416-2-vishal.moola@gmail.com/ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/afs/write.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/afs/write.c b/fs/afs/write.c index fd433024070e..8750b99c3f56 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -731,6 +731,7 @@ static int afs_writepages_region(struct address_space *mapping, * (changing page->mapping to NULL), or even swizzled * back from swapper_space to tmpfs file mapping */ +try_again: if (wbc->sync_mode != WB_SYNC_NONE) { ret = folio_lock_killable(folio); if (ret < 0) { @@ -757,9 +758,10 @@ static int afs_writepages_region(struct address_space *mapping, #ifdef CONFIG_AFS_FSCACHE folio_wait_fscache(folio); #endif - } else { - start += folio_size(folio); + goto try_again; } + + start += folio_size(folio); if (wbc->sync_mode == WB_SYNC_NONE) { if (skips >= 5 || need_resched()) { *_next = start;
From: Thomas Gleixner tglx@linutronix.de
commit 13bb06f8dd42071cb9a49f6e21099eea05d4b856 upstream.
The tick period is aligned very early while the first clock_event_device is registered. At that point the system runs in periodic mode and switches later to one-shot mode if possible.
The next wake-up event is programmed based on the aligned value (tick_next_period) but the delta value, that is used to program the clock_event_device, is computed based on ktime_get().
With the subtracted offset, the device fires earlier than the exact time frame. With a large enough offset the system programs the timer for the next wake-up and the remaining time left is too small to make any boot progress. The system hangs.
Move the alignment later to the setup of tick_sched timer. At this point the system switches to oneshot mode and a high resolution clocksource is available. At this point it is safe to align tick_next_period because ktime_get() will now return accurate (not jiffies based) time.
[bigeasy: Patch description + testing].
Fixes: e9523a0d81899 ("tick/common: Align tick period with the HZ tick.") Reported-by: Mathias Krause minipli@grsecurity.net Reported-by: "Bhatnagar, Rishabh" risbhat@amazon.com Suggested-by: Mathias Krause minipli@grsecurity.net Signed-off-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Richard W.M. Jones rjones@redhat.com Tested-by: Mathias Krause minipli@grsecurity.net Acked-by: SeongJae Park sj@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/5a56290d-806e-b9a5-f37c-f21958b5a8c0@grsecurity.net Link: https://lore.kernel.org/12c6f9a3-d087-b824-0d05-0d18c9bc1bf3@amazon.com Link: https://lore.kernel.org/r/20230615091830.RxMV2xf_@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/time/tick-common.c | 13 +------------ kernel/time/tick-sched.c | 13 ++++++++++++- 2 files changed, 13 insertions(+), 13 deletions(-)
--- a/kernel/time/tick-common.c +++ b/kernel/time/tick-common.c @@ -218,19 +218,8 @@ static void tick_setup_device(struct tic * this cpu: */ if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) { - ktime_t next_p; - u32 rem; - tick_do_timer_cpu = cpu; - - next_p = ktime_get(); - div_u64_rem(next_p, TICK_NSEC, &rem); - if (rem) { - next_p -= rem; - next_p += TICK_NSEC; - } - - tick_next_period = next_p; + tick_next_period = ktime_get(); #ifdef CONFIG_NO_HZ_FULL /* * The boot CPU may be nohz_full, in which case set --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -161,8 +161,19 @@ static ktime_t tick_init_jiffy_update(vo raw_spin_lock(&jiffies_lock); write_seqcount_begin(&jiffies_seq); /* Did we start the jiffies update yet ? */ - if (last_jiffies_update == 0) + if (last_jiffies_update == 0) { + u32 rem; + + /* + * Ensure that the tick is aligned to a multiple of + * TICK_NSEC. + */ + div_u64_rem(tick_next_period, TICK_NSEC, &rem); + if (rem) + tick_next_period += TICK_NSEC - rem; + last_jiffies_update = tick_next_period; + } period = last_jiffies_update; write_seqcount_end(&jiffies_seq); raw_spin_unlock(&jiffies_lock);
From: Michael S. Tsirkin mst@redhat.com
commit afd384f0dbea2229fd11159efb86a5b41051c4a9 upstream.
This reverts commit 07b679f70d73483930e8d3c293942416d9cd5c13.
This change appears to have broken things... We now see applications hanging during disk accesses. e.g. multi-port virtio-blk device running in h/w (FPGA) Host running a simple 'fio' test. [global] thread=1 direct=1 ioengine=libaio norandommap=1 group_reporting=1 bs=4K rw=read iodepth=128 runtime=1 numjobs=4 time_based [job0] filename=/dev/vda [job1] filename=/dev/vdb [job2] filename=/dev/vdc ... [job15] filename=/dev/vdp
i.e. 16 disks; 4 queues per disk; simple burst of 4KB reads This is repeatedly run in a loop.
After a few, normally <10 seconds, fio hangs. With 64 queues (16 disks), failure occurs within a few seconds; with 8 queues (2 disks) it may take ~hour before hanging. Last message: fio-3.19 Starting 8 threads Jobs: 1 (f=1): [_(7),R(1)][68.3%][eta 03h:11m:06s] I think this means at the end of the run 1 queue was left incomplete.
'diskstats' (run while fio is hung) shows no outstanding transactions. e.g. $ cat /proc/diskstats ... 252 0 vda 1843140071 0 14745120568 712568645 0 0 0 0 0 3117947 712568645 0 0 0 0 0 0 252 16 vdb 1816291511 0 14530332088 704905623 0 0 0 0 0 3117711 704905623 0 0 0 0 0 0 ...
Other stats (in the h/w, and added to the virtio-blk driver ([a]virtio_queue_rq(), [b]virtblk_handle_req(), [c]virtblk_request_done()) all agree, and show every request had a completion, and that virtblk_request_done() never gets called. e.g. PF= 0 vq=0 1 2 3 [a]request_count - 839416590 813148916 105586179 84988123 [b]completion1_count - 839416590 813148916 105586179 84988123 [c]completion2_count - 0 0 0 0
PF= 1 vq=0 1 2 3 [a]request_count - 823335887 812516140 104582672 75856549 [b]completion1_count - 823335887 812516140 104582672 75856549 [c]completion2_count - 0 0 0 0
i.e. the issue is after the virtio-blk driver.
This change was introduced in kernel 6.3.0. I am seeing this using 6.3.3. If I run with an earlier kernel (5.15), it does not occur. If I make a simple patch to the 6.3.3 virtio-blk driver, to skip the blk_mq_add_to_batch()call, it does not fail. e.g. kernel 5.15 - this is OK virtio_blk.c,virtblk_done() [irq handler] if (likely(!blk_should_fake_timeout(req->q))) { blk_mq_complete_request(req); }
kernel 6.3.3 - this fails virtio_blk.c,virtblk_handle_req() [irq handler] if (likely(!blk_should_fake_timeout(req->q))) { if (!blk_mq_complete_request_remote(req)) { if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) { virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed } } }
If I do, kernel 6.3.3 - this is OK virtio_blk.c,virtblk_handle_req() [irq handler] if (likely(!blk_should_fake_timeout(req->q))) { if (!blk_mq_complete_request_remote(req)) { virtblk_request_done(req); //force this here... if (!blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), virtblk_complete_batch)) { virtblk_request_done(req); //this never gets called... so blk_mq_add_to_batch() must always succeed } } }
Perhaps you might like to fix/test/revert this change... Martin
Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202306090826.C1fZmdMe-lkp@intel.com/ Cc: Suwan Kim suwan.kim027@gmail.com Tested-by: edliaw@google.com Reported-by: "Roberts, Martin" martin.roberts@intel.com Message-Id: 336455b4f630f329380a8f53ee8cad3868764d5c.1686295549.git.mst@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/virtio_blk.c | 82 ++++++++++++++++++++------------------------- 1 file changed, 37 insertions(+), 45 deletions(-)
--- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -348,63 +348,33 @@ static inline void virtblk_request_done( blk_mq_end_request(req, status); }
-static void virtblk_complete_batch(struct io_comp_batch *iob) -{ - struct request *req; - - rq_list_for_each(&iob->req_list, req) { - virtblk_unmap_data(req, blk_mq_rq_to_pdu(req)); - virtblk_cleanup_cmd(req); - } - blk_mq_end_request_batch(iob); -} - -static int virtblk_handle_req(struct virtio_blk_vq *vq, - struct io_comp_batch *iob) -{ - struct virtblk_req *vbr; - int req_done = 0; - unsigned int len; - - while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) { - struct request *req = blk_mq_rq_from_pdu(vbr); - - if (likely(!blk_should_fake_timeout(req->q)) && - !blk_mq_complete_request_remote(req) && - !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), - virtblk_complete_batch)) - virtblk_request_done(req); - req_done++; - } - - return req_done; -} - static void virtblk_done(struct virtqueue *vq) { struct virtio_blk *vblk = vq->vdev->priv; - struct virtio_blk_vq *vblk_vq = &vblk->vqs[vq->index]; - int req_done = 0; + bool req_done = false; + int qid = vq->index; + struct virtblk_req *vbr; unsigned long flags; - DEFINE_IO_COMP_BATCH(iob); + unsigned int len;
- spin_lock_irqsave(&vblk_vq->lock, flags); + spin_lock_irqsave(&vblk->vqs[qid].lock, flags); do { virtqueue_disable_cb(vq); - req_done += virtblk_handle_req(vblk_vq, &iob); + while ((vbr = virtqueue_get_buf(vblk->vqs[qid].vq, &len)) != NULL) { + struct request *req = blk_mq_rq_from_pdu(vbr);
+ if (likely(!blk_should_fake_timeout(req->q))) + blk_mq_complete_request(req); + req_done = true; + } if (unlikely(virtqueue_is_broken(vq))) break; } while (!virtqueue_enable_cb(vq));
- if (req_done) { - if (!rq_list_empty(iob.req_list)) - iob.complete(&iob); - - /* In case queue is stopped waiting for more buffers. */ + /* In case queue is stopped waiting for more buffers. */ + if (req_done) blk_mq_start_stopped_hw_queues(vblk->disk->queue, true); - } - spin_unlock_irqrestore(&vblk_vq->lock, flags); + spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags); }
static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx) @@ -1283,15 +1253,37 @@ static void virtblk_map_queues(struct bl } }
+static void virtblk_complete_batch(struct io_comp_batch *iob) +{ + struct request *req; + + rq_list_for_each(&iob->req_list, req) { + virtblk_unmap_data(req, blk_mq_rq_to_pdu(req)); + virtblk_cleanup_cmd(req); + } + blk_mq_end_request_batch(iob); +} + static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob) { struct virtio_blk *vblk = hctx->queue->queuedata; struct virtio_blk_vq *vq = get_virtio_blk_vq(hctx); + struct virtblk_req *vbr; unsigned long flags; + unsigned int len; int found = 0;
spin_lock_irqsave(&vq->lock, flags); - found = virtblk_handle_req(vq, iob); + + while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) { + struct request *req = blk_mq_rq_from_pdu(vbr); + + found++; + if (!blk_mq_complete_request_remote(req) && + !blk_mq_add_to_batch(req, iob, virtblk_vbr_status(vbr), + virtblk_complete_batch)) + virtblk_request_done(req); + }
if (found) blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
From: Nathan Chancellor nathan@kernel.org
This patch fixes a stable only patch, so it has no direct upstream equivalent.
After a stable only patch to explicitly handle the '.got' section to handle an orphan section warning from the linker, certain configurations error when linking with ld.lld, which enables relro by default:
ld.lld: error: section: .got is not contiguous with other relro sections
This has come up with other architectures before, such as arm and arm64 in commit 0cda9bc15dfc ("ARM: 9038/1: Link with '-z norelro'") and commit 3b92fa7485eb ("arm64: link with -z norelro regardless of CONFIG_RELOCATABLE"). Additionally, '-z norelro' is used unconditionally for RISC-V upstream after commit 26e7aacb83df ("riscv: Allow to downgrade paging mode from the command line"), which alluded to this issue for the same reason. Bring 6.3 in line with mainline and link with '-z norelro', which resolves the above link failure.
Fixes: e6d1562dd4e9 ("riscv: vmlinux.lds.S: Explicitly handle '.got' section") Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202306192231.DJmWr6BX-lkp@intel.com/ Signed-off-by: Nathan Chancellor nathan@kernel.org Acked-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/riscv/Makefile +++ b/arch/riscv/Makefile @@ -7,7 +7,7 @@ #
OBJCOPYFLAGS := -O binary -LDFLAGS_vmlinux := +LDFLAGS_vmlinux := -z norelro ifeq ($(CONFIG_DYNAMIC_FTRACE),y) LDFLAGS_vmlinux := --no-relax KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 0a85264e48b642d360720589fdb837a3643fb9b0 upstream.
mptcp_connect tool was printing some duplicated entries when showing how to use it: -j -l -r
While at it, I also:
- moved the very few entries that were not sorted,
- added -R that was missing since commit 8a4b910d005d ("mptcp: selftests: add rcvbuf set option"),
- removed the -u parameter that has been removed in commit f730b65c9d85 ("selftests: mptcp: try to set mptcp ulp mode in different sk states").
No need to backport this, it is just an internal tool used by our selftests. The help menu is mainly useful for MPTCP kernel devs.
Acked-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_connect.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.c +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c @@ -106,8 +106,8 @@ static struct cfg_sockopt_types cfg_sock static void die_usage(void) { fprintf(stderr, "Usage: mptcp_connect [-6] [-c cmsg] [-f offset] [-i file] [-I num] [-j] [-l] " - "[-m mode] [-M mark] [-o option] [-p port] [-P mode] [-j] [-l] [-r num] " - "[-s MPTCP|TCP] [-S num] [-r num] [-t num] [-T num] [-u] [-w sec] connect_address\n"); + "[-m mode] [-M mark] [-o option] [-p port] [-P mode] [-r num] [-R num] " + "[-s MPTCP|TCP] [-S num] [-t num] [-T num] [-w sec] connect_address\n"); fprintf(stderr, "\t-6 use ipv6\n"); fprintf(stderr, "\t-c cmsg -- test cmsg type <cmsg>\n"); fprintf(stderr, "\t-f offset -- stop the I/O after receiving and sending the specified amount " @@ -126,13 +126,13 @@ static void die_usage(void) fprintf(stderr, "\t-p num -- use port num\n"); fprintf(stderr, "\t-P [saveWithPeek|saveAfterPeek] -- save data with/after MSG_PEEK form tcp socket\n"); - fprintf(stderr, "\t-t num -- set poll timeout to num\n"); - fprintf(stderr, "\t-T num -- set expected runtime to num ms\n"); fprintf(stderr, "\t-r num -- enable slow mode, limiting each write to num bytes " "-- for remove addr tests\n"); fprintf(stderr, "\t-R num -- set SO_RCVBUF to num\n"); fprintf(stderr, "\t-s [MPTCP|TCP] -- use mptcp(default) or tcp sockets\n"); fprintf(stderr, "\t-S num -- set SO_SNDBUF to num\n"); + fprintf(stderr, "\t-t num -- set poll timeout to num\n"); + fprintf(stderr, "\t-T num -- set expected runtime to num ms\n"); fprintf(stderr, "\t-w num -- wait num sec before closing the socket\n"); exit(1); }
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 0fcd72df8847d3a62eb34a084862157ce0564a94 upstream.
Most of the code had an issue according to ShellCheck.
That's mainly due to the fact it incorrectly believes most of the code was unreachable because it's invoked by variable name, see how the "tests" array is used.
Once SC2317 has been ignored, three small warnings were still visible:
- SC2155: Declare and assign separately to avoid masking return values.
- SC2046: Quote this to prevent word splitting: can be ignored because "ip netns pids" can display more than one pid.
- SC2166: Prefer [ p ] || [ q ] as [ p -o q ] is not well defined.
This probably didn't fix any actual issues but it might help spotting new interesting warnings reported by ShellCheck as just before, ShellCheck was reporting issues for most lines making it a bit useless.
Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -8,6 +8,10 @@
. "$(dirname "${0}")/mptcp_lib.sh"
+# ShellCheck incorrectly believes that most of the code here is unreachable +# because it's invoked by variable name, see how the "tests" array is used +#shellcheck disable=SC2317 + ret=0 sin="" sinfail="" @@ -377,8 +381,9 @@ check_transfer()
local line if [ -n "$bytes" ]; then + local out_size # when truncating we must check the size explicitly - local out_size=$(wc -c $out | awk '{print $1}') + out_size=$(wc -c $out | awk '{print $1}') if [ $out_size -ne $bytes ]; then echo "[ FAIL ] $what output file has wrong size ($out_size, $bytes)" fail_test @@ -513,6 +518,7 @@ kill_events_pids()
kill_tests_wait() { + #shellcheck disable=SC2046 kill -SIGUSR1 $(ip netns pids $ns2) $(ip netns pids $ns1) wait } @@ -1725,7 +1731,7 @@ chk_subflow_nr()
cnt1=$(ss -N $ns1 -tOni | grep -c token) cnt2=$(ss -N $ns2 -tOni | grep -c token) - if [ "$cnt1" != "$subflow_nr" -o "$cnt2" != "$subflow_nr" ]; then + if [ "$cnt1" != "$subflow_nr" ] || [ "$cnt2" != "$subflow_nr" ]; then echo "[fail] got $cnt1:$cnt2 subflows expected $subflow_nr" fail_test dump_stats=1
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 673004821ab98c6645bd21af56a290854e88f533 upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
New functions are now available to easily detect if a certain feature is missing by looking at kallsyms.
These new helpers are going to be used in the following commits. In order to ease the backport of such future patches, it would be good if this patch is backported up to the introduction of MPTCP selftests, hence the Fixes tag below: this type of check was supposed to be done from the beginning.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 048d19d444be ("mptcp: add basic kselftest for mptcp") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/config | 1 tools/testing/selftests/net/mptcp/mptcp_lib.sh | 38 +++++++++++++++++++++++++ 2 files changed, 39 insertions(+)
--- a/tools/testing/selftests/net/mptcp/config +++ b/tools/testing/selftests/net/mptcp/config @@ -1,3 +1,4 @@ +CONFIG_KALLSYMS=y CONFIG_MPTCP=y CONFIG_IPV6=y CONFIG_MPTCP_IPV6=y --- a/tools/testing/selftests/net/mptcp/mptcp_lib.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_lib.sh @@ -38,3 +38,41 @@ mptcp_lib_check_mptcp() { exit ${KSFT_SKIP} fi } + +mptcp_lib_check_kallsyms() { + if ! mptcp_lib_has_file "/proc/kallsyms"; then + echo "SKIP: CONFIG_KALLSYMS is missing" + exit ${KSFT_SKIP} + fi +} + +# Internal: use mptcp_lib_kallsyms_has() instead +__mptcp_lib_kallsyms_has() { + local sym="${1}" + + mptcp_lib_check_kallsyms + + grep -q " ${sym}" /proc/kallsyms +} + +# $1: part of a symbol to look at, add '$' at the end for full name +mptcp_lib_kallsyms_has() { + local sym="${1}" + + if __mptcp_lib_kallsyms_has "${sym}"; then + return 0 + fi + + mptcp_lib_fail_if_expected_feature "${sym} symbol not found" +} + +# $1: part of a symbol to look at, add '$' at the end for full name +mptcp_lib_kallsyms_doesnt_have() { + local sym="${1}" + + if ! __mptcp_lib_kallsyms_has "${sym}"; then + return 0 + fi + + mptcp_lib_fail_if_expected_feature "${sym} symbol has been found" +}
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 07bf49401909264a38fa3427c3cce43e8304436a upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
One of them is the support of IP(V6)_TRANSPARENT socket option with MPTCP connections introduced by commit c9406a23c116 ("mptcp: sockopt: add SOL_IP freebind & transparent options").
It is possible to look for "__ip_sock_set_tos" in kallsyms because IP(V6)_TRANSPARENT socket option support has been added after TOS support which came with the required infrastructure in MPTCP sockopt code. To support TOS, the following function has been exported (T). Not great but better than checking for a specific kernel version.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 5fb62e9cd3ad ("selftests: mptcp: add tproxy test case") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh @@ -144,6 +144,7 @@ cleanup() }
mptcp_lib_check_mptcp +mptcp_lib_check_kallsyms
ip -Version > /dev/null 2>&1 if [ $? -ne 0 ];then @@ -695,6 +696,15 @@ run_test_transparent() return 0 fi
+ # IP(V6)_TRANSPARENT has been added after TOS support which came with + # the required infrastructure in MPTCP sockopt code. To support TOS, the + # following function has been exported (T). Not great but better than + # checking for a specific kernel version. + if ! mptcp_lib_kallsyms_has "T __ip_sock_set_tos$"; then + echo "INFO: ${msg} not supported by the kernel: SKIP" + return + fi + ip netns exec "$listener_ns" nft -f /dev/stdin <<"EOF" flush ruleset table inet mangle {
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 4ad39a42da2e9770c8e4c37fe632ed8898419129 upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
One of them is the full support of disconnections from the userspace introduced by commit b29fcfb54cd7 ("mptcp: full disconnect implementation").
It is possible to look for "mptcp_pm_data_reset" in kallsyms because a preparation patch added it to ease the introduction of the mentioned feature.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 5 +++++ 1 file changed, 5 insertions(+)
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh @@ -797,6 +797,11 @@ run_tests_disconnect() local old_cin=$cin local old_sin=$sin
+ if ! mptcp_lib_kallsyms_has "mptcp_pm_data_reset$"; then + echo "INFO: Full disconnect not supported: SKIP" + return + fi + cat $cin $cin $cin > "$cin".disconnect
# force do_transfer to cope with the multiple tranmissions
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 06b03083158e90d57866fa220de92c8dd8b9598b upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
One of them is the support of TCP_FASTOPEN socket option with MPTCP connections introduced by commit 4ffb0a02346c ("mptcp: add TCP_FASTOPEN sock option").
It is possible to look for "mptcp_fastopen_" in kallsyms to know if the feature is supported or not.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: ca7ae8916043 ("selftests: mptcp: mptfo Initiator/Listener") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 5 +++++ 1 file changed, 5 insertions(+)
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh @@ -777,6 +777,11 @@ run_tests_peekmode()
run_tests_mptfo() { + if ! mptcp_lib_kallsyms_has "mptcp_fastopen_"; then + echo "INFO: TFO not supported by the kernel: SKIP" + return + fi + echo "INFO: with MPTFO start" ip netns exec "$ns1" sysctl -q net.ipv4.tcp_fastopen=2 ip netns exec "$ns2" sysctl -q net.ipv4.tcp_fastopen=1
From: Matthieu Baerts matthieu.baerts@tessares.net
commit dc97251bf0b70549c76ba261516c01b8096771c5 upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
One of them is the listen diag dump support introduced by commit 4fa39b701ce9 ("mptcp: listen diag dump support").
It looks like there is no good pre-check to do here, i.e. dedicated function available in kallsyms. Instead, we try to get info if nothing is returned, the test is marked as skipped.
That's not ideal because something could be wrong with the feature and instead of reporting an error, the test could be marked as skipped. If we know in advanced that the feature is supposed to be supported, the tester can set SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var to 1: in this case the test will report an error instead of marking the test as skipped if nothing is returned.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: f2ae0fa68e28 ("selftests/mptcp: add diag listen tests") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/diag.sh | 42 ++++++++++++------------------ 1 file changed, 17 insertions(+), 25 deletions(-)
--- a/tools/testing/selftests/net/mptcp/diag.sh +++ b/tools/testing/selftests/net/mptcp/diag.sh @@ -55,16 +55,20 @@ __chk_nr() { local command="$1" local expected=$2 - local msg nr + local msg="$3" + local skip="${4:-SKIP}" + local nr
- shift 2 - msg=$* nr=$(eval $command)
printf "%-50s" "$msg" if [ $nr != $expected ]; then - echo "[ fail ] expected $expected found $nr" - ret=$test_cnt + if [ $nr = "$skip" ] && ! mptcp_lib_expect_all_features; then + echo "[ skip ] Feature probably not supported" + else + echo "[ fail ] expected $expected found $nr" + ret=$test_cnt + fi else echo "[ ok ]" fi @@ -76,12 +80,12 @@ __chk_msk_nr() local condition=$1 shift 1
- __chk_nr "ss -inmHMN $ns | $condition" $* + __chk_nr "ss -inmHMN $ns | $condition" "$@" }
chk_msk_nr() { - __chk_msk_nr "grep -c token:" $* + __chk_msk_nr "grep -c token:" "$@" }
wait_msk_nr() @@ -119,37 +123,26 @@ wait_msk_nr()
chk_msk_fallback_nr() { - __chk_msk_nr "grep -c fallback" $* + __chk_msk_nr "grep -c fallback" "$@" }
chk_msk_remote_key_nr() { - __chk_msk_nr "grep -c remote_key" $* + __chk_msk_nr "grep -c remote_key" "$@" }
__chk_listen() { local filter="$1" local expected=$2 + local msg="$3"
- shift 2 - msg=$* - - nr=$(ss -N $ns -Ml "$filter" | grep -c LISTEN) - printf "%-50s" "$msg" - - if [ $nr != $expected ]; then - echo "[ fail ] expected $expected found $nr" - ret=$test_cnt - else - echo "[ ok ]" - fi + __chk_nr "ss -N $ns -Ml '$filter' | grep -c LISTEN" "$expected" "$msg" 0 }
chk_msk_listen() { lport=$1 - local msg="check for listen socket"
# destination port search should always return empty list __chk_listen "dport $lport" 0 "listen match for dport $lport" @@ -167,10 +160,9 @@ chk_msk_listen() chk_msk_inuse() { local expected=$1 + local msg="$2" local listen_nr
- shift 1 - listen_nr=$(ss -N "${ns}" -Ml | grep -c LISTEN) expected=$((expected + listen_nr))
@@ -181,7 +173,7 @@ chk_msk_inuse() sleep 0.1 done
- __chk_nr get_msk_inuse $expected $* + __chk_nr get_msk_inuse $expected "$msg" }
# $1: ns, $2: port
From: Matthieu Baerts matthieu.baerts@tessares.net
commit dc93086aff040349b5b2a4608c71ea01286dc2cc upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
One of them is the reporting of the MPTCP sockets being used, introduced by commit c558246ee73e ("mptcp: add statistics for mptcp socket in use").
Similar to the parent commit, it looks like there is no good pre-check to do here, i.e. dedicated function available in kallsyms. Instead, we try to get info and if nothing is returned, the test is marked as skipped.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: e04a30f78809 ("selftest: mptcp: add test for mptcp socket in use") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/diag.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/mptcp/diag.sh b/tools/testing/selftests/net/mptcp/diag.sh index 4a6165389b74..fa9e09ad97d9 100755 --- a/tools/testing/selftests/net/mptcp/diag.sh +++ b/tools/testing/selftests/net/mptcp/diag.sh @@ -173,7 +173,7 @@ chk_msk_inuse() sleep 0.1 done
- __chk_nr get_msk_inuse $expected "$msg" + __chk_nr get_msk_inuse $expected "$msg" 0 }
# $1: ns, $2: port
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 2177d0b08e421971e035672b70f3228d9485c650 upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
One of them is the checks of the default limits returned by the MPTCP in-kernel path-manager. The default values have been modified by commit 72bcbc46a5c3 ("mptcp: increase default max additional subflows to 2"). Instead of comparing with hardcoded values, we can get the default one and compare with them.
Note that if we expect to have the latest version, we continue to check the hardcoded values to avoid unexpected behaviour changes.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: eedbc685321b ("selftests: add PM netlink functional tests") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/pm_netlink.sh | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
--- a/tools/testing/selftests/net/mptcp/pm_netlink.sh +++ b/tools/testing/selftests/net/mptcp/pm_netlink.sh @@ -73,8 +73,12 @@ check() }
check "ip netns exec $ns1 ./pm_nl_ctl dump" "" "defaults addr list" -check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0 + +default_limits="$(ip netns exec $ns1 ./pm_nl_ctl limits)" +if mptcp_lib_expect_all_features; then + check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0 subflows 2" "defaults limits" +fi
ip netns exec $ns1 ./pm_nl_ctl add 10.0.1.1 ip netns exec $ns1 ./pm_nl_ctl add 10.0.1.2 flags subflow dev lo @@ -121,12 +125,10 @@ ip netns exec $ns1 ./pm_nl_ctl flush check "ip netns exec $ns1 ./pm_nl_ctl dump" "" "flush addrs"
ip netns exec $ns1 ./pm_nl_ctl limits 9 1 -check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0 -subflows 2" "rcv addrs above hard limit" +check "ip netns exec $ns1 ./pm_nl_ctl limits" "$default_limits" "rcv addrs above hard limit"
ip netns exec $ns1 ./pm_nl_ctl limits 1 9 -check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 0 -subflows 2" "subflows above hard limit" +check "ip netns exec $ns1 ./pm_nl_ctl limits" "$default_limits" "subflows above hard limit"
ip netns exec $ns1 ./pm_nl_ctl limits 8 8 check "ip netns exec $ns1 ./pm_nl_ctl limits" "accept 8
From: Matthieu Baerts matthieu.baerts@tessares.net
commit f3761b50b8e4cb4807b5d41e02144c8c8a0f2512 upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
One of them is the fullmesh flag that can be given to the MPTCP in-kernel path-manager and introduced in commit 2843ff6f36db ("mptcp: remote addresses fullmesh").
If the flag is not visible in the dump after having set it, we don't check the content. Note that if we expect to have this feature and SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var is set to 1, we always check the content to avoid regressions.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 6da1dfdd037e ("selftests: mptcp: add set_flags tests in pm_netlink.sh") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/pm_netlink.sh | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
--- a/tools/testing/selftests/net/mptcp/pm_netlink.sh +++ b/tools/testing/selftests/net/mptcp/pm_netlink.sh @@ -178,14 +178,19 @@ subflow,backup 10.0.1.1" "set flags (bac ip netns exec $ns1 ./pm_nl_ctl set 10.0.1.1 flags nobackup check "ip netns exec $ns1 ./pm_nl_ctl dump" "id 1 flags \ subflow 10.0.1.1" " (nobackup)" + +# fullmesh support has been added later ip netns exec $ns1 ./pm_nl_ctl set id 1 flags fullmesh -check "ip netns exec $ns1 ./pm_nl_ctl dump" "id 1 flags \ +if ip netns exec $ns1 ./pm_nl_ctl dump | grep -q "fullmesh" || + mptcp_lib_expect_all_features; then + check "ip netns exec $ns1 ./pm_nl_ctl dump" "id 1 flags \ subflow,fullmesh 10.0.1.1" " (fullmesh)" -ip netns exec $ns1 ./pm_nl_ctl set id 1 flags nofullmesh -check "ip netns exec $ns1 ./pm_nl_ctl dump" "id 1 flags \ + ip netns exec $ns1 ./pm_nl_ctl set id 1 flags nofullmesh + check "ip netns exec $ns1 ./pm_nl_ctl dump" "id 1 flags \ subflow 10.0.1.1" " (nofullmesh)" -ip netns exec $ns1 ./pm_nl_ctl set id 1 flags backup,fullmesh -check "ip netns exec $ns1 ./pm_nl_ctl dump" "id 1 flags \ + ip netns exec $ns1 ./pm_nl_ctl set id 1 flags backup,fullmesh + check "ip netns exec $ns1 ./pm_nl_ctl dump" "id 1 flags \ subflow,backup,fullmesh 10.0.1.1" " (backup,fullmesh)" +fi
exit $ret
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 8dee6ca2ac1e5630a7bb6a98bc0b686916fc2000 upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
One of them is the getsockopt(SOL_MPTCP) to get info about the MPTCP connections introduced by commit 55c42fa7fa33 ("mptcp: add MPTCP_INFO getsockopt") and the following ones.
We cannot guess in advance which sizes the kernel will returned: older kernel can returned smaller sizes, e.g. recently the tcp_info structure has been modified in commit 71fc704768f6 ("tcp: add rcv_wnd and plb_rehash to TCP_INFO") where a new field has been added.
The userspace can also expect a smaller size if it is compiled with old uAPI kernel headers.
So for these sizes, we can only check if they are above a certain threshold, 0 for the moment. We can also only compared sizes with the ones set by the kernel.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: ce9979129a0b ("selftests: mptcp: add mptcp getsockopt test cases") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_sockopt.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_sockopt.c +++ b/tools/testing/selftests/net/mptcp/mptcp_sockopt.c @@ -87,6 +87,10 @@ struct so_state { uint64_t tcpi_rcv_delta; };
+#ifndef MIN +#define MIN(a, b) ((a) < (b) ? (a) : (b)) +#endif + static void die_perror(const char *msg) { perror(msg); @@ -349,13 +353,14 @@ static void do_getsockopt_tcp_info(struc xerror("getsockopt MPTCP_TCPINFO (tries %d, %m)");
assert(olen <= sizeof(ti)); - assert(ti.d.size_user == ti.d.size_kernel); - assert(ti.d.size_user == sizeof(struct tcp_info)); + assert(ti.d.size_kernel > 0); + assert(ti.d.size_user == + MIN(ti.d.size_kernel, sizeof(struct tcp_info))); assert(ti.d.num_subflows == 1);
assert(olen > (socklen_t)sizeof(struct mptcp_subflow_data)); olen -= sizeof(struct mptcp_subflow_data); - assert(olen == sizeof(struct tcp_info)); + assert(olen == ti.d.size_user);
if (ti.ti[0].tcpi_bytes_sent == w && ti.ti[0].tcpi_bytes_received == r) @@ -401,13 +406,14 @@ static void do_getsockopt_subflow_addrs( die_perror("getsockopt MPTCP_SUBFLOW_ADDRS");
assert(olen <= sizeof(addrs)); - assert(addrs.d.size_user == addrs.d.size_kernel); - assert(addrs.d.size_user == sizeof(struct mptcp_subflow_addrs)); + assert(addrs.d.size_kernel > 0); + assert(addrs.d.size_user == + MIN(addrs.d.size_kernel, sizeof(struct mptcp_subflow_addrs))); assert(addrs.d.num_subflows == 1);
assert(olen > (socklen_t)sizeof(struct mptcp_subflow_data)); olen -= sizeof(struct mptcp_subflow_data); - assert(olen == sizeof(struct mptcp_subflow_addrs)); + assert(olen == addrs.d.size_user);
llen = sizeof(local); ret = getsockname(fd, (struct sockaddr *)&local, &llen);
From: Matthieu Baerts matthieu.baerts@tessares.net
commit c6f7eccc519837ebde1d099d9610c4f1d5bd975e upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
One of them is the getsockopt(SOL_MPTCP) to get info about the MPTCP connections introduced by commit 55c42fa7fa33 ("mptcp: add MPTCP_INFO getsockopt") and the following ones.
It is possible to look for "mptcp_diag_fill_info" in kallsyms because it is introduced by the mentioned feature. So we can know in advance if the feature is supported and skip the sub-test if not.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: ce9979129a0b ("selftests: mptcp: add mptcp getsockopt test cases") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh @@ -87,6 +87,7 @@ cleanup() }
mptcp_lib_check_mptcp +mptcp_lib_check_kallsyms
ip -Version > /dev/null 2>&1 if [ $? -ne 0 ];then @@ -253,6 +254,11 @@ do_mptcp_sockopt_tests() { local lret=0
+ if ! mptcp_lib_kallsyms_has "mptcp_diag_fill_info$"; then + echo "INFO: MPTCP sockopt not supported: SKIP" + return + fi + ip netns exec "$ns_sbox" ./mptcp_sockopt lret=$?
From: Matthieu Baerts matthieu.baerts@tessares.net
commit b631e3a4e94c77c9007d60b577a069c203ce9594 upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
One of them is TCP_INQ cmsg support introduced in commit 2c9e77659a0c ("mptcp: add TCP_INQ cmsg support").
It is possible to look for "mptcp_ioctl" in kallsyms because it was needed to introduce the mentioned feature. We can skip these tests and not set TCPINQ option if the feature is not supported.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 5cbd886ce2a9 ("selftests: mptcp: add TCP_INQ support") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh @@ -187,9 +187,14 @@ do_transfer() local_addr="0.0.0.0" fi
+ cmsg="TIMESTAMPNS" + if mptcp_lib_kallsyms_has "mptcp_ioctl$"; then + cmsg+=",TCPINQ" + fi + timeout ${timeout_test} \ ip netns exec ${listener_ns} \ - $mptcp_connect -t ${timeout_poll} -l -M 1 -p $port -s ${srv_proto} -c TIMESTAMPNS,TCPINQ \ + $mptcp_connect -t ${timeout_poll} -l -M 1 -p $port -s ${srv_proto} -c "${cmsg}" \ ${local_addr} < "$sin" > "$sout" & local spid=$!
@@ -197,7 +202,7 @@ do_transfer()
timeout ${timeout_test} \ ip netns exec ${connector_ns} \ - $mptcp_connect -t ${timeout_poll} -M 2 -p $port -s ${cl_proto} -c TIMESTAMPNS,TCPINQ \ + $mptcp_connect -t ${timeout_poll} -M 2 -p $port -s ${cl_proto} -c "${cmsg}" \ $connect_addr < "$cin" > "$cout" &
local cpid=$! @@ -313,6 +318,11 @@ do_tcpinq_tests() { local lret=0
+ if ! mptcp_lib_kallsyms_has "mptcp_ioctl$"; then + echo "INFO: TCP_INQ not supported: SKIP" + return + fi + local args for args in "-t tcp" "-r tcp"; do do_tcpinq_test $args
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 723d6b9b12338c1caf06bf6fe269962ef04e2c71 upstream.
When a required tool is missing, the return code 4 (SKIP) should be returned instead of 1 (FAIL).
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 259a834fadda ("selftests: mptcp: functional tests for the userspace PM type") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/userspace_pm.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/net/mptcp/userspace_pm.sh +++ b/tools/testing/selftests/net/mptcp/userspace_pm.sh @@ -8,7 +8,7 @@ mptcp_lib_check_mptcp ip -Version > /dev/null 2>&1 if [ $? -ne 0 ];then echo "SKIP: Cannot not run test without ip tool" - exit 1 + exit ${KSFT_SKIP} fi
ANNOUNCED=6 # MPTCP_EVENT_ANNOUNCED
From: Matthieu Baerts matthieu.baerts@tessares.net
commit f90adb033891d418c5dafef34a9aa49f3c860991 upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
One of them is the MPTCP Userspace PM introduced by commit 4638de5aefe5 ("mptcp: handle local addrs announced by userspace PMs").
We can skip all these tests if the feature is not supported simply by looking for the MPTCP pm_type's sysctl knob.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 259a834fadda ("selftests: mptcp: functional tests for the userspace PM type") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/userspace_pm.sh | 5 +++++ 1 file changed, 5 insertions(+)
--- a/tools/testing/selftests/net/mptcp/userspace_pm.sh +++ b/tools/testing/selftests/net/mptcp/userspace_pm.sh @@ -5,6 +5,11 @@
mptcp_lib_check_mptcp
+if ! mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then + echo "userspace pm tests are not supported by the kernel: SKIP" + exit ${KSFT_SKIP} +fi + ip -Version > /dev/null 2>&1 if [ $? -ne 0 ];then echo "SKIP: Cannot not run test without ip tool"
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 626cb7a5f6b892e48f27a76d11af040c538e03dc upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
One of them is the new listener events linked to the path-manager introduced by commit f8c9dfbd875b ("mptcp: add pm listener events").
It is possible to look for "mptcp_event_pm_listener" in kallsyms to know in advance if the kernel supports this feature and skip these sub-tests if the feature is not supported.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 6c73008aa301 ("selftests: mptcp: listener test for userspace PM") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/userspace_pm.sh | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/net/mptcp/userspace_pm.sh b/tools/testing/selftests/net/mptcp/userspace_pm.sh index 38a1d34f7b4d..98d9e4d2d3fc 100755 --- a/tools/testing/selftests/net/mptcp/userspace_pm.sh +++ b/tools/testing/selftests/net/mptcp/userspace_pm.sh @@ -4,6 +4,7 @@ . "$(dirname "${0}")/mptcp_lib.sh"
mptcp_lib_check_mptcp +mptcp_lib_check_kallsyms
if ! mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then echo "userspace pm tests are not supported by the kernel: SKIP" @@ -914,6 +915,11 @@ test_listener() { print_title "Listener tests"
+ if ! mptcp_lib_kallsyms_has "mptcp_event_pm_listener$"; then + stdbuf -o0 -e0 printf "LISTENER events \t[SKIP] Not supported\n" + return + fi + # Capture events on the network namespace running the client :>$client_evts
From: Matthieu Baerts matthieu.baerts@tessares.net
commit b1a6a38ab8a633546cefae890da842f19e006c74 upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
A new function is now available to easily detect if a feature is missing by looking at the kernel version. That's clearly not ideal and this kind of check should be avoided as soon as possible. But sometimes, there are no external sign that a "feature" is available or not: internal behaviours can change without modifying the uAPI and these selftests are verifying the internal behaviours. Sometimes, the only (easy) way to verify if the feature is present is to run the test but then the validation cannot determine if there is a failure with the feature or if the feature is missing. Then it looks better to check the kernel version instead of having tests that can never fail. In any case, we need a solution not to have a whole selftest being marked as failed just because one sub-test has failed.
Note that this env var car be set to 1 not to do such check and run the linked sub-test: SELFTESTS_MPTCP_LIB_NO_KVERSION_CHECK.
This new helper is going to be used in the following commits. In order to ease the backport of such future patches, it would be good if this patch is backported up to the introduction of MPTCP selftests, hence the Fixes tag below: this type of check was supposed to be done from the beginning.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 048d19d444be ("mptcp: add basic kselftest for mptcp") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_lib.sh | 26 +++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
--- a/tools/testing/selftests/net/mptcp/mptcp_lib.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_lib.sh @@ -76,3 +76,29 @@ mptcp_lib_kallsyms_doesnt_have() {
mptcp_lib_fail_if_expected_feature "${sym} symbol has been found" } + +# !!!AVOID USING THIS!!! +# Features might not land in the expected version and features can be backported +# +# $1: kernel version, e.g. 6.3 +mptcp_lib_kversion_ge() { + local exp_maj="${1%.*}" + local exp_min="${1#*.}" + local v maj min + + # If the kernel has backported features, set this env var to 1: + if [ "${SELFTESTS_MPTCP_LIB_NO_KVERSION_CHECK:-}" = "1" ]; then + return 0 + fi + + v=$(uname -r | cut -d'.' -f1,2) + maj=${v%.*} + min=${v#*.} + + if [ "${maj}" -gt "${exp_maj}" ] || + { [ "${maj}" -eq "${exp_maj}" ] && [ "${min}" -ge "${exp_min}" ]; }; then + return 0 + fi + + mptcp_lib_fail_if_expected_feature "kernel version ${1} lower than ${v}" +}
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 0c4cd3f86a40028845ad6f8af5b37165666404cd upstream.
IPTables commands using 'iptables-nft' fail on old kernels, at least 5.15 because it doesn't see the default IPTables chains:
$ iptables -L iptables/1.8.2 Failed to initialize nft: Protocol not supported
As a first step before switching to NFTables, we can use iptables-legacy if available.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 8d014eaa9254 ("selftests: mptcp: add ADD_ADDR timeout test case") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -25,6 +25,8 @@ capout="" ns1="" ns2="" ksft_skip=4 +iptables="iptables" +ip6tables="ip6tables" timeout_poll=30 timeout_test=$((timeout_poll * 2 + 1)) capture=0 @@ -146,7 +148,11 @@ check_tools() exit $ksft_skip fi
- if ! iptables -V &> /dev/null; then + # Use the legacy version if available to support old kernel versions + if iptables-legacy -V &> /dev/null; then + iptables="iptables-legacy" + ip6tables="ip6tables-legacy" + elif ! iptables -V &> /dev/null; then echo "SKIP: Could not run all tests without iptables tool" exit $ksft_skip fi @@ -247,9 +253,9 @@ reset_with_add_addr_timeout()
reset "${1}" || return 1
- tables="iptables" + tables="${iptables}" if [ $ip -eq 6 ]; then - tables="ip6tables" + tables="${ip6tables}" fi
ip netns exec $ns1 sysctl -q net.mptcp.add_addr_timeout=1 @@ -314,9 +320,9 @@ reset_with_fail() local ip="${3:-4}" local tables
- tables="iptables" + tables="${iptables}" if [ $ip -eq 6 ]; then - tables="ip6tables" + tables="${ip6tables}" fi
ip netns exec $ns2 $tables \ @@ -704,7 +710,7 @@ filter_tcp_from() local src="${2}" local target="${3}"
- ip netns exec "${ns}" iptables -A INPUT -s "${src}" -p tcp -j "${target}" + ip netns exec "${ns}" ${iptables} -A INPUT -s "${src}" -p tcp -j "${target}" }
do_transfer()
From: Matthieu Baerts matthieu.baerts@tessares.net
commit cdb50525345cf5a8359ee391032ef606a7826f08 upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
Here are some helpers that will be used to mark subtests as skipped if a feature is not supported. Marking as a fix for the commit introducing this selftest to help with the backports.
While at it, also check if kallsyms feature is available as it will also be used in the following commits to check if MPTCP features are available before starting a test.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: b08fbf241064 ("selftests: add test-cases for MPTCP MP_JOIN") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 27 ++++++++++++++++++++++++ 1 file changed, 27 insertions(+)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -142,6 +142,7 @@ cleanup_partial() check_tools() { mptcp_lib_check_mptcp + mptcp_lib_check_kallsyms
if ! ip -Version &> /dev/null; then echo "SKIP: Could not run test without ip tool" @@ -191,6 +192,32 @@ cleanup() cleanup_partial }
+# $1: msg +print_title() +{ + printf "%03u %-36s %s" "${TEST_COUNT}" "${TEST_NAME}" "${1}" +} + +# [ $1: fail msg ] +mark_as_skipped() +{ + local msg="${1:-"Feature not supported"}" + + mptcp_lib_fail_if_expected_feature "${msg}" + + print_title "[ skip ] ${msg}" + printf "\n" +} + +# $@: condition +continue_if() +{ + if ! "${@}"; then + mark_as_skipped + return 1 + fi +} + skip_test() { if [ "${#only_tests_ids[@]}" -eq 0 ] && [ "${#only_tests_names[@]}" -eq 0 ]; then
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 47867f0a7e831e24e5eab3330667ce9682d50fb1 upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
One of them is the MPTCP MIB counters introduced in commit fc518953bc9c ("mptcp: add and use MIB counter infrastructure") and more later. The MPTCP Join selftest heavily relies on these counters.
If a counter is not supported by the kernel, it is not displayed when using 'nstat -z'. We can then detect that and skip the verification. A new helper (get_counter()) has been added to do the required checks and return an error if the counter is not available.
Note that if we expect to have these features available and if SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var is set to 1, the tests will be marked as failed instead of skipped.
This new helper also makes sure we get the exact counter we want to avoid issues we had in the past, e.g. with MPTcpExtRmAddr and MPTcpExtRmAddrDrop sharing the same prefix. While at it, we uniform the way we fetch a MIB counter.
Note for the backports: we rarely change these modified blocks so if there is are conflicts, it is very likely because a counter is not used in the older kernels and we don't need that chunk.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: b08fbf241064 ("selftests: add test-cases for MPTCP MP_JOIN") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 230 +++++++++++++----------- 1 file changed, 130 insertions(+), 100 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -500,11 +500,25 @@ wait_local_port_listen() done }
-rm_addr_count() +# $1: ns ; $2: counter +get_counter() { - local ns=${1} + local ns="${1}" + local counter="${2}" + local count + + count=$(ip netns exec ${ns} nstat -asz "${counter}" | awk 'NR==1 {next} {print $2}') + if [ -z "${count}" ]; then + mptcp_lib_fail_if_expected_feature "${counter} counter" + return 1 + fi
- ip netns exec ${ns} nstat -as | grep MPTcpExtRmAddr | awk '{print $2}' + echo "${count}" +} + +rm_addr_count() +{ + get_counter "${1}" "MPTcpExtRmAddr" }
# $1: ns, $2: old rm_addr counter in $ns @@ -527,11 +541,11 @@ wait_mpj() local ns="${1}" local cnt old_cnt
- old_cnt=$(ip netns exec ${ns} nstat -as | grep MPJoinAckRx | awk '{print $2}') + old_cnt=$(get_counter ${ns} "MPTcpExtMPJoinAckRx")
local i for i in $(seq 10); do - cnt=$(ip netns exec ${ns} nstat -as | grep MPJoinAckRx | awk '{print $2}') + cnt=$(get_counter ${ns} "MPTcpExtMPJoinAckRx") [ "$cnt" = "${old_cnt}" ] || break sleep 0.1 done @@ -1190,12 +1204,13 @@ chk_csum_nr() fi
printf "%-${nr_blank}s %s" " " "sum" - count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtDataCsumErr | awk '{print $2}') - [ -z "$count" ] && count=0 + count=$(get_counter ${ns1} "MPTcpExtDataCsumErr") if [ "$count" != "$csum_ns1" ]; then extra_msg="$extra_msg ns1=$count" fi - if { [ "$count" != $csum_ns1 ] && [ $allow_multi_errors_ns1 -eq 0 ]; } || + if [ -z "$count" ]; then + echo -n "[skip]" + elif { [ "$count" != $csum_ns1 ] && [ $allow_multi_errors_ns1 -eq 0 ]; } || { [ "$count" -lt $csum_ns1 ] && [ $allow_multi_errors_ns1 -eq 1 ]; }; then echo "[fail] got $count data checksum error[s] expected $csum_ns1" fail_test @@ -1204,12 +1219,13 @@ chk_csum_nr() echo -n "[ ok ]" fi echo -n " - csum " - count=$(ip netns exec $ns2 nstat -as | grep MPTcpExtDataCsumErr | awk '{print $2}') - [ -z "$count" ] && count=0 + count=$(get_counter ${ns2} "MPTcpExtDataCsumErr") if [ "$count" != "$csum_ns2" ]; then extra_msg="$extra_msg ns2=$count" fi - if { [ "$count" != $csum_ns2 ] && [ $allow_multi_errors_ns2 -eq 0 ]; } || + if [ -z "$count" ]; then + echo -n "[skip]" + elif { [ "$count" != $csum_ns2 ] && [ $allow_multi_errors_ns2 -eq 0 ]; } || { [ "$count" -lt $csum_ns2 ] && [ $allow_multi_errors_ns2 -eq 1 ]; }; then echo "[fail] got $count data checksum error[s] expected $csum_ns2" fail_test @@ -1251,12 +1267,13 @@ chk_fail_nr() fi
printf "%-${nr_blank}s %s" " " "ftx" - count=$(ip netns exec $ns_tx nstat -as | grep MPTcpExtMPFailTx | awk '{print $2}') - [ -z "$count" ] && count=0 + count=$(get_counter ${ns_tx} "MPTcpExtMPFailTx") if [ "$count" != "$fail_tx" ]; then extra_msg="$extra_msg,tx=$count" fi - if { [ "$count" != "$fail_tx" ] && [ $allow_tx_lost -eq 0 ]; } || + if [ -z "$count" ]; then + echo -n "[skip]" + elif { [ "$count" != "$fail_tx" ] && [ $allow_tx_lost -eq 0 ]; } || { [ "$count" -gt "$fail_tx" ] && [ $allow_tx_lost -eq 1 ]; }; then echo "[fail] got $count MP_FAIL[s] TX expected $fail_tx" fail_test @@ -1266,12 +1283,13 @@ chk_fail_nr() fi
echo -n " - failrx" - count=$(ip netns exec $ns_rx nstat -as | grep MPTcpExtMPFailRx | awk '{print $2}') - [ -z "$count" ] && count=0 + count=$(get_counter ${ns_rx} "MPTcpExtMPFailRx") if [ "$count" != "$fail_rx" ]; then extra_msg="$extra_msg,rx=$count" fi - if { [ "$count" != "$fail_rx" ] && [ $allow_rx_lost -eq 0 ]; } || + if [ -z "$count" ]; then + echo -n "[skip]" + elif { [ "$count" != "$fail_rx" ] && [ $allow_rx_lost -eq 0 ]; } || { [ "$count" -gt "$fail_rx" ] && [ $allow_rx_lost -eq 1 ]; }; then echo "[fail] got $count MP_FAIL[s] RX expected $fail_rx" fail_test @@ -1303,10 +1321,11 @@ chk_fclose_nr() fi
printf "%-${nr_blank}s %s" " " "ctx" - count=$(ip netns exec $ns_tx nstat -as | grep MPTcpExtMPFastcloseTx | awk '{print $2}') - [ -z "$count" ] && count=0 - [ "$count" != "$fclose_tx" ] && extra_msg="$extra_msg,tx=$count" - if [ "$count" != "$fclose_tx" ]; then + count=$(get_counter ${ns_tx} "MPTcpExtMPFastcloseTx") + if [ -z "$count" ]; then + echo -n "[skip]" + elif [ "$count" != "$fclose_tx" ]; then + extra_msg="$extra_msg,tx=$count" echo "[fail] got $count MP_FASTCLOSE[s] TX expected $fclose_tx" fail_test dump_stats=1 @@ -1315,10 +1334,11 @@ chk_fclose_nr() fi
echo -n " - fclzrx" - count=$(ip netns exec $ns_rx nstat -as | grep MPTcpExtMPFastcloseRx | awk '{print $2}') - [ -z "$count" ] && count=0 - [ "$count" != "$fclose_rx" ] && extra_msg="$extra_msg,rx=$count" - if [ "$count" != "$fclose_rx" ]; then + count=$(get_counter ${ns_rx} "MPTcpExtMPFastcloseRx") + if [ -z "$count" ]; then + echo -n "[skip]" + elif [ "$count" != "$fclose_rx" ]; then + extra_msg="$extra_msg,rx=$count" echo "[fail] got $count MP_FASTCLOSE[s] RX expected $fclose_rx" fail_test dump_stats=1 @@ -1349,9 +1369,10 @@ chk_rst_nr() fi
printf "%-${nr_blank}s %s" " " "rtx" - count=$(ip netns exec $ns_tx nstat -as | grep MPTcpExtMPRstTx | awk '{print $2}') - [ -z "$count" ] && count=0 - if [ $count -lt $rst_tx ]; then + count=$(get_counter ${ns_tx} "MPTcpExtMPRstTx") + if [ -z "$count" ]; then + echo -n "[skip]" + elif [ $count -lt $rst_tx ]; then echo "[fail] got $count MP_RST[s] TX expected $rst_tx" fail_test dump_stats=1 @@ -1360,9 +1381,10 @@ chk_rst_nr() fi
echo -n " - rstrx " - count=$(ip netns exec $ns_rx nstat -as | grep MPTcpExtMPRstRx | awk '{print $2}') - [ -z "$count" ] && count=0 - if [ "$count" -lt "$rst_rx" ]; then + count=$(get_counter ${ns_rx} "MPTcpExtMPRstRx") + if [ -z "$count" ]; then + echo -n "[skip]" + elif [ "$count" -lt "$rst_rx" ]; then echo "[fail] got $count MP_RST[s] RX expected $rst_rx" fail_test dump_stats=1 @@ -1383,9 +1405,10 @@ chk_infi_nr() local dump_stats
printf "%-${nr_blank}s %s" " " "itx" - count=$(ip netns exec $ns2 nstat -as | grep InfiniteMapTx | awk '{print $2}') - [ -z "$count" ] && count=0 - if [ "$count" != "$infi_tx" ]; then + count=$(get_counter ${ns2} "MPTcpExtInfiniteMapTx") + if [ -z "$count" ]; then + echo -n "[skip]" + elif [ "$count" != "$infi_tx" ]; then echo "[fail] got $count infinite map[s] TX expected $infi_tx" fail_test dump_stats=1 @@ -1394,9 +1417,10 @@ chk_infi_nr() fi
echo -n " - infirx" - count=$(ip netns exec $ns1 nstat -as | grep InfiniteMapRx | awk '{print $2}') - [ -z "$count" ] && count=0 - if [ "$count" != "$infi_rx" ]; then + count=$(get_counter ${ns1} "MPTcpExtInfiniteMapRx") + if [ -z "$count" ]; then + echo "[skip]" + elif [ "$count" != "$infi_rx" ]; then echo "[fail] got $count infinite map[s] RX expected $infi_rx" fail_test dump_stats=1 @@ -1428,9 +1452,10 @@ chk_join_nr() fi
printf "%03u %-36s %s" "${TEST_COUNT}" "${title}" "syn" - count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtMPJoinSynRx | awk '{print $2}') - [ -z "$count" ] && count=0 - if [ "$count" != "$syn_nr" ]; then + count=$(get_counter ${ns1} "MPTcpExtMPJoinSynRx") + if [ -z "$count" ]; then + echo -n "[skip]" + elif [ "$count" != "$syn_nr" ]; then echo "[fail] got $count JOIN[s] syn expected $syn_nr" fail_test dump_stats=1 @@ -1440,9 +1465,10 @@ chk_join_nr()
echo -n " - synack" with_cookie=$(ip netns exec $ns2 sysctl -n net.ipv4.tcp_syncookies) - count=$(ip netns exec $ns2 nstat -as | grep MPTcpExtMPJoinSynAckRx | awk '{print $2}') - [ -z "$count" ] && count=0 - if [ "$count" != "$syn_ack_nr" ]; then + count=$(get_counter ${ns2} "MPTcpExtMPJoinSynAckRx") + if [ -z "$count" ]; then + echo -n "[skip]" + elif [ "$count" != "$syn_ack_nr" ]; then # simult connections exceeding the limit with cookie enabled could go up to # synack validation as the conn limit can be enforced reliably only after # the subflow creation @@ -1458,9 +1484,10 @@ chk_join_nr() fi
echo -n " - ack" - count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtMPJoinAckRx | awk '{print $2}') - [ -z "$count" ] && count=0 - if [ "$count" != "$ack_nr" ]; then + count=$(get_counter ${ns1} "MPTcpExtMPJoinAckRx") + if [ -z "$count" ]; then + echo "[skip]" + elif [ "$count" != "$ack_nr" ]; then echo "[fail] got $count JOIN[s] ack expected $ack_nr" fail_test dump_stats=1 @@ -1492,12 +1519,12 @@ chk_stale_nr() local recover_nr
printf "%-${nr_blank}s %-18s" " " "stale" - stale_nr=$(ip netns exec $ns nstat -as | grep MPTcpExtSubflowStale | awk '{print $2}') - [ -z "$stale_nr" ] && stale_nr=0 - recover_nr=$(ip netns exec $ns nstat -as | grep MPTcpExtSubflowRecover | awk '{print $2}') - [ -z "$recover_nr" ] && recover_nr=0
- if [ $stale_nr -lt $stale_min ] || + stale_nr=$(get_counter ${ns} "MPTcpExtSubflowStale") + recover_nr=$(get_counter ${ns} "MPTcpExtSubflowRecover") + if [ -z "$stale_nr" ] || [ -z "$recover_nr" ]; then + echo "[skip]" + elif [ $stale_nr -lt $stale_min ] || { [ $stale_max -gt 0 ] && [ $stale_nr -gt $stale_max ]; } || [ $((stale_nr - recover_nr)) -ne $stale_delta ]; then echo "[fail] got $stale_nr stale[s] $recover_nr recover[s], " \ @@ -1533,12 +1560,12 @@ chk_add_nr() timeout=$(ip netns exec $ns1 sysctl -n net.mptcp.add_addr_timeout)
printf "%-${nr_blank}s %s" " " "add" - count=$(ip netns exec $ns2 nstat -as MPTcpExtAddAddr | grep MPTcpExtAddAddr | awk '{print $2}') - [ -z "$count" ] && count=0 - + count=$(get_counter ${ns2} "MPTcpExtAddAddr") + if [ -z "$count" ]; then + echo -n "[skip]" # if the test configured a short timeout tolerate greater then expected # add addrs options, due to retransmissions - if [ "$count" != "$add_nr" ] && { [ "$timeout" -gt 1 ] || [ "$count" -lt "$add_nr" ]; }; then + elif [ "$count" != "$add_nr" ] && { [ "$timeout" -gt 1 ] || [ "$count" -lt "$add_nr" ]; }; then echo "[fail] got $count ADD_ADDR[s] expected $add_nr" fail_test dump_stats=1 @@ -1547,9 +1574,10 @@ chk_add_nr() fi
echo -n " - echo " - count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtEchoAdd | awk '{print $2}') - [ -z "$count" ] && count=0 - if [ "$count" != "$echo_nr" ]; then + count=$(get_counter ${ns1} "MPTcpExtEchoAdd") + if [ -z "$count" ]; then + echo -n "[skip]" + elif [ "$count" != "$echo_nr" ]; then echo "[fail] got $count ADD_ADDR echo[s] expected $echo_nr" fail_test dump_stats=1 @@ -1559,9 +1587,10 @@ chk_add_nr()
if [ $port_nr -gt 0 ]; then echo -n " - pt " - count=$(ip netns exec $ns2 nstat -as | grep MPTcpExtPortAdd | awk '{print $2}') - [ -z "$count" ] && count=0 - if [ "$count" != "$port_nr" ]; then + count=$(get_counter ${ns2} "MPTcpExtPortAdd") + if [ -z "$count" ]; then + echo "[skip]" + elif [ "$count" != "$port_nr" ]; then echo "[fail] got $count ADD_ADDR[s] with a port-number expected $port_nr" fail_test dump_stats=1 @@ -1570,10 +1599,10 @@ chk_add_nr() fi
printf "%-${nr_blank}s %s" " " "syn" - count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtMPJoinPortSynRx | - awk '{print $2}') - [ -z "$count" ] && count=0 - if [ "$count" != "$syn_nr" ]; then + count=$(get_counter ${ns1} "MPTcpExtMPJoinPortSynRx") + if [ -z "$count" ]; then + echo -n "[skip]" + elif [ "$count" != "$syn_nr" ]; then echo "[fail] got $count JOIN[s] syn with a different \ port-number expected $syn_nr" fail_test @@ -1583,10 +1612,10 @@ chk_add_nr() fi
echo -n " - synack" - count=$(ip netns exec $ns2 nstat -as | grep MPTcpExtMPJoinPortSynAckRx | - awk '{print $2}') - [ -z "$count" ] && count=0 - if [ "$count" != "$syn_ack_nr" ]; then + count=$(get_counter ${ns2} "MPTcpExtMPJoinPortSynAckRx") + if [ -z "$count" ]; then + echo -n "[skip]" + elif [ "$count" != "$syn_ack_nr" ]; then echo "[fail] got $count JOIN[s] synack with a different \ port-number expected $syn_ack_nr" fail_test @@ -1596,10 +1625,10 @@ chk_add_nr() fi
echo -n " - ack" - count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtMPJoinPortAckRx | - awk '{print $2}') - [ -z "$count" ] && count=0 - if [ "$count" != "$ack_nr" ]; then + count=$(get_counter ${ns1} "MPTcpExtMPJoinPortAckRx") + if [ -z "$count" ]; then + echo "[skip]" + elif [ "$count" != "$ack_nr" ]; then echo "[fail] got $count JOIN[s] ack with a different \ port-number expected $ack_nr" fail_test @@ -1609,10 +1638,10 @@ chk_add_nr() fi
printf "%-${nr_blank}s %s" " " "syn" - count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtMismatchPortSynRx | - awk '{print $2}') - [ -z "$count" ] && count=0 - if [ "$count" != "$mis_syn_nr" ]; then + count=$(get_counter ${ns1} "MPTcpExtMismatchPortSynRx") + if [ -z "$count" ]; then + echo -n "[skip]" + elif [ "$count" != "$mis_syn_nr" ]; then echo "[fail] got $count JOIN[s] syn with a mismatched \ port-number expected $mis_syn_nr" fail_test @@ -1622,10 +1651,10 @@ chk_add_nr() fi
echo -n " - ack " - count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtMismatchPortAckRx | - awk '{print $2}') - [ -z "$count" ] && count=0 - if [ "$count" != "$mis_ack_nr" ]; then + count=$(get_counter ${ns1} "MPTcpExtMismatchPortAckRx") + if [ -z "$count" ]; then + echo "[skip]" + elif [ "$count" != "$mis_ack_nr" ]; then echo "[fail] got $count JOIN[s] ack with a mismatched \ port-number expected $mis_ack_nr" fail_test @@ -1669,9 +1698,10 @@ chk_rm_nr() fi
printf "%-${nr_blank}s %s" " " "rm " - count=$(ip netns exec $addr_ns nstat -as | grep MPTcpExtRmAddr | awk '{print $2}') - [ -z "$count" ] && count=0 - if [ "$count" != "$rm_addr_nr" ]; then + count=$(get_counter ${addr_ns} "MPTcpExtRmAddr") + if [ -z "$count" ]; then + echo -n "[skip]" + elif [ "$count" != "$rm_addr_nr" ]; then echo "[fail] got $count RM_ADDR[s] expected $rm_addr_nr" fail_test dump_stats=1 @@ -1680,29 +1710,27 @@ chk_rm_nr() fi
echo -n " - rmsf " - count=$(ip netns exec $subflow_ns nstat -as | grep MPTcpExtRmSubflow | awk '{print $2}') - [ -z "$count" ] && count=0 - if [ -n "$simult" ]; then + count=$(get_counter ${subflow_ns} "MPTcpExtRmSubflow") + if [ -z "$count" ]; then + echo -n "[skip]" + elif [ -n "$simult" ]; then local cnt suffix
- cnt=$(ip netns exec $addr_ns nstat -as | grep MPTcpExtRmSubflow | awk '{print $2}') + cnt=$(get_counter ${addr_ns} "MPTcpExtRmSubflow")
# in case of simult flush, the subflow removal count on each side is # unreliable - [ -z "$cnt" ] && cnt=0 count=$((count + cnt)) [ "$count" != "$rm_subflow_nr" ] && suffix="$count in [$rm_subflow_nr:$((rm_subflow_nr*2))]" if [ $count -ge "$rm_subflow_nr" ] && \ [ "$count" -le "$((rm_subflow_nr *2 ))" ]; then - echo "[ ok ] $suffix" + echo -n "[ ok ] $suffix" else echo "[fail] got $count RM_SUBFLOW[s] expected in range [$rm_subflow_nr:$((rm_subflow_nr*2))]" fail_test dump_stats=1 fi - return - fi - if [ "$count" != "$rm_subflow_nr" ]; then + elif [ "$count" != "$rm_subflow_nr" ]; then echo "[fail] got $count RM_SUBFLOW[s] expected $rm_subflow_nr" fail_test dump_stats=1 @@ -1723,9 +1751,10 @@ chk_prio_nr() local dump_stats
printf "%-${nr_blank}s %s" " " "ptx" - count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtMPPrioTx | awk '{print $2}') - [ -z "$count" ] && count=0 - if [ "$count" != "$mp_prio_nr_tx" ]; then + count=$(get_counter ${ns1} "MPTcpExtMPPrioTx") + if [ -z "$count" ]; then + echo -n "[skip]" + elif [ "$count" != "$mp_prio_nr_tx" ]; then echo "[fail] got $count MP_PRIO[s] TX expected $mp_prio_nr_tx" fail_test dump_stats=1 @@ -1734,9 +1763,10 @@ chk_prio_nr() fi
echo -n " - prx " - count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtMPPrioRx | awk '{print $2}') - [ -z "$count" ] && count=0 - if [ "$count" != "$mp_prio_nr_rx" ]; then + count=$(get_counter ${ns1} "MPTcpExtMPPrioRx") + if [ -z "$count" ]; then + echo "[skip]" + elif [ "$count" != "$mp_prio_nr_rx" ]; then echo "[fail] got $count MP_PRIO[s] RX expected $mp_prio_nr_rx" fail_test dump_stats=1 @@ -1812,7 +1842,7 @@ wait_attempt_fail() while [ $time -lt $timeout_ms ]; do local cnt
- cnt=$(ip netns exec $ns nstat -as TcpAttemptFails | grep TcpAttemptFails | awk '{print $2}') + cnt=$(get_counter ${ns} "TcpAttemptFails")
[ "$cnt" = 1 ] && return 1 time=$((time + 100))
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 4a0b866a3f7d3c22033f40e93e94befc6fe51bce upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
Some tests are using IPTables and/or TC commands to force some behaviours. If one of these commands fails -- likely because some features are not available due to missing kernel config -- we should intercept the error and skip the tests requiring these features.
Note that if we expect to have these features available and if SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var is set to 1, the tests will be marked as failed instead of skipped.
This patch also replaces the 'exit 1' by 'return 1' not to stop the selftest in the middle without the conclusion if there is an issue with NF or TC.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 8d014eaa9254 ("selftests: mptcp: add ADD_ADDR timeout test case") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 88 +++++++++++++++--------- 1 file changed, 57 insertions(+), 31 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -286,11 +286,15 @@ reset_with_add_addr_timeout() fi
ip netns exec $ns1 sysctl -q net.mptcp.add_addr_timeout=1 - ip netns exec $ns2 $tables -A OUTPUT -p tcp \ - -m tcp --tcp-option 30 \ - -m bpf --bytecode \ - "$CBPF_MPTCP_SUBOPTION_ADD_ADDR" \ - -j DROP + + if ! ip netns exec $ns2 $tables -A OUTPUT -p tcp \ + -m tcp --tcp-option 30 \ + -m bpf --bytecode \ + "$CBPF_MPTCP_SUBOPTION_ADD_ADDR" \ + -j DROP; then + mark_as_skipped "unable to set the 'add addr' rule" + return 1 + fi }
# $1: test name @@ -334,17 +338,12 @@ reset_with_allow_join_id0() # tc action pedit offset 162 out of bounds # # Netfilter is used to mark packets with enough data. -reset_with_fail() +setup_fail_rules() { - reset "${1}" || return 1 - - ip netns exec $ns1 sysctl -q net.mptcp.checksum_enabled=1 - ip netns exec $ns2 sysctl -q net.mptcp.checksum_enabled=1 - check_invert=1 validate_checksum=1 - local i="$2" - local ip="${3:-4}" + local i="$1" + local ip="${2:-4}" local tables
tables="${iptables}" @@ -359,15 +358,32 @@ reset_with_fail() -p tcp \ -m length --length 150:9999 \ -m statistic --mode nth --packet 1 --every 99999 \ - -j MARK --set-mark 42 || exit 1 + -j MARK --set-mark 42 || return ${ksft_skip}
- tc -n $ns2 qdisc add dev ns2eth$i clsact || exit 1 + tc -n $ns2 qdisc add dev ns2eth$i clsact || return ${ksft_skip} tc -n $ns2 filter add dev ns2eth$i egress \ protocol ip prio 1000 \ handle 42 fw \ action pedit munge offset 148 u8 invert \ pipe csum tcp \ - index 100 || exit 1 + index 100 || return ${ksft_skip} +} + +reset_with_fail() +{ + reset "${1}" || return 1 + shift + + ip netns exec $ns1 sysctl -q net.mptcp.checksum_enabled=1 + ip netns exec $ns2 sysctl -q net.mptcp.checksum_enabled=1 + + local rc=0 + setup_fail_rules "${@}" || rc=$? + + if [ ${rc} -eq ${ksft_skip} ]; then + mark_as_skipped "unable to set the 'fail' rules" + return 1 + fi }
reset_with_events() @@ -382,6 +398,25 @@ reset_with_events() evts_ns2_pid=$! }
+reset_with_tcp_filter() +{ + reset "${1}" || return 1 + shift + + local ns="${!1}" + local src="${2}" + local target="${3}" + + if ! ip netns exec "${ns}" ${iptables} \ + -A INPUT \ + -s "${src}" \ + -p tcp \ + -j "${target}"; then + mark_as_skipped "unable to set the filter rules" + return 1 + fi +} + fail_test() { ret=1 @@ -745,15 +780,6 @@ pm_nl_check_endpoint() fi }
-filter_tcp_from() -{ - local ns="${1}" - local src="${2}" - local target="${3}" - - ip netns exec "${ns}" ${iptables} -A INPUT -s "${src}" -p tcp -j "${target}" -} - do_transfer() { local listener_ns="$1" @@ -1935,23 +1961,23 @@ subflows_error_tests() fi
# multiple subflows, with subflow creation error - if reset "multi subflows, with failing subflow"; then + if reset_with_tcp_filter "multi subflows, with failing subflow" ns1 10.0.3.2 REJECT && + continue_if mptcp_lib_kallsyms_has "mptcp_pm_subflow_check_next$"; then pm_nl_set_limits $ns1 0 2 pm_nl_set_limits $ns2 0 2 pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow pm_nl_add_endpoint $ns2 10.0.2.2 flags subflow - filter_tcp_from $ns1 10.0.3.2 REJECT run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow chk_join_nr 1 1 1 fi
# multiple subflows, with subflow timeout on MPJ - if reset "multi subflows, with subflow timeout"; then + if reset_with_tcp_filter "multi subflows, with subflow timeout" ns1 10.0.3.2 DROP && + continue_if mptcp_lib_kallsyms_has "mptcp_pm_subflow_check_next$"; then pm_nl_set_limits $ns1 0 2 pm_nl_set_limits $ns2 0 2 pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow pm_nl_add_endpoint $ns2 10.0.2.2 flags subflow - filter_tcp_from $ns1 10.0.3.2 DROP run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow chk_join_nr 1 1 1 fi @@ -1959,11 +1985,11 @@ subflows_error_tests() # multiple subflows, check that the endpoint corresponding to # closed subflow (due to reset) is not reused if additional # subflows are added later - if reset "multi subflows, fair usage on close"; then + if reset_with_tcp_filter "multi subflows, fair usage on close" ns1 10.0.3.2 REJECT && + continue_if mptcp_lib_kallsyms_has "mptcp_pm_subflow_check_next$"; then pm_nl_set_limits $ns1 0 1 pm_nl_set_limits $ns2 0 1 pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow - filter_tcp_from $ns1 10.0.3.2 REJECT run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow &
# mpj subflow will be in TW after the reset
From: Matthieu Baerts matthieu.baerts@tessares.net
commit d4c81bbb8600257fd3076d0196cb08bd2e5bdf24 upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
At some points, a new feature caused internal behaviour changes we are verifying in the selftests, see the Fixes tag below. It was not a uAPI change but because in these selftests, we check some internal behaviours, it is normal we have to adapt them from time to time after having added some features.
It is possible to look for "mptcp_pm_subflow_check_next" in kallsyms because it was needed to introduce the mentioned feature. So we can know in advance what the behaviour we are expecting here instead of supporting the two behaviours.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -2089,11 +2089,18 @@ signal_address_tests() # the peer could possibly miss some addr notification, allow retransmission ip netns exec $ns1 sysctl -q net.mptcp.add_addr_timeout=1 run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow - chk_join_nr 3 3 3
- # the server will not signal the address terminating - # the MPC subflow - chk_add_nr 3 3 + # It is not directly linked to the commit introducing this + # symbol but for the parent one which is linked anyway. + if ! mptcp_lib_kallsyms_has "mptcp_pm_subflow_check_next$"; then + chk_join_nr 3 3 2 + chk_add_nr 4 4 + else + chk_join_nr 3 3 3 + # the server will not signal the address terminating + # the MPC subflow + chk_add_nr 3 3 + fi fi }
From: Matthieu Baerts matthieu.baerts@tessares.net
commit ae947bb2c253ff5f395bb70cb9db8700543bf398 upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
One of them is the support of MP_FASTCLOSE introduced in commit f284c0c77321 ("mptcp: implement fastclose xmit path").
If the MIB counter is not available, the test cannot be verified and the behaviour will not be the expected one. So we can skip the test if the counter is missing.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 01542c9bf9ab ("selftests: mptcp: add fastclose testcase") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -261,6 +261,19 @@ reset() return 0 }
+# $1: test name ; $2: counter to check +reset_check_counter() +{ + reset "${1}" || return 1 + + local counter="${2}" + + if ! nstat -asz "${counter}" | grep -wq "${counter}"; then + mark_as_skipped "counter '${counter}' is not available" + return 1 + fi +} + # $1: test name reset_with_cookies() { @@ -3081,14 +3094,14 @@ fullmesh_tests()
fastclose_tests() { - if reset "fastclose test"; then + if reset_check_counter "fastclose test" "MPTcpExtMPFastcloseTx"; then run_tests $ns1 $ns2 10.0.1.1 1024 0 fastclose_client chk_join_nr 0 0 0 chk_fclose_nr 1 1 chk_rst_nr 1 1 invert fi
- if reset "fastclose server test"; then + if reset_check_counter "fastclose server test" "MPTcpExtMPFastcloseRx"; then run_tests $ns1 $ns2 10.0.1.1 1024 0 fastclose_server chk_join_nr 0 0 0 chk_fclose_nr 1 1 invert
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 425ba803124b90cb9124d99f13b372a89dc151d9 upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
At some points, a new feature caused internal behaviour changes we are verifying in the selftests, see the Fixes tag below. It was not a UAPI change but because in these selftests, we check some internal behaviours, it is normal we have to adapt them from time to time after having added some features.
It looks like there is no external sign we can use to predict the expected behaviour. Instead of accepting different behaviours and thus not really checking for the expected behaviour, we are looking here for a specific kernel version. That's not ideal but it looks better than removing the test because it cannot support older kernel versions.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 6fa0174a7c86 ("mptcp: more careful RM_ADDR generation") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -2354,7 +2354,12 @@ remove_tests() pm_nl_add_endpoint $ns2 10.0.4.2 flags subflow run_tests $ns1 $ns2 10.0.1.1 0 -8 -8 slow chk_join_nr 3 3 3 - chk_rm_nr 0 3 simult + + if mptcp_lib_kversion_ge 5.18; then + chk_rm_nr 0 3 simult + else + chk_rm_nr 3 3 + fi fi
# addresses flush
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 36c4127ae8dd0ebac6d56d8a1b272dd483471c40 upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
One of them is the support of the implicit endpoints introduced by commit d045b9eb95a9 ("mptcp: introduce implicit endpoints").
It is possible to look for "mptcp_subflow_send_ack" in kallsyms because it was needed to introduce the mentioned feature. So we can know in advance if the feature is supported instead of trying and accepting any results.
Note that here and in the following commits, we re-do the same check for each sub-test of the same function for a few reasons. The main one is not to break the ID assign to each test in order to be able to easily compare results between different kernel versions. Also, we can still run a specific test even if it is skipped. Another reason is that it makes it clear during the review that a specific subtest will be skipped or not under certain conditions. At the end, it looks OK to call the exact same helper multiple times: it is not a critical path and it is the same code that is executed, not really more cases to maintain.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 69c6ce7b6eca ("selftests: mptcp: add implicit endpoint test case") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -3232,8 +3232,10 @@ userspace_tests()
endpoint_tests() { + # subflow_rebuild_header is needed to support the implicit flag # userspace pm type prevents add_addr - if reset "implicit EP"; then + if reset "implicit EP" && + mptcp_lib_kallsyms_has "subflow_rebuild_header$"; then pm_nl_set_limits $ns1 2 2 pm_nl_set_limits $ns2 2 2 pm_nl_add_endpoint $ns1 10.0.2.1 flags signal @@ -3253,7 +3255,8 @@ endpoint_tests() kill_tests_wait fi
- if reset "delete and re-add"; then + if reset "delete and re-add" && + mptcp_lib_kallsyms_has "subflow_rebuild_header$"; then pm_nl_set_limits $ns1 1 1 pm_nl_set_limits $ns2 1 1 pm_nl_add_endpoint $ns2 10.0.2.2 id 2 dev ns2eth2 flags subflow
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 07216a3c5d926bf1b6b360a0073747228a1f9b7f upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
Commit bccefb762439 ("selftests: mptcp: simplify pm_nl_change_endpoint") has simplified the way the backup flag is set on an endpoint. Instead of doing:
./pm_nl_ctl set 10.0.2.1 flags backup
Now we do:
./pm_nl_ctl set id 1 flags backup
The new way is easier to maintain but it is also incompatible with older kernels not supporting the implicit endpoints putting in place the infrastructure to set flags per ID, hence the second Fixes tag.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: bccefb762439 ("selftests: mptcp: simplify pm_nl_change_endpoint") Cc: stable@vger.kernel.org Fixes: 4cf86ae84c71 ("mptcp: strict local address ID selection") Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -2639,7 +2639,8 @@ mixed_tests() backup_tests() { # single subflow, backup - if reset "single subflow, backup"; then + if reset "single subflow, backup" && + continue_if mptcp_lib_kallsyms_has "subflow_rebuild_header$"; then pm_nl_set_limits $ns1 0 1 pm_nl_set_limits $ns2 0 1 pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow,backup @@ -2649,7 +2650,8 @@ backup_tests() fi
# single address, backup - if reset "single address, backup"; then + if reset "single address, backup" && + continue_if mptcp_lib_kallsyms_has "subflow_rebuild_header$"; then pm_nl_set_limits $ns1 0 1 pm_nl_add_endpoint $ns1 10.0.2.1 flags signal pm_nl_set_limits $ns2 1 1 @@ -2660,7 +2662,8 @@ backup_tests() fi
# single address with port, backup - if reset "single address with port, backup"; then + if reset "single address with port, backup" && + continue_if mptcp_lib_kallsyms_has "subflow_rebuild_header$"; then pm_nl_set_limits $ns1 0 1 pm_nl_add_endpoint $ns1 10.0.2.1 flags signal port 10100 pm_nl_set_limits $ns2 1 1
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 9db34c4294af9999edc773d96744e2d2d4eb5060 upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
One of them is the support of the fullmesh flag for the in-kernel PM introduced by commit 2843ff6f36db ("mptcp: remote addresses fullmesh") and commit 1a0d6136c5f0 ("mptcp: local addresses fullmesh").
It looks like there is no easy external sign we can use to predict the expected behaviour. We could add the flag and then check if it has been added but for that, and for each fullmesh test, we would need to setup a new environment, do the checks, clean it and then only start the test from yet another clean environment. To keep it simple and avoid introducing new issues, we look for a specific kernel version. That's not ideal but an acceptable solution for this case.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 6a0653b96f5d ("selftests: mptcp: add fullmesh setting tests") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -3058,7 +3058,8 @@ fullmesh_tests() fi
# set fullmesh flag - if reset "set fullmesh flag test"; then + if reset "set fullmesh flag test" && + continue_if mptcp_lib_kversion_ge 5.18; then pm_nl_set_limits $ns1 4 4 pm_nl_add_endpoint $ns1 10.0.2.1 flags subflow pm_nl_set_limits $ns2 4 4 @@ -3068,7 +3069,8 @@ fullmesh_tests() fi
# set nofullmesh flag - if reset "set nofullmesh flag test"; then + if reset "set nofullmesh flag test" && + continue_if mptcp_lib_kversion_ge 5.18; then pm_nl_set_limits $ns1 4 4 pm_nl_add_endpoint $ns1 10.0.2.1 flags subflow,fullmesh pm_nl_set_limits $ns2 4 4 @@ -3078,7 +3080,8 @@ fullmesh_tests() fi
# set backup,fullmesh flags - if reset "set backup,fullmesh flags test"; then + if reset "set backup,fullmesh flags test" && + continue_if mptcp_lib_kversion_ge 5.18; then pm_nl_set_limits $ns1 4 4 pm_nl_add_endpoint $ns1 10.0.2.1 flags subflow pm_nl_set_limits $ns2 4 4 @@ -3089,7 +3092,8 @@ fullmesh_tests() fi
# set nobackup,nofullmesh flags - if reset "set nobackup,nofullmesh flags test"; then + if reset "set nobackup,nofullmesh flags test" && + continue_if mptcp_lib_kversion_ge 5.18; then pm_nl_set_limits $ns1 4 4 pm_nl_set_limits $ns2 4 4 pm_nl_add_endpoint $ns2 10.0.2.2 flags subflow,backup,fullmesh
From: Matthieu Baerts matthieu.baerts@tessares.net
commit f2b492b04a167261e1c38eb76f78fb4294473a49 upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
One of them is the support of the userspace PM introduced by commit 4638de5aefe5 ("mptcp: handle local addrs announced by userspace PMs") and the following ones.
It is possible to look for the MPTCP pm_type's sysctl knob to know in advance if the userspace PM is available.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 5ac1d2d63451 ("selftests: mptcp: Add tests for userspace PM type") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 26 +++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -84,7 +84,7 @@ init_partial() ip netns add $netns || exit $ksft_skip ip -net $netns link set lo up ip netns exec $netns sysctl -q net.mptcp.enabled=1 - ip netns exec $netns sysctl -q net.mptcp.pm_type=0 + ip netns exec $netns sysctl -q net.mptcp.pm_type=0 2>/dev/null || true ip netns exec $netns sysctl -q net.ipv4.conf.all.rp_filter=0 ip netns exec $netns sysctl -q net.ipv4.conf.default.rp_filter=0 if [ $checksum -eq 1 ]; then @@ -3151,7 +3151,8 @@ fail_tests() userspace_tests() { # userspace pm type prevents add_addr - if reset "userspace pm type prevents add_addr"; then + if reset "userspace pm type prevents add_addr" && + continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then set_userspace_pm $ns1 pm_nl_set_limits $ns1 0 2 pm_nl_set_limits $ns2 0 2 @@ -3162,7 +3163,8 @@ userspace_tests() fi
# userspace pm type does not echo add_addr without daemon - if reset "userspace pm no echo w/o daemon"; then + if reset "userspace pm no echo w/o daemon" && + continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then set_userspace_pm $ns2 pm_nl_set_limits $ns1 0 2 pm_nl_set_limits $ns2 0 2 @@ -3173,7 +3175,8 @@ userspace_tests() fi
# userspace pm type rejects join - if reset "userspace pm type rejects join"; then + if reset "userspace pm type rejects join" && + continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then set_userspace_pm $ns1 pm_nl_set_limits $ns1 1 1 pm_nl_set_limits $ns2 1 1 @@ -3183,7 +3186,8 @@ userspace_tests() fi
# userspace pm type does not send join - if reset "userspace pm type does not send join"; then + if reset "userspace pm type does not send join" && + continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then set_userspace_pm $ns2 pm_nl_set_limits $ns1 1 1 pm_nl_set_limits $ns2 1 1 @@ -3193,7 +3197,8 @@ userspace_tests() fi
# userspace pm type prevents mp_prio - if reset "userspace pm type prevents mp_prio"; then + if reset "userspace pm type prevents mp_prio" && + continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then set_userspace_pm $ns1 pm_nl_set_limits $ns1 1 1 pm_nl_set_limits $ns2 1 1 @@ -3204,7 +3209,8 @@ userspace_tests() fi
# userspace pm type prevents rm_addr - if reset "userspace pm type prevents rm_addr"; then + if reset "userspace pm type prevents rm_addr" && + continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then set_userspace_pm $ns1 set_userspace_pm $ns2 pm_nl_set_limits $ns1 0 1 @@ -3216,7 +3222,8 @@ userspace_tests() fi
# userspace pm add & remove address - if reset_with_events "userspace pm add & remove address"; then + if reset_with_events "userspace pm add & remove address" && + continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then set_userspace_pm $ns1 pm_nl_set_limits $ns2 1 1 run_tests $ns1 $ns2 10.0.1.1 0 userspace_1 0 slow @@ -3227,7 +3234,8 @@ userspace_tests() fi
# userspace pm create destroy subflow - if reset_with_events "userspace pm create destroy subflow"; then + if reset_with_events "userspace pm create destroy subflow" && + continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then set_userspace_pm $ns2 pm_nl_set_limits $ns1 0 1 run_tests $ns1 $ns2 10.0.1.1 0 0 userspace_1 slow
From: Matthieu Baerts matthieu.baerts@tessares.net
commit ff8897b5189495b47895ca247b860a29dc04b36b upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
One of them is the support of the MP_FAIL / infinite mapping introduced by commit 1e39e5a32ad7 ("mptcp: infinite mapping sending") and the following ones.
It is possible to look for one of the infinite mapping counters to know in advance if the this feature is available.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: b6e074e171bc ("selftests: mptcp: add infinite map testcase") Cc: stable@vger.kernel.org Fixes: 2ba18161d407 ("selftests: mptcp: add MP_FAIL reset testcase") Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -384,7 +384,7 @@ setup_fail_rules()
reset_with_fail() { - reset "${1}" || return 1 + reset_check_counter "${1}" "MPTcpExtInfiniteMapTx" || return 1 shift
ip netns exec $ns1 sysctl -q net.mptcp.checksum_enabled=1
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 632978f0a961b4591a05ba9e39eab24541d83e84 upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
One of them is the support of sending an MP_PRIO signal for the initial subflow, introduced by commit c157bbe776b7 ("mptcp: allow the in kernel PM to set MPC subflow priority").
It is possible to look for "mptcp_subflow_send_ack" in kallsyms because it was needed to introduce the mentioned feature. So we can know in advance if the feature is supported instead of trying and accepting any results.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 914f6a59b10f ("selftests: mptcp: add MPC backup tests") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -2673,14 +2673,16 @@ backup_tests() chk_prio_nr 1 1 fi
- if reset "mpc backup"; then + if reset "mpc backup" && + continue_if mptcp_lib_kallsyms_doesnt_have "mptcp_subflow_send_ack$"; then pm_nl_add_endpoint $ns2 10.0.1.2 flags subflow,backup run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow chk_join_nr 0 0 0 chk_prio_nr 0 1 fi
- if reset "mpc backup both sides"; then + if reset "mpc backup both sides" && + continue_if mptcp_lib_kallsyms_doesnt_have "mptcp_subflow_send_ack$"; then pm_nl_add_endpoint $ns1 10.0.1.1 flags subflow,backup pm_nl_add_endpoint $ns2 10.0.1.2 flags subflow,backup run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow @@ -2688,14 +2690,16 @@ backup_tests() chk_prio_nr 1 1 fi
- if reset "mpc switch to backup"; then + if reset "mpc switch to backup" && + continue_if mptcp_lib_kallsyms_doesnt_have "mptcp_subflow_send_ack$"; then pm_nl_add_endpoint $ns2 10.0.1.2 flags subflow run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow backup chk_join_nr 0 0 0 chk_prio_nr 0 1 fi
- if reset "mpc switch to backup both sides"; then + if reset "mpc switch to backup both sides" && + continue_if mptcp_lib_kallsyms_doesnt_have "mptcp_subflow_send_ack$"; then pm_nl_add_endpoint $ns1 10.0.1.1 flags subflow pm_nl_add_endpoint $ns2 10.0.1.2 flags subflow run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow backup
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 0471bb479af03874b09350fcfe51d3743a5608de upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
One of them is the support of PM listener events introduced by commit f8c9dfbd875b ("mptcp: add pm listener events").
It is possible to look for "mptcp_event_pm_listener" in kallsyms to know in advance if the kernel supports this feature.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 178d023208eb ("selftests: mptcp: listener test for in-kernel PM") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 5 +++++ 1 file changed, 5 insertions(+)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -2734,6 +2734,11 @@ verify_listener_events() $e_saddr $e_sport fi
+ if ! mptcp_lib_kallsyms_has "mptcp_event_pm_listener$"; then + printf "[skip]: event not supported\n" + return + fi + type=$(grep "type:$e_type," $evt | sed --unbuffered -n 's/.*(type:)([[:digit:]]*).*$/\2/p;q') family=$(grep "type:$e_type," $evt |
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 96b84195df61d374d8028cf426a115ae085031ec upstream.
The alignment was different from the other tests because tabs were used instead of spaces.
While at it, also use 'echo' instead of 'printf' to print the result to keep the same style as done in the other sub-tests. And, even if it should be better with, also remove 'stdbuf' and sed's '--unbuffered' option because they are not used in the other subtests and they are not available when using a minimal environment with busybox.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 178d023208eb ("selftests: mptcp: listener test for in-kernel PM") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 30 +++++++++++------------- 1 file changed, 14 insertions(+), 16 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -2725,43 +2725,41 @@ verify_listener_events() local family local saddr local sport + local name
if [ $e_type = $LISTENER_CREATED ]; then - stdbuf -o0 -e0 printf "\t\t\t\t\t CREATE_LISTENER %s:%s"\ - $e_saddr $e_sport + name="LISTENER_CREATED" elif [ $e_type = $LISTENER_CLOSED ]; then - stdbuf -o0 -e0 printf "\t\t\t\t\t CLOSE_LISTENER %s:%s "\ - $e_saddr $e_sport + name="LISTENER_CLOSED" + else + name="$e_type" fi
+ printf "%-${nr_blank}s %s %s:%s " " " "$name" "$e_saddr" "$e_sport" + if ! mptcp_lib_kallsyms_has "mptcp_event_pm_listener$"; then printf "[skip]: event not supported\n" return fi
- type=$(grep "type:$e_type," $evt | - sed --unbuffered -n 's/.*(type:)([[:digit:]]*).*$/\2/p;q') - family=$(grep "type:$e_type," $evt | - sed --unbuffered -n 's/.*(family:)([[:digit:]]*).*$/\2/p;q') - sport=$(grep "type:$e_type," $evt | - sed --unbuffered -n 's/.*(sport:)([[:digit:]]*).*$/\2/p;q') + type=$(grep "type:$e_type," $evt | sed -n 's/.*(type:)([[:digit:]]*).*$/\2/p;q') + family=$(grep "type:$e_type," $evt | sed -n 's/.*(family:)([[:digit:]]*).*$/\2/p;q') + sport=$(grep "type:$e_type," $evt | sed -n 's/.*(sport:)([[:digit:]]*).*$/\2/p;q') if [ $family ] && [ $family = $AF_INET6 ]; then - saddr=$(grep "type:$e_type," $evt | - sed --unbuffered -n 's/.*(saddr6:)([0-9a-f:.]*).*$/\2/p;q') + saddr=$(grep "type:$e_type," $evt | sed -n 's/.*(saddr6:)([0-9a-f:.]*).*$/\2/p;q') else - saddr=$(grep "type:$e_type," $evt | - sed --unbuffered -n 's/.*(saddr4:)([0-9.]*).*$/\2/p;q') + saddr=$(grep "type:$e_type," $evt | sed -n 's/.*(saddr4:)([0-9.]*).*$/\2/p;q') fi
if [ $type ] && [ $type = $e_type ] && [ $family ] && [ $family = $e_family ] && [ $saddr ] && [ $saddr = $e_saddr ] && [ $sport ] && [ $sport = $e_sport ]; then - stdbuf -o0 -e0 printf "[ ok ]\n" + echo "[ ok ]" return 0 fi fail_test - stdbuf -o0 -e0 printf "[fail]\n" + echo "[fail]" }
add_addr_ports_tests()
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 6673851be0fc1bfc3353ffb52ff26ae5468f12c9 upstream.
Selftests are supposed to run on any kernels, including the old ones not supporting all MPTCP features.
One of them is the support of a mix of subflows in v4 and v6 by the in-kernel PM introduced by commit b9d69db87fb7 ("mptcp: let the in-kernel PM use mixed IPv4 and IPv6 addresses").
It looks like there is no external sign we can use to predict the expected behaviour. Instead of accepting different behaviours and thus not really checking for the expected behaviour, we are looking here for a specific kernel version. That's not ideal but it looks better than removing the test because it cannot support older kernel versions.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: ad3493746ebe ("selftests: mptcp: add test-cases for mixed v4/v6 subflows") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -2597,7 +2597,8 @@ v4mapped_tests()
mixed_tests() { - if reset "IPv4 sockets do not use IPv6 addresses"; then + if reset "IPv4 sockets do not use IPv6 addresses" && + continue_if mptcp_lib_kversion_ge 6.3; then pm_nl_set_limits $ns1 0 1 pm_nl_set_limits $ns2 1 1 pm_nl_add_endpoint $ns1 dead:beef:2::1 flags signal @@ -2606,7 +2607,8 @@ mixed_tests() fi
# Need an IPv6 mptcp socket to allow subflows of both families - if reset "simult IPv4 and IPv6 subflows"; then + if reset "simult IPv4 and IPv6 subflows" && + continue_if mptcp_lib_kversion_ge 6.3; then pm_nl_set_limits $ns1 0 1 pm_nl_set_limits $ns2 1 1 pm_nl_add_endpoint $ns1 10.0.1.1 flags signal @@ -2615,7 +2617,8 @@ mixed_tests() fi
# cross families subflows will not be created even in fullmesh mode - if reset "simult IPv4 and IPv6 subflows, fullmesh 1x1"; then + if reset "simult IPv4 and IPv6 subflows, fullmesh 1x1" && + continue_if mptcp_lib_kversion_ge 6.3; then pm_nl_set_limits $ns1 0 4 pm_nl_set_limits $ns2 1 4 pm_nl_add_endpoint $ns2 dead:beef:2::2 flags subflow,fullmesh @@ -2626,7 +2629,8 @@ mixed_tests()
# fullmesh still tries to create all the possibly subflows with # matching family - if reset "simult IPv4 and IPv6 subflows, fullmesh 2x2"; then + if reset "simult IPv4 and IPv6 subflows, fullmesh 2x2" && + continue_if mptcp_lib_kversion_ge 6.3; then pm_nl_set_limits $ns1 0 4 pm_nl_set_limits $ns2 2 4 pm_nl_add_endpoint $ns1 10.0.2.1 flags signal
From: Roberto Sassu roberto.sassu@huawei.com
commit 935d44acf621aa0688fef8312dec3e5940f38f4e upstream.
Ensure that file_seals is non-NULL before using it in the memfd_create() syscall. One situation in which memfd_file_seals_ptr() could return a NULL pointer when CONFIG_SHMEM=n, oopsing the kernel.
Link: https://lkml.kernel.org/r/20230607132427.2867435-1-roberto.sassu@huaweicloud... Fixes: 47b9012ecdc7 ("shmem: add sealing support to hugetlb-backed memfd") Signed-off-by: Roberto Sassu roberto.sassu@huawei.com Cc: Marc-Andr Lureau marcandre.lureau@redhat.com Cc: Mike Kravetz mike.kravetz@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memfd.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/mm/memfd.c +++ b/mm/memfd.c @@ -375,12 +375,15 @@ SYSCALL_DEFINE2(memfd_create,
inode->i_mode &= ~0111; file_seals = memfd_file_seals_ptr(file); - *file_seals &= ~F_SEAL_SEAL; - *file_seals |= F_SEAL_EXEC; + if (file_seals) { + *file_seals &= ~F_SEAL_SEAL; + *file_seals |= F_SEAL_EXEC; + } } else if (flags & MFD_ALLOW_SEALING) { /* MFD_EXEC and MFD_ALLOW_SEALING are set */ file_seals = memfd_file_seals_ptr(file); - *file_seals &= ~F_SEAL_SEAL; + if (file_seals) + *file_seals &= ~F_SEAL_SEAL; }
fd_install(fd, file);
From: Rafael Aquini aquini@redhat.com
commit 54abe19e00cfcc5a72773d15cd00ed19ab763439 upstream.
When commit 19343b5bdd16 ("mm/page-writeback: introduce tracepoint for wait_on_page_writeback()") repurposed the writeback_dirty_page trace event as a template to create its new wait_on_page_writeback trace event, it ended up opening a window to NULL pointer dereference crashes due to the (infrequent) occurrence of a race where an access to a page in the swap-cache happens concurrently with the moment this page is being written to disk and the tracepoint is enabled:
BUG: kernel NULL pointer dereference, address: 0000000000000040 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 800000010ec0a067 P4D 800000010ec0a067 PUD 102353067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 1320 Comm: shmem-worker Kdump: loaded Not tainted 6.4.0-rc5+ #13 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230301gitf80f052277c8-1.fc37 03/01/2023 RIP: 0010:trace_event_raw_event_writeback_folio_template+0x76/0xf0 Code: 4d 85 e4 74 5c 49 8b 3c 24 e8 06 98 ee ff 48 89 c7 e8 9e 8b ee ff ba 20 00 00 00 48 89 ef 48 89 c6 e8 fe d4 1a 00 49 8b 04 24 <48> 8b 40 40 48 89 43 28 49 8b 45 20 48 89 e7 48 89 43 30 e8 a2 4d RSP: 0000:ffffaad580b6fb60 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff90e38035c01c RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff90e38035c044 RBP: ffff90e38035c024 R08: 0000000000000002 R09: 0000000000000006 R10: ffff90e38035c02e R11: 0000000000000020 R12: ffff90e380bac000 R13: ffffe3a7456d9200 R14: 0000000000001b81 R15: ffffe3a7456d9200 FS: 00007f2e4e8a15c0(0000) GS:ffff90e3fbc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000040 CR3: 00000001150c6003 CR4: 0000000000170ee0 Call Trace: <TASK> ? __die+0x20/0x70 ? page_fault_oops+0x76/0x170 ? kernelmode_fixup_or_oops+0x84/0x110 ? exc_page_fault+0x65/0x150 ? asm_exc_page_fault+0x22/0x30 ? trace_event_raw_event_writeback_folio_template+0x76/0xf0 folio_wait_writeback+0x6b/0x80 shmem_swapin_folio+0x24a/0x500 ? filemap_get_entry+0xe3/0x140 shmem_get_folio_gfp+0x36e/0x7c0 ? find_busiest_group+0x43/0x1a0 shmem_fault+0x76/0x2a0 ? __update_load_avg_cfs_rq+0x281/0x2f0 __do_fault+0x33/0x130 do_read_fault+0x118/0x160 do_pte_missing+0x1ed/0x2a0 __handle_mm_fault+0x566/0x630 handle_mm_fault+0x91/0x210 do_user_addr_fault+0x22c/0x740 exc_page_fault+0x65/0x150 asm_exc_page_fault+0x22/0x30
This problem arises from the fact that the repurposed writeback_dirty_page trace event code was written assuming that every pointer to mapping (struct address_space) would come from a file-mapped page-cache object, thus mapping->host would always be populated, and that was a valid case before commit 19343b5bdd16. The swap-cache address space (swapper_spaces), however, doesn't populate its ->host (struct inode) pointer, thus leading to the crashes in the corner-case aforementioned.
commit 19343b5bdd16 ended up breaking the assignment of __entry->name and __entry->ino for the wait_on_page_writeback tracepoint -- both dependent on mapping->host carrying a pointer to a valid inode. The assignment of __entry->name was fixed by commit 68f23b89067f ("memcg: fix a crash in wb_workfn when a device disappears"), and this commit fixes the remaining case, for __entry->ino.
Link: https://lkml.kernel.org/r/20230606233613.1290819-1-aquini@redhat.com Fixes: 19343b5bdd16 ("mm/page-writeback: introduce tracepoint for wait_on_page_writeback()") Signed-off-by: Rafael Aquini aquini@redhat.com Reviewed-by: Yafang Shao laoar.shao@gmail.com Cc: Aristeu Rozanski aris@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/trace/events/writeback.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/trace/events/writeback.h +++ b/include/trace/events/writeback.h @@ -68,7 +68,7 @@ DECLARE_EVENT_CLASS(writeback_folio_temp strscpy_pad(__entry->name, bdi_dev_name(mapping ? inode_to_bdi(mapping->host) : NULL), 32); - __entry->ino = mapping ? mapping->host->i_ino : 0; + __entry->ino = (mapping && mapping->host) ? mapping->host->i_ino : 0; __entry->index = folio->index; ),
From: Prathu Baronia prathubaronia2011@gmail.com
commit 2049a7d0cbc6ac8e370e836ed68597be04a7dc49 upstream.
Since gfp flags have been shifted to gfp_types.h so update the path in the gfp-translate script.
Link: https://lkml.kernel.org/r/20230608154450.21758-1-prathubaronia2011@gmail.com Fixes: cb5a065b4ea9c ("headers/deps: mm: Split <linux/gfp_types.h> out of <linux/gfp.h>") Signed-off-by: Prathu Baronia prathubaronia2011@gmail.com Reviewed-by: David Hildenbrand david@redhat.com Cc: Masahiro Yamada masahiroy@kernel.org Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: Nicolas Schier nicolas@fjasle.eu Cc: Ingo Molnar mingo@kernel.org Cc: Yury Norov yury.norov@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- scripts/gfp-translate | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/scripts/gfp-translate +++ b/scripts/gfp-translate @@ -63,11 +63,11 @@ fi
# Extract GFP flags from the kernel source TMPFILE=`mktemp -t gfptranslate-XXXXXX` || exit 1 -grep -q ___GFP $SOURCE/include/linux/gfp.h +grep -q ___GFP $SOURCE/include/linux/gfp_types.h if [ $? -eq 0 ]; then - grep "^#define ___GFP" $SOURCE/include/linux/gfp.h | sed -e 's/u$//' | grep -v GFP_BITS > $TMPFILE + grep "^#define ___GFP" $SOURCE/include/linux/gfp_types.h | sed -e 's/u$//' | grep -v GFP_BITS > $TMPFILE else - grep "^#define __GFP" $SOURCE/include/linux/gfp.h | sed -e 's/(__force gfp_t)//' | sed -e 's/u)/)/' | grep -v GFP_BITS | sed -e 's/)//) //' > $TMPFILE + grep "^#define __GFP" $SOURCE/include/linux/gfp_types.h | sed -e 's/(__force gfp_t)//' | sed -e 's/u)/)/' | grep -v GFP_BITS | sed -e 's/)//) //' > $TMPFILE fi
# Parse the flags
From: Ryusuke Konishi konishi.ryusuke@gmail.com
commit 679bd7ebdd315bf457a4740b306ae99f1d0a403d upstream.
As a result of analysis of a syzbot report, it turned out that in three cases where nilfs2 allocates block device buffers directly via sb_getblk, concurrent reads to the device can corrupt the allocated buffers.
Nilfs2 uses sb_getblk for segment summary blocks, that make up a log header, and the super root block, that is the trailer, and when moving and writing the second super block after fs resize.
In any of these, since the uptodate flag is not set when storing metadata to be written in the allocated buffers, the stored metadata will be overwritten if a device read of the same block occurs concurrently before the write. This causes metadata corruption and misbehavior in the log write itself, causing warnings in nilfs_btree_assign() as reported.
Fix these issues by setting an uptodate flag on the buffer head on the first or before modifying each buffer obtained with sb_getblk, and clearing the flag on failure.
When setting the uptodate flag, the lock_buffer/unlock_buffer pair is used to perform necessary exclusive control, and the buffer is filled to ensure that uninitialized bytes are not mixed into the data read from others. As for buffers for segment summary blocks, they are filled incrementally, so if the uptodate flag was unset on their allocation, set the flag and zero fill the buffer once at that point.
Also, regarding the superblock move routine, the starting point of the memset call to zerofill the block is incorrectly specified, which can cause a buffer overflow on file systems with block sizes greater than 4KiB. In addition, if the superblock is moved within a large block, it is necessary to assume the possibility that the data in the superblock will be destroyed by zero-filling before copying. So fix these potential issues as well.
Link: https://lkml.kernel.org/r/20230609035732.20426-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Reported-by: syzbot+31837fe952932efc8fb9@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/00000000000030000a05e981f475@google.com Tested-by: Ryusuke Konishi konishi.ryusuke@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nilfs2/segbuf.c | 6 ++++++ fs/nilfs2/segment.c | 7 +++++++ fs/nilfs2/super.c | 23 ++++++++++++++++++++++- 3 files changed, 35 insertions(+), 1 deletion(-)
--- a/fs/nilfs2/segbuf.c +++ b/fs/nilfs2/segbuf.c @@ -101,6 +101,12 @@ int nilfs_segbuf_extend_segsum(struct ni if (unlikely(!bh)) return -ENOMEM;
+ lock_buffer(bh); + if (!buffer_uptodate(bh)) { + memset(bh->b_data, 0, bh->b_size); + set_buffer_uptodate(bh); + } + unlock_buffer(bh); nilfs_segbuf_add_segsum_buffer(segbuf, bh); return 0; } --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -981,10 +981,13 @@ static void nilfs_segctor_fill_in_super_ unsigned int isz, srsz;
bh_sr = NILFS_LAST_SEGBUF(&sci->sc_segbufs)->sb_super_root; + + lock_buffer(bh_sr); raw_sr = (struct nilfs_super_root *)bh_sr->b_data; isz = nilfs->ns_inode_size; srsz = NILFS_SR_BYTES(isz);
+ raw_sr->sr_sum = 0; /* Ensure initialization within this update */ raw_sr->sr_bytes = cpu_to_le16(srsz); raw_sr->sr_nongc_ctime = cpu_to_le64(nilfs_doing_gc() ? @@ -998,6 +1001,8 @@ static void nilfs_segctor_fill_in_super_ nilfs_write_inode_common(nilfs->ns_sufile, (void *)raw_sr + NILFS_SR_SUFILE_OFFSET(isz), 1); memset((void *)raw_sr + srsz, 0, nilfs->ns_blocksize - srsz); + set_buffer_uptodate(bh_sr); + unlock_buffer(bh_sr); }
static void nilfs_redirty_inodes(struct list_head *head) @@ -1780,6 +1785,7 @@ static void nilfs_abort_logs(struct list list_for_each_entry(segbuf, logs, sb_list) { list_for_each_entry(bh, &segbuf->sb_segsum_buffers, b_assoc_buffers) { + clear_buffer_uptodate(bh); if (bh->b_page != bd_page) { if (bd_page) end_page_writeback(bd_page); @@ -1791,6 +1797,7 @@ static void nilfs_abort_logs(struct list b_assoc_buffers) { clear_buffer_async_write(bh); if (bh == segbuf->sb_super_root) { + clear_buffer_uptodate(bh); if (bh->b_page != bd_page) { end_page_writeback(bd_page); bd_page = bh->b_page; --- a/fs/nilfs2/super.c +++ b/fs/nilfs2/super.c @@ -372,10 +372,31 @@ static int nilfs_move_2nd_super(struct s goto out; } nsbp = (void *)nsbh->b_data + offset; - memset(nsbp, 0, nilfs->ns_blocksize);
+ lock_buffer(nsbh); if (sb2i >= 0) { + /* + * The position of the second superblock only changes by 4KiB, + * which is larger than the maximum superblock data size + * (= 1KiB), so there is no need to use memmove() to allow + * overlap between source and destination. + */ memcpy(nsbp, nilfs->ns_sbp[sb2i], nilfs->ns_sbsize); + + /* + * Zero fill after copy to avoid overwriting in case of move + * within the same block. + */ + memset(nsbh->b_data, 0, offset); + memset((void *)nsbp + nilfs->ns_sbsize, 0, + nsbh->b_size - offset - nilfs->ns_sbsize); + } else { + memset(nsbh->b_data, 0, nsbh->b_size); + } + set_buffer_uptodate(nsbh); + unlock_buffer(nsbh); + + if (sb2i >= 0) { brelse(nilfs->ns_sbh[sb2i]); nilfs->ns_sbh[sb2i] = nsbh; nilfs->ns_sbp[sb2i] = nsbp;
From: Ryusuke Konishi konishi.ryusuke@gmail.com
commit 782e53d0c14420858dbf0f8f797973c150d3b6d7 upstream.
In a syzbot stress test that deliberately causes file system errors on nilfs2 with a corrupted disk image, it has been reported that nilfs_clear_dirty_page() called from nilfs_clear_dirty_pages() can cause a general protection fault.
In nilfs_clear_dirty_pages(), when looking up dirty pages from the page cache and calling nilfs_clear_dirty_page() for each dirty page/folio retrieved, the back reference from the argument page to "mapping" may have been changed to NULL (and possibly others). It is necessary to check this after locking the page/folio.
So, fix this issue by not calling nilfs_clear_dirty_page() on a page/folio after locking it in nilfs_clear_dirty_pages() if the back reference "mapping" from the page/folio is different from the "mapping" that held the page/folio just before.
Link: https://lkml.kernel.org/r/20230612021456.3682-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Reported-by: syzbot+53369d11851d8f26735c@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/000000000000da4f6b05eb9bf593@google.com Tested-by: Ryusuke Konishi konishi.ryusuke@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nilfs2/page.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -370,7 +370,15 @@ void nilfs_clear_dirty_pages(struct addr struct folio *folio = fbatch.folios[i];
folio_lock(folio); - nilfs_clear_dirty_page(&folio->page, silent); + + /* + * This folio may have been removed from the address + * space by truncation or invalidation when the lock + * was acquired. Skip processing in that case. + */ + if (likely(folio->mapping == mapping)) + nilfs_clear_dirty_page(&folio->page, silent); + folio_unlock(folio); } folio_batch_release(&fbatch);
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit 22db06337f590d01d79f60f181d8dfe5a9ef9085 upstream.
The addition of might_sleep() to down_timeout() caused the latter to enable interrupts unconditionally in some cases, which in turn broke the ACPI S3 wakeup path in acpi_suspend_enter(), where down_timeout() is called by acpi_disable_all_gpes() via acpi_ut_acquire_mutex().
Namely, if CONFIG_DEBUG_ATOMIC_SLEEP is set, might_sleep() causes might_resched() to be used and if CONFIG_PREEMPT_VOLUNTARY is set, this triggers __cond_resched() which may call preempt_schedule_common(), so __schedule() gets invoked and it ends up with enabled interrupts (in the prev == next case).
Now, enabling interrupts early in the S3 wakeup path causes the kernel to crash.
Address this by modifying acpi_suspend_enter() to disable GPEs without attempting to acquire the sleeping lock which is not needed in that code path anyway.
Fixes: 99409b935c9a ("locking/semaphore: Add might_sleep() to down_*() family") Reported-by: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: 5.15+ stable@vger.kernel.org # 5.15+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/acpica/achware.h | 2 -- drivers/acpi/sleep.c | 16 ++++++++++++---- include/acpi/acpixf.h | 1 + 3 files changed, 13 insertions(+), 6 deletions(-)
--- a/drivers/acpi/acpica/achware.h +++ b/drivers/acpi/acpica/achware.h @@ -101,8 +101,6 @@ acpi_status acpi_hw_get_gpe_status(struct acpi_gpe_event_info *gpe_event_info, acpi_event_status *event_status);
-acpi_status acpi_hw_disable_all_gpes(void); - acpi_status acpi_hw_enable_all_runtime_gpes(void);
acpi_status acpi_hw_enable_all_wakeup_gpes(void); --- a/drivers/acpi/sleep.c +++ b/drivers/acpi/sleep.c @@ -636,11 +636,19 @@ static int acpi_suspend_enter(suspend_st }
/* - * Disable and clear GPE status before interrupt is enabled. Some GPEs - * (like wakeup GPE) haven't handler, this can avoid such GPE misfire. - * acpi_leave_sleep_state will reenable specific GPEs later + * Disable all GPE and clear their status bits before interrupts are + * enabled. Some GPEs (like wakeup GPEs) have no handlers and this can + * prevent them from producing spurious interrups. + * + * acpi_leave_sleep_state() will reenable specific GPEs later. + * + * Because this code runs on one CPU with disabled interrupts (all of + * the other CPUs are offline at this time), it need not acquire any + * sleeping locks which may trigger an implicit preemption point even + * if there is no contention, so avoid doing that by using a low-level + * library routine here. */ - acpi_disable_all_gpes(); + acpi_hw_disable_all_gpes(); /* Allow EC transactions to happen. */ acpi_ec_unblock_transactions();
--- a/include/acpi/acpixf.h +++ b/include/acpi/acpixf.h @@ -761,6 +761,7 @@ ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_sta acpi_event_status *event_status)) ACPI_HW_DEPENDENT_RETURN_UINT32(u32 acpi_dispatch_gpe(acpi_handle gpe_device, u32 gpe_number)) +ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_hw_disable_all_gpes(void)) ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_disable_all_gpes(void)) ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_enable_all_runtime_gpes(void)) ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_enable_all_wakeup_gpes(void))
From: Hans de Goede hdegoede@redhat.com
commit 0bb619f9227aa370330d2b309733d74750705053 upstream.
Since commit 955fb8719efb ("thermal/intel/intel_soc_dts_iosf: Use Intel TCC library") intel_soc_dts_iosf is reporting the wrong temperature.
The driver expects tj_max to be in milli-degrees-celcius but after the switch to the TCC library this is now in degrees celcius so instead of e.g. 90000 it is set to 90 causing a temperature 45 degrees below tj_max to be reported as -44910 milli-degrees instead of as 45000 milli-degrees.
Fix this by adding back the lost factor of 1000.
Fixes: 955fb8719efb ("thermal/intel/intel_soc_dts_iosf: Use Intel TCC library") Reported-by: Bernhard Krug b.krug@elektronenpumpe.de Signed-off-by: Hans de Goede hdegoede@redhat.com Acked-by: Zhang Rui rui.zhang@intel.com Cc: 6.3+ stable@vger.kernel.org # 6.3+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/thermal/intel/intel_soc_dts_iosf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/thermal/intel/intel_soc_dts_iosf.c +++ b/drivers/thermal/intel/intel_soc_dts_iosf.c @@ -401,7 +401,7 @@ struct intel_soc_dts_sensors *intel_soc_ spin_lock_init(&sensors->intr_notify_lock); mutex_init(&sensors->dts_update_lock); sensors->intr_type = intr_type; - sensors->tj_max = tj_max; + sensors->tj_max = tj_max * 1000; if (intr_type == INTEL_SOC_DTS_INTERRUPT_NONE) notification = false; else
From: Gavin Shan gshan@redhat.com
commit 2230f9e1171a2e9731422a14d1bbc313c0b719d1 upstream.
We run into guest hang in edk2 firmware when KSM is kept as running on the host. The edk2 firmware is waiting for status 0x80 from QEMU's pflash device (TYPE_PFLASH_CFI01) during the operation of sector erasing or buffered write. The status is returned by reading the memory region of the pflash device and the read request should have been forwarded to QEMU and emulated by it. Unfortunately, the read request is covered by an illegal stage2 mapping when the guest hang issue occurs. The read request is completed with QEMU bypassed and wrong status is fetched. The edk2 firmware runs into an infinite loop with the wrong status.
The illegal stage2 mapping is populated due to same page sharing by KSM at (C) even the associated memory slot has been marked as invalid at (B) when the memory slot is requested to be deleted. It's notable that the active and inactive memory slots can't be swapped when we're in the middle of kvm_mmu_notifier_change_pte() because kvm->mn_active_invalidate_count is elevated, and kvm_swap_active_memslots() will busy loop until it reaches to zero again. Besides, the swapping from the active to the inactive memory slots is also avoided by holding &kvm->srcu in __kvm_handle_hva_range(), corresponding to synchronize_srcu_expedited() in kvm_swap_active_memslots().
CPU-A CPU-B ----- ----- ioctl(kvm_fd, KVM_SET_USER_MEMORY_REGION) kvm_vm_ioctl_set_memory_region kvm_set_memory_region __kvm_set_memory_region kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE) kvm_invalidate_memslot kvm_copy_memslot kvm_replace_memslot kvm_swap_active_memslots (A) kvm_arch_flush_shadow_memslot (B) same page sharing by KSM kvm_mmu_notifier_invalidate_range_start : kvm_mmu_notifier_change_pte kvm_handle_hva_range __kvm_handle_hva_range kvm_set_spte_gfn (C) : kvm_mmu_notifier_invalidate_range_end
Fix the issue by skipping the invalid memory slot at (C) to avoid the illegal stage2 mapping so that the read request for the pflash's status is forwarded to QEMU and emulated by it. In this way, the correct pflash's status can be returned from QEMU to break the infinite loop in the edk2 firmware.
We tried a git-bisect and the first problematic commit is cd4c71835228 (" KVM: arm64: Convert to the gfn-based MMU notifier callbacks"). With this, clean_dcache_guest_page() is called after the memory slots are iterated in kvm_mmu_notifier_change_pte(). clean_dcache_guest_page() is called before the iteration on the memory slots before this commit. This change literally enlarges the racy window between kvm_mmu_notifier_change_pte() and memory slot removal so that we're able to reproduce the issue in a practical test case. However, the issue exists since commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup").
Cc: stable@vger.kernel.org # v3.9+ Fixes: d5d8184d35c9 ("KVM: ARM: Memory virtualization setup") Reported-by: Shuai Hu hshuai@redhat.com Reported-by: Zhenyu Zhang zhenyzha@redhat.com Signed-off-by: Gavin Shan gshan@redhat.com Reviewed-by: David Hildenbrand david@redhat.com Reviewed-by: Oliver Upton oliver.upton@linux.dev Reviewed-by: Peter Xu peterx@redhat.com Reviewed-by: Sean Christopherson seanjc@google.com Reviewed-by: Shaoqin Huang shahuang@redhat.com Message-Id: 20230615054259.14911-1-gshan@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- virt/kvm/kvm_main.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-)
--- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -683,6 +683,24 @@ static __always_inline int kvm_handle_hv
return __kvm_handle_hva_range(kvm, &range); } + +static bool kvm_change_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range) +{ + /* + * Skipping invalid memslots is correct if and only change_pte() is + * surrounded by invalidate_range_{start,end}(), which is currently + * guaranteed by the primary MMU. If that ever changes, KVM needs to + * unmap the memslot instead of skipping the memslot to ensure that KVM + * doesn't hold references to the old PFN. + */ + WARN_ON_ONCE(!READ_ONCE(kvm->mn_active_invalidate_count)); + + if (range->slot->flags & KVM_MEMSLOT_INVALID) + return false; + + return kvm_set_spte_gfn(kvm, range); +} + static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn, struct mm_struct *mm, unsigned long address, @@ -704,7 +722,7 @@ static void kvm_mmu_notifier_change_pte( if (!READ_ONCE(kvm->mmu_invalidate_in_progress)) return;
- kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn); + kvm_handle_hva_range(mn, address, address + 1, pte, kvm_change_spte_gfn); }
void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start,
From: Lorenzo Stoakes lstoakes@gmail.com
commit 95a301eefa82057571207edd06ea36218985a75e upstream.
In __vmalloc_area_node() we always warn_alloc() when an allocation performed by vm_area_alloc_pages() fails unless it was due to a pending fatal signal.
However, huge page allocations instigated either by vmalloc_huge() or __vmalloc_node_range() (or a caller that invokes this like kvmalloc() or kvmalloc_node()) always falls back to order-0 allocations if the huge page allocation fails.
This renders the warning useless and noisy, especially as all callers appear to be aware that this may fallback. This has already resulted in at least one bug report from a user who was confused by this (see link).
Therefore, simply update the code to only output this warning for order-0 pages when no fatal signal is pending.
Link: https://bugzilla.suse.com/show_bug.cgi?id=1211410 Link: https://lkml.kernel.org/r/20230605201107.83298-1-lstoakes@gmail.com Fixes: 80b1d8fdfad1 ("mm: vmalloc: correct use of __GFP_NOWARN mask in __vmalloc_area_node()") Signed-off-by: Lorenzo Stoakes lstoakes@gmail.com Acked-by: Vlastimil Babka vbabka@suse.cz Reviewed-by: Baoquan He bhe@redhat.com Acked-by: Michal Hocko mhocko@suse.com Reviewed-by: Uladzislau Rezki (Sony) urezki@gmail.com Reviewed-by: David Hildenbrand david@redhat.com Cc: Christoph Hellwig hch@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/vmalloc.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)
--- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3046,11 +3046,20 @@ static void *__vmalloc_area_node(struct * allocation request, free them via vfree() if any. */ if (area->nr_pages != nr_small_pages) { - /* vm_area_alloc_pages() can also fail due to a fatal signal */ - if (!fatal_signal_pending(current)) + /* + * vm_area_alloc_pages() can fail due to insufficient memory but + * also:- + * + * - a pending fatal signal + * - insufficient huge page-order pages + * + * Since we always retry allocations at order-0 in the huge page + * case a warning for either is spurious. + */ + if (!fatal_signal_pending(current) && page_order == 0) warn_alloc(gfp_mask, NULL, - "vmalloc error: size %lu, page order %u, failed to allocate pages", - area->nr_pages * PAGE_SIZE, page_order); + "vmalloc error: size %lu, failed to allocate pages", + area->nr_pages * PAGE_SIZE); goto fail; }
From: Liam R. Howlett Liam.Howlett@oracle.com
commit 77795f900e2a07c1cbedc375789aefb43843b6c2 upstream.
The return of do_mprotect_pkey() can still be incorrectly returned as success if there is a gap that spans to or beyond the end address passed in. Update the check to ensure that the end address has indeed been seen.
Link: https://lore.kernel.org/all/CABi2SkXjN+5iFoBhxk71t3cmunTk-s=rB4T7qo0UQRh17s4... Link: https://lkml.kernel.org/r/20230606182912.586576-1-Liam.Howlett@oracle.com Fixes: 82f951340f25 ("mm/mprotect: fix do_mprotect_pkey() return on error") Signed-off-by: Liam R. Howlett Liam.Howlett@oracle.com Reported-by: Jeff Xu jeffxu@chromium.org Reviewed-by: Lorenzo Stoakes lstoakes@gmail.com Acked-by: David Hildenbrand david@redhat.com Acked-by: Vlastimil Babka vbabka@suse.cz Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/mprotect.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -838,7 +838,7 @@ static int do_mprotect_pkey(unsigned lon } tlb_finish_mmu(&tlb);
- if (!error && vma_iter_end(&vmi) < end) + if (!error && tmp < end) error = -ENOMEM;
out:
From: Dexuan Cui decui@microsoft.com
commit ec97e112985c2581ee61854a4b74f080f6cdfc2c upstream.
Commit 572086325ce9 ("Drivers: hv: vmbus: Cleanup synic memory free path") says "Any memory allocations that succeeded will be freed when the caller cleans up by calling hv_synic_free()", but if the get_zeroed_page() in hv_synic_alloc() fails, currently hv_synic_free() is not really called in vmbus_bus_init(), consequently there will be a memory leak, e.g. hv_context.hv_numa_map is not freed in the error path. Fix this by updating the goto labels.
Cc: stable@kernel.org Signed-off-by: Dexuan Cui decui@microsoft.com Fixes: 4df4cb9e99f8 ("x86/hyperv: Initialize clockevents earlier in CPU onlining") Reviewed-by: Michael Kelley mikelley@microsoft.com Link: https://lore.kernel.org/r/20230504224155.10484-1-decui@microsoft.com Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hv/vmbus_drv.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -1525,7 +1525,7 @@ static int vmbus_bus_init(void) ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "hyperv/vmbus:online", hv_synic_init, hv_synic_cleanup); if (ret < 0) - goto err_cpuhp; + goto err_alloc; hyperv_cpuhp_online = ret;
ret = vmbus_connect(); @@ -1577,9 +1577,8 @@ static int vmbus_bus_init(void)
err_connect: cpuhp_remove_state(hyperv_cpuhp_online); -err_cpuhp: - hv_synic_free(); err_alloc: + hv_synic_free(); if (vmbus_irq == -1) { hv_remove_vmbus_handler(); } else {
From: Michael Kelley mikelley@microsoft.com
commit 320805ab61e5f1e2a5729ae266e16bec2904050c upstream.
vmbus_wait_for_unload() may be called in the panic path after other CPUs are stopped. vmbus_wait_for_unload() currently loops through online CPUs looking for the UNLOAD response message. But the values of CONFIG_KEXEC_CORE and crash_kexec_post_notifiers affect the path used to stop the other CPUs, and in one of the paths the stopped CPUs are removed from cpu_online_mask. This removal happens in both x86/x64 and arm64 architectures. In such a case, vmbus_wait_for_unload() only checks the panic'ing CPU, and misses the UNLOAD response message except when the panic'ing CPU is CPU 0. vmbus_wait_for_unload() eventually times out, but only after waiting 100 seconds.
Fix this by looping through *present* CPUs in vmbus_wait_for_unload(). The cpu_present_mask is not modified by stopping the other CPUs in the panic path, nor should it be.
Also, in a CoCo VM the synic_message_page is not allocated in hv_synic_alloc(), but is set and cleared in hv_synic_enable_regs() and hv_synic_disable_regs() such that it is set only when the CPU is online. If not all present CPUs are online when vmbus_wait_for_unload() is called, the synic_message_page might be NULL. Add a check for this.
Fixes: cd95aad55793 ("Drivers: hv: vmbus: handle various crash scenarios") Cc: stable@vger.kernel.org Reported-by: John Starks jostarks@microsoft.com Signed-off-by: Michael Kelley mikelley@microsoft.com Reviewed-by: Vitaly Kuznetsov vkuznets@redhat.com Link: https://lore.kernel.org/r/1684422832-38476-1-git-send-email-mikelley@microso... Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hv/channel_mgmt.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)
--- a/drivers/hv/channel_mgmt.c +++ b/drivers/hv/channel_mgmt.c @@ -829,11 +829,22 @@ static void vmbus_wait_for_unload(void) if (completion_done(&vmbus_connection.unload_event)) goto completed;
- for_each_online_cpu(cpu) { + for_each_present_cpu(cpu) { struct hv_per_cpu_context *hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu);
+ /* + * In a CoCo VM the synic_message_page is not allocated + * in hv_synic_alloc(). Instead it is set/cleared in + * hv_synic_enable_regs() and hv_synic_disable_regs() + * such that it is set only when the CPU is online. If + * not all present CPUs are online, the message page + * might be NULL, so skip such CPUs. + */ page_addr = hv_cpu->synic_message_page; + if (!page_addr) + continue; + msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
@@ -867,11 +878,14 @@ completed: * maybe-pending messages on all CPUs to be able to receive new * messages after we reconnect. */ - for_each_online_cpu(cpu) { + for_each_present_cpu(cpu) { struct hv_per_cpu_context *hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu);
page_addr = hv_cpu->synic_message_page; + if (!page_addr) + continue; + msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT; msg->header.message_type = HVMSG_NONE; }
From: Dexuan Cui decui@microsoft.com
commit 440b5e3663271b0ffbd4908115044a6a51fb938b upstream.
Since day 1 of the driver, there has been a race between hv_pci_query_relations() and survey_child_resources(): during fast device hotplug, hv_pci_query_relations() may error out due to device-remove and the stack variable 'comp' is no longer valid; however, pci_devices_present_work() -> survey_child_resources() -> complete() may be running on another CPU and accessing the no-longer-valid 'comp'. Fix the race by flushing the workqueue before we exit from hv_pci_query_relations().
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs") Signed-off-by: Dexuan Cui decui@microsoft.com Reviewed-by: Michael Kelley mikelley@microsoft.com Acked-by: Lorenzo Pieralisi lpieralisi@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615044451.5580-2-decui@microsoft.com Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-hyperv.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
--- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -3308,6 +3308,24 @@ static int hv_pci_query_relations(struct if (!ret) ret = wait_for_response(hdev, &comp);
+ /* + * In the case of fast device addition/removal, it's possible that + * vmbus_sendpacket() or wait_for_response() returns -ENODEV but we + * already got a PCI_BUS_RELATIONS* message from the host and the + * channel callback already scheduled a work to hbus->wq, which can be + * running pci_devices_present_work() -> survey_child_resources() -> + * complete(&hbus->survey_event), even after hv_pci_query_relations() + * exits and the stack variable 'comp' is no longer valid; as a result, + * a hang or a page fault may happen when the complete() calls + * raw_spin_lock_irqsave(). Flush hbus->wq before we exit from + * hv_pci_query_relations() to avoid the issues. Note: if 'ret' is + * -ENODEV, there can't be any more work item scheduled to hbus->wq + * after the flush_workqueue(): see vmbus_onoffer_rescind() -> + * vmbus_reset_channel_cb(), vmbus_rescind_cleanup() -> + * channel->rescind = true. + */ + flush_workqueue(hbus->wq); + return ret; }
From: Dexuan Cui decui@microsoft.com
commit a847234e24d03d01a9566d1d9dcce018cc018d67 upstream.
This reverts commit d6af2ed29c7c1c311b96dac989dcb991e90ee195.
The statement "the hv_pci_bus_exit() call releases structures of all its child devices" in commit d6af2ed29c7c is not true: in the path hv_pci_probe() -> hv_pci_enter_d0() -> hv_pci_bus_exit(hdev, true): the parameter "keep_devs" is true, so hv_pci_bus_exit() does *not* release the child "struct hv_pci_dev *hpdev" that is created earlier in pci_devices_present_work() -> new_pcichild_device().
The commit d6af2ed29c7c was originally made in July 2020 for RHEL 7.7, where the old version of hv_pci_bus_exit() was used; when the commit was rebased and merged into the upstream, people didn't notice that it's not really necessary. The commit itself doesn't cause any issue, but it makes hv_pci_probe() more complicated. Revert it to facilitate some upcoming changes to hv_pci_probe().
Signed-off-by: Dexuan Cui decui@microsoft.com Reviewed-by: Michael Kelley mikelley@microsoft.com Acked-by: Wei Hu weh@microsoft.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615044451.5580-5-decui@microsoft.com Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-hyperv.c | 71 +++++++++++++++++------------------- 1 file changed, 34 insertions(+), 37 deletions(-)
--- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -3238,8 +3238,10 @@ static int hv_pci_enter_d0(struct hv_dev struct pci_bus_d0_entry *d0_entry; struct hv_pci_compl comp_pkt; struct pci_packet *pkt; + bool retry = true; int ret;
+enter_d0_retry: /* * Tell the host that the bus is ready to use, and moved into the * powered-on state. This includes telling the host which region @@ -3266,6 +3268,38 @@ static int hv_pci_enter_d0(struct hv_dev if (ret) goto exit;
+ /* + * In certain case (Kdump) the pci device of interest was + * not cleanly shut down and resource is still held on host + * side, the host could return invalid device status. + * We need to explicitly request host to release the resource + * and try to enter D0 again. + */ + if (comp_pkt.completion_status < 0 && retry) { + retry = false; + + dev_err(&hdev->device, "Retrying D0 Entry\n"); + + /* + * Hv_pci_bus_exit() calls hv_send_resource_released() + * to free up resources of its child devices. + * In the kdump kernel we need to set the + * wslot_res_allocated to 255 so it scans all child + * devices to release resources allocated in the + * normal kernel before panic happened. + */ + hbus->wslot_res_allocated = 255; + + ret = hv_pci_bus_exit(hdev, true); + + if (ret == 0) { + kfree(pkt); + goto enter_d0_retry; + } + dev_err(&hdev->device, + "Retrying D0 failed with ret %d\n", ret); + } + if (comp_pkt.completion_status < 0) { dev_err(&hdev->device, "PCI Pass-through VSP failed D0 Entry with status %x\n", @@ -3511,7 +3545,6 @@ static int hv_pci_probe(struct hv_device struct hv_pcibus_device *hbus; u16 dom_req, dom; char *name; - bool enter_d0_retry = true; int ret;
/* @@ -3651,47 +3684,11 @@ static int hv_pci_probe(struct hv_device if (ret) goto free_fwnode;
-retry: ret = hv_pci_query_relations(hdev); if (ret) goto free_irq_domain;
ret = hv_pci_enter_d0(hdev); - /* - * In certain case (Kdump) the pci device of interest was - * not cleanly shut down and resource is still held on host - * side, the host could return invalid device status. - * We need to explicitly request host to release the resource - * and try to enter D0 again. - * Since the hv_pci_bus_exit() call releases structures - * of all its child devices, we need to start the retry from - * hv_pci_query_relations() call, requesting host to send - * the synchronous child device relations message before this - * information is needed in hv_send_resources_allocated() - * call later. - */ - if (ret == -EPROTO && enter_d0_retry) { - enter_d0_retry = false; - - dev_err(&hdev->device, "Retrying D0 Entry\n"); - - /* - * Hv_pci_bus_exit() calls hv_send_resources_released() - * to free up resources of its child devices. - * In the kdump kernel we need to set the - * wslot_res_allocated to 255 so it scans all child - * devices to release resources allocated in the - * normal kernel before panic happened. - */ - hbus->wslot_res_allocated = 255; - ret = hv_pci_bus_exit(hdev, true); - - if (ret == 0) - goto retry; - - dev_err(&hdev->device, - "Retrying D0 failed with ret %d\n", ret); - } if (ret) goto free_irq_domain;
From: Dexuan Cui decui@microsoft.com
commit add9195e69c94b32e96f78c2f9cea68f0e850b3f upstream.
The hpdev->state is never really useful. The only use in hv_pci_eject_device() and hv_eject_device_work() is not really necessary.
Signed-off-by: Dexuan Cui decui@microsoft.com Reviewed-by: Michael Kelley mikelley@microsoft.com Acked-by: Lorenzo Pieralisi lpieralisi@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615044451.5580-4-decui@microsoft.com Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-hyperv.c | 12 ------------ 1 file changed, 12 deletions(-)
--- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -553,19 +553,10 @@ struct hv_dr_state { struct hv_pcidev_description func[]; };
-enum hv_pcichild_state { - hv_pcichild_init = 0, - hv_pcichild_requirements, - hv_pcichild_resourced, - hv_pcichild_ejecting, - hv_pcichild_maximum -}; - struct hv_pci_dev { /* List protected by pci_rescan_remove_lock */ struct list_head list_entry; refcount_t refs; - enum hv_pcichild_state state; struct pci_slot *pci_slot; struct hv_pcidev_description desc; bool reported_missing; @@ -2751,8 +2742,6 @@ static void hv_eject_device_work(struct hpdev = container_of(work, struct hv_pci_dev, wrk); hbus = hpdev->hbus;
- WARN_ON(hpdev->state != hv_pcichild_ejecting); - /* * Ejection can come before or after the PCI bus has been set up, so * attempt to find it and tear down the bus state, if it exists. This @@ -2809,7 +2798,6 @@ static void hv_pci_eject_device(struct h return; }
- hpdev->state = hv_pcichild_ejecting; get_pcichild(hpdev); INIT_WORK(&hpdev->wrk, hv_eject_device_work); queue_work(hbus->wq, &hpdev->wrk);
From: Dexuan Cui decui@microsoft.com
commit 2738d5ab7929a845b654cd171a1e275c37eb428e upstream.
When the host tries to remove a PCI device, the host first sends a PCI_EJECT message to the guest, and the guest is supposed to gracefully remove the PCI device and send a PCI_EJECTION_COMPLETE message to the host; the host then sends a VMBus message CHANNELMSG_RESCIND_CHANNELOFFER to the guest (when the guest receives this message, the device is already unassigned from the guest) and the guest can do some final cleanup work; if the guest fails to respond to the PCI_EJECT message within one minute, the host sends the VMBus message CHANNELMSG_RESCIND_CHANNELOFFER and removes the PCI device forcibly.
In the case of fast device addition/removal, it's possible that the PCI device driver is still configuring MSI-X interrupts when the guest receives the PCI_EJECT message; the channel callback calls hv_pci_eject_device(), which sets hpdev->state to hv_pcichild_ejecting, and schedules a work hv_eject_device_work(); if the PCI device driver is calling pci_alloc_irq_vectors() -> ... -> hv_compose_msi_msg(), we can break the while loop in hv_compose_msi_msg() due to the updated hpdev->state, and leave data->chip_data with its default value of NULL; later, when the PCI device driver calls request_irq() -> ... -> hv_irq_unmask(), the guest crashes in hv_arch_irq_unmask() due to data->chip_data being NULL.
Fix the issue by not testing hpdev->state in the while loop: when the guest receives PCI_EJECT, the device is still assigned to the guest, and the guest has one minute to finish the device removal gracefully. We don't really need to (and we should not) test hpdev->state in the loop.
Fixes: de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()") Signed-off-by: Dexuan Cui decui@microsoft.com Reviewed-by: Michael Kelley mikelley@microsoft.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615044451.5580-3-decui@microsoft.com Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-hyperv.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
--- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -634,6 +634,11 @@ static void hv_arch_irq_unmask(struct ir pbus = pdev->bus; hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata); int_desc = data->chip_data; + if (!int_desc) { + dev_warn(&hbus->hdev->device, "%s() can not unmask irq %u\n", + __func__, data->irq); + return; + }
spin_lock_irqsave(&hbus->retarget_msi_interrupt_lock, flags);
@@ -1902,12 +1907,6 @@ static void hv_compose_msi_msg(struct ir hv_pci_onchannelcallback(hbus); spin_unlock_irqrestore(&channel->sched_lock, flags);
- if (hpdev->state == hv_pcichild_ejecting) { - dev_err_once(&hbus->hdev->device, - "the device is being ejected\n"); - goto enable_tasklet; - } - udelay(100); }
From: Dexuan Cui decui@microsoft.com
commit 067d6ec7ed5b49380688e06c1e5f883a71bef4fe upstream.
In the case of fast device addition/removal, it's possible that hv_eject_device_work() can start to run before create_root_hv_pci_bus() starts to run; as a result, the pci_get_domain_bus_and_slot() in hv_eject_device_work() can return a 'pdev' of NULL, and hv_eject_device_work() can remove the 'hpdev', and immediately send a message PCI_EJECTION_COMPLETE to the host, and the host immediately unassigns the PCI device from the guest; meanwhile, create_root_hv_pci_bus() and the PCI device driver can be probing the dead PCI device and reporting timeout errors.
Fix the issue by adding a per-bus mutex 'state_lock' and grabbing the mutex before powering on the PCI bus in hv_pci_enter_d0(): when hv_eject_device_work() starts to run, it's able to find the 'pdev' and call pci_stop_and_remove_bus_device(pdev): if the PCI device driver has loaded, the PCI device driver's probe() function is already called in create_root_hv_pci_bus() -> pci_bus_add_devices(), and now hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able to call the PCI device driver's remove() function and remove the device reliably; if the PCI device driver hasn't loaded yet, the function call hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able to remove the PCI device reliably and the PCI device driver's probe() function won't be called; if the PCI device driver's probe() is already running (e.g., systemd-udev is loading the PCI device driver), it must be holding the per-device lock, and after the probe() finishes and releases the lock, hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able to proceed to remove the device reliably.
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs") Signed-off-by: Dexuan Cui decui@microsoft.com Reviewed-by: Michael Kelley mikelley@microsoft.com Acked-by: Lorenzo Pieralisi lpieralisi@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615044451.5580-6-decui@microsoft.com Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pci-hyperv.c | 29 ++++++++++++++++++++++++++--- 1 file changed, 26 insertions(+), 3 deletions(-)
--- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -489,7 +489,10 @@ struct hv_pcibus_device { struct fwnode_handle *fwnode; /* Protocol version negotiated with the host */ enum pci_protocol_version_t protocol_version; + + struct mutex state_lock; enum hv_pcibus_state state; + struct hv_device *hdev; resource_size_t low_mmio_space; resource_size_t high_mmio_space; @@ -2512,6 +2515,8 @@ static void pci_devices_present_work(str if (!dr) return;
+ mutex_lock(&hbus->state_lock); + /* First, mark all existing children as reported missing. */ spin_lock_irqsave(&hbus->device_list_lock, flags); list_for_each_entry(hpdev, &hbus->children, list_entry) { @@ -2593,6 +2598,8 @@ static void pci_devices_present_work(str break; }
+ mutex_unlock(&hbus->state_lock); + kfree(dr); }
@@ -2741,6 +2748,8 @@ static void hv_eject_device_work(struct hpdev = container_of(work, struct hv_pci_dev, wrk); hbus = hpdev->hbus;
+ mutex_lock(&hbus->state_lock); + /* * Ejection can come before or after the PCI bus has been set up, so * attempt to find it and tear down the bus state, if it exists. This @@ -2777,6 +2786,8 @@ static void hv_eject_device_work(struct put_pcichild(hpdev); put_pcichild(hpdev); /* hpdev has been freed. Do not use it any more. */ + + mutex_unlock(&hbus->state_lock); }
/** @@ -3567,6 +3578,7 @@ static int hv_pci_probe(struct hv_device return -ENOMEM;
hbus->bridge = bridge; + mutex_init(&hbus->state_lock); hbus->state = hv_pcibus_init; hbus->wslot_res_allocated = -1;
@@ -3675,9 +3687,11 @@ static int hv_pci_probe(struct hv_device if (ret) goto free_irq_domain;
+ mutex_lock(&hbus->state_lock); + ret = hv_pci_enter_d0(hdev); if (ret) - goto free_irq_domain; + goto release_state_lock;
ret = hv_pci_allocate_bridge_windows(hbus); if (ret) @@ -3695,12 +3709,15 @@ static int hv_pci_probe(struct hv_device if (ret) goto free_windows;
+ mutex_unlock(&hbus->state_lock); return 0;
free_windows: hv_pci_free_bridge_windows(hbus); exit_d0: (void) hv_pci_bus_exit(hdev, true); +release_state_lock: + mutex_unlock(&hbus->state_lock); free_irq_domain: irq_domain_remove(hbus->irq_domain); free_fwnode: @@ -3950,20 +3967,26 @@ static int hv_pci_resume(struct hv_devic if (ret) goto out;
+ mutex_lock(&hbus->state_lock); + ret = hv_pci_enter_d0(hdev); if (ret) - goto out; + goto release_state_lock;
ret = hv_send_resources_allocated(hdev); if (ret) - goto out; + goto release_state_lock;
prepopulate_bars(hbus);
hv_pci_restore_msi_state(hbus);
hbus->state = hv_pcibus_installed; + mutex_unlock(&hbus->state_lock); return 0; + +release_state_lock: + mutex_unlock(&hbus->state_lock); out: vmbus_close(hdev->channel); return ret;
From: Jens Axboe axboe@kernel.dk
commit b1dc492087db0f2e5a45f1072a743d04618dd6be upstream.
If we have cmsg attached AND we transferred partial data at least, clear msg_controllen on retry so we don't attempt to send that again.
Cc: stable@vger.kernel.org # 5.10+ Fixes: cac9e4418f4c ("io_uring/net: save msghdr->msg_control for retries") Reported-by: Stefan Metzmacher metze@samba.org Reviewed-by: Stefan Metzmacher metze@samba.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/net.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/io_uring/net.c +++ b/io_uring/net.c @@ -326,6 +326,8 @@ int io_sendmsg(struct io_kiocb *req, uns if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK)) return io_setup_async_msg(req, kmsg, issue_flags); if (ret > 0 && io_net_retry(sock, flags)) { + kmsg->msg.msg_controllen = 0; + kmsg->msg.msg_control = NULL; sr->done_io += ret; req->flags |= REQ_F_PARTIAL_IO; return io_setup_async_msg(req, kmsg, issue_flags);
From: Jens Axboe axboe@kernel.dk
commit 78d0d2063bab954d19a1696feae4c7706a626d48 upstream.
We cannot sanely handle partial retries for recvmsg if we have cmsg attached. If we don't, then we'd just be overwriting the initial cmsg header on retries. Alternatively we could increment and handle this appropriately, but it doesn't seem worth the complication.
Move the MSG_WAITALL check into the non-multishot case while at it, since MSG_WAITALL is explicitly disabled for multishot anyway.
Link: https://lore.kernel.org/io-uring/0b0d4411-c8fd-4272-770b-e030af6919a0@kernel... Cc: stable@vger.kernel.org # 5.10+ Reported-by: Stefan Metzmacher metze@samba.org Reviewed-by: Stefan Metzmacher metze@samba.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/net.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/io_uring/net.c +++ b/io_uring/net.c @@ -789,16 +789,19 @@ retry_multishot: flags = sr->msg_flags; if (force_nonblock) flags |= MSG_DONTWAIT; - if (flags & MSG_WAITALL) - min_ret = iov_iter_count(&kmsg->msg.msg_iter);
kmsg->msg.msg_get_inq = 1; - if (req->flags & REQ_F_APOLL_MULTISHOT) + if (req->flags & REQ_F_APOLL_MULTISHOT) { ret = io_recvmsg_multishot(sock, sr, kmsg, flags, &mshot_finished); - else + } else { + /* disable partial retry for recvmsg with cmsg attached */ + if (flags & MSG_WAITALL && !kmsg->msg.msg_controllen) + min_ret = iov_iter_count(&kmsg->msg.msg_iter); + ret = __sys_recvmsg_sock(sock, &kmsg->msg, sr->umsg, kmsg->uaddr, flags); + }
if (ret < min_ret) { if (ret == -EAGAIN && force_nonblock) {
From: Paolo Abeni pabeni@redhat.com
commit c2b2ae3925b65070adb27d5a31a31c376f26dec7 upstream.
Currently the mptcp code has assumes that disconnect() can fail only at mptcp_sendmsg_fastopen() time - to avoid a deadlock scenario - and don't even bother returning an error code.
Soon mptcp_disconnect() will handle more error conditions: let's track them explicitly.
As a bonus, explicitly annotate TCP-level disconnect as not failing: the mptcp code never blocks for event on the subflows.
Fixes: 7d803344fdc3 ("mptcp: fix deadlock in fastopen error path") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Tested-by: Christoph Paasch cpaasch@apple.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-)
--- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1696,7 +1696,13 @@ static int mptcp_sendmsg_fastopen(struct if (ret && ret != -EINPROGRESS && ret != -ERESTARTSYS && ret != -EINTR) *copied_syn = 0; } else if (ret && ret != -EINPROGRESS) { - mptcp_disconnect(sk, 0); + /* The disconnect() op called by tcp_sendmsg_fastopen()/ + * __inet_stream_connect() can fail, due to looking check, + * see mptcp_disconnect(). + * Attempt it again outside the problematic scope. + */ + if (!mptcp_disconnect(sk, 0)) + sk->sk_socket->state = SS_UNCONNECTED; }
return ret; @@ -2360,7 +2366,10 @@ static void __mptcp_close_ssk(struct soc
need_push = (flags & MPTCP_CF_PUSH) && __mptcp_retransmit_pending_data(sk); if (!dispose_it) { - tcp_disconnect(ssk, 0); + /* The MPTCP code never wait on the subflow sockets, TCP-level + * disconnect should never fail + */ + WARN_ON_ONCE(tcp_disconnect(ssk, 0)); msk->subflow->state = SS_UNCONNECTED; mptcp_subflow_ctx_reset(subflow); release_sock(ssk); @@ -2787,7 +2796,7 @@ void mptcp_subflow_shutdown(struct sock break; fallthrough; case TCP_SYN_SENT: - tcp_disconnect(ssk, O_NONBLOCK); + WARN_ON_ONCE(tcp_disconnect(ssk, O_NONBLOCK)); break; default: if (__mptcp_check_fallback(mptcp_sk(sk))) { @@ -3047,11 +3056,10 @@ static int mptcp_disconnect(struct sock
/* We are on the fastopen error path. We can't call straight into the * subflows cleanup code due to lock nesting (we are already under - * msk->firstsocket lock). Do nothing and leave the cleanup to the - * caller. + * msk->firstsocket lock). */ if (msk->fastopening) - return 0; + return -EBUSY;
mptcp_listen_inuse_dec(sk); inet_sk_state_store(sk, TCP_CLOSE);
From: Paolo Abeni pabeni@redhat.com
commit 0ad529d9fd2bfa3fc619552a8d2fb2f2ef0bce2e upstream.
Christoph reported a divide by zero bug in mptcp_recvmsg():
divide error: 0000 [#1] PREEMPT SMP CPU: 1 PID: 19978 Comm: syz-executor.6 Not tainted 6.4.0-rc2-gffcc7899081b #20 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 RIP: 0010:__tcp_select_window+0x30e/0x420 net/ipv4/tcp_output.c:3018 Code: 11 ff 0f b7 cd c1 e9 0c b8 ff ff ff ff d3 e0 89 c1 f7 d1 01 cb 21 c3 eb 17 e8 2e 83 11 ff 31 db eb 0e e8 25 83 11 ff 89 d8 99 <f7> 7c 24 04 29 d3 65 48 8b 04 25 28 00 00 00 48 3b 44 24 10 75 60 RSP: 0018:ffffc90000a07a18 EFLAGS: 00010246 RAX: 000000000000ffd7 RBX: 000000000000ffd7 RCX: 0000000000040000 RDX: 0000000000000000 RSI: 000000000003ffff RDI: 0000000000040000 RBP: 000000000000ffd7 R08: ffffffff820cf297 R09: 0000000000000001 R10: 0000000000000000 R11: ffffffff8103d1a0 R12: 0000000000003f00 R13: 0000000000300000 R14: ffff888101cf3540 R15: 0000000000180000 FS: 00007f9af4c09640(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b33824000 CR3: 000000012f241001 CR4: 0000000000170ee0 Call Trace: <TASK> __tcp_cleanup_rbuf+0x138/0x1d0 net/ipv4/tcp.c:1611 mptcp_recvmsg+0xcb8/0xdd0 net/mptcp/protocol.c:2034 inet_recvmsg+0x127/0x1f0 net/ipv4/af_inet.c:861 ____sys_recvmsg+0x269/0x2b0 net/socket.c:1019 ___sys_recvmsg+0xe6/0x260 net/socket.c:2764 do_recvmmsg+0x1a5/0x470 net/socket.c:2858 __do_sys_recvmmsg net/socket.c:2937 [inline] __se_sys_recvmmsg net/socket.c:2953 [inline] __x64_sys_recvmmsg+0xa6/0x130 net/socket.c:2953 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f9af58fc6a9 Code: 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 37 0d 00 f7 d8 64 89 01 48 RSP: 002b:00007f9af4c08cd8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b RAX: ffffffffffffffda RBX: 00000000006bc050 RCX: 00007f9af58fc6a9 RDX: 0000000000000001 RSI: 0000000020000140 RDI: 0000000000000004 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000f00 R11: 0000000000000246 R12: 00000000006bc05c R13: fffffffffffffea8 R14: 00000000006bc050 R15: 000000000001fe40 </TASK>
mptcp_recvmsg is allowed to release the msk socket lock when blocking, and before re-acquiring it another thread could have switched the sock to TCP_LISTEN status - with a prior connect(AF_UNSPEC) - also clearing icsk_ack.rcv_mss.
Address the issue preventing the disconnect if some other process is concurrently performing a blocking syscall on the same socket, alike commit 4faeee0cf8a5 ("tcp: deny tcp_disconnect() when threads are waiting").
Fixes: a6b118febbab ("mptcp: add receive buffer auto-tuning") Cc: stable@vger.kernel.org Reported-by: Christoph Paasch cpaasch@apple.com Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/404 Signed-off-by: Paolo Abeni pabeni@redhat.com Tested-by: Christoph Paasch cpaasch@apple.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -3054,6 +3054,12 @@ static int mptcp_disconnect(struct sock { struct mptcp_sock *msk = mptcp_sk(sk);
+ /* Deny disconnect if other threads are blocked in sk_wait_event() + * or inet_wait_for_connect(). + */ + if (sk->sk_wait_pending) + return -EBUSY; + /* We are on the fastopen error path. We can't call straight into the * subflows cleanup code due to lock nesting (we are already under * msk->firstsocket lock). @@ -3120,6 +3126,7 @@ struct sock *mptcp_sk_clone_init(const s inet_sk(nsk)->pinet6 = mptcp_inet6_sk(nsk); #endif
+ nsk->sk_wait_pending = 0; __mptcp_init_sock(nsk);
msk = mptcp_sk(nsk);
From: Paolo Abeni pabeni@redhat.com
commit 56a666c48b038e91b76471289e2cf60c79d326b9 upstream.
At passive MPJ time, if the msk socket lock is held by the user, the new subflow is appended to the msk->join_list under the msk data lock.
In mptcp_release_cb()/__mptcp_flush_join_list(), the subflows in that list are moved from the join_list into the conn_list under the msk socket lock.
Append and removal could race, possibly corrupting such list. Address the issue splicing the join list into a temporary one while still under the msk data lock.
Found by code inspection, the race itself should be almost impossible to trigger in practice.
Fixes: 3e5014909b56 ("mptcp: cleanup MPJ subflow list handling") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
--- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -837,12 +837,12 @@ static bool __mptcp_finish_join(struct m return true; }
-static void __mptcp_flush_join_list(struct sock *sk) +static void __mptcp_flush_join_list(struct sock *sk, struct list_head *join_list) { struct mptcp_subflow_context *tmp, *subflow; struct mptcp_sock *msk = mptcp_sk(sk);
- list_for_each_entry_safe(subflow, tmp, &msk->join_list, node) { + list_for_each_entry_safe(subflow, tmp, join_list, node) { struct sock *ssk = mptcp_subflow_tcp_sock(subflow); bool slow = lock_sock_fast(ssk);
@@ -3314,9 +3314,14 @@ static void mptcp_release_cb(struct sock for (;;) { unsigned long flags = (msk->cb_flags & MPTCP_FLAGS_PROCESS_CTX_NEED) | msk->push_pending; + struct list_head join_list; + if (!flags) break;
+ INIT_LIST_HEAD(&join_list); + list_splice_init(&msk->join_list, &join_list); + /* the following actions acquire the subflow socket lock * * 1) can't be invoked in atomic scope @@ -3327,8 +3332,9 @@ static void mptcp_release_cb(struct sock msk->push_pending = 0; msk->cb_flags &= ~flags; spin_unlock_bh(&sk->sk_lock.slock); + if (flags & BIT(MPTCP_FLUSH_JOIN_LIST)) - __mptcp_flush_join_list(sk); + __mptcp_flush_join_list(sk, &join_list); if (flags & BIT(MPTCP_PUSH_PENDING)) __mptcp_push_pending(sk, 0); if (flags & BIT(MPTCP_RETRANSMIT))
From: Paolo Abeni pabeni@redhat.com
commit 81c1d029016001f994ce1c46849c5e9900d8eab8 upstream.
An orphaned msk releases the used resources via the worker, when the latter first see the msk in CLOSED status.
If the msk status transitions to TCP_CLOSE in the release callback invoked by the worker's final release_sock(), such instance of the workqueue will not take any action.
Additionally the MPTCP code prevents scheduling the worker once the socket reaches the CLOSE status: such msk resources will be leaked.
The only code path that can trigger the above scenario is the __mptcp_check_send_data_fin() in fallback mode.
Address the issue removing the special handling of fallback socket in __mptcp_check_send_data_fin(), consolidating the state machine for fallback and non fallback socket.
Since non-fallback sockets do not send and do not receive data_fin, the mptcp code can update the msk internal status to match the next step in the SM every time data fin (ack) should be generated or received.
As a consequence we can remove a bunch of checks for fallback from the fastpath.
Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.c | 41 +++++++++++++++-------------------------- net/mptcp/subflow.c | 17 ++++++++++------- 2 files changed, 25 insertions(+), 33 deletions(-)
--- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -44,7 +44,7 @@ enum { static struct percpu_counter mptcp_sockets_allocated ____cacheline_aligned_in_smp;
static void __mptcp_destroy_sock(struct sock *sk); -static void __mptcp_check_send_data_fin(struct sock *sk); +static void mptcp_check_send_data_fin(struct sock *sk);
DEFINE_PER_CPU(struct mptcp_delegated_action, mptcp_delegated_actions); static struct net_device mptcp_napi_dev; @@ -411,8 +411,7 @@ static bool mptcp_pending_data_fin_ack(s { struct mptcp_sock *msk = mptcp_sk(sk);
- return !__mptcp_check_fallback(msk) && - ((1 << sk->sk_state) & + return ((1 << sk->sk_state) & (TCPF_FIN_WAIT1 | TCPF_CLOSING | TCPF_LAST_ACK)) && msk->write_seq == READ_ONCE(msk->snd_una); } @@ -570,9 +569,6 @@ static bool mptcp_check_data_fin(struct u64 rcv_data_fin_seq; bool ret = false;
- if (__mptcp_check_fallback(msk)) - return ret; - /* Need to ack a DATA_FIN received from a peer while this side * of the connection is in ESTABLISHED, FIN_WAIT1, or FIN_WAIT2. * msk->rcv_data_fin was set when parsing the incoming options @@ -610,7 +606,8 @@ static bool mptcp_check_data_fin(struct }
ret = true; - mptcp_send_ack(msk); + if (!__mptcp_check_fallback(msk)) + mptcp_send_ack(msk); mptcp_close_wake_up(sk); } return ret; @@ -1596,7 +1593,7 @@ out: if (!mptcp_timer_pending(sk)) mptcp_reset_timer(sk); if (do_check_data_fin) - __mptcp_check_send_data_fin(sk); + mptcp_check_send_data_fin(sk); }
static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk, bool first) @@ -2651,8 +2648,6 @@ static void mptcp_worker(struct work_str if (unlikely((1 << state) & (TCPF_CLOSE | TCPF_LISTEN))) goto unlock;
- mptcp_check_data_fin_ack(sk); - mptcp_check_fastclose(msk);
mptcp_pm_nl_work(msk); @@ -2660,7 +2655,8 @@ static void mptcp_worker(struct work_str if (test_and_clear_bit(MPTCP_WORK_EOF, &msk->flags)) mptcp_check_for_eof(msk);
- __mptcp_check_send_data_fin(sk); + mptcp_check_send_data_fin(sk); + mptcp_check_data_fin_ack(sk); mptcp_check_data_fin(sk);
if (test_and_clear_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags)) @@ -2803,6 +2799,12 @@ void mptcp_subflow_shutdown(struct sock pr_debug("Fallback"); ssk->sk_shutdown |= how; tcp_shutdown(ssk, how); + + /* simulate the data_fin ack reception to let the state + * machine move forward + */ + WRITE_ONCE(mptcp_sk(sk)->snd_una, mptcp_sk(sk)->snd_nxt); + mptcp_schedule_work(sk); } else { pr_debug("Sending DATA_FIN on subflow %p", ssk); tcp_send_ack(ssk); @@ -2842,7 +2844,7 @@ static int mptcp_close_state(struct sock return next & TCP_ACTION_FIN; }
-static void __mptcp_check_send_data_fin(struct sock *sk) +static void mptcp_check_send_data_fin(struct sock *sk) { struct mptcp_subflow_context *subflow; struct mptcp_sock *msk = mptcp_sk(sk); @@ -2860,19 +2862,6 @@ static void __mptcp_check_send_data_fin(
WRITE_ONCE(msk->snd_nxt, msk->write_seq);
- /* fallback socket will not get data_fin/ack, can move to the next - * state now - */ - if (__mptcp_check_fallback(msk)) { - WRITE_ONCE(msk->snd_una, msk->write_seq); - if ((1 << sk->sk_state) & (TCPF_CLOSING | TCPF_LAST_ACK)) { - inet_sk_state_store(sk, TCP_CLOSE); - mptcp_close_wake_up(sk); - } else if (sk->sk_state == TCP_FIN_WAIT1) { - inet_sk_state_store(sk, TCP_FIN_WAIT2); - } - } - mptcp_for_each_subflow(msk, subflow) { struct sock *tcp_sk = mptcp_subflow_tcp_sock(subflow);
@@ -2892,7 +2881,7 @@ static void __mptcp_wr_shutdown(struct s WRITE_ONCE(msk->write_seq, msk->write_seq + 1); WRITE_ONCE(msk->snd_data_fin_enable, 1);
- __mptcp_check_send_data_fin(sk); + mptcp_check_send_data_fin(sk); }
static void __mptcp_destroy_sock(struct sock *sk) --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1749,14 +1749,16 @@ static void subflow_state_change(struct { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); struct sock *parent = subflow->conn; + struct mptcp_sock *msk;
__subflow_state_change(sk);
+ msk = mptcp_sk(parent); if (subflow_simultaneous_connect(sk)) { mptcp_propagate_sndbuf(parent, sk); mptcp_do_fallback(sk); - mptcp_rcv_space_init(mptcp_sk(parent), sk); - pr_fallback(mptcp_sk(parent)); + mptcp_rcv_space_init(msk, sk); + pr_fallback(msk); subflow->conn_finished = 1; mptcp_set_connected(parent); } @@ -1772,11 +1774,12 @@ static void subflow_state_change(struct
subflow_sched_work_if_closed(mptcp_sk(parent), sk);
- if (__mptcp_check_fallback(mptcp_sk(parent)) && - !subflow->rx_eof && subflow_is_done(sk)) { - subflow->rx_eof = 1; - mptcp_subflow_eof(parent); - } + /* when the fallback subflow closes the rx side, trigger a 'dummy' + * ingress data fin, so that the msk state will follow along + */ + if (__mptcp_check_fallback(msk) && subflow_is_done(sk) && msk->first == sk && + mptcp_update_rcv_data_fin(msk, READ_ONCE(msk->ack_seq), true)) + mptcp_schedule_work(parent); }
void mptcp_subflow_queue_clean(struct sock *listener_sk, struct sock *listener_ssk)
From: Paolo Abeni pabeni@redhat.com
commit 57fc0f1ceaa4016354cf6f88533e20b56190e41a upstream.
The MPTCP protocol access the listener subflow in a lockless manner in a couple of places (poll, diag). That works only if the msk itself leaves the listener status only after that the subflow itself has been closed/disconnected. Otherwise we risk deadlock in diag, as reported by Christoph.
Address the issue ensuring that the first subflow (the listener one) is always disconnected before updating the msk socket status.
Reported-by: Christoph Paasch cpaasch@apple.com Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/407 Fixes: b29fcfb54cd7 ("mptcp: full disconnect implementation") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 1 + net/mptcp/protocol.c | 31 +++++++++++++++++++------------ 2 files changed, 20 insertions(+), 12 deletions(-)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -1047,6 +1047,7 @@ static int mptcp_pm_nl_create_listen_soc if (err) return err;
+ inet_sk_state_store(newsk, TCP_LISTEN); err = kernel_listen(ssock, backlog); if (err) return err; --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2385,13 +2385,6 @@ static void __mptcp_close_ssk(struct soc kfree_rcu(subflow, rcu); } else { /* otherwise tcp will dispose of the ssk and subflow ctx */ - if (ssk->sk_state == TCP_LISTEN) { - tcp_set_state(ssk, TCP_CLOSE); - mptcp_subflow_queue_clean(sk, ssk); - inet_csk_listen_stop(ssk); - mptcp_event_pm_listener(ssk, MPTCP_EVENT_LISTENER_CLOSED); - } - __tcp_close(ssk, 0);
/* close acquired an extra ref */ @@ -2926,10 +2919,24 @@ static __poll_t mptcp_check_readable(str return EPOLLIN | EPOLLRDNORM; }
-static void mptcp_listen_inuse_dec(struct sock *sk) +static void mptcp_check_listen_stop(struct sock *sk) { - if (inet_sk_state_load(sk) == TCP_LISTEN) - sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); + struct sock *ssk; + + if (inet_sk_state_load(sk) != TCP_LISTEN) + return; + + sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); + ssk = mptcp_sk(sk)->first; + if (WARN_ON_ONCE(!ssk || inet_sk_state_load(ssk) != TCP_LISTEN)) + return; + + lock_sock_nested(ssk, SINGLE_DEPTH_NESTING); + mptcp_subflow_queue_clean(sk, ssk); + inet_csk_listen_stop(ssk); + mptcp_event_pm_listener(ssk, MPTCP_EVENT_LISTENER_CLOSED); + tcp_set_state(ssk, TCP_CLOSE); + release_sock(ssk); }
bool __mptcp_close(struct sock *sk, long timeout) @@ -2942,7 +2949,7 @@ bool __mptcp_close(struct sock *sk, long WRITE_ONCE(sk->sk_shutdown, SHUTDOWN_MASK);
if ((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_CLOSE)) { - mptcp_listen_inuse_dec(sk); + mptcp_check_listen_stop(sk); inet_sk_state_store(sk, TCP_CLOSE); goto cleanup; } @@ -3056,7 +3063,7 @@ static int mptcp_disconnect(struct sock if (msk->fastopening) return -EBUSY;
- mptcp_listen_inuse_dec(sk); + mptcp_check_listen_stop(sk); inet_sk_state_store(sk, TCP_CLOSE);
mptcp_stop_timer(sk);
From: Xiu Jianfeng xiujianfeng@huawei.com
commit 6f363f5aa845561f7ea496d8b1175e3204470486 upstream.
We found a refcount UAF bug as follows:
refcount_t: addition on 0; use-after-free. WARNING: CPU: 1 PID: 342 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x148 Workqueue: events cpuset_hotplug_workfn Call trace: refcount_warn_saturate+0xa0/0x148 __refcount_add.constprop.0+0x5c/0x80 css_task_iter_advance_css_set+0xd8/0x210 css_task_iter_advance+0xa8/0x120 css_task_iter_next+0x94/0x158 update_tasks_root_domain+0x58/0x98 rebuild_root_domains+0xa0/0x1b0 rebuild_sched_domains_locked+0x144/0x188 cpuset_hotplug_workfn+0x138/0x5a0 process_one_work+0x1e8/0x448 worker_thread+0x228/0x3e0 kthread+0xe0/0xf0 ret_from_fork+0x10/0x20
then a kernel panic will be triggered as below:
Unable to handle kernel paging request at virtual address 00000000c0000010 Call trace: cgroup_apply_control_disable+0xa4/0x16c rebind_subsystems+0x224/0x590 cgroup_destroy_root+0x64/0x2e0 css_free_rwork_fn+0x198/0x2a0 process_one_work+0x1d4/0x4bc worker_thread+0x158/0x410 kthread+0x108/0x13c ret_from_fork+0x10/0x18
The race that cause this bug can be shown as below:
(hotplug cpu) | (umount cpuset) mutex_lock(&cpuset_mutex) | mutex_lock(&cgroup_mutex) cpuset_hotplug_workfn | rebuild_root_domains | rebind_subsystems update_tasks_root_domain | spin_lock_irq(&css_set_lock) css_task_iter_start | list_move_tail(&cset->e_cset_node[ss->id] while(css_task_iter_next) | &dcgrp->e_csets[ss->id]); css_task_iter_end | spin_unlock_irq(&css_set_lock) mutex_unlock(&cpuset_mutex) | mutex_unlock(&cgroup_mutex)
Inside css_task_iter_start/next/end, css_set_lock is hold and then released, so when iterating task(left side), the css_set may be moved to another list(right side), then it->cset_head points to the old list head and it->cset_pos->next points to the head node of new list, which can't be used as struct css_set.
To fix this issue, switch from all css_sets to only scgrp's css_sets to patch in-flight iterators to preserve correct iteration, and then update it->cset_head as well.
Reported-by: Gaosheng Cui cuigaosheng1@huawei.com Link: https://www.spinics.net/lists/cgroups/msg37935.html Suggested-by: Michal Koutný mkoutny@suse.com Link: https://lore.kernel.org/all/20230526114139.70274-1-xiujianfeng@huaweicloud.c... Signed-off-by: Xiu Jianfeng xiujianfeng@huawei.com Fixes: 2d8f243a5e6e ("cgroup: implement cgroup->e_csets[]") Cc: stable@vger.kernel.org # v3.16+ Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/cgroup/cgroup.c | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-)
--- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1788,7 +1788,7 @@ int rebind_subsystems(struct cgroup_root { struct cgroup *dcgrp = &dst_root->cgrp; struct cgroup_subsys *ss; - int ssid, i, ret; + int ssid, ret; u16 dfl_disable_ss_mask = 0;
lockdep_assert_held(&cgroup_mutex); @@ -1832,7 +1832,8 @@ int rebind_subsystems(struct cgroup_root struct cgroup_root *src_root = ss->root; struct cgroup *scgrp = &src_root->cgrp; struct cgroup_subsys_state *css = cgroup_css(scgrp, ss); - struct css_set *cset; + struct css_set *cset, *cset_pos; + struct css_task_iter *it;
WARN_ON(!css || cgroup_css(dcgrp, ss));
@@ -1850,9 +1851,22 @@ int rebind_subsystems(struct cgroup_root css->cgroup = dcgrp;
spin_lock_irq(&css_set_lock); - hash_for_each(css_set_table, i, cset, hlist) + WARN_ON(!list_empty(&dcgrp->e_csets[ss->id])); + list_for_each_entry_safe(cset, cset_pos, &scgrp->e_csets[ss->id], + e_cset_node[ss->id]) { list_move_tail(&cset->e_cset_node[ss->id], &dcgrp->e_csets[ss->id]); + /* + * all css_sets of scgrp together in same order to dcgrp, + * patch in-flight iterators to preserve correct iteration. + * since the iterator is always advanced right away and + * finished when it->cset_pos meets it->cset_head, so only + * update it->cset_head is enough here. + */ + list_for_each_entry(it, &cset->task_iters, iters_node) + if (it->cset_head == &scgrp->e_csets[ss->id]) + it->cset_head = &dcgrp->e_csets[ss->id]; + } spin_unlock_irq(&css_set_lock);
if (ss->css_rstat_flush) {
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
commit f0cc749254d12c78e93dae3b27b21dc9546843d0 upstream.
syzbot is again reporting circular locking dependency between cpu_hotplug_lock and freezer_mutex. Do like what we did with commit 57dcd64c7e036299 ("cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex").
Reported-by: syzbot syzbot+2ab700fe1829880a2ec6@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=2ab700fe1829880a2ec6 Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Tested-by: syzbot syzbot+2ab700fe1829880a2ec6@syzkaller.appspotmail.com Fixes: f5d39b020809 ("freezer,sched: Rewrite core freezer logic") Cc: stable@vger.kernel.org # v6.1+ Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/cgroup/legacy_freezer.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/kernel/cgroup/legacy_freezer.c +++ b/kernel/cgroup/legacy_freezer.c @@ -108,16 +108,18 @@ static int freezer_css_online(struct cgr struct freezer *freezer = css_freezer(css); struct freezer *parent = parent_freezer(freezer);
+ cpus_read_lock(); mutex_lock(&freezer_mutex);
freezer->state |= CGROUP_FREEZER_ONLINE;
if (parent && (parent->state & CGROUP_FREEZING)) { freezer->state |= CGROUP_FREEZING_PARENT | CGROUP_FROZEN; - static_branch_inc(&freezer_active); + static_branch_inc_cpuslocked(&freezer_active); }
mutex_unlock(&freezer_mutex); + cpus_read_unlock(); return 0; }
@@ -132,14 +134,16 @@ static void freezer_css_offline(struct c { struct freezer *freezer = css_freezer(css);
+ cpus_read_lock(); mutex_lock(&freezer_mutex);
if (freezer->state & CGROUP_FREEZING) - static_branch_dec(&freezer_active); + static_branch_dec_cpuslocked(&freezer_active);
freezer->state = 0;
mutex_unlock(&freezer_mutex); + cpus_read_unlock(); }
static void freezer_css_free(struct cgroup_subsys_state *css)
From: Jiawen Wu jiawenwu@trustnetic.com
commit 408c090002c8ca5da3da1417d1d675583379fae6 upstream.
PHY address and device address are passed in the wrong order.
Cc: stable@vger.kernel.org Fixes: 4e4aafcddbbf ("net: mdio: Add dedicated C45 API to MDIO bus drivers") Signed-off-by: Jiawen Wu jiawenwu@trustnetic.com Reviewed-by: Andrew Lunn andrew@lunn.ch Reviewed-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Link: https://lore.kernel.org/r/20230619094948.84452-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/phy/mdio_bus.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c index 389f33a12534..8b3618d3da4a 100644 --- a/drivers/net/phy/mdio_bus.c +++ b/drivers/net/phy/mdio_bus.c @@ -1287,7 +1287,7 @@ EXPORT_SYMBOL_GPL(mdiobus_modify_changed); * @mask: bit mask of bits to clear * @set: bit mask of bits to set */ -int mdiobus_c45_modify_changed(struct mii_bus *bus, int devad, int addr, +int mdiobus_c45_modify_changed(struct mii_bus *bus, int addr, int devad, u32 regnum, u16 mask, u16 set) { int err;
From: Jisheng Zhang jszhang@kernel.org
commit f334ad47683606b682b4166b800d8b372d315436 upstream.
mmc host drivers should have enabled the asynchronous probe option, but it seems like we didn't set it for litex_mmc when introducing litex mmc support, so let's set it now.
Tested with linux-on-litex-vexriscv on sipeed tang nano 20K fpga.
Signed-off-by: Jisheng Zhang jszhang@kernel.org Acked-by: Gabriel Somlo gsomlo@gmail.com Fixes: 92e099104729 ("mmc: Add driver for LiteX's LiteSDCard interface") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230617085319.2139-1-jszhang@kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/litex_mmc.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/mmc/host/litex_mmc.c +++ b/drivers/mmc/host/litex_mmc.c @@ -649,6 +649,7 @@ static struct platform_driver litex_mmc_ .driver = { .name = "litex-mmc", .of_match_table = litex_match, + .probe_type = PROBE_PREFER_ASYNCHRONOUS, }, }; module_platform_driver(litex_mmc_driver);
From: Stephan Gerhold stephan@gerhold.net
commit e6f9e590b72e12bbb86b1b8be7e1981f357392ad upstream.
While SDHCI claims to support 64-bit DMA on MSM8916 it does not seem to be properly functional. It is not immediately obvious because SDHCI is usually used with IOMMU bypassed on this SoC, and all physical memory has 32-bit addresses. But when trying to enable the IOMMU it quickly fails with an error such as the following:
arm-smmu 1e00000.iommu: Unhandled context fault: fsr=0x402, iova=0xfffff200, fsynr=0xe0000, cbfrsynra=0x140, cb=3 mmc1: ADMA error: 0x02000000 mmc1: sdhci: ============ SDHCI REGISTER DUMP =========== mmc1: sdhci: Sys addr: 0x00000000 | Version: 0x00002e02 mmc1: sdhci: Blk size: 0x00000008 | Blk cnt: 0x00000000 mmc1: sdhci: Argument: 0x00000000 | Trn mode: 0x00000013 mmc1: sdhci: Present: 0x03f80206 | Host ctl: 0x00000019 mmc1: sdhci: Power: 0x0000000f | Blk gap: 0x00000000 mmc1: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007 mmc1: sdhci: Timeout: 0x0000000a | Int stat: 0x00000001 mmc1: sdhci: Int enab: 0x03ff900b | Sig enab: 0x03ff100b mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 mmc1: sdhci: Caps: 0x322dc8b2 | Caps_1: 0x00008007 mmc1: sdhci: Cmd: 0x0000333a | Max curr: 0x00000000 mmc1: sdhci: Resp[0]: 0x00000920 | Resp[1]: 0x5b590000 mmc1: sdhci: Resp[2]: 0xe6487f80 | Resp[3]: 0x0a404094 mmc1: sdhci: Host ctl2: 0x00000008 mmc1: sdhci: ADMA Err: 0x00000001 | ADMA Ptr: 0x0000000ffffff224 mmc1: sdhci_msm: ----------- VENDOR REGISTER DUMP ----------- mmc1: sdhci_msm: DLL sts: 0x00000000 | DLL cfg: 0x60006400 | DLL cfg2: 0x00000000 mmc1: sdhci_msm: DLL cfg3: 0x00000000 | DLL usr ctl: 0x00000000 | DDR cfg: 0x00000000 mmc1: sdhci_msm: Vndr func: 0x00018a9c | Vndr func2 : 0xf88018a8 Vndr func3: 0x00000000 mmc1: sdhci: ============================================ mmc1: sdhci: fffffffff200: DMA 0x0000ffffffffe100, LEN 0x0008, Attr=0x21 mmc1: sdhci: fffffffff20c: DMA 0x0000000000000000, LEN 0x0000, Attr=0x03
Looking closely it's obvious that only the 32-bit part of the address (0xfffff200) arrives at the SMMU, the higher 16-bit (0xffff...) get lost somewhere. This might not be a limitation of the SDHCI itself but perhaps the bus/interconnect it is connected to, or even the connection to the SMMU.
Work around this by setting SDHCI_QUIRK2_BROKEN_64_BIT_DMA to avoid using 64-bit addresses.
Signed-off-by: Stephan Gerhold stephan@gerhold.net Acked-by: Adrian Hunter adrian.hunter@intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230518-msm8916-64bit-v1-1-5694b0f35211@gerhold.n... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/sdhci-msm.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/mmc/host/sdhci-msm.c +++ b/drivers/mmc/host/sdhci-msm.c @@ -2479,6 +2479,9 @@ static inline void sdhci_msm_get_of_prop msm_host->ddr_config = DDR_CONFIG_POR_VAL;
of_property_read_u32(node, "qcom,dll-config", &msm_host->dll_config); + + if (of_device_is_compatible(node, "qcom,msm8916-sdhci")) + host->quirks2 |= SDHCI_QUIRK2_BROKEN_64_BIT_DMA; }
static int sdhci_msm_gcc_reset(struct device *dev, struct sdhci_host *host)
From: Martin Hundebøll martin@geanix.com
commit 3c40eb8145325b0f5b93b8a169146078cb2c49d6 upstream.
The call to mmc_request_done() can schedule, so it must not be called from irq context. Wake the irq thread if it needs to be called, and let its existing logic do its work.
Fixes the following kernel bug, which appears when running an RT patched kernel on the AmLogic Meson AXG A113X SoC: [ 11.111407] BUG: scheduling while atomic: kworker/0:1H/75/0x00010001 [ 11.111438] Modules linked in: [ 11.111451] CPU: 0 PID: 75 Comm: kworker/0:1H Not tainted 6.4.0-rc3-rt2-rtx-00081-gfd07f41ed6b4-dirty #1 [ 11.111461] Hardware name: RTX AXG A113X Linux Platform Board (DT) [ 11.111469] Workqueue: kblockd blk_mq_run_work_fn [ 11.111492] Call trace: [ 11.111497] dump_backtrace+0xac/0xe8 [ 11.111510] show_stack+0x18/0x28 [ 11.111518] dump_stack_lvl+0x48/0x60 [ 11.111530] dump_stack+0x18/0x24 [ 11.111537] __schedule_bug+0x4c/0x68 [ 11.111548] __schedule+0x80/0x574 [ 11.111558] schedule_loop+0x2c/0x50 [ 11.111567] schedule_rtlock+0x14/0x20 [ 11.111576] rtlock_slowlock_locked+0x468/0x730 [ 11.111587] rt_spin_lock+0x40/0x64 [ 11.111596] __wake_up_common_lock+0x5c/0xc4 [ 11.111610] __wake_up+0x18/0x24 [ 11.111620] mmc_blk_mq_req_done+0x68/0x138 [ 11.111633] mmc_request_done+0x104/0x118 [ 11.111644] meson_mmc_request_done+0x38/0x48 [ 11.111654] meson_mmc_irq+0x128/0x1f0 [ 11.111663] __handle_irq_event_percpu+0x70/0x114 [ 11.111674] handle_irq_event_percpu+0x18/0x4c [ 11.111683] handle_irq_event+0x80/0xb8 [ 11.111691] handle_fasteoi_irq+0xa4/0x120 [ 11.111704] handle_irq_desc+0x20/0x38 [ 11.111712] generic_handle_domain_irq+0x1c/0x28 [ 11.111721] gic_handle_irq+0x8c/0xa8 [ 11.111735] call_on_irq_stack+0x24/0x4c [ 11.111746] do_interrupt_handler+0x88/0x94 [ 11.111757] el1_interrupt+0x34/0x64 [ 11.111769] el1h_64_irq_handler+0x18/0x24 [ 11.111779] el1h_64_irq+0x64/0x68 [ 11.111786] __add_wait_queue+0x0/0x4c [ 11.111795] mmc_blk_rw_wait+0x84/0x118 [ 11.111804] mmc_blk_mq_issue_rq+0x5c4/0x654 [ 11.111814] mmc_mq_queue_rq+0x194/0x214 [ 11.111822] blk_mq_dispatch_rq_list+0x3ac/0x528 [ 11.111834] __blk_mq_sched_dispatch_requests+0x340/0x4d0 [ 11.111847] blk_mq_sched_dispatch_requests+0x38/0x70 [ 11.111858] blk_mq_run_work_fn+0x3c/0x70 [ 11.111865] process_one_work+0x17c/0x1f0 [ 11.111876] worker_thread+0x1d4/0x26c [ 11.111885] kthread+0xe4/0xf4 [ 11.111894] ret_from_fork+0x10/0x20
Fixes: 51c5d8447bd7 ("MMC: meson: initial support for GX platforms") Cc: stable@vger.kernel.org Signed-off-by: Martin Hundebøll martin@geanix.com Link: https://lore.kernel.org/r/20230607082713.517157-1-martin@geanix.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/meson-gx-mmc.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-)
--- a/drivers/mmc/host/meson-gx-mmc.c +++ b/drivers/mmc/host/meson-gx-mmc.c @@ -1006,11 +1006,8 @@ static irqreturn_t meson_mmc_irq(int irq
if (data && !cmd->error) data->bytes_xfered = data->blksz * data->blocks; - if (meson_mmc_bounce_buf_read(data) || - meson_mmc_get_next_command(cmd)) - ret = IRQ_WAKE_THREAD; - else - ret = IRQ_HANDLED; + + return IRQ_WAKE_THREAD; }
out: @@ -1022,9 +1019,6 @@ out: writel(start, host->regs + SD_EMMC_START); }
- if (ret == IRQ_HANDLED) - meson_mmc_request_done(host->mmc, cmd->mrq); - return ret; }
From: Christophe Kerello christophe.kerello@foss.st.com
commit 47b3ad6b7842f49d374a01b054a4b1461a621bdc upstream.
The way that the timeout is currently calculated could lead to a u64 timeout value in mmci_start_command(). This value is then cast in a u32 register that leads to mmc erase failed issue with some SD cards.
Fixes: 8266c585f489 ("mmc: mmci: add hardware busy timeout feature") Signed-off-by: Yann Gautier yann.gautier@foss.st.com Signed-off-by: Christophe Kerello christophe.kerello@foss.st.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230613134146.418016-1-yann.gautier@foss.st.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/mmci.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/mmc/host/mmci.c +++ b/drivers/mmc/host/mmci.c @@ -1735,7 +1735,8 @@ static void mmci_set_max_busy_timeout(st return;
if (host->variant->busy_timeout && mmc->actual_clock) - max_busy_timeout = ~0UL / (mmc->actual_clock / MSEC_PER_SEC); + max_busy_timeout = U32_MAX / DIV_ROUND_UP(mmc->actual_clock, + MSEC_PER_SEC);
mmc->max_busy_timeout = max_busy_timeout; }
From: Sergey Shtylyov s.shtylyov@omp.ru
commit 8d0caeedcd05a721f3cc2537b0ea212ec4027307 upstream.
The driver overrides the error codes and IRQ0 returned by platform_get_irq() to -EINVAL, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the error codes upstream. Since commit ce753ad1549c ("platform: finally disallow IRQ0 in platform_get_irq() and its ilk") IRQ0 is no longer returned by those APIs, so we now can safely ignore it...
Fixes: 682798a596a6 ("mmc: sdhci-spear: Handle return value of platform_get_irq") Cc: stable@vger.kernel.org # v5.19+ Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Acked-by: Viresh Kumar viresh.kumar@linaro.org Acked-by: Adrian Hunter adrian.hunter@intel.com Link: https://lore.kernel.org/r/20230617203622.6812-10-s.shtylyov@omp.ru Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/sdhci-spear.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/mmc/host/sdhci-spear.c +++ b/drivers/mmc/host/sdhci-spear.c @@ -65,8 +65,8 @@ static int sdhci_probe(struct platform_d host->hw_name = "sdhci"; host->ops = &sdhci_pltfm_ops; host->irq = platform_get_irq(pdev, 0); - if (host->irq <= 0) { - ret = -EINVAL; + if (host->irq < 0) { + ret = host->irq; goto err_host; } host->quirks = SDHCI_QUIRK_BROKEN_ADMA;
From: Sergey Shtylyov s.shtylyov@omp.ru
commit 71150ac12558bcd9d75e6e24cf7c872c2efd80f3 upstream.
The driver overrides the error codes and IRQ0 returned by platform_get_irq() to -EINVAL, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the error codes upstream. Since commit ce753ad1549c ("platform: finally disallow IRQ0 in platform_get_irq() and its ilk") IRQ0 is no longer returned by those APIs, so we now can safely ignore it...
Fixes: 660fc733bd74 ("mmc: bcm2835: Add new driver for the sdhost controller.") Cc: stable@vger.kernel.org # v5.19+ Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Link: https://lore.kernel.org/r/20230617203622.6812-2-s.shtylyov@omp.ru Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/bcm2835.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/mmc/host/bcm2835.c +++ b/drivers/mmc/host/bcm2835.c @@ -1403,8 +1403,8 @@ static int bcm2835_probe(struct platform host->max_clk = clk_get_rate(clk);
host->irq = platform_get_irq(pdev, 0); - if (host->irq <= 0) { - ret = -EINVAL; + if (host->irq < 0) { + ret = host->irq; goto err; }
From: Sergey Shtylyov s.shtylyov@omp.ru
commit c2df53c5806cfd746dae08e07bc8c4ad247c3b70 upstream.
The driver overrides the error codes and IRQ0 returned by platform_get_irq() to -EINVAL, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the error codes upstream. Since commit ce753ad1549c ("platform: finally disallow IRQ0 in platform_get_irq() and its ilk") IRQ0 is no longer returned by those APIs, so we now can safely ignore it...
Fixes: 2408a08583d2 ("mmc: sunxi-mmc: Handle return value of platform_get_irq") Cc: stable@vger.kernel.org # v5.19+ Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Reviewed-by: Jernej Skrabec jernej.skrabec@gmail.com Link: https://lore.kernel.org/r/20230617203622.6812-12-s.shtylyov@omp.ru Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/sunxi-mmc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/mmc/host/sunxi-mmc.c +++ b/drivers/mmc/host/sunxi-mmc.c @@ -1350,8 +1350,8 @@ static int sunxi_mmc_resource_request(st return ret;
host->irq = platform_get_irq(pdev, 0); - if (host->irq <= 0) { - ret = -EINVAL; + if (host->irq < 0) { + ret = host->irq; goto error_disable_mmc; }
From: Sergey Shtylyov s.shtylyov@omp.ru
commit b8ada54fa1b83f3b6480d4cced71354301750153 upstream.
The driver overrides the error codes and IRQ0 returned by platform_get_irq() to -EINVAL, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the error codes upstream. Since commit ce753ad1549c ("platform: finally disallow IRQ0 in platform_get_irq() and its ilk") IRQ0 is no longer returned by those APIs, so we now can safely ignore it...
Fixes: cbcaac6d7dd2 ("mmc: meson-gx-mmc: Fix platform_get_irq's error checking") Cc: stable@vger.kernel.org # v5.19+ Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Reviewed-by: Neil Armstrong neil.armstrong@linaro.org Link: https://lore.kernel.org/r/20230617203622.6812-3-s.shtylyov@omp.ru Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/meson-gx-mmc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/mmc/host/meson-gx-mmc.c +++ b/drivers/mmc/host/meson-gx-mmc.c @@ -1202,8 +1202,8 @@ static int meson_mmc_probe(struct platfo return PTR_ERR(host->regs);
host->irq = platform_get_irq(pdev, 0); - if (host->irq <= 0) - return -EINVAL; + if (host->irq < 0) + return host->irq;
cd_irq = platform_get_irq_optional(pdev, 1); mmc_gpio_set_cd_irq(mmc, cd_irq);
From: Krister Johansen kjlx@templeofstupid.com
commit 0108a4e9f3584a7a2c026d1601b0682ff7335d95 upstream.
When subprograms are in use, the main program is not jit'd after the subprograms because jit_subprogs sets a value for prog->bpf_func upon success. Subsequent calls to the JIT are bypassed when this value is non-NULL. This leads to a situation where the main program and its func[0] counterpart are both in the bpf kallsyms tree, but only func[0] has an extable. Extables are only created during JIT. Now there are two nearly identical program ksym entries in the tree, but only one has an extable. Depending upon how the entries are placed, there's a chance that a fault will call search_extable on the aux with the NULL entry.
Since jit_subprogs already copies state from func[0] to the main program, include the extable pointer in this state duplication. Additionally, ensure that the copy of the main program in func[0] is not added to the bpf_prog_kallsyms table. Instead, let the main program get added later in bpf_prog_load(). This ensures there is only a single copy of the main program in the kallsyms table, and that its tag matches the tag observed by tooling like bpftool.
Cc: stable@vger.kernel.org Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs") Signed-off-by: Krister Johansen kjlx@templeofstupid.com Acked-by: Yonghong Song yhs@fb.com Acked-by: Ilya Leoshkevich iii@linux.ibm.com Tested-by: Ilya Leoshkevich iii@linux.ibm.com Link: https://lore.kernel.org/r/6de9b2f4b4724ef56efbb0339daaa66c8b68b1e7.168661666... Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -16198,9 +16198,10 @@ static int jit_subprogs(struct bpf_verif }
/* finally lock prog and jit images for all functions and - * populate kallsysm + * populate kallsysm. Begin at the first subprogram, since + * bpf_prog_load will add the kallsyms for the main program. */ - for (i = 0; i < env->subprog_cnt; i++) { + for (i = 1; i < env->subprog_cnt; i++) { bpf_prog_lock_ro(func[i]); bpf_prog_kallsyms_add(func[i]); } @@ -16226,6 +16227,8 @@ static int jit_subprogs(struct bpf_verif prog->jited = 1; prog->bpf_func = func[0]->bpf_func; prog->jited_len = func[0]->jited_len; + prog->aux->extable = func[0]->aux->extable; + prog->aux->num_exentries = func[0]->aux->num_exentries; prog->aux->func = func; prog->aux->func_cnt = env->subprog_cnt; bpf_prog_jit_attempt_done(prog);
From: Mukesh Sisodiya mukesh.sisodiya@intel.com
commit 4e9f0ec38852c18faa9689322e758575af33e5d4 upstream.
Add support for AX1690i and AX1690s devices with PCIE id 0x7AF0.
Cc: stable@vger.kernel.org # 6.1+ Signed-off-by: Mukesh Sisodiya mukesh.sisodiya@intel.com Signed-off-by: Gregory Greenman gregory.greenman@intel.com Signed-off-by: Johannes Berg johannes.berg@intel.com Link: https://lore.kernel.org/r/20230619150233.461290-2-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/intel/iwlwifi/pcie/drv.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/wireless/intel/iwlwifi/pcie/drv.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/drv.c @@ -547,6 +547,8 @@ static const struct iwl_dev_info iwl_dev IWL_DEV_INFO(0x54F0, 0x1692, iwlax411_2ax_cfg_so_gf4_a0, iwl_ax411_killer_1690i_name), IWL_DEV_INFO(0x7A70, 0x1691, iwlax411_2ax_cfg_so_gf4_a0, iwl_ax411_killer_1690s_name), IWL_DEV_INFO(0x7A70, 0x1692, iwlax411_2ax_cfg_so_gf4_a0, iwl_ax411_killer_1690i_name), + IWL_DEV_INFO(0x7AF0, 0x1691, iwlax411_2ax_cfg_so_gf4_a0, iwl_ax411_killer_1690s_name), + IWL_DEV_INFO(0x7AF0, 0x1692, iwlax411_2ax_cfg_so_gf4_a0, iwl_ax411_killer_1690i_name),
IWL_DEV_INFO(0x271C, 0x0214, iwl9260_2ac_cfg, iwl9260_1_name), IWL_DEV_INFO(0x7E40, 0x1691, iwl_cfg_ma_a0_gf4_a0, iwl_ax411_killer_1690s_name),
From: Neil Armstrong neil.armstrong@linaro.org
[ Upstream commit 9d7054fb3ac2e8d252aae1268f20623f244e644f ]
Now spi_geni_grab_gpi_chan() errors are correctly reported, the -EPROBE_DEFER error should be returned from probe in case the GPI dma driver is built as module and/or not probed yet.
Fixes: b59c122484ec ("spi: spi-geni-qcom: Add support for GPI dma") Fixes: 6532582c353f ("spi: spi-geni-qcom: fix error handling in spi_geni_grab_gpi_chan()") Signed-off-by: Neil Armstrong neil.armstrong@linaro.org Link: https://lore.kernel.org/r/20230615-topic-sm8550-upstream-fix-spi-geni-qcom-p... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-geni-qcom.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c index b106faf21a723..baf477383682d 100644 --- a/drivers/spi/spi-geni-qcom.c +++ b/drivers/spi/spi-geni-qcom.c @@ -646,6 +646,8 @@ static int spi_geni_init(struct spi_geni_master *mas) geni_se_select_mode(se, GENI_GPI_DMA); dev_dbg(mas->dev, "Using GPI DMA mode for SPI\n"); break; + } else if (ret == -EPROBE_DEFER) { + goto out_pm; } /* * in case of failure to get gpi dma channel, we can still do the
From: Teresa Remmet t.remmet@phytec.de
[ Upstream commit 7257d930aadcd62d1c7971ab14f3b1126356abdc ]
L3_OUT and L4_OUT Bit fields range from Bit 0:4 and thus the mask should be 0x1F instead of 0x0F.
Fixes: 0935ff5f1f0a ("regulator: pca9450: add pca9450 pmic driver") Signed-off-by: Teresa Remmet t.remmet@phytec.de Reviewed-by: Frieder Schrempf frieder.schrempf@kontron.de Link: https://lore.kernel.org/r/20230614125240.3946519-1-t.remmet@phytec.de Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/regulator/pca9450.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/regulator/pca9450.h b/include/linux/regulator/pca9450.h index 3c01c2bf84f53..505c908dbb817 100644 --- a/include/linux/regulator/pca9450.h +++ b/include/linux/regulator/pca9450.h @@ -196,11 +196,11 @@ enum {
/* PCA9450_REG_LDO3_VOLT bits */ #define LDO3_EN_MASK 0xC0 -#define LDO3OUT_MASK 0x0F +#define LDO3OUT_MASK 0x1F
/* PCA9450_REG_LDO4_VOLT bits */ #define LDO4_EN_MASK 0xC0 -#define LDO4OUT_MASK 0x0F +#define LDO4OUT_MASK 0x1F
/* PCA9450_REG_LDO5_VOLT bits */ #define LDO5L_EN_MASK 0xC0
From: Russ Weight russell.h.weight@intel.com
[ Upstream commit c8e796895e2310b6130e7577248da1d771431a77 ]
The max_raw_write member of the regmap_spi_avmm_bus structure is defined as: .max_raw_write = SPI_AVMM_VAL_SIZE * MAX_WRITE_CNT
SPI_AVMM_VAL_SIZE == 4 and MAX_WRITE_CNT == 1 so this results in a maximum write transfer size of 4 bytes which provides only enough space to transfer the address of the target register. It provides no space for the value to be transferred. This bug became an issue (divide-by-zero in _regmap_raw_write()) after the following was accepted into mainline:
commit 3981514180c9 ("regmap: Account for register length when chunking")
Change max_raw_write to include space (4 additional bytes) for both the register address and value:
.max_raw_write = SPI_AVMM_REG_SIZE + SPI_AVMM_VAL_SIZE * MAX_WRITE_CNT
Fixes: 7f9fb67358a2 ("regmap: add Intel SPI Slave to AVMM Bus Bridge support") Reviewed-by: Matthew Gerlach matthew.gerlach@linux.intel.com Signed-off-by: Russ Weight russell.h.weight@intel.com Link: https://lore.kernel.org/r/20230620202824.380313-1-russell.h.weight@intel.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/base/regmap/regmap-spi-avmm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/base/regmap/regmap-spi-avmm.c b/drivers/base/regmap/regmap-spi-avmm.c index 4c2b94b3e30be..6af692844c196 100644 --- a/drivers/base/regmap/regmap-spi-avmm.c +++ b/drivers/base/regmap/regmap-spi-avmm.c @@ -660,7 +660,7 @@ static const struct regmap_bus regmap_spi_avmm_bus = { .reg_format_endian_default = REGMAP_ENDIAN_NATIVE, .val_format_endian_default = REGMAP_ENDIAN_NATIVE, .max_raw_read = SPI_AVMM_VAL_SIZE * MAX_READ_CNT, - .max_raw_write = SPI_AVMM_VAL_SIZE * MAX_WRITE_CNT, + .max_raw_write = SPI_AVMM_REG_SIZE + SPI_AVMM_VAL_SIZE * MAX_WRITE_CNT, .free_context = spi_avmm_bridge_ctx_free, };
From: Namjae Jeon linkinjeon@kernel.org
[ Upstream commit 211db0ac9e3dc6c46f2dd53395b34d76af929faf ]
Since vfs_path_lookup is exported, It should not be internal. Move vfs_path_lookup prototype in internal.h to linux/namei.h.
Suggested-by: Al Viro viro@zeniv.linux.org.uk Reviewed-by: Christian Brauner brauner@kernel.org Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Al Viro viro@zeniv.linux.org.uk Stable-dep-of: 40b268d384a2 ("ksmbd: add mnt_want_write to ksmbd vfs functions") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/internal.h | 2 -- fs/ksmbd/vfs.c | 2 -- include/linux/namei.h | 2 ++ 3 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/fs/internal.h b/fs/internal.h index dc4eb91a577a8..071a7517f1a74 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -59,8 +59,6 @@ extern int finish_clean_context(struct fs_context *fc); */ extern int filename_lookup(int dfd, struct filename *name, unsigned flags, struct path *path, struct path *root); -extern int vfs_path_lookup(struct dentry *, struct vfsmount *, - const char *, unsigned int, struct path *); int do_rmdir(int dfd, struct filename *name); int do_unlinkat(int dfd, struct filename *name); int may_linkat(struct mnt_idmap *idmap, const struct path *link); diff --git a/fs/ksmbd/vfs.c b/fs/ksmbd/vfs.c index f6a8b26593090..1ad97df1dfb6f 100644 --- a/fs/ksmbd/vfs.c +++ b/fs/ksmbd/vfs.c @@ -19,8 +19,6 @@ #include <linux/sched/xacct.h> #include <linux/crc32c.h>
-#include "../internal.h" /* for vfs_path_lookup */ - #include "glob.h" #include "oplock.h" #include "connection.h" diff --git a/include/linux/namei.h b/include/linux/namei.h index 0d797f3367cad..ba9b32b4d1b07 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -63,6 +63,8 @@ extern struct dentry *kern_path_create(int, const char *, struct path *, unsigne extern struct dentry *user_path_create(int, const char __user *, struct path *, unsigned int); extern void done_path_create(struct path *, struct dentry *); extern struct dentry *kern_path_locked(const char *, struct path *); +int vfs_path_lookup(struct dentry *, struct vfsmount *, const char *, + unsigned int, struct path *);
extern struct dentry *try_lookup_one_len(const char *, struct dentry *, int); extern struct dentry *lookup_one_len(const char *, struct dentry *, int);
From: Al Viro viro@zeniv.linux.org.uk
[ Upstream commit 9bc37e04823b5280dd0f22b6680fc23fe81ca325 ]
Pass the dentry of a source file and the dentry of a destination directory to lock parent inodes for rename. As soon as this function returns, ->d_parent of the source file dentry is stable and inodes are properly locked for calling vfs-rename. This helper is needed for ksmbd server. rename request of SMB protocol has to rename an opened file, no matter which directory it's in.
Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Al Viro viro@zeniv.linux.org.uk Stable-dep-of: 40b268d384a2 ("ksmbd: add mnt_want_write to ksmbd vfs functions") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/namei.c | 68 ++++++++++++++++++++++++++++++++++++------- include/linux/namei.h | 1 + 2 files changed, 58 insertions(+), 11 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c index edfedfbccaef4..6bc1964e2214e 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2980,20 +2980,10 @@ static inline int may_create(struct mnt_idmap *idmap, return inode_permission(idmap, dir, MAY_WRITE | MAY_EXEC); }
-/* - * p1 and p2 should be directories on the same fs. - */ -struct dentry *lock_rename(struct dentry *p1, struct dentry *p2) +static struct dentry *lock_two_directories(struct dentry *p1, struct dentry *p2) { struct dentry *p;
- if (p1 == p2) { - inode_lock_nested(p1->d_inode, I_MUTEX_PARENT); - return NULL; - } - - mutex_lock(&p1->d_sb->s_vfs_rename_mutex); - p = d_ancestor(p2, p1); if (p) { inode_lock_nested(p2->d_inode, I_MUTEX_PARENT); @@ -3012,8 +3002,64 @@ struct dentry *lock_rename(struct dentry *p1, struct dentry *p2) inode_lock_nested(p2->d_inode, I_MUTEX_PARENT2); return NULL; } + +/* + * p1 and p2 should be directories on the same fs. + */ +struct dentry *lock_rename(struct dentry *p1, struct dentry *p2) +{ + if (p1 == p2) { + inode_lock_nested(p1->d_inode, I_MUTEX_PARENT); + return NULL; + } + + mutex_lock(&p1->d_sb->s_vfs_rename_mutex); + return lock_two_directories(p1, p2); +} EXPORT_SYMBOL(lock_rename);
+/* + * c1 and p2 should be on the same fs. + */ +struct dentry *lock_rename_child(struct dentry *c1, struct dentry *p2) +{ + if (READ_ONCE(c1->d_parent) == p2) { + /* + * hopefully won't need to touch ->s_vfs_rename_mutex at all. + */ + inode_lock_nested(p2->d_inode, I_MUTEX_PARENT); + /* + * now that p2 is locked, nobody can move in or out of it, + * so the test below is safe. + */ + if (likely(c1->d_parent == p2)) + return NULL; + + /* + * c1 got moved out of p2 while we'd been taking locks; + * unlock and fall back to slow case. + */ + inode_unlock(p2->d_inode); + } + + mutex_lock(&c1->d_sb->s_vfs_rename_mutex); + /* + * nobody can move out of any directories on this fs. + */ + if (likely(c1->d_parent != p2)) + return lock_two_directories(c1->d_parent, p2); + + /* + * c1 got moved into p2 while we were taking locks; + * we need p2 locked and ->s_vfs_rename_mutex unlocked, + * for consistency with lock_rename(). + */ + inode_lock_nested(p2->d_inode, I_MUTEX_PARENT); + mutex_unlock(&c1->d_sb->s_vfs_rename_mutex); + return NULL; +} +EXPORT_SYMBOL(lock_rename_child); + void unlock_rename(struct dentry *p1, struct dentry *p2) { inode_unlock(p1->d_inode); diff --git a/include/linux/namei.h b/include/linux/namei.h index ba9b32b4d1b07..5864e4d82e567 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -83,6 +83,7 @@ extern int follow_down(struct path *path, unsigned int flags); extern int follow_up(struct path *);
extern struct dentry *lock_rename(struct dentry *, struct dentry *); +extern struct dentry *lock_rename_child(struct dentry *, struct dentry *); extern void unlock_rename(struct dentry *, struct dentry *);
extern int __must_check nd_jump_link(const struct path *path);
From: Namjae Jeon linkinjeon@kernel.org
[ Upstream commit 74d7970febf7e9005375aeda0df821d2edffc9f7 ]
Al pointed out that ksmbd has racy issue from using ->d_parent and ->d_name in ksmbd_vfs_unlink and smb2_vfs_rename(). and use new lock_rename_child() to lock stable parent while underlying rename racy. Introduce vfs_path_parent_lookup helper to avoid out of share access and export vfs functions like the following ones to use vfs_path_parent_lookup(). - rename __lookup_hash() to lookup_one_qstr_excl(). - export lookup_one_qstr_excl(). - export getname_kernel() and putname().
vfs_path_parent_lookup() is used for parent lookup of destination file using absolute pathname given from FILE_RENAME_INFORMATION request.
Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Stable-dep-of: 40b268d384a2 ("ksmbd: add mnt_want_write to ksmbd vfs functions") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ksmbd/smb2pdu.c | 147 ++++---------- fs/ksmbd/vfs.c | 435 ++++++++++++++++++------------------------ fs/ksmbd/vfs.h | 19 +- fs/ksmbd/vfs_cache.c | 5 +- fs/namei.c | 57 ++++-- include/linux/namei.h | 6 + 6 files changed, 283 insertions(+), 386 deletions(-)
diff --git a/fs/ksmbd/smb2pdu.c b/fs/ksmbd/smb2pdu.c index 36843bef05c5f..06a335356e373 100644 --- a/fs/ksmbd/smb2pdu.c +++ b/fs/ksmbd/smb2pdu.c @@ -2515,7 +2515,7 @@ static int smb2_creat(struct ksmbd_work *work, struct path *path, char *name, return rc; }
- rc = ksmbd_vfs_kern_path(work, name, 0, path, 0); + rc = ksmbd_vfs_kern_path_locked(work, name, 0, path, 0); if (rc) { pr_err("cannot get linux path (%s), err = %d\n", name, rc); @@ -2806,8 +2806,10 @@ int smb2_open(struct ksmbd_work *work) goto err_out1; }
- rc = ksmbd_vfs_kern_path(work, name, LOOKUP_NO_SYMLINKS, &path, 1); + rc = ksmbd_vfs_kern_path_locked(work, name, LOOKUP_NO_SYMLINKS, &path, 1); if (!rc) { + file_present = true; + if (req->CreateOptions & FILE_DELETE_ON_CLOSE_LE) { /* * If file exists with under flags, return access @@ -2816,7 +2818,6 @@ int smb2_open(struct ksmbd_work *work) if (req->CreateDisposition == FILE_OVERWRITE_IF_LE || req->CreateDisposition == FILE_OPEN_IF_LE) { rc = -EACCES; - path_put(&path); goto err_out; }
@@ -2824,26 +2825,23 @@ int smb2_open(struct ksmbd_work *work) ksmbd_debug(SMB, "User does not have write permission\n"); rc = -EACCES; - path_put(&path); goto err_out; } } else if (d_is_symlink(path.dentry)) { rc = -EACCES; - path_put(&path); goto err_out; } - }
- if (rc) { + file_present = true; + idmap = mnt_idmap(path.mnt); + } else { if (rc != -ENOENT) goto err_out; ksmbd_debug(SMB, "can not get linux path for %s, rc = %d\n", name, rc); rc = 0; - } else { - file_present = true; - idmap = mnt_idmap(path.mnt); } + if (stream_name) { if (req->CreateOptions & FILE_DIRECTORY_FILE_LE) { if (s_type == DATA_STREAM) { @@ -2971,8 +2969,9 @@ int smb2_open(struct ksmbd_work *work)
if ((daccess & FILE_DELETE_LE) || (req->CreateOptions & FILE_DELETE_ON_CLOSE_LE)) { - rc = ksmbd_vfs_may_delete(idmap, - path.dentry); + rc = inode_permission(idmap, + d_inode(path.dentry->d_parent), + MAY_EXEC | MAY_WRITE); if (rc) goto err_out; } @@ -3343,10 +3342,13 @@ int smb2_open(struct ksmbd_work *work) }
err_out: - if (file_present || created) - path_put(&path); + if (file_present || created) { + inode_unlock(d_inode(path.dentry->d_parent)); + dput(path.dentry); + } ksmbd_revert_fsids(work); err_out1: + if (rc) { if (rc == -EINVAL) rsp->hdr.Status = STATUS_INVALID_PARAMETER; @@ -5485,44 +5487,19 @@ int smb2_echo(struct ksmbd_work *work)
static int smb2_rename(struct ksmbd_work *work, struct ksmbd_file *fp, - struct mnt_idmap *idmap, struct smb2_file_rename_info *file_info, struct nls_table *local_nls) { struct ksmbd_share_config *share = fp->tcon->share_conf; - char *new_name = NULL, *abs_oldname = NULL, *old_name = NULL; - char *pathname = NULL; - struct path path; - bool file_present = true; - int rc; + char *new_name = NULL; + int rc, flags = 0;
ksmbd_debug(SMB, "setting FILE_RENAME_INFO\n"); - pathname = kmalloc(PATH_MAX, GFP_KERNEL); - if (!pathname) - return -ENOMEM; - - abs_oldname = file_path(fp->filp, pathname, PATH_MAX); - if (IS_ERR(abs_oldname)) { - rc = -EINVAL; - goto out; - } - old_name = strrchr(abs_oldname, '/'); - if (old_name && old_name[1] != '\0') { - old_name++; - } else { - ksmbd_debug(SMB, "can't get last component in path %s\n", - abs_oldname); - rc = -ENOENT; - goto out; - } - new_name = smb2_get_name(file_info->FileName, le32_to_cpu(file_info->FileNameLength), local_nls); - if (IS_ERR(new_name)) { - rc = PTR_ERR(new_name); - goto out; - } + if (IS_ERR(new_name)) + return PTR_ERR(new_name);
if (strchr(new_name, ':')) { int s_type; @@ -5548,7 +5525,7 @@ static int smb2_rename(struct ksmbd_work *work, if (rc) goto out;
- rc = ksmbd_vfs_setxattr(idmap, + rc = ksmbd_vfs_setxattr(file_mnt_idmap(fp->filp), fp->filp->f_path.dentry, xattr_stream_name, NULL, 0, 0); @@ -5563,47 +5540,18 @@ static int smb2_rename(struct ksmbd_work *work, }
ksmbd_debug(SMB, "new name %s\n", new_name); - rc = ksmbd_vfs_kern_path(work, new_name, LOOKUP_NO_SYMLINKS, &path, 1); - if (rc) { - if (rc != -ENOENT) - goto out; - file_present = false; - } else { - path_put(&path); - } - if (ksmbd_share_veto_filename(share, new_name)) { rc = -ENOENT; ksmbd_debug(SMB, "Can't rename vetoed file: %s\n", new_name); goto out; }
- if (file_info->ReplaceIfExists) { - if (file_present) { - rc = ksmbd_vfs_remove_file(work, new_name); - if (rc) { - if (rc != -ENOTEMPTY) - rc = -EINVAL; - ksmbd_debug(SMB, "cannot delete %s, rc %d\n", - new_name, rc); - goto out; - } - } - } else { - if (file_present && - strncmp(old_name, path.dentry->d_name.name, strlen(old_name))) { - rc = -EEXIST; - ksmbd_debug(SMB, - "cannot rename already existing file\n"); - goto out; - } - } + if (!file_info->ReplaceIfExists) + flags = RENAME_NOREPLACE;
- rc = ksmbd_vfs_fp_rename(work, fp, new_name); + rc = ksmbd_vfs_rename(work, &fp->filp->f_path, new_name, flags); out: - kfree(pathname); - if (!IS_ERR(new_name)) - kfree(new_name); + kfree(new_name); return rc; }
@@ -5643,18 +5591,17 @@ static int smb2_create_link(struct ksmbd_work *work, }
ksmbd_debug(SMB, "target name is %s\n", target_name); - rc = ksmbd_vfs_kern_path(work, link_name, LOOKUP_NO_SYMLINKS, &path, 0); + rc = ksmbd_vfs_kern_path_locked(work, link_name, LOOKUP_NO_SYMLINKS, + &path, 0); if (rc) { if (rc != -ENOENT) goto out; file_present = false; - } else { - path_put(&path); }
if (file_info->ReplaceIfExists) { if (file_present) { - rc = ksmbd_vfs_remove_file(work, link_name); + rc = ksmbd_vfs_remove_file(work, &path); if (rc) { rc = -EINVAL; ksmbd_debug(SMB, "cannot delete %s\n", @@ -5674,6 +5621,10 @@ static int smb2_create_link(struct ksmbd_work *work, if (rc) rc = -EINVAL; out: + if (file_present) { + inode_unlock(d_inode(path.dentry->d_parent)); + path_put(&path); + } if (!IS_ERR(link_name)) kfree(link_name); kfree(pathname); @@ -5851,12 +5802,6 @@ static int set_rename_info(struct ksmbd_work *work, struct ksmbd_file *fp, struct smb2_file_rename_info *rename_info, unsigned int buf_len) { - struct mnt_idmap *idmap; - struct ksmbd_file *parent_fp; - struct dentry *parent; - struct dentry *dentry = fp->filp->f_path.dentry; - int ret; - if (!(fp->daccess & FILE_DELETE_LE)) { pr_err("no right to delete : 0x%x\n", fp->daccess); return -EACCES; @@ -5866,32 +5811,10 @@ static int set_rename_info(struct ksmbd_work *work, struct ksmbd_file *fp, le32_to_cpu(rename_info->FileNameLength)) return -EINVAL;
- idmap = file_mnt_idmap(fp->filp); - if (ksmbd_stream_fd(fp)) - goto next; - - parent = dget_parent(dentry); - ret = ksmbd_vfs_lock_parent(idmap, parent, dentry); - if (ret) { - dput(parent); - return ret; - } - - parent_fp = ksmbd_lookup_fd_inode(d_inode(parent)); - inode_unlock(d_inode(parent)); - dput(parent); + if (!le32_to_cpu(rename_info->FileNameLength)) + return -EINVAL;
- if (parent_fp) { - if (parent_fp->daccess & FILE_DELETE_LE) { - pr_err("parent dir is opened with delete access\n"); - ksmbd_fd_put(work, parent_fp); - return -ESHARE; - } - ksmbd_fd_put(work, parent_fp); - } -next: - return smb2_rename(work, fp, idmap, rename_info, - work->conn->local_nls); + return smb2_rename(work, fp, rename_info, work->conn->local_nls); }
static int set_file_disposition_info(struct ksmbd_file *fp, diff --git a/fs/ksmbd/vfs.c b/fs/ksmbd/vfs.c index 1ad97df1dfb6f..448c442ad3f49 100644 --- a/fs/ksmbd/vfs.c +++ b/fs/ksmbd/vfs.c @@ -18,6 +18,7 @@ #include <linux/vmalloc.h> #include <linux/sched/xacct.h> #include <linux/crc32c.h> +#include <linux/namei.h>
#include "glob.h" #include "oplock.h" @@ -35,19 +36,6 @@ #include "mgmt/user_session.h" #include "mgmt/user_config.h"
-static char *extract_last_component(char *path) -{ - char *p = strrchr(path, '/'); - - if (p && p[1] != '\0') { - *p = '\0'; - p++; - } else { - p = NULL; - } - return p; -} - static void ksmbd_vfs_inherit_owner(struct ksmbd_work *work, struct inode *parent_inode, struct inode *inode) @@ -61,65 +49,77 @@ static void ksmbd_vfs_inherit_owner(struct ksmbd_work *work,
/** * ksmbd_vfs_lock_parent() - lock parent dentry if it is stable - * - * the parent dentry got by dget_parent or @parent could be - * unstable, we try to lock a parent inode and lookup the - * child dentry again. - * - * the reference count of @parent isn't incremented. */ -int ksmbd_vfs_lock_parent(struct mnt_idmap *idmap, struct dentry *parent, - struct dentry *child) +int ksmbd_vfs_lock_parent(struct dentry *parent, struct dentry *child) { - struct dentry *dentry; - int ret = 0; - inode_lock_nested(d_inode(parent), I_MUTEX_PARENT); - dentry = lookup_one(idmap, child->d_name.name, parent, - child->d_name.len); - if (IS_ERR(dentry)) { - ret = PTR_ERR(dentry); - goto out_err; - } - - if (dentry != child) { - ret = -ESTALE; - dput(dentry); - goto out_err; + if (child->d_parent != parent) { + inode_unlock(d_inode(parent)); + return -ENOENT; }
- dput(dentry); return 0; -out_err: - inode_unlock(d_inode(parent)); - return ret; }
-int ksmbd_vfs_may_delete(struct mnt_idmap *idmap, - struct dentry *dentry) +static int ksmbd_vfs_path_lookup_locked(struct ksmbd_share_config *share_conf, + char *pathname, unsigned int flags, + struct path *path) { - struct dentry *parent; - int ret; + struct qstr last; + struct filename *filename; + struct path *root_share_path = &share_conf->vfs_path; + int err, type; + struct path parent_path; + struct dentry *d; + + if (pathname[0] == '\0') { + pathname = share_conf->path; + root_share_path = NULL; + } else { + flags |= LOOKUP_BENEATH; + }
- parent = dget_parent(dentry); - ret = ksmbd_vfs_lock_parent(idmap, parent, dentry); - if (ret) { - dput(parent); - return ret; + filename = getname_kernel(pathname); + if (IS_ERR(filename)) + return PTR_ERR(filename); + + err = vfs_path_parent_lookup(filename, flags, + &parent_path, &last, &type, + root_share_path); + putname(filename); + if (err) + return err; + + if (unlikely(type != LAST_NORM)) { + path_put(&parent_path); + return -ENOENT; }
- ret = inode_permission(idmap, d_inode(parent), - MAY_EXEC | MAY_WRITE); + inode_lock_nested(parent_path.dentry->d_inode, I_MUTEX_PARENT); + d = lookup_one_qstr_excl(&last, parent_path.dentry, 0); + if (IS_ERR(d)) + goto err_out;
- inode_unlock(d_inode(parent)); - dput(parent); - return ret; + if (d_is_negative(d)) { + dput(d); + goto err_out; + } + + path->dentry = d; + path->mnt = share_conf->vfs_path.mnt; + path_put(&parent_path); + + return 0; + +err_out: + inode_unlock(parent_path.dentry->d_inode); + path_put(&parent_path); + return -ENOENT; }
int ksmbd_vfs_query_maximal_access(struct mnt_idmap *idmap, struct dentry *dentry, __le32 *daccess) { - struct dentry *parent; int ret = 0;
*daccess = cpu_to_le32(FILE_READ_ATTRIBUTES | READ_CONTROL); @@ -136,18 +136,9 @@ int ksmbd_vfs_query_maximal_access(struct mnt_idmap *idmap, if (!inode_permission(idmap, d_inode(dentry), MAY_OPEN | MAY_EXEC)) *daccess |= FILE_EXECUTE_LE;
- parent = dget_parent(dentry); - ret = ksmbd_vfs_lock_parent(idmap, parent, dentry); - if (ret) { - dput(parent); - return ret; - } - - if (!inode_permission(idmap, d_inode(parent), MAY_EXEC | MAY_WRITE)) + if (!inode_permission(idmap, d_inode(dentry->d_parent), MAY_EXEC | MAY_WRITE)) *daccess |= FILE_DELETE_LE;
- inode_unlock(d_inode(parent)); - dput(parent); return ret; }
@@ -580,54 +571,32 @@ int ksmbd_vfs_fsync(struct ksmbd_work *work, u64 fid, u64 p_id) * * Return: 0 on success, otherwise error */ -int ksmbd_vfs_remove_file(struct ksmbd_work *work, char *name) +int ksmbd_vfs_remove_file(struct ksmbd_work *work, const struct path *path) { struct mnt_idmap *idmap; - struct path path; - struct dentry *parent; + struct dentry *parent = path->dentry->d_parent; int err;
if (ksmbd_override_fsids(work)) return -ENOMEM;
- err = ksmbd_vfs_kern_path(work, name, LOOKUP_NO_SYMLINKS, &path, false); - if (err) { - ksmbd_debug(VFS, "can't get %s, err %d\n", name, err); - ksmbd_revert_fsids(work); - return err; - } - - idmap = mnt_idmap(path.mnt); - parent = dget_parent(path.dentry); - err = ksmbd_vfs_lock_parent(idmap, parent, path.dentry); - if (err) { - dput(parent); - path_put(&path); - ksmbd_revert_fsids(work); - return err; - } - - if (!d_inode(path.dentry)->i_nlink) { + if (!d_inode(path->dentry)->i_nlink) { err = -ENOENT; goto out_err; }
- if (S_ISDIR(d_inode(path.dentry)->i_mode)) { - err = vfs_rmdir(idmap, d_inode(parent), path.dentry); + idmap = mnt_idmap(path->mnt); + if (S_ISDIR(d_inode(path->dentry)->i_mode)) { + err = vfs_rmdir(idmap, d_inode(parent), path->dentry); if (err && err != -ENOTEMPTY) - ksmbd_debug(VFS, "%s: rmdir failed, err %d\n", name, - err); + ksmbd_debug(VFS, "rmdir failed, err %d\n", err); } else { - err = vfs_unlink(idmap, d_inode(parent), path.dentry, NULL); + err = vfs_unlink(idmap, d_inode(parent), path->dentry, NULL); if (err) - ksmbd_debug(VFS, "%s: unlink failed, err %d\n", name, - err); + ksmbd_debug(VFS, "unlink failed, err %d\n", err); }
out_err: - inode_unlock(d_inode(parent)); - dput(parent); - path_put(&path); ksmbd_revert_fsids(work); return err; } @@ -686,149 +655,114 @@ int ksmbd_vfs_link(struct ksmbd_work *work, const char *oldname, return err; }
-static int ksmbd_validate_entry_in_use(struct dentry *src_dent) +int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path, + char *newname, int flags) { - struct dentry *dst_dent; - - spin_lock(&src_dent->d_lock); - list_for_each_entry(dst_dent, &src_dent->d_subdirs, d_child) { - struct ksmbd_file *child_fp; + struct dentry *old_parent, *new_dentry, *trap; + struct dentry *old_child = old_path->dentry; + struct path new_path; + struct qstr new_last; + struct renamedata rd; + struct filename *to; + struct ksmbd_share_config *share_conf = work->tcon->share_conf; + struct ksmbd_file *parent_fp; + int new_type; + int err, lookup_flags = LOOKUP_NO_SYMLINKS;
- if (d_really_is_negative(dst_dent)) - continue; + if (ksmbd_override_fsids(work)) + return -ENOMEM;
- child_fp = ksmbd_lookup_fd_inode(d_inode(dst_dent)); - if (child_fp) { - spin_unlock(&src_dent->d_lock); - ksmbd_debug(VFS, "Forbid rename, sub file/dir is in use\n"); - return -EACCES; - } + to = getname_kernel(newname); + if (IS_ERR(to)) { + err = PTR_ERR(to); + goto revert_fsids; } - spin_unlock(&src_dent->d_lock);
- return 0; -} +retry: + err = vfs_path_parent_lookup(to, lookup_flags | LOOKUP_BENEATH, + &new_path, &new_last, &new_type, + &share_conf->vfs_path); + if (err) + goto out1;
-static int __ksmbd_vfs_rename(struct ksmbd_work *work, - struct mnt_idmap *src_idmap, - struct dentry *src_dent_parent, - struct dentry *src_dent, - struct mnt_idmap *dst_idmap, - struct dentry *dst_dent_parent, - struct dentry *trap_dent, - char *dst_name) -{ - struct dentry *dst_dent; - int err; + if (old_path->mnt != new_path.mnt) { + err = -EXDEV; + goto out2; + }
- if (!work->tcon->posix_extensions) { - err = ksmbd_validate_entry_in_use(src_dent); - if (err) - return err; + trap = lock_rename_child(old_child, new_path.dentry); + + old_parent = dget(old_child->d_parent); + if (d_unhashed(old_child)) { + err = -EINVAL; + goto out3; }
- if (d_really_is_negative(src_dent_parent)) - return -ENOENT; - if (d_really_is_negative(dst_dent_parent)) - return -ENOENT; - if (d_really_is_negative(src_dent)) - return -ENOENT; - if (src_dent == trap_dent) - return -EINVAL; + parent_fp = ksmbd_lookup_fd_inode(d_inode(old_child->d_parent)); + if (parent_fp) { + if (parent_fp->daccess & FILE_DELETE_LE) { + pr_err("parent dir is opened with delete access\n"); + err = -ESHARE; + ksmbd_fd_put(work, parent_fp); + goto out3; + } + ksmbd_fd_put(work, parent_fp); + }
- if (ksmbd_override_fsids(work)) - return -ENOMEM; + new_dentry = lookup_one_qstr_excl(&new_last, new_path.dentry, + lookup_flags | LOOKUP_RENAME_TARGET); + if (IS_ERR(new_dentry)) { + err = PTR_ERR(new_dentry); + goto out3; + }
- dst_dent = lookup_one(dst_idmap, dst_name, - dst_dent_parent, strlen(dst_name)); - err = PTR_ERR(dst_dent); - if (IS_ERR(dst_dent)) { - pr_err("lookup failed %s [%d]\n", dst_name, err); - goto out; + if (d_is_symlink(new_dentry)) { + err = -EACCES; + goto out4; }
- err = -ENOTEMPTY; - if (dst_dent != trap_dent && !d_really_is_positive(dst_dent)) { - struct renamedata rd = { - .old_mnt_idmap = src_idmap, - .old_dir = d_inode(src_dent_parent), - .old_dentry = src_dent, - .new_mnt_idmap = dst_idmap, - .new_dir = d_inode(dst_dent_parent), - .new_dentry = dst_dent, - }; - err = vfs_rename(&rd); + if ((flags & RENAME_NOREPLACE) && d_is_positive(new_dentry)) { + err = -EEXIST; + goto out4; } - if (err) - pr_err("vfs_rename failed err %d\n", err); - if (dst_dent) - dput(dst_dent); -out: - ksmbd_revert_fsids(work); - return err; -}
-int ksmbd_vfs_fp_rename(struct ksmbd_work *work, struct ksmbd_file *fp, - char *newname) -{ - struct mnt_idmap *idmap; - struct path dst_path; - struct dentry *src_dent_parent, *dst_dent_parent; - struct dentry *src_dent, *trap_dent, *src_child; - char *dst_name; - int err; + if (old_child == trap) { + err = -EINVAL; + goto out4; + }
- dst_name = extract_last_component(newname); - if (!dst_name) { - dst_name = newname; - newname = ""; + if (new_dentry == trap) { + err = -ENOTEMPTY; + goto out4; }
- src_dent_parent = dget_parent(fp->filp->f_path.dentry); - src_dent = fp->filp->f_path.dentry; + rd.old_mnt_idmap = mnt_idmap(old_path->mnt), + rd.old_dir = d_inode(old_parent), + rd.old_dentry = old_child, + rd.new_mnt_idmap = mnt_idmap(new_path.mnt), + rd.new_dir = new_path.dentry->d_inode, + rd.new_dentry = new_dentry, + rd.flags = flags, + err = vfs_rename(&rd); + if (err) + ksmbd_debug(VFS, "vfs_rename failed err %d\n", err);
- err = ksmbd_vfs_kern_path(work, newname, - LOOKUP_NO_SYMLINKS | LOOKUP_DIRECTORY, - &dst_path, false); - if (err) { - ksmbd_debug(VFS, "Cannot get path for %s [%d]\n", newname, err); - goto out; +out4: + dput(new_dentry); +out3: + dput(old_parent); + unlock_rename(old_parent, new_path.dentry); +out2: + path_put(&new_path); + + if (retry_estale(err, lookup_flags)) { + lookup_flags |= LOOKUP_REVAL; + goto retry; } - dst_dent_parent = dst_path.dentry; - - trap_dent = lock_rename(src_dent_parent, dst_dent_parent); - dget(src_dent); - dget(dst_dent_parent); - idmap = file_mnt_idmap(fp->filp); - src_child = lookup_one(idmap, src_dent->d_name.name, src_dent_parent, - src_dent->d_name.len); - if (IS_ERR(src_child)) { - err = PTR_ERR(src_child); - goto out_lock; - } - - if (src_child != src_dent) { - err = -ESTALE; - dput(src_child); - goto out_lock; - } - dput(src_child); - - err = __ksmbd_vfs_rename(work, - idmap, - src_dent_parent, - src_dent, - mnt_idmap(dst_path.mnt), - dst_dent_parent, - trap_dent, - dst_name); -out_lock: - dput(src_dent); - dput(dst_dent_parent); - unlock_rename(src_dent_parent, dst_dent_parent); - path_put(&dst_path); -out: - dput(src_dent_parent); +out1: + putname(to); +revert_fsids: + ksmbd_revert_fsids(work); return err; }
@@ -1079,14 +1013,16 @@ int ksmbd_vfs_remove_xattr(struct mnt_idmap *idmap, return vfs_removexattr(idmap, dentry, attr_name); }
-int ksmbd_vfs_unlink(struct mnt_idmap *idmap, - struct dentry *dir, struct dentry *dentry) +int ksmbd_vfs_unlink(struct file *filp) { int err = 0; + struct dentry *dir, *dentry = filp->f_path.dentry; + struct mnt_idmap *idmap = file_mnt_idmap(filp);
- err = ksmbd_vfs_lock_parent(idmap, dir, dentry); + dir = dget_parent(dentry); + err = ksmbd_vfs_lock_parent(dir, dentry); if (err) - return err; + goto out; dget(dentry);
if (S_ISDIR(d_inode(dentry)->i_mode)) @@ -1098,6 +1034,8 @@ int ksmbd_vfs_unlink(struct mnt_idmap *idmap, inode_unlock(d_inode(dir)); if (err) ksmbd_debug(VFS, "failed to delete, err %d\n", err); +out: + dput(dir);
return err; } @@ -1200,7 +1138,7 @@ static int ksmbd_vfs_lookup_in_dir(const struct path *dir, char *name, }
/** - * ksmbd_vfs_kern_path() - lookup a file and get path info + * ksmbd_vfs_kern_path_locked() - lookup a file and get path info * @name: file path that is relative to share * @flags: lookup flags * @path: if lookup succeed, return path info @@ -1208,24 +1146,20 @@ static int ksmbd_vfs_lookup_in_dir(const struct path *dir, char *name, * * Return: 0 on success, otherwise error */ -int ksmbd_vfs_kern_path(struct ksmbd_work *work, char *name, - unsigned int flags, struct path *path, bool caseless) +int ksmbd_vfs_kern_path_locked(struct ksmbd_work *work, char *name, + unsigned int flags, struct path *path, + bool caseless) { struct ksmbd_share_config *share_conf = work->tcon->share_conf; int err; + struct path parent_path;
- flags |= LOOKUP_BENEATH; - err = vfs_path_lookup(share_conf->vfs_path.dentry, - share_conf->vfs_path.mnt, - name, - flags, - path); + err = ksmbd_vfs_path_lookup_locked(share_conf, name, flags, path); if (!err) - return 0; + return err;
if (caseless) { char *filepath; - struct path parent; size_t path_len, remain_len;
filepath = kstrdup(name, GFP_KERNEL); @@ -1235,10 +1169,10 @@ int ksmbd_vfs_kern_path(struct ksmbd_work *work, char *name, path_len = strlen(filepath); remain_len = path_len;
- parent = share_conf->vfs_path; - path_get(&parent); + parent_path = share_conf->vfs_path; + path_get(&parent_path);
- while (d_can_lookup(parent.dentry)) { + while (d_can_lookup(parent_path.dentry)) { char *filename = filepath + path_len - remain_len; char *next = strchrnul(filename, '/'); size_t filename_len = next - filename; @@ -1247,12 +1181,11 @@ int ksmbd_vfs_kern_path(struct ksmbd_work *work, char *name, if (filename_len == 0) break;
- err = ksmbd_vfs_lookup_in_dir(&parent, filename, + err = ksmbd_vfs_lookup_in_dir(&parent_path, filename, filename_len, work->conn->um); - path_put(&parent); if (err) - goto out; + goto out2;
next[0] = '\0';
@@ -1260,23 +1193,31 @@ int ksmbd_vfs_kern_path(struct ksmbd_work *work, char *name, share_conf->vfs_path.mnt, filepath, flags, - &parent); + path); if (err) - goto out; - else if (is_last) { - *path = parent; - goto out; - } + goto out2; + else if (is_last) + goto out1; + path_put(&parent_path); + parent_path = *path;
next[0] = '/'; remain_len -= filename_len + 1; }
- path_put(&parent); err = -EINVAL; -out: +out2: + path_put(&parent_path); +out1: kfree(filepath); } + + if (!err) { + err = ksmbd_vfs_lock_parent(parent_path.dentry, path->dentry); + if (err) + dput(path->dentry); + path_put(&parent_path); + } return err; }
diff --git a/fs/ksmbd/vfs.h b/fs/ksmbd/vfs.h index 9d676ab0cd25b..a4ae89f3230de 100644 --- a/fs/ksmbd/vfs.h +++ b/fs/ksmbd/vfs.h @@ -71,9 +71,7 @@ struct ksmbd_kstat { __le32 file_attributes; };
-int ksmbd_vfs_lock_parent(struct mnt_idmap *idmap, struct dentry *parent, - struct dentry *child); -int ksmbd_vfs_may_delete(struct mnt_idmap *idmap, struct dentry *dentry); +int ksmbd_vfs_lock_parent(struct dentry *parent, struct dentry *child); int ksmbd_vfs_query_maximal_access(struct mnt_idmap *idmap, struct dentry *dentry, __le32 *daccess); int ksmbd_vfs_create(struct ksmbd_work *work, const char *name, umode_t mode); @@ -84,12 +82,12 @@ int ksmbd_vfs_write(struct ksmbd_work *work, struct ksmbd_file *fp, char *buf, size_t count, loff_t *pos, bool sync, ssize_t *written); int ksmbd_vfs_fsync(struct ksmbd_work *work, u64 fid, u64 p_id); -int ksmbd_vfs_remove_file(struct ksmbd_work *work, char *name); +int ksmbd_vfs_remove_file(struct ksmbd_work *work, const struct path *path); int ksmbd_vfs_link(struct ksmbd_work *work, const char *oldname, const char *newname); int ksmbd_vfs_getattr(const struct path *path, struct kstat *stat); -int ksmbd_vfs_fp_rename(struct ksmbd_work *work, struct ksmbd_file *fp, - char *newname); +int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path, + char *newname, int flags); int ksmbd_vfs_truncate(struct ksmbd_work *work, struct ksmbd_file *fp, loff_t size); struct srv_copychunk; @@ -116,9 +114,9 @@ int ksmbd_vfs_xattr_stream_name(char *stream_name, char **xattr_stream_name, size_t *xattr_stream_name_size, int s_type); int ksmbd_vfs_remove_xattr(struct mnt_idmap *idmap, struct dentry *dentry, char *attr_name); -int ksmbd_vfs_kern_path(struct ksmbd_work *work, - char *name, unsigned int flags, struct path *path, - bool caseless); +int ksmbd_vfs_kern_path_locked(struct ksmbd_work *work, char *name, + unsigned int flags, struct path *path, + bool caseless); struct dentry *ksmbd_vfs_kern_path_create(struct ksmbd_work *work, const char *name, unsigned int flags, @@ -131,8 +129,7 @@ struct file_allocated_range_buffer; int ksmbd_vfs_fqar_lseek(struct ksmbd_file *fp, loff_t start, loff_t length, struct file_allocated_range_buffer *ranges, unsigned int in_count, unsigned int *out_count); -int ksmbd_vfs_unlink(struct mnt_idmap *idmap, struct dentry *dir, - struct dentry *dentry); +int ksmbd_vfs_unlink(struct file *filp); void *ksmbd_vfs_init_kstat(char **p, struct ksmbd_kstat *ksmbd_kstat); int ksmbd_vfs_fill_dentry_attrs(struct ksmbd_work *work, struct mnt_idmap *idmap, diff --git a/fs/ksmbd/vfs_cache.c b/fs/ksmbd/vfs_cache.c index 054a7d2e0f489..2d0138e72d783 100644 --- a/fs/ksmbd/vfs_cache.c +++ b/fs/ksmbd/vfs_cache.c @@ -244,7 +244,6 @@ void ksmbd_release_inode_hash(void)
static void __ksmbd_inode_close(struct ksmbd_file *fp) { - struct dentry *dir, *dentry; struct ksmbd_inode *ci = fp->f_ci; int err; struct file *filp; @@ -263,11 +262,9 @@ static void __ksmbd_inode_close(struct ksmbd_file *fp) if (atomic_dec_and_test(&ci->m_count)) { write_lock(&ci->m_lock); if (ci->m_flags & (S_DEL_ON_CLS | S_DEL_PENDING)) { - dentry = filp->f_path.dentry; - dir = dentry->d_parent; ci->m_flags &= ~(S_DEL_ON_CLS | S_DEL_PENDING); write_unlock(&ci->m_lock); - ksmbd_vfs_unlink(file_mnt_idmap(filp), dir, dentry); + ksmbd_vfs_unlink(filp); write_lock(&ci->m_lock); } write_unlock(&ci->m_lock); diff --git a/fs/namei.c b/fs/namei.c index 6bc1964e2214e..c1d59ae5af83e 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -254,6 +254,7 @@ getname_kernel(const char * filename)
return result; } +EXPORT_SYMBOL(getname_kernel);
void putname(struct filename *name) { @@ -271,6 +272,7 @@ void putname(struct filename *name) } else __putname(name); } +EXPORT_SYMBOL(putname);
/** * check_acl - perform ACL permission checking @@ -1581,8 +1583,9 @@ static struct dentry *lookup_dcache(const struct qstr *name, * when directory is guaranteed to have no in-lookup children * at all. */ -static struct dentry *__lookup_hash(const struct qstr *name, - struct dentry *base, unsigned int flags) +struct dentry *lookup_one_qstr_excl(const struct qstr *name, + struct dentry *base, + unsigned int flags) { struct dentry *dentry = lookup_dcache(name, base, flags); struct dentry *old; @@ -1606,6 +1609,7 @@ static struct dentry *__lookup_hash(const struct qstr *name, } return dentry; } +EXPORT_SYMBOL(lookup_one_qstr_excl);
static struct dentry *lookup_fast(struct nameidata *nd) { @@ -2532,16 +2536,17 @@ static int path_parentat(struct nameidata *nd, unsigned flags, }
/* Note: this does not consume "name" */ -static int filename_parentat(int dfd, struct filename *name, - unsigned int flags, struct path *parent, - struct qstr *last, int *type) +static int __filename_parentat(int dfd, struct filename *name, + unsigned int flags, struct path *parent, + struct qstr *last, int *type, + const struct path *root) { int retval; struct nameidata nd;
if (IS_ERR(name)) return PTR_ERR(name); - set_nameidata(&nd, dfd, name, NULL); + set_nameidata(&nd, dfd, name, root); retval = path_parentat(&nd, flags | LOOKUP_RCU, parent); if (unlikely(retval == -ECHILD)) retval = path_parentat(&nd, flags, parent); @@ -2556,6 +2561,13 @@ static int filename_parentat(int dfd, struct filename *name, return retval; }
+static int filename_parentat(int dfd, struct filename *name, + unsigned int flags, struct path *parent, + struct qstr *last, int *type) +{ + return __filename_parentat(dfd, name, flags, parent, last, type, NULL); +} + /* does lookup, returns the object with parent locked */ static struct dentry *__kern_path_locked(struct filename *name, struct path *path) { @@ -2571,7 +2583,7 @@ static struct dentry *__kern_path_locked(struct filename *name, struct path *pat return ERR_PTR(-EINVAL); } inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT); - d = __lookup_hash(&last, path->dentry, 0); + d = lookup_one_qstr_excl(&last, path->dentry, 0); if (IS_ERR(d)) { inode_unlock(path->dentry->d_inode); path_put(path); @@ -2599,6 +2611,24 @@ int kern_path(const char *name, unsigned int flags, struct path *path) } EXPORT_SYMBOL(kern_path);
+/** + * vfs_path_parent_lookup - lookup a parent path relative to a dentry-vfsmount pair + * @filename: filename structure + * @flags: lookup flags + * @parent: pointer to struct path to fill + * @last: last component + * @type: type of the last component + * @root: pointer to struct path of the base directory + */ +int vfs_path_parent_lookup(struct filename *filename, unsigned int flags, + struct path *parent, struct qstr *last, int *type, + const struct path *root) +{ + return __filename_parentat(AT_FDCWD, filename, flags, parent, last, + type, root); +} +EXPORT_SYMBOL(vfs_path_parent_lookup); + /** * vfs_path_lookup - lookup a file path relative to a dentry-vfsmount pair * @dentry: pointer to dentry of the base directory @@ -3852,7 +3882,8 @@ static struct dentry *filename_create(int dfd, struct filename *name, if (last.name[last.len] && !want_dir) create_flags = 0; inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT); - dentry = __lookup_hash(&last, path->dentry, reval_flag | create_flags); + dentry = lookup_one_qstr_excl(&last, path->dentry, + reval_flag | create_flags); if (IS_ERR(dentry)) goto unlock;
@@ -4212,7 +4243,7 @@ int do_rmdir(int dfd, struct filename *name) goto exit2;
inode_lock_nested(path.dentry->d_inode, I_MUTEX_PARENT); - dentry = __lookup_hash(&last, path.dentry, lookup_flags); + dentry = lookup_one_qstr_excl(&last, path.dentry, lookup_flags); error = PTR_ERR(dentry); if (IS_ERR(dentry)) goto exit3; @@ -4345,7 +4376,7 @@ int do_unlinkat(int dfd, struct filename *name) goto exit2; retry_deleg: inode_lock_nested(path.dentry->d_inode, I_MUTEX_PARENT); - dentry = __lookup_hash(&last, path.dentry, lookup_flags); + dentry = lookup_one_qstr_excl(&last, path.dentry, lookup_flags); error = PTR_ERR(dentry); if (!IS_ERR(dentry)) {
@@ -4909,7 +4940,8 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd, retry_deleg: trap = lock_rename(new_path.dentry, old_path.dentry);
- old_dentry = __lookup_hash(&old_last, old_path.dentry, lookup_flags); + old_dentry = lookup_one_qstr_excl(&old_last, old_path.dentry, + lookup_flags); error = PTR_ERR(old_dentry); if (IS_ERR(old_dentry)) goto exit3; @@ -4917,7 +4949,8 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd, error = -ENOENT; if (d_is_negative(old_dentry)) goto exit4; - new_dentry = __lookup_hash(&new_last, new_path.dentry, lookup_flags | target_flags); + new_dentry = lookup_one_qstr_excl(&new_last, new_path.dentry, + lookup_flags | target_flags); error = PTR_ERR(new_dentry); if (IS_ERR(new_dentry)) goto exit4; diff --git a/include/linux/namei.h b/include/linux/namei.h index 5864e4d82e567..1463cbda48886 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -57,12 +57,18 @@ static inline int user_path_at(int dfd, const char __user *name, unsigned flags, return user_path_at_empty(dfd, name, flags, path, NULL); }
+struct dentry *lookup_one_qstr_excl(const struct qstr *name, + struct dentry *base, + unsigned int flags); extern int kern_path(const char *, unsigned, struct path *);
extern struct dentry *kern_path_create(int, const char *, struct path *, unsigned int); extern struct dentry *user_path_create(int, const char __user *, struct path *, unsigned int); extern void done_path_create(struct path *, struct dentry *); extern struct dentry *kern_path_locked(const char *, struct path *); +int vfs_path_parent_lookup(struct filename *filename, unsigned int flags, + struct path *parent, struct qstr *last, int *type, + const struct path *root); int vfs_path_lookup(struct dentry *, struct vfsmount *, const char *, unsigned int, struct path *);
From: Namjae Jeon linkinjeon@kernel.org
[ Upstream commit 40b268d384a22276dca1450549f53eed60e21deb ]
ksmbd is doing write access using vfs helpers. There are the cases that mnt_want_write() is not called in vfs helper. This patch add missing mnt_want_write() to ksmbd vfs functions.
Cc: stable@vger.kernel.org Cc: Amir Goldstein amir73il@gmail.com Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ksmbd/smb2pdu.c | 26 ++++------ fs/ksmbd/smbacl.c | 10 ++-- fs/ksmbd/vfs.c | 117 ++++++++++++++++++++++++++++++++++--------- fs/ksmbd/vfs.h | 17 +++---- fs/ksmbd/vfs_cache.c | 2 +- 5 files changed, 117 insertions(+), 55 deletions(-)
diff --git a/fs/ksmbd/smb2pdu.c b/fs/ksmbd/smb2pdu.c index 06a335356e373..1d365eb79c3ef 100644 --- a/fs/ksmbd/smb2pdu.c +++ b/fs/ksmbd/smb2pdu.c @@ -2311,7 +2311,7 @@ static int smb2_set_ea(struct smb2_ea_info *eabuf, unsigned int buf_len, /* delete the EA only when it exits */ if (rc > 0) { rc = ksmbd_vfs_remove_xattr(idmap, - path->dentry, + path, attr_name);
if (rc < 0) { @@ -2325,8 +2325,7 @@ static int smb2_set_ea(struct smb2_ea_info *eabuf, unsigned int buf_len, /* if the EA doesn't exist, just do nothing. */ rc = 0; } else { - rc = ksmbd_vfs_setxattr(idmap, - path->dentry, attr_name, value, + rc = ksmbd_vfs_setxattr(idmap, path, attr_name, value, le16_to_cpu(eabuf->EaValueLength), 0); if (rc < 0) { ksmbd_debug(SMB, @@ -2383,8 +2382,7 @@ static noinline int smb2_set_stream_name_xattr(const struct path *path, return -EBADF; }
- rc = ksmbd_vfs_setxattr(idmap, path->dentry, - xattr_stream_name, NULL, 0, 0); + rc = ksmbd_vfs_setxattr(idmap, path, xattr_stream_name, NULL, 0, 0); if (rc < 0) pr_err("Failed to store XATTR stream name :%d\n", rc); return 0; @@ -2412,7 +2410,7 @@ static int smb2_remove_smb_xattrs(const struct path *path) if (!strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) && !strncmp(&name[XATTR_USER_PREFIX_LEN], STREAM_PREFIX, STREAM_PREFIX_LEN)) { - err = ksmbd_vfs_remove_xattr(idmap, path->dentry, + err = ksmbd_vfs_remove_xattr(idmap, path, name); if (err) ksmbd_debug(SMB, "remove xattr failed : %s\n", @@ -2459,8 +2457,7 @@ static void smb2_new_xattrs(struct ksmbd_tree_connect *tcon, const struct path * da.flags = XATTR_DOSINFO_ATTRIB | XATTR_DOSINFO_CREATE_TIME | XATTR_DOSINFO_ITIME;
- rc = ksmbd_vfs_set_dos_attrib_xattr(mnt_idmap(path->mnt), - path->dentry, &da); + rc = ksmbd_vfs_set_dos_attrib_xattr(mnt_idmap(path->mnt), path, &da); if (rc) ksmbd_debug(SMB, "failed to store file attribute into xattr\n"); } @@ -3034,7 +3031,7 @@ int smb2_open(struct ksmbd_work *work) struct inode *inode = d_inode(path.dentry);
posix_acl_rc = ksmbd_vfs_inherit_posix_acl(idmap, - path.dentry, + &path, d_inode(path.dentry->d_parent)); if (posix_acl_rc) ksmbd_debug(SMB, "inherit posix acl failed : %d\n", posix_acl_rc); @@ -3050,7 +3047,7 @@ int smb2_open(struct ksmbd_work *work) if (rc) { if (posix_acl_rc) ksmbd_vfs_set_init_posix_acl(idmap, - path.dentry); + &path);
if (test_share_config_flag(work->tcon->share_conf, KSMBD_SHARE_FLAG_ACL_XATTR)) { @@ -3090,7 +3087,7 @@ int smb2_open(struct ksmbd_work *work)
rc = ksmbd_vfs_set_sd_xattr(conn, idmap, - path.dentry, + &path, pntsd, pntsd_size); kfree(pntsd); @@ -5526,7 +5523,7 @@ static int smb2_rename(struct ksmbd_work *work, goto out;
rc = ksmbd_vfs_setxattr(file_mnt_idmap(fp->filp), - fp->filp->f_path.dentry, + &fp->filp->f_path, xattr_stream_name, NULL, 0, 0); if (rc < 0) { @@ -5691,8 +5688,7 @@ static int set_file_basic_info(struct ksmbd_file *fp, da.flags = XATTR_DOSINFO_ATTRIB | XATTR_DOSINFO_CREATE_TIME | XATTR_DOSINFO_ITIME;
- rc = ksmbd_vfs_set_dos_attrib_xattr(idmap, - filp->f_path.dentry, &da); + rc = ksmbd_vfs_set_dos_attrib_xattr(idmap, &filp->f_path, &da); if (rc) ksmbd_debug(SMB, "failed to restore file attribute in EA\n"); @@ -7547,7 +7543,7 @@ static inline int fsctl_set_sparse(struct ksmbd_work *work, u64 id,
da.attr = le32_to_cpu(fp->f_ci->m_fattr); ret = ksmbd_vfs_set_dos_attrib_xattr(idmap, - fp->filp->f_path.dentry, &da); + &fp->filp->f_path, &da); if (ret) fp->f_ci->m_fattr = old_fattr; } diff --git a/fs/ksmbd/smbacl.c b/fs/ksmbd/smbacl.c index 0a5862a61c773..ad919a4239d0a 100644 --- a/fs/ksmbd/smbacl.c +++ b/fs/ksmbd/smbacl.c @@ -1162,8 +1162,7 @@ int smb_inherit_dacl(struct ksmbd_conn *conn, pntsd_size += sizeof(struct smb_acl) + nt_size; }
- ksmbd_vfs_set_sd_xattr(conn, idmap, - path->dentry, pntsd, pntsd_size); + ksmbd_vfs_set_sd_xattr(conn, idmap, path, pntsd, pntsd_size); kfree(pntsd); }
@@ -1383,7 +1382,7 @@ int set_info_sec(struct ksmbd_conn *conn, struct ksmbd_tree_connect *tcon, newattrs.ia_valid |= ATTR_MODE; newattrs.ia_mode = (inode->i_mode & ~0777) | (fattr.cf_mode & 0777);
- ksmbd_vfs_remove_acl_xattrs(idmap, path->dentry); + ksmbd_vfs_remove_acl_xattrs(idmap, path); /* Update posix acls */ if (IS_ENABLED(CONFIG_FS_POSIX_ACL) && fattr.cf_dacls) { rc = set_posix_acl(idmap, path->dentry, @@ -1414,9 +1413,8 @@ int set_info_sec(struct ksmbd_conn *conn, struct ksmbd_tree_connect *tcon,
if (test_share_config_flag(tcon->share_conf, KSMBD_SHARE_FLAG_ACL_XATTR)) { /* Update WinACL in xattr */ - ksmbd_vfs_remove_sd_xattrs(idmap, path->dentry); - ksmbd_vfs_set_sd_xattr(conn, idmap, - path->dentry, pntsd, ntsd_len); + ksmbd_vfs_remove_sd_xattrs(idmap, path); + ksmbd_vfs_set_sd_xattr(conn, idmap, path, pntsd, ntsd_len); }
out: diff --git a/fs/ksmbd/vfs.c b/fs/ksmbd/vfs.c index 448c442ad3f49..fb19deba58cd9 100644 --- a/fs/ksmbd/vfs.c +++ b/fs/ksmbd/vfs.c @@ -166,6 +166,10 @@ int ksmbd_vfs_create(struct ksmbd_work *work, const char *name, umode_t mode) return err; }
+ err = mnt_want_write(path.mnt); + if (err) + goto out_err; + mode |= S_IFREG; err = vfs_create(mnt_idmap(path.mnt), d_inode(path.dentry), dentry, mode, true); @@ -175,6 +179,9 @@ int ksmbd_vfs_create(struct ksmbd_work *work, const char *name, umode_t mode) } else { pr_err("File(%s): creation failed (err:%d)\n", name, err); } + mnt_drop_write(path.mnt); + +out_err: done_path_create(&path, dentry); return err; } @@ -205,30 +212,35 @@ int ksmbd_vfs_mkdir(struct ksmbd_work *work, const char *name, umode_t mode) return err; }
+ err = mnt_want_write(path.mnt); + if (err) + goto out_err2; + idmap = mnt_idmap(path.mnt); mode |= S_IFDIR; err = vfs_mkdir(idmap, d_inode(path.dentry), dentry, mode); - if (err) { - goto out; - } else if (d_unhashed(dentry)) { + if (!err && d_unhashed(dentry)) { struct dentry *d;
d = lookup_one(idmap, dentry->d_name.name, dentry->d_parent, dentry->d_name.len); if (IS_ERR(d)) { err = PTR_ERR(d); - goto out; + goto out_err1; } if (unlikely(d_is_negative(d))) { dput(d); err = -ENOENT; - goto out; + goto out_err1; }
ksmbd_vfs_inherit_owner(work, d_inode(path.dentry), d_inode(d)); dput(d); } -out: + +out_err1: + mnt_drop_write(path.mnt); +out_err2: done_path_create(&path, dentry); if (err) pr_err("mkdir(%s): creation failed (err:%d)\n", name, err); @@ -439,7 +451,7 @@ static int ksmbd_vfs_stream_write(struct ksmbd_file *fp, char *buf, loff_t *pos, memcpy(&stream_buf[*pos], buf, count);
err = ksmbd_vfs_setxattr(idmap, - fp->filp->f_path.dentry, + &fp->filp->f_path, fp->stream.name, (void *)stream_buf, size, @@ -585,6 +597,10 @@ int ksmbd_vfs_remove_file(struct ksmbd_work *work, const struct path *path) goto out_err; }
+ err = mnt_want_write(path->mnt); + if (err) + goto out_err; + idmap = mnt_idmap(path->mnt); if (S_ISDIR(d_inode(path->dentry)->i_mode)) { err = vfs_rmdir(idmap, d_inode(parent), path->dentry); @@ -595,6 +611,7 @@ int ksmbd_vfs_remove_file(struct ksmbd_work *work, const struct path *path) if (err) ksmbd_debug(VFS, "unlink failed, err %d\n", err); } + mnt_drop_write(path->mnt);
out_err: ksmbd_revert_fsids(work); @@ -640,11 +657,16 @@ int ksmbd_vfs_link(struct ksmbd_work *work, const char *oldname, goto out3; }
+ err = mnt_want_write(newpath.mnt); + if (err) + goto out3; + err = vfs_link(oldpath.dentry, mnt_idmap(newpath.mnt), d_inode(newpath.dentry), dentry, NULL); if (err) ksmbd_debug(VFS, "vfs_link failed err %d\n", err); + mnt_drop_write(newpath.mnt);
out3: done_path_create(&newpath, dentry); @@ -690,6 +712,10 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path, goto out2; }
+ err = mnt_want_write(old_path->mnt); + if (err) + goto out2; + trap = lock_rename_child(old_child, new_path.dentry);
old_parent = dget(old_child->d_parent); @@ -752,6 +778,7 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path, out3: dput(old_parent); unlock_rename(old_parent, new_path.dentry); + mnt_drop_write(old_path->mnt); out2: path_put(&new_path);
@@ -892,19 +919,24 @@ ssize_t ksmbd_vfs_getxattr(struct mnt_idmap *idmap, * Return: 0 on success, otherwise error */ int ksmbd_vfs_setxattr(struct mnt_idmap *idmap, - struct dentry *dentry, const char *attr_name, + const struct path *path, const char *attr_name, void *attr_value, size_t attr_size, int flags) { int err;
+ err = mnt_want_write(path->mnt); + if (err) + return err; + err = vfs_setxattr(idmap, - dentry, + path->dentry, attr_name, attr_value, attr_size, flags); if (err) ksmbd_debug(VFS, "setxattr failed, err %d\n", err); + mnt_drop_write(path->mnt); return err; }
@@ -1008,9 +1040,18 @@ int ksmbd_vfs_fqar_lseek(struct ksmbd_file *fp, loff_t start, loff_t length, }
int ksmbd_vfs_remove_xattr(struct mnt_idmap *idmap, - struct dentry *dentry, char *attr_name) + const struct path *path, char *attr_name) { - return vfs_removexattr(idmap, dentry, attr_name); + int err; + + err = mnt_want_write(path->mnt); + if (err) + return err; + + err = vfs_removexattr(idmap, path->dentry, attr_name); + mnt_drop_write(path->mnt); + + return err; }
int ksmbd_vfs_unlink(struct file *filp) @@ -1019,6 +1060,10 @@ int ksmbd_vfs_unlink(struct file *filp) struct dentry *dir, *dentry = filp->f_path.dentry; struct mnt_idmap *idmap = file_mnt_idmap(filp);
+ err = mnt_want_write(filp->f_path.mnt); + if (err) + return err; + dir = dget_parent(dentry); err = ksmbd_vfs_lock_parent(dir, dentry); if (err) @@ -1036,6 +1081,7 @@ int ksmbd_vfs_unlink(struct file *filp) ksmbd_debug(VFS, "failed to delete, err %d\n", err); out: dput(dir); + mnt_drop_write(filp->f_path.mnt);
return err; } @@ -1239,13 +1285,13 @@ struct dentry *ksmbd_vfs_kern_path_create(struct ksmbd_work *work, }
int ksmbd_vfs_remove_acl_xattrs(struct mnt_idmap *idmap, - struct dentry *dentry) + const struct path *path) { char *name, *xattr_list = NULL; ssize_t xattr_list_len; int err = 0;
- xattr_list_len = ksmbd_vfs_listxattr(dentry, &xattr_list); + xattr_list_len = ksmbd_vfs_listxattr(path->dentry, &xattr_list); if (xattr_list_len < 0) { goto out; } else if (!xattr_list_len) { @@ -1253,6 +1299,10 @@ int ksmbd_vfs_remove_acl_xattrs(struct mnt_idmap *idmap, goto out; }
+ err = mnt_want_write(path->mnt); + if (err) + goto out; + for (name = xattr_list; name - xattr_list < xattr_list_len; name += strlen(name) + 1) { ksmbd_debug(SMB, "%s, len %zd\n", name, strlen(name)); @@ -1261,25 +1311,26 @@ int ksmbd_vfs_remove_acl_xattrs(struct mnt_idmap *idmap, sizeof(XATTR_NAME_POSIX_ACL_ACCESS) - 1) || !strncmp(name, XATTR_NAME_POSIX_ACL_DEFAULT, sizeof(XATTR_NAME_POSIX_ACL_DEFAULT) - 1)) { - err = vfs_remove_acl(idmap, dentry, name); + err = vfs_remove_acl(idmap, path->dentry, name); if (err) ksmbd_debug(SMB, "remove acl xattr failed : %s\n", name); } } + mnt_drop_write(path->mnt); + out: kvfree(xattr_list); return err; }
-int ksmbd_vfs_remove_sd_xattrs(struct mnt_idmap *idmap, - struct dentry *dentry) +int ksmbd_vfs_remove_sd_xattrs(struct mnt_idmap *idmap, const struct path *path) { char *name, *xattr_list = NULL; ssize_t xattr_list_len; int err = 0;
- xattr_list_len = ksmbd_vfs_listxattr(dentry, &xattr_list); + xattr_list_len = ksmbd_vfs_listxattr(path->dentry, &xattr_list); if (xattr_list_len < 0) { goto out; } else if (!xattr_list_len) { @@ -1292,7 +1343,7 @@ int ksmbd_vfs_remove_sd_xattrs(struct mnt_idmap *idmap, ksmbd_debug(SMB, "%s, len %zd\n", name, strlen(name));
if (!strncmp(name, XATTR_NAME_SD, XATTR_NAME_SD_LEN)) { - err = ksmbd_vfs_remove_xattr(idmap, dentry, name); + err = ksmbd_vfs_remove_xattr(idmap, path, name); if (err) ksmbd_debug(SMB, "remove xattr failed : %s\n", name); } @@ -1369,13 +1420,14 @@ static struct xattr_smb_acl *ksmbd_vfs_make_xattr_posix_acl(struct mnt_idmap *id
int ksmbd_vfs_set_sd_xattr(struct ksmbd_conn *conn, struct mnt_idmap *idmap, - struct dentry *dentry, + const struct path *path, struct smb_ntsd *pntsd, int len) { int rc; struct ndr sd_ndr = {0}, acl_ndr = {0}; struct xattr_ntacl acl = {0}; struct xattr_smb_acl *smb_acl, *def_smb_acl = NULL; + struct dentry *dentry = path->dentry; struct inode *inode = d_inode(dentry);
acl.version = 4; @@ -1427,7 +1479,7 @@ int ksmbd_vfs_set_sd_xattr(struct ksmbd_conn *conn, goto out; }
- rc = ksmbd_vfs_setxattr(idmap, dentry, + rc = ksmbd_vfs_setxattr(idmap, path, XATTR_NAME_SD, sd_ndr.data, sd_ndr.offset, 0); if (rc < 0) @@ -1517,7 +1569,7 @@ int ksmbd_vfs_get_sd_xattr(struct ksmbd_conn *conn, }
int ksmbd_vfs_set_dos_attrib_xattr(struct mnt_idmap *idmap, - struct dentry *dentry, + const struct path *path, struct xattr_dos_attrib *da) { struct ndr n; @@ -1527,7 +1579,7 @@ int ksmbd_vfs_set_dos_attrib_xattr(struct mnt_idmap *idmap, if (err) return err;
- err = ksmbd_vfs_setxattr(idmap, dentry, XATTR_NAME_DOS_ATTRIBUTE, + err = ksmbd_vfs_setxattr(idmap, path, XATTR_NAME_DOS_ATTRIBUTE, (void *)n.data, n.offset, 0); if (err) ksmbd_debug(SMB, "failed to store dos attribute in xattr\n"); @@ -1764,10 +1816,11 @@ void ksmbd_vfs_posix_lock_unblock(struct file_lock *flock) }
int ksmbd_vfs_set_init_posix_acl(struct mnt_idmap *idmap, - struct dentry *dentry) + struct path *path) { struct posix_acl_state acl_state; struct posix_acl *acls; + struct dentry *dentry = path->dentry; struct inode *inode = d_inode(dentry); int rc;
@@ -1797,6 +1850,11 @@ int ksmbd_vfs_set_init_posix_acl(struct mnt_idmap *idmap, return -ENOMEM; } posix_state_to_acl(&acl_state, acls->a_entries); + + rc = mnt_want_write(path->mnt); + if (rc) + goto out_err; + rc = set_posix_acl(idmap, dentry, ACL_TYPE_ACCESS, acls); if (rc < 0) ksmbd_debug(SMB, "Set posix acl(ACL_TYPE_ACCESS) failed, rc : %d\n", @@ -1808,16 +1866,20 @@ int ksmbd_vfs_set_init_posix_acl(struct mnt_idmap *idmap, ksmbd_debug(SMB, "Set posix acl(ACL_TYPE_DEFAULT) failed, rc : %d\n", rc); } + mnt_drop_write(path->mnt); + +out_err: free_acl_state(&acl_state); posix_acl_release(acls); return rc; }
int ksmbd_vfs_inherit_posix_acl(struct mnt_idmap *idmap, - struct dentry *dentry, struct inode *parent_inode) + struct path *path, struct inode *parent_inode) { struct posix_acl *acls; struct posix_acl_entry *pace; + struct dentry *dentry = path->dentry; struct inode *inode = d_inode(dentry); int rc, i;
@@ -1836,6 +1898,10 @@ int ksmbd_vfs_inherit_posix_acl(struct mnt_idmap *idmap, } }
+ rc = mnt_want_write(path->mnt); + if (rc) + goto out_err; + rc = set_posix_acl(idmap, dentry, ACL_TYPE_ACCESS, acls); if (rc < 0) ksmbd_debug(SMB, "Set posix acl(ACL_TYPE_ACCESS) failed, rc : %d\n", @@ -1847,6 +1913,9 @@ int ksmbd_vfs_inherit_posix_acl(struct mnt_idmap *idmap, ksmbd_debug(SMB, "Set posix acl(ACL_TYPE_DEFAULT) failed, rc : %d\n", rc); } + mnt_drop_write(path->mnt); + +out_err: posix_acl_release(acls); return rc; } diff --git a/fs/ksmbd/vfs.h b/fs/ksmbd/vfs.h index a4ae89f3230de..8c0931d4d5310 100644 --- a/fs/ksmbd/vfs.h +++ b/fs/ksmbd/vfs.h @@ -108,12 +108,12 @@ ssize_t ksmbd_vfs_casexattr_len(struct mnt_idmap *idmap, struct dentry *dentry, char *attr_name, int attr_name_len); int ksmbd_vfs_setxattr(struct mnt_idmap *idmap, - struct dentry *dentry, const char *attr_name, + const struct path *path, const char *attr_name, void *attr_value, size_t attr_size, int flags); int ksmbd_vfs_xattr_stream_name(char *stream_name, char **xattr_stream_name, size_t *xattr_stream_name_size, int s_type); int ksmbd_vfs_remove_xattr(struct mnt_idmap *idmap, - struct dentry *dentry, char *attr_name); + const struct path *path, char *attr_name); int ksmbd_vfs_kern_path_locked(struct ksmbd_work *work, char *name, unsigned int flags, struct path *path, bool caseless); @@ -139,26 +139,25 @@ void ksmbd_vfs_posix_lock_wait(struct file_lock *flock); int ksmbd_vfs_posix_lock_wait_timeout(struct file_lock *flock, long timeout); void ksmbd_vfs_posix_lock_unblock(struct file_lock *flock); int ksmbd_vfs_remove_acl_xattrs(struct mnt_idmap *idmap, - struct dentry *dentry); -int ksmbd_vfs_remove_sd_xattrs(struct mnt_idmap *idmap, - struct dentry *dentry); + const struct path *path); +int ksmbd_vfs_remove_sd_xattrs(struct mnt_idmap *idmap, const struct path *path); int ksmbd_vfs_set_sd_xattr(struct ksmbd_conn *conn, struct mnt_idmap *idmap, - struct dentry *dentry, + const struct path *path, struct smb_ntsd *pntsd, int len); int ksmbd_vfs_get_sd_xattr(struct ksmbd_conn *conn, struct mnt_idmap *idmap, struct dentry *dentry, struct smb_ntsd **pntsd); int ksmbd_vfs_set_dos_attrib_xattr(struct mnt_idmap *idmap, - struct dentry *dentry, + const struct path *path, struct xattr_dos_attrib *da); int ksmbd_vfs_get_dos_attrib_xattr(struct mnt_idmap *idmap, struct dentry *dentry, struct xattr_dos_attrib *da); int ksmbd_vfs_set_init_posix_acl(struct mnt_idmap *idmap, - struct dentry *dentry); + struct path *path); int ksmbd_vfs_inherit_posix_acl(struct mnt_idmap *idmap, - struct dentry *dentry, + struct path *path, struct inode *parent_inode); #endif /* __KSMBD_VFS_H__ */ diff --git a/fs/ksmbd/vfs_cache.c b/fs/ksmbd/vfs_cache.c index 2d0138e72d783..f41f8d6108ce9 100644 --- a/fs/ksmbd/vfs_cache.c +++ b/fs/ksmbd/vfs_cache.c @@ -252,7 +252,7 @@ static void __ksmbd_inode_close(struct ksmbd_file *fp) if (ksmbd_stream_fd(fp) && (ci->m_flags & S_DEL_ON_CLS_STREAM)) { ci->m_flags &= ~S_DEL_ON_CLS_STREAM; err = ksmbd_vfs_remove_xattr(file_mnt_idmap(filp), - filp->f_path.dentry, + &filp->f_path, fp->stream.name); if (err) pr_err("remove xattr failed : %s\n",
From: Andrew Powers-Holmes aholmes@omnom.net
commit 568a67e742dfa90b19a23305317164c5c350b71e upstream.
The register and range mappings for the PCIe controller in Rockchip's RK356x SoCs are incorrect. Replace them with corrected values from the vendor BSP sources, updated to match current DT schema.
These values are also used in u-boot.
Fixes: 66b51ea7d70f ("arm64: dts: rockchip: Add rk3568 PCIe2x1 controller") Cc: stable@vger.kernel.org Signed-off-by: Andrew Powers-Holmes aholmes@omnom.net Signed-off-by: Jonas Karlman jonas@kwiboo.se Signed-off-by: Nicolas Frattaroli frattaroli.nicolas@gmail.com Tested-by: Diederik de Haas didi.debian@cknow.org Link: https://lore.kernel.org/r/20230601132516.153934-1-frattaroli.nicolas@gmail.c... Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/rockchip/rk3568.dtsi | 14 ++++++++------ arch/arm64/boot/dts/rockchip/rk356x.dtsi | 7 ++++--- 2 files changed, 12 insertions(+), 9 deletions(-)
--- a/arch/arm64/boot/dts/rockchip/rk3568.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3568.dtsi @@ -94,9 +94,10 @@ power-domains = <&power RK3568_PD_PIPE>; reg = <0x3 0xc0400000 0x0 0x00400000>, <0x0 0xfe270000 0x0 0x00010000>, - <0x3 0x7f000000 0x0 0x01000000>; - ranges = <0x01000000 0x0 0x3ef00000 0x3 0x7ef00000 0x0 0x00100000>, - <0x02000000 0x0 0x00000000 0x3 0x40000000 0x0 0x3ef00000>; + <0x0 0xf2000000 0x0 0x00100000>; + ranges = <0x01000000 0x0 0xf2100000 0x0 0xf2100000 0x0 0x00100000>, + <0x02000000 0x0 0xf2200000 0x0 0xf2200000 0x0 0x01e00000>, + <0x03000000 0x0 0x40000000 0x3 0x40000000 0x0 0x40000000>; reg-names = "dbi", "apb", "config"; resets = <&cru SRST_PCIE30X1_POWERUP>; reset-names = "pipe"; @@ -146,9 +147,10 @@ power-domains = <&power RK3568_PD_PIPE>; reg = <0x3 0xc0800000 0x0 0x00400000>, <0x0 0xfe280000 0x0 0x00010000>, - <0x3 0xbf000000 0x0 0x01000000>; - ranges = <0x01000000 0x0 0x3ef00000 0x3 0xbef00000 0x0 0x00100000>, - <0x02000000 0x0 0x00000000 0x3 0x80000000 0x0 0x3ef00000>; + <0x0 0xf0000000 0x0 0x00100000>; + ranges = <0x01000000 0x0 0xf0100000 0x0 0xf0100000 0x0 0x00100000>, + <0x02000000 0x0 0xf0200000 0x0 0xf0200000 0x0 0x01e00000>, + <0x03000000 0x0 0x40000000 0x3 0x80000000 0x0 0x40000000>; reg-names = "dbi", "apb", "config"; resets = <&cru SRST_PCIE30X2_POWERUP>; reset-names = "pipe"; --- a/arch/arm64/boot/dts/rockchip/rk356x.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk356x.dtsi @@ -952,7 +952,7 @@ compatible = "rockchip,rk3568-pcie"; reg = <0x3 0xc0000000 0x0 0x00400000>, <0x0 0xfe260000 0x0 0x00010000>, - <0x3 0x3f000000 0x0 0x01000000>; + <0x0 0xf4000000 0x0 0x00100000>; reg-names = "dbi", "apb", "config"; interrupts = <GIC_SPI 75 IRQ_TYPE_LEVEL_HIGH>, <GIC_SPI 74 IRQ_TYPE_LEVEL_HIGH>, @@ -982,8 +982,9 @@ phys = <&combphy2 PHY_TYPE_PCIE>; phy-names = "pcie-phy"; power-domains = <&power RK3568_PD_PIPE>; - ranges = <0x01000000 0x0 0x3ef00000 0x3 0x3ef00000 0x0 0x00100000 - 0x02000000 0x0 0x00000000 0x3 0x00000000 0x0 0x3ef00000>; + ranges = <0x01000000 0x0 0xf4100000 0x0 0xf4100000 0x0 0x00100000>, + <0x02000000 0x0 0xf4200000 0x0 0xf4200000 0x0 0x01e00000>, + <0x03000000 0x0 0x40000000 0x3 0x00000000 0x0 0x40000000>; resets = <&cru SRST_PCIE20_POWERUP>; reset-names = "pipe"; #address-cells = <3>;
From: Ming Lei ming.lei@redhat.com
commit 9c39b7a905d84b7da5f59d80f2e455853fea7217 upstream.
When __blkcg_rstat_flush() is called from cgroup_rstat_flush*() code path, interrupt is always disabled.
When we start to flush blkcg per-cpu stats list in __blkg_release() for avoiding to leak blkcg_gq's reference in commit 20cb1c2fb756 ("blk-cgroup: Flush stats before releasing blkcg_gq"), local irq isn't disabled yet, then lockdep warning may be triggered because the dependent cgroup locks may be acquired from irq(soft irq) handler.
Fix the issue by disabling local irq always.
Fixes: 20cb1c2fb756 ("blk-cgroup: Flush stats before releasing blkcg_gq") Reported-by: Shinichiro Kawasaki shinichiro.kawasaki@wdc.com Closes: https://lore.kernel.org/linux-block/pz2wzwnmn5tk3pwpskmjhli6g3qly7eoknilb26o... Cc: stable@vger.kernel.org Cc: Jay Shin jaeshin@redhat.com Cc: Tejun Heo tj@kernel.org Cc: Waiman Long longman@redhat.com Signed-off-by: Ming Lei ming.lei@redhat.com Reviewed-by: Waiman Long longman@redhat.com Link: https://lore.kernel.org/r/20230622084249.1208005-1-ming.lei@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- block/blk-cgroup.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -907,6 +907,7 @@ static void __blkcg_rstat_flush(struct b struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu); struct llist_node *lnode; struct blkg_iostat_set *bisc, *next_bisc; + unsigned long flags;
rcu_read_lock();
@@ -920,7 +921,7 @@ static void __blkcg_rstat_flush(struct b * When flushing from cgroup, cgroup_rstat_lock is always held, so * this lock won't cause contention most of time. */ - raw_spin_lock(&blkg_stat_lock); + raw_spin_lock_irqsave(&blkg_stat_lock, flags);
/* * Iterate only the iostat_cpu's queued in the lockless list. @@ -946,7 +947,7 @@ static void __blkcg_rstat_flush(struct b blkcg_iostat_update(parent, &blkg->iostat.cur, &blkg->iostat.last); } - raw_spin_unlock(&blkg_stat_lock); + raw_spin_unlock_irqrestore(&blkg_stat_lock, flags); out: rcu_read_unlock(); }
From: Jens Axboe axboe@kernel.dk
Commit ef7dfac51d8ed961b742218f526bd589f3900a59 upstream.
We selectively grab the ctx->uring_lock for poll update/removal, but we really should grab it from the start to fully synchronize with linked timeouts. Normally this is indeed the case, but if requests are forced async by the application, we don't fully cover removal and timer disarm within the uring_lock.
Make this simpler by having consistent locking state for poll removal.
Cc: stable@vger.kernel.org # 6.1+ Reported-by: Querijn Voet querijnqyn@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/poll.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
--- a/io_uring/poll.c +++ b/io_uring/poll.c @@ -977,8 +977,9 @@ int io_poll_remove(struct io_kiocb *req, struct io_hash_bucket *bucket; struct io_kiocb *preq; int ret2, ret = 0; - bool locked; + bool locked = true;
+ io_ring_submit_lock(ctx, issue_flags); preq = io_poll_find(ctx, true, &cd, &ctx->cancel_table, &bucket); ret2 = io_poll_disarm(preq); if (bucket) @@ -990,12 +991,10 @@ int io_poll_remove(struct io_kiocb *req, goto out; }
- io_ring_submit_lock(ctx, issue_flags); preq = io_poll_find(ctx, true, &cd, &ctx->cancel_table_locked, &bucket); ret2 = io_poll_disarm(preq); if (bucket) spin_unlock(&bucket->lock); - io_ring_submit_unlock(ctx, issue_flags); if (ret2) { ret = ret2; goto out; @@ -1019,7 +1018,7 @@ found: if (poll_update->update_user_data) preq->cqe.user_data = poll_update->new_user_data;
- ret2 = io_poll_add(preq, issue_flags); + ret2 = io_poll_add(preq, issue_flags & ~IO_URING_F_UNLOCKED); /* successfully updated, don't complete poll request */ if (!ret2 || ret2 == -EIOCBQUEUED) goto out; @@ -1027,9 +1026,9 @@ found:
req_set_fail(preq); io_req_set_res(preq, -ECANCELED, 0); - locked = !(issue_flags & IO_URING_F_UNLOCKED); io_req_task_complete(preq, &locked); out: + io_ring_submit_unlock(ctx, issue_flags); if (ret < 0) { req_set_fail(req); return ret;
From: Lee Jones lee@kernel.org
commit d082d48737c75d2b3cc1f972b8c8674c25131534 upstream.
KPTI keeps around two PGDs: one for userspace and another for the kernel. Among other things, set_pgd() contains infrastructure to ensure that updates to the kernel PGD are reflected in the user PGD as well.
One side-effect of this is that set_pgd() expects to be passed whole pages. Unfortunately, init_trampoline_kaslr() passes in a single entry: 'trampoline_pgd_entry'.
When KPTI is on, set_pgd() will update 'trampoline_pgd_entry' (an 8-Byte globally stored [.bss] variable) and will then proceed to replicate that value into the non-existent neighboring user page (located +4k away), leading to the corruption of other global [.bss] stored variables.
Fix it by directly assigning 'trampoline_pgd_entry' and avoiding set_pgd().
[ dhansen: tweak subject and changelog ]
Fixes: 0925dda5962e ("x86/mm/KASLR: Use only one PUD entry for real mode trampoline") Suggested-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Lee Jones lee@kernel.org Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20230614163859.924309-1-lee@kernel.org/g Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/mm/kaslr.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/arch/x86/mm/kaslr.c +++ b/arch/x86/mm/kaslr.c @@ -172,10 +172,10 @@ void __meminit init_trampoline_kaslr(voi set_p4d(p4d_tramp, __p4d(_KERNPG_TABLE | __pa(pud_page_tramp)));
- set_pgd(&trampoline_pgd_entry, - __pgd(_KERNPG_TABLE | __pa(p4d_page_tramp))); + trampoline_pgd_entry = + __pgd(_KERNPG_TABLE | __pa(p4d_page_tramp)); } else { - set_pgd(&trampoline_pgd_entry, - __pgd(_KERNPG_TABLE | __pa(pud_page_tramp))); + trampoline_pgd_entry = + __pgd(_KERNPG_TABLE | __pa(pud_page_tramp)); } }
From: Chen Aotian chenaotian2@163.com
[ Upstream commit a61675294735570daca3779bd1dbb3715f7232bd ]
After replacing e->info, it is necessary to free the old einfo.
Fixes: f25da51fdc38 ("ieee802154: hwsim: add replacement for fakelb") Reviewed-by: Miquel Raynal miquel.raynal@bootlin.com Reviewed-by: Alexander Aring aahringo@redhat.com Signed-off-by: Chen Aotian chenaotian2@163.com Link: https://lore.kernel.org/r/20230409022048.61223-1-chenaotian2@163.com Signed-off-by: Stefan Schmidt stefan@datenfreihafen.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ieee802154/mac802154_hwsim.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ieee802154/mac802154_hwsim.c b/drivers/net/ieee802154/mac802154_hwsim.c index 8445c2189d116..31cba9aa76366 100644 --- a/drivers/net/ieee802154/mac802154_hwsim.c +++ b/drivers/net/ieee802154/mac802154_hwsim.c @@ -685,7 +685,7 @@ static int hwsim_del_edge_nl(struct sk_buff *msg, struct genl_info *info) static int hwsim_set_edge_lqi(struct sk_buff *msg, struct genl_info *info) { struct nlattr *edge_attrs[MAC802154_HWSIM_EDGE_ATTR_MAX + 1]; - struct hwsim_edge_info *einfo; + struct hwsim_edge_info *einfo, *einfo_old; struct hwsim_phy *phy_v0; struct hwsim_edge *e; u32 v0, v1; @@ -723,8 +723,10 @@ static int hwsim_set_edge_lqi(struct sk_buff *msg, struct genl_info *info) list_for_each_entry_rcu(e, &phy_v0->edges, list) { if (e->endpoint->idx == v1) { einfo->lqi = lqi; - rcu_assign_pointer(e->info, einfo); + einfo_old = rcu_replace_pointer(e->info, einfo, + lockdep_is_held(&hwsim_phys_lock)); rcu_read_unlock(); + kfree_rcu(einfo_old, rcu); mutex_unlock(&hwsim_phys_lock); return 0; }
From: Benedict Wong benedictwong@google.com
[ Upstream commit 1f8b6df6a997a430b0c48b504638154b520781ad ]
This change allows inbound traffic through nested IPsec tunnels to successfully match policies and templates, while retaining the secpath stack trace as necessary for netfilter policies.
Specifically, this patch marks secpath entries that have already matched against a relevant policy as having been verified, allowing it to be treated as optional and skipped after a tunnel decapsulation (during which the src/dst/proto/etc may have changed, and the correct policy chain no long be resolvable).
This approach is taken as opposed to the iteration in b0355dbbf13c, where the secpath was cleared, since that breaks subsequent validations that rely on the existence of the secpath entries (netfilter policies, or transport-in-tunnel mode, where policies remain resolvable).
Fixes: b0355dbbf13c ("Fix XFRM-I support for nested ESP tunnels") Test: Tested against Android Kernel Unit Tests Test: Tested against Android CTS Signed-off-by: Benedict Wong benedictwong@google.com Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/xfrm.h | 1 + net/xfrm/xfrm_input.c | 1 + net/xfrm/xfrm_policy.c | 12 ++++++++++++ 3 files changed, 14 insertions(+)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 3e1f70e8e4247..47ecf1d4719b0 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -1049,6 +1049,7 @@ struct xfrm_offload { struct sec_path { int len; int olen; + int verified_cnt;
struct xfrm_state *xvec[XFRM_MAX_DEPTH]; struct xfrm_offload ovec[XFRM_MAX_OFFLOAD_DEPTH]; diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index 436d29640ac2c..9f294a20dcece 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -131,6 +131,7 @@ struct sec_path *secpath_set(struct sk_buff *skb) memset(sp->ovec, 0, sizeof(sp->ovec)); sp->olen = 0; sp->len = 0; + sp->verified_cnt = 0;
return sp; } diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 6d15788b51231..ff58ce6c030ca 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -3349,6 +3349,13 @@ xfrm_policy_ok(const struct xfrm_tmpl *tmpl, const struct sec_path *sp, int star if (xfrm_state_ok(tmpl, sp->xvec[idx], family, if_id)) return ++idx; if (sp->xvec[idx]->props.mode != XFRM_MODE_TRANSPORT) { + if (idx < sp->verified_cnt) { + /* Secpath entry previously verified, consider optional and + * continue searching + */ + continue; + } + if (start == -1) start = -2-idx; break; @@ -3723,6 +3730,9 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb, * Order is _important_. Later we will implement * some barriers, but at the moment barriers * are implied between each two transformations. + * Upon success, marks secpath entries as having been + * verified to allow them to be skipped in future policy + * checks (e.g. nested tunnels). */ for (i = xfrm_nr-1, k = 0; i >= 0; i--) { k = xfrm_policy_ok(tpp[i], sp, k, family, if_id); @@ -3741,6 +3751,8 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb, }
xfrm_pols_put(pols, npols); + sp->verified_cnt = k; + return 1; } XFRM_INC_STATS(net, LINUX_MIB_XFRMINPOLBLOCK);
From: Benedict Wong benedictwong@google.com
[ Upstream commit a287f5b0cfc6804c5b12a4be13c7c9fe27869e90 ]
This change adds methods in the XFRM-I input path that ensures that policies are checked prior to processing of the subsequent decapsulated packet, after which the relevant policies may no longer be resolvable (due to changing src/dst/proto/etc).
Notably, raw ESP/AH packets did not perform policy checks inherently, whereas all other encapsulated packets (UDP, TCP encapsulated) do policy checks after calling xfrm_input handling in the respective encapsulation layer.
Fixes: b0355dbbf13c ("Fix XFRM-I support for nested ESP tunnels") Test: Verified with additional Android Kernel Unit tests Test: Verified against Android CTS Signed-off-by: Benedict Wong benedictwong@google.com Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/xfrm/xfrm_interface_core.c | 54 +++++++++++++++++++++++++++++++--- 1 file changed, 50 insertions(+), 4 deletions(-)
diff --git a/net/xfrm/xfrm_interface_core.c b/net/xfrm/xfrm_interface_core.c index 1f99dc4690271..35279c220bd78 100644 --- a/net/xfrm/xfrm_interface_core.c +++ b/net/xfrm/xfrm_interface_core.c @@ -310,6 +310,52 @@ static void xfrmi_scrub_packet(struct sk_buff *skb, bool xnet) skb->mark = 0; }
+static int xfrmi_input(struct sk_buff *skb, int nexthdr, __be32 spi, + int encap_type, unsigned short family) +{ + struct sec_path *sp; + + sp = skb_sec_path(skb); + if (sp && (sp->len || sp->olen) && + !xfrm_policy_check(NULL, XFRM_POLICY_IN, skb, family)) + goto discard; + + XFRM_SPI_SKB_CB(skb)->family = family; + if (family == AF_INET) { + XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct iphdr, daddr); + XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4 = NULL; + } else { + XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct ipv6hdr, daddr); + XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 = NULL; + } + + return xfrm_input(skb, nexthdr, spi, encap_type); +discard: + kfree_skb(skb); + return 0; +} + +static int xfrmi4_rcv(struct sk_buff *skb) +{ + return xfrmi_input(skb, ip_hdr(skb)->protocol, 0, 0, AF_INET); +} + +static int xfrmi6_rcv(struct sk_buff *skb) +{ + return xfrmi_input(skb, skb_network_header(skb)[IP6CB(skb)->nhoff], + 0, 0, AF_INET6); +} + +static int xfrmi4_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) +{ + return xfrmi_input(skb, nexthdr, spi, encap_type, AF_INET); +} + +static int xfrmi6_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) +{ + return xfrmi_input(skb, nexthdr, spi, encap_type, AF_INET6); +} + static int xfrmi_rcv_cb(struct sk_buff *skb, int err) { const struct xfrm_mode *inner_mode; @@ -945,8 +991,8 @@ static struct pernet_operations xfrmi_net_ops = { };
static struct xfrm6_protocol xfrmi_esp6_protocol __read_mostly = { - .handler = xfrm6_rcv, - .input_handler = xfrm_input, + .handler = xfrmi6_rcv, + .input_handler = xfrmi6_input, .cb_handler = xfrmi_rcv_cb, .err_handler = xfrmi6_err, .priority = 10, @@ -996,8 +1042,8 @@ static struct xfrm6_tunnel xfrmi_ip6ip_handler __read_mostly = { #endif
static struct xfrm4_protocol xfrmi_esp4_protocol __read_mostly = { - .handler = xfrm4_rcv, - .input_handler = xfrm_input, + .handler = xfrmi4_rcv, + .input_handler = xfrmi4_input, .cb_handler = xfrmi_rcv_cb, .err_handler = xfrmi4_err, .priority = 10,
From: Reiji Watanabe reijiw@google.com
[ Upstream commit 8681f71759010503892f9e3ddb05f65c0f21b690 ]
Restore the host's PMUSERENR_EL0 value instead of clearing it, before returning back to userspace, as the host's EL0 might have a direct access to PMU registers (some bits of PMUSERENR_EL0 for might not be zero for the host EL0).
Fixes: 83a7a4d643d3 ("arm64: perf: Enable PMU counter userspace access for perf event") Signed-off-by: Reiji Watanabe reijiw@google.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20230603025035.3781797-2-reijiw@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/kvm/hyp/include/hyp/switch.h | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index 33f4d42003296..168365167a137 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -81,7 +81,12 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu) * EL1 instead of being trapped to EL2. */ if (kvm_arm_support_pmu_v3()) { + struct kvm_cpu_context *hctxt; + write_sysreg(0, pmselr_el0); + + hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt; + ctxt_sys_reg(hctxt, PMUSERENR_EL0) = read_sysreg(pmuserenr_el0); write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0); }
@@ -105,8 +110,12 @@ static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu) write_sysreg(vcpu->arch.mdcr_el2_host, mdcr_el2);
write_sysreg(0, hstr_el2); - if (kvm_arm_support_pmu_v3()) - write_sysreg(0, pmuserenr_el0); + if (kvm_arm_support_pmu_v3()) { + struct kvm_cpu_context *hctxt; + + hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt; + write_sysreg(ctxt_sys_reg(hctxt, PMUSERENR_EL0), pmuserenr_el0); + }
if (cpus_have_final_cap(ARM64_SME)) { sysreg_clear_set_s(SYS_HFGRTR_EL2, 0,
From: Leon Romanovsky leonro@nvidia.com
[ Upstream commit bf06fcf4be0feefebd27deb8b60ad262f4230489 ]
Offloaded policies are deleted through two flows: netdev is going down and policy flush.
In both cases, the code lacks relevant call to delete offloaded policy.
Fixes: 919e43fad516 ("xfrm: add an interface to offload policy") Signed-off-by: Leon Romanovsky leonro@nvidia.com Reviewed-by: Simon Horman simon.horman@corigine.com Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/xfrm/xfrm_policy.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index ff58ce6c030ca..e7617c9959c31 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -1831,6 +1831,7 @@ int xfrm_policy_flush(struct net *net, u8 type, bool task_valid)
__xfrm_policy_unlink(pol, dir); spin_unlock_bh(&net->xfrm.xfrm_policy_lock); + xfrm_dev_policy_delete(pol); cnt++; xfrm_audit_policy_delete(pol, 1, task_valid); xfrm_policy_kill(pol); @@ -1869,6 +1870,7 @@ int xfrm_dev_policy_flush(struct net *net, struct net_device *dev,
__xfrm_policy_unlink(pol, dir); spin_unlock_bh(&net->xfrm.xfrm_policy_lock); + xfrm_dev_policy_delete(pol); cnt++; xfrm_audit_policy_delete(pol, 1, task_valid); xfrm_policy_kill(pol);
From: Maxim Mikityanskiy maxim@isovalent.com
[ Upstream commit 713274f1f2c896d37017efee333fd44149710119 ]
The following scenario describes a bug in the verifier where it incorrectly concludes about equivalent scalar IDs which could lead to verifier bypass in privileged mode:
1. Prepare a 32-bit rogue number. 2. Put the rogue number into the upper half of a 64-bit register, and roll a random (unknown to the verifier) bit in the lower half. The rest of the bits should be zero (although variations are possible). 3. Assign an ID to the register by MOVing it to another arbitrary register. 4. Perform a 32-bit spill of the register, then perform a 32-bit fill to another register. Due to a bug in the verifier, the ID will be preserved, although the new register will contain only the lower 32 bits, i.e. all zeros except one random bit.
At this point there are two registers with different values but the same ID, which means the integrity of the verifier state has been corrupted.
5. Compare the new 32-bit register with 0. In the branch where it's equal to 0, the verifier will believe that the original 64-bit register is also 0, because it has the same ID, but its actual value still contains the rogue number in the upper half. Some optimizations of the verifier prevent the actual bypass, so extra care is needed: the comparison must be between two registers, and both branches must be reachable (this is why one random bit is needed). Both branches are still suitable for the bypass. 6. Right shift the original register by 32 bits to pop the rogue number. 7. Use the rogue number as an offset with any pointer. The verifier will believe that the offset is 0, while in reality it's the given number.
The fix is similar to the 32-bit BPF_MOV handling in check_alu_op for SCALAR_VALUE. If the spill is narrowing the actual register value, don't keep the ID, make sure it's reset to 0.
Fixes: 354e8f1970f8 ("bpf: Support <8-byte scalar spill and refill") Signed-off-by: Maxim Mikityanskiy maxim@isovalent.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Tested-by: Andrii Nakryiko andrii@kernel.org # Checked veristat delta Acked-by: Yonghong Song yhs@fb.com Link: https://lore.kernel.org/bpf/20230607123951.558971-2-maxtram95@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/verifier.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 1f6c0c5a1741d..b039990137444 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3559,6 +3559,9 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, return err; } save_register_state(state, spi, reg, size); + /* Break the relation on a narrowing spill. */ + if (fls64(reg->umax_value) > BITS_PER_BYTE * size) + state->stack[spi].spilled_ptr.id = 0; } else if (!reg && !(off % BPF_REG_SIZE) && is_bpf_st_mem(insn) && insn->imm != 0 && env->bpf_capable) { struct bpf_reg_state fake_reg = {};
From: Maciej Żenczykowski maze@google.com
[ Upstream commit 1166a530a84758bb9e6b448fc8c195ed413f5ded ]
Before Linux v5.8 an AF_INET6 SOCK_DGRAM (udp/udplite) socket with SOL_UDP, UDP_ENCAP, UDP_ENCAP_ESPINUDP{,_NON_IKE} enabled would just unconditionally use xfrm4_udp_encap_rcv(), afterwards such a socket would use the newly added xfrm6_udp_encap_rcv() which only handles IPv6 packets.
Cc: Sabrina Dubroca sd@queasysnail.net Cc: Steffen Klassert steffen.klassert@secunet.com Cc: Jakub Kicinski kuba@kernel.org Cc: Benedict Wong benedictwong@google.com Cc: Yan Yan evitayan@google.com Fixes: 0146dca70b87 ("xfrm: add support for UDPv6 encapsulation of ESP") Signed-off-by: Maciej Żenczykowski maze@google.com Reviewed-by: Simon Horman simon.horman@corigine.com Reviewed-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/xfrm4_input.c | 1 + net/ipv6/xfrm6_input.c | 3 +++ 2 files changed, 4 insertions(+)
diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c index ad2afeef4f106..eac206a290d05 100644 --- a/net/ipv4/xfrm4_input.c +++ b/net/ipv4/xfrm4_input.c @@ -164,6 +164,7 @@ int xfrm4_udp_encap_rcv(struct sock *sk, struct sk_buff *skb) kfree_skb(skb); return 0; } +EXPORT_SYMBOL(xfrm4_udp_encap_rcv);
int xfrm4_rcv(struct sk_buff *skb) { diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c index 04cbeefd89828..4907ab241d6be 100644 --- a/net/ipv6/xfrm6_input.c +++ b/net/ipv6/xfrm6_input.c @@ -86,6 +86,9 @@ int xfrm6_udp_encap_rcv(struct sock *sk, struct sk_buff *skb) __be32 *udpdata32; __u16 encap_type = up->encap_type;
+ if (skb->protocol == htons(ETH_P_IP)) + return xfrm4_udp_encap_rcv(sk, skb); + /* if this is not encapsulated socket, then just return now */ if (!encap_type) return 1;
From: Yonghong Song yhs@fb.com
[ Upstream commit ad96f1c9138e0897bee7f7c5e54b3e24f8b62f57 ]
The sysctl net/core/bpf_jit_enable does not work now due to commit 1022a5498f6f ("bpf, x86_64: Use bpf_jit_binary_pack_alloc"). The commit saved the jitted insns into 'rw_image' instead of 'image' which caused bpf_jit_dump not dumping proper content.
With 'echo 2 > /proc/sys/net/core/bpf_jit_enable', run './test_progs -t fentry_test'. Without this patch, one of jitted image for one particular prog is:
flen=17 proglen=92 pass=4 image=0000000014c64883 from=test_progs pid=1807 00000000: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 00000010: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 00000020: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 00000030: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 00000040: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 00000050: cc cc cc cc cc cc cc cc cc cc cc cc
With this patch, the jitte image for the same prog is:
flen=17 proglen=92 pass=4 image=00000000b90254b7 from=test_progs pid=1809 00000000: f3 0f 1e fa 0f 1f 44 00 00 66 90 55 48 89 e5 f3 00000010: 0f 1e fa 31 f6 48 8b 57 00 48 83 fa 07 75 2b 48 00000020: 8b 57 10 83 fa 09 75 22 48 8b 57 08 48 81 e2 ff 00000030: 00 00 00 48 83 fa 08 75 11 48 8b 7f 18 be 01 00 00000040: 00 00 48 83 ff 0a 74 02 31 f6 48 bf 18 d0 14 00 00000050: 00 c9 ff ff 48 89 77 00 31 c0 c9 c3
Fixes: 1022a5498f6f ("bpf, x86_64: Use bpf_jit_binary_pack_alloc") Signed-off-by: Yonghong Song yhs@fb.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Song Liu song@kernel.org Link: https://lore.kernel.org/bpf/20230609005439.3173569-1-yhs@fb.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/net/bpf_jit_comp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 1056bbf55b172..438adb695daab 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -2570,7 +2570,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) }
if (bpf_jit_enable > 1) - bpf_jit_dump(prog->len, proglen, pass + 1, image); + bpf_jit_dump(prog->len, proglen, pass + 1, rw_image);
if (image) { if (!prog->is_func || extra_pass) {
From: Magali Lemes magali.lemes@canonical.com
[ Upstream commit d113c395c67b62fc0d3f2004c0afc406aca0a2b7 ]
TLS selftests use the ChaCha20-Poly1305 and SM4 algorithms, which are not FIPS compliant. When fips=1, this set of tests fails. Add a check and only run these tests if not in FIPS mode.
Fixes: 4f336e88a870 ("selftests/tls: add CHACHA20-POLY1305 to tls selftests") Fixes: e506342a03c7 ("selftests/tls: add SM4 GCM/CCM to tls selftests") Reviewed-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Magali Lemes magali.lemes@canonical.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/tls.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c index 2cbb12736596d..c0ad8385441f2 100644 --- a/tools/testing/selftests/net/tls.c +++ b/tools/testing/selftests/net/tls.c @@ -25,6 +25,8 @@ #define TLS_PAYLOAD_MAX_LEN 16384 #define SOL_TLS 282
+static int fips_enabled; + struct tls_crypto_info_keys { union { struct tls12_crypto_info_aes_gcm_128 aes128; @@ -235,7 +237,7 @@ FIXTURE_VARIANT(tls) { uint16_t tls_version; uint16_t cipher_type; - bool nopad; + bool nopad, fips_non_compliant; };
FIXTURE_VARIANT_ADD(tls, 12_aes_gcm) @@ -254,24 +256,28 @@ FIXTURE_VARIANT_ADD(tls, 12_chacha) { .tls_version = TLS_1_2_VERSION, .cipher_type = TLS_CIPHER_CHACHA20_POLY1305, + .fips_non_compliant = true, };
FIXTURE_VARIANT_ADD(tls, 13_chacha) { .tls_version = TLS_1_3_VERSION, .cipher_type = TLS_CIPHER_CHACHA20_POLY1305, + .fips_non_compliant = true, };
FIXTURE_VARIANT_ADD(tls, 13_sm4_gcm) { .tls_version = TLS_1_3_VERSION, .cipher_type = TLS_CIPHER_SM4_GCM, + .fips_non_compliant = true, };
FIXTURE_VARIANT_ADD(tls, 13_sm4_ccm) { .tls_version = TLS_1_3_VERSION, .cipher_type = TLS_CIPHER_SM4_CCM, + .fips_non_compliant = true, };
FIXTURE_VARIANT_ADD(tls, 12_aes_ccm) @@ -311,6 +317,9 @@ FIXTURE_SETUP(tls) int one = 1; int ret;
+ if (fips_enabled && variant->fips_non_compliant) + SKIP(return, "Unsupported cipher in FIPS mode"); + tls_crypto_info_init(variant->tls_version, variant->cipher_type, &tls12);
@@ -1820,4 +1829,17 @@ TEST(tls_v6ops) { close(sfd); }
+static void __attribute__((constructor)) fips_check(void) { + int res; + FILE *f; + + f = fopen("/proc/sys/crypto/fips_enabled", "r"); + if (f) { + res = fscanf(f, "%d", &fips_enabled); + if (res != 1) + ksft_print_msg("ERROR: Couldn't read /proc/sys/crypto/fips_enabled\n"); + fclose(f); + } +} + TEST_HARNESS_MAIN
From: Magali Lemes magali.lemes@canonical.com
[ Upstream commit cb43c60e64ca67fcc9d23bd08f51d2ab8209d9d7 ]
The vrf-xfrm-tests tests use the hmac(md5) and cbc(des3_ede) algorithms for performing authentication and encryption, respectively. This causes the tests to fail when fips=1 is set, since these algorithms are not allowed in FIPS mode. Therefore, switch from hmac(md5) and cbc(des3_ede) to hmac(sha1) and cbc(aes), which are FIPS compliant.
Fixes: 3f251d741150 ("selftests: Add tests for vrf and xfrms") Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Magali Lemes magali.lemes@canonical.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/vrf-xfrm-tests.sh | 32 +++++++++---------- 1 file changed, 16 insertions(+), 16 deletions(-)
diff --git a/tools/testing/selftests/net/vrf-xfrm-tests.sh b/tools/testing/selftests/net/vrf-xfrm-tests.sh index 184da81f554ff..452638ae8aed8 100755 --- a/tools/testing/selftests/net/vrf-xfrm-tests.sh +++ b/tools/testing/selftests/net/vrf-xfrm-tests.sh @@ -264,60 +264,60 @@ setup_xfrm() ip -netns host1 xfrm state add src ${HOST1_4} dst ${HOST2_4} \ proto esp spi ${SPI_1} reqid 0 mode tunnel \ replay-window 4 replay-oseq 0x4 \ - auth-trunc 'hmac(md5)' ${AUTH_1} 96 \ - enc 'cbc(des3_ede)' ${ENC_1} \ + auth-trunc 'hmac(sha1)' ${AUTH_1} 96 \ + enc 'cbc(aes)' ${ENC_1} \ sel src ${h1_4} dst ${h2_4} ${devarg}
ip -netns host2 xfrm state add src ${HOST1_4} dst ${HOST2_4} \ proto esp spi ${SPI_1} reqid 0 mode tunnel \ replay-window 4 replay-oseq 0x4 \ - auth-trunc 'hmac(md5)' ${AUTH_1} 96 \ - enc 'cbc(des3_ede)' ${ENC_1} \ + auth-trunc 'hmac(sha1)' ${AUTH_1} 96 \ + enc 'cbc(aes)' ${ENC_1} \ sel src ${h1_4} dst ${h2_4}
ip -netns host1 xfrm state add src ${HOST2_4} dst ${HOST1_4} \ proto esp spi ${SPI_2} reqid 0 mode tunnel \ replay-window 4 replay-oseq 0x4 \ - auth-trunc 'hmac(md5)' ${AUTH_2} 96 \ - enc 'cbc(des3_ede)' ${ENC_2} \ + auth-trunc 'hmac(sha1)' ${AUTH_2} 96 \ + enc 'cbc(aes)' ${ENC_2} \ sel src ${h2_4} dst ${h1_4} ${devarg}
ip -netns host2 xfrm state add src ${HOST2_4} dst ${HOST1_4} \ proto esp spi ${SPI_2} reqid 0 mode tunnel \ replay-window 4 replay-oseq 0x4 \ - auth-trunc 'hmac(md5)' ${AUTH_2} 96 \ - enc 'cbc(des3_ede)' ${ENC_2} \ + auth-trunc 'hmac(sha1)' ${AUTH_2} 96 \ + enc 'cbc(aes)' ${ENC_2} \ sel src ${h2_4} dst ${h1_4}
ip -6 -netns host1 xfrm state add src ${HOST1_6} dst ${HOST2_6} \ proto esp spi ${SPI_1} reqid 0 mode tunnel \ replay-window 4 replay-oseq 0x4 \ - auth-trunc 'hmac(md5)' ${AUTH_1} 96 \ - enc 'cbc(des3_ede)' ${ENC_1} \ + auth-trunc 'hmac(sha1)' ${AUTH_1} 96 \ + enc 'cbc(aes)' ${ENC_1} \ sel src ${h1_6} dst ${h2_6} ${devarg}
ip -6 -netns host2 xfrm state add src ${HOST1_6} dst ${HOST2_6} \ proto esp spi ${SPI_1} reqid 0 mode tunnel \ replay-window 4 replay-oseq 0x4 \ - auth-trunc 'hmac(md5)' ${AUTH_1} 96 \ - enc 'cbc(des3_ede)' ${ENC_1} \ + auth-trunc 'hmac(sha1)' ${AUTH_1} 96 \ + enc 'cbc(aes)' ${ENC_1} \ sel src ${h1_6} dst ${h2_6}
ip -6 -netns host1 xfrm state add src ${HOST2_6} dst ${HOST1_6} \ proto esp spi ${SPI_2} reqid 0 mode tunnel \ replay-window 4 replay-oseq 0x4 \ - auth-trunc 'hmac(md5)' ${AUTH_2} 96 \ - enc 'cbc(des3_ede)' ${ENC_2} \ + auth-trunc 'hmac(sha1)' ${AUTH_2} 96 \ + enc 'cbc(aes)' ${ENC_2} \ sel src ${h2_6} dst ${h1_6} ${devarg}
ip -6 -netns host2 xfrm state add src ${HOST2_6} dst ${HOST1_6} \ proto esp spi ${SPI_2} reqid 0 mode tunnel \ replay-window 4 replay-oseq 0x4 \ - auth-trunc 'hmac(md5)' ${AUTH_2} 96 \ - enc 'cbc(des3_ede)' ${ENC_2} \ + auth-trunc 'hmac(sha1)' ${AUTH_2} 96 \ + enc 'cbc(aes)' ${ENC_2} \ sel src ${h2_6} dst ${h1_6} }
From: Magali Lemes magali.lemes@canonical.com
[ Upstream commit d7a2fc1437f71cb058c7b11bc33dfc19e4bf277a ]
There are some MD5 tests which fail when the kernel is in FIPS mode, since MD5 is not FIPS compliant. Add a check and only run those tests if FIPS mode is not enabled.
Fixes: f0bee1ebb5594 ("fcnal-test: Add TCP MD5 tests") Fixes: 5cad8bce26e01 ("fcnal-test: Add TCP MD5 tests for VRF") Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Magali Lemes magali.lemes@canonical.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/fcnal-test.sh | 27 ++++++++++++++++------- 1 file changed, 19 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/net/fcnal-test.sh b/tools/testing/selftests/net/fcnal-test.sh index 21ca91473c095..ee6880ac3e5ed 100755 --- a/tools/testing/selftests/net/fcnal-test.sh +++ b/tools/testing/selftests/net/fcnal-test.sh @@ -92,6 +92,13 @@ NSC_CMD="ip netns exec ${NSC}"
which ping6 > /dev/null 2>&1 && ping6=$(which ping6) || ping6=$(which ping)
+# Check if FIPS mode is enabled +if [ -f /proc/sys/crypto/fips_enabled ]; then + fips_enabled=`cat /proc/sys/crypto/fips_enabled` +else + fips_enabled=0 +fi + ################################################################################ # utilities
@@ -1216,7 +1223,7 @@ ipv4_tcp_novrf() run_cmd nettest -d ${NSA_DEV} -r ${a} log_test_addr ${a} $? 1 "No server, device client, local conn"
- ipv4_tcp_md5_novrf + [ "$fips_enabled" = "1" ] || ipv4_tcp_md5_novrf }
ipv4_tcp_vrf() @@ -1270,9 +1277,11 @@ ipv4_tcp_vrf() log_test_addr ${a} $? 1 "Global server, local connection"
# run MD5 tests - setup_vrf_dup - ipv4_tcp_md5 - cleanup_vrf_dup + if [ "$fips_enabled" = "0" ]; then + setup_vrf_dup + ipv4_tcp_md5 + cleanup_vrf_dup + fi
# # enable VRF global server @@ -2772,7 +2781,7 @@ ipv6_tcp_novrf() log_test_addr ${a} $? 1 "No server, device client, local conn" done
- ipv6_tcp_md5_novrf + [ "$fips_enabled" = "1" ] || ipv6_tcp_md5_novrf }
ipv6_tcp_vrf() @@ -2842,9 +2851,11 @@ ipv6_tcp_vrf() log_test_addr ${a} $? 1 "Global server, local connection"
# run MD5 tests - setup_vrf_dup - ipv6_tcp_md5 - cleanup_vrf_dup + if [ "$fips_enabled" = "0" ]; then + setup_vrf_dup + ipv6_tcp_md5 + cleanup_vrf_dup + fi
# # enable VRF global server
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
[ Upstream commit f015b900bc3285322029b4a7d132d6aeb0e51857 ]
With offloading enabled, esp_xmit() gets invoked very late, from within validate_xmit_xfrm() which is after validate_xmit_skb() validates and linearizes the skb if the underlying device does not support fragments.
esp_output_tail() may add a fragment to the skb while adding the auth tag/ IV. Devices without the proper support will then send skb->data points to with the correct length so the packet will have garbage at the end. A pcap sniffer will claim that the proper data has been sent since it parses the skb properly.
It is not affected with INET_ESP_OFFLOAD disabled.
Linearize the skb after offloading if the sending hardware requires it. It was tested on v4, v6 has been adopted.
Fixes: 7785bba299a8d ("esp: Add a software GRO codepath") Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/esp4_offload.c | 3 +++ net/ipv6/esp6_offload.c | 3 +++ 2 files changed, 6 insertions(+)
diff --git a/net/ipv4/esp4_offload.c b/net/ipv4/esp4_offload.c index 3969fa805679c..ee848be59e65a 100644 --- a/net/ipv4/esp4_offload.c +++ b/net/ipv4/esp4_offload.c @@ -340,6 +340,9 @@ static int esp_xmit(struct xfrm_state *x, struct sk_buff *skb, netdev_features_
secpath_reset(skb);
+ if (skb_needs_linearize(skb, skb->dev->features) && + __skb_linearize(skb)) + return -ENOMEM; return 0; }
diff --git a/net/ipv6/esp6_offload.c b/net/ipv6/esp6_offload.c index 75c02992c520f..7723402689973 100644 --- a/net/ipv6/esp6_offload.c +++ b/net/ipv6/esp6_offload.c @@ -374,6 +374,9 @@ static int esp6_xmit(struct xfrm_state *x, struct sk_buff *skb, netdev_features
secpath_reset(skb);
+ if (skb_needs_linearize(skb, skb->dev->features) && + __skb_linearize(skb)) + return -ENOMEM; return 0; }
From: Yevgeny Kliteynik kliteyn@nvidia.com
[ Upstream commit ef4c5afc783dc3d47640270a9b94713229c697e8 ]
When TUNNEL_L3_TO_L2 decap action was created, a pointer to a local variable was passed as its HW action data, resulting in attempt to free invalid address:
BUG: KASAN: invalid-free in mlx5dr_action_destroy+0x318/0x410 [mlx5_core]
Fixes: 4781df92f4da ("net/mlx5: DR, Move STEv0 modify header logic") Signed-off-by: Yevgeny Kliteynik kliteyn@nvidia.com Reviewed-by: Alex Vesker valex@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/steering/dr_action.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c index ee104cf04392f..438c3bfae762a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c @@ -1403,9 +1403,13 @@ dr_action_create_reformat_action(struct mlx5dr_domain *dmn, } case DR_ACTION_TYP_TNL_L3_TO_L2: { - u8 hw_actions[ACTION_CACHE_LINE_SIZE] = {}; + u8 *hw_actions; int ret;
+ hw_actions = kzalloc(ACTION_CACHE_LINE_SIZE, GFP_KERNEL); + if (!hw_actions) + return -ENOMEM; + ret = mlx5dr_ste_set_action_decap_l3_list(dmn->ste_ctx, data, data_sz, hw_actions, @@ -1413,6 +1417,7 @@ dr_action_create_reformat_action(struct mlx5dr_domain *dmn, &action->rewrite->num_of_actions); if (ret) { mlx5dr_dbg(dmn, "Failed creating decap l3 action list\n"); + kfree(hw_actions); return ret; }
@@ -1420,6 +1425,7 @@ dr_action_create_reformat_action(struct mlx5dr_domain *dmn, DR_CHUNK_SIZE_8); if (!action->rewrite->chunk) { mlx5dr_dbg(dmn, "Failed allocating modify header chunk\n"); + kfree(hw_actions); return -ENOMEM; }
@@ -1433,6 +1439,7 @@ dr_action_create_reformat_action(struct mlx5dr_domain *dmn, if (ret) { mlx5dr_dbg(dmn, "Writing decap l3 actions to ICM failed\n"); mlx5dr_icm_free_chunk(action->rewrite->chunk); + kfree(hw_actions); return ret; } return 0;
From: Íñigo Huguet ihuguet@redhat.com
[ Upstream commit 4aaf2c52834b7f95acdf9fb0211a1b60adbf421b ]
When running workloads heavy unbalanced towards TX (high TX, low RX traffic), sfc driver can retain the CPU during too long times. Although in many cases this is not enough to be visible, it can affect performance and system responsiveness.
A way to reproduce it is to use a debug kernel and run some parallel netperf TX tests. In some systems, this will lead to this message being logged: kernel:watchdog: BUG: soft lockup - CPU#12 stuck for 22s!
The reason is that sfc driver doesn't account any NAPI budget for the TX completion events work. With high-TX/low-RX traffic, this makes that the CPU is held for long time for NAPI poll.
Documentations says "drivers can process completions for any number of Tx packets but should only process up to budget number of Rx packets". However, many drivers do limit the amount of TX completions that they process in a single NAPI poll.
In the same way, this patch adds a limit for the TX work in sfc. With the patch applied, the watchdog warning never appears.
Tested with netperf in different combinations: single process / parallel processes, TCP / UDP and different sizes of UDP messages. Repeated the tests before and after the patch, without any noticeable difference in network or CPU performance.
Test hardware: Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz (4 cores, 2 threads/core) Solarflare Communications XtremeScale X2522-25G Network Adapter
Fixes: 5227ecccea2d ("sfc: remove tx and MCDI handling from NAPI budget consideration") Fixes: d19a53721863 ("sfc_ef100: TX path for EF100 NICs") Reported-by: Fei Liu feliu@redhat.com Signed-off-by: Íñigo Huguet ihuguet@redhat.com Acked-by: Martin Habets habetsm.xilinx@gmail.com Link: https://lore.kernel.org/r/20230615084929.10506-1-ihuguet@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/sfc/ef10.c | 25 ++++++++++++++++++------- drivers/net/ethernet/sfc/ef100_nic.c | 7 ++++++- drivers/net/ethernet/sfc/ef100_tx.c | 4 ++-- drivers/net/ethernet/sfc/ef100_tx.h | 2 +- drivers/net/ethernet/sfc/tx_common.c | 4 +++- drivers/net/ethernet/sfc/tx_common.h | 2 +- 6 files changed, 31 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c index d30459dbfe8f8..b63e47af63655 100644 --- a/drivers/net/ethernet/sfc/ef10.c +++ b/drivers/net/ethernet/sfc/ef10.c @@ -2950,7 +2950,7 @@ static u32 efx_ef10_extract_event_ts(efx_qword_t *event) return tstamp; }
-static void +static int efx_ef10_handle_tx_event(struct efx_channel *channel, efx_qword_t *event) { struct efx_nic *efx = channel->efx; @@ -2958,13 +2958,14 @@ efx_ef10_handle_tx_event(struct efx_channel *channel, efx_qword_t *event) unsigned int tx_ev_desc_ptr; unsigned int tx_ev_q_label; unsigned int tx_ev_type; + int work_done; u64 ts_part;
if (unlikely(READ_ONCE(efx->reset_pending))) - return; + return 0;
if (unlikely(EFX_QWORD_FIELD(*event, ESF_DZ_TX_DROP_EVENT))) - return; + return 0;
/* Get the transmit queue */ tx_ev_q_label = EFX_QWORD_FIELD(*event, ESF_DZ_TX_QLABEL); @@ -2973,8 +2974,7 @@ efx_ef10_handle_tx_event(struct efx_channel *channel, efx_qword_t *event) if (!tx_queue->timestamping) { /* Transmit completion */ tx_ev_desc_ptr = EFX_QWORD_FIELD(*event, ESF_DZ_TX_DESCR_INDX); - efx_xmit_done(tx_queue, tx_ev_desc_ptr & tx_queue->ptr_mask); - return; + return efx_xmit_done(tx_queue, tx_ev_desc_ptr & tx_queue->ptr_mask); }
/* Transmit timestamps are only available for 8XXX series. They result @@ -3000,6 +3000,7 @@ efx_ef10_handle_tx_event(struct efx_channel *channel, efx_qword_t *event) * fields in the event. */ tx_ev_type = EFX_QWORD_FIELD(*event, ESF_EZ_TX_SOFT1); + work_done = 0;
switch (tx_ev_type) { case TX_TIMESTAMP_EVENT_TX_EV_COMPLETION: @@ -3016,6 +3017,7 @@ efx_ef10_handle_tx_event(struct efx_channel *channel, efx_qword_t *event) tx_queue->completed_timestamp_major = ts_part;
efx_xmit_done_single(tx_queue); + work_done = 1; break;
default: @@ -3026,6 +3028,8 @@ efx_ef10_handle_tx_event(struct efx_channel *channel, efx_qword_t *event) EFX_QWORD_VAL(*event)); break; } + + return work_done; }
static void @@ -3081,13 +3085,16 @@ static void efx_ef10_handle_driver_generated_event(struct efx_channel *channel, } }
+#define EFX_NAPI_MAX_TX 512 + static int efx_ef10_ev_process(struct efx_channel *channel, int quota) { struct efx_nic *efx = channel->efx; efx_qword_t event, *p_event; unsigned int read_ptr; - int ev_code; + int spent_tx = 0; int spent = 0; + int ev_code;
if (quota <= 0) return spent; @@ -3126,7 +3133,11 @@ static int efx_ef10_ev_process(struct efx_channel *channel, int quota) } break; case ESE_DZ_EV_CODE_TX_EV: - efx_ef10_handle_tx_event(channel, &event); + spent_tx += efx_ef10_handle_tx_event(channel, &event); + if (spent_tx >= EFX_NAPI_MAX_TX) { + spent = quota; + goto out; + } break; case ESE_DZ_EV_CODE_DRIVER_EV: efx_ef10_handle_driver_event(channel, &event); diff --git a/drivers/net/ethernet/sfc/ef100_nic.c b/drivers/net/ethernet/sfc/ef100_nic.c index 4dc643b0d2db4..7adde9639c8ab 100644 --- a/drivers/net/ethernet/sfc/ef100_nic.c +++ b/drivers/net/ethernet/sfc/ef100_nic.c @@ -253,6 +253,8 @@ static void ef100_ev_read_ack(struct efx_channel *channel) efx_reg(channel->efx, ER_GZ_EVQ_INT_PRIME)); }
+#define EFX_NAPI_MAX_TX 512 + static int ef100_ev_process(struct efx_channel *channel, int quota) { struct efx_nic *efx = channel->efx; @@ -260,6 +262,7 @@ static int ef100_ev_process(struct efx_channel *channel, int quota) bool evq_phase, old_evq_phase; unsigned int read_ptr; efx_qword_t *p_event; + int spent_tx = 0; int spent = 0; bool ev_phase; int ev_type; @@ -295,7 +298,9 @@ static int ef100_ev_process(struct efx_channel *channel, int quota) efx_mcdi_process_event(channel, p_event); break; case ESE_GZ_EF100_EV_TX_COMPLETION: - ef100_ev_tx(channel, p_event); + spent_tx += ef100_ev_tx(channel, p_event); + if (spent_tx >= EFX_NAPI_MAX_TX) + spent = quota; break; case ESE_GZ_EF100_EV_DRIVER: netif_info(efx, drv, efx->net_dev, diff --git a/drivers/net/ethernet/sfc/ef100_tx.c b/drivers/net/ethernet/sfc/ef100_tx.c index 29ffaf35559d6..849e5555bd128 100644 --- a/drivers/net/ethernet/sfc/ef100_tx.c +++ b/drivers/net/ethernet/sfc/ef100_tx.c @@ -346,7 +346,7 @@ void ef100_tx_write(struct efx_tx_queue *tx_queue) ef100_tx_push_buffers(tx_queue); }
-void ef100_ev_tx(struct efx_channel *channel, const efx_qword_t *p_event) +int ef100_ev_tx(struct efx_channel *channel, const efx_qword_t *p_event) { unsigned int tx_done = EFX_QWORD_FIELD(*p_event, ESF_GZ_EV_TXCMPL_NUM_DESC); @@ -357,7 +357,7 @@ void ef100_ev_tx(struct efx_channel *channel, const efx_qword_t *p_event) unsigned int tx_index = (tx_queue->read_count + tx_done - 1) & tx_queue->ptr_mask;
- efx_xmit_done(tx_queue, tx_index); + return efx_xmit_done(tx_queue, tx_index); }
/* Add a socket buffer to a TX queue diff --git a/drivers/net/ethernet/sfc/ef100_tx.h b/drivers/net/ethernet/sfc/ef100_tx.h index e9e11540fcdea..d9a0819c5a72c 100644 --- a/drivers/net/ethernet/sfc/ef100_tx.h +++ b/drivers/net/ethernet/sfc/ef100_tx.h @@ -20,7 +20,7 @@ void ef100_tx_init(struct efx_tx_queue *tx_queue); void ef100_tx_write(struct efx_tx_queue *tx_queue); unsigned int ef100_tx_max_skb_descs(struct efx_nic *efx);
-void ef100_ev_tx(struct efx_channel *channel, const efx_qword_t *p_event); +int ef100_ev_tx(struct efx_channel *channel, const efx_qword_t *p_event);
netdev_tx_t ef100_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb); int __ef100_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb, diff --git a/drivers/net/ethernet/sfc/tx_common.c b/drivers/net/ethernet/sfc/tx_common.c index 67e789b96c437..755aa92bf8236 100644 --- a/drivers/net/ethernet/sfc/tx_common.c +++ b/drivers/net/ethernet/sfc/tx_common.c @@ -249,7 +249,7 @@ void efx_xmit_done_check_empty(struct efx_tx_queue *tx_queue) } }
-void efx_xmit_done(struct efx_tx_queue *tx_queue, unsigned int index) +int efx_xmit_done(struct efx_tx_queue *tx_queue, unsigned int index) { unsigned int fill_level, pkts_compl = 0, bytes_compl = 0; unsigned int efv_pkts_compl = 0; @@ -279,6 +279,8 @@ void efx_xmit_done(struct efx_tx_queue *tx_queue, unsigned int index) }
efx_xmit_done_check_empty(tx_queue); + + return pkts_compl + efv_pkts_compl; }
/* Remove buffers put into a tx_queue for the current packet. diff --git a/drivers/net/ethernet/sfc/tx_common.h b/drivers/net/ethernet/sfc/tx_common.h index d87aecbc7bf1a..1e9f42938aac9 100644 --- a/drivers/net/ethernet/sfc/tx_common.h +++ b/drivers/net/ethernet/sfc/tx_common.h @@ -28,7 +28,7 @@ static inline bool efx_tx_buffer_in_use(struct efx_tx_buffer *buffer) }
void efx_xmit_done_check_empty(struct efx_tx_queue *tx_queue); -void efx_xmit_done(struct efx_tx_queue *tx_queue, unsigned int index); +int efx_xmit_done(struct efx_tx_queue *tx_queue, unsigned int index);
void efx_enqueue_unwind(struct efx_tx_queue *tx_queue, unsigned int insert_count);
From: Stefan Wahren stefan.wahren@i2se.com
[ Upstream commit 92717c2356cb62c89e8a3dc37cbbab2502562524 ]
In case the QCA7000 is not available via SPI (e.g. in reset), the driver will cause a high load. The reason for this is that the synchronization is never finished and schedule() is never called. Since the synchronization is not timing critical, it's safe to drop this from the scheduling condition.
Signed-off-by: Stefan Wahren stefan.wahren@i2se.com Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/qualcomm/qca_spi.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c index c865a4be05eec..4a1b94e5a8ea9 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.c +++ b/drivers/net/ethernet/qualcomm/qca_spi.c @@ -582,8 +582,7 @@ qcaspi_spi_thread(void *data) while (!kthread_should_stop()) { set_current_state(TASK_INTERRUPTIBLE); if ((qca->intr_req == qca->intr_svc) && - (qca->txr.skb[qca->txr.head] == NULL) && - (qca->sync == QCASPI_SYNC_READY)) + !qca->txr.skb[qca->txr.head]) schedule();
set_current_state(TASK_RUNNING);
From: Sergey Shtylyov s.shtylyov@omp.ru
[ Upstream commit 0c4dc0f054891a2cbde0426b0c0fdf232d89f47f ]
The driver overrides the error codes returned by platform_get_irq() to -EINVAL, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the error codes upstream.
Fixes: 208489032bdd ("mmc: mediatek: Add Mediatek MMC driver") Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Link: https://lore.kernel.org/r/20230617203622.6812-4-s.shtylyov@omp.ru Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/host/mtk-sd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mmc/host/mtk-sd.c b/drivers/mmc/host/mtk-sd.c index edade0e54a0c2..9785ec91654f7 100644 --- a/drivers/mmc/host/mtk-sd.c +++ b/drivers/mmc/host/mtk-sd.c @@ -2680,7 +2680,7 @@ static int msdc_drv_probe(struct platform_device *pdev)
host->irq = platform_get_irq(pdev, 0); if (host->irq < 0) { - ret = -EINVAL; + ret = host->irq; goto host_free; }
From: Sergey Shtylyov s.shtylyov@omp.ru
[ Upstream commit 8d84064da0d4672e74f984e8710f27881137472c ]
The driver overrides the error codes returned by platform_get_irq() to -ENXIO, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the error codes upstream.
Fixes: 9ec36cafe43b ("of/irq: do irq resolution in platform_get_irq") Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Link: https://lore.kernel.org/r/20230617203622.6812-5-s.shtylyov@omp.ru Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/host/mvsdio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mmc/host/mvsdio.c b/drivers/mmc/host/mvsdio.c index 629efbe639c4f..b4f6a0a2fcb51 100644 --- a/drivers/mmc/host/mvsdio.c +++ b/drivers/mmc/host/mvsdio.c @@ -704,7 +704,7 @@ static int mvsd_probe(struct platform_device *pdev) } irq = platform_get_irq(pdev, 0); if (irq < 0) - return -ENXIO; + return irq;
mmc = mmc_alloc_host(sizeof(struct mvsd_host), &pdev->dev); if (!mmc) {
From: Sergey Shtylyov s.shtylyov@omp.ru
[ Upstream commit aedf4ba1ad00aaa94c1b66c73ecaae95e2564b95 ]
The driver overrides the error codes returned by platform_get_irq() to -ENXIO, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the error codes upstream.
Fixes: 9ec36cafe43b ("of/irq: do irq resolution in platform_get_irq") Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Link: https://lore.kernel.org/r/20230617203622.6812-6-s.shtylyov@omp.ru Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/host/omap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mmc/host/omap.c b/drivers/mmc/host/omap.c index 57d39283924da..cc2213ea324f1 100644 --- a/drivers/mmc/host/omap.c +++ b/drivers/mmc/host/omap.c @@ -1343,7 +1343,7 @@ static int mmc_omap_probe(struct platform_device *pdev)
irq = platform_get_irq(pdev, 0); if (irq < 0) - return -ENXIO; + return irq;
res = platform_get_resource(pdev, IORESOURCE_MEM, 0); host->virt_base = devm_ioremap_resource(&pdev->dev, res);
From: Sergey Shtylyov s.shtylyov@omp.ru
[ Upstream commit fb51b74a57859b707c3e8055ed0c25a7ca4f6a29 ]
The driver overrides the error codes returned by platform_get_irq() to -ENXIO, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the error codes upstream.
Fixes: 9ec36cafe43b ("of/irq: do irq resolution in platform_get_irq") Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Link: https://lore.kernel.org/r/20230617203622.6812-7-s.shtylyov@omp.ru Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/host/omap_hsmmc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/mmc/host/omap_hsmmc.c b/drivers/mmc/host/omap_hsmmc.c index 4bd7447552055..2db3a16e63c48 100644 --- a/drivers/mmc/host/omap_hsmmc.c +++ b/drivers/mmc/host/omap_hsmmc.c @@ -1791,9 +1791,11 @@ static int omap_hsmmc_probe(struct platform_device *pdev) }
res = platform_get_resource(pdev, IORESOURCE_MEM, 0); - irq = platform_get_irq(pdev, 0); - if (res == NULL || irq < 0) + if (!res) return -ENXIO; + irq = platform_get_irq(pdev, 0); + if (irq < 0) + return irq;
base = devm_ioremap_resource(&pdev->dev, res); if (IS_ERR(base))
From: Sergey Shtylyov s.shtylyov@omp.ru
[ Upstream commit 3c482e1e830d79b9be8afb900a965135c01f7893 ]
The driver overrides the error codes returned by platform_get_irq() to -EINVAL, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the error codes upstream.
Fixes: ff65ffe46d28 ("mmc: Add Actions Semi Owl SoCs SD/MMC driver") Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Link: https://lore.kernel.org/r/20230617203622.6812-8-s.shtylyov@omp.ru Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/host/owl-mmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mmc/host/owl-mmc.c b/drivers/mmc/host/owl-mmc.c index 3dc143b039397..679b8b0b310e5 100644 --- a/drivers/mmc/host/owl-mmc.c +++ b/drivers/mmc/host/owl-mmc.c @@ -638,7 +638,7 @@ static int owl_mmc_probe(struct platform_device *pdev)
owl_host->irq = platform_get_irq(pdev, 0); if (owl_host->irq < 0) { - ret = -EINVAL; + ret = owl_host->irq; goto err_release_channel; }
From: Sergey Shtylyov s.shtylyov@omp.ru
[ Upstream commit b465dea5e1540c7d7b5211adaf94926980d3014b ]
The driver overrides the error codes returned by platform_get_irq() to -EINVAL, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the error codes upstream.
Fixes: 1b7ba57ecc86 ("mmc: sdhci-acpi: Handle return value of platform_get_irq") Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Acked-by: Adrian Hunter adrian.hunter@intel.com Link: https://lore.kernel.org/r/20230617203622.6812-9-s.shtylyov@omp.ru Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/host/sdhci-acpi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mmc/host/sdhci-acpi.c b/drivers/mmc/host/sdhci-acpi.c index 8f0e639236b15..edf2e6c14dc6f 100644 --- a/drivers/mmc/host/sdhci-acpi.c +++ b/drivers/mmc/host/sdhci-acpi.c @@ -829,7 +829,7 @@ static int sdhci_acpi_probe(struct platform_device *pdev) host->ops = &sdhci_acpi_ops_dflt; host->irq = platform_get_irq(pdev, 0); if (host->irq < 0) { - err = -EINVAL; + err = host->irq; goto err_free; }
From: Sergey Shtylyov s.shtylyov@omp.ru
[ Upstream commit 5b067d7f855c61df7f8e2e8ccbcee133c282415e ]
The driver overrides the error codes returned by platform_get_irq() to -ENXIO, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the error codes upstream.
Fixes: 9ec36cafe43b ("of/irq: do irq resolution in platform_get_irq") Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Link: https://lore.kernel.org/r/20230617203622.6812-11-s.shtylyov@omp.ru Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/host/sh_mmcif.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mmc/host/sh_mmcif.c b/drivers/mmc/host/sh_mmcif.c index 0fd4c9d644dd5..5cf53348372a4 100644 --- a/drivers/mmc/host/sh_mmcif.c +++ b/drivers/mmc/host/sh_mmcif.c @@ -1400,7 +1400,7 @@ static int sh_mmcif_probe(struct platform_device *pdev) irq[0] = platform_get_irq(pdev, 0); irq[1] = platform_get_irq_optional(pdev, 1); if (irq[0] < 0) - return -ENXIO; + return irq[0];
reg = devm_platform_ioremap_resource(pdev, 0); if (IS_ERR(reg))
From: Sergey Shtylyov s.shtylyov@omp.ru
[ Upstream commit 413db499730248431c1005b392e8ed82c4fa19bf ]
The driver overrides the error codes returned by platform_get_irq_byname() to -ENODEV, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating error codes upstream.
Fixes: 9ec36cafe43b ("of/irq: do irq resolution in platform_get_irq") Signed-off-by: Sergey Shtylyov s.shtylyov@omp.ru Link: https://lore.kernel.org/r/20230617203622.6812-13-s.shtylyov@omp.ru Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/host/usdhi6rol0.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/mmc/host/usdhi6rol0.c b/drivers/mmc/host/usdhi6rol0.c index 99515be6e5e57..2032e4e1ee68b 100644 --- a/drivers/mmc/host/usdhi6rol0.c +++ b/drivers/mmc/host/usdhi6rol0.c @@ -1757,8 +1757,10 @@ static int usdhi6_probe(struct platform_device *pdev) irq_cd = platform_get_irq_byname(pdev, "card detect"); irq_sd = platform_get_irq_byname(pdev, "data"); irq_sdio = platform_get_irq_byname(pdev, "SDIO"); - if (irq_sd < 0 || irq_sdio < 0) - return -ENODEV; + if (irq_sd < 0) + return irq_sd; + if (irq_sdio < 0) + return irq_sdio;
mmc = mmc_alloc_host(sizeof(struct usdhi6_host), dev); if (!mmc)
From: Terin Stock terin@cloudflare.com
[ Upstream commit d7fce52fdf96663ddc2eb21afecff3775588612a ]
When using encapsulation the original packet's headers are copied to the inner headers. This preserves the space for an inner mac header, which is not used by the inner payloads for the encapsulation types supported by IPVS. If a packet is using GUE or GRE encapsulation and needs to be segmented, flow can be passed to __skb_udp_tunnel_segment() which calculates a negative tunnel header length. A negative tunnel header length causes pskb_may_pull() to fail, dropping the packet.
This can be observed by attaching probes to ip_vs_in_hook(), __dev_queue_xmit(), and __skb_udp_tunnel_segment():
perf probe --add '__dev_queue_xmit skb->inner_mac_header \ skb->inner_network_header skb->mac_header skb->network_header' perf probe --add '__skb_udp_tunnel_segment:7 tnl_hlen' perf probe -m ip_vs --add 'ip_vs_in_hook skb->inner_mac_header \ skb->inner_network_header skb->mac_header skb->network_header'
These probes the headers and tunnel header length for packets which traverse the IPVS encapsulation path. A TCP packet can be forced into the segmentation path by being smaller than a calculated clamped MSS, but larger than the advertised MSS.
probe:ip_vs_in_hook: inner_mac_header=0x0 inner_network_header=0x0 mac_header=0x44 network_header=0x52 probe:ip_vs_in_hook: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32 probe:dev_queue_xmit: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32 probe:__skb_udp_tunnel_segment_L7: tnl_hlen=-2
When using veth-based encapsulation, the interfaces are set to be mac-less, which does not preserve space for an inner mac header. This prevents this issue from occurring.
In our real-world testing of sending a 32KB file we observed operation time increasing from ~75ms for veth-based encapsulation to over 1.5s using IPVS encapsulation due to retries from dropped packets.
This changeset modifies the packet on the encapsulation path in ip_vs_tunnel_xmit() and ip_vs_tunnel_xmit_v6() to remove the inner mac header offset. This fixes UDP segmentation for both encapsulation types, and corrects the inner headers for any IPIP flows that may use it.
Fixes: 84c0d5e96f3a ("ipvs: allow tunneling with gue encapsulation") Signed-off-by: Terin Stock terin@cloudflare.com Acked-by: Julian Anastasov ja@ssi.bg Acked-by: Simon Horman horms@kernel.org Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/ipvs/ip_vs_xmit.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c index 80448885c3d71..b452eb3ddcecb 100644 --- a/net/netfilter/ipvs/ip_vs_xmit.c +++ b/net/netfilter/ipvs/ip_vs_xmit.c @@ -1225,6 +1225,7 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp, skb->transport_header = skb->network_header;
skb_set_inner_ipproto(skb, next_protocol); + skb_set_inner_mac_header(skb, skb_inner_network_offset(skb));
if (tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE) { bool check = false; @@ -1373,6 +1374,7 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp, skb->transport_header = skb->network_header;
skb_set_inner_ipproto(skb, next_protocol); + skb_set_inner_mac_header(skb, skb_inner_network_offset(skb));
if (tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE) { bool check = false;
From: Arınç ÜNAL arinc.unal@arinc9.com
[ Upstream commit 4ae90f90e4909e3014e2dc6a0627964617a7b824 ]
All MT7530 switch IP variants share the MT7530_MFC register, but the current driver only writes it for the switch variant that is integrated in the MT7621 SoC. Modify the code to include all MT7530 derivatives.
Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Suggested-by: Vladimir Oltean olteanv@gmail.com Signed-off-by: Arınç ÜNAL arinc.unal@arinc9.com Reviewed-by: Vladimir Oltean olteanv@gmail.com Reviewed-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/mt7530.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c index 0c81f5830b90a..7dbbd5e509d3b 100644 --- a/drivers/net/dsa/mt7530.c +++ b/drivers/net/dsa/mt7530.c @@ -1013,7 +1013,7 @@ mt753x_cpu_port_enable(struct dsa_switch *ds, int port) UNU_FFP(BIT(port)));
/* Set CPU port number */ - if (priv->id == ID_MT7621) + if (priv->id == ID_MT7530 || priv->id == ID_MT7621) mt7530_rmw(priv, MT7530_MFC, CPU_MASK, CPU_EN | CPU_PORT(port));
/* CPU port gets connected to all user ports of
From: Arınç ÜNAL arinc.unal@arinc9.com
[ Upstream commit d7c66073559386b836bded7cdc8b66ee5c049129 ]
BPDUs are link-local frames, therefore they must be trapped to the CPU port. Currently, the MT7530 switch treats BPDUs as regular multicast frames, therefore flooding them to user ports. To fix this, set BPDUs to be trapped to the CPU port. Group this on mt7530_setup() and mt7531_setup_common() into mt753x_trap_frames() and call that.
Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: Arınç ÜNAL arinc.unal@arinc9.com Reviewed-by: Vladimir Oltean olteanv@gmail.com Reviewed-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/mt7530.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c index 7dbbd5e509d3b..40be635d2ecc9 100644 --- a/drivers/net/dsa/mt7530.c +++ b/drivers/net/dsa/mt7530.c @@ -991,6 +991,14 @@ static void mt7530_setup_port5(struct dsa_switch *ds, phy_interface_t interface) mutex_unlock(&priv->reg_mutex); }
+static void +mt753x_trap_frames(struct mt7530_priv *priv) +{ + /* Trap BPDUs to the CPU port(s) */ + mt7530_rmw(priv, MT753X_BPC, MT753X_BPDU_PORT_FW_MASK, + MT753X_BPDU_CPU_ONLY); +} + static int mt753x_cpu_port_enable(struct dsa_switch *ds, int port) { @@ -2214,6 +2222,8 @@ mt7530_setup(struct dsa_switch *ds)
priv->p6_interface = PHY_INTERFACE_MODE_NA;
+ mt753x_trap_frames(priv); + /* Enable and reset MIB counters */ mt7530_mib_reset(ds);
@@ -2320,8 +2330,8 @@ mt7531_setup_common(struct dsa_switch *ds) BIT(cpu_dp->index)); break; } - mt7530_rmw(priv, MT753X_BPC, MT753X_BPDU_PORT_FW_MASK, - MT753X_BPDU_CPU_ONLY); + + mt753x_trap_frames(priv);
/* Enable and reset MIB counters */ mt7530_mib_reset(ds);
From: Arınç ÜNAL arinc.unal@arinc9.com
[ Upstream commit 8332cf6fd7c7087dbc2067115b33979c9851bbc4 ]
LLDP frames are link-local frames, therefore they must be trapped to the CPU port. Currently, the MT753X switches treat LLDP frames as regular multicast frames, therefore flooding them to user ports. To fix this, set LLDP frames to be trapped to the CPU port(s).
Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: Arınç ÜNAL arinc.unal@arinc9.com Reviewed-by: Vladimir Oltean olteanv@gmail.com Reviewed-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/mt7530.c | 4 ++++ drivers/net/dsa/mt7530.h | 5 +++++ 2 files changed, 9 insertions(+)
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c index 40be635d2ecc9..e542f5dbe5831 100644 --- a/drivers/net/dsa/mt7530.c +++ b/drivers/net/dsa/mt7530.c @@ -997,6 +997,10 @@ mt753x_trap_frames(struct mt7530_priv *priv) /* Trap BPDUs to the CPU port(s) */ mt7530_rmw(priv, MT753X_BPC, MT753X_BPDU_PORT_FW_MASK, MT753X_BPDU_CPU_ONLY); + + /* Trap LLDP frames with :0E MAC DA to the CPU port(s) */ + mt7530_rmw(priv, MT753X_RGAC2, MT753X_R0E_PORT_FW_MASK, + MT753X_R0E_PORT_FW(MT753X_BPDU_CPU_ONLY)); }
static int diff --git a/drivers/net/dsa/mt7530.h b/drivers/net/dsa/mt7530.h index 6b2fc6290ea84..31c0f7156b699 100644 --- a/drivers/net/dsa/mt7530.h +++ b/drivers/net/dsa/mt7530.h @@ -65,6 +65,11 @@ enum mt753x_id { #define MT753X_BPC 0x24 #define MT753X_BPDU_PORT_FW_MASK GENMASK(2, 0)
+/* Register for :03 and :0E MAC DA frame control */ +#define MT753X_RGAC2 0x2c +#define MT753X_R0E_PORT_FW_MASK GENMASK(18, 16) +#define MT753X_R0E_PORT_FW(x) FIELD_PREP(MT753X_R0E_PORT_FW_MASK, x) + enum mt753x_bpdu_port_fw { MT753X_BPDU_FOLLOW_MFC, MT753X_BPDU_CPU_EXCLUDE = 4,
From: Vladimir Oltean olteanv@gmail.com
[ Upstream commit b79d7c14f48083abb3fb061370c0c64a569edf4c ]
Since the introduction of the OF bindings, DSA has always had a policy that in case multiple CPU ports are present in the device tree, the numerically smallest one is always chosen.
The MT7530 switch family, except the switch on the MT7988 SoC, has 2 CPU ports, 5 and 6, where port 6 is preferable on the MT7531BE switch because it has higher bandwidth.
The MT7530 driver developers had 3 options: - to modify DSA when the MT7531 switch support was introduced, such as to prefer the better port - to declare both CPU ports in device trees as CPU ports, and live with the sub-optimal performance resulting from not preferring the better port - to declare just port 6 in the device tree as a CPU port
Of course they chose the path of least resistance (3rd option), kicking the can down the road. The hardware description in the device tree is supposed to be stable - developers are not supposed to adopt the strategy of piecemeal hardware description, where the device tree is updated in lockstep with the features that the kernel currently supports.
Now, as a result of the fact that they did that, any attempts to modify the device tree and describe both CPU ports as CPU ports would make DSA change its default selection from port 6 to 5, effectively resulting in a performance degradation visible to users with the MT7531BE switch as can be seen below.
Without preferring port 6:
[ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-20.00 sec 374 MBytes 157 Mbits/sec 734 sender [ 5][TX-C] 0.00-20.00 sec 373 MBytes 156 Mbits/sec receiver [ 7][RX-C] 0.00-20.00 sec 1.81 GBytes 778 Mbits/sec 0 sender [ 7][RX-C] 0.00-20.00 sec 1.81 GBytes 777 Mbits/sec receiver
With preferring port 6:
[ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-20.00 sec 1.99 GBytes 856 Mbits/sec 273 sender [ 5][TX-C] 0.00-20.00 sec 1.99 GBytes 855 Mbits/sec receiver [ 7][RX-C] 0.00-20.00 sec 1.72 GBytes 737 Mbits/sec 15 sender [ 7][RX-C] 0.00-20.00 sec 1.71 GBytes 736 Mbits/sec receiver
Using one port for WAN and the other ports for LAN is a very popular use case which is what this test emulates.
As such, this change proposes that we retroactively modify stable kernels (which don't support the modification of the CPU port assignments, so as to let user space fix the problem and restore the throughput) to keep the mt7530 driver preferring port 6 even with device trees where the hardware is more fully described.
Fixes: c288575f7810 ("net: dsa: mt7530: Add the support of MT7531 switch") Signed-off-by: Vladimir Oltean olteanv@gmail.com Signed-off-by: Arınç ÜNAL arinc.unal@arinc9.com Reviewed-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/mt7530.c | 15 +++++++++++++++ include/net/dsa.h | 8 ++++++++ net/dsa/dsa.c | 24 +++++++++++++++++++++++- 3 files changed, 46 insertions(+), 1 deletion(-)
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c index e542f5dbe5831..566545b6554ba 100644 --- a/drivers/net/dsa/mt7530.c +++ b/drivers/net/dsa/mt7530.c @@ -419,6 +419,20 @@ static void mt7530_pll_setup(struct mt7530_priv *priv) core_set(priv, CORE_TRGMII_GSW_CLK_CG, REG_GSWCK_EN); }
+/* If port 6 is available as a CPU port, always prefer that as the default, + * otherwise don't care. + */ +static struct dsa_port * +mt753x_preferred_default_local_cpu_port(struct dsa_switch *ds) +{ + struct dsa_port *cpu_dp = dsa_to_port(ds, 6); + + if (dsa_port_is_cpu(cpu_dp)) + return cpu_dp; + + return NULL; +} + /* Setup port 6 interface mode and TRGMII TX circuit */ static int mt7530_pad_clk_setup(struct dsa_switch *ds, phy_interface_t interface) @@ -3191,6 +3205,7 @@ static int mt753x_set_mac_eee(struct dsa_switch *ds, int port, static const struct dsa_switch_ops mt7530_switch_ops = { .get_tag_protocol = mtk_get_tag_protocol, .setup = mt753x_setup, + .preferred_default_local_cpu_port = mt753x_preferred_default_local_cpu_port, .get_strings = mt7530_get_strings, .get_ethtool_stats = mt7530_get_ethtool_stats, .get_sset_count = mt7530_get_sset_count, diff --git a/include/net/dsa.h b/include/net/dsa.h index a15f17a38eca6..def06ef676dd8 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -973,6 +973,14 @@ struct dsa_switch_ops { struct phy_device *phy); void (*port_disable)(struct dsa_switch *ds, int port);
+ /* + * Compatibility between device trees defining multiple CPU ports and + * drivers which are not OK to use by default the numerically smallest + * CPU port of a switch for its local ports. This can return NULL, + * meaning "don't know/don't care". + */ + struct dsa_port *(*preferred_default_local_cpu_port)(struct dsa_switch *ds); + /* * Port's MAC EEE settings */ diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index e5f156940c671..6cd8607a3928f 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -402,6 +402,24 @@ static int dsa_tree_setup_default_cpu(struct dsa_switch_tree *dst) return 0; }
+static struct dsa_port * +dsa_switch_preferred_default_local_cpu_port(struct dsa_switch *ds) +{ + struct dsa_port *cpu_dp; + + if (!ds->ops->preferred_default_local_cpu_port) + return NULL; + + cpu_dp = ds->ops->preferred_default_local_cpu_port(ds); + if (!cpu_dp) + return NULL; + + if (WARN_ON(!dsa_port_is_cpu(cpu_dp) || cpu_dp->ds != ds)) + return NULL; + + return cpu_dp; +} + /* Perform initial assignment of CPU ports to user ports and DSA links in the * fabric, giving preference to CPU ports local to each switch. Default to * using the first CPU port in the switch tree if the port does not have a CPU @@ -409,12 +427,16 @@ static int dsa_tree_setup_default_cpu(struct dsa_switch_tree *dst) */ static int dsa_tree_setup_cpu_ports(struct dsa_switch_tree *dst) { - struct dsa_port *cpu_dp, *dp; + struct dsa_port *preferred_cpu_dp, *cpu_dp, *dp;
list_for_each_entry(cpu_dp, &dst->ports, list) { if (!dsa_port_is_cpu(cpu_dp)) continue;
+ preferred_cpu_dp = dsa_switch_preferred_default_local_cpu_port(cpu_dp->ds); + if (preferred_cpu_dp && preferred_cpu_dp != cpu_dp) + continue; + /* Prefer a local CPU port */ dsa_switch_for_each_port(dp, cpu_dp->ds) { /* Prefer the first local CPU port found */
From: Ross Lagerwall ross.lagerwall@citrix.com
[ Upstream commit 7580e0a78eb29e7bb1a772eba4088250bbb70d41 ]
We have seen a bug where the NIC incorrectly changes the length in the IP header of a padded packet to include the padding bytes. The driver already has a workaround for this so do the workaround for this NIC too. This resolves the issue.
The NIC in question identifies itself as follows:
[ 8.828494] be2net 0000:02:00.0: FW version is 10.7.110.31 [ 8.834759] be2net 0000:02:00.0: Emulex OneConnect(be3): PF FLEX10 port 1
02:00.0 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01)
Fixes: ca34fe38f06d ("be2net: fix wrong usage of adapter->generation") Signed-off-by: Ross Lagerwall ross.lagerwall@citrix.com Link: https://lore.kernel.org/r/20230616164549.2863037-1-ross.lagerwall@citrix.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/emulex/benet/be_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index 46fe3d74e2e98..7236c4f70cf51 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -1136,8 +1136,8 @@ static struct sk_buff *be_lancer_xmit_workarounds(struct be_adapter *adapter, eth_hdr_len = ntohs(skb->protocol) == ETH_P_8021Q ? VLAN_ETH_HLEN : ETH_HLEN; if (skb->len <= 60 && - (lancer_chip(adapter) || skb_vlan_tag_present(skb)) && - is_ipv4_pkt(skb)) { + (lancer_chip(adapter) || BE3_chip(adapter) || + skb_vlan_tag_present(skb)) && is_ipv4_pkt(skb)) { ip = (struct iphdr *)ip_hdr(skb); pskb_trim(skb, eth_hdr_len + ntohs(ip->tot_len)); }
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 4bedf9eee016286c835e3d8fa981ddece5338795 ]
Add bound flag to rule and chain transactions as in 6a0a8d10a366 ("netfilter: nf_tables: use-after-free in failing rule with bound set") to skip them in case that the chain is already bound from the abort path.
This patch fixes an imbalance in the chain use refcnt that triggers a WARN_ON on the table and chain destroy path.
This patch also disallows nested chain bindings, which is not supported from userspace.
The logic to deal with chain binding in nft_data_hold() and nft_data_release() is not correct. The NFT_TRANS_PREPARE state needs a special handling in case a chain is bound but next expressions in the same rule fail to initialize as described by 1240eb93f061 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE").
The chain is left bound if rule construction fails, so the objects stored in this chain (and the chain itself) are released by the transaction records from the abort path, follow up patch ("netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain") completes this error handling.
When deleting an existing rule, chain bound flag is set off so the rule expression .destroy path releases the objects.
Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/netfilter/nf_tables.h | 21 +++++++- net/netfilter/nf_tables_api.c | 86 +++++++++++++++++++----------- net/netfilter/nft_immediate.c | 87 +++++++++++++++++++++++++++---- 3 files changed, 153 insertions(+), 41 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 300bce4d67cec..91bf88dab245f 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -1009,7 +1009,10 @@ static inline struct nft_userdata *nft_userdata(const struct nft_rule *rule) return (void *)&rule->data[rule->dlen]; }
-void nf_tables_rule_release(const struct nft_ctx *ctx, struct nft_rule *rule); +void nft_rule_expr_activate(const struct nft_ctx *ctx, struct nft_rule *rule); +void nft_rule_expr_deactivate(const struct nft_ctx *ctx, struct nft_rule *rule, + enum nft_trans_phase phase); +void nf_tables_rule_destroy(const struct nft_ctx *ctx, struct nft_rule *rule);
static inline void nft_set_elem_update_expr(const struct nft_set_ext *ext, struct nft_regs *regs, @@ -1092,6 +1095,7 @@ int nft_setelem_validate(const struct nft_ctx *ctx, struct nft_set *set, const struct nft_set_iter *iter, struct nft_set_elem *elem); int nft_set_catchall_validate(const struct nft_ctx *ctx, struct nft_set *set); +int nf_tables_bind_chain(const struct nft_ctx *ctx, struct nft_chain *chain);
enum nft_chain_types { NFT_CHAIN_T_DEFAULT = 0, @@ -1128,11 +1132,17 @@ int nft_chain_validate_dependency(const struct nft_chain *chain, int nft_chain_validate_hooks(const struct nft_chain *chain, unsigned int hook_flags);
+static inline bool nft_chain_binding(const struct nft_chain *chain) +{ + return chain->flags & NFT_CHAIN_BINDING; +} + static inline bool nft_chain_is_bound(struct nft_chain *chain) { return (chain->flags & NFT_CHAIN_BINDING) && chain->bound; }
+int nft_chain_add(struct nft_table *table, struct nft_chain *chain); void nft_chain_del(struct nft_chain *chain); void nf_tables_chain_destroy(struct nft_ctx *ctx);
@@ -1567,6 +1577,7 @@ struct nft_trans_rule { struct nft_rule *rule; struct nft_flow_rule *flow; u32 rule_id; + bool bound; };
#define nft_trans_rule(trans) \ @@ -1575,6 +1586,8 @@ struct nft_trans_rule { (((struct nft_trans_rule *)trans->data)->flow) #define nft_trans_rule_id(trans) \ (((struct nft_trans_rule *)trans->data)->rule_id) +#define nft_trans_rule_bound(trans) \ + (((struct nft_trans_rule *)trans->data)->bound)
struct nft_trans_set { struct nft_set *set; @@ -1599,15 +1612,19 @@ struct nft_trans_set { (((struct nft_trans_set *)trans->data)->gc_int)
struct nft_trans_chain { + struct nft_chain *chain; bool update; char *name; struct nft_stats __percpu *stats; u8 policy; + bool bound; u32 chain_id; struct nft_base_chain *basechain; struct list_head hook_list; };
+#define nft_trans_chain(trans) \ + (((struct nft_trans_chain *)trans->data)->chain) #define nft_trans_chain_update(trans) \ (((struct nft_trans_chain *)trans->data)->update) #define nft_trans_chain_name(trans) \ @@ -1616,6 +1633,8 @@ struct nft_trans_chain { (((struct nft_trans_chain *)trans->data)->stats) #define nft_trans_chain_policy(trans) \ (((struct nft_trans_chain *)trans->data)->policy) +#define nft_trans_chain_bound(trans) \ + (((struct nft_trans_chain *)trans->data)->bound) #define nft_trans_chain_id(trans) \ (((struct nft_trans_chain *)trans->data)->chain_id) #define nft_trans_basechain(trans) \ diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 8f63514656a17..281ee688128ab 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -195,6 +195,48 @@ static void nft_set_trans_bind(const struct nft_ctx *ctx, struct nft_set *set) } }
+static void nft_chain_trans_bind(const struct nft_ctx *ctx, struct nft_chain *chain) +{ + struct nftables_pernet *nft_net; + struct net *net = ctx->net; + struct nft_trans *trans; + + if (!nft_chain_binding(chain)) + return; + + nft_net = nft_pernet(net); + list_for_each_entry_reverse(trans, &nft_net->commit_list, list) { + switch (trans->msg_type) { + case NFT_MSG_NEWCHAIN: + if (nft_trans_chain(trans) == chain) + nft_trans_chain_bound(trans) = true; + break; + case NFT_MSG_NEWRULE: + if (trans->ctx.chain == chain) + nft_trans_rule_bound(trans) = true; + break; + } + } +} + +int nf_tables_bind_chain(const struct nft_ctx *ctx, struct nft_chain *chain) +{ + if (!nft_chain_binding(chain)) + return 0; + + if (nft_chain_binding(ctx->chain)) + return -EOPNOTSUPP; + + if (chain->bound) + return -EBUSY; + + chain->bound = true; + chain->use++; + nft_chain_trans_bind(ctx, chain); + + return 0; +} + static int nft_netdev_register_hooks(struct net *net, struct list_head *hook_list) { @@ -340,8 +382,9 @@ static struct nft_trans *nft_trans_chain_add(struct nft_ctx *ctx, int msg_type) ntohl(nla_get_be32(ctx->nla[NFTA_CHAIN_ID])); } } - + nft_trans_chain(trans) = ctx->chain; nft_trans_commit_list_add_tail(ctx->net, trans); + return trans; }
@@ -359,8 +402,7 @@ static int nft_delchain(struct nft_ctx *ctx) return 0; }
-static void nft_rule_expr_activate(const struct nft_ctx *ctx, - struct nft_rule *rule) +void nft_rule_expr_activate(const struct nft_ctx *ctx, struct nft_rule *rule) { struct nft_expr *expr;
@@ -373,9 +415,8 @@ static void nft_rule_expr_activate(const struct nft_ctx *ctx, } }
-static void nft_rule_expr_deactivate(const struct nft_ctx *ctx, - struct nft_rule *rule, - enum nft_trans_phase phase) +void nft_rule_expr_deactivate(const struct nft_ctx *ctx, struct nft_rule *rule, + enum nft_trans_phase phase) { struct nft_expr *expr;
@@ -2221,7 +2262,7 @@ static int nft_basechain_init(struct nft_base_chain *basechain, u8 family, return 0; }
-static int nft_chain_add(struct nft_table *table, struct nft_chain *chain) +int nft_chain_add(struct nft_table *table, struct nft_chain *chain) { int err;
@@ -3427,8 +3468,7 @@ static int nf_tables_getrule(struct sk_buff *skb, const struct nfnl_info *info, return err; }
-static void nf_tables_rule_destroy(const struct nft_ctx *ctx, - struct nft_rule *rule) +void nf_tables_rule_destroy(const struct nft_ctx *ctx, struct nft_rule *rule) { struct nft_expr *expr, *next;
@@ -3445,7 +3485,7 @@ static void nf_tables_rule_destroy(const struct nft_ctx *ctx, kfree(rule); }
-void nf_tables_rule_release(const struct nft_ctx *ctx, struct nft_rule *rule) +static void nf_tables_rule_release(const struct nft_ctx *ctx, struct nft_rule *rule) { nft_rule_expr_deactivate(ctx, rule, NFT_TRANS_RELEASE); nf_tables_rule_destroy(ctx, rule); @@ -6570,7 +6610,6 @@ static int nf_tables_newsetelem(struct sk_buff *skb, void nft_data_hold(const struct nft_data *data, enum nft_data_types type) { struct nft_chain *chain; - struct nft_rule *rule;
if (type == NFT_DATA_VERDICT) { switch (data->verdict.code) { @@ -6578,15 +6617,6 @@ void nft_data_hold(const struct nft_data *data, enum nft_data_types type) case NFT_GOTO: chain = data->verdict.chain; chain->use++; - - if (!nft_chain_is_bound(chain)) - break; - - chain->table->use++; - list_for_each_entry(rule, &chain->rules, list) - chain->use++; - - nft_chain_add(chain->table, chain); break; } } @@ -9583,7 +9613,7 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action) kfree(nft_trans_chain_name(trans)); nft_trans_destroy(trans); } else { - if (nft_chain_is_bound(trans->ctx.chain)) { + if (nft_trans_chain_bound(trans)) { nft_trans_destroy(trans); break; } @@ -9601,6 +9631,10 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action) nft_trans_destroy(trans); break; case NFT_MSG_NEWRULE: + if (nft_trans_rule_bound(trans)) { + nft_trans_destroy(trans); + break; + } trans->ctx.chain->use--; list_del_rcu(&nft_trans_rule(trans)->list); nft_rule_expr_deactivate(&trans->ctx, @@ -10164,22 +10198,12 @@ static int nft_verdict_init(const struct nft_ctx *ctx, struct nft_data *data, static void nft_verdict_uninit(const struct nft_data *data) { struct nft_chain *chain; - struct nft_rule *rule;
switch (data->verdict.code) { case NFT_JUMP: case NFT_GOTO: chain = data->verdict.chain; chain->use--; - - if (!nft_chain_is_bound(chain)) - break; - - chain->table->use--; - list_for_each_entry(rule, &chain->rules, list) - chain->use--; - - nft_chain_del(chain); break; } } diff --git a/net/netfilter/nft_immediate.c b/net/netfilter/nft_immediate.c index c9d2f7c29f530..0492a04064f11 100644 --- a/net/netfilter/nft_immediate.c +++ b/net/netfilter/nft_immediate.c @@ -76,11 +76,9 @@ static int nft_immediate_init(const struct nft_ctx *ctx, switch (priv->data.verdict.code) { case NFT_JUMP: case NFT_GOTO: - if (nft_chain_is_bound(chain)) { - err = -EBUSY; - goto err1; - } - chain->bound = true; + err = nf_tables_bind_chain(ctx, chain); + if (err < 0) + return err; break; default: break; @@ -98,6 +96,31 @@ static void nft_immediate_activate(const struct nft_ctx *ctx, const struct nft_expr *expr) { const struct nft_immediate_expr *priv = nft_expr_priv(expr); + const struct nft_data *data = &priv->data; + struct nft_ctx chain_ctx; + struct nft_chain *chain; + struct nft_rule *rule; + + if (priv->dreg == NFT_REG_VERDICT) { + switch (data->verdict.code) { + case NFT_JUMP: + case NFT_GOTO: + chain = data->verdict.chain; + if (!nft_chain_binding(chain)) + break; + + chain_ctx = *ctx; + chain_ctx.chain = chain; + + list_for_each_entry(rule, &chain->rules, list) + nft_rule_expr_activate(&chain_ctx, rule); + + nft_clear(ctx->net, chain); + break; + default: + break; + } + }
return nft_data_hold(&priv->data, nft_dreg_to_type(priv->dreg)); } @@ -107,6 +130,40 @@ static void nft_immediate_deactivate(const struct nft_ctx *ctx, enum nft_trans_phase phase) { const struct nft_immediate_expr *priv = nft_expr_priv(expr); + const struct nft_data *data = &priv->data; + struct nft_ctx chain_ctx; + struct nft_chain *chain; + struct nft_rule *rule; + + if (priv->dreg == NFT_REG_VERDICT) { + switch (data->verdict.code) { + case NFT_JUMP: + case NFT_GOTO: + chain = data->verdict.chain; + if (!nft_chain_binding(chain)) + break; + + chain_ctx = *ctx; + chain_ctx.chain = chain; + + list_for_each_entry(rule, &chain->rules, list) + nft_rule_expr_deactivate(&chain_ctx, rule, phase); + + switch (phase) { + case NFT_TRANS_PREPARE: + nft_deactivate_next(ctx->net, chain); + break; + default: + nft_chain_del(chain); + chain->bound = false; + chain->table->use--; + break; + } + break; + default: + break; + } + }
if (phase == NFT_TRANS_COMMIT) return; @@ -131,15 +188,27 @@ static void nft_immediate_destroy(const struct nft_ctx *ctx, case NFT_GOTO: chain = data->verdict.chain;
- if (!nft_chain_is_bound(chain)) + if (!nft_chain_binding(chain)) + break; + + /* Rule construction failed, but chain is already bound: + * let the transaction records release this chain and its rules. + */ + if (chain->bound) { + chain->use--; break; + }
+ /* Rule has been deleted, release chain and its rules. */ chain_ctx = *ctx; chain_ctx.chain = chain;
- list_for_each_entry_safe(rule, n, &chain->rules, list) - nf_tables_rule_release(&chain_ctx, rule); - + chain->use--; + list_for_each_entry_safe(rule, n, &chain->rules, list) { + chain->use--; + list_del(&rule->list); + nf_tables_rule_destroy(&chain_ctx, rule); + } nf_tables_chain_destroy(&chain_ctx); break; default:
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 26b5a5712eb85e253724e56a54c17f8519bd8e4e ]
Add a new state to deal with rule expressions deactivation from the newrule error path, otherwise the anonymous set remains in the list in inactive state for the next generation. Mark the set/chain transaction as unbound so the abort path releases this object, set it as inactive in the next generation so it is not reachable anymore from this transaction and reference counter is dropped.
Fixes: 1240eb93f061 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/netfilter/nf_tables.h | 2 ++ net/netfilter/nf_tables_api.c | 45 ++++++++++++++++++++++++++----- net/netfilter/nft_immediate.c | 3 +++ 3 files changed, 43 insertions(+), 7 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 91bf88dab245f..55c14754e5e4e 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -901,6 +901,7 @@ struct nft_expr_type {
enum nft_trans_phase { NFT_TRANS_PREPARE, + NFT_TRANS_PREPARE_ERROR, NFT_TRANS_ABORT, NFT_TRANS_COMMIT, NFT_TRANS_RELEASE @@ -1096,6 +1097,7 @@ int nft_setelem_validate(const struct nft_ctx *ctx, struct nft_set *set, struct nft_set_elem *elem); int nft_set_catchall_validate(const struct nft_ctx *ctx, struct nft_set *set); int nf_tables_bind_chain(const struct nft_ctx *ctx, struct nft_chain *chain); +void nf_tables_unbind_chain(const struct nft_ctx *ctx, struct nft_chain *chain);
enum nft_chain_types { NFT_CHAIN_T_DEFAULT = 0, diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 281ee688128ab..837925cf0b848 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -171,7 +171,8 @@ static void nft_trans_destroy(struct nft_trans *trans) kfree(trans); }
-static void nft_set_trans_bind(const struct nft_ctx *ctx, struct nft_set *set) +static void __nft_set_trans_bind(const struct nft_ctx *ctx, struct nft_set *set, + bool bind) { struct nftables_pernet *nft_net; struct net *net = ctx->net; @@ -185,17 +186,28 @@ static void nft_set_trans_bind(const struct nft_ctx *ctx, struct nft_set *set) switch (trans->msg_type) { case NFT_MSG_NEWSET: if (nft_trans_set(trans) == set) - nft_trans_set_bound(trans) = true; + nft_trans_set_bound(trans) = bind; break; case NFT_MSG_NEWSETELEM: if (nft_trans_elem_set(trans) == set) - nft_trans_elem_set_bound(trans) = true; + nft_trans_elem_set_bound(trans) = bind; break; } } }
-static void nft_chain_trans_bind(const struct nft_ctx *ctx, struct nft_chain *chain) +static void nft_set_trans_bind(const struct nft_ctx *ctx, struct nft_set *set) +{ + return __nft_set_trans_bind(ctx, set, true); +} + +static void nft_set_trans_unbind(const struct nft_ctx *ctx, struct nft_set *set) +{ + return __nft_set_trans_bind(ctx, set, false); +} + +static void __nft_chain_trans_bind(const struct nft_ctx *ctx, + struct nft_chain *chain, bool bind) { struct nftables_pernet *nft_net; struct net *net = ctx->net; @@ -209,16 +221,22 @@ static void nft_chain_trans_bind(const struct nft_ctx *ctx, struct nft_chain *ch switch (trans->msg_type) { case NFT_MSG_NEWCHAIN: if (nft_trans_chain(trans) == chain) - nft_trans_chain_bound(trans) = true; + nft_trans_chain_bound(trans) = bind; break; case NFT_MSG_NEWRULE: if (trans->ctx.chain == chain) - nft_trans_rule_bound(trans) = true; + nft_trans_rule_bound(trans) = bind; break; } } }
+static void nft_chain_trans_bind(const struct nft_ctx *ctx, + struct nft_chain *chain) +{ + __nft_chain_trans_bind(ctx, chain, true); +} + int nf_tables_bind_chain(const struct nft_ctx *ctx, struct nft_chain *chain) { if (!nft_chain_binding(chain)) @@ -237,6 +255,11 @@ int nf_tables_bind_chain(const struct nft_ctx *ctx, struct nft_chain *chain) return 0; }
+void nf_tables_unbind_chain(const struct nft_ctx *ctx, struct nft_chain *chain) +{ + __nft_chain_trans_bind(ctx, chain, false); +} + static int nft_netdev_register_hooks(struct net *net, struct list_head *hook_list) { @@ -3821,7 +3844,7 @@ static int nf_tables_newrule(struct sk_buff *skb, const struct nfnl_info *info, if (flow) nft_flow_rule_destroy(flow); err_release_rule: - nft_rule_expr_deactivate(&ctx, rule, NFT_TRANS_PREPARE); + nft_rule_expr_deactivate(&ctx, rule, NFT_TRANS_PREPARE_ERROR); nf_tables_rule_destroy(&ctx, rule); err_release_expr: for (i = 0; i < n; i++) { @@ -5114,6 +5137,13 @@ void nf_tables_deactivate_set(const struct nft_ctx *ctx, struct nft_set *set, enum nft_trans_phase phase) { switch (phase) { + case NFT_TRANS_PREPARE_ERROR: + nft_set_trans_unbind(ctx, set); + if (nft_set_is_anonymous(set)) + nft_deactivate_next(ctx->net, set); + + set->use--; + break; case NFT_TRANS_PREPARE: if (nft_set_is_anonymous(set)) nft_deactivate_next(ctx->net, set); @@ -7626,6 +7656,7 @@ void nf_tables_deactivate_flowtable(const struct nft_ctx *ctx, enum nft_trans_phase phase) { switch (phase) { + case NFT_TRANS_PREPARE_ERROR: case NFT_TRANS_PREPARE: case NFT_TRANS_ABORT: case NFT_TRANS_RELEASE: diff --git a/net/netfilter/nft_immediate.c b/net/netfilter/nft_immediate.c index 0492a04064f11..3d76ebfe8939b 100644 --- a/net/netfilter/nft_immediate.c +++ b/net/netfilter/nft_immediate.c @@ -150,6 +150,9 @@ static void nft_immediate_deactivate(const struct nft_ctx *ctx, nft_rule_expr_deactivate(&chain_ctx, rule, phase);
switch (phase) { + case NFT_TRANS_PREPARE_ERROR: + nf_tables_unbind_chain(ctx, chain); + fallthrough; case NFT_TRANS_PREPARE: nft_deactivate_next(ctx->net, chain); break;
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 628bd3e49cba1c066228e23d71a852c23e26da73 ]
set .destroy callback releases the references to other objects in maps. This is very late and it results in spurious EBUSY errors. Drop refcount from the preparation phase instead, update set backend not to drop reference counter from set .destroy path.
Exceptions: NFT_TRANS_PREPARE_ERROR does not require to drop the reference counter because the transaction abort path releases the map references for each element since the set is unbound. The abort path also deals with releasing reference counter for new elements added to unbound sets.
Fixes: 591054469b3e ("netfilter: nf_tables: revisit chain/object refcounting from elements") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/netfilter/nf_tables.h | 5 +- net/netfilter/nf_tables_api.c | 147 ++++++++++++++++++++++++++---- net/netfilter/nft_set_bitmap.c | 5 +- net/netfilter/nft_set_hash.c | 23 ++++- net/netfilter/nft_set_pipapo.c | 14 ++- net/netfilter/nft_set_rbtree.c | 5 +- 6 files changed, 167 insertions(+), 32 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 55c14754e5e4e..a0f63c7d1d687 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -472,7 +472,8 @@ struct nft_set_ops { int (*init)(const struct nft_set *set, const struct nft_set_desc *desc, const struct nlattr * const nla[]); - void (*destroy)(const struct nft_set *set); + void (*destroy)(const struct nft_ctx *ctx, + const struct nft_set *set); void (*gc_init)(const struct nft_set *set);
unsigned int elemsize; @@ -809,6 +810,8 @@ int nft_set_elem_expr_clone(const struct nft_ctx *ctx, struct nft_set *set, struct nft_expr *expr_array[]); void nft_set_elem_destroy(const struct nft_set *set, void *elem, bool destroy_expr); +void nf_tables_set_elem_destroy(const struct nft_ctx *ctx, + const struct nft_set *set, void *elem);
/** * struct nft_set_gc_batch_head - nf_tables set garbage collection batch diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 837925cf0b848..dee60bad5c198 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -561,6 +561,58 @@ static int nft_trans_set_add(const struct nft_ctx *ctx, int msg_type, return __nft_trans_set_add(ctx, msg_type, set, NULL); }
+static void nft_setelem_data_deactivate(const struct net *net, + const struct nft_set *set, + struct nft_set_elem *elem); + +static int nft_mapelem_deactivate(const struct nft_ctx *ctx, + struct nft_set *set, + const struct nft_set_iter *iter, + struct nft_set_elem *elem) +{ + nft_setelem_data_deactivate(ctx->net, set, elem); + + return 0; +} + +struct nft_set_elem_catchall { + struct list_head list; + struct rcu_head rcu; + void *elem; +}; + +static void nft_map_catchall_deactivate(const struct nft_ctx *ctx, + struct nft_set *set) +{ + u8 genmask = nft_genmask_next(ctx->net); + struct nft_set_elem_catchall *catchall; + struct nft_set_elem elem; + struct nft_set_ext *ext; + + list_for_each_entry(catchall, &set->catchall_list, list) { + ext = nft_set_elem_ext(set, catchall->elem); + if (!nft_set_elem_active(ext, genmask)) + continue; + + elem.priv = catchall->elem; + nft_setelem_data_deactivate(ctx->net, set, &elem); + break; + } +} + +static void nft_map_deactivate(const struct nft_ctx *ctx, struct nft_set *set) +{ + struct nft_set_iter iter = { + .genmask = nft_genmask_next(ctx->net), + .fn = nft_mapelem_deactivate, + }; + + set->ops->walk(ctx, set, &iter); + WARN_ON_ONCE(iter.err); + + nft_map_catchall_deactivate(ctx, set); +} + static int nft_delset(const struct nft_ctx *ctx, struct nft_set *set) { int err; @@ -569,6 +621,9 @@ static int nft_delset(const struct nft_ctx *ctx, struct nft_set *set) if (err < 0) return err;
+ if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT)) + nft_map_deactivate(ctx, set); + nft_deactivate_next(ctx->net, set); ctx->table->use--;
@@ -3596,12 +3651,6 @@ int nft_setelem_validate(const struct nft_ctx *ctx, struct nft_set *set, return 0; }
-struct nft_set_elem_catchall { - struct list_head list; - struct rcu_head rcu; - void *elem; -}; - int nft_set_catchall_validate(const struct nft_ctx *ctx, struct nft_set *set) { u8 genmask = nft_genmask_next(ctx->net); @@ -4928,7 +4977,7 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, for (i = 0; i < set->num_exprs; i++) nft_expr_destroy(&ctx, set->exprs[i]); err_set_destroy: - ops->destroy(set); + ops->destroy(&ctx, set); err_set_init: kfree(set->name); err_set_name: @@ -4943,7 +4992,7 @@ static void nft_set_catchall_destroy(const struct nft_ctx *ctx,
list_for_each_entry_safe(catchall, next, &set->catchall_list, list) { list_del_rcu(&catchall->list); - nft_set_elem_destroy(set, catchall->elem, true); + nf_tables_set_elem_destroy(ctx, set, catchall->elem); kfree_rcu(catchall, rcu); } } @@ -4958,7 +5007,7 @@ static void nft_set_destroy(const struct nft_ctx *ctx, struct nft_set *set) for (i = 0; i < set->num_exprs; i++) nft_expr_destroy(ctx, set->exprs[i]);
- set->ops->destroy(set); + set->ops->destroy(ctx, set); nft_set_catchall_destroy(ctx, set); kfree(set->name); kvfree(set); @@ -5123,10 +5172,60 @@ static void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set, } }
+static void nft_setelem_data_activate(const struct net *net, + const struct nft_set *set, + struct nft_set_elem *elem); + +static int nft_mapelem_activate(const struct nft_ctx *ctx, + struct nft_set *set, + const struct nft_set_iter *iter, + struct nft_set_elem *elem) +{ + nft_setelem_data_activate(ctx->net, set, elem); + + return 0; +} + +static void nft_map_catchall_activate(const struct nft_ctx *ctx, + struct nft_set *set) +{ + u8 genmask = nft_genmask_next(ctx->net); + struct nft_set_elem_catchall *catchall; + struct nft_set_elem elem; + struct nft_set_ext *ext; + + list_for_each_entry(catchall, &set->catchall_list, list) { + ext = nft_set_elem_ext(set, catchall->elem); + if (!nft_set_elem_active(ext, genmask)) + continue; + + elem.priv = catchall->elem; + nft_setelem_data_activate(ctx->net, set, &elem); + break; + } +} + +static void nft_map_activate(const struct nft_ctx *ctx, struct nft_set *set) +{ + struct nft_set_iter iter = { + .genmask = nft_genmask_next(ctx->net), + .fn = nft_mapelem_activate, + }; + + set->ops->walk(ctx, set, &iter); + WARN_ON_ONCE(iter.err); + + nft_map_catchall_activate(ctx, set); +} + void nf_tables_activate_set(const struct nft_ctx *ctx, struct nft_set *set) { - if (nft_set_is_anonymous(set)) + if (nft_set_is_anonymous(set)) { + if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT)) + nft_map_activate(ctx, set); + nft_clear(ctx->net, set); + }
set->use++; } @@ -5145,13 +5244,20 @@ void nf_tables_deactivate_set(const struct nft_ctx *ctx, struct nft_set *set, set->use--; break; case NFT_TRANS_PREPARE: - if (nft_set_is_anonymous(set)) - nft_deactivate_next(ctx->net, set); + if (nft_set_is_anonymous(set)) { + if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT)) + nft_map_deactivate(ctx, set);
+ nft_deactivate_next(ctx->net, set); + } set->use--; return; case NFT_TRANS_ABORT: case NFT_TRANS_RELEASE: + if (nft_set_is_anonymous(set) && + set->flags & (NFT_SET_MAP | NFT_SET_OBJECT)) + nft_map_deactivate(ctx, set); + set->use--; fallthrough; default: @@ -5904,6 +6010,7 @@ static void nft_set_elem_expr_destroy(const struct nft_ctx *ctx, __nft_set_elem_expr_destroy(ctx, expr); }
+/* Drop references and destroy. Called from gc, dynset and abort path. */ void nft_set_elem_destroy(const struct nft_set *set, void *elem, bool destroy_expr) { @@ -5925,11 +6032,11 @@ void nft_set_elem_destroy(const struct nft_set *set, void *elem, } EXPORT_SYMBOL_GPL(nft_set_elem_destroy);
-/* Only called from commit path, nft_setelem_data_deactivate() already deals - * with the refcounting from the preparation phase. +/* Destroy element. References have been already dropped in the preparation + * path via nft_setelem_data_deactivate(). */ -static void nf_tables_set_elem_destroy(const struct nft_ctx *ctx, - const struct nft_set *set, void *elem) +void nf_tables_set_elem_destroy(const struct nft_ctx *ctx, + const struct nft_set *set, void *elem) { struct nft_set_ext *ext = nft_set_elem_ext(set, elem);
@@ -6562,7 +6669,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set, if (obj) obj->use--; err_elem_userdata: - nf_tables_set_elem_destroy(ctx, set, elem.priv); + nft_set_elem_destroy(set, elem.priv, true); err_parse_data: if (nla[NFTA_SET_ELEM_DATA] != NULL) nft_data_release(&elem.data.val, desc.type); @@ -9700,6 +9807,9 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action) case NFT_MSG_DESTROYSET: trans->ctx.table->use++; nft_clear(trans->ctx.net, nft_trans_set(trans)); + if (nft_trans_set(trans)->flags & (NFT_SET_MAP | NFT_SET_OBJECT)) + nft_map_activate(&trans->ctx, nft_trans_set(trans)); + nft_trans_destroy(trans); break; case NFT_MSG_NEWSETELEM: @@ -10469,6 +10579,9 @@ static void __nft_release_table(struct net *net, struct nft_table *table) list_for_each_entry_safe(set, ns, &table->sets, list) { list_del(&set->list); table->use--; + if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT)) + nft_map_deactivate(&ctx, set); + nft_set_destroy(&ctx, set); } list_for_each_entry_safe(obj, ne, &table->objects, list) { diff --git a/net/netfilter/nft_set_bitmap.c b/net/netfilter/nft_set_bitmap.c index 96081ac8d2b4c..1e5e7a181e0bc 100644 --- a/net/netfilter/nft_set_bitmap.c +++ b/net/netfilter/nft_set_bitmap.c @@ -271,13 +271,14 @@ static int nft_bitmap_init(const struct nft_set *set, return 0; }
-static void nft_bitmap_destroy(const struct nft_set *set) +static void nft_bitmap_destroy(const struct nft_ctx *ctx, + const struct nft_set *set) { struct nft_bitmap *priv = nft_set_priv(set); struct nft_bitmap_elem *be, *n;
list_for_each_entry_safe(be, n, &priv->list, head) - nft_set_elem_destroy(set, be, true); + nf_tables_set_elem_destroy(ctx, set, be); }
static bool nft_bitmap_estimate(const struct nft_set_desc *desc, u32 features, diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c index 76de6c8d98655..0b73cb0e752f7 100644 --- a/net/netfilter/nft_set_hash.c +++ b/net/netfilter/nft_set_hash.c @@ -400,19 +400,31 @@ static int nft_rhash_init(const struct nft_set *set, return 0; }
+struct nft_rhash_ctx { + const struct nft_ctx ctx; + const struct nft_set *set; +}; + static void nft_rhash_elem_destroy(void *ptr, void *arg) { - nft_set_elem_destroy(arg, ptr, true); + struct nft_rhash_ctx *rhash_ctx = arg; + + nf_tables_set_elem_destroy(&rhash_ctx->ctx, rhash_ctx->set, ptr); }
-static void nft_rhash_destroy(const struct nft_set *set) +static void nft_rhash_destroy(const struct nft_ctx *ctx, + const struct nft_set *set) { struct nft_rhash *priv = nft_set_priv(set); + struct nft_rhash_ctx rhash_ctx = { + .ctx = *ctx, + .set = set, + };
cancel_delayed_work_sync(&priv->gc_work); rcu_barrier(); rhashtable_free_and_destroy(&priv->ht, nft_rhash_elem_destroy, - (void *)set); + (void *)&rhash_ctx); }
/* Number of buckets is stored in u32, so cap our result to 1U<<31 */ @@ -643,7 +655,8 @@ static int nft_hash_init(const struct nft_set *set, return 0; }
-static void nft_hash_destroy(const struct nft_set *set) +static void nft_hash_destroy(const struct nft_ctx *ctx, + const struct nft_set *set) { struct nft_hash *priv = nft_set_priv(set); struct nft_hash_elem *he; @@ -653,7 +666,7 @@ static void nft_hash_destroy(const struct nft_set *set) for (i = 0; i < priv->buckets; i++) { hlist_for_each_entry_safe(he, next, &priv->table[i], node) { hlist_del_rcu(&he->node); - nft_set_elem_destroy(set, he, true); + nf_tables_set_elem_destroy(ctx, set, he); } } } diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c index 15e451dc3fc46..c867b5b772e86 100644 --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -2148,10 +2148,12 @@ static int nft_pipapo_init(const struct nft_set *set,
/** * nft_set_pipapo_match_destroy() - Destroy elements from key mapping array + * @ctx: context * @set: nftables API set representation * @m: matching data pointing to key mapping array */ -static void nft_set_pipapo_match_destroy(const struct nft_set *set, +static void nft_set_pipapo_match_destroy(const struct nft_ctx *ctx, + const struct nft_set *set, struct nft_pipapo_match *m) { struct nft_pipapo_field *f; @@ -2168,15 +2170,17 @@ static void nft_set_pipapo_match_destroy(const struct nft_set *set,
e = f->mt[r].e;
- nft_set_elem_destroy(set, e, true); + nf_tables_set_elem_destroy(ctx, set, e); } }
/** * nft_pipapo_destroy() - Free private data for set and all committed elements + * @ctx: context * @set: nftables API set representation */ -static void nft_pipapo_destroy(const struct nft_set *set) +static void nft_pipapo_destroy(const struct nft_ctx *ctx, + const struct nft_set *set) { struct nft_pipapo *priv = nft_set_priv(set); struct nft_pipapo_match *m; @@ -2186,7 +2190,7 @@ static void nft_pipapo_destroy(const struct nft_set *set) if (m) { rcu_barrier();
- nft_set_pipapo_match_destroy(set, m); + nft_set_pipapo_match_destroy(ctx, set, m);
#ifdef NFT_PIPAPO_ALIGN free_percpu(m->scratch_aligned); @@ -2203,7 +2207,7 @@ static void nft_pipapo_destroy(const struct nft_set *set) m = priv->clone;
if (priv->dirty) - nft_set_pipapo_match_destroy(set, m); + nft_set_pipapo_match_destroy(ctx, set, m);
#ifdef NFT_PIPAPO_ALIGN free_percpu(priv->clone->scratch_aligned); diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c index 2f114aa10f1a7..5c05c9b990fba 100644 --- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -664,7 +664,8 @@ static int nft_rbtree_init(const struct nft_set *set, return 0; }
-static void nft_rbtree_destroy(const struct nft_set *set) +static void nft_rbtree_destroy(const struct nft_ctx *ctx, + const struct nft_set *set) { struct nft_rbtree *priv = nft_set_priv(set); struct nft_rbtree_elem *rbe; @@ -675,7 +676,7 @@ static void nft_rbtree_destroy(const struct nft_set *set) while ((node = priv->root.rb_node) != NULL) { rb_erase(node, &priv->root); rbe = rb_entry(node, struct nft_rbtree_elem, node); - nft_set_elem_destroy(set, rbe, true); + nf_tables_set_elem_destroy(ctx, set, rbe); } }
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 2b84e215f87443c74ac0aa7f76bb172d43a87033 ]
The .walk callback iterates over the current active set, but it might be useful to iterate over the next generation set. Use the generation mask to determine what set view (either current or next generation) is use for the walk iteration.
Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_set_pipapo.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c index c867b5b772e86..0452ee586c1cc 100644 --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -1974,12 +1974,16 @@ static void nft_pipapo_walk(const struct nft_ctx *ctx, struct nft_set *set, struct nft_set_iter *iter) { struct nft_pipapo *priv = nft_set_priv(set); + struct net *net = read_pnet(&set->net); struct nft_pipapo_match *m; struct nft_pipapo_field *f; int i, r;
rcu_read_lock(); - m = rcu_dereference(priv->match); + if (iter->genmask == nft_genmask_cur(net)) + m = rcu_dereference(priv->match); + else + m = priv->clone;
if (unlikely(!m)) goto out;
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit c88c535b592d3baeee74009f3eceeeaf0fdd5e1b ]
Anonymous sets come with NFT_SET_CONSTANT from userspace. Although API allows to create anonymous sets without NFT_SET_CONSTANT, it makes no sense to allow to add and to delete elements for bound anonymous sets.
Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index dee60bad5c198..4bf532a2a60b9 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -6714,7 +6714,8 @@ static int nf_tables_newsetelem(struct sk_buff *skb, if (IS_ERR(set)) return PTR_ERR(set);
- if (!list_empty(&set->bindings) && set->flags & NFT_SET_CONSTANT) + if (!list_empty(&set->bindings) && + (set->flags & (NFT_SET_CONSTANT | NFT_SET_ANONYMOUS))) return -EBUSY;
nft_ctx_init(&ctx, net, skb, info->nlh, family, table, NULL, nla); @@ -6988,7 +6989,9 @@ static int nf_tables_delsetelem(struct sk_buff *skb, set = nft_set_lookup(table, nla[NFTA_SET_ELEM_LIST_SET], genmask); if (IS_ERR(set)) return PTR_ERR(set); - if (!list_empty(&set->bindings) && set->flags & NFT_SET_CONSTANT) + + if (!list_empty(&set->bindings) && + (set->flags & (NFT_SET_CONSTANT | NFT_SET_ANONYMOUS))) return -EBUSY;
nft_ctx_init(&ctx, net, skb, info->nlh, family, table, NULL, nla);
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 938154b93be8cd611ddfd7bafc1849f3c4355201 ]
Add a new list to track set transaction and to check for unbound anonymous sets before entering the commit phase.
Bail out at the end of the transaction handling if an anonymous set remains unbound.
Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/netfilter/nf_tables.h | 3 +++ net/netfilter/nf_tables_api.c | 35 ++++++++++++++++++++++++++++--- 2 files changed, 35 insertions(+), 3 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index a0f63c7d1d687..730c63ceee567 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -1565,6 +1565,7 @@ static inline void nft_set_elem_clear_busy(struct nft_set_ext *ext) * struct nft_trans - nf_tables object update in transaction * * @list: used internally + * @binding_list: list of objects with possible bindings * @msg_type: message type * @put_net: ctx->net needs to be put * @ctx: transaction context @@ -1572,6 +1573,7 @@ static inline void nft_set_elem_clear_busy(struct nft_set_ext *ext) */ struct nft_trans { struct list_head list; + struct list_head binding_list; int msg_type; bool put_net; struct nft_ctx ctx; @@ -1716,6 +1718,7 @@ static inline int nft_request_module(struct net *net, const char *fmt, ...) { re struct nftables_pernet { struct list_head tables; struct list_head commit_list; + struct list_head binding_list; struct list_head module_list; struct list_head notify_list; struct mutex commit_mutex; diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 4bf532a2a60b9..2e3a374db1b9f 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -153,6 +153,7 @@ static struct nft_trans *nft_trans_alloc_gfp(const struct nft_ctx *ctx, return NULL;
INIT_LIST_HEAD(&trans->list); + INIT_LIST_HEAD(&trans->binding_list); trans->msg_type = msg_type; trans->ctx = *ctx;
@@ -165,9 +166,15 @@ static struct nft_trans *nft_trans_alloc(const struct nft_ctx *ctx, return nft_trans_alloc_gfp(ctx, msg_type, size, GFP_KERNEL); }
-static void nft_trans_destroy(struct nft_trans *trans) +static void nft_trans_list_del(struct nft_trans *trans) { list_del(&trans->list); + list_del(&trans->binding_list); +} + +static void nft_trans_destroy(struct nft_trans *trans) +{ + nft_trans_list_del(trans); kfree(trans); }
@@ -359,6 +366,14 @@ static void nft_trans_commit_list_add_tail(struct net *net, struct nft_trans *tr { struct nftables_pernet *nft_net = nft_pernet(net);
+ switch (trans->msg_type) { + case NFT_MSG_NEWSET: + if (!nft_trans_set_update(trans) && + nft_set_is_anonymous(nft_trans_set(trans))) + list_add_tail(&trans->binding_list, &nft_net->binding_list); + break; + } + list_add_tail(&trans->list, &nft_net->commit_list); }
@@ -9027,7 +9042,7 @@ static void nf_tables_trans_destroy_work(struct work_struct *w) synchronize_rcu();
list_for_each_entry_safe(trans, next, &head, list) { - list_del(&trans->list); + nft_trans_list_del(trans); nft_commit_release(trans); } } @@ -9394,6 +9409,19 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) return 0; }
+ list_for_each_entry(trans, &nft_net->binding_list, binding_list) { + switch (trans->msg_type) { + case NFT_MSG_NEWSET: + if (!nft_trans_set_update(trans) && + nft_set_is_anonymous(nft_trans_set(trans)) && + !nft_trans_set_bound(trans)) { + pr_warn_once("nftables ruleset with unbound set\n"); + return -EINVAL; + } + break; + } + } + /* 0. Validate ruleset, otherwise roll back for error reporting. */ if (nf_tables_validate(net) < 0) return -EAGAIN; @@ -9893,7 +9921,7 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action)
list_for_each_entry_safe_reverse(trans, next, &nft_net->commit_list, list) { - list_del(&trans->list); + nft_trans_list_del(trans); nf_tables_abort_release(trans); }
@@ -10669,6 +10697,7 @@ static int __net_init nf_tables_init_net(struct net *net)
INIT_LIST_HEAD(&nft_net->tables); INIT_LIST_HEAD(&nft_net->commit_list); + INIT_LIST_HEAD(&nft_net->binding_list); INIT_LIST_HEAD(&nft_net->module_list); INIT_LIST_HEAD(&nft_net->notify_list); mutex_init(&nft_net->commit_mutex);
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 62e1e94b246e685d89c3163aaef4b160e42ceb02 ]
Use binding list to track set transaction and to check for unbound chains before entering the commit phase.
Bail out if chain binding remain unused before entering the commit step.
Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 2e3a374db1b9f..f065542750376 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -372,6 +372,11 @@ static void nft_trans_commit_list_add_tail(struct net *net, struct nft_trans *tr nft_set_is_anonymous(nft_trans_set(trans))) list_add_tail(&trans->binding_list, &nft_net->binding_list); break; + case NFT_MSG_NEWCHAIN: + if (!nft_trans_chain_update(trans) && + nft_chain_binding(nft_trans_chain(trans))) + list_add_tail(&trans->binding_list, &nft_net->binding_list); + break; }
list_add_tail(&trans->list, &nft_net->commit_list); @@ -9419,6 +9424,14 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) return -EINVAL; } break; + case NFT_MSG_NEWCHAIN: + if (!nft_trans_chain_update(trans) && + nft_chain_binding(nft_trans_chain(trans)) && + !nft_trans_chain_bound(trans)) { + pr_warn_once("nftables ruleset with unbound chain\n"); + return -EINVAL; + } + break; } }
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit b770283c98e0eee9133c47bc03b6cc625dc94723 ]
Disallow updates of set timeout and garbage collection parameters for anonymous sets.
Fixes: 123b99619cca ("netfilter: nf_tables: honor set timeout and garbage collection updates") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index f065542750376..b9e276820c722 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -4894,6 +4894,9 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, if (info->nlh->nlmsg_flags & NLM_F_REPLACE) return -EOPNOTSUPP;
+ if (nft_set_is_anonymous(set)) + return -EOPNOTSUPP; + err = nft_set_expr_alloc(&ctx, set, nla, exprs, &num_exprs, flags); if (err < 0) return err;
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 62f9a68a36d4441a6c412b81faed102594bc6670 ]
Move the alias from xt_osf to nfnetlink_osf.
Fixes: f9324952088f ("netfilter: nfnetlink_osf: extract nfnetlink_subsystem code from xt_osf.c") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nfnetlink_osf.c | 1 + net/netfilter/xt_osf.c | 1 - 2 files changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nfnetlink_osf.c b/net/netfilter/nfnetlink_osf.c index ee6840bd59337..8f1bfa6ccc2d9 100644 --- a/net/netfilter/nfnetlink_osf.c +++ b/net/netfilter/nfnetlink_osf.c @@ -439,3 +439,4 @@ module_init(nfnl_osf_init); module_exit(nfnl_osf_fini);
MODULE_LICENSE("GPL"); +MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_OSF); diff --git a/net/netfilter/xt_osf.c b/net/netfilter/xt_osf.c index e1990baf3a3b7..dc9485854002a 100644 --- a/net/netfilter/xt_osf.c +++ b/net/netfilter/xt_osf.c @@ -71,4 +71,3 @@ MODULE_AUTHOR("Evgeniy Polyakov zbr@ioremap.net"); MODULE_DESCRIPTION("Passive OS fingerprint matching."); MODULE_ALIAS("ipt_osf"); MODULE_ALIAS("ip6t_osf"); -MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_OSF);
From: Francesco Dolcini francesco.dolcini@toradex.com
[ Upstream commit a129b41fe0a8b4da828c46b10f5244ca07a3fec3 ]
This reverts commit da9ef50f545f86ffe6ff786174d26500c4db737a.
This fixes a regression in which the link would come up, but no communication was possible.
The reverted commit was also removing a comment about DP83867_PHYCR_FORCE_LINK_GOOD, this is not added back in this commits since it seems that this is unrelated to the original code change.
Closes: https://lore.kernel.org/all/ZGuDJos8D7N0J6Z2@francesco-nb.int.toradex.com/ Fixes: da9ef50f545f ("net: phy: dp83867: perform soft reset and retain established link") Signed-off-by: Francesco Dolcini francesco.dolcini@toradex.com Reviewed-by: Andrew Lunn andrew@lunn.ch Reviewed-by: Praneeth Bajjuri praneeth@ti.com Link: https://lore.kernel.org/r/20230619154435.355485-1-francesco@dolcini.it Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/dp83867.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c index 9f7ff88200484..76fe7e7d9ac91 100644 --- a/drivers/net/phy/dp83867.c +++ b/drivers/net/phy/dp83867.c @@ -905,7 +905,7 @@ static int dp83867_phy_reset(struct phy_device *phydev) { int err;
- err = phy_write(phydev, DP83867_CTRL, DP83867_SW_RESTART); + err = phy_write(phydev, DP83867_CTRL, DP83867_SW_RESET); if (err < 0) return err;
From: Florent Revest revest@chromium.org
[ Upstream commit 9724160b3942b0a967b91a59f81da5593f28b8ba ]
When building a kernel with LLVM=1, LLVM_IAS=0 and CONFIG_KASAN=y, LLVM leaves DWARF tags for the "asan.module_ctor" & co symbols. In turn, pahole creates BTF_KIND_FUNC entries for these and this makes the BTF metadata validation fail because they contain a dot.
In a dramatic turn of event, this BTF verification failure can cause the netfilter_bpf initialization to fail, causing netfilter_core to free the netfilter_helper hashmap and netfilter_ftp to trigger a use-after-free. The risk of u-a-f in netfilter will be addressed separately but the existence of "asan.module_ctor" debug info under some build conditions sounds like a good enough reason to accept functions that contain dots in BTF.
Although using only LLVM=1 is the recommended way to compile clang-based kernels, users can certainly do LLVM=1, LLVM_IAS=0 as well and we still try to support that combination according to Nick. To clarify:
- > v5.10 kernel, LLVM=1 (LLVM_IAS=0 is not the default) is recommended, but user can still have LLVM=1, LLVM_IAS=0 to trigger the issue
- <= 5.10 kernel, LLVM=1 (LLVM_IAS=0 is the default) is recommended in which case GNU as will be used
Fixes: 1dc92851849c ("bpf: kernel side support for BTF Var and DataSec") Signed-off-by: Florent Revest revest@chromium.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Andrii Nakryiko andrii@kernel.org Cc: Yonghong Song yhs@meta.com Cc: Nick Desaulniers ndesaulniers@google.com Link: https://lore.kernel.org/bpf/20230615145607.3469985-1-revest@chromium.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/btf.c | 20 ++++++++------------ 1 file changed, 8 insertions(+), 12 deletions(-)
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 46bce5577fb48..3cc3232f8c334 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -735,13 +735,12 @@ static bool btf_name_offset_valid(const struct btf *btf, u32 offset) return offset < btf->hdr.str_len; }
-static bool __btf_name_char_ok(char c, bool first, bool dot_ok) +static bool __btf_name_char_ok(char c, bool first) { if ((first ? !isalpha(c) : !isalnum(c)) && c != '_' && - ((c == '.' && !dot_ok) || - c != '.')) + c != '.') return false; return true; } @@ -758,20 +757,20 @@ static const char *btf_str_by_offset(const struct btf *btf, u32 offset) return NULL; }
-static bool __btf_name_valid(const struct btf *btf, u32 offset, bool dot_ok) +static bool __btf_name_valid(const struct btf *btf, u32 offset) { /* offset must be valid */ const char *src = btf_str_by_offset(btf, offset); const char *src_limit;
- if (!__btf_name_char_ok(*src, true, dot_ok)) + if (!__btf_name_char_ok(*src, true)) return false;
/* set a limit on identifier length */ src_limit = src + KSYM_NAME_LEN; src++; while (*src && src < src_limit) { - if (!__btf_name_char_ok(*src, false, dot_ok)) + if (!__btf_name_char_ok(*src, false)) return false; src++; } @@ -779,17 +778,14 @@ static bool __btf_name_valid(const struct btf *btf, u32 offset, bool dot_ok) return !*src; }
-/* Only C-style identifier is permitted. This can be relaxed if - * necessary. - */ static bool btf_name_valid_identifier(const struct btf *btf, u32 offset) { - return __btf_name_valid(btf, offset, false); + return __btf_name_valid(btf, offset); }
static bool btf_name_valid_section(const struct btf *btf, u32 offset) { - return __btf_name_valid(btf, offset, true); + return __btf_name_valid(btf, offset); }
static const char *__btf_name_by_offset(const struct btf *btf, u32 offset) @@ -4450,7 +4446,7 @@ static s32 btf_var_check_meta(struct btf_verifier_env *env, }
if (!t->name_off || - !__btf_name_valid(env->btf, t->name_off, true)) { + !__btf_name_valid(env->btf, t->name_off)) { btf_verifier_log_type(env, t, "Invalid name"); return -EINVAL; }
From: Jiri Olsa jolsa@kernel.org
[ Upstream commit db8eae6bc5c702d8e3ab2d0c6bb5976c131576eb ]
We currently allow to create perf link for program with expected_attach_type == BPF_TRACE_KPROBE_MULTI.
This will cause crash when we call helpers like get_attach_cookie or get_func_ip in such program, because it will call the kprobe_multi's version (current->bpf_ctx context setup) of those helpers while it expects perf_link's current->bpf_ctx context setup.
Making sure that we use BPF_TRACE_KPROBE_MULTI expected_attach_type only for programs attaching through kprobe_multi link.
Fixes: ca74823c6e16 ("bpf: Add cookie support to programs attached with kprobe multi link") Signed-off-by: Jiri Olsa jolsa@kernel.org Signed-off-by: Andrii Nakryiko andrii@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Link: https://lore.kernel.org/bpf/20230618131414.75649-1-jolsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/syscall.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index adc83cb82f379..8bd5e3741d418 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -3412,6 +3412,11 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog, return prog->enforce_expected_attach_type && prog->expected_attach_type != attach_type ? -EINVAL : 0; + case BPF_PROG_TYPE_KPROBE: + if (prog->expected_attach_type == BPF_TRACE_KPROBE_MULTI && + attach_type != BPF_TRACE_KPROBE_MULTI) + return -EINVAL; + return 0; default: return 0; }
From: Jens Axboe axboe@kernel.dk
[ Upstream commit 26fed83653d0154704cadb7afc418f315c7ac1f0 ]
Rather than assign the user pointer to msghdr->msg_control, assign it to msghdr->msg_control_user to make sparse happy. They are in a union so the end result is the same, but let's avoid new sparse warnings and squash this one.
Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202306210654.mDMcyMuB-lkp@intel.com/ Fixes: cac9e4418f4c ("io_uring/net: save msghdr->msg_control for retries") Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- io_uring/net.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/io_uring/net.c b/io_uring/net.c index 49dc43d1d9fd5..2cf671cef0b61 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -203,7 +203,7 @@ static int io_sendmsg_copy_hdr(struct io_kiocb *req, ret = sendmsg_copy_msghdr(&iomsg->msg, sr->umsg, sr->msg_flags, &iomsg->free_iov); /* save msg_control as sys_sendmsg() overwrites it */ - sr->msg_control = iomsg->msg.msg_control; + sr->msg_control = iomsg->msg.msg_control_user; return ret; }
@@ -302,7 +302,7 @@ int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
if (req_has_async_data(req)) { kmsg = req->async_data; - kmsg->msg.msg_control = sr->msg_control; + kmsg->msg.msg_control_user = sr->msg_control; } else { ret = io_sendmsg_copy_hdr(req, &iomsg); if (ret)
From: Danielle Ratson danieller@nvidia.com
[ Upstream commit c7c059fba6fb19c3bc924925c984772e733cb594 ]
When mirroring to a gretap in hardware the device expects to be programmed with the egress port and all the encapsulating headers. This requires the driver to resolve the path the packet will take in the software data path and program the device accordingly.
If the path cannot be resolved (in this case because of an unresolved neighbor), then mirror installation fails until the path is resolved. This results in a race that causes the test to sometimes fail.
Fix this by setting the neighbor's state to permanent in a couple of tests, so that it is always valid.
Fixes: 35c31d5c323f ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1d") Fixes: 239e754af854 ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1q") Signed-off-by: Danielle Ratson danieller@nvidia.com Reviewed-by: Petr Machata petrm@nvidia.com Signed-off-by: Petr Machata petrm@nvidia.com Link: https://lore.kernel.org/r/268816ac729cb6028c7a34d4dda6f4ec7af55333.168726460... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../testing/selftests/net/forwarding/mirror_gre_bridge_1d.sh | 4 ++++ .../testing/selftests/net/forwarding/mirror_gre_bridge_1q.sh | 4 ++++ 2 files changed, 8 insertions(+)
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1d.sh b/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1d.sh index c5095da7f6bf8..aec752a22e9ec 100755 --- a/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1d.sh +++ b/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1d.sh @@ -93,12 +93,16 @@ cleanup()
test_gretap() { + ip neigh replace 192.0.2.130 lladdr $(mac_get $h3) \ + nud permanent dev br2 full_test_span_gre_dir gt4 ingress 8 0 "mirror to gretap" full_test_span_gre_dir gt4 egress 0 8 "mirror to gretap" }
test_ip6gretap() { + ip neigh replace 2001:db8:2::2 lladdr $(mac_get $h3) \ + nud permanent dev br2 full_test_span_gre_dir gt6 ingress 8 0 "mirror to ip6gretap" full_test_span_gre_dir gt6 egress 0 8 "mirror to ip6gretap" } diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1q.sh b/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1q.sh index 9ff22f28032dd..0cf4c47a46f9b 100755 --- a/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1q.sh +++ b/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1q.sh @@ -90,12 +90,16 @@ cleanup()
test_gretap() { + ip neigh replace 192.0.2.130 lladdr $(mac_get $h3) \ + nud permanent dev br1 full_test_span_gre_dir gt4 ingress 8 0 "mirror to gretap" full_test_span_gre_dir gt4 egress 0 8 "mirror to gretap" }
test_ip6gretap() { + ip neigh replace 2001:db8:2::2 lladdr $(mac_get $h3) \ + nud permanent dev br1 full_test_span_gre_dir gt6 ingress 8 0 "mirror to ip6gretap" full_test_span_gre_dir gt6 egress 0 8 "mirror to ip6gretap" }
From: Shyam Sundar S K Shyam-sundar.S-k@amd.com
[ Upstream commit 146b6f6855e7656e8329910606595220c761daac ]
Power source notify handler is getting registered even when none of the PMF feature in enabled leading to a crash.
... [ 22.592162] Call Trace: [ 22.592164] <TASK> [ 22.592164] ? rcu_note_context_switch+0x5e0/0x660 [ 22.592166] ? __warn+0x81/0x130 [ 22.592171] ? rcu_note_context_switch+0x5e0/0x660 [ 22.592172] ? report_bug+0x171/0x1a0 [ 22.592175] ? prb_read_valid+0x1b/0x30 [ 22.592177] ? handle_bug+0x3c/0x80 [ 22.592178] ? exc_invalid_op+0x17/0x70 [ 22.592179] ? asm_exc_invalid_op+0x1a/0x20 [ 22.592182] ? rcu_note_context_switch+0x5e0/0x660 [ 22.592183] ? acpi_ut_delete_object_desc+0x86/0xb0 [ 22.592186] ? acpi_ut_update_ref_count.part.0+0x22d/0x930 [ 22.592187] __schedule+0xc0/0x1410 [ 22.592189] ? ktime_get+0x3c/0xa0 [ 22.592191] ? lapic_next_event+0x1d/0x30 [ 22.592193] ? hrtimer_start_range_ns+0x25b/0x350 [ 22.592196] schedule+0x5e/0xd0 [ 22.592197] schedule_hrtimeout_range_clock+0xbe/0x140 [ 22.592199] ? __pfx_hrtimer_wakeup+0x10/0x10 [ 22.592200] usleep_range_state+0x64/0x90 [ 22.592203] amd_pmf_send_cmd+0x106/0x2a0 [amd_pmf bddfe0fe3712aaa99acce3d5487405c5213c6616] [ 22.592207] amd_pmf_update_slider+0x56/0x1b0 [amd_pmf bddfe0fe3712aaa99acce3d5487405c5213c6616] [ 22.592210] amd_pmf_set_sps_power_limits+0x72/0x80 [amd_pmf bddfe0fe3712aaa99acce3d5487405c5213c6616] [ 22.592213] amd_pmf_pwr_src_notify_call+0x49/0x90 [amd_pmf bddfe0fe3712aaa99acce3d5487405c5213c6616] [ 22.592216] notifier_call_chain+0x5a/0xd0 [ 22.592218] atomic_notifier_call_chain+0x32/0x50 ...
Fix this by moving the registration of source change notify handler only when SPS(Static Slider) is advertised as supported.
Reported-by: Allen Zhong allen@atr.me Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217571 Fixes: 4c71ae414474 ("platform/x86/amd/pmf: Add support SPS PMF feature") Tested-by: Patil Rajesh Reddy Patil.Reddy@amd.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Shyam Sundar S K Shyam-sundar.S-k@amd.com Link: https://lore.kernel.org/r/20230622060309.310001-1-Shyam-sundar.S-k@amd.com Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/amd/pmf/core.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/platform/x86/amd/pmf/core.c b/drivers/platform/x86/amd/pmf/core.c index dc9803e1a4b9b..73d2357e32f8e 100644 --- a/drivers/platform/x86/amd/pmf/core.c +++ b/drivers/platform/x86/amd/pmf/core.c @@ -297,6 +297,8 @@ static void amd_pmf_init_features(struct amd_pmf_dev *dev) /* Enable Static Slider */ if (is_apmf_func_supported(dev, APMF_FUNC_STATIC_SLIDER_GRANULAR)) { amd_pmf_init_sps(dev); + dev->pwr_src_notifier.notifier_call = amd_pmf_pwr_src_notify_call; + power_supply_reg_notifier(&dev->pwr_src_notifier); dev_dbg(dev->dev, "SPS enabled and Platform Profiles registered\n"); }
@@ -315,8 +317,10 @@ static void amd_pmf_init_features(struct amd_pmf_dev *dev)
static void amd_pmf_deinit_features(struct amd_pmf_dev *dev) { - if (is_apmf_func_supported(dev, APMF_FUNC_STATIC_SLIDER_GRANULAR)) + if (is_apmf_func_supported(dev, APMF_FUNC_STATIC_SLIDER_GRANULAR)) { + power_supply_unreg_notifier(&dev->pwr_src_notifier); amd_pmf_deinit_sps(dev); + }
if (is_apmf_func_supported(dev, APMF_FUNC_AUTO_MODE)) { amd_pmf_deinit_auto_mode(dev); @@ -399,9 +403,6 @@ static int amd_pmf_probe(struct platform_device *pdev) apmf_install_handler(dev); amd_pmf_dbgfs_register(dev);
- dev->pwr_src_notifier.notifier_call = amd_pmf_pwr_src_notify_call; - power_supply_reg_notifier(&dev->pwr_src_notifier); - dev_info(dev->dev, "registered PMF device successfully\n");
return 0; @@ -411,7 +412,6 @@ static int amd_pmf_remove(struct platform_device *pdev) { struct amd_pmf_dev *dev = platform_get_drvdata(pdev);
- power_supply_unreg_notifier(&dev->pwr_src_notifier); amd_pmf_deinit_features(dev); apmf_acpi_deinit(dev); amd_pmf_dbgfs_unregister(dev);
From: Eric Dumazet edumazet@google.com
[ Upstream commit 2174a08db80d1efeea382e25ac41c4e7511eb6d6 ]
syzbot managed to trigger a divide error [1] in netem.
It could happen if q->rate changes while netem_enqueue() is running, since q->rate is read twice.
It turns out netem_change() always lacked proper synchronization.
[1] divide error: 0000 [#1] SMP KASAN CPU: 1 PID: 7867 Comm: syz-executor.1 Not tainted 6.1.30-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023 RIP: 0010:div64_u64 include/linux/math64.h:69 [inline] RIP: 0010:packet_time_ns net/sched/sch_netem.c:357 [inline] RIP: 0010:netem_enqueue+0x2067/0x36d0 net/sched/sch_netem.c:576 Code: 89 e2 48 69 da 00 ca 9a 3b 42 80 3c 28 00 4c 8b a4 24 88 00 00 00 74 0d 4c 89 e7 e8 c3 4f 3b fd 48 8b 4c 24 18 48 89 d8 31 d2 <49> f7 34 24 49 01 c7 4c 8b 64 24 48 4d 01 f7 4c 89 e3 48 c1 eb 03 RSP: 0018:ffffc9000dccea60 EFLAGS: 00010246 RAX: 000001a442624200 RBX: 000001a442624200 RCX: ffff888108a4f000 RDX: 0000000000000000 RSI: 000000000000070d RDI: 000000000000070d RBP: ffffc9000dcceb90 R08: ffffffff849c5e26 R09: fffffbfff10e1297 R10: 0000000000000000 R11: dffffc0000000001 R12: ffff888108a4f358 R13: dffffc0000000000 R14: 0000001a8cd9a7ec R15: 0000000000000000 FS: 00007fa73fe18700(0000) GS:ffff8881f6b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa73fdf7718 CR3: 000000011d36e000 CR4: 0000000000350ee0 Call Trace: <TASK> [<ffffffff84714385>] __dev_xmit_skb net/core/dev.c:3931 [inline] [<ffffffff84714385>] __dev_queue_xmit+0xcf5/0x3370 net/core/dev.c:4290 [<ffffffff84d22df2>] dev_queue_xmit include/linux/netdevice.h:3030 [inline] [<ffffffff84d22df2>] neigh_hh_output include/net/neighbour.h:531 [inline] [<ffffffff84d22df2>] neigh_output include/net/neighbour.h:545 [inline] [<ffffffff84d22df2>] ip_finish_output2+0xb92/0x10d0 net/ipv4/ip_output.c:235 [<ffffffff84d21e63>] __ip_finish_output+0xc3/0x2b0 [<ffffffff84d10a81>] ip_finish_output+0x31/0x2a0 net/ipv4/ip_output.c:323 [<ffffffff84d10f14>] NF_HOOK_COND include/linux/netfilter.h:298 [inline] [<ffffffff84d10f14>] ip_output+0x224/0x2a0 net/ipv4/ip_output.c:437 [<ffffffff84d123b5>] dst_output include/net/dst.h:444 [inline] [<ffffffff84d123b5>] ip_local_out net/ipv4/ip_output.c:127 [inline] [<ffffffff84d123b5>] __ip_queue_xmit+0x1425/0x2000 net/ipv4/ip_output.c:542 [<ffffffff84d12fdc>] ip_queue_xmit+0x4c/0x70 net/ipv4/ip_output.c:556
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Cc: Stephen Hemminger stephen@networkplumber.org Cc: Jamal Hadi Salim jhs@mojatatu.com Cc: Cong Wang xiyou.wangcong@gmail.com Cc: Jiri Pirko jiri@resnulli.us Reviewed-by: Jamal Hadi Salim jhs@mojatatu.com Reviewed-by: Simon Horman simon.horman@corigine.com Link: https://lore.kernel.org/r/20230620184425.1179809-1-edumazet@google.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_netem.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index 6ef3021e1169a..e79be1b3e74da 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -966,6 +966,7 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt, if (ret < 0) return ret;
+ sch_tree_lock(sch); /* backup q->clg and q->loss_model */ old_clg = q->clg; old_loss_model = q->loss_model; @@ -974,7 +975,7 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt, ret = get_loss_clg(q, tb[TCA_NETEM_LOSS]); if (ret) { q->loss_model = old_loss_model; - return ret; + goto unlock; } } else { q->loss_model = CLG_RANDOM; @@ -1041,6 +1042,8 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt, /* capping jitter to the range acceptable by tabledist() */ q->jitter = min_t(s64, abs(q->jitter), INT_MAX);
+unlock: + sch_tree_unlock(sch); return ret;
get_table_failure: @@ -1050,7 +1053,8 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt, */ q->clg = old_clg; q->loss_model = old_loss_model; - return ret; + + goto unlock; }
static int netem_init(struct Qdisc *sch, struct nlattr *opt,
From: Maciej Żenczykowski maze@google.com
[ Upstream commit a9628e88776eb7d045cf46467f1afdd0f7fe72ea ]
This reverts commit 1f86123b9749 ("net: align SO_RCVMARK required privileges with SO_MARK") because the reasoning in the commit message is not really correct: SO_RCVMARK is used for 'reading' incoming skb mark (via cmsg), as such it is more equivalent to 'getsockopt(SO_MARK)' which has no priv check and retrieves the socket mark, rather than 'setsockopt(SO_MARK) which sets the socket mark and does require privs.
Additionally incoming skb->mark may already be visible if sysctl_fwmark_reflect and/or sysctl_tcp_fwmark_accept are enabled.
Furthermore, it is easier to block the getsockopt via bpf (either cgroup setsockopt hook, or via syscall filters) then to unblock it if it requires CAP_NET_RAW/ADMIN.
On Android the socket mark is (among other things) used to store the network identifier a socket is bound to. Setting it is privileged, but retrieving it is not. We'd like unprivileged userspace to be able to read the network id of incoming packets (where mark is set via iptables [to be moved to bpf])...
An alternative would be to add another sysctl to control whether setting SO_RCVMARK is privilged or not. (or even a MASK of which bits in the mark can be exposed) But this seems like over-engineering...
Note: This is a non-trivial revert, due to later merged commit e42c7beee71d ("bpf: net: Consider has_current_bpf_ctx() when testing capable() in sk_setsockopt()") which changed both 'ns_capable' into 'sockopt_ns_capable' calls.
Fixes: 1f86123b9749 ("net: align SO_RCVMARK required privileges with SO_MARK") Cc: Larysa Zaremba larysa.zaremba@intel.com Cc: Simon Horman simon.horman@corigine.com Cc: Paolo Abeni pabeni@redhat.com Cc: Eyal Birger eyal.birger@gmail.com Cc: Jakub Kicinski kuba@kernel.org Cc: Eric Dumazet edumazet@google.com Cc: Patrick Rohr prohr@google.com Signed-off-by: Maciej Żenczykowski maze@google.com Reviewed-by: Simon Horman simon.horman@corigine.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20230618103130.51628-1-maze@google.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/net/core/sock.c b/net/core/sock.c index 3fd71f343c9f2..b34c48f802e98 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1362,12 +1362,6 @@ int sk_setsockopt(struct sock *sk, int level, int optname, __sock_set_mark(sk, val); break; case SO_RCVMARK: - if (!sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) && - !sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) { - ret = -EPERM; - break; - } - sock_valbool_flag(sk, SOCK_RCVMARK, valbool); break;
From: Nicolas Frattaroli frattaroli.nicolas@gmail.com
[ Upstream commit cf9ae4a0077496e8224d68fc88e3df13dd7e5f37 ]
In pre-production prototypes (of which I only know one person having one, Peter Geis), GPIO0 pin A5 was tied to the SDMMC power enable pin on the CM4 connector. On all production models, this is not the case; instead, this pin is used for the nEXTRST signal, and the SDMMC power enable pin is always pulled high.
Since everyone currently using the SOQuartz device trees will want this change, it is made to the tree without splitting the trees into two separate ones of which users will then inevitably choose the wrong one.
This fixes USB and PCIe on a wide variety of CM4IO-compatible boards which use the nEXTRST signal.
Fixes: 5859b5a9c3ac ("arm64: dts: rockchip: add SoQuartz CM4IO dts") Signed-off-by: Nicolas Frattaroli frattaroli.nicolas@gmail.com Link: https://lore.kernel.org/r/20230421152610.21688-1-frattaroli.nicolas@gmail.co... Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Sasha Levin sashal@kernel.org --- .../boot/dts/rockchip/rk3566-soquartz-cm4.dts | 18 +++++++----- .../boot/dts/rockchip/rk3566-soquartz.dtsi | 29 +++++++++---------- 2 files changed, 24 insertions(+), 23 deletions(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk3566-soquartz-cm4.dts b/arch/arm64/boot/dts/rockchip/rk3566-soquartz-cm4.dts index 263ce40770dde..cddf6cd2fecb1 100644 --- a/arch/arm64/boot/dts/rockchip/rk3566-soquartz-cm4.dts +++ b/arch/arm64/boot/dts/rockchip/rk3566-soquartz-cm4.dts @@ -28,6 +28,16 @@ regulator-max-microvolt = <5000000>; vin-supply = <&vcc12v_dcin>; }; + + vcc_sd_pwr: vcc-sd-pwr-regulator { + compatible = "regulator-fixed"; + regulator-name = "vcc_sd_pwr"; + regulator-always-on; + regulator-boot-on; + regulator-min-microvolt = <3300000>; + regulator-max-microvolt = <3300000>; + vin-supply = <&vcc3v3_sys>; + }; };
/* phy for pcie */ @@ -130,13 +140,7 @@ };
&sdmmc0 { - vmmc-supply = <&sdmmc_pwr>; - status = "okay"; -}; - -&sdmmc_pwr { - regulator-min-microvolt = <3300000>; - regulator-max-microvolt = <3300000>; + vmmc-supply = <&vcc_sd_pwr>; status = "okay"; };
diff --git a/arch/arm64/boot/dts/rockchip/rk3566-soquartz.dtsi b/arch/arm64/boot/dts/rockchip/rk3566-soquartz.dtsi index 102e448bc026a..31aa2b8efe393 100644 --- a/arch/arm64/boot/dts/rockchip/rk3566-soquartz.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3566-soquartz.dtsi @@ -104,16 +104,6 @@ regulator-max-microvolt = <3300000>; vin-supply = <&vcc5v0_sys>; }; - - sdmmc_pwr: sdmmc-pwr-regulator { - compatible = "regulator-fixed"; - enable-active-high; - gpio = <&gpio0 RK_PA5 GPIO_ACTIVE_HIGH>; - pinctrl-names = "default"; - pinctrl-0 = <&sdmmc_pwr_h>; - regulator-name = "sdmmc_pwr"; - status = "disabled"; - }; };
&cpu0 { @@ -155,6 +145,19 @@ status = "disabled"; };
+&gpio0 { + nextrst-hog { + gpio-hog; + /* + * GPIO_ACTIVE_LOW + output-low here means that the pin is set + * to high, because output-low decides the value pre-inversion. + */ + gpios = <RK_PA5 GPIO_ACTIVE_LOW>; + line-name = "nEXTRST"; + output-low; + }; +}; + &gpu { mali-supply = <&vdd_gpu>; status = "okay"; @@ -538,12 +541,6 @@ rockchip,pins = <2 RK_PC2 RK_FUNC_GPIO &pcfg_pull_none>; }; }; - - sdmmc-pwr { - sdmmc_pwr_h: sdmmc-pwr-h { - rockchip,pins = <0 RK_PA5 RK_FUNC_GPIO &pcfg_pull_none>; - }; - }; };
&pmu_io_domains {
From: Jiawen Wu jiawenwu@trustnetic.com
[ Upstream commit 8c00914e5438e3636f26b4f814b3297ae2a1b9ee ]
In case of gpio-regmap, IRQ chip is added by regmap-irq and associated with GPIO chip by gpiochip_irqchip_add_domain(). The initialization flag was not added in gpiochip_irqchip_add_domain(), causing gpiochip_to_irq() to return -EPROBE_DEFER.
Fixes: 5467801f1fcb ("gpio: Restrict usage of GPIO chip irq members before initialization") Signed-off-by: Jiawen Wu jiawenwu@trustnetic.com Reviewed-by: Andy Shevchenko andy.shevchenko@gmail.com Reviewed-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpiolib.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index 4472214fcd43a..1cfddbb3443d0 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -1763,6 +1763,14 @@ int gpiochip_irqchip_add_domain(struct gpio_chip *gc, gc->to_irq = gpiochip_to_irq; gc->irq.domain = domain;
+ /* + * Using barrier() here to prevent compiler from reordering + * gc->irq.initialized before adding irqdomain. + */ + barrier(); + + gc->irq.initialized = true; + return 0; } EXPORT_SYMBOL_GPL(gpiochip_irqchip_add_domain);
From: Jiasheng Jiang jiasheng@iscas.ac.cn
[ Upstream commit c1bcb976d8feb107ff2c12caaf12ac5e70f44d5f ]
Add the missing check for platform_get_irq() and return error code if it fails.
The returned error code will be dealed with in builtin_platform_driver(sifive_gpio_driver) and the driver will not be registered.
Fixes: f52d6d8b43e5 ("gpio: sifive: To get gpio irq offset from device tree data") Signed-off-by: Jiasheng Jiang jiasheng@iscas.ac.cn Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpio-sifive.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/gpio/gpio-sifive.c b/drivers/gpio/gpio-sifive.c index bc5660f61c570..0f1e1226ebbe8 100644 --- a/drivers/gpio/gpio-sifive.c +++ b/drivers/gpio/gpio-sifive.c @@ -221,8 +221,12 @@ static int sifive_gpio_probe(struct platform_device *pdev) return -ENODEV; }
- for (i = 0; i < ngpio; i++) - chip->irq_number[i] = platform_get_irq(pdev, i); + for (i = 0; i < ngpio; i++) { + ret = platform_get_irq(pdev, i); + if (ret < 0) + return ret; + chip->irq_number[i] = ret; + }
ret = bgpio_init(&chip->gc, dev, 4, chip->base + SIFIVE_GPIO_INPUT_VAL,
From: Su Hui suhui@nfschina.com
[ Upstream commit 5b00369fcf6d1ff9050b94800dc596925ff3623f ]
Move allocation code down to avoid memory leak.
Fixes: 29f54745f245 ("iommu/amd: Add missing domain type checks") Signed-off-by: Su Hui suhui@nfschina.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Reviewed-by: Vasant Hegde vasant.hegde@amd.com Link: https://lore.kernel.org/r/20230608021933.856045-1-suhui@nfschina.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/amd/iommu.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index cd0e3fb78c3ff..e9eac86cddbd2 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -2069,10 +2069,6 @@ static struct protection_domain *protection_domain_alloc(unsigned int type) int mode = DEFAULT_PGTABLE_LEVEL; int ret;
- domain = kzalloc(sizeof(*domain), GFP_KERNEL); - if (!domain) - return NULL; - /* * Force IOMMU v1 page table when iommu=pt and * when allocating domain for pass-through devices. @@ -2088,6 +2084,10 @@ static struct protection_domain *protection_domain_alloc(unsigned int type) return NULL; }
+ domain = kzalloc(sizeof(*domain), GFP_KERNEL); + if (!domain) + return NULL; + switch (pgtable) { case AMD_IOMMU_V1: ret = protection_domain_init_v1(domain, mode);
From: Michael Walle mwalle@kernel.org
[ Upstream commit ff7a1790fbf92f1bdd0966d3f0da3ea808ede876 ]
Up until commit 6a45b0e2589f ("gpiolib: Introduce gpiochip_irqchip_add_domain()") all irq_domains were allocated by gpiolib itself and thus gpiolib also takes care of freeing it.
With gpiochip_irqchip_add_domain() a user of gpiolib can associate an irq_domain with the gpio_chip. This irq_domain is not managed by gpiolib and therefore must not be freed by gpiolib.
Fixes: 6a45b0e2589f ("gpiolib: Introduce gpiochip_irqchip_add_domain()") Reported-by: Jiawen Wu jiawenwu@trustnetic.com Signed-off-by: Michael Walle mwalle@kernel.org Reviewed-by: Linus Walleij linus.walleij@linaro.org Reviewed-by: Andy Shevchenko andy@kernel.org Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpiolib.c | 3 ++- include/linux/gpio/driver.h | 8 ++++++++ 2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index 1cfddbb3443d0..ce8f71e06eb62 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -1716,7 +1716,7 @@ static void gpiochip_irqchip_remove(struct gpio_chip *gc) }
/* Remove all IRQ mappings and delete the domain */ - if (gc->irq.domain) { + if (!gc->irq.domain_is_allocated_externally && gc->irq.domain) { unsigned int irq;
for (offset = 0; offset < gc->ngpio; offset++) { @@ -1762,6 +1762,7 @@ int gpiochip_irqchip_add_domain(struct gpio_chip *gc,
gc->to_irq = gpiochip_to_irq; gc->irq.domain = domain; + gc->irq.domain_is_allocated_externally = true;
/* * Using barrier() here to prevent compiler from reordering diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h index ccd8a512d8546..b769ff6b00fdd 100644 --- a/include/linux/gpio/driver.h +++ b/include/linux/gpio/driver.h @@ -244,6 +244,14 @@ struct gpio_irq_chip { */ bool initialized;
+ /** + * @domain_is_allocated_externally: + * + * True it the irq_domain was allocated outside of gpiolib, in which + * case gpiolib won't free the irq_domain itself. + */ + bool domain_is_allocated_externally; + /** * @init_hw: optional routine to initialize hardware before * an IRQ chip will be added. This is quite useful when
From: Maurizio Lombardi mlombard@redhat.com
[ Upstream commit 13247018d68f21e7132924b9853f7e2c423588b6 ]
If the initiator suddenly stops sending data during a login while keeping the TCP connection open, the login_work won't be scheduled and will never release the login semaphore; concurrent login operations will therefore get stuck and fail.
The bug is due to the inability of the login timeout code to properly handle this particular case.
Fix the problem by replacing the old per-NP login timer with a new per-connection timer.
The timer is started when an initiator connects to the target; if it expires, it sends a SIGINT signal to the thread pointed at by the conn->login_kworker pointer.
conn->login_kworker is set by calling the iscsit_set_login_timer_kworker() helper, initially it will point to the np thread; When the login operation's control is in the process of being passed from the NP-thread to login_work, the conn->login_worker pointer is set to NULL. Finally, login_kworker will be changed to point to the worker thread executing the login_work job.
If conn->login_kworker is NULL when the timer expires, it means that the login operation hasn't been completed yet but login_work isn't running, in this case the timer will mark the login process as failed and will schedule login_work so the latter will be forced to free the resources it holds.
Signed-off-by: Maurizio Lombardi mlombard@redhat.com Link: https://lore.kernel.org/r/20230508162219.1731964-2-mlombard@redhat.com Reviewed-by: Mike Christie michael.christie@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/target/iscsi/iscsi_target.c | 2 - drivers/target/iscsi/iscsi_target_login.c | 63 ++------------------ drivers/target/iscsi/iscsi_target_nego.c | 70 +++++++++++++---------- drivers/target/iscsi/iscsi_target_util.c | 51 +++++++++++++++++ drivers/target/iscsi/iscsi_target_util.h | 4 ++ include/target/iscsi/iscsi_target_core.h | 6 +- 6 files changed, 103 insertions(+), 93 deletions(-)
diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index 07e196b44b91d..c70ce1fa399f8 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -363,8 +363,6 @@ struct iscsi_np *iscsit_add_np( init_completion(&np->np_restart_comp); INIT_LIST_HEAD(&np->np_list);
- timer_setup(&np->np_login_timer, iscsi_handle_login_thread_timeout, 0); - ret = iscsi_target_setup_login_socket(np, sockaddr); if (ret != 0) { kfree(np); diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c index 274bdd7845ca9..90b870f234f03 100644 --- a/drivers/target/iscsi/iscsi_target_login.c +++ b/drivers/target/iscsi/iscsi_target_login.c @@ -811,59 +811,6 @@ void iscsi_post_login_handler( iscsit_dec_conn_usage_count(conn); }
-void iscsi_handle_login_thread_timeout(struct timer_list *t) -{ - struct iscsi_np *np = from_timer(np, t, np_login_timer); - - spin_lock_bh(&np->np_thread_lock); - pr_err("iSCSI Login timeout on Network Portal %pISpc\n", - &np->np_sockaddr); - - if (np->np_login_timer_flags & ISCSI_TF_STOP) { - spin_unlock_bh(&np->np_thread_lock); - return; - } - - if (np->np_thread) - send_sig(SIGINT, np->np_thread, 1); - - np->np_login_timer_flags &= ~ISCSI_TF_RUNNING; - spin_unlock_bh(&np->np_thread_lock); -} - -static void iscsi_start_login_thread_timer(struct iscsi_np *np) -{ - /* - * This used the TA_LOGIN_TIMEOUT constant because at this - * point we do not have access to ISCSI_TPG_ATTRIB(tpg)->login_timeout - */ - spin_lock_bh(&np->np_thread_lock); - np->np_login_timer_flags &= ~ISCSI_TF_STOP; - np->np_login_timer_flags |= ISCSI_TF_RUNNING; - mod_timer(&np->np_login_timer, jiffies + TA_LOGIN_TIMEOUT * HZ); - - pr_debug("Added timeout timer to iSCSI login request for" - " %u seconds.\n", TA_LOGIN_TIMEOUT); - spin_unlock_bh(&np->np_thread_lock); -} - -static void iscsi_stop_login_thread_timer(struct iscsi_np *np) -{ - spin_lock_bh(&np->np_thread_lock); - if (!(np->np_login_timer_flags & ISCSI_TF_RUNNING)) { - spin_unlock_bh(&np->np_thread_lock); - return; - } - np->np_login_timer_flags |= ISCSI_TF_STOP; - spin_unlock_bh(&np->np_thread_lock); - - del_timer_sync(&np->np_login_timer); - - spin_lock_bh(&np->np_thread_lock); - np->np_login_timer_flags &= ~ISCSI_TF_RUNNING; - spin_unlock_bh(&np->np_thread_lock); -} - int iscsit_setup_np( struct iscsi_np *np, struct sockaddr_storage *sockaddr) @@ -1123,10 +1070,13 @@ static struct iscsit_conn *iscsit_alloc_conn(struct iscsi_np *np) spin_lock_init(&conn->nopin_timer_lock); spin_lock_init(&conn->response_queue_lock); spin_lock_init(&conn->state_lock); + spin_lock_init(&conn->login_worker_lock); + spin_lock_init(&conn->login_timer_lock);
timer_setup(&conn->nopin_response_timer, iscsit_handle_nopin_response_timeout, 0); timer_setup(&conn->nopin_timer, iscsit_handle_nopin_timeout, 0); + timer_setup(&conn->login_timer, iscsit_login_timeout, 0);
if (iscsit_conn_set_transport(conn, np->np_transport) < 0) goto free_conn; @@ -1304,7 +1254,7 @@ static int __iscsi_target_login_thread(struct iscsi_np *np) goto new_sess_out; }
- iscsi_start_login_thread_timer(np); + iscsit_start_login_timer(conn, current);
pr_debug("Moving to TARG_CONN_STATE_XPT_UP.\n"); conn->conn_state = TARG_CONN_STATE_XPT_UP; @@ -1417,8 +1367,6 @@ static int __iscsi_target_login_thread(struct iscsi_np *np) if (ret < 0) goto new_sess_out;
- iscsi_stop_login_thread_timer(np); - if (ret == 1) { tpg_np = conn->tpg_np;
@@ -1434,7 +1382,7 @@ static int __iscsi_target_login_thread(struct iscsi_np *np) new_sess_out: new_sess = true; old_sess_out: - iscsi_stop_login_thread_timer(np); + iscsit_stop_login_timer(conn); tpg_np = conn->tpg_np; iscsi_target_login_sess_out(conn, zero_tsih, new_sess); new_sess = false; @@ -1448,7 +1396,6 @@ static int __iscsi_target_login_thread(struct iscsi_np *np) return 1;
exit: - iscsi_stop_login_thread_timer(np); spin_lock_bh(&np->np_thread_lock); np->np_thread_state = ISCSI_NP_THREAD_EXIT; spin_unlock_bh(&np->np_thread_lock); diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c index 24040c118e49d..e3a5644a70b36 100644 --- a/drivers/target/iscsi/iscsi_target_nego.c +++ b/drivers/target/iscsi/iscsi_target_nego.c @@ -535,25 +535,6 @@ static void iscsi_target_login_drop(struct iscsit_conn *conn, struct iscsi_login iscsi_target_login_sess_out(conn, zero_tsih, true); }
-struct conn_timeout { - struct timer_list timer; - struct iscsit_conn *conn; -}; - -static void iscsi_target_login_timeout(struct timer_list *t) -{ - struct conn_timeout *timeout = from_timer(timeout, t, timer); - struct iscsit_conn *conn = timeout->conn; - - pr_debug("Entering iscsi_target_login_timeout >>>>>>>>>>>>>>>>>>>\n"); - - if (conn->login_kworker) { - pr_debug("Sending SIGINT to conn->login_kworker %s/%d\n", - conn->login_kworker->comm, conn->login_kworker->pid); - send_sig(SIGINT, conn->login_kworker, 1); - } -} - static void iscsi_target_do_login_rx(struct work_struct *work) { struct iscsit_conn *conn = container_of(work, @@ -562,12 +543,15 @@ static void iscsi_target_do_login_rx(struct work_struct *work) struct iscsi_np *np = login->np; struct iscsi_portal_group *tpg = conn->tpg; struct iscsi_tpg_np *tpg_np = conn->tpg_np; - struct conn_timeout timeout; int rc, zero_tsih = login->zero_tsih; bool state;
pr_debug("entering iscsi_target_do_login_rx, conn: %p, %s:%d\n", conn, current->comm, current->pid); + + spin_lock(&conn->login_worker_lock); + set_bit(LOGIN_FLAGS_WORKER_RUNNING, &conn->login_flags); + spin_unlock(&conn->login_worker_lock); /* * If iscsi_target_do_login_rx() has been invoked by ->sk_data_ready() * before initial PDU processing in iscsi_target_start_negotiation() @@ -597,19 +581,16 @@ static void iscsi_target_do_login_rx(struct work_struct *work) goto err; }
- conn->login_kworker = current; allow_signal(SIGINT); - - timeout.conn = conn; - timer_setup_on_stack(&timeout.timer, iscsi_target_login_timeout, 0); - mod_timer(&timeout.timer, jiffies + TA_LOGIN_TIMEOUT * HZ); - pr_debug("Starting login timer for %s/%d\n", current->comm, current->pid); + rc = iscsit_set_login_timer_kworker(conn, current); + if (rc < 0) { + /* The login timer has already expired */ + pr_debug("iscsi_target_do_login_rx, login failed\n"); + goto err; + }
rc = conn->conn_transport->iscsit_get_login_rx(conn, login); - del_timer_sync(&timeout.timer); - destroy_timer_on_stack(&timeout.timer); flush_signals(current); - conn->login_kworker = NULL;
if (rc < 0) goto err; @@ -646,7 +627,17 @@ static void iscsi_target_do_login_rx(struct work_struct *work) if (iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_WRITE_ACTIVE)) goto err; + + /* + * Set the login timer thread pointer to NULL to prevent the + * login process from getting stuck if the initiator + * stops sending data. + */ + rc = iscsit_set_login_timer_kworker(conn, NULL); + if (rc < 0) + goto err; } else if (rc == 1) { + iscsit_stop_login_timer(conn); cancel_delayed_work(&conn->login_work); iscsi_target_nego_release(conn); iscsi_post_login_handler(np, conn, zero_tsih); @@ -656,6 +647,7 @@ static void iscsi_target_do_login_rx(struct work_struct *work)
err: iscsi_target_restore_sock_callbacks(conn); + iscsit_stop_login_timer(conn); cancel_delayed_work(&conn->login_work); iscsi_target_login_drop(conn, login); iscsit_deaccess_np(np, tpg, tpg_np); @@ -1368,14 +1360,30 @@ int iscsi_target_start_negotiation( * and perform connection cleanup now. */ ret = iscsi_target_do_login(conn, login); - if (!ret && iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_INITIAL_PDU)) - ret = -1; + if (!ret) { + spin_lock(&conn->login_worker_lock); + + if (iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_INITIAL_PDU)) + ret = -1; + else if (!test_bit(LOGIN_FLAGS_WORKER_RUNNING, &conn->login_flags)) { + if (iscsit_set_login_timer_kworker(conn, NULL) < 0) { + /* + * The timeout has expired already. + * Schedule login_work to perform the cleanup. + */ + schedule_delayed_work(&conn->login_work, 0); + } + } + + spin_unlock(&conn->login_worker_lock); + }
if (ret < 0) { iscsi_target_restore_sock_callbacks(conn); iscsi_remove_failed_auth_entry(conn); } if (ret != 0) { + iscsit_stop_login_timer(conn); cancel_delayed_work_sync(&conn->login_work); iscsi_target_nego_release(conn); } diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c index 26dc8ed3045b6..b14835fcb0334 100644 --- a/drivers/target/iscsi/iscsi_target_util.c +++ b/drivers/target/iscsi/iscsi_target_util.c @@ -1040,6 +1040,57 @@ void iscsit_stop_nopin_timer(struct iscsit_conn *conn) spin_unlock_bh(&conn->nopin_timer_lock); }
+void iscsit_login_timeout(struct timer_list *t) +{ + struct iscsit_conn *conn = from_timer(conn, t, login_timer); + struct iscsi_login *login = conn->login; + + pr_debug("Entering iscsi_target_login_timeout >>>>>>>>>>>>>>>>>>>\n"); + + spin_lock_bh(&conn->login_timer_lock); + login->login_failed = 1; + + if (conn->login_kworker) { + pr_debug("Sending SIGINT to conn->login_kworker %s/%d\n", + conn->login_kworker->comm, conn->login_kworker->pid); + send_sig(SIGINT, conn->login_kworker, 1); + } else { + schedule_delayed_work(&conn->login_work, 0); + } + spin_unlock_bh(&conn->login_timer_lock); +} + +void iscsit_start_login_timer(struct iscsit_conn *conn, struct task_struct *kthr) +{ + pr_debug("Login timer started\n"); + + conn->login_kworker = kthr; + mod_timer(&conn->login_timer, jiffies + TA_LOGIN_TIMEOUT * HZ); +} + +int iscsit_set_login_timer_kworker(struct iscsit_conn *conn, struct task_struct *kthr) +{ + struct iscsi_login *login = conn->login; + int ret = 0; + + spin_lock_bh(&conn->login_timer_lock); + if (login->login_failed) { + /* The timer has already expired */ + ret = -1; + } else { + conn->login_kworker = kthr; + } + spin_unlock_bh(&conn->login_timer_lock); + + return ret; +} + +void iscsit_stop_login_timer(struct iscsit_conn *conn) +{ + pr_debug("Login timer stopped\n"); + timer_delete_sync(&conn->login_timer); +} + int iscsit_send_tx_data( struct iscsit_cmd *cmd, struct iscsit_conn *conn, diff --git a/drivers/target/iscsi/iscsi_target_util.h b/drivers/target/iscsi/iscsi_target_util.h index 33ea799a08508..24b8e577575a6 100644 --- a/drivers/target/iscsi/iscsi_target_util.h +++ b/drivers/target/iscsi/iscsi_target_util.h @@ -56,6 +56,10 @@ extern void iscsit_handle_nopin_timeout(struct timer_list *t); extern void __iscsit_start_nopin_timer(struct iscsit_conn *); extern void iscsit_start_nopin_timer(struct iscsit_conn *); extern void iscsit_stop_nopin_timer(struct iscsit_conn *); +extern void iscsit_login_timeout(struct timer_list *t); +extern void iscsit_start_login_timer(struct iscsit_conn *, struct task_struct *kthr); +extern void iscsit_stop_login_timer(struct iscsit_conn *); +extern int iscsit_set_login_timer_kworker(struct iscsit_conn *, struct task_struct *kthr); extern int iscsit_send_tx_data(struct iscsit_cmd *, struct iscsit_conn *, int); extern int iscsit_fe_sendpage_sg(struct iscsit_cmd *, struct iscsit_conn *); extern int iscsit_tx_login_rsp(struct iscsit_conn *, u8, u8); diff --git a/include/target/iscsi/iscsi_target_core.h b/include/target/iscsi/iscsi_target_core.h index 229118156a1f6..42f4a4c0c1003 100644 --- a/include/target/iscsi/iscsi_target_core.h +++ b/include/target/iscsi/iscsi_target_core.h @@ -562,12 +562,14 @@ struct iscsit_conn { #define LOGIN_FLAGS_READ_ACTIVE 2 #define LOGIN_FLAGS_WRITE_ACTIVE 3 #define LOGIN_FLAGS_CLOSED 4 +#define LOGIN_FLAGS_WORKER_RUNNING 5 unsigned long login_flags; struct delayed_work login_work; struct iscsi_login *login; struct timer_list nopin_timer; struct timer_list nopin_response_timer; struct timer_list transport_timer; + struct timer_list login_timer; struct task_struct *login_kworker; /* Spinlock used for add/deleting cmd's from conn_cmd_list */ spinlock_t cmd_lock; @@ -576,6 +578,8 @@ struct iscsit_conn { spinlock_t nopin_timer_lock; spinlock_t response_queue_lock; spinlock_t state_lock; + spinlock_t login_timer_lock; + spinlock_t login_worker_lock; /* libcrypto RX and TX contexts for crc32c */ struct ahash_request *conn_rx_hash; struct ahash_request *conn_tx_hash; @@ -792,7 +796,6 @@ struct iscsi_np { enum np_thread_state_table np_thread_state; bool enabled; atomic_t np_reset_count; - enum iscsi_timer_flags_table np_login_timer_flags; u32 np_exports; enum np_flags_table np_flags; spinlock_t np_thread_lock; @@ -800,7 +803,6 @@ struct iscsi_np { struct socket *np_socket; struct sockaddr_storage np_sockaddr; struct task_struct *np_thread; - struct timer_list np_login_timer; void *np_context; struct iscsit_transport *np_transport; struct list_head np_list;
From: Maurizio Lombardi mlombard@redhat.com
[ Upstream commit 98a8c2bf938a5973716f280da618077a3d255976 ]
Signed-off-by: Maurizio Lombardi mlombard@redhat.com Link: https://lore.kernel.org/r/20230508162219.1731964-3-mlombard@redhat.com Reviewed-by: Mike Christie michael.christie@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/target/iscsi/iscsi_target_core.h | 1 - 1 file changed, 1 deletion(-)
diff --git a/include/target/iscsi/iscsi_target_core.h b/include/target/iscsi/iscsi_target_core.h index 42f4a4c0c1003..4c15420e8965d 100644 --- a/include/target/iscsi/iscsi_target_core.h +++ b/include/target/iscsi/iscsi_target_core.h @@ -568,7 +568,6 @@ struct iscsit_conn { struct iscsi_login *login; struct timer_list nopin_timer; struct timer_list nopin_response_timer; - struct timer_list transport_timer; struct timer_list login_timer; struct task_struct *login_kworker; /* Spinlock used for add/deleting cmd's from conn_cmd_list */
From: Maurizio Lombardi mlombard@redhat.com
[ Upstream commit 2a737d3b8c792400118d6cf94958f559de9c5e59 ]
The tpg->np_login_sem is a semaphore that is used to serialize the login process when multiple login threads run concurrently against the same target portal group.
The iscsi_target_locate_portal() function finds the tpg, calls iscsit_access_np() against the np_login_sem semaphore and saves the tpg pointer in conn->tpg;
If iscsi_target_locate_portal() fails, the caller will check for the conn->tpg pointer and, if it's not NULL, then it will assume that iscsi_target_locate_portal() called iscsit_access_np() on the semaphore.
Make sure that conn->tpg gets initialized only if iscsit_access_np() was successful, otherwise iscsit_deaccess_np() may end up being called against a semaphore we never took, allowing more than one thread to access the same tpg.
Signed-off-by: Maurizio Lombardi mlombard@redhat.com Link: https://lore.kernel.org/r/20230508162219.1731964-4-mlombard@redhat.com Reviewed-by: Mike Christie michael.christie@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/target/iscsi/iscsi_target_nego.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c index e3a5644a70b36..fa3fb5f4e6bc4 100644 --- a/drivers/target/iscsi/iscsi_target_nego.c +++ b/drivers/target/iscsi/iscsi_target_nego.c @@ -1122,6 +1122,7 @@ int iscsi_target_locate_portal( iscsi_target_set_sock_callbacks(conn);
login->np = np; + conn->tpg = NULL;
login_req = (struct iscsi_login_req *) login->req; payload_length = ntoh24(login_req->dlength); @@ -1189,7 +1190,6 @@ int iscsi_target_locate_portal( */ sessiontype = strncmp(s_buf, DISCOVERY, 9); if (!sessiontype) { - conn->tpg = iscsit_global->discovery_tpg; if (!login->leading_connection) goto get_target;
@@ -1206,9 +1206,11 @@ int iscsi_target_locate_portal( * Serialize access across the discovery struct iscsi_portal_group to * process login attempt. */ + conn->tpg = iscsit_global->discovery_tpg; if (iscsit_access_np(np, conn->tpg) < 0) { iscsit_tx_login_rsp(conn, ISCSI_STATUS_CLS_TARGET_ERR, ISCSI_LOGIN_STATUS_SVC_UNAVAILABLE); + conn->tpg = NULL; ret = -1; goto out; }
From: Denis Arefev arefev@swemel.ru
[ Upstream commit 16a9c24f24fbe4564284eb575b18cc20586b9270 ]
Added a variable check and transition in case of an error
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Denis Arefev arefev@swemel.ru Reviewed-by: Ping Cheng ping.cheng@wacom.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/wacom_sys.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c index fb538a6c4add8..aff4a21a46b6a 100644 --- a/drivers/hid/wacom_sys.c +++ b/drivers/hid/wacom_sys.c @@ -2417,8 +2417,13 @@ static int wacom_parse_and_register(struct wacom *wacom, bool wireless) goto fail_quirks; }
- if (features->device_type & WACOM_DEVICETYPE_WL_MONITOR) + if (features->device_type & WACOM_DEVICETYPE_WL_MONITOR) { error = hid_hw_open(hdev); + if (error) { + hid_err(hdev, "hw open failed\n"); + goto fail_quirks; + } + }
wacom_set_shared_values(wacom_wac); devres_close_group(&hdev->dev, wacom);
From: Marc Zyngier maz@kernel.org
[ Upstream commit 8d0f019e4c4f2ee2de81efd9bf1c27e9fb3c0460 ]
Add the missing Set/Way CMOs that apply to tagged memory.
Signed-off-by: Marc Zyngier maz@kernel.org Reviewed-by: Cornelia Huck cohuck@redhat.com Reviewed-by: Steven Price steven.price@arm.com Reviewed-by: Oliver Upton oliver.upton@linux.dev Link: https://lore.kernel.org/r/20230515204601.1270428-2-maz@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/include/asm/sysreg.h | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 9e3ecba3c4e67..d4a85c1db2e2a 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -115,8 +115,14 @@ #define SB_BARRIER_INSN __SYS_BARRIER_INSN(0, 7, 31)
#define SYS_DC_ISW sys_insn(1, 0, 7, 6, 2) +#define SYS_DC_IGSW sys_insn(1, 0, 7, 6, 4) +#define SYS_DC_IGDSW sys_insn(1, 0, 7, 6, 6) #define SYS_DC_CSW sys_insn(1, 0, 7, 10, 2) +#define SYS_DC_CGSW sys_insn(1, 0, 7, 10, 4) +#define SYS_DC_CGDSW sys_insn(1, 0, 7, 10, 6) #define SYS_DC_CISW sys_insn(1, 0, 7, 14, 2) +#define SYS_DC_CIGSW sys_insn(1, 0, 7, 14, 4) +#define SYS_DC_CIGDSW sys_insn(1, 0, 7, 14, 6)
/* * Automatically generated definitions for system registers, the
From: Steve French stfrench@microsoft.com
[ Upstream commit b535cc796a4b4942cd189652588e8d37c1f5925a ]
If plen is null when passed in, we only checked for null in one of the two places where it could be used. Although plen is always valid (not null) for current callers of the SMB2_change_notify function, this change makes it more consistent.
Reported-by: kernel test robot lkp@intel.com Reported-by: Dan Carpenter error27@gmail.com Closes: https://lore.kernel.org/all/202305251831.3V1gbbFs-lkp@intel.com/ Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cifs/smb2pdu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index 02228a590cd54..a2a95cd113df8 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -3777,7 +3777,7 @@ SMB2_change_notify(const unsigned int xid, struct cifs_tcon *tcon, if (*out_data == NULL) { rc = -ENOMEM; goto cnotify_exit; - } else + } else if (plen) *plen = le32_to_cpu(smb_rsp->OutputBufferLength); }
From: Hans Verkuil hverkuil-cisco@xs4all.nl
[ Upstream commit fe4526d99e2e06b08bb80316c3a596ea6a807b75 ]
Explicitly disable the CEC adapter in cec_devnode_unregister()
Usually this does not really do anything important, but for drivers that use the CEC pin framework this is needed to properly stop the hrtimer. Without this a crash would happen when such a driver is unloaded with rmmod.
Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/cec/core/cec-adap.c | 5 ++++- drivers/media/cec/core/cec-core.c | 2 ++ drivers/media/cec/core/cec-priv.h | 1 + 3 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c index 4f5ab3cae8a71..ac18707fddcd2 100644 --- a/drivers/media/cec/core/cec-adap.c +++ b/drivers/media/cec/core/cec-adap.c @@ -1582,7 +1582,7 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block) * * This function is called with adap->lock held. */ -static int cec_adap_enable(struct cec_adapter *adap) +int cec_adap_enable(struct cec_adapter *adap) { bool enable; int ret = 0; @@ -1592,6 +1592,9 @@ static int cec_adap_enable(struct cec_adapter *adap) if (adap->needs_hpd) enable = enable && adap->phys_addr != CEC_PHYS_ADDR_INVALID;
+ if (adap->devnode.unregistered) + enable = false; + if (enable == adap->is_enabled) return 0;
diff --git a/drivers/media/cec/core/cec-core.c b/drivers/media/cec/core/cec-core.c index af358e901b5f3..7e153c5cad04f 100644 --- a/drivers/media/cec/core/cec-core.c +++ b/drivers/media/cec/core/cec-core.c @@ -191,6 +191,8 @@ static void cec_devnode_unregister(struct cec_adapter *adap) mutex_lock(&adap->lock); __cec_s_phys_addr(adap, CEC_PHYS_ADDR_INVALID, false); __cec_s_log_addrs(adap, NULL, false); + // Disable the adapter (since adap->devnode.unregistered is true) + cec_adap_enable(adap); mutex_unlock(&adap->lock);
cdev_device_del(&devnode->cdev, &devnode->dev); diff --git a/drivers/media/cec/core/cec-priv.h b/drivers/media/cec/core/cec-priv.h index b78df931aa74b..ed1f8c67626bf 100644 --- a/drivers/media/cec/core/cec-priv.h +++ b/drivers/media/cec/core/cec-priv.h @@ -47,6 +47,7 @@ int cec_monitor_pin_cnt_inc(struct cec_adapter *adap); void cec_monitor_pin_cnt_dec(struct cec_adapter *adap); int cec_adap_status(struct seq_file *file, void *priv); int cec_thread_func(void *_adap); +int cec_adap_enable(struct cec_adapter *adap); void __cec_s_phys_addr(struct cec_adapter *adap, u16 phys_addr, bool block); int __cec_s_log_addrs(struct cec_adapter *adap, struct cec_log_addrs *log_addrs, bool block);
From: Hans Verkuil hverkuil-cisco@xs4all.nl
[ Upstream commit 73af6c7511038249cad3d5f3b44bf8d78ac0f499 ]
When a message was received the last_initiator is set to 0xff. This will force the signal free time for the next transmit to that for a new initiator. However, if a new transmit is already in progress, then don't set last_initiator, since that's the initiator of the current transmit. Overwriting this would cause the signal free time of a following transmit to be that of the new initiator instead of a next transmit.
Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/cec/core/cec-adap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c index ac18707fddcd2..b1512f9c5895c 100644 --- a/drivers/media/cec/core/cec-adap.c +++ b/drivers/media/cec/core/cec-adap.c @@ -1090,7 +1090,8 @@ void cec_received_msg_ts(struct cec_adapter *adap, mutex_lock(&adap->lock); dprintk(2, "%s: %*ph\n", __func__, msg->len, msg->msg);
- adap->last_initiator = 0xff; + if (!adap->transmit_in_progress) + adap->last_initiator = 0xff;
/* Check if this message was for us (directed or broadcast). */ if (!cec_msg_is_broadcast(msg))
From: Osama Muhammad osmtendev@gmail.com
[ Upstream commit 9b9e46aa07273ceb96866b2e812b46f1ee0b8d2f ]
This patch fixes the error checking in nfcsim.c. The DebugFS kernel API is developed in a way that the caller can safely ignore the errors that occur during the creation of DebugFS nodes.
Signed-off-by: Osama Muhammad osmtendev@gmail.com Reviewed-by: Simon Horman simon.horman@corigine.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nfc/nfcsim.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/drivers/nfc/nfcsim.c b/drivers/nfc/nfcsim.c index 85bf8d586c707..0f6befe8be1e2 100644 --- a/drivers/nfc/nfcsim.c +++ b/drivers/nfc/nfcsim.c @@ -336,10 +336,6 @@ static struct dentry *nfcsim_debugfs_root; static void nfcsim_debugfs_init(void) { nfcsim_debugfs_root = debugfs_create_dir("nfcsim", NULL); - - if (!nfcsim_debugfs_root) - pr_err("Could not create debugfs entry\n"); - }
static void nfcsim_debugfs_remove(void)
From: Shida Zhang zhangshida@kylinos.cn
[ Upstream commit 8fd9f4232d8152c650fd15127f533a0f6d0a4b2b ]
This fixes the following warning reported by gcc 10.2.1 under x86_64:
../fs/btrfs/tree-log.c: In function ‘btrfs_log_inode’: ../fs/btrfs/tree-log.c:6211:9: error: ‘last_range_start’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 6211 | ret = insert_dir_log_key(trans, log, path, key.objectid, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 6212 | first_dir_index, last_dir_index); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../fs/btrfs/tree-log.c:6161:6: note: ‘last_range_start’ was declared here 6161 | u64 last_range_start; | ^~~~~~~~~~~~~~~~
This might be a false positive fixed in later compiler versions but we want to have it fixed.
Reported-by: k2ci kernel-bot@kylinos.cn Reviewed-by: Anand Jain anand.jain@oracle.com Signed-off-by: Shida Zhang zhangshida@kylinos.cn Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/tree-log.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 200cea6e49e51..b91fa398b8141 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -6163,7 +6163,7 @@ static int log_delayed_deletions_incremental(struct btrfs_trans_handle *trans, { struct btrfs_root *log = inode->root->log_root; const struct btrfs_delayed_item *curr; - u64 last_range_start; + u64 last_range_start = 0; u64 last_range_end = 0; struct btrfs_key key;
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit 016da9c65fec9f0e78c4909ed9a0f2d567af6775 ]
The "udc" pointer was never set in the probe() function so it will lead to a NULL dereference in udc_pci_remove() when we do:
usb_del_gadget_udc(&udc->gadget);
Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Link: https://lore.kernel.org/r/ZG+A/dNpFWAlCChk@kili Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/gadget/udc/amd5536udc_pci.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/usb/gadget/udc/amd5536udc_pci.c b/drivers/usb/gadget/udc/amd5536udc_pci.c index c80f9bd51b750..a36913ae31f9e 100644 --- a/drivers/usb/gadget/udc/amd5536udc_pci.c +++ b/drivers/usb/gadget/udc/amd5536udc_pci.c @@ -170,6 +170,9 @@ static int udc_pci_probe( retval = -ENODEV; goto err_probe; } + + udc = dev; + return 0;
err_probe:
From: min15.li min15.li@samsung.com
[ Upstream commit 31a5978243d24d77be4bacca56c78a0fbc43b00d ]
In the function nvme_passthru_end(), only the value of the command opcode is checked, without checking the command type (IO command or Admin command). When we send a Dataset Management command (The opcode of the Dataset Management command is the same as the Set Feature command), kernel thinks it is a set feature command, then sets the controller's keep alive interval, and calls nvme_keep_alive_work().
Signed-off-by: min15.li min15.li@samsung.com Reviewed-by: Kanchan Joshi joshi.k@samsung.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/core.c | 4 +++- drivers/nvme/host/ioctl.c | 2 +- drivers/nvme/host/nvme.h | 2 +- drivers/nvme/target/passthru.c | 2 +- 4 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index c015393beeee8..f12109230dc8d 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1115,7 +1115,7 @@ u32 nvme_passthru_start(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode) } EXPORT_SYMBOL_NS_GPL(nvme_passthru_start, NVME_TARGET_PASSTHRU);
-void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects, +void nvme_passthru_end(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u32 effects, struct nvme_command *cmd, int status) { if (effects & NVME_CMD_EFFECTS_CSE_MASK) { @@ -1132,6 +1132,8 @@ void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects, nvme_queue_scan(ctrl); flush_work(&ctrl->scan_work); } + if (ns) + return;
switch (cmd->common.opcode) { case nvme_admin_set_features: diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index d24ea2e051564..0264ec74a4361 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -254,7 +254,7 @@ static int nvme_submit_user_cmd(struct request_queue *q, blk_mq_free_request(req);
if (effects) - nvme_passthru_end(ctrl, effects, cmd, ret); + nvme_passthru_end(ctrl, ns, effects, cmd, ret);
return ret; } diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index a2d4f59e0535a..2dd52739236e1 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -1077,7 +1077,7 @@ u32 nvme_command_effects(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode); u32 nvme_passthru_start(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode); int nvme_execute_rq(struct request *rq, bool at_head); -void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects, +void nvme_passthru_end(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u32 effects, struct nvme_command *cmd, int status); struct nvme_ctrl *nvme_ctrl_from_file(struct file *file); struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid); diff --git a/drivers/nvme/target/passthru.c b/drivers/nvme/target/passthru.c index 511c980d538df..71a9c1cc57f59 100644 --- a/drivers/nvme/target/passthru.c +++ b/drivers/nvme/target/passthru.c @@ -243,7 +243,7 @@ static void nvmet_passthru_execute_cmd_work(struct work_struct *w) blk_mq_free_request(rq);
if (effects) - nvme_passthru_end(ctrl, effects, req->cmd, status); + nvme_passthru_end(ctrl, ns, effects, req->cmd, status); }
static enum rq_end_io_ret nvmet_passthru_req_done(struct request *rq,
From: Uday Shankar ushankar@purestorage.com
[ Upstream commit ea4d453b9ec9ea279c39744cd0ecb47ef48ede35 ]
With TBKAS on, the completion of one command can defer sending a keep alive for up to twice the delay between successive runs of nvme_keep_alive_work. The current delay of KATO / 2 thus makes it possible for one command to defer sending a keep alive for up to KATO, which can result in the controller detecting a KATO. The following trace demonstrates the issue, taking KATO = 8 for simplicity:
1. t = 0: run nvme_keep_alive_work, no keep-alive sent 2. t = ε: I/O completion seen, set comp_seen = true 3. t = 4: run nvme_keep_alive_work, see comp_seen == true, skip sending keep-alive, set comp_seen = false 4. t = 8: run nvme_keep_alive_work, see comp_seen == false, send a keep-alive command.
Here, there is a delay of 8 - ε between receiving a command completion and sending the next command. With ε small, the controller is likely to detect a keep alive timeout.
Fix this by running nvme_keep_alive_work with a delay of KATO / 4 whenever TBKAS is on. Going through the above trace now gives us a worst-case delay of 4 - ε, which is in line with the recommendation of sending a command every KATO / 2 in the NVMe specification.
Reported-by: Costa Sapuntzakis costa@purestorage.com Reported-by: Randy Jennings randyj@purestorage.com Signed-off-by: Uday Shankar ushankar@purestorage.com Reviewed-by: Hannes Reinecke hare@suse.de Reviewed-by: Sagi Grimberg sagi@grimberg.me Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/core.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index f12109230dc8d..3838a6622c34b 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1163,9 +1163,25 @@ EXPORT_SYMBOL_NS_GPL(nvme_passthru_end, NVME_TARGET_PASSTHRU); * The host should send Keep Alive commands at half of the Keep Alive Timeout * accounting for transport roundtrip times [..]. */ +static unsigned long nvme_keep_alive_work_period(struct nvme_ctrl *ctrl) +{ + unsigned long delay = ctrl->kato * HZ / 2; + + /* + * When using Traffic Based Keep Alive, we need to run + * nvme_keep_alive_work at twice the normal frequency, as one + * command completion can postpone sending a keep alive command + * by up to twice the delay between runs. + */ + if (ctrl->ctratt & NVME_CTRL_ATTR_TBKAS) + delay /= 2; + return delay; +} + static void nvme_queue_keep_alive_work(struct nvme_ctrl *ctrl) { - queue_delayed_work(nvme_wq, &ctrl->ka_work, ctrl->kato * HZ / 2); + queue_delayed_work(nvme_wq, &ctrl->ka_work, + nvme_keep_alive_work_period(ctrl)); }
static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq,
From: Uday Shankar ushankar@purestorage.com
[ Upstream commit 774a9636514764ddc0d072ae0d1d1c01a47e6ddd ]
When a command completes, we set a flag which will skip sending a keep alive at the next run of nvme_keep_alive_work when TBKAS is on. However, if the command was submitted long ago, it's possible that the controller may have also restarted its keep alive timer (as a result of receiving the command) long ago. The following trace demonstrates the issue, assuming TBKAS is on and KATO = 8 for simplicity:
1. t = 0: submit I/O commands A, B, C, D, E 2. t = 0.5: commands A, B, C, D, E reach controller, restart its keep alive timer 3. t = 1: A completes 4. t = 2: run nvme_keep_alive_work, see recent completion, do nothing 5. t = 3: B completes 6. t = 4: run nvme_keep_alive_work, see recent completion, do nothing 7. t = 5: C completes 8. t = 6: run nvme_keep_alive_work, see recent completion, do nothing 9. t = 7: D completes 10. t = 8: run nvme_keep_alive_work, see recent completion, do nothing 11. t = 9: E completes
At this point, 8.5 seconds have passed without restarting the controller's keep alive timer, so the controller will detect a keep alive timeout.
Fix this by checking the IO start time when deciding to defer sending a keep alive command. Only set comp_seen if the command started after the most recent run of nvme_keep_alive_work. With this change, the completions of B, C, and D will not set comp_seen and the run of nvme_keep_alive_work at t = 4 will send a keep alive.
Reported-by: Costa Sapuntzakis costa@purestorage.com Reported-by: Randy Jennings randyj@purestorage.com Signed-off-by: Uday Shankar ushankar@purestorage.com Reviewed-by: Hannes Reinecke hare@suse.de Reviewed-by: Sagi Grimberg sagi@grimberg.me Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/core.c | 14 +++++++++++++- drivers/nvme/host/nvme.h | 1 + 2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 3838a6622c34b..d87b8dfda622a 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -397,7 +397,16 @@ void nvme_complete_rq(struct request *req) trace_nvme_complete_rq(req); nvme_cleanup_cmd(req);
- if (ctrl->kas) + /* + * Completions of long-running commands should not be able to + * defer sending of periodic keep alives, since the controller + * may have completed processing such commands a long time ago + * (arbitrarily close to command submission time). + * req->deadline - req->timeout is the command submission time + * in jiffies. + */ + if (ctrl->kas && + req->deadline - req->timeout >= ctrl->ka_last_check_time) ctrl->comp_seen = true;
switch (nvme_decide_disposition(req)) { @@ -1200,6 +1209,7 @@ static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq, return RQ_END_IO_NONE; }
+ ctrl->ka_last_check_time = jiffies; ctrl->comp_seen = false; spin_lock_irqsave(&ctrl->lock, flags); if (ctrl->state == NVME_CTRL_LIVE || @@ -1218,6 +1228,8 @@ static void nvme_keep_alive_work(struct work_struct *work) bool comp_seen = ctrl->comp_seen; struct request *rq;
+ ctrl->ka_last_check_time = jiffies; + if ((ctrl->ctratt & NVME_CTRL_ATTR_TBKAS) && comp_seen) { dev_dbg(ctrl->device, "reschedule traffic based keep-alive timer\n"); diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 2dd52739236e1..8657811f8b887 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -328,6 +328,7 @@ struct nvme_ctrl { struct delayed_work ka_work; struct delayed_work failfast_work; struct nvme_command ka_cmd; + unsigned long ka_last_check_time; struct work_struct fw_act_work; unsigned long events;
From: Uday Shankar ushankar@purestorage.com
[ Upstream commit c7275ce6a5fd32ca9f5a6294ed89cf0523181af9 ]
Upon keep alive completion, nvme_keep_alive_work is scheduled with the same delay every time. If keep alive commands are completing slowly, this may cause a keep alive timeout. The following trace illustrates the issue, taking KATO = 8 and TBKAS off for simplicity:
1. t = 0: run nvme_keep_alive_work, send keep alive 2. t = ε: keep alive reaches controller, controller restarts its keep alive timer 3. t = 4: host receives keep alive completion, schedules nvme_keep_alive_work with delay 4 4. t = 8: run nvme_keep_alive_work, send keep alive
Here, a keep alive having RTT of 4 causes a delay of at least 8 - ε between the controller receiving successive keep alives. With ε small, the controller is likely to detect a keep alive timeout.
Fix this by calculating the RTT of the keep alive command, and adjusting the scheduling delay of the next keep alive work accordingly.
Reported-by: Costa Sapuntzakis costa@purestorage.com Reported-by: Randy Jennings randyj@purestorage.com Signed-off-by: Uday Shankar ushankar@purestorage.com Reviewed-by: Hannes Reinecke hare@suse.de Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/core.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index d87b8dfda622a..8a632bf7f5a8f 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1199,6 +1199,20 @@ static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq, struct nvme_ctrl *ctrl = rq->end_io_data; unsigned long flags; bool startka = false; + unsigned long rtt = jiffies - (rq->deadline - rq->timeout); + unsigned long delay = nvme_keep_alive_work_period(ctrl); + + /* + * Subtract off the keepalive RTT so nvme_keep_alive_work runs + * at the desired frequency. + */ + if (rtt <= delay) { + delay -= rtt; + } else { + dev_warn(ctrl->device, "long keepalive RTT (%u ms)\n", + jiffies_to_msecs(rtt)); + delay = 0; + }
blk_mq_free_request(rq);
@@ -1217,7 +1231,7 @@ static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq, startka = true; spin_unlock_irqrestore(&ctrl->lock, flags); if (startka) - nvme_queue_keep_alive_work(ctrl); + queue_delayed_work(nvme_wq, &ctrl->ka_work, delay); return RQ_END_IO_NONE; }
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit 20a99a291d564a559cc2fd013b4824a3bb3f1db7 ]
Some devices have a wrong entry in their button array which points to a GPIO which is required in another driver, so soc_button_array must not claim it.
A specific example of this is the Lenovo Yoga Book X90F / X90L, where the PNP0C40 home button entry points to a GPIO which is not a home button and which is required by the lenovo-yogabook driver.
Add a DMI quirk table which can specify an ACPI GPIO resource index which should be skipped; and add an entry for the Lenovo Yoga Book X90F / X90L to this new DMI quirk table.
Signed-off-by: Hans de Goede hdegoede@redhat.com Link: https://lore.kernel.org/r/20230414072116.4497-1-hdegoede@redhat.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/input/misc/soc_button_array.c | 30 +++++++++++++++++++++++++++ 1 file changed, 30 insertions(+)
diff --git a/drivers/input/misc/soc_button_array.c b/drivers/input/misc/soc_button_array.c index 09489380afda7..e79f5497948b8 100644 --- a/drivers/input/misc/soc_button_array.c +++ b/drivers/input/misc/soc_button_array.c @@ -108,6 +108,27 @@ static const struct dmi_system_id dmi_use_low_level_irq[] = { {} /* Terminating entry */ };
+/* + * Some devices have a wrong entry which points to a GPIO which is + * required in another driver, so this driver must not claim it. + */ +static const struct dmi_system_id dmi_invalid_acpi_index[] = { + { + /* + * Lenovo Yoga Book X90F / X90L, the PNP0C40 home button entry + * points to a GPIO which is not a home button and which is + * required by the lenovo-yogabook driver. + */ + .matches = { + DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Intel Corporation"), + DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "CHERRYVIEW D1 PLATFORM"), + DMI_EXACT_MATCH(DMI_PRODUCT_VERSION, "YETI-11"), + }, + .driver_data = (void *)1l, + }, + {} /* Terminating entry */ +}; + /* * Get the Nth GPIO number from the ACPI object. */ @@ -137,6 +158,8 @@ soc_button_device_create(struct platform_device *pdev, struct platform_device *pd; struct gpio_keys_button *gpio_keys; struct gpio_keys_platform_data *gpio_keys_pdata; + const struct dmi_system_id *dmi_id; + int invalid_acpi_index = -1; int error, gpio, irq; int n_buttons = 0;
@@ -154,10 +177,17 @@ soc_button_device_create(struct platform_device *pdev, gpio_keys = (void *)(gpio_keys_pdata + 1); n_buttons = 0;
+ dmi_id = dmi_first_match(dmi_invalid_acpi_index); + if (dmi_id) + invalid_acpi_index = (long)dmi_id->driver_data; + for (info = button_info; info->name; info++) { if (info->autorepeat != autorepeat) continue;
+ if (info->acpi_index == invalid_acpi_index) + continue; + error = soc_button_lookup_gpio(&pdev->dev, info->acpi_index, &gpio, &irq); if (error || irq < 0) { /*
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
[ Upstream commit ca8fc6814844d8787e7fec61b2544a871ea8b675 ]
The WCD938x audio codec Soundwire interface part is not a DAI and does not allow sound-dai-cells:
sc7280-idp.dtb: codec@0,4: '#sound-dai-cells' does not match any of the regexes: 'pinctrl-[0-9]+'
Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Reviewed-by: Douglas Anderson dianders@chromium.org Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Signed-off-by: Bjorn Andersson andersson@kernel.org Link: https://lore.kernel.org/r/20230220095401.64196-1-krzysztof.kozlowski@linaro.... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/qcom/sc7280-idp.dtsi | 2 -- 1 file changed, 2 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/sc7280-idp.dtsi b/arch/arm64/boot/dts/qcom/sc7280-idp.dtsi index 8b5293e7fd2a3..a0e767ff252c5 100644 --- a/arch/arm64/boot/dts/qcom/sc7280-idp.dtsi +++ b/arch/arm64/boot/dts/qcom/sc7280-idp.dtsi @@ -480,7 +480,6 @@ wcd_rx: codec@0,4 { compatible = "sdw20217010d00"; reg = <0 4>; - #sound-dai-cells = <1>; qcom,rx-port-mapping = <1 2 3 4 5>; }; }; @@ -491,7 +490,6 @@ wcd_tx: codec@0,3 { compatible = "sdw20217010d00"; reg = <0 3>; - #sound-dai-cells = <1>; qcom,tx-port-mapping = <1 2 3 4>; }; };
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
[ Upstream commit 16bd455d0897d1b8b7a9aee2ed51d75b14a34563 ]
The WCD938x audio codec Soundwire interface part is not a DAI and does not allow sound-dai-cells:
sc7280-herobrine-crd.dtb: codec@0,4: '#sound-dai-cells' does not match any of the regexes: 'pinctrl-[0-9]+'
Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Reviewed-by: Douglas Anderson dianders@chromium.org Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Signed-off-by: Bjorn Andersson andersson@kernel.org Link: https://lore.kernel.org/r/20230220095401.64196-2-krzysztof.kozlowski@linaro.... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/qcom/sc7280-qcard.dtsi | 2 -- 1 file changed, 2 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/sc7280-qcard.dtsi b/arch/arm64/boot/dts/qcom/sc7280-qcard.dtsi index 88204f794ccb7..1080d701d45a2 100644 --- a/arch/arm64/boot/dts/qcom/sc7280-qcard.dtsi +++ b/arch/arm64/boot/dts/qcom/sc7280-qcard.dtsi @@ -419,7 +419,6 @@ wcd_rx: codec@0,4 { compatible = "sdw20217010d00"; reg = <0 4>; - #sound-dai-cells = <1>; qcom,rx-port-mapping = <1 2 3 4 5>; }; }; @@ -428,7 +427,6 @@ wcd_tx: codec@0,3 { compatible = "sdw20217010d00"; reg = <0 3>; - #sound-dai-cells = <1>; qcom,tx-port-mapping = <1 2 3 4>; }; };
From: Vineeth Vijayan vneethv@linux.ibm.com
[ Upstream commit 89c0c62e947a01e7a36b54582fd9c9e346170255 ]
Currently, if the device is offline and all the channel paths are either configured or varied offline, the associated subchannel gets unregistered. Don't unregister the subchannel, instead unregister offline device.
Signed-off-by: Vineeth Vijayan vneethv@linux.ibm.com Reviewed-by: Peter Oberparleiter oberpar@linux.ibm.com Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/s390/cio/device.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/s390/cio/device.c b/drivers/s390/cio/device.c index d5c43e9b51289..c0d620ffea618 100644 --- a/drivers/s390/cio/device.c +++ b/drivers/s390/cio/device.c @@ -1376,6 +1376,7 @@ void ccw_device_set_notoper(struct ccw_device *cdev) enum io_sch_action { IO_SCH_UNREG, IO_SCH_ORPH_UNREG, + IO_SCH_UNREG_CDEV, IO_SCH_ATTACH, IO_SCH_UNREG_ATTACH, IO_SCH_ORPH_ATTACH, @@ -1408,7 +1409,7 @@ static enum io_sch_action sch_get_action(struct subchannel *sch) } if ((sch->schib.pmcw.pam & sch->opm) == 0) { if (ccw_device_notify(cdev, CIO_NO_PATH) != NOTIFY_OK) - return IO_SCH_UNREG; + return IO_SCH_UNREG_CDEV; return IO_SCH_DISC; } if (device_is_disconnected(cdev)) @@ -1470,6 +1471,7 @@ static int io_subchannel_sch_event(struct subchannel *sch, int process) case IO_SCH_ORPH_ATTACH: ccw_device_set_disconnected(cdev); break; + case IO_SCH_UNREG_CDEV: case IO_SCH_UNREG_ATTACH: case IO_SCH_UNREG: if (!cdev) @@ -1503,6 +1505,7 @@ static int io_subchannel_sch_event(struct subchannel *sch, int process) if (rc) goto out; break; + case IO_SCH_UNREG_CDEV: case IO_SCH_UNREG_ATTACH: spin_lock_irqsave(sch->lock, flags); sch_set_cdev(sch, NULL);
From: Clark Wang xiaoning.wang@nxp.com
[ Upstream commit 9728fb3ce11729aa8c276825ddf504edeb00611d ]
When all bits of IER are set to 0, we still can observe the lpspi irq events when using DMA mode to transfer data.
So disable irq to avoid the too much irq events.
Signed-off-by: Clark Wang xiaoning.wang@nxp.com Link: https://lore.kernel.org/r/20230505063557.3962220-1-xiaoning.wang@nxp.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-fsl-lpspi.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/spi/spi-fsl-lpspi.c b/drivers/spi/spi-fsl-lpspi.c index 34488de555871..457fe6bc7e41e 100644 --- a/drivers/spi/spi-fsl-lpspi.c +++ b/drivers/spi/spi-fsl-lpspi.c @@ -910,9 +910,14 @@ static int fsl_lpspi_probe(struct platform_device *pdev) ret = fsl_lpspi_dma_init(&pdev->dev, fsl_lpspi, controller); if (ret == -EPROBE_DEFER) goto out_pm_get; - if (ret < 0) dev_err(&pdev->dev, "dma setup error %d, use pio\n", ret); + else + /* + * disable LPSPI module IRQ when enable DMA mode successfully, + * to prevent the unexpected LPSPI module IRQ events. + */ + disable_irq(irq);
ret = devm_spi_register_controller(&pdev->dev, controller); if (ret < 0) {
From: Srinivas Kandagatla srinivas.kandagatla@linaro.org
[ Upstream commit 2d7c2f9272de6347a9cec0fc07708913692c0ae3 ]
regmap-sdw does not support multi register writes, so there is no point in setting this flag. This also leads to incorrect programming of WSA codecs with regmap_multi_reg_write() call.
This invalid configuration should have been rejected by regmap-sdw.
Signed-off-by: Srinivas Kandagatla srinivas.kandagatla@linaro.org Link: https://lore.kernel.org/r/20230523165414.14560-1-srinivas.kandagatla@linaro.... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/wcd938x-sdw.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/sound/soc/codecs/wcd938x-sdw.c b/sound/soc/codecs/wcd938x-sdw.c index 402286dfaea44..9c10200ff34b2 100644 --- a/sound/soc/codecs/wcd938x-sdw.c +++ b/sound/soc/codecs/wcd938x-sdw.c @@ -1190,7 +1190,6 @@ static const struct regmap_config wcd938x_regmap_config = { .readable_reg = wcd938x_readable_register, .writeable_reg = wcd938x_writeable_register, .volatile_reg = wcd938x_volatile_register, - .can_multi_write = true, };
static const struct sdw_slave_ops wcd9380_slave_ops = {
From: Herve Codina herve.codina@bootlin.com
[ Upstream commit 8938f75a5e35c597a647c28984a0304da7a33d63 ]
In the error path, a of_node_put() for platform is missing. Just add it.
Signed-off-by: Herve Codina herve.codina@bootlin.com Acked-by: Kuninori Morimoto kuninori.morimoto.gx@renesas.com Link: https://lore.kernel.org/r/20230523151223.109551-9-herve.codina@bootlin.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/generic/simple-card.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/sound/soc/generic/simple-card.c b/sound/soc/generic/simple-card.c index e98932c167542..5f8468ff36562 100644 --- a/sound/soc/generic/simple-card.c +++ b/sound/soc/generic/simple-card.c @@ -416,6 +416,7 @@ static int __simple_for_each_link(struct asoc_simple_priv *priv,
if (ret < 0) { of_node_put(codec); + of_node_put(plat); of_node_put(np); goto error; }
From: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
[ Upstream commit 700581ede41d029403feec935df4616309696fd7 ]
A BIOS/DMI update seems to have broken some devices, let's add a new mapping.
Link: https://github.com/thesofproject/linux/issues/4323 Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Reviewed-by: Rander Wang rander.wang@intel.com Signed-off-by: Bard Liao yung-chuan.liao@linux.intel.com Link: https://lore.kernel.org/r/20230515074859.3097-1-yung-chuan.liao@linux.intel.... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/soundwire/dmi-quirks.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/soundwire/dmi-quirks.c b/drivers/soundwire/dmi-quirks.c index 58ea013fa918a..2a1096dab63d3 100644 --- a/drivers/soundwire/dmi-quirks.c +++ b/drivers/soundwire/dmi-quirks.c @@ -99,6 +99,13 @@ static const struct dmi_system_id adr_remap_quirk_table[] = { }, .driver_data = (void *)intel_tgl_bios, }, + { + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "HP"), + DMI_MATCH(DMI_BOARD_NAME, "8709"), + }, + .driver_data = (void *)intel_tgl_bios, + }, { /* quirk used for NUC15 'Bishop County' LAPBC510 and LAPBC710 skews */ .matches = {
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
[ Upstream commit 99e09b9c0ab43346c52f2787ca4e5c4b1798362e ]
Reverse actions in qcom_swrm_startup() error paths to avoid leaking stream memory and keeping runtime PM unbalanced.
Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Reviewed-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Link: https://lore.kernel.org/r/20230517163736.997553-1-krzysztof.kozlowski@linaro... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/soundwire/qcom.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/drivers/soundwire/qcom.c b/drivers/soundwire/qcom.c index 30575ed20947e..0dcdbd4e1ec3a 100644 --- a/drivers/soundwire/qcom.c +++ b/drivers/soundwire/qcom.c @@ -1098,8 +1098,10 @@ static int qcom_swrm_startup(struct snd_pcm_substream *substream, }
sruntime = sdw_alloc_stream(dai->name); - if (!sruntime) - return -ENOMEM; + if (!sruntime) { + ret = -ENOMEM; + goto err_alloc; + }
ctrl->sruntime[dai->id] = sruntime;
@@ -1109,12 +1111,19 @@ static int qcom_swrm_startup(struct snd_pcm_substream *substream, if (ret < 0 && ret != -ENOTSUPP) { dev_err(dai->dev, "Failed to set sdw stream on %s\n", codec_dai->name); - sdw_release_stream(sruntime); - return ret; + goto err_set_stream; } }
return 0; + +err_set_stream: + sdw_release_stream(sruntime); +err_alloc: + pm_runtime_mark_last_busy(ctrl->dev); + pm_runtime_put_autosuspend(ctrl->dev); + + return ret; }
static void qcom_swrm_shutdown(struct snd_pcm_substream *substream,
From: Hao Yao hao.yao@intel.com
[ Upstream commit fb109fba728407fa4a84d659b5cb87cd8399d7b3 ]
When int3472 is loaded before GPIO driver, acpi_get_and_request_gpiod() failed but the returned gpio descriptor is not NULL, it will cause panic in later gpiod_put(), so set the gpio_desc to NULL in register error handling to avoid such crash.
Signed-off-by: Hao Yao hao.yao@intel.com Signed-off-by: Bingbu Cao bingbu.cao@intel.com Link: https://lore.kernel.org/r/20230524035135.90315-1-bingbu.cao@intel.com Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../platform/x86/intel/int3472/clk_and_regulator.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/platform/x86/intel/int3472/clk_and_regulator.c b/drivers/platform/x86/intel/int3472/clk_and_regulator.c index 1086c3d834945..399f0623ca1b5 100644 --- a/drivers/platform/x86/intel/int3472/clk_and_regulator.c +++ b/drivers/platform/x86/intel/int3472/clk_and_regulator.c @@ -101,9 +101,11 @@ int skl_int3472_register_clock(struct int3472_discrete_device *int3472,
int3472->clock.ena_gpio = acpi_get_and_request_gpiod(path, agpio->pin_table[0], "int3472,clk-enable"); - if (IS_ERR(int3472->clock.ena_gpio)) - return dev_err_probe(int3472->dev, PTR_ERR(int3472->clock.ena_gpio), - "getting clk-enable GPIO\n"); + if (IS_ERR(int3472->clock.ena_gpio)) { + ret = PTR_ERR(int3472->clock.ena_gpio); + int3472->clock.ena_gpio = NULL; + return dev_err_probe(int3472->dev, ret, "getting clk-enable GPIO\n"); + }
if (polarity == GPIO_ACTIVE_LOW) gpiod_toggle_active_low(int3472->clock.ena_gpio); @@ -199,8 +201,9 @@ int skl_int3472_register_regulator(struct int3472_discrete_device *int3472, int3472->regulator.gpio = acpi_get_and_request_gpiod(path, agpio->pin_table[0], "int3472,regulator"); if (IS_ERR(int3472->regulator.gpio)) { - dev_err(int3472->dev, "Failed to get regulator GPIO line\n"); - return PTR_ERR(int3472->regulator.gpio); + ret = PTR_ERR(int3472->regulator.gpio); + int3472->regulator.gpio = NULL; + return dev_err_probe(int3472->dev, ret, "getting regulator GPIO\n"); }
/* Ensure the pin is in output mode and non-active state */
From: Edson Juliano Drosdeck edson.drosdeck@gmail.com
[ Upstream commit e384dba03e3294ce7ea69e4da558e9bf8f0e8946 ]
Add entries for Positivo laptops: CW14Q01P, K1424G, N14ZP74G to the DMI table, so that active-high jack-detect will work properly on these laptops.
Signed-off-by: Edson Juliano Drosdeck edson.drosdeck@gmail.com Link: https://lore.kernel.org/r/20230529181911.632851-1-edson.drosdeck@gmail.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/nau8824.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/sound/soc/codecs/nau8824.c b/sound/soc/codecs/nau8824.c index 4f19fd9b65d11..5a4db8944d06a 100644 --- a/sound/soc/codecs/nau8824.c +++ b/sound/soc/codecs/nau8824.c @@ -1903,6 +1903,30 @@ static const struct dmi_system_id nau8824_quirk_table[] = { }, .driver_data = (void *)(NAU8824_MONO_SPEAKER), }, + { + /* Positivo CW14Q01P */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Positivo Tecnologia SA"), + DMI_MATCH(DMI_BOARD_NAME, "CW14Q01P"), + }, + .driver_data = (void *)(NAU8824_JD_ACTIVE_HIGH), + }, + { + /* Positivo K1424G */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Positivo Tecnologia SA"), + DMI_MATCH(DMI_BOARD_NAME, "K1424G"), + }, + .driver_data = (void *)(NAU8824_JD_ACTIVE_HIGH), + }, + { + /* Positivo N14ZP74G */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Positivo Tecnologia SA"), + DMI_MATCH(DMI_BOARD_NAME, "N14ZP74G"), + }, + .driver_data = (void *)(NAU8824_JD_ACTIVE_HIGH), + }, {} };
From: Sicong Jiang kevin.jiangsc@gmail.com
[ Upstream commit 57d1e8900495cf1751cec74db16fe1a0fe47efbb ]
Thinkpad Neo14 Ryzen Edition uses Ryzen 6800H processor, and adding to quirks list for acp6x will enable internal mic.
Signed-off-by: Sicong Jiang kevin.jiangsc@gmail.com Link: https://lore.kernel.org/r/20230531090635.89565-1-kevin.jiangsc@gmail.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/amd/yc/acp6x-mach.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/sound/soc/amd/yc/acp6x-mach.c b/sound/soc/amd/yc/acp6x-mach.c index 84b401b685f7f..c1ca3ceac5f2f 100644 --- a/sound/soc/amd/yc/acp6x-mach.c +++ b/sound/soc/amd/yc/acp6x-mach.c @@ -171,6 +171,13 @@ static const struct dmi_system_id yc_acp_quirk_table[] = { DMI_MATCH(DMI_PRODUCT_NAME, "21CL"), } }, + { + .driver_data = &acp6x_card, + .matches = { + DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "21EF"), + } + }, { .driver_data = &acp6x_card, .matches = {
From: Andreas Gruenbacher agruenba@redhat.com
[ Upstream commit fa58cc888d67e640e354d8b3ceef877ea167b0cf ]
When a direct I/O write is performed, iomap_dio_rw() invalidates the part of the page cache which the write is going to before carrying out the write. In the odd case, the direct I/O write will be reading from the same page it is writing to. gfs2 carries out writes with page faults disabled, so it should have been obvious that this page invalidation can cause iomap_dio_rw() to never make any progress. Currently, gfs2 will end up in an endless retry loop in gfs2_file_direct_write() instead, though.
Break this endless loop by limiting the number of retries and falling back to buffered I/O after that.
Also simplify should_fault_in_pages() sightly and add a comment to make the above case easier to understand.
Reported-by: Jan Kara jack@suse.cz Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/gfs2/file.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 300844f50dcd2..cb62c8f07d1e7 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -784,9 +784,13 @@ static inline bool should_fault_in_pages(struct iov_iter *i, if (!user_backed_iter(i)) return false;
+ /* + * Try to fault in multiple pages initially. When that doesn't result + * in any progress, fall back to a single page. + */ size = PAGE_SIZE; offs = offset_in_page(iocb->ki_pos); - if (*prev_count != count || !*window_size) { + if (*prev_count != count) { size_t nr_dirtied;
nr_dirtied = max(current->nr_dirtied_pause - @@ -870,6 +874,7 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from, struct gfs2_inode *ip = GFS2_I(inode); size_t prev_count = 0, window_size = 0; size_t written = 0; + bool enough_retries; ssize_t ret;
/* @@ -913,11 +918,17 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from, if (ret > 0) written = ret;
+ enough_retries = prev_count == iov_iter_count(from) && + window_size <= PAGE_SIZE; if (should_fault_in_pages(from, iocb, &prev_count, &window_size)) { gfs2_glock_dq(gh); window_size -= fault_in_iov_iter_readable(from, window_size); - if (window_size) - goto retry; + if (window_size) { + if (!enough_retries) + goto retry; + /* fall back to buffered I/O */ + ret = 0; + } } out_unlock: if (gfs2_holder_queued(gh))
From: Alexander Gordeev agordeev@linux.ibm.com
[ Upstream commit 03c5c83b70dca3729a3eb488e668e5044bd9a5ea ]
Avoid linker error for randomly generated config file that has CONFIG_BRANCH_PROFILE_NONE enabled and make it similar to riscv, x86 and also to commit 4bf3ec384edf ("s390: disable branch profiling for vdso").
Reviewed-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/purgatory/Makefile | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/s390/purgatory/Makefile b/arch/s390/purgatory/Makefile index 32573b4f9bd20..cc8cf5abea158 100644 --- a/arch/s390/purgatory/Makefile +++ b/arch/s390/purgatory/Makefile @@ -26,6 +26,7 @@ KBUILD_CFLAGS += -Wno-pointer-sign -Wno-sign-compare KBUILD_CFLAGS += -fno-zero-initialized-in-bss -fno-builtin -ffreestanding KBUILD_CFLAGS += -Os -m64 -msoft-float -fno-common KBUILD_CFLAGS += -fno-stack-protector +KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING KBUILD_CFLAGS += $(CLANG_FLAGS) KBUILD_CFLAGS += $(call cc-option,-fno-PIE) KBUILD_AFLAGS := $(filter-out -DCC_USING_EXPOLINE,$(KBUILD_AFLAGS))
From: Chancel Liu chancel.liu@nxp.com
[ Upstream commit 32cf0046a652116d6a216d575f3049a9ff9dd80d ]
There's an issue on SAI synchronous mode that TX/RX side can't get BCLK from RX/TX it sync with if BYP bit is asserted. It's a workaround to fix it that enable SION of IOMUX pad control and assert BCI.
For example if TX sync with RX which means both TX and RX are using clk form RX and BYP=1. TX can get BCLK only if the following two conditions are valid: 1. SION of RX BCLK IOMUX pad is set to 1 2. BCI of TX is set to 1
Signed-off-by: Chancel Liu chancel.liu@nxp.com Acked-by: Shengjiu Wang shengjiu.wang@gmail.com Link: https://lore.kernel.org/r/20230530103012.3448838-1-chancel.liu@nxp.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/fsl/fsl_sai.c | 11 +++++++++-- sound/soc/fsl/fsl_sai.h | 1 + 2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c index 990bba0be1fb1..8a64f3c1d1556 100644 --- a/sound/soc/fsl/fsl_sai.c +++ b/sound/soc/fsl/fsl_sai.c @@ -491,14 +491,21 @@ static int fsl_sai_set_bclk(struct snd_soc_dai *dai, bool tx, u32 freq) regmap_update_bits(sai->regmap, reg, FSL_SAI_CR2_MSEL_MASK, FSL_SAI_CR2_MSEL(sai->mclk_id[tx]));
- if (savediv == 1) + if (savediv == 1) { regmap_update_bits(sai->regmap, reg, FSL_SAI_CR2_DIV_MASK | FSL_SAI_CR2_BYP, FSL_SAI_CR2_BYP); - else + if (fsl_sai_dir_is_synced(sai, adir)) + regmap_update_bits(sai->regmap, FSL_SAI_xCR2(tx, ofs), + FSL_SAI_CR2_BCI, FSL_SAI_CR2_BCI); + else + regmap_update_bits(sai->regmap, FSL_SAI_xCR2(tx, ofs), + FSL_SAI_CR2_BCI, 0); + } else { regmap_update_bits(sai->regmap, reg, FSL_SAI_CR2_DIV_MASK | FSL_SAI_CR2_BYP, savediv / 2 - 1); + }
if (sai->soc_data->max_register >= FSL_SAI_MCTL) { /* SAI is in master mode at this point, so enable MCLK */ diff --git a/sound/soc/fsl/fsl_sai.h b/sound/soc/fsl/fsl_sai.h index 197748a888d5f..a53c4f0e25faf 100644 --- a/sound/soc/fsl/fsl_sai.h +++ b/sound/soc/fsl/fsl_sai.h @@ -116,6 +116,7 @@
/* SAI Transmit and Receive Configuration 2 Register */ #define FSL_SAI_CR2_SYNC BIT(30) +#define FSL_SAI_CR2_BCI BIT(28) #define FSL_SAI_CR2_MSEL_MASK (0x3 << 26) #define FSL_SAI_CR2_MSEL_BUS 0 #define FSL_SAI_CR2_MSEL_MCLK1 BIT(26)
From: Min-Hua Chen minhuadotchen@gmail.com
[ Upstream commit 8cde87b007dad2e461015ff70352af56ceb02c75 ]
This patch fixes the following sparse warning:
net/sched/sch_api.c:2305:1: sparse: warning: symbol 'tc_skip_wrapper' was not declared. Should it be static?
No functional change intended.
Signed-off-by: Min-Hua Chen minhuadotchen@gmail.com Acked-by: Pedro Tammela pctammela@mojatatu.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_api.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 3f7311529cc00..34c90c9c2fcad 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -2324,7 +2324,9 @@ static struct pernet_operations psched_net_ops = { .exit = psched_net_exit, };
+#if IS_ENABLED(CONFIG_RETPOLINE) DEFINE_STATIC_KEY_FALSE(tc_skip_wrapper); +#endif
static int __init pktsched_init(void) {
From: Sayed, Karimuddin karimuddin.sayed@intel.com
[ Upstream commit 1a93f10c5b12bd766a537b24a50fca5373467303 ]
Add "Intel Reference boad" and "Intel NUC 13" SSID in the alc256. Enable jack headset volume buttons
Reviewed-by: Kai Vehmanen kai.vehmanen@linux.intel.com Signed-off-by: Sayed, Karimuddin karimuddin.sayed@intel.com Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Link: https://lore.kernel.org/r/20230602193812.66768-1-pierre-louis.bossart@linux.... Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/pci/hda/patch_realtek.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 920e44ba998a5..eb049014f87ac 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -9594,6 +9594,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x10ec, 0x124c, "Intel Reference board", ALC295_FIXUP_CHROME_BOOK), SND_PCI_QUIRK(0x10ec, 0x1252, "Intel Reference board", ALC295_FIXUP_CHROME_BOOK), SND_PCI_QUIRK(0x10ec, 0x1254, "Intel Reference board", ALC295_FIXUP_CHROME_BOOK), + SND_PCI_QUIRK(0x10ec, 0x12cc, "Intel Reference board", ALC225_FIXUP_HEADSET_JACK), SND_PCI_QUIRK(0x10f7, 0x8338, "Panasonic CF-SZ6", ALC269_FIXUP_HEADSET_MODE), SND_PCI_QUIRK(0x144d, 0xc109, "Samsung Ativ book 9 (NP900X3G)", ALC269_FIXUP_INV_DMIC), SND_PCI_QUIRK(0x144d, 0xc169, "Samsung Notebook 9 Pen (NP930SBE-K01US)", ALC298_FIXUP_SAMSUNG_AMP), @@ -9814,6 +9815,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x8086, 0x2074, "Intel NUC 8", ALC233_FIXUP_INTEL_NUC8_DMIC), SND_PCI_QUIRK(0x8086, 0x2080, "Intel NUC 8 Rugged", ALC256_FIXUP_INTEL_NUC8_RUGGED), SND_PCI_QUIRK(0x8086, 0x2081, "Intel NUC 10", ALC256_FIXUP_INTEL_NUC10), + SND_PCI_QUIRK(0x8086, 0x3038, "Intel NUC 13", ALC225_FIXUP_HEADSET_JACK), SND_PCI_QUIRK(0xf111, 0x0001, "Framework Laptop", ALC295_FIXUP_FRAMEWORK_LAPTOP_MIC_NO_PRESENCE),
#if 0
From: Simon Horman horms@kernel.org
[ Upstream commit 7ebfd881abe9e0ea9557b29dab6aa28d294fabb4 ]
Rather than casting pci1xxxx_i2c_shutdown to an incompatible function type, update the type to match that expected by __devm_add_action.
Reported by clang-16 with W-1:
.../i2c-mchp-pci1xxxx.c:1159:29: error: cast from 'void (*)(struct pci1xxxx_i2c *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict] ret = devm_add_action(dev, (void (*)(void *))pci1xxxx_i2c_shutdown, i2c); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./include/linux/device.h:251:29: note: expanded from macro 'devm_add_action' __devm_add_action(release, action, data, #action) ^~~~~~
No functional change intended. Compile tested only.
Signed-off-by: Simon Horman horms@kernel.org Reviewed-by: Horatiu Vultur horatiu.vultur@microchip.com Reviewed-by: Andi Shyti andi.shyti@kernel.org Reviewed-by: Tharun Kumar Ptharunkumar.pasumarthi@microchip.com Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-mchp-pci1xxxx.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/i2c/busses/i2c-mchp-pci1xxxx.c b/drivers/i2c/busses/i2c-mchp-pci1xxxx.c index b21ffd6df9276..5ef136c3ecb12 100644 --- a/drivers/i2c/busses/i2c-mchp-pci1xxxx.c +++ b/drivers/i2c/busses/i2c-mchp-pci1xxxx.c @@ -1118,8 +1118,10 @@ static int pci1xxxx_i2c_resume(struct device *dev) static DEFINE_SIMPLE_DEV_PM_OPS(pci1xxxx_i2c_pm_ops, pci1xxxx_i2c_suspend, pci1xxxx_i2c_resume);
-static void pci1xxxx_i2c_shutdown(struct pci1xxxx_i2c *i2c) +static void pci1xxxx_i2c_shutdown(void *data) { + struct pci1xxxx_i2c *i2c = data; + pci1xxxx_i2c_config_padctrl(i2c, false); pci1xxxx_i2c_configure_core_reg(i2c, false); } @@ -1156,7 +1158,7 @@ static int pci1xxxx_i2c_probe_pci(struct pci_dev *pdev, init_completion(&i2c->i2c_xfer_done); pci1xxxx_i2c_init(i2c);
- ret = devm_add_action(dev, (void (*)(void *))pci1xxxx_i2c_shutdown, i2c); + ret = devm_add_action(dev, pci1xxxx_i2c_shutdown, i2c); if (ret) return ret;
From: David Zheng david.zheng@intel.com
[ Upstream commit 1acfc6e753ed978b36d722f54e57fe4d1e8a6ffa ]
With IC_INTR_RX_FULL slave interrupt handler reads data in a loop until RX FIFO is empty. When testing with the slave-eeprom, each transaction has 2 bytes for address/index and 1 byte for value, the address byte can be written as data byte due to dropping STOP condition.
In the test below, the master continuously writes to the slave, first 2 bytes are index, 3rd byte is value and follow by a STOP condition.
i2c_write: i2c-3 #0 a=04b f=0000 l=3 [00-D1-D1] i2c_write: i2c-3 #0 a=04b f=0000 l=3 [00-D2-D2] i2c_write: i2c-3 #0 a=04b f=0000 l=3 [00-D3-D3]
Upon receiving STOP condition slave eeprom would reset `idx_write_cnt` so next 2 bytes can be treated as buffer index for upcoming transaction. Supposedly the slave eeprom buffer would be written as
EEPROM[0x00D1] = 0xD1 EEPROM[0x00D2] = 0xD2 EEPROM[0x00D3] = 0xD3
When CPU load is high the slave irq handler may not read fast enough, the interrupt status can be seen as 0x204 with both DW_IC_INTR_STOP_DET (0x200) and DW_IC_INTR_RX_FULL (0x4) bits. The slave device may see the transactions below.
0x1 STATUS SLAVE_ACTIVITY=0x1 : RAW_INTR_STAT=0x1594 : INTR_STAT=0x4 0x1 STATUS SLAVE_ACTIVITY=0x1 : RAW_INTR_STAT=0x1594 : INTR_STAT=0x4 0x1 STATUS SLAVE_ACTIVITY=0x1 : RAW_INTR_STAT=0x1594 : INTR_STAT=0x4 0x1 STATUS SLAVE_ACTIVITY=0x1 : RAW_INTR_STAT=0x1794 : INTR_STAT=0x204 0x1 STATUS SLAVE_ACTIVITY=0x0 : RAW_INTR_STAT=0x1790 : INTR_STAT=0x200 0x1 STATUS SLAVE_ACTIVITY=0x1 : RAW_INTR_STAT=0x1594 : INTR_STAT=0x4 0x1 STATUS SLAVE_ACTIVITY=0x1 : RAW_INTR_STAT=0x1594 : INTR_STAT=0x4 0x1 STATUS SLAVE_ACTIVITY=0x1 : RAW_INTR_STAT=0x1594 : INTR_STAT=0x4
After `D1` is received, read loop continues to read `00` which is the first bype of next index. Since STOP condition is ignored by the loop, eeprom buffer index increased to `D2` and `00` is written as value.
So the slave eeprom buffer becomes
EEPROM[0x00D1] = 0xD1 EEPROM[0x00D2] = 0x00 EEPROM[0x00D3] = 0xD3
The fix is to use `FIRST_DATA_BYTE` (bit 11) in `IC_DATA_CMD` to split the transactions. The first index byte in this case would have bit 11 set. Check this indication to inject I2C_SLAVE_WRITE_REQUESTED event which will reset `idx_write_cnt` in slave eeprom.
Signed-off-by: David Zheng david.zheng@intel.com Acked-by: Jarkko Nikula jarkko.nikula@linux.intel.com Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-designware-core.h | 1 + drivers/i2c/busses/i2c-designware-slave.c | 4 ++++ 2 files changed, 5 insertions(+)
diff --git a/drivers/i2c/busses/i2c-designware-core.h b/drivers/i2c/busses/i2c-designware-core.h index 050d8c63ad3c5..0164d92163308 100644 --- a/drivers/i2c/busses/i2c-designware-core.h +++ b/drivers/i2c/busses/i2c-designware-core.h @@ -40,6 +40,7 @@ #define DW_IC_CON_BUS_CLEAR_CTRL BIT(11)
#define DW_IC_DATA_CMD_DAT GENMASK(7, 0) +#define DW_IC_DATA_CMD_FIRST_DATA_BYTE BIT(11)
/* * Registers offset diff --git a/drivers/i2c/busses/i2c-designware-slave.c b/drivers/i2c/busses/i2c-designware-slave.c index cec25054bb244..2e079cf20bb5b 100644 --- a/drivers/i2c/busses/i2c-designware-slave.c +++ b/drivers/i2c/busses/i2c-designware-slave.c @@ -176,6 +176,10 @@ static irqreturn_t i2c_dw_isr_slave(int this_irq, void *dev_id)
do { regmap_read(dev->map, DW_IC_DATA_CMD, &tmp); + if (tmp & DW_IC_DATA_CMD_FIRST_DATA_BYTE) + i2c_slave_event(dev->slave, + I2C_SLAVE_WRITE_REQUESTED, + &val); val = tmp; i2c_slave_event(dev->slave, I2C_SLAVE_WRITE_RECEIVED, &val);
From: Linus Walleij linus.walleij@linaro.org
[ Upstream commit 4a672d500bfd6bb87092c33d5a2572c3d0a1cf83 ]
Several device tree files get the polarity of the pendown-gpios wrong: this signal is active low. Fix up all incorrect flags, so that operating systems can rely on the flag being correctly set.
Signed-off-by: Linus Walleij linus.walleij@linaro.org Link: https://lore.kernel.org/r/20230510105156.1134320-1-linus.walleij@linaro.org Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/am57xx-cl-som-am57x.dts | 2 +- arch/arm/boot/dts/at91sam9261ek.dts | 2 +- arch/arm/boot/dts/imx7d-pico-hobbit.dts | 2 +- arch/arm/boot/dts/imx7d-sdb.dts | 2 +- arch/arm/boot/dts/omap3-cm-t3x.dtsi | 2 +- arch/arm/boot/dts/omap3-devkit8000-lcd-common.dtsi | 2 +- arch/arm/boot/dts/omap3-lilly-a83x.dtsi | 2 +- arch/arm/boot/dts/omap3-overo-common-lcd35.dtsi | 2 +- arch/arm/boot/dts/omap3-overo-common-lcd43.dtsi | 2 +- arch/arm/boot/dts/omap3-pandora-common.dtsi | 2 +- arch/arm/boot/dts/omap5-cm-t54.dts | 2 +- 11 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/arch/arm/boot/dts/am57xx-cl-som-am57x.dts b/arch/arm/boot/dts/am57xx-cl-som-am57x.dts index 2fc9a5d5e0c0d..625b9b311b49d 100644 --- a/arch/arm/boot/dts/am57xx-cl-som-am57x.dts +++ b/arch/arm/boot/dts/am57xx-cl-som-am57x.dts @@ -527,7 +527,7 @@
interrupt-parent = <&gpio1>; interrupts = <31 0>; - pendown-gpio = <&gpio1 31 0>; + pendown-gpio = <&gpio1 31 GPIO_ACTIVE_LOW>;
ti,x-min = /bits/ 16 <0x0>; diff --git a/arch/arm/boot/dts/at91sam9261ek.dts b/arch/arm/boot/dts/at91sam9261ek.dts index 88869ca874d1a..045cb253f23a6 100644 --- a/arch/arm/boot/dts/at91sam9261ek.dts +++ b/arch/arm/boot/dts/at91sam9261ek.dts @@ -156,7 +156,7 @@ compatible = "ti,ads7843"; interrupts-extended = <&pioC 2 IRQ_TYPE_EDGE_BOTH>; spi-max-frequency = <3000000>; - pendown-gpio = <&pioC 2 GPIO_ACTIVE_HIGH>; + pendown-gpio = <&pioC 2 GPIO_ACTIVE_LOW>;
ti,x-min = /bits/ 16 <150>; ti,x-max = /bits/ 16 <3830>; diff --git a/arch/arm/boot/dts/imx7d-pico-hobbit.dts b/arch/arm/boot/dts/imx7d-pico-hobbit.dts index d917dc4f2f227..6ad39dca70096 100644 --- a/arch/arm/boot/dts/imx7d-pico-hobbit.dts +++ b/arch/arm/boot/dts/imx7d-pico-hobbit.dts @@ -64,7 +64,7 @@ interrupt-parent = <&gpio2>; interrupts = <7 0>; spi-max-frequency = <1000000>; - pendown-gpio = <&gpio2 7 0>; + pendown-gpio = <&gpio2 7 GPIO_ACTIVE_LOW>; vcc-supply = <®_3p3v>; ti,x-min = /bits/ 16 <0>; ti,x-max = /bits/ 16 <4095>; diff --git a/arch/arm/boot/dts/imx7d-sdb.dts b/arch/arm/boot/dts/imx7d-sdb.dts index f483bc0afe5ea..234e5fc647b22 100644 --- a/arch/arm/boot/dts/imx7d-sdb.dts +++ b/arch/arm/boot/dts/imx7d-sdb.dts @@ -205,7 +205,7 @@ pinctrl-0 = <&pinctrl_tsc2046_pendown>; interrupt-parent = <&gpio2>; interrupts = <29 0>; - pendown-gpio = <&gpio2 29 GPIO_ACTIVE_HIGH>; + pendown-gpio = <&gpio2 29 GPIO_ACTIVE_LOW>; touchscreen-max-pressure = <255>; wakeup-source; }; diff --git a/arch/arm/boot/dts/omap3-cm-t3x.dtsi b/arch/arm/boot/dts/omap3-cm-t3x.dtsi index e61b8a2bfb7de..51baedf1603bd 100644 --- a/arch/arm/boot/dts/omap3-cm-t3x.dtsi +++ b/arch/arm/boot/dts/omap3-cm-t3x.dtsi @@ -227,7 +227,7 @@
interrupt-parent = <&gpio2>; interrupts = <25 0>; /* gpio_57 */ - pendown-gpio = <&gpio2 25 GPIO_ACTIVE_HIGH>; + pendown-gpio = <&gpio2 25 GPIO_ACTIVE_LOW>;
ti,x-min = /bits/ 16 <0x0>; ti,x-max = /bits/ 16 <0x0fff>; diff --git a/arch/arm/boot/dts/omap3-devkit8000-lcd-common.dtsi b/arch/arm/boot/dts/omap3-devkit8000-lcd-common.dtsi index 3decc2d78a6ca..a7f99ae0c1fe9 100644 --- a/arch/arm/boot/dts/omap3-devkit8000-lcd-common.dtsi +++ b/arch/arm/boot/dts/omap3-devkit8000-lcd-common.dtsi @@ -54,7 +54,7 @@
interrupt-parent = <&gpio1>; interrupts = <27 0>; /* gpio_27 */ - pendown-gpio = <&gpio1 27 GPIO_ACTIVE_HIGH>; + pendown-gpio = <&gpio1 27 GPIO_ACTIVE_LOW>;
ti,x-min = /bits/ 16 <0x0>; ti,x-max = /bits/ 16 <0x0fff>; diff --git a/arch/arm/boot/dts/omap3-lilly-a83x.dtsi b/arch/arm/boot/dts/omap3-lilly-a83x.dtsi index c595afe4181d7..d310b5c7bac36 100644 --- a/arch/arm/boot/dts/omap3-lilly-a83x.dtsi +++ b/arch/arm/boot/dts/omap3-lilly-a83x.dtsi @@ -311,7 +311,7 @@ interrupt-parent = <&gpio1>; interrupts = <8 0>; /* boot6 / gpio_8 */ spi-max-frequency = <1000000>; - pendown-gpio = <&gpio1 8 GPIO_ACTIVE_HIGH>; + pendown-gpio = <&gpio1 8 GPIO_ACTIVE_LOW>; vcc-supply = <®_vcc3>; pinctrl-names = "default"; pinctrl-0 = <&tsc2048_pins>; diff --git a/arch/arm/boot/dts/omap3-overo-common-lcd35.dtsi b/arch/arm/boot/dts/omap3-overo-common-lcd35.dtsi index 1d6e88f99eb31..c3570acc35fad 100644 --- a/arch/arm/boot/dts/omap3-overo-common-lcd35.dtsi +++ b/arch/arm/boot/dts/omap3-overo-common-lcd35.dtsi @@ -149,7 +149,7 @@
interrupt-parent = <&gpio4>; interrupts = <18 0>; /* gpio_114 */ - pendown-gpio = <&gpio4 18 GPIO_ACTIVE_HIGH>; + pendown-gpio = <&gpio4 18 GPIO_ACTIVE_LOW>;
ti,x-min = /bits/ 16 <0x0>; ti,x-max = /bits/ 16 <0x0fff>; diff --git a/arch/arm/boot/dts/omap3-overo-common-lcd43.dtsi b/arch/arm/boot/dts/omap3-overo-common-lcd43.dtsi index 7e30f9d45790e..d95a0e130058c 100644 --- a/arch/arm/boot/dts/omap3-overo-common-lcd43.dtsi +++ b/arch/arm/boot/dts/omap3-overo-common-lcd43.dtsi @@ -160,7 +160,7 @@
interrupt-parent = <&gpio4>; interrupts = <18 0>; /* gpio_114 */ - pendown-gpio = <&gpio4 18 GPIO_ACTIVE_HIGH>; + pendown-gpio = <&gpio4 18 GPIO_ACTIVE_LOW>;
ti,x-min = /bits/ 16 <0x0>; ti,x-max = /bits/ 16 <0x0fff>; diff --git a/arch/arm/boot/dts/omap3-pandora-common.dtsi b/arch/arm/boot/dts/omap3-pandora-common.dtsi index 559853764487f..4c3b6bab179cc 100644 --- a/arch/arm/boot/dts/omap3-pandora-common.dtsi +++ b/arch/arm/boot/dts/omap3-pandora-common.dtsi @@ -651,7 +651,7 @@ pinctrl-0 = <&penirq_pins>; interrupt-parent = <&gpio3>; interrupts = <30 IRQ_TYPE_NONE>; /* GPIO_94 */ - pendown-gpio = <&gpio3 30 GPIO_ACTIVE_HIGH>; + pendown-gpio = <&gpio3 30 GPIO_ACTIVE_LOW>; vcc-supply = <&vaux4>;
ti,x-min = /bits/ 16 <0>; diff --git a/arch/arm/boot/dts/omap5-cm-t54.dts b/arch/arm/boot/dts/omap5-cm-t54.dts index 2d87b9fc230ee..af288d63a26a4 100644 --- a/arch/arm/boot/dts/omap5-cm-t54.dts +++ b/arch/arm/boot/dts/omap5-cm-t54.dts @@ -354,7 +354,7 @@
interrupt-parent = <&gpio1>; interrupts = <15 0>; /* gpio1_wk15 */ - pendown-gpio = <&gpio1 15 GPIO_ACTIVE_HIGH>; + pendown-gpio = <&gpio1 15 GPIO_ACTIVE_LOW>;
ti,x-min = /bits/ 16 <0x0>;
From: Nitesh Shetty nj.shetty@samsung.com
[ Upstream commit 8cfb98196cceec35416041c6b91212d2b99392e4 ]
Memory/pages are not freed, when unloading nullblk driver.
Steps to reproduce issue 1.free -h total used free shared buff/cache available Mem: 7.8Gi 260Mi 7.1Gi 3.0Mi 395Mi 7.3Gi Swap: 0B 0B 0B 2.modprobe null_blk memory_backed=1 3.dd if=/dev/urandom of=/dev/nullb0 oflag=direct bs=1M count=1000 4.modprobe -r null_blk 5.free -h total used free shared buff/cache available Mem: 7.8Gi 1.2Gi 6.1Gi 3.0Mi 398Mi 6.3Gi Swap: 0B 0B 0B
Signed-off-by: Anuj Gupta anuj20.g@samsung.com Signed-off-by: Nitesh Shetty nj.shetty@samsung.com Link: https://lore.kernel.org/r/20230605062354.24785-1-nj.shetty@samsung.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/block/null_blk/main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c index 14491952047f5..3b6b4cb400f42 100644 --- a/drivers/block/null_blk/main.c +++ b/drivers/block/null_blk/main.c @@ -2212,6 +2212,7 @@ static void null_destroy_dev(struct nullb *nullb) struct nullb_device *dev = nullb->dev;
null_del_dev(nullb); + null_free_device_storage(dev, false); null_free_dev(dev); }
From: Inki Dae inki.dae@samsung.com
[ Upstream commit 4a059559809fd1ddbf16f847c4d2237309c08edf ]
Fix a wrong error return by dropping an error return.
When vidi driver is remvoed, if ctx->raw_edid isn't same as fake_edid_info then only what we have to is to free ctx->raw_edid so that driver removing can work correctly - it's not an error case.
Signed-off-by: Inki Dae inki.dae@samsung.com Reviewed-by: Andi Shyti andi.shyti@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/exynos/exynos_drm_vidi.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/drivers/gpu/drm/exynos/exynos_drm_vidi.c b/drivers/gpu/drm/exynos/exynos_drm_vidi.c index 4d56c8c799c5a..f5e1adfcaa514 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_vidi.c +++ b/drivers/gpu/drm/exynos/exynos_drm_vidi.c @@ -469,8 +469,6 @@ static int vidi_remove(struct platform_device *pdev) if (ctx->raw_edid != (struct edid *)fake_edid_info) { kfree(ctx->raw_edid); ctx->raw_edid = NULL; - - return -EINVAL; }
component_del(&pdev->dev, &vidi_component_ops);
From: Min Li lm0963hack@gmail.com
[ Upstream commit 48bfd02569f5db49cc033f259e66d57aa6efc9a3 ]
If it is async, runqueue_node is freed in g2d_runqueue_worker on another worker thread. So in extreme cases, if g2d_runqueue_worker runs first, and then executes the following if statement, there will be use-after-free.
Signed-off-by: Min Li lm0963hack@gmail.com Reviewed-by: Andi Shyti andi.shyti@kernel.org Signed-off-by: Inki Dae inki.dae@samsung.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/exynos/exynos_drm_g2d.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c index ec784e58da5c1..414e585ec7dd0 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c @@ -1335,7 +1335,7 @@ int exynos_g2d_exec_ioctl(struct drm_device *drm_dev, void *data, /* Let the runqueue know that there is work to do. */ queue_work(g2d->g2d_workq, &g2d->runqueue_work);
- if (runqueue_node->async) + if (req->async) goto out;
wait_for_completion(&runqueue_node->complete);
From: Min Li lm0963hack@gmail.com
[ Upstream commit 982b173a6c6d9472730c3116051977e05d17c8c5 ]
Userspace can race to free the gobj(robj converted from), robj should not be accessed again after drm_gem_object_put, otherwith it will result in use-after-free.
Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Min Li lm0963hack@gmail.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/radeon/radeon_gem.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c index 261fcbae88d78..75d79c3110389 100644 --- a/drivers/gpu/drm/radeon/radeon_gem.c +++ b/drivers/gpu/drm/radeon/radeon_gem.c @@ -459,7 +459,6 @@ int radeon_gem_set_domain_ioctl(struct drm_device *dev, void *data, struct radeon_device *rdev = dev->dev_private; struct drm_radeon_gem_set_domain *args = data; struct drm_gem_object *gobj; - struct radeon_bo *robj; int r;
/* for now if someone requests domain CPU - @@ -472,13 +471,12 @@ int radeon_gem_set_domain_ioctl(struct drm_device *dev, void *data, up_read(&rdev->exclusive_lock); return -ENOENT; } - robj = gem_to_radeon_bo(gobj);
r = radeon_gem_set_domain(gobj, args->read_domains, args->write_domain);
drm_gem_object_put(gobj); up_read(&rdev->exclusive_lock); - r = radeon_gem_handle_lockup(robj->rdev, r); + r = radeon_gem_handle_lockup(rdev, r); return r; }
From: Rong Tao rongtao@cestc.cn
[ Upstream commit 57380fd1249b20ef772549af2c58ef57b21faba7 ]
Add cpu_relax() for arm64 instead of directly assert(), and add assert.h header file. Also, add smp_wmb and smp_mb for arm64.
Compilation error as follows, avoid __always_inline undefined.
$ make cc -Wall -pthread -O2 -ggdb -flto -fwhole-program -c -o ring.o ring.c In file included from ring.c:10: main.h: In function ‘busy_wait’: main.h:99:21: warning: implicit declaration of function ‘assert’ [-Wimplicit-function-declaration] 99 | #define cpu_relax() assert(0) | ^~~~~~ main.h:107:17: note: in expansion of macro ‘cpu_relax’ 107 | cpu_relax(); | ^~~~~~~~~ main.h:12:1: note: ‘assert’ is defined in header ‘<assert.h>’; did you forget to ‘#include <assert.h>’? 11 | #include <stdbool.h> +++ |+#include <assert.h> 12 | main.h: At top level: main.h:143:23: error: expected ‘;’ before ‘void’ 143 | static __always_inline | ^ | ; 144 | void __read_once_size(const volatile void *p, void *res, int size) | ~~~~ main.h:158:23: error: expected ‘;’ before ‘void’ 158 | static __always_inline void __write_once_size(volatile void *p, void *res, int size) | ^~~~~ | ; make: *** [<builtin>: ring.o] Error 1
Signed-off-by: Rong Tao rongtao@cestc.cn Message-Id: tencent_F53E159DD7925174445D830DA19FACF44B07@qq.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/virtio/ringtest/main.h | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/tools/virtio/ringtest/main.h b/tools/virtio/ringtest/main.h index b68920d527503..d18dd317e27f9 100644 --- a/tools/virtio/ringtest/main.h +++ b/tools/virtio/ringtest/main.h @@ -8,6 +8,7 @@ #ifndef MAIN_H #define MAIN_H
+#include <assert.h> #include <stdbool.h>
extern int param; @@ -95,6 +96,8 @@ extern unsigned ring_size; #define cpu_relax() asm ("rep; nop" ::: "memory") #elif defined(__s390x__) #define cpu_relax() barrier() +#elif defined(__aarch64__) +#define cpu_relax() asm ("yield" ::: "memory") #else #define cpu_relax() assert(0) #endif @@ -112,6 +115,8 @@ static inline void busy_wait(void)
#if defined(__x86_64__) || defined(__i386__) #define smp_mb() asm volatile("lock; addl $0,-132(%%rsp)" ::: "memory", "cc") +#elif defined(__aarch64__) +#define smp_mb() asm volatile("dmb ish" ::: "memory") #else /* * Not using __ATOMIC_SEQ_CST since gcc docs say they are only synchronized @@ -136,10 +141,16 @@ static inline void busy_wait(void)
#if defined(__i386__) || defined(__x86_64__) || defined(__s390x__) #define smp_wmb() barrier() +#elif defined(__aarch64__) +#define smp_wmb() asm volatile("dmb ishst" ::: "memory") #else #define smp_wmb() smp_release() #endif
+#ifndef __always_inline +#define __always_inline inline __attribute__((always_inline)) +#endif + static __always_inline void __read_once_size(const volatile void *p, void *res, int size) {
From: Shannon Nelson shannon.nelson@amd.com
[ Upstream commit 376daf317753ccb6b1ecbdece66018f7f6313a7f ]
As is done in the net, iscsi, and vsock vhost support, let the vdpa vqs know about the features that have been negotiated. This allows vhost to more safely make decisions based on the features, such as when using PACKED vs split queues.
Signed-off-by: Shannon Nelson shannon.nelson@amd.com Acked-by: Jason Wang jasowang@redhat.com Message-Id: 20230424225031.18947-2-shannon.nelson@amd.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vhost/vdpa.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 779fc44677162..22e6b23ac96ff 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -385,7 +385,10 @@ static long vhost_vdpa_set_features(struct vhost_vdpa *v, u64 __user *featurep) { struct vdpa_device *vdpa = v->vdpa; const struct vdpa_config_ops *ops = vdpa->config; + struct vhost_dev *d = &v->vdev; + u64 actual_features; u64 features; + int i;
/* * It's not allowed to change the features after they have @@ -400,6 +403,16 @@ static long vhost_vdpa_set_features(struct vhost_vdpa *v, u64 __user *featurep) if (vdpa_set_features(vdpa, features)) return -EINVAL;
+ /* let the vqs know what has been configured */ + actual_features = ops->get_driver_features(vdpa); + for (i = 0; i < d->nvqs; ++i) { + struct vhost_virtqueue *vq = d->vqs[i]; + + mutex_lock(&vq->mutex); + vq->acked_features = actual_features; + mutex_unlock(&vq->mutex); + } + return 0; }
From: Andrey Smetanin asmetanin@yandex-team.ru
[ Upstream commit 1f5d2e3bab16369d5d4b4020a25db4ab1f4f082c ]
Fix possible virtqueue used buffers leak and corresponding stuck in case of temporary -EIO from sendmsg() which is produced by tun driver while backend device is not up.
In case of no-retriable error and zcopy do not revert upend_idx to pass packet data (that is update used_idx in corresponding vhost_zerocopy_signal_used()) as if packet data has been transferred successfully.
v2: set vq->heads[ubuf->desc].len equal to VHOST_DMA_DONE_LEN in case of fake successful transmit.
Signed-off-by: Andrey Smetanin asmetanin@yandex-team.ru Message-Id: 20230424204411.24888-1-asmetanin@yandex-team.ru Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Andrey Smetanin asmetanin@yandex-team.ru Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vhost/net.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 07181cd8d52e6..ae2273196b0c9 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -935,13 +935,18 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
err = sock->ops->sendmsg(sock, &msg, len); if (unlikely(err < 0)) { + bool retry = err == -EAGAIN || err == -ENOMEM || err == -ENOBUFS; + if (zcopy_used) { if (vq->heads[ubuf->desc].len == VHOST_DMA_IN_PROGRESS) vhost_net_ubuf_put(ubufs); - nvq->upend_idx = ((unsigned)nvq->upend_idx - 1) - % UIO_MAXIOV; + if (retry) + nvq->upend_idx = ((unsigned)nvq->upend_idx - 1) + % UIO_MAXIOV; + else + vq->heads[ubuf->desc].len = VHOST_DMA_DONE_LEN; } - if (err == -EAGAIN || err == -ENOMEM || err == -ENOBUFS) { + if (retry) { vhost_discard_vq_desc(vq, 1); vhost_net_enable_vq(net, vq); break;
From: Omar Sandoval osandov@fb.com
[ Upstream commit b9f174c811e3ae4ae8959dc57e6adb9990e913f4 ]
Commits ffb1b4a41016 ("x86/unwind/orc: Add 'signal' field to ORC metadata") and fb799447ae29 ("x86,objtool: Split UNWIND_HINT_EMPTY in two") changed the ORC format. Although ORC is internal to the kernel, it's the only way for external tools to get reliable kernel stack traces on x86-64. In particular, the drgn debugger [1] uses ORC for stack unwinding, and these format changes broke it [2]. As the drgn maintainer, I don't care how often or how much the kernel changes the ORC format as long as I have a way to detect the change.
It suffices to store a version identifier in the vmlinux and kernel module ELF files (to use when parsing ORC sections from ELF), and in kernel memory (to use when parsing ORC from a core dump+symbol table). Rather than hard-coding a version number that needs to be manually bumped, Peterz suggested hashing the definitions from orc_types.h. If there is a format change that isn't caught by this, the hashing script can be updated.
This patch adds an .orc_header allocated ELF section containing the 20-byte hash to vmlinux and kernel modules, along with the corresponding __start_orc_header and __stop_orc_header symbols in vmlinux.
1: https://github.com/osandov/drgn 2: https://github.com/osandov/drgn/issues/303
Fixes: ffb1b4a41016 ("x86/unwind/orc: Add 'signal' field to ORC metadata") Fixes: fb799447ae29 ("x86,objtool: Split UNWIND_HINT_EMPTY in two") Signed-off-by: Omar Sandoval osandov@fb.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Acked-by: Josh Poimboeuf jpoimboe@kernel.org Link: https://lkml.kernel.org/r/aef9c8dc43915b886a8c48509a12ec1b006ca1ca.168669080... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/Makefile | 12 ++++++++++++ arch/x86/include/asm/Kbuild | 1 + arch/x86/include/asm/orc_header.h | 19 +++++++++++++++++++ arch/x86/kernel/unwind_orc.c | 3 +++ include/asm-generic/vmlinux.lds.h | 3 +++ scripts/mod/modpost.c | 5 +++++ scripts/orc_hash.sh | 16 ++++++++++++++++ 7 files changed, 59 insertions(+) create mode 100644 arch/x86/include/asm/orc_header.h create mode 100644 scripts/orc_hash.sh
diff --git a/arch/x86/Makefile b/arch/x86/Makefile index b39975977c037..fdc2e3abd6152 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -305,6 +305,18 @@ ifeq ($(RETPOLINE_CFLAGS),) endif endif
+ifdef CONFIG_UNWINDER_ORC +orc_hash_h := arch/$(SRCARCH)/include/generated/asm/orc_hash.h +orc_hash_sh := $(srctree)/scripts/orc_hash.sh +targets += $(orc_hash_h) +quiet_cmd_orc_hash = GEN $@ + cmd_orc_hash = mkdir -p $(dir $@); \ + $(CONFIG_SHELL) $(orc_hash_sh) < $< > $@ +$(orc_hash_h): $(srctree)/arch/x86/include/asm/orc_types.h $(orc_hash_sh) FORCE + $(call if_changed,orc_hash) +archprepare: $(orc_hash_h) +endif + archclean: $(Q)rm -rf $(objtree)/arch/i386 $(Q)rm -rf $(objtree)/arch/x86_64 diff --git a/arch/x86/include/asm/Kbuild b/arch/x86/include/asm/Kbuild index 1e51650b79d7c..4f1ce5fc4e194 100644 --- a/arch/x86/include/asm/Kbuild +++ b/arch/x86/include/asm/Kbuild @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0
+generated-y += orc_hash.h generated-y += syscalls_32.h generated-y += syscalls_64.h generated-y += syscalls_x32.h diff --git a/arch/x86/include/asm/orc_header.h b/arch/x86/include/asm/orc_header.h new file mode 100644 index 0000000000000..07bacf3e160ea --- /dev/null +++ b/arch/x86/include/asm/orc_header.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* Copyright (c) Meta Platforms, Inc. and affiliates. */ + +#ifndef _ORC_HEADER_H +#define _ORC_HEADER_H + +#include <linux/types.h> +#include <linux/compiler.h> +#include <asm/orc_hash.h> + +/* + * The header is currently a 20-byte hash of the ORC entry definition; see + * scripts/orc_hash.sh. + */ +#define ORC_HEADER \ + __used __section(".orc_header") __aligned(4) \ + static const u8 orc_header[] = { ORC_HASH } + +#endif /* _ORC_HEADER_H */ diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index 37307b40f8daf..c960f250624ab 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -7,6 +7,9 @@ #include <asm/unwind.h> #include <asm/orc_types.h> #include <asm/orc_lookup.h> +#include <asm/orc_header.h> + +ORC_HEADER;
#define orc_warn(fmt, ...) \ printk_deferred_once(KERN_WARNING "WARNING: " fmt, ##__VA_ARGS__) diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index d1f57e4868ed3..7058b01e9f146 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -839,6 +839,9 @@
#ifdef CONFIG_UNWINDER_ORC #define ORC_UNWIND_TABLE \ + .orc_header : AT(ADDR(.orc_header) - LOAD_OFFSET) { \ + BOUNDED_SECTION_BY(.orc_header, _orc_header) \ + } \ . = ALIGN(4); \ .orc_unwind_ip : AT(ADDR(.orc_unwind_ip) - LOAD_OFFSET) { \ BOUNDED_SECTION_BY(.orc_unwind_ip, _orc_unwind_ip) \ diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index 9466b6a2abae4..5b3964b39709f 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -1985,6 +1985,11 @@ static void add_header(struct buffer *b, struct module *mod) buf_printf(b, "#include <linux/vermagic.h>\n"); buf_printf(b, "#include <linux/compiler.h>\n"); buf_printf(b, "\n"); + buf_printf(b, "#ifdef CONFIG_UNWINDER_ORC\n"); + buf_printf(b, "#include <asm/orc_header.h>\n"); + buf_printf(b, "ORC_HEADER;\n"); + buf_printf(b, "#endif\n"); + buf_printf(b, "\n"); buf_printf(b, "BUILD_SALT;\n"); buf_printf(b, "BUILD_LTO_INFO;\n"); buf_printf(b, "\n"); diff --git a/scripts/orc_hash.sh b/scripts/orc_hash.sh new file mode 100644 index 0000000000000..466611aa0053f --- /dev/null +++ b/scripts/orc_hash.sh @@ -0,0 +1,16 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) Meta Platforms, Inc. and affiliates. + +set -e + +printf '%s' '#define ORC_HASH ' + +awk ' +/^#define ORC_(REG|TYPE)_/ { print } +/^struct orc_entry {$/ { p=1 } +p { print } +/^}/ { p=0 }' | + sha1sum | + cut -d " " -f 1 | + sed 's/([0-9a-f]{2})/0x\1,/g'
From: Dheeraj Kumar Srivastava dheerajkumar.srivastava@amd.com
[ Upstream commit 85d38d5810e285d5aec7fb5283107d1da70c12a9 ]
When booting with "intremap=off" and "x2apic_phys" on the kernel command line, the physical x2APIC driver ends up being used even when x2APIC mode is disabled ("intremap=off" disables x2APIC mode). This happens because the first compound condition check in x2apic_phys_probe() is false due to x2apic_mode == 0 and so the following one returns true after default_acpi_madt_oem_check() having already selected the physical x2APIC driver.
This results in the following panic:
kernel BUG at arch/x86/kernel/apic/io_apic.c:2409! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-rc2-ver4.1rc2 #2 Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS 2.3.6 07/06/2021 RIP: 0010:setup_IO_APIC+0x9c/0xaf0 Call Trace: <TASK> ? native_read_msr apic_intr_mode_init x86_late_time_init start_kernel x86_64_start_reservations x86_64_start_kernel secondary_startup_64_no_verify </TASK>
which is:
setup_IO_APIC: apic_printk(APIC_VERBOSE, "ENABLING IO-APIC IRQs\n"); for_each_ioapic(ioapic) BUG_ON(mp_irqdomain_create(ioapic));
Return 0 to denote that x2APIC has not been enabled when probing the physical x2APIC driver.
[ bp: Massage commit message heavily. ]
Fixes: 9ebd680bd029 ("x86, apic: Use probe routines to simplify apic selection") Signed-off-by: Dheeraj Kumar Srivastava dheerajkumar.srivastava@amd.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Kishon Vijay Abraham I kvijayab@amd.com Reviewed-by: Vasant Hegde vasant.hegde@amd.com Reviewed-by: Cyrill Gorcunov gorcunov@gmail.com Reviewed-by: Thomas Gleixner tglx@linutronix.de Link: https://lore.kernel.org/r/20230616212236.1389-1-dheerajkumar.srivastava@amd.... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/apic/x2apic_phys.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c index 6bde05a86b4ed..896bc41cb2ba7 100644 --- a/arch/x86/kernel/apic/x2apic_phys.c +++ b/arch/x86/kernel/apic/x2apic_phys.c @@ -97,7 +97,10 @@ static void init_x2apic_ldr(void)
static int x2apic_phys_probe(void) { - if (x2apic_mode && (x2apic_phys || x2apic_fadt_phys())) + if (!x2apic_mode) + return 0; + + if (x2apic_phys || x2apic_fadt_phys()) return 1;
return apic == &apic_x2apic_phys;
From: Clark Wang xiaoning.wang@nxp.com
[ Upstream commit e69b9bc170c6d93ee375a5cbfd15f74c0fb59bdd ]
Claim clkhi and clklo as integer type to avoid possible calculation errors caused by data overflow.
Fixes: a55fa9d0e42e ("i2c: imx-lpi2c: add low power i2c bus driver") Signed-off-by: Clark Wang xiaoning.wang@nxp.com Signed-off-by: Carlos Song carlos.song@nxp.com Reviewed-by: Andi Shyti andi.shyti@kernel.org Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-imx-lpi2c.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/i2c/busses/i2c-imx-lpi2c.c b/drivers/i2c/busses/i2c-imx-lpi2c.c index a49b14d52a986..ff12018bc2060 100644 --- a/drivers/i2c/busses/i2c-imx-lpi2c.c +++ b/drivers/i2c/busses/i2c-imx-lpi2c.c @@ -201,8 +201,8 @@ static void lpi2c_imx_stop(struct lpi2c_imx_struct *lpi2c_imx) /* CLKLO = I2C_CLK_RATIO * CLKHI, SETHOLD = CLKHI, DATAVD = CLKHI/2 */ static int lpi2c_imx_config(struct lpi2c_imx_struct *lpi2c_imx) { - u8 prescale, filt, sethold, clkhi, clklo, datavd; - unsigned int clk_rate, clk_cycle; + u8 prescale, filt, sethold, datavd; + unsigned int clk_rate, clk_cycle, clkhi, clklo; enum lpi2c_imx_pincfg pincfg; unsigned int temp;
From: Pablo Neira Ayuso pablo@netfilter.org
commit 043d2acf57227db1fdaaa620b2a420acfaa56d6e upstream.
Otherwise the module reference counter is leaked.
Fixes b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -2664,6 +2664,8 @@ static int nf_tables_updchain(struct nft nft_trans_basechain(trans) = basechain; INIT_LIST_HEAD(&nft_trans_chain_hooks(trans)); list_splice(&hook.list, &nft_trans_chain_hooks(trans)); + if (nla[NFTA_CHAIN_HOOK]) + module_put(hook.type->owner);
nft_trans_commit_list_add_tail(ctx->net, trans);
From: Marc Zyngier maz@kernel.org
commit 1caa71a7a600f7781ce05ef1e84701c459653663 upstream.
When reworking the vgic locking, the vgic distributor registration got simplified, which was a very good cleanup. But just a tad too radical, as we now register the *native* vgic only, ignoring the GICv2-on-GICv3 that allows pre-historic VMs (or so I thought) to run.
As it turns out, QEMU still defaults to GICv2 in some cases, and this breaks Nathan's setup!
Fix it by propagating the *requested* vgic type rather than the host's version.
Fixes: 59112e9c390b ("KVM: arm64: vgic: Fix a circular locking issue") Reported-by: Nathan Chancellor nathan@kernel.org Tested-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Marc Zyngier maz@kernel.org link: https://lore.kernel.org/r/20230606221525.GA2269598@dev-arch.thelio-3990X Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/vgic/vgic-init.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/arch/arm64/kvm/vgic/vgic-init.c +++ b/arch/arm64/kvm/vgic/vgic-init.c @@ -446,6 +446,7 @@ int vgic_lazy_init(struct kvm *kvm) int kvm_vgic_map_resources(struct kvm *kvm) { struct vgic_dist *dist = &kvm->arch.vgic; + enum vgic_type type; gpa_t dist_base; int ret = 0;
@@ -460,10 +461,13 @@ int kvm_vgic_map_resources(struct kvm *k if (!irqchip_in_kernel(kvm)) goto out;
- if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V2) + if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V2) { ret = vgic_v2_map_resources(kvm); - else + type = VGIC_V2; + } else { ret = vgic_v3_map_resources(kvm); + type = VGIC_V3; + }
if (ret) { __kvm_vgic_destroy(kvm); @@ -473,8 +477,7 @@ int kvm_vgic_map_resources(struct kvm *k dist_base = dist->vgic_dist_base; mutex_unlock(&kvm->arch.config_lock);
- ret = vgic_register_dist_iodev(kvm, dist_base, - kvm_vgic_global_state.type); + ret = vgic_register_dist_iodev(kvm, dist_base, type); if (ret) { kvm_err("Unable to register VGIC dist MMIO regions\n"); kvm_vgic_destroy(kvm);
From: Namjae Jeon linkinjeon@kernel.org
commit 48b47f0caaa8a9f05ed803cb4f335fa3a7bfc622 upstream.
Uninitialized rd.delegated_inode can be used in vfs_rename(). Fix this by setting rd.delegated_inode to NULL to avoid the uninitialized read.
Fixes: 74d7970febf7 ("ksmbd: fix racy issue from using ->d_parent and ->d_name") Reported-by: Coverity Scan scan-admin@coverity.com Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ksmbd/vfs.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/ksmbd/vfs.c +++ b/fs/ksmbd/vfs.c @@ -769,6 +769,7 @@ retry: rd.new_dir = new_path.dentry->d_inode, rd.new_dentry = new_dentry, rd.flags = flags, + rd.delegated_inode = NULL, err = vfs_rename(&rd); if (err) ksmbd_debug(VFS, "vfs_rename failed err %d\n", err);
From: Namjae Jeon linkinjeon@kernel.org
commit df14afeed2e6c1bbadef7d2f9c46887bbd6d8d94 upstream.
There is a case that file_present is true and path is uninitialized. This patch change file_present is set to false by default and set to true when patch is initialized.
Fixes: 74d7970febf7 ("ksmbd: fix racy issue from using ->d_parent and ->d_name") Reported-by: Coverity Scan scan-admin@coverity.com Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ksmbd/smb2pdu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/fs/ksmbd/smb2pdu.c +++ b/fs/ksmbd/smb2pdu.c @@ -5560,7 +5560,7 @@ static int smb2_create_link(struct ksmbd { char *link_name = NULL, *target_name = NULL, *pathname = NULL; struct path path; - bool file_present = true; + bool file_present = false; int rc;
if (buf_len < (u64)sizeof(struct smb2_file_link_info) + @@ -5593,8 +5593,8 @@ static int smb2_create_link(struct ksmbd if (rc) { if (rc != -ENOENT) goto out; - file_present = false; - } + } else + file_present = true;
if (file_info->ReplaceIfExists) { if (file_present) {
From: Namjae Jeon linkinjeon@kernel.org
commit 6fe55c2799bc29624770c26f98ba7b06214f43e0 upstream.
last component point filename struct. Currently putname is called after vfs_path_parent_lookup(). And then last component is used for lookup_one_qstr_excl(). name in last component is freed by previous calling putname(). And It cause file lookup failure when testing generic/464 test of xfstest.
Fixes: 74d7970febf7 ("ksmbd: fix racy issue from using ->d_parent and ->d_name") Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ksmbd/vfs.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/fs/ksmbd/vfs.c +++ b/fs/ksmbd/vfs.c @@ -86,12 +86,14 @@ static int ksmbd_vfs_path_lookup_locked( err = vfs_path_parent_lookup(filename, flags, &parent_path, &last, &type, root_share_path); - putname(filename); - if (err) + if (err) { + putname(filename); return err; + }
if (unlikely(type != LAST_NORM)) { path_put(&parent_path); + putname(filename); return -ENOENT; }
@@ -108,12 +110,14 @@ static int ksmbd_vfs_path_lookup_locked( path->dentry = d; path->mnt = share_conf->vfs_path.mnt; path_put(&parent_path); + putname(filename);
return 0;
err_out: inode_unlock(parent_path.dentry->d_inode); path_put(&parent_path); + putname(filename); return -ENOENT; }
On Mon, 26 Jun 2023 20:08:26 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.3.10 release. There are 199 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 28 Jun 2023 18:07:23 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.3.10-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.3.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v6.3: 11 builds: 11 pass, 0 fail 28 boots: 28 pass, 0 fail 130 tests: 130 pass, 0 fail
Linux version: 6.3.10-rc1-g3d49488718bf Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On 6/26/23 11:08 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.3.10 release. There are 199 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 28 Jun 2023 18:07:23 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.3.10-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.3.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
* Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.3.10 release. There are 199 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 28 Jun 2023 18:07:23 +0000. Anything received after that time might be too late.
Hi Greg
6.3.10-rc1
compiles, boots and runs here on x86_64 (AMD Ryzen 5 PRO 4650G, Slackware64-15.0)
Tested-by: Markus Reichelt lkt+2023@mareichelt.com
Hi Greg,
From: Greg Kroah-Hartman gregkh@linuxfoundation.org Sent: Monday, June 26, 2023 7:08 PM
This is the start of the stable review cycle for the 6.3.10 release. There are 199 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 28 Jun 2023 18:07:23 +0000. Anything received after that time might be too late.
Thank you for the release!
CIP configurations built and booted okay with Linux 6.3.10-rc1 (3d49488718bf): https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/pipelines/91... https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/commits/linu...
Tested-by: Chris Paterson (CIP) chris.paterson2@renesas.com
Kind regards, Chris
On Mon, Jun 26, 2023 at 08:08:26PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.3.10 release. There are 199 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 28 Jun 2023 18:07:23 +0000. Anything received after that time might be too late.
Build results: total: 153 pass: 153 fail: 0 Qemu test results: total: 520 pass: 520 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On Mon, 26 Jun 2023 at 23:46, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.3.10 release. There are 199 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 28 Jun 2023 18:07:23 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.3.10-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.3.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 6.3.10-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-6.3.y * git commit: 3d49488718bf7f62bf7e49db5f677e4ad0d5a900 * git describe: v6.3.9-200-g3d49488718bf * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.3.y/build/v6.3.9-...
## Test Regressions (compared to v6.3.9)
## Metric Regressions (compared to v6.3.9)
## Test Fixes (compared to v6.3.9)
## Metric Fixes (compared to v6.3.9)
## Test result summary total: 189021, pass: 155522, fail: 3063, skip: 30272, xfail: 164
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 145 total, 144 passed, 1 failed * arm64: 54 total, 53 passed, 1 failed * i386: 41 total, 40 passed, 1 failed * mips: 30 total, 28 passed, 2 failed * parisc: 4 total, 4 passed, 0 failed * powerpc: 38 total, 36 passed, 2 failed * riscv: 26 total, 25 passed, 1 failed * s390: 16 total, 14 passed, 2 failed * sh: 14 total, 12 passed, 2 failed * sparc: 8 total, 8 passed, 0 failed * x86_64: 46 total, 46 passed, 0 failed
## Test suites summary * boot * fwts * kselftest-android * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers-dma-buf * kselftest-efivarfs * kselftest-exec * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-firmware * kselftest-fpu * kselftest-ftrace * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-mincore * kselftest-mqueue * kselftest-net * kselftest-net-forwarding * kselftest-net-mptcp * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-user_events * kselftest-vDSO * kselftest-watchdog * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-fsx * ltp-hugetlb * ltp-io * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-pty * ltp-sched * ltp-securebits * ltp-smoke * ltp-syscalls * ltp-tracing * network-basic-tests * perf * rcutorture * v4l2-compliance * vdso
-- Linaro LKFT https://lkft.linaro.org
On Mon, Jun 26, 2023 at 08:08:26PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.3.10 release. There are 199 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Tested-by: Conor Dooley conor.dooley@microchip.com
Cheers, Conor.
This is the start of the stable review cycle for the 6.3.10 release. There are 199 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 28 Jun 2023 18:07:23 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.3.10-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.3.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my x86_64 and ARM64 test systems. No errors or regressions.
Tested-by: Allen Pais apais@linux.microsoft.com
Thanks.
linux-stable-mirror@lists.linaro.org