This is the start of the stable review cycle for the 5.4.232 release. There are 156 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 22 Feb 2023 13:35:35 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.232-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.4.232-rc1
Joerg Roedel jroedel@suse.de iommu/amd: Pass gfp flags to iommu_map_page() in amd_iommu_map()
Dan Carpenter error27@gmail.com net: sched: sch: Fix off by one in htb_activate_prios()
Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com ASoC: SOF: Intel: hda-dai: fix possible stream_tag leak
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: fix underflow in second superblock position calculations
Greg Kroah-Hartman gregkh@linuxfoundation.org kvm: initialize all of the kvm_debugregs structure before sending it to userspace
Natalia Petrova n.petrova@fintech.ru i40e: Add checking for null for nlmsg_find_attr()
Guillaume Nault gnault@redhat.com ipv6: Fix tcp socket connection with DSCP.
Guillaume Nault gnault@redhat.com ipv6: Fix datagram socket connection with DSCP.
Jason Xing kernelxing@tencent.com ixgbe: add double of VLAN header when computing the max MTU
Jakub Kicinski kuba@kernel.org net: mpls: fix stale pointer if allocation fails during device rename
Cristian Ciocaltea cristian.ciocaltea@collabora.com net: stmmac: Restrict warning on disabling DMA store and fwd mode
Michael Chan michael.chan@broadcom.com bnxt_en: Fix mqprio and XDP ring checking logic
Johannes Zink j.zink@pengutronix.de net: stmmac: fix order of dwmac5 FlexPPS parametrization sequence
Miko Larsson mikoxyzzz@gmail.com net/usb: kalmia: Don't pass act_len in usb_bulk_msg error path
Kuniyuki Iwashima kuniyu@amazon.com dccp/tcp: Avoid negative sk_forward_alloc by ipv6_pinfo.pktoptions.
Pietro Borrello borrello@diag.uniroma1.it sctp: sctp_sock_filter(): avoid list_entry() on possibly empty list
Rafał Miłecki rafal@milecki.pl net: bgmac: fix BCM5358 support by setting correct flags
Jason Xing kernelxing@tencent.com i40e: add double of VLAN header when computing the max MTU
Jason Xing kernelxing@tencent.com ixgbe: allow to increase MTU to 3K with XDP enabled
Andrew Morton akpm@linux-foundation.org revert "squashfs: harden sanity check in squashfs_read_xattr_id_table"
Felix Riemann felix.riemann@sma.de net: Fix unwanted sign extension in netdev_stats_to_stats64()
Aaron Thompson dev@aaront.org Revert "mm: Always release pages to the buddy allocator in memblock_free_late()."
Mike Kravetz mike.kravetz@oracle.com hugetlb: check for undefined shift on 32 bit architectures
Munehisa Kamata kamatam@amazon.com sched/psi: Fix use-after-free in ep_remove_wait_queue()
Kailang Yang kailang@realtek.com ALSA: hda/realtek - fixed wrong gpio assigned
Bo Liu bo.liu@senarytech.com ALSA: hda/conexant: add a new hda codec SN6180
Yang Yingliang yangyingliang@huawei.com mmc: mmc_spi: fix error handling in mmc_spi_probe()
Yang Yingliang yangyingliang@huawei.com mmc: sdio: fix possible resource leaks in some error paths
Ido Schimmel idosch@nvidia.com ipv4: Fix incorrect route flushing when source address is deleted
Shaoying Xu shaoyi@amazon.com Revert "ipv4: Fix incorrect route flushing when source address is deleted"
Brian Foster bfoster@redhat.com xfs: sync lazy sb accounting on quiesce of read-only mounts
Darrick J. Wong djwong@kernel.org xfs: prevent UAF in xfs_log_item_in_current_chkpt
Darrick J. Wong darrick.wong@oracle.com xfs: fix the forward progress assertion in xfs_iwalk_run_callbacks
Darrick J. Wong darrick.wong@oracle.com xfs: ensure inobt record walks always make forward progress
Darrick J. Wong darrick.wong@oracle.com xfs: fix missing CoW blocks writeback conversion retry
Darrick J. Wong darrick.wong@oracle.com xfs: only relog deferred intent items if free space in the log gets low
Darrick J. Wong darrick.wong@oracle.com xfs: expose the log push threshold
Darrick J. Wong darrick.wong@oracle.com xfs: periodically relog deferred intent items
Darrick J. Wong darrick.wong@oracle.com xfs: change the order in which child and parent defer ops are finished
Darrick J. Wong darrick.wong@oracle.com xfs: fix an incore inode UAF in xfs_bui_recover
Darrick J. Wong darrick.wong@oracle.com xfs: clean up xfs_bui_item_recover iget/trans_alloc/ilock ordering
Darrick J. Wong darrick.wong@oracle.com xfs: clean up bmap intent item recovery checking
Darrick J. Wong darrick.wong@oracle.com xfs: xfs_defer_capture should absorb remaining transaction reservation
Darrick J. Wong darrick.wong@oracle.com xfs: xfs_defer_capture should absorb remaining block reservations
Darrick J. Wong darrick.wong@oracle.com xfs: proper replay of deferred ops queued during log recovery
Dave Chinner dchinner@redhat.com xfs: fix finobt btree block recovery ordering
Darrick J. Wong darrick.wong@oracle.com xfs: log new intent items created as part of finishing recovered intent items
Christoph Hellwig hch@lst.de xfs: refactor xfs_defer_finish_noroll
Christoph Hellwig hch@lst.de xfs: turn dfp_intent into a xfs_log_item
Christoph Hellwig hch@lst.de xfs: merge the ->diff_items defer op into ->create_intent
Christoph Hellwig hch@lst.de xfs: merge the ->log_item defer op into ->create_intent
Christoph Hellwig hch@lst.de xfs: factor out a xfs_defer_create_intent helper
Christoph Hellwig hch@lst.de xfs: remove the xfs_inode_log_item_t typedef
Christoph Hellwig hch@lst.de xfs: remove the xfs_efd_log_item_t typedef
Christoph Hellwig hch@lst.de xfs: remove the xfs_efi_log_item_t typedef
Florian Westphal fw@strlen.de netfilter: nft_tproxy: restrict to prerouting hook
Anand Jain anand.jain@oracle.com btrfs: free device in btrfs_close_devices for a single device filesystem
Seth Jenkins sethjenkins@google.com aio: fix mremap after fork null-deref
Amit Engel Amit.Engel@dell.com nvme-fc: fix a missing queue put in nvmet_fc_ls_create_association
Vasily Gorbik gor@linux.ibm.com s390/decompressor: specify __decompress() buf len to avoid overflow
Kees Cook keescook@chromium.org net: sched: sch: Bounds check priority
Andrey Konovalov andrey.konovalov@linaro.org net: stmmac: do not stop RX_CLK in Rx LPI state for qcs404 SoC
Hyunwoo Kim v4bel@theori.io net/rose: Fix to not accept on connected socket
Shunsuke Mie mie@igel.co.jp tools/virtio: fix the vringh test for virtio ring changes
Arnd Bergmann arnd@arndb.de ASoC: cs42l56: fix DT probe
Eduard Zingerman eddyz87@gmail.com selftests/bpf: Verify copy_register_state() preserves parent/live fields
Mike Kravetz mike.kravetz@oracle.com migrate: hugetlb: check for hugetlb shared PMD in node migration
Toke Høiland-Jørgensen toke@redhat.com bpf: Always return target ifindex in bpf_fib_lookup
Andy Shevchenko andriy.shevchenko@linux.intel.com nvme-pci: Move enumeration by class to be last in the table
Heiner Kallweit hkallweit1@gmail.com arm64: dts: meson-axg: Make mmc host controller interrupts level-sensitive
Heiner Kallweit hkallweit1@gmail.com arm64: dts: meson-g12-common: Make mmc host controller interrupts level-sensitive
Heiner Kallweit hkallweit1@gmail.com arm64: dts: meson-gx: Make mmc host controller interrupts level-sensitive
Guo Ren guoren@linux.alibaba.com riscv: Fixup race condition on PG_dcache_clean in flush_icache_pte
Xiubo Li xiubli@redhat.com ceph: flush cap releases when the session is flushed
Prashant Malani pmalani@chromium.org usb: typec: altmodes/displayport: Fix probe pin assign check
Mark Pearson mpearson-lenovo@squebb.ca usb: core: add quirk for Alcor Link AK9563 smartcard reader
Alan Stern stern@rowland.harvard.edu net: USB: Fix wrong-direction WARNING in plusb.c
Andy Shevchenko andriy.shevchenko@linux.intel.com pinctrl: intel: Restore the pins that used to be in Direct IRQ mode
Maxim Korotkov korotkov.maxim.s@gmail.com pinctrl: single: fix potential NULL dereference
Joel Stanley joel@jms.id.au pinctrl: aspeed: Fix confusing types in return value
Dan Carpenter error27@gmail.com ALSA: pci: lx6464es: fix a debug loop
Hangbin Liu liuhangbin@gmail.com selftests: forwarding: lib: quote the sysctl values
Pietro Borrello borrello@diag.uniroma1.it rds: rds_rm_zerocopy_callback() use list_first_entry()
Anirudh Venkataramanan anirudh.venkataramanan@intel.com ice: Do not use WQ_MEM_RECLAIM flag for workqueue
Neel Patel neel.patel@amd.com ionic: clean interrupt before enabling queue to avoid credit race
Heiner Kallweit hkallweit1@gmail.com net: phy: meson-gxl: use MMD access dummy stubs for GXL, internal PHY
Qi Zheng zhengqi.arch@bytedance.com bonding: fix error checking in bond_debug_reregister()
Christian Hopps chopps@chopps.org xfrm: fix bug with DSCP copy to v6 from v4 tunnel
Yang Yingliang yangyingliang@huawei.com RDMA/usnic: use iommu_map_atomic() under spin_lock()
Tom Murphy murphyt7@tcd.ie iommu: Add gfp parameter to iommu_ops::map
Dragos Tatulea dtatulea@nvidia.com IB/IPoIB: Fix legacy IPoIB due to wrong number of queues
Dean Luick dean.luick@cornelisnetworks.com IB/hfi1: Restore allocated resources on failed copyout
Devid Antonio Filoni devid.filoni@egluetechnologies.com can: j1939: do not wait 250 ms if the same addr was already claimed
Shiju Jose shiju.jose@huawei.com tracing: Fix poll() and select() do not work on per_cpu trace_pipe and trace_pipe_raw
Artemii Karasev karasev@ispras.ru ALSA: emux: Avoid potential array out-of-bound in snd_emux_xg_control()
Alexander Potapenko glider@google.com btrfs: zlib: zero-initialize zlib workspace
Josef Bacik josef@toxicpanda.com btrfs: limit device extents to the device size
Andreas Kemnade andreas@kemnade.info iio:adc:twl6030: Enable measurement of VAC
Minsuk Kang linuxlovemin@yonsei.ac.kr wifi: brcmfmac: Check the count value of channel spec to prevent out-of-bounds reads
Chao Yu chao@kernel.org f2fs: fix to do sanity check on i_extra_isize in is_alive()
Dongliang Mu dzm91@hust.edu.cn fbdev: smscufx: fix error handling code in ufx_usb_probe
Michael Ellerman mpe@ellerman.id.au powerpc/imc-pmu: Revert nest_init_lock to being a mutex
Ilpo Järvinen ilpo.jarvinen@linux.intel.com serial: 8250_dma: Fix DMA Rx rearm race
Ilpo Järvinen ilpo.jarvinen@linux.intel.com serial: 8250_dma: Fix DMA Rx completion race
Zhang Xiaoxu zhangxiaoxu5@huawei.com xprtrdma: Fix regbuf data not freed in rpcrdma_req_create()
Andrea Righi andrea.righi@canonical.com mm: swap: properly update readahead statistics in unuse_pte_range()
Michael Walle michael@walle.cc nvmem: core: fix cell removal on error
Phillip Lougher phillip@squashfs.org.uk Squashfs: fix handling and sanity checking of xattr_ids count
Longlong Xia xialonglong1@huawei.com mm/swapfile: add cond_resched() in get_swap_pages()
Zheng Yongjun zhengyongjun3@huawei.com fpga: stratix10-soc: Fix return value check in s10_ops_write_init()
Mike Kravetz mike.kravetz@oracle.com mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps
Andreas Schwab schwab@suse.de riscv: disable generation of unwind tables
Helge Deller deller@gmx.de parisc: Wire up PTRACE_GETREGS/PTRACE_SETREGS for compat case
Helge Deller deller@gmx.de parisc: Fix return code of pdc_iodc_print()
Andreas Kemnade andreas@kemnade.info iio:adc:twl6030: Enable measurements of VUSB, VBAT and others
Xiongfeng Wang wangxiongfeng2@huawei.com iio: adc: berlin2-adc: Add missing of_node_put() in error path
Dmitry Perchanov dmitry.perchanov@intel.com iio: hid: fix the retval in accel_3d_capture_sample
Ard Biesheuvel ardb@kernel.org efi: Accept version 2 of memory attributes table
Alexander Egorenkov egorenar@linux.ibm.com watchdog: diag288_wdt: fix __diag288() inline assembly
Alexander Egorenkov egorenar@linux.ibm.com watchdog: diag288_wdt: do not use stack buffers for hardware data
Samuel Thibault samuel.thibault@ens-lyon.org fbcon: Check font dimension limits
Werner Sembach wse@tuxedocomputers.com Input: i8042 - add Clevo PCX0DX to i8042 quirk table
Werner Sembach wse@tuxedocomputers.com Input: i8042 - add TUXEDO devices to i8042 quirk tables
Werner Sembach wse@tuxedocomputers.com Input: i8042 - merge quirk tables
Werner Sembach wse@tuxedocomputers.com Input: i8042 - move __initconst to fix code styling warning
George Kennedy george.kennedy@oracle.com vc_screen: move load of struct vc_data pointer in vcs_read() to avoid UAF
Udipto Goswami quic_ugoswami@quicinc.com usb: gadget: f_fs: Fix unbalanced spinlock in __ffs_ep0_queue_wait
Neil Armstrong neil.armstrong@linaro.org usb: dwc3: qcom: enable vbus override when in OTG dr-mode
Wesley Cheng wcheng@codeaurora.org usb: dwc3: dwc3-qcom: Fix typo in the dwc3 vbus override API
Olivier Moysan olivier.moysan@foss.st.com iio: adc: stm32-dfsdm: fill module aliases
Hyunwoo Kim v4bel@theori.io net/x25: Fix to not accept on connected socket
Randy Dunlap rdunlap@infradead.org i2c: rk3x: fix a bunch of kernel-doc warnings
Mike Christie michael.christie@oracle.com scsi: iscsi_tcp: Fix UAF during login when accessing the shost ipaddress
Maurizio Lombardi mlombard@redhat.com scsi: target: core: Fix warning on RT kernels
Anton Gusev aagusev@ispras.ru efi: fix potential NULL deref in efi_mem_reserve_persistent
Fedor Pchelkin pchelkin@ispras.ru net: openvswitch: fix flow memory leak in ovs_flow_cmd_new
Parav Pandit parav@nvidia.com virtio-net: Keep stop() to follow mirror sequence of open()
Andrei Gherzan andrei.gherzan@canonical.com selftests: net: udpgso_bench_tx: Cater for pending datagrams zerocopy benchmarking
Andrei Gherzan andrei.gherzan@canonical.com selftests: net: udpgso_bench: Fix racing bug between the rx/tx programs
Andrei Gherzan andrei.gherzan@canonical.com selftests: net: udpgso_bench_rx/tx: Stop when wrong CLI args are provided
Andrei Gherzan andrei.gherzan@canonical.com selftests: net: udpgso_bench_rx: Fix 'used uninitialized' compiler warning
Damien Le Moal damien.lemoal@opensource.wdc.com ata: libata: Fix sata_down_spd_limit() when no link speed is reported
Ziyang Xuan william.xuanziyang@huawei.com can: j1939: fix errant WARN_ON_ONCE in j1939_session_deactivate
Chris Healy healych@amazon.com net: phy: meson-gxl: Add generic dummy stubs for MMD register access
Fedor Pchelkin pchelkin@ispras.ru squashfs: harden sanity check in squashfs_read_xattr_id_table
Florian Westphal fw@strlen.de netfilter: br_netfilter: disable sabotage_in hook after first suppression
Hyunwoo Kim v4bel@theori.io netrom: Fix use-after-free caused by accept on already connected socket
Al Viro viro@zeniv.linux.org.uk fix "direction" argument of iov_iter_kvec()
Al Viro viro@zeniv.linux.org.uk fix iov_iter_bvec() "direction" argument
Al Viro viro@zeniv.linux.org.uk WRITE is "data source", not destination...
Martin K. Petersen martin.petersen@oracle.com scsi: Revert "scsi: core: map PQ=1, PDT=other values to SCSI_SCAN_TARGET_PRESENT"
Pierluigi Passaro pierluigi.p@variscite.com arm64: dts: imx8mm: Fix pad control for UART1_DTE_RX
Artemii Karasev karasev@ispras.ru ALSA: hda/via: Avoid potential array out-of-bound in add_secret_dac_path()
Andy Shevchenko andriy.shevchenko@linux.intel.com ASoC: Intel: bytcr_rt5651: Drop reference count of ACPI device after use
Yuan Can yuancan@huawei.com bus: sunxi-rsb: Fix error handling in sunxi_rsb_init()
Takashi Sakamoto o-takashi@sakamocchi.jp firewire: fix memory leak for payload of request subaction to IEC 61883-1 FCP region
-------------
Diffstat:
Makefile | 4 +- arch/arm64/boot/dts/amlogic/meson-axg.dtsi | 4 +- arch/arm64/boot/dts/amlogic/meson-g12-common.dtsi | 6 +- arch/arm64/boot/dts/amlogic/meson-gx.dtsi | 6 +- arch/arm64/boot/dts/freescale/imx8mm-pinfunc.h | 2 +- arch/parisc/kernel/firmware.c | 5 +- arch/parisc/kernel/ptrace.c | 15 +- arch/powerpc/perf/imc-pmu.c | 14 +- arch/riscv/Makefile | 3 + arch/riscv/mm/cacheflush.c | 4 +- arch/s390/boot/compressed/decompressor.c | 2 +- arch/x86/kvm/x86.c | 3 +- drivers/ata/libata-core.c | 2 +- drivers/bus/sunxi-rsb.c | 8 +- drivers/firewire/core-cdev.c | 4 +- drivers/firmware/efi/efi.c | 2 + drivers/firmware/efi/memattr.c | 2 +- drivers/fpga/stratix10-soc.c | 4 +- drivers/fsi/fsi-sbefifo.c | 6 +- drivers/i2c/busses/i2c-rk3x.c | 44 +- drivers/iio/accel/hid-sensor-accel-3d.c | 1 + drivers/iio/adc/berlin2-adc.c | 4 +- drivers/iio/adc/stm32-dfsdm-adc.c | 1 + drivers/iio/adc/twl6030-gpadc.c | 32 + drivers/infiniband/hw/hfi1/file_ops.c | 7 +- drivers/infiniband/hw/usnic/usnic_uiom.c | 8 +- drivers/infiniband/ulp/ipoib/ipoib_main.c | 8 + drivers/input/serio/i8042-x86ia64io.h | 1188 ++++++++------ drivers/iommu/amd_iommu.c | 5 +- drivers/iommu/arm-smmu-v3.c | 2 +- drivers/iommu/arm-smmu.c | 2 +- drivers/iommu/dma-iommu.c | 6 +- drivers/iommu/exynos-iommu.c | 2 +- drivers/iommu/intel-iommu.c | 2 +- drivers/iommu/iommu.c | 43 +- drivers/iommu/ipmmu-vmsa.c | 2 +- drivers/iommu/msm_iommu.c | 2 +- drivers/iommu/mtk_iommu.c | 2 +- drivers/iommu/mtk_iommu_v1.c | 2 +- drivers/iommu/omap-iommu.c | 2 +- drivers/iommu/qcom_iommu.c | 2 +- drivers/iommu/rockchip-iommu.c | 2 +- drivers/iommu/s390-iommu.c | 2 +- drivers/iommu/tegra-gart.c | 2 +- drivers/iommu/tegra-smmu.c | 2 +- drivers/iommu/virtio-iommu.c | 2 +- drivers/mmc/core/sdio_bus.c | 17 +- drivers/mmc/core/sdio_cis.c | 12 - drivers/mmc/host/mmc_spi.c | 8 +- drivers/net/bonding/bond_debugfs.c | 2 +- drivers/net/ethernet/broadcom/bgmac-bcma.c | 6 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 +- drivers/net/ethernet/intel/i40e/i40e_main.c | 4 +- drivers/net/ethernet/intel/ice/ice_main.c | 2 +- drivers/net/ethernet/intel/ixgbe/ixgbe.h | 2 + drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 28 +- drivers/net/ethernet/pensando/ionic/ionic_lif.c | 15 +- .../ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c | 2 + drivers/net/ethernet/stmicro/stmmac/dwmac5.c | 3 +- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 +- .../net/ethernet/stmicro/stmmac/stmmac_platform.c | 2 +- drivers/net/phy/meson-gxl.c | 4 + drivers/net/usb/kalmia.c | 8 +- drivers/net/usb/plusb.c | 4 +- drivers/net/virtio_net.c | 2 +- .../broadcom/brcm80211/brcmfmac/cfg80211.c | 17 + drivers/nvme/host/pci.c | 3 +- drivers/nvme/target/fc.c | 4 +- drivers/nvmem/core.c | 3 +- drivers/pinctrl/aspeed/pinctrl-aspeed.c | 2 +- drivers/pinctrl/intel/pinctrl-intel.c | 16 +- drivers/pinctrl/pinctrl-single.c | 2 + drivers/scsi/iscsi_tcp.c | 9 +- drivers/scsi/scsi_scan.c | 7 +- drivers/target/target_core_file.c | 4 +- drivers/target/target_core_tmr.c | 4 +- drivers/tty/serial/8250/8250_dma.c | 26 +- drivers/tty/vt/vc_screen.c | 9 +- drivers/usb/core/quirks.c | 3 + drivers/usb/dwc3/dwc3-qcom.c | 10 +- drivers/usb/gadget/function/f_fs.c | 4 +- drivers/usb/typec/altmodes/displayport.c | 8 +- drivers/video/fbdev/core/fbcon.c | 7 +- drivers/video/fbdev/smscufx.c | 46 +- drivers/watchdog/diag288_wdt.c | 15 +- drivers/xen/pvcalls-back.c | 8 +- fs/aio.c | 4 + fs/btrfs/volumes.c | 18 +- fs/btrfs/zlib.c | 2 +- fs/ceph/mds_client.c | 6 + fs/f2fs/gc.c | 18 +- fs/nilfs2/ioctl.c | 7 + fs/nilfs2/super.c | 9 + fs/nilfs2/the_nilfs.c | 8 +- fs/proc/task_mmu.c | 4 +- fs/squashfs/squashfs_fs.h | 2 +- fs/squashfs/squashfs_fs_sb.h | 2 +- fs/squashfs/xattr.h | 4 +- fs/squashfs/xattr_id.c | 2 +- fs/xfs/libxfs/xfs_defer.c | 358 +++- fs/xfs/libxfs/xfs_defer.h | 49 +- fs/xfs/libxfs/xfs_inode_fork.c | 2 +- fs/xfs/libxfs/xfs_trans_inode.c | 2 +- fs/xfs/xfs_aops.c | 4 +- fs/xfs/xfs_bmap_item.c | 238 +-- fs/xfs/xfs_bmap_item.h | 3 +- fs/xfs/xfs_extfree_item.c | 175 +- fs/xfs/xfs_extfree_item.h | 18 +- fs/xfs/xfs_icreate_item.c | 1 + fs/xfs/xfs_inode.c | 4 +- fs/xfs/xfs_inode_item.c | 2 +- fs/xfs/xfs_inode_item.h | 4 +- fs/xfs/xfs_iwalk.c | 27 +- fs/xfs/xfs_log.c | 68 +- fs/xfs/xfs_log.h | 3 + fs/xfs/xfs_log_cil.c | 8 +- fs/xfs/xfs_log_recover.c | 160 +- fs/xfs/xfs_mount.c | 3 +- fs/xfs/xfs_refcount_item.c | 173 +- fs/xfs/xfs_refcount_item.h | 3 +- fs/xfs/xfs_rmap_item.c | 161 +- fs/xfs/xfs_rmap_item.h | 3 +- fs/xfs/xfs_stats.c | 4 + fs/xfs/xfs_stats.h | 1 + fs/xfs/xfs_super.c | 8 +- fs/xfs/xfs_trace.h | 1 + fs/xfs/xfs_trans.h | 10 + include/linux/hugetlb.h | 18 +- include/linux/iommu.h | 21 +- include/linux/stmmac.h | 1 + include/net/sock.h | 13 + kernel/sched/psi.c | 7 +- kernel/trace/trace.c | 3 - mm/memblock.c | 8 +- mm/mempolicy.c | 3 +- mm/swapfile.c | 13 +- net/bridge/br_netfilter_hooks.c | 1 + net/can/j1939/address-claim.c | 40 + net/can/j1939/transport.c | 4 - net/core/dev.c | 2 +- net/core/filter.c | 3 +- net/dccp/ipv6.c | 7 +- net/ipv6/datagram.c | 2 +- net/ipv6/tcp_ipv6.c | 11 +- net/mpls/af_mpls.c | 4 + net/netfilter/nft_tproxy.c | 8 + net/netrom/af_netrom.c | 5 + net/openvswitch/datapath.c | 12 +- net/rds/message.c | 6 +- net/rose/af_rose.c | 8 + net/sched/sch_htb.c | 5 +- net/sctp/diag.c | 4 +- net/sunrpc/xprtrdma/verbs.c | 4 +- net/x25/af_x25.c | 6 + net/xfrm/xfrm_input.c | 3 +- sound/pci/hda/patch_conexant.c | 1 + sound/pci/hda/patch_realtek.c | 2 +- sound/pci/hda/patch_via.c | 3 + sound/pci/lx6464es/lx_core.c | 11 +- sound/soc/codecs/cs42l56.c | 6 - sound/soc/intel/boards/bytcr_rt5651.c | 2 +- sound/soc/sof/intel/hda-dai.c | 8 +- sound/synth/emux/emux_nrpn.c | 3 + .../selftests/bpf/verifier/search_pruning.c | 36 + tools/testing/selftests/net/fib_tests.sh | 1727 ++++++++++++++++++++ tools/testing/selftests/net/forwarding/lib.sh | 4 +- tools/testing/selftests/net/udpgso_bench.sh | 24 +- tools/testing/selftests/net/udpgso_bench_rx.c | 4 +- tools/testing/selftests/net/udpgso_bench_tx.c | 36 +- tools/virtio/linux/bug.h | 8 +- tools/virtio/linux/build_bug.h | 7 + tools/virtio/linux/cpumask.h | 7 + tools/virtio/linux/gfp.h | 7 + tools/virtio/linux/kernel.h | 1 + tools/virtio/linux/kmsan.h | 12 + tools/virtio/linux/scatterlist.h | 1 + tools/virtio/linux/topology.h | 7 + 177 files changed, 4190 insertions(+), 1340 deletions(-)
From: Takashi Sakamoto o-takashi@sakamocchi.jp
commit 531390a243ef47448f8bad01c186c2787666bf4d upstream.
This patch is fix for Linux kernel v2.6.33 or later.
For request subaction to IEC 61883-1 FCP region, Linux FireWire subsystem have had an issue of use-after-free. The subsystem allows multiple user space listeners to the region, while data of the payload was likely released before the listeners execute read(2) to access to it for copying to user space.
The issue was fixed by a commit 281e20323ab7 ("firewire: core: fix use-after-free regression in FCP handler"). The object of payload is duplicated in kernel space for each listener. When the listener executes ioctl(2) with FW_CDEV_IOC_SEND_RESPONSE request, the object is going to be released.
However, it causes memory leak since the commit relies on call of release_request() in drivers/firewire/core-cdev.c. Against the expectation, the function is never called due to the design of release_client_resource(). The function delegates release task to caller when called with non-NULL fourth argument. The implementation of ioctl_send_response() is the case. It should release the object explicitly.
This commit fixes the bug.
Cc: stable@vger.kernel.org Fixes: 281e20323ab7 ("firewire: core: fix use-after-free regression in FCP handler") Signed-off-by: Takashi Sakamoto o-takashi@sakamocchi.jp Link: https://lore.kernel.org/r/20230117090610.93792-2-o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firewire/core-cdev.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/firewire/core-cdev.c +++ b/drivers/firewire/core-cdev.c @@ -818,8 +818,10 @@ static int ioctl_send_response(struct cl
r = container_of(resource, struct inbound_transaction_resource, resource); - if (is_fcp_request(r->request)) + if (is_fcp_request(r->request)) { + kfree(r->data); goto out; + }
if (a->length != fw_get_response_length(r->request)) { ret = -EINVAL;
From: Yuan Can yuancan@huawei.com
[ Upstream commit f71eaf2708be7831428eacae7db25d8ec6b8b4c5 ]
The sunxi_rsb_init() returns the platform_driver_register() directly without checking its return value, if platform_driver_register() failed, the sunxi_rsb_bus is not unregistered. Fix by unregister sunxi_rsb_bus when platform_driver_register() failed.
Fixes: d787dcdb9c8f ("bus: sunxi-rsb: Add driver for Allwinner Reduced Serial Bus") Signed-off-by: Yuan Can yuancan@huawei.com Reviewed-by: Jernej Skrabec jernej.skrabec@gmail.com Link: https://lore.kernel.org/r/20221123094200.12036-1-yuancan@huawei.com Signed-off-by: Jernej Skrabec jernej.skrabec@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bus/sunxi-rsb.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/bus/sunxi-rsb.c b/drivers/bus/sunxi-rsb.c index f8c29b888e6b..98cbb18f17fa 100644 --- a/drivers/bus/sunxi-rsb.c +++ b/drivers/bus/sunxi-rsb.c @@ -781,7 +781,13 @@ static int __init sunxi_rsb_init(void) return ret; }
- return platform_driver_register(&sunxi_rsb_driver); + ret = platform_driver_register(&sunxi_rsb_driver); + if (ret) { + bus_unregister(&sunxi_rsb_bus); + return ret; + } + + return 0; } module_init(sunxi_rsb_init);
From: Andy Shevchenko andriy.shevchenko@linux.intel.com
[ Upstream commit 721858823d7cdc8f2a897579b040e935989f6f02 ]
Theoretically the device might gone if its reference count drops to 0. This might be the case when we try to find the first physical node of the ACPI device. We need to keep reference to it until we get a result of the above mentioned call. Refactor the code to drop the reference count at the correct place.
While at it, move to acpi_dev_put() as symmetrical call to the acpi_dev_get_first_match_dev().
Fixes: 02c0a3b3047f ("ASoC: Intel: bytcr_rt5651: add MCLK, quirks and cleanups") Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Acked-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Link: https://lore.kernel.org/r/20230112112852.67714-3-andriy.shevchenko@linux.int... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/intel/boards/bytcr_rt5651.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/intel/boards/bytcr_rt5651.c b/sound/soc/intel/boards/bytcr_rt5651.c index 921c09cdb480..0c1c8628b991 100644 --- a/sound/soc/intel/boards/bytcr_rt5651.c +++ b/sound/soc/intel/boards/bytcr_rt5651.c @@ -919,7 +919,6 @@ static int snd_byt_rt5651_mc_probe(struct platform_device *pdev) if (adev) { snprintf(byt_rt5651_codec_name, sizeof(byt_rt5651_codec_name), "i2c-%s", acpi_dev_name(adev)); - put_device(&adev->dev); byt_rt5651_dais[dai_index].codecs->name = byt_rt5651_codec_name; } else { dev_err(&pdev->dev, "Error cannot find '%s' dev\n", mach->id); @@ -928,6 +927,7 @@ static int snd_byt_rt5651_mc_probe(struct platform_device *pdev)
codec_dev = bus_find_device_by_name(&i2c_bus_type, NULL, byt_rt5651_codec_name); + acpi_dev_put(adev); if (!codec_dev) return -EPROBE_DEFER;
From: Artemii Karasev karasev@ispras.ru
[ Upstream commit b9cee506da2b7920b5ea02ccd8e78a907d0ee7aa ]
snd_hda_get_connections() can return a negative error code. It may lead to accessing 'conn' array at a negative index.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Artemii Karasev karasev@ispras.ru Fixes: 30b4503378c9 ("ALSA: hda - Expose secret DAC-AA connection of some VIA codecs") Link: https://lore.kernel.org/r/20230119082259.3634-1-karasev@ispras.ru Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/pci/hda/patch_via.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/sound/pci/hda/patch_via.c b/sound/pci/hda/patch_via.c index 3edb4e25797d..4a74ccf7cf3e 100644 --- a/sound/pci/hda/patch_via.c +++ b/sound/pci/hda/patch_via.c @@ -821,6 +821,9 @@ static int add_secret_dac_path(struct hda_codec *codec) return 0; nums = snd_hda_get_connections(codec, spec->gen.mixer_nid, conn, ARRAY_SIZE(conn) - 1); + if (nums < 0) + return nums; + for (i = 0; i < nums; i++) { if (get_wcaps_type(get_wcaps(codec, conn[i])) == AC_WID_AUD_OUT) return 0;
From: Pierluigi Passaro pierluigi.p@variscite.com
[ Upstream commit 47123900f3e4a7f769631d6ec15abf44086276f6 ]
According section 8.2.5.313 Select Input Register (IOMUXC_UART1_RXD_SELECT_INPUT) of i.MX 8M Mini Applications Processor Reference Manual, Rev. 3, 11/2020 the required setting for this specific pin configuration is "1"
Signed-off-by: Pierluigi Passaro pierluigi.p@variscite.com Reviewed-by: Fabio Estevam festevam@gmail.com Fixes: c1c9d41319c3 ("dt-bindings: imx: Add pinctrl binding doc for imx8mm") Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/freescale/imx8mm-pinfunc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/freescale/imx8mm-pinfunc.h b/arch/arm64/boot/dts/freescale/imx8mm-pinfunc.h index 93b44efdbc52..35a60b0d3a4f 100644 --- a/arch/arm64/boot/dts/freescale/imx8mm-pinfunc.h +++ b/arch/arm64/boot/dts/freescale/imx8mm-pinfunc.h @@ -585,7 +585,7 @@ #define MX8MM_IOMUXC_UART1_RXD_GPIO5_IO22 0x234 0x49C 0x000 0x5 0x0 #define MX8MM_IOMUXC_UART1_RXD_TPSMP_HDATA24 0x234 0x49C 0x000 0x7 0x0 #define MX8MM_IOMUXC_UART1_TXD_UART1_DCE_TX 0x238 0x4A0 0x000 0x0 0x0 -#define MX8MM_IOMUXC_UART1_TXD_UART1_DTE_RX 0x238 0x4A0 0x4F4 0x0 0x0 +#define MX8MM_IOMUXC_UART1_TXD_UART1_DTE_RX 0x238 0x4A0 0x4F4 0x0 0x1 #define MX8MM_IOMUXC_UART1_TXD_ECSPI3_MOSI 0x238 0x4A0 0x000 0x1 0x0 #define MX8MM_IOMUXC_UART1_TXD_GPIO5_IO23 0x238 0x4A0 0x000 0x5 0x0 #define MX8MM_IOMUXC_UART1_TXD_TPSMP_HDATA25 0x238 0x4A0 0x000 0x7 0x0
From: Martin K. Petersen martin.petersen@oracle.com
[ Upstream commit 15600159bcc6abbeae6b33a849bef90dca28b78f ]
This reverts commit 948e922fc44611ee2de0c89583ca958cb5307d36.
Not all targets that return PQ=1 and PDT=0 should be ignored. While the SCSI spec is vague in this department, there appears to be a critical mass of devices which rely on devices being accessible with this combination of reported values.
Fixes: 948e922fc446 ("scsi: core: map PQ=1, PDT=other values to SCSI_SCAN_TARGET_PRESENT") Link: https://lore.kernel.org/r/yq1lelrleqr.fsf@ca-mkp.ca.oracle.com Acked-by: Bart Van Assche bvanassche@acm.org Acked-by: Martin Wilck mwilck@suse.com Acked-by: Hannes Reinecke hare@suse.de Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/scsi_scan.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 3fd109fd9335..d236322ced30 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -1130,8 +1130,7 @@ static int scsi_probe_and_add_lun(struct scsi_target *starget, * that no LUN is present, so don't add sdev in these cases. * Two specific examples are: * 1) NetApp targets: return PQ=1, PDT=0x1f - * 2) IBM/2145 targets: return PQ=1, PDT=0 - * 3) USB UFI: returns PDT=0x1f, with the PQ bits being "reserved" + * 2) USB UFI: returns PDT=0x1f, with the PQ bits being "reserved" * in the UFI 1.0 spec (we cannot rely on reserved bits). * * References: @@ -1145,8 +1144,8 @@ static int scsi_probe_and_add_lun(struct scsi_target *starget, * PDT=00h Direct-access device (floppy) * PDT=1Fh none (no FDD connected to the requested logical unit) */ - if (((result[0] >> 5) == 1 || - (starget->pdt_1f_for_no_lun && (result[0] & 0x1f) == 0x1f)) && + if (((result[0] >> 5) == 1 || starget->pdt_1f_for_no_lun) && + (result[0] & 0x1f) == 0x1f && !scsi_is_wlun(lun)) { SCSI_LOG_SCAN_BUS(3, sdev_printk(KERN_INFO, sdev, "scsi scan: peripheral device type"
From: Al Viro viro@zeniv.linux.org.uk
[ Upstream commit 974c36fb828aeae7b4f9063f94860ae6c5633efd ]
Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/fsi/fsi-sbefifo.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/fsi/fsi-sbefifo.c b/drivers/fsi/fsi-sbefifo.c index c8ccc99e214f..84a60d2d8e8a 100644 --- a/drivers/fsi/fsi-sbefifo.c +++ b/drivers/fsi/fsi-sbefifo.c @@ -640,7 +640,7 @@ static void sbefifo_collect_async_ffdc(struct sbefifo *sbefifo) } ffdc_iov.iov_base = ffdc; ffdc_iov.iov_len = SBEFIFO_MAX_FFDC_SIZE; - iov_iter_kvec(&ffdc_iter, WRITE, &ffdc_iov, 1, SBEFIFO_MAX_FFDC_SIZE); + iov_iter_kvec(&ffdc_iter, READ, &ffdc_iov, 1, SBEFIFO_MAX_FFDC_SIZE); cmd[0] = cpu_to_be32(2); cmd[1] = cpu_to_be32(SBEFIFO_CMD_GET_SBE_FFDC); rc = sbefifo_do_command(sbefifo, cmd, 2, &ffdc_iter); @@ -737,7 +737,7 @@ int sbefifo_submit(struct device *dev, const __be32 *command, size_t cmd_len, rbytes = (*resp_len) * sizeof(__be32); resp_iov.iov_base = response; resp_iov.iov_len = rbytes; - iov_iter_kvec(&resp_iter, WRITE, &resp_iov, 1, rbytes); + iov_iter_kvec(&resp_iter, READ, &resp_iov, 1, rbytes);
/* Perform the command */ mutex_lock(&sbefifo->lock); @@ -817,7 +817,7 @@ static ssize_t sbefifo_user_read(struct file *file, char __user *buf, /* Prepare iov iterator */ resp_iov.iov_base = buf; resp_iov.iov_len = len; - iov_iter_init(&resp_iter, WRITE, &resp_iov, 1, len); + iov_iter_init(&resp_iter, READ, &resp_iov, 1, len);
/* Perform the command */ mutex_lock(&sbefifo->lock);
From: Al Viro viro@zeniv.linux.org.uk
[ Upstream commit b676668d99155e6859d99bbf2df18b3f03851902 ]
Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/target/target_core_file.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/target/target_core_file.c b/drivers/target/target_core_file.c index 7143d03f0e02..18fbbe510d01 100644 --- a/drivers/target/target_core_file.c +++ b/drivers/target/target_core_file.c @@ -340,7 +340,7 @@ static int fd_do_rw(struct se_cmd *cmd, struct file *fd, len += sg->length; }
- iov_iter_bvec(&iter, READ, bvec, sgl_nents, len); + iov_iter_bvec(&iter, is_write, bvec, sgl_nents, len); if (is_write) ret = vfs_iter_write(fd, &iter, &pos, 0); else @@ -477,7 +477,7 @@ fd_execute_write_same(struct se_cmd *cmd) len += se_dev->dev_attrib.block_size; }
- iov_iter_bvec(&iter, READ, bvec, nolb, len); + iov_iter_bvec(&iter, WRITE, bvec, nolb, len); ret = vfs_iter_write(fd_dev->fd_file, &iter, &pos, 0);
kfree(bvec);
From: Al Viro viro@zeniv.linux.org.uk
[ Upstream commit fc02f33787d8dd227b54f263eba983d5b249c032 ]
Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/xen/pvcalls-back.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c index 9439de2ca0e4..9c267e27d9d9 100644 --- a/drivers/xen/pvcalls-back.c +++ b/drivers/xen/pvcalls-back.c @@ -129,13 +129,13 @@ static bool pvcalls_conn_back_read(void *opaque) if (masked_prod < masked_cons) { vec[0].iov_base = data->in + masked_prod; vec[0].iov_len = wanted; - iov_iter_kvec(&msg.msg_iter, WRITE, vec, 1, wanted); + iov_iter_kvec(&msg.msg_iter, READ, vec, 1, wanted); } else { vec[0].iov_base = data->in + masked_prod; vec[0].iov_len = array_size - masked_prod; vec[1].iov_base = data->in; vec[1].iov_len = wanted - vec[0].iov_len; - iov_iter_kvec(&msg.msg_iter, WRITE, vec, 2, wanted); + iov_iter_kvec(&msg.msg_iter, READ, vec, 2, wanted); }
atomic_set(&map->read, 0); @@ -188,13 +188,13 @@ static bool pvcalls_conn_back_write(struct sock_mapping *map) if (pvcalls_mask(prod, array_size) > pvcalls_mask(cons, array_size)) { vec[0].iov_base = data->out + pvcalls_mask(cons, array_size); vec[0].iov_len = size; - iov_iter_kvec(&msg.msg_iter, READ, vec, 1, size); + iov_iter_kvec(&msg.msg_iter, WRITE, vec, 1, size); } else { vec[0].iov_base = data->out + pvcalls_mask(cons, array_size); vec[0].iov_len = array_size - pvcalls_mask(cons, array_size); vec[1].iov_base = data->out; vec[1].iov_len = size - vec[0].iov_len; - iov_iter_kvec(&msg.msg_iter, READ, vec, 2, size); + iov_iter_kvec(&msg.msg_iter, WRITE, vec, 2, size); }
atomic_set(&map->write, 0);
From: Hyunwoo Kim v4bel@theori.io
[ Upstream commit 611792920925fb088ddccbe2783c7f92fdfb6b64 ]
If you call listen() and accept() on an already connect()ed AF_NETROM socket, accept() can successfully connect. This is because when the peer socket sends data to sendmsg, the skb with its own sk stored in the connected socket's sk->sk_receive_queue is connected, and nr_accept() dequeues the skb waiting in the sk->sk_receive_queue.
As a result, nr_accept() allocates and returns a sock with the sk of the parent AF_NETROM socket.
And here use-after-free can happen through complex race conditions: ``` cpu0 cpu1 1. socket_2 = socket(AF_NETROM) . . listen(socket_2) accepted_socket = accept(socket_2) 2. socket_1 = socket(AF_NETROM) nr_create() // sk refcount : 1 connect(socket_1) 3. write(accepted_socket) nr_sendmsg() nr_output() nr_kick() nr_send_iframe() nr_transmit_buffer() nr_route_frame() nr_loopback_queue() nr_loopback_timer() nr_rx_frame() nr_process_rx_frame(sk, skb); // sk : socket_1's sk nr_state3_machine() nr_queue_rx_frame() sock_queue_rcv_skb() sock_queue_rcv_skb_reason() __sock_queue_rcv_skb() __skb_queue_tail(list, skb); // list : socket_1's sk->sk_receive_queue 4. listen(socket_1) nr_listen() uaf_socket = accept(socket_1) nr_accept() skb_dequeue(&sk->sk_receive_queue); 5. close(accepted_socket) nr_release() nr_write_internal(sk, NR_DISCREQ) nr_transmit_buffer() // NR_DISCREQ nr_route_frame() nr_loopback_queue() nr_loopback_timer() nr_rx_frame() // sk : socket_1's sk nr_process_rx_frame() // NR_STATE_3 nr_state3_machine() // NR_DISCREQ nr_disconnect() nr_sk(sk)->state = NR_STATE_0; 6. close(socket_1) // sk refcount : 3 nr_release() // NR_STATE_0 sock_put(sk); // sk refcount : 0 sk_free(sk); close(uaf_socket) nr_release() sock_hold(sk); // UAF ```
KASAN report by syzbot: ``` BUG: KASAN: use-after-free in nr_release+0x66/0x460 net/netrom/af_netrom.c:520 Write of size 4 at addr ffff8880235d8080 by task syz-executor564/5128
Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:306 [inline] print_report+0x15e/0x461 mm/kasan/report.c:417 kasan_report+0xbf/0x1f0 mm/kasan/report.c:517 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x141/0x190 mm/kasan/generic.c:189 instrument_atomic_read_write include/linux/instrumented.h:102 [inline] atomic_fetch_add_relaxed include/linux/atomic/atomic-instrumented.h:116 [inline] __refcount_add include/linux/refcount.h:193 [inline] __refcount_inc include/linux/refcount.h:250 [inline] refcount_inc include/linux/refcount.h:267 [inline] sock_hold include/net/sock.h:775 [inline] nr_release+0x66/0x460 net/netrom/af_netrom.c:520 __sock_release+0xcd/0x280 net/socket.c:650 sock_close+0x1c/0x20 net/socket.c:1365 __fput+0x27c/0xa90 fs/file_table.c:320 task_work_run+0x16f/0x270 kernel/task_work.c:179 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0xaa8/0x2950 kernel/exit.c:867 do_group_exit+0xd4/0x2a0 kernel/exit.c:1012 get_signal+0x21c3/0x2450 kernel/signal.c:2859 arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306 exit_to_user_mode_loop kernel/entry/common.c:168 [inline] exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline] syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296 do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f6c19e3c9b9 Code: Unable to access opcode bytes at 0x7f6c19e3c98f. RSP: 002b:00007fffd4ba2ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: 0000000000000116 RBX: 0000000000000003 RCX: 00007f6c19e3c9b9 RDX: 0000000000000318 RSI: 00000000200bd000 RDI: 0000000000000006 RBP: 0000000000000003 R08: 000000000000000d R09: 000000000000000d R10: 0000000000000000 R11: 0000000000000246 R12: 000055555566a2c0 R13: 0000000000000011 R14: 0000000000000000 R15: 0000000000000000 </TASK>
Allocated by task 5128: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 ____kasan_kmalloc mm/kasan/common.c:371 [inline] ____kasan_kmalloc mm/kasan/common.c:330 [inline] __kasan_kmalloc+0xa3/0xb0 mm/kasan/common.c:380 kasan_kmalloc include/linux/kasan.h:211 [inline] __do_kmalloc_node mm/slab_common.c:968 [inline] __kmalloc+0x5a/0xd0 mm/slab_common.c:981 kmalloc include/linux/slab.h:584 [inline] sk_prot_alloc+0x140/0x290 net/core/sock.c:2038 sk_alloc+0x3a/0x7a0 net/core/sock.c:2091 nr_create+0xb6/0x5f0 net/netrom/af_netrom.c:433 __sock_create+0x359/0x790 net/socket.c:1515 sock_create net/socket.c:1566 [inline] __sys_socket_create net/socket.c:1603 [inline] __sys_socket_create net/socket.c:1588 [inline] __sys_socket+0x133/0x250 net/socket.c:1636 __do_sys_socket net/socket.c:1649 [inline] __se_sys_socket net/socket.c:1647 [inline] __x64_sys_socket+0x73/0xb0 net/socket.c:1647 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 5128: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:518 ____kasan_slab_free mm/kasan/common.c:236 [inline] ____kasan_slab_free+0x13b/0x1a0 mm/kasan/common.c:200 kasan_slab_free include/linux/kasan.h:177 [inline] __cache_free mm/slab.c:3394 [inline] __do_kmem_cache_free mm/slab.c:3580 [inline] __kmem_cache_free+0xcd/0x3b0 mm/slab.c:3587 sk_prot_free net/core/sock.c:2074 [inline] __sk_destruct+0x5df/0x750 net/core/sock.c:2166 sk_destruct net/core/sock.c:2181 [inline] __sk_free+0x175/0x460 net/core/sock.c:2192 sk_free+0x7c/0xa0 net/core/sock.c:2203 sock_put include/net/sock.h:1991 [inline] nr_release+0x39e/0x460 net/netrom/af_netrom.c:554 __sock_release+0xcd/0x280 net/socket.c:650 sock_close+0x1c/0x20 net/socket.c:1365 __fput+0x27c/0xa90 fs/file_table.c:320 task_work_run+0x16f/0x270 kernel/task_work.c:179 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0xaa8/0x2950 kernel/exit.c:867 do_group_exit+0xd4/0x2a0 kernel/exit.c:1012 get_signal+0x21c3/0x2450 kernel/signal.c:2859 arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306 exit_to_user_mode_loop kernel/entry/common.c:168 [inline] exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline] syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296 do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x63/0xcd ```
To fix this issue, nr_listen() returns -EINVAL for sockets that successfully nr_connect().
Reported-by: syzbot+caa188bdfc1eeafeb418@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Hyunwoo Kim v4bel@theori.io Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/netrom/af_netrom.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c index 58d5373c513c..7da77ddba5f4 100644 --- a/net/netrom/af_netrom.c +++ b/net/netrom/af_netrom.c @@ -378,6 +378,11 @@ static int nr_listen(struct socket *sock, int backlog) struct sock *sk = sock->sk;
lock_sock(sk); + if (sock->state != SS_UNCONNECTED) { + release_sock(sk); + return -EINVAL; + } + if (sk->sk_state != TCP_LISTEN) { memset(&nr_sk(sk)->user_addr, 0, AX25_ADDR_LEN); sk->sk_max_ack_backlog = backlog;
From: Florian Westphal fw@strlen.de
[ Upstream commit 2b272bb558f1d3a5aa95ed8a82253786fd1a48ba ]
When using a xfrm interface in a bridged setup (the outgoing device is bridged), the incoming packets in the xfrm interface are only tracked in the outgoing direction.
$ brctl show bridge name interfaces br_eth1 eth1
$ conntrack -L tcp 115 SYN_SENT src=192... dst=192... [UNREPLIED] ...
If br_netfilter is enabled, the first (encrypted) packet is received onR eth1, conntrack hooks are called from br_netfilter emulation which allocates nf_bridge info for this skb.
If the packet is for local machine, skb gets passed up the ip stack. The skb passes through ip prerouting a second time. br_netfilter ip_sabotage_in supresses the re-invocation of the hooks.
After this, skb gets decrypted in xfrm layer and appears in network stack a second time (after decryption).
Then, ip_sabotage_in is called again and suppresses netfilter hook invocation, even though the bridge layer never called them for the plaintext incarnation of the packet.
Free the bridge info after the first suppression to avoid this.
I was unable to figure out where the regression comes from, as far as i can see br_netfilter always had this problem; i did not expect that skb is looped again with different headers.
Fixes: c4b0e771f906 ("netfilter: avoid using skb->nf_bridge directly") Reported-and-tested-by: Wolfgang Nothdurft wolfgang@linogate.de Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bridge/br_netfilter_hooks.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c index 01e33724d10c..43cb7aab4eed 100644 --- a/net/bridge/br_netfilter_hooks.c +++ b/net/bridge/br_netfilter_hooks.c @@ -871,6 +871,7 @@ static unsigned int ip_sabotage_in(void *priv, if (nf_bridge && !nf_bridge->in_prerouting && !netif_is_l3_master(skb->dev) && !netif_is_l3_slave(skb->dev)) { + nf_bridge_info_free(skb); state->okfn(state->net, state->sk, skb); return NF_STOLEN; }
From: Fedor Pchelkin pchelkin@ispras.ru
[ Upstream commit 72e544b1b28325fe78a4687b980871a7e4101f76 ]
While mounting a corrupted filesystem, a signed integer '*xattr_ids' can become less than zero. This leads to the incorrect computation of 'len' and 'indexes' values which can cause null-ptr-deref in copy_bio_to_actor() or out-of-bounds accesses in the next sanity checks inside squashfs_read_xattr_id_table().
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Link: https://lkml.kernel.org/r/20230117105226.329303-2-pchelkin@ispras.ru Fixes: 506220d2ba21 ("squashfs: add more sanity checks in xattr id lookup") Reported-by: syzbot+082fa4af80a5bb1a9843@syzkaller.appspotmail.com Signed-off-by: Fedor Pchelkin pchelkin@ispras.ru Signed-off-by: Alexey Khoroshilov khoroshilov@ispras.ru Cc: Phillip Lougher phillip@squashfs.org.uk Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/squashfs/xattr_id.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/squashfs/xattr_id.c b/fs/squashfs/xattr_id.c index 087cab8c78f4..f6d78cbc3e74 100644 --- a/fs/squashfs/xattr_id.c +++ b/fs/squashfs/xattr_id.c @@ -76,7 +76,7 @@ __le64 *squashfs_read_xattr_id_table(struct super_block *sb, u64 table_start, /* Sanity check values */
/* there is always at least one xattr id */ - if (*xattr_ids == 0) + if (*xattr_ids <= 0) return ERR_PTR(-EINVAL);
len = SQUASHFS_XATTR_BLOCK_BYTES(*xattr_ids);
From: Chris Healy healych@amazon.com
[ Upstream commit afc2336f89dc0fc0ef25b92366814524b0fd90fb ]
The Meson G12A Internal PHY does not support standard IEEE MMD extended register access, therefore add generic dummy stubs to fail the read and write MMD calls. This is necessary to prevent the core PHY code from erroneously believing that EEE is supported by this PHY even though this PHY does not support EEE, as MMD register access returns all FFFFs.
Fixes: 5c3407abb338 ("net: phy: meson-gxl: add g12a support") Reviewed-by: Heiner Kallweit hkallweit1@gmail.com Signed-off-by: Chris Healy healych@amazon.com Reviewed-by: Jerome Brunet jbrunet@baylibre.com Link: https://lore.kernel.org/r/20230130231402.471493-1-cphealy@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/meson-gxl.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c index e8f2ca625837..f7a9e6599a64 100644 --- a/drivers/net/phy/meson-gxl.c +++ b/drivers/net/phy/meson-gxl.c @@ -245,6 +245,8 @@ static struct phy_driver meson_gxl_phy[] = { .config_intr = meson_gxl_config_intr, .suspend = genphy_suspend, .resume = genphy_resume, + .read_mmd = genphy_read_mmd_unsupported, + .write_mmd = genphy_write_mmd_unsupported, }, };
From: Ziyang Xuan william.xuanziyang@huawei.com
[ Upstream commit d0553680f94c49bbe0e39eb50d033ba563b4212d ]
The conclusion "j1939_session_deactivate() should be called with a session ref-count of at least 2" is incorrect. In some concurrent scenarios, j1939_session_deactivate can be called with the session ref-count less than 2. But there is not any problem because it will check the session active state before session putting in j1939_session_deactivate_locked().
Here is the concurrent scenario of the problem reported by syzbot and my reproduction log.
cpu0 cpu1 j1939_xtp_rx_eoma j1939_xtp_rx_abort_one j1939_session_get_by_addr [kref == 2] j1939_session_get_by_addr [kref == 3] j1939_session_deactivate [kref == 2] j1939_session_put [kref == 1] j1939_session_completed j1939_session_deactivate WARN_ON_ONCE(kref < 2)
===================================================== WARNING: CPU: 1 PID: 21 at net/can/j1939/transport.c:1088 j1939_session_deactivate+0x5f/0x70 CPU: 1 PID: 21 Comm: ksoftirqd/1 Not tainted 5.14.0-rc7+ #32 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 RIP: 0010:j1939_session_deactivate+0x5f/0x70 Call Trace: j1939_session_deactivate_activate_next+0x11/0x28 j1939_xtp_rx_eoma+0x12a/0x180 j1939_tp_recv+0x4a2/0x510 j1939_can_recv+0x226/0x380 can_rcv_filter+0xf8/0x220 can_receive+0x102/0x220 ? process_backlog+0xf0/0x2c0 can_rcv+0x53/0xf0 __netif_receive_skb_one_core+0x67/0x90 ? process_backlog+0x97/0x2c0 __netif_receive_skb+0x22/0x80
Fixes: 0c71437dd50d ("can: j1939: j1939_session_deactivate(): clarify lifetime of session object") Reported-by: syzbot+9981a614060dcee6eeca@syzkaller.appspotmail.com Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Acked-by: Oleksij Rempel o.rempel@pengutronix.de Link: https://lore.kernel.org/all/20210906094200.95868-1-william.xuanziyang@huawei... Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/can/j1939/transport.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c index 9ca19dfe3e83..9c8c7c5dc9c3 100644 --- a/net/can/j1939/transport.c +++ b/net/can/j1939/transport.c @@ -1087,10 +1087,6 @@ static bool j1939_session_deactivate(struct j1939_session *session) bool active;
j1939_session_list_lock(priv); - /* This function should be called with a session ref-count of at - * least 2. - */ - WARN_ON_ONCE(kref_read(&session->kref) < 2); active = j1939_session_deactivate_locked(session); j1939_session_list_unlock(priv);
From: Damien Le Moal damien.lemoal@opensource.wdc.com
[ Upstream commit 69f2c9346313ba3d3dfa4091ff99df26c67c9021 ]
Commit 2dc0b46b5ea3 ("libata: sata_down_spd_limit should return if driver has not recorded sstatus speed") changed the behavior of sata_down_spd_limit() to return doing nothing if a drive does not report a current link speed, to avoid reducing the link speed to the lowest 1.5 Gbps speed.
However, the change assumed that a speed was recorded before probing (e.g. before a suspend/resume) and set in link->sata_spd. This causes problems with adapters/drives combination failing to establish a link speed during probe autonegotiation. One example reported of this problem is an mvebu adapter with a 3Gbps port-multiplier box: autonegotiation fails, leaving no recorded link speed and no reported current link speed. Probe retries also fail as no action is taken by sata_set_spd() after each retry.
Fix this by returning early in sata_down_spd_limit() only if we do have a recorded link speed, that is, if link->sata_spd is not 0. With this fix, a failed probe not leading to a recorded link speed is retried at the lower 1.5 Gbps speed, with the link speed potentially increased later on the second revalidate of the device if the device reports that it supports higher link speeds.
Reported-by: Marius Dinu marius@psihoexpert.ro Fixes: 2dc0b46b5ea3 ("libata: sata_down_spd_limit should return if driver has not recorded sstatus speed") Reviewed-by: Niklas Cassel niklas.cassel@wdc.com Tested-by: Marius Dinu marius@psihoexpert.ro Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ata/libata-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index fbb1676aa33f..c06f618b1aa3 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -3096,7 +3096,7 @@ int sata_down_spd_limit(struct ata_link *link, u32 spd_limit) */ if (spd > 1) mask &= (1 << (spd - 1)) - 1; - else + else if (link->sata_spd) return -EINVAL;
/* were we already at the bottom? */
From: Andrei Gherzan andrei.gherzan@canonical.com
[ Upstream commit c03c80e3a03ffb4f790901d60797e9810539d946 ]
This change fixes the following compiler warning:
/usr/include/x86_64-linux-gnu/bits/error.h:40:5: warning: ‘gso_size’ may be used uninitialized [-Wmaybe-uninitialized] 40 | __error_noreturn (__status, __errnum, __format, __va_arg_pack ()); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ udpgso_bench_rx.c: In function ‘main’: udpgso_bench_rx.c:253:23: note: ‘gso_size’ was declared here 253 | int ret, len, gso_size, budget = 256;
Fixes: 3327a9c46352 ("selftests: add functionals test for UDP GRO") Signed-off-by: Andrei Gherzan andrei.gherzan@canonical.com Reviewed-by: Willem de Bruijn willemb@google.com Link: https://lore.kernel.org/r/20230201001612.515730-1-andrei.gherzan@canonical.c... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/udpgso_bench_rx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c index 6a193425c367..d0895bd1933f 100644 --- a/tools/testing/selftests/net/udpgso_bench_rx.c +++ b/tools/testing/selftests/net/udpgso_bench_rx.c @@ -250,7 +250,7 @@ static int recv_msg(int fd, char *buf, int len, int *gso_size) static void do_flush_udp(int fd) { static char rbuf[ETH_MAX_MTU]; - int ret, len, gso_size, budget = 256; + int ret, len, gso_size = 0, budget = 256;
len = cfg_read_all ? sizeof(rbuf) : 0; while (budget--) {
From: Andrei Gherzan andrei.gherzan@canonical.com
[ Upstream commit db9b47ee9f5f375ab0c5daeb20321c75b4fa657d ]
Leaving unrecognized arguments buried in the output, can easily hide a CLI/script typo. Avoid this by exiting when wrong arguments are provided to the udpgso_bench test programs.
Fixes: 3a687bef148d ("selftests: udp gso benchmark") Signed-off-by: Andrei Gherzan andrei.gherzan@canonical.com Cc: Willem de Bruijn willemb@google.com Reviewed-by: Willem de Bruijn willemb@google.com Link: https://lore.kernel.org/r/20230201001612.515730-2-andrei.gherzan@canonical.c... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/udpgso_bench_rx.c | 2 ++ tools/testing/selftests/net/udpgso_bench_tx.c | 2 ++ 2 files changed, 4 insertions(+)
diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c index d0895bd1933f..4058c7451e70 100644 --- a/tools/testing/selftests/net/udpgso_bench_rx.c +++ b/tools/testing/selftests/net/udpgso_bench_rx.c @@ -336,6 +336,8 @@ static void parse_opts(int argc, char **argv) cfg_verify = true; cfg_read_all = true; break; + default: + exit(1); } }
diff --git a/tools/testing/selftests/net/udpgso_bench_tx.c b/tools/testing/selftests/net/udpgso_bench_tx.c index f1fdaa270291..b47b5c32039f 100644 --- a/tools/testing/selftests/net/udpgso_bench_tx.c +++ b/tools/testing/selftests/net/udpgso_bench_tx.c @@ -490,6 +490,8 @@ static void parse_opts(int argc, char **argv) case 'z': cfg_zerocopy = true; break; + default: + exit(1); } }
From: Andrei Gherzan andrei.gherzan@canonical.com
[ Upstream commit dafe93b9ee21028d625dce347118b82659652eff ]
"udpgro_bench.sh" invokes udpgso_bench_rx/udpgso_bench_tx programs subsequently and while doing so, there is a chance that the rx one is not ready to accept socket connections. This racing bug could fail the test with at least one of the following:
./udpgso_bench_tx: connect: Connection refused ./udpgso_bench_tx: sendmsg: Connection refused ./udpgso_bench_tx: write: Connection refused
This change addresses this by making udpgro_bench.sh wait for the rx program to be ready before firing off the tx one - up to a 10s timeout.
Fixes: 3a687bef148d ("selftests: udp gso benchmark") Signed-off-by: Andrei Gherzan andrei.gherzan@canonical.com Cc: Paolo Abeni pabeni@redhat.com Cc: Willem de Bruijn willemb@google.com Reviewed-by: Willem de Bruijn willemb@google.com Link: https://lore.kernel.org/r/20230201001612.515730-3-andrei.gherzan@canonical.c... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/udpgso_bench.sh | 24 +++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/net/udpgso_bench.sh b/tools/testing/selftests/net/udpgso_bench.sh index dc932fd65363..640bc43452fa 100755 --- a/tools/testing/selftests/net/udpgso_bench.sh +++ b/tools/testing/selftests/net/udpgso_bench.sh @@ -7,6 +7,7 @@ readonly GREEN='\033[0;92m' readonly YELLOW='\033[0;33m' readonly RED='\033[0;31m' readonly NC='\033[0m' # No Color +readonly TESTPORT=8000
readonly KSFT_PASS=0 readonly KSFT_FAIL=1 @@ -56,11 +57,26 @@ trap wake_children EXIT
run_one() { local -r args=$@ + local nr_socks=0 + local i=0 + local -r timeout=10 + + ./udpgso_bench_rx -p "$TESTPORT" & + ./udpgso_bench_rx -p "$TESTPORT" -t & + + # Wait for the above test program to get ready to receive connections. + while [ "$i" -lt "$timeout" ]; do + nr_socks="$(ss -lnHi | grep -c "*:${TESTPORT}")" + [ "$nr_socks" -eq 2 ] && break + i=$((i + 1)) + sleep 1 + done + if [ "$nr_socks" -ne 2 ]; then + echo "timed out while waiting for udpgso_bench_rx" + exit 1 + fi
- ./udpgso_bench_rx & - ./udpgso_bench_rx -t & - - ./udpgso_bench_tx ${args} + ./udpgso_bench_tx -p "$TESTPORT" ${args} }
run_in_netns() {
From: Andrei Gherzan andrei.gherzan@canonical.com
[ Upstream commit 329c9cd769c2e306957df031efff656c40922c76 ]
The test tool can check that the zerocopy number of completions value is valid taking into consideration the number of datagram send calls. This can catch the system into a state where the datagrams are still in the system (for example in a qdisk, waiting for the network interface to return a completion notification, etc).
This change adds a retry logic of computing the number of completions up to a configurable (via CLI) timeout (default: 2 seconds).
Fixes: 79ebc3c26010 ("net/udpgso_bench_tx: options to exercise TX CMSG") Signed-off-by: Andrei Gherzan andrei.gherzan@canonical.com Cc: Willem de Bruijn willemb@google.com Cc: Paolo Abeni pabeni@redhat.com Reviewed-by: Willem de Bruijn willemb@google.com Link: https://lore.kernel.org/r/20230201001612.515730-4-andrei.gherzan@canonical.c... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/udpgso_bench_tx.c | 34 +++++++++++++++---- 1 file changed, 27 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/net/udpgso_bench_tx.c b/tools/testing/selftests/net/udpgso_bench_tx.c index b47b5c32039f..477392715a9a 100644 --- a/tools/testing/selftests/net/udpgso_bench_tx.c +++ b/tools/testing/selftests/net/udpgso_bench_tx.c @@ -62,6 +62,7 @@ static int cfg_payload_len = (1472 * 42); static int cfg_port = 8000; static int cfg_runtime_ms = -1; static bool cfg_poll; +static int cfg_poll_loop_timeout_ms = 2000; static bool cfg_segment; static bool cfg_sendmmsg; static bool cfg_tcp; @@ -235,16 +236,17 @@ static void flush_errqueue_recv(int fd) } }
-static void flush_errqueue(int fd, const bool do_poll) +static void flush_errqueue(int fd, const bool do_poll, + unsigned long poll_timeout, const bool poll_err) { if (do_poll) { struct pollfd fds = {0}; int ret;
fds.fd = fd; - ret = poll(&fds, 1, 500); + ret = poll(&fds, 1, poll_timeout); if (ret == 0) { - if (cfg_verbose) + if ((cfg_verbose) && (poll_err)) fprintf(stderr, "poll timeout\n"); } else if (ret < 0) { error(1, errno, "poll"); @@ -254,6 +256,20 @@ static void flush_errqueue(int fd, const bool do_poll) flush_errqueue_recv(fd); }
+static void flush_errqueue_retry(int fd, unsigned long num_sends) +{ + unsigned long tnow, tstop; + bool first_try = true; + + tnow = gettimeofday_ms(); + tstop = tnow + cfg_poll_loop_timeout_ms; + do { + flush_errqueue(fd, true, tstop - tnow, first_try); + first_try = false; + tnow = gettimeofday_ms(); + } while ((stat_zcopies != num_sends) && (tnow < tstop)); +} + static int send_tcp(int fd, char *data) { int ret, done = 0, count = 0; @@ -413,7 +429,8 @@ static int send_udp_segment(int fd, char *data)
static void usage(const char *filepath) { - error(1, 0, "Usage: %s [-46acmHPtTuvz] [-C cpu] [-D dst ip] [-l secs] [-M messagenr] [-p port] [-s sendsize] [-S gsosize]", + error(1, 0, "Usage: %s [-46acmHPtTuvz] [-C cpu] [-D dst ip] [-l secs] " + "[-L secs] [-M messagenr] [-p port] [-s sendsize] [-S gsosize]", filepath); }
@@ -423,7 +440,7 @@ static void parse_opts(int argc, char **argv) int max_len, hdrlen; int c;
- while ((c = getopt(argc, argv, "46acC:D:Hl:mM:p:s:PS:tTuvz")) != -1) { + while ((c = getopt(argc, argv, "46acC:D:Hl:L:mM:p:s:PS:tTuvz")) != -1) { switch (c) { case '4': if (cfg_family != PF_UNSPEC) @@ -452,6 +469,9 @@ static void parse_opts(int argc, char **argv) case 'l': cfg_runtime_ms = strtoul(optarg, NULL, 10) * 1000; break; + case 'L': + cfg_poll_loop_timeout_ms = strtoul(optarg, NULL, 10) * 1000; + break; case 'm': cfg_sendmmsg = true; break; @@ -679,7 +699,7 @@ int main(int argc, char **argv) num_sends += send_udp(fd, buf[i]); num_msgs++; if ((cfg_zerocopy && ((num_msgs & 0xF) == 0)) || cfg_tx_tstamp) - flush_errqueue(fd, cfg_poll); + flush_errqueue(fd, cfg_poll, 500, true);
if (cfg_msg_nr && num_msgs >= cfg_msg_nr) break; @@ -698,7 +718,7 @@ int main(int argc, char **argv) } while (!interrupted && (cfg_runtime_ms == -1 || tnow < tstop));
if (cfg_zerocopy || cfg_tx_tstamp) - flush_errqueue(fd, true); + flush_errqueue_retry(fd, num_sends);
if (close(fd)) error(1, errno, "close");
From: Parav Pandit parav@nvidia.com
[ Upstream commit 63b114042d8a9c02d9939889177c36dbdb17a588 ]
Cited commit in fixes tag frees rxq xdp info while RQ NAPI is still enabled and packet processing may be ongoing.
Follow the mirror sequence of open() in the stop() callback. This ensures that when rxq info is unregistered, no rx packet processing is ongoing.
Fixes: 754b8a21a96d ("virtio_net: setup xdp_rxq_info") Acked-by: Michael S. Tsirkin mst@redhat.com Reviewed-by: Jiri Pirko jiri@nvidia.com Signed-off-by: Parav Pandit parav@nvidia.com Link: https://lore.kernel.org/r/20230202163516.12559-1-parav@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/virtio_net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 579df7c5411d..5212d9cb0372 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1910,8 +1910,8 @@ static int virtnet_close(struct net_device *dev) cancel_delayed_work_sync(&vi->refill);
for (i = 0; i < vi->max_queue_pairs; i++) { - xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); napi_disable(&vi->rq[i].napi); + xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); virtnet_napi_tx_disable(&vi->sq[i].napi); }
From: Fedor Pchelkin pchelkin@ispras.ru
[ Upstream commit 0c598aed445eb45b0ee7ba405f7ece99ee349c30 ]
Syzkaller reports a memory leak of new_flow in ovs_flow_cmd_new() as it is not freed when an allocation of a key fails.
BUG: memory leak unreferenced object 0xffff888116668000 (size 632): comm "syz-executor231", pid 1090, jiffies 4294844701 (age 18.871s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000defa3494>] kmem_cache_zalloc include/linux/slab.h:654 [inline] [<00000000defa3494>] ovs_flow_alloc+0x19/0x180 net/openvswitch/flow_table.c:77 [<00000000c67d8873>] ovs_flow_cmd_new+0x1de/0xd40 net/openvswitch/datapath.c:957 [<0000000010a539a8>] genl_family_rcv_msg_doit+0x22d/0x330 net/netlink/genetlink.c:739 [<00000000dff3302d>] genl_family_rcv_msg net/netlink/genetlink.c:783 [inline] [<00000000dff3302d>] genl_rcv_msg+0x328/0x590 net/netlink/genetlink.c:800 [<000000000286dd87>] netlink_rcv_skb+0x153/0x430 net/netlink/af_netlink.c:2515 [<0000000061fed410>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:811 [<000000009dc0f111>] netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline] [<000000009dc0f111>] netlink_unicast+0x545/0x7f0 net/netlink/af_netlink.c:1339 [<000000004a5ee816>] netlink_sendmsg+0x8e7/0xde0 net/netlink/af_netlink.c:1934 [<00000000482b476f>] sock_sendmsg_nosec net/socket.c:651 [inline] [<00000000482b476f>] sock_sendmsg+0x152/0x190 net/socket.c:671 [<00000000698574ba>] ____sys_sendmsg+0x70a/0x870 net/socket.c:2356 [<00000000d28d9e11>] ___sys_sendmsg+0xf3/0x170 net/socket.c:2410 [<0000000083ba9120>] __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439 [<00000000c00628f8>] do_syscall_64+0x30/0x40 arch/x86/entry/common.c:46 [<000000004abfdcf4>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
To fix this the patch rearranges the goto labels to reflect the order of object allocations and adds appropriate goto statements on the error paths.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 68bb10101e6b ("openvswitch: Fix flow lookup to use unmasked key") Signed-off-by: Fedor Pchelkin pchelkin@ispras.ru Signed-off-by: Alexey Khoroshilov khoroshilov@ispras.ru Acked-by: Eelco Chaudron echaudro@redhat.com Reviewed-by: Simon Horman simon.horman@corigine.com Link: https://lore.kernel.org/r/20230201210218.361970-1-pchelkin@ispras.ru Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/openvswitch/datapath.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index a8a8396dd983..4c537e74b18c 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -941,14 +941,14 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info) key = kzalloc(sizeof(*key), GFP_KERNEL); if (!key) { error = -ENOMEM; - goto err_kfree_key; + goto err_kfree_flow; }
ovs_match_init(&match, key, false, &mask); error = ovs_nla_get_match(net, &match, a[OVS_FLOW_ATTR_KEY], a[OVS_FLOW_ATTR_MASK], log); if (error) - goto err_kfree_flow; + goto err_kfree_key;
ovs_flow_mask_key(&new_flow->key, key, true, &mask);
@@ -956,14 +956,14 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info) error = ovs_nla_get_identifier(&new_flow->id, a[OVS_FLOW_ATTR_UFID], key, log); if (error) - goto err_kfree_flow; + goto err_kfree_key;
/* Validate actions. */ error = ovs_nla_copy_actions(net, a[OVS_FLOW_ATTR_ACTIONS], &new_flow->key, &acts, log); if (error) { OVS_NLERR(log, "Flow actions may not be safe on all matching packets."); - goto err_kfree_flow; + goto err_kfree_key; }
reply = ovs_flow_cmd_alloc_info(acts, &new_flow->id, info, false, @@ -1063,10 +1063,10 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info) kfree_skb(reply); err_kfree_acts: ovs_nla_free_flow_actions(acts); -err_kfree_flow: - ovs_flow_free(new_flow, false); err_kfree_key: kfree(key); +err_kfree_flow: + ovs_flow_free(new_flow, false); error: return error; }
From: Anton Gusev aagusev@ispras.ru
[ Upstream commit 966d47e1f27c45507c5df82b2a2157e5a4fd3909 ]
When iterating on a linked list, a result of memremap is dereferenced without checking it for NULL.
This patch adds a check that falls back on allocating a new page in case memremap doesn't succeed.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 18df7577adae ("efi/memreserve: deal with memreserve entries in unmapped memory") Signed-off-by: Anton Gusev aagusev@ispras.ru [ardb: return -ENOMEM instead of breaking out of the loop] Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/efi/efi.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index eb98018ab420..ed31b08855f9 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -1022,6 +1022,8 @@ int __ref efi_mem_reserve_persistent(phys_addr_t addr, u64 size) /* first try to find a slot in an existing linked list entry */ for (prsv = efi_memreserve_root->next; prsv; ) { rsv = memremap(prsv, sizeof(*rsv), MEMREMAP_WB); + if (!rsv) + return -ENOMEM; index = atomic_fetch_add_unless(&rsv->count, 1, rsv->size); if (index < rsv->size) { rsv->entry[index].base = addr;
From: Maurizio Lombardi mlombard@redhat.com
[ Upstream commit 84ed64b1a7a7fcd507598dee7708c1f225123711 ]
Calling spin_lock_irqsave() does not disable the interrupts on realtime kernels, remove the warning and replace assert_spin_locked() with lockdep_assert_held().
Signed-off-by: Maurizio Lombardi mlombard@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20230110125310.55884-1-mlombard@redhat.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/target/target_core_tmr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c index feeba3966617..6928ebf0be9c 100644 --- a/drivers/target/target_core_tmr.c +++ b/drivers/target/target_core_tmr.c @@ -82,8 +82,8 @@ static bool __target_check_io_state(struct se_cmd *se_cmd, { struct se_session *sess = se_cmd->se_sess;
- assert_spin_locked(&sess->sess_cmd_lock); - WARN_ON_ONCE(!irqs_disabled()); + lockdep_assert_held(&sess->sess_cmd_lock); + /* * If command already reached CMD_T_COMPLETE state within * target_complete_cmd() or CMD_T_FABRIC_STOP due to shutdown,
From: Mike Christie michael.christie@oracle.com
[ Upstream commit f484a794e4ee2a9ce61f52a78e810ac45f3fe3b3 ]
If during iscsi_sw_tcp_session_create() iscsi_tcp_r2tpool_alloc() fails, userspace could be accessing the host's ipaddress attr. If we then free the session via iscsi_session_teardown() while userspace is still accessing the session we will hit a use after free bug.
Set the tcp_sw_host->session after we have completed session creation and can no longer fail.
Link: https://lore.kernel.org/r/20230117193937.21244-3-michael.christie@oracle.com Signed-off-by: Mike Christie michael.christie@oracle.com Reviewed-by: Lee Duncan lduncan@suse.com Acked-by: Ding Hui dinghui@sangfor.com.cn Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/iscsi_tcp.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c index b5dd1caae5e9..9320a0a92bb2 100644 --- a/drivers/scsi/iscsi_tcp.c +++ b/drivers/scsi/iscsi_tcp.c @@ -770,7 +770,7 @@ static int iscsi_sw_tcp_host_get_param(struct Scsi_Host *shost, enum iscsi_host_param param, char *buf) { struct iscsi_sw_tcp_host *tcp_sw_host = iscsi_host_priv(shost); - struct iscsi_session *session = tcp_sw_host->session; + struct iscsi_session *session; struct iscsi_conn *conn; struct iscsi_tcp_conn *tcp_conn; struct iscsi_sw_tcp_conn *tcp_sw_conn; @@ -779,6 +779,7 @@ static int iscsi_sw_tcp_host_get_param(struct Scsi_Host *shost,
switch (param) { case ISCSI_HOST_PARAM_IPADDRESS: + session = tcp_sw_host->session; if (!session) return -ENOTCONN;
@@ -867,12 +868,14 @@ iscsi_sw_tcp_session_create(struct iscsi_endpoint *ep, uint16_t cmds_max, if (!cls_session) goto remove_host; session = cls_session->dd_data; - tcp_sw_host = iscsi_host_priv(shost); - tcp_sw_host->session = session;
shost->can_queue = session->scsi_cmds_max; if (iscsi_tcp_r2tpool_alloc(session)) goto remove_session; + + /* We are now fully setup so expose the session to sysfs. */ + tcp_sw_host = iscsi_host_priv(shost); + tcp_sw_host->session = session; return cls_session;
remove_session:
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit 0582d984793d30442da88fe458674502bad1ad29 ]
Fix multiple W=1 kernel-doc warnings in i2c-rk3x.c:
drivers/i2c/busses/i2c-rk3x.c:83: warning: missing initial short description on line: * struct i2c_spec_values: drivers/i2c/busses/i2c-rk3x.c:139: warning: missing initial short description on line: * struct rk3x_i2c_calced_timings: drivers/i2c/busses/i2c-rk3x.c:162: warning: missing initial short description on line: * struct rk3x_i2c_soc_data: drivers/i2c/busses/i2c-rk3x.c:242: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Generate a START condition, which triggers a REG_INT_START interrupt. drivers/i2c/busses/i2c-rk3x.c:261: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Generate a STOP condition, which triggers a REG_INT_STOP interrupt. drivers/i2c/busses/i2c-rk3x.c:304: warning: expecting prototype for Setup a read according to i2c(). Prototype was for rk3x_i2c_prepare_read() instead drivers/i2c/busses/i2c-rk3x.c:335: warning: expecting prototype for Fill the transmit buffer with data from i2c(). Prototype was for rk3x_i2c_fill_transmit_buf() instead drivers/i2c/busses/i2c-rk3x.c:535: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Get timing values of I2C specification drivers/i2c/busses/i2c-rk3x.c:552: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Calculate divider values for desired SCL frequency drivers/i2c/busses/i2c-rk3x.c:713: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Calculate timing values for desired SCL frequency drivers/i2c/busses/i2c-rk3x.c:963: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Setup I2C registers for an I2C operation specified by msgs, num.
Signed-off-by: Randy Dunlap rdunlap@infradead.org Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-rk3x.c | 44 +++++++++++++++++------------------ 1 file changed, 22 insertions(+), 22 deletions(-)
diff --git a/drivers/i2c/busses/i2c-rk3x.c b/drivers/i2c/busses/i2c-rk3x.c index 1107a5e7229e..ac3ae14a4c07 100644 --- a/drivers/i2c/busses/i2c-rk3x.c +++ b/drivers/i2c/busses/i2c-rk3x.c @@ -79,7 +79,7 @@ enum { #define DEFAULT_SCL_RATE (100 * 1000) /* Hz */
/** - * struct i2c_spec_values: + * struct i2c_spec_values - I2C specification values for various modes * @min_hold_start_ns: min hold time (repeated) START condition * @min_low_ns: min LOW period of the SCL clock * @min_high_ns: min HIGH period of the SCL cloc @@ -135,7 +135,7 @@ static const struct i2c_spec_values fast_mode_plus_spec = { };
/** - * struct rk3x_i2c_calced_timings: + * struct rk3x_i2c_calced_timings - calculated V1 timings * @div_low: Divider output for low * @div_high: Divider output for high * @tuning: Used to adjust setup/hold data time, @@ -158,7 +158,7 @@ enum rk3x_i2c_state { };
/** - * struct rk3x_i2c_soc_data: + * struct rk3x_i2c_soc_data - SOC-specific data * @grf_offset: offset inside the grf regmap for setting the i2c type * @calc_timings: Callback function for i2c timing information calculated */ @@ -238,7 +238,8 @@ static inline void rk3x_i2c_clean_ipd(struct rk3x_i2c *i2c) }
/** - * Generate a START condition, which triggers a REG_INT_START interrupt. + * rk3x_i2c_start - Generate a START condition, which triggers a REG_INT_START interrupt. + * @i2c: target controller data */ static void rk3x_i2c_start(struct rk3x_i2c *i2c) { @@ -257,8 +258,8 @@ static void rk3x_i2c_start(struct rk3x_i2c *i2c) }
/** - * Generate a STOP condition, which triggers a REG_INT_STOP interrupt. - * + * rk3x_i2c_stop - Generate a STOP condition, which triggers a REG_INT_STOP interrupt. + * @i2c: target controller data * @error: Error code to return in rk3x_i2c_xfer */ static void rk3x_i2c_stop(struct rk3x_i2c *i2c, int error) @@ -297,7 +298,8 @@ static void rk3x_i2c_stop(struct rk3x_i2c *i2c, int error) }
/** - * Setup a read according to i2c->msg + * rk3x_i2c_prepare_read - Setup a read according to i2c->msg + * @i2c: target controller data */ static void rk3x_i2c_prepare_read(struct rk3x_i2c *i2c) { @@ -328,7 +330,8 @@ static void rk3x_i2c_prepare_read(struct rk3x_i2c *i2c) }
/** - * Fill the transmit buffer with data from i2c->msg + * rk3x_i2c_fill_transmit_buf - Fill the transmit buffer with data from i2c->msg + * @i2c: target controller data */ static void rk3x_i2c_fill_transmit_buf(struct rk3x_i2c *i2c) { @@ -531,11 +534,10 @@ static irqreturn_t rk3x_i2c_irq(int irqno, void *dev_id) }
/** - * Get timing values of I2C specification - * + * rk3x_i2c_get_spec - Get timing values of I2C specification * @speed: Desired SCL frequency * - * Returns: Matched i2c spec values. + * Return: Matched i2c_spec_values. */ static const struct i2c_spec_values *rk3x_i2c_get_spec(unsigned int speed) { @@ -548,13 +550,12 @@ static const struct i2c_spec_values *rk3x_i2c_get_spec(unsigned int speed) }
/** - * Calculate divider values for desired SCL frequency - * + * rk3x_i2c_v0_calc_timings - Calculate divider values for desired SCL frequency * @clk_rate: I2C input clock rate * @t: Known I2C timing information * @t_calc: Caculated rk3x private timings that would be written into regs * - * Returns: 0 on success, -EINVAL if the goal SCL rate is too slow. In that case + * Return: %0 on success, -%EINVAL if the goal SCL rate is too slow. In that case * a best-effort divider value is returned in divs. If the target rate is * too high, we silently use the highest possible rate. */ @@ -709,13 +710,12 @@ static int rk3x_i2c_v0_calc_timings(unsigned long clk_rate, }
/** - * Calculate timing values for desired SCL frequency - * + * rk3x_i2c_v1_calc_timings - Calculate timing values for desired SCL frequency * @clk_rate: I2C input clock rate * @t: Known I2C timing information * @t_calc: Caculated rk3x private timings that would be written into regs * - * Returns: 0 on success, -EINVAL if the goal SCL rate is too slow. In that case + * Return: %0 on success, -%EINVAL if the goal SCL rate is too slow. In that case * a best-effort divider value is returned in divs. If the target rate is * too high, we silently use the highest possible rate. * The following formulas are v1's method to calculate timings. @@ -959,14 +959,14 @@ static int rk3x_i2c_clk_notifier_cb(struct notifier_block *nb, unsigned long }
/** - * Setup I2C registers for an I2C operation specified by msgs, num. - * - * Must be called with i2c->lock held. - * + * rk3x_i2c_setup - Setup I2C registers for an I2C operation specified by msgs, num. + * @i2c: target controller data * @msgs: I2C msgs to process * @num: Number of msgs * - * returns: Number of I2C msgs processed or negative in case of error + * Must be called with i2c->lock held. + * + * Return: Number of I2C msgs processed or negative in case of error */ static int rk3x_i2c_setup(struct rk3x_i2c *i2c, struct i2c_msg *msgs, int num) {
From: Hyunwoo Kim v4bel@theori.io
[ Upstream commit f2b0b5210f67c56a3bcdf92ff665fb285d6e0067 ]
When listen() and accept() are called on an x25 socket that connect() succeeds, accept() succeeds immediately. This is because x25_connect() queues the skb to sk->sk_receive_queue, and x25_accept() dequeues it.
This creates a child socket with the sk of the parent x25 socket, which can cause confusion.
Fix x25_listen() to return -EINVAL if the socket has already been successfully connect()ed to avoid this issue.
Signed-off-by: Hyunwoo Kim v4bel@theori.io Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/x25/af_x25.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c index c94aa587e0c9..43dd489ad6db 100644 --- a/net/x25/af_x25.c +++ b/net/x25/af_x25.c @@ -492,6 +492,12 @@ static int x25_listen(struct socket *sock, int backlog) int rc = -EOPNOTSUPP;
lock_sock(sk); + if (sock->state != SS_UNCONNECTED) { + rc = -EINVAL; + release_sock(sk); + return rc; + } + if (sk->sk_state != TCP_LISTEN) { memset(&x25_sk(sk)->dest_addr, 0, X25_ADDR_LEN); sk->sk_max_ack_backlog = backlog;
From: Olivier Moysan olivier.moysan@foss.st.com
[ Upstream commit cc3304052a89ab6ac887ed9224420a27e3d354e1 ]
When STM32 DFSDM driver is built as module, no modalias information is available. This prevents module to be loaded by udev. Add MODULE_DEVICE_TABLE() to fill module aliases.
Fixes: e2e6771c6462 ("IIO: ADC: add STM32 DFSDM sigma delta ADC support") Signed-off-by: Olivier Moysan olivier.moysan@foss.st.com Link: https://lore.kernel.org/r/20221202152848.45585-1-olivier.moysan@foss.st.com Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iio/adc/stm32-dfsdm-adc.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/iio/adc/stm32-dfsdm-adc.c b/drivers/iio/adc/stm32-dfsdm-adc.c index c2948defa785..0c92eb30f0d8 100644 --- a/drivers/iio/adc/stm32-dfsdm-adc.c +++ b/drivers/iio/adc/stm32-dfsdm-adc.c @@ -1544,6 +1544,7 @@ static const struct of_device_id stm32_dfsdm_adc_match[] = { }, {} }; +MODULE_DEVICE_TABLE(of, stm32_dfsdm_adc_match);
static int stm32_dfsdm_adc_probe(struct platform_device *pdev) {
From: Wesley Cheng wcheng@codeaurora.org
[ Upstream commit 8e6cb5d27e8246d9c986ec162d066a502d2b602b ]
There was an extra character in the dwc3_qcom_vbus_override_enable() function. Removed the extra character.
Signed-off-by: Wesley Cheng wcheng@codeaurora.org Signed-off-by: Bryan O'Donoghue bryan.odonoghue@linaro.org Reviewed-by: Bjorn Andersson bjorn.andersson@linaro.org Link: https://lore.kernel.org/r/20210704013314.200951-2-bryan.odonoghue@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Stable-dep-of: eb320f76e31d ("usb: dwc3: qcom: enable vbus override when in OTG dr-mode") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/dwc3/dwc3-qcom.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/dwc3/dwc3-qcom.c b/drivers/usb/dwc3/dwc3-qcom.c index aed35276e0e0..03321457a4a1 100644 --- a/drivers/usb/dwc3/dwc3-qcom.c +++ b/drivers/usb/dwc3/dwc3-qcom.c @@ -102,7 +102,7 @@ static inline void dwc3_qcom_clrbits(void __iomem *base, u32 offset, u32 val) readl(base + offset); }
-static void dwc3_qcom_vbus_overrride_enable(struct dwc3_qcom *qcom, bool enable) +static void dwc3_qcom_vbus_override_enable(struct dwc3_qcom *qcom, bool enable) { if (enable) { dwc3_qcom_setbits(qcom->qscratch_base, QSCRATCH_SS_PHY_CTRL, @@ -123,7 +123,7 @@ static int dwc3_qcom_vbus_notifier(struct notifier_block *nb, struct dwc3_qcom *qcom = container_of(nb, struct dwc3_qcom, vbus_nb);
/* enable vbus override for device mode */ - dwc3_qcom_vbus_overrride_enable(qcom, event); + dwc3_qcom_vbus_override_enable(qcom, event); qcom->mode = event ? USB_DR_MODE_PERIPHERAL : USB_DR_MODE_HOST;
return NOTIFY_DONE; @@ -135,7 +135,7 @@ static int dwc3_qcom_host_notifier(struct notifier_block *nb, struct dwc3_qcom *qcom = container_of(nb, struct dwc3_qcom, host_nb);
/* disable vbus override in host mode */ - dwc3_qcom_vbus_overrride_enable(qcom, !event); + dwc3_qcom_vbus_override_enable(qcom, !event); qcom->mode = event ? USB_DR_MODE_HOST : USB_DR_MODE_PERIPHERAL;
return NOTIFY_DONE; @@ -670,7 +670,7 @@ static int dwc3_qcom_probe(struct platform_device *pdev)
/* enable vbus override for device mode */ if (qcom->mode == USB_DR_MODE_PERIPHERAL) - dwc3_qcom_vbus_overrride_enable(qcom, true); + dwc3_qcom_vbus_override_enable(qcom, true);
/* register extcon to override sw_vbus on Vbus change later */ ret = dwc3_qcom_register_extcon(qcom);
From: Neil Armstrong neil.armstrong@linaro.org
[ Upstream commit eb320f76e31dc835b9f57f04af1a2353b13bb7d8 ]
With vbus override enabled when in OTG dr_mode, Host<->Peripheral switch now works on SM8550, otherwise the DWC3 seems to be stuck in Host mode only.
Fixes: a4333c3a6ba9 ("usb: dwc3: Add Qualcomm DWC3 glue driver") Reviewed-by: Bryan O'Donoghue bryan.odonoghue@linaro.org Signed-off-by: Neil Armstrong neil.armstrong@linaro.org Link: https://lore.kernel.org/r/20230123-topic-sm8550-upstream-dwc3-qcom-otg-v2-1-... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/dwc3/dwc3-qcom.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/dwc3-qcom.c b/drivers/usb/dwc3/dwc3-qcom.c index 03321457a4a1..2dcdeb52fc29 100644 --- a/drivers/usb/dwc3/dwc3-qcom.c +++ b/drivers/usb/dwc3/dwc3-qcom.c @@ -669,7 +669,7 @@ static int dwc3_qcom_probe(struct platform_device *pdev) qcom->mode = usb_get_dr_mode(&qcom->dwc3->dev);
/* enable vbus override for device mode */ - if (qcom->mode == USB_DR_MODE_PERIPHERAL) + if (qcom->mode != USB_DR_MODE_HOST) dwc3_qcom_vbus_override_enable(qcom, true);
/* register extcon to override sw_vbus on Vbus change later */
From: Udipto Goswami quic_ugoswami@quicinc.com
[ Upstream commit 921deb9da15851425ccbb6ee409dc2fd8fbdfe6b ]
__ffs_ep0_queue_wait executes holding the spinlock of &ffs->ev.waitq.lock and unlocks it after the assignments to usb_request are done. However in the code if the request is already NULL we bail out returning -EINVAL but never unlocked the spinlock.
Fix this by adding spin_unlock_irq &ffs->ev.waitq.lock before returning.
Fixes: 6a19da111057 ("usb: gadget: f_fs: Prevent race during ffs_ep0_queue_wait") Reviewed-by: John Keeping john@metanate.com Signed-off-by: Udipto Goswami quic_ugoswami@quicinc.com Link: https://lore.kernel.org/r/20230124091149.18647-1-quic_ugoswami@quicinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/gadget/function/f_fs.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c index 431ab6d07497..5fe749036773 100644 --- a/drivers/usb/gadget/function/f_fs.c +++ b/drivers/usb/gadget/function/f_fs.c @@ -278,8 +278,10 @@ static int __ffs_ep0_queue_wait(struct ffs_data *ffs, char *data, size_t len) struct usb_request *req = ffs->ep0req; int ret;
- if (!req) + if (!req) { + spin_unlock_irq(&ffs->ev.waitq.lock); return -EINVAL; + }
req->zero = len < le16_to_cpu(ffs->ev.setup.wLength);
From: George Kennedy george.kennedy@oracle.com
[ Upstream commit 226fae124b2dac217ea5436060d623ff3385bc34 ]
After a call to console_unlock() in vcs_read() the vc_data struct can be freed by vc_deallocate(). Because of that, the struct vc_data pointer load must be done at the top of while loop in vcs_read() to avoid a UAF when vcs_size() is called.
Syzkaller reported a UAF in vcs_size().
BUG: KASAN: use-after-free in vcs_size (drivers/tty/vt/vc_screen.c:215) Read of size 4 at addr ffff8881137479a8 by task 4a005ed81e27e65/1537
CPU: 0 PID: 1537 Comm: 4a005ed81e27e65 Not tainted 6.2.0-rc5 #1 Hardware name: Red Hat KVM, BIOS 1.15.0-2.module Call Trace: <TASK> __asan_report_load4_noabort (mm/kasan/report_generic.c:350) vcs_size (drivers/tty/vt/vc_screen.c:215) vcs_read (drivers/tty/vt/vc_screen.c:415) vfs_read (fs/read_write.c:468 fs/read_write.c:450) ... </TASK>
Allocated by task 1191: ... kmalloc_trace (mm/slab_common.c:1069) vc_allocate (./include/linux/slab.h:580 ./include/linux/slab.h:720 drivers/tty/vt/vt.c:1128 drivers/tty/vt/vt.c:1108) con_install (drivers/tty/vt/vt.c:3383) tty_init_dev (drivers/tty/tty_io.c:1301 drivers/tty/tty_io.c:1413 drivers/tty/tty_io.c:1390) tty_open (drivers/tty/tty_io.c:2080 drivers/tty/tty_io.c:2126) chrdev_open (fs/char_dev.c:415) do_dentry_open (fs/open.c:883) vfs_open (fs/open.c:1014) ...
Freed by task 1548: ... kfree (mm/slab_common.c:1021) vc_port_destruct (drivers/tty/vt/vt.c:1094) tty_port_destructor (drivers/tty/tty_port.c:296) tty_port_put (drivers/tty/tty_port.c:312) vt_disallocate_all (drivers/tty/vt/vt_ioctl.c:662 (discriminator 2)) vt_ioctl (drivers/tty/vt/vt_ioctl.c:903) tty_ioctl (drivers/tty/tty_io.c:2776) ...
The buggy address belongs to the object at ffff888113747800 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 424 bytes inside of 1024-byte region [ffff888113747800, ffff888113747c00)
The buggy address belongs to the physical page: page:00000000b3fe6c7c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x113740 head:00000000b3fe6c7c order:3 compound_mapcount:0 subpages_mapcount:0 compound_pincount:0 anon flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff) raw: 0017ffffc0010200 ffff888100042dc0 0000000000000000 dead000000000001 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff888113747880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888113747900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888113747980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff888113747a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888113747a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Disabling lock debugging due to kernel taint
Fixes: ac751efa6a0d ("console: rename acquire/release_console_sem() to console_lock/unlock()") Reported-by: syzkaller syzkaller@googlegroups.com Suggested-by: Jiri Slaby jirislaby@kernel.org Signed-off-by: George Kennedy george.kennedy@oracle.com Link: https://lore.kernel.org/r/1674577014-12374-1-git-send-email-george.kennedy@o... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/tty/vt/vc_screen.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/tty/vt/vc_screen.c b/drivers/tty/vt/vc_screen.c index 778f83ea2249..e61fd04a0d8d 100644 --- a/drivers/tty/vt/vc_screen.c +++ b/drivers/tty/vt/vc_screen.c @@ -265,10 +265,6 @@ vcs_read(struct file *file, char __user *buf, size_t count, loff_t *ppos)
uni_mode = use_unicode(inode); attr = use_attributes(inode); - ret = -ENXIO; - vc = vcs_vc(inode, &viewed); - if (!vc) - goto unlock_out;
ret = -EINVAL; if (pos < 0) @@ -288,6 +284,11 @@ vcs_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) ssize_t orig_count; long p = pos;
+ ret = -ENXIO; + vc = vcs_vc(inode, &viewed); + if (!vc) + goto unlock_out; + /* Check whether we are above size each round, * as copy_to_user at the end of this loop * could sleep.
From: Werner Sembach wse@tuxedocomputers.com
[ Upstream commit 95a9916c909f0b1d95e24b4232b4bc38ff755415 ]
Move __intconst from before i8042_dmi_laptop_table[] to after it for consistent code styling.
Signed-off-by: Werner Sembach wse@tuxedocomputers.com Reviewed-by: Hans de Goede hdegoede@redhat.com Link: https://lore.kernel.org/r/20220629112725.12922-2-wse@tuxedocomputers.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Stable-dep-of: 9c445d2637c9 ("Input: i8042 - add Clevo PCX0DX to i8042 quirk table") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/input/serio/i8042-x86ia64io.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/input/serio/i8042-x86ia64io.h b/drivers/input/serio/i8042-x86ia64io.h index 0282c4c55e9d..f0611dabea35 100644 --- a/drivers/input/serio/i8042-x86ia64io.h +++ b/drivers/input/serio/i8042-x86ia64io.h @@ -791,7 +791,7 @@ static const struct dmi_system_id __initconst i8042_dmi_nopnp_table[] = { { } };
-static const struct dmi_system_id __initconst i8042_dmi_laptop_table[] = { +static const struct dmi_system_id i8042_dmi_laptop_table[] __initconst = { { .matches = { DMI_MATCH(DMI_CHASSIS_TYPE, "8"), /* Portable */
From: Werner Sembach wse@tuxedocomputers.com
[ Upstream commit ff946268a0813c35b790dfbe07c3bfaa7bfb869c ]
Merge i8042 quirk tables to reduce code duplication for devices that need more than one quirk. Before every quirk had its own table with devices needing that quirk. If a new quirk needed to be added a new table had to be created. When a device needed multiple quirks, it appeared in multiple tables. Now only one table called i8042_dmi_quirk_table exists. In it every device has one entry and required quirks are coded in the .driver_data field of the struct dmi_system_id used by this table. Multiple quirks for one device can be applied by bitwise-or of the new SERIO_QUIRK_* defines.
Also align quirkable options with command line parameters and make vendor wide quirks per device overwriteable on a per device basis. The first match is honored while following matches are ignored. So when a vendor wide quirk is defined in the table, a device can inserted before and therefore ignoring the vendor wide define.
Signed-off-by: Werner Sembach wse@tuxedocomputers.com Reviewed-by: Hans de Goede hdegoede@redhat.com Link: https://lore.kernel.org/r/20220629112725.12922-3-wse@tuxedocomputers.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Stable-dep-of: 9c445d2637c9 ("Input: i8042 - add Clevo PCX0DX to i8042 quirk table") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/input/serio/i8042-x86ia64io.h | 1100 +++++++++++++------------ 1 file changed, 595 insertions(+), 505 deletions(-)
diff --git a/drivers/input/serio/i8042-x86ia64io.h b/drivers/input/serio/i8042-x86ia64io.h index f0611dabea35..73374c15eb27 100644 --- a/drivers/input/serio/i8042-x86ia64io.h +++ b/drivers/input/serio/i8042-x86ia64io.h @@ -67,654 +67,735 @@ static inline void i8042_write_command(int val)
#include <linux/dmi.h>
-static const struct dmi_system_id __initconst i8042_dmi_noloop_table[] = { +#define SERIO_QUIRK_NOKBD BIT(0) +#define SERIO_QUIRK_NOAUX BIT(1) +#define SERIO_QUIRK_NOMUX BIT(2) +#define SERIO_QUIRK_FORCEMUX BIT(3) +#define SERIO_QUIRK_UNLOCK BIT(4) +#define SERIO_QUIRK_PROBE_DEFER BIT(5) +#define SERIO_QUIRK_RESET_ALWAYS BIT(6) +#define SERIO_QUIRK_RESET_NEVER BIT(7) +#define SERIO_QUIRK_DIECT BIT(8) +#define SERIO_QUIRK_DUMBKBD BIT(9) +#define SERIO_QUIRK_NOLOOP BIT(10) +#define SERIO_QUIRK_NOTIMEOUT BIT(11) +#define SERIO_QUIRK_KBDRESET BIT(12) +#define SERIO_QUIRK_DRITEK BIT(13) +#define SERIO_QUIRK_NOPNP BIT(14) + +/* Quirk table for different mainboards. Options similar or identical to i8042 + * module parameters. + * ORDERING IS IMPORTANT! The first match will be apllied and the rest ignored. + * This allows entries to overwrite vendor wide quirks on a per device basis. + * Where this is irrelevant, entries are sorted case sensitive by DMI_SYS_VENDOR + * and/or DMI_BOARD_VENDOR to make it easier to avoid dublicate entries. + */ +static const struct dmi_system_id i8042_dmi_quirk_table[] __initconst = { { - /* - * Arima-Rioworks HDAMB - - * AUX LOOP command does not raise AUX IRQ - */ .matches = { - DMI_MATCH(DMI_BOARD_VENDOR, "RIOWORKS"), - DMI_MATCH(DMI_BOARD_NAME, "HDAMB"), - DMI_MATCH(DMI_BOARD_VERSION, "Rev E"), + DMI_MATCH(DMI_SYS_VENDOR, "ALIENWARE"), + DMI_MATCH(DMI_PRODUCT_NAME, "Sentia"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* ASUS G1S */ .matches = { - DMI_MATCH(DMI_BOARD_VENDOR, "ASUSTeK Computer Inc."), - DMI_MATCH(DMI_BOARD_NAME, "G1S"), - DMI_MATCH(DMI_BOARD_VERSION, "1.0"), + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_PRODUCT_NAME, "X750LN"), }, + .driver_data = (void *)(SERIO_QUIRK_NOLOOP) }, { - /* ASUS P65UP5 - AUX LOOP command does not raise AUX IRQ */ + /* Asus X450LCP */ .matches = { - DMI_MATCH(DMI_BOARD_VENDOR, "ASUSTeK Computer INC."), - DMI_MATCH(DMI_BOARD_NAME, "P/I-P65UP5"), - DMI_MATCH(DMI_BOARD_VERSION, "REV 2.X"), + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_PRODUCT_NAME, "X450LCP"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_NEVER) }, { + /* ASUS ZenBook UX425UA */ .matches = { DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), - DMI_MATCH(DMI_PRODUCT_NAME, "X750LN"), + DMI_MATCH(DMI_PRODUCT_NAME, "ZenBook UX425UA"), }, + .driver_data = (void *)(SERIO_QUIRK_PROBE_DEFER | SERIO_QUIRK_RESET_NEVER) }, { + /* ASUS ZenBook UM325UA */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Compaq"), - DMI_MATCH(DMI_PRODUCT_NAME , "ProLiant"), - DMI_MATCH(DMI_PRODUCT_VERSION, "8500"), + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_PRODUCT_NAME, "ZenBook UX325UA_UM325UA"), }, + .driver_data = (void *)(SERIO_QUIRK_PROBE_DEFER | SERIO_QUIRK_RESET_NEVER) }, + /* + * On some Asus laptops, just running self tests cause problems. + */ { .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Compaq"), - DMI_MATCH(DMI_PRODUCT_NAME , "ProLiant"), - DMI_MATCH(DMI_PRODUCT_VERSION, "DL760"), + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_CHASSIS_TYPE, "10"), /* Notebook */ }, + .driver_data = (void *)(SERIO_QUIRK_RESET_NEVER) }, { - /* Dell Embedded Box PC 3000 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), - DMI_MATCH(DMI_PRODUCT_NAME, "Embedded Box PC 3000"), + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_CHASSIS_TYPE, "31"), /* Convertible Notebook */ }, + .driver_data = (void *)(SERIO_QUIRK_RESET_NEVER) }, { - /* OQO Model 01 */ + /* ASUS P65UP5 - AUX LOOP command does not raise AUX IRQ */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "OQO"), - DMI_MATCH(DMI_PRODUCT_NAME, "ZEPTO"), - DMI_MATCH(DMI_PRODUCT_VERSION, "00"), + DMI_MATCH(DMI_BOARD_VENDOR, "ASUSTeK Computer INC."), + DMI_MATCH(DMI_BOARD_NAME, "P/I-P65UP5"), + DMI_MATCH(DMI_BOARD_VERSION, "REV 2.X"), }, + .driver_data = (void *)(SERIO_QUIRK_NOLOOP) }, { - /* ULI EV4873 - AUX LOOP does not work properly */ + /* ASUS G1S */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "ULI"), - DMI_MATCH(DMI_PRODUCT_NAME, "EV4873"), - DMI_MATCH(DMI_PRODUCT_VERSION, "5a"), + DMI_MATCH(DMI_BOARD_VENDOR, "ASUSTeK Computer Inc."), + DMI_MATCH(DMI_BOARD_NAME, "G1S"), + DMI_MATCH(DMI_BOARD_VERSION, "1.0"), }, + .driver_data = (void *)(SERIO_QUIRK_NOLOOP) }, { - /* Microsoft Virtual Machine */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), - DMI_MATCH(DMI_PRODUCT_NAME, "Virtual Machine"), - DMI_MATCH(DMI_PRODUCT_VERSION, "VS2005R2"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 1360"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Medion MAM 2070 */ + /* Acer Aspire 5710 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Notebook"), - DMI_MATCH(DMI_PRODUCT_NAME, "MAM 2070"), - DMI_MATCH(DMI_PRODUCT_VERSION, "5a"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 5710"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Medion Akoya E7225 */ + /* Acer Aspire 7738 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Medion"), - DMI_MATCH(DMI_PRODUCT_NAME, "Akoya E7225"), - DMI_MATCH(DMI_PRODUCT_VERSION, "1.0"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 7738"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Blue FB5601 */ + /* Acer Aspire 5536 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "blue"), - DMI_MATCH(DMI_PRODUCT_NAME, "FB5601"), - DMI_MATCH(DMI_PRODUCT_VERSION, "M606"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 5536"), + DMI_MATCH(DMI_PRODUCT_VERSION, "0100"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Gigabyte M912 */ + /* + * Acer Aspire 5738z + * Touchpad stops working in mux mode when dis- + re-enabled + * with the touchpad enable/disable toggle hotkey + */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "GIGABYTE"), - DMI_MATCH(DMI_PRODUCT_NAME, "M912"), - DMI_MATCH(DMI_PRODUCT_VERSION, "01"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 5738"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Gigabyte M1022M netbook */ + /* Acer Aspire One 150 */ .matches = { - DMI_MATCH(DMI_BOARD_VENDOR, "Gigabyte Technology Co.,Ltd."), - DMI_MATCH(DMI_BOARD_NAME, "M1022E"), - DMI_MATCH(DMI_BOARD_VERSION, "1.02"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "AOA150"), }, + .driver_data = (void *)(SERIO_QUIRK_RESET_ALWAYS) }, { - /* Gigabyte Spring Peak - defines wrong chassis type */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "GIGABYTE"), - DMI_MATCH(DMI_PRODUCT_NAME, "Spring Peak"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "Aspire A114-31"), }, + .driver_data = (void *)(SERIO_QUIRK_RESET_ALWAYS) }, { - /* Gigabyte T1005 - defines wrong chassis type ("Other") */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "GIGABYTE"), - DMI_MATCH(DMI_PRODUCT_NAME, "T1005"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "Aspire A314-31"), }, + .driver_data = (void *)(SERIO_QUIRK_RESET_ALWAYS) }, { - /* Gigabyte T1005M/P - defines wrong chassis type ("Other") */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "GIGABYTE"), - DMI_MATCH(DMI_PRODUCT_NAME, "T1005M/P"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "Aspire A315-31"), }, + .driver_data = (void *)(SERIO_QUIRK_RESET_ALWAYS) }, { .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"), - DMI_MATCH(DMI_PRODUCT_NAME, "HP Pavilion dv9700"), - DMI_MATCH(DMI_PRODUCT_VERSION, "Rev 1"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "Aspire ES1-132"), }, + .driver_data = (void *)(SERIO_QUIRK_RESET_ALWAYS) }, { .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "PEGATRON CORPORATION"), - DMI_MATCH(DMI_PRODUCT_NAME, "C15B"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "Aspire ES1-332"), }, + .driver_data = (void *)(SERIO_QUIRK_RESET_ALWAYS) }, { .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "ByteSpeed LLC"), - DMI_MATCH(DMI_PRODUCT_NAME, "ByteSpeed Laptop C15B"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "Aspire ES1-432"), }, + .driver_data = (void *)(SERIO_QUIRK_RESET_ALWAYS) }, - { } -}; - -/* - * Some Fujitsu notebooks are having trouble with touchpads if - * active multiplexing mode is activated. Luckily they don't have - * external PS/2 ports so we can safely disable it. - * ... apparently some Toshibas don't like MUX mode either and - * die horrible death on reboot. - */ -static const struct dmi_system_id __initconst i8042_dmi_nomux_table[] = { { - /* Fujitsu Lifebook P7010/P7010D */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), - DMI_MATCH(DMI_PRODUCT_NAME, "P7010"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "TravelMate Spin B118-RN"), }, + .driver_data = (void *)(SERIO_QUIRK_RESET_ALWAYS) }, + /* + * Some Wistron based laptops need us to explicitly enable the 'Dritek + * keyboard extension' to make their extra keys start generating scancodes. + * Originally, this was just confined to older laptops, but a few Acer laptops + * have turned up in 2007 that also need this again. + */ { - /* Fujitsu Lifebook P7010 */ + /* Acer Aspire 5100 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU SIEMENS"), - DMI_MATCH(DMI_PRODUCT_NAME, "0000000000"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 5100"), }, + .driver_data = (void *)(SERIO_QUIRK_DRITEK) }, { - /* Fujitsu Lifebook P5020D */ + /* Acer Aspire 5610 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), - DMI_MATCH(DMI_PRODUCT_NAME, "LifeBook P Series"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 5610"), }, + .driver_data = (void *)(SERIO_QUIRK_DRITEK) }, { - /* Fujitsu Lifebook S2000 */ + /* Acer Aspire 5630 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), - DMI_MATCH(DMI_PRODUCT_NAME, "LifeBook S Series"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 5630"), }, + .driver_data = (void *)(SERIO_QUIRK_DRITEK) }, { - /* Fujitsu Lifebook S6230 */ + /* Acer Aspire 5650 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), - DMI_MATCH(DMI_PRODUCT_NAME, "LifeBook S6230"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 5650"), }, + .driver_data = (void *)(SERIO_QUIRK_DRITEK) }, { - /* Fujitsu Lifebook T725 laptop */ + /* Acer Aspire 5680 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), - DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK T725"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 5680"), }, + .driver_data = (void *)(SERIO_QUIRK_DRITEK) }, { - /* Fujitsu Lifebook U745 */ + /* Acer Aspire 5720 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), - DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK U745"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 5720"), }, + .driver_data = (void *)(SERIO_QUIRK_DRITEK) }, { - /* Fujitsu T70H */ + /* Acer Aspire 9110 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), - DMI_MATCH(DMI_PRODUCT_NAME, "FMVLT70H"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 9110"), }, + .driver_data = (void *)(SERIO_QUIRK_DRITEK) }, { - /* Fujitsu-Siemens Lifebook T3010 */ + /* Acer TravelMate 660 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU SIEMENS"), - DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK T3010"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "TravelMate 660"), }, + .driver_data = (void *)(SERIO_QUIRK_DRITEK) }, { - /* Fujitsu-Siemens Lifebook E4010 */ + /* Acer TravelMate 2490 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU SIEMENS"), - DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK E4010"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "TravelMate 2490"), }, + .driver_data = (void *)(SERIO_QUIRK_DRITEK) }, { - /* Fujitsu-Siemens Amilo Pro 2010 */ + /* Acer TravelMate 4280 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU SIEMENS"), - DMI_MATCH(DMI_PRODUCT_NAME, "AMILO Pro V2010"), + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "TravelMate 4280"), }, + .driver_data = (void *)(SERIO_QUIRK_DRITEK) }, { - /* Fujitsu-Siemens Amilo Pro 2030 */ + /* Amoi M636/A737 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU SIEMENS"), - DMI_MATCH(DMI_PRODUCT_NAME, "AMILO PRO V2030"), + DMI_MATCH(DMI_SYS_VENDOR, "Amoi Electronics CO.,LTD."), + DMI_MATCH(DMI_PRODUCT_NAME, "M636/A737 platform"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* - * No data is coming from the touchscreen unless KBC - * is in legacy mode. - */ - /* Panasonic CF-29 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Matsushita"), - DMI_MATCH(DMI_PRODUCT_NAME, "CF-29"), + DMI_MATCH(DMI_SYS_VENDOR, "ByteSpeed LLC"), + DMI_MATCH(DMI_PRODUCT_NAME, "ByteSpeed Laptop C15B"), }, + .driver_data = (void *)(SERIO_QUIRK_NOLOOP) }, { - /* - * HP Pavilion DV4017EA - - * errors on MUX ports are reported without raising AUXDATA - * causing "spurious NAK" messages. - */ + /* Compal HEL80I */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"), - DMI_MATCH(DMI_PRODUCT_NAME, "Pavilion dv4000 (EA032EA#ABF)"), + DMI_MATCH(DMI_SYS_VENDOR, "COMPAL"), + DMI_MATCH(DMI_PRODUCT_NAME, "HEL80I"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* - * HP Pavilion ZT1000 - - * like DV4017EA does not raise AUXERR for errors on MUX ports. - */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"), - DMI_MATCH(DMI_PRODUCT_NAME, "HP Pavilion Notebook PC"), - DMI_MATCH(DMI_PRODUCT_VERSION, "HP Pavilion Notebook ZT1000"), + DMI_MATCH(DMI_SYS_VENDOR, "Compaq"), + DMI_MATCH(DMI_PRODUCT_NAME, "ProLiant"), + DMI_MATCH(DMI_PRODUCT_VERSION, "8500"), }, + .driver_data = (void *)(SERIO_QUIRK_NOLOOP) }, { - /* - * HP Pavilion DV4270ca - - * like DV4017EA does not raise AUXERR for errors on MUX ports. - */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"), - DMI_MATCH(DMI_PRODUCT_NAME, "Pavilion dv4000 (EH476UA#ABL)"), + DMI_MATCH(DMI_SYS_VENDOR, "Compaq"), + DMI_MATCH(DMI_PRODUCT_NAME, "ProLiant"), + DMI_MATCH(DMI_PRODUCT_VERSION, "DL760"), }, + .driver_data = (void *)(SERIO_QUIRK_NOLOOP) }, { + /* Advent 4211 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "TOSHIBA"), - DMI_MATCH(DMI_PRODUCT_NAME, "Satellite P10"), + DMI_MATCH(DMI_SYS_VENDOR, "DIXONSXP"), + DMI_MATCH(DMI_PRODUCT_NAME, "Advent 4211"), }, + .driver_data = (void *)(SERIO_QUIRK_RESET_ALWAYS) }, { + /* Dell Embedded Box PC 3000 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "TOSHIBA"), - DMI_MATCH(DMI_PRODUCT_NAME, "EQUIUM A110"), + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Embedded Box PC 3000"), }, + .driver_data = (void *)(SERIO_QUIRK_NOLOOP) }, { + /* Dell XPS M1530 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "TOSHIBA"), - DMI_MATCH(DMI_PRODUCT_NAME, "SATELLITE C850D"), + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "XPS M1530"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { + /* Dell Vostro 1510 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "ALIENWARE"), - DMI_MATCH(DMI_PRODUCT_NAME, "Sentia"), + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Vostro1510"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Sharp Actius MM20 */ + /* Dell Vostro V13 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "SHARP"), - DMI_MATCH(DMI_PRODUCT_NAME, "PC-MM20 Series"), + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Vostro V13"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_NOTIMEOUT) }, { - /* Sony Vaio FS-115b */ + /* Dell Vostro 1320 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Sony Corporation"), - DMI_MATCH(DMI_PRODUCT_NAME, "VGN-FS115B"), + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Vostro 1320"), }, + .driver_data = (void *)(SERIO_QUIRK_RESET_ALWAYS) }, { - /* - * Sony Vaio FZ-240E - - * reset and GET ID commands issued via KBD port are - * sometimes being delivered to AUX3. - */ + /* Dell Vostro 1520 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Sony Corporation"), - DMI_MATCH(DMI_PRODUCT_NAME, "VGN-FZ240E"), + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Vostro 1520"), }, + .driver_data = (void *)(SERIO_QUIRK_RESET_ALWAYS) }, { - /* - * Most (all?) VAIOs do not have external PS/2 ports nor - * they implement active multiplexing properly, and - * MUX discovery usually messes up keyboard/touchpad. - */ + /* Dell Vostro 1720 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Sony Corporation"), - DMI_MATCH(DMI_BOARD_NAME, "VAIO"), + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Vostro 1720"), }, + .driver_data = (void *)(SERIO_QUIRK_RESET_ALWAYS) }, { - /* Amoi M636/A737 */ + /* Entroware Proteus */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Amoi Electronics CO.,LTD."), - DMI_MATCH(DMI_PRODUCT_NAME, "M636/A737 platform"), + DMI_MATCH(DMI_SYS_VENDOR, "Entroware"), + DMI_MATCH(DMI_PRODUCT_NAME, "Proteus"), + DMI_MATCH(DMI_PRODUCT_VERSION, "EL07R4"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS) }, + /* + * Some Fujitsu notebooks are having trouble with touchpads if + * active multiplexing mode is activated. Luckily they don't have + * external PS/2 ports so we can safely disable it. + * ... apparently some Toshibas don't like MUX mode either and + * die horrible death on reboot. + */ { - /* Lenovo 3000 n100 */ + /* Fujitsu Lifebook P7010/P7010D */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), - DMI_MATCH(DMI_PRODUCT_NAME, "076804U"), + DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), + DMI_MATCH(DMI_PRODUCT_NAME, "P7010"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Lenovo XiaoXin Air 12 */ + /* Fujitsu Lifebook P5020D */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), - DMI_MATCH(DMI_PRODUCT_NAME, "80UN"), + DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), + DMI_MATCH(DMI_PRODUCT_NAME, "LifeBook P Series"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { + /* Fujitsu Lifebook S2000 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 1360"), + DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), + DMI_MATCH(DMI_PRODUCT_NAME, "LifeBook S Series"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Acer Aspire 5710 */ + /* Fujitsu Lifebook S6230 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 5710"), + DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), + DMI_MATCH(DMI_PRODUCT_NAME, "LifeBook S6230"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Acer Aspire 7738 */ + /* Fujitsu Lifebook T725 laptop */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 7738"), + DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), + DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK T725"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_NOTIMEOUT) }, { - /* Gericom Bellagio */ + /* Fujitsu Lifebook U745 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Gericom"), - DMI_MATCH(DMI_PRODUCT_NAME, "N34AS6"), + DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), + DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK U745"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* IBM 2656 */ + /* Fujitsu T70H */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "IBM"), - DMI_MATCH(DMI_PRODUCT_NAME, "2656"), + DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), + DMI_MATCH(DMI_PRODUCT_NAME, "FMVLT70H"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Dell XPS M1530 */ + /* Fujitsu A544 laptop */ + /* https://bugzilla.redhat.com/show_bug.cgi?id=1111138 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), - DMI_MATCH(DMI_PRODUCT_NAME, "XPS M1530"), + DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), + DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK A544"), }, + .driver_data = (void *)(SERIO_QUIRK_NOTIMEOUT) }, { - /* Compal HEL80I */ + /* Fujitsu AH544 laptop */ + /* https://bugzilla.kernel.org/show_bug.cgi?id=69731 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "COMPAL"), - DMI_MATCH(DMI_PRODUCT_NAME, "HEL80I"), + DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), + DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK AH544"), }, + .driver_data = (void *)(SERIO_QUIRK_NOTIMEOUT) }, { - /* Dell Vostro 1510 */ + /* Fujitsu U574 laptop */ + /* https://bugzilla.kernel.org/show_bug.cgi?id=69731 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), - DMI_MATCH(DMI_PRODUCT_NAME, "Vostro1510"), + DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), + DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK U574"), }, + .driver_data = (void *)(SERIO_QUIRK_NOTIMEOUT) }, { - /* Acer Aspire 5536 */ + /* Fujitsu UH554 laptop */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 5536"), - DMI_MATCH(DMI_PRODUCT_VERSION, "0100"), + DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), + DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK UH544"), }, + .driver_data = (void *)(SERIO_QUIRK_NOTIMEOUT) }, { - /* Dell Vostro V13 */ + /* Fujitsu Lifebook P7010 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), - DMI_MATCH(DMI_PRODUCT_NAME, "Vostro V13"), + DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU SIEMENS"), + DMI_MATCH(DMI_PRODUCT_NAME, "0000000000"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Newer HP Pavilion dv4 models */ + /* Fujitsu-Siemens Lifebook T3010 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"), - DMI_MATCH(DMI_PRODUCT_NAME, "HP Pavilion dv4 Notebook PC"), + DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU SIEMENS"), + DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK T3010"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Asus X450LCP */ + /* Fujitsu-Siemens Lifebook E4010 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), - DMI_MATCH(DMI_PRODUCT_NAME, "X450LCP"), + DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU SIEMENS"), + DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK E4010"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Avatar AVIU-145A6 */ + /* Fujitsu-Siemens Amilo Pro 2010 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Intel"), - DMI_MATCH(DMI_PRODUCT_NAME, "IC4I"), + DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU SIEMENS"), + DMI_MATCH(DMI_PRODUCT_NAME, "AMILO Pro V2010"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* TUXEDO BU1406 */ + /* Fujitsu-Siemens Amilo Pro 2030 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Notebook"), - DMI_MATCH(DMI_PRODUCT_NAME, "N24_25BU"), + DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU SIEMENS"), + DMI_MATCH(DMI_PRODUCT_NAME, "AMILO PRO V2030"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Lenovo LaVie Z */ + /* Gigabyte M912 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), - DMI_MATCH(DMI_PRODUCT_VERSION, "Lenovo LaVie Z"), + DMI_MATCH(DMI_SYS_VENDOR, "GIGABYTE"), + DMI_MATCH(DMI_PRODUCT_NAME, "M912"), + DMI_MATCH(DMI_PRODUCT_VERSION, "01"), }, + .driver_data = (void *)(SERIO_QUIRK_NOLOOP) }, { - /* - * Acer Aspire 5738z - * Touchpad stops working in mux mode when dis- + re-enabled - * with the touchpad enable/disable toggle hotkey - */ + /* Gigabyte Spring Peak - defines wrong chassis type */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 5738"), + DMI_MATCH(DMI_SYS_VENDOR, "GIGABYTE"), + DMI_MATCH(DMI_PRODUCT_NAME, "Spring Peak"), }, + .driver_data = (void *)(SERIO_QUIRK_NOLOOP) }, { - /* Entroware Proteus */ + /* Gigabyte T1005 - defines wrong chassis type ("Other") */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Entroware"), - DMI_MATCH(DMI_PRODUCT_NAME, "Proteus"), - DMI_MATCH(DMI_PRODUCT_VERSION, "EL07R4"), + DMI_MATCH(DMI_SYS_VENDOR, "GIGABYTE"), + DMI_MATCH(DMI_PRODUCT_NAME, "T1005"), }, + .driver_data = (void *)(SERIO_QUIRK_NOLOOP) }, - { } -}; - -static const struct dmi_system_id i8042_dmi_forcemux_table[] __initconst = { { - /* - * Sony Vaio VGN-CS series require MUX or the touch sensor - * buttons will disturb touchpad operation - */ + /* Gigabyte T1005M/P - defines wrong chassis type ("Other") */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Sony Corporation"), - DMI_MATCH(DMI_PRODUCT_NAME, "VGN-CS"), + DMI_MATCH(DMI_SYS_VENDOR, "GIGABYTE"), + DMI_MATCH(DMI_PRODUCT_NAME, "T1005M/P"), }, + .driver_data = (void *)(SERIO_QUIRK_NOLOOP) }, - { } -}; - -/* - * On some Asus laptops, just running self tests cause problems. - */ -static const struct dmi_system_id i8042_dmi_noselftest_table[] = { + /* + * Some laptops need keyboard reset before probing for the trackpad to get + * it detected, initialised & finally work. + */ { + /* Gigabyte P35 v2 - Elantech touchpad */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), - DMI_MATCH(DMI_CHASSIS_TYPE, "10"), /* Notebook */ - }, - }, { - .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), - DMI_MATCH(DMI_CHASSIS_TYPE, "31"), /* Convertible Notebook */ + DMI_MATCH(DMI_SYS_VENDOR, "GIGABYTE"), + DMI_MATCH(DMI_PRODUCT_NAME, "P35V2"), }, + .driver_data = (void *)(SERIO_QUIRK_KBDRESET) }, - { } -}; -static const struct dmi_system_id __initconst i8042_dmi_reset_table[] = { - { - /* MSI Wind U-100 */ + { + /* Aorus branded Gigabyte X3 Plus - Elantech touchpad */ .matches = { - DMI_MATCH(DMI_BOARD_NAME, "U-100"), - DMI_MATCH(DMI_BOARD_VENDOR, "MICRO-STAR INTERNATIONAL CO., LTD"), + DMI_MATCH(DMI_SYS_VENDOR, "GIGABYTE"), + DMI_MATCH(DMI_PRODUCT_NAME, "X3"), }, + .driver_data = (void *)(SERIO_QUIRK_KBDRESET) }, { - /* LG Electronics X110 */ + /* Gigabyte P34 - Elantech touchpad */ .matches = { - DMI_MATCH(DMI_BOARD_NAME, "X110"), - DMI_MATCH(DMI_BOARD_VENDOR, "LG Electronics Inc."), + DMI_MATCH(DMI_SYS_VENDOR, "GIGABYTE"), + DMI_MATCH(DMI_PRODUCT_NAME, "P34"), }, + .driver_data = (void *)(SERIO_QUIRK_KBDRESET) }, { - /* Acer Aspire One 150 */ + /* Gigabyte P57 - Elantech touchpad */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "AOA150"), + DMI_MATCH(DMI_SYS_VENDOR, "GIGABYTE"), + DMI_MATCH(DMI_PRODUCT_NAME, "P57"), }, + .driver_data = (void *)(SERIO_QUIRK_KBDRESET) }, { + /* Gericom Bellagio */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "Aspire A114-31"), + DMI_MATCH(DMI_SYS_VENDOR, "Gericom"), + DMI_MATCH(DMI_PRODUCT_NAME, "N34AS6"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { + /* Gigabyte M1022M netbook */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "Aspire A314-31"), + DMI_MATCH(DMI_BOARD_VENDOR, "Gigabyte Technology Co.,Ltd."), + DMI_MATCH(DMI_BOARD_NAME, "M1022E"), + DMI_MATCH(DMI_BOARD_VERSION, "1.02"), }, + .driver_data = (void *)(SERIO_QUIRK_NOLOOP) }, { .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "Aspire A315-31"), + DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"), + DMI_MATCH(DMI_PRODUCT_NAME, "HP Pavilion dv9700"), + DMI_MATCH(DMI_PRODUCT_VERSION, "Rev 1"), }, + .driver_data = (void *)(SERIO_QUIRK_NOLOOP) }, { + /* + * HP Pavilion DV4017EA - + * errors on MUX ports are reported without raising AUXDATA + * causing "spurious NAK" messages. + */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "Aspire ES1-132"), + DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"), + DMI_MATCH(DMI_PRODUCT_NAME, "Pavilion dv4000 (EA032EA#ABF)"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { + /* + * HP Pavilion ZT1000 - + * like DV4017EA does not raise AUXERR for errors on MUX ports. + */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "Aspire ES1-332"), + DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"), + DMI_MATCH(DMI_PRODUCT_NAME, "HP Pavilion Notebook PC"), + DMI_MATCH(DMI_PRODUCT_VERSION, "HP Pavilion Notebook ZT1000"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { + /* + * HP Pavilion DV4270ca - + * like DV4017EA does not raise AUXERR for errors on MUX ports. + */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "Aspire ES1-432"), + DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"), + DMI_MATCH(DMI_PRODUCT_NAME, "Pavilion dv4000 (EH476UA#ABL)"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { + /* Newer HP Pavilion dv4 models */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "TravelMate Spin B118-RN"), + DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"), + DMI_MATCH(DMI_PRODUCT_NAME, "HP Pavilion dv4 Notebook PC"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_NOTIMEOUT) }, { - /* Advent 4211 */ + /* IBM 2656 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "DIXONSXP"), - DMI_MATCH(DMI_PRODUCT_NAME, "Advent 4211"), + DMI_MATCH(DMI_SYS_VENDOR, "IBM"), + DMI_MATCH(DMI_PRODUCT_NAME, "2656"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Medion Akoya Mini E1210 */ + /* Avatar AVIU-145A6 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "MEDION"), - DMI_MATCH(DMI_PRODUCT_NAME, "E1210"), + DMI_MATCH(DMI_SYS_VENDOR, "Intel"), + DMI_MATCH(DMI_PRODUCT_NAME, "IC4I"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Medion Akoya E1222 */ + /* Intel MBO Desktop D845PESV */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "MEDION"), - DMI_MATCH(DMI_PRODUCT_NAME, "E122X"), + DMI_MATCH(DMI_BOARD_VENDOR, "Intel Corporation"), + DMI_MATCH(DMI_BOARD_NAME, "D845PESV"), }, + .driver_data = (void *)(SERIO_QUIRK_NOPNP) }, { - /* Mivvy M310 */ + /* + * Intel NUC D54250WYK - does not have i8042 controller but + * declares PS/2 devices in DSDT. + */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "VIOOO"), - DMI_MATCH(DMI_PRODUCT_NAME, "N10"), + DMI_MATCH(DMI_BOARD_VENDOR, "Intel Corporation"), + DMI_MATCH(DMI_BOARD_NAME, "D54250WYK"), }, + .driver_data = (void *)(SERIO_QUIRK_NOPNP) }, { - /* Dell Vostro 1320 */ + /* Lenovo 3000 n100 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), - DMI_MATCH(DMI_PRODUCT_NAME, "Vostro 1320"), + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "076804U"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Dell Vostro 1520 */ + /* Lenovo XiaoXin Air 12 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), - DMI_MATCH(DMI_PRODUCT_NAME, "Vostro 1520"), + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "80UN"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Dell Vostro 1720 */ + /* Lenovo LaVie Z */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), - DMI_MATCH(DMI_PRODUCT_NAME, "Vostro 1720"), + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_VERSION, "Lenovo LaVie Z"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { /* Lenovo Ideapad U455 */ @@ -722,6 +803,7 @@ static const struct dmi_system_id __initconst i8042_dmi_reset_table[] = { DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), DMI_MATCH(DMI_PRODUCT_NAME, "20046"), }, + .driver_data = (void *)(SERIO_QUIRK_RESET_ALWAYS) }, { /* Lenovo ThinkPad L460 */ @@ -729,13 +811,7 @@ static const struct dmi_system_id __initconst i8042_dmi_reset_table[] = { DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad L460"), }, - }, - { - /* Clevo P650RS, 650RP6, Sager NP8152-S, and others */ - .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Notebook"), - DMI_MATCH(DMI_PRODUCT_NAME, "P65xRP"), - }, + .driver_data = (void *)(SERIO_QUIRK_RESET_ALWAYS) }, { /* Lenovo ThinkPad Twist S230u */ @@ -743,275 +819,269 @@ static const struct dmi_system_id __initconst i8042_dmi_reset_table[] = { DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), DMI_MATCH(DMI_PRODUCT_NAME, "33474HU"), }, + .driver_data = (void *)(SERIO_QUIRK_RESET_ALWAYS) }, { - /* Entroware Proteus */ + /* LG Electronics X110 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Entroware"), - DMI_MATCH(DMI_PRODUCT_NAME, "Proteus"), - DMI_MATCH(DMI_PRODUCT_VERSION, "EL07R4"), + DMI_MATCH(DMI_BOARD_VENDOR, "LG Electronics Inc."), + DMI_MATCH(DMI_BOARD_NAME, "X110"), }, + .driver_data = (void *)(SERIO_QUIRK_RESET_ALWAYS) }, - { } -}; - -#ifdef CONFIG_PNP -static const struct dmi_system_id __initconst i8042_dmi_nopnp_table[] = { { - /* Intel MBO Desktop D845PESV */ + /* Medion Akoya Mini E1210 */ .matches = { - DMI_MATCH(DMI_BOARD_NAME, "D845PESV"), - DMI_MATCH(DMI_BOARD_VENDOR, "Intel Corporation"), + DMI_MATCH(DMI_SYS_VENDOR, "MEDION"), + DMI_MATCH(DMI_PRODUCT_NAME, "E1210"), }, + .driver_data = (void *)(SERIO_QUIRK_RESET_ALWAYS) }, { - /* - * Intel NUC D54250WYK - does not have i8042 controller but - * declares PS/2 devices in DSDT. - */ + /* Medion Akoya E1222 */ .matches = { - DMI_MATCH(DMI_BOARD_NAME, "D54250WYK"), - DMI_MATCH(DMI_BOARD_VENDOR, "Intel Corporation"), + DMI_MATCH(DMI_SYS_VENDOR, "MEDION"), + DMI_MATCH(DMI_PRODUCT_NAME, "E122X"), }, + .driver_data = (void *)(SERIO_QUIRK_RESET_ALWAYS) }, { /* MSI Wind U-100 */ .matches = { - DMI_MATCH(DMI_BOARD_NAME, "U-100"), - DMI_MATCH(DMI_BOARD_VENDOR, "MICRO-STAR INTERNATIONAL CO., LTD"), - }, - }, - { - /* Acer Aspire 5 A515 */ - .matches = { - DMI_MATCH(DMI_BOARD_NAME, "Grumpy_PK"), - DMI_MATCH(DMI_BOARD_VENDOR, "PK"), - }, - }, - { } -}; - -static const struct dmi_system_id i8042_dmi_laptop_table[] __initconst = { - { - .matches = { - DMI_MATCH(DMI_CHASSIS_TYPE, "8"), /* Portable */ - }, - }, - { - .matches = { - DMI_MATCH(DMI_CHASSIS_TYPE, "9"), /* Laptop */ + DMI_MATCH(DMI_BOARD_VENDOR, "MICRO-STAR INTERNATIONAL CO., LTD"), + DMI_MATCH(DMI_BOARD_NAME, "U-100"), }, + .driver_data = (void *)(SERIO_QUIRK_RESET_ALWAYS | SERIO_QUIRK_NOPNP) }, { + /* + * No data is coming from the touchscreen unless KBC + * is in legacy mode. + */ + /* Panasonic CF-29 */ .matches = { - DMI_MATCH(DMI_CHASSIS_TYPE, "10"), /* Notebook */ + DMI_MATCH(DMI_SYS_VENDOR, "Matsushita"), + DMI_MATCH(DMI_PRODUCT_NAME, "CF-29"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { + /* Medion Akoya E7225 */ .matches = { - DMI_MATCH(DMI_CHASSIS_TYPE, "14"), /* Sub-Notebook */ + DMI_MATCH(DMI_SYS_VENDOR, "Medion"), + DMI_MATCH(DMI_PRODUCT_NAME, "Akoya E7225"), + DMI_MATCH(DMI_PRODUCT_VERSION, "1.0"), }, + .driver_data = (void *)(SERIO_QUIRK_NOLOOP) }, - { } -}; -#endif - -static const struct dmi_system_id __initconst i8042_dmi_notimeout_table[] = { { - /* Dell Vostro V13 */ + /* Microsoft Virtual Machine */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), - DMI_MATCH(DMI_PRODUCT_NAME, "Vostro V13"), + DMI_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), + DMI_MATCH(DMI_PRODUCT_NAME, "Virtual Machine"), + DMI_MATCH(DMI_PRODUCT_VERSION, "VS2005R2"), }, + .driver_data = (void *)(SERIO_QUIRK_NOLOOP) }, { - /* Newer HP Pavilion dv4 models */ + /* Medion MAM 2070 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"), - DMI_MATCH(DMI_PRODUCT_NAME, "HP Pavilion dv4 Notebook PC"), + DMI_MATCH(DMI_SYS_VENDOR, "Notebook"), + DMI_MATCH(DMI_PRODUCT_NAME, "MAM 2070"), + DMI_MATCH(DMI_PRODUCT_VERSION, "5a"), }, + .driver_data = (void *)(SERIO_QUIRK_NOLOOP) }, { - /* Fujitsu A544 laptop */ - /* https://bugzilla.redhat.com/show_bug.cgi?id=1111138 */ + /* TUXEDO BU1406 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), - DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK A544"), + DMI_MATCH(DMI_SYS_VENDOR, "Notebook"), + DMI_MATCH(DMI_PRODUCT_NAME, "N24_25BU"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Fujitsu AH544 laptop */ - /* https://bugzilla.kernel.org/show_bug.cgi?id=69731 */ + /* Clevo P650RS, 650RP6, Sager NP8152-S, and others */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), - DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK AH544"), + DMI_MATCH(DMI_SYS_VENDOR, "Notebook"), + DMI_MATCH(DMI_PRODUCT_NAME, "P65xRP"), }, + .driver_data = (void *)(SERIO_QUIRK_RESET_ALWAYS) }, { - /* Fujitsu Lifebook T725 laptop */ + /* OQO Model 01 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), - DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK T725"), + DMI_MATCH(DMI_SYS_VENDOR, "OQO"), + DMI_MATCH(DMI_PRODUCT_NAME, "ZEPTO"), + DMI_MATCH(DMI_PRODUCT_VERSION, "00"), }, + .driver_data = (void *)(SERIO_QUIRK_NOLOOP) }, { - /* Fujitsu U574 laptop */ - /* https://bugzilla.kernel.org/show_bug.cgi?id=69731 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), - DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK U574"), + DMI_MATCH(DMI_SYS_VENDOR, "PEGATRON CORPORATION"), + DMI_MATCH(DMI_PRODUCT_NAME, "C15B"), }, + .driver_data = (void *)(SERIO_QUIRK_NOLOOP) }, { - /* Fujitsu UH554 laptop */ + /* Acer Aspire 5 A515 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), - DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK UH544"), + DMI_MATCH(DMI_BOARD_VENDOR, "PK"), + DMI_MATCH(DMI_BOARD_NAME, "Grumpy_PK"), }, + .driver_data = (void *)(SERIO_QUIRK_NOPNP) }, - { } -}; - -/* - * Some Wistron based laptops need us to explicitly enable the 'Dritek - * keyboard extension' to make their extra keys start generating scancodes. - * Originally, this was just confined to older laptops, but a few Acer laptops - * have turned up in 2007 that also need this again. - */ -static const struct dmi_system_id __initconst i8042_dmi_dritek_table[] = { { - /* Acer Aspire 5100 */ + /* ULI EV4873 - AUX LOOP does not work properly */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 5100"), + DMI_MATCH(DMI_SYS_VENDOR, "ULI"), + DMI_MATCH(DMI_PRODUCT_NAME, "EV4873"), + DMI_MATCH(DMI_PRODUCT_VERSION, "5a"), }, + .driver_data = (void *)(SERIO_QUIRK_NOLOOP) }, { - /* Acer Aspire 5610 */ + /* + * Arima-Rioworks HDAMB - + * AUX LOOP command does not raise AUX IRQ + */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 5610"), + DMI_MATCH(DMI_BOARD_VENDOR, "RIOWORKS"), + DMI_MATCH(DMI_BOARD_NAME, "HDAMB"), + DMI_MATCH(DMI_BOARD_VERSION, "Rev E"), }, + .driver_data = (void *)(SERIO_QUIRK_NOLOOP) }, { - /* Acer Aspire 5630 */ + /* Sharp Actius MM20 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 5630"), + DMI_MATCH(DMI_SYS_VENDOR, "SHARP"), + DMI_MATCH(DMI_PRODUCT_NAME, "PC-MM20 Series"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Acer Aspire 5650 */ + /* + * Sony Vaio FZ-240E - + * reset and GET ID commands issued via KBD port are + * sometimes being delivered to AUX3. + */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 5650"), + DMI_MATCH(DMI_SYS_VENDOR, "Sony Corporation"), + DMI_MATCH(DMI_PRODUCT_NAME, "VGN-FZ240E"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Acer Aspire 5680 */ + /* + * Most (all?) VAIOs do not have external PS/2 ports nor + * they implement active multiplexing properly, and + * MUX discovery usually messes up keyboard/touchpad. + */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 5680"), + DMI_MATCH(DMI_SYS_VENDOR, "Sony Corporation"), + DMI_MATCH(DMI_BOARD_NAME, "VAIO"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Acer Aspire 5720 */ + /* Sony Vaio FS-115b */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 5720"), + DMI_MATCH(DMI_SYS_VENDOR, "Sony Corporation"), + DMI_MATCH(DMI_PRODUCT_NAME, "VGN-FS115B"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Acer Aspire 9110 */ + /* + * Sony Vaio VGN-CS series require MUX or the touch sensor + * buttons will disturb touchpad operation + */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "Aspire 9110"), + DMI_MATCH(DMI_SYS_VENDOR, "Sony Corporation"), + DMI_MATCH(DMI_PRODUCT_NAME, "VGN-CS"), }, + .driver_data = (void *)(SERIO_QUIRK_FORCEMUX) }, { - /* Acer TravelMate 660 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "TravelMate 660"), + DMI_MATCH(DMI_SYS_VENDOR, "TOSHIBA"), + DMI_MATCH(DMI_PRODUCT_NAME, "Satellite P10"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Acer TravelMate 2490 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "TravelMate 2490"), + DMI_MATCH(DMI_SYS_VENDOR, "TOSHIBA"), + DMI_MATCH(DMI_PRODUCT_NAME, "EQUIUM A110"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, { - /* Acer TravelMate 4280 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "TravelMate 4280"), + DMI_MATCH(DMI_SYS_VENDOR, "TOSHIBA"), + DMI_MATCH(DMI_PRODUCT_NAME, "SATELLITE C850D"), }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, - { } -}; - -/* - * Some laptops need keyboard reset before probing for the trackpad to get - * it detected, initialised & finally work. - */ -static const struct dmi_system_id __initconst i8042_dmi_kbdreset_table[] = { { - /* Gigabyte P35 v2 - Elantech touchpad */ + /* Mivvy M310 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "GIGABYTE"), - DMI_MATCH(DMI_PRODUCT_NAME, "P35V2"), + DMI_MATCH(DMI_SYS_VENDOR, "VIOOO"), + DMI_MATCH(DMI_PRODUCT_NAME, "N10"), }, + .driver_data = (void *)(SERIO_QUIRK_RESET_ALWAYS) }, - { - /* Aorus branded Gigabyte X3 Plus - Elantech touchpad */ + /* + * Some laptops need keyboard reset before probing for the trackpad to get + * it detected, initialised & finally work. + */ + { + /* Schenker XMG C504 - Elantech touchpad */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "GIGABYTE"), - DMI_MATCH(DMI_PRODUCT_NAME, "X3"), + DMI_MATCH(DMI_SYS_VENDOR, "XMG"), + DMI_MATCH(DMI_PRODUCT_NAME, "C504"), }, + .driver_data = (void *)(SERIO_QUIRK_KBDRESET) }, { - /* Gigabyte P34 - Elantech touchpad */ + /* Blue FB5601 */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "GIGABYTE"), - DMI_MATCH(DMI_PRODUCT_NAME, "P34"), + DMI_MATCH(DMI_SYS_VENDOR, "blue"), + DMI_MATCH(DMI_PRODUCT_NAME, "FB5601"), + DMI_MATCH(DMI_PRODUCT_VERSION, "M606"), }, + .driver_data = (void *)(SERIO_QUIRK_NOLOOP) }, + { } +}; + +#ifdef CONFIG_PNP +static const struct dmi_system_id i8042_dmi_laptop_table[] __initconst = { { - /* Gigabyte P57 - Elantech touchpad */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "GIGABYTE"), - DMI_MATCH(DMI_PRODUCT_NAME, "P57"), + DMI_MATCH(DMI_CHASSIS_TYPE, "8"), /* Portable */ }, }, { - /* Schenker XMG C504 - Elantech touchpad */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "XMG"), - DMI_MATCH(DMI_PRODUCT_NAME, "C504"), + DMI_MATCH(DMI_CHASSIS_TYPE, "9"), /* Laptop */ }, }, - { } -}; - -static const struct dmi_system_id i8042_dmi_probe_defer_table[] __initconst = { { - /* ASUS ZenBook UX425UA */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), - DMI_MATCH(DMI_PRODUCT_NAME, "ZenBook UX425UA"), + DMI_MATCH(DMI_CHASSIS_TYPE, "10"), /* Notebook */ }, }, { - /* ASUS ZenBook UM325UA */ .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), - DMI_MATCH(DMI_PRODUCT_NAME, "ZenBook UX325UA_UM325UA"), + DMI_MATCH(DMI_CHASSIS_TYPE, "14"), /* Sub-Notebook */ }, }, { } }; +#endif
#endif /* CONFIG_X86 */
@@ -1166,11 +1236,6 @@ static int __init i8042_pnp_init(void) bool pnp_data_busted = false; int err;
-#ifdef CONFIG_X86 - if (dmi_check_system(i8042_dmi_nopnp_table)) - i8042_nopnp = true; -#endif - if (i8042_nopnp) { pr_info("PNP detection disabled\n"); return 0; @@ -1274,6 +1339,59 @@ static inline int i8042_pnp_init(void) { return 0; } static inline void i8042_pnp_exit(void) { } #endif /* CONFIG_PNP */
+ +#ifdef CONFIG_X86 +static void __init i8042_check_quirks(void) +{ + const struct dmi_system_id *device_quirk_info; + uintptr_t quirks; + + device_quirk_info = dmi_first_match(i8042_dmi_quirk_table); + if (!device_quirk_info) + return; + + quirks = (uintptr_t)device_quirk_info->driver_data; + + if (quirks & SERIO_QUIRK_NOKBD) + i8042_nokbd = true; + if (quirks & SERIO_QUIRK_NOAUX) + i8042_noaux = true; + if (quirks & SERIO_QUIRK_NOMUX) + i8042_nomux = true; + if (quirks & SERIO_QUIRK_FORCEMUX) + i8042_nomux = false; + if (quirks & SERIO_QUIRK_UNLOCK) + i8042_unlock = true; + if (quirks & SERIO_QUIRK_PROBE_DEFER) + i8042_probe_defer = true; + /* Honor module parameter when value is not default */ + if (i8042_reset == I8042_RESET_DEFAULT) { + if (quirks & SERIO_QUIRK_RESET_ALWAYS) + i8042_reset = I8042_RESET_ALWAYS; + if (quirks & SERIO_QUIRK_RESET_NEVER) + i8042_reset = I8042_RESET_NEVER; + } + if (quirks & SERIO_QUIRK_DIECT) + i8042_direct = true; + if (quirks & SERIO_QUIRK_DUMBKBD) + i8042_dumbkbd = true; + if (quirks & SERIO_QUIRK_NOLOOP) + i8042_noloop = true; + if (quirks & SERIO_QUIRK_NOTIMEOUT) + i8042_notimeout = true; + if (quirks & SERIO_QUIRK_KBDRESET) + i8042_kbdreset = true; + if (quirks & SERIO_QUIRK_DRITEK) + i8042_dritek = true; +#ifdef CONFIG_PNP + if (quirks & SERIO_QUIRK_NOPNP) + i8042_nopnp = true; +#endif +} +#else +static inline void i8042_check_quirks(void) {} +#endif + static int __init i8042_platform_init(void) { int retval; @@ -1296,45 +1414,17 @@ static int __init i8042_platform_init(void) i8042_kbd_irq = I8042_MAP_IRQ(1); i8042_aux_irq = I8042_MAP_IRQ(12);
- retval = i8042_pnp_init(); - if (retval) - return retval; - #if defined(__ia64__) - i8042_reset = I8042_RESET_ALWAYS; + i8042_reset = I8042_RESET_ALWAYS; #endif
-#ifdef CONFIG_X86 - /* Honor module parameter when value is not default */ - if (i8042_reset == I8042_RESET_DEFAULT) { - if (dmi_check_system(i8042_dmi_reset_table)) - i8042_reset = I8042_RESET_ALWAYS; - - if (dmi_check_system(i8042_dmi_noselftest_table)) - i8042_reset = I8042_RESET_NEVER; - } - - if (dmi_check_system(i8042_dmi_noloop_table)) - i8042_noloop = true; - - if (dmi_check_system(i8042_dmi_nomux_table)) - i8042_nomux = true; - - if (dmi_check_system(i8042_dmi_forcemux_table)) - i8042_nomux = false; - - if (dmi_check_system(i8042_dmi_notimeout_table)) - i8042_notimeout = true; - - if (dmi_check_system(i8042_dmi_dritek_table)) - i8042_dritek = true; - - if (dmi_check_system(i8042_dmi_kbdreset_table)) - i8042_kbdreset = true; + i8042_check_quirks();
- if (dmi_check_system(i8042_dmi_probe_defer_table)) - i8042_probe_defer = true; + retval = i8042_pnp_init(); + if (retval) + return retval;
+#ifdef CONFIG_X86 /* * A20 was already enabled during early kernel init. But some buggy * BIOSes (in MSI Laptops) require A20 to be enabled using 8042 to
From: Werner Sembach wse@tuxedocomputers.com
[ Upstream commit a6a87c36165e6791eeaed88025cde270536c3198 ]
A lot of modern Clevo barebones have touchpad and/or keyboard issues after suspend fixable with nomux + reset + noloop + nopnp. Luckily, none of them have an external PS/2 port so this can safely be set for all of them.
I'm not entirely sure if every device listed really needs all four quirks, but after testing and production use. No negative effects could be observed when setting all four.
The list is quite massive as neither the TUXEDO nor the Clevo dmi strings have been very consistent historically. I tried to keep the list as short as possible without risking on missing an affected device.
This is revision 3. The Clevo N150CU barebone is still removed as it might have problems with the fix and needs further investigations. The SchenkerTechnologiesGmbH System-/Board-Vendor string variations are added. This is now based in the quirk table refactor. This now also includes the additional noaux flag for the NS7xMU.
Signed-off-by: Werner Sembach wse@tuxedocomputers.com Reviewed-by: Hans de Goede hdegoede@redhat.com Link: https://lore.kernel.org/r/20220629112725.12922-5-wse@tuxedocomputers.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Stable-dep-of: 9c445d2637c9 ("Input: i8042 - add Clevo PCX0DX to i8042 quirk table") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/input/serio/i8042-x86ia64io.h | 129 ++++++++++++++++++++++++++ 1 file changed, 129 insertions(+)
diff --git a/drivers/input/serio/i8042-x86ia64io.h b/drivers/input/serio/i8042-x86ia64io.h index 73374c15eb27..ea9996debb88 100644 --- a/drivers/input/serio/i8042-x86ia64io.h +++ b/drivers/input/serio/i8042-x86ia64io.h @@ -1025,6 +1025,29 @@ static const struct dmi_system_id i8042_dmi_quirk_table[] __initconst = { }, .driver_data = (void *)(SERIO_QUIRK_NOMUX) }, + /* + * A lot of modern Clevo barebones have touchpad and/or keyboard issues + * after suspend fixable with nomux + reset + noloop + nopnp. Luckily, + * none of them have an external PS/2 port so this can safely be set for + * all of them. These two are based on a Clevo design, but have the + * board_name changed. + */ + { + .matches = { + DMI_MATCH(DMI_BOARD_VENDOR, "TUXEDO"), + DMI_MATCH(DMI_BOARD_NAME, "AURA1501"), + }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | + SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) + }, + { + .matches = { + DMI_MATCH(DMI_BOARD_VENDOR, "TUXEDO"), + DMI_MATCH(DMI_BOARD_NAME, "EDUBOOK1502"), + }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | + SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) + }, { /* Mivvy M310 */ .matches = { @@ -1054,6 +1077,112 @@ static const struct dmi_system_id i8042_dmi_quirk_table[] __initconst = { }, .driver_data = (void *)(SERIO_QUIRK_NOLOOP) }, + /* + * A lot of modern Clevo barebones have touchpad and/or keyboard issues + * after suspend fixable with nomux + reset + noloop + nopnp. Luckily, + * none of them have an external PS/2 port so this can safely be set for + * all of them. + * Clevo barebones come with board_vendor and/or system_vendor set to + * either the very generic string "Notebook" and/or a different value + * for each individual reseller. The only somewhat universal way to + * identify them is by board_name. + */ + { + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "LAPQC71A"), + }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | + SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) + }, + { + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "LAPQC71B"), + }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | + SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) + }, + { + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "N140CU"), + }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | + SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) + }, + { + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "N141CU"), + }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | + SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) + }, + { + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "NH5xAx"), + }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | + SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) + }, + { + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "NL5xRU"), + }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | + SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) + }, + /* + * At least one modern Clevo barebone has the touchpad connected both + * via PS/2 and i2c interface. This causes a race condition between the + * psmouse and i2c-hid driver. Since the full capability of the touchpad + * is available via the i2c interface and the device has no external + * PS/2 port, it is safe to just ignore all ps2 mouses here to avoid + * this issue. The known affected device is the + * TUXEDO InfinityBook S17 Gen6 / Clevo NS70MU which comes with one of + * the two different dmi strings below. NS50MU is not a typo! + */ + { + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "NS50MU"), + }, + .driver_data = (void *)(SERIO_QUIRK_NOAUX | SERIO_QUIRK_NOMUX | + SERIO_QUIRK_RESET_ALWAYS | SERIO_QUIRK_NOLOOP | + SERIO_QUIRK_NOPNP) + }, + { + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "NS50_70MU"), + }, + .driver_data = (void *)(SERIO_QUIRK_NOAUX | SERIO_QUIRK_NOMUX | + SERIO_QUIRK_RESET_ALWAYS | SERIO_QUIRK_NOLOOP | + SERIO_QUIRK_NOPNP) + }, + { + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "NJ50_70CU"), + }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | + SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) + }, + { + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "PB50_70DFx,DDx"), + }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | + SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) + }, + { + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "X170SM"), + }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | + SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) + }, + { + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "X170KM-G"), + }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | + SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) + }, { } };
From: Werner Sembach wse@tuxedocomputers.com
[ Upstream commit 9c445d2637c938a800fcc8b5f0b10e60c94460c7 ]
The Clevo PCX0DX/TUXEDO XP1511, need quirks for the keyboard to not be occasionally unresponsive after resume.
Signed-off-by: Werner Sembach wse@tuxedocomputers.com Cc: stable@vger.kernel.org Reviewed-by: Hans de Goede hdegoede@redhat.com Reviewed-by: Mattijs Korpershoek mkorpershoek@baylibre.com Link: https://lore.kernel.org/r/20230110134524.553620-1-wse@tuxedocomputers.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/input/serio/i8042-x86ia64io.h | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/input/serio/i8042-x86ia64io.h b/drivers/input/serio/i8042-x86ia64io.h index ea9996debb88..6b2e88da3076 100644 --- a/drivers/input/serio/i8042-x86ia64io.h +++ b/drivers/input/serio/i8042-x86ia64io.h @@ -1169,6 +1169,13 @@ static const struct dmi_system_id i8042_dmi_quirk_table[] __initconst = { .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) }, + { + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "PCX0DX"), + }, + .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | + SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) + }, { .matches = { DMI_MATCH(DMI_BOARD_NAME, "X170SM"),
From: Samuel Thibault samuel.thibault@ens-lyon.org
commit 2b09d5d364986f724f17001ccfe4126b9b43a0be upstream.
blit_x and blit_y are u32, so fbcon currently cannot support fonts larger than 32x32.
The 32x32 case also needs shifting an unsigned int, to properly set bit 31, otherwise we get "UBSAN: shift-out-of-bounds in fbcon_set_font", as reported on:
http://lore.kernel.org/all/IA1PR07MB98308653E259A6F2CE94A4AFABCE9@IA1PR07MB9... Kernel Branch: 6.2.0-rc5-next-20230124 Kernel config: https://drive.google.com/file/d/1F-LszDAizEEH0ZX0HcSR06v5q8FPl2Uv/view?usp=s... Reproducer: https://drive.google.com/file/d/1mP1jcLBY7vWCNM60OMf-ogw-urQRjNrm/view?usp=s...
Reported-by: Sanan Hasanov sanan.hasanov@Knights.ucf.edu Signed-off-by: Samuel Thibault samuel.thibault@ens-lyon.org Fixes: 2d2699d98492 ("fbcon: font setting should check limitation of driver") Cc: stable@vger.kernel.org Tested-by: Miko Larsson mikoxyzzz@gmail.com Reviewed-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/video/fbdev/core/fbcon.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/drivers/video/fbdev/core/fbcon.c +++ b/drivers/video/fbdev/core/fbcon.c @@ -2497,9 +2497,12 @@ static int fbcon_set_font(struct vc_data h > FBCON_SWAP(info->var.rotate, info->var.yres, info->var.xres)) return -EINVAL;
+ if (font->width > 32 || font->height > 32) + return -EINVAL; + /* Make sure drawing engine can handle the font */ - if (!(info->pixmap.blit_x & (1 << (font->width - 1))) || - !(info->pixmap.blit_y & (1 << (font->height - 1)))) + if (!(info->pixmap.blit_x & BIT(font->width - 1)) || + !(info->pixmap.blit_y & BIT(font->height - 1))) return -EINVAL;
/* Make sure driver can handle the font length */
From: Alexander Egorenkov egorenar@linux.ibm.com
commit fe8973a3ad0905cb9ba2d42db42ed51de14737df upstream.
With CONFIG_VMAP_STACK=y the stack is allocated from the vmalloc space. Data passed to a hardware or a hypervisor interface that requires V=R can no longer be allocated on the stack.
Use kmalloc() to get memory for a diag288 command.
Signed-off-by: Alexander Egorenkov egorenar@linux.ibm.com Reviewed-by: Heiko Carstens hca@linux.ibm.com Cc: stable@vger.kernel.org Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/watchdog/diag288_wdt.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
--- a/drivers/watchdog/diag288_wdt.c +++ b/drivers/watchdog/diag288_wdt.c @@ -272,12 +272,21 @@ static int __init diag288_init(void) char ebc_begin[] = { 194, 197, 199, 201, 213 }; + char *ebc_cmd;
watchdog_set_nowayout(&wdt_dev, nowayout_info);
if (MACHINE_IS_VM) { - if (__diag288_vm(WDT_FUNC_INIT, 15, - ebc_begin, sizeof(ebc_begin)) != 0) { + ebc_cmd = kmalloc(sizeof(ebc_begin), GFP_KERNEL); + if (!ebc_cmd) { + pr_err("The watchdog cannot be initialized\n"); + return -ENOMEM; + } + memcpy(ebc_cmd, ebc_begin, sizeof(ebc_begin)); + ret = __diag288_vm(WDT_FUNC_INIT, 15, + ebc_cmd, sizeof(ebc_begin)); + kfree(ebc_cmd); + if (ret != 0) { pr_err("The watchdog cannot be initialized\n"); return -EINVAL; }
From: Alexander Egorenkov egorenar@linux.ibm.com
commit 32e40f9506b9e32917eb73154f93037b443124d1 upstream.
The DIAG 288 statement consumes an EBCDIC string the address of which is passed in a register. Use a "memory" clobber to tell the compiler that memory is accessed within the inline assembly.
Signed-off-by: Alexander Egorenkov egorenar@linux.ibm.com Reviewed-by: Heiko Carstens hca@linux.ibm.com Cc: stable@vger.kernel.org Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/watchdog/diag288_wdt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/watchdog/diag288_wdt.c +++ b/drivers/watchdog/diag288_wdt.c @@ -86,7 +86,7 @@ static int __diag288(unsigned int func, "1:\n" EX_TABLE(0b, 1b) : "+d" (err) : "d"(__func), "d"(__timeout), - "d"(__action), "d"(__len) : "1", "cc"); + "d"(__action), "d"(__len) : "1", "cc", "memory"); return err; }
From: Ard Biesheuvel ardb@kernel.org
commit 636ab417a7aec4ee993916e688eb5c5977570836 upstream.
UEFI v2.10 introduces version 2 of the memory attributes table, which turns the reserved field into a flags field, but is compatible with version 1 in all other respects. So let's not complain about version 2 if we encounter it.
Cc: stable@vger.kernel.org Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firmware/efi/memattr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/firmware/efi/memattr.c +++ b/drivers/firmware/efi/memattr.c @@ -32,7 +32,7 @@ int __init efi_memattr_init(void) return -ENOMEM; }
- if (tbl->version > 1) { + if (tbl->version > 2) { pr_warn("Unexpected EFI Memory Attributes table version %d\n", tbl->version); goto unmap;
From: Dmitry Perchanov dmitry.perchanov@intel.com
commit f7b23d1c35d8b8de1425bdfccaefd01f3b7c9d1c upstream.
Return value should be zero for success. This was forgotten for timestamp feature. Verified on RealSense cameras.
Fixes: a96cd0f901ee ("iio: accel: hid-sensor-accel-3d: Add timestamp") Signed-off-by: Dmitry Perchanov dmitry.perchanov@intel.com Link: https://lore.kernel.org/r/a6dc426498221c81fa71045b41adf782ebd42136.camel@int... Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/accel/hid-sensor-accel-3d.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/iio/accel/hid-sensor-accel-3d.c +++ b/drivers/iio/accel/hid-sensor-accel-3d.c @@ -279,6 +279,7 @@ static int accel_3d_capture_sample(struc hid_sensor_convert_timestamp( &accel_state->common_attributes, *(int64_t *)raw_data); + ret = 0; break; default: break;
From: Xiongfeng Wang wangxiongfeng2@huawei.com
commit cbd3a0153cd18a2cbef6bf3cf31bb406c3fc9f55 upstream.
of_get_parent() will return a device_node pointer with refcount incremented. We need to use of_node_put() on it when done. Add the missing of_node_put() in the error path of berlin2_adc_probe();
Fixes: 70f1937911ca ("iio: adc: add support for Berlin") Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com Link: https://lore.kernel.org/r/20221129020316.191731-1-wangxiongfeng2@huawei.com Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/adc/berlin2-adc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/iio/adc/berlin2-adc.c +++ b/drivers/iio/adc/berlin2-adc.c @@ -289,8 +289,10 @@ static int berlin2_adc_probe(struct plat int ret;
indio_dev = devm_iio_device_alloc(&pdev->dev, sizeof(*priv)); - if (!indio_dev) + if (!indio_dev) { + of_node_put(parent_np); return -ENOMEM; + }
priv = iio_priv(indio_dev); platform_set_drvdata(pdev, indio_dev);
From: Andreas Kemnade andreas@kemnade.info
commit f804bd0dc28683a93a60f271aaefb2fc5b0853dd upstream.
Some inputs need to be wired up to produce proper measurements, without this change only near zero values are reported.
Signed-off-by: Andreas Kemnade andreas@kemnade.info Fixes: 1696f36482e70 ("iio: twl6030-gpadc: TWL6030, TWL6032 GPADC driver") Link: https://lore.kernel.org/r/20221201181635.3522962-1-andreas@kemnade.info Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/adc/twl6030-gpadc.c | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+)
--- a/drivers/iio/adc/twl6030-gpadc.c +++ b/drivers/iio/adc/twl6030-gpadc.c @@ -57,6 +57,18 @@ #define TWL6030_GPADCS BIT(1) #define TWL6030_GPADCR BIT(0)
+#define USB_VBUS_CTRL_SET 0x04 +#define USB_ID_CTRL_SET 0x06 + +#define TWL6030_MISC1 0xE4 +#define VBUS_MEAS 0x01 +#define ID_MEAS 0x01 + +#define VAC_MEAS 0x04 +#define VBAT_MEAS 0x02 +#define BB_MEAS 0x01 + + /** * struct twl6030_chnl_calib - channel calibration * @gain: slope coefficient for ideal curve @@ -927,6 +939,26 @@ static int twl6030_gpadc_probe(struct pl return ret; }
+ ret = twl_i2c_write_u8(TWL_MODULE_USB, VBUS_MEAS, USB_VBUS_CTRL_SET); + if (ret < 0) { + dev_err(dev, "failed to wire up inputs\n"); + return ret; + } + + ret = twl_i2c_write_u8(TWL_MODULE_USB, ID_MEAS, USB_ID_CTRL_SET); + if (ret < 0) { + dev_err(dev, "failed to wire up inputs\n"); + return ret; + } + + ret = twl_i2c_write_u8(TWL6030_MODULE_ID0, + VBAT_MEAS | BB_MEAS | BB_MEAS, + TWL6030_MISC1); + if (ret < 0) { + dev_err(dev, "failed to wire up inputs\n"); + return ret; + } + indio_dev->name = DRIVER_NAME; indio_dev->dev.parent = dev; indio_dev->info = &twl6030_gpadc_iio_info;
From: Helge Deller deller@gmx.de
commit 5d1335dabb3c493a3d6d5b233953b6ac7b6c1ff2 upstream.
There is an off-by-one if the printed string includes a new-line char.
Cc: stable@vger.kernel.org Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/parisc/kernel/firmware.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/arch/parisc/kernel/firmware.c +++ b/arch/parisc/kernel/firmware.c @@ -1229,7 +1229,7 @@ static char __attribute__((aligned(64))) */ int pdc_iodc_print(const unsigned char *str, unsigned count) { - unsigned int i; + unsigned int i, found = 0; unsigned long flags;
for (i = 0; i < count;) { @@ -1238,6 +1238,7 @@ int pdc_iodc_print(const unsigned char * iodc_dbuf[i+0] = '\r'; iodc_dbuf[i+1] = '\n'; i += 2; + found = 1; goto print; default: iodc_dbuf[i] = str[i]; @@ -1254,7 +1255,7 @@ print: __pa(iodc_retbuf), 0, __pa(iodc_dbuf), i, 0); spin_unlock_irqrestore(&pdc_lock, flags);
- return i; + return i - found; }
#if !defined(BOOTLOADER)
From: Helge Deller deller@gmx.de
commit 316f1f42b5cc1d95124c1f0387c867c1ba7b6d0e upstream.
Wire up the missing ptrace requests PTRACE_GETREGS, PTRACE_SETREGS, PTRACE_GETFPREGS and PTRACE_SETFPREGS when running 32-bit applications on 64-bit kernels.
Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # 4.7+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/parisc/kernel/ptrace.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
--- a/arch/parisc/kernel/ptrace.c +++ b/arch/parisc/kernel/ptrace.c @@ -128,6 +128,12 @@ long arch_ptrace(struct task_struct *chi unsigned long tmp; long ret = -EIO;
+ unsigned long user_regs_struct_size = sizeof(struct user_regs_struct); +#ifdef CONFIG_64BIT + if (is_compat_task()) + user_regs_struct_size /= 2; +#endif + switch (request) {
/* Read the word at location addr in the USER area. For ptraced @@ -183,14 +189,14 @@ long arch_ptrace(struct task_struct *chi return copy_regset_to_user(child, task_user_regset_view(current), REGSET_GENERAL, - 0, sizeof(struct user_regs_struct), + 0, user_regs_struct_size, datap);
case PTRACE_SETREGS: /* Set all gp regs in the child. */ return copy_regset_from_user(child, task_user_regset_view(current), REGSET_GENERAL, - 0, sizeof(struct user_regs_struct), + 0, user_regs_struct_size, datap);
case PTRACE_GETFPREGS: /* Get the child FPU state. */ @@ -304,6 +310,11 @@ long compat_arch_ptrace(struct task_stru } } break; + case PTRACE_GETREGS: + case PTRACE_SETREGS: + case PTRACE_GETFPREGS: + case PTRACE_SETFPREGS: + return arch_ptrace(child, request, addr, data);
default: ret = compat_ptrace_request(child, request, addr, data);
From: Andreas Schwab schwab@suse.de
commit 2f394c0e7d1129a35156e492bc8f445fb20f43ac upstream.
GCC 13 will enable -fasynchronous-unwind-tables by default on riscv. In the kernel, we don't have any use for unwind tables yet, so disable them. More importantly, the .eh_frame section brings relocations (R_RISC_32_PCREL, R_RISCV_SET{6,8,16}, R_RISCV_SUB{6,8,16}) into modules that we are not prepared to handle.
Signed-off-by: Andreas Schwab schwab@suse.de Link: https://lore.kernel.org/r/mvmzg9xybqu.fsf@suse.de Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/Makefile | 3 +++ 1 file changed, 3 insertions(+)
--- a/arch/riscv/Makefile +++ b/arch/riscv/Makefile @@ -75,6 +75,9 @@ ifeq ($(CONFIG_PERF_EVENTS),y) KBUILD_CFLAGS += -fno-omit-frame-pointer endif
+# Avoid generating .eh_frame sections. +KBUILD_CFLAGS += -fno-asynchronous-unwind-tables -fno-unwind-tables + KBUILD_CFLAGS_MODULE += $(call cc-option,-mno-relax) KBUILD_AFLAGS_MODULE += $(call as-option,-Wa$(comma)-mno-relax)
From: Mike Kravetz mike.kravetz@oracle.com
commit 3489dbb696d25602aea8c3e669a6d43b76bd5358 upstream.
Patch series "Fixes for hugetlb mapcount at most 1 for shared PMDs".
This issue of mapcount in hugetlb pages referenced by shared PMDs was discussed in [1]. The following two patches address user visible behavior caused by this issue.
[1] https://lore.kernel.org/linux-mm/Y9BF+OCdWnCSilEu@monkey/
This patch (of 2):
A hugetlb page will have a mapcount of 1 if mapped by multiple processes via a shared PMD. This is because only the first process increases the map count, and subsequent processes just add the shared PMD page to their page table.
page_mapcount is being used to decide if a hugetlb page is shared or private in /proc/PID/smaps. Pages referenced via a shared PMD were incorrectly being counted as private.
To fix, check for a shared PMD if mapcount is 1. If a shared PMD is found count the hugetlb page as shared. A new helper to check for a shared PMD is added.
[akpm@linux-foundation.org: simplification, per David] [akpm@linux-foundation.org: hugetlb.h: include page_ref.h for page_count()] Link: https://lkml.kernel.org/r/20230126222721.222195-2-mike.kravetz@oracle.com Fixes: 25ee01a2fca0 ("mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps") Signed-off-by: Mike Kravetz mike.kravetz@oracle.com Acked-by: Peter Xu peterx@redhat.com Cc: David Hildenbrand david@redhat.com Cc: James Houghton jthoughton@google.com Cc: Matthew Wilcox willy@infradead.org Cc: Michal Hocko mhocko@suse.com Cc: Muchun Song songmuchun@bytedance.com Cc: Naoya Horiguchi naoya.horiguchi@linux.dev Cc: Vishal Moola (Oracle) vishal.moola@gmail.com Cc: Yang Shi shy828301@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/proc/task_mmu.c | 4 +--- include/linux/hugetlb.h | 13 +++++++++++++ 2 files changed, 14 insertions(+), 3 deletions(-)
--- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -723,9 +723,7 @@ static int smaps_hugetlb_range(pte_t *pt page = device_private_entry_to_page(swpent); } if (page) { - int mapcount = page_mapcount(page); - - if (mapcount >= 2) + if (page_mapcount(page) >= 2 || hugetlb_pmd_shared(pte)) mss->shared_hugetlb += huge_page_size(hstate_vma(vma)); else mss->private_hugetlb += huge_page_size(hstate_vma(vma)); --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -7,6 +7,7 @@ #include <linux/fs.h> #include <linux/hugetlb_inline.h> #include <linux/cgroup.h> +#include <linux/page_ref.h> #include <linux/list.h> #include <linux/kref.h> #include <asm/pgtable.h> @@ -744,4 +745,16 @@ static inline spinlock_t *huge_pte_lock( return ptl; }
+#ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE +static inline bool hugetlb_pmd_shared(pte_t *pte) +{ + return page_count(virt_to_page(pte)) > 1; +} +#else +static inline bool hugetlb_pmd_shared(pte_t *pte) +{ + return false; +} +#endif + #endif /* _LINUX_HUGETLB_H */
From: Zheng Yongjun zhengyongjun3@huawei.com
commit 65ea840afd508194b0ee903256162aa87e46ec30 upstream.
In case of error, the function stratix10_svc_allocate_memory() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR().
Fixes: e7eef1d7633a ("fpga: add intel stratix10 soc fpga manager driver") Signed-off-by: Zheng Yongjun zhengyongjun3@huawei.com Reviewed-by: Russ Weight russell.h.weight@intel.com Cc: stable@vger.kernel.org Acked-by: Xu Yilun yilun.xu@intel.com Link: https://lore.kernel.org/r/20221126071430.19540-1-zhengyongjun3@huawei.com Signed-off-by: Xu Yilun yilun.xu@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/fpga/stratix10-soc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/fpga/stratix10-soc.c +++ b/drivers/fpga/stratix10-soc.c @@ -218,9 +218,9 @@ static int s10_ops_write_init(struct fpg /* Allocate buffers from the service layer's pool. */ for (i = 0; i < NUM_SVC_BUFS; i++) { kbuf = stratix10_svc_allocate_memory(priv->chan, SVC_BUF_SIZE); - if (!kbuf) { + if (IS_ERR(kbuf)) { s10_free_buffers(mgr); - ret = -ENOMEM; + ret = PTR_ERR(kbuf); goto init_done; }
From: Longlong Xia xialonglong1@huawei.com
commit 7717fc1a12f88701573f9ed897cc4f6699c661e3 upstream.
The softlockup still occurs in get_swap_pages() under memory pressure. 64 CPU cores, 64GB memory, and 28 zram devices, the disksize of each zram device is 50MB with same priority as si. Use the stress-ng tool to increase memory pressure, causing the system to oom frequently.
The plist_for_each_entry_safe() loops in get_swap_pages() could reach tens of thousands of times to find available space (extreme case: cond_resched() is not called in scan_swap_map_slots()). Let's add cond_resched() into get_swap_pages() when failed to find available space to avoid softlockup.
Link: https://lkml.kernel.org/r/20230128094757.1060525-1-xialonglong1@huawei.com Signed-off-by: Longlong Xia xialonglong1@huawei.com Reviewed-by: "Huang, Ying" ying.huang@intel.com Cc: Chen Wandun chenwandun@huawei.com Cc: Huang Ying ying.huang@intel.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Nanyong Sun sunnanyong@huawei.com Cc: Hugh Dickins hughd@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/swapfile.c | 1 + 1 file changed, 1 insertion(+)
--- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1061,6 +1061,7 @@ start_over: goto check_out; pr_debug("scan_swap_map of si %d failed to find offset\n", si->type); + cond_resched();
spin_lock(&swap_avail_lock); nextsi:
From: Phillip Lougher phillip@squashfs.org.uk
commit f65c4bbbd682b0877b669828b4e033b8d5d0a2dc upstream.
A Sysbot [1] corrupted filesystem exposes two flaws in the handling and sanity checking of the xattr_ids count in the filesystem. Both of these flaws cause computation overflow due to incorrect typing.
In the corrupted filesystem the xattr_ids value is 4294967071, which stored in a signed variable becomes the negative number -225.
Flaw 1 (64-bit systems only):
The signed integer xattr_ids variable causes sign extension.
This causes variable overflow in the SQUASHFS_XATTR_*(A) macros. The variable is first multiplied by sizeof(struct squashfs_xattr_id) where the type of the sizeof operator is "unsigned long".
On a 64-bit system this is 64-bits in size, and causes the negative number to be sign extended and widened to 64-bits and then become unsigned. This produces the very large number 18446744073709548016 or 2^64 - 3600. This number when rounded up by SQUASHFS_METADATA_SIZE - 1 (8191 bytes) and divided by SQUASHFS_METADATA_SIZE overflows and produces a length of 0 (stored in len).
Flaw 2 (32-bit systems only):
On a 32-bit system the integer variable is not widened by the unsigned long type of the sizeof operator (32-bits), and the signedness of the variable has no effect due it always being treated as unsigned.
The above corrupted xattr_ids value of 4294967071, when multiplied overflows and produces the number 4294963696 or 2^32 - 3400. This number when rounded up by SQUASHFS_METADATA_SIZE - 1 (8191 bytes) and divided by SQUASHFS_METADATA_SIZE overflows again and produces a length of 0.
The effect of the 0 length computation:
In conjunction with the corrupted xattr_ids field, the filesystem also has a corrupted xattr_table_start value, where it matches the end of filesystem value of 850.
This causes the following sanity check code to fail because the incorrectly computed len of 0 matches the incorrect size of the table reported by the superblock (0 bytes).
len = SQUASHFS_XATTR_BLOCK_BYTES(*xattr_ids); indexes = SQUASHFS_XATTR_BLOCKS(*xattr_ids);
/* * The computed size of the index table (len bytes) should exactly * match the table start and end points */ start = table_start + sizeof(*id_table); end = msblk->bytes_used;
if (len != (end - start)) return ERR_PTR(-EINVAL);
Changing the xattr_ids variable to be "usigned int" fixes the flaw on a 64-bit system. This relies on the fact the computation is widened by the unsigned long type of the sizeof operator.
Casting the variable to u64 in the above macro fixes this flaw on a 32-bit system.
It also means 64-bit systems do not implicitly rely on the type of the sizeof operator to widen the computation.
[1] https://lore.kernel.org/lkml/000000000000cd44f005f1a0f17f@google.com/
Link: https://lkml.kernel.org/r/20230127061842.10965-1-phillip@squashfs.org.uk Fixes: 506220d2ba21 ("squashfs: add more sanity checks in xattr id lookup") Signed-off-by: Phillip Lougher phillip@squashfs.org.uk Reported-by: syzbot+082fa4af80a5bb1a9843@syzkaller.appspotmail.com Cc: Alexey Khoroshilov khoroshilov@ispras.ru Cc: Fedor Pchelkin pchelkin@ispras.ru Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/squashfs/squashfs_fs.h | 2 +- fs/squashfs/squashfs_fs_sb.h | 2 +- fs/squashfs/xattr.h | 4 ++-- fs/squashfs/xattr_id.c | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-)
--- a/fs/squashfs/squashfs_fs.h +++ b/fs/squashfs/squashfs_fs.h @@ -183,7 +183,7 @@ static inline int squashfs_block_size(__ #define SQUASHFS_ID_BLOCK_BYTES(A) (SQUASHFS_ID_BLOCKS(A) *\ sizeof(u64)) /* xattr id lookup table defines */ -#define SQUASHFS_XATTR_BYTES(A) ((A) * sizeof(struct squashfs_xattr_id)) +#define SQUASHFS_XATTR_BYTES(A) (((u64) (A)) * sizeof(struct squashfs_xattr_id))
#define SQUASHFS_XATTR_BLOCK(A) (SQUASHFS_XATTR_BYTES(A) / \ SQUASHFS_METADATA_SIZE) --- a/fs/squashfs/squashfs_fs_sb.h +++ b/fs/squashfs/squashfs_fs_sb.h @@ -63,7 +63,7 @@ struct squashfs_sb_info { long long bytes_used; unsigned int inodes; unsigned int fragments; - int xattr_ids; + unsigned int xattr_ids; unsigned int ids; }; #endif --- a/fs/squashfs/xattr.h +++ b/fs/squashfs/xattr.h @@ -10,12 +10,12 @@
#ifdef CONFIG_SQUASHFS_XATTR extern __le64 *squashfs_read_xattr_id_table(struct super_block *, u64, - u64 *, int *); + u64 *, unsigned int *); extern int squashfs_xattr_lookup(struct super_block *, unsigned int, int *, unsigned int *, unsigned long long *); #else static inline __le64 *squashfs_read_xattr_id_table(struct super_block *sb, - u64 start, u64 *xattr_table_start, int *xattr_ids) + u64 start, u64 *xattr_table_start, unsigned int *xattr_ids) { struct squashfs_xattr_id_table *id_table;
--- a/fs/squashfs/xattr_id.c +++ b/fs/squashfs/xattr_id.c @@ -56,7 +56,7 @@ int squashfs_xattr_lookup(struct super_b * Read uncompressed xattr id lookup table indexes from disk into memory */ __le64 *squashfs_read_xattr_id_table(struct super_block *sb, u64 table_start, - u64 *xattr_table_start, int *xattr_ids) + u64 *xattr_table_start, unsigned int *xattr_ids) { struct squashfs_sb_info *msblk = sb->s_fs_info; unsigned int len, indexes;
From: Michael Walle michael@walle.cc
commit db3546d58b5a0fa581d9c9f2bdc2856fa6c5e43e upstream.
nvmem_add_cells() could return an error after some cells are already added to the provider. In this case, the added cells are not removed. Remove any registered cells if nvmem_add_cells() fails.
Fixes: fa72d847d68d7 ("nvmem: check the return value of nvmem_add_cells()") Cc: stable@vger.kernel.org Signed-off-by: Michael Walle michael@walle.cc Signed-off-by: Srinivas Kandagatla srinivas.kandagatla@linaro.org Link: https://lore.kernel.org/r/20230127104015.23839-9-srinivas.kandagatla@linaro.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/nvmem/core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/nvmem/core.c +++ b/drivers/nvmem/core.c @@ -439,7 +439,7 @@ struct nvmem_device *nvmem_register(cons if (config->cells) { rval = nvmem_add_cells(nvmem, config->cells, config->ncells); if (rval) - goto err_teardown_compat; + goto err_remove_cells; }
rval = nvmem_add_cells_from_table(nvmem); @@ -456,7 +456,6 @@ struct nvmem_device *nvmem_register(cons
err_remove_cells: nvmem_device_remove_all_cells(nvmem); -err_teardown_compat: if (config->compat) nvmem_sysfs_remove_compat(nvmem, config); err_device_del:
From: Andrea Righi andrea.righi@canonical.com
commit ebc5951eea499314f6fbbde20e295f1345c67330 upstream.
In unuse_pte_range() we blindly swap-in pages without checking if the swap entry is already present in the swap cache.
By doing this, the hit/miss ratio used by the swap readahead heuristic is not properly updated and this leads to non-optimal performance during swapoff.
Tracing the distribution of the readahead size returned by the swap readahead heuristic during swapoff shows that a small readahead size is used most of the time as if we had only misses (this happens both with cluster and vma readahead), for example:
r::swapin_nr_pages(unsigned long offset):unsigned long:$retval COUNT EVENT 36948 $retval = 8 44151 $retval = 4 49290 $retval = 1 527771 $retval = 2
Checking if the swap entry is present in the swap cache, instead, allows to properly update the readahead statistics and the heuristic behaves in a better way during swapoff, selecting a bigger readahead size:
r::swapin_nr_pages(unsigned long offset):unsigned long:$retval COUNT EVENT 1618 $retval = 1 4960 $retval = 2 41315 $retval = 4 103521 $retval = 8
In terms of swapoff performance the result is the following:
Testing environment ===================
- Host: CPU: 1.8GHz Intel Core i7-8565U (quad-core, 8MB cache) HDD: PC401 NVMe SK hynix 512GB MEM: 16GB
- Guest (kvm): 8GB of RAM virtio block driver 16GB swap file on ext4 (/swapfile)
Test case ========= - allocate 85% of memory - `systemctl hibernate` to force all the pages to be swapped-out to the swap file - resume the system - measure the time that swapoff takes to complete: # /usr/bin/time swapoff /swapfile
Result (swapoff time) ====== 5.6 vanilla 5.6 w/ this patch ----------- ----------------- cluster-readahead 22.09s 12.19s vma-readahead 18.20s 15.33s
Conclusion ==========
The specific use case this patch is addressing is to improve swapoff performance in cloud environments when a VM has been hibernated, resumed and all the memory needs to be forced back to RAM by disabling swap.
This change allows to better exploits the advantages of the readahead heuristic during swapoff and this improvement allows to to speed up the resume process of such VMs.
[andrea.righi@canonical.com: update changelog] Link: http://lkml.kernel.org/r/20200418084705.GA147642@xps-13 Signed-off-by: Andrea Righi andrea.righi@canonical.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Reviewed-by: "Huang, Ying" ying.huang@intel.com Cc: Minchan Kim minchan@kernel.org Cc: Anchal Agarwal anchalag@amazon.com Cc: Hugh Dickins hughd@google.com Cc: Vineeth Remanan Pillai vpillai@digitalocean.com Cc: Kelley Nielsen kelleynnn@gmail.com Link: http://lkml.kernel.org/r/20200416180132.GB3352@xps-13 Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Luiz Capitulino luizcap@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/swapfile.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
Add missing SOB.
--- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1951,10 +1951,14 @@ static int unuse_pte_range(struct vm_are
pte_unmap(pte); swap_map = &si->swap_map[offset]; - vmf.vma = vma; - vmf.address = addr; - vmf.pmd = pmd; - page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, &vmf); + page = lookup_swap_cache(entry, vma, addr); + if (!page) { + vmf.vma = vma; + vmf.address = addr; + vmf.pmd = pmd; + page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, + &vmf); + } if (!page) { if (*swap_map == 0 || *swap_map == SWAP_MAP_BAD) goto try_next;
From: Zhang Xiaoxu zhangxiaoxu5@huawei.com
commit 9181f40fb2952fd59ecb75e7158620c9c669eee3 upstream.
If rdma receive buffer allocate failed, should call rpcrdma_regbuf_free() to free the send buffer, otherwise, the buffer data will be leaked.
Fixes: bb93a1ae2bf4 ("xprtrdma: Allocate req's regbufs at xprt create time") Signed-off-by: Zhang Xiaoxu zhangxiaoxu5@huawei.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com [Harshit: Backport to 5.4.y] Also make the same change for 'req->rl_rdmabuf' at the same time as this will also have the same memory leak problem as 'req->rl_sendbuf' (This is because commit b78de1dca00376aaba7a58bb5fe21c1606524abe is not in 5.4.y) Signed-off-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sunrpc/xprtrdma/verbs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -1034,9 +1034,9 @@ struct rpcrdma_req *rpcrdma_req_create(s return req;
out4: - kfree(req->rl_sendbuf); + rpcrdma_regbuf_free(req->rl_sendbuf); out3: - kfree(req->rl_rdmabuf); + rpcrdma_regbuf_free(req->rl_rdmabuf); out2: kfree(req); out1:
From: Ilpo Järvinen ilpo.jarvinen@linux.intel.com
commit 31352811e13dc2313f101b890fd4b1ce760b5fe7 upstream.
__dma_rx_complete() is called from two places: - Through the DMA completion callback dma_rx_complete() - From serial8250_rx_dma_flush() after IIR_RLSI or IIR_RX_TIMEOUT The former does not hold port's lock during __dma_rx_complete() which allows these two to race and potentially insert the same data twice.
Extend port's lock coverage in dma_rx_complete() to prevent the race and check if the DMA Rx is still pending completion before calling into __dma_rx_complete().
Reported-by: Gilles BULOZ gilles.buloz@kontron.com Tested-by: Gilles BULOZ gilles.buloz@kontron.com Fixes: 9ee4b83e51f7 ("serial: 8250: Add support for dmaengine") Cc: stable@vger.kernel.org Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/r/20230130114841.25749-2-ilpo.jarvinen@linux.intel.c... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/8250/8250_dma.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
--- a/drivers/tty/serial/8250/8250_dma.c +++ b/drivers/tty/serial/8250/8250_dma.c @@ -59,6 +59,18 @@ static void __dma_rx_complete(void *para tty_flip_buffer_push(tty_port); }
+static void dma_rx_complete(void *param) +{ + struct uart_8250_port *p = param; + struct uart_8250_dma *dma = p->dma; + unsigned long flags; + + spin_lock_irqsave(&p->port.lock, flags); + if (dma->rx_running) + __dma_rx_complete(p); + spin_unlock_irqrestore(&p->port.lock, flags); +} + int serial8250_tx_dma(struct uart_8250_port *p) { struct uart_8250_dma *dma = p->dma; @@ -121,7 +133,7 @@ int serial8250_rx_dma(struct uart_8250_p return -EBUSY;
dma->rx_running = 1; - desc->callback = __dma_rx_complete; + desc->callback = dma_rx_complete; desc->callback_param = p;
dma->rx_cookie = dmaengine_submit(desc);
From: Ilpo Järvinen ilpo.jarvinen@linux.intel.com
commit 57e9af7831dcf211c5c689c2a6f209f4abdf0bce upstream.
As DMA Rx can be completed from two places, it is possible that DMA Rx completes before DMA completion callback had a chance to complete it. Once the previous DMA Rx has been completed, a new one can be started on the next UART interrupt. The following race is possible (uart_unlock_and_check_sysrq_irqrestore() replaced with spin_unlock_irqrestore() for simplicity/clarity):
CPU0 CPU1 dma_rx_complete() serial8250_handle_irq() spin_lock_irqsave(&port->lock) handle_rx_dma() serial8250_rx_dma_flush() __dma_rx_complete() dma->rx_running = 0 // Complete DMA Rx spin_unlock_irqrestore(&port->lock)
serial8250_handle_irq() spin_lock_irqsave(&port->lock) handle_rx_dma() serial8250_rx_dma() dma->rx_running = 1 // Setup a new DMA Rx spin_unlock_irqrestore(&port->lock)
spin_lock_irqsave(&port->lock) // sees dma->rx_running = 1 __dma_rx_complete() dma->rx_running = 0 // Incorrectly complete // running DMA Rx
This race seems somewhat theoretical to occur for real but handle it correctly regardless. Check what is the DMA status before complething anything in __dma_rx_complete().
Reported-by: Gilles BULOZ gilles.buloz@kontron.com Tested-by: Gilles BULOZ gilles.buloz@kontron.com Fixes: 9ee4b83e51f7 ("serial: 8250: Add support for dmaengine") Cc: stable@vger.kernel.org Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/r/20230130114841.25749-3-ilpo.jarvinen@linux.intel.c... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/8250/8250_dma.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
--- a/drivers/tty/serial/8250/8250_dma.c +++ b/drivers/tty/serial/8250/8250_dma.c @@ -46,15 +46,23 @@ static void __dma_rx_complete(void *para struct uart_8250_dma *dma = p->dma; struct tty_port *tty_port = &p->port.state->port; struct dma_tx_state state; + enum dma_status dma_status; int count;
- dma->rx_running = 0; - dmaengine_tx_status(dma->rxchan, dma->rx_cookie, &state); + /* + * New DMA Rx can be started during the completion handler before it + * could acquire port's lock and it might still be ongoing. Don't to + * anything in such case. + */ + dma_status = dmaengine_tx_status(dma->rxchan, dma->rx_cookie, &state); + if (dma_status == DMA_IN_PROGRESS) + return;
count = dma->rx_size - state.residue;
tty_insert_flip_string(tty_port, dma->rx_buf, count); p->port.icount.rx += count; + dma->rx_running = 0;
tty_flip_buffer_push(tty_port); }
From: Michael Ellerman mpe@ellerman.id.au
commit ad53db4acb415976761d7302f5b02e97f2bd097e upstream.
The recent commit 76d588dddc45 ("powerpc/imc-pmu: Fix use of mutex in IRQs disabled section") fixed warnings (and possible deadlocks) in the IMC PMU driver by converting the locking to use spinlocks.
It also converted the init-time nest_init_lock to a spinlock, even though it's not used at runtime in IRQ disabled sections or while holding other spinlocks.
This leads to warnings such as:
BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0 preempt_count: 1, expected: 0 CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc2-14719-gf12cd06109f4-dirty #1 Hardware name: Mambo,Simulated-System POWER9 0x4e1203 opal:v6.6.6 PowerNV Call Trace: dump_stack_lvl+0x74/0xa8 (unreliable) __might_resched+0x178/0x1a0 __cpuhp_setup_state+0x64/0x1e0 init_imc_pmu+0xe48/0x1250 opal_imc_counters_probe+0x30c/0x6a0 platform_probe+0x78/0x110 really_probe+0x104/0x420 __driver_probe_device+0xb0/0x170 driver_probe_device+0x58/0x180 __driver_attach+0xd8/0x250 bus_for_each_dev+0xb4/0x140 driver_attach+0x34/0x50 bus_add_driver+0x1e8/0x2d0 driver_register+0xb4/0x1c0 __platform_driver_register+0x38/0x50 opal_imc_driver_init+0x2c/0x40 do_one_initcall+0x80/0x360 kernel_init_freeable+0x310/0x3b8 kernel_init+0x30/0x1a0 ret_from_kernel_thread+0x5c/0x64
Fix it by converting nest_init_lock back to a mutex, so that we can call sleeping functions while holding it. There is no interaction between nest_init_lock and the runtime spinlocks used by the actual PMU routines.
Fixes: 76d588dddc45 ("powerpc/imc-pmu: Fix use of mutex in IRQs disabled section") Tested-by: Kajol Jainkjain@linux.ibm.com Reviewed-by: Kajol Jainkjain@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20230130014401.540543-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/perf/imc-pmu.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
--- a/arch/powerpc/perf/imc-pmu.c +++ b/arch/powerpc/perf/imc-pmu.c @@ -21,7 +21,7 @@ * Used to avoid races in counting the nest-pmu units during hotplug * register and unregister */ -static DEFINE_SPINLOCK(nest_init_lock); +static DEFINE_MUTEX(nest_init_lock); static DEFINE_PER_CPU(struct imc_pmu_ref *, local_nest_imc_refc); static struct imc_pmu **per_nest_pmu_arr; static cpumask_t nest_imc_cpumask; @@ -1605,7 +1605,7 @@ static void imc_common_mem_free(struct i static void imc_common_cpuhp_mem_free(struct imc_pmu *pmu_ptr) { if (pmu_ptr->domain == IMC_DOMAIN_NEST) { - spin_lock(&nest_init_lock); + mutex_lock(&nest_init_lock); if (nest_pmus == 1) { cpuhp_remove_state(CPUHP_AP_PERF_POWERPC_NEST_IMC_ONLINE); kfree(nest_imc_refc); @@ -1615,7 +1615,7 @@ static void imc_common_cpuhp_mem_free(st
if (nest_pmus > 0) nest_pmus--; - spin_unlock(&nest_init_lock); + mutex_unlock(&nest_init_lock); }
/* Free core_imc memory */ @@ -1772,11 +1772,11 @@ int init_imc_pmu(struct device_node *par * rest. To handle the cpuhotplug callback unregister, we track * the number of nest pmus in "nest_pmus". */ - spin_lock(&nest_init_lock); + mutex_lock(&nest_init_lock); if (nest_pmus == 0) { ret = init_nest_pmu_ref(); if (ret) { - spin_unlock(&nest_init_lock); + mutex_unlock(&nest_init_lock); kfree(per_nest_pmu_arr); per_nest_pmu_arr = NULL; goto err_free_mem; @@ -1784,7 +1784,7 @@ int init_imc_pmu(struct device_node *par /* Register for cpu hotplug notification. */ ret = nest_pmu_cpumask_init(); if (ret) { - spin_unlock(&nest_init_lock); + mutex_unlock(&nest_init_lock); kfree(nest_imc_refc); kfree(per_nest_pmu_arr); per_nest_pmu_arr = NULL; @@ -1792,7 +1792,7 @@ int init_imc_pmu(struct device_node *par } } nest_pmus++; - spin_unlock(&nest_init_lock); + mutex_unlock(&nest_init_lock); break; case IMC_DOMAIN_CORE: ret = core_imc_pmu_cpumask_init();
From: Dongliang Mu dzm91@hust.edu.cn
commit b76449ee75e21acfe9fa4c653d8598f191ed7d68 upstream.
The current error handling code in ufx_usb_probe have many unmatching issues, e.g., missing ufx_free_usb_list, destroy_modedb label should only include framebuffer_release, fb_dealloc_cmap only matches fb_alloc_cmap.
My local syzkaller reports a memory leak bug:
memory leak in ufx_usb_probe
BUG: memory leak unreferenced object 0xffff88802f879580 (size 128): comm "kworker/0:7", pid 17416, jiffies 4295067474 (age 46.710s) hex dump (first 32 bytes): 80 21 7c 2e 80 88 ff ff 18 d0 d0 0c 80 88 ff ff .!|............. 00 d0 d0 0c 80 88 ff ff e0 ff ff ff 0f 00 00 00 ................ backtrace: [<ffffffff814c99a0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1045 [<ffffffff824d219c>] kmalloc include/linux/slab.h:553 [inline] [<ffffffff824d219c>] kzalloc include/linux/slab.h:689 [inline] [<ffffffff824d219c>] ufx_alloc_urb_list drivers/video/fbdev/smscufx.c:1873 [inline] [<ffffffff824d219c>] ufx_usb_probe+0x11c/0x15a0 drivers/video/fbdev/smscufx.c:1655 [<ffffffff82d17927>] usb_probe_interface+0x177/0x370 drivers/usb/core/driver.c:396 [<ffffffff82712f0d>] call_driver_probe drivers/base/dd.c:560 [inline] [<ffffffff82712f0d>] really_probe+0x12d/0x390 drivers/base/dd.c:639 [<ffffffff8271322f>] __driver_probe_device+0xbf/0x140 drivers/base/dd.c:778 [<ffffffff827132da>] driver_probe_device+0x2a/0x120 drivers/base/dd.c:808 [<ffffffff82713c27>] __device_attach_driver+0xf7/0x150 drivers/base/dd.c:936 [<ffffffff82710137>] bus_for_each_drv+0xb7/0x100 drivers/base/bus.c:427 [<ffffffff827136b5>] __device_attach+0x105/0x2d0 drivers/base/dd.c:1008 [<ffffffff82711d36>] bus_probe_device+0xc6/0xe0 drivers/base/bus.c:487 [<ffffffff8270e242>] device_add+0x642/0xdc0 drivers/base/core.c:3517 [<ffffffff82d14d5f>] usb_set_configuration+0x8ef/0xb80 drivers/usb/core/message.c:2170 [<ffffffff82d2576c>] usb_generic_driver_probe+0x8c/0xc0 drivers/usb/core/generic.c:238 [<ffffffff82d16ffc>] usb_probe_device+0x5c/0x140 drivers/usb/core/driver.c:293 [<ffffffff82712f0d>] call_driver_probe drivers/base/dd.c:560 [inline] [<ffffffff82712f0d>] really_probe+0x12d/0x390 drivers/base/dd.c:639 [<ffffffff8271322f>] __driver_probe_device+0xbf/0x140 drivers/base/dd.c:778
Fix this bug by rewriting the error handling code in ufx_usb_probe.
Reported-by: syzkaller syzkaller@googlegroups.com Tested-by: Dongliang Mu dzm91@hust.edu.cn Signed-off-by: Dongliang Mu dzm91@hust.edu.cn Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/video/fbdev/smscufx.c | 46 ++++++++++++++++++++++++++++-------------- 1 file changed, 31 insertions(+), 15 deletions(-)
--- a/drivers/video/fbdev/smscufx.c +++ b/drivers/video/fbdev/smscufx.c @@ -1622,7 +1622,7 @@ static int ufx_usb_probe(struct usb_inte struct usb_device *usbdev; struct ufx_data *dev; struct fb_info *info; - int retval; + int retval = -ENOMEM; u32 id_rev, fpga_rev;
/* usb initialization */ @@ -1654,15 +1654,17 @@ static int ufx_usb_probe(struct usb_inte
if (!ufx_alloc_urb_list(dev, WRITES_IN_FLIGHT, MAX_TRANSFER)) { dev_err(dev->gdev, "ufx_alloc_urb_list failed\n"); - goto e_nomem; + goto put_ref; }
/* We don't register a new USB class. Our client interface is fbdev */
/* allocates framebuffer driver structure, not framebuffer memory */ info = framebuffer_alloc(0, &usbdev->dev); - if (!info) - goto e_nomem; + if (!info) { + dev_err(dev->gdev, "framebuffer_alloc failed\n"); + goto free_urb_list; + }
dev->info = info; info->par = dev; @@ -1705,22 +1707,34 @@ static int ufx_usb_probe(struct usb_inte check_warn_goto_error(retval, "unable to find common mode for display and adapter");
retval = ufx_reg_set_bits(dev, 0x4000, 0x00000001); - check_warn_goto_error(retval, "error %d enabling graphics engine", retval); + if (retval < 0) { + dev_err(dev->gdev, "error %d enabling graphics engine", retval); + goto setup_modes; + }
/* ready to begin using device */ atomic_set(&dev->usb_active, 1);
dev_dbg(dev->gdev, "checking var"); retval = ufx_ops_check_var(&info->var, info); - check_warn_goto_error(retval, "error %d ufx_ops_check_var", retval); + if (retval < 0) { + dev_err(dev->gdev, "error %d ufx_ops_check_var", retval); + goto reset_active; + }
dev_dbg(dev->gdev, "setting par"); retval = ufx_ops_set_par(info); - check_warn_goto_error(retval, "error %d ufx_ops_set_par", retval); + if (retval < 0) { + dev_err(dev->gdev, "error %d ufx_ops_set_par", retval); + goto reset_active; + }
dev_dbg(dev->gdev, "registering framebuffer"); retval = register_framebuffer(info); - check_warn_goto_error(retval, "error %d register_framebuffer", retval); + if (retval < 0) { + dev_err(dev->gdev, "error %d register_framebuffer", retval); + goto reset_active; + }
dev_info(dev->gdev, "SMSC UDX USB device /dev/fb%d attached. %dx%d resolution." " Using %dK framebuffer memory\n", info->node, @@ -1728,21 +1742,23 @@ static int ufx_usb_probe(struct usb_inte
return 0;
-error: - fb_dealloc_cmap(&info->cmap); -destroy_modedb: +reset_active: + atomic_set(&dev->usb_active, 0); +setup_modes: fb_destroy_modedb(info->monspecs.modedb); vfree(info->screen_base); fb_destroy_modelist(&info->modelist); +error: + fb_dealloc_cmap(&info->cmap); +destroy_modedb: framebuffer_release(info); +free_urb_list: + if (dev->urbs.count > 0) + ufx_free_urb_list(dev); put_ref: kref_put(&dev->kref, ufx_free); /* ref for framebuffer */ kref_put(&dev->kref, ufx_free); /* last ref from kref_init */ return retval; - -e_nomem: - retval = -ENOMEM; - goto put_ref; }
static void ufx_usb_disconnect(struct usb_interface *interface)
From: Chao Yu chao@kernel.org
commit d3b7b4afd6b2c344eabf9cc26b8bfa903c164c7c upstream.
syzbot found a f2fs bug:
BUG: KASAN: slab-out-of-bounds in data_blkaddr fs/f2fs/f2fs.h:2891 [inline] BUG: KASAN: slab-out-of-bounds in is_alive fs/f2fs/gc.c:1117 [inline] BUG: KASAN: slab-out-of-bounds in gc_data_segment fs/f2fs/gc.c:1520 [inline] BUG: KASAN: slab-out-of-bounds in do_garbage_collect+0x386a/0x3df0 fs/f2fs/gc.c:1734 Read of size 4 at addr ffff888076557568 by task kworker/u4:3/52
CPU: 1 PID: 52 Comm: kworker/u4:3 Not tainted 6.1.0-rc4-syzkaller-00362-gfef7fd48922d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Workqueue: writeback wb_workfn (flush-7:0) Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:284 [inline] print_report+0x15e/0x45d mm/kasan/report.c:395 kasan_report+0xbb/0x1f0 mm/kasan/report.c:495 data_blkaddr fs/f2fs/f2fs.h:2891 [inline] is_alive fs/f2fs/gc.c:1117 [inline] gc_data_segment fs/f2fs/gc.c:1520 [inline] do_garbage_collect+0x386a/0x3df0 fs/f2fs/gc.c:1734 f2fs_gc+0x88c/0x20a0 fs/f2fs/gc.c:1831 f2fs_balance_fs+0x544/0x6b0 fs/f2fs/segment.c:410 f2fs_write_inode+0x57e/0xe20 fs/f2fs/inode.c:753 write_inode fs/fs-writeback.c:1440 [inline] __writeback_single_inode+0xcfc/0x1440 fs/fs-writeback.c:1652 writeback_sb_inodes+0x54d/0xf90 fs/fs-writeback.c:1870 wb_writeback+0x2c5/0xd70 fs/fs-writeback.c:2044 wb_do_writeback fs/fs-writeback.c:2187 [inline] wb_workfn+0x2dc/0x12f0 fs/fs-writeback.c:2227 process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289 worker_thread+0x665/0x1080 kernel/workqueue.c:2436 kthread+0x2e4/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
The root cause is that we forgot to do sanity check on .i_extra_isize in below path, result in accessing invalid address later, fix it. - gc_data_segment - is_alive - data_blkaddr - offset_in_addr
Reported-by: syzbot+f8f3dfa4abc489e768a1@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-f2fs-devel/0000000000003cb3c405ed5c17f9@google... Signed-off-by: Chao Yu chao@kernel.org Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/f2fs/gc.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
--- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -612,7 +612,7 @@ static bool is_alive(struct f2fs_sb_info { struct page *node_page; nid_t nid; - unsigned int ofs_in_node, max_addrs; + unsigned int ofs_in_node, max_addrs, base; block_t source_blkaddr;
nid = le32_to_cpu(sum->nid); @@ -638,11 +638,17 @@ static bool is_alive(struct f2fs_sb_info return false; }
- max_addrs = IS_INODE(node_page) ? DEF_ADDRS_PER_INODE : - DEF_ADDRS_PER_BLOCK; - if (ofs_in_node >= max_addrs) { - f2fs_err(sbi, "Inconsistent ofs_in_node:%u in summary, ino:%u, nid:%u, max:%u", - ofs_in_node, dni->ino, dni->nid, max_addrs); + if (IS_INODE(node_page)) { + base = offset_in_addr(F2FS_INODE(node_page)); + max_addrs = DEF_ADDRS_PER_INODE; + } else { + base = 0; + max_addrs = DEF_ADDRS_PER_BLOCK; + } + + if (base + ofs_in_node >= max_addrs) { + f2fs_err(sbi, "Inconsistent blkaddr offset: base:%u, ofs_in_node:%u, max:%u, ino:%u, nid:%u", + base, ofs_in_node, max_addrs, dni->ino, dni->nid); f2fs_put_page(node_page, 1); return false; }
From: Minsuk Kang linuxlovemin@yonsei.ac.kr
commit 4920ab131b2dbae7464b72bdcac465d070254209 upstream.
This patch fixes slab-out-of-bounds reads in brcmfmac that occur in brcmf_construct_chaninfo() and brcmf_enable_bw40_2g() when the count value of channel specifications provided by the device is greater than the length of 'list->element[]', decided by the size of the 'list' allocated with kzalloc(). The patch adds checks that make the functions free the buffer and return -EINVAL if that is the case. Note that the negative return is handled by the caller, brcmf_setup_wiphybands() or brcmf_cfg80211_attach().
Found by a modified version of syzkaller.
Crash Report from brcmf_construct_chaninfo(): ================================================================== BUG: KASAN: slab-out-of-bounds in brcmf_setup_wiphybands+0x1238/0x1430 Read of size 4 at addr ffff888115f24600 by task kworker/0:2/1896
CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G W O 5.14.0+ #132 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Workqueue: usb_hub_wq hub_event Call Trace: dump_stack_lvl+0x57/0x7d print_address_description.constprop.0.cold+0x93/0x334 kasan_report.cold+0x83/0xdf brcmf_setup_wiphybands+0x1238/0x1430 brcmf_cfg80211_attach+0x2118/0x3fd0 brcmf_attach+0x389/0xd40 brcmf_usb_probe+0x12de/0x1690 usb_probe_interface+0x25f/0x710 really_probe+0x1be/0xa90 __driver_probe_device+0x2ab/0x460 driver_probe_device+0x49/0x120 __device_attach_driver+0x18a/0x250 bus_for_each_drv+0x123/0x1a0 __device_attach+0x207/0x330 bus_probe_device+0x1a2/0x260 device_add+0xa61/0x1ce0 usb_set_configuration+0x984/0x1770 usb_generic_driver_probe+0x69/0x90 usb_probe_device+0x9c/0x220 really_probe+0x1be/0xa90 __driver_probe_device+0x2ab/0x460 driver_probe_device+0x49/0x120 __device_attach_driver+0x18a/0x250 bus_for_each_drv+0x123/0x1a0 __device_attach+0x207/0x330 bus_probe_device+0x1a2/0x260 device_add+0xa61/0x1ce0 usb_new_device.cold+0x463/0xf66 hub_event+0x10d5/0x3330 process_one_work+0x873/0x13e0 worker_thread+0x8b/0xd10 kthread+0x379/0x450 ret_from_fork+0x1f/0x30
Allocated by task 1896: kasan_save_stack+0x1b/0x40 __kasan_kmalloc+0x7c/0x90 kmem_cache_alloc_trace+0x19e/0x330 brcmf_setup_wiphybands+0x290/0x1430 brcmf_cfg80211_attach+0x2118/0x3fd0 brcmf_attach+0x389/0xd40 brcmf_usb_probe+0x12de/0x1690 usb_probe_interface+0x25f/0x710 really_probe+0x1be/0xa90 __driver_probe_device+0x2ab/0x460 driver_probe_device+0x49/0x120 __device_attach_driver+0x18a/0x250 bus_for_each_drv+0x123/0x1a0 __device_attach+0x207/0x330 bus_probe_device+0x1a2/0x260 device_add+0xa61/0x1ce0 usb_set_configuration+0x984/0x1770 usb_generic_driver_probe+0x69/0x90 usb_probe_device+0x9c/0x220 really_probe+0x1be/0xa90 __driver_probe_device+0x2ab/0x460 driver_probe_device+0x49/0x120 __device_attach_driver+0x18a/0x250 bus_for_each_drv+0x123/0x1a0 __device_attach+0x207/0x330 bus_probe_device+0x1a2/0x260 device_add+0xa61/0x1ce0 usb_new_device.cold+0x463/0xf66 hub_event+0x10d5/0x3330 process_one_work+0x873/0x13e0 worker_thread+0x8b/0xd10 kthread+0x379/0x450 ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff888115f24000 which belongs to the cache kmalloc-2k of size 2048 The buggy address is located 1536 bytes inside of 2048-byte region [ffff888115f24000, ffff888115f24800)
Memory state around the buggy address: ffff888115f24500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888115f24580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888115f24600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffff888115f24680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888115f24700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ==================================================================
Crash Report from brcmf_enable_bw40_2g(): ================================================================== BUG: KASAN: slab-out-of-bounds in brcmf_cfg80211_attach+0x3d11/0x3fd0 Read of size 4 at addr ffff888103787600 by task kworker/0:2/1896
CPU: 0 PID: 1896 Comm: kworker/0:2 Tainted: G W O 5.14.0+ #132 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Workqueue: usb_hub_wq hub_event Call Trace: dump_stack_lvl+0x57/0x7d print_address_description.constprop.0.cold+0x93/0x334 kasan_report.cold+0x83/0xdf brcmf_cfg80211_attach+0x3d11/0x3fd0 brcmf_attach+0x389/0xd40 brcmf_usb_probe+0x12de/0x1690 usb_probe_interface+0x25f/0x710 really_probe+0x1be/0xa90 __driver_probe_device+0x2ab/0x460 driver_probe_device+0x49/0x120 __device_attach_driver+0x18a/0x250 bus_for_each_drv+0x123/0x1a0 __device_attach+0x207/0x330 bus_probe_device+0x1a2/0x260 device_add+0xa61/0x1ce0 usb_set_configuration+0x984/0x1770 usb_generic_driver_probe+0x69/0x90 usb_probe_device+0x9c/0x220 really_probe+0x1be/0xa90 __driver_probe_device+0x2ab/0x460 driver_probe_device+0x49/0x120 __device_attach_driver+0x18a/0x250 bus_for_each_drv+0x123/0x1a0 __device_attach+0x207/0x330 bus_probe_device+0x1a2/0x260 device_add+0xa61/0x1ce0 usb_new_device.cold+0x463/0xf66 hub_event+0x10d5/0x3330 process_one_work+0x873/0x13e0 worker_thread+0x8b/0xd10 kthread+0x379/0x450 ret_from_fork+0x1f/0x30
Allocated by task 1896: kasan_save_stack+0x1b/0x40 __kasan_kmalloc+0x7c/0x90 kmem_cache_alloc_trace+0x19e/0x330 brcmf_cfg80211_attach+0x3302/0x3fd0 brcmf_attach+0x389/0xd40 brcmf_usb_probe+0x12de/0x1690 usb_probe_interface+0x25f/0x710 really_probe+0x1be/0xa90 __driver_probe_device+0x2ab/0x460 driver_probe_device+0x49/0x120 __device_attach_driver+0x18a/0x250 bus_for_each_drv+0x123/0x1a0 __device_attach+0x207/0x330 bus_probe_device+0x1a2/0x260 device_add+0xa61/0x1ce0 usb_set_configuration+0x984/0x1770 usb_generic_driver_probe+0x69/0x90 usb_probe_device+0x9c/0x220 really_probe+0x1be/0xa90 __driver_probe_device+0x2ab/0x460 driver_probe_device+0x49/0x120 __device_attach_driver+0x18a/0x250 bus_for_each_drv+0x123/0x1a0 __device_attach+0x207/0x330 bus_probe_device+0x1a2/0x260 device_add+0xa61/0x1ce0 usb_new_device.cold+0x463/0xf66 hub_event+0x10d5/0x3330 process_one_work+0x873/0x13e0 worker_thread+0x8b/0xd10 kthread+0x379/0x450 ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff888103787000 which belongs to the cache kmalloc-2k of size 2048 The buggy address is located 1536 bytes inside of 2048-byte region [ffff888103787000, ffff888103787800)
Memory state around the buggy address: ffff888103787500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888103787580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888103787600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffff888103787680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888103787700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ==================================================================
Reported-by: Dokyung Song dokyungs@yonsei.ac.kr Reported-by: Jisoo Jang jisoo.jang@yonsei.ac.kr Reported-by: Minsuk Kang linuxlovemin@yonsei.ac.kr Reviewed-by: Arend van Spriel arend.vanspriel@broadcom.com Signed-off-by: Minsuk Kang linuxlovemin@yonsei.ac.kr Signed-off-by: Kalle Valo kvalo@kernel.org Link: https://lore.kernel.org/r/20221116142952.518241-1-linuxlovemin@yonsei.ac.kr Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 17 ++++++++++++ 1 file changed, 17 insertions(+)
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c @@ -87,6 +87,9 @@ #define BRCMF_ASSOC_PARAMS_FIXED_SIZE \ (sizeof(struct brcmf_assoc_params_le) - sizeof(u16))
+#define BRCMF_MAX_CHANSPEC_LIST \ + (BRCMF_DCMD_MEDLEN / sizeof(__le32) - 1) + static bool check_vif_up(struct brcmf_cfg80211_vif *vif) { if (!test_bit(BRCMF_VIF_STATUS_READY, &vif->sme_state)) { @@ -6067,6 +6070,13 @@ static int brcmf_construct_chaninfo(stru band->channels[i].flags = IEEE80211_CHAN_DISABLED;
total = le32_to_cpu(list->count); + if (total > BRCMF_MAX_CHANSPEC_LIST) { + bphy_err(drvr, "Invalid count of channel Spec. (%u)\n", + total); + err = -EINVAL; + goto fail_pbuf; + } + for (i = 0; i < total; i++) { ch.chspec = (u16)le32_to_cpu(list->element[i]); cfg->d11inf.decchspec(&ch); @@ -6212,6 +6222,13 @@ static int brcmf_enable_bw40_2g(struct b band = cfg_to_wiphy(cfg)->bands[NL80211_BAND_2GHZ]; list = (struct brcmf_chanspec_list *)pbuf; num_chan = le32_to_cpu(list->count); + if (num_chan > BRCMF_MAX_CHANSPEC_LIST) { + bphy_err(drvr, "Invalid count of channel Spec. (%u)\n", + num_chan); + kfree(pbuf); + return -EINVAL; + } + for (i = 0; i < num_chan; i++) { ch.chspec = (u16)le32_to_cpu(list->element[i]); cfg->d11inf.decchspec(&ch);
From: Andreas Kemnade andreas@kemnade.info
[ Upstream commit bffb7d9d1a3dbd09e083b88aefd093b3b10abbfb ]
VAC needs to be wired up to produce proper measurements, without this change only near zero values are reported.
Reported-by: kernel test robot lkp@intel.com Reported-by: Julia Lawall julia.lawall@lip6.fr Fixes: 1696f36482e7 ("iio: twl6030-gpadc: TWL6030, TWL6032 GPADC driver") Signed-off-by: Andreas Kemnade andreas@kemnade.info Link: https://lore.kernel.org/r/20221217221305.671117-1-andreas@kemnade.info Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iio/adc/twl6030-gpadc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/adc/twl6030-gpadc.c b/drivers/iio/adc/twl6030-gpadc.c index 845e994e113ec..0a76081747580 100644 --- a/drivers/iio/adc/twl6030-gpadc.c +++ b/drivers/iio/adc/twl6030-gpadc.c @@ -952,7 +952,7 @@ static int twl6030_gpadc_probe(struct platform_device *pdev) }
ret = twl_i2c_write_u8(TWL6030_MODULE_ID0, - VBAT_MEAS | BB_MEAS | BB_MEAS, + VBAT_MEAS | BB_MEAS | VAC_MEAS, TWL6030_MISC1); if (ret < 0) { dev_err(dev, "failed to wire up inputs\n");
From: Josef Bacik josef@toxicpanda.com
commit 3c538de0f2a74d50aff7278c092f88ae59cee688 upstream.
There was a recent regression in btrfs/177 that started happening with the size class patches ("btrfs: introduce size class to block group allocator"). This however isn't a regression introduced by those patches, but rather the bug was uncovered by a change in behavior in these patches. The patches triggered more chunk allocations in the ^free-space-tree case, which uncovered a race with device shrink.
The problem is we will set the device total size to the new size, and use this to find a hole for a device extent. However during shrink we may have device extents allocated past this range, so we could potentially find a hole in a range past our new shrink size. We don't actually limit our found extent to the device size anywhere, we assume that we will not find a hole past our device size. This isn't true with shrink as we're relocating block groups and thus creating holes past the device size.
Fix this by making sure we do not search past the new device size, and if we wander into any device extents that start after our device size simply break from the loop and use whatever hole we've already found.
CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Josef Bacik josef@toxicpanda.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/volumes.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1701,7 +1701,7 @@ again: goto out; }
- while (1) { + while (search_start < search_end) { l = path->nodes[0]; slot = path->slots[0]; if (slot >= btrfs_header_nritems(l)) { @@ -1724,6 +1724,9 @@ again: if (key.type != BTRFS_DEV_EXTENT_KEY) goto next;
+ if (key.offset > search_end) + break; + if (key.offset > search_start) { hole_size = key.offset - search_start;
@@ -1794,6 +1797,7 @@ next: else ret = 0;
+ ASSERT(max_hole_start + max_hole_size <= search_end); out: btrfs_free_path(path); *start = max_hole_start;
From: Alexander Potapenko glider@google.com
commit eadd7deca0ad8a83edb2b894d8326c78e78635d6 upstream.
KMSAN reports uses of uninitialized memory in zlib's longest_match() called on memory originating from zlib_alloc_workspace(). This issue is known by zlib maintainers and is claimed to be harmless, but to be on the safe side we'd better initialize the memory.
Link: https://zlib.net/zlib_faq.html#faq36 Reported-by: syzbot+14d9e7602ebdf7ec0a60@syzkaller.appspotmail.com CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Alexander Potapenko glider@google.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/zlib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/btrfs/zlib.c +++ b/fs/btrfs/zlib.c @@ -74,7 +74,7 @@ static struct list_head *zlib_alloc_work
workspacesize = max(zlib_deflate_workspacesize(MAX_WBITS, MAX_MEM_LEVEL), zlib_inflate_workspacesize()); - workspace->strm.workspace = kvmalloc(workspacesize, GFP_KERNEL); + workspace->strm.workspace = kvzalloc(workspacesize, GFP_KERNEL); workspace->level = level; workspace->buf = kmalloc(PAGE_SIZE, GFP_KERNEL); if (!workspace->strm.workspace || !workspace->buf)
From: Artemii Karasev karasev@ispras.ru
commit 6a32425f953b955b4ff82f339d01df0b713caa5d upstream.
snd_emux_xg_control() can be called with an argument 'param' greater than size of 'control' array. It may lead to accessing 'control' array at a wrong index.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Artemii Karasev karasev@ispras.ru Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230207132026.2870-1-karasev@ispras.ru Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/synth/emux/emux_nrpn.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/sound/synth/emux/emux_nrpn.c +++ b/sound/synth/emux/emux_nrpn.c @@ -349,6 +349,9 @@ int snd_emux_xg_control(struct snd_emux_port *port, struct snd_midi_channel *chan, int param) { + if (param >= ARRAY_SIZE(chan->control)) + return -EINVAL; + return send_converted_effect(xg_effects, ARRAY_SIZE(xg_effects), port, chan, param, chan->control[param],
From: Shiju Jose shiju.jose@huawei.com
commit 3e46d910d8acf94e5360126593b68bf4fee4c4a1 upstream.
poll() and select() on per_cpu trace_pipe and trace_pipe_raw do not work since kernel 6.1-rc6. This issue is seen after the commit 42fb0a1e84ff525ebe560e2baf9451ab69127e2b ("tracing/ring-buffer: Have polling block on watermark").
This issue is firstly detected and reported, when testing the CXL error events in the rasdaemon and also erified using the test application for poll() and select().
This issue occurs for the per_cpu case, when calling the ring_buffer_poll_wait(), in kernel/trace/ring_buffer.c, with the buffer_percent > 0 and then wait until the percentage of pages are available. The default value set for the buffer_percent is 50 in the kernel/trace/trace.c.
As a fix, allow userspace application could set buffer_percent as 0 through the buffer_percent_fops, so that the task will wake up as soon as data is added to any of the specific cpu buffer.
Link: https://lore.kernel.org/linux-trace-kernel/20230202182309.742-2-shiju.jose@h...
Cc: mhiramat@kernel.org Cc: mchehab@kernel.org Cc: linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 42fb0a1e84ff5 ("tracing/ring-buffer: Have polling block on watermark") Signed-off-by: Shiju Jose shiju.jose@huawei.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace.c | 3 --- 1 file changed, 3 deletions(-)
--- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8298,9 +8298,6 @@ buffer_percent_write(struct file *filp, if (val > 100) return -EINVAL;
- if (!val) - val = 1; - tr->buffer_percent = val;
(*ppos)++;
From: Devid Antonio Filoni devid.filoni@egluetechnologies.com
commit 4ae5e1e97c44f4654516c1d41591a462ed62fa7b upstream.
The ISO 11783-5 standard, in "4.5.2 - Address claim requirements", states: d) No CF shall begin, or resume, transmission on the network until 250 ms after it has successfully claimed an address except when responding to a request for address-claimed.
But "Figure 6" and "Figure 7" in "4.5.4.2 - Address-claim prioritization" show that the CF begins the transmission after 250 ms from the first AC (address-claimed) message even if it sends another AC message during that time window to resolve the address contention with another CF.
As stated in "4.4.2.3 - Address-claimed message": In order to successfully claim an address, the CF sending an address claimed message shall not receive a contending claim from another CF for at least 250 ms.
As stated in "4.4.3.2 - NAME management (NM) message": 1) A commanding CF can d) request that a CF with a specified NAME transmit the address- claimed message with its current NAME. 2) A target CF shall d) send an address-claimed message in response to a request for a matching NAME
Taking the above arguments into account, the 250 ms wait is requested only during network initialization.
Do not restart the timer on AC message if both the NAME and the address match and so if the address has already been claimed (timer has expired) or the AC message has been sent to resolve the contention with another CF (timer is still running).
Signed-off-by: Devid Antonio Filoni devid.filoni@egluetechnologies.com Acked-by: Oleksij Rempel o.rempel@pengutronix.de Link: https://lore.kernel.org/all/20221125170418.34575-1-devid.filoni@egluetechnol... Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/can/j1939/address-claim.c | 40 +++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+)
diff --git a/net/can/j1939/address-claim.c b/net/can/j1939/address-claim.c index f33c47327927..ca4ad6cdd5cb 100644 --- a/net/can/j1939/address-claim.c +++ b/net/can/j1939/address-claim.c @@ -165,6 +165,46 @@ static void j1939_ac_process(struct j1939_priv *priv, struct sk_buff *skb) * leaving this function. */ ecu = j1939_ecu_get_by_name_locked(priv, name); + + if (ecu && ecu->addr == skcb->addr.sa) { + /* The ISO 11783-5 standard, in "4.5.2 - Address claim + * requirements", states: + * d) No CF shall begin, or resume, transmission on the + * network until 250 ms after it has successfully claimed + * an address except when responding to a request for + * address-claimed. + * + * But "Figure 6" and "Figure 7" in "4.5.4.2 - Address-claim + * prioritization" show that the CF begins the transmission + * after 250 ms from the first AC (address-claimed) message + * even if it sends another AC message during that time window + * to resolve the address contention with another CF. + * + * As stated in "4.4.2.3 - Address-claimed message": + * In order to successfully claim an address, the CF sending + * an address claimed message shall not receive a contending + * claim from another CF for at least 250 ms. + * + * As stated in "4.4.3.2 - NAME management (NM) message": + * 1) A commanding CF can + * d) request that a CF with a specified NAME transmit + * the address-claimed message with its current NAME. + * 2) A target CF shall + * d) send an address-claimed message in response to a + * request for a matching NAME + * + * Taking the above arguments into account, the 250 ms wait is + * requested only during network initialization. + * + * Do not restart the timer on AC message if both the NAME and + * the address match and so if the address has already been + * claimed (timer has expired) or the AC message has been sent + * to resolve the contention with another CF (timer is still + * running). + */ + goto out_ecu_put; + } + if (!ecu && j1939_address_is_unicast(skcb->addr.sa)) ecu = j1939_ecu_create_locked(priv, name);
From: Dean Luick dean.luick@cornelisnetworks.com
[ Upstream commit 6601fc0d15ffc20654e39486f9bef35567106d68 ]
Fix a resource leak if an error occurs.
Fixes: f404ca4c7ea8 ("IB/hfi1: Refactor hfi_user_exp_rcv_setup() IOCTL") Signed-off-by: Dean Luick dean.luick@cornelisnetworks.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@cornelisnetworks.com Link: https://lore.kernel.org/r/167354736291.2132367.10894218740150168180.stgit@aw... Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/hfi1/file_ops.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c index efd977f70f9ea..607e2636a6d16 100644 --- a/drivers/infiniband/hw/hfi1/file_ops.c +++ b/drivers/infiniband/hw/hfi1/file_ops.c @@ -1363,12 +1363,15 @@ static int user_exp_rcv_setup(struct hfi1_filedata *fd, unsigned long arg, addr = arg + offsetof(struct hfi1_tid_info, tidcnt); if (copy_to_user((void __user *)addr, &tinfo.tidcnt, sizeof(tinfo.tidcnt))) - return -EFAULT; + ret = -EFAULT;
addr = arg + offsetof(struct hfi1_tid_info, length); - if (copy_to_user((void __user *)addr, &tinfo.length, + if (!ret && copy_to_user((void __user *)addr, &tinfo.length, sizeof(tinfo.length))) ret = -EFAULT; + + if (ret) + hfi1_user_exp_rcv_invalid(fd, &tinfo); }
return ret;
From: Dragos Tatulea dtatulea@nvidia.com
[ Upstream commit e632291a2dbce45a24cddeb5fe28fe71d724ba43 ]
The cited commit creates child PKEY interfaces over netlink will multiple tx and rx queues, but some devices doesn't support more than 1 tx and 1 rx queues. This causes to a crash when traffic is sent over the PKEY interface due to the parent having a single queue but the child having multiple queues.
This patch fixes the number of queues to 1 for legacy IPoIB at the earliest possible point in time.
BUG: kernel NULL pointer dereference, address: 000000000000036b PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 4 PID: 209665 Comm: python3 Not tainted 6.1.0_for_upstream_min_debug_2022_12_12_17_02 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:kmem_cache_alloc+0xcb/0x450 Code: ce 7e 49 8b 50 08 49 83 78 10 00 4d 8b 28 0f 84 cb 02 00 00 4d 85 ed 0f 84 c2 02 00 00 41 8b 44 24 28 48 8d 4a 01 49 8b 3c 24 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 b8 41 8b RSP: 0018:ffff88822acbbab8 EFLAGS: 00010202 RAX: 0000000000000070 RBX: ffff8881c28e3e00 RCX: 00000000064f8dae RDX: 00000000064f8dad RSI: 0000000000000a20 RDI: 0000000000030d00 RBP: 0000000000000a20 R08: ffff8882f5d30d00 R09: ffff888104032f40 R10: ffff88810fade828 R11: 736f6d6570736575 R12: ffff88810081c000 R13: 00000000000002fb R14: ffffffff817fc865 R15: 0000000000000000 FS: 00007f9324ff9700(0000) GS:ffff8882f5d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000036b CR3: 00000001125af004 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> skb_clone+0x55/0xd0 ip6_finish_output2+0x3fe/0x690 ip6_finish_output+0xfa/0x310 ip6_send_skb+0x1e/0x60 udp_v6_send_skb+0x1e5/0x420 udpv6_sendmsg+0xb3c/0xe60 ? ip_mc_finish_output+0x180/0x180 ? __switch_to_asm+0x3a/0x60 ? __switch_to_asm+0x34/0x60 sock_sendmsg+0x33/0x40 __sys_sendto+0x103/0x160 ? _copy_to_user+0x21/0x30 ? kvm_clock_get_cycles+0xd/0x10 ? ktime_get_ts64+0x49/0xe0 __x64_sys_sendto+0x25/0x30 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f9374f1ed14 Code: 42 41 f8 ff 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 34 89 ef 48 89 44 24 08 e8 68 41 f8 ff 48 8b RSP: 002b:00007f9324ff7bd0 EFLAGS: 00000293 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007f9324ff7cc8 RCX: 00007f9374f1ed14 RDX: 00000000000002fb RSI: 00007f93000052f0 RDI: 0000000000000030 RBP: 0000000000000000 R08: 00007f9324ff7d40 R09: 000000000000001c R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 R13: 000000012a05f200 R14: 0000000000000001 R15: 00007f9374d57bdc </TASK>
Fixes: dbc94a0fb817 ("IB/IPoIB: Fix queue count inconsistency for PKEY child interfaces") Signed-off-by: Dragos Tatulea dtatulea@nvidia.com Link: https://lore.kernel.org/r/95eb6b74c7cf49fa46281f9d056d685c9fa11d38.167458457... Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/ulp/ipoib/ipoib_main.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 69ecf37053a81..3c3cc6af0a1ef 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -2171,6 +2171,14 @@ int ipoib_intf_init(struct ib_device *hca, u8 port, const char *name, rn->attach_mcast = ipoib_mcast_attach; rn->detach_mcast = ipoib_mcast_detach; rn->hca = hca; + + rc = netif_set_real_num_tx_queues(dev, 1); + if (rc) + goto out; + + rc = netif_set_real_num_rx_queues(dev, 1); + if (rc) + goto out; }
priv->rn_ops = dev->netdev_ops;
From: Tom Murphy murphyt7@tcd.ie
[ Upstream commit 781ca2de89bae1b1d2c96df9ef33e9a324415995 ]
Add a gfp_t parameter to the iommu_ops::map function. Remove the needless locking in the AMD iommu driver.
The iommu_ops::map function (or the iommu_map function which calls it) was always supposed to be sleepable (according to Joerg's comment in this thread: https://lore.kernel.org/patchwork/patch/977520/ ) and so should probably have had a "might_sleep()" since it was written. However currently the dma-iommu api can call iommu_map in an atomic context, which it shouldn't do. This doesn't cause any problems because any iommu driver which uses the dma-iommu api uses gfp_atomic in it's iommu_ops::map function. But doing this wastes the memory allocators atomic pools.
Signed-off-by: Tom Murphy murphyt7@tcd.ie Reviewed-by: Robin Murphy robin.murphy@arm.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Joerg Roedel jroedel@suse.de Stable-dep-of: b7e08a5a63a1 ("RDMA/usnic: use iommu_map_atomic() under spin_lock()") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/amd_iommu.c | 3 ++- drivers/iommu/arm-smmu-v3.c | 2 +- drivers/iommu/arm-smmu.c | 2 +- drivers/iommu/dma-iommu.c | 6 ++--- drivers/iommu/exynos-iommu.c | 2 +- drivers/iommu/intel-iommu.c | 2 +- drivers/iommu/iommu.c | 43 +++++++++++++++++++++++++++++----- drivers/iommu/ipmmu-vmsa.c | 2 +- drivers/iommu/msm_iommu.c | 2 +- drivers/iommu/mtk_iommu.c | 2 +- drivers/iommu/mtk_iommu_v1.c | 2 +- drivers/iommu/omap-iommu.c | 2 +- drivers/iommu/qcom_iommu.c | 2 +- drivers/iommu/rockchip-iommu.c | 2 +- drivers/iommu/s390-iommu.c | 2 +- drivers/iommu/tegra-gart.c | 2 +- drivers/iommu/tegra-smmu.c | 2 +- drivers/iommu/virtio-iommu.c | 2 +- include/linux/iommu.h | 21 ++++++++++++++++- 19 files changed, 77 insertions(+), 26 deletions(-)
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index c392930253a30..5e269f3dca3dd 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -3098,7 +3098,8 @@ static int amd_iommu_attach_device(struct iommu_domain *dom, }
static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, - phys_addr_t paddr, size_t page_size, int iommu_prot) + phys_addr_t paddr, size_t page_size, int iommu_prot, + gfp_t gfp) { struct protection_domain *domain = to_pdomain(dom); int prot = 0; diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 02c2fb551f381..4f64c3a9ee88d 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -2451,7 +2451,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) }
static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot) + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) { struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 2185ea5191c15..775acae84d7d2 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1160,7 +1160,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) }
static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot) + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) { struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu; diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 9c3e630c6c4c8..4fc8fb92d45ef 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -477,7 +477,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, if (!iova) return DMA_MAPPING_ERROR;
- if (iommu_map(domain, iova, phys - iova_off, size, prot)) { + if (iommu_map_atomic(domain, iova, phys - iova_off, size, prot)) { iommu_dma_free_iova(cookie, iova, size); return DMA_MAPPING_ERROR; } @@ -612,7 +612,7 @@ static void *iommu_dma_alloc_remap(struct device *dev, size_t size, arch_dma_prep_coherent(sg_page(sg), sg->length); }
- if (iommu_map_sg(domain, iova, sgt.sgl, sgt.orig_nents, ioprot) + if (iommu_map_sg_atomic(domain, iova, sgt.sgl, sgt.orig_nents, ioprot) < size) goto out_free_sg;
@@ -872,7 +872,7 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, * We'll leave any physical concatenation to the IOMMU driver's * implementation - it knows better than we do. */ - if (iommu_map_sg(domain, iova, sg, nents, prot) < iova_len) + if (iommu_map_sg_atomic(domain, iova, sg, nents, prot) < iova_len) goto out_free_iova;
return __finalise_sg(dev, sg, nents, iova); diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c index 31a9b9885653f..acf21e4237949 100644 --- a/drivers/iommu/exynos-iommu.c +++ b/drivers/iommu/exynos-iommu.c @@ -1077,7 +1077,7 @@ static int lv2set_page(sysmmu_pte_t *pent, phys_addr_t paddr, size_t size, */ static int exynos_iommu_map(struct iommu_domain *iommu_domain, unsigned long l_iova, phys_addr_t paddr, size_t size, - int prot) + int prot, gfp_t gfp) { struct exynos_iommu_domain *domain = to_exynos_domain(iommu_domain); sysmmu_pte_t *entry; diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index ff120d7ed3424..2a53fa7d2b47d 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5458,7 +5458,7 @@ static void intel_iommu_aux_detach_device(struct iommu_domain *domain,
static int intel_iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t hpa, - size_t size, int iommu_prot) + size_t size, int iommu_prot, gfp_t gfp) { struct dmar_domain *dmar_domain = to_dmar_domain(domain); u64 max_addr; diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index c5758fb696cc8..ef8f0d1ac5ab1 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1858,8 +1858,8 @@ static size_t iommu_pgsize(struct iommu_domain *domain, return pgsize; }
-int iommu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot) +int __iommu_map(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) { const struct iommu_ops *ops = domain->ops; unsigned long orig_iova = iova; @@ -1896,8 +1896,8 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n", iova, &paddr, pgsize); + ret = ops->map(domain, iova, paddr, pgsize, prot, gfp);
- ret = ops->map(domain, iova, paddr, pgsize, prot); if (ret) break;
@@ -1917,8 +1917,22 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
return ret; } + +int iommu_map(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t size, int prot) +{ + might_sleep(); + return __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL); +} EXPORT_SYMBOL_GPL(iommu_map);
+int iommu_map_atomic(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t size, int prot) +{ + return __iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC); +} +EXPORT_SYMBOL_GPL(iommu_map_atomic); + static size_t __iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size, struct iommu_iotlb_gather *iotlb_gather) @@ -1995,8 +2009,9 @@ size_t iommu_unmap_fast(struct iommu_domain *domain, } EXPORT_SYMBOL_GPL(iommu_unmap_fast);
-size_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova, - struct scatterlist *sg, unsigned int nents, int prot) +size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova, + struct scatterlist *sg, unsigned int nents, int prot, + gfp_t gfp) { size_t len = 0, mapped = 0; phys_addr_t start; @@ -2007,7 +2022,9 @@ size_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova, phys_addr_t s_phys = sg_phys(sg);
if (len && s_phys != start + len) { - ret = iommu_map(domain, iova + mapped, start, len, prot); + ret = __iommu_map(domain, iova + mapped, start, + len, prot, gfp); + if (ret) goto out_err;
@@ -2035,8 +2052,22 @@ size_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova, return 0;
} + +size_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova, + struct scatterlist *sg, unsigned int nents, int prot) +{ + might_sleep(); + return __iommu_map_sg(domain, iova, sg, nents, prot, GFP_KERNEL); +} EXPORT_SYMBOL_GPL(iommu_map_sg);
+size_t iommu_map_sg_atomic(struct iommu_domain *domain, unsigned long iova, + struct scatterlist *sg, unsigned int nents, int prot) +{ + return __iommu_map_sg(domain, iova, sg, nents, prot, GFP_ATOMIC); +} +EXPORT_SYMBOL_GPL(iommu_map_sg_atomic); + int iommu_domain_window_enable(struct iommu_domain *domain, u32 wnd_nr, phys_addr_t paddr, u64 size, int prot) { diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c index 584eefab1dbcd..33874c7aea42c 100644 --- a/drivers/iommu/ipmmu-vmsa.c +++ b/drivers/iommu/ipmmu-vmsa.c @@ -724,7 +724,7 @@ static void ipmmu_detach_device(struct iommu_domain *io_domain, }
static int ipmmu_map(struct iommu_domain *io_domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot) + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) { struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c index cba0097eba39c..696a6c53a69fd 100644 --- a/drivers/iommu/msm_iommu.c +++ b/drivers/iommu/msm_iommu.c @@ -504,7 +504,7 @@ static void msm_iommu_detach_dev(struct iommu_domain *domain, }
static int msm_iommu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t pa, size_t len, int prot) + phys_addr_t pa, size_t len, int prot, gfp_t gfp) { struct msm_priv *priv = to_msm_priv(domain); unsigned long flags; diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 18d7c818a174c..dad44e10b3314 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -427,7 +427,7 @@ static void mtk_iommu_detach_device(struct iommu_domain *domain, }
static int mtk_iommu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot) + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) { struct mtk_iommu_domain *dom = to_mtk_domain(domain); struct mtk_iommu_data *data = mtk_iommu_get_m4u_data(); diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c index e31bd281e59d6..fa602bfe69512 100644 --- a/drivers/iommu/mtk_iommu_v1.c +++ b/drivers/iommu/mtk_iommu_v1.c @@ -295,7 +295,7 @@ static void mtk_iommu_detach_device(struct iommu_domain *domain, }
static int mtk_iommu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot) + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) { struct mtk_iommu_domain *dom = to_mtk_domain(domain); unsigned int page_num = size >> MT2701_IOMMU_PAGE_SHIFT; diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c index 09c6e1c680db9..be551cc34be45 100644 --- a/drivers/iommu/omap-iommu.c +++ b/drivers/iommu/omap-iommu.c @@ -1339,7 +1339,7 @@ static u32 iotlb_init_entry(struct iotlb_entry *e, u32 da, u32 pa, int pgsz) }
static int omap_iommu_map(struct iommu_domain *domain, unsigned long da, - phys_addr_t pa, size_t bytes, int prot) + phys_addr_t pa, size_t bytes, int prot, gfp_t gfp) { struct omap_iommu_domain *omap_domain = to_omap_domain(domain); struct device *dev = omap_domain->dev; diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c index b6e546b62a7cb..e377f7771e30b 100644 --- a/drivers/iommu/qcom_iommu.c +++ b/drivers/iommu/qcom_iommu.c @@ -419,7 +419,7 @@ static void qcom_iommu_detach_dev(struct iommu_domain *domain, struct device *de }
static int qcom_iommu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot) + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) { int ret; unsigned long flags; diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index 0df091934361b..96f37d9d4d93c 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -758,7 +758,7 @@ static int rk_iommu_map_iova(struct rk_iommu_domain *rk_domain, u32 *pte_addr, }
static int rk_iommu_map(struct iommu_domain *domain, unsigned long _iova, - phys_addr_t paddr, size_t size, int prot) + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) { struct rk_iommu_domain *rk_domain = to_rk_domain(domain); unsigned long flags; diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c index 3b0b18e23187c..1137f3ddcb851 100644 --- a/drivers/iommu/s390-iommu.c +++ b/drivers/iommu/s390-iommu.c @@ -265,7 +265,7 @@ static int s390_iommu_update_trans(struct s390_domain *s390_domain, }
static int s390_iommu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot) + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) { struct s390_domain *s390_domain = to_s390_domain(domain); int flags = ZPCI_PTE_VALID, rc = 0; diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c index 3924f7c055440..3fb7ba72507de 100644 --- a/drivers/iommu/tegra-gart.c +++ b/drivers/iommu/tegra-gart.c @@ -178,7 +178,7 @@ static inline int __gart_iommu_map(struct gart_device *gart, unsigned long iova, }
static int gart_iommu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t pa, size_t bytes, int prot) + phys_addr_t pa, size_t bytes, int prot, gfp_t gfp) { struct gart_device *gart = gart_handle; int ret; diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c index dd486233e2828..576be3f245daa 100644 --- a/drivers/iommu/tegra-smmu.c +++ b/drivers/iommu/tegra-smmu.c @@ -651,7 +651,7 @@ static void tegra_smmu_set_pte(struct tegra_smmu_as *as, unsigned long iova, }
static int tegra_smmu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot) + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) { struct tegra_smmu_as *as = to_smmu_as(domain); dma_addr_t pte_dma; diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 60e659a24f90b..37e2267acf295 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -715,7 +715,7 @@ static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev) }
static int viommu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot) + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) { int ret; u32 flags; diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 29bac5345563a..6ca3fb2873d7d 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -256,7 +256,7 @@ struct iommu_ops { int (*attach_dev)(struct iommu_domain *domain, struct device *dev); void (*detach_dev)(struct iommu_domain *domain, struct device *dev); int (*map)(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot); + phys_addr_t paddr, size_t size, int prot, gfp_t gfp); size_t (*unmap)(struct iommu_domain *domain, unsigned long iova, size_t size, struct iommu_iotlb_gather *iotlb_gather); void (*flush_iotlb_all)(struct iommu_domain *domain); @@ -421,6 +421,8 @@ extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev); extern struct iommu_domain *iommu_get_dma_domain(struct device *dev); extern int iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot); +extern int iommu_map_atomic(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t size, int prot); extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size); extern size_t iommu_unmap_fast(struct iommu_domain *domain, @@ -428,6 +430,9 @@ extern size_t iommu_unmap_fast(struct iommu_domain *domain, struct iommu_iotlb_gather *iotlb_gather); extern size_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova, struct scatterlist *sg,unsigned int nents, int prot); +extern size_t iommu_map_sg_atomic(struct iommu_domain *domain, + unsigned long iova, struct scatterlist *sg, + unsigned int nents, int prot); extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova); extern void iommu_set_fault_handler(struct iommu_domain *domain, iommu_fault_handler_t handler, void *token); @@ -662,6 +667,13 @@ static inline int iommu_map(struct iommu_domain *domain, unsigned long iova, return -ENODEV; }
+static inline int iommu_map_atomic(struct iommu_domain *domain, + unsigned long iova, phys_addr_t paddr, + size_t size, int prot) +{ + return -ENODEV; +} + static inline size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) { @@ -682,6 +694,13 @@ static inline size_t iommu_map_sg(struct iommu_domain *domain, return 0; }
+static inline size_t iommu_map_sg_atomic(struct iommu_domain *domain, + unsigned long iova, struct scatterlist *sg, + unsigned int nents, int prot) +{ + return 0; +} + static inline void iommu_flush_tlb_all(struct iommu_domain *domain) { }
From: Yang Yingliang yangyingliang@huawei.com
[ Upstream commit b7e08a5a63a11627601915473c3b569c1f6c6c06 ]
usnic_uiom_map_sorted_intervals() is called under spin_lock(), iommu_map() might sleep, use iommu_map_atomic() to avoid potential sleep in atomic context.
Fixes: e3cf00d0a87f ("IB/usnic: Add Cisco VIC low-level hardware driver") Signed-off-by: Yang Yingliang yangyingliang@huawei.com Link: https://lore.kernel.org/r/20230129093757.637354-1-yangyingliang@huawei.com Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/usnic/usnic_uiom.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c index 62e6ffa9ad78e..1d060ae47933e 100644 --- a/drivers/infiniband/hw/usnic/usnic_uiom.c +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c @@ -281,8 +281,8 @@ static int usnic_uiom_map_sorted_intervals(struct list_head *intervals, size = pa_end - pa_start + PAGE_SIZE; usnic_dbg("va 0x%lx pa %pa size 0x%zx flags 0x%x", va_start, &pa_start, size, flags); - err = iommu_map(pd->domain, va_start, pa_start, - size, flags); + err = iommu_map_atomic(pd->domain, va_start, + pa_start, size, flags); if (err) { usnic_err("Failed to map va 0x%lx pa %pa size 0x%zx with err %d\n", va_start, &pa_start, size, err); @@ -298,8 +298,8 @@ static int usnic_uiom_map_sorted_intervals(struct list_head *intervals, size = pa - pa_start + PAGE_SIZE; usnic_dbg("va 0x%lx pa %pa size 0x%zx flags 0x%x\n", va_start, &pa_start, size, flags); - err = iommu_map(pd->domain, va_start, pa_start, - size, flags); + err = iommu_map_atomic(pd->domain, va_start, + pa_start, size, flags); if (err) { usnic_err("Failed to map va 0x%lx pa %pa size 0x%zx with err %d\n", va_start, &pa_start, size, err);
From: Christian Hopps chopps@chopps.org
[ Upstream commit 6028da3f125fec34425dbd5fec18e85d372b2af6 ]
When copying the DSCP bits for decap-dscp into IPv6 don't assume the outer encap is always IPv6. Instead, as with the inner IPv4 case, copy the DSCP bits from the correctly saved "tos" value in the control block.
Fixes: 227620e29509 ("[IPSEC]: Separate inner/outer mode processing on input") Signed-off-by: Christian Hopps chopps@chopps.org Acked-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/xfrm/xfrm_input.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index e120df0a6da13..4d8d7cf3d1994 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -274,8 +274,7 @@ static int xfrm6_remove_tunnel_encap(struct xfrm_state *x, struct sk_buff *skb) goto out;
if (x->props.flags & XFRM_STATE_DECAP_DSCP) - ipv6_copy_dscp(ipv6_get_dsfield(ipv6_hdr(skb)), - ipipv6_hdr(skb)); + ipv6_copy_dscp(XFRM_MODE_SKB_CB(skb)->tos, ipipv6_hdr(skb)); if (!(x->props.flags & XFRM_STATE_NOECN)) ipip6_ecn_decapsulate(skb);
From: Qi Zheng zhengqi.arch@bytedance.com
[ Upstream commit cbe83191d40d8925b7a99969d037d2a0caf69294 ]
Since commit ff9fb72bc077 ("debugfs: return error values, not NULL") changed return value of debugfs_rename() in error cases from %NULL to %ERR_PTR(-ERROR), we should also check error values instead of NULL.
Fixes: ff9fb72bc077 ("debugfs: return error values, not NULL") Signed-off-by: Qi Zheng zhengqi.arch@bytedance.com Acked-by: Jay Vosburgh jay.vosburgh@canonical.com Link: https://lore.kernel.org/r/20230202093256.32458-1-zhengqi.arch@bytedance.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_debugfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_debugfs.c b/drivers/net/bonding/bond_debugfs.c index f3f86ef68ae0c..8b6cf2bf9025a 100644 --- a/drivers/net/bonding/bond_debugfs.c +++ b/drivers/net/bonding/bond_debugfs.c @@ -76,7 +76,7 @@ void bond_debug_reregister(struct bonding *bond)
d = debugfs_rename(bonding_debug_root, bond->debug_dir, bonding_debug_root, bond->dev->name); - if (d) { + if (!IS_ERR(d)) { bond->debug_dir = d; } else { netdev_warn(bond->dev, "failed to reregister, so just unregister old one\n");
From: Heiner Kallweit hkallweit1@gmail.com
[ Upstream commit 69ff53e4a4c9498eeed7d1441f68a1481dc69251 ]
Jerome provided the information that also the GXL internal PHY doesn't support MMD register access and EEE. MMD reads return 0xffff, what results in e.g. completely wrong ethtool --show-eee output. Therefore use the MMD dummy stubs.
Fixes: d853d145ea3e ("net: phy: add an option to disable EEE advertisement") Suggested-by: Jerome Brunet jbrunet@baylibre.com Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Link: https://lore.kernel.org/r/84432fe4-0be4-bc82-4e5c-557206b40f56@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/meson-gxl.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c index f7a9e6599a642..39151ec6f65e2 100644 --- a/drivers/net/phy/meson-gxl.c +++ b/drivers/net/phy/meson-gxl.c @@ -235,6 +235,8 @@ static struct phy_driver meson_gxl_phy[] = { .config_intr = meson_gxl_config_intr, .suspend = genphy_suspend, .resume = genphy_resume, + .read_mmd = genphy_read_mmd_unsupported, + .write_mmd = genphy_write_mmd_unsupported, }, { PHY_ID_MATCH_EXACT(0x01803301), .name = "Meson G12A Internal PHY",
From: Neel Patel neel.patel@amd.com
[ Upstream commit e8797a058466b60fc5a3291b92430c93ba90eaff ]
Clear the interrupt credits before enabling the queue rather than after to be sure that the enabled queue starts at 0 and that we don't wipe away possible credits after enabling the queue.
Fixes: 0f3154e6bcb3 ("ionic: Add Tx and Rx handling") Signed-off-by: Neel Patel neel.patel@amd.com Signed-off-by: Shannon Nelson shannon.nelson@amd.com Reviewed-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/pensando/ionic/ionic_lif.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c b/drivers/net/ethernet/pensando/ionic/ionic_lif.c index f9c303d76658a..d0841836cf705 100644 --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c @@ -190,6 +190,7 @@ static int ionic_qcq_enable(struct ionic_qcq *qcq) .oper = IONIC_Q_ENABLE, }, }; + int ret;
idev = &lif->ionic->idev; dev = lif->ionic->dev; @@ -197,16 +198,24 @@ static int ionic_qcq_enable(struct ionic_qcq *qcq) dev_dbg(dev, "q_enable.index %d q_enable.qtype %d\n", ctx.cmd.q_control.index, ctx.cmd.q_control.type);
+ if (qcq->flags & IONIC_QCQ_F_INTR) + ionic_intr_clean(idev->intr_ctrl, qcq->intr.index); + + ret = ionic_adminq_post_wait(lif, &ctx); + if (ret) + return ret; + + if (qcq->napi.poll) + napi_enable(&qcq->napi); + if (qcq->flags & IONIC_QCQ_F_INTR) { irq_set_affinity_hint(qcq->intr.vector, &qcq->intr.affinity_mask); - napi_enable(&qcq->napi); - ionic_intr_clean(idev->intr_ctrl, qcq->intr.index); ionic_intr_mask(idev->intr_ctrl, qcq->intr.index, IONIC_INTR_MASK_CLEAR); }
- return ionic_adminq_post_wait(lif, &ctx); + return 0; }
static int ionic_qcq_disable(struct ionic_qcq *qcq)
From: Anirudh Venkataramanan anirudh.venkataramanan@intel.com
[ Upstream commit 4d159f7884f78b1aacb99b4fc37d1e3cb1194e39 ]
When both ice and the irdma driver are loaded, a warning in check_flush_dependency is being triggered. This is due to ice driver workqueue being allocated with the WQ_MEM_RECLAIM flag and the irdma one is not.
According to kernel documentation, this flag should be set if the workqueue will be involved in the kernel's memory reclamation flow. Since it is not, there is no need for the ice driver's WQ to have this flag set so remove it.
Example trace:
[ +0.000004] workqueue: WQ_MEM_RECLAIM ice:ice_service_task [ice] is flushing !WQ_MEM_RECLAIM infiniband:0x0 [ +0.000139] WARNING: CPU: 0 PID: 728 at kernel/workqueue.c:2632 check_flush_dependency+0x178/0x1a0 [ +0.000011] Modules linked in: bonding tls xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_cha in_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rfkill vfat fat intel_rapl_msr intel _rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct1 0dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_ core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_cm iw_cm iTCO_wdt iTCO_vendor_support ipmi_ssif irdma mei_me ib_uverbs ib_core intel_uncore joydev pcspkr i2c_i801 acpi_ipmi mei lpc_ich i2c_smbus intel_pch_thermal ioatdma ipmi_si acpi_power_meter acpi_pad xfs libcrc32c sd_mod t10_pi crc64_rocksoft crc64 sg ahci ixgbe libahci ice i40e igb crc32c_intel mdio i2c_algo_bit liba ta dca wmi dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse [ +0.000161] [last unloaded: bonding] [ +0.000006] CPU: 0 PID: 728 Comm: kworker/0:2 Tainted: G S 6.2.0-rc2_next-queue-13jan-00458-gc20aabd57164 #1 [ +0.000006] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020 [ +0.000003] Workqueue: ice ice_service_task [ice] [ +0.000127] RIP: 0010:check_flush_dependency+0x178/0x1a0 [ +0.000005] Code: 89 8e 02 01 e8 49 3d 40 00 49 8b 55 18 48 8d 8d d0 00 00 00 48 8d b3 d0 00 00 00 4d 89 e0 48 c7 c7 e0 3b 08 9f e8 bb d3 07 01 <0f> 0b e9 be fe ff ff 80 3d 24 89 8e 02 00 0f 85 6b ff ff ff e9 06 [ +0.000004] RSP: 0018:ffff88810a39f990 EFLAGS: 00010282 [ +0.000005] RAX: 0000000000000000 RBX: ffff888141bc2400 RCX: 0000000000000000 [ +0.000004] RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffa1213a80 [ +0.000003] RBP: ffff888194bf3400 R08: ffffed117b306112 R09: ffffed117b306112 [ +0.000003] R10: ffff888bd983088b R11: ffffed117b306111 R12: 0000000000000000 [ +0.000003] R13: ffff888111f84d00 R14: ffff88810a3943ac R15: ffff888194bf3400 [ +0.000004] FS: 0000000000000000(0000) GS:ffff888bd9800000(0000) knlGS:0000000000000000 [ +0.000003] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000003] CR2: 000056035b208b60 CR3: 000000017795e005 CR4: 00000000007706f0 [ +0.000003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ +0.000003] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ +0.000002] PKRU: 55555554 [ +0.000003] Call Trace: [ +0.000002] <TASK> [ +0.000003] __flush_workqueue+0x203/0x840 [ +0.000006] ? mutex_unlock+0x84/0xd0 [ +0.000008] ? __pfx_mutex_unlock+0x10/0x10 [ +0.000004] ? __pfx___flush_workqueue+0x10/0x10 [ +0.000006] ? mutex_lock+0xa3/0xf0 [ +0.000005] ib_cache_cleanup_one+0x39/0x190 [ib_core] [ +0.000174] __ib_unregister_device+0x84/0xf0 [ib_core] [ +0.000094] ib_unregister_device+0x25/0x30 [ib_core] [ +0.000093] irdma_ib_unregister_device+0x97/0xc0 [irdma] [ +0.000064] ? __pfx_irdma_ib_unregister_device+0x10/0x10 [irdma] [ +0.000059] ? up_write+0x5c/0x90 [ +0.000005] irdma_remove+0x36/0x90 [irdma] [ +0.000062] auxiliary_bus_remove+0x32/0x50 [ +0.000007] device_release_driver_internal+0xfa/0x1c0 [ +0.000005] bus_remove_device+0x18a/0x260 [ +0.000007] device_del+0x2e5/0x650 [ +0.000005] ? __pfx_device_del+0x10/0x10 [ +0.000003] ? mutex_unlock+0x84/0xd0 [ +0.000004] ? __pfx_mutex_unlock+0x10/0x10 [ +0.000004] ? _raw_spin_unlock+0x18/0x40 [ +0.000005] ice_unplug_aux_dev+0x52/0x70 [ice] [ +0.000160] ice_service_task+0x1309/0x14f0 [ice] [ +0.000134] ? __pfx___schedule+0x10/0x10 [ +0.000006] process_one_work+0x3b1/0x6c0 [ +0.000008] worker_thread+0x69/0x670 [ +0.000005] ? __kthread_parkme+0xec/0x110 [ +0.000007] ? __pfx_worker_thread+0x10/0x10 [ +0.000005] kthread+0x17f/0x1b0 [ +0.000005] ? __pfx_kthread+0x10/0x10 [ +0.000004] ret_from_fork+0x29/0x50 [ +0.000009] </TASK>
Fixes: 940b61af02f4 ("ice: Initialize PF and setup miscellaneous interrupt") Signed-off-by: Anirudh Venkataramanan anirudh.venkataramanan@intel.com Signed-off-by: Marcin Szycik marcin.szycik@linux.intel.com Tested-by: Jakub Andrysiak jakub.andrysiak@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Reviewed-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 7d28563ab7946..ae1d305672259 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -3211,7 +3211,7 @@ static int __init ice_module_init(void) pr_info("%s - version %s\n", ice_driver_string, ice_drv_ver); pr_info("%s\n", ice_copyright);
- ice_wq = alloc_workqueue("%s", WQ_MEM_RECLAIM, 0, KBUILD_MODNAME); + ice_wq = alloc_workqueue("%s", 0, 0, KBUILD_MODNAME); if (!ice_wq) { pr_err("Failed to create workqueue\n"); return -ENOMEM;
From: Pietro Borrello borrello@diag.uniroma1.it
[ Upstream commit f753a68980cf4b59a80fe677619da2b1804f526d ]
rds_rm_zerocopy_callback() uses list_entry() on the head of a list causing a type confusion. Use list_first_entry() to actually access the first element of the rs_zcookie_queue list.
Fixes: 9426bbc6de99 ("rds: use list structure to track information for zerocopy completion notification") Reviewed-by: Willem de Bruijn willemb@google.com Signed-off-by: Pietro Borrello borrello@diag.uniroma1.it Link: https://lore.kernel.org/r/20230202-rds-zerocopy-v3-1-83b0df974f9a@diag.uniro... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/rds/message.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/rds/message.c b/net/rds/message.c index 92b6b22884d4c..be6a0a073b12a 100644 --- a/net/rds/message.c +++ b/net/rds/message.c @@ -104,9 +104,9 @@ static void rds_rm_zerocopy_callback(struct rds_sock *rs, spin_lock_irqsave(&q->lock, flags); head = &q->zcookie_head; if (!list_empty(head)) { - info = list_entry(head, struct rds_msg_zcopy_info, - rs_zcookie_next); - if (info && rds_zcookie_add(info, cookie)) { + info = list_first_entry(head, struct rds_msg_zcopy_info, + rs_zcookie_next); + if (rds_zcookie_add(info, cookie)) { spin_unlock_irqrestore(&q->lock, flags); kfree(rds_info_from_znotifier(znotif)); /* caller invokes rds_wake_sk_sleep() */
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit 3a082086aa200852545cf15159213582c0c80eba ]
When set/restore sysctl value, we should quote the value as some keys may have multi values, e.g. net.ipv4.ping_group_range
Fixes: f5ae57784ba8 ("selftests: forwarding: lib: Add sysctl_set(), sysctl_restore()") Signed-off-by: Hangbin Liu liuhangbin@gmail.com Reviewed-by: Petr Machata petrm@nvidia.com Link: https://lore.kernel.org/r/20230208032110.879205-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/forwarding/lib.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh index f190ad58e00d4..4d8845a68cf28 100644 --- a/tools/testing/selftests/net/forwarding/lib.sh +++ b/tools/testing/selftests/net/forwarding/lib.sh @@ -562,14 +562,14 @@ sysctl_set() local value=$1; shift
SYSCTL_ORIG[$key]=$(sysctl -n $key) - sysctl -qw $key=$value + sysctl -qw $key="$value" }
sysctl_restore() { local key=$1; shift
- sysctl -qw $key=${SYSCTL_ORIG["$key"]} + sysctl -qw $key="${SYSCTL_ORIG[$key]}" }
forwarding_enable()
From: Dan Carpenter error27@gmail.com
[ Upstream commit 5dac9f8dc25fefd9d928b98f6477ff3daefd73e3 ]
This loop accidentally reuses the "i" iterator for both the inside and the outside loop. The value of MAX_STREAM_BUFFER is 5. I believe that chip->rmh.stat_len is in the 2-12 range. If the value of .stat_len is 4 or more then it will loop exactly one time, but if it's less then it is a forever loop.
It looks like it was supposed to combined into one loop where conditions are checked.
Fixes: 8e6320064c33 ("ALSA: lx_core: Remove useless #if 0 .. #endif") Signed-off-by: Dan Carpenter error27@gmail.com Link: https://lore.kernel.org/r/Y9jnJTis/mRFJAQp@kili Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/pci/lx6464es/lx_core.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/sound/pci/lx6464es/lx_core.c b/sound/pci/lx6464es/lx_core.c index dd3a873777eb5..00975e86473c5 100644 --- a/sound/pci/lx6464es/lx_core.c +++ b/sound/pci/lx6464es/lx_core.c @@ -493,12 +493,11 @@ int lx_buffer_ask(struct lx6464es *chip, u32 pipe, int is_capture, dev_dbg(chip->card->dev, "CMD_08_ASK_BUFFERS: needed %d, freed %d\n", *r_needed, *r_freed); - for (i = 0; i < MAX_STREAM_BUFFER; ++i) { - for (i = 0; i != chip->rmh.stat_len; ++i) - dev_dbg(chip->card->dev, - " stat[%d]: %x, %x\n", i, - chip->rmh.stat[i], - chip->rmh.stat[i] & MASK_DATA_SIZE); + for (i = 0; i < MAX_STREAM_BUFFER && i < chip->rmh.stat_len; + ++i) { + dev_dbg(chip->card->dev, " stat[%d]: %x, %x\n", i, + chip->rmh.stat[i], + chip->rmh.stat[i] & MASK_DATA_SIZE); } }
From: Joel Stanley joel@jms.id.au
[ Upstream commit 287a344a11f1ebd31055cf9b22c88d7005f108d7 ]
The function signature is int, but we return a bool. Instead return a negative errno as the kerneldoc suggests.
Fixes: 4d3d0e4272d8 ("pinctrl: Add core support for Aspeed SoCs") Signed-off-by: Joel Stanley joel@jms.id.au Reviewed-by: Andrew Jeffery andrew@aj.id.au Link: https://lore.kernel.org/r/20230119231856.52014-1-joel@jms.id.au Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pinctrl/aspeed/pinctrl-aspeed.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pinctrl/aspeed/pinctrl-aspeed.c b/drivers/pinctrl/aspeed/pinctrl-aspeed.c index 22aca6d182c0c..2c1e799b029e7 100644 --- a/drivers/pinctrl/aspeed/pinctrl-aspeed.c +++ b/drivers/pinctrl/aspeed/pinctrl-aspeed.c @@ -115,7 +115,7 @@ static int aspeed_disable_sig(struct aspeed_pinmux_data *ctx, int ret = 0;
if (!exprs) - return true; + return -EINVAL;
while (*exprs && !ret) { ret = aspeed_sig_expr_disable(ctx, *exprs);
From: Maxim Korotkov korotkov.maxim.s@gmail.com
[ Upstream commit d2d73e6d4822140445ad4a7b1c6091e0f5fe703b ]
Added checking of pointer "function" in pcs_set_mux(). pinmux_generic_get_function() can return NULL and the pointer "function" was dereferenced without checking against NULL.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 571aec4df5b7 ("pinctrl: single: Use generic pinmux helpers for managing functions") Signed-off-by: Maxim Korotkov korotkov.maxim.s@gmail.com Reviewed-by: Tony Lindgren tony@atomide.com Link: https://lore.kernel.org/r/20221118104332.943-1-korotkov.maxim.s@gmail.com Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pinctrl/pinctrl-single.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/pinctrl/pinctrl-single.c b/drivers/pinctrl/pinctrl-single.c index 20c89023d312e..ce5be6f0b7aac 100644 --- a/drivers/pinctrl/pinctrl-single.c +++ b/drivers/pinctrl/pinctrl-single.c @@ -345,6 +345,8 @@ static int pcs_set_mux(struct pinctrl_dev *pctldev, unsigned fselector, if (!pcs->fmask) return 0; function = pinmux_generic_get_function(pctldev, fselector); + if (!function) + return -EINVAL; func = function->data; if (!func) return -EINVAL;
From: Andy Shevchenko andriy.shevchenko@linux.intel.com
[ Upstream commit a8520be3ffef3d25b53bf171a7ebe17ee0154175 ]
If the firmware mangled the register contents too much, check the saved value for the Direct IRQ mode. If it matches, we will restore the pin state.
Reported-by: Jim Minter jimminter@microsoft.com Fixes: 6989ea4881c8 ("pinctrl: intel: Save and restore pins in "direct IRQ" mode") Tested-by: Jim Minter jimminter@microsoft.com Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Acked-by: Mika Westerberg mika.westerberg@linux.intel.com Link: https://lore.kernel.org/r/20230206141558.20916-1-andriy.shevchenko@linux.int... Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pinctrl/intel/pinctrl-intel.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/drivers/pinctrl/intel/pinctrl-intel.c b/drivers/pinctrl/intel/pinctrl-intel.c index e0cb76f7e5407..32c6326337f73 100644 --- a/drivers/pinctrl/intel/pinctrl-intel.c +++ b/drivers/pinctrl/intel/pinctrl-intel.c @@ -1510,6 +1510,12 @@ int intel_pinctrl_probe_by_uid(struct platform_device *pdev) EXPORT_SYMBOL_GPL(intel_pinctrl_probe_by_uid);
#ifdef CONFIG_PM_SLEEP +static bool __intel_gpio_is_direct_irq(u32 value) +{ + return (value & PADCFG0_GPIROUTIOXAPIC) && (value & PADCFG0_GPIOTXDIS) && + (__intel_gpio_get_gpio_mode(value) == PADCFG0_PMODE_GPIO); +} + static bool intel_pinctrl_should_save(struct intel_pinctrl *pctrl, unsigned int pin) { const struct pin_desc *pd = pin_desc_get(pctrl->pctldev, pin); @@ -1543,8 +1549,7 @@ static bool intel_pinctrl_should_save(struct intel_pinctrl *pctrl, unsigned int * See https://bugzilla.kernel.org/show_bug.cgi?id=214749. */ value = readl(intel_get_padcfg(pctrl, pin, PADCFG0)); - if ((value & PADCFG0_GPIROUTIOXAPIC) && (value & PADCFG0_GPIOTXDIS) && - (__intel_gpio_get_gpio_mode(value) == PADCFG0_PMODE_GPIO)) + if (__intel_gpio_is_direct_irq(value)) return true;
return false; @@ -1656,7 +1661,12 @@ int intel_pinctrl_resume_noirq(struct device *dev) void __iomem *padcfg; u32 val;
- if (!intel_pinctrl_should_save(pctrl, desc->number)) + if (!(intel_pinctrl_should_save(pctrl, desc->number) || + /* + * If the firmware mangled the register contents too much, + * check the saved value for the Direct IRQ mode. + */ + __intel_gpio_is_direct_irq(pads[i].padcfg0))) continue;
padcfg = intel_get_padcfg(pctrl, desc->number, PADCFG0);
From: Alan Stern stern@rowland.harvard.edu
commit 811d581194f7412eda97acc03d17fc77824b561f upstream.
The syzbot fuzzer detected a bug in the plusb network driver: A zero-length control-OUT transfer was treated as a read instead of a write. In modern kernels this error provokes a WARNING:
usb 1-1: BOGUS control dir, pipe 80000280 doesn't match bRequestType c0 WARNING: CPU: 0 PID: 4645 at drivers/usb/core/urb.c:411 usb_submit_urb+0x14a7/0x1880 drivers/usb/core/urb.c:411 Modules linked in: CPU: 1 PID: 4645 Comm: dhcpcd Not tainted 6.2.0-rc6-syzkaller-00050-g9f266ccaa2f5 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023 RIP: 0010:usb_submit_urb+0x14a7/0x1880 drivers/usb/core/urb.c:411 ... Call Trace: <TASK> usb_start_wait_urb+0x101/0x4b0 drivers/usb/core/message.c:58 usb_internal_control_msg drivers/usb/core/message.c:102 [inline] usb_control_msg+0x320/0x4a0 drivers/usb/core/message.c:153 __usbnet_read_cmd+0xb9/0x390 drivers/net/usb/usbnet.c:2010 usbnet_read_cmd+0x96/0xf0 drivers/net/usb/usbnet.c:2068 pl_vendor_req drivers/net/usb/plusb.c:60 [inline] pl_set_QuickLink_features drivers/net/usb/plusb.c:75 [inline] pl_reset+0x2f/0xf0 drivers/net/usb/plusb.c:85 usbnet_open+0xcc/0x5d0 drivers/net/usb/usbnet.c:889 __dev_open+0x297/0x4d0 net/core/dev.c:1417 __dev_change_flags+0x587/0x750 net/core/dev.c:8530 dev_change_flags+0x97/0x170 net/core/dev.c:8602 devinet_ioctl+0x15a2/0x1d70 net/ipv4/devinet.c:1147 inet_ioctl+0x33f/0x380 net/ipv4/af_inet.c:979 sock_do_ioctl+0xcc/0x230 net/socket.c:1169 sock_ioctl+0x1f8/0x680 net/socket.c:1286 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
The fix is to call usbnet_write_cmd() instead of usbnet_read_cmd() and remove the USB_DIR_IN flag.
Reported-and-tested-by: syzbot+2a0e7abd24f1eb90ce25@syzkaller.appspotmail.com Signed-off-by: Alan Stern stern@rowland.harvard.edu Fixes: 090ffa9d0e90 ("[PATCH] USB: usbnet (9/9) module for pl2301/2302 cables") CC: stable@vger.kernel.org Link: https://lore.kernel.org/r/00000000000052099f05f3b3e298@google.com/ Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/plusb.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/drivers/net/usb/plusb.c +++ b/drivers/net/usb/plusb.c @@ -57,9 +57,7 @@ static inline int pl_vendor_req(struct usbnet *dev, u8 req, u8 val, u8 index) { - return usbnet_read_cmd(dev, req, - USB_DIR_IN | USB_TYPE_VENDOR | - USB_RECIP_DEVICE, + return usbnet_write_cmd(dev, req, USB_TYPE_VENDOR | USB_RECIP_DEVICE, val, index, NULL, 0); }
From: Mark Pearson mpearson-lenovo@squebb.ca
commit 303e724d7b1e1a0a93daf0b1ab5f7c4f53543b34 upstream.
The Alcor Link AK9563 smartcard reader used on some Lenovo platforms doesn't work. If LPM is enabled the reader will provide an invalid usb config descriptor. Added quirk to disable LPM.
Verified fix on Lenovo P16 G1 and T14 G3
Tested-by: Miroslav Zatko mzatko@mirexoft.com Tested-by: Dennis Wassenberg dennis.wassenberg@secunet.com Cc: stable@vger.kernel.org Signed-off-by: Dennis Wassenberg dennis.wassenberg@secunet.com Signed-off-by: Mark Pearson mpearson-lenovo@squebb.ca Link: https://lore.kernel.org/r/20230208181223.1092654-1-mpearson-lenovo@squebb.ca Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/core/quirks.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/usb/core/quirks.c +++ b/drivers/usb/core/quirks.c @@ -527,6 +527,9 @@ static const struct usb_device_id usb_qu /* DJI CineSSD */ { USB_DEVICE(0x2ca3, 0x0031), .driver_info = USB_QUIRK_NO_LPM },
+ /* Alcor Link AK9563 SC Reader used in 2022 Lenovo ThinkPads */ + { USB_DEVICE(0x2ce3, 0x9563), .driver_info = USB_QUIRK_NO_LPM }, + /* DELL USB GEN2 */ { USB_DEVICE(0x413c, 0xb062), .driver_info = USB_QUIRK_NO_LPM | USB_QUIRK_RESET_RESUME },
From: Prashant Malani pmalani@chromium.org
commit 54e5c00a4eb0a4c663445b245f641bbfab142430 upstream.
While checking Pin Assignments of the port and partner during probe, we don't take into account whether the peripheral is a plug or receptacle.
This manifests itself in a mode entry failure on certain docks and dongles with captive cables. For instance, the Startech.com Type-C to DP dongle (Model #CDP2DP) advertises its DP VDO as 0x405. This would fail the Pin Assignment compatibility check, despite it supporting Pin Assignment C as a UFP.
Update the check to use the correct DP Pin Assign macros that take the peripheral's receptacle bit into account.
Fixes: c1e5c2f0cb8a ("usb: typec: altmodes/displayport: correct pin assignment for UFP receptacles") Cc: stable@vger.kernel.org Reported-by: Diana Zigterman dzigterman@chromium.org Signed-off-by: Prashant Malani pmalani@chromium.org Link: https://lore.kernel.org/r/20230208205318.131385-1-pmalani@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/altmodes/displayport.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/usb/typec/altmodes/displayport.c +++ b/drivers/usb/typec/altmodes/displayport.c @@ -522,10 +522,10 @@ int dp_altmode_probe(struct typec_altmod /* FIXME: Port can only be DFP_U. */
/* Make sure we have compatiple pin configurations */ - if (!(DP_CAP_DFP_D_PIN_ASSIGN(port->vdo) & - DP_CAP_UFP_D_PIN_ASSIGN(alt->vdo)) && - !(DP_CAP_UFP_D_PIN_ASSIGN(port->vdo) & - DP_CAP_DFP_D_PIN_ASSIGN(alt->vdo))) + if (!(DP_CAP_PIN_ASSIGN_DFP_D(port->vdo) & + DP_CAP_PIN_ASSIGN_UFP_D(alt->vdo)) && + !(DP_CAP_PIN_ASSIGN_UFP_D(port->vdo) & + DP_CAP_PIN_ASSIGN_DFP_D(alt->vdo))) return -ENODEV;
ret = sysfs_create_group(&alt->dev.kobj, &dp_altmode_group);
From: Xiubo Li xiubli@redhat.com
commit e7d84c6a1296d059389f7342d9b4b7defb518d3a upstream.
MDS expects the completed cap release prior to responding to the session flush for cache drop.
Cc: stable@vger.kernel.org Link: http://tracker.ceph.com/issues/38009 Signed-off-by: Xiubo Li xiubli@redhat.com Reviewed-by: Venky Shankar vshankar@redhat.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ceph/mds_client.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -3151,6 +3151,12 @@ static void handle_session(struct ceph_m break;
case CEPH_SESSION_FLUSHMSG: + /* flush cap releases */ + spin_lock(&session->s_cap_lock); + if (session->s_num_cap_releases) + ceph_flush_cap_releases(mdsc, session); + spin_unlock(&session->s_cap_lock); + send_flushmsg_ack(mdsc, session, seq); break;
From: Guo Ren guoren@linux.alibaba.com
commit 950b879b7f0251317d26bae0687e72592d607532 upstream.
In commit 588a513d3425 ("arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache()"), we found RISC-V has the same issue as the previous arm64. The previous implementation didn't guarantee the correct sequence of operations, which means flush_icache_all() hasn't been called when the PG_dcache_clean was set. That would cause a risk of page synchronization.
Fixes: 08f051eda33b ("RISC-V: Flush I$ when making a dirty page executable") Signed-off-by: Guo Ren guoren@linux.alibaba.com Signed-off-by: Guo Ren guoren@kernel.org Reviewed-by: Andrew Jones ajones@ventanamicro.com Reviewed-by: Conor Dooley conor.dooley@microchip.com Link: https://lore.kernel.org/r/20230127035306.1819561-1-guoren@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/mm/cacheflush.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/arch/riscv/mm/cacheflush.c +++ b/arch/riscv/mm/cacheflush.c @@ -71,6 +71,8 @@ void flush_icache_pte(pte_t pte) { struct page *page = pte_page(pte);
- if (!test_and_set_bit(PG_dcache_clean, &page->flags)) + if (!test_bit(PG_dcache_clean, &page->flags)) { flush_icache_all(); + set_bit(PG_dcache_clean, &page->flags); + } }
From: Heiner Kallweit hkallweit1@gmail.com
commit 66e45351f7d6798751f98001d1fcd572024d87f0 upstream.
The usage of edge-triggered interrupts lead to lost interrupts under load, see [0]. This was confirmed to be fixed by using level-triggered interrupts. The report was about SDIO. However, as the host controller is the same for SD and MMC, apply the change to all mmc controller instances.
[0] https://www.spinics.net/lists/linux-mmc/msg73991.html
Fixes: ef8d2ffedf18 ("ARM64: dts: meson-gxbb: add MMC support") Cc: stable@vger.kernel.org Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Acked-by: Neil Armstrong neil.armstrong@linaro.org Link: https://lore.kernel.org/r/76e042e0-a610-5ed5-209f-c4d7f879df44@gmail.com Signed-off-by: Neil Armstrong neil.armstrong@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/amlogic/meson-gx.dtsi | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/arch/arm64/boot/dts/amlogic/meson-gx.dtsi +++ b/arch/arm64/boot/dts/amlogic/meson-gx.dtsi @@ -528,21 +528,21 @@ sd_emmc_a: mmc@70000 { compatible = "amlogic,meson-gx-mmc", "amlogic,meson-gxbb-mmc"; reg = <0x0 0x70000 0x0 0x800>; - interrupts = <GIC_SPI 216 IRQ_TYPE_EDGE_RISING>; + interrupts = <GIC_SPI 216 IRQ_TYPE_LEVEL_HIGH>; status = "disabled"; };
sd_emmc_b: mmc@72000 { compatible = "amlogic,meson-gx-mmc", "amlogic,meson-gxbb-mmc"; reg = <0x0 0x72000 0x0 0x800>; - interrupts = <GIC_SPI 217 IRQ_TYPE_EDGE_RISING>; + interrupts = <GIC_SPI 217 IRQ_TYPE_LEVEL_HIGH>; status = "disabled"; };
sd_emmc_c: mmc@74000 { compatible = "amlogic,meson-gx-mmc", "amlogic,meson-gxbb-mmc"; reg = <0x0 0x74000 0x0 0x800>; - interrupts = <GIC_SPI 218 IRQ_TYPE_EDGE_RISING>; + interrupts = <GIC_SPI 218 IRQ_TYPE_LEVEL_HIGH>; status = "disabled"; }; };
From: Heiner Kallweit hkallweit1@gmail.com
commit ac8db4cceed218cca21c84f9d75ce88182d8b04f upstream.
The usage of edge-triggered interrupts lead to lost interrupts under load, see [0]. This was confirmed to be fixed by using level-triggered interrupts. The report was about SDIO. However, as the host controller is the same for SD and MMC, apply the change to all mmc controller instances.
[0] https://www.spinics.net/lists/linux-mmc/msg73991.html
Fixes: 4759fd87b928 ("arm64: dts: meson: g12a: add mmc nodes") Tested-by: FUKAUMI Naoki naoki@radxa.com Tested-by: Martin Blumenstingl martin.blumenstingl@googlemail.com Tested-by: Jerome Brunet jbrunet@baylibre.com Cc: stable@vger.kernel.org Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Acked-by: Neil Armstrong neil.armstrong@linaro.org Link: https://lore.kernel.org/r/27d89baa-b8fa-baca-541b-ef17a97cde3c@gmail.com Signed-off-by: Neil Armstrong neil.armstrong@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/amlogic/meson-g12-common.dtsi | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/arch/arm64/boot/dts/amlogic/meson-g12-common.dtsi +++ b/arch/arm64/boot/dts/amlogic/meson-g12-common.dtsi @@ -2317,7 +2317,7 @@ sd_emmc_a: sd@ffe03000 { compatible = "amlogic,meson-axg-mmc"; reg = <0x0 0xffe03000 0x0 0x800>; - interrupts = <GIC_SPI 189 IRQ_TYPE_EDGE_RISING>; + interrupts = <GIC_SPI 189 IRQ_TYPE_LEVEL_HIGH>; status = "disabled"; clocks = <&clkc CLKID_SD_EMMC_A>, <&clkc CLKID_SD_EMMC_A_CLK0>, @@ -2329,7 +2329,7 @@ sd_emmc_b: sd@ffe05000 { compatible = "amlogic,meson-axg-mmc"; reg = <0x0 0xffe05000 0x0 0x800>; - interrupts = <GIC_SPI 190 IRQ_TYPE_EDGE_RISING>; + interrupts = <GIC_SPI 190 IRQ_TYPE_LEVEL_HIGH>; status = "disabled"; clocks = <&clkc CLKID_SD_EMMC_B>, <&clkc CLKID_SD_EMMC_B_CLK0>, @@ -2341,7 +2341,7 @@ sd_emmc_c: mmc@ffe07000 { compatible = "amlogic,meson-axg-mmc"; reg = <0x0 0xffe07000 0x0 0x800>; - interrupts = <GIC_SPI 191 IRQ_TYPE_EDGE_RISING>; + interrupts = <GIC_SPI 191 IRQ_TYPE_LEVEL_HIGH>; status = "disabled"; clocks = <&clkc CLKID_SD_EMMC_C>, <&clkc CLKID_SD_EMMC_C_CLK0>,
From: Heiner Kallweit hkallweit1@gmail.com
commit d182bcf300772d8b2e5f43e47fa0ebda2b767cc4 upstream.
The usage of edge-triggered interrupts lead to lost interrupts under load, see [0]. This was confirmed to be fixed by using level-triggered interrupts. The report was about SDIO. However, as the host controller is the same for SD and MMC, apply the change to all mmc controller instances.
[0] https://www.spinics.net/lists/linux-mmc/msg73991.html
Fixes: 221cf34bac54 ("ARM64: dts: meson-axg: enable the eMMC controller") Reported-by: Peter Suti peter.suti@streamunlimited.com Tested-by: Vyacheslav Bocharov adeep@lexina.in Tested-by: Peter Suti peter.suti@streamunlimited.com Cc: stable@vger.kernel.org Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Acked-by: Neil Armstrong neil.armstrong@linaro.org Link: https://lore.kernel.org/r/c00655d3-02f8-6f5f-4239-ca2412420cad@gmail.com Signed-off-by: Neil Armstrong neil.armstrong@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/amlogic/meson-axg.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm64/boot/dts/amlogic/meson-axg.dtsi +++ b/arch/arm64/boot/dts/amlogic/meson-axg.dtsi @@ -1705,7 +1705,7 @@ sd_emmc_b: sd@5000 { compatible = "amlogic,meson-axg-mmc"; reg = <0x0 0x5000 0x0 0x800>; - interrupts = <GIC_SPI 217 IRQ_TYPE_EDGE_RISING>; + interrupts = <GIC_SPI 217 IRQ_TYPE_LEVEL_HIGH>; status = "disabled"; clocks = <&clkc CLKID_SD_EMMC_B>, <&clkc CLKID_SD_EMMC_B_CLK0>, @@ -1717,7 +1717,7 @@ sd_emmc_c: mmc@7000 { compatible = "amlogic,meson-axg-mmc"; reg = <0x0 0x7000 0x0 0x800>; - interrupts = <GIC_SPI 218 IRQ_TYPE_EDGE_RISING>; + interrupts = <GIC_SPI 218 IRQ_TYPE_LEVEL_HIGH>; status = "disabled"; clocks = <&clkc CLKID_SD_EMMC_C>, <&clkc CLKID_SD_EMMC_C_CLK0>,
From: Andy Shevchenko andriy.shevchenko@linux.intel.com
commit 0b85f59d30b91bd2b93ea7ef0816a4b7e7039e8c upstream.
It's unusual that we have enumeration by class in the middle of the table. It might potentially be problematic in the future if we add another entry after it.
So, move class matching entry to be the last in the ID table.
Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Reviewed-by: Keith Busch kbusch@kernel.org Reviewed-by: Sagi Grimberg sagi@grimberg.me Reviewed-by: Chaitanya Kulkarni chaitanya.kulkarni@wdc.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Gwendal Grignou gwendal@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/nvme/host/pci.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -3199,7 +3199,6 @@ static const struct pci_device_id nvme_i NVME_QUIRK_IGNORE_DEV_SUBNQN, }, { PCI_DEVICE(0x1c5c, 0x1504), /* SK Hynix PC400 */ .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, - { PCI_DEVICE_CLASS(PCI_CLASS_STORAGE_EXPRESS, 0xffffff) }, { PCI_DEVICE(0x2646, 0x2263), /* KINGSTON A2000 NVMe SSD */ .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2001), @@ -3209,6 +3208,8 @@ static const struct pci_device_id nvme_i .driver_data = NVME_QUIRK_SINGLE_VECTOR | NVME_QUIRK_128_BYTES_SQES | NVME_QUIRK_SHARED_TAGS }, + + { PCI_DEVICE_CLASS(PCI_CLASS_STORAGE_EXPRESS, 0xffffff) }, { 0, } }; MODULE_DEVICE_TABLE(pci, nvme_id_table);
From: Toke Høiland-Jørgensen toke@redhat.com
commit d1c362e1dd68a421cf9033404cf141a4ab734a5d upstream.
The bpf_fib_lookup() helper performs a neighbour lookup for the destination IP and returns BPF_FIB_LKUP_NO_NEIGH if this fails, with the expectation that the BPF program will pass the packet up the stack in this case. However, with the addition of bpf_redirect_neigh() that can be used instead to perform the neighbour lookup, at the cost of a bit of duplicated work.
For that we still need the target ifindex, and since bpf_fib_lookup() already has that at the time it performs the neighbour lookup, there is really no reason why it can't just return it in any case. So let's just always return the ifindex if the FIB lookup itself succeeds.
Signed-off-by: Toke Høiland-Jørgensen toke@redhat.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Cc: David Ahern dsahern@gmail.com Link: https://lore.kernel.org/bpf/20201009184234.134214-1-toke@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/filter.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/core/filter.c +++ b/net/core/filter.c @@ -4617,7 +4617,6 @@ static int bpf_fib_set_fwd_params(struct memcpy(params->smac, dev->dev_addr, ETH_ALEN); params->h_vlan_TCI = 0; params->h_vlan_proto = 0; - params->ifindex = dev->ifindex;
return 0; } @@ -4714,6 +4713,7 @@ static int bpf_ipv4_fib_lookup(struct ne dev = nhc->nhc_dev;
params->rt_metric = res.fi->fib_priority; + params->ifindex = dev->ifindex;
/* xdp and cls_bpf programs are run in RCU-bh so * rcu_read_lock_bh is not needed here @@ -4839,6 +4839,7 @@ static int bpf_ipv6_fib_lookup(struct ne
dev = res.nh->fib_nh_dev; params->rt_metric = res.f6i->fib6_metric; + params->ifindex = dev->ifindex;
/* xdp and cls_bpf programs are run in RCU-bh so rcu_read_lock_bh is * not needed here.
From: Mike Kravetz mike.kravetz@oracle.com
commit 73bdf65ea74857d7fb2ec3067a3cec0e261b1462 upstream.
migrate_pages/mempolicy semantics state that CAP_SYS_NICE is required to move pages shared with another process to a different node. page_mapcount
1 is being used to determine if a hugetlb page is shared. However, a
hugetlb page will have a mapcount of 1 if mapped by multiple processes via a shared PMD. As a result, hugetlb pages shared by multiple processes and mapped with a shared PMD can be moved by a process without CAP_SYS_NICE.
To fix, check for a shared PMD if mapcount is 1. If a shared PMD is found consider the page shared.
Link: https://lkml.kernel.org/r/20230126222721.222195-3-mike.kravetz@oracle.com Fixes: e2d8cf405525 ("migrate: add hugepage migration code to migrate_pages()") Signed-off-by: Mike Kravetz mike.kravetz@oracle.com Acked-by: Peter Xu peterx@redhat.com Acked-by: David Hildenbrand david@redhat.com Cc: James Houghton jthoughton@google.com Cc: Matthew Wilcox willy@infradead.org Cc: Michal Hocko mhocko@suse.com Cc: Muchun Song songmuchun@bytedance.com Cc: Naoya Horiguchi naoya.horiguchi@linux.dev Cc: Vishal Moola (Oracle) vishal.moola@gmail.com Cc: Yang Shi shy828301@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/mempolicy.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -571,7 +571,8 @@ static int queue_pages_hugetlb(pte_t *pt goto unlock; /* With MPOL_MF_MOVE, we migrate only unshared hugepage. */ if (flags & (MPOL_MF_MOVE_ALL) || - (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) + (flags & MPOL_MF_MOVE && page_mapcount(page) == 1 && + !hugetlb_pmd_shared(pte))) isolate_huge_page(page, qp->pagelist); unlock: spin_unlock(ptl);
From: Eduard Zingerman eddyz87@gmail.com
[ Upstream commit b9fa9bc839291020b362ab5392e5f18ba79657ac ]
A testcase to check that verifier.c:copy_register_state() preserves register parentage chain and livness information.
Signed-off-by: Eduard Zingerman eddyz87@gmail.com Link: https://lore.kernel.org/r/20230106142214.1040390-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../selftests/bpf/verifier/search_pruning.c | 36 +++++++++++++++++++ 1 file changed, 36 insertions(+)
diff --git a/tools/testing/selftests/bpf/verifier/search_pruning.c b/tools/testing/selftests/bpf/verifier/search_pruning.c index 7e50cb80873a5..7e36078f8f482 100644 --- a/tools/testing/selftests/bpf/verifier/search_pruning.c +++ b/tools/testing/selftests/bpf/verifier/search_pruning.c @@ -154,3 +154,39 @@ .result_unpriv = ACCEPT, .insn_processed = 15, }, +/* The test performs a conditional 64-bit write to a stack location + * fp[-8], this is followed by an unconditional 8-bit write to fp[-8], + * then data is read from fp[-8]. This sequence is unsafe. + * + * The test would be mistakenly marked as safe w/o dst register parent + * preservation in verifier.c:copy_register_state() function. + * + * Note the usage of BPF_F_TEST_STATE_FREQ to force creation of the + * checkpoint state after conditional 64-bit assignment. + */ +{ + "write tracking and register parent chain bug", + .insns = { + /* r6 = ktime_get_ns() */ + BPF_EMIT_CALL(BPF_FUNC_ktime_get_ns), + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), + /* r0 = ktime_get_ns() */ + BPF_EMIT_CALL(BPF_FUNC_ktime_get_ns), + /* if r0 > r6 goto +1 */ + BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_6, 1), + /* *(u64 *)(r10 - 8) = 0xdeadbeef */ + BPF_ST_MEM(BPF_DW, BPF_REG_FP, -8, 0xdeadbeef), + /* r1 = 42 */ + BPF_MOV64_IMM(BPF_REG_1, 42), + /* *(u8 *)(r10 - 8) = r1 */ + BPF_STX_MEM(BPF_B, BPF_REG_FP, BPF_REG_1, -8), + /* r2 = *(u64 *)(r10 - 8) */ + BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_FP, -8), + /* exit(0) */ + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .flags = BPF_F_TEST_STATE_FREQ, + .errstr = "invalid read from stack off -8+1 size 8", + .result = REJECT, +},
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit e18c6da62edc780e4f4f3c9ce07bdacd69505182 ]
While looking through legacy platform data users, I noticed that the DT probing never uses data from the DT properties, as the platform_data structure gets overwritten directly after it is initialized.
There have never been any boards defining the platform_data in the mainline kernel either, so this driver so far only worked with patched kernels or with the default values.
For the benefit of possible downstream users, fix the DT probe by no longer overwriting the data.
Signed-off-by: Arnd Bergmann arnd@arndb.de Acked-by: Charles Keepax ckeepax@opensource.cirrus.com Link: https://lore.kernel.org/r/20230126162203.2986339-1-arnd@kernel.org Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/cs42l56.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/sound/soc/codecs/cs42l56.c b/sound/soc/codecs/cs42l56.c index 8be7d83f0ce9a..732405587c5a4 100644 --- a/sound/soc/codecs/cs42l56.c +++ b/sound/soc/codecs/cs42l56.c @@ -1192,18 +1192,12 @@ static int cs42l56_i2c_probe(struct i2c_client *i2c_client, if (pdata) { cs42l56->pdata = *pdata; } else { - pdata = devm_kzalloc(&i2c_client->dev, sizeof(*pdata), - GFP_KERNEL); - if (!pdata) - return -ENOMEM; - if (i2c_client->dev.of_node) { ret = cs42l56_handle_of_data(i2c_client, &cs42l56->pdata); if (ret != 0) return ret; } - cs42l56->pdata = *pdata; }
if (cs42l56->pdata.gpio_nreset) {
From: Shunsuke Mie mie@igel.co.jp
[ Upstream commit 3f7b75abf41cc4143aa295f62acbb060a012868d ]
Fix the build caused by missing kmsan_handle_dma() and is_power_of_2() that are used in drivers/virtio/virtio_ring.c.
Signed-off-by: Shunsuke Mie mie@igel.co.jp Message-Id: 20230110034310.779744-1-mie@igel.co.jp Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/virtio/linux/bug.h | 8 +++----- tools/virtio/linux/build_bug.h | 7 +++++++ tools/virtio/linux/cpumask.h | 7 +++++++ tools/virtio/linux/gfp.h | 7 +++++++ tools/virtio/linux/kernel.h | 1 + tools/virtio/linux/kmsan.h | 12 ++++++++++++ tools/virtio/linux/scatterlist.h | 1 + tools/virtio/linux/topology.h | 7 +++++++ 8 files changed, 45 insertions(+), 5 deletions(-) create mode 100644 tools/virtio/linux/build_bug.h create mode 100644 tools/virtio/linux/cpumask.h create mode 100644 tools/virtio/linux/gfp.h create mode 100644 tools/virtio/linux/kmsan.h create mode 100644 tools/virtio/linux/topology.h
diff --git a/tools/virtio/linux/bug.h b/tools/virtio/linux/bug.h index b14c2c3b6b857..74aef964f5099 100644 --- a/tools/virtio/linux/bug.h +++ b/tools/virtio/linux/bug.h @@ -1,11 +1,9 @@ /* SPDX-License-Identifier: GPL-2.0 */ -#ifndef BUG_H -#define BUG_H +#ifndef _LINUX_BUG_H +#define _LINUX_BUG_H
#define BUG_ON(__BUG_ON_cond) assert(!(__BUG_ON_cond))
-#define BUILD_BUG_ON(x) - #define BUG() abort()
-#endif /* BUG_H */ +#endif /* _LINUX_BUG_H */ diff --git a/tools/virtio/linux/build_bug.h b/tools/virtio/linux/build_bug.h new file mode 100644 index 0000000000000..cdbb75e28a604 --- /dev/null +++ b/tools/virtio/linux/build_bug.h @@ -0,0 +1,7 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_BUILD_BUG_H +#define _LINUX_BUILD_BUG_H + +#define BUILD_BUG_ON(x) + +#endif /* _LINUX_BUILD_BUG_H */ diff --git a/tools/virtio/linux/cpumask.h b/tools/virtio/linux/cpumask.h new file mode 100644 index 0000000000000..307da69d6b26c --- /dev/null +++ b/tools/virtio/linux/cpumask.h @@ -0,0 +1,7 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_CPUMASK_H +#define _LINUX_CPUMASK_H + +#include <linux/kernel.h> + +#endif /* _LINUX_CPUMASK_H */ diff --git a/tools/virtio/linux/gfp.h b/tools/virtio/linux/gfp.h new file mode 100644 index 0000000000000..43d146f236f14 --- /dev/null +++ b/tools/virtio/linux/gfp.h @@ -0,0 +1,7 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __LINUX_GFP_H +#define __LINUX_GFP_H + +#include <linux/topology.h> + +#endif diff --git a/tools/virtio/linux/kernel.h b/tools/virtio/linux/kernel.h index 6683b4a70b059..3325cdf229410 100644 --- a/tools/virtio/linux/kernel.h +++ b/tools/virtio/linux/kernel.h @@ -10,6 +10,7 @@ #include <stdarg.h>
#include <linux/compiler.h> +#include <linux/log2.h> #include <linux/types.h> #include <linux/printk.h> #include <linux/bug.h> diff --git a/tools/virtio/linux/kmsan.h b/tools/virtio/linux/kmsan.h new file mode 100644 index 0000000000000..272b5aa285d5a --- /dev/null +++ b/tools/virtio/linux/kmsan.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_KMSAN_H +#define _LINUX_KMSAN_H + +#include <linux/gfp.h> + +inline void kmsan_handle_dma(struct page *page, size_t offset, size_t size, + enum dma_data_direction dir) +{ +} + +#endif /* _LINUX_KMSAN_H */ diff --git a/tools/virtio/linux/scatterlist.h b/tools/virtio/linux/scatterlist.h index 369ee308b6686..74d9e1825748e 100644 --- a/tools/virtio/linux/scatterlist.h +++ b/tools/virtio/linux/scatterlist.h @@ -2,6 +2,7 @@ #ifndef SCATTERLIST_H #define SCATTERLIST_H #include <linux/kernel.h> +#include <linux/bug.h>
struct scatterlist { unsigned long page_link; diff --git a/tools/virtio/linux/topology.h b/tools/virtio/linux/topology.h new file mode 100644 index 0000000000000..910794afb993a --- /dev/null +++ b/tools/virtio/linux/topology.h @@ -0,0 +1,7 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_TOPOLOGY_H +#define _LINUX_TOPOLOGY_H + +#include <linux/cpumask.h> + +#endif /* _LINUX_TOPOLOGY_H */
From: Hyunwoo Kim v4bel@theori.io
[ Upstream commit 14caefcf9837a2be765a566005ad82cd0d2a429f ]
If you call listen() and accept() on an already connect()ed rose socket, accept() can successfully connect. This is because when the peer socket sends data to sendmsg, the skb with its own sk stored in the connected socket's sk->sk_receive_queue is connected, and rose_accept() dequeues the skb waiting in the sk->sk_receive_queue.
This creates a child socket with the sk of the parent rose socket, which can cause confusion.
Fix rose_listen() to return -EINVAL if the socket has already been successfully connected, and add lock_sock to prevent this issue.
Signed-off-by: Hyunwoo Kim v4bel@theori.io Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20230125105944.GA133314@ubuntu Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/rose/af_rose.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c index 95dda29058a0e..6fb158172ddc2 100644 --- a/net/rose/af_rose.c +++ b/net/rose/af_rose.c @@ -465,6 +465,12 @@ static int rose_listen(struct socket *sock, int backlog) { struct sock *sk = sock->sk;
+ lock_sock(sk); + if (sock->state != SS_UNCONNECTED) { + release_sock(sk); + return -EINVAL; + } + if (sk->sk_state != TCP_LISTEN) { struct rose_sock *rose = rose_sk(sk);
@@ -474,8 +480,10 @@ static int rose_listen(struct socket *sock, int backlog) memset(rose->dest_digis, 0, AX25_ADDR_LEN * ROSE_MAX_DIGIS); sk->sk_max_ack_backlog = backlog; sk->sk_state = TCP_LISTEN; + release_sock(sk); return 0; } + release_sock(sk);
return -EOPNOTSUPP; }
From: Andrey Konovalov andrey.konovalov@linaro.org
[ Upstream commit 54aa39a513dbf2164ca462a19f04519b2407a224 ]
Currently in phy_init_eee() the driver unconditionally configures the PHY to stop RX_CLK after entering Rx LPI state. This causes an LPI interrupt storm on my qcs404-base board.
Change the PHY initialization so that for "qcom,qcs404-ethqos" compatible device RX_CLK continues to run even in Rx LPI state.
Signed-off-by: Andrey Konovalov andrey.konovalov@linaro.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c | 2 ++ drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 ++- include/linux/stmmac.h | 1 + 3 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c index bfc4a92f1d92b..78be62ecc9a9a 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c @@ -505,6 +505,8 @@ static int qcom_ethqos_probe(struct platform_device *pdev) plat_dat->has_gmac4 = 1; plat_dat->pmt = 1; plat_dat->tso_en = of_property_read_bool(np, "snps,tso"); + if (of_device_is_compatible(np, "qcom,qcs404-ethqos")) + plat_dat->rx_clk_runs_in_lpi = 1;
ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res); if (ret) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 3079e52546663..6a3b0f76d9729 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -932,7 +932,8 @@ static void stmmac_mac_link_up(struct phylink_config *config,
stmmac_mac_set(priv, priv->ioaddr, true); if (phy && priv->dma_cap.eee) { - priv->eee_active = phy_init_eee(phy, 1) >= 0; + priv->eee_active = + phy_init_eee(phy, !priv->plat->rx_clk_runs_in_lpi) >= 0; priv->eee_enabled = stmmac_eee_init(priv); stmmac_set_eee_pls(priv, priv->hw, true); } diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 0b35747c9837a..88b107e20fc7c 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -178,6 +178,7 @@ struct plat_stmmacenet_data { int rss_en; int mac_port_sel_speed; bool en_tx_lpi_clockgating; + bool rx_clk_runs_in_lpi; int has_xgmac; bool sph_disable; };
From: Kees Cook keescook@chromium.org
[ Upstream commit de5ca4c3852f896cacac2bf259597aab5e17d9e3 ]
Nothing was explicitly bounds checking the priority index used to access clpriop[]. WARN and bail out early if it's pathological. Seen with GCC 13:
../net/sched/sch_htb.c: In function 'htb_activate_prios': ../net/sched/sch_htb.c:437:44: warning: array subscript [0, 31] is outside array bounds of 'struct htb_prio[8]' [-Warray-bounds=] 437 | if (p->inner.clprio[prio].feed.rb_node) | ~~~~~~~~~~~~~~~^~~~~~ ../net/sched/sch_htb.c:131:41: note: while referencing 'clprio' 131 | struct htb_prio clprio[TC_HTB_NUMPRIO]; | ^~~~~~
Cc: Jamal Hadi Salim jhs@mojatatu.com Cc: Cong Wang xiyou.wangcong@gmail.com Cc: Jiri Pirko jiri@resnulli.us Cc: "David S. Miller" davem@davemloft.net Cc: Eric Dumazet edumazet@google.com Cc: Jakub Kicinski kuba@kernel.org Cc: Paolo Abeni pabeni@redhat.com Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org Reviewed-by: Simon Horman simon.horman@corigine.com Reviewed-by: Cong Wang cong.wang@bytedance.com Link: https://lore.kernel.org/r/20230127224036.never.561-kees@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_htb.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index 8184c87da8bec..e635713cb41dd 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -405,7 +405,10 @@ static void htb_activate_prios(struct htb_sched *q, struct htb_class *cl) while (cl->cmode == HTB_MAY_BORROW && p && mask) { m = mask; while (m) { - int prio = ffz(~m); + unsigned int prio = ffz(~m); + + if (WARN_ON_ONCE(prio > ARRAY_SIZE(p->inner.clprio))) + break; m &= ~(1 << prio);
if (p->inner.clprio[prio].feed.rb_node)
Hello,
On Mon, 2023-02-20 at 14:35 +0100, Greg Kroah-Hartman wrote:
From: Kees Cook keescook@chromium.org
[ Upstream commit de5ca4c3852f896cacac2bf259597aab5e17d9e3 ]
Nothing was explicitly bounds checking the priority index used to access clpriop[]. WARN and bail out early if it's pathological. Seen with GCC 13:
../net/sched/sch_htb.c: In function 'htb_activate_prios': ../net/sched/sch_htb.c:437:44: warning: array subscript [0, 31] is outside array bounds of 'struct htb_prio[8]' [-Warray-bounds=] 437 | if (p->inner.clprio[prio].feed.rb_node) | ~~~~~~~~~~~~~~~^~~~~~ ../net/sched/sch_htb.c:131:41: note: while referencing 'clprio' 131 | struct htb_prio clprio[TC_HTB_NUMPRIO]; | ^~~~~~
Cc: Jamal Hadi Salim jhs@mojatatu.com Cc: Cong Wang xiyou.wangcong@gmail.com Cc: Jiri Pirko jiri@resnulli.us Cc: "David S. Miller" davem@davemloft.net Cc: Eric Dumazet edumazet@google.com Cc: Jakub Kicinski kuba@kernel.org Cc: Paolo Abeni pabeni@redhat.com Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org Reviewed-by: Simon Horman simon.horman@corigine.com Reviewed-by: Cong Wang cong.wang@bytedance.com Link: https://lore.kernel.org/r/20230127224036.never.561-kees@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
This one has a follow-up which I don't see among the patches reaching the netdev ML:
commit 9cec2aaffe969f2a3e18b5ec105fc20bb908e475 Author: Dan Carpenter error27@gmail.com Date: Mon Feb 6 16:18:32 2023 +0300
net: sched: sch: Fix off by one in htb_activate_prios()
Cheers,
Paolo
On Tue, Feb 21, 2023 at 08:45:18AM +0100, Paolo Abeni wrote:
Hello,
On Mon, 2023-02-20 at 14:35 +0100, Greg Kroah-Hartman wrote:
From: Kees Cook keescook@chromium.org
[ Upstream commit de5ca4c3852f896cacac2bf259597aab5e17d9e3 ]
Nothing was explicitly bounds checking the priority index used to access clpriop[]. WARN and bail out early if it's pathological. Seen with GCC 13:
../net/sched/sch_htb.c: In function 'htb_activate_prios': ../net/sched/sch_htb.c:437:44: warning: array subscript [0, 31] is outside array bounds of 'struct htb_prio[8]' [-Warray-bounds=] 437 | if (p->inner.clprio[prio].feed.rb_node) | ~~~~~~~~~~~~~~~^~~~~~ ../net/sched/sch_htb.c:131:41: note: while referencing 'clprio' 131 | struct htb_prio clprio[TC_HTB_NUMPRIO]; | ^~~~~~
Cc: Jamal Hadi Salim jhs@mojatatu.com Cc: Cong Wang xiyou.wangcong@gmail.com Cc: Jiri Pirko jiri@resnulli.us Cc: "David S. Miller" davem@davemloft.net Cc: Eric Dumazet edumazet@google.com Cc: Jakub Kicinski kuba@kernel.org Cc: Paolo Abeni pabeni@redhat.com Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org Reviewed-by: Simon Horman simon.horman@corigine.com Reviewed-by: Cong Wang cong.wang@bytedance.com Link: https://lore.kernel.org/r/20230127224036.never.561-kees@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
This one has a follow-up which I don't see among the patches reaching the netdev ML:
commit 9cec2aaffe969f2a3e18b5ec105fc20bb908e475 Author: Dan Carpenter error27@gmail.com Date: Mon Feb 6 16:18:32 2023 +0300
net: sched: sch: Fix off by one in htb_activate_prios()
This too is in the queue for 5.4 and newer kernels, are you sure you didn't miss that in this series?
thanks,
greg k-h
On Tue, 2023-02-21 at 09:41 +0100, Greg Kroah-Hartman wrote:
On Tue, Feb 21, 2023 at 08:45:18AM +0100, Paolo Abeni wrote:
Hello,
On Mon, 2023-02-20 at 14:35 +0100, Greg Kroah-Hartman wrote:
From: Kees Cook keescook@chromium.org
[ Upstream commit de5ca4c3852f896cacac2bf259597aab5e17d9e3 ]
Nothing was explicitly bounds checking the priority index used to access clpriop[]. WARN and bail out early if it's pathological. Seen with GCC 13:
../net/sched/sch_htb.c: In function 'htb_activate_prios': ../net/sched/sch_htb.c:437:44: warning: array subscript [0, 31] is outside array bounds of 'struct htb_prio[8]' [-Warray-bounds=] 437 | if (p->inner.clprio[prio].feed.rb_node) | ~~~~~~~~~~~~~~~^~~~~~ ../net/sched/sch_htb.c:131:41: note: while referencing 'clprio' 131 | struct htb_prio clprio[TC_HTB_NUMPRIO]; | ^~~~~~
Cc: Jamal Hadi Salim jhs@mojatatu.com Cc: Cong Wang xiyou.wangcong@gmail.com Cc: Jiri Pirko jiri@resnulli.us Cc: "David S. Miller" davem@davemloft.net Cc: Eric Dumazet edumazet@google.com Cc: Jakub Kicinski kuba@kernel.org Cc: Paolo Abeni pabeni@redhat.com Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org Reviewed-by: Simon Horman simon.horman@corigine.com Reviewed-by: Cong Wang cong.wang@bytedance.com Link: https://lore.kernel.org/r/20230127224036.never.561-kees@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
This one has a follow-up which I don't see among the patches reaching the netdev ML:
commit 9cec2aaffe969f2a3e18b5ec105fc20bb908e475 Author: Dan Carpenter error27@gmail.com Date: Mon Feb 6 16:18:32 2023 +0300
net: sched: sch: Fix off by one in htb_activate_prios()
This too is in the queue for 5.4 and newer kernels, are you sure you didn't miss that in this series?
I missed it, sorry. I checked only my inbox and netdev, and it was not there, but I see it in the stable queue.
Sorry for the noise,
Paolo
From: Vasily Gorbik gor@linux.ibm.com
[ Upstream commit 7ab41c2c08a32132ba8c14624910e2fe8ce4ba4b ]
Historically calls to __decompress() didn't specify "out_len" parameter on many architectures including s390, expecting that no writes beyond uncompressed kernel image are performed. This has changed since commit 2aa14b1ab2c4 ("zstd: import usptream v1.5.2") which includes zstd library commit 6a7ede3dfccb ("Reduce size of dctx by reutilizing dst buffer (#2751)"). Now zstd decompression code might store literal buffer in the unwritten portion of the destination buffer. Since "out_len" is not set, it is considered to be unlimited and hence free to use for optimization needs. On s390 this might corrupt initrd or ipl report which are often placed right after the decompressor buffer. Luckily the size of uncompressed kernel image is already known to the decompressor, so to avoid the problem simply specify it in the "out_len" parameter.
Link: https://github.com/facebook/zstd/commit/6a7ede3dfccb Signed-off-by: Vasily Gorbik gor@linux.ibm.com Tested-by: Alexander Egorenkov egorenar@linux.ibm.com Link: https://lore.kernel.org/r/patch-1.thread-41c676.git-41c676c2d153.your-ad-her... Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/boot/compressed/decompressor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/s390/boot/compressed/decompressor.c b/arch/s390/boot/compressed/decompressor.c index 45046630c56ac..c42ab33bd4524 100644 --- a/arch/s390/boot/compressed/decompressor.c +++ b/arch/s390/boot/compressed/decompressor.c @@ -80,6 +80,6 @@ void *decompress_kernel(void) void *output = (void *)decompress_offset;
__decompress(_compressed_start, _compressed_end - _compressed_start, - NULL, NULL, output, 0, NULL, error); + NULL, NULL, output, vmlinux.image_size, NULL, error); return output; }
From: Amit Engel Amit.Engel@dell.com
[ Upstream commit 0cab4404874f2de52617de8400c844891c6ea1ce ]
As part of nvmet_fc_ls_create_association there is a case where nvmet_fc_alloc_target_queue fails right after a new association with an admin queue is created. In this case, no one releases the get taken in nvmet_fc_alloc_target_assoc. This fix is adding the missing put.
Signed-off-by: Amit Engel Amit.Engel@dell.com Reviewed-by: James Smart jsmart2021@gmail.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/target/fc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c index 9b07e8c7689ab..f74fc6481731d 100644 --- a/drivers/nvme/target/fc.c +++ b/drivers/nvme/target/fc.c @@ -1362,8 +1362,10 @@ nvmet_fc_ls_create_association(struct nvmet_fc_tgtport *tgtport, else { queue = nvmet_fc_alloc_target_queue(iod->assoc, 0, be16_to_cpu(rqst->assoc_cmd.sqsize)); - if (!queue) + if (!queue) { ret = VERR_QUEUE_ALLOC_FAIL; + nvmet_fc_tgt_a_put(iod->assoc); + } } }
From: Seth Jenkins sethjenkins@google.com
commit 81e9d6f8647650a7bead74c5f926e29970e834d1 upstream.
Commit e4a0d3e720e7 ("aio: Make it possible to remap aio ring") introduced a null-deref if mremap is called on an old aio mapping after fork as mm->ioctx_table will be set to NULL.
[jmoyer@redhat.com: fix 80 column issue] Link: https://lkml.kernel.org/r/x49sffq4nvg.fsf@segfault.boston.devel.redhat.com Fixes: e4a0d3e720e7 ("aio: Make it possible to remap aio ring") Signed-off-by: Seth Jenkins sethjenkins@google.com Signed-off-by: Jeff Moyer jmoyer@redhat.com Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Benjamin LaHaise bcrl@kvack.org Cc: Jann Horn jannh@google.com Cc: Pavel Emelyanov xemul@parallels.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/aio.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/fs/aio.c +++ b/fs/aio.c @@ -336,6 +336,9 @@ static int aio_ring_mremap(struct vm_are spin_lock(&mm->ioctx_lock); rcu_read_lock(); table = rcu_dereference(mm->ioctx_table); + if (!table) + goto out_unlock; + for (i = 0; i < table->nr; i++) { struct kioctx *ctx;
@@ -349,6 +352,7 @@ static int aio_ring_mremap(struct vm_are } }
+out_unlock: rcu_read_unlock(); spin_unlock(&mm->ioctx_lock); return res;
From: Anand Jain anand.jain@oracle.com
commit 5f58d783fd7823b2c2d5954d1126e702f94bfc4c upstream.
We have this check to make sure we don't accidentally add older devices that may have disappeared and re-appeared with an older generation from being added to an fs_devices (such as a replace source device). This makes sense, we don't want stale disks in our file system. However for single disks this doesn't really make sense.
I've seen this in testing, but I was provided a reproducer from a project that builds btrfs images on loopback devices. The loopback device gets cached with the new generation, and then if it is re-used to generate a new file system we'll fail to mount it because the new fs is "older" than what we have in cache.
Fix this by freeing the cache when closing the device for a single device filesystem. This will ensure that the mount command passed device path is scanned successfully during the next mount.
CC: stable@vger.kernel.org # 5.10+ Reported-by: Daan De Meyer daandemeyer@fb.com Signed-off-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Anand Jain anand.jain@oracle.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Anand Jain anand.jain@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/volumes.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
--- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -354,6 +354,7 @@ void btrfs_free_device(struct btrfs_devi static void free_fs_devices(struct btrfs_fs_devices *fs_devices) { struct btrfs_device *device; + WARN_ON(fs_devices->opened); while (!list_empty(&fs_devices->devices)) { device = list_entry(fs_devices->devices.next, @@ -1401,6 +1402,17 @@ int btrfs_close_devices(struct btrfs_fs_ if (!fs_devices->opened) { seed_devices = fs_devices->seed; fs_devices->seed = NULL; + + /* + * If the struct btrfs_fs_devices is not assembled with any + * other device, it can be re-initialized during the next mount + * without the needing device-scan step. Therefore, it can be + * fully freed. + */ + if (fs_devices->num_devices == 1) { + list_del(&fs_devices->fs_list); + free_fs_devices(fs_devices); + } } mutex_unlock(&uuid_mutex);
From: Florian Westphal fw@strlen.de
commit 18bbc3213383a82b05383827f4b1b882e3f0a5a5 upstream.
TPROXY is only allowed from prerouting, but nft_tproxy doesn't check this. This fixes a crash (null dereference) when using tproxy from e.g. output.
Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support") Reported-by: Shell Chen xierch@gmail.com Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Qingfang DENG dqfext@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nft_tproxy.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/net/netfilter/nft_tproxy.c +++ b/net/netfilter/nft_tproxy.c @@ -289,6 +289,13 @@ static int nft_tproxy_dump(struct sk_buf return 0; }
+static int nft_tproxy_validate(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nft_data **data) +{ + return nft_chain_validate_hooks(ctx->chain, 1 << NF_INET_PRE_ROUTING); +} + static struct nft_expr_type nft_tproxy_type; static const struct nft_expr_ops nft_tproxy_ops = { .type = &nft_tproxy_type, @@ -296,6 +303,7 @@ static const struct nft_expr_ops nft_tpr .eval = nft_tproxy_eval, .init = nft_tproxy_init, .dump = nft_tproxy_dump, + .validate = nft_tproxy_validate, };
static struct nft_expr_type nft_tproxy_type __read_mostly = {
From: Christoph Hellwig hch@lst.de
commit 82ff450b2d936d778361a1de43eb078cc043c7fe upstream.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_extfree_item.c | 2 +- fs/xfs/xfs_extfree_item.h | 10 +++++----- fs/xfs/xfs_log_recover.c | 4 ++-- fs/xfs/xfs_super.c | 2 +- 4 files changed, 9 insertions(+), 9 deletions(-)
--- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -161,7 +161,7 @@ xfs_efi_init(
ASSERT(nextents > 0); if (nextents > XFS_EFI_MAX_FAST_EXTENTS) { - size = (uint)(sizeof(xfs_efi_log_item_t) + + size = (uint)(sizeof(struct xfs_efi_log_item) + ((nextents - 1) * sizeof(xfs_extent_t))); efip = kmem_zalloc(size, 0); } else { --- a/fs/xfs/xfs_extfree_item.h +++ b/fs/xfs/xfs_extfree_item.h @@ -50,13 +50,13 @@ struct kmem_zone; * of commit failure or log I/O errors. Note that the EFD is not inserted in the * AIL, so at this point both the EFI and EFD are freed. */ -typedef struct xfs_efi_log_item { +struct xfs_efi_log_item { struct xfs_log_item efi_item; atomic_t efi_refcount; atomic_t efi_next_extent; unsigned long efi_flags; /* misc flags */ xfs_efi_log_format_t efi_format; -} xfs_efi_log_item_t; +};
/* * This is the "extent free done" log item. It is used to log @@ -65,7 +65,7 @@ typedef struct xfs_efi_log_item { */ typedef struct xfs_efd_log_item { struct xfs_log_item efd_item; - xfs_efi_log_item_t *efd_efip; + struct xfs_efi_log_item *efd_efip; uint efd_next_extent; xfs_efd_log_format_t efd_format; } xfs_efd_log_item_t; @@ -78,10 +78,10 @@ typedef struct xfs_efd_log_item { extern struct kmem_zone *xfs_efi_zone; extern struct kmem_zone *xfs_efd_zone;
-xfs_efi_log_item_t *xfs_efi_init(struct xfs_mount *, uint); +struct xfs_efi_log_item *xfs_efi_init(struct xfs_mount *, uint); int xfs_efi_copy_format(xfs_log_iovec_t *buf, xfs_efi_log_format_t *dst_efi_fmt); -void xfs_efi_item_free(xfs_efi_log_item_t *); +void xfs_efi_item_free(struct xfs_efi_log_item *); void xfs_efi_release(struct xfs_efi_log_item *);
int xfs_efi_recover(struct xfs_mount *mp, --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -3384,7 +3384,7 @@ xlog_recover_efd_pass2( struct xlog_recover_item *item) { xfs_efd_log_format_t *efd_formatp; - xfs_efi_log_item_t *efip = NULL; + struct xfs_efi_log_item *efip = NULL; struct xfs_log_item *lip; uint64_t efi_id; struct xfs_ail_cursor cur; @@ -3405,7 +3405,7 @@ xlog_recover_efd_pass2( lip = xfs_trans_ail_cursor_first(ailp, &cur, 0); while (lip != NULL) { if (lip->li_type == XFS_LI_EFI) { - efip = (xfs_efi_log_item_t *)lip; + efip = (struct xfs_efi_log_item *)lip; if (efip->efi_format.efi_id == efi_id) { /* * Drop the EFD reference to the EFI. This --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1920,7 +1920,7 @@ xfs_init_zones(void) if (!xfs_efd_zone) goto out_destroy_buf_item_zone;
- xfs_efi_zone = kmem_zone_init((sizeof(xfs_efi_log_item_t) + + xfs_efi_zone = kmem_zone_init((sizeof(struct xfs_efi_log_item) + ((XFS_EFI_MAX_FAST_EXTENTS - 1) * sizeof(xfs_extent_t))), "xfs_efi_item"); if (!xfs_efi_zone)
From: Christoph Hellwig hch@lst.de
commit c84e819090f39e96e4d432c9047a50d2424f99e0 upstream.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_extfree_item.h | 4 ++-- fs/xfs/xfs_super.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)
--- a/fs/xfs/xfs_extfree_item.h +++ b/fs/xfs/xfs_extfree_item.h @@ -63,12 +63,12 @@ struct xfs_efi_log_item { * the fact that some extents earlier mentioned in an efi item * have been freed. */ -typedef struct xfs_efd_log_item { +struct xfs_efd_log_item { struct xfs_log_item efd_item; struct xfs_efi_log_item *efd_efip; uint efd_next_extent; xfs_efd_log_format_t efd_format; -} xfs_efd_log_item_t; +};
/* * Max number of extents in fast allocation path. --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1914,7 +1914,7 @@ xfs_init_zones(void) if (!xfs_buf_item_zone) goto out_destroy_trans_zone;
- xfs_efd_zone = kmem_zone_init((sizeof(xfs_efd_log_item_t) + + xfs_efd_zone = kmem_zone_init((sizeof(struct xfs_efd_log_item) + ((XFS_EFD_MAX_FAST_EXTENTS - 1) * sizeof(xfs_extent_t))), "xfs_efd_item"); if (!xfs_efd_zone)
From: Christoph Hellwig hch@lst.de
commit fd9cbe51215198ccffa64169c98eae35b0916088 upstream.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_inode_fork.c | 2 +- fs/xfs/libxfs/xfs_trans_inode.c | 2 +- fs/xfs/xfs_inode.c | 4 ++-- fs/xfs/xfs_inode_item.c | 2 +- fs/xfs/xfs_inode_item.h | 4 ++-- fs/xfs/xfs_super.c | 4 ++-- 6 files changed, 9 insertions(+), 9 deletions(-)
--- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -592,7 +592,7 @@ void xfs_iflush_fork( xfs_inode_t *ip, xfs_dinode_t *dip, - xfs_inode_log_item_t *iip, + struct xfs_inode_log_item *iip, int whichfork) { char *cp; --- a/fs/xfs/libxfs/xfs_trans_inode.c +++ b/fs/xfs/libxfs/xfs_trans_inode.c @@ -27,7 +27,7 @@ xfs_trans_ijoin( struct xfs_inode *ip, uint lock_flags) { - xfs_inode_log_item_t *iip; + struct xfs_inode_log_item *iip;
ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); if (ip->i_itemp == NULL) --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -2555,7 +2555,7 @@ xfs_ifree_cluster( xfs_daddr_t blkno; xfs_buf_t *bp; xfs_inode_t *ip; - xfs_inode_log_item_t *iip; + struct xfs_inode_log_item *iip; struct xfs_log_item *lip; struct xfs_perag *pag; struct xfs_ino_geometry *igeo = M_IGEO(mp); @@ -2617,7 +2617,7 @@ xfs_ifree_cluster( */ list_for_each_entry(lip, &bp->b_li_list, li_bio_list) { if (lip->li_type == XFS_LI_INODE) { - iip = (xfs_inode_log_item_t *)lip; + iip = (struct xfs_inode_log_item *)lip; ASSERT(iip->ili_logged == 1); lip->li_cb = xfs_istale_done; xfs_trans_ail_copy_lsn(mp->m_ail, --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -781,7 +781,7 @@ xfs_iflush_abort( xfs_inode_t *ip, bool stale) { - xfs_inode_log_item_t *iip = ip->i_itemp; + struct xfs_inode_log_item *iip = ip->i_itemp;
if (iip) { if (test_bit(XFS_LI_IN_AIL, &iip->ili_item.li_flags)) { --- a/fs/xfs/xfs_inode_item.h +++ b/fs/xfs/xfs_inode_item.h @@ -13,7 +13,7 @@ struct xfs_bmbt_rec; struct xfs_inode; struct xfs_mount;
-typedef struct xfs_inode_log_item { +struct xfs_inode_log_item { struct xfs_log_item ili_item; /* common portion */ struct xfs_inode *ili_inode; /* inode ptr */ xfs_lsn_t ili_flush_lsn; /* lsn at last flush */ @@ -23,7 +23,7 @@ typedef struct xfs_inode_log_item { unsigned int ili_last_fields; /* fields when flushed */ unsigned int ili_fields; /* fields to be logged */ unsigned int ili_fsync_fields; /* logged since last fsync */ -} xfs_inode_log_item_t; +};
static inline int xfs_inode_clean(xfs_inode_t *ip) { --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1934,8 +1934,8 @@ xfs_init_zones(void) goto out_destroy_efi_zone;
xfs_ili_zone = - kmem_zone_init_flags(sizeof(xfs_inode_log_item_t), "xfs_ili", - KM_ZONE_SPREAD, NULL); + kmem_zone_init_flags(sizeof(struct xfs_inode_log_item), + "xfs_ili", KM_ZONE_SPREAD, NULL); if (!xfs_ili_zone) goto out_destroy_inode_zone; xfs_icreate_zone = kmem_zone_init(sizeof(struct xfs_icreate_item),
From: Christoph Hellwig hch@lst.de
commit e046e949486ec92d83b2ccdf0e7e9144f74ef028 upstream.
Create a helper that encapsulates the whole logic to create a defer intent. This reorders some of the work that was done, but none of that has an affect on the operation as only fields that don't directly interact are affected.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_defer.c | 39 +++++++++++++++++++++++---------------- 1 file changed, 23 insertions(+), 16 deletions(-)
--- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -178,6 +178,23 @@ static const struct xfs_defer_op_type *d [XFS_DEFER_OPS_TYPE_AGFL_FREE] = &xfs_agfl_free_defer_type, };
+static void +xfs_defer_create_intent( + struct xfs_trans *tp, + struct xfs_defer_pending *dfp, + bool sort) +{ + const struct xfs_defer_op_type *ops = defer_op_types[dfp->dfp_type]; + struct list_head *li; + + if (sort) + list_sort(tp->t_mountp, &dfp->dfp_work, ops->diff_items); + + dfp->dfp_intent = ops->create_intent(tp, dfp->dfp_count); + list_for_each(li, &dfp->dfp_work) + ops->log_item(tp, dfp->dfp_intent, li); +} + /* * For each pending item in the intake list, log its intent item and the * associated extents, then add the entire intake list to the end of @@ -187,17 +204,11 @@ STATIC void xfs_defer_create_intents( struct xfs_trans *tp) { - struct list_head *li; struct xfs_defer_pending *dfp; - const struct xfs_defer_op_type *ops;
list_for_each_entry(dfp, &tp->t_dfops, dfp_list) { - ops = defer_op_types[dfp->dfp_type]; - dfp->dfp_intent = ops->create_intent(tp, dfp->dfp_count); trace_xfs_defer_create_intent(tp->t_mountp, dfp); - list_sort(tp->t_mountp, &dfp->dfp_work, ops->diff_items); - list_for_each(li, &dfp->dfp_work) - ops->log_item(tp, dfp->dfp_intent, li); + xfs_defer_create_intent(tp, dfp, true); } }
@@ -427,17 +438,13 @@ xfs_defer_finish_noroll( } if (error == -EAGAIN) { /* - * Caller wants a fresh transaction, so log a - * new log intent item to replace the old one - * and roll the transaction. See "Requesting - * a Fresh Transaction while Finishing - * Deferred Work" above. + * Caller wants a fresh transaction, so log a new log + * intent item to replace the old one and roll the + * transaction. See "Requesting a Fresh Transaction + * while Finishing Deferred Work" above. */ - dfp->dfp_intent = ops->create_intent(*tp, - dfp->dfp_count); dfp->dfp_done = NULL; - list_for_each(li, &dfp->dfp_work) - ops->log_item(*tp, dfp->dfp_intent, li); + xfs_defer_create_intent(*tp, dfp, false); } else { /* Done with the dfp, free it. */ list_del(&dfp->dfp_list);
From: Christoph Hellwig hch@lst.de
commit c1f09188e8de0ae65433cb9c8ace4feb66359bcc upstream.
These are aways called together, and my merging them we reduce the amount of indirect calls, improve type safety and in general clean up the code a bit.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_defer.c | 6 +---- fs/xfs/libxfs/xfs_defer.h | 4 +-- fs/xfs/xfs_bmap_item.c | 47 +++++++++++++++++-------------------------- fs/xfs/xfs_extfree_item.c | 49 ++++++++++++++++++--------------------------- fs/xfs/xfs_refcount_item.c | 48 ++++++++++++++++++-------------------------- fs/xfs/xfs_rmap_item.c | 48 ++++++++++++++++++-------------------------- 6 files changed, 83 insertions(+), 119 deletions(-)
--- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -185,14 +185,12 @@ xfs_defer_create_intent( bool sort) { const struct xfs_defer_op_type *ops = defer_op_types[dfp->dfp_type]; - struct list_head *li;
if (sort) list_sort(tp->t_mountp, &dfp->dfp_work, ops->diff_items);
- dfp->dfp_intent = ops->create_intent(tp, dfp->dfp_count); - list_for_each(li, &dfp->dfp_work) - ops->log_item(tp, dfp->dfp_intent, li); + dfp->dfp_intent = ops->create_intent(tp, &dfp->dfp_work, + dfp->dfp_count); }
/* --- a/fs/xfs/libxfs/xfs_defer.h +++ b/fs/xfs/libxfs/xfs_defer.h @@ -50,8 +50,8 @@ struct xfs_defer_op_type { void (*finish_cleanup)(struct xfs_trans *, void *, int); void (*cancel_item)(struct list_head *); int (*diff_items)(void *, struct list_head *, struct list_head *); - void *(*create_intent)(struct xfs_trans *, uint); - void (*log_item)(struct xfs_trans *, void *, struct list_head *); + void *(*create_intent)(struct xfs_trans *tp, struct list_head *items, + unsigned int count); unsigned int max_items; };
--- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -278,27 +278,6 @@ xfs_bmap_update_diff_items( return ba->bi_owner->i_ino - bb->bi_owner->i_ino; }
-/* Get an BUI. */ -STATIC void * -xfs_bmap_update_create_intent( - struct xfs_trans *tp, - unsigned int count) -{ - struct xfs_bui_log_item *buip; - - ASSERT(count == XFS_BUI_MAX_FAST_EXTENTS); - ASSERT(tp != NULL); - - buip = xfs_bui_init(tp->t_mountp); - ASSERT(buip != NULL); - - /* - * Get a log_item_desc to point at the new item. - */ - xfs_trans_add_item(tp, &buip->bui_item); - return buip; -} - /* Set the map extent flags for this mapping. */ static void xfs_trans_set_bmap_flags( @@ -326,16 +305,12 @@ xfs_trans_set_bmap_flags( STATIC void xfs_bmap_update_log_item( struct xfs_trans *tp, - void *intent, - struct list_head *item) + struct xfs_bui_log_item *buip, + struct xfs_bmap_intent *bmap) { - struct xfs_bui_log_item *buip = intent; - struct xfs_bmap_intent *bmap; uint next_extent; struct xfs_map_extent *map;
- bmap = container_of(item, struct xfs_bmap_intent, bi_list); - tp->t_flags |= XFS_TRANS_DIRTY; set_bit(XFS_LI_DIRTY, &buip->bui_item.li_flags);
@@ -355,6 +330,23 @@ xfs_bmap_update_log_item( bmap->bi_bmap.br_state); }
+STATIC void * +xfs_bmap_update_create_intent( + struct xfs_trans *tp, + struct list_head *items, + unsigned int count) +{ + struct xfs_bui_log_item *buip = xfs_bui_init(tp->t_mountp); + struct xfs_bmap_intent *bmap; + + ASSERT(count == XFS_BUI_MAX_FAST_EXTENTS); + + xfs_trans_add_item(tp, &buip->bui_item); + list_for_each_entry(bmap, items, bi_list) + xfs_bmap_update_log_item(tp, buip, bmap); + return buip; +} + /* Get an BUD so we can process all the deferred rmap updates. */ STATIC void * xfs_bmap_update_create_done( @@ -419,7 +411,6 @@ const struct xfs_defer_op_type xfs_bmap_ .diff_items = xfs_bmap_update_diff_items, .create_intent = xfs_bmap_update_create_intent, .abort_intent = xfs_bmap_update_abort_intent, - .log_item = xfs_bmap_update_log_item, .create_done = xfs_bmap_update_create_done, .finish_item = xfs_bmap_update_finish_item, .cancel_item = xfs_bmap_update_cancel_item, --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -412,41 +412,16 @@ xfs_extent_free_diff_items( XFS_FSB_TO_AGNO(mp, rb->xefi_startblock); }
-/* Get an EFI. */ -STATIC void * -xfs_extent_free_create_intent( - struct xfs_trans *tp, - unsigned int count) -{ - struct xfs_efi_log_item *efip; - - ASSERT(tp != NULL); - ASSERT(count > 0); - - efip = xfs_efi_init(tp->t_mountp, count); - ASSERT(efip != NULL); - - /* - * Get a log_item_desc to point at the new item. - */ - xfs_trans_add_item(tp, &efip->efi_item); - return efip; -} - /* Log a free extent to the intent item. */ STATIC void xfs_extent_free_log_item( struct xfs_trans *tp, - void *intent, - struct list_head *item) + struct xfs_efi_log_item *efip, + struct xfs_extent_free_item *free) { - struct xfs_efi_log_item *efip = intent; - struct xfs_extent_free_item *free; uint next_extent; struct xfs_extent *extp;
- free = container_of(item, struct xfs_extent_free_item, xefi_list); - tp->t_flags |= XFS_TRANS_DIRTY; set_bit(XFS_LI_DIRTY, &efip->efi_item.li_flags);
@@ -462,6 +437,24 @@ xfs_extent_free_log_item( extp->ext_len = free->xefi_blockcount; }
+STATIC void * +xfs_extent_free_create_intent( + struct xfs_trans *tp, + struct list_head *items, + unsigned int count) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_efi_log_item *efip = xfs_efi_init(mp, count); + struct xfs_extent_free_item *free; + + ASSERT(count > 0); + + xfs_trans_add_item(tp, &efip->efi_item); + list_for_each_entry(free, items, xefi_list) + xfs_extent_free_log_item(tp, efip, free); + return efip; +} + /* Get an EFD so we can process all the free extents. */ STATIC void * xfs_extent_free_create_done( @@ -516,7 +509,6 @@ const struct xfs_defer_op_type xfs_exten .diff_items = xfs_extent_free_diff_items, .create_intent = xfs_extent_free_create_intent, .abort_intent = xfs_extent_free_abort_intent, - .log_item = xfs_extent_free_log_item, .create_done = xfs_extent_free_create_done, .finish_item = xfs_extent_free_finish_item, .cancel_item = xfs_extent_free_cancel_item, @@ -582,7 +574,6 @@ const struct xfs_defer_op_type xfs_agfl_ .diff_items = xfs_extent_free_diff_items, .create_intent = xfs_extent_free_create_intent, .abort_intent = xfs_extent_free_abort_intent, - .log_item = xfs_extent_free_log_item, .create_done = xfs_extent_free_create_done, .finish_item = xfs_agfl_free_finish_item, .cancel_item = xfs_extent_free_cancel_item, --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -284,27 +284,6 @@ xfs_refcount_update_diff_items( XFS_FSB_TO_AGNO(mp, rb->ri_startblock); }
-/* Get an CUI. */ -STATIC void * -xfs_refcount_update_create_intent( - struct xfs_trans *tp, - unsigned int count) -{ - struct xfs_cui_log_item *cuip; - - ASSERT(tp != NULL); - ASSERT(count > 0); - - cuip = xfs_cui_init(tp->t_mountp, count); - ASSERT(cuip != NULL); - - /* - * Get a log_item_desc to point at the new item. - */ - xfs_trans_add_item(tp, &cuip->cui_item); - return cuip; -} - /* Set the phys extent flags for this reverse mapping. */ static void xfs_trans_set_refcount_flags( @@ -328,16 +307,12 @@ xfs_trans_set_refcount_flags( STATIC void xfs_refcount_update_log_item( struct xfs_trans *tp, - void *intent, - struct list_head *item) + struct xfs_cui_log_item *cuip, + struct xfs_refcount_intent *refc) { - struct xfs_cui_log_item *cuip = intent; - struct xfs_refcount_intent *refc; uint next_extent; struct xfs_phys_extent *ext;
- refc = container_of(item, struct xfs_refcount_intent, ri_list); - tp->t_flags |= XFS_TRANS_DIRTY; set_bit(XFS_LI_DIRTY, &cuip->cui_item.li_flags);
@@ -354,6 +329,24 @@ xfs_refcount_update_log_item( xfs_trans_set_refcount_flags(ext, refc->ri_type); }
+STATIC void * +xfs_refcount_update_create_intent( + struct xfs_trans *tp, + struct list_head *items, + unsigned int count) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_cui_log_item *cuip = xfs_cui_init(mp, count); + struct xfs_refcount_intent *refc; + + ASSERT(count > 0); + + xfs_trans_add_item(tp, &cuip->cui_item); + list_for_each_entry(refc, items, ri_list) + xfs_refcount_update_log_item(tp, cuip, refc); + return cuip; +} + /* Get an CUD so we can process all the deferred refcount updates. */ STATIC void * xfs_refcount_update_create_done( @@ -432,7 +425,6 @@ const struct xfs_defer_op_type xfs_refco .diff_items = xfs_refcount_update_diff_items, .create_intent = xfs_refcount_update_create_intent, .abort_intent = xfs_refcount_update_abort_intent, - .log_item = xfs_refcount_update_log_item, .create_done = xfs_refcount_update_create_done, .finish_item = xfs_refcount_update_finish_item, .finish_cleanup = xfs_refcount_update_finish_cleanup, --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -352,41 +352,16 @@ xfs_rmap_update_diff_items( XFS_FSB_TO_AGNO(mp, rb->ri_bmap.br_startblock); }
-/* Get an RUI. */ -STATIC void * -xfs_rmap_update_create_intent( - struct xfs_trans *tp, - unsigned int count) -{ - struct xfs_rui_log_item *ruip; - - ASSERT(tp != NULL); - ASSERT(count > 0); - - ruip = xfs_rui_init(tp->t_mountp, count); - ASSERT(ruip != NULL); - - /* - * Get a log_item_desc to point at the new item. - */ - xfs_trans_add_item(tp, &ruip->rui_item); - return ruip; -} - /* Log rmap updates in the intent item. */ STATIC void xfs_rmap_update_log_item( struct xfs_trans *tp, - void *intent, - struct list_head *item) + struct xfs_rui_log_item *ruip, + struct xfs_rmap_intent *rmap) { - struct xfs_rui_log_item *ruip = intent; - struct xfs_rmap_intent *rmap; uint next_extent; struct xfs_map_extent *map;
- rmap = container_of(item, struct xfs_rmap_intent, ri_list); - tp->t_flags |= XFS_TRANS_DIRTY; set_bit(XFS_LI_DIRTY, &ruip->rui_item.li_flags);
@@ -406,6 +381,24 @@ xfs_rmap_update_log_item( rmap->ri_bmap.br_state); }
+STATIC void * +xfs_rmap_update_create_intent( + struct xfs_trans *tp, + struct list_head *items, + unsigned int count) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_rui_log_item *ruip = xfs_rui_init(mp, count); + struct xfs_rmap_intent *rmap; + + ASSERT(count > 0); + + xfs_trans_add_item(tp, &ruip->rui_item); + list_for_each_entry(rmap, items, ri_list) + xfs_rmap_update_log_item(tp, ruip, rmap); + return ruip; +} + /* Get an RUD so we can process all the deferred rmap updates. */ STATIC void * xfs_rmap_update_create_done( @@ -476,7 +469,6 @@ const struct xfs_defer_op_type xfs_rmap_ .diff_items = xfs_rmap_update_diff_items, .create_intent = xfs_rmap_update_create_intent, .abort_intent = xfs_rmap_update_abort_intent, - .log_item = xfs_rmap_update_log_item, .create_done = xfs_rmap_update_create_done, .finish_item = xfs_rmap_update_finish_item, .finish_cleanup = xfs_rmap_update_finish_cleanup,
From: Christoph Hellwig hch@lst.de
commit d367a868e46b025a8ced8e00ef2b3a3c2f3bf732 upstream.
This avoids a per-item indirect call, and also simplifies the interface a bit.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_defer.c | 5 +---- fs/xfs/libxfs/xfs_defer.h | 3 +-- fs/xfs/xfs_bmap_item.c | 9 ++++++--- fs/xfs/xfs_extfree_item.c | 7 ++++--- fs/xfs/xfs_refcount_item.c | 6 ++++-- fs/xfs/xfs_rmap_item.c | 6 ++++-- 6 files changed, 20 insertions(+), 16 deletions(-)
--- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -186,11 +186,8 @@ xfs_defer_create_intent( { const struct xfs_defer_op_type *ops = defer_op_types[dfp->dfp_type];
- if (sort) - list_sort(tp->t_mountp, &dfp->dfp_work, ops->diff_items); - dfp->dfp_intent = ops->create_intent(tp, &dfp->dfp_work, - dfp->dfp_count); + dfp->dfp_count, sort); }
/* --- a/fs/xfs/libxfs/xfs_defer.h +++ b/fs/xfs/libxfs/xfs_defer.h @@ -49,9 +49,8 @@ struct xfs_defer_op_type { void **); void (*finish_cleanup)(struct xfs_trans *, void *, int); void (*cancel_item)(struct list_head *); - int (*diff_items)(void *, struct list_head *, struct list_head *); void *(*create_intent)(struct xfs_trans *tp, struct list_head *items, - unsigned int count); + unsigned int count, bool sort); unsigned int max_items; };
--- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -334,14 +334,18 @@ STATIC void * xfs_bmap_update_create_intent( struct xfs_trans *tp, struct list_head *items, - unsigned int count) + unsigned int count, + bool sort) { - struct xfs_bui_log_item *buip = xfs_bui_init(tp->t_mountp); + struct xfs_mount *mp = tp->t_mountp; + struct xfs_bui_log_item *buip = xfs_bui_init(mp); struct xfs_bmap_intent *bmap;
ASSERT(count == XFS_BUI_MAX_FAST_EXTENTS);
xfs_trans_add_item(tp, &buip->bui_item); + if (sort) + list_sort(mp, items, xfs_bmap_update_diff_items); list_for_each_entry(bmap, items, bi_list) xfs_bmap_update_log_item(tp, buip, bmap); return buip; @@ -408,7 +412,6 @@ xfs_bmap_update_cancel_item(
const struct xfs_defer_op_type xfs_bmap_update_defer_type = { .max_items = XFS_BUI_MAX_FAST_EXTENTS, - .diff_items = xfs_bmap_update_diff_items, .create_intent = xfs_bmap_update_create_intent, .abort_intent = xfs_bmap_update_abort_intent, .create_done = xfs_bmap_update_create_done, --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -441,7 +441,8 @@ STATIC void * xfs_extent_free_create_intent( struct xfs_trans *tp, struct list_head *items, - unsigned int count) + unsigned int count, + bool sort) { struct xfs_mount *mp = tp->t_mountp; struct xfs_efi_log_item *efip = xfs_efi_init(mp, count); @@ -450,6 +451,8 @@ xfs_extent_free_create_intent( ASSERT(count > 0);
xfs_trans_add_item(tp, &efip->efi_item); + if (sort) + list_sort(mp, items, xfs_extent_free_diff_items); list_for_each_entry(free, items, xefi_list) xfs_extent_free_log_item(tp, efip, free); return efip; @@ -506,7 +509,6 @@ xfs_extent_free_cancel_item(
const struct xfs_defer_op_type xfs_extent_free_defer_type = { .max_items = XFS_EFI_MAX_FAST_EXTENTS, - .diff_items = xfs_extent_free_diff_items, .create_intent = xfs_extent_free_create_intent, .abort_intent = xfs_extent_free_abort_intent, .create_done = xfs_extent_free_create_done, @@ -571,7 +573,6 @@ xfs_agfl_free_finish_item( /* sub-type with special handling for AGFL deferred frees */ const struct xfs_defer_op_type xfs_agfl_free_defer_type = { .max_items = XFS_EFI_MAX_FAST_EXTENTS, - .diff_items = xfs_extent_free_diff_items, .create_intent = xfs_extent_free_create_intent, .abort_intent = xfs_extent_free_abort_intent, .create_done = xfs_extent_free_create_done, --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -333,7 +333,8 @@ STATIC void * xfs_refcount_update_create_intent( struct xfs_trans *tp, struct list_head *items, - unsigned int count) + unsigned int count, + bool sort) { struct xfs_mount *mp = tp->t_mountp; struct xfs_cui_log_item *cuip = xfs_cui_init(mp, count); @@ -342,6 +343,8 @@ xfs_refcount_update_create_intent( ASSERT(count > 0);
xfs_trans_add_item(tp, &cuip->cui_item); + if (sort) + list_sort(mp, items, xfs_refcount_update_diff_items); list_for_each_entry(refc, items, ri_list) xfs_refcount_update_log_item(tp, cuip, refc); return cuip; @@ -422,7 +425,6 @@ xfs_refcount_update_cancel_item(
const struct xfs_defer_op_type xfs_refcount_update_defer_type = { .max_items = XFS_CUI_MAX_FAST_EXTENTS, - .diff_items = xfs_refcount_update_diff_items, .create_intent = xfs_refcount_update_create_intent, .abort_intent = xfs_refcount_update_abort_intent, .create_done = xfs_refcount_update_create_done, --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -385,7 +385,8 @@ STATIC void * xfs_rmap_update_create_intent( struct xfs_trans *tp, struct list_head *items, - unsigned int count) + unsigned int count, + bool sort) { struct xfs_mount *mp = tp->t_mountp; struct xfs_rui_log_item *ruip = xfs_rui_init(mp, count); @@ -394,6 +395,8 @@ xfs_rmap_update_create_intent( ASSERT(count > 0);
xfs_trans_add_item(tp, &ruip->rui_item); + if (sort) + list_sort(mp, items, xfs_rmap_update_diff_items); list_for_each_entry(rmap, items, ri_list) xfs_rmap_update_log_item(tp, ruip, rmap); return ruip; @@ -466,7 +469,6 @@ xfs_rmap_update_cancel_item(
const struct xfs_defer_op_type xfs_rmap_update_defer_type = { .max_items = XFS_RUI_MAX_FAST_EXTENTS, - .diff_items = xfs_rmap_update_diff_items, .create_intent = xfs_rmap_update_create_intent, .abort_intent = xfs_rmap_update_abort_intent, .create_done = xfs_rmap_update_create_done,
From: Christoph Hellwig hch@lst.de
commit 13a8333339072b8654c1d2c75550ee9f41ee15de upstream.
All defer op instance place their own extension of the log item into the dfp_intent field. Replace that with a xfs_log_item to improve type safety and make the code easier to follow.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_defer.h | 11 ++++++----- fs/xfs/xfs_bmap_item.c | 12 ++++++------ fs/xfs/xfs_extfree_item.c | 12 ++++++------ fs/xfs/xfs_refcount_item.c | 12 ++++++------ fs/xfs/xfs_rmap_item.c | 12 ++++++------ 5 files changed, 30 insertions(+), 29 deletions(-)
--- a/fs/xfs/libxfs/xfs_defer.h +++ b/fs/xfs/libxfs/xfs_defer.h @@ -28,7 +28,7 @@ enum xfs_defer_ops_type { struct xfs_defer_pending { struct list_head dfp_list; /* pending items */ struct list_head dfp_work; /* work items */ - void *dfp_intent; /* log intent item */ + struct xfs_log_item *dfp_intent; /* log intent item */ void *dfp_done; /* log done item */ unsigned int dfp_count; /* # extent items */ enum xfs_defer_ops_type dfp_type; @@ -43,14 +43,15 @@ void xfs_defer_move(struct xfs_trans *dt
/* Description of a deferred type. */ struct xfs_defer_op_type { - void (*abort_intent)(void *); - void *(*create_done)(struct xfs_trans *, void *, unsigned int); + struct xfs_log_item *(*create_intent)(struct xfs_trans *tp, + struct list_head *items, unsigned int count, bool sort); + void (*abort_intent)(struct xfs_log_item *intent); + void *(*create_done)(struct xfs_trans *tp, struct xfs_log_item *intent, + unsigned int count); int (*finish_item)(struct xfs_trans *, struct list_head *, void *, void **); void (*finish_cleanup)(struct xfs_trans *, void *, int); void (*cancel_item)(struct list_head *); - void *(*create_intent)(struct xfs_trans *tp, struct list_head *items, - unsigned int count, bool sort); unsigned int max_items; };
--- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -330,7 +330,7 @@ xfs_bmap_update_log_item( bmap->bi_bmap.br_state); }
-STATIC void * +static struct xfs_log_item * xfs_bmap_update_create_intent( struct xfs_trans *tp, struct list_head *items, @@ -348,17 +348,17 @@ xfs_bmap_update_create_intent( list_sort(mp, items, xfs_bmap_update_diff_items); list_for_each_entry(bmap, items, bi_list) xfs_bmap_update_log_item(tp, buip, bmap); - return buip; + return &buip->bui_item; }
/* Get an BUD so we can process all the deferred rmap updates. */ STATIC void * xfs_bmap_update_create_done( struct xfs_trans *tp, - void *intent, + struct xfs_log_item *intent, unsigned int count) { - return xfs_trans_get_bud(tp, intent); + return xfs_trans_get_bud(tp, BUI_ITEM(intent)); }
/* Process a deferred rmap update. */ @@ -394,9 +394,9 @@ xfs_bmap_update_finish_item( /* Abort all pending BUIs. */ STATIC void xfs_bmap_update_abort_intent( - void *intent) + struct xfs_log_item *intent) { - xfs_bui_release(intent); + xfs_bui_release(BUI_ITEM(intent)); }
/* Cancel a deferred rmap update. */ --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -437,7 +437,7 @@ xfs_extent_free_log_item( extp->ext_len = free->xefi_blockcount; }
-STATIC void * +static struct xfs_log_item * xfs_extent_free_create_intent( struct xfs_trans *tp, struct list_head *items, @@ -455,17 +455,17 @@ xfs_extent_free_create_intent( list_sort(mp, items, xfs_extent_free_diff_items); list_for_each_entry(free, items, xefi_list) xfs_extent_free_log_item(tp, efip, free); - return efip; + return &efip->efi_item; }
/* Get an EFD so we can process all the free extents. */ STATIC void * xfs_extent_free_create_done( struct xfs_trans *tp, - void *intent, + struct xfs_log_item *intent, unsigned int count) { - return xfs_trans_get_efd(tp, intent, count); + return xfs_trans_get_efd(tp, EFI_ITEM(intent), count); }
/* Process a free extent. */ @@ -491,9 +491,9 @@ xfs_extent_free_finish_item( /* Abort all pending EFIs. */ STATIC void xfs_extent_free_abort_intent( - void *intent) + struct xfs_log_item *intent) { - xfs_efi_release(intent); + xfs_efi_release(EFI_ITEM(intent)); }
/* Cancel a free extent. */ --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -329,7 +329,7 @@ xfs_refcount_update_log_item( xfs_trans_set_refcount_flags(ext, refc->ri_type); }
-STATIC void * +static struct xfs_log_item * xfs_refcount_update_create_intent( struct xfs_trans *tp, struct list_head *items, @@ -347,17 +347,17 @@ xfs_refcount_update_create_intent( list_sort(mp, items, xfs_refcount_update_diff_items); list_for_each_entry(refc, items, ri_list) xfs_refcount_update_log_item(tp, cuip, refc); - return cuip; + return &cuip->cui_item; }
/* Get an CUD so we can process all the deferred refcount updates. */ STATIC void * xfs_refcount_update_create_done( struct xfs_trans *tp, - void *intent, + struct xfs_log_item *intent, unsigned int count) { - return xfs_trans_get_cud(tp, intent); + return xfs_trans_get_cud(tp, CUI_ITEM(intent)); }
/* Process a deferred refcount update. */ @@ -407,9 +407,9 @@ xfs_refcount_update_finish_cleanup( /* Abort all pending CUIs. */ STATIC void xfs_refcount_update_abort_intent( - void *intent) + struct xfs_log_item *intent) { - xfs_cui_release(intent); + xfs_cui_release(CUI_ITEM(intent)); }
/* Cancel a deferred refcount update. */ --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -381,7 +381,7 @@ xfs_rmap_update_log_item( rmap->ri_bmap.br_state); }
-STATIC void * +static struct xfs_log_item * xfs_rmap_update_create_intent( struct xfs_trans *tp, struct list_head *items, @@ -399,17 +399,17 @@ xfs_rmap_update_create_intent( list_sort(mp, items, xfs_rmap_update_diff_items); list_for_each_entry(rmap, items, ri_list) xfs_rmap_update_log_item(tp, ruip, rmap); - return ruip; + return &ruip->rui_item; }
/* Get an RUD so we can process all the deferred rmap updates. */ STATIC void * xfs_rmap_update_create_done( struct xfs_trans *tp, - void *intent, + struct xfs_log_item *intent, unsigned int count) { - return xfs_trans_get_rud(tp, intent); + return xfs_trans_get_rud(tp, RUI_ITEM(intent)); }
/* Process a deferred rmap update. */ @@ -451,9 +451,9 @@ xfs_rmap_update_finish_cleanup( /* Abort all pending RUIs. */ STATIC void xfs_rmap_update_abort_intent( - void *intent) + struct xfs_log_item *intent) { - xfs_rui_release(intent); + xfs_rui_release(RUI_ITEM(intent)); }
/* Cancel a deferred rmap update. */
From: Christoph Hellwig hch@lst.de
commit bb47d79750f1a68a75d4c7defc2da934ba31de14 upstream.
Split out a helper that operates on a single xfs_defer_pending structure to untangle the code.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_defer.c | 128 +++++++++++++++++++++------------------------- 1 file changed, 59 insertions(+), 69 deletions(-)
--- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -360,6 +360,53 @@ xfs_defer_cancel_list( }
/* + * Log an intent-done item for the first pending intent, and finish the work + * items. + */ +static int +xfs_defer_finish_one( + struct xfs_trans *tp, + struct xfs_defer_pending *dfp) +{ + const struct xfs_defer_op_type *ops = defer_op_types[dfp->dfp_type]; + void *state = NULL; + struct list_head *li, *n; + int error; + + trace_xfs_defer_pending_finish(tp->t_mountp, dfp); + + dfp->dfp_done = ops->create_done(tp, dfp->dfp_intent, dfp->dfp_count); + list_for_each_safe(li, n, &dfp->dfp_work) { + list_del(li); + dfp->dfp_count--; + error = ops->finish_item(tp, li, dfp->dfp_done, &state); + if (error == -EAGAIN) { + /* + * Caller wants a fresh transaction; put the work item + * back on the list and log a new log intent item to + * replace the old one. See "Requesting a Fresh + * Transaction while Finishing Deferred Work" above. + */ + list_add(li, &dfp->dfp_work); + dfp->dfp_count++; + dfp->dfp_done = NULL; + xfs_defer_create_intent(tp, dfp, false); + } + + if (error) + goto out; + } + + /* Done with the dfp, free it. */ + list_del(&dfp->dfp_list); + kmem_free(dfp); +out: + if (ops->finish_cleanup) + ops->finish_cleanup(tp, state, error); + return error; +} + +/* * Finish all the pending work. This involves logging intent items for * any work items that wandered in since the last transaction roll (if * one has even happened), rolling the transaction, and finishing the @@ -372,11 +419,7 @@ xfs_defer_finish_noroll( struct xfs_trans **tp) { struct xfs_defer_pending *dfp; - struct list_head *li; - struct list_head *n; - void *state; int error = 0; - const struct xfs_defer_op_type *ops; LIST_HEAD(dop_pending);
ASSERT((*tp)->t_flags & XFS_TRANS_PERM_LOG_RES); @@ -385,83 +428,30 @@ xfs_defer_finish_noroll(
/* Until we run out of pending work to finish... */ while (!list_empty(&dop_pending) || !list_empty(&(*tp)->t_dfops)) { - /* log intents and pull in intake items */ xfs_defer_create_intents(*tp); list_splice_tail_init(&(*tp)->t_dfops, &dop_pending);
- /* - * Roll the transaction. - */ error = xfs_defer_trans_roll(tp); if (error) - goto out; + goto out_shutdown;
- /* Log an intent-done item for the first pending item. */ dfp = list_first_entry(&dop_pending, struct xfs_defer_pending, dfp_list); - ops = defer_op_types[dfp->dfp_type]; - trace_xfs_defer_pending_finish((*tp)->t_mountp, dfp); - dfp->dfp_done = ops->create_done(*tp, dfp->dfp_intent, - dfp->dfp_count); - - /* Finish the work items. */ - state = NULL; - list_for_each_safe(li, n, &dfp->dfp_work) { - list_del(li); - dfp->dfp_count--; - error = ops->finish_item(*tp, li, dfp->dfp_done, - &state); - if (error == -EAGAIN) { - /* - * Caller wants a fresh transaction; - * put the work item back on the list - * and jump out. - */ - list_add(li, &dfp->dfp_work); - dfp->dfp_count++; - break; - } else if (error) { - /* - * Clean up after ourselves and jump out. - * xfs_defer_cancel will take care of freeing - * all these lists and stuff. - */ - if (ops->finish_cleanup) - ops->finish_cleanup(*tp, state, error); - goto out; - } - } - if (error == -EAGAIN) { - /* - * Caller wants a fresh transaction, so log a new log - * intent item to replace the old one and roll the - * transaction. See "Requesting a Fresh Transaction - * while Finishing Deferred Work" above. - */ - dfp->dfp_done = NULL; - xfs_defer_create_intent(*tp, dfp, false); - } else { - /* Done with the dfp, free it. */ - list_del(&dfp->dfp_list); - kmem_free(dfp); - } - - if (ops->finish_cleanup) - ops->finish_cleanup(*tp, state, error); - } - -out: - if (error) { - xfs_defer_trans_abort(*tp, &dop_pending); - xfs_force_shutdown((*tp)->t_mountp, SHUTDOWN_CORRUPT_INCORE); - trace_xfs_defer_finish_error(*tp, error); - xfs_defer_cancel_list((*tp)->t_mountp, &dop_pending); - xfs_defer_cancel(*tp); - return error; + error = xfs_defer_finish_one(*tp, dfp); + if (error && error != -EAGAIN) + goto out_shutdown; }
trace_xfs_defer_finish_done(*tp, _RET_IP_); return 0; + +out_shutdown: + xfs_defer_trans_abort(*tp, &dop_pending); + xfs_force_shutdown((*tp)->t_mountp, SHUTDOWN_CORRUPT_INCORE); + trace_xfs_defer_finish_error(*tp, error); + xfs_defer_cancel_list((*tp)->t_mountp, &dop_pending); + xfs_defer_cancel(*tp); + return error; }
int
From: "Darrick J. Wong" darrick.wong@oracle.com
commit 93293bcbde93567efaf4e6bcd58cad270e1fcbf5 upstream.
[Slightly edit fs/xfs/xfs_bmap_item.c & fs/xfs/xfs_refcount_item.c to resolve merge conflicts]
During a code inspection, I found a serious bug in the log intent item recovery code when an intent item cannot complete all the work and decides to requeue itself to get that done. When this happens, the item recovery creates a new incore deferred op representing the remaining work and attaches it to the transaction that it allocated. At the end of _item_recover, it moves the entire chain of deferred ops to the dummy parent_tp that xlog_recover_process_intents passed to it, but fail to log a new intent item for the remaining work before committing the transaction for the single unit of work.
xlog_finish_defer_ops logs those new intent items once recovery has finished dealing with the intent items that it recovered, but this isn't sufficient. If the log is forced to disk after a recovered log item decides to requeue itself and the system goes down before we call xlog_finish_defer_ops, the second log recovery will never see the new intent item and therefore has no idea that there was more work to do. It will finish recovery leaving the filesystem in a corrupted state.
The same logic applies to /any/ deferred ops added during intent item recovery, not just the one handling the remaining work.
Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_defer.c | 26 ++++++++++++++++++++++++-- fs/xfs/libxfs/xfs_defer.h | 6 ++++++ fs/xfs/xfs_bmap_item.c | 2 +- fs/xfs/xfs_refcount_item.c | 2 +- 4 files changed, 32 insertions(+), 4 deletions(-)
--- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -186,8 +186,9 @@ xfs_defer_create_intent( { const struct xfs_defer_op_type *ops = defer_op_types[dfp->dfp_type];
- dfp->dfp_intent = ops->create_intent(tp, &dfp->dfp_work, - dfp->dfp_count, sort); + if (!dfp->dfp_intent) + dfp->dfp_intent = ops->create_intent(tp, &dfp->dfp_work, + dfp->dfp_count, sort); }
/* @@ -390,6 +391,7 @@ xfs_defer_finish_one( list_add(li, &dfp->dfp_work); dfp->dfp_count++; dfp->dfp_done = NULL; + dfp->dfp_intent = NULL; xfs_defer_create_intent(tp, dfp, false); }
@@ -552,3 +554,23 @@ xfs_defer_move(
xfs_defer_reset(stp); } + +/* + * Prepare a chain of fresh deferred ops work items to be completed later. Log + * recovery requires the ability to put off until later the actual finishing + * work so that it can process unfinished items recovered from the log in + * correct order. + * + * Create and log intent items for all the work that we're capturing so that we + * can be assured that the items will get replayed if the system goes down + * before log recovery gets a chance to finish the work it put off. Then we + * move the chain from stp to dtp. + */ +void +xfs_defer_capture( + struct xfs_trans *dtp, + struct xfs_trans *stp) +{ + xfs_defer_create_intents(stp); + xfs_defer_move(dtp, stp); +} --- a/fs/xfs/libxfs/xfs_defer.h +++ b/fs/xfs/libxfs/xfs_defer.h @@ -61,4 +61,10 @@ extern const struct xfs_defer_op_type xf extern const struct xfs_defer_op_type xfs_extent_free_defer_type; extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
+/* + * Functions to capture a chain of deferred operations and continue them later. + * This doesn't normally happen except log recovery. + */ +void xfs_defer_capture(struct xfs_trans *dtp, struct xfs_trans *stp); + #endif /* __XFS_DEFER_H__ */ --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -541,7 +541,7 @@ xfs_bui_recover( }
set_bit(XFS_BUI_RECOVERED, &buip->bui_flags); - xfs_defer_move(parent_tp, tp); + xfs_defer_capture(parent_tp, tp); error = xfs_trans_commit(tp); xfs_iunlock(ip, XFS_ILOCK_EXCL); xfs_irele(ip); --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -574,7 +574,7 @@ xfs_cui_recover(
xfs_refcount_finish_one_cleanup(tp, rcur, error); set_bit(XFS_CUI_RECOVERED, &cuip->cui_flags); - xfs_defer_move(parent_tp, tp); + xfs_defer_capture(parent_tp, tp); error = xfs_trans_commit(tp); return error;
From: Dave Chinner dchinner@redhat.com
commit 671459676ab0e1d371c8d6b184ad1faa05b6941e upstream.
[ In 5.4.y, xlog_recover_get_buf_lsn() is defined inside fs/xfs/xfs_log_recover.c ]
Nathan popped up on #xfs and pointed out that we fail to handle finobt btree blocks in xlog_recover_get_buf_lsn(). This means they always fall through the entire magic number matching code to "recover immediately". Whilst most of the time this is the correct behaviour, occasionally it will be incorrect and could potentially overwrite more recent metadata because we don't check the LSN in the on disk metadata at all.
This bug has been present since the finobt was first introduced, and is a potential cause of the occasional xfs_iget_check_free_state() failures we see that indicate that the inode btree state does not match the on disk inode state.
Fixes: aafc3c246529 ("xfs: support the XFS_BTNUM_FINOBT free inode btree type") Reported-by: Nathan Scott nathans@redhat.com Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Brian Foster bfoster@redhat.com Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_log_recover.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -2206,6 +2206,8 @@ xlog_recover_get_buf_lsn( case XFS_ABTC_MAGIC: case XFS_RMAP_CRC_MAGIC: case XFS_REFC_CRC_MAGIC: + case XFS_FIBT_CRC_MAGIC: + case XFS_FIBT_MAGIC: case XFS_IBT_CRC_MAGIC: case XFS_IBT_MAGIC: { struct xfs_btree_block *btb = blk;
From: "Darrick J. Wong" darrick.wong@oracle.com
commit e6fff81e487089e47358a028526a9f63cdbcd503 upstream.
When we replay unfinished intent items that have been recovered from the log, it's possible that the replay will cause the creation of more deferred work items. As outlined in commit 509955823cc9c ("xfs: log recovery should replay deferred ops in order"), later work items have an implicit ordering dependency on earlier work items. Therefore, recovery must replay the items (both recovered and created) in the same order that they would have been during normal operation.
For log recovery, we enforce this ordering by using an empty transaction to collect deferred ops that get created in the process of recovering a log intent item to prevent them from being committed before the rest of the recovered intent items. After we finish committing all the recovered log items, we allocate a transaction with an enormous block reservation, splice our huge list of created deferred ops into that transaction, and commit it, thereby finishing all those ops.
This is /really/ hokey -- it's the one place in XFS where we allow nested transactions; the splicing of the defer ops list is is inelegant and has to be done twice per recovery function; and the broken way we handle inode pointers and block reservations cause subtle use-after-free and allocator problems that will be fixed by this patch and the two patches after it.
Therefore, replace the hokey empty transaction with a structure designed to capture each chain of deferred ops that are created as part of recovering a single unfinished log intent. Finally, refactor the loop that replays those chains to do so using one transaction per chain.
Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_defer.c | 89 ++++++++++++++++++++++++-- fs/xfs/libxfs/xfs_defer.h | 19 +++++ fs/xfs/xfs_bmap_item.c | 18 +---- fs/xfs/xfs_bmap_item.h | 3 fs/xfs/xfs_extfree_item.c | 9 +- fs/xfs/xfs_extfree_item.h | 4 - fs/xfs/xfs_log_recover.c | 151 +++++++++++++++++++++++++-------------------- fs/xfs/xfs_refcount_item.c | 18 +---- fs/xfs/xfs_refcount_item.h | 3 fs/xfs/xfs_rmap_item.c | 8 +- fs/xfs/xfs_rmap_item.h | 3 11 files changed, 213 insertions(+), 112 deletions(-)
--- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -563,14 +563,89 @@ xfs_defer_move( * * Create and log intent items for all the work that we're capturing so that we * can be assured that the items will get replayed if the system goes down - * before log recovery gets a chance to finish the work it put off. Then we - * move the chain from stp to dtp. + * before log recovery gets a chance to finish the work it put off. The entire + * deferred ops state is transferred to the capture structure and the + * transaction is then ready for the caller to commit it. If there are no + * intent items to capture, this function returns NULL. + */ +static struct xfs_defer_capture * +xfs_defer_ops_capture( + struct xfs_trans *tp) +{ + struct xfs_defer_capture *dfc; + + if (list_empty(&tp->t_dfops)) + return NULL; + + /* Create an object to capture the defer ops. */ + dfc = kmem_zalloc(sizeof(*dfc), KM_NOFS); + INIT_LIST_HEAD(&dfc->dfc_list); + INIT_LIST_HEAD(&dfc->dfc_dfops); + + xfs_defer_create_intents(tp); + + /* Move the dfops chain and transaction state to the capture struct. */ + list_splice_init(&tp->t_dfops, &dfc->dfc_dfops); + dfc->dfc_tpflags = tp->t_flags & XFS_TRANS_LOWMODE; + tp->t_flags &= ~XFS_TRANS_LOWMODE; + + return dfc; +} + +/* Release all resources that we used to capture deferred ops. */ +void +xfs_defer_ops_release( + struct xfs_mount *mp, + struct xfs_defer_capture *dfc) +{ + xfs_defer_cancel_list(mp, &dfc->dfc_dfops); + kmem_free(dfc); +} + +/* + * Capture any deferred ops and commit the transaction. This is the last step + * needed to finish a log intent item that we recovered from the log. + */ +int +xfs_defer_ops_capture_and_commit( + struct xfs_trans *tp, + struct list_head *capture_list) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_defer_capture *dfc; + int error; + + /* If we don't capture anything, commit transaction and exit. */ + dfc = xfs_defer_ops_capture(tp); + if (!dfc) + return xfs_trans_commit(tp); + + /* Commit the transaction and add the capture structure to the list. */ + error = xfs_trans_commit(tp); + if (error) { + xfs_defer_ops_release(mp, dfc); + return error; + } + + list_add_tail(&dfc->dfc_list, capture_list); + return 0; +} + +/* + * Attach a chain of captured deferred ops to a new transaction and free the + * capture structure. */ void -xfs_defer_capture( - struct xfs_trans *dtp, - struct xfs_trans *stp) +xfs_defer_ops_continue( + struct xfs_defer_capture *dfc, + struct xfs_trans *tp) { - xfs_defer_create_intents(stp); - xfs_defer_move(dtp, stp); + ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES); + ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY)); + + /* Move captured dfops chain and state to the transaction. */ + list_splice_init(&dfc->dfc_dfops, &tp->t_dfops); + tp->t_flags |= dfc->dfc_tpflags; + + kmem_free(dfc); } --- a/fs/xfs/libxfs/xfs_defer.h +++ b/fs/xfs/libxfs/xfs_defer.h @@ -7,6 +7,7 @@ #define __XFS_DEFER_H__
struct xfs_defer_op_type; +struct xfs_defer_capture;
/* * Header for deferred operation list. @@ -62,9 +63,25 @@ extern const struct xfs_defer_op_type xf extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
/* + * This structure enables a dfops user to detach the chain of deferred + * operations from a transaction so that they can be continued later. + */ +struct xfs_defer_capture { + /* List of other capture structures. */ + struct list_head dfc_list; + + /* Deferred ops state saved from the transaction. */ + struct list_head dfc_dfops; + unsigned int dfc_tpflags; +}; + +/* * Functions to capture a chain of deferred operations and continue them later. * This doesn't normally happen except log recovery. */ -void xfs_defer_capture(struct xfs_trans *dtp, struct xfs_trans *stp); +int xfs_defer_ops_capture_and_commit(struct xfs_trans *tp, + struct list_head *capture_list); +void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp); +void xfs_defer_ops_release(struct xfs_mount *mp, struct xfs_defer_capture *d);
#endif /* __XFS_DEFER_H__ */ --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -425,8 +425,8 @@ const struct xfs_defer_op_type xfs_bmap_ */ int xfs_bui_recover( - struct xfs_trans *parent_tp, - struct xfs_bui_log_item *buip) + struct xfs_bui_log_item *buip, + struct list_head *capture_list) { int error = 0; unsigned int bui_type; @@ -442,7 +442,7 @@ xfs_bui_recover( struct xfs_trans *tp; struct xfs_inode *ip = NULL; struct xfs_bmbt_irec irec; - struct xfs_mount *mp = parent_tp->t_mountp; + struct xfs_mount *mp = buip->bui_item.li_mountp;
ASSERT(!test_bit(XFS_BUI_RECOVERED, &buip->bui_flags));
@@ -491,12 +491,7 @@ xfs_bui_recover( XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK), 0, 0, &tp); if (error) return error; - /* - * Recovery stashes all deferred ops during intent processing and - * finishes them on completion. Transfer current dfops state to this - * transaction and transfer the result back before we return. - */ - xfs_defer_move(tp, parent_tp); + budp = xfs_trans_get_bud(tp, buip);
/* Grab the inode. */ @@ -541,15 +536,12 @@ xfs_bui_recover( }
set_bit(XFS_BUI_RECOVERED, &buip->bui_flags); - xfs_defer_capture(parent_tp, tp); - error = xfs_trans_commit(tp); + error = xfs_defer_ops_capture_and_commit(tp, capture_list); xfs_iunlock(ip, XFS_ILOCK_EXCL); xfs_irele(ip); - return error;
err_inode: - xfs_defer_move(parent_tp, tp); xfs_trans_cancel(tp); if (ip) { xfs_iunlock(ip, XFS_ILOCK_EXCL); --- a/fs/xfs/xfs_bmap_item.h +++ b/fs/xfs/xfs_bmap_item.h @@ -77,6 +77,7 @@ extern struct kmem_zone *xfs_bud_zone; struct xfs_bui_log_item *xfs_bui_init(struct xfs_mount *); void xfs_bui_item_free(struct xfs_bui_log_item *); void xfs_bui_release(struct xfs_bui_log_item *); -int xfs_bui_recover(struct xfs_trans *parent_tp, struct xfs_bui_log_item *buip); +int xfs_bui_recover(struct xfs_bui_log_item *buip, + struct list_head *capture_list);
#endif /* __XFS_BMAP_ITEM_H__ */ --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -586,9 +586,10 @@ const struct xfs_defer_op_type xfs_agfl_ */ int xfs_efi_recover( - struct xfs_mount *mp, - struct xfs_efi_log_item *efip) + struct xfs_efi_log_item *efip, + struct list_head *capture_list) { + struct xfs_mount *mp = efip->efi_item.li_mountp; struct xfs_efd_log_item *efdp; struct xfs_trans *tp; int i; @@ -637,8 +638,8 @@ xfs_efi_recover( }
set_bit(XFS_EFI_RECOVERED, &efip->efi_flags); - error = xfs_trans_commit(tp); - return error; + + return xfs_defer_ops_capture_and_commit(tp, capture_list);
abort_error: xfs_trans_cancel(tp); --- a/fs/xfs/xfs_extfree_item.h +++ b/fs/xfs/xfs_extfree_item.h @@ -84,7 +84,7 @@ int xfs_efi_copy_format(xfs_log_iovec_ void xfs_efi_item_free(struct xfs_efi_log_item *); void xfs_efi_release(struct xfs_efi_log_item *);
-int xfs_efi_recover(struct xfs_mount *mp, - struct xfs_efi_log_item *efip); +int xfs_efi_recover(struct xfs_efi_log_item *efip, + struct list_head *capture_list);
#endif /* __XFS_EXTFREE_ITEM_H__ */ --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -4587,9 +4587,9 @@ xlog_recover_process_data( /* Recover the EFI if necessary. */ STATIC int xlog_recover_process_efi( - struct xfs_mount *mp, struct xfs_ail *ailp, - struct xfs_log_item *lip) + struct xfs_log_item *lip, + struct list_head *capture_list) { struct xfs_efi_log_item *efip; int error; @@ -4602,7 +4602,7 @@ xlog_recover_process_efi( return 0;
spin_unlock(&ailp->ail_lock); - error = xfs_efi_recover(mp, efip); + error = xfs_efi_recover(efip, capture_list); spin_lock(&ailp->ail_lock);
return error; @@ -4627,9 +4627,9 @@ xlog_recover_cancel_efi( /* Recover the RUI if necessary. */ STATIC int xlog_recover_process_rui( - struct xfs_mount *mp, struct xfs_ail *ailp, - struct xfs_log_item *lip) + struct xfs_log_item *lip, + struct list_head *capture_list) { struct xfs_rui_log_item *ruip; int error; @@ -4642,7 +4642,7 @@ xlog_recover_process_rui( return 0;
spin_unlock(&ailp->ail_lock); - error = xfs_rui_recover(mp, ruip); + error = xfs_rui_recover(ruip, capture_list); spin_lock(&ailp->ail_lock);
return error; @@ -4667,9 +4667,9 @@ xlog_recover_cancel_rui( /* Recover the CUI if necessary. */ STATIC int xlog_recover_process_cui( - struct xfs_trans *parent_tp, struct xfs_ail *ailp, - struct xfs_log_item *lip) + struct xfs_log_item *lip, + struct list_head *capture_list) { struct xfs_cui_log_item *cuip; int error; @@ -4682,7 +4682,7 @@ xlog_recover_process_cui( return 0;
spin_unlock(&ailp->ail_lock); - error = xfs_cui_recover(parent_tp, cuip); + error = xfs_cui_recover(cuip, capture_list); spin_lock(&ailp->ail_lock);
return error; @@ -4707,9 +4707,9 @@ xlog_recover_cancel_cui( /* Recover the BUI if necessary. */ STATIC int xlog_recover_process_bui( - struct xfs_trans *parent_tp, struct xfs_ail *ailp, - struct xfs_log_item *lip) + struct xfs_log_item *lip, + struct list_head *capture_list) { struct xfs_bui_log_item *buip; int error; @@ -4722,7 +4722,7 @@ xlog_recover_process_bui( return 0;
spin_unlock(&ailp->ail_lock); - error = xfs_bui_recover(parent_tp, buip); + error = xfs_bui_recover(buip, capture_list); spin_lock(&ailp->ail_lock);
return error; @@ -4761,37 +4761,65 @@ static inline bool xlog_item_is_intent(s /* Take all the collected deferred ops and finish them in order. */ static int xlog_finish_defer_ops( - struct xfs_trans *parent_tp) + struct xfs_mount *mp, + struct list_head *capture_list) { - struct xfs_mount *mp = parent_tp->t_mountp; + struct xfs_defer_capture *dfc, *next; struct xfs_trans *tp; int64_t freeblks; - uint resblks; - int error; + uint64_t resblks; + int error = 0;
- /* - * We're finishing the defer_ops that accumulated as a result of - * recovering unfinished intent items during log recovery. We - * reserve an itruncate transaction because it is the largest - * permanent transaction type. Since we're the only user of the fs - * right now, take 93% (15/16) of the available free blocks. Use - * weird math to avoid a 64-bit division. - */ - freeblks = percpu_counter_sum(&mp->m_fdblocks); - if (freeblks <= 0) - return -ENOSPC; - resblks = min_t(int64_t, UINT_MAX, freeblks); - resblks = (resblks * 15) >> 4; - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks, - 0, XFS_TRANS_RESERVE, &tp); - if (error) - return error; - /* transfer all collected dfops to this transaction */ - xfs_defer_move(tp, parent_tp); + list_for_each_entry_safe(dfc, next, capture_list, dfc_list) { + /* + * We're finishing the defer_ops that accumulated as a result + * of recovering unfinished intent items during log recovery. + * We reserve an itruncate transaction because it is the + * largest permanent transaction type. Since we're the only + * user of the fs right now, take 93% (15/16) of the available + * free blocks. Use weird math to avoid a 64-bit division. + */ + freeblks = percpu_counter_sum(&mp->m_fdblocks); + if (freeblks <= 0) + return -ENOSPC; + + resblks = min_t(uint64_t, UINT_MAX, freeblks); + resblks = (resblks * 15) >> 4; + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks, + 0, XFS_TRANS_RESERVE, &tp); + if (error) + return error; + + /* + * Transfer to this new transaction all the dfops we captured + * from recovering a single intent item. + */ + list_del_init(&dfc->dfc_list); + xfs_defer_ops_continue(dfc, tp); + + error = xfs_trans_commit(tp); + if (error) + return error; + }
- return xfs_trans_commit(tp); + ASSERT(list_empty(capture_list)); + return 0; }
+/* Release all the captured defer ops and capture structures in this list. */ +static void +xlog_abort_defer_ops( + struct xfs_mount *mp, + struct list_head *capture_list) +{ + struct xfs_defer_capture *dfc; + struct xfs_defer_capture *next; + + list_for_each_entry_safe(dfc, next, capture_list, dfc_list) { + list_del_init(&dfc->dfc_list); + xfs_defer_ops_release(mp, dfc); + } +} /* * When this is called, all of the log intent items which did not have * corresponding log done items should be in the AIL. What we do now @@ -4812,35 +4840,23 @@ STATIC int xlog_recover_process_intents( struct xlog *log) { - struct xfs_trans *parent_tp; + LIST_HEAD(capture_list); struct xfs_ail_cursor cur; struct xfs_log_item *lip; struct xfs_ail *ailp; - int error; + int error = 0; #if defined(DEBUG) || defined(XFS_WARN) xfs_lsn_t last_lsn; #endif
- /* - * The intent recovery handlers commit transactions to complete recovery - * for individual intents, but any new deferred operations that are - * queued during that process are held off until the very end. The - * purpose of this transaction is to serve as a container for deferred - * operations. Each intent recovery handler must transfer dfops here - * before its local transaction commits, and we'll finish the entire - * list below. - */ - error = xfs_trans_alloc_empty(log->l_mp, &parent_tp); - if (error) - return error; - ailp = log->l_ailp; spin_lock(&ailp->ail_lock); - lip = xfs_trans_ail_cursor_first(ailp, &cur, 0); #if defined(DEBUG) || defined(XFS_WARN) last_lsn = xlog_assign_lsn(log->l_curr_cycle, log->l_curr_block); #endif - while (lip != NULL) { + for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0); + lip != NULL; + lip = xfs_trans_ail_cursor_next(ailp, &cur)) { /* * We're done when we see something other than an intent. * There should be no intents left in the AIL now. @@ -4862,35 +4878,40 @@ xlog_recover_process_intents(
/* * NOTE: If your intent processing routine can create more - * deferred ops, you /must/ attach them to the dfops in this - * routine or else those subsequent intents will get + * deferred ops, you /must/ attach them to the capture list in + * the recover routine or else those subsequent intents will be * replayed in the wrong order! */ switch (lip->li_type) { case XFS_LI_EFI: - error = xlog_recover_process_efi(log->l_mp, ailp, lip); + error = xlog_recover_process_efi(ailp, lip, &capture_list); break; case XFS_LI_RUI: - error = xlog_recover_process_rui(log->l_mp, ailp, lip); + error = xlog_recover_process_rui(ailp, lip, &capture_list); break; case XFS_LI_CUI: - error = xlog_recover_process_cui(parent_tp, ailp, lip); + error = xlog_recover_process_cui(ailp, lip, &capture_list); break; case XFS_LI_BUI: - error = xlog_recover_process_bui(parent_tp, ailp, lip); + error = xlog_recover_process_bui(ailp, lip, &capture_list); break; } if (error) - goto out; - lip = xfs_trans_ail_cursor_next(ailp, &cur); + break; } -out: + xfs_trans_ail_cursor_done(&cur); spin_unlock(&ailp->ail_lock); - if (!error) - error = xlog_finish_defer_ops(parent_tp); - xfs_trans_cancel(parent_tp); + if (error) + goto err; + + error = xlog_finish_defer_ops(log->l_mp, &capture_list); + if (error) + goto err;
+ return 0; +err: + xlog_abort_defer_ops(log->l_mp, &capture_list); return error; }
--- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -439,8 +439,8 @@ const struct xfs_defer_op_type xfs_refco */ int xfs_cui_recover( - struct xfs_trans *parent_tp, - struct xfs_cui_log_item *cuip) + struct xfs_cui_log_item *cuip, + struct list_head *capture_list) { int i; int error = 0; @@ -456,7 +456,7 @@ xfs_cui_recover( xfs_extlen_t new_len; struct xfs_bmbt_irec irec; bool requeue_only = false; - struct xfs_mount *mp = parent_tp->t_mountp; + struct xfs_mount *mp = cuip->cui_item.li_mountp;
ASSERT(!test_bit(XFS_CUI_RECOVERED, &cuip->cui_flags));
@@ -511,12 +511,7 @@ xfs_cui_recover( mp->m_refc_maxlevels * 2, 0, XFS_TRANS_RESERVE, &tp); if (error) return error; - /* - * Recovery stashes all deferred ops during intent processing and - * finishes them on completion. Transfer current dfops state to this - * transaction and transfer the result back before we return. - */ - xfs_defer_move(tp, parent_tp); + cudp = xfs_trans_get_cud(tp, cuip);
for (i = 0; i < cuip->cui_format.cui_nextents; i++) { @@ -574,13 +569,10 @@ xfs_cui_recover(
xfs_refcount_finish_one_cleanup(tp, rcur, error); set_bit(XFS_CUI_RECOVERED, &cuip->cui_flags); - xfs_defer_capture(parent_tp, tp); - error = xfs_trans_commit(tp); - return error; + return xfs_defer_ops_capture_and_commit(tp, capture_list);
abort_error: xfs_refcount_finish_one_cleanup(tp, rcur, error); - xfs_defer_move(parent_tp, tp); xfs_trans_cancel(tp); return error; } --- a/fs/xfs/xfs_refcount_item.h +++ b/fs/xfs/xfs_refcount_item.h @@ -80,6 +80,7 @@ extern struct kmem_zone *xfs_cud_zone; struct xfs_cui_log_item *xfs_cui_init(struct xfs_mount *, uint); void xfs_cui_item_free(struct xfs_cui_log_item *); void xfs_cui_release(struct xfs_cui_log_item *); -int xfs_cui_recover(struct xfs_trans *parent_tp, struct xfs_cui_log_item *cuip); +int xfs_cui_recover(struct xfs_cui_log_item *cuip, + struct list_head *capture_list);
#endif /* __XFS_REFCOUNT_ITEM_H__ */ --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -483,9 +483,10 @@ const struct xfs_defer_op_type xfs_rmap_ */ int xfs_rui_recover( - struct xfs_mount *mp, - struct xfs_rui_log_item *ruip) + struct xfs_rui_log_item *ruip, + struct list_head *capture_list) { + struct xfs_mount *mp = ruip->rui_item.li_mountp; int i; int error = 0; struct xfs_map_extent *rmap; @@ -592,8 +593,7 @@ xfs_rui_recover(
xfs_rmap_finish_one_cleanup(tp, rcur, error); set_bit(XFS_RUI_RECOVERED, &ruip->rui_flags); - error = xfs_trans_commit(tp); - return error; + return xfs_defer_ops_capture_and_commit(tp, capture_list);
abort_error: xfs_rmap_finish_one_cleanup(tp, rcur, error); --- a/fs/xfs/xfs_rmap_item.h +++ b/fs/xfs/xfs_rmap_item.h @@ -82,6 +82,7 @@ int xfs_rui_copy_format(struct xfs_log_i struct xfs_rui_log_format *dst_rui_fmt); void xfs_rui_item_free(struct xfs_rui_log_item *); void xfs_rui_release(struct xfs_rui_log_item *); -int xfs_rui_recover(struct xfs_mount *mp, struct xfs_rui_log_item *ruip); +int xfs_rui_recover(struct xfs_rui_log_item *ruip, + struct list_head *capture_list);
#endif /* __XFS_RMAP_ITEM_H__ */
From: "Darrick J. Wong" darrick.wong@oracle.com
commit 4f9a60c48078c0efa3459678fa8d6e050e8ada5d upstream.
When xfs_defer_capture extracts the deferred ops and transaction state from a transaction, it should record the remaining block reservations so that when we continue the dfops chain, we can reserve the same number of blocks to use. We capture the reservations for both data and realtime volumes.
This adds the requirement that every log intent item recovery function must be careful to reserve enough blocks to handle both itself and all defer ops that it can queue. On the other hand, this enables us to do away with the handwaving block estimation nonsense that was going on in xlog_finish_defer_ops.
Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Brian Foster bfoster@redhat.com Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_defer.c | 4 ++++ fs/xfs/libxfs/xfs_defer.h | 4 ++++ fs/xfs/xfs_log_recover.c | 21 +++------------------ 3 files changed, 11 insertions(+), 18 deletions(-)
--- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -589,6 +589,10 @@ xfs_defer_ops_capture( dfc->dfc_tpflags = tp->t_flags & XFS_TRANS_LOWMODE; tp->t_flags &= ~XFS_TRANS_LOWMODE;
+ /* Capture the remaining block reservations along with the dfops. */ + dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used; + dfc->dfc_rtxres = tp->t_rtx_res - tp->t_rtx_res_used; + return dfc; }
--- a/fs/xfs/libxfs/xfs_defer.h +++ b/fs/xfs/libxfs/xfs_defer.h @@ -73,6 +73,10 @@ struct xfs_defer_capture { /* Deferred ops state saved from the transaction. */ struct list_head dfc_dfops; unsigned int dfc_tpflags; + + /* Block reservations for the data and rt devices. */ + unsigned int dfc_blkres; + unsigned int dfc_rtxres; };
/* --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -4766,27 +4766,12 @@ xlog_finish_defer_ops( { struct xfs_defer_capture *dfc, *next; struct xfs_trans *tp; - int64_t freeblks; - uint64_t resblks; int error = 0;
list_for_each_entry_safe(dfc, next, capture_list, dfc_list) { - /* - * We're finishing the defer_ops that accumulated as a result - * of recovering unfinished intent items during log recovery. - * We reserve an itruncate transaction because it is the - * largest permanent transaction type. Since we're the only - * user of the fs right now, take 93% (15/16) of the available - * free blocks. Use weird math to avoid a 64-bit division. - */ - freeblks = percpu_counter_sum(&mp->m_fdblocks); - if (freeblks <= 0) - return -ENOSPC; - - resblks = min_t(uint64_t, UINT_MAX, freeblks); - resblks = (resblks * 15) >> 4; - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks, - 0, XFS_TRANS_RESERVE, &tp); + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, + dfc->dfc_blkres, dfc->dfc_rtxres, + XFS_TRANS_RESERVE, &tp); if (error) return error;
From: "Darrick J. Wong" darrick.wong@oracle.com
commit 929b92f64048d90d23e40a59c47adf59f5026903 upstream.
When xfs_defer_capture extracts the deferred ops and transaction state from a transaction, it should record the transaction reservation type from the old transaction so that when we continue the dfops chain, we still use the same reservation parameters.
Doing this means that the log item recovery functions get to determine the transaction reservation instead of abusing tr_itruncate in yet another part of xfs.
Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_defer.c | 3 +++ fs/xfs/libxfs/xfs_defer.h | 3 +++ fs/xfs/xfs_log_recover.c | 17 ++++++++++++++--- 3 files changed, 20 insertions(+), 3 deletions(-)
--- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -593,6 +593,9 @@ xfs_defer_ops_capture( dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used; dfc->dfc_rtxres = tp->t_rtx_res - tp->t_rtx_res_used;
+ /* Preserve the log reservation size. */ + dfc->dfc_logres = tp->t_log_res; + return dfc; }
--- a/fs/xfs/libxfs/xfs_defer.h +++ b/fs/xfs/libxfs/xfs_defer.h @@ -77,6 +77,9 @@ struct xfs_defer_capture { /* Block reservations for the data and rt devices. */ unsigned int dfc_blkres; unsigned int dfc_rtxres; + + /* Log reservation saved from the transaction. */ + unsigned int dfc_logres; };
/* --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -4769,9 +4769,20 @@ xlog_finish_defer_ops( int error = 0;
list_for_each_entry_safe(dfc, next, capture_list, dfc_list) { - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, - dfc->dfc_blkres, dfc->dfc_rtxres, - XFS_TRANS_RESERVE, &tp); + struct xfs_trans_res resv; + + /* + * Create a new transaction reservation from the captured + * information. Set logcount to 1 to force the new transaction + * to regrant every roll so that we can make forward progress + * in recovery no matter how full the log might be. + */ + resv.tr_logres = dfc->dfc_logres; + resv.tr_logcount = 1; + resv.tr_logflags = XFS_TRANS_PERM_LOG_RES; + + error = xfs_trans_alloc(mp, &resv, dfc->dfc_blkres, + dfc->dfc_rtxres, XFS_TRANS_RESERVE, &tp); if (error) return error;
From: "Darrick J. Wong" darrick.wong@oracle.com
commit 919522e89f8e71fc6a8f8abe17be4011573c6ea0 upstream.
The bmap intent item checking code in xfs_bui_item_recover is spread all over the function. We should check the recovered log item at the top before we allocate any resources or do anything else, so do that.
Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Dave Chinner dchinner@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_bmap_item.c | 38 ++++++++++++-------------------------- 1 file changed, 12 insertions(+), 26 deletions(-)
--- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -434,9 +434,7 @@ xfs_bui_recover( xfs_fsblock_t startblock_fsb; xfs_fsblock_t inode_fsb; xfs_filblks_t count; - bool op_ok; struct xfs_bud_log_item *budp; - enum xfs_bmap_intent_type type; int whichfork; xfs_exntst_t state; struct xfs_trans *tp; @@ -462,16 +460,19 @@ xfs_bui_recover( XFS_FSB_TO_DADDR(mp, bmap->me_startblock)); inode_fsb = XFS_BB_TO_FSB(mp, XFS_FSB_TO_DADDR(mp, XFS_INO_TO_FSB(mp, bmap->me_owner))); - switch (bmap->me_flags & XFS_BMAP_EXTENT_TYPE_MASK) { + state = (bmap->me_flags & XFS_BMAP_EXTENT_UNWRITTEN) ? + XFS_EXT_UNWRITTEN : XFS_EXT_NORM; + whichfork = (bmap->me_flags & XFS_BMAP_EXTENT_ATTR_FORK) ? + XFS_ATTR_FORK : XFS_DATA_FORK; + bui_type = bmap->me_flags & XFS_BMAP_EXTENT_TYPE_MASK; + switch (bui_type) { case XFS_BMAP_MAP: case XFS_BMAP_UNMAP: - op_ok = true; break; default: - op_ok = false; - break; + return -EFSCORRUPTED; } - if (!op_ok || startblock_fsb == 0 || + if (startblock_fsb == 0 || bmap->me_len == 0 || inode_fsb == 0 || startblock_fsb >= mp->m_sb.sb_dblocks || @@ -502,32 +503,17 @@ xfs_bui_recover( if (VFS_I(ip)->i_nlink == 0) xfs_iflags_set(ip, XFS_IRECOVERY);
- /* Process deferred bmap item. */ - state = (bmap->me_flags & XFS_BMAP_EXTENT_UNWRITTEN) ? - XFS_EXT_UNWRITTEN : XFS_EXT_NORM; - whichfork = (bmap->me_flags & XFS_BMAP_EXTENT_ATTR_FORK) ? - XFS_ATTR_FORK : XFS_DATA_FORK; - bui_type = bmap->me_flags & XFS_BMAP_EXTENT_TYPE_MASK; - switch (bui_type) { - case XFS_BMAP_MAP: - case XFS_BMAP_UNMAP: - type = bui_type; - break; - default: - XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp); - error = -EFSCORRUPTED; - goto err_inode; - } xfs_trans_ijoin(tp, ip, 0);
count = bmap->me_len; - error = xfs_trans_log_finish_bmap_update(tp, budp, type, ip, whichfork, - bmap->me_startoff, bmap->me_startblock, &count, state); + error = xfs_trans_log_finish_bmap_update(tp, budp, bui_type, ip, + whichfork, bmap->me_startoff, bmap->me_startblock, + &count, state); if (error) goto err_inode;
if (count > 0) { - ASSERT(type == XFS_BMAP_UNMAP); + ASSERT(bui_type == XFS_BMAP_UNMAP); irec.br_startblock = bmap->me_startblock; irec.br_blockcount = count; irec.br_startoff = bmap->me_startoff;
From: "Darrick J. Wong" darrick.wong@oracle.com
commit 64a3f3315bc60f710a0a25c1798ac0ea58c6fa1f upstream.
In most places in XFS, we have a specific order in which we gather resources: grab the inode, allocate a transaction, then lock the inode. xfs_bui_item_recover doesn't do it in that order, so fix it to be more consistent. This also makes the error bailout code a bit less weird.
Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Dave Chinner dchinner@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Brian Foster bfoster@redhat.com Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_bmap_item.c | 38 ++++++++++++++++++++++++-------------- 1 file changed, 24 insertions(+), 14 deletions(-)
--- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -22,6 +22,7 @@ #include "xfs_bmap_btree.h" #include "xfs_trans_space.h" #include "xfs_error.h" +#include "xfs_quota.h"
kmem_zone_t *xfs_bui_zone; kmem_zone_t *xfs_bud_zone; @@ -488,21 +489,26 @@ xfs_bui_recover( return -EFSCORRUPTED; }
- error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, - XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK), 0, 0, &tp); + /* Grab the inode. */ + error = xfs_iget(mp, NULL, bmap->me_owner, 0, 0, &ip); if (error) return error;
- budp = xfs_trans_get_bud(tp, buip); - - /* Grab the inode. */ - error = xfs_iget(mp, tp, bmap->me_owner, 0, XFS_ILOCK_EXCL, &ip); + error = xfs_qm_dqattach(ip); if (error) - goto err_inode; + goto err_rele;
if (VFS_I(ip)->i_nlink == 0) xfs_iflags_set(ip, XFS_IRECOVERY);
+ /* Allocate transaction and do the work. */ + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, + XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK), 0, 0, &tp); + if (error) + goto err_rele; + + budp = xfs_trans_get_bud(tp, buip); + xfs_ilock(ip, XFS_ILOCK_EXCL); xfs_trans_ijoin(tp, ip, 0);
count = bmap->me_len; @@ -510,7 +516,7 @@ xfs_bui_recover( whichfork, bmap->me_startoff, bmap->me_startblock, &count, state); if (error) - goto err_inode; + goto err_cancel;
if (count > 0) { ASSERT(bui_type == XFS_BMAP_UNMAP); @@ -522,16 +528,20 @@ xfs_bui_recover( }
set_bit(XFS_BUI_RECOVERED, &buip->bui_flags); + /* Commit transaction, which frees the transaction. */ error = xfs_defer_ops_capture_and_commit(tp, capture_list); + if (error) + goto err_unlock; + xfs_iunlock(ip, XFS_ILOCK_EXCL); xfs_irele(ip); - return error; + return 0;
-err_inode: +err_cancel: xfs_trans_cancel(tp); - if (ip) { - xfs_iunlock(ip, XFS_ILOCK_EXCL); - xfs_irele(ip); - } +err_unlock: + xfs_iunlock(ip, XFS_ILOCK_EXCL); +err_rele: + xfs_irele(ip); return error; }
From: "Darrick J. Wong" darrick.wong@oracle.com
commit ff4ab5e02a0447dd1e290883eb6cd7d94848e590 upstream.
In xfs_bui_item_recover, there exists a use-after-free bug with regards to the inode that is involved in the bmap replay operation. If the mapping operation does not complete, we call xfs_bmap_unmap_extent to create a deferred op to finish the unmapping work, and we retain a pointer to the incore inode.
Unfortunately, the very next thing we do is commit the transaction and drop the inode. If reclaim tears down the inode before we try to finish the defer ops, we dereference garbage and blow up. Therefore, create a way to join inodes to the defer ops freezer so that we can maintain the xfs_inode reference until we're done with the inode.
Note: This imposes the requirement that there be enough memory to keep every incore inode in memory throughout recovery.
Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Brian Foster bfoster@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_defer.c | 43 ++++++++++++++++++++++++++++++++++++++----- fs/xfs/libxfs/xfs_defer.h | 11 +++++++++-- fs/xfs/xfs_bmap_item.c | 7 +++++-- fs/xfs/xfs_extfree_item.c | 2 +- fs/xfs/xfs_log_recover.c | 7 ++++++- fs/xfs/xfs_refcount_item.c | 2 +- fs/xfs/xfs_rmap_item.c | 2 +- 7 files changed, 61 insertions(+), 13 deletions(-)
--- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -16,6 +16,7 @@ #include "xfs_inode.h" #include "xfs_inode_item.h" #include "xfs_trace.h" +#include "xfs_icache.h"
/* * Deferred Operations in XFS @@ -567,10 +568,14 @@ xfs_defer_move( * deferred ops state is transferred to the capture structure and the * transaction is then ready for the caller to commit it. If there are no * intent items to capture, this function returns NULL. + * + * If capture_ip is not NULL, the capture structure will obtain an extra + * reference to the inode. */ static struct xfs_defer_capture * xfs_defer_ops_capture( - struct xfs_trans *tp) + struct xfs_trans *tp, + struct xfs_inode *capture_ip) { struct xfs_defer_capture *dfc;
@@ -596,6 +601,15 @@ xfs_defer_ops_capture( /* Preserve the log reservation size. */ dfc->dfc_logres = tp->t_log_res;
+ /* + * Grab an extra reference to this inode and attach it to the capture + * structure. + */ + if (capture_ip) { + ihold(VFS_I(capture_ip)); + dfc->dfc_capture_ip = capture_ip; + } + return dfc; }
@@ -606,24 +620,33 @@ xfs_defer_ops_release( struct xfs_defer_capture *dfc) { xfs_defer_cancel_list(mp, &dfc->dfc_dfops); + if (dfc->dfc_capture_ip) + xfs_irele(dfc->dfc_capture_ip); kmem_free(dfc); }
/* * Capture any deferred ops and commit the transaction. This is the last step - * needed to finish a log intent item that we recovered from the log. + * needed to finish a log intent item that we recovered from the log. If any + * of the deferred ops operate on an inode, the caller must pass in that inode + * so that the reference can be transferred to the capture structure. The + * caller must hold ILOCK_EXCL on the inode, and must unlock it before calling + * xfs_defer_ops_continue. */ int xfs_defer_ops_capture_and_commit( struct xfs_trans *tp, + struct xfs_inode *capture_ip, struct list_head *capture_list) { struct xfs_mount *mp = tp->t_mountp; struct xfs_defer_capture *dfc; int error;
+ ASSERT(!capture_ip || xfs_isilocked(capture_ip, XFS_ILOCK_EXCL)); + /* If we don't capture anything, commit transaction and exit. */ - dfc = xfs_defer_ops_capture(tp); + dfc = xfs_defer_ops_capture(tp, capture_ip); if (!dfc) return xfs_trans_commit(tp);
@@ -640,16 +663,26 @@ xfs_defer_ops_capture_and_commit(
/* * Attach a chain of captured deferred ops to a new transaction and free the - * capture structure. + * capture structure. If an inode was captured, it will be passed back to the + * caller with ILOCK_EXCL held and joined to the transaction with lockflags==0. + * The caller now owns the inode reference. */ void xfs_defer_ops_continue( struct xfs_defer_capture *dfc, - struct xfs_trans *tp) + struct xfs_trans *tp, + struct xfs_inode **captured_ipp) { ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES); ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
+ /* Lock and join the captured inode to the new transaction. */ + if (dfc->dfc_capture_ip) { + xfs_ilock(dfc->dfc_capture_ip, XFS_ILOCK_EXCL); + xfs_trans_ijoin(tp, dfc->dfc_capture_ip, 0); + } + *captured_ipp = dfc->dfc_capture_ip; + /* Move captured dfops chain and state to the transaction. */ list_splice_init(&dfc->dfc_dfops, &tp->t_dfops); tp->t_flags |= dfc->dfc_tpflags; --- a/fs/xfs/libxfs/xfs_defer.h +++ b/fs/xfs/libxfs/xfs_defer.h @@ -80,6 +80,12 @@ struct xfs_defer_capture {
/* Log reservation saved from the transaction. */ unsigned int dfc_logres; + + /* + * An inode reference that must be maintained to complete the deferred + * work. + */ + struct xfs_inode *dfc_capture_ip; };
/* @@ -87,8 +93,9 @@ struct xfs_defer_capture { * This doesn't normally happen except log recovery. */ int xfs_defer_ops_capture_and_commit(struct xfs_trans *tp, - struct list_head *capture_list); -void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp); + struct xfs_inode *capture_ip, struct list_head *capture_list); +void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp, + struct xfs_inode **captured_ipp); void xfs_defer_ops_release(struct xfs_mount *mp, struct xfs_defer_capture *d);
#endif /* __XFS_DEFER_H__ */ --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -528,8 +528,11 @@ xfs_bui_recover( }
set_bit(XFS_BUI_RECOVERED, &buip->bui_flags); - /* Commit transaction, which frees the transaction. */ - error = xfs_defer_ops_capture_and_commit(tp, capture_list); + /* + * Commit transaction, which frees the transaction and saves the inode + * for later replay activities. + */ + error = xfs_defer_ops_capture_and_commit(tp, ip, capture_list); if (error) goto err_unlock;
--- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -639,7 +639,7 @@ xfs_efi_recover(
set_bit(XFS_EFI_RECOVERED, &efip->efi_flags);
- return xfs_defer_ops_capture_and_commit(tp, capture_list); + return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
abort_error: xfs_trans_cancel(tp); --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -4766,6 +4766,7 @@ xlog_finish_defer_ops( { struct xfs_defer_capture *dfc, *next; struct xfs_trans *tp; + struct xfs_inode *ip; int error = 0;
list_for_each_entry_safe(dfc, next, capture_list, dfc_list) { @@ -4791,9 +4792,13 @@ xlog_finish_defer_ops( * from recovering a single intent item. */ list_del_init(&dfc->dfc_list); - xfs_defer_ops_continue(dfc, tp); + xfs_defer_ops_continue(dfc, tp, &ip);
error = xfs_trans_commit(tp); + if (ip) { + xfs_iunlock(ip, XFS_ILOCK_EXCL); + xfs_irele(ip); + } if (error) return error; } --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -569,7 +569,7 @@ xfs_cui_recover(
xfs_refcount_finish_one_cleanup(tp, rcur, error); set_bit(XFS_CUI_RECOVERED, &cuip->cui_flags); - return xfs_defer_ops_capture_and_commit(tp, capture_list); + return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
abort_error: xfs_refcount_finish_one_cleanup(tp, rcur, error); --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -593,7 +593,7 @@ xfs_rui_recover(
xfs_rmap_finish_one_cleanup(tp, rcur, error); set_bit(XFS_RUI_RECOVERED, &ruip->rui_flags); - return xfs_defer_ops_capture_and_commit(tp, capture_list); + return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
abort_error: xfs_rmap_finish_one_cleanup(tp, rcur, error);
From: "Darrick J. Wong" darrick.wong@oracle.com
commit 27dada070d59c28a441f1907d2cec891b17dcb26 upstream.
The defer ops code has been finishing items in the wrong order -- if a top level defer op creates items A and B, and finishing item A creates more defer ops A1 and A2, we'll put the new items on the end of the chain and process them in the order A B A1 A2. This is kind of weird, since it's convenient for programmers to be able to think of A and B as an ordered sequence where all the sub-tasks for A must finish before we move on to B, e.g. A A1 A2 D.
Right now, our log intent items are not so complex that this matters, but this will become important for the atomic extent swapping patchset. In order to maintain correct reference counting of extents, we have to unmap and remap extents in that order, and we want to complete that work before moving on to the next range that the user wants to swap. This patch fixes defer ops to satsify that requirement.
The primary symptom of the incorrect order was noticed in an early performance analysis of the atomic extent swap code. An astonishingly large number of deferred work items accumulated when userspace requested an atomic update of two very fragmented files. The cause of this was traced to the same ordering bug in the inner loop of xfs_defer_finish_noroll.
If the ->finish_item method of a deferred operation queues new deferred operations, those new deferred ops are appended to the tail of the pending work list. To illustrate, say that a caller creates a transaction t0 with four deferred operations D0-D3. The first thing defer ops does is roll the transaction to t1, leaving us with:
t1: D0(t0), D1(t0), D2(t0), D3(t0)
Let's say that finishing each of D0-D3 will create two new deferred ops. After finish D0 and roll, we'll have the following chain:
t2: D1(t0), D2(t0), D3(t0), d4(t1), d5(t1)
d4 and d5 were logged to t1. Notice that while we're about to start work on D1, we haven't actually completed all the work implied by D0 being finished. So far we've been careful (or lucky) to structure the dfops callers such that D1 doesn't depend on d4 or d5 being finished, but this is a potential logic bomb.
There's a second problem lurking. Let's see what happens as we finish D1-D3:
t3: D2(t0), D3(t0), d4(t1), d5(t1), d6(t2), d7(t2) t4: D3(t0), d4(t1), d5(t1), d6(t2), d7(t2), d8(t3), d9(t3) t5: d4(t1), d5(t1), d6(t2), d7(t2), d8(t3), d9(t3), d10(t4), d11(t4)
Let's say that d4-d11 are simple work items that don't queue any other operations, which means that we can complete each d4 and roll to t6:
t6: d5(t1), d6(t2), d7(t2), d8(t3), d9(t3), d10(t4), d11(t4) t7: d6(t2), d7(t2), d8(t3), d9(t3), d10(t4), d11(t4) ... t11: d10(t4), d11(t4) t12: d11(t4) <done>
When we try to roll to transaction #12, we're holding defer op d11, which we logged way back in t4. This means that the tail of the log is pinned at t4. If the log is very small or there are a lot of other threads updating metadata, this means that we might have wrapped the log and cannot get roll to t11 because there isn't enough space left before we'd run into t4.
Let's shift back to the original failure. I mentioned before that I discovered this flaw while developing the atomic file update code. In that scenario, we have a defer op (D0) that finds a range of file blocks to remap, creates a handful of new defer ops to do that, and then asks to be continued with however much work remains.
So, D0 is the original swapext deferred op. The first thing defer ops does is rolls to t1:
t1: D0(t0)
We try to finish D0, logging d1 and d2 in the process, but can't get all the work done. We log a done item and a new intent item for the work that D0 still has to do, and roll to t2:
t2: D0'(t1), d1(t1), d2(t1)
We roll and try to finish D0', but still can't get all the work done, so we log a done item and a new intent item for it, requeue D0 a second time, and roll to t3:
t3: D0''(t2), d1(t1), d2(t1), d3(t2), d4(t2)
If it takes 48 more rolls to complete D0, then we'll finally dispense with D0 in t50:
t50: D<fifty primes>(t49), d1(t1), ..., d102(t50)
We then try to roll again to get a chain like this:
t51: d1(t1), d2(t1), ..., d101(t50), d102(t50) ... t152: d102(t50) <done>
Notice that in rolling to transaction #51, we're holding on to a log intent item for d1 that was logged in transaction #1. This means that the tail of the log is pinned at t1. If the log is very small or there are a lot of other threads updating metadata, this means that we might have wrapped the log and cannot roll to t51 because there isn't enough space left before we'd run into t1. This is of course problem #2 again.
But notice the third problem with this scenario: we have 102 defer ops tied to this transaction! Each of these items are backed by pinned kernel memory, which means that we risk OOM if the chains get too long.
Yikes. Problem #1 is a subtle logic bomb that could hit someone in the future; problem #2 applies (rarely) to the current upstream, and problem
This is not how incremental deferred operations were supposed to work. The dfops design of logging in the same transaction an intent-done item and a new intent item for the work remaining was to make it so that we only have to juggle enough deferred work items to finish that one small piece of work. Deferred log item recovery will find that first unfinished work item and restart it, no matter how many other intent items might follow it in the log. Therefore, it's ok to put the new intents at the start of the dfops chain.
For the first example, the chains look like this:
t2: d4(t1), d5(t1), D1(t0), D2(t0), D3(t0) t3: d5(t1), D1(t0), D2(t0), D3(t0) ... t9: d9(t7), D3(t0) t10: D3(t0) t11: d10(t10), d11(t10) t12: d11(t10)
For the second example, the chains look like this:
t1: D0(t0) t2: d1(t1), d2(t1), D0'(t1) t3: d2(t1), D0'(t1) t4: D0'(t1) t5: d1(t4), d2(t4), D0''(t4) ... t148: D0<50 primes>(t147) t149: d101(t148), d102(t148) t150: d102(t148) <done>
This actually sucks more for pinning the log tail (we try to roll to t10 while holding an intent item that was logged in t1) but we've solved problem #1. We've also reduced the maximum chain length from:
sum(all the new items) + nr_original_items
to:
max(new items that each original item creates) + nr_original_items
This solves problem #3 by sharply reducing the number of defer ops that can be attached to a transaction at any given time. The change makes the problem of log tail pinning worse, but is improvement we need to solve problem #2. Actually solving #2, however, is left to the next patch.
Note that a subsequent analysis of some hard-to-trigger reflink and COW livelocks on extremely fragmented filesystems (or systems running a lot of IO threads) showed the same symptoms -- uncomfortably large numbers of incore deferred work items and occasional stalls in the transaction grant code while waiting for log reservations. I think this patch and the next one will also solve these problems.
As originally written, the code used list_splice_tail_init instead of list_splice_init, so change that, and leave a short comment explaining our actions.
Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Dave Chinner dchinner@redhat.com Reviewed-by: Brian Foster bfoster@redhat.com Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_defer.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
--- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -431,8 +431,17 @@ xfs_defer_finish_noroll(
/* Until we run out of pending work to finish... */ while (!list_empty(&dop_pending) || !list_empty(&(*tp)->t_dfops)) { + /* + * Deferred items that are created in the process of finishing + * other deferred work items should be queued at the head of + * the pending list, which puts them ahead of the deferred work + * that was created by the caller. This keeps the number of + * pending work items to a minimum, which decreases the amount + * of time that any one intent item can stick around in memory, + * pinning the log tail. + */ xfs_defer_create_intents(*tp); - list_splice_tail_init(&(*tp)->t_dfops, &dop_pending); + list_splice_init(&(*tp)->t_dfops, &dop_pending);
error = xfs_defer_trans_roll(tp); if (error)
From: "Darrick J. Wong" darrick.wong@oracle.com
commit 4e919af7827a6adfc28e82cd6c4ffcfcc3dd6118 upstream.
[ Modify xfs_{bmap|extfree|refcount|rmap}_item.c to fix merge conflicts ]
There's a subtle design flaw in the deferred log item code that can lead to pinning the log tail. Taking up the defer ops chain examples from the previous commit, we can get trapped in sequences like this:
Caller hands us a transaction t0 with D0-D3 attached. The defer ops chain will look like the following if the transaction rolls succeed:
t1: D0(t0), D1(t0), D2(t0), D3(t0) t2: d4(t1), d5(t1), D1(t0), D2(t0), D3(t0) t3: d5(t1), D1(t0), D2(t0), D3(t0) ... t9: d9(t7), D3(t0) t10: D3(t0) t11: d10(t10), d11(t10) t12: d11(t10)
In transaction 9, we finish d9 and try to roll to t10 while holding onto an intent item for D3 that we logged in t0.
The previous commit changed the order in which we place new defer ops in the defer ops processing chain to reduce the maximum chain length. Now make xfs_defer_finish_noroll capable of relogging the entire chain periodically so that we can always move the log tail forward. Most chains will never get relogged, except for operations that generate very long chains (large extents containing many blocks with different sharing levels) or are on filesystems with small logs and a lot of ongoing metadata updates.
Callers are now required to ensure that the transaction reservation is large enough to handle logging done items and new intent items for the maximum possible chain length. Most callers are careful to keep the chain lengths low, so the overhead should be minimal.
The decision to relog an intent item is made based on whether the intent was logged in a previous checkpoint, since there's no point in relogging an intent into the same checkpoint.
Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Brian Foster bfoster@redhat.com Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_defer.c | 42 ++++++++++++++++++ fs/xfs/xfs_bmap_item.c | 83 +++++++++++++++++++++++------------ fs/xfs/xfs_extfree_item.c | 104 ++++++++++++++++++++++++++++----------------- fs/xfs/xfs_refcount_item.c | 95 ++++++++++++++++++++++++++--------------- fs/xfs/xfs_rmap_item.c | 93 +++++++++++++++++++++++++--------------- fs/xfs/xfs_stats.c | 4 + fs/xfs/xfs_stats.h | 1 fs/xfs/xfs_trace.h | 1 fs/xfs/xfs_trans.h | 10 ++++ 9 files changed, 300 insertions(+), 133 deletions(-)
--- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -17,6 +17,7 @@ #include "xfs_inode_item.h" #include "xfs_trace.h" #include "xfs_icache.h" +#include "xfs_log.h"
/* * Deferred Operations in XFS @@ -362,6 +363,42 @@ xfs_defer_cancel_list( }
/* + * Prevent a log intent item from pinning the tail of the log by logging a + * done item to release the intent item; and then log a new intent item. + * The caller should provide a fresh transaction and roll it after we're done. + */ +static int +xfs_defer_relog( + struct xfs_trans **tpp, + struct list_head *dfops) +{ + struct xfs_defer_pending *dfp; + + ASSERT((*tpp)->t_flags & XFS_TRANS_PERM_LOG_RES); + + list_for_each_entry(dfp, dfops, dfp_list) { + /* + * If the log intent item for this deferred op is not a part of + * the current log checkpoint, relog the intent item to keep + * the log tail moving forward. We're ok with this being racy + * because an incorrect decision means we'll be a little slower + * at pushing the tail. + */ + if (dfp->dfp_intent == NULL || + xfs_log_item_in_current_chkpt(dfp->dfp_intent)) + continue; + + trace_xfs_defer_relog_intent((*tpp)->t_mountp, dfp); + XFS_STATS_INC((*tpp)->t_mountp, defer_relog); + dfp->dfp_intent = xfs_trans_item_relog(dfp->dfp_intent, *tpp); + } + + if ((*tpp)->t_flags & XFS_TRANS_DIRTY) + return xfs_defer_trans_roll(tpp); + return 0; +} + +/* * Log an intent-done item for the first pending intent, and finish the work * items. */ @@ -447,6 +484,11 @@ xfs_defer_finish_noroll( if (error) goto out_shutdown;
+ /* Possibly relog intent items to keep the log moving. */ + error = xfs_defer_relog(tp, &dop_pending); + if (error) + goto out_shutdown; + dfp = list_first_entry(&dop_pending, struct xfs_defer_pending, dfp_list); error = xfs_defer_finish_one(*tp, dfp); --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -125,34 +125,6 @@ xfs_bui_item_release( xfs_bui_release(BUI_ITEM(lip)); }
-static const struct xfs_item_ops xfs_bui_item_ops = { - .iop_size = xfs_bui_item_size, - .iop_format = xfs_bui_item_format, - .iop_unpin = xfs_bui_item_unpin, - .iop_release = xfs_bui_item_release, -}; - -/* - * Allocate and initialize an bui item with the given number of extents. - */ -struct xfs_bui_log_item * -xfs_bui_init( - struct xfs_mount *mp) - -{ - struct xfs_bui_log_item *buip; - - buip = kmem_zone_zalloc(xfs_bui_zone, 0); - - xfs_log_item_init(mp, &buip->bui_item, XFS_LI_BUI, &xfs_bui_item_ops); - buip->bui_format.bui_nextents = XFS_BUI_MAX_FAST_EXTENTS; - buip->bui_format.bui_id = (uintptr_t)(void *)buip; - atomic_set(&buip->bui_next_extent, 0); - atomic_set(&buip->bui_refcount, 2); - - return buip; -} - static inline struct xfs_bud_log_item *BUD_ITEM(struct xfs_log_item *lip) { return container_of(lip, struct xfs_bud_log_item, bud_item); @@ -548,3 +520,58 @@ err_rele: xfs_irele(ip); return error; } + +/* Relog an intent item to push the log tail forward. */ +static struct xfs_log_item * +xfs_bui_item_relog( + struct xfs_log_item *intent, + struct xfs_trans *tp) +{ + struct xfs_bud_log_item *budp; + struct xfs_bui_log_item *buip; + struct xfs_map_extent *extp; + unsigned int count; + + count = BUI_ITEM(intent)->bui_format.bui_nextents; + extp = BUI_ITEM(intent)->bui_format.bui_extents; + + tp->t_flags |= XFS_TRANS_DIRTY; + budp = xfs_trans_get_bud(tp, BUI_ITEM(intent)); + set_bit(XFS_LI_DIRTY, &budp->bud_item.li_flags); + + buip = xfs_bui_init(tp->t_mountp); + memcpy(buip->bui_format.bui_extents, extp, count * sizeof(*extp)); + atomic_set(&buip->bui_next_extent, count); + xfs_trans_add_item(tp, &buip->bui_item); + set_bit(XFS_LI_DIRTY, &buip->bui_item.li_flags); + return &buip->bui_item; +} + +static const struct xfs_item_ops xfs_bui_item_ops = { + .iop_size = xfs_bui_item_size, + .iop_format = xfs_bui_item_format, + .iop_unpin = xfs_bui_item_unpin, + .iop_release = xfs_bui_item_release, + .iop_relog = xfs_bui_item_relog, +}; + +/* + * Allocate and initialize an bui item with the given number of extents. + */ +struct xfs_bui_log_item * +xfs_bui_init( + struct xfs_mount *mp) + +{ + struct xfs_bui_log_item *buip; + + buip = kmem_zone_zalloc(xfs_bui_zone, 0); + + xfs_log_item_init(mp, &buip->bui_item, XFS_LI_BUI, &xfs_bui_item_ops); + buip->bui_format.bui_nextents = XFS_BUI_MAX_FAST_EXTENTS; + buip->bui_format.bui_id = (uintptr_t)(void *)buip; + atomic_set(&buip->bui_next_extent, 0); + atomic_set(&buip->bui_refcount, 2); + + return buip; +} --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -139,44 +139,6 @@ xfs_efi_item_release( xfs_efi_release(EFI_ITEM(lip)); }
-static const struct xfs_item_ops xfs_efi_item_ops = { - .iop_size = xfs_efi_item_size, - .iop_format = xfs_efi_item_format, - .iop_unpin = xfs_efi_item_unpin, - .iop_release = xfs_efi_item_release, -}; - - -/* - * Allocate and initialize an efi item with the given number of extents. - */ -struct xfs_efi_log_item * -xfs_efi_init( - struct xfs_mount *mp, - uint nextents) - -{ - struct xfs_efi_log_item *efip; - uint size; - - ASSERT(nextents > 0); - if (nextents > XFS_EFI_MAX_FAST_EXTENTS) { - size = (uint)(sizeof(struct xfs_efi_log_item) + - ((nextents - 1) * sizeof(xfs_extent_t))); - efip = kmem_zalloc(size, 0); - } else { - efip = kmem_zone_zalloc(xfs_efi_zone, 0); - } - - xfs_log_item_init(mp, &efip->efi_item, XFS_LI_EFI, &xfs_efi_item_ops); - efip->efi_format.efi_nextents = nextents; - efip->efi_format.efi_id = (uintptr_t)(void *)efip; - atomic_set(&efip->efi_next_extent, 0); - atomic_set(&efip->efi_refcount, 2); - - return efip; -} - /* * Copy an EFI format buffer from the given buf, and into the destination * EFI format structure. @@ -645,3 +607,69 @@ abort_error: xfs_trans_cancel(tp); return error; } + +/* Relog an intent item to push the log tail forward. */ +static struct xfs_log_item * +xfs_efi_item_relog( + struct xfs_log_item *intent, + struct xfs_trans *tp) +{ + struct xfs_efd_log_item *efdp; + struct xfs_efi_log_item *efip; + struct xfs_extent *extp; + unsigned int count; + + count = EFI_ITEM(intent)->efi_format.efi_nextents; + extp = EFI_ITEM(intent)->efi_format.efi_extents; + + tp->t_flags |= XFS_TRANS_DIRTY; + efdp = xfs_trans_get_efd(tp, EFI_ITEM(intent), count); + efdp->efd_next_extent = count; + memcpy(efdp->efd_format.efd_extents, extp, count * sizeof(*extp)); + set_bit(XFS_LI_DIRTY, &efdp->efd_item.li_flags); + + efip = xfs_efi_init(tp->t_mountp, count); + memcpy(efip->efi_format.efi_extents, extp, count * sizeof(*extp)); + atomic_set(&efip->efi_next_extent, count); + xfs_trans_add_item(tp, &efip->efi_item); + set_bit(XFS_LI_DIRTY, &efip->efi_item.li_flags); + return &efip->efi_item; +} + +static const struct xfs_item_ops xfs_efi_item_ops = { + .iop_size = xfs_efi_item_size, + .iop_format = xfs_efi_item_format, + .iop_unpin = xfs_efi_item_unpin, + .iop_release = xfs_efi_item_release, + .iop_relog = xfs_efi_item_relog, +}; + +/* + * Allocate and initialize an efi item with the given number of extents. + */ +struct xfs_efi_log_item * +xfs_efi_init( + struct xfs_mount *mp, + uint nextents) + +{ + struct xfs_efi_log_item *efip; + uint size; + + ASSERT(nextents > 0); + if (nextents > XFS_EFI_MAX_FAST_EXTENTS) { + size = (uint)(sizeof(struct xfs_efi_log_item) + + ((nextents - 1) * sizeof(xfs_extent_t))); + efip = kmem_zalloc(size, 0); + } else { + efip = kmem_zone_zalloc(xfs_efi_zone, 0); + } + + xfs_log_item_init(mp, &efip->efi_item, XFS_LI_EFI, &xfs_efi_item_ops); + efip->efi_format.efi_nextents = nextents; + efip->efi_format.efi_id = (uintptr_t)(void *)efip; + atomic_set(&efip->efi_next_extent, 0); + atomic_set(&efip->efi_refcount, 2); + + return efip; +} --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -123,40 +123,6 @@ xfs_cui_item_release( xfs_cui_release(CUI_ITEM(lip)); }
-static const struct xfs_item_ops xfs_cui_item_ops = { - .iop_size = xfs_cui_item_size, - .iop_format = xfs_cui_item_format, - .iop_unpin = xfs_cui_item_unpin, - .iop_release = xfs_cui_item_release, -}; - -/* - * Allocate and initialize an cui item with the given number of extents. - */ -struct xfs_cui_log_item * -xfs_cui_init( - struct xfs_mount *mp, - uint nextents) - -{ - struct xfs_cui_log_item *cuip; - - ASSERT(nextents > 0); - if (nextents > XFS_CUI_MAX_FAST_EXTENTS) - cuip = kmem_zalloc(xfs_cui_log_item_sizeof(nextents), - 0); - else - cuip = kmem_zone_zalloc(xfs_cui_zone, 0); - - xfs_log_item_init(mp, &cuip->cui_item, XFS_LI_CUI, &xfs_cui_item_ops); - cuip->cui_format.cui_nextents = nextents; - cuip->cui_format.cui_id = (uintptr_t)(void *)cuip; - atomic_set(&cuip->cui_next_extent, 0); - atomic_set(&cuip->cui_refcount, 2); - - return cuip; -} - static inline struct xfs_cud_log_item *CUD_ITEM(struct xfs_log_item *lip) { return container_of(lip, struct xfs_cud_log_item, cud_item); @@ -576,3 +542,64 @@ abort_error: xfs_trans_cancel(tp); return error; } + +/* Relog an intent item to push the log tail forward. */ +static struct xfs_log_item * +xfs_cui_item_relog( + struct xfs_log_item *intent, + struct xfs_trans *tp) +{ + struct xfs_cud_log_item *cudp; + struct xfs_cui_log_item *cuip; + struct xfs_phys_extent *extp; + unsigned int count; + + count = CUI_ITEM(intent)->cui_format.cui_nextents; + extp = CUI_ITEM(intent)->cui_format.cui_extents; + + tp->t_flags |= XFS_TRANS_DIRTY; + cudp = xfs_trans_get_cud(tp, CUI_ITEM(intent)); + set_bit(XFS_LI_DIRTY, &cudp->cud_item.li_flags); + + cuip = xfs_cui_init(tp->t_mountp, count); + memcpy(cuip->cui_format.cui_extents, extp, count * sizeof(*extp)); + atomic_set(&cuip->cui_next_extent, count); + xfs_trans_add_item(tp, &cuip->cui_item); + set_bit(XFS_LI_DIRTY, &cuip->cui_item.li_flags); + return &cuip->cui_item; +} + +static const struct xfs_item_ops xfs_cui_item_ops = { + .iop_size = xfs_cui_item_size, + .iop_format = xfs_cui_item_format, + .iop_unpin = xfs_cui_item_unpin, + .iop_release = xfs_cui_item_release, + .iop_relog = xfs_cui_item_relog, +}; + +/* + * Allocate and initialize an cui item with the given number of extents. + */ +struct xfs_cui_log_item * +xfs_cui_init( + struct xfs_mount *mp, + uint nextents) + +{ + struct xfs_cui_log_item *cuip; + + ASSERT(nextents > 0); + if (nextents > XFS_CUI_MAX_FAST_EXTENTS) + cuip = kmem_zalloc(xfs_cui_log_item_sizeof(nextents), + 0); + else + cuip = kmem_zone_zalloc(xfs_cui_zone, 0); + + xfs_log_item_init(mp, &cuip->cui_item, XFS_LI_CUI, &xfs_cui_item_ops); + cuip->cui_format.cui_nextents = nextents; + cuip->cui_format.cui_id = (uintptr_t)(void *)cuip; + atomic_set(&cuip->cui_next_extent, 0); + atomic_set(&cuip->cui_refcount, 2); + + return cuip; +} --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -122,39 +122,6 @@ xfs_rui_item_release( xfs_rui_release(RUI_ITEM(lip)); }
-static const struct xfs_item_ops xfs_rui_item_ops = { - .iop_size = xfs_rui_item_size, - .iop_format = xfs_rui_item_format, - .iop_unpin = xfs_rui_item_unpin, - .iop_release = xfs_rui_item_release, -}; - -/* - * Allocate and initialize an rui item with the given number of extents. - */ -struct xfs_rui_log_item * -xfs_rui_init( - struct xfs_mount *mp, - uint nextents) - -{ - struct xfs_rui_log_item *ruip; - - ASSERT(nextents > 0); - if (nextents > XFS_RUI_MAX_FAST_EXTENTS) - ruip = kmem_zalloc(xfs_rui_log_item_sizeof(nextents), 0); - else - ruip = kmem_zone_zalloc(xfs_rui_zone, 0); - - xfs_log_item_init(mp, &ruip->rui_item, XFS_LI_RUI, &xfs_rui_item_ops); - ruip->rui_format.rui_nextents = nextents; - ruip->rui_format.rui_id = (uintptr_t)(void *)ruip; - atomic_set(&ruip->rui_next_extent, 0); - atomic_set(&ruip->rui_refcount, 2); - - return ruip; -} - /* * Copy an RUI format buffer from the given buf, and into the destination * RUI format structure. The RUI/RUD items were designed not to need any @@ -600,3 +567,63 @@ abort_error: xfs_trans_cancel(tp); return error; } + +/* Relog an intent item to push the log tail forward. */ +static struct xfs_log_item * +xfs_rui_item_relog( + struct xfs_log_item *intent, + struct xfs_trans *tp) +{ + struct xfs_rud_log_item *rudp; + struct xfs_rui_log_item *ruip; + struct xfs_map_extent *extp; + unsigned int count; + + count = RUI_ITEM(intent)->rui_format.rui_nextents; + extp = RUI_ITEM(intent)->rui_format.rui_extents; + + tp->t_flags |= XFS_TRANS_DIRTY; + rudp = xfs_trans_get_rud(tp, RUI_ITEM(intent)); + set_bit(XFS_LI_DIRTY, &rudp->rud_item.li_flags); + + ruip = xfs_rui_init(tp->t_mountp, count); + memcpy(ruip->rui_format.rui_extents, extp, count * sizeof(*extp)); + atomic_set(&ruip->rui_next_extent, count); + xfs_trans_add_item(tp, &ruip->rui_item); + set_bit(XFS_LI_DIRTY, &ruip->rui_item.li_flags); + return &ruip->rui_item; +} + +static const struct xfs_item_ops xfs_rui_item_ops = { + .iop_size = xfs_rui_item_size, + .iop_format = xfs_rui_item_format, + .iop_unpin = xfs_rui_item_unpin, + .iop_release = xfs_rui_item_release, + .iop_relog = xfs_rui_item_relog, +}; + +/* + * Allocate and initialize an rui item with the given number of extents. + */ +struct xfs_rui_log_item * +xfs_rui_init( + struct xfs_mount *mp, + uint nextents) + +{ + struct xfs_rui_log_item *ruip; + + ASSERT(nextents > 0); + if (nextents > XFS_RUI_MAX_FAST_EXTENTS) + ruip = kmem_zalloc(xfs_rui_log_item_sizeof(nextents), 0); + else + ruip = kmem_zone_zalloc(xfs_rui_zone, 0); + + xfs_log_item_init(mp, &ruip->rui_item, XFS_LI_RUI, &xfs_rui_item_ops); + ruip->rui_format.rui_nextents = nextents; + ruip->rui_format.rui_id = (uintptr_t)(void *)ruip; + atomic_set(&ruip->rui_next_extent, 0); + atomic_set(&ruip->rui_refcount, 2); + + return ruip; +} --- a/fs/xfs/xfs_stats.c +++ b/fs/xfs/xfs_stats.c @@ -23,6 +23,7 @@ int xfs_stats_format(struct xfsstats __p uint64_t xs_xstrat_bytes = 0; uint64_t xs_write_bytes = 0; uint64_t xs_read_bytes = 0; + uint64_t defer_relog = 0;
static const struct xstats_entry { char *desc; @@ -70,10 +71,13 @@ int xfs_stats_format(struct xfsstats __p xs_xstrat_bytes += per_cpu_ptr(stats, i)->s.xs_xstrat_bytes; xs_write_bytes += per_cpu_ptr(stats, i)->s.xs_write_bytes; xs_read_bytes += per_cpu_ptr(stats, i)->s.xs_read_bytes; + defer_relog += per_cpu_ptr(stats, i)->s.defer_relog; }
len += scnprintf(buf + len, PATH_MAX-len, "xpc %Lu %Lu %Lu\n", xs_xstrat_bytes, xs_write_bytes, xs_read_bytes); + len += scnprintf(buf + len, PATH_MAX-len, "defer_relog %llu\n", + defer_relog); len += scnprintf(buf + len, PATH_MAX-len, "debug %u\n", #if defined(DEBUG) 1); --- a/fs/xfs/xfs_stats.h +++ b/fs/xfs/xfs_stats.h @@ -137,6 +137,7 @@ struct __xfsstats { uint64_t xs_xstrat_bytes; uint64_t xs_write_bytes; uint64_t xs_read_bytes; + uint64_t defer_relog; };
#define xfsstats_offset(f) (offsetof(struct __xfsstats, f)/sizeof(uint32_t)) --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -2418,6 +2418,7 @@ DEFINE_DEFER_PENDING_EVENT(xfs_defer_cre DEFINE_DEFER_PENDING_EVENT(xfs_defer_cancel_list); DEFINE_DEFER_PENDING_EVENT(xfs_defer_pending_finish); DEFINE_DEFER_PENDING_EVENT(xfs_defer_pending_abort); +DEFINE_DEFER_PENDING_EVENT(xfs_defer_relog_intent);
#define DEFINE_BMAP_FREE_DEFERRED_EVENT DEFINE_PHYS_EXTENT_DEFERRED_EVENT DEFINE_BMAP_FREE_DEFERRED_EVENT(xfs_bmap_free_defer); --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -77,6 +77,8 @@ struct xfs_item_ops { void (*iop_release)(struct xfs_log_item *); xfs_lsn_t (*iop_committed)(struct xfs_log_item *, xfs_lsn_t); void (*iop_error)(struct xfs_log_item *, xfs_buf_t *); + struct xfs_log_item *(*iop_relog)(struct xfs_log_item *intent, + struct xfs_trans *tp); };
/* @@ -244,4 +246,12 @@ void xfs_trans_buf_copy_type(struct xfs
extern kmem_zone_t *xfs_trans_zone;
+static inline struct xfs_log_item * +xfs_trans_item_relog( + struct xfs_log_item *lip, + struct xfs_trans *tp) +{ + return lip->li_ops->iop_relog(lip, tp); +} + #endif /* __XFS_TRANS_H__ */
From: "Darrick J. Wong" darrick.wong@oracle.com
commit ed1575daf71e4e21d8ae735b6e687c95454aaa17 upstream.
Separate the computation of the log push threshold and the push logic in xlog_grant_push_ail. This enables higher level code to determine (for example) that it is holding on to a logged intent item and the log is so busy that it is more than 75% full. In that case, it would be desirable to move the log item towards the head to release the tail, which we will cover in the next patch.
Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Brian Foster bfoster@redhat.com Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_icreate_item.c | 1 + fs/xfs/xfs_log.c | 40 ++++++++++++++++++++++++++++++---------- fs/xfs/xfs_log.h | 2 ++ 3 files changed, 33 insertions(+), 10 deletions(-)
--- a/fs/xfs/xfs_icreate_item.c +++ b/fs/xfs/xfs_icreate_item.c @@ -10,6 +10,7 @@ #include "xfs_trans.h" #include "xfs_trans_priv.h" #include "xfs_icreate_item.h" +#include "xfs_log_priv.h" #include "xfs_log.h"
kmem_zone_t *xfs_icreate_zone; /* inode create item zone */ --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -1537,14 +1537,14 @@ xlog_commit_record( }
/* - * Push on the buffer cache code if we ever use more than 75% of the on-disk - * log space. This code pushes on the lsn which would supposedly free up - * the 25% which we want to leave free. We may need to adopt a policy which - * pushes on an lsn which is further along in the log once we reach the high - * water mark. In this manner, we would be creating a low water mark. + * Compute the LSN that we'd need to push the log tail towards in order to have + * (a) enough on-disk log space to log the number of bytes specified, (b) at + * least 25% of the log space free, and (c) at least 256 blocks free. If the + * log free space already meets all three thresholds, this function returns + * NULLCOMMITLSN. */ -STATIC void -xlog_grant_push_ail( +xfs_lsn_t +xlog_grant_push_threshold( struct xlog *log, int need_bytes) { @@ -1570,7 +1570,7 @@ xlog_grant_push_ail( free_threshold = max(free_threshold, (log->l_logBBsize >> 2)); free_threshold = max(free_threshold, 256); if (free_blocks >= free_threshold) - return; + return NULLCOMMITLSN;
xlog_crack_atomic_lsn(&log->l_tail_lsn, &threshold_cycle, &threshold_block); @@ -1590,13 +1590,33 @@ xlog_grant_push_ail( if (XFS_LSN_CMP(threshold_lsn, last_sync_lsn) > 0) threshold_lsn = last_sync_lsn;
+ return threshold_lsn; +} + +/* + * Push the tail of the log if we need to do so to maintain the free log space + * thresholds set out by xlog_grant_push_threshold. We may need to adopt a + * policy which pushes on an lsn which is further along in the log once we + * reach the high water mark. In this manner, we would be creating a low water + * mark. + */ +STATIC void +xlog_grant_push_ail( + struct xlog *log, + int need_bytes) +{ + xfs_lsn_t threshold_lsn; + + threshold_lsn = xlog_grant_push_threshold(log, need_bytes); + if (threshold_lsn == NULLCOMMITLSN || XLOG_FORCED_SHUTDOWN(log)) + return; + /* * Get the transaction layer to kick the dirty buffers out to * disk asynchronously. No point in trying to do this if * the filesystem is shutting down. */ - if (!XLOG_FORCED_SHUTDOWN(log)) - xfs_ail_push(log->l_ailp, threshold_lsn); + xfs_ail_push(log->l_ailp, threshold_lsn); }
/* --- a/fs/xfs/xfs_log.h +++ b/fs/xfs/xfs_log.h @@ -146,4 +146,6 @@ void xfs_log_quiesce(struct xfs_mount *m bool xfs_log_check_lsn(struct xfs_mount *, xfs_lsn_t); bool xfs_log_in_recovery(struct xfs_mount *);
+xfs_lsn_t xlog_grant_push_threshold(struct xlog *log, int need_bytes); + #endif /* __XFS_LOG_H__ */
From: "Darrick J. Wong" darrick.wong@oracle.com
commit 74f4d6a1e065c92428c5b588099e307a582d79d9 upstream.
Now that we have the ability to ask the log how far the tail needs to be pushed to maintain its free space targets, augment the decision to relog an intent item so that we only do it if the log has hit the 75% full threshold. There's no point in relogging an intent into the same checkpoint, and there's no need to relog if there's plenty of free space in the log.
Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Brian Foster bfoster@redhat.com Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/libxfs/xfs_defer.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
--- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -372,7 +372,10 @@ xfs_defer_relog( struct xfs_trans **tpp, struct list_head *dfops) { + struct xlog *log = (*tpp)->t_mountp->m_log; struct xfs_defer_pending *dfp; + xfs_lsn_t threshold_lsn = NULLCOMMITLSN; +
ASSERT((*tpp)->t_flags & XFS_TRANS_PERM_LOG_RES);
@@ -388,6 +391,19 @@ xfs_defer_relog( xfs_log_item_in_current_chkpt(dfp->dfp_intent)) continue;
+ /* + * Figure out where we need the tail to be in order to maintain + * the minimum required free space in the log. Only sample + * the log threshold once per call. + */ + if (threshold_lsn == NULLCOMMITLSN) { + threshold_lsn = xlog_grant_push_threshold(log, 0); + if (threshold_lsn == NULLCOMMITLSN) + break; + } + if (XFS_LSN_CMP(dfp->dfp_intent->li_lsn, threshold_lsn) >= 0) + continue; + trace_xfs_defer_relog_intent((*tpp)->t_mountp, dfp); XFS_STATS_INC((*tpp)->t_mountp, defer_relog); dfp->dfp_intent = xfs_trans_item_relog(dfp->dfp_intent, *tpp);
From: "Darrick J. Wong" darrick.wong@oracle.com
commit c2f09217a4305478c55adc9a98692488dd19cd32 upstream.
[ Set xfs_writepage_ctx->fork to XFS_DATA_FORK since 5.4.y tracks current extent's fork in this variable ]
In commit 7588cbeec6df, we tried to fix a race stemming from the lack of coordination between higher level code that wants to allocate and remap CoW fork extents into the data fork. Christoph cites as examples the always_cow mode, and a directio write completion racing with writeback.
According to the comments before the goto retry, we want to restart the lookup to catch the extent in the data fork, but we don't actually reset whichfork or cow_fsb, which means the second try executes using stale information. Up until now I think we've gotten lucky that either there's something left in the CoW fork to cause cow_fsb to be reset, or either data/cow fork sequence numbers have advanced enough to force a fresh lookup from the data fork. However, if we reach the retry with an empty stable CoW fork and a stable data fork, neither of those things happens. The retry foolishly re-calls xfs_convert_blocks on the CoW fork which fails again. This time, we toss the write.
I've recently been working on extending reflink to the realtime device. When the realtime extent size is larger than a single block, we have to force the page cache to CoW the entire rt extent if a write (or fallocate) are not aligned with the rt extent size. The strategy I've chosen to deal with this is derived from Dave's blocksize > pagesize series: dirtying around the write range, and ensuring that writeback always starts mapping on an rt extent boundary. This has brought this race front and center, since generic/522 blows up immediately.
However, I'm pretty sure this is a bug outright, independent of that.
Fixes: 7588cbeec6df ("xfs: retry COW fork delalloc conversion when no extent was found") Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_aops.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -495,7 +495,7 @@ xfs_map_blocks( ssize_t count = i_blocksize(inode); xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset); xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, offset + count); - xfs_fileoff_t cow_fsb = NULLFILEOFF; + xfs_fileoff_t cow_fsb; struct xfs_bmbt_irec imap; struct xfs_iext_cursor icur; int retries = 0; @@ -529,6 +529,8 @@ xfs_map_blocks( * landed in a hole and we skip the block. */ retry: + cow_fsb = NULLFILEOFF; + wpc->fork = XFS_DATA_FORK; xfs_ilock(ip, XFS_ILOCK_SHARED); ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE || (ip->i_df.if_flags & XFS_IFEXTENTS));
From: "Darrick J. Wong" darrick.wong@oracle.com
commit 27c14b5daa82861220d6fa6e27b51f05f21ffaa7 upstream.
[ In xfs_iwalk_ag(), Replace a call to XFS_IS_CORRUPT() with a call to ASSERT() ]
The aim of the inode btree record iterator function is to call a callback on every record in the btree. To avoid having to tear down and recreate the inode btree cursor around every callback, it caches a certain number of records in a memory buffer. After each batch of callback invocations, we have to perform a btree lookup to find the next record after where we left off.
However, if the keys of the inode btree are corrupt, the lookup might put us in the wrong part of the inode btree, causing the walk function to loop forever. Therefore, we add extra cursor tracking to make sure that we never go backwards neither when performing the lookup nor when jumping to the next inobt record. This also fixes an off by one error where upon resume the lookup should have been for the inode /after/ the point at which we stopped.
Found by fuzzing xfs/460 with keys[2].startino = ones causing bulkstat and quotacheck to hang.
Fixes: a211432c27ff ("xfs: create simplified inode walk function") Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Chandan Babu R chandanrlinux@gmail.com Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_iwalk.c | 27 ++++++++++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-)
--- a/fs/xfs/xfs_iwalk.c +++ b/fs/xfs/xfs_iwalk.c @@ -55,6 +55,9 @@ struct xfs_iwalk_ag { /* Where do we start the traversal? */ xfs_ino_t startino;
+ /* What was the last inode number we saw when iterating the inobt? */ + xfs_ino_t lastino; + /* Array of inobt records we cache. */ struct xfs_inobt_rec_incore *recs;
@@ -300,6 +303,9 @@ xfs_iwalk_ag_start( return error; XFS_WANT_CORRUPTED_RETURN(mp, *has_more == 1);
+ iwag->lastino = XFS_AGINO_TO_INO(mp, agno, + irec->ir_startino + XFS_INODES_PER_CHUNK - 1); + /* * If the LE lookup yielded an inobt record before the cursor position, * skip it and see if there's another one after it. @@ -346,15 +352,17 @@ xfs_iwalk_run_callbacks( struct xfs_mount *mp = iwag->mp; struct xfs_trans *tp = iwag->tp; struct xfs_inobt_rec_incore *irec; - xfs_agino_t restart; + xfs_agino_t next_agino; int error;
+ next_agino = XFS_INO_TO_AGINO(mp, iwag->lastino) + 1; + ASSERT(iwag->nr_recs > 0);
/* Delete cursor but remember the last record we cached... */ xfs_iwalk_del_inobt(tp, curpp, agi_bpp, 0); irec = &iwag->recs[iwag->nr_recs - 1]; - restart = irec->ir_startino + XFS_INODES_PER_CHUNK - 1; + ASSERT(next_agino == irec->ir_startino + XFS_INODES_PER_CHUNK);
error = xfs_iwalk_ag_recs(iwag); if (error) @@ -371,7 +379,7 @@ xfs_iwalk_run_callbacks( if (error) return error;
- return xfs_inobt_lookup(*curpp, restart, XFS_LOOKUP_GE, has_more); + return xfs_inobt_lookup(*curpp, next_agino, XFS_LOOKUP_GE, has_more); }
/* Walk all inodes in a single AG, from @iwag->startino to the end of the AG. */ @@ -395,6 +403,7 @@ xfs_iwalk_ag(
while (!error && has_more) { struct xfs_inobt_rec_incore *irec; + xfs_ino_t rec_fsino;
cond_resched(); if (xfs_pwork_want_abort(&iwag->pwork)) @@ -406,6 +415,15 @@ xfs_iwalk_ag( if (error || !has_more) break;
+ /* Make sure that we always move forward. */ + rec_fsino = XFS_AGINO_TO_INO(mp, agno, irec->ir_startino); + if (iwag->lastino != NULLFSINO && iwag->lastino >= rec_fsino) { + ASSERT(iwag->lastino < rec_fsino); + error = -EFSCORRUPTED; + goto out; + } + iwag->lastino = rec_fsino + XFS_INODES_PER_CHUNK - 1; + /* No allocated inodes in this chunk; skip it. */ if (iwag->skip_empty && irec->ir_freecount == irec->ir_count) { error = xfs_btree_increment(cur, 0, &has_more); @@ -534,6 +552,7 @@ xfs_iwalk( .trim_start = 1, .skip_empty = 1, .pwork = XFS_PWORK_SINGLE_THREADED, + .lastino = NULLFSINO, }; xfs_agnumber_t agno = XFS_INO_TO_AGNO(mp, startino); int error; @@ -622,6 +641,7 @@ xfs_iwalk_threaded( iwag->data = data; iwag->startino = startino; iwag->sz_recs = xfs_iwalk_prefetch(inode_records); + iwag->lastino = NULLFSINO; xfs_pwork_queue(&pctl, &iwag->pwork); startino = XFS_AGINO_TO_INO(mp, agno + 1, 0); if (flags & XFS_INOBT_WALK_SAME_AG) @@ -695,6 +715,7 @@ xfs_inobt_walk( .startino = startino, .sz_recs = xfs_inobt_walk_prefetch(inobt_records), .pwork = XFS_PWORK_SINGLE_THREADED, + .lastino = NULLFSINO, }; xfs_agnumber_t agno = XFS_INO_TO_AGNO(mp, startino); int error;
From: "Darrick J. Wong" darrick.wong@oracle.com
commit a5336d6bb2d02d0e9d4d3c8be04b80b8b68d56c8 upstream.
In commit 27c14b5daa82 we started tracking the last inode seen during an inode walk to avoid infinite loops if a corrupt inobt record happens to have a lower ir_startino than the record preceeding it. Unfortunately, the assertion trips over the case where there are completely empty inobt records (which can happen quite easily on 64k page filesystems) because we advance the tracking cursor without actually putting the empty record into the processing buffer. Fix the assert to allow for this case.
Reported-by: zlang@redhat.com Fixes: 27c14b5daa82 ("xfs: ensure inobt record walks always make forward progress") Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Zorro Lang zlang@redhat.com Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_iwalk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/xfs/xfs_iwalk.c +++ b/fs/xfs/xfs_iwalk.c @@ -362,7 +362,7 @@ xfs_iwalk_run_callbacks( /* Delete cursor but remember the last record we cached... */ xfs_iwalk_del_inobt(tp, curpp, agi_bpp, 0); irec = &iwag->recs[iwag->nr_recs - 1]; - ASSERT(next_agino == irec->ir_startino + XFS_INODES_PER_CHUNK); + ASSERT(next_agino >= irec->ir_startino + XFS_INODES_PER_CHUNK);
error = xfs_iwalk_ag_recs(iwag); if (error)
From: "Darrick J. Wong" djwong@kernel.org
commit f8d92a66e810acbef6ddbc0bd0cbd9b117ce8acd upstream.
[ Continue to interpret xfs_log_item->li_seq as an LSN rather than a CIL sequence number. ]
While I was running with KASAN and lockdep enabled, I stumbled upon an KASAN report about a UAF to a freed CIL checkpoint. Looking at the comment for xfs_log_item_in_current_chkpt, it seems pretty obvious to me that the original patch to xfs_defer_finish_noroll should have done something to lock the CIL to prevent it from switching the CIL contexts while the predicate runs.
For upper level code that needs to know if a given log item is new enough not to need relogging, add a new wrapper that takes the CIL context lock long enough to sample the current CIL context. This is kind of racy in that the CIL can switch the contexts immediately after sampling, but that's ok because the consequence is that the defer ops code is a little slow to relog items.
================================================================== BUG: KASAN: use-after-free in xfs_log_item_in_current_chkpt+0x139/0x160 [xfs] Read of size 8 at addr ffff88804ea5f608 by task fsstress/527999
CPU: 1 PID: 527999 Comm: fsstress Tainted: G D 5.16.0-rc4-xfsx #rc4 Call Trace: <TASK> dump_stack_lvl+0x45/0x59 print_address_description.constprop.0+0x1f/0x140 kasan_report.cold+0x83/0xdf xfs_log_item_in_current_chkpt+0x139/0x160 xfs_defer_finish_noroll+0x3bb/0x1e30 __xfs_trans_commit+0x6c8/0xcf0 xfs_reflink_remap_extent+0x66f/0x10e0 xfs_reflink_remap_blocks+0x2dd/0xa90 xfs_file_remap_range+0x27b/0xc30 vfs_dedupe_file_range_one+0x368/0x420 vfs_dedupe_file_range+0x37c/0x5d0 do_vfs_ioctl+0x308/0x1260 __x64_sys_ioctl+0xa1/0x170 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f2c71a2950b Code: 0f 1e fa 48 8b 05 85 39 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 39 0d 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe8c0e03c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00005600862a8740 RCX: 00007f2c71a2950b RDX: 00005600862a7be0 RSI: 00000000c0189436 RDI: 0000000000000004 RBP: 000000000000000b R08: 0000000000000027 R09: 0000000000000003 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000005a R13: 00005600862804a8 R14: 0000000000016000 R15: 00005600862a8a20 </TASK>
Allocated by task 464064: kasan_save_stack+0x1e/0x50 __kasan_kmalloc+0x81/0xa0 kmem_alloc+0xcd/0x2c0 [xfs] xlog_cil_ctx_alloc+0x17/0x1e0 [xfs] xlog_cil_push_work+0x141/0x13d0 [xfs] process_one_work+0x7f6/0x1380 worker_thread+0x59d/0x1040 kthread+0x3b0/0x490 ret_from_fork+0x1f/0x30
Freed by task 51: kasan_save_stack+0x1e/0x50 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 __kasan_slab_free+0xed/0x130 slab_free_freelist_hook+0x7f/0x160 kfree+0xde/0x340 xlog_cil_committed+0xbfd/0xfe0 [xfs] xlog_cil_process_committed+0x103/0x1c0 [xfs] xlog_state_do_callback+0x45d/0xbd0 [xfs] xlog_ioend_work+0x116/0x1c0 [xfs] process_one_work+0x7f6/0x1380 worker_thread+0x59d/0x1040 kthread+0x3b0/0x490 ret_from_fork+0x1f/0x30
Last potentially related work creation: kasan_save_stack+0x1e/0x50 __kasan_record_aux_stack+0xb7/0xc0 insert_work+0x48/0x2e0 __queue_work+0x4e7/0xda0 queue_work_on+0x69/0x80 xlog_cil_push_now.isra.0+0x16b/0x210 [xfs] xlog_cil_force_seq+0x1b7/0x850 [xfs] xfs_log_force_seq+0x1c7/0x670 [xfs] xfs_file_fsync+0x7c1/0xa60 [xfs] __x64_sys_fsync+0x52/0x80 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae
The buggy address belongs to the object at ffff88804ea5f600 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 8 bytes inside of 256-byte region [ffff88804ea5f600, ffff88804ea5f700) The buggy address belongs to the page: page:ffffea00013a9780 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88804ea5ea00 pfn:0x4ea5e head:ffffea00013a9780 order:1 compound_mapcount:0 flags: 0x4fff80000010200(slab|head|node=1|zone=1|lastcpupid=0xfff) raw: 04fff80000010200 ffffea0001245908 ffffea00011bd388 ffff888004c42b40 raw: ffff88804ea5ea00 0000000000100009 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff88804ea5f500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88804ea5f580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88804ea5f600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff88804ea5f680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88804ea5f700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ==================================================================
Fixes: 4e919af7827a ("xfs: periodically relog deferred intent items") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_log_cil.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
--- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -1178,21 +1178,19 @@ out_shutdown: */ bool xfs_log_item_in_current_chkpt( - struct xfs_log_item *lip) + struct xfs_log_item *lip) { - struct xfs_cil_ctx *ctx; + struct xfs_cil *cil = lip->li_mountp->m_log->l_cilp;
if (list_empty(&lip->li_cil)) return false;
- ctx = lip->li_mountp->m_log->l_cilp->xc_ctx; - /* * li_seq is written on the first commit of a log item to record the * first checkpoint it is written to. Hence if it is different to the * current sequence, we're in a new checkpoint. */ - if (XFS_LSN_CMP(lip->li_seq, ctx->sequence) != 0) + if (XFS_LSN_CMP(lip->li_seq, READ_ONCE(cil->xc_current_sequence)) != 0) return false; return true; }
From: Brian Foster bfoster@redhat.com
commit 50d25484bebe94320c49dd1347d3330c7063bbdb upstream.
[ Modify xfs_log_unmount_write() to return zero when the log is in a read-only state ]
xfs_log_sbcount() syncs the superblock specifically to accumulate the in-core percpu superblock counters and commit them to disk. This is required to maintain filesystem consistency across quiesce (freeze, read-only mount/remount) or unmount when lazy superblock accounting is enabled because individual transactions do not update the superblock directly.
This mechanism works as expected for writable mounts, but xfs_log_sbcount() skips the update for read-only mounts. Read-only mounts otherwise still allow log recovery and write out an unmount record during log quiesce. If a read-only mount performs log recovery, it can modify the in-core superblock counters and write an unmount record when the filesystem unmounts without ever syncing the in-core counters. This leaves the filesystem with a clean log but in an inconsistent state with regard to lazy sb counters.
Update xfs_log_sbcount() to use the same logic xfs_log_unmount_write() uses to determine when to write an unmount record. This ensures that lazy accounting is always synced before the log is cleaned. Refactor this logic into a new helper to distinguish between a writable filesystem and a writable log. Specifically, the log is writable unless the filesystem is mounted with the norecovery mount option, the underlying log device is read-only, or the filesystem is shutdown. Drop the freeze state check because the update is already allowed during the freezing process and no context calls this function on an already frozen fs. Also, retain the shutdown check in xfs_log_unmount_write() to catch the case where the preceding log force might have triggered a shutdown.
Signed-off-by: Brian Foster bfoster@redhat.com Reviewed-by: Gao Xiang hsiangkao@redhat.com Reviewed-by: Allison Henderson allison.henderson@oracle.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Bill O'Donnell billodo@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandan.babu@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_log.c | 28 ++++++++++++++++++++-------- fs/xfs/xfs_log.h | 1 + fs/xfs/xfs_mount.c | 3 +-- 3 files changed, 22 insertions(+), 10 deletions(-)
--- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -369,6 +369,25 @@ xlog_tic_add_region(xlog_ticket_t *tic, tic->t_res_num++; }
+bool +xfs_log_writable( + struct xfs_mount *mp) +{ + /* + * Never write to the log on norecovery mounts, if the block device is + * read-only, or if the filesystem is shutdown. Read-only mounts still + * allow internal writes for log recovery and unmount purposes, so don't + * restrict that case here. + */ + if (mp->m_flags & XFS_MOUNT_NORECOVERY) + return false; + if (xfs_readonly_buftarg(mp->m_log->l_targ)) + return false; + if (XFS_FORCED_SHUTDOWN(mp)) + return false; + return true; +} + /* * Replenish the byte reservation required by moving the grant write head. */ @@ -895,15 +914,8 @@ xfs_log_unmount_write(xfs_mount_t *mp) #endif int error;
- /* - * Don't write out unmount record on norecovery mounts or ro devices. - * Or, if we are doing a forced umount (typically because of IO errors). - */ - if (mp->m_flags & XFS_MOUNT_NORECOVERY || - xfs_readonly_buftarg(log->l_targ)) { - ASSERT(mp->m_flags & XFS_MOUNT_RDONLY); + if (!xfs_log_writable(mp)) return 0; - }
error = xfs_log_force(mp, XFS_LOG_SYNC); ASSERT(error || !(XLOG_FORCED_SHUTDOWN(log))); --- a/fs/xfs/xfs_log.h +++ b/fs/xfs/xfs_log.h @@ -132,6 +132,7 @@ int xfs_log_reserve(struct xfs_mount * int xfs_log_regrant(struct xfs_mount *mp, struct xlog_ticket *tic); void xfs_log_unmount(struct xfs_mount *mp); int xfs_log_force_umount(struct xfs_mount *mp, int logerror); +bool xfs_log_writable(struct xfs_mount *mp);
struct xlog_ticket *xfs_log_ticket_get(struct xlog_ticket *ticket); void xfs_log_ticket_put(struct xlog_ticket *ticket); --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -1218,8 +1218,7 @@ xfs_fs_writable( int xfs_log_sbcount(xfs_mount_t *mp) { - /* allow this to proceed during the freeze sequence... */ - if (!xfs_fs_writable(mp, SB_FREEZE_COMPLETE)) + if (!xfs_log_writable(mp)) return 0;
/*
From: Shaoying Xu shaoyi@amazon.com
This reverts commit 2537b637eac0bd546f63e1492a34edd30878e8d4 that deleted the whole fib_tests.sh by mistake and caused fib_tests failure in kselftests run.
Signed-off-by: Shaoying Xu shaoyi@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/fib_semantics.c | 1 tools/testing/selftests/net/fib_tests.sh | 1727 +++++++++++++++++++++++++++++++ 2 files changed, 1727 insertions(+), 1 deletion(-) create mode 100755 tools/testing/selftests/net/fib_tests.sh
--- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -421,7 +421,6 @@ static struct fib_info *fib_find_info(st nfi->fib_prefsrc == fi->fib_prefsrc && nfi->fib_priority == fi->fib_priority && nfi->fib_type == fi->fib_type && - nfi->fib_tb_id == fi->fib_tb_id && memcmp(nfi->fib_metrics, fi->fib_metrics, sizeof(u32) * RTAX_MAX) == 0 && !((nfi->fib_flags ^ fi->fib_flags) & ~RTNH_COMPARE_MASK) && --- /dev/null +++ b/tools/testing/selftests/net/fib_tests.sh @@ -0,0 +1,1727 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +# This test is for checking IPv4 and IPv6 FIB behavior in response to +# different events. + +ret=0 +# Kselftest framework requirement - SKIP code is 4. +ksft_skip=4 + +# all tests in this script. Can be overridden with -t option +TESTS="unregister down carrier nexthop suppress ipv6_rt ipv4_rt ipv6_addr_metric ipv4_addr_metric ipv6_route_metrics ipv4_route_metrics ipv4_route_v6_gw rp_filter" + +VERBOSE=0 +PAUSE_ON_FAIL=no +PAUSE=no +IP="ip -netns ns1" +NS_EXEC="ip netns exec ns1" + +which ping6 > /dev/null 2>&1 && ping6=$(which ping6) || ping6=$(which ping) + +log_test() +{ + local rc=$1 + local expected=$2 + local msg="$3" + + if [ ${rc} -eq ${expected} ]; then + printf " TEST: %-60s [ OK ]\n" "${msg}" + nsuccess=$((nsuccess+1)) + else + ret=1 + nfail=$((nfail+1)) + printf " TEST: %-60s [FAIL]\n" "${msg}" + if [ "${PAUSE_ON_FAIL}" = "yes" ]; then + echo + echo "hit enter to continue, 'q' to quit" + read a + [ "$a" = "q" ] && exit 1 + fi + fi + + if [ "${PAUSE}" = "yes" ]; then + echo + echo "hit enter to continue, 'q' to quit" + read a + [ "$a" = "q" ] && exit 1 + fi +} + +setup() +{ + set -e + ip netns add ns1 + ip netns set ns1 auto + $IP link set dev lo up + ip netns exec ns1 sysctl -qw net.ipv4.ip_forward=1 + ip netns exec ns1 sysctl -qw net.ipv6.conf.all.forwarding=1 + + $IP link add dummy0 type dummy + $IP link set dev dummy0 up + $IP address add 198.51.100.1/24 dev dummy0 + $IP -6 address add 2001:db8:1::1/64 dev dummy0 + set +e + +} + +cleanup() +{ + $IP link del dev dummy0 &> /dev/null + ip netns del ns1 + ip netns del ns2 &> /dev/null +} + +get_linklocal() +{ + local dev=$1 + local addr + + addr=$($IP -6 -br addr show dev ${dev} | \ + awk '{ + for (i = 3; i <= NF; ++i) { + if ($i ~ /^fe80/) + print $i + } + }' + ) + addr=${addr//*} + + [ -z "$addr" ] && return 1 + + echo $addr + + return 0 +} + +fib_unreg_unicast_test() +{ + echo + echo "Single path route test" + + setup + + echo " Start point" + $IP route get fibmatch 198.51.100.2 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + set -e + $IP link del dev dummy0 + set +e + + echo " Nexthop device deleted" + $IP route get fibmatch 198.51.100.2 &> /dev/null + log_test $? 2 "IPv4 fibmatch - no route" + $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null + log_test $? 2 "IPv6 fibmatch - no route" + + cleanup +} + +fib_unreg_multipath_test() +{ + + echo + echo "Multipath route test" + + setup + + set -e + $IP link add dummy1 type dummy + $IP link set dev dummy1 up + $IP address add 192.0.2.1/24 dev dummy1 + $IP -6 address add 2001:db8:2::1/64 dev dummy1 + + $IP route add 203.0.113.0/24 \ + nexthop via 198.51.100.2 dev dummy0 \ + nexthop via 192.0.2.2 dev dummy1 + $IP -6 route add 2001:db8:3::/64 \ + nexthop via 2001:db8:1::2 dev dummy0 \ + nexthop via 2001:db8:2::2 dev dummy1 + set +e + + echo " Start point" + $IP route get fibmatch 203.0.113.1 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:3::1 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + set -e + $IP link del dev dummy0 + set +e + + echo " One nexthop device deleted" + $IP route get fibmatch 203.0.113.1 &> /dev/null + log_test $? 2 "IPv4 - multipath route removed on delete" + + $IP -6 route get fibmatch 2001:db8:3::1 &> /dev/null + # In IPv6 we do not flush the entire multipath route. + log_test $? 0 "IPv6 - multipath down to single path" + + set -e + $IP link del dev dummy1 + set +e + + echo " Second nexthop device deleted" + $IP -6 route get fibmatch 2001:db8:3::1 &> /dev/null + log_test $? 2 "IPv6 - no route" + + cleanup +} + +fib_unreg_test() +{ + fib_unreg_unicast_test + fib_unreg_multipath_test +} + +fib_down_unicast_test() +{ + echo + echo "Single path, admin down" + + setup + + echo " Start point" + $IP route get fibmatch 198.51.100.2 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + set -e + $IP link set dev dummy0 down + set +e + + echo " Route deleted on down" + $IP route get fibmatch 198.51.100.2 &> /dev/null + log_test $? 2 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null + log_test $? 2 "IPv6 fibmatch" + + cleanup +} + +fib_down_multipath_test_do() +{ + local down_dev=$1 + local up_dev=$2 + + $IP route get fibmatch 203.0.113.1 \ + oif $down_dev &> /dev/null + log_test $? 2 "IPv4 fibmatch on down device" + $IP -6 route get fibmatch 2001:db8:3::1 \ + oif $down_dev &> /dev/null + log_test $? 2 "IPv6 fibmatch on down device" + + $IP route get fibmatch 203.0.113.1 \ + oif $up_dev &> /dev/null + log_test $? 0 "IPv4 fibmatch on up device" + $IP -6 route get fibmatch 2001:db8:3::1 \ + oif $up_dev &> /dev/null + log_test $? 0 "IPv6 fibmatch on up device" + + $IP route get fibmatch 203.0.113.1 | \ + grep $down_dev | grep -q "dead linkdown" + log_test $? 0 "IPv4 flags on down device" + $IP -6 route get fibmatch 2001:db8:3::1 | \ + grep $down_dev | grep -q "dead linkdown" + log_test $? 0 "IPv6 flags on down device" + + $IP route get fibmatch 203.0.113.1 | \ + grep $up_dev | grep -q "dead linkdown" + log_test $? 1 "IPv4 flags on up device" + $IP -6 route get fibmatch 2001:db8:3::1 | \ + grep $up_dev | grep -q "dead linkdown" + log_test $? 1 "IPv6 flags on up device" +} + +fib_down_multipath_test() +{ + echo + echo "Admin down multipath" + + setup + + set -e + $IP link add dummy1 type dummy + $IP link set dev dummy1 up + + $IP address add 192.0.2.1/24 dev dummy1 + $IP -6 address add 2001:db8:2::1/64 dev dummy1 + + $IP route add 203.0.113.0/24 \ + nexthop via 198.51.100.2 dev dummy0 \ + nexthop via 192.0.2.2 dev dummy1 + $IP -6 route add 2001:db8:3::/64 \ + nexthop via 2001:db8:1::2 dev dummy0 \ + nexthop via 2001:db8:2::2 dev dummy1 + set +e + + echo " Verify start point" + $IP route get fibmatch 203.0.113.1 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + + $IP -6 route get fibmatch 2001:db8:3::1 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + set -e + $IP link set dev dummy0 down + set +e + + echo " One device down, one up" + fib_down_multipath_test_do "dummy0" "dummy1" + + set -e + $IP link set dev dummy0 up + $IP link set dev dummy1 down + set +e + + echo " Other device down and up" + fib_down_multipath_test_do "dummy1" "dummy0" + + set -e + $IP link set dev dummy0 down + set +e + + echo " Both devices down" + $IP route get fibmatch 203.0.113.1 &> /dev/null + log_test $? 2 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:3::1 &> /dev/null + log_test $? 2 "IPv6 fibmatch" + + $IP link del dev dummy1 + cleanup +} + +fib_down_test() +{ + fib_down_unicast_test + fib_down_multipath_test +} + +# Local routes should not be affected when carrier changes. +fib_carrier_local_test() +{ + echo + echo "Local carrier tests - single path" + + setup + + set -e + $IP link set dev dummy0 carrier on + set +e + + echo " Start point" + $IP route get fibmatch 198.51.100.1 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:1::1 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + $IP route get fibmatch 198.51.100.1 | \ + grep -q "linkdown" + log_test $? 1 "IPv4 - no linkdown flag" + $IP -6 route get fibmatch 2001:db8:1::1 | \ + grep -q "linkdown" + log_test $? 1 "IPv6 - no linkdown flag" + + set -e + $IP link set dev dummy0 carrier off + sleep 1 + set +e + + echo " Carrier off on nexthop" + $IP route get fibmatch 198.51.100.1 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:1::1 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + $IP route get fibmatch 198.51.100.1 | \ + grep -q "linkdown" + log_test $? 1 "IPv4 - linkdown flag set" + $IP -6 route get fibmatch 2001:db8:1::1 | \ + grep -q "linkdown" + log_test $? 1 "IPv6 - linkdown flag set" + + set -e + $IP address add 192.0.2.1/24 dev dummy0 + $IP -6 address add 2001:db8:2::1/64 dev dummy0 + set +e + + echo " Route to local address with carrier down" + $IP route get fibmatch 192.0.2.1 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:2::1 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + $IP route get fibmatch 192.0.2.1 | \ + grep -q "linkdown" + log_test $? 1 "IPv4 linkdown flag set" + $IP -6 route get fibmatch 2001:db8:2::1 | \ + grep -q "linkdown" + log_test $? 1 "IPv6 linkdown flag set" + + cleanup +} + +fib_carrier_unicast_test() +{ + ret=0 + + echo + echo "Single path route carrier test" + + setup + + set -e + $IP link set dev dummy0 carrier on + set +e + + echo " Start point" + $IP route get fibmatch 198.51.100.2 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + $IP route get fibmatch 198.51.100.2 | \ + grep -q "linkdown" + log_test $? 1 "IPv4 no linkdown flag" + $IP -6 route get fibmatch 2001:db8:1::2 | \ + grep -q "linkdown" + log_test $? 1 "IPv6 no linkdown flag" + + set -e + $IP link set dev dummy0 carrier off + sleep 1 + set +e + + echo " Carrier down" + $IP route get fibmatch 198.51.100.2 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:1::2 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + $IP route get fibmatch 198.51.100.2 | \ + grep -q "linkdown" + log_test $? 0 "IPv4 linkdown flag set" + $IP -6 route get fibmatch 2001:db8:1::2 | \ + grep -q "linkdown" + log_test $? 0 "IPv6 linkdown flag set" + + set -e + $IP address add 192.0.2.1/24 dev dummy0 + $IP -6 address add 2001:db8:2::1/64 dev dummy0 + set +e + + echo " Second address added with carrier down" + $IP route get fibmatch 192.0.2.2 &> /dev/null + log_test $? 0 "IPv4 fibmatch" + $IP -6 route get fibmatch 2001:db8:2::2 &> /dev/null + log_test $? 0 "IPv6 fibmatch" + + $IP route get fibmatch 192.0.2.2 | \ + grep -q "linkdown" + log_test $? 0 "IPv4 linkdown flag set" + $IP -6 route get fibmatch 2001:db8:2::2 | \ + grep -q "linkdown" + log_test $? 0 "IPv6 linkdown flag set" + + cleanup +} + +fib_carrier_test() +{ + fib_carrier_local_test + fib_carrier_unicast_test +} + +fib_rp_filter_test() +{ + echo + echo "IPv4 rp_filter tests" + + setup + + set -e + ip netns add ns2 + ip netns set ns2 auto + + ip -netns ns2 link set dev lo up + + $IP link add name veth1 type veth peer name veth2 + $IP link set dev veth2 netns ns2 + $IP address add 192.0.2.1/24 dev veth1 + ip -netns ns2 address add 192.0.2.1/24 dev veth2 + $IP link set dev veth1 up + ip -netns ns2 link set dev veth2 up + + $IP link set dev lo address 52:54:00:6a:c7:5e + $IP link set dev veth1 address 52:54:00:6a:c7:5e + ip -netns ns2 link set dev lo address 52:54:00:6a:c7:5e + ip -netns ns2 link set dev veth2 address 52:54:00:6a:c7:5e + + # 1. (ns2) redirect lo's egress to veth2's egress + ip netns exec ns2 tc qdisc add dev lo parent root handle 1: fq_codel + ip netns exec ns2 tc filter add dev lo parent 1: protocol arp basic \ + action mirred egress redirect dev veth2 + ip netns exec ns2 tc filter add dev lo parent 1: protocol ip basic \ + action mirred egress redirect dev veth2 + + # 2. (ns1) redirect veth1's ingress to lo's ingress + $NS_EXEC tc qdisc add dev veth1 ingress + $NS_EXEC tc filter add dev veth1 ingress protocol arp basic \ + action mirred ingress redirect dev lo + $NS_EXEC tc filter add dev veth1 ingress protocol ip basic \ + action mirred ingress redirect dev lo + + # 3. (ns1) redirect lo's egress to veth1's egress + $NS_EXEC tc qdisc add dev lo parent root handle 1: fq_codel + $NS_EXEC tc filter add dev lo parent 1: protocol arp basic \ + action mirred egress redirect dev veth1 + $NS_EXEC tc filter add dev lo parent 1: protocol ip basic \ + action mirred egress redirect dev veth1 + + # 4. (ns2) redirect veth2's ingress to lo's ingress + ip netns exec ns2 tc qdisc add dev veth2 ingress + ip netns exec ns2 tc filter add dev veth2 ingress protocol arp basic \ + action mirred ingress redirect dev lo + ip netns exec ns2 tc filter add dev veth2 ingress protocol ip basic \ + action mirred ingress redirect dev lo + + $NS_EXEC sysctl -qw net.ipv4.conf.all.rp_filter=1 + $NS_EXEC sysctl -qw net.ipv4.conf.all.accept_local=1 + $NS_EXEC sysctl -qw net.ipv4.conf.all.route_localnet=1 + ip netns exec ns2 sysctl -qw net.ipv4.conf.all.rp_filter=1 + ip netns exec ns2 sysctl -qw net.ipv4.conf.all.accept_local=1 + ip netns exec ns2 sysctl -qw net.ipv4.conf.all.route_localnet=1 + set +e + + run_cmd "ip netns exec ns2 ping -w1 -c1 192.0.2.1" + log_test $? 0 "rp_filter passes local packets" + + run_cmd "ip netns exec ns2 ping -w1 -c1 127.0.0.1" + log_test $? 0 "rp_filter passes loopback packets" + + cleanup +} + +################################################################################ +# Tests on nexthop spec + +# run 'ip route add' with given spec +add_rt() +{ + local desc="$1" + local erc=$2 + local vrf=$3 + local pfx=$4 + local gw=$5 + local dev=$6 + local cmd out rc + + [ "$vrf" = "-" ] && vrf="default" + [ -n "$gw" ] && gw="via $gw" + [ -n "$dev" ] && dev="dev $dev" + + cmd="$IP route add vrf $vrf $pfx $gw $dev" + if [ "$VERBOSE" = "1" ]; then + printf "\n COMMAND: $cmd\n" + fi + + out=$(eval $cmd 2>&1) + rc=$? + if [ "$VERBOSE" = "1" -a -n "$out" ]; then + echo " $out" + fi + log_test $rc $erc "$desc" +} + +fib4_nexthop() +{ + echo + echo "IPv4 nexthop tests" + + echo "<<< write me >>>" +} + +fib6_nexthop() +{ + local lldummy=$(get_linklocal dummy0) + local llv1=$(get_linklocal dummy0) + + if [ -z "$lldummy" ]; then + echo "Failed to get linklocal address for dummy0" + return 1 + fi + if [ -z "$llv1" ]; then + echo "Failed to get linklocal address for veth1" + return 1 + fi + + echo + echo "IPv6 nexthop tests" + + add_rt "Directly connected nexthop, unicast address" 0 \ + - 2001:db8:101::/64 2001:db8:1::2 + add_rt "Directly connected nexthop, unicast address with device" 0 \ + - 2001:db8:102::/64 2001:db8:1::2 "dummy0" + add_rt "Gateway is linklocal address" 0 \ + - 2001:db8:103::1/64 $llv1 "veth0" + + # fails because LL address requires a device + add_rt "Gateway is linklocal address, no device" 2 \ + - 2001:db8:104::1/64 $llv1 + + # local address can not be a gateway + add_rt "Gateway can not be local unicast address" 2 \ + - 2001:db8:105::/64 2001:db8:1::1 + add_rt "Gateway can not be local unicast address, with device" 2 \ + - 2001:db8:106::/64 2001:db8:1::1 "dummy0" + add_rt "Gateway can not be a local linklocal address" 2 \ + - 2001:db8:107::1/64 $lldummy "dummy0" + + # VRF tests + add_rt "Gateway can be local address in a VRF" 0 \ + - 2001:db8:108::/64 2001:db8:51::2 + add_rt "Gateway can be local address in a VRF, with device" 0 \ + - 2001:db8:109::/64 2001:db8:51::2 "veth0" + add_rt "Gateway can be local linklocal address in a VRF" 0 \ + - 2001:db8:110::1/64 $llv1 "veth0" + + add_rt "Redirect to VRF lookup" 0 \ + - 2001:db8:111::/64 "" "red" + + add_rt "VRF route, gateway can be local address in default VRF" 0 \ + red 2001:db8:112::/64 2001:db8:51::1 + + # local address in same VRF fails + add_rt "VRF route, gateway can not be a local address" 2 \ + red 2001:db8:113::1/64 2001:db8:2::1 + add_rt "VRF route, gateway can not be a local addr with device" 2 \ + red 2001:db8:114::1/64 2001:db8:2::1 "dummy1" +} + +# Default VRF: +# dummy0 - 198.51.100.1/24 2001:db8:1::1/64 +# veth0 - 192.0.2.1/24 2001:db8:51::1/64 +# +# VRF red: +# dummy1 - 192.168.2.1/24 2001:db8:2::1/64 +# veth1 - 192.0.2.2/24 2001:db8:51::2/64 +# +# [ dummy0 veth0 ]--[ veth1 dummy1 ] + +fib_nexthop_test() +{ + setup + + set -e + + $IP -4 rule add pref 32765 table local + $IP -4 rule del pref 0 + $IP -6 rule add pref 32765 table local + $IP -6 rule del pref 0 + + $IP link add red type vrf table 1 + $IP link set red up + $IP -4 route add vrf red unreachable default metric 4278198272 + $IP -6 route add vrf red unreachable default metric 4278198272 + + $IP link add veth0 type veth peer name veth1 + $IP link set dev veth0 up + $IP address add 192.0.2.1/24 dev veth0 + $IP -6 address add 2001:db8:51::1/64 dev veth0 + + $IP link set dev veth1 vrf red up + $IP address add 192.0.2.2/24 dev veth1 + $IP -6 address add 2001:db8:51::2/64 dev veth1 + + $IP link add dummy1 type dummy + $IP link set dev dummy1 vrf red up + $IP address add 192.168.2.1/24 dev dummy1 + $IP -6 address add 2001:db8:2::1/64 dev dummy1 + set +e + + sleep 1 + fib4_nexthop + fib6_nexthop + + ( + $IP link del dev dummy1 + $IP link del veth0 + $IP link del red + ) 2>/dev/null + cleanup +} + +fib_suppress_test() +{ + echo + echo "FIB rule with suppress_prefixlength" + setup + + $IP link add dummy1 type dummy + $IP link set dummy1 up + $IP -6 route add default dev dummy1 + $IP -6 rule add table main suppress_prefixlength 0 + ping -f -c 1000 -W 1 1234::1 >/dev/null 2>&1 + $IP -6 rule del table main suppress_prefixlength 0 + $IP link del dummy1 + + # If we got here without crashing, we're good. + log_test 0 0 "FIB rule suppress test" + + cleanup +} + +################################################################################ +# Tests on route add and replace + +run_cmd() +{ + local cmd="$1" + local out + local stderr="2>/dev/null" + + if [ "$VERBOSE" = "1" ]; then + printf " COMMAND: $cmd\n" + stderr= + fi + + out=$(eval $cmd $stderr) + rc=$? + if [ "$VERBOSE" = "1" -a -n "$out" ]; then + echo " $out" + fi + + [ "$VERBOSE" = "1" ] && echo + + return $rc +} + +check_expected() +{ + local out="$1" + local expected="$2" + local rc=0 + + [ "${out}" = "${expected}" ] && return 0 + + if [ -z "${out}" ]; then + if [ "$VERBOSE" = "1" ]; then + printf "\nNo route entry found\n" + printf "Expected:\n" + printf " ${expected}\n" + fi + return 1 + fi + + # tricky way to convert output to 1-line without ip's + # messy ''; this drops all extra white space + out=$(echo ${out}) + if [ "${out}" != "${expected}" ]; then + rc=1 + if [ "${VERBOSE}" = "1" ]; then + printf " Unexpected route entry. Have:\n" + printf " ${out}\n" + printf " Expected:\n" + printf " ${expected}\n\n" + fi + fi + + return $rc +} + +# add route for a prefix, flushing any existing routes first +# expected to be the first step of a test +add_route6() +{ + local pfx="$1" + local nh="$2" + local out + + if [ "$VERBOSE" = "1" ]; then + echo + echo " ##################################################" + echo + fi + + run_cmd "$IP -6 ro flush ${pfx}" + [ $? -ne 0 ] && exit 1 + + out=$($IP -6 ro ls match ${pfx}) + if [ -n "$out" ]; then + echo "Failed to flush routes for prefix used for tests." + exit 1 + fi + + run_cmd "$IP -6 ro add ${pfx} ${nh}" + if [ $? -ne 0 ]; then + echo "Failed to add initial route for test." + exit 1 + fi +} + +# add initial route - used in replace route tests +add_initial_route6() +{ + add_route6 "2001:db8:104::/64" "$1" +} + +check_route6() +{ + local pfx + local expected="$1" + local out + local rc=0 + + set -- $expected + pfx=$1 + + out=$($IP -6 ro ls match ${pfx} | sed -e 's/ pref medium//') + check_expected "${out}" "${expected}" +} + +route_cleanup() +{ + $IP li del red 2>/dev/null + $IP li del dummy1 2>/dev/null + $IP li del veth1 2>/dev/null + $IP li del veth3 2>/dev/null + + cleanup &> /dev/null +} + +route_setup() +{ + route_cleanup + setup + + [ "${VERBOSE}" = "1" ] && set -x + set -e + + ip netns add ns2 + ip netns set ns2 auto + ip -netns ns2 link set dev lo up + ip netns exec ns2 sysctl -qw net.ipv4.ip_forward=1 + ip netns exec ns2 sysctl -qw net.ipv6.conf.all.forwarding=1 + + $IP li add veth1 type veth peer name veth2 + $IP li add veth3 type veth peer name veth4 + + $IP li set veth1 up + $IP li set veth3 up + $IP li set veth2 netns ns2 up + $IP li set veth4 netns ns2 up + ip -netns ns2 li add dummy1 type dummy + ip -netns ns2 li set dummy1 up + + $IP -6 addr add 2001:db8:101::1/64 dev veth1 nodad + $IP -6 addr add 2001:db8:103::1/64 dev veth3 nodad + $IP addr add 172.16.101.1/24 dev veth1 + $IP addr add 172.16.103.1/24 dev veth3 + + ip -netns ns2 -6 addr add 2001:db8:101::2/64 dev veth2 nodad + ip -netns ns2 -6 addr add 2001:db8:103::2/64 dev veth4 nodad + ip -netns ns2 -6 addr add 2001:db8:104::1/64 dev dummy1 nodad + + ip -netns ns2 addr add 172.16.101.2/24 dev veth2 + ip -netns ns2 addr add 172.16.103.2/24 dev veth4 + ip -netns ns2 addr add 172.16.104.1/24 dev dummy1 + + set +e +} + +# assumption is that basic add of a single path route works +# otherwise just adding an address on an interface is broken +ipv6_rt_add() +{ + local rc + + echo + echo "IPv6 route add / append tests" + + # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL + add_route6 "2001:db8:104::/64" "via 2001:db8:101::2" + run_cmd "$IP -6 ro add 2001:db8:104::/64 via 2001:db8:103::2" + log_test $? 2 "Attempt to add duplicate route - gw" + + # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL + add_route6 "2001:db8:104::/64" "via 2001:db8:101::2" + run_cmd "$IP -6 ro add 2001:db8:104::/64 dev veth3" + log_test $? 2 "Attempt to add duplicate route - dev only" + + # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL + add_route6 "2001:db8:104::/64" "via 2001:db8:101::2" + run_cmd "$IP -6 ro add unreachable 2001:db8:104::/64" + log_test $? 2 "Attempt to add duplicate route - reject route" + + # route append with same prefix adds a new route + # - iproute2 sets NLM_F_CREATE | NLM_F_APPEND + add_route6 "2001:db8:104::/64" "via 2001:db8:101::2" + run_cmd "$IP -6 ro append 2001:db8:104::/64 via 2001:db8:103::2" + check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + log_test $? 0 "Append nexthop to existing route - gw" + + # insert mpath directly + add_route6 "2001:db8:104::/64" "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + log_test $? 0 "Add multipath route" + + add_route6 "2001:db8:104::/64" "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro add 2001:db8:104::/64 nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + log_test $? 2 "Attempt to add duplicate multipath route" + + # insert of a second route without append but different metric + add_route6 "2001:db8:104::/64" "via 2001:db8:101::2" + run_cmd "$IP -6 ro add 2001:db8:104::/64 via 2001:db8:103::2 metric 512" + rc=$? + if [ $rc -eq 0 ]; then + run_cmd "$IP -6 ro add 2001:db8:104::/64 via 2001:db8:103::3 metric 256" + rc=$? + fi + log_test $rc 0 "Route add with different metrics" + + run_cmd "$IP -6 ro del 2001:db8:104::/64 metric 512" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:104::/64 via 2001:db8:103::3 dev veth3 metric 256 2001:db8:104::/64 via 2001:db8:101::2 dev veth1 metric 1024" + rc=$? + fi + log_test $rc 0 "Route delete with metric" +} + +ipv6_rt_replace_single() +{ + # single path with single path + # + add_initial_route6 "via 2001:db8:101::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 via 2001:db8:103::2" + check_route6 "2001:db8:104::/64 via 2001:db8:103::2 dev veth3 metric 1024" + log_test $? 0 "Single path with single path" + + # single path with multipath + # + add_initial_route6 "nexthop via 2001:db8:101::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:101::3 nexthop via 2001:db8:103::2" + check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::3 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + log_test $? 0 "Single path with multipath" + + # single path with single path using MULTIPATH attribute + # + add_initial_route6 "via 2001:db8:101::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:103::2" + check_route6 "2001:db8:104::/64 via 2001:db8:103::2 dev veth3 metric 1024" + log_test $? 0 "Single path with single path via multipath attribute" + + # route replace fails - invalid nexthop + add_initial_route6 "via 2001:db8:101::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 via 2001:db8:104::2" + if [ $? -eq 0 ]; then + # previous command is expected to fail so if it returns 0 + # that means the test failed. + log_test 0 1 "Invalid nexthop" + else + check_route6 "2001:db8:104::/64 via 2001:db8:101::2 dev veth1 metric 1024" + log_test $? 0 "Invalid nexthop" + fi + + # replace non-existent route + # - note use of change versus replace since ip adds NLM_F_CREATE + # for replace + add_initial_route6 "via 2001:db8:101::2" + run_cmd "$IP -6 ro change 2001:db8:105::/64 via 2001:db8:101::2" + log_test $? 2 "Single path - replace of non-existent route" +} + +ipv6_rt_replace_mpath() +{ + # multipath with multipath + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:101::3 nexthop via 2001:db8:103::3" + check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::3 dev veth1 weight 1 nexthop via 2001:db8:103::3 dev veth3 weight 1" + log_test $? 0 "Multipath with multipath" + + # multipath with single + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 via 2001:db8:101::3" + check_route6 "2001:db8:104::/64 via 2001:db8:101::3 dev veth1 metric 1024" + log_test $? 0 "Multipath with single path" + + # multipath with single + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:101::3" + check_route6 "2001:db8:104::/64 via 2001:db8:101::3 dev veth1 metric 1024" + log_test $? 0 "Multipath with single path via multipath attribute" + + # multipath with dev-only + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 dev veth1" + check_route6 "2001:db8:104::/64 dev veth1 metric 1024" + log_test $? 0 "Multipath with dev-only" + + # route replace fails - invalid nexthop 1 + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:111::3 nexthop via 2001:db8:103::3" + check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + log_test $? 0 "Multipath - invalid first nexthop" + + # route replace fails - invalid nexthop 2 + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:101::3 nexthop via 2001:db8:113::3" + check_route6 "2001:db8:104::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + log_test $? 0 "Multipath - invalid second nexthop" + + # multipath non-existent route + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro change 2001:db8:105::/64 nexthop via 2001:db8:101::3 nexthop via 2001:db8:103::3" + log_test $? 2 "Multipath - replace of non-existent route" +} + +ipv6_rt_replace() +{ + echo + echo "IPv6 route replace tests" + + ipv6_rt_replace_single + ipv6_rt_replace_mpath +} + +ipv6_route_test() +{ + route_setup + + ipv6_rt_add + ipv6_rt_replace + + route_cleanup +} + +ip_addr_metric_check() +{ + ip addr help 2>&1 | grep -q metric + if [ $? -ne 0 ]; then + echo "iproute2 command does not support metric for addresses. Skipping test" + return 1 + fi + + return 0 +} + +ipv6_addr_metric_test() +{ + local rc + + echo + echo "IPv6 prefix route tests" + + ip_addr_metric_check || return 1 + + setup + + set -e + $IP li add dummy1 type dummy + $IP li add dummy2 type dummy + $IP li set dummy1 up + $IP li set dummy2 up + + # default entry is metric 256 + run_cmd "$IP -6 addr add dev dummy1 2001:db8:104::1/64" + run_cmd "$IP -6 addr add dev dummy2 2001:db8:104::2/64" + set +e + + check_route6 "2001:db8:104::/64 dev dummy1 proto kernel metric 256 2001:db8:104::/64 dev dummy2 proto kernel metric 256" + log_test $? 0 "Default metric" + + set -e + run_cmd "$IP -6 addr flush dev dummy1" + run_cmd "$IP -6 addr add dev dummy1 2001:db8:104::1/64 metric 257" + set +e + + check_route6 "2001:db8:104::/64 dev dummy2 proto kernel metric 256 2001:db8:104::/64 dev dummy1 proto kernel metric 257" + log_test $? 0 "User specified metric on first device" + + set -e + run_cmd "$IP -6 addr flush dev dummy2" + run_cmd "$IP -6 addr add dev dummy2 2001:db8:104::2/64 metric 258" + set +e + + check_route6 "2001:db8:104::/64 dev dummy1 proto kernel metric 257 2001:db8:104::/64 dev dummy2 proto kernel metric 258" + log_test $? 0 "User specified metric on second device" + + run_cmd "$IP -6 addr del dev dummy1 2001:db8:104::1/64 metric 257" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:104::/64 dev dummy2 proto kernel metric 258" + rc=$? + fi + log_test $rc 0 "Delete of address on first device" + + run_cmd "$IP -6 addr change dev dummy2 2001:db8:104::2/64 metric 259" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:104::/64 dev dummy2 proto kernel metric 259" + rc=$? + fi + log_test $rc 0 "Modify metric of address" + + # verify prefix route removed on down + run_cmd "ip netns exec ns1 sysctl -qw net.ipv6.conf.all.keep_addr_on_down=1" + run_cmd "$IP li set dev dummy2 down" + rc=$? + if [ $rc -eq 0 ]; then + out=$($IP -6 ro ls match 2001:db8:104::/64) + check_expected "${out}" "" + rc=$? + fi + log_test $rc 0 "Prefix route removed on link down" + + # verify prefix route re-inserted with assigned metric + run_cmd "$IP li set dev dummy2 up" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:104::/64 dev dummy2 proto kernel metric 259" + rc=$? + fi + log_test $rc 0 "Prefix route with metric on link up" + + # verify peer metric added correctly + set -e + run_cmd "$IP -6 addr flush dev dummy2" + run_cmd "$IP -6 addr add dev dummy2 2001:db8:104::1 peer 2001:db8:104::2 metric 260" + set +e + + check_route6 "2001:db8:104::1 dev dummy2 proto kernel metric 260" + log_test $? 0 "Set metric with peer route on local side" + check_route6 "2001:db8:104::2 dev dummy2 proto kernel metric 260" + log_test $? 0 "Set metric with peer route on peer side" + + set -e + run_cmd "$IP -6 addr change dev dummy2 2001:db8:104::1 peer 2001:db8:104::3 metric 261" + set +e + + check_route6 "2001:db8:104::1 dev dummy2 proto kernel metric 261" + log_test $? 0 "Modify metric and peer address on local side" + check_route6 "2001:db8:104::3 dev dummy2 proto kernel metric 261" + log_test $? 0 "Modify metric and peer address on peer side" + + $IP li del dummy1 + $IP li del dummy2 + cleanup +} + +ipv6_route_metrics_test() +{ + local rc + + echo + echo "IPv6 routes with metrics" + + route_setup + + # + # single path with metrics + # + run_cmd "$IP -6 ro add 2001:db8:111::/64 via 2001:db8:101::2 mtu 1400" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:111::/64 via 2001:db8:101::2 dev veth1 metric 1024 mtu 1400" + rc=$? + fi + log_test $rc 0 "Single path route with mtu metric" + + + # + # multipath via separate routes with metrics + # + run_cmd "$IP -6 ro add 2001:db8:112::/64 via 2001:db8:101::2 mtu 1400" + run_cmd "$IP -6 ro append 2001:db8:112::/64 via 2001:db8:103::2" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:112::/64 metric 1024 mtu 1400 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + rc=$? + fi + log_test $rc 0 "Multipath route via 2 single routes with mtu metric on first" + + # second route is coalesced to first to make a multipath route. + # MTU of the second path is hidden from display! + run_cmd "$IP -6 ro add 2001:db8:113::/64 via 2001:db8:101::2" + run_cmd "$IP -6 ro append 2001:db8:113::/64 via 2001:db8:103::2 mtu 1400" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:113::/64 metric 1024 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + rc=$? + fi + log_test $rc 0 "Multipath route via 2 single routes with mtu metric on 2nd" + + run_cmd "$IP -6 ro del 2001:db8:113::/64 via 2001:db8:101::2" + if [ $? -eq 0 ]; then + check_route6 "2001:db8:113::/64 via 2001:db8:103::2 dev veth3 metric 1024 mtu 1400" + log_test $? 0 " MTU of second leg" + fi + + # + # multipath with metrics + # + run_cmd "$IP -6 ro add 2001:db8:115::/64 mtu 1400 nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + rc=$? + if [ $rc -eq 0 ]; then + check_route6 "2001:db8:115::/64 metric 1024 mtu 1400 nexthop via 2001:db8:101::2 dev veth1 weight 1 nexthop via 2001:db8:103::2 dev veth3 weight 1" + rc=$? + fi + log_test $rc 0 "Multipath route with mtu metric" + + $IP -6 ro add 2001:db8:104::/64 via 2001:db8:101::2 mtu 1300 + run_cmd "ip netns exec ns1 ${ping6} -w1 -c1 -s 1500 2001:db8:104::1" + log_test $? 0 "Using route with mtu metric" + + run_cmd "$IP -6 ro add 2001:db8:114::/64 via 2001:db8:101::2 congctl lock foo" + log_test $? 2 "Invalid metric (fails metric_convert)" + + route_cleanup +} + +# add route for a prefix, flushing any existing routes first +# expected to be the first step of a test +add_route() +{ + local pfx="$1" + local nh="$2" + local out + + if [ "$VERBOSE" = "1" ]; then + echo + echo " ##################################################" + echo + fi + + run_cmd "$IP ro flush ${pfx}" + [ $? -ne 0 ] && exit 1 + + out=$($IP ro ls match ${pfx}) + if [ -n "$out" ]; then + echo "Failed to flush routes for prefix used for tests." + exit 1 + fi + + run_cmd "$IP ro add ${pfx} ${nh}" + if [ $? -ne 0 ]; then + echo "Failed to add initial route for test." + exit 1 + fi +} + +# add initial route - used in replace route tests +add_initial_route() +{ + add_route "172.16.104.0/24" "$1" +} + +check_route() +{ + local pfx + local expected="$1" + local out + + set -- $expected + pfx=$1 + [ "${pfx}" = "unreachable" ] && pfx=$2 + + out=$($IP ro ls match ${pfx}) + check_expected "${out}" "${expected}" +} + +# assumption is that basic add of a single path route works +# otherwise just adding an address on an interface is broken +ipv4_rt_add() +{ + local rc + + echo + echo "IPv4 route add / append tests" + + # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro add 172.16.104.0/24 via 172.16.103.2" + log_test $? 2 "Attempt to add duplicate route - gw" + + # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro add 172.16.104.0/24 dev veth3" + log_test $? 2 "Attempt to add duplicate route - dev only" + + # route add same prefix - fails with EEXISTS b/c ip adds NLM_F_EXCL + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro add unreachable 172.16.104.0/24" + log_test $? 2 "Attempt to add duplicate route - reject route" + + # iproute2 prepend only sets NLM_F_CREATE + # - adds a new route; does NOT convert existing route to ECMP + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro prepend 172.16.104.0/24 via 172.16.103.2" + check_route "172.16.104.0/24 via 172.16.103.2 dev veth3 172.16.104.0/24 via 172.16.101.2 dev veth1" + log_test $? 0 "Add new nexthop for existing prefix" + + # route append with same prefix adds a new route + # - iproute2 sets NLM_F_CREATE | NLM_F_APPEND + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro append 172.16.104.0/24 via 172.16.103.2" + check_route "172.16.104.0/24 via 172.16.101.2 dev veth1 172.16.104.0/24 via 172.16.103.2 dev veth3" + log_test $? 0 "Append nexthop to existing route - gw" + + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro append 172.16.104.0/24 dev veth3" + check_route "172.16.104.0/24 via 172.16.101.2 dev veth1 172.16.104.0/24 dev veth3 scope link" + log_test $? 0 "Append nexthop to existing route - dev only" + + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro append unreachable 172.16.104.0/24" + check_route "172.16.104.0/24 via 172.16.101.2 dev veth1 unreachable 172.16.104.0/24" + log_test $? 0 "Append nexthop to existing route - reject route" + + run_cmd "$IP ro flush 172.16.104.0/24" + run_cmd "$IP ro add unreachable 172.16.104.0/24" + run_cmd "$IP ro append 172.16.104.0/24 via 172.16.103.2" + check_route "unreachable 172.16.104.0/24 172.16.104.0/24 via 172.16.103.2 dev veth3" + log_test $? 0 "Append nexthop to existing reject route - gw" + + run_cmd "$IP ro flush 172.16.104.0/24" + run_cmd "$IP ro add unreachable 172.16.104.0/24" + run_cmd "$IP ro append 172.16.104.0/24 dev veth3" + check_route "unreachable 172.16.104.0/24 172.16.104.0/24 dev veth3 scope link" + log_test $? 0 "Append nexthop to existing reject route - dev only" + + # insert mpath directly + add_route "172.16.104.0/24" "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + check_route "172.16.104.0/24 nexthop via 172.16.101.2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" + log_test $? 0 "add multipath route" + + add_route "172.16.104.0/24" "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro add 172.16.104.0/24 nexthop via 172.16.101.2 nexthop via 172.16.103.2" + log_test $? 2 "Attempt to add duplicate multipath route" + + # insert of a second route without append but different metric + add_route "172.16.104.0/24" "via 172.16.101.2" + run_cmd "$IP ro add 172.16.104.0/24 via 172.16.103.2 metric 512" + rc=$? + if [ $rc -eq 0 ]; then + run_cmd "$IP ro add 172.16.104.0/24 via 172.16.103.3 metric 256" + rc=$? + fi + log_test $rc 0 "Route add with different metrics" + + run_cmd "$IP ro del 172.16.104.0/24 metric 512" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 via 172.16.101.2 dev veth1 172.16.104.0/24 via 172.16.103.3 dev veth3 metric 256" + rc=$? + fi + log_test $rc 0 "Route delete with metric" +} + +ipv4_rt_replace_single() +{ + # single path with single path + # + add_initial_route "via 172.16.101.2" + run_cmd "$IP ro replace 172.16.104.0/24 via 172.16.103.2" + check_route "172.16.104.0/24 via 172.16.103.2 dev veth3" + log_test $? 0 "Single path with single path" + + # single path with multipath + # + add_initial_route "nexthop via 172.16.101.2" + run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.101.3 nexthop via 172.16.103.2" + check_route "172.16.104.0/24 nexthop via 172.16.101.3 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" + log_test $? 0 "Single path with multipath" + + # single path with reject + # + add_initial_route "nexthop via 172.16.101.2" + run_cmd "$IP ro replace unreachable 172.16.104.0/24" + check_route "unreachable 172.16.104.0/24" + log_test $? 0 "Single path with reject route" + + # single path with single path using MULTIPATH attribute + # + add_initial_route "via 172.16.101.2" + run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.103.2" + check_route "172.16.104.0/24 via 172.16.103.2 dev veth3" + log_test $? 0 "Single path with single path via multipath attribute" + + # route replace fails - invalid nexthop + add_initial_route "via 172.16.101.2" + run_cmd "$IP ro replace 172.16.104.0/24 via 2001:db8:104::2" + if [ $? -eq 0 ]; then + # previous command is expected to fail so if it returns 0 + # that means the test failed. + log_test 0 1 "Invalid nexthop" + else + check_route "172.16.104.0/24 via 172.16.101.2 dev veth1" + log_test $? 0 "Invalid nexthop" + fi + + # replace non-existent route + # - note use of change versus replace since ip adds NLM_F_CREATE + # for replace + add_initial_route "via 172.16.101.2" + run_cmd "$IP ro change 172.16.105.0/24 via 172.16.101.2" + log_test $? 2 "Single path - replace of non-existent route" +} + +ipv4_rt_replace_mpath() +{ + # multipath with multipath + add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.101.3 nexthop via 172.16.103.3" + check_route "172.16.104.0/24 nexthop via 172.16.101.3 dev veth1 weight 1 nexthop via 172.16.103.3 dev veth3 weight 1" + log_test $? 0 "Multipath with multipath" + + # multipath with single + add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro replace 172.16.104.0/24 via 172.16.101.3" + check_route "172.16.104.0/24 via 172.16.101.3 dev veth1" + log_test $? 0 "Multipath with single path" + + # multipath with single + add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.101.3" + check_route "172.16.104.0/24 via 172.16.101.3 dev veth1" + log_test $? 0 "Multipath with single path via multipath attribute" + + # multipath with reject + add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro replace unreachable 172.16.104.0/24" + check_route "unreachable 172.16.104.0/24" + log_test $? 0 "Multipath with reject route" + + # route replace fails - invalid nexthop 1 + add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.111.3 nexthop via 172.16.103.3" + check_route "172.16.104.0/24 nexthop via 172.16.101.2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" + log_test $? 0 "Multipath - invalid first nexthop" + + # route replace fails - invalid nexthop 2 + add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro replace 172.16.104.0/24 nexthop via 172.16.101.3 nexthop via 172.16.113.3" + check_route "172.16.104.0/24 nexthop via 172.16.101.2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" + log_test $? 0 "Multipath - invalid second nexthop" + + # multipath non-existent route + add_initial_route "nexthop via 172.16.101.2 nexthop via 172.16.103.2" + run_cmd "$IP ro change 172.16.105.0/24 nexthop via 172.16.101.3 nexthop via 172.16.103.3" + log_test $? 2 "Multipath - replace of non-existent route" +} + +ipv4_rt_replace() +{ + echo + echo "IPv4 route replace tests" + + ipv4_rt_replace_single + ipv4_rt_replace_mpath +} + +ipv4_route_test() +{ + route_setup + + ipv4_rt_add + ipv4_rt_replace + + route_cleanup +} + +ipv4_addr_metric_test() +{ + local rc + + echo + echo "IPv4 prefix route tests" + + ip_addr_metric_check || return 1 + + setup + + set -e + $IP li add dummy1 type dummy + $IP li add dummy2 type dummy + $IP li set dummy1 up + $IP li set dummy2 up + + # default entry is metric 256 + run_cmd "$IP addr add dev dummy1 172.16.104.1/24" + run_cmd "$IP addr add dev dummy2 172.16.104.2/24" + set +e + + check_route "172.16.104.0/24 dev dummy1 proto kernel scope link src 172.16.104.1 172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2" + log_test $? 0 "Default metric" + + set -e + run_cmd "$IP addr flush dev dummy1" + run_cmd "$IP addr add dev dummy1 172.16.104.1/24 metric 257" + set +e + + check_route "172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2 172.16.104.0/24 dev dummy1 proto kernel scope link src 172.16.104.1 metric 257" + log_test $? 0 "User specified metric on first device" + + set -e + run_cmd "$IP addr flush dev dummy2" + run_cmd "$IP addr add dev dummy2 172.16.104.2/24 metric 258" + set +e + + check_route "172.16.104.0/24 dev dummy1 proto kernel scope link src 172.16.104.1 metric 257 172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2 metric 258" + log_test $? 0 "User specified metric on second device" + + run_cmd "$IP addr del dev dummy1 172.16.104.1/24 metric 257" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2 metric 258" + rc=$? + fi + log_test $rc 0 "Delete of address on first device" + + run_cmd "$IP addr change dev dummy2 172.16.104.2/24 metric 259" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2 metric 259" + rc=$? + fi + log_test $rc 0 "Modify metric of address" + + # verify prefix route removed on down + run_cmd "$IP li set dev dummy2 down" + rc=$? + if [ $rc -eq 0 ]; then + out=$($IP ro ls match 172.16.104.0/24) + check_expected "${out}" "" + rc=$? + fi + log_test $rc 0 "Prefix route removed on link down" + + # verify prefix route re-inserted with assigned metric + run_cmd "$IP li set dev dummy2 up" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.2 metric 259" + rc=$? + fi + log_test $rc 0 "Prefix route with metric on link up" + + # explicitly check for metric changes on edge scenarios + run_cmd "$IP addr flush dev dummy2" + run_cmd "$IP addr add dev dummy2 172.16.104.0/24 metric 259" + run_cmd "$IP addr change dev dummy2 172.16.104.0/24 metric 260" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 dev dummy2 proto kernel scope link src 172.16.104.0 metric 260" + rc=$? + fi + log_test $rc 0 "Modify metric of .0/24 address" + + run_cmd "$IP addr flush dev dummy2" + run_cmd "$IP addr add dev dummy2 172.16.104.1/32 peer 172.16.104.2 metric 260" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.104.2 dev dummy2 proto kernel scope link src 172.16.104.1 metric 260" + rc=$? + fi + log_test $rc 0 "Set metric of address with peer route" + + run_cmd "$IP addr change dev dummy2 172.16.104.1/32 peer 172.16.104.3 metric 261" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.104.3 dev dummy2 proto kernel scope link src 172.16.104.1 metric 261" + rc=$? + fi + log_test $rc 0 "Modify metric and peer address for peer route" + + $IP li del dummy1 + $IP li del dummy2 + cleanup +} + +ipv4_route_metrics_test() +{ + local rc + + echo + echo "IPv4 route add / append tests" + + route_setup + + run_cmd "$IP ro add 172.16.111.0/24 via 172.16.101.2 mtu 1400" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.111.0/24 via 172.16.101.2 dev veth1 mtu 1400" + rc=$? + fi + log_test $rc 0 "Single path route with mtu metric" + + + run_cmd "$IP ro add 172.16.112.0/24 mtu 1400 nexthop via 172.16.101.2 nexthop via 172.16.103.2" + rc=$? + if [ $rc -eq 0 ]; then + check_route "172.16.112.0/24 mtu 1400 nexthop via 172.16.101.2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" + rc=$? + fi + log_test $rc 0 "Multipath route with mtu metric" + + $IP ro add 172.16.104.0/24 via 172.16.101.2 mtu 1300 + run_cmd "ip netns exec ns1 ping -w1 -c1 -s 1500 172.16.104.1" + log_test $? 0 "Using route with mtu metric" + + run_cmd "$IP ro add 172.16.111.0/24 via 172.16.101.2 congctl lock foo" + log_test $? 2 "Invalid metric (fails metric_convert)" + + route_cleanup +} + +ipv4_route_v6_gw_test() +{ + local rc + + echo + echo "IPv4 route with IPv6 gateway tests" + + route_setup + sleep 2 + + # + # single path route + # + run_cmd "$IP ro add 172.16.104.0/24 via inet6 2001:db8:101::2" + rc=$? + log_test $rc 0 "Single path route with IPv6 gateway" + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 via inet6 2001:db8:101::2 dev veth1" + fi + + run_cmd "ip netns exec ns1 ping -w1 -c1 172.16.104.1" + log_test $rc 0 "Single path route with IPv6 gateway - ping" + + run_cmd "$IP ro del 172.16.104.0/24 via inet6 2001:db8:101::2" + rc=$? + log_test $rc 0 "Single path route delete" + if [ $rc -eq 0 ]; then + check_route "172.16.112.0/24" + fi + + # + # multipath - v6 then v4 + # + run_cmd "$IP ro add 172.16.104.0/24 nexthop via inet6 2001:db8:101::2 dev veth1 nexthop via 172.16.103.2 dev veth3" + rc=$? + log_test $rc 0 "Multipath route add - v6 nexthop then v4" + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 nexthop via inet6 2001:db8:101::2 dev veth1 weight 1 nexthop via 172.16.103.2 dev veth3 weight 1" + fi + + run_cmd "$IP ro del 172.16.104.0/24 nexthop via 172.16.103.2 dev veth3 nexthop via inet6 2001:db8:101::2 dev veth1" + log_test $? 2 " Multipath route delete - nexthops in wrong order" + + run_cmd "$IP ro del 172.16.104.0/24 nexthop via inet6 2001:db8:101::2 dev veth1 nexthop via 172.16.103.2 dev veth3" + log_test $? 0 " Multipath route delete exact match" + + # + # multipath - v4 then v6 + # + run_cmd "$IP ro add 172.16.104.0/24 nexthop via 172.16.103.2 dev veth3 nexthop via inet6 2001:db8:101::2 dev veth1" + rc=$? + log_test $rc 0 "Multipath route add - v4 nexthop then v6" + if [ $rc -eq 0 ]; then + check_route "172.16.104.0/24 nexthop via 172.16.103.2 dev veth3 weight 1 nexthop via inet6 2001:db8:101::2 dev veth1 weight 1" + fi + + run_cmd "$IP ro del 172.16.104.0/24 nexthop via inet6 2001:db8:101::2 dev veth1 nexthop via 172.16.103.2 dev veth3" + log_test $? 2 " Multipath route delete - nexthops in wrong order" + + run_cmd "$IP ro del 172.16.104.0/24 nexthop via 172.16.103.2 dev veth3 nexthop via inet6 2001:db8:101::2 dev veth1" + log_test $? 0 " Multipath route delete exact match" + + route_cleanup +} + +################################################################################ +# usage + +usage() +{ + cat <<EOF +usage: ${0##*/} OPTS + + -t <test> Test(s) to run (default: all) + (options: $TESTS) + -p Pause on fail + -P Pause after each test before cleanup + -v verbose mode (show commands and output) +EOF +} + +################################################################################ +# main + +while getopts :t:pPhv o +do + case $o in + t) TESTS=$OPTARG;; + p) PAUSE_ON_FAIL=yes;; + P) PAUSE=yes;; + v) VERBOSE=$(($VERBOSE + 1));; + h) usage; exit 0;; + *) usage; exit 1;; + esac +done + +PEER_CMD="ip netns exec ${PEER_NS}" + +# make sure we don't pause twice +[ "${PAUSE}" = "yes" ] && PAUSE_ON_FAIL=no + +if [ "$(id -u)" -ne 0 ];then + echo "SKIP: Need root privileges" + exit $ksft_skip; +fi + +if [ ! -x "$(command -v ip)" ]; then + echo "SKIP: Could not run test without ip tool" + exit $ksft_skip +fi + +ip route help 2>&1 | grep -q fibmatch +if [ $? -ne 0 ]; then + echo "SKIP: iproute2 too old, missing fibmatch" + exit $ksft_skip +fi + +# start clean +cleanup &> /dev/null + +for t in $TESTS +do + case $t in + fib_unreg_test|unregister) fib_unreg_test;; + fib_down_test|down) fib_down_test;; + fib_carrier_test|carrier) fib_carrier_test;; + fib_rp_filter_test|rp_filter) fib_rp_filter_test;; + fib_nexthop_test|nexthop) fib_nexthop_test;; + fib_suppress_test|suppress) fib_suppress_test;; + ipv6_route_test|ipv6_rt) ipv6_route_test;; + ipv4_route_test|ipv4_rt) ipv4_route_test;; + ipv6_addr_metric) ipv6_addr_metric_test;; + ipv4_addr_metric) ipv4_addr_metric_test;; + ipv6_route_metrics) ipv6_route_metrics_test;; + ipv4_route_metrics) ipv4_route_metrics_test;; + ipv4_route_v6_gw) ipv4_route_v6_gw_test;; + + help) echo "Test names: $TESTS"; exit 0;; + esac +done + +if [ "$TESTS" != "none" ]; then + printf "\nTests passed: %3d\n" ${nsuccess} + printf "Tests failed: %3d\n" ${nfail} +fi + +exit $ret
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit f96a3d74554df537b6db5c99c27c80e7afadc8d1 ]
Cited commit added the table ID to the FIB info structure, but did not prevent structures with different table IDs from being consolidated. This can lead to routes being flushed from a VRF when an address is deleted from a different VRF.
Fix by taking the table ID into account when looking for a matching FIB info. This is already done for FIB info structures backed by a nexthop object in fib_find_info_nh().
Add test cases that fail before the fix:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [FAIL] TEST: Route in default VRF not removed [ OK ] RTNETLINK answers: File exists TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [FAIL]
Tests passed: 6 Tests failed: 2
And pass after:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests Regular FIB info TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Route removed from VRF when source address deleted [ OK ] TEST: Route in default VRF not removed [ OK ] TEST: Route removed in default VRF when source address deleted [ OK ] TEST: Route in VRF is not removed by address delete [ OK ]
Tests passed: 8 Tests failed: 0
Fixes: 5a56a0b3a45d ("net: Don't delete routes in different VRFs") Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Shaoying Xu shaoyi@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/fib_semantics.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -421,6 +421,7 @@ static struct fib_info *fib_find_info(st nfi->fib_prefsrc == fi->fib_prefsrc && nfi->fib_priority == fi->fib_priority && nfi->fib_type == fi->fib_type && + nfi->fib_tb_id == fi->fib_tb_id && memcmp(nfi->fib_metrics, fi->fib_metrics, sizeof(u32) * RTAX_MAX) == 0 && !((nfi->fib_flags ^ fi->fib_flags) & ~RTNH_COMPARE_MASK) &&
From: Yang Yingliang yangyingliang@huawei.com
commit 605d9fb9556f8f5fb4566f4df1480f280f308ded upstream.
If sdio_add_func() or sdio_init_func() fails, sdio_remove_func() can not release the resources, because the sdio function is not presented in these two cases, it won't call of_node_put() or put_device().
To fix these leaks, make sdio_func_present() only control whether device_del() needs to be called or not, then always call of_node_put() and put_device().
In error case in sdio_init_func(), the reference of 'card->dev' is not get, to avoid redundant put in sdio_free_func_cis(), move the get_device() to sdio_alloc_func() and put_device() to sdio_release_func(), it can keep the get/put function be balanced.
Without this patch, while doing fault inject test, it can get the following leak reports, after this fix, the leak is gone.
unreferenced object 0xffff888112514000 (size 2048): comm "kworker/3:2", pid 65, jiffies 4294741614 (age 124.774s) hex dump (first 32 bytes): 00 e0 6f 12 81 88 ff ff 60 58 8d 06 81 88 ff ff ..o.....`X...... 10 40 51 12 81 88 ff ff 10 40 51 12 81 88 ff ff .@Q......@Q..... backtrace: [<000000009e5931da>] kmalloc_trace+0x21/0x110 [<000000002f839ccb>] mmc_alloc_card+0x38/0xb0 [mmc_core] [<0000000004adcbf6>] mmc_sdio_init_card+0xde/0x170 [mmc_core] [<000000007538fea0>] mmc_attach_sdio+0xcb/0x1b0 [mmc_core] [<00000000d4fdeba7>] mmc_rescan+0x54a/0x640 [mmc_core]
unreferenced object 0xffff888112511000 (size 2048): comm "kworker/3:2", pid 65, jiffies 4294741623 (age 124.766s) hex dump (first 32 bytes): 00 40 51 12 81 88 ff ff e0 58 8d 06 81 88 ff ff .@Q......X...... 10 10 51 12 81 88 ff ff 10 10 51 12 81 88 ff ff ..Q.......Q..... backtrace: [<000000009e5931da>] kmalloc_trace+0x21/0x110 [<00000000fcbe706c>] sdio_alloc_func+0x35/0x100 [mmc_core] [<00000000c68f4b50>] mmc_attach_sdio.cold.18+0xb1/0x395 [mmc_core] [<00000000d4fdeba7>] mmc_rescan+0x54a/0x640 [mmc_core]
Fixes: 3d10a1ba0d37 ("sdio: fix reference counting in sdio_remove_func()") Signed-off-by: Yang Yingliang yangyingliang@huawei.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230130125808.3471254-1-yangyingliang@huawei.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/core/sdio_bus.c | 17 ++++++++++++++--- drivers/mmc/core/sdio_cis.c | 12 ------------ 2 files changed, 14 insertions(+), 15 deletions(-)
--- a/drivers/mmc/core/sdio_bus.c +++ b/drivers/mmc/core/sdio_bus.c @@ -269,6 +269,12 @@ static void sdio_release_func(struct dev if (!(func->card->quirks & MMC_QUIRK_NONSTD_SDIO)) sdio_free_func_cis(func);
+ /* + * We have now removed the link to the tuples in the + * card structure, so remove the reference. + */ + put_device(&func->card->dev); + kfree(func->info); kfree(func->tmpbuf); kfree(func); @@ -299,6 +305,12 @@ struct sdio_func *sdio_alloc_func(struct
device_initialize(&func->dev);
+ /* + * We may link to tuples in the card structure, + * we need make sure we have a reference to it. + */ + get_device(&func->card->dev); + func->dev.parent = &card->dev; func->dev.bus = &sdio_bus_type; func->dev.release = sdio_release_func; @@ -352,10 +364,9 @@ int sdio_add_func(struct sdio_func *func */ void sdio_remove_func(struct sdio_func *func) { - if (!sdio_func_present(func)) - return; + if (sdio_func_present(func)) + device_del(&func->dev);
- device_del(&func->dev); of_node_put(func->dev.of_node); put_device(&func->dev); } --- a/drivers/mmc/core/sdio_cis.c +++ b/drivers/mmc/core/sdio_cis.c @@ -384,12 +384,6 @@ int sdio_read_func_cis(struct sdio_func return ret;
/* - * Since we've linked to tuples in the card structure, - * we must make sure we have a reference to it. - */ - get_device(&func->card->dev); - - /* * Vendor/device id is optional for function CIS, so * copy it from the card structure as needed. */ @@ -414,11 +408,5 @@ void sdio_free_func_cis(struct sdio_func }
func->tuples = NULL; - - /* - * We have now removed the link to the tuples in the - * card structure, so remove the reference. - */ - put_device(&func->card->dev); }
From: Yang Yingliang yangyingliang@huawei.com
commit cf4c9d2ac1e42c7d18b921bec39486896645b714 upstream.
If mmc_add_host() fails, it doesn't need to call mmc_remove_host(), or it will cause null-ptr-deref, because of deleting a not added device in mmc_remove_host().
To fix this, goto label 'fail_glue_init', if mmc_add_host() fails, and change the label 'fail_add_host' to 'fail_gpiod_request'.
Fixes: 15a0580ced08 ("mmc_spi host driver") Signed-off-by: Yang Yingliang yangyingliang@huawei.com Cc:stable@vger.kernel.org Link: https://lore.kernel.org/r/20230131013835.3564011-1-yangyingliang@huawei.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/mmc_spi.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/mmc/host/mmc_spi.c +++ b/drivers/mmc/host/mmc_spi.c @@ -1420,7 +1420,7 @@ static int mmc_spi_probe(struct spi_devi
status = mmc_add_host(mmc); if (status != 0) - goto fail_add_host; + goto fail_glue_init;
/* * Index 0 is card detect @@ -1428,7 +1428,7 @@ static int mmc_spi_probe(struct spi_devi */ status = mmc_gpiod_request_cd(mmc, NULL, 0, false, 1, NULL); if (status == -EPROBE_DEFER) - goto fail_add_host; + goto fail_gpiod_request; if (!status) { /* * The platform has a CD GPIO signal that may support @@ -1443,7 +1443,7 @@ static int mmc_spi_probe(struct spi_devi /* Index 1 is write protect/read only */ status = mmc_gpiod_request_ro(mmc, NULL, 1, 0, NULL); if (status == -EPROBE_DEFER) - goto fail_add_host; + goto fail_gpiod_request; if (!status) has_ro = true;
@@ -1457,7 +1457,7 @@ static int mmc_spi_probe(struct spi_devi ? ", cd polling" : ""); return 0;
-fail_add_host: +fail_gpiod_request: mmc_remove_host(mmc); fail_glue_init: if (host->dma_dev)
From: Bo Liu bo.liu@senarytech.com
commit 18d7e16c917a08f08778ecf2b780d63648d5d923 upstream.
The current kernel does not support the SN6180 codec chip. Add the SN6180 codec configuration item to kernel.
Signed-off-by: Bo Liu bo.liu@senarytech.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1675908828-1012-1-git-send-email-bo.liu@senarytech... Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_conexant.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_conexant.c +++ b/sound/pci/hda/patch_conexant.c @@ -1093,6 +1093,7 @@ static const struct hda_device_id snd_hd HDA_CODEC_ENTRY(0x14f11f86, "CX8070", patch_conexant_auto), HDA_CODEC_ENTRY(0x14f12008, "CX8200", patch_conexant_auto), HDA_CODEC_ENTRY(0x14f120d0, "CX11970", patch_conexant_auto), + HDA_CODEC_ENTRY(0x14f120d1, "SN6180", patch_conexant_auto), HDA_CODEC_ENTRY(0x14f15045, "CX20549 (Venice)", patch_conexant_auto), HDA_CODEC_ENTRY(0x14f15047, "CX20551 (Waikiki)", patch_conexant_auto), HDA_CODEC_ENTRY(0x14f15051, "CX20561 (Hermosa)", patch_conexant_auto),
From: Kailang Yang kailang@realtek.com
commit 2bdccfd290d421b50df4ec6a68d832dad1310748 upstream.
GPIO2 PIN use for output. Mask Dir and Data need to assign for 0x4. Not 0x3. This fixed was for Lenovo Desktop(0x17aa1056). GPIO2 use for AMP enable.
Signed-off-by: Kailang Yang kailang@realtek.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/8d02bb9ac8134f878cd08607fdf088fd@realtek.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -770,7 +770,7 @@ do_sku: alc_setup_gpio(codec, 0x02); break; case 7: - alc_setup_gpio(codec, 0x03); + alc_setup_gpio(codec, 0x04); break; case 5: default:
From: Munehisa Kamata kamatam@amazon.com
commit c2dbe32d5db5c4ead121cf86dabd5ab691fb47fe upstream.
If a non-root cgroup gets removed when there is a thread that registered trigger and is polling on a pressure file within the cgroup, the polling waitqueue gets freed in the following path:
do_rmdir cgroup_rmdir kernfs_drain_open_files cgroup_file_release cgroup_pressure_release psi_trigger_destroy
However, the polling thread still has a reference to the pressure file and will access the freed waitqueue when the file is closed or upon exit:
fput ep_eventpoll_release ep_free ep_remove_wait_queue remove_wait_queue
This results in use-after-free as pasted below.
The fundamental problem here is that cgroup_file_release() (and consequently waitqueue's lifetime) is not tied to the file's real lifetime. Using wake_up_pollfree() here might be less than ideal, but it is in line with the comment at commit 42288cb44c4b ("wait: add wake_up_pollfree()") since the waitqueue's lifetime is not tied to file's one and can be considered as another special case. While this would be fixable by somehow making cgroup_file_release() be tied to the fput(), it would require sizable refactoring at cgroups or higher layer which might be more justifiable if we identify more cases like this.
BUG: KASAN: use-after-free in _raw_spin_lock_irqsave+0x60/0xc0 Write of size 4 at addr ffff88810e625328 by task a.out/4404
CPU: 19 PID: 4404 Comm: a.out Not tainted 6.2.0-rc6 #38 Hardware name: Amazon EC2 c5a.8xlarge/, BIOS 1.0 10/16/2017 Call Trace: <TASK> dump_stack_lvl+0x73/0xa0 print_report+0x16c/0x4e0 kasan_report+0xc3/0xf0 kasan_check_range+0x2d2/0x310 _raw_spin_lock_irqsave+0x60/0xc0 remove_wait_queue+0x1a/0xa0 ep_free+0x12c/0x170 ep_eventpoll_release+0x26/0x30 __fput+0x202/0x400 task_work_run+0x11d/0x170 do_exit+0x495/0x1130 do_group_exit+0x100/0x100 get_signal+0xd67/0xde0 arch_do_signal_or_restart+0x2a/0x2b0 exit_to_user_mode_prepare+0x94/0x100 syscall_exit_to_user_mode+0x20/0x40 do_syscall_64+0x52/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd </TASK>
Allocated by task 4404:
kasan_set_track+0x3d/0x60 __kasan_kmalloc+0x85/0x90 psi_trigger_create+0x113/0x3e0 pressure_write+0x146/0x2e0 cgroup_file_write+0x11c/0x250 kernfs_fop_write_iter+0x186/0x220 vfs_write+0x3d8/0x5c0 ksys_write+0x90/0x110 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 4407:
kasan_set_track+0x3d/0x60 kasan_save_free_info+0x27/0x40 ____kasan_slab_free+0x11d/0x170 slab_free_freelist_hook+0x87/0x150 __kmem_cache_free+0xcb/0x180 psi_trigger_destroy+0x2e8/0x310 cgroup_file_release+0x4f/0xb0 kernfs_drain_open_files+0x165/0x1f0 kernfs_drain+0x162/0x1a0 __kernfs_remove+0x1fb/0x310 kernfs_remove_by_name_ns+0x95/0xe0 cgroup_addrm_files+0x67f/0x700 cgroup_destroy_locked+0x283/0x3c0 cgroup_rmdir+0x29/0x100 kernfs_iop_rmdir+0xd1/0x140 vfs_rmdir+0xfe/0x240 do_rmdir+0x13d/0x280 __x64_sys_rmdir+0x2c/0x30 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: 0e94682b73bf ("psi: introduce psi monitor") Signed-off-by: Munehisa Kamata kamatam@amazon.com Signed-off-by: Mengchi Cheng mengcc@amazon.com Signed-off-by: Ingo Molnar mingo@kernel.org Acked-by: Suren Baghdasaryan surenb@google.com Acked-by: Peter Zijlstra peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/20230106224859.4123476-1-kamatam@amazon.com/ Link: https://lore.kernel.org/r/20230214212705.4058045-1-kamatam@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sched/psi.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
--- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -1092,10 +1092,11 @@ void psi_trigger_destroy(struct psi_trig
group = t->group; /* - * Wakeup waiters to stop polling. Can happen if cgroup is deleted - * from under a polling process. + * Wakeup waiters to stop polling and clear the queue to prevent it from + * being accessed later. Can happen if cgroup is deleted from under a + * polling process. */ - wake_up_interruptible(&t->event_wait); + wake_up_pollfree(&t->event_wait);
mutex_lock(&group->trigger_lock);
From: Mike Kravetz mike.kravetz@oracle.com
commit ec4288fe63966b26d53907212ecd05dfa81dd2cc upstream.
Users can specify the hugetlb page size in the mmap, shmget and memfd_create system calls. This is done by using 6 bits within the flags argument to encode the base-2 logarithm of the desired page size. The routine hstate_sizelog() uses the log2 value to find the corresponding hugetlb hstate structure. Converting the log2 value (page_size_log) to potential hugetlb page size is the simple statement:
1UL << page_size_log
Because only 6 bits are used for page_size_log, the left shift can not be greater than 63. This is fine on 64 bit architectures where a long is 64 bits. However, if a value greater than 31 is passed on a 32 bit architecture (where long is 32 bits) the shift will result in undefined behavior. This was generally not an issue as the result of the undefined shift had to exactly match hugetlb page size to proceed.
Recent improvements in runtime checking have resulted in this undefined behavior throwing errors such as reported below.
Fix by comparing page_size_log to BITS_PER_LONG before doing shift.
Link: https://lkml.kernel.org/r/20230216013542.138708-1-mike.kravetz@oracle.com Link: https://lore.kernel.org/lkml/CA+G9fYuei_Tr-vN9GS7SfFyU1y9hNysnf=PB7kT0=yv4Mi... Fixes: 42d7395feb56 ("mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLB") Signed-off-by: Mike Kravetz mike.kravetz@oracle.com Reported-by: Naresh Kamboju naresh.kamboju@linaro.org Reviewed-by: Jesper Juhl jesperjuhl76@gmail.com Acked-by: Muchun Song songmuchun@bytedance.com Tested-by: Linux Kernel Functional Testing lkft@linaro.org Tested-by: Naresh Kamboju naresh.kamboju@linaro.org Cc: Anders Roxell anders.roxell@linaro.org Cc: Andi Kleen ak@linux.intel.com Cc: Sasha Levin sashal@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/hugetlb.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -397,7 +397,10 @@ static inline struct hstate *hstate_size if (!page_size_log) return &default_hstate;
- return size_to_hstate(1UL << page_size_log); + if (page_size_log < BITS_PER_LONG) + return size_to_hstate(1UL << page_size_log); + + return NULL; }
static inline struct hstate *hstate_vma(struct vm_area_struct *vma)
From: Aaron Thompson dev@aaront.org
commit 647037adcad00f2bab8828d3d41cd0553d41f3bd upstream.
This reverts commit 115d9d77bb0f9152c60b6e8646369fa7f6167593.
The pages being freed by memblock_free_late() have already been initialized, but if they are in the deferred init range, __free_one_page() might access nearby uninitialized pages when trying to coalesce buddies. This can, for example, trigger this BUG:
BUG: unable to handle page fault for address: ffffe964c02580c8 RIP: 0010:__list_del_entry_valid+0x3f/0x70 <TASK> __free_one_page+0x139/0x410 __free_pages_ok+0x21d/0x450 memblock_free_late+0x8c/0xb9 efi_free_boot_services+0x16b/0x25c efi_enter_virtual_mode+0x403/0x446 start_kernel+0x678/0x714 secondary_startup_64_no_verify+0xd2/0xdb </TASK>
A proper fix will be more involved so revert this change for the time being.
Fixes: 115d9d77bb0f ("mm: Always release pages to the buddy allocator in memblock_free_late().") Signed-off-by: Aaron Thompson dev@aaront.org Link: https://lore.kernel.org/r/20230207082151.1303-1-dev@aaront.org Signed-off-by: Mike Rapoport (IBM) rppt@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memblock.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)
--- a/mm/memblock.c +++ b/mm/memblock.c @@ -1546,13 +1546,7 @@ void __init __memblock_free_late(phys_ad end = PFN_DOWN(base + size);
for (; cursor < end; cursor++) { - /* - * Reserved pages are always initialized by the end of - * memblock_free_all() (by memmap_init() and, if deferred - * initialization is enabled, memmap_init_reserved_pages()), so - * these pages can be released directly to the buddy allocator. - */ - __free_pages_core(pfn_to_page(cursor), 0); + memblock_free_pages(pfn_to_page(cursor), cursor, 0); totalram_pages_inc(); } }
From: Felix Riemann felix.riemann@sma.de
commit 9b55d3f0a69af649c62cbc2633e6d695bb3cc583 upstream.
When converting net_device_stats to rtnl_link_stats64 sign extension is triggered on ILP32 machines as 6c1c509778 changed the previous "ulong -> u64" conversion to "long -> u64" by accessing the net_device_stats fields through a (signed) atomic_long_t.
This causes for example the received bytes counter to jump to 16EiB after having received 2^31 bytes. Casting the atomic value to "unsigned long" beforehand converting it into u64 avoids this.
Fixes: 6c1c5097781f ("net: add atomic_long_t to net_device_stats fields") Signed-off-by: Felix Riemann felix.riemann@sma.de Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/core/dev.c +++ b/net/core/dev.c @@ -9467,7 +9467,7 @@ void netdev_stats_to_stats64(struct rtnl
BUILD_BUG_ON(n > sizeof(*stats64) / sizeof(u64)); for (i = 0; i < n; i++) - dst[i] = atomic_long_read(&src[i]); + dst[i] = (unsigned long)atomic_long_read(&src[i]); /* zero out counters that only exist in rtnl_link_stats64 */ memset((char *)stats64 + n * sizeof(u64), 0, sizeof(*stats64) - n * sizeof(u64));
From: Andrew Morton akpm@linux-foundation.org
commit a5b21d8d791cd4db609d0bbcaa9e0c7e019888d1 upstream.
This fix was nacked by Philip, for reasons identified in the email linked below.
Link: https://lkml.kernel.org/r/68f15d67-8945-2728-1f17-5b53a80ec52d@squashfs.org.... Fixes: 72e544b1b28325 ("squashfs: harden sanity check in squashfs_read_xattr_id_table") Cc: Alexey Khoroshilov khoroshilov@ispras.ru Cc: Fedor Pchelkin pchelkin@ispras.ru Cc: Phillip Lougher phillip@squashfs.org.uk Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/squashfs/xattr_id.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/squashfs/xattr_id.c +++ b/fs/squashfs/xattr_id.c @@ -76,7 +76,7 @@ __le64 *squashfs_read_xattr_id_table(str /* Sanity check values */
/* there is always at least one xattr id */ - if (*xattr_ids <= 0) + if (*xattr_ids == 0) return ERR_PTR(-EINVAL);
len = SQUASHFS_XATTR_BLOCK_BYTES(*xattr_ids);
From: Jason Xing kernelxing@tencent.com
commit f9cd6a4418bac6a046ee78382423b1ae7565fb24 upstream.
Recently I encountered one case where I cannot increase the MTU size directly from 1500 to a much bigger value with XDP enabled if the server is equipped with IXGBE card, which happened on thousands of servers in production environment. After applying the current patch, we can set the maximum MTU size to 3K.
This patch follows the behavior of changing MTU as i40e/ice does.
[1] commit 23b44513c3e6 ("ice: allow 3k MTU for XDP") [2] commit 0c8493d90b6b ("i40e: add XDP support for pass and drop actions")
Fixes: fabf1bce103a ("ixgbe: Prevent unsupported configurations with XDP") Signed-off-by: Jason Xing kernelxing@tencent.com Reviewed-by: Alexander Duyck alexanderduyck@fb.com Tested-by: Chandan Kumar Rout chandanx.rout@intel.com (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-)
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -6722,6 +6722,18 @@ static void ixgbe_free_all_rx_resources( }
/** + * ixgbe_max_xdp_frame_size - returns the maximum allowed frame size for XDP + * @adapter: device handle, pointer to adapter + */ +static int ixgbe_max_xdp_frame_size(struct ixgbe_adapter *adapter) +{ + if (PAGE_SIZE >= 8192 || adapter->flags2 & IXGBE_FLAG2_RX_LEGACY) + return IXGBE_RXBUFFER_2K; + else + return IXGBE_RXBUFFER_3K; +} + +/** * ixgbe_change_mtu - Change the Maximum Transfer Unit * @netdev: network interface device structure * @new_mtu: new value for maximum frame size @@ -6732,18 +6744,13 @@ static int ixgbe_change_mtu(struct net_d { struct ixgbe_adapter *adapter = netdev_priv(netdev);
- if (adapter->xdp_prog) { + if (ixgbe_enabled_xdp_adapter(adapter)) { int new_frame_size = new_mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN; - int i; - - for (i = 0; i < adapter->num_rx_queues; i++) { - struct ixgbe_ring *ring = adapter->rx_ring[i];
- if (new_frame_size > ixgbe_rx_bufsz(ring)) { - e_warn(probe, "Requested MTU size is not supported with XDP\n"); - return -EINVAL; - } + if (new_frame_size > ixgbe_max_xdp_frame_size(adapter)) { + e_warn(probe, "Requested MTU size is not supported with XDP\n"); + return -EINVAL; } }
From: Jason Xing kernelxing@tencent.com
commit ce45ffb815e8e238f05de1630be3969b6bb15e4e upstream.
Include the second VLAN HLEN into account when computing the maximum MTU size as other drivers do.
Fixes: 0c8493d90b6b ("i40e: add XDP support for pass and drop actions") Signed-off-by: Jason Xing kernelxing@tencent.com Reviewed-by: Alexander Duyck alexanderduyck@fb.com Tested-by: Chandan Kumar Rout chandanx.rout@intel.com (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -2702,7 +2702,7 @@ static int i40e_change_mtu(struct net_de struct i40e_pf *pf = vsi->back;
if (i40e_enabled_xdp_vsi(vsi)) { - int frame_size = new_mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN; + int frame_size = new_mtu + I40E_PACKET_HDR_PAD;
if (frame_size > i40e_max_xdp_frame_size(vsi)) return -EINVAL;
From: Rafał Miłecki rafal@milecki.pl
commit d61615c366a489646a1bfe5b33455f916762d5f4 upstream.
Code blocks handling BCMA_CHIP_ID_BCM5357 and BCMA_CHIP_ID_BCM53572 were incorrectly unified. Chip package values are not unique and cannot be checked independently. They are meaningful only in a context of a given chip.
Packages BCM5358 and BCM47188 share the same value but then belong to different chips. Code unification resulted in treating BCM5358 as BCM47188 and broke its initialization.
Link: https://github.com/openwrt/openwrt/issues/8278 Fixes: cb1b0f90acfe ("net: ethernet: bgmac: unify code of the same family") Cc: Jon Mason jdmason@kudzu.us Signed-off-by: Rafał Miłecki rafal@milecki.pl Reviewed-by: Florian Fainelli f.fainelli@gmail.com Link: https://lore.kernel.org/r/20230208091637.16291-1-zajec5@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/bgmac-bcma.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/net/ethernet/broadcom/bgmac-bcma.c +++ b/drivers/net/ethernet/broadcom/bgmac-bcma.c @@ -228,12 +228,12 @@ static int bgmac_probe(struct bcma_devic bgmac->feature_flags |= BGMAC_FEAT_CLKCTLST; bgmac->feature_flags |= BGMAC_FEAT_FLW_CTRL1; bgmac->feature_flags |= BGMAC_FEAT_SW_TYPE_PHY; - if (ci->pkg == BCMA_PKG_ID_BCM47188 || - ci->pkg == BCMA_PKG_ID_BCM47186) { + if ((ci->id == BCMA_CHIP_ID_BCM5357 && ci->pkg == BCMA_PKG_ID_BCM47186) || + (ci->id == BCMA_CHIP_ID_BCM53572 && ci->pkg == BCMA_PKG_ID_BCM47188)) { bgmac->feature_flags |= BGMAC_FEAT_SW_TYPE_RGMII; bgmac->feature_flags |= BGMAC_FEAT_IOST_ATTACHED; } - if (ci->pkg == BCMA_PKG_ID_BCM5358) + if (ci->id == BCMA_CHIP_ID_BCM5357 && ci->pkg == BCMA_PKG_ID_BCM5358) bgmac->feature_flags |= BGMAC_FEAT_SW_TYPE_EPHYRMII; break; case BCMA_CHIP_ID_BCM53573:
From: Pietro Borrello borrello@diag.uniroma1.it
commit a1221703a0f75a9d81748c516457e0fc76951496 upstream.
Use list_is_first() to check whether tsp->asoc matches the first element of ep->asocs, as the list is not guaranteed to have an entry.
Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file") Signed-off-by: Pietro Borrello borrello@diag.uniroma1.it Acked-by: Xin Long lucien.xin@gmail.com Link: https://lore.kernel.org/r/20230208-sctp-filter-v2-1-6e1f4017f326@diag.unirom... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/diag.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/net/sctp/diag.c +++ b/net/sctp/diag.c @@ -349,11 +349,9 @@ static int sctp_sock_filter(struct sctp_ struct sctp_comm_param *commp = p; struct sock *sk = ep->base.sk; const struct inet_diag_req_v2 *r = commp->r; - struct sctp_association *assoc = - list_entry(ep->asocs.next, struct sctp_association, asocs);
/* find the ep only once through the transports by this condition */ - if (tsp->asoc != assoc) + if (!list_is_first(&tsp->asoc->asocs, &ep->asocs)) return 0;
if (r->sdiag_family != AF_UNSPEC && sk->sk_family != r->sdiag_family)
From: Kuniyuki Iwashima kuniyu@amazon.com
commit ca43ccf41224b023fc290073d5603a755fd12eed upstream.
Eric Dumazet pointed out [0] that when we call skb_set_owner_r() for ipv6_pinfo.pktoptions, sk_rmem_schedule() has not been called, resulting in a negative sk_forward_alloc.
We add a new helper which clones a skb and sets its owner only when sk_rmem_schedule() succeeds.
Note that we move skb_set_owner_r() forward in (dccp|tcp)_v6_do_rcv() because tcp_send_synack() can make sk_forward_alloc negative before ipv6_opt_accepted() in the crossed SYN-ACK or self-connect() cases.
[0]: https://lore.kernel.org/netdev/CANn89iK9oc20Jdi_41jb9URdF210r7d1Y-+uypbMSbOf...
Fixes: 323fbd0edf3f ("net: dccp: Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv()") Fixes: 3df80d9320bc ("[DCCP]: Introduce DCCPv6") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/sock.h | 13 +++++++++++++ net/dccp/ipv6.c | 7 ++----- net/ipv6/tcp_ipv6.c | 10 +++------- 3 files changed, 18 insertions(+), 12 deletions(-)
--- a/include/net/sock.h +++ b/include/net/sock.h @@ -2167,6 +2167,19 @@ static inline __must_check bool skb_set_ return false; }
+static inline struct sk_buff *skb_clone_and_charge_r(struct sk_buff *skb, struct sock *sk) +{ + skb = skb_clone(skb, sk_gfp_mask(sk, GFP_ATOMIC)); + if (skb) { + if (sk_rmem_schedule(sk, skb, skb->truesize)) { + skb_set_owner_r(skb, sk); + return skb; + } + __kfree_skb(skb); + } + return NULL; +} + void sk_reset_timer(struct sock *sk, struct timer_list *timer, unsigned long expires);
--- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -541,11 +541,9 @@ static struct sock *dccp_v6_request_recv *own_req = inet_ehash_nolisten(newsk, req_to_sk(req_unhash), NULL); /* Clone pktoptions received with SYN, if we own the req */ if (*own_req && ireq->pktopts) { - newnp->pktoptions = skb_clone(ireq->pktopts, GFP_ATOMIC); + newnp->pktoptions = skb_clone_and_charge_r(ireq->pktopts, newsk); consume_skb(ireq->pktopts); ireq->pktopts = NULL; - if (newnp->pktoptions) - skb_set_owner_r(newnp->pktoptions, newsk); }
return newsk; @@ -605,7 +603,7 @@ static int dccp_v6_do_rcv(struct sock *s --ANK (980728) */ if (np->rxopt.all) - opt_skb = skb_clone(skb, GFP_ATOMIC); + opt_skb = skb_clone_and_charge_r(skb, sk);
if (sk->sk_state == DCCP_OPEN) { /* Fast path */ if (dccp_rcv_established(sk, skb, dccp_hdr(skb), skb->len)) @@ -669,7 +667,6 @@ ipv6_pktoptions: np->flow_label = ip6_flowlabel(ipv6_hdr(opt_skb)); if (ipv6_opt_accepted(sk, opt_skb, &DCCP_SKB_CB(opt_skb)->header.h6)) { - skb_set_owner_r(opt_skb, sk); memmove(IP6CB(opt_skb), &DCCP_SKB_CB(opt_skb)->header.h6, sizeof(struct inet6_skb_parm)); --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1318,14 +1318,11 @@ static struct sock *tcp_v6_syn_recv_sock
/* Clone pktoptions received with SYN, if we own the req */ if (ireq->pktopts) { - newnp->pktoptions = skb_clone(ireq->pktopts, - sk_gfp_mask(sk, GFP_ATOMIC)); + newnp->pktoptions = skb_clone_and_charge_r(ireq->pktopts, newsk); consume_skb(ireq->pktopts); ireq->pktopts = NULL; - if (newnp->pktoptions) { + if (newnp->pktoptions) tcp_v6_restore_cb(newnp->pktoptions); - skb_set_owner_r(newnp->pktoptions, newsk); - } } } else { if (!req_unhash && found_dup_sk) { @@ -1393,7 +1390,7 @@ static int tcp_v6_do_rcv(struct sock *sk --ANK (980728) */ if (np->rxopt.all) - opt_skb = skb_clone(skb, sk_gfp_mask(sk, GFP_ATOMIC)); + opt_skb = skb_clone_and_charge_r(skb, sk);
if (sk->sk_state == TCP_ESTABLISHED) { /* Fast path */ struct dst_entry *dst; @@ -1475,7 +1472,6 @@ ipv6_pktoptions: if (np->repflow) np->flow_label = ip6_flowlabel(ipv6_hdr(opt_skb)); if (ipv6_opt_accepted(sk, opt_skb, &TCP_SKB_CB(opt_skb)->header.h6)) { - skb_set_owner_r(opt_skb, sk); tcp_v6_restore_cb(opt_skb); opt_skb = xchg(&np->pktoptions, opt_skb); } else {
From: Miko Larsson mikoxyzzz@gmail.com
commit c68f345b7c425b38656e1791a0486769a8797016 upstream.
syzbot reported that act_len in kalmia_send_init_packet() is uninitialized when passing it to the first usb_bulk_msg error path. Jiri Pirko noted that it's pointless to pass it in the error path, and that the value that would be printed in the second error path would be the value of act_len from the first call to usb_bulk_msg.[1]
With this in mind, let's just not pass act_len to the usb_bulk_msg error paths.
1: https://lore.kernel.org/lkml/Y9pY61y1nwTuzMOa@nanopsycho/
Fixes: d40261236e8e ("net/usb: Add Samsung Kalmia driver for Samsung GT-B3730") Reported-and-tested-by: syzbot+cd80c5ef5121bfe85b55@syzkaller.appspotmail.com Signed-off-by: Miko Larsson mikoxyzzz@gmail.com Reviewed-by: Alexander Duyck alexanderduyck@fb.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/kalmia.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/net/usb/kalmia.c +++ b/drivers/net/usb/kalmia.c @@ -65,8 +65,8 @@ kalmia_send_init_packet(struct usbnet *d init_msg, init_msg_len, &act_len, KALMIA_USB_TIMEOUT); if (status != 0) { netdev_err(dev->net, - "Error sending init packet. Status %i, length %i\n", - status, act_len); + "Error sending init packet. Status %i\n", + status); return status; } else if (act_len != init_msg_len) { @@ -83,8 +83,8 @@ kalmia_send_init_packet(struct usbnet *d
if (status != 0) netdev_err(dev->net, - "Error receiving init result. Status %i, length %i\n", - status, act_len); + "Error receiving init result. Status %i\n", + status); else if (act_len != expected_len) netdev_err(dev->net, "Unexpected init result length: %i\n", act_len);
From: Johannes Zink j.zink@pengutronix.de
commit 4562c65ec852067c6196abdcf2d925f08841dcbc upstream.
So far changing the period by just setting new period values while running did not work.
The order as indicated by the publicly available reference manual of the i.MX8MP [1] indicates a sequence:
* initiate the programming sequence * set the values for PPS period and start time * start the pulse train generation.
This is currently not used in dwmac5_flex_pps_config(), which instead does:
* initiate the programming sequence and immediately start the pulse train generation * set the values for PPS period and start time
This caused the period values written not to take effect until the FlexPPS output was disabled and re-enabled again.
This patch fix the order and allows the period to be set immediately.
[1] https://www.nxp.com/webapp/Download?colCode=IMX8MPRM
Fixes: 9a8a02c9d46d ("net: stmmac: Add Flexible PPS support") Signed-off-by: Johannes Zink j.zink@pengutronix.de Link: https://lore.kernel.org/r/20230210143937.3427483-1-j.zink@pengutronix.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/stmicro/stmmac/dwmac5.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac5.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac5.c @@ -520,9 +520,9 @@ int dwmac5_flex_pps_config(void __iomem return 0; }
- val |= PPSCMDx(index, 0x2); val |= TRGTMODSELx(index, 0x2); val |= PPSEN0; + writel(val, ioaddr + MAC_PPS_CONTROL);
writel(cfg->start.tv_sec, ioaddr + MAC_PPSx_TARGET_TIME_SEC(index));
@@ -547,6 +547,7 @@ int dwmac5_flex_pps_config(void __iomem writel(period - 1, ioaddr + MAC_PPSx_WIDTH(index));
/* Finally, activate it */ + val |= PPSCMDx(index, 0x2); writel(val, ioaddr + MAC_PPS_CONTROL); return 0; }
From: Michael Chan michael.chan@broadcom.com
commit 2038cc592811209de20c4e094ca08bfb1e6fbc6c upstream.
In bnxt_reserve_rings(), there is logic to check that the number of TX rings reserved is enough to cover all the mqprio TCs, but it fails to account for the TX XDP rings. So the check will always fail if there are mqprio TCs and TX XDP rings. As a result, the driver always fails to initialize after the XDP program is attached and the device will be brought down. A subsequent ifconfig up will also fail because the number of TX rings is set to an inconsistent number. Fix the check to properly account for TX XDP rings. If the check fails, set the number of TX rings back to a consistent number after calling netdev_reset_tc().
Fixes: 674f50a5b026 ("bnxt_en: Implement new method to reserve rings.") Reviewed-by: Hongguang Gao hongguang.gao@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -8205,10 +8205,14 @@ int bnxt_reserve_rings(struct bnxt *bp, netdev_err(bp->dev, "ring reservation/IRQ init failure rc: %d\n", rc); return rc; } - if (tcs && (bp->tx_nr_rings_per_tc * tcs != bp->tx_nr_rings)) { + if (tcs && (bp->tx_nr_rings_per_tc * tcs != + bp->tx_nr_rings - bp->tx_nr_rings_xdp)) { netdev_err(bp->dev, "tx ring reservation failure\n"); netdev_reset_tc(bp->dev); - bp->tx_nr_rings_per_tc = bp->tx_nr_rings; + if (bp->tx_nr_rings_xdp) + bp->tx_nr_rings_per_tc = bp->tx_nr_rings_xdp; + else + bp->tx_nr_rings_per_tc = bp->tx_nr_rings; return -ENOMEM; } return 0;
From: Cristian Ciocaltea cristian.ciocaltea@collabora.com
commit 05d7623a892a9da62da0e714428e38f09e4a64d8 upstream.
When setting 'snps,force_thresh_dma_mode' DT property, the following warning is always emitted, regardless the status of force_sf_dma_mode:
dwmac-starfive 10020000.ethernet: force_sf_dma_mode is ignored if force_thresh_dma_mode is set.
Do not print the rather misleading message when DMA store and forward mode is already disabled.
Fixes: e2a240c7d3bc ("driver:net:stmmac: Disable DMA store and forward mode if platform data force_thresh_dma_mode is set.") Signed-off-by: Cristian Ciocaltea cristian.ciocaltea@collabora.com Link: https://lore.kernel.org/r/20230210202126.877548-1-cristian.ciocaltea@collabo... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c @@ -554,7 +554,7 @@ stmmac_probe_config_dt(struct platform_d dma_cfg->mixed_burst = of_property_read_bool(np, "snps,mixed-burst");
plat->force_thresh_dma_mode = of_property_read_bool(np, "snps,force_thresh_dma_mode"); - if (plat->force_thresh_dma_mode) { + if (plat->force_thresh_dma_mode && plat->force_sf_dma_mode) { plat->force_sf_dma_mode = 0; dev_warn(&pdev->dev, "force_sf_dma_mode is ignored if force_thresh_dma_mode is set.\n");
From: Jakub Kicinski kuba@kernel.org
commit fda6c89fe3d9aca073495a664e1d5aea28cd4377 upstream.
lianhui reports that when MPLS fails to register the sysctl table under new location (during device rename) the old pointers won't get overwritten and may be freed again (double free).
Handle this gracefully. The best option would be unregistering the MPLS from the device completely on failure, but unfortunately mpls_ifdown() can fail. So failing fully is also unreliable.
Another option is to register the new table first then only remove old one if the new one succeeds. That requires more code, changes order of notifications and two tables may be visible at the same time.
sysctl point is not used in the rest of the code - set to NULL on failures and skip unregister if already NULL.
Reported-by: lianhui tang bluetlh@gmail.com Fixes: 0fae3bf018d9 ("mpls: handle device renames for per-device sysctls") Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mpls/af_mpls.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/net/mpls/af_mpls.c +++ b/net/mpls/af_mpls.c @@ -1428,6 +1428,7 @@ static int mpls_dev_sysctl_register(stru free: kfree(table); out: + mdev->sysctl = NULL; return -ENOBUFS; }
@@ -1437,6 +1438,9 @@ static void mpls_dev_sysctl_unregister(s struct net *net = dev_net(dev); struct ctl_table *table;
+ if (!mdev->sysctl) + return; + table = mdev->sysctl->ctl_table_arg; unregister_net_sysctl_table(mdev->sysctl); kfree(table);
From: Jason Xing kernelxing@tencent.com
commit 0967bf837784a11c65d66060623a74e65211af0b upstream.
Include the second VLAN HLEN into account when computing the maximum MTU size as other drivers do.
Fixes: fabf1bce103a ("ixgbe: Prevent unsupported configurations with XDP") Signed-off-by: Jason Xing kernelxing@tencent.com Reviewed-by: Alexander Duyck alexanderduyck@fb.com Tested-by: Chandan Kumar Rout chandanx.rout@intel.com (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/intel/ixgbe/ixgbe.h | 2 ++ drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 3 +-- 2 files changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h @@ -67,6 +67,8 @@ #define IXGBE_RXBUFFER_4K 4096 #define IXGBE_MAX_RXBUFFER 16384 /* largest size for a single descriptor */
+#define IXGBE_PKT_HDR_PAD (ETH_HLEN + ETH_FCS_LEN + (VLAN_HLEN * 2)) + /* Attempt to maximize the headroom available for incoming frames. We * use a 2K buffer for receives and need 1536/1534 to store the data for * the frame. This leaves us with 512 bytes of room. From that we need --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -6745,8 +6745,7 @@ static int ixgbe_change_mtu(struct net_d struct ixgbe_adapter *adapter = netdev_priv(netdev);
if (ixgbe_enabled_xdp_adapter(adapter)) { - int new_frame_size = new_mtu + ETH_HLEN + ETH_FCS_LEN + - VLAN_HLEN; + int new_frame_size = new_mtu + IXGBE_PKT_HDR_PAD;
if (new_frame_size > ixgbe_max_xdp_frame_size(adapter)) { e_warn(probe, "Requested MTU size is not supported with XDP\n");
From: Guillaume Nault gnault@redhat.com
commit e010ae08c71fda8be3d6bda256837795a0b3ea41 upstream.
Take into account the IPV6_TCLASS socket option (DSCP) in ip6_datagram_flow_key_init(). Otherwise fib6_rule_match() can't properly match the DSCP value, resulting in invalid route lookup.
For example:
ip route add unreachable table main 2001:db8::10/124
ip route add table 100 2001:db8::10/124 dev eth0 ip -6 rule add dsfield 0x04 table 100
echo test | socat - UDP6:[2001:db8::11]:54321,ipv6-tclass=0x04
Without this patch, socat fails at connect() time ("No route to host") because the fib-rule doesn't jump to table 100 and the lookup ends up being done in the main table.
Fixes: 2cc67cc731d9 ("[IPV6] ROUTE: Routing by Traffic Class.") Signed-off-by: Guillaume Nault gnault@redhat.com Reviewed-by: Eric Dumazet edumazet@google.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/datagram.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv6/datagram.c +++ b/net/ipv6/datagram.c @@ -50,7 +50,7 @@ static void ip6_datagram_flow_key_init(s fl6->flowi6_mark = sk->sk_mark; fl6->fl6_dport = inet->inet_dport; fl6->fl6_sport = inet->inet_sport; - fl6->flowlabel = np->flow_label; + fl6->flowlabel = ip6_make_flowinfo(np->tclass, np->flow_label); fl6->flowi6_uid = sk->sk_uid;
if (!fl6->flowi6_oif)
From: Guillaume Nault gnault@redhat.com
commit 8230680f36fd1525303d1117768c8852314c488c upstream.
Take into account the IPV6_TCLASS socket option (DSCP) in tcp_v6_connect(). Otherwise fib6_rule_match() can't properly match the DSCP value, resulting in invalid route lookup.
For example:
ip route add unreachable table main 2001:db8::10/124
ip route add table 100 2001:db8::10/124 dev eth0 ip -6 rule add dsfield 0x04 table 100
echo test | socat - TCP6:[2001:db8::11]:54321,ipv6-tclass=0x04
Without this patch, socat fails at connect() time ("No route to host") because the fib-rule doesn't jump to table 100 and the lookup ends up being done in the main table.
Fixes: 2cc67cc731d9 ("[IPV6] ROUTE: Routing by Traffic Class.") Signed-off-by: Guillaume Nault gnault@redhat.com Reviewed-by: Eric Dumazet edumazet@google.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/tcp_ipv6.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -264,6 +264,7 @@ static int tcp_v6_connect(struct sock *s fl6.flowi6_proto = IPPROTO_TCP; fl6.daddr = sk->sk_v6_daddr; fl6.saddr = saddr ? *saddr : np->saddr; + fl6.flowlabel = ip6_make_flowinfo(np->tclass, np->flow_label); fl6.flowi6_oif = sk->sk_bound_dev_if; fl6.flowi6_mark = sk->sk_mark; fl6.fl6_dport = usin->sin6_port;
From: Natalia Petrova n.petrova@fintech.ru
[ Upstream commit 7fa0b526f865cb42aa33917fd02a92cb03746f4d ]
The result of nlmsg_find_attr() 'br_spec' is dereferenced in nla_for_each_nested(), but it can take NULL value in nla_find() function, which will result in an error.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 51616018dd1b ("i40e: Add support for getlink, setlink ndo ops") Signed-off-by: Natalia Petrova n.petrova@fintech.ru Reviewed-by: Jesse Brandeburg jesse.brandeburg@intel.com Tested-by: Gurucharan G gurucharanx.g@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Link: https://lore.kernel.org/r/20230209172833.3596034-1-anthony.l.nguyen@intel.co... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_main.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 1226b4fdbeac0..3f983d69f10eb 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -12535,6 +12535,8 @@ static int i40e_ndo_bridge_setlink(struct net_device *dev, }
br_spec = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg), IFLA_AF_SPEC); + if (!br_spec) + return -EINVAL;
nla_for_each_nested(attr, br_spec, rem) { __u16 mode;
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
commit 2c10b61421a28e95a46ab489fd56c0f442ff6952 upstream.
When calling the KVM_GET_DEBUGREGS ioctl, on some configurations, there might be some unitialized portions of the kvm_debugregs structure that could be copied to userspace. Prevent this as is done in the other kvm ioctls, by setting the whole structure to 0 before copying anything into it.
Bonus is that this reduces the lines of code as the explicit flag setting and reserved space zeroing out can be removed.
Cc: Sean Christopherson seanjc@google.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: Dave Hansen dave.hansen@linux.intel.com Cc: x86@kernel.org Cc: "H. Peter Anvin" hpa@zytor.com Cc: stable stable@kernel.org Reported-by: Xingyuan Mo hdthky0@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Message-Id: 20230214103304.3689213-1-gregkh@linuxfoundation.org Tested-by: Xingyuan Mo hdthky0@gmail.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/x86.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3948,12 +3948,11 @@ static void kvm_vcpu_ioctl_x86_get_debug { unsigned long val;
+ memset(dbgregs, 0, sizeof(*dbgregs)); memcpy(dbgregs->db, vcpu->arch.db, sizeof(vcpu->arch.db)); kvm_get_dr(vcpu, 6, &val); dbgregs->dr6 = val; dbgregs->dr7 = vcpu->arch.dr7; - dbgregs->flags = 0; - memset(&dbgregs->reserved, 0, sizeof(dbgregs->reserved)); }
static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
From: Ryusuke Konishi konishi.ryusuke@gmail.com
commit 99b9402a36f0799f25feee4465bfa4b8dfa74b4d upstream.
Macro NILFS_SB2_OFFSET_BYTES, which computes the position of the second superblock, underflows when the argument device size is less than 4096 bytes. Therefore, when using this macro, it is necessary to check in advance that the device size is not less than a lower limit, or at least that underflow does not occur.
The current nilfs2 implementation lacks this check, causing out-of-bound block access when mounting devices smaller than 4096 bytes:
I/O error, dev loop0, sector 36028797018963960 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 NILFS (loop0): unable to read secondary superblock (blocksize = 1024)
In addition, when trying to resize the filesystem to a size below 4096 bytes, this underflow occurs in nilfs_resize_fs(), passing a huge number of segments to nilfs_sufile_resize(), corrupting parameters such as the number of segments in superblocks. This causes excessive loop iterations in nilfs_sufile_resize() during a subsequent resize ioctl, causing semaphore ns_segctor_sem to block for a long time and hang the writer thread:
INFO: task segctord:5067 blocked for more than 143 seconds. Not tainted 6.2.0-rc8-syzkaller-00015-gf6feea56f66d #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:segctord state:D stack:23456 pid:5067 ppid:2 flags:0x00004000 Call Trace: <TASK> context_switch kernel/sched/core.c:5293 [inline] __schedule+0x1409/0x43f0 kernel/sched/core.c:6606 schedule+0xc3/0x190 kernel/sched/core.c:6682 rwsem_down_write_slowpath+0xfcf/0x14a0 kernel/locking/rwsem.c:1190 nilfs_transaction_lock+0x25c/0x4f0 fs/nilfs2/segment.c:357 nilfs_segctor_thread_construct fs/nilfs2/segment.c:2486 [inline] nilfs_segctor_thread+0x52f/0x1140 fs/nilfs2/segment.c:2570 kthread+0x270/0x300 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 </TASK> ... Call Trace: <TASK> folio_mark_accessed+0x51c/0xf00 mm/swap.c:515 __nilfs_get_page_block fs/nilfs2/page.c:42 [inline] nilfs_grab_buffer+0x3d3/0x540 fs/nilfs2/page.c:61 nilfs_mdt_submit_block+0xd7/0x8f0 fs/nilfs2/mdt.c:121 nilfs_mdt_read_block+0xeb/0x430 fs/nilfs2/mdt.c:176 nilfs_mdt_get_block+0x12d/0xbb0 fs/nilfs2/mdt.c:251 nilfs_sufile_get_segment_usage_block fs/nilfs2/sufile.c:92 [inline] nilfs_sufile_truncate_range fs/nilfs2/sufile.c:679 [inline] nilfs_sufile_resize+0x7a3/0x12b0 fs/nilfs2/sufile.c:777 nilfs_resize_fs+0x20c/0xed0 fs/nilfs2/super.c:422 nilfs_ioctl_resize fs/nilfs2/ioctl.c:1033 [inline] nilfs_ioctl+0x137c/0x2440 fs/nilfs2/ioctl.c:1301 ...
This fixes these issues by inserting appropriate minimum device size checks or anti-underflow checks, depending on where the macro is used.
Link: https://lkml.kernel.org/r/0000000000004e1dfa05f4a48e6b@google.com Link: https://lkml.kernel.org/r/20230214224043.24141-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Reported-by: syzbot+f0c4082ce5ebebdac63b@syzkaller.appspotmail.com Tested-by: Ryusuke Konishi konishi.ryusuke@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nilfs2/ioctl.c | 7 +++++++ fs/nilfs2/super.c | 9 +++++++++ fs/nilfs2/the_nilfs.c | 8 +++++++- 3 files changed, 23 insertions(+), 1 deletion(-)
--- a/fs/nilfs2/ioctl.c +++ b/fs/nilfs2/ioctl.c @@ -1130,7 +1130,14 @@ static int nilfs_ioctl_set_alloc_range(s
minseg = range[0] + segbytes - 1; do_div(minseg, segbytes); + + if (range[1] < 4096) + goto out; + maxseg = NILFS_SB2_OFFSET_BYTES(range[1]); + if (maxseg < segbytes) + goto out; + do_div(maxseg, segbytes); maxseg--;
--- a/fs/nilfs2/super.c +++ b/fs/nilfs2/super.c @@ -404,6 +404,15 @@ int nilfs_resize_fs(struct super_block * goto out;
/* + * Prevent underflow in second superblock position calculation. + * The exact minimum size check is done in nilfs_sufile_resize(). + */ + if (newsize < 4096) { + ret = -ENOSPC; + goto out; + } + + /* * Write lock is required to protect some functions depending * on the number of segments, the number of reserved segments, * and so forth. --- a/fs/nilfs2/the_nilfs.c +++ b/fs/nilfs2/the_nilfs.c @@ -517,9 +517,15 @@ static int nilfs_load_super_block(struct { struct nilfs_super_block **sbp = nilfs->ns_sbp; struct buffer_head **sbh = nilfs->ns_sbh; - u64 sb2off = NILFS_SB2_OFFSET_BYTES(nilfs->ns_bdev->bd_inode->i_size); + u64 sb2off, devsize = nilfs->ns_bdev->bd_inode->i_size; int valid[2], swp = 0;
+ if (devsize < NILFS_SEG_MIN_BLOCKS * NILFS_MIN_BLOCK_SIZE + 4096) { + nilfs_msg(sb, KERN_ERR, "device size too small"); + return -EINVAL; + } + sb2off = NILFS_SB2_OFFSET_BYTES(devsize); + sbp[0] = nilfs_read_super_block(sb, NILFS_SB_OFFSET_BYTES, blocksize, &sbh[0]); sbp[1] = nilfs_read_super_block(sb, sb2off, blocksize, &sbh[1]);
From: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
commit 1f810d2b6b2fbdc5279644d8b2c140b1f7c9d43d upstream.
The HDaudio stream allocation is done first, and in a second step the LOSIDV parameter is programmed for the multi-link used by a codec.
This leads to a possible stream_tag leak, e.g. if a DisplayAudio link is not used. This would happen when a non-Intel graphics card is used and userspace unconditionally uses the Intel Display Audio PCMs without checking if they are connected to a receiver with jack controls.
We should first check that there is a valid multi-link entry to configure before allocating a stream_tag. This change aligns the dma_assign and dma_cleanup phases.
Complements: b0cd60f3e9f5 ("ALSA/ASoC: hda: clarify bus_get_link() and bus_link_get() helpers") Link: https://github.com/thesofproject/linux/issues/4151 Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Reviewed-by: Ranjani Sridharan ranjani.sridharan@linux.intel.com Reviewed-by: Rander Wang rander.wang@intel.com Reviewed-by: Bard Liao yung-chuan.liao@linux.intel.com Signed-off-by: Peter Ujfalusi peter.ujfalusi@linux.intel.com Link: https://lore.kernel.org/r/20230216162340.19480-1-peter.ujfalusi@linux.intel.... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/sof/intel/hda-dai.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/sound/soc/sof/intel/hda-dai.c +++ b/sound/soc/sof/intel/hda-dai.c @@ -211,6 +211,10 @@ static int hda_link_hw_params(struct snd int stream_tag; int ret;
+ link = snd_hdac_ext_bus_get_link(bus, codec_dai->component->name); + if (!link) + return -EINVAL; + /* get stored dma data if resuming from system suspend */ link_dev = snd_soc_dai_get_dma_data(dai, substream); if (!link_dev) { @@ -231,10 +235,6 @@ static int hda_link_hw_params(struct snd if (ret < 0) return ret;
- link = snd_hdac_ext_bus_get_link(bus, codec_dai->component->name); - if (!link) - return -EINVAL; - /* set the stream tag in the codec dai dma params */ if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) snd_soc_dai_set_tdm_slot(codec_dai, stream_tag, 0, 0, 0);
From: Dan Carpenter error27@gmail.com
commit 9cec2aaffe969f2a3e18b5ec105fc20bb908e475 upstream.
The > needs be >= to prevent an out of bounds access.
Fixes: de5ca4c3852f ("net: sched: sch: Bounds check priority") Signed-off-by: Dan Carpenter error27@gmail.com Reviewed-by: Simon Horman simon.horman@corigine.com Reviewed-by: Kees Cook keescook@chromium.org Link: https://lore.kernel.org/r/Y+D+KN18FQI2DKLq@kili Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/sch_htb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -407,7 +407,7 @@ static void htb_activate_prios(struct ht while (m) { unsigned int prio = ffz(~m);
- if (WARN_ON_ONCE(prio > ARRAY_SIZE(p->inner.clprio))) + if (WARN_ON_ONCE(prio >= ARRAY_SIZE(p->inner.clprio))) break; m &= ~(1 << prio);
From: Joerg Roedel jroedel@suse.de
commit 3057fb9377eb5e73386dd0d8804bf72bdd23e391 upstream.
A recent commit added a gfp parameter to amd_iommu_map() to make it callable from atomic context, but forgot to pass it down to iommu_map_page() and left GFP_KERNEL there. This caused sleep-while-atomic warnings and needs to be fixed.
Reported-by: Qian Cai cai@lca.pw Reported-by: Dan Carpenter dan.carpenter@oracle.com Fixes: 781ca2de89ba ("iommu: Add gfp parameter to iommu_ops::map") Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iommu/amd_iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -3114,7 +3114,7 @@ static int amd_iommu_map(struct iommu_do prot |= IOMMU_PROT_IW;
mutex_lock(&domain->api_lock); - ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL); + ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp); mutex_unlock(&domain->api_lock);
domain_flush_np_cache(domain, iova, page_size);
On Mon, 20 Feb 2023 at 19:14, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.4.232 release. There are 156 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 22 Feb 2023 13:35:35 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.232-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 5.4.232-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-5.4.y * git commit: 01caaff111842f1fb3c245d4387cb2ba7aed627c * git describe: v5.4.231-157-g01caaff11184 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.4.y/build/v5.4.23...
## Test Regressions (compared to v5.4.231)
## Metric Regressions (compared to v5.4.231)
## Test Fixes (compared to v5.4.231)
## Metric Fixes (compared to v5.4.231)
## Test result summary total: 128121, pass: 104609, fail: 3232, skip: 19905, xfail: 375
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 148 total, 147 passed, 1 failed * arm64: 48 total, 44 passed, 4 failed * i386: 28 total, 22 passed, 6 failed * mips: 31 total, 29 passed, 2 failed * parisc: 8 total, 8 passed, 0 failed * powerpc: 34 total, 32 passed, 2 failed * riscv: 16 total, 12 passed, 4 failed * s390: 8 total, 8 passed, 0 failed * sh: 14 total, 12 passed, 2 failed * sparc: 8 total, 8 passed, 0 failed * x86_64: 41 total, 39 passed, 2 failed
## Test suites summary * boot * fwts * igt-gpu-tools * kselftest-android * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers-dma-buf * kselftest-efivarfs * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-net-forwarding * kselftest-net-mptcp * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-fsx * ltp-hugetlb * ltp-io * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-open-posix-tests * ltp-pty * ltp-sched * ltp-securebits * ltp-smoke * ltp-syscalls * ltp-tracing * network-basic-tests * packetdrill * perf * rcutorture * v4l2-compliance * vdso
-- Linaro LKFT https://lkft.linaro.org
Hi Greg,
On Mon, Feb 20, 2023 at 02:34:04PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.232 release. There are 156 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 22 Feb 2023 13:35:35 +0000. Anything received after that time might be too late.
Build test (gcc version 11.3.1 20230210): mips: 65 configs -> no failure arm: 106 configs -> no failure arm64: 2 configs -> no failure x86_64: 4 configs -> no failure alpha allmodconfig -> no failure powerpc allmodconfig -> no failure riscv allmodconfig -> no failure s390 allmodconfig -> no failure xtensa allmodconfig -> no failure
Boot test: x86_64: Booted on my test laptop. No regression. x86_64: Booted on qemu. No regression. [1]
[1]. https://openqa.qa.codethink.co.uk/tests/2905
Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk
-- Regards Sudip
On Mon, Feb 20, 2023 at 02:34:04PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.232 release. There are 156 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 22 Feb 2023 13:35:35 +0000. Anything received after that time might be too late.
Build results: total: 159 pass: 159 fail: 0 Qemu test results: total: 450 pass: 450 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On 2/20/23 05:34, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.232 release. There are 156 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 22 Feb 2023 13:35:35 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.232-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli f.fainelli@gmail.com
On 2/20/23 06:34, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.232 release. There are 156 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 22 Feb 2023 13:35:35 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.232-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On 2023/2/20 21:34, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.232 release. There are 156 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 22 Feb 2023 13:35:35 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.232-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
Tested on arm64 and x86 for 5.4.232-rc1,
Kernel repo:https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Branch: linux-5.4.y Version: 5.4.232-rc1 Commit: 01caaff111842f1fb3c245d4387cb2ba7aed627c Compiler: gcc version 7.3.0 (GCC)
arm64: -------------------------------------------------------------------- Testcase Result Summary: total: 9017 passed: 9017 failed: 0 timeout: 0 --------------------------------------------------------------------
x86: -------------------------------------------------------------------- Testcase Result Summary: total: 9017 passed: 9017 failed: 0 timeout: 0 -------------------------------------------------------------------- Tested-by: Hulk Robot hulkrobot@huawei.com
linux-stable-mirror@lists.linaro.org