This is the start of the stable review cycle for the 5.17.3 release. There are 343 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 14 Apr 2022 06:28:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.17.3-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.17.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.17.3-rc1
Jens Axboe axboe@kernel.dk io_uring: drop the old style inflight file tracking
Jens Axboe axboe@kernel.dk io_uring: defer file assignment
Jens Axboe axboe@kernel.dk io_uring: propagate issue_flags state down to file assignment
Jens Axboe axboe@kernel.dk io_uring: move read/write file prep state into actual opcode handler
Christophe Leroy christophe.leroy@csgroup.eu static_call: Don't make __static_call_return0 static
Waiman Long longman@redhat.com mm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warning
Andre Przywara andre.przywara@arm.com irqchip/gic, gic-v3: Prevent GSI to SGI translations
Christophe Leroy christophe.leroy@csgroup.eu powerpc/64: Fix build failure with allyesconfig in book3s_64_entry.S
Marc Zyngier maz@kernel.org irqchip/gic-v4: Wait for GICR_VPENDBASER.Dirty to clear before descheduling
Nick Desaulniers ndesaulniers@google.com x86/extable: Prefer local labels in .set directives
Peter Zijlstra peterz@infradead.org x86,static_call: Fix __static_call_return0 for i386
Sebastian Andrzej Siewior bigeasy@linutronix.de sched: Teach the forced-newidle balancer about CPU affinity limitation.
Peter Zijlstra peterz@infradead.org sched/core: Fix forceidle balancing
Peter Zijlstra peterz@infradead.org objtool: Fix SLS validation for kcov tail-call replacement
Vincent Mailhol mailhol.vincent@wanadoo.fr x86/bug: Prevent shadowing in __WARN_FLAGS
Kefeng Wang wangkefeng.wang@huawei.com Revert "powerpc: Set max_mapnr correctly"
Kefeng Wang wangkefeng.wang@huawei.com powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit
Andrea Parri (Microsoft) parri.andrea@gmail.com Drivers: hv: vmbus: Replace smp_store_mb() with virt_store_mb()
Paolo Bonzini pbonzini@redhat.com KVM: avoid NULL pointer dereference in kvm_dirty_ring_push
Vinod Koul vkoul@kernel.org dmaengine: Revert "dmaengine: shdma: Fix runtime PM imbalance on error"
Arnaldo Carvalho de Melo acme@redhat.com tools build: Use $(shell ) instead of `` to get embedded libperl's ccopts
Arnaldo Carvalho de Melo acme@redhat.com tools build: Filter out options and warnings not supported by clang
Arnaldo Carvalho de Melo acme@redhat.com perf python: Fix probing for some clang command line options
Arnaldo Carvalho de Melo acme@redhat.com perf build: Don't use -ffat-lto-objects in the python feature test when building with clang-13
Jakub Sitnicki jakub@cloudflare.com bpf: Treat bpf_sk_lookup remote_port as a 2-byte field
Jakub Sitnicki jakub@cloudflare.com selftests/bpf: Fix u8 narrow load checks for bpf_sk_lookup remote_port
Jakub Sitnicki jakub@cloudflare.com bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide
Jakub Kicinski kuba@kernel.org Revert "selftests: net: Add tls config dependency for tls selftests"
Dust Li dust.li@linux.alibaba.com net/smc: send directly on setting TCP_NODELAY
Philip Yang Philip.Yang@amd.com drm/amdkfd: Fix variable set but not used warning
Akihiko Odaki akihiko.odaki@gmail.com Revert "ACPI: processor: idle: Only flush cache on entering C3"
Trond Myklebust trond.myklebust@hammerspace.com SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()
Alex Deucher alexander.deucher@amd.com drm/amdgpu: don't use BACO for reset in S3
Lee Jones lee.jones@linaro.org drm/amdkfd: Create file descriptor after client is added to smi_clients list
Karol Herbst kherbst@redhat.com drm/nouveau/pmu: Add missing callbacks for Tegra devices
Emily Deng Emily.Deng@amd.com drm/amdgpu/vcn: Fix the register setting for vcn1
Alex Deucher alexander.deucher@amd.com drm/amdgpu/smu10: fix SoC/fclk units in auto mode
Benjamin Marty info@benjaminmarty.ch drm/amdgpu/display: change pipe policy for DCN 2.1
CHANDAN VURDIGERE NATARAJ chandan.vurdigerenataraj@amd.com drm/amd/display: Fix by adding FPU protection for dcn30_internal_validate_bw
Daniel Mack daniel@zonque.org drm/panel: ili9341: fix optional regulator handling
Shirish S shirish.s@amd.com amd/display: set backlight only if required
Thomas Zimmermann tzimmermann@suse.de fbdev: Fix unregistering of framebuffers without device
Marc Zyngier maz@kernel.org irqchip/gic-v3: Fix GICR_CTLR.RWP polling
Namhyung Kim namhyung@kernel.org perf/core: Inherit event_caps
Xiaomeng Tong xiam0nd.tong@gmail.com perf: qcom_l2_pmu: fix an incorrect NULL check on list iterator
Christian Lamparter chunkeey@gmail.com ata: sata_dwc_460ex: Fix crash due to OOB write
Kan Liang kan.liang@linux.intel.com perf/x86/intel: Don't extend the pseudo-encoding to GP counters
Dave Hansen dave.hansen@linux.intel.com x86/mm/tlb: Revert retpoline avoidance approach
Reto Buerki reet@codelabs.ch x86/msi: Fix msi message data shadow struct
Shreeya Patel shreeya.patel@collabora.com gpio: Restrict usage of GPIO chip irq members before initialization
Xiaomeng Tong xiam0nd.tong@gmail.com drbd: fix an invalid memory access caused by incorrect use of list iterator
Douglas Miller doug.miller@cornelisnetworks.com RDMA/hfi1: Fix use-after-free bug for mm struct
Guo Ren guoren@kernel.org arm64: patch_text: Fixup last cpu should be master
Manish Chopra manishc@marvell.com qed: fix ethtool register dump
Paulo Alcantara pc@cjr.nz cifs: force new session setup and tcon for dfs
Vinod Koul vkoul@kernel.org spi: core: add dma_map_dev for __spi_unmap_msg()
Kaiwen Hu kevinhu@synology.com btrfs: prevent subvol with swapfile from being deleted
Qu Wenruo wqu@suse.com btrfs: avoid defragging extents whose next extents are not targets
Qu Wenruo wqu@suse.com btrfs: remove device item and update super block in the same transaction
Johannes Thumshirn johannes.thumshirn@wdc.com btrfs: zoned: traverse devices under chunk_mutex in btrfs_can_activate_zone
Ethan Lien ethanlien@synology.com btrfs: fix qgroup reserve overflow the qgroup limit
Kan Liang kan.liang@linux.intel.com perf/x86/intel: Update the FRONTEND MSR mask on Sapphire Rapids
Pawan Gupta pawan.kumar.gupta@linux.intel.com x86/speculation: Restore speculation related MSRs during S3 resume
Pawan Gupta pawan.kumar.gupta@linux.intel.com x86/pm: Save the MSR validity status at context setup
Jens Axboe axboe@kernel.dk io_uring: fix race between timeout flush and removal
Eugene Syromiatnikov esyr@redhat.com io_uring: implement compat handling for IORING_REGISTER_IOWQ_AFF
Jens Axboe axboe@kernel.dk io_uring: defer splice/tee file validity check until command issue
Jens Axboe axboe@kernel.dk io_uring: don't check req->file in io_fsync_prep()
Miaohe Lin linmiaohe@huawei.com mm/mempolicy: fix mpol_new leak in shared_policy_replace
Paolo Bonzini pbonzini@redhat.com mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0)
Max Filippov jcmvbkbc@gmail.com highmem: fix checks in __kmap_local_sched_{in,out}
Guo Xuenan guoxuenan@huawei.com lz4: fix LZ4_decompress_safe_partial read out of bound
Michael Wu michael@allwinnertech.com mmc: core: Fixup support for writeback-cache for eMMC and SD
Wolfram Sang wsa+renesas@sang-engineering.com mmc: renesas_sdhi: don't overwrite TAP settings when HS400 tuning is complete
Wolfram Sang wsa+renesas@sang-engineering.com mmc: renesas_sdhi: special 4tap settings only apply to HS400
Yann Gautier yann.gautier@foss.st.com mmc: mmci: stm32: correctly check all elements of sg list
Christian Löhle CLoehle@hyperstone.com mmc: block: Check for errors after write on SPI
Pali Rohár pali@kernel.org Revert "mmc: sdhci-xenon: fix annoying 1.8V regulator warning"
Adrian Hunter adrian.hunter@intel.com scsi: ufs: ufs-pci: Add support for Intel MTL
Damien Le Moal damien.lemoal@opensource.wdc.com scsi: mpt3sas: Fix use after free in _scsih_expander_node_remove()
Chanho Park chanho61.park@samsung.com arm64: Add part number for Arm Cortex-A78AE
Denis Nikitin denik@chromium.org perf session: Remap buf if there is no space for event
Adrian Hunter adrian.hunter@intel.com perf tools: Fix perf's libperf_print callback
James Clark james.clark@arm.com perf: arm-spe: Fix perf report --mem-mode
James Clark james.clark@arm.com perf unwind: Don't show unwind error messages when augmenting frame pointer stack
Tony Lindgren tony@atomide.com iommu/omap: Fix regression in probe for NULL pointer dereference
Trond Myklebust trond.myklebust@hammerspace.com SUNRPC: svc_tcp_sendmsg() should handle errors from xdr_alloc_bvec()
Trond Myklebust trond.myklebust@hammerspace.com SUNRPC: Handle low memory situations in call_status()
Trond Myklebust trond.myklebust@hammerspace.com SUNRPC: Handle ENOMEM in call_transmit_status()
Pavel Begunkov asml.silence@gmail.com io_uring: don't touch scm_fp_list after queueing skb
Pavel Begunkov asml.silence@gmail.com io_uring: nospec index for tags on files update
Xiaomeng Tong xiam0nd.tong@gmail.com scsi: ufs: ufshpb: Fix a NULL check on list iterator
Martin K. Petersen martin.petersen@oracle.com scsi: sd: sd_read_cpr() requires VPD pages
Lv Yunlong lyl2019@mail.ustc.edu.cn drbd: Fix five use after free bugs in get_initial_state
Maxim Mikityanskiy maximmi@nvidia.com bpf: Support dual-stack sockets in bpf_tcp_check_syncookie
Roman Li Roman.Li@amd.com drm/amd/display: Remove redundant dsc power gating from init_hw
Meenakshikumar Somasundaram meenakshikumar.somasundaram@amd.com drm/amd/display: Fix for dmub outbox notification enable
Kamal Dasu kdasu.kdev@gmail.com spi: bcm-qspi: fix MSPI only access with bcm_qspi_exec_mem_op()
Jamie Bainbridge jamie.bainbridge@gmail.com qede: confirm skb is allocated before using
Michael Walle michael@walle.cc net: phy: mscc-miim: reject clause 45 register accesses
Taehee Yoo ap420073@gmail.com net: sfc: fix using uninitialized xdp tx_queue
Eric Dumazet edumazet@google.com rxrpc: fix a race in rxrpc_exit_net()
Ilya Maximets i.maximets@ovn.org net: openvswitch: fix leak of nested actions
Andrew Lunn andrew@lunn.ch net: ethernet: mv643xx: Fix over zealous checking of_get_mac_address()
Ilya Maximets i.maximets@ovn.org net: openvswitch: don't send internal clone attribute to the userspace.
Maciej Fijalkowski maciej.fijalkowski@intel.com ice: clear cmd_type_offset_bsz for TX rings
Maciej Fijalkowski maciej.fijalkowski@intel.com ice: xsk: fix VSI state check in ice_xsk_wakeup()
Maciej Fijalkowski maciej.fijalkowski@intel.com ice: synchronize_rcu() when terminating rings
David Ahern dsahern@kernel.org ipv6: Fix stats accounting in ip6_pkt_drop
Anatolii Gerasymenko anatolii.gerasymenko@intel.com ice: Do not skip not enabled queues in ice_vc_dis_qs_msg
Anatolii Gerasymenko anatolii.gerasymenko@intel.com ice: Set txq_teid to ICE_INVAL_TEID on ring creation
Miaoqian Lin linmq006@gmail.com dpaa2-ptp: Fix refcount leak in dpaa2_ptp_probe
Jamie Bainbridge jamie.bainbridge@gmail.com sctp: count singleton chunks in assoc user stats
Niels Dossche dossche.niels@gmail.com IB/rdmavt: add lock to call to rvt_error_qp to prevent a race condition
Paulo Alcantara pc@cjr.nz cifs: fix potential race with cifsd thread
Mark Zhang markzhang@nvidia.com IB/cm: Cancel mad on the DREQ event when the state is MRA_REP_RCVD
Aharon Landau aharonl@nvidia.com RDMA/mlx5: Add a missing update of cache->last_add
Aharon Landau aharonl@nvidia.com RDMA/mlx5: Don't remove cache MRs when a delay is needed
Martin Habets habetsm.xilinx@gmail.com sfc: Do not free an empty page_ring
Ray Jui ray.jui@broadcom.com bnxt_en: Prevent XDP redirect from running when stopping TX queue
Andy Gospodarek gospo@broadcom.com bnxt_en: reserve space inside receive page for skb_shared_info
Pavan Chebbi pavan.chebbi@broadcom.com bnxt_en: Synchronize tx when xdp redirects happen on same ring
Phil Auld pauld@redhat.com arch/arm64: Fix topology initialization for core scheduling
Axel Lin axel.lin@ingics.com regulator: atc260x: Fix missing active_discharge_on setting
Geert Uytterhoeven geert+renesas@glider.be spi: rpc-if: Fix RPM imbalance in probe error path
Axel Lin axel.lin@ingics.com regulator: rtq2134: Fix missing active_discharge_on setting
Liu Ying victor.liu@nxp.com drm/imx: dw_hdmi-imx: Fix bailout in error cases of probe
José Expósito jose.exposito89@gmail.com drm/imx: Fix memory leak in imx_pd_connector_get_modes
Jiasheng Jiang jiasheng@iscas.ac.cn drm/imx: imx-ldb: Check for null pointer after calling kmemdup
Chen-Yu Tsai wens@csie.org net: stmmac: Fix unset max_speed difference between DT and non-DT platforms
Nikolay Aleksandrov razor@blackwall.org net: ipv4: fix route with nexthop object delete warning
Matt Johnston matt@codeconstruct.com.au mctp: Use output netdev to allocate skb headroom
Matt Johnston matt@codeconstruct.com.au mctp: Fix check for dev_hard_header() result
Ivan Vecera ivecera@redhat.com ice: Fix MAC address setting
Ivan Vecera ivecera@redhat.com ice: Clear default forwarding VSI during VSI release
Vladimir Oltean vladimir.oltean@nxp.com Revert "net: dsa: stop updating master MTU from master.c"
Jean-Philippe Brucker jean-philippe@linaro.org skbuff: fix coalescing for page_pool fragment recycling
Eyal Birger eyal.birger@gmail.com vrf: fix packet sniffing for traffic originating from ip tunnels
Ziyang Xuan william.xuanziyang@huawei.com net/tls: fix slab-out-of-bounds bug in decrypt_internal
Taehee Yoo ap420073@gmail.com net: sfc: add missing xdp queue reinitialization
Jason Wang jasowang@redhat.com vdpa: mlx5: prevent cvq work from hogging CPU
Christophe JAILLET christophe.jaillet@wanadoo.fr scsi: zorro7xx: Fix a resource leak in zorro7xx_remove_one()
John Garry john.garry@huawei.com scsi: core: Fix sbitmap depth in scsi_realloc_sdev_budget_map()
Kevin Groeneveld kgroeneveld@lenbrook.com scsi: sr: Fix typo in CDROM(CLOSETRAY|EJECT) handling
Tomas Henzl thenzl@redhat.com scsi: core: scsi_logging: Fix a BUG
ChenXiaoSong chenxiaosong2@huawei.com NFSv4: fix open failure with O_ACCMODE flag
ChenXiaoSong chenxiaosong2@huawei.com Revert "NFSv4: Handle the special Linux file open access mode"
Jeremy Sowden jeremy@azazel.net netfilter: bitwise: fix reduce comparisons
Guilherme G. Piccoli gpiccoli@igalia.com Drivers: hv: vmbus: Fix potential crash on module unload
Andrea Parri (Microsoft) parri.andrea@gmail.com Drivers: hv: vmbus: Fix initialization of device object in vmbus_device_register()
Dan Carpenter dan.carpenter@oracle.com drm/amdgpu: fix off by one in amdgpu_gfx_kiq_acquire()
Mauricio Faria de Oliveira mfo@canonical.com mm: fix race between MADV_FREE reclaim and blkdev direct IO read
John David Anglin dave.anglin@bell.net parisc: Fix patch code locking and flushing
Helge Deller deller@gmx.de parisc: Fix CPU affinity for Lasi, WAX and Dino chips
Naresh Kamboju naresh.kamboju@linaro.org selftests: net: Add tls config dependency for tls selftests
Trond Myklebust trond.myklebust@hammerspace.com NFS: Avoid writeback threads getting stuck in mempool_alloc()
Trond Myklebust trond.myklebust@hammerspace.com NFS: nfsiod should not block forever in mempool_alloc()
Trond Myklebust trond.myklebust@hammerspace.com SUNRPC: Fix socket waits for write buffer space
Haimin Zhang tcs_kernel@tencent.com jfs: prevent NULL deref in diFree
Randy Dunlap rdunlap@infradead.org virtio_console: eliminate anonymous module_init & module_exit
Jiri Slaby jirislaby@kernel.org serial: samsung_tty: do not unlock port->lock for uart_write_wakeup()
Nathan Chancellor nathan@kernel.org x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy
Peter Zijlstra peterz@infradead.org x86: Annotate call_on_stack()
NeilBrown neilb@suse.de NFS: swap-out must always use STABLE writes.
NeilBrown neilb@suse.de NFS: swap IO handling is slightly different for O_DIRECT IO
NeilBrown neilb@suse.de SUNRPC: remove scheduling boost for "SWAPPER" tasks.
NeilBrown neilb@suse.de SUNRPC/xprt: async tasks mustn't block waiting for memory
Maxime Ripard maxime@cerno.tech clk: Enforce that disjoints limits are invalid
Tony Lindgren tony@atomide.com clk: ti: Preserve node in ti_dt_clocks_register()
Dongli Zhang dongli.zhang@oracle.com xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32
Oded Gabbay ogabbay@kernel.org habanalabs/gaudi: handle axi errors from NIC engines
Oded Gabbay ogabbay@kernel.org habanalabs: reject host map with mmu disabled
Ohad Sharabi osharabi@habana.ai habanalabs: fix possible memory leak in MMU DR fini
Trond Myklebust trond.myklebust@hammerspace.com NFSv4: Protect the state recovery thread against direct reclaim
Xin Xiong xiongx18@fudan.edu.cn NFSv4.2: fix reference count leaks in _nfs42_proc_copy_notify()
Lucas Denefle lucas.denefle@converge.io w1: w1_therm: fixes w1_seq for ds28ea00 sensors
Xiaoke Wang xkernel.wang@foxmail.com staging: wfx: fix an error handling in wfx_init_common()
Jérôme Pouiller jerome.pouiller@silabs.com staging: wfx: apply the necessary SDIO quirks for the Silabs WF200
Viresh Kumar viresh.kumar@linaro.org opp: Expose of-node's name in debugfs
Pierre Gondois Pierre.Gondois@arm.com cpufreq: CPPC: Fix performance/frequency conversion
Sascha Hauer s.hauer@pengutronix.de clk: rockchip: drop CLK_SET_RATE_PARENT from dclk_vop* on rk3568
Amjad Ouled-Ameur aouledameur@baylibre.com phy: amlogic: meson8b-usb2: fix shared reset control use
Amjad Ouled-Ameur aouledameur@baylibre.com phy: amlogic: meson8b-usb2: Use dev_err_probe()
Amjad Ouled-Ameur aouledameur@baylibre.com phy: amlogic: phy-meson-gxl-usb2: fix shared reset controller use
Stefan Wahren stefan.wahren@i2se.com staging: vchiq_core: handle NULL result of find_service_by_handle
Stefan Wahren stefan.wahren@i2se.com staging: vchiq_arm: Avoid NULL ptr deref in vchiq_dump_platform_instances
José Expósito jose.exposito89@gmail.com clk: mediatek: Fix memory leaks on probe
Adam Wujek dev_public@wujek.eu clk: si5341: fix reported clk_rate when output divider is 2
Qinghua Jin qhjin.dev@gmail.com minix: fix bug when opening a file with O_DIRECT
Randy Dunlap rdunlap@infradead.org init/main.c: return 1 from handled __setup() functions
Feng Tang feng.tang@intel.com lib/Kconfig.debug: add ARCH dependency for FUNCTION_ALIGN option
Xiubo Li xiubli@redhat.com ceph: fix memory leak in ceph_readdir when note_last_dentry returns error
Xiubo Li xiubli@redhat.com ceph: fix inode reference leakage in ceph_get_snapdir()
Wang Yufen wangyufen@huawei.com netlabel: fix out-of-bounds memory accesses
Florian Westphal fw@strlen.de netfilter: conntrack: revisit gc autotuning
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: Fix use after free in hci_send_acl
Krzysztof Kozlowski krzk@kernel.org MIPS: ingenic: correct unit node address
Arnd Bergmann arnd@arndb.de iwlwifi: mei: fix building iwlmei
Max Filippov jcmvbkbc@gmail.com xtensa: fix DTC warning unit_address_format
Deren Wu deren.wu@mediatek.com mt76: fix monitor mode crash with sdio driver
Juergen Gross jgross@suse.com xen/usb: harden xen_hcd against malicious backends
H. Nikolaus Schaller hns@goldelico.com usb: dwc3: omap: fix "unbalanced disables for smps10_out1" on omap5evm
Michael Walle michael@walle.cc net: sfp: add 2500base-X quirk for Lantech SFP module
Jorge Lopez jorge.lopez2@hp.com platform/x86: hp-wmi: Fix 0x05 error code reported by several WMI calls
Jorge Lopez jorge.lopez2@hp.com platform/x86: hp-wmi: Fix SW_TABLET_MODE detection method
Gal Pressman gal@nvidia.com net/mlx5e: Remove overzealous validations in netlink EEPROM query
Jakub Kicinski kuba@kernel.org net: limit altnames to 64k total
Jakub Kicinski kuba@kernel.org net: account alternate interface name memory
Michael T. Kloos michael@michaelkloos.com riscv: Fixed misaligned memory access. Fixed pointer comparison.
Vincent Mailhol mailhol.vincent@wanadoo.fr can: etas_es58x: es58x_fd_rx_event_msg(): initialize rx_event_msg before calling es58x_check_msg_len()
Oliver Hartkopp socketcan@hartkopp.net can: isotp: set default value for N_As to 50 micro seconds
Hans de Goede hdegoede@redhat.com platform/x86: x86-android-tablets: Depend on EFI and SPI
Jianglei Nie niejianglei2021@163.com scsi: libfc: Fix use after free in fc_exch_abts_resp()
Hangyu Hua hbh25y@gmail.com powerpc/secvar: fix refcount leak in format_show()
Michael Ellerman mpe@ellerman.id.au powerpc/64e: Tie PPC_BOOK3E_64 to PPC_FSL_BOOK3E
Michael Ellerman mpe@ellerman.id.au powerpc/code-patching: Pre-map patch area
Alexander Lobakin alobakin@pm.me MIPS: fix fortify panic when copying asm exception handlers
Li Chen lchen@ambarella.com PCI: endpoint: Fix misused goto label
Michael Chan michael.chan@broadcom.com bnxt_en: Eliminate unintended link toggle during FW reset
Minghao Chi (CGEL ZTE) chi.minghao@zte.com.cn Bluetooth: use memset avoid memory leaks
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: Fix not checking for valid hdev on bt_dev_{info,warn,err,dbg}
Sean Wang sean.wang@mediatek.com Bluetooth: mediatek: fix the conflict between mtk and msft vendor event
Harold Huang baymaxhuang@gmail.com tuntap: add sanity checks about msg_controllen in sendmsg
Mark Pearson markpearson@lenovo.com platform/x86: thinkpad_acpi: Add dual fan probe
Sven Eckelmann sven@narfation.org macvtap: advertise link netns via netlink
Mateusz Palczewski mateusz.palczewski@intel.com iavf: stop leaking iavf_status as "errno" values
Hangyu Hua hbh25y@gmail.com mips: ralink: fix a refcount leak in ill_acc_of_setup()
Dust Li dust.li@linux.alibaba.com net/smc: correct settings of RMB window update limit
Xiang Chen chenxiang66@hisilicon.com scsi: hisi_sas: Limit users changing debugfs BIST count value
Qi Liu liuqi115@huawei.com scsi: hisi_sas: Free irq vectors in order for v3 HW
Randy Dunlap rdunlap@infradead.org scsi: aha152x: Fix aha152x_setup() __setup handler return value
Hans de Goede hdegoede@redhat.com power: supply: axp288_fuel_gauge: Use acpi_quirk_skip_acpi_ac_and_battery()
Hans de Goede hdegoede@redhat.com power: supply: axp288_charger: Use acpi_quirk_skip_acpi_ac_and_battery()
Yang Li yang.lee@linux.alibaba.com mt76: mt7615: Fix assigning negative values to unsigned variable
Nicholas Piggin npiggin@gmail.com powerpc/64s/hash: Make hash faults work in NMI context
Matt Johnston matt@codeconstruct.com.au mctp: make __mctp_dev_get() take a refcount hold
Johan Almbladh johan.almbladh@anyfinetworks.com mt76: mt7915: fix injected MPDU transmission to not use HW A-MSDU
Damien Le Moal damien.lemoal@opensource.wdc.com scsi: pm8001: Fix memory leak in pm8001_chip_fw_flash_update_req()
Damien Le Moal damien.lemoal@opensource.wdc.com scsi: pm8001: Fix tag leaks on error
Damien Le Moal damien.lemoal@opensource.wdc.com scsi: pm8001: Fix task leak in pm8001_send_abort_all()
Damien Le Moal damien.lemoal@opensource.wdc.com scsi: pm8001: Fix tag values handling
Damien Le Moal damien.lemoal@opensource.wdc.com scsi: pm8001: Fix pm8001_mpi_task_abort_resp()
Damien Le Moal damien.lemoal@opensource.wdc.com scsi: pm8001: Fix pm80xx_pci_mem_copy() interface
Alex Williamson alex.williamson@redhat.com vfio/pci: Stub vfio_pci_vga_rw when !CONFIG_VFIO_PCI_VGA
Alex Deucher alexander.deucher@amd.com drm/amdkfd: make CRAT table missing message informational only
Mike Snitzer snitzer@redhat.com dm: requeue IO if mapping table not yet available
Jordy Zomer jordy@jordyzomer.github.io dm ioctl: prevent potential spectre v1 gadget
Ping-Ke Shih pkshih@realtek.com rtw88: change rtw_info() to proper message level
Ido Schimmel idosch@nvidia.com ipv4: Invalidate neighbour for broadcast address upon address addition
Baochen Qiang quic_bqiang@quicinc.com ath11k: Fix frames flush failure caused by deadlock
Jiri Kosina jkosina@suse.cz rtw89: fix RCU usage in rtw89_core_txq_push()
Jue Wang juew@google.com x86/mce: Work around an erratum on fast string copy instructions
Daniel Thompson daniel.thompson@linaro.org drm/msm/dsi: Remove spurious IRQF_ONESHOT flag
Eric Dumazet edumazet@google.com ipv6: annotate some data-races around sk->sk_prot
Miri Korenblit miriam.rachel.korenblit@intel.com iwlwifi: mvm: move only to an enabled channel
Luca Coelho luciano.coelho@intel.com iwlwifi: fix small doc mistake for iwl_fw_ini_addr_val
Ilan Peer ilan.peer@intel.com iwlwifi: mvm: Correctly set fragmented EBS
Hans de Goede hdegoede@redhat.com usb: dwc3: pci: Set the swnode from inside dwc3_pci_quirks()
José Expósito jose.exposito89@gmail.com HID: apple: Report Magic Keyboard 2021 with fingerprint reader battery over USB
José Expósito jose.exposito89@gmail.com HID: apple: Report Magic Keyboard 2021 battery over USB
Maxim Mikityanskiy maximmi@nvidia.com net/mlx5e: Disable TX queues before registering the netdev
Sung Joon Kim sungkim@amd.com drm/amd/display: reset lane settings after each PHY repeater LT
Kevin Tang kevin3.tang@gmail.com drm/sprd: check the platform_get_resource() return value
Kevin Tang kevin3.tang@gmail.com drm/sprd: fix potential NULL dereference
Hans de Goede hdegoede@redhat.com power: supply: axp288-charger: Set Vhold to 4.4V
Christophe Leroy christophe.leroy@csgroup.eu powerpc/set_memory: Avoid spinlock recursion in change_page_attr()
Sreekanth Reddy sreekanth.reddy@broadcom.com scsi: mpi3mr: Fix memory leaks
Sreekanth Reddy sreekanth.reddy@broadcom.com scsi: mpi3mr: Fix reporting of actual data transfer size
Sreekanth Reddy sreekanth.reddy@broadcom.com scsi: mpi3mr: Fix deadlock while canceling the fw event
Manivannan Sadhasivam mani@kernel.org PCI: pciehp: Add Qualcomm quirk for Command Completed erratum
Sebastian Andrzej Siewior bigeasy@linutronix.de tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH.
Hou Zhiqiang Zhiqiang.Hou@nxp.com PCI: endpoint: Fix alignment fault error in copy tests
Ilya Leoshkevich iii@linux.ibm.com libbpf: Fix accessing the first syscall argument on s390
Ilya Leoshkevich iii@linux.ibm.com libbpf: Fix accessing the first syscall argument on arm64
Ilya Leoshkevich iii@linux.ibm.com libbpf: Fix accessing syscall arguments on powerpc
Marc Zyngier maz@kernel.org KVM: arm64: Do not change the PMU event filter after a VCPU has run
Neal Liu neal_liu@aspeedtech.com usb: ehci: add pci device support for Aspeed platforms
Zhou Guanghui zhouguanghui1@huawei.com iommu/arm-smmu-v3: fix event handling soft lockup
Ricardo Koller ricarkol@google.com kvm: selftests: aarch64: use a tighter assert in vgic_poke_irq()
Ricardo Koller ricarkol@google.com kvm: selftests: aarch64: fix some vgic related comments
Ricardo Koller ricarkol@google.com kvm: selftests: aarch64: fix the failure check in kvm_set_gsi_routing_irqchip_check
Ricardo Koller ricarkol@google.com kvm: selftests: aarch64: pass vgic_irq guest args as a pointer
Ricardo Koller ricarkol@google.com kvm: selftests: aarch64: fix assert in gicv3_access_reg
Pali Rohár pali@kernel.org PCI: aardvark: Fix support for MSI interrupts
Mahesh Rajashekhara mahesh.rajashekhara@microchip.com scsi: smartpqi: Fix kdump issue when controller is locked up
Don Brace don.brace@microchip.com scsi: smartpqi: Fix rmmod stack trace
Rajneesh Bhardwaj rajneesh.bhardwaj@amd.com drm/amdgpu: Fix recursive locking warning
Sourabh Jain sourabhjain@linux.ibm.com powerpc: Set crashkernel offset to mid of RMA region
Eric Dumazet edumazet@google.com net: initialize init_net earlier
Eric Dumazet edumazet@google.com ref_tracker: implement use-after-free detection
Eric Dumazet edumazet@google.com ipv6: make mc_forwarding atomic
Yonghong Song yhs@fb.com libbpf: Fix build issue with llvm-readelf
Avraham Stern avraham.stern@intel.com cfg80211: don't add non transmitted BSS to 6GHz scanned channels
Jedrzej Jagielski jedrzej.jagielski@intel.com i40e: Add sending commands in atomic context
Lorenzo Bianconi lorenzo@kernel.org mt76: dma: initialize skip_unmap in mt76_dma_rx_fill
Ben Greear greearb@candelatech.com mt76: mt7921: fix crash when startup fails.
Evgeny Boger boger@wirenboard.com power: supply: axp20x_battery: properly report current when discharging
Yongzhi Liu lyz_cs@pku.edu.cn drm/v3d: fix missing unlock
Yang Guang yang.guang5@zte.com.cn scsi: bfa: Replace snprintf() with sysfs_emit()
Yang Guang yang.guang5@zte.com.cn scsi: mvsas: Replace snprintf() with sysfs_emit()
Jakub Sitnicki jakub@cloudflare.com bpf: Make dst_port field in struct bpf_sock 16-bit wide
Yongzhi Liu lyz_cs@pku.edu.cn drm/bridge: Add missing pm_runtime_put_sync
Tony Lu tonylu@linux.alibaba.com net/smc: Send directly when TCP_CORK is cleared
Kalle Valo quic_kvalo@quicinc.com ath11k: mhi: use mhi_sync_power_up()
Kalle Valo quic_kvalo@quicinc.com ath11k: pci: fix crash on suspend if board file is not found
Venkateswara Naralasetty quic_vnaralas@quicinc.com ath11k: fix kernel panic during unload/load ath11k modules
Maxim Kiselev bigunclemax@gmail.com powerpc: dts: t104xrdb: fix phy type for FMAN 4/5
Sachin Sant sachinp@linux.ibm.com powerpc/xive: Export XIVE IPI information for online-only processors.
Jack Wang jinpu.wang@ionos.com RDMA/rtrs-clt: Do stop and failover outside reconnect work.
Amit Cohen amcohen@nvidia.com mlxsw: spectrum: Guard against invalid local ports
Tianci.Yin tianci.yin@amd.com drm/amdgpu: Fix an error message in rmmod
Philip Yang Philip.Yang@amd.com drm/amdkfd: svm range restore work deadlock when process exit
Philip Yang Philip.Yang@amd.com drm/amdkfd: Ensure mm remain valid in svm deferred_list work
Philip Yang Philip.Yang@amd.com drm/amdkfd: Don't take process mutex for svm ioctls
Roi Dayan roid@nvidia.com net/mlx5e: TC, Hold sample_attr on stack instead of pointer
Magnus Karlsson magnus.karlsson@intel.com selftests, xsk: Fix bpf_res cleanup test
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: hci_sync: Fix queuing commands when HCI_UNREGISTER is set
Yang Guang yang.guang5@zte.com.cn ptp: replace snprintf with sysfs_emit
Pawel Laszczak pawell@cadence.com usb: cdnsp: fix cdnsp_decode_trb function to properly handle ret value
Wayne Chang waynec@nvidia.com usb: gadget: tegra-xudc: Fix control endpoint's definitions
Wayne Chang waynec@nvidia.com usb: gadget: tegra-xudc: Do not program SPARAM
Nicholas Kazlauskas nicholas.kazlauskas@amd.com drm/amd/display: Use PSR version selected during set_psr_caps
Yongzhi Liu lyz_cs@pku.edu.cn drm/amd/display: Fix memory leak
Xin Xiong xiongx18@fudan.edu.cn drm/amd/amdgpu/amdgpu_cs: fix refcount leak of a dma_fence obj
Soenke Huster soenke.huster@eknoes.de Bluetooth: hci_event: Ignore multiple conn complete events
Jani Nikula jani.nikula@intel.com drm/edid: improve non-desktop quirk logging
Philipp Zabel philipp.zabel@gmail.com drm/edid: remove non_desktop quirk for HPN-3515 and LEN-B800.
Eric Huang jinhuieric.huang@amd.com drm/amdkfd: enable heavy-weight TLB flush on Arcturus
Dale Zhao dale.zhao@amd.com drm/amd/display: Add signal type check when verify stream backends same
Soenke Huster soenke.huster@eknoes.de Bluetooth: fix null ptr deref on hci_sync_conn_complete_evt
Zekun Shen bruceshenzk@gmail.com ath5k: fix OOB in ath5k_eeprom_read_pcal_info_5111
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: hci_sync: Fix compilation warning
Anisse Astier anisse@astier.eu drm: Add orientation quirk for GPD Win Max
Hou Wenlong houwenlong.hwl@antgroup.com KVM: x86/emulator: Emulate RDPID only if it is enabled in guest
Like Xu likexu@tencent.com KVM: x86/pmu: Fix and isolate TSX-specific performance event logic
Jim Mattson jmattson@google.com KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs
Peter Gonda pgonda@google.com KVM: SVM: Fix kvm_cache_regs.h inclusions for is_guest_mode()
Jim Mattson jmattson@google.com KVM: x86/pmu: Use different raw event masks for AMD and Intel
Muchun Song songmuchun@bytedance.com mm: kfence: fix objcgs vector allocation
Zheng Yongjun zhengyongjun3@huawei.com net: dsa: felix: fix possible NULL pointer dereference
Jiasheng Jiang jiasheng@iscas.ac.cn rtc: wm8350: Handle error for wm8350_register_irq
Benjamin Beichler benjamin.beichler@uni-rostock.de um: fix and optimize xor select template for CONFIG64 and timetravel mode
Johannes Berg johannes.berg@intel.com lib/logic_iomem: correct fallback config references
-------------
Diffstat:
Documentation/virt/kvm/devices/vcpu.rst | 2 +- Makefile | 4 +- arch/arm64/include/asm/cputype.h | 2 + arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/kernel/patching.c | 4 +- arch/arm64/kernel/proton-pack.c | 1 + arch/arm64/kernel/smp.c | 2 +- arch/arm64/kvm/arm.c | 4 + arch/arm64/kvm/pmu-emul.c | 33 +- arch/mips/boot/dts/ingenic/jz4780.dtsi | 2 +- arch/mips/include/asm/setup.h | 2 +- arch/mips/kernel/traps.c | 22 +- arch/mips/ralink/ill_acc.c | 1 + arch/parisc/kernel/patch.c | 25 +- arch/powerpc/boot/dts/fsl/t104xrdb.dtsi | 4 +- arch/powerpc/include/asm/interrupt.h | 2 +- arch/powerpc/include/asm/page.h | 6 +- arch/powerpc/kernel/rtas.c | 6 + arch/powerpc/kernel/secvar-sysfs.c | 9 +- arch/powerpc/kexec/core.c | 15 +- arch/powerpc/kvm/book3s_64_entry.S | 10 +- arch/powerpc/lib/code-patching.c | 14 + arch/powerpc/mm/book3s64/hash_utils.c | 54 +- arch/powerpc/mm/mem.c | 2 +- arch/powerpc/mm/pageattr.c | 32 +- arch/powerpc/perf/callchain.h | 9 +- arch/powerpc/perf/callchain_64.c | 27 - arch/powerpc/platforms/Kconfig.cputype | 3 +- arch/powerpc/sysdev/xive/common.c | 2 +- arch/riscv/lib/memmove.S | 368 +++++++++++--- arch/um/include/asm/xor.h | 4 +- arch/x86/Kconfig | 5 + arch/x86/events/intel/core.c | 8 +- arch/x86/include/asm/asm.h | 20 +- arch/x86/include/asm/bug.h | 4 +- arch/x86/include/asm/irq_stack.h | 3 +- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/include/asm/msi.h | 19 +- arch/x86/include/asm/perf_event.h | 5 + arch/x86/kernel/cpu/mce/core.c | 64 +++ arch/x86/kernel/cpu/mce/internal.h | 5 +- arch/x86/kernel/static_call.c | 5 +- arch/x86/kvm/emulate.c | 4 +- arch/x86/kvm/kvm_emulate.h | 1 + arch/x86/kvm/pmu.c | 18 +- arch/x86/kvm/svm/pmu.c | 9 +- arch/x86/kvm/svm/svm.h | 2 + arch/x86/kvm/svm/svm_onhyperv.c | 1 - arch/x86/kvm/vmx/pmu_intel.c | 14 +- arch/x86/kvm/x86.c | 6 + arch/x86/mm/tlb.c | 37 +- arch/x86/power/cpu.c | 21 +- arch/x86/xen/smp_hvm.c | 6 + arch/x86/xen/time.c | 24 +- arch/xtensa/boot/dts/xtfpga-flash-128m.dtsi | 8 +- arch/xtensa/boot/dts/xtfpga-flash-16m.dtsi | 8 +- arch/xtensa/boot/dts/xtfpga-flash-4m.dtsi | 4 +- drivers/acpi/processor_idle.c | 3 +- drivers/ata/sata_dwc_460ex.c | 6 +- drivers/block/drbd/drbd_int.h | 8 +- drivers/block/drbd/drbd_main.c | 6 +- drivers/block/drbd/drbd_nl.c | 41 +- drivers/block/drbd/drbd_state.c | 18 +- drivers/block/drbd/drbd_state_change.h | 8 +- drivers/bluetooth/btmtk.h | 1 + drivers/bluetooth/btmtksdio.c | 9 +- drivers/bluetooth/btusb.c | 8 - drivers/char/virtio_console.c | 8 +- drivers/clk/clk-si5341.c | 16 +- drivers/clk/clk.c | 24 + drivers/clk/mediatek/clk-mt8192.c | 36 +- drivers/clk/rockchip/clk-rk3568.c | 6 +- drivers/clk/ti/clk.c | 13 +- drivers/cpufreq/cppc_cpufreq.c | 43 +- drivers/dma/sh/shdma-base.c | 4 +- drivers/gpio/gpiolib.c | 19 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 6 - drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 +- drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 14 +- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 1 - drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 24 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 80 +-- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 7 +- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 6 + .../drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c | 80 ++- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c | 6 +- drivers/gpu/drm/amd/display/dc/core/dc.c | 66 ++- drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 15 +- drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 3 + drivers/gpu/drm/amd/display/dc/dc.h | 3 + drivers/gpu/drm/amd/display/dc/dce/dmub_outbox.c | 25 +- drivers/gpu/drm/amd/display/dc/dce/dmub_outbox.h | 4 +- .../drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 11 - .../gpu/drm/amd/display/dc/dcn21/dcn21_resource.c | 2 +- drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c | 2 +- .../gpu/drm/amd/display/dc/dcn31/dcn31_resource.c | 2 + drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 11 + .../gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c | 8 +- drivers/gpu/drm/bridge/nwl-dsi.c | 18 +- drivers/gpu/drm/drm_edid.c | 19 +- drivers/gpu/drm/drm_panel_orientation_quirks.c | 6 + drivers/gpu/drm/imx/dw_hdmi-imx.c | 8 +- drivers/gpu/drm/imx/imx-ldb.c | 2 + drivers/gpu/drm/imx/parallel-display.c | 4 +- drivers/gpu/drm/msm/dsi/dsi_host.c | 2 +- drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gm20b.c | 1 + drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp102.c | 2 +- drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c | 1 + drivers/gpu/drm/nouveau/nvkm/subdev/pmu/priv.h | 1 + drivers/gpu/drm/panel/panel-ilitek-ili9341.c | 4 +- drivers/gpu/drm/sprd/sprd_dpu.c | 5 + drivers/gpu/drm/sprd/sprd_drm.c | 2 +- drivers/gpu/drm/sprd/sprd_dsi.c | 5 + drivers/gpu/drm/v3d/v3d_gem.c | 6 +- drivers/hid/hid-apple.c | 6 +- drivers/hv/channel_mgmt.c | 6 +- drivers/hv/vmbus_drv.c | 16 +- drivers/infiniband/core/cm.c | 3 +- drivers/infiniband/hw/hfi1/mmu_rb.c | 6 + drivers/infiniband/hw/mlx5/mr.c | 5 +- drivers/infiniband/sw/rdmavt/qp.c | 6 +- drivers/infiniband/ulp/rtrs/rtrs-clt.c | 40 +- drivers/infiniband/ulp/rtrs/rtrs-clt.h | 1 + drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 + drivers/iommu/omap-iommu.c | 2 +- drivers/irqchip/irq-gic-v3-its.c | 28 +- drivers/irqchip/irq-gic-v3.c | 14 +- drivers/irqchip/irq-gic.c | 6 + drivers/md/dm-ioctl.c | 2 + drivers/md/dm-rq.c | 7 +- drivers/md/dm.c | 11 +- drivers/misc/habanalabs/common/memory.c | 30 +- drivers/misc/habanalabs/common/mmu/mmu_v1.c | 2 +- drivers/misc/habanalabs/gaudi/gaudi.c | 48 ++ drivers/mmc/core/block.c | 46 +- drivers/mmc/core/quirks.h | 5 + drivers/mmc/host/mmci_stm32_sdmmc.c | 6 +- drivers/mmc/host/renesas_sdhi_core.c | 8 +- drivers/mmc/host/sdhci-xenon.c | 10 - drivers/net/can/usb/etas_es58x/es58x_fd.c | 3 +- drivers/net/dsa/ocelot/felix_vsc9959.c | 4 + drivers/net/ethernet/broadcom/bnxt/bnxt.c | 7 + drivers/net/ethernet/broadcom/bnxt/bnxt.h | 5 +- drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 4 +- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 14 +- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h | 2 + drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c | 4 +- drivers/net/ethernet/intel/i40e/i40e_common.c | 21 +- drivers/net/ethernet/intel/iavf/iavf.h | 5 +- drivers/net/ethernet/intel/iavf/iavf_main.c | 173 +++++-- drivers/net/ethernet/intel/iavf/iavf_virtchnl.c | 18 +- drivers/net/ethernet/intel/ice/ice.h | 2 +- drivers/net/ethernet/intel/ice/ice_lib.c | 3 + drivers/net/ethernet/intel/ice/ice_main.c | 13 +- drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 4 +- drivers/net/ethernet/intel/ice/ice_xsk.c | 6 +- drivers/net/ethernet/marvell/mv643xx_eth.c | 2 +- .../ethernet/mellanox/mlx5/core/en/tc/act/sample.c | 7 +- .../net/ethernet/mellanox/mlx5/core/en/tc/sample.c | 10 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 1 - drivers/net/ethernet/mellanox/mlx5/core/en_tc.h | 2 +- .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 6 +- drivers/net/ethernet/mellanox/mlx5/core/port.c | 23 - drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 4 +- drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 7 + drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c | 3 +- .../ethernet/mellanox/mlxsw/spectrum_switchdev.c | 3 +- drivers/net/ethernet/qlogic/qed/qed_debug.c | 2 +- drivers/net/ethernet/qlogic/qede/qede_fp.c | 3 + drivers/net/ethernet/sfc/efx_channels.c | 148 +++--- drivers/net/ethernet/sfc/rx_common.c | 3 + drivers/net/ethernet/sfc/tx.c | 3 + drivers/net/ethernet/sfc/tx_common.c | 2 + .../net/ethernet/stmicro/stmmac/stmmac_platform.c | 3 +- drivers/net/macvtap.c | 6 + drivers/net/mdio/mdio-mscc-miim.c | 6 + drivers/net/phy/sfp-bus.c | 6 + drivers/net/tap.c | 3 +- drivers/net/tun.c | 3 +- drivers/net/vrf.c | 15 +- drivers/net/wireless/ath/ath11k/ahb.c | 2 + drivers/net/wireless/ath/ath11k/mac.c | 2 +- drivers/net/wireless/ath/ath11k/mhi.c | 2 +- drivers/net/wireless/ath/ath11k/pci.c | 10 + drivers/net/wireless/ath/ath5k/eeprom.c | 3 + drivers/net/wireless/intel/iwlwifi/Kconfig | 1 + .../net/wireless/intel/iwlwifi/fw/api/dbg-tlv.h | 5 +- drivers/net/wireless/intel/iwlwifi/mvm/phy-ctxt.c | 31 +- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 5 +- drivers/net/wireless/mediatek/mt76/dma.c | 1 + drivers/net/wireless/mediatek/mt76/mt76.h | 2 +- drivers/net/wireless/mediatek/mt76/mt7615/mac.c | 2 +- drivers/net/wireless/mediatek/mt76/mt7915/mac.c | 1 + drivers/net/wireless/mediatek/mt76/mt7921/main.c | 1 + drivers/net/wireless/realtek/rtw88/debug.c | 2 +- drivers/net/wireless/realtek/rtw88/debug.h | 1 + drivers/net/wireless/realtek/rtw88/fw.c | 2 +- drivers/net/wireless/realtek/rtw88/mac80211.c | 8 +- drivers/net/wireless/realtek/rtw88/main.c | 8 +- drivers/net/wireless/realtek/rtw88/rtw8821c.c | 2 +- drivers/net/wireless/realtek/rtw88/rtw8822b.c | 4 +- drivers/net/wireless/realtek/rtw88/rtw8822c.c | 4 +- drivers/net/wireless/realtek/rtw88/sar.c | 8 +- drivers/net/wireless/realtek/rtw89/core.c | 5 +- drivers/opp/debugfs.c | 5 + drivers/opp/opp.h | 1 + drivers/parisc/dino.c | 41 +- drivers/parisc/gsc.c | 31 ++ drivers/parisc/gsc.h | 1 + drivers/parisc/lasi.c | 7 +- drivers/parisc/wax.c | 7 +- drivers/pci/controller/pci-aardvark.c | 16 +- drivers/pci/endpoint/functions/pci-epf-test.c | 14 +- drivers/pci/hotplug/pciehp_hpc.c | 2 + drivers/perf/qcom_l2_pmu.c | 6 +- drivers/phy/amlogic/phy-meson-gxl-usb2.c | 5 +- drivers/phy/amlogic/phy-meson8b-usb2.c | 9 +- drivers/platform/x86/Kconfig | 2 +- drivers/platform/x86/hp-wmi.c | 93 ++-- drivers/platform/x86/thinkpad_acpi.c | 15 +- drivers/power/supply/Kconfig | 4 +- drivers/power/supply/axp20x_battery.c | 13 +- drivers/power/supply/axp288_charger.c | 21 +- drivers/power/supply/axp288_fuel_gauge.c | 14 +- drivers/ptp/ptp_sysfs.c | 4 +- drivers/regulator/atc260x-regulator.c | 1 + drivers/regulator/rtq2134-regulator.c | 1 + drivers/rtc/rtc-wm8350.c | 11 +- drivers/scsi/aha152x.c | 6 +- drivers/scsi/bfa/bfad_attr.c | 26 +- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 68 ++- drivers/scsi/libfc/fc_exch.c | 1 + drivers/scsi/mpi3mr/mpi3mr.h | 4 + drivers/scsi/mpi3mr/mpi3mr_fw.c | 4 +- drivers/scsi/mpi3mr/mpi3mr_os.c | 108 +++- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 5 +- drivers/scsi/mvsas/mv_init.c | 4 +- drivers/scsi/pm8001/pm8001_hwi.c | 79 ++- drivers/scsi/pm8001/pm8001_init.c | 3 +- drivers/scsi/pm8001/pm8001_sas.c | 15 +- drivers/scsi/pm8001/pm8001_sas.h | 2 + drivers/scsi/pm8001/pm80xx_hwi.c | 22 +- drivers/scsi/scsi_logging.c | 2 +- drivers/scsi/scsi_scan.c | 5 + drivers/scsi/sd.c | 2 +- drivers/scsi/smartpqi/smartpqi_init.c | 45 +- drivers/scsi/sr.c | 2 +- drivers/scsi/ufs/ufshcd-pci.c | 17 + drivers/scsi/ufs/ufshpb.c | 11 +- drivers/scsi/zorro7xx.c | 2 + drivers/spi/spi-bcm-qspi.c | 4 +- drivers/spi/spi-rpc-if.c | 8 +- drivers/spi/spi.c | 4 + .../vc04_services/interface/vchiq_arm/vchiq_arm.c | 3 + .../vc04_services/interface/vchiq_arm/vchiq_core.c | 6 + drivers/staging/wfx/bus_sdio.c | 3 - drivers/staging/wfx/main.c | 7 +- drivers/tty/serial/samsung_tty.c | 5 +- drivers/usb/cdns3/cdnsp-debug.h | 305 ++++++------ drivers/usb/dwc3/dwc3-omap.c | 2 +- drivers/usb/dwc3/dwc3-pci.c | 11 +- drivers/usb/gadget/udc/tegra-xudc.c | 20 +- drivers/usb/host/ehci-pci.c | 9 + drivers/usb/host/xen-hcd.c | 57 ++- drivers/vdpa/mlx5/net/mlx5_vnet.c | 21 +- drivers/vfio/pci/vfio_pci_rdwr.c | 2 + drivers/vhost/net.c | 1 + drivers/video/fbdev/core/fbmem.c | 9 +- drivers/w1/slaves/w1_therm.c | 8 +- fs/btrfs/extent_io.h | 2 +- fs/btrfs/inode.c | 22 + fs/btrfs/ioctl.c | 20 +- fs/btrfs/volumes.c | 65 ++- fs/btrfs/zoned.c | 9 +- fs/ceph/dir.c | 11 +- fs/ceph/inode.c | 10 +- fs/cifs/connect.c | 15 +- fs/cifs/netmisc.c | 2 +- fs/file_table.c | 1 + fs/io-wq.h | 1 + fs/io_uring.c | 388 +++++++-------- fs/jfs/inode.c | 3 +- fs/minix/inode.c | 3 +- fs/nfs/dir.c | 10 - fs/nfs/direct.c | 48 +- fs/nfs/file.c | 4 +- fs/nfs/inode.c | 1 - fs/nfs/internal.h | 17 + fs/nfs/nfs42proc.c | 9 +- fs/nfs/nfs4file.c | 6 +- fs/nfs/nfs4state.c | 12 + fs/nfs/pagelist.c | 10 +- fs/nfs/pnfs_nfs.c | 8 +- fs/nfs/write.c | 34 +- include/linux/gpio/driver.h | 9 + include/linux/ipv6.h | 2 +- include/linux/mmzone.h | 11 +- include/linux/nfs_fs.h | 10 +- include/linux/ref_tracker.h | 2 + include/linux/static_call.h | 5 +- include/linux/vfio_pci_core.h | 9 + include/net/arp.h | 1 + include/net/bluetooth/bluetooth.h | 14 +- include/net/bluetooth/hci_core.h | 3 + include/net/mctp.h | 2 - include/net/net_namespace.h | 6 + include/trace/events/sunrpc.h | 1 - include/uapi/linux/bpf.h | 6 +- include/uapi/linux/can/isotp.h | 28 +- init/main.c | 8 +- kernel/Makefile | 3 +- kernel/events/core.c | 3 + kernel/sched/core.c | 16 +- kernel/sched/idle.c | 1 - kernel/sched/sched.h | 6 - kernel/static_call.c | 542 +------------------- kernel/static_call_inline.c | 543 +++++++++++++++++++++ lib/Kconfig.debug | 3 +- lib/logic_iomem.c | 8 +- lib/lz4/lz4_decompress.c | 8 +- lib/ref_tracker.c | 5 + mm/highmem.c | 4 +- mm/kfence/core.c | 11 +- mm/kfence/kfence.h | 3 + mm/mempolicy.c | 1 + mm/mremap.c | 3 + mm/rmap.c | 25 +- net/batman-adv/multicast.c | 2 +- net/bluetooth/hci_conn.c | 1 + net/bluetooth/hci_event.c | 79 ++- net/bluetooth/hci_sync.c | 7 +- net/bluetooth/l2cap_core.c | 1 + net/bpf/test_run.c | 4 +- net/can/isotp.c | 12 +- net/core/dev.c | 3 +- net/core/filter.c | 46 +- net/core/net_namespace.c | 17 +- net/core/rtnetlink.c | 13 +- net/core/skbuff.c | 15 +- net/dsa/master.c | 25 +- net/ipv4/arp.c | 9 +- net/ipv4/fib_frontend.c | 5 +- net/ipv4/fib_semantics.c | 7 +- net/ipv4/inet_hashtables.c | 53 +- net/ipv6/addrconf.c | 4 +- net/ipv6/af_inet6.c | 24 +- net/ipv6/inet6_hashtables.c | 5 +- net/ipv6/ip6_input.c | 2 +- net/ipv6/ip6mr.c | 8 +- net/ipv6/ipv6_sockglue.c | 6 +- net/ipv6/route.c | 2 +- net/mctp/af_mctp.c | 46 +- net/mctp/device.c | 21 +- net/mctp/route.c | 21 +- net/mctp/test/utils.c | 1 - net/netfilter/nf_conntrack_core.c | 85 +++- net/netfilter/nft_bitwise.c | 4 +- net/netlabel/netlabel_kapi.c | 2 + net/openvswitch/actions.c | 2 +- net/openvswitch/flow_netlink.c | 99 +++- net/rxrpc/net_ns.c | 2 +- net/sctp/outqueue.c | 6 +- net/smc/af_smc.c | 8 +- net/smc/smc_core.c | 2 +- net/smc/smc_tx.c | 25 +- net/smc/smc_tx.h | 1 + net/sunrpc/clnt.c | 7 + net/sunrpc/sched.c | 7 - net/sunrpc/svcsock.c | 4 +- net/sunrpc/xprt.c | 23 +- net/sunrpc/xprtrdma/transport.c | 2 +- net/sunrpc/xprtsock.c | 70 ++- net/tls/tls_sw.c | 2 +- net/wireless/scan.c | 9 +- tools/build/feature/Makefile | 9 +- tools/lib/bpf/Makefile | 4 +- tools/lib/bpf/bpf_tracing.h | 14 + tools/objtool/check.c | 11 + tools/perf/Makefile.config | 6 + tools/perf/arch/arm64/util/arm-spe.c | 6 + tools/perf/perf.c | 2 +- tools/perf/tests/dwarf-unwind.c | 2 +- .../perf/util/arm64-frame-pointer-unwind-support.c | 2 +- tools/perf/util/machine.c | 2 +- tools/perf/util/session.c | 15 +- tools/perf/util/setup.py | 8 +- tools/perf/util/unwind-libdw.c | 10 +- tools/perf/util/unwind-libdw.h | 1 + tools/perf/util/unwind-libunwind-local.c | 10 +- tools/perf/util/unwind-libunwind.c | 6 +- tools/perf/util/unwind.h | 13 +- tools/testing/selftests/bpf/progs/test_sk_lookup.c | 3 +- tools/testing/selftests/bpf/xdpxceiver.c | 80 +-- tools/testing/selftests/bpf/xdpxceiver.h | 2 +- tools/testing/selftests/kvm/aarch64/vgic_irq.c | 45 +- tools/testing/selftests/kvm/lib/aarch64/gic_v3.c | 12 +- tools/testing/selftests/kvm/lib/aarch64/vgic.c | 9 +- virt/kvm/kvm_main.c | 2 +- 404 files changed, 4510 insertions(+), 2534 deletions(-)
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit 2a6852cb8ff0c8c1363cac648d68489343813212 ]
Due to some renaming, we ended up with the "indirect iomem" naming in Kconfig, following INDIRECT_PIO. However, clearly I missed following through on that in the ifdefs, but so far INDIRECT_IOMEM_FALLBACK isn't used by any architecture.
Reported-by: Lukas Bulwahn lukas.bulwahn@gmail.com Fixes: ca2e334232b6 ("lib: add iomem emulation (logic_iomem)") Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Richard Weinberger richard@nod.at Signed-off-by: Sasha Levin sashal@kernel.org --- lib/logic_iomem.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/lib/logic_iomem.c b/lib/logic_iomem.c index 8c3365f26e51..b247d412ddef 100644 --- a/lib/logic_iomem.c +++ b/lib/logic_iomem.c @@ -68,7 +68,7 @@ int logic_iomem_add_region(struct resource *resource, } EXPORT_SYMBOL(logic_iomem_add_region);
-#ifndef CONFIG_LOGIC_IOMEM_FALLBACK +#ifndef CONFIG_INDIRECT_IOMEM_FALLBACK static void __iomem *real_ioremap(phys_addr_t offset, size_t size) { WARN(1, "invalid ioremap(0x%llx, 0x%zx)\n", @@ -81,7 +81,7 @@ static void real_iounmap(volatile void __iomem *addr) WARN(1, "invalid iounmap for addr 0x%llx\n", (unsigned long long)(uintptr_t __force)addr); } -#endif /* CONFIG_LOGIC_IOMEM_FALLBACK */ +#endif /* CONFIG_INDIRECT_IOMEM_FALLBACK */
void __iomem *ioremap(phys_addr_t offset, size_t size) { @@ -168,7 +168,7 @@ void iounmap(volatile void __iomem *addr) } EXPORT_SYMBOL(iounmap);
-#ifndef CONFIG_LOGIC_IOMEM_FALLBACK +#ifndef CONFIG_INDIRECT_IOMEM_FALLBACK #define MAKE_FALLBACK(op, sz) \ static u##sz real_raw_read ## op(const volatile void __iomem *addr) \ { \ @@ -213,7 +213,7 @@ static void real_memcpy_toio(volatile void __iomem *addr, const void *buffer, WARN(1, "Invalid memcpy_toio at address 0x%llx\n", (unsigned long long)(uintptr_t __force)addr); } -#endif /* CONFIG_LOGIC_IOMEM_FALLBACK */ +#endif /* CONFIG_INDIRECT_IOMEM_FALLBACK */
#define MAKE_OP(op, sz) \ u##sz __raw_read ## op(const volatile void __iomem *addr) \
From: Benjamin Beichler benjamin.beichler@uni-rostock.de
[ Upstream commit e3a33af812c611d99756e2ec61e9d7068d466bdf ]
Due to dropped inclusion of asm-generic/xor.h, xor_block_8regs symbol is missing with CONFIG64 and break compilation, as the asm/xor_64.h also did not include it. The patch recreate the logic from arch/x86, which check whether AVX is available and add fallbacks for 32bit and 64bit config of um.
A very minor additional "fix" is, the return of the macro parameter instead of NULL, as this is the original intent of the macro, but this does not change the actual behavior.
Fixes: c0ecca6604b8 ("um: enable the use of optimized xor routines in UML") Signed-off-by: Benjamin Beichler benjamin.beichler@uni-rostock.de Acked-By: Anton Ivanov anton.ivanov@cambridgegreys.com Signed-off-by: Richard Weinberger richard@nod.at Signed-off-by: Sasha Levin sashal@kernel.org --- arch/um/include/asm/xor.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/um/include/asm/xor.h b/arch/um/include/asm/xor.h index f512704a9ec7..22b39de73c24 100644 --- a/arch/um/include/asm/xor.h +++ b/arch/um/include/asm/xor.h @@ -4,8 +4,10 @@
#ifdef CONFIG_64BIT #undef CONFIG_X86_32 +#define TT_CPU_INF_XOR_DEFAULT (AVX_SELECT(&xor_block_sse_pf64)) #else #define CONFIG_X86_32 1 +#define TT_CPU_INF_XOR_DEFAULT (AVX_SELECT(&xor_block_8regs)) #endif
#include <asm/cpufeature.h> @@ -16,7 +18,7 @@ #undef XOR_SELECT_TEMPLATE /* pick an arbitrary one - measuring isn't possible with inf-cpu */ #define XOR_SELECT_TEMPLATE(x) \ - (time_travel_mode == TT_MODE_INFCPU ? &xor_block_8regs : NULL) + (time_travel_mode == TT_MODE_INFCPU ? TT_CPU_INF_XOR_DEFAULT : x)) #endif
#endif
From: Jiasheng Jiang jiasheng@iscas.ac.cn
[ Upstream commit 43f0269b6b89c1eec4ef83c48035608f4dcdd886 ]
As the potential failure of the wm8350_register_irq(), it should be better to check it and return error if fails. Also, it need not free 'wm_rtc->rtc' since it will be freed automatically.
Fixes: 077eaf5b40ec ("rtc: rtc-wm8350: add support for WM8350 RTC") Signed-off-by: Jiasheng Jiang jiasheng@iscas.ac.cn Acked-by: Charles Keepax ckeepax@opensource.cirrus.com Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Link: https://lore.kernel.org/r/20220303085030.291793-1-jiasheng@iscas.ac.cn Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/rtc/rtc-wm8350.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/rtc/rtc-wm8350.c b/drivers/rtc/rtc-wm8350.c index 2018614f258f..6eaa9321c074 100644 --- a/drivers/rtc/rtc-wm8350.c +++ b/drivers/rtc/rtc-wm8350.c @@ -432,14 +432,21 @@ static int wm8350_rtc_probe(struct platform_device *pdev) return ret; }
- wm8350_register_irq(wm8350, WM8350_IRQ_RTC_SEC, + ret = wm8350_register_irq(wm8350, WM8350_IRQ_RTC_SEC, wm8350_rtc_update_handler, 0, "RTC Seconds", wm8350); + if (ret) + return ret; + wm8350_mask_irq(wm8350, WM8350_IRQ_RTC_SEC);
- wm8350_register_irq(wm8350, WM8350_IRQ_RTC_ALM, + ret = wm8350_register_irq(wm8350, WM8350_IRQ_RTC_ALM, wm8350_rtc_alarm_handler, 0, "RTC Alarm", wm8350); + if (ret) { + wm8350_free_irq(wm8350, WM8350_IRQ_RTC_SEC, wm8350); + return ret; + }
return 0; }
From: Zheng Yongjun zhengyongjun3@huawei.com
[ Upstream commit 866b7a278cdb51eb158cd8513bc7438fc857804a ]
As the possible failure of the allocation, kzalloc() may return NULL pointer. Therefore, it should be better to check the 'sgi' in order to prevent the dereference of NULL pointer.
Fixes: 23ae3a7877718 ("net: dsa: felix: add stream gate settings for psfp"). Signed-off-by: Zheng Yongjun zhengyongjun3@huawei.com Reviewed-by: Vladimir Oltean vladimir.oltean@nxp.com Link: https://lore.kernel.org/r/20220329090800.130106-1-zhengyongjun3@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/ocelot/felix_vsc9959.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/net/dsa/ocelot/felix_vsc9959.c b/drivers/net/dsa/ocelot/felix_vsc9959.c index 33f0ceae381d..2875b5250856 100644 --- a/drivers/net/dsa/ocelot/felix_vsc9959.c +++ b/drivers/net/dsa/ocelot/felix_vsc9959.c @@ -1940,6 +1940,10 @@ static int vsc9959_psfp_filter_add(struct ocelot *ocelot, int port, case FLOW_ACTION_GATE: size = struct_size(sgi, entries, a->gate.num_entries); sgi = kzalloc(size, GFP_KERNEL); + if (!sgi) { + ret = -ENOMEM; + goto err; + } vsc9959_psfp_parse_gate(a, sgi); ret = vsc9959_psfp_sgi_table_add(ocelot, sgi); if (ret) {
From: Muchun Song songmuchun@bytedance.com
[ Upstream commit 8f0b36497303487d5a32c75789c77859cc2ee895 ]
If the kfence object is allocated to be used for objects vector, then this slot of the pool eventually being occupied permanently since the vector is never freed. The solutions could be (1) freeing vector when the kfence object is freed or (2) allocating all vectors statically.
Since the memory consumption of object vectors is low, it is better to chose (2) to fix the issue and it is also can reduce overhead of vectors allocating in the future.
Link: https://lkml.kernel.org/r/20220328132843.16624-1-songmuchun@bytedance.com Fixes: d3fb45f370d9 ("mm, kfence: insert KFENCE hooks for SLAB") Signed-off-by: Muchun Song songmuchun@bytedance.com Reviewed-by: Marco Elver elver@google.com Reviewed-by: Roman Gushchin roman.gushchin@linux.dev Cc: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Xiongchun Duan duanxiongchun@bytedance.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/kfence/core.c | 11 ++++++++++- mm/kfence/kfence.h | 3 +++ 2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 13128fa13062..d4c7978cd75e 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -555,6 +555,8 @@ static bool __init kfence_init_pool(void) * enters __slab_free() slow-path. */ for (i = 0; i < KFENCE_POOL_SIZE / PAGE_SIZE; i++) { + struct slab *slab = page_slab(&pages[i]); + if (!i || (i % 2)) continue;
@@ -562,7 +564,11 @@ static bool __init kfence_init_pool(void) if (WARN_ON(compound_head(&pages[i]) != &pages[i])) goto err;
- __SetPageSlab(&pages[i]); + __folio_set_slab(slab_folio(slab)); +#ifdef CONFIG_MEMCG + slab->memcg_data = (unsigned long)&kfence_metadata[i / 2 - 1].objcg | + MEMCG_DATA_OBJCGS; +#endif }
/* @@ -938,6 +944,9 @@ void __kfence_free(void *addr) { struct kfence_metadata *meta = addr_to_metadata((unsigned long)addr);
+#ifdef CONFIG_MEMCG + KFENCE_WARN_ON(meta->objcg); +#endif /* * If the objects of the cache are SLAB_TYPESAFE_BY_RCU, defer freeing * the object, as the object page may be recycled for other-typed diff --git a/mm/kfence/kfence.h b/mm/kfence/kfence.h index 2a2d5de9d379..9a6c4b1b12a8 100644 --- a/mm/kfence/kfence.h +++ b/mm/kfence/kfence.h @@ -89,6 +89,9 @@ struct kfence_metadata { struct kfence_track free_track; /* For updating alloc_covered on frees. */ u32 alloc_stack_hash; +#ifdef CONFIG_MEMCG + struct obj_cgroup *objcg; +#endif };
extern struct kfence_metadata kfence_metadata[CONFIG_KFENCE_NUM_OBJECTS];
From: Jim Mattson jmattson@google.com
[ Upstream commit 95b065bf5c431c06c68056a03a5853b660640ecc ]
The third nybble of AMD's event select overlaps with Intel's IN_TX and IN_TXCP bits. Therefore, we can't use AMD64_RAW_EVENT_MASK on Intel platforms that support TSX.
Declare a raw_event_mask in the kvm_pmu structure, initialize it in the vendor-specific pmu_refresh() functions, and use that mask for PERF_TYPE_RAW configurations in reprogram_gp_counter().
Fixes: 710c47651431 ("KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW") Signed-off-by: Jim Mattson jmattson@google.com Message-Id: 20220308012452.3468611-1-jmattson@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/pmu.c | 3 ++- arch/x86/kvm/svm/pmu.c | 1 + arch/x86/kvm/vmx/pmu_intel.c | 1 + 4 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index ec9830d2aabf..17b4e1808b8e 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -509,6 +509,7 @@ struct kvm_pmu { u64 global_ctrl_mask; u64 global_ovf_ctrl_mask; u64 reserved_bits; + u64 raw_event_mask; u8 version; struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC]; struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED]; diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index b1a02993782b..902b6d700215 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -185,6 +185,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel) u32 type = PERF_TYPE_RAW; struct kvm *kvm = pmc->vcpu->kvm; struct kvm_pmu_event_filter *filter; + struct kvm_pmu *pmu = vcpu_to_pmu(pmc->vcpu); bool allow_event = true;
if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL) @@ -221,7 +222,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel) }
if (type == PERF_TYPE_RAW) - config = eventsel & AMD64_RAW_EVENT_MASK; + config = eventsel & pmu->raw_event_mask;
if (pmc->current_config == eventsel && pmc_resume_counter(pmc)) return; diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c index 5aa45f13b16d..7cc664b4472a 100644 --- a/arch/x86/kvm/svm/pmu.c +++ b/arch/x86/kvm/svm/pmu.c @@ -284,6 +284,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1; pmu->reserved_bits = 0xfffffff000280000ull; + pmu->raw_event_mask = AMD64_RAW_EVENT_MASK; pmu->version = 1; /* not applicable to AMD; but clean them to prevent any fall out */ pmu->counter_bitmask[KVM_PMC_FIXED] = 0; diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index 466d18fc0c5d..8ba12294a7c5 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -485,6 +485,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu) pmu->counter_bitmask[KVM_PMC_FIXED] = 0; pmu->version = 0; pmu->reserved_bits = 0xffffffff00200000ull; + pmu->raw_event_mask = X86_RAW_EVENT_MASK;
entry = kvm_find_cpuid_entry(vcpu, 0xa, 0); if (!entry || !enable_pmu)
From: Peter Gonda pgonda@google.com
[ Upstream commit 4a9e7b9ea252842bc8b14d495706ac6317fafd5d ]
Include kvm_cache_regs.h to pick up the definition of is_guest_mode(), which is referenced by nested_svm_virtualize_tpr() in svm.h. Remove include from svm_onhpyerv.c which was done only because of lack of include in svm.h.
Fixes: 883b0a91f41ab ("KVM: SVM: Move Nested SVM Implementation to nested.c") Cc: Paolo Bonzini pbonzini@redhat.com Cc: Sean Christopherson seanjc@google.com Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Peter Gonda pgonda@google.com Message-Id: 20220304161032.2270688-1-pgonda@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/svm/svm.h | 2 ++ arch/x86/kvm/svm/svm_onhyperv.c | 1 - 2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index fa98d6844728..86bcfed6599e 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -22,6 +22,8 @@ #include <asm/svm.h> #include <asm/sev-common.h>
+#include "kvm_cache_regs.h" + #define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)
#define IOPM_SIZE PAGE_SIZE * 3 diff --git a/arch/x86/kvm/svm/svm_onhyperv.c b/arch/x86/kvm/svm/svm_onhyperv.c index 98aa981c04ec..8cdc62c74a96 100644 --- a/arch/x86/kvm/svm/svm_onhyperv.c +++ b/arch/x86/kvm/svm/svm_onhyperv.c @@ -4,7 +4,6 @@ */
#include <linux/kvm_host.h> -#include "kvm_cache_regs.h"
#include <asm/mshyperv.h>
From: Jim Mattson jmattson@google.com
[ Upstream commit 9b026073db2f1ad0e4d8b61c83316c8497981037 ]
AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some reserved bits are cleared, and some are not. Specifically, on Zen3/Milan, bits 19 and 42 are not cleared.
When emulating such a WRMSR, KVM should not synthesize a #GP, regardless of which bits are set. However, undocumented bits should not be passed through to the hardware MSR. So, rather than checking for reserved bits and synthesizing a #GP, just clear the reserved bits.
This may seem pedantic, but since KVM currently does not support the "Host/Guest Only" bits (41:40), it is necessary to clear these bits rather than synthesizing #GP, because some popular guests (e.g Linux) will set the "Host Only" bit even on CPUs that don't support EFER.SVME, and they don't expect a #GP.
For example,
root@Ubuntu1804:~# perf stat -e r26 -a sleep 1
Performance counter stats for 'system wide':
0 r26
1.001070977 seconds time elapsed
Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379957] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000130026) at rIP: 0xffffffff9b276a28 (native_write_msr+0x8/0x30) Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379958] Call Trace: Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379963] amd_pmu_disable_event+0x27/0x90
Fixes: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM") Reported-by: Lotus Fenn lotusf@google.com Signed-off-by: Jim Mattson jmattson@google.com Reviewed-by: Like Xu likexu@tencent.com Reviewed-by: David Dunn daviddunn@google.com Message-Id: 20220226234131.2167175-1-jmattson@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/svm/pmu.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c index 7cc664b4472a..ba40b7fced5a 100644 --- a/arch/x86/kvm/svm/pmu.c +++ b/arch/x86/kvm/svm/pmu.c @@ -262,12 +262,10 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) /* MSR_EVNTSELn */ pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_EVNTSEL); if (pmc) { - if (data == pmc->eventsel) - return 0; - if (!(data & pmu->reserved_bits)) { + data &= ~pmu->reserved_bits; + if (data != pmc->eventsel) reprogram_gp_counter(pmc, data); - return 0; - } + return 0; }
return 1;
From: Like Xu likexu@tencent.com
[ Upstream commit e644896f5106aa3f6d7e8c7adf2e4dc0fce53555 ]
HSW_IN_TX* bits are used in generic code which are not supported on AMD. Worse, these bits overlap with AMD EventSelect[11:8] and hence using HSW_IN_TX* bits unconditionally in generic code is resulting in unintentional pmu behavior on AMD. For example, if EventSelect[11:8] is 0x2, pmc_reprogram_counter() wrongly assumes that HSW_IN_TX_CHECKPOINTED is set and thus forces sampling period to be 0.
Also per the SDM, both bits 32 and 33 "may only be set if the processor supports HLE or RTM" and for "IN_TXCP (bit 33): this bit may only be set for IA32_PERFEVTSEL2."
Opportunistically eliminate code redundancy, because if the HSW_IN_TX* bit is set in pmc->eventsel, it is already set in attr.config.
Reported-by: Ravi Bangoria ravi.bangoria@amd.com Reported-by: Jim Mattson jmattson@google.com Fixes: 103af0a98788 ("perf, kvm: Support the in_tx/in_tx_cp modifiers in KVM arch perfmon emulation v5") Co-developed-by: Ravi Bangoria ravi.bangoria@amd.com Signed-off-by: Ravi Bangoria ravi.bangoria@amd.com Signed-off-by: Like Xu likexu@tencent.com Message-Id: 20220309084257.88931-1-likexu@tencent.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/pmu.c | 15 +++++---------- arch/x86/kvm/vmx/pmu_intel.c | 13 ++++++++++--- 2 files changed, 15 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index 902b6d700215..eca39f56c231 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -96,8 +96,7 @@ static void kvm_perf_overflow(struct perf_event *perf_event,
static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config, bool exclude_user, - bool exclude_kernel, bool intr, - bool in_tx, bool in_tx_cp) + bool exclude_kernel, bool intr) { struct perf_event *event; struct perf_event_attr attr = { @@ -116,16 +115,14 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
attr.sample_period = get_sample_period(pmc, pmc->counter);
- if (in_tx) - attr.config |= HSW_IN_TX; - if (in_tx_cp) { + if ((attr.config & HSW_IN_TX_CHECKPOINTED) && + guest_cpuid_is_intel(pmc->vcpu)) { /* * HSW_IN_TX_CHECKPOINTED is not supported with nonzero * period. Just clear the sample period so at least * allocating the counter doesn't fail. */ attr.sample_period = 0; - attr.config |= HSW_IN_TX_CHECKPOINTED; }
event = perf_event_create_kernel_counter(&attr, -1, current, @@ -233,9 +230,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel) pmc_reprogram_counter(pmc, type, config, !(eventsel & ARCH_PERFMON_EVENTSEL_USR), !(eventsel & ARCH_PERFMON_EVENTSEL_OS), - eventsel & ARCH_PERFMON_EVENTSEL_INT, - (eventsel & HSW_IN_TX), - (eventsel & HSW_IN_TX_CHECKPOINTED)); + eventsel & ARCH_PERFMON_EVENTSEL_INT); } EXPORT_SYMBOL_GPL(reprogram_gp_counter);
@@ -271,7 +266,7 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx) kvm_x86_ops.pmu_ops->pmc_perf_hw_id(pmc), !(en_field & 0x2), /* exclude user */ !(en_field & 0x1), /* exclude kernel */ - pmi, false, false); + pmi); } EXPORT_SYMBOL_GPL(reprogram_fixed_counter);
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index 8ba12294a7c5..5fa3870b8988 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -389,6 +389,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) struct kvm_pmc *pmc; u32 msr = msr_info->index; u64 data = msr_info->data; + u64 reserved_bits;
switch (msr) { case MSR_CORE_PERF_FIXED_CTR_CTRL: @@ -443,7 +444,11 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) } else if ((pmc = get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0))) { if (data == pmc->eventsel) return 0; - if (!(data & pmu->reserved_bits)) { + reserved_bits = pmu->reserved_bits; + if ((pmc->idx == 2) && + (pmu->raw_event_mask & HSW_IN_TX_CHECKPOINTED)) + reserved_bits ^= HSW_IN_TX_CHECKPOINTED; + if (!(data & reserved_bits)) { reprogram_gp_counter(pmc, data); return 0; } @@ -534,8 +539,10 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu) entry = kvm_find_cpuid_entry(vcpu, 7, 0); if (entry && (boot_cpu_has(X86_FEATURE_HLE) || boot_cpu_has(X86_FEATURE_RTM)) && - (entry->ebx & (X86_FEATURE_HLE|X86_FEATURE_RTM))) - pmu->reserved_bits ^= HSW_IN_TX|HSW_IN_TX_CHECKPOINTED; + (entry->ebx & (X86_FEATURE_HLE|X86_FEATURE_RTM))) { + pmu->reserved_bits ^= HSW_IN_TX; + pmu->raw_event_mask |= (HSW_IN_TX|HSW_IN_TX_CHECKPOINTED); + }
bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters);
From: Hou Wenlong houwenlong.hwl@antgroup.com
[ Upstream commit a836839cbfe60dc434c5476a7429cf2bae36415d ]
When RDTSCP is supported but RDPID is not supported in host, RDPID emulation is available. However, __kvm_get_msr() would only fail when RDTSCP/RDPID both are disabled in guest, so the emulator wouldn't inject a #UD when RDPID is disabled but RDTSCP is enabled in guest.
Fixes: fb6d4d340e05 ("KVM: x86: emulate RDPID") Signed-off-by: Hou Wenlong houwenlong.hwl@antgroup.com Message-Id: 1dfd46ae5b76d3ed87bde3154d51c64ea64c99c1.1646226788.git.houwenlong.hwl@antgroup.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/emulate.c | 4 +++- arch/x86/kvm/kvm_emulate.h | 1 + arch/x86/kvm/x86.c | 6 ++++++ 3 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 02d061a06aa1..de9d8a27387c 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3523,8 +3523,10 @@ static int em_rdpid(struct x86_emulate_ctxt *ctxt) { u64 tsc_aux = 0;
- if (ctxt->ops->get_msr(ctxt, MSR_TSC_AUX, &tsc_aux)) + if (!ctxt->ops->guest_has_rdpid(ctxt)) return emulate_ud(ctxt); + + ctxt->ops->get_msr(ctxt, MSR_TSC_AUX, &tsc_aux); ctxt->dst.val = tsc_aux; return X86EMUL_CONTINUE; } diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h index 39eded2426ff..a2a7654d8ace 100644 --- a/arch/x86/kvm/kvm_emulate.h +++ b/arch/x86/kvm/kvm_emulate.h @@ -226,6 +226,7 @@ struct x86_emulate_ops { bool (*guest_has_long_mode)(struct x86_emulate_ctxt *ctxt); bool (*guest_has_movbe)(struct x86_emulate_ctxt *ctxt); bool (*guest_has_fxsr)(struct x86_emulate_ctxt *ctxt); + bool (*guest_has_rdpid)(struct x86_emulate_ctxt *ctxt);
void (*set_nmi_mask)(struct x86_emulate_ctxt *ctxt, bool masked);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9b6166348c94..c81ec70197fb 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7675,6 +7675,11 @@ static bool emulator_guest_has_fxsr(struct x86_emulate_ctxt *ctxt) return guest_cpuid_has(emul_to_vcpu(ctxt), X86_FEATURE_FXSR); }
+static bool emulator_guest_has_rdpid(struct x86_emulate_ctxt *ctxt) +{ + return guest_cpuid_has(emul_to_vcpu(ctxt), X86_FEATURE_RDPID); +} + static ulong emulator_read_gpr(struct x86_emulate_ctxt *ctxt, unsigned reg) { return kvm_register_read_raw(emul_to_vcpu(ctxt), reg); @@ -7757,6 +7762,7 @@ static const struct x86_emulate_ops emulate_ops = { .guest_has_long_mode = emulator_guest_has_long_mode, .guest_has_movbe = emulator_guest_has_movbe, .guest_has_fxsr = emulator_guest_has_fxsr, + .guest_has_rdpid = emulator_guest_has_rdpid, .set_nmi_mask = emulator_set_nmi_mask, .get_hflags = emulator_get_hflags, .exiting_smm = emulator_exiting_smm,
From: Anisse Astier anisse@astier.eu
[ Upstream commit 0b464ca3e0dd3cec65f28bc6d396d82f19080f69 ]
Panel is 800x1280, but mounted on a laptop form factor, sideways.
Signed-off-by: Anisse Astier anisse@astier.eu Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Jani Nikula jani.nikula@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20211229222200.53128-3-anisse@... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/drm_panel_orientation_quirks.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/drm_panel_orientation_quirks.c b/drivers/gpu/drm/drm_panel_orientation_quirks.c index b910978d3e48..4e853acfd1e8 100644 --- a/drivers/gpu/drm/drm_panel_orientation_quirks.c +++ b/drivers/gpu/drm/drm_panel_orientation_quirks.c @@ -180,6 +180,12 @@ static const struct dmi_system_id orientation_data[] = { DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "MicroPC"), }, .driver_data = (void *)&lcd720x1280_rightside_up, + }, { /* GPD Win Max */ + .matches = { + DMI_EXACT_MATCH(DMI_SYS_VENDOR, "GPD"), + DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "G1619-01"), + }, + .driver_data = (void *)&lcd800x1280_rightside_up, }, { /* * GPD Pocket, note that the the DMI data is less generic then * it seems, devices with a board-vendor of "AMI Corporation"
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 89a0b8b98f49ae34886e67624208c2898e1e4d7f ]
This fixes the following warning:
net/bluetooth/hci_sync.c:5143:5: warning: no previous prototype for ‘hci_le_ext_create_conn_sync’ [-Wmissing-prototypes]
Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_sync.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c index 405d48c3e63e..48c837530a11 100644 --- a/net/bluetooth/hci_sync.c +++ b/net/bluetooth/hci_sync.c @@ -5156,8 +5156,8 @@ static void set_ext_conn_params(struct hci_conn *conn, p->max_ce_len = cpu_to_le16(0x0000); }
-int hci_le_ext_create_conn_sync(struct hci_dev *hdev, struct hci_conn *conn, - u8 own_addr_type) +static int hci_le_ext_create_conn_sync(struct hci_dev *hdev, + struct hci_conn *conn, u8 own_addr_type) { struct hci_cp_le_ext_create_conn *cp; struct hci_cp_le_ext_conn_param *p;
From: Zekun Shen bruceshenzk@gmail.com
[ Upstream commit 564d4eceb97eaf381dd6ef6470b06377bb50c95a ]
The bug was found during fuzzing. Stacktrace locates it in ath5k_eeprom_convert_pcal_info_5111. When none of the curve is selected in the loop, idx can go up to AR5K_EEPROM_N_PD_CURVES. The line makes pd out of bound. pd = &chinfo[pier].pd_curves[idx];
There are many OOB writes using pd later in the code. So I added a sanity check for idx. Checks for other loops involving AR5K_EEPROM_N_PD_CURVES are not needed as the loop index is not used outside the loops.
The patch is NOT tested with real device.
The following is the fuzzing report
BUG: KASAN: slab-out-of-bounds in ath5k_eeprom_read_pcal_info_5111+0x126a/0x1390 [ath5k] Write of size 1 at addr ffff8880174a4d60 by task modprobe/214
CPU: 0 PID: 214 Comm: modprobe Not tainted 5.6.0 #1 Call Trace: dump_stack+0x76/0xa0 print_address_description.constprop.0+0x16/0x200 ? ath5k_eeprom_read_pcal_info_5111+0x126a/0x1390 [ath5k] ? ath5k_eeprom_read_pcal_info_5111+0x126a/0x1390 [ath5k] __kasan_report.cold+0x37/0x7c ? ath5k_eeprom_read_pcal_info_5111+0x126a/0x1390 [ath5k] kasan_report+0xe/0x20 ath5k_eeprom_read_pcal_info_5111+0x126a/0x1390 [ath5k] ? apic_timer_interrupt+0xa/0x20 ? ath5k_eeprom_init_11a_pcal_freq+0xbc0/0xbc0 [ath5k] ? ath5k_pci_eeprom_read+0x228/0x3c0 [ath5k] ath5k_eeprom_init+0x2513/0x6290 [ath5k] ? ath5k_eeprom_init_11a_pcal_freq+0xbc0/0xbc0 [ath5k] ? usleep_range+0xb8/0x100 ? apic_timer_interrupt+0xa/0x20 ? ath5k_eeprom_read_pcal_info_2413+0x2f20/0x2f20 [ath5k] ath5k_hw_init+0xb60/0x1970 [ath5k] ath5k_init_ah+0x6fe/0x2530 [ath5k] ? kasprintf+0xa6/0xe0 ? ath5k_stop+0x140/0x140 [ath5k] ? _dev_notice+0xf6/0xf6 ? apic_timer_interrupt+0xa/0x20 ath5k_pci_probe.cold+0x29a/0x3d6 [ath5k] ? ath5k_pci_eeprom_read+0x3c0/0x3c0 [ath5k] ? mutex_lock+0x89/0xd0 ? ath5k_pci_eeprom_read+0x3c0/0x3c0 [ath5k] local_pci_probe+0xd3/0x160 pci_device_probe+0x23f/0x3e0 ? pci_device_remove+0x280/0x280 ? pci_device_remove+0x280/0x280 really_probe+0x209/0x5d0
Reported-by: Brendan Dolan-Gavitt brendandg@nyu.edu Signed-off-by: Zekun Shen bruceshenzk@gmail.com Signed-off-by: Kalle Valo quic_kvalo@quicinc.com Link: https://lore.kernel.org/r/YckvDdj3mtCkDRIt@a-10-27-26-18.dynapool.vpn.nyu.ed... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath5k/eeprom.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/wireless/ath/ath5k/eeprom.c b/drivers/net/wireless/ath/ath5k/eeprom.c index 1fbc2c19848f..d444b3d70ba2 100644 --- a/drivers/net/wireless/ath/ath5k/eeprom.c +++ b/drivers/net/wireless/ath/ath5k/eeprom.c @@ -746,6 +746,9 @@ ath5k_eeprom_convert_pcal_info_5111(struct ath5k_hw *ah, int mode, } }
+ if (idx == AR5K_EEPROM_N_PD_CURVES) + goto err_out; + ee->ee_pd_gains[mode] = 1;
pd = &chinfo[pier].pd_curves[idx];
From: Soenke Huster soenke.huster@eknoes.de
[ Upstream commit 3afee2118132e93e5f6fa636dfde86201a860ab3 ]
This event is just specified for SCO and eSCO link types. On the reception of a HCI_Synchronous_Connection_Complete for a BDADDR of an existing LE connection, LE link type and a status that triggers the second case of the packet processing a NULL pointer dereference happens, as conn->link is NULL.
Signed-off-by: Soenke Huster soenke.huster@eknoes.de Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_event.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index a105b7317560..519f5906ee98 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -4661,6 +4661,19 @@ static void hci_sync_conn_complete_evt(struct hci_dev *hdev, void *data, struct hci_ev_sync_conn_complete *ev = data; struct hci_conn *conn;
+ switch (ev->link_type) { + case SCO_LINK: + case ESCO_LINK: + break; + default: + /* As per Core 5.3 Vol 4 Part E 7.7.35 (p.2219), Link_Type + * for HCI_Synchronous_Connection_Complete is limited to + * either SCO or eSCO + */ + bt_dev_err(hdev, "Ignoring connect complete event for invalid link type"); + return; + } + bt_dev_dbg(hdev, "status 0x%2.2x", ev->status);
hci_dev_lock(hdev);
From: Dale Zhao dale.zhao@amd.com
[ Upstream commit 047db281c026de5971cedb5bb486aa29bd16a39d ]
[Why] For allow eDP hot-plug feature, the stream signal may change to VIRTUAL when plug-out and back to eDP when plug-in. OS will still setPathMode with same timing for each plugging, but eDP gets no stream update as we don't check signal type changing back as keeping it VIRTUAL. It's also unsafe for future cases that stream signal is switched with same timing.
[How] Check stream signal type change include previous HDMI signal case.
Reviewed-by: Aric Cyr Aric.Cyr@amd.com Acked-by: Wayne Lin wayne.lin@amd.com Signed-off-by: Dale Zhao dale.zhao@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index 18757c158523..ac3071e38e4a 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -1640,6 +1640,9 @@ static bool are_stream_backends_same( if (is_timing_changed(stream_a, stream_b)) return false;
+ if (stream_a->signal != stream_b->signal) + return false; + if (stream_a->dpms_off != stream_b->dpms_off) return false;
From: Eric Huang jinhuieric.huang@amd.com
[ Upstream commit f61c40c0757a79bcf744314df606c2bc8ae6a729 ]
SDMA FW fixes the hang issue for adding heavy-weight TLB flush on Arcturus, so we can enable it.
Signed-off-by: Eric Huang jinhuieric.huang@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 6 ------ drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 10 ++++++++-- 2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index f9bab963a948..5df387c4d7fb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1813,12 +1813,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( true); ret = unreserve_bo_and_vms(&ctx, false, false);
- /* Only apply no TLB flush on Aldebaran to - * workaround regressions on other Asics. - */ - if (table_freed && (adev->asic_type != CHIP_ALDEBARAN)) - *table_freed = true; - goto out;
out_unreserve: diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 4bfc0c8ab764..337953af7c2f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1416,6 +1416,12 @@ static int kfd_ioctl_free_memory_of_gpu(struct file *filep, return ret; }
+static bool kfd_flush_tlb_after_unmap(struct kfd_dev *dev) { + return KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) || + (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 1) && + dev->adev->sdma.instance[0].fw_version >= 18); +} + static int kfd_ioctl_map_memory_to_gpu(struct file *filep, struct kfd_process *p, void *data) { @@ -1503,7 +1509,7 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep, }
/* Flush TLBs after waiting for the page table updates to complete */ - if (table_freed) { + if (table_freed || !kfd_flush_tlb_after_unmap(dev)) { for (i = 0; i < args->n_devices; i++) { peer = kfd_device_by_id(devices_arr[i]); if (WARN_ON_ONCE(!peer)) @@ -1603,7 +1609,7 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep, } mutex_unlock(&p->mutex);
- if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) { + if (kfd_flush_tlb_after_unmap(dev)) { err = amdgpu_amdkfd_gpuvm_sync_memory(dev->adev, (struct kgd_mem *) mem, true); if (err) {
From: Philipp Zabel philipp.zabel@gmail.com
[ Upstream commit 50dc95d561a2552b0d76a9f91b38005195bf2974 ]
Now that there is support for the Microsoft VSDB for HMDs, remove the non-desktop quirk for two devices that are verified to contain it in their EDID: HPN-3515 and LEN-B800. Presumably most of the other Windows Mixed Reality headsets contain it as well, but there are ACR-7FCE and SEC-5194 devices without it.
Tested with LEN-B800.
Signed-off-by: Philipp Zabel philipp.zabel@gmail.com Reviewed-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Jani Nikula jani.nikula@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20220123101653.147333-2-philip... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/drm_edid.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c index b8f5419e514a..a71b82668a98 100644 --- a/drivers/gpu/drm/drm_edid.c +++ b/drivers/gpu/drm/drm_edid.c @@ -212,9 +212,7 @@ static const struct edid_quirk {
/* Windows Mixed Reality Headsets */ EDID_QUIRK('A', 'C', 'R', 0x7fce, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('H', 'P', 'N', 0x3515, EDID_QUIRK_NON_DESKTOP), EDID_QUIRK('L', 'E', 'N', 0x0408, EDID_QUIRK_NON_DESKTOP), - EDID_QUIRK('L', 'E', 'N', 0xb800, EDID_QUIRK_NON_DESKTOP), EDID_QUIRK('F', 'U', 'J', 0x1970, EDID_QUIRK_NON_DESKTOP), EDID_QUIRK('D', 'E', 'L', 0x7fce, EDID_QUIRK_NON_DESKTOP), EDID_QUIRK('S', 'E', 'C', 0x144a, EDID_QUIRK_NON_DESKTOP),
From: Jani Nikula jani.nikula@intel.com
[ Upstream commit ce99534e978d4a36787dbe5e5c57749d12e6bf4a ]
Improve non-desktop quirk logging if the EDID indicates non-desktop. If both are set, note about redundant quirk. If there's no quirk but the EDID indicates non-desktop, don't log non-desktop is set to 0.
Cc: Philipp Zabel philipp.zabel@gmail.com Signed-off-by: Jani Nikula jani.nikula@intel.com Reviewed-by: Philipp Zabel philipp.zabel@gmail.com Tested-by: Philipp Zabel philipp.zabel@gmail.com Link: https://patchwork.freedesktop.org/patch/msgid/20211228101051.317989-1-jani.n... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/drm_edid.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c index a71b82668a98..83e5c115e754 100644 --- a/drivers/gpu/drm/drm_edid.c +++ b/drivers/gpu/drm/drm_edid.c @@ -5325,17 +5325,13 @@ u32 drm_add_display_info(struct drm_connector *connector, const struct edid *edi info->width_mm = edid->width_cm * 10; info->height_mm = edid->height_cm * 10;
- info->non_desktop = !!(quirks & EDID_QUIRK_NON_DESKTOP); - drm_get_monitor_range(connector, edid);
- DRM_DEBUG_KMS("non_desktop set to %d\n", info->non_desktop); - if (edid->revision < 3) - return quirks; + goto out;
if (!(edid->input & DRM_EDID_INPUT_DIGITAL)) - return quirks; + goto out;
info->color_formats |= DRM_COLOR_FORMAT_RGB444; drm_parse_cea_ext(connector, edid); @@ -5356,7 +5352,7 @@ u32 drm_add_display_info(struct drm_connector *connector, const struct edid *edi
/* Only defined for 1.4 with digital displays */ if (edid->revision < 4) - return quirks; + goto out;
switch (edid->input & DRM_EDID_DIGITAL_DEPTH_MASK) { case DRM_EDID_DIGITAL_DEPTH_6: @@ -5393,6 +5389,13 @@ u32 drm_add_display_info(struct drm_connector *connector, const struct edid *edi
drm_update_mso(connector, edid);
+out: + if (quirks & EDID_QUIRK_NON_DESKTOP) { + drm_dbg_kms(connector->dev, "Non-desktop display%s\n", + info->non_desktop ? " (redundant quirk)" : ""); + info->non_desktop = true; + } + return quirks; }
From: Soenke Huster soenke.huster@eknoes.de
[ Upstream commit d5ebaa7c5f6f688959e8d40840b2249ede63b8ed ]
When one of the three connection complete events is received multiple times for the same handle, the device is registered multiple times which leads to memory corruptions. Therefore, consequent events for a single connection are ignored.
The conn->state can hold different values, therefore HCI_CONN_HANDLE_UNSET is introduced to identify new connections. To make sure the events do not contain this or another invalid handle HCI_CONN_HANDLE_MAX and checks are introduced.
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=215497 Signed-off-by: Soenke Huster soenke.huster@eknoes.de Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/bluetooth/hci_core.h | 3 ++ net/bluetooth/hci_conn.c | 1 + net/bluetooth/hci_event.c | 63 ++++++++++++++++++++++++-------- 3 files changed, 52 insertions(+), 15 deletions(-)
diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index e336e9c1dda4..36d727f94ac2 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -294,6 +294,9 @@ struct adv_monitor {
#define HCI_MAX_SHORT_NAME_LENGTH 10
+#define HCI_CONN_HANDLE_UNSET 0xffff +#define HCI_CONN_HANDLE_MAX 0x0eff + /* Min encryption key size to match with SMP */ #define HCI_MIN_ENC_KEY_SIZE 7
diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c index 3bb2b3b6a1c9..84312c836549 100644 --- a/net/bluetooth/hci_conn.c +++ b/net/bluetooth/hci_conn.c @@ -691,6 +691,7 @@ struct hci_conn *hci_conn_add(struct hci_dev *hdev, int type, bdaddr_t *dst,
bacpy(&conn->dst, dst); bacpy(&conn->src, &hdev->bdaddr); + conn->handle = HCI_CONN_HANDLE_UNSET; conn->hdev = hdev; conn->type = type; conn->role = role; diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index 519f5906ee98..63b925921c87 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -3068,6 +3068,11 @@ static void hci_conn_complete_evt(struct hci_dev *hdev, void *data, struct hci_ev_conn_complete *ev = data; struct hci_conn *conn;
+ if (__le16_to_cpu(ev->handle) > HCI_CONN_HANDLE_MAX) { + bt_dev_err(hdev, "Ignoring HCI_Connection_Complete for invalid handle"); + return; + } + bt_dev_dbg(hdev, "status 0x%2.2x", ev->status);
hci_dev_lock(hdev); @@ -3106,6 +3111,17 @@ static void hci_conn_complete_evt(struct hci_dev *hdev, void *data, } }
+ /* The HCI_Connection_Complete event is only sent once per connection. + * Processing it more than once per connection can corrupt kernel memory. + * + * As the connection handle is set here for the first time, it indicates + * whether the connection is already set up. + */ + if (conn->handle != HCI_CONN_HANDLE_UNSET) { + bt_dev_err(hdev, "Ignoring HCI_Connection_Complete for existing connection"); + goto unlock; + } + if (!ev->status) { conn->handle = __le16_to_cpu(ev->handle);
@@ -4674,6 +4690,11 @@ static void hci_sync_conn_complete_evt(struct hci_dev *hdev, void *data, return; }
+ if (__le16_to_cpu(ev->handle) > HCI_CONN_HANDLE_MAX) { + bt_dev_err(hdev, "Ignoring HCI_Sync_Conn_Complete for invalid handle"); + return; + } + bt_dev_dbg(hdev, "status 0x%2.2x", ev->status);
hci_dev_lock(hdev); @@ -4697,23 +4718,19 @@ static void hci_sync_conn_complete_evt(struct hci_dev *hdev, void *data, goto unlock; }
+ /* The HCI_Synchronous_Connection_Complete event is only sent once per connection. + * Processing it more than once per connection can corrupt kernel memory. + * + * As the connection handle is set here for the first time, it indicates + * whether the connection is already set up. + */ + if (conn->handle != HCI_CONN_HANDLE_UNSET) { + bt_dev_err(hdev, "Ignoring HCI_Sync_Conn_Complete event for existing connection"); + goto unlock; + } + switch (ev->status) { case 0x00: - /* The synchronous connection complete event should only be - * sent once per new connection. Receiving a successful - * complete event when the connection status is already - * BT_CONNECTED means that the device is misbehaving and sent - * multiple complete event packets for the same new connection. - * - * Registering the device more than once can corrupt kernel - * memory, hence upon detecting this invalid event, we report - * an error and ignore the packet. - */ - if (conn->state == BT_CONNECTED) { - bt_dev_err(hdev, "Ignoring connect complete event for existing connection"); - goto unlock; - } - conn->handle = __le16_to_cpu(ev->handle); conn->state = BT_CONNECTED; conn->type = ev->link_type; @@ -5509,6 +5526,11 @@ static void le_conn_complete_evt(struct hci_dev *hdev, u8 status, struct smp_irk *irk; u8 addr_type;
+ if (handle > HCI_CONN_HANDLE_MAX) { + bt_dev_err(hdev, "Ignoring HCI_LE_Connection_Complete for invalid handle"); + return; + } + hci_dev_lock(hdev);
/* All controllers implicitly stop advertising in the event of a @@ -5550,6 +5572,17 @@ static void le_conn_complete_evt(struct hci_dev *hdev, u8 status, cancel_delayed_work(&conn->le_conn_timeout); }
+ /* The HCI_LE_Connection_Complete event is only sent once per connection. + * Processing it more than once per connection can corrupt kernel memory. + * + * As the connection handle is set here for the first time, it indicates + * whether the connection is already set up. + */ + if (conn->handle != HCI_CONN_HANDLE_UNSET) { + bt_dev_err(hdev, "Ignoring HCI_Connection_Complete for existing connection"); + goto unlock; + } + le_conn_update_addr(conn, bdaddr, bdaddr_type, local_rpa);
/* Lookup the identity address from the stored connection
From: Xin Xiong xiongx18@fudan.edu.cn
[ Upstream commit dfced44f122c500004a48ecc8db516bb6a295a1b ]
This issue takes place in an error path in amdgpu_cs_fence_to_handle_ioctl(). When `info->in.what` falls into default case, the function simply returns -EINVAL, forgetting to decrement the reference count of a dma_fence obj, which is bumped earlier by amdgpu_cs_get_fence(). This may result in reference count leaks.
Fix it by decreasing the refcount of specific object before returning the error code.
Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Xin Xiong xiongx18@fudan.edu.cn Signed-off-by: Xin Tan tanxin.ctf@gmail.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 06d07502a1f6..a34be65c9eaa 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1509,6 +1509,7 @@ int amdgpu_cs_fence_to_handle_ioctl(struct drm_device *dev, void *data, return 0;
default: + dma_fence_put(fence); return -EINVAL; } }
From: Yongzhi Liu lyz_cs@pku.edu.cn
[ Upstream commit 5d5c6dba2b43e28845d7d7ed32a36802329a5f52 ]
[why] Resource release is needed on the error handling path to prevent memory leak.
[how] Fix this by adding kfree on the error handling path.
Reviewed-by: Harry Wentland harry.wentland@amd.com Signed-off-by: Yongzhi Liu lyz_cs@pku.edu.cn Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../amd/display/amdgpu_dm/amdgpu_dm_debugfs.c | 80 ++++++++++++++----- 1 file changed, 60 insertions(+), 20 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c index 26719efa5396..12d437d9a0e4 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c @@ -227,8 +227,10 @@ static ssize_t dp_link_settings_read(struct file *f, char __user *buf, break;
r = put_user(*(rd_buf + result), buf); - if (r) + if (r) { + kfree(rd_buf); return r; /* r = -EFAULT */ + }
buf += 1; size -= 1; @@ -389,8 +391,10 @@ static ssize_t dp_phy_settings_read(struct file *f, char __user *buf, break;
r = put_user((*(rd_buf + result)), buf); - if (r) + if (r) { + kfree(rd_buf); return r; /* r = -EFAULT */ + }
buf += 1; size -= 1; @@ -1359,8 +1363,10 @@ static ssize_t dp_dsc_clock_en_read(struct file *f, char __user *buf, break; }
- if (!pipe_ctx) + if (!pipe_ctx) { + kfree(rd_buf); return -ENXIO; + }
dsc = pipe_ctx->stream_res.dsc; if (dsc) @@ -1376,8 +1382,10 @@ static ssize_t dp_dsc_clock_en_read(struct file *f, char __user *buf, break;
r = put_user(*(rd_buf + result), buf); - if (r) + if (r) { + kfree(rd_buf); return r; /* r = -EFAULT */ + }
buf += 1; size -= 1; @@ -1546,8 +1554,10 @@ static ssize_t dp_dsc_slice_width_read(struct file *f, char __user *buf, break; }
- if (!pipe_ctx) + if (!pipe_ctx) { + kfree(rd_buf); return -ENXIO; + }
dsc = pipe_ctx->stream_res.dsc; if (dsc) @@ -1563,8 +1573,10 @@ static ssize_t dp_dsc_slice_width_read(struct file *f, char __user *buf, break;
r = put_user(*(rd_buf + result), buf); - if (r) + if (r) { + kfree(rd_buf); return r; /* r = -EFAULT */ + }
buf += 1; size -= 1; @@ -1731,8 +1743,10 @@ static ssize_t dp_dsc_slice_height_read(struct file *f, char __user *buf, break; }
- if (!pipe_ctx) + if (!pipe_ctx) { + kfree(rd_buf); return -ENXIO; + }
dsc = pipe_ctx->stream_res.dsc; if (dsc) @@ -1748,8 +1762,10 @@ static ssize_t dp_dsc_slice_height_read(struct file *f, char __user *buf, break;
r = put_user(*(rd_buf + result), buf); - if (r) + if (r) { + kfree(rd_buf); return r; /* r = -EFAULT */ + }
buf += 1; size -= 1; @@ -1912,8 +1928,10 @@ static ssize_t dp_dsc_bits_per_pixel_read(struct file *f, char __user *buf, break; }
- if (!pipe_ctx) + if (!pipe_ctx) { + kfree(rd_buf); return -ENXIO; + }
dsc = pipe_ctx->stream_res.dsc; if (dsc) @@ -1929,8 +1947,10 @@ static ssize_t dp_dsc_bits_per_pixel_read(struct file *f, char __user *buf, break;
r = put_user(*(rd_buf + result), buf); - if (r) + if (r) { + kfree(rd_buf); return r; /* r = -EFAULT */ + }
buf += 1; size -= 1; @@ -2088,8 +2108,10 @@ static ssize_t dp_dsc_pic_width_read(struct file *f, char __user *buf, break; }
- if (!pipe_ctx) + if (!pipe_ctx) { + kfree(rd_buf); return -ENXIO; + }
dsc = pipe_ctx->stream_res.dsc; if (dsc) @@ -2105,8 +2127,10 @@ static ssize_t dp_dsc_pic_width_read(struct file *f, char __user *buf, break;
r = put_user(*(rd_buf + result), buf); - if (r) + if (r) { + kfree(rd_buf); return r; /* r = -EFAULT */ + }
buf += 1; size -= 1; @@ -2145,8 +2169,10 @@ static ssize_t dp_dsc_pic_height_read(struct file *f, char __user *buf, break; }
- if (!pipe_ctx) + if (!pipe_ctx) { + kfree(rd_buf); return -ENXIO; + }
dsc = pipe_ctx->stream_res.dsc; if (dsc) @@ -2162,8 +2188,10 @@ static ssize_t dp_dsc_pic_height_read(struct file *f, char __user *buf, break;
r = put_user(*(rd_buf + result), buf); - if (r) + if (r) { + kfree(rd_buf); return r; /* r = -EFAULT */ + }
buf += 1; size -= 1; @@ -2217,8 +2245,10 @@ static ssize_t dp_dsc_chunk_size_read(struct file *f, char __user *buf, break; }
- if (!pipe_ctx) + if (!pipe_ctx) { + kfree(rd_buf); return -ENXIO; + }
dsc = pipe_ctx->stream_res.dsc; if (dsc) @@ -2234,8 +2264,10 @@ static ssize_t dp_dsc_chunk_size_read(struct file *f, char __user *buf, break;
r = put_user(*(rd_buf + result), buf); - if (r) + if (r) { + kfree(rd_buf); return r; /* r = -EFAULT */ + }
buf += 1; size -= 1; @@ -2289,8 +2321,10 @@ static ssize_t dp_dsc_slice_bpg_offset_read(struct file *f, char __user *buf, break; }
- if (!pipe_ctx) + if (!pipe_ctx) { + kfree(rd_buf); return -ENXIO; + }
dsc = pipe_ctx->stream_res.dsc; if (dsc) @@ -2306,8 +2340,10 @@ static ssize_t dp_dsc_slice_bpg_offset_read(struct file *f, char __user *buf, break;
r = put_user(*(rd_buf + result), buf); - if (r) + if (r) { + kfree(rd_buf); return r; /* r = -EFAULT */ + }
buf += 1; size -= 1; @@ -3459,8 +3495,10 @@ static ssize_t dcc_en_bits_read( dc->hwss.get_dcc_en_bits(dc, dcc_en_bits);
rd_buf = kcalloc(rd_buf_size, sizeof(char), GFP_KERNEL); - if (!rd_buf) + if (!rd_buf) { + kfree(dcc_en_bits); return -ENOMEM; + }
for (i = 0; i < num_pipes; i++) offset += snprintf(rd_buf + offset, rd_buf_size - offset, @@ -3473,8 +3511,10 @@ static ssize_t dcc_en_bits_read( if (*pos >= rd_buf_size) break; r = put_user(*(rd_buf + result), buf); - if (r) + if (r) { + kfree(rd_buf); return r; /* r = -EFAULT */ + } buf += 1; size -= 1; *pos += 1;
From: Nicholas Kazlauskas nicholas.kazlauskas@amd.com
[ Upstream commit b80ddeb29d9df449f875f0b6f5de08d7537c02b8 ]
[Why] If the DPCD caps specifies a PSR version newer than PSR_VERSION_1 then we fallback to using PSR_VERSION_1 in amdgpu_dm_set_psr_caps.
This gets overriden with the raw DPCD value in amdgpu_dm_link_setup_psr, which can result in DMCUB hanging if we pass in an unsupported PSR version number.
[How] Fix the hang by using link->psr_settings.psr_version directly during amdgpu_dm_link_setup_psr.
Tested-by: Daniel Wheeler daniel.wheeler@amd.com Reviewed-by: Anthony Koo Anthony.Koo@amd.com Acked-by: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Signed-off-by: Nicholas Kazlauskas nicholas.kazlauskas@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c index c510638b4f99..a009fc654ac9 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c @@ -149,10 +149,8 @@ bool amdgpu_dm_link_setup_psr(struct dc_stream_state *stream)
link = stream->link;
- psr_config.psr_version = link->dpcd_caps.psr_caps.psr_version; - - if (psr_config.psr_version > 0) { - psr_config.psr_exit_link_training_required = 0x1; + if (link->psr_settings.psr_version != DC_PSR_VERSION_UNSUPPORTED) { + psr_config.psr_version = link->psr_settings.psr_version; psr_config.psr_frame_capture_indication_req = 0; psr_config.psr_rfb_setup_time = 0x37; psr_config.psr_sdp_transmit_line_num_deadline = 0x20;
From: Wayne Chang waynec@nvidia.com
[ Upstream commit 62fb61580eb48fc890b7bc9fb5fd263367baeca8 ]
According to the Tegra Technical Reference Manual, SPARAM is a read-only register and should not be programmed in the driver.
The change removes the wrong SPARAM usage.
Signed-off-by: Wayne Chang waynec@nvidia.com Link: https://lore.kernel.org/r/20220107090443.149021-1-waynec@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/gadget/udc/tegra-xudc.c | 8 -------- 1 file changed, 8 deletions(-)
diff --git a/drivers/usb/gadget/udc/tegra-xudc.c b/drivers/usb/gadget/udc/tegra-xudc.c index 43f1b0d461c1..716d9ab2d2ff 100644 --- a/drivers/usb/gadget/udc/tegra-xudc.c +++ b/drivers/usb/gadget/udc/tegra-xudc.c @@ -32,9 +32,6 @@ #include <linux/workqueue.h>
/* XUSB_DEV registers */ -#define SPARAM 0x000 -#define SPARAM_ERSTMAX_MASK GENMASK(20, 16) -#define SPARAM_ERSTMAX(x) (((x) << 16) & SPARAM_ERSTMAX_MASK) #define DB 0x004 #define DB_TARGET_MASK GENMASK(15, 8) #define DB_TARGET(x) (((x) << 8) & DB_TARGET_MASK) @@ -3295,11 +3292,6 @@ static void tegra_xudc_init_event_ring(struct tegra_xudc *xudc) unsigned int i; u32 val;
- val = xudc_readl(xudc, SPARAM); - val &= ~(SPARAM_ERSTMAX_MASK); - val |= SPARAM_ERSTMAX(XUDC_NR_EVENT_RINGS); - xudc_writel(xudc, val, SPARAM); - for (i = 0; i < ARRAY_SIZE(xudc->event_ring); i++) { memset(xudc->event_ring[i], 0, XUDC_EVENT_RING_SIZE * sizeof(*xudc->event_ring[i]));
From: Wayne Chang waynec@nvidia.com
[ Upstream commit 7bd42fb95eb4f98495ccadf467ad15124208ec49 ]
According to the Tegra Technical Reference Manual, the seq_num field of control endpoint is not [31:24] but [31:27]. Bit 24 is reserved and bit 26 is splitxstate.
The change fixes the wrong control endpoint's definitions.
Signed-off-by: Wayne Chang waynec@nvidia.com Link: https://lore.kernel.org/r/20220107091349.149798-1-waynec@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/gadget/udc/tegra-xudc.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/gadget/udc/tegra-xudc.c b/drivers/usb/gadget/udc/tegra-xudc.c index 716d9ab2d2ff..be76f891b9c5 100644 --- a/drivers/usb/gadget/udc/tegra-xudc.c +++ b/drivers/usb/gadget/udc/tegra-xudc.c @@ -272,8 +272,10 @@ BUILD_EP_CONTEXT_RW(deq_hi, deq_hi, 0, 0xffffffff) BUILD_EP_CONTEXT_RW(avg_trb_len, tx_info, 0, 0xffff) BUILD_EP_CONTEXT_RW(max_esit_payload, tx_info, 16, 0xffff) BUILD_EP_CONTEXT_RW(edtla, rsvd[0], 0, 0xffffff) -BUILD_EP_CONTEXT_RW(seq_num, rsvd[0], 24, 0xff) +BUILD_EP_CONTEXT_RW(rsvd, rsvd[0], 24, 0x1) BUILD_EP_CONTEXT_RW(partial_td, rsvd[0], 25, 0x1) +BUILD_EP_CONTEXT_RW(splitxstate, rsvd[0], 26, 0x1) +BUILD_EP_CONTEXT_RW(seq_num, rsvd[0], 27, 0x1f) BUILD_EP_CONTEXT_RW(cerrcnt, rsvd[1], 18, 0x3) BUILD_EP_CONTEXT_RW(data_offset, rsvd[2], 0, 0x1ffff) BUILD_EP_CONTEXT_RW(numtrbs, rsvd[2], 22, 0x1f) @@ -1554,6 +1556,9 @@ static int __tegra_xudc_ep_set_halt(struct tegra_xudc_ep *ep, bool halt) ep_reload(xudc, ep->index);
ep_ctx_write_state(ep->context, EP_STATE_RUNNING); + ep_ctx_write_rsvd(ep->context, 0); + ep_ctx_write_partial_td(ep->context, 0); + ep_ctx_write_splitxstate(ep->context, 0); ep_ctx_write_seq_num(ep->context, 0);
ep_reload(xudc, ep->index); @@ -2809,7 +2814,10 @@ static void tegra_xudc_reset(struct tegra_xudc *xudc) xudc->setup_seq_num = 0; xudc->queued_setup_packet = false;
- ep_ctx_write_seq_num(ep0->context, xudc->setup_seq_num); + ep_ctx_write_rsvd(ep0->context, 0); + ep_ctx_write_partial_td(ep0->context, 0); + ep_ctx_write_splitxstate(ep0->context, 0); + ep_ctx_write_seq_num(ep0->context, 0);
deq_ptr = trb_virt_to_phys(ep0, &ep0->transfer_ring[ep0->deq_ptr]);
From: Pawel Laszczak pawell@cadence.com
[ Upstream commit 03db9289b5ab59437e42a111a34545a7cedb5190 ]
Variable ret in function cdnsp_decode_trb is initialized but not used. To fix this compiler warning patch adds checking whether the data buffer has not been overflowed.
Reported-by: kernel test robot lkp@intel.com Signed-off-by: Pawel Laszczak pawell@cadence.com Link: https://lore.kernel.org/r/20220112053237.14309-1-pawell@gli-login.cadence.co... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/cdns3/cdnsp-debug.h | 305 ++++++++++++++++---------------- 1 file changed, 154 insertions(+), 151 deletions(-)
diff --git a/drivers/usb/cdns3/cdnsp-debug.h b/drivers/usb/cdns3/cdnsp-debug.h index a8776df2d4e0..f0ca865cce2a 100644 --- a/drivers/usb/cdns3/cdnsp-debug.h +++ b/drivers/usb/cdns3/cdnsp-debug.h @@ -182,208 +182,211 @@ static inline const char *cdnsp_decode_trb(char *str, size_t size, u32 field0, int ep_id = TRB_TO_EP_INDEX(field3) - 1; int type = TRB_FIELD_TO_TYPE(field3); unsigned int ep_num; - int ret = 0; + int ret; u32 temp;
ep_num = DIV_ROUND_UP(ep_id, 2);
switch (type) { case TRB_LINK: - ret += snprintf(str, size, - "LINK %08x%08x intr %ld type '%s' flags %c:%c:%c:%c", - field1, field0, GET_INTR_TARGET(field2), - cdnsp_trb_type_string(type), - field3 & TRB_IOC ? 'I' : 'i', - field3 & TRB_CHAIN ? 'C' : 'c', - field3 & TRB_TC ? 'T' : 't', - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, + "LINK %08x%08x intr %ld type '%s' flags %c:%c:%c:%c", + field1, field0, GET_INTR_TARGET(field2), + cdnsp_trb_type_string(type), + field3 & TRB_IOC ? 'I' : 'i', + field3 & TRB_CHAIN ? 'C' : 'c', + field3 & TRB_TC ? 'T' : 't', + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_TRANSFER: case TRB_COMPLETION: case TRB_PORT_STATUS: case TRB_HC_EVENT: - ret += snprintf(str, size, - "ep%d%s(%d) type '%s' TRB %08x%08x status '%s'" - " len %ld slot %ld flags %c:%c", - ep_num, ep_id % 2 ? "out" : "in", - TRB_TO_EP_INDEX(field3), - cdnsp_trb_type_string(type), field1, field0, - cdnsp_trb_comp_code_string(GET_COMP_CODE(field2)), - EVENT_TRB_LEN(field2), TRB_TO_SLOT_ID(field3), - field3 & EVENT_DATA ? 'E' : 'e', - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, + "ep%d%s(%d) type '%s' TRB %08x%08x status '%s'" + " len %ld slot %ld flags %c:%c", + ep_num, ep_id % 2 ? "out" : "in", + TRB_TO_EP_INDEX(field3), + cdnsp_trb_type_string(type), field1, field0, + cdnsp_trb_comp_code_string(GET_COMP_CODE(field2)), + EVENT_TRB_LEN(field2), TRB_TO_SLOT_ID(field3), + field3 & EVENT_DATA ? 'E' : 'e', + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_MFINDEX_WRAP: - ret += snprintf(str, size, "%s: flags %c", - cdnsp_trb_type_string(type), - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, "%s: flags %c", + cdnsp_trb_type_string(type), + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_SETUP: - ret += snprintf(str, size, - "type '%s' bRequestType %02x bRequest %02x " - "wValue %02x%02x wIndex %02x%02x wLength %d " - "length %ld TD size %ld intr %ld Setup ID %ld " - "flags %c:%c:%c", - cdnsp_trb_type_string(type), - field0 & 0xff, - (field0 & 0xff00) >> 8, - (field0 & 0xff000000) >> 24, - (field0 & 0xff0000) >> 16, - (field1 & 0xff00) >> 8, - field1 & 0xff, - (field1 & 0xff000000) >> 16 | - (field1 & 0xff0000) >> 16, - TRB_LEN(field2), GET_TD_SIZE(field2), - GET_INTR_TARGET(field2), - TRB_SETUPID_TO_TYPE(field3), - field3 & TRB_IDT ? 'D' : 'd', - field3 & TRB_IOC ? 'I' : 'i', - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, + "type '%s' bRequestType %02x bRequest %02x " + "wValue %02x%02x wIndex %02x%02x wLength %d " + "length %ld TD size %ld intr %ld Setup ID %ld " + "flags %c:%c:%c", + cdnsp_trb_type_string(type), + field0 & 0xff, + (field0 & 0xff00) >> 8, + (field0 & 0xff000000) >> 24, + (field0 & 0xff0000) >> 16, + (field1 & 0xff00) >> 8, + field1 & 0xff, + (field1 & 0xff000000) >> 16 | + (field1 & 0xff0000) >> 16, + TRB_LEN(field2), GET_TD_SIZE(field2), + GET_INTR_TARGET(field2), + TRB_SETUPID_TO_TYPE(field3), + field3 & TRB_IDT ? 'D' : 'd', + field3 & TRB_IOC ? 'I' : 'i', + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_DATA: - ret += snprintf(str, size, - "type '%s' Buffer %08x%08x length %ld TD size %ld " - "intr %ld flags %c:%c:%c:%c:%c:%c:%c", - cdnsp_trb_type_string(type), - field1, field0, TRB_LEN(field2), - GET_TD_SIZE(field2), - GET_INTR_TARGET(field2), - field3 & TRB_IDT ? 'D' : 'i', - field3 & TRB_IOC ? 'I' : 'i', - field3 & TRB_CHAIN ? 'C' : 'c', - field3 & TRB_NO_SNOOP ? 'S' : 's', - field3 & TRB_ISP ? 'I' : 'i', - field3 & TRB_ENT ? 'E' : 'e', - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, + "type '%s' Buffer %08x%08x length %ld TD size %ld " + "intr %ld flags %c:%c:%c:%c:%c:%c:%c", + cdnsp_trb_type_string(type), + field1, field0, TRB_LEN(field2), + GET_TD_SIZE(field2), + GET_INTR_TARGET(field2), + field3 & TRB_IDT ? 'D' : 'i', + field3 & TRB_IOC ? 'I' : 'i', + field3 & TRB_CHAIN ? 'C' : 'c', + field3 & TRB_NO_SNOOP ? 'S' : 's', + field3 & TRB_ISP ? 'I' : 'i', + field3 & TRB_ENT ? 'E' : 'e', + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_STATUS: - ret += snprintf(str, size, - "Buffer %08x%08x length %ld TD size %ld intr" - "%ld type '%s' flags %c:%c:%c:%c", - field1, field0, TRB_LEN(field2), - GET_TD_SIZE(field2), - GET_INTR_TARGET(field2), - cdnsp_trb_type_string(type), - field3 & TRB_IOC ? 'I' : 'i', - field3 & TRB_CHAIN ? 'C' : 'c', - field3 & TRB_ENT ? 'E' : 'e', - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, + "Buffer %08x%08x length %ld TD size %ld intr" + "%ld type '%s' flags %c:%c:%c:%c", + field1, field0, TRB_LEN(field2), + GET_TD_SIZE(field2), + GET_INTR_TARGET(field2), + cdnsp_trb_type_string(type), + field3 & TRB_IOC ? 'I' : 'i', + field3 & TRB_CHAIN ? 'C' : 'c', + field3 & TRB_ENT ? 'E' : 'e', + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_NORMAL: case TRB_ISOC: case TRB_EVENT_DATA: case TRB_TR_NOOP: - ret += snprintf(str, size, - "type '%s' Buffer %08x%08x length %ld " - "TD size %ld intr %ld " - "flags %c:%c:%c:%c:%c:%c:%c:%c:%c", - cdnsp_trb_type_string(type), - field1, field0, TRB_LEN(field2), - GET_TD_SIZE(field2), - GET_INTR_TARGET(field2), - field3 & TRB_BEI ? 'B' : 'b', - field3 & TRB_IDT ? 'T' : 't', - field3 & TRB_IOC ? 'I' : 'i', - field3 & TRB_CHAIN ? 'C' : 'c', - field3 & TRB_NO_SNOOP ? 'S' : 's', - field3 & TRB_ISP ? 'I' : 'i', - field3 & TRB_ENT ? 'E' : 'e', - field3 & TRB_CYCLE ? 'C' : 'c', - !(field3 & TRB_EVENT_INVALIDATE) ? 'V' : 'v'); + ret = snprintf(str, size, + "type '%s' Buffer %08x%08x length %ld " + "TD size %ld intr %ld " + "flags %c:%c:%c:%c:%c:%c:%c:%c:%c", + cdnsp_trb_type_string(type), + field1, field0, TRB_LEN(field2), + GET_TD_SIZE(field2), + GET_INTR_TARGET(field2), + field3 & TRB_BEI ? 'B' : 'b', + field3 & TRB_IDT ? 'T' : 't', + field3 & TRB_IOC ? 'I' : 'i', + field3 & TRB_CHAIN ? 'C' : 'c', + field3 & TRB_NO_SNOOP ? 'S' : 's', + field3 & TRB_ISP ? 'I' : 'i', + field3 & TRB_ENT ? 'E' : 'e', + field3 & TRB_CYCLE ? 'C' : 'c', + !(field3 & TRB_EVENT_INVALIDATE) ? 'V' : 'v'); break; case TRB_CMD_NOOP: case TRB_ENABLE_SLOT: - ret += snprintf(str, size, "%s: flags %c", - cdnsp_trb_type_string(type), - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, "%s: flags %c", + cdnsp_trb_type_string(type), + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_DISABLE_SLOT: - ret += snprintf(str, size, "%s: slot %ld flags %c", - cdnsp_trb_type_string(type), - TRB_TO_SLOT_ID(field3), - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, "%s: slot %ld flags %c", + cdnsp_trb_type_string(type), + TRB_TO_SLOT_ID(field3), + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_ADDR_DEV: - ret += snprintf(str, size, - "%s: ctx %08x%08x slot %ld flags %c:%c", - cdnsp_trb_type_string(type), field1, field0, - TRB_TO_SLOT_ID(field3), - field3 & TRB_BSR ? 'B' : 'b', - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, + "%s: ctx %08x%08x slot %ld flags %c:%c", + cdnsp_trb_type_string(type), field1, field0, + TRB_TO_SLOT_ID(field3), + field3 & TRB_BSR ? 'B' : 'b', + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_CONFIG_EP: - ret += snprintf(str, size, - "%s: ctx %08x%08x slot %ld flags %c:%c", - cdnsp_trb_type_string(type), field1, field0, - TRB_TO_SLOT_ID(field3), - field3 & TRB_DC ? 'D' : 'd', - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, + "%s: ctx %08x%08x slot %ld flags %c:%c", + cdnsp_trb_type_string(type), field1, field0, + TRB_TO_SLOT_ID(field3), + field3 & TRB_DC ? 'D' : 'd', + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_EVAL_CONTEXT: - ret += snprintf(str, size, - "%s: ctx %08x%08x slot %ld flags %c", - cdnsp_trb_type_string(type), field1, field0, - TRB_TO_SLOT_ID(field3), - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, + "%s: ctx %08x%08x slot %ld flags %c", + cdnsp_trb_type_string(type), field1, field0, + TRB_TO_SLOT_ID(field3), + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_RESET_EP: case TRB_HALT_ENDPOINT: case TRB_FLUSH_ENDPOINT: - ret += snprintf(str, size, - "%s: ep%d%s(%d) ctx %08x%08x slot %ld flags %c", - cdnsp_trb_type_string(type), - ep_num, ep_id % 2 ? "out" : "in", - TRB_TO_EP_INDEX(field3), field1, field0, - TRB_TO_SLOT_ID(field3), - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, + "%s: ep%d%s(%d) ctx %08x%08x slot %ld flags %c", + cdnsp_trb_type_string(type), + ep_num, ep_id % 2 ? "out" : "in", + TRB_TO_EP_INDEX(field3), field1, field0, + TRB_TO_SLOT_ID(field3), + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_STOP_RING: - ret += snprintf(str, size, - "%s: ep%d%s(%d) slot %ld sp %d flags %c", - cdnsp_trb_type_string(type), - ep_num, ep_id % 2 ? "out" : "in", - TRB_TO_EP_INDEX(field3), - TRB_TO_SLOT_ID(field3), - TRB_TO_SUSPEND_PORT(field3), - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, + "%s: ep%d%s(%d) slot %ld sp %d flags %c", + cdnsp_trb_type_string(type), + ep_num, ep_id % 2 ? "out" : "in", + TRB_TO_EP_INDEX(field3), + TRB_TO_SLOT_ID(field3), + TRB_TO_SUSPEND_PORT(field3), + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_SET_DEQ: - ret += snprintf(str, size, - "%s: ep%d%s(%d) deq %08x%08x stream %ld slot %ld flags %c", - cdnsp_trb_type_string(type), - ep_num, ep_id % 2 ? "out" : "in", - TRB_TO_EP_INDEX(field3), field1, field0, - TRB_TO_STREAM_ID(field2), - TRB_TO_SLOT_ID(field3), - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, + "%s: ep%d%s(%d) deq %08x%08x stream %ld slot %ld flags %c", + cdnsp_trb_type_string(type), + ep_num, ep_id % 2 ? "out" : "in", + TRB_TO_EP_INDEX(field3), field1, field0, + TRB_TO_STREAM_ID(field2), + TRB_TO_SLOT_ID(field3), + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_RESET_DEV: - ret += snprintf(str, size, "%s: slot %ld flags %c", - cdnsp_trb_type_string(type), - TRB_TO_SLOT_ID(field3), - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, "%s: slot %ld flags %c", + cdnsp_trb_type_string(type), + TRB_TO_SLOT_ID(field3), + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_ENDPOINT_NRDY: - temp = TRB_TO_HOST_STREAM(field2); - - ret += snprintf(str, size, - "%s: ep%d%s(%d) H_SID %x%s%s D_SID %lx flags %c:%c", - cdnsp_trb_type_string(type), - ep_num, ep_id % 2 ? "out" : "in", - TRB_TO_EP_INDEX(field3), temp, - temp == STREAM_PRIME_ACK ? "(PRIME)" : "", - temp == STREAM_REJECTED ? "(REJECTED)" : "", - TRB_TO_DEV_STREAM(field0), - field3 & TRB_STAT ? 'S' : 's', - field3 & TRB_CYCLE ? 'C' : 'c'); + temp = TRB_TO_HOST_STREAM(field2); + + ret = snprintf(str, size, + "%s: ep%d%s(%d) H_SID %x%s%s D_SID %lx flags %c:%c", + cdnsp_trb_type_string(type), + ep_num, ep_id % 2 ? "out" : "in", + TRB_TO_EP_INDEX(field3), temp, + temp == STREAM_PRIME_ACK ? "(PRIME)" : "", + temp == STREAM_REJECTED ? "(REJECTED)" : "", + TRB_TO_DEV_STREAM(field0), + field3 & TRB_STAT ? 'S' : 's', + field3 & TRB_CYCLE ? 'C' : 'c'); break; default: - ret += snprintf(str, size, - "type '%s' -> raw %08x %08x %08x %08x", - cdnsp_trb_type_string(type), - field0, field1, field2, field3); + ret = snprintf(str, size, + "type '%s' -> raw %08x %08x %08x %08x", + cdnsp_trb_type_string(type), + field0, field1, field2, field3); }
+ if (ret >= size) + pr_info("CDNSP: buffer overflowed.\n"); + return str; }
From: Yang Guang yang.guang5@zte.com.cn
[ Upstream commit e2cf07654efb0fd7bbcb475c6f74be7b5755a8fd ]
coccinelle report: ./drivers/ptp/ptp_sysfs.c:17:8-16: WARNING: use scnprintf or sprintf ./drivers/ptp/ptp_sysfs.c:390:8-16: WARNING: use scnprintf or sprintf
Use sysfs_emit instead of scnprintf or sprintf makes more sense.
Reported-by: Zeal Robot zealci@zte.com.cn Signed-off-by: Yang Guang yang.guang5@zte.com.cn Signed-off-by: David Yang davidcomponentone@gmail.com Acked-by: Richard Cochran richardcochran@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ptp/ptp_sysfs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/ptp/ptp_sysfs.c b/drivers/ptp/ptp_sysfs.c index 41b92dc2f011..9233bfedeb17 100644 --- a/drivers/ptp/ptp_sysfs.c +++ b/drivers/ptp/ptp_sysfs.c @@ -14,7 +14,7 @@ static ssize_t clock_name_show(struct device *dev, struct device_attribute *attr, char *page) { struct ptp_clock *ptp = dev_get_drvdata(dev); - return snprintf(page, PAGE_SIZE-1, "%s\n", ptp->info->name); + return sysfs_emit(page, "%s\n", ptp->info->name); } static DEVICE_ATTR_RO(clock_name);
@@ -387,7 +387,7 @@ static ssize_t ptp_pin_show(struct device *dev, struct device_attribute *attr,
mutex_unlock(&ptp->pincfg_mux);
- return snprintf(page, PAGE_SIZE, "%u %u\n", func, chan); + return sysfs_emit(page, "%u %u\n", func, chan); }
static ssize_t ptp_pin_store(struct device *dev, struct device_attribute *attr,
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 0b94f2651f56b9e4aa5f012b0d7eb57308c773cf ]
hci_cmd_sync_queue shall return an error if HCI_UNREGISTER flag has been set as that means hci_unregister_dev has been called so it will likely cause a uaf after the timeout as the hdev will be freed.
Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_sync.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c index 48c837530a11..8f4c5698913d 100644 --- a/net/bluetooth/hci_sync.c +++ b/net/bluetooth/hci_sync.c @@ -379,6 +379,9 @@ int hci_cmd_sync_queue(struct hci_dev *hdev, hci_cmd_sync_work_func_t func, { struct hci_cmd_sync_work_entry *entry;
+ if (hci_dev_test_flag(hdev, HCI_UNREGISTER)) + return -ENODEV; + entry = kmalloc(sizeof(*entry), GFP_KERNEL); if (!entry) return -ENOMEM;
From: Magnus Karlsson magnus.karlsson@intel.com
[ Upstream commit 3b22523bca02b0d5618c08b93d8fd1fb578e1cc3 ]
After commit 710ad98c363a ("veth: Do not record rx queue hint in veth_xmit"), veth no longer receives traffic on the same queue as it was sent on. This breaks the bpf_res test for the AF_XDP selftests as the socket tied to queue 1 will not receive traffic anymore.
Modify the test so that two sockets are tied to queue id 0 using a shared umem instead. When killing the first socket enter the second socket into the xskmap so that traffic will flow to it. This will still test that the resources are not cleaned up until after the second socket dies, without having to rely on veth supporting rx_queue hints.
Reported-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Signed-off-by: Magnus Karlsson magnus.karlsson@intel.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Acked-by: John Fastabend john.fastabend@gmail.com Link: https://lore.kernel.org/bpf/20220125082945.26179-1-magnus.karlsson@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/bpf/xdpxceiver.c | 80 +++++++++++++++--------- tools/testing/selftests/bpf/xdpxceiver.h | 2 +- 2 files changed, 50 insertions(+), 32 deletions(-)
diff --git a/tools/testing/selftests/bpf/xdpxceiver.c b/tools/testing/selftests/bpf/xdpxceiver.c index ffa5502ad95e..5f8296d29e77 100644 --- a/tools/testing/selftests/bpf/xdpxceiver.c +++ b/tools/testing/selftests/bpf/xdpxceiver.c @@ -266,22 +266,24 @@ static int xsk_configure_umem(struct xsk_umem_info *umem, void *buffer, u64 size }
static int xsk_configure_socket(struct xsk_socket_info *xsk, struct xsk_umem_info *umem, - struct ifobject *ifobject, u32 qid) + struct ifobject *ifobject, bool shared) { - struct xsk_socket_config cfg; + struct xsk_socket_config cfg = {}; struct xsk_ring_cons *rxr; struct xsk_ring_prod *txr;
xsk->umem = umem; cfg.rx_size = xsk->rxqsize; cfg.tx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS; - cfg.libbpf_flags = 0; + cfg.libbpf_flags = XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD; cfg.xdp_flags = ifobject->xdp_flags; cfg.bind_flags = ifobject->bind_flags; + if (shared) + cfg.bind_flags |= XDP_SHARED_UMEM;
txr = ifobject->tx_on ? &xsk->tx : NULL; rxr = ifobject->rx_on ? &xsk->rx : NULL; - return xsk_socket__create(&xsk->xsk, ifobject->ifname, qid, umem->umem, rxr, txr, &cfg); + return xsk_socket__create(&xsk->xsk, ifobject->ifname, 0, umem->umem, rxr, txr, &cfg); }
static struct option long_options[] = { @@ -387,7 +389,6 @@ static void __test_spec_init(struct test_spec *test, struct ifobject *ifobj_tx, for (i = 0; i < MAX_INTERFACES; i++) { struct ifobject *ifobj = i ? ifobj_rx : ifobj_tx;
- ifobj->umem = &ifobj->umem_arr[0]; ifobj->xsk = &ifobj->xsk_arr[0]; ifobj->use_poll = false; ifobj->pacing_on = true; @@ -401,11 +402,12 @@ static void __test_spec_init(struct test_spec *test, struct ifobject *ifobj_tx, ifobj->tx_on = false; }
+ memset(ifobj->umem, 0, sizeof(*ifobj->umem)); + ifobj->umem->num_frames = DEFAULT_UMEM_BUFFERS; + ifobj->umem->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE; + for (j = 0; j < MAX_SOCKETS; j++) { - memset(&ifobj->umem_arr[j], 0, sizeof(ifobj->umem_arr[j])); memset(&ifobj->xsk_arr[j], 0, sizeof(ifobj->xsk_arr[j])); - ifobj->umem_arr[j].num_frames = DEFAULT_UMEM_BUFFERS; - ifobj->umem_arr[j].frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE; ifobj->xsk_arr[j].rxqsize = XSK_RING_CONS__DEFAULT_NUM_DESCS; } } @@ -950,7 +952,10 @@ static void tx_stats_validate(struct ifobject *ifobject)
static void thread_common_ops(struct test_spec *test, struct ifobject *ifobject) { + u64 umem_sz = ifobject->umem->num_frames * ifobject->umem->frame_size; int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE; + int ret, ifindex; + void *bufs; u32 i;
ifobject->ns_fd = switch_namespace(ifobject->nsname); @@ -958,23 +963,20 @@ static void thread_common_ops(struct test_spec *test, struct ifobject *ifobject) if (ifobject->umem->unaligned_mode) mmap_flags |= MAP_HUGETLB;
- for (i = 0; i < test->nb_sockets; i++) { - u64 umem_sz = ifobject->umem->num_frames * ifobject->umem->frame_size; - u32 ctr = 0; - void *bufs; - int ret; + bufs = mmap(NULL, umem_sz, PROT_READ | PROT_WRITE, mmap_flags, -1, 0); + if (bufs == MAP_FAILED) + exit_with_error(errno);
- bufs = mmap(NULL, umem_sz, PROT_READ | PROT_WRITE, mmap_flags, -1, 0); - if (bufs == MAP_FAILED) - exit_with_error(errno); + ret = xsk_configure_umem(ifobject->umem, bufs, umem_sz); + if (ret) + exit_with_error(-ret);
- ret = xsk_configure_umem(&ifobject->umem_arr[i], bufs, umem_sz); - if (ret) - exit_with_error(-ret); + for (i = 0; i < test->nb_sockets; i++) { + u32 ctr = 0;
while (ctr++ < SOCK_RECONF_CTR) { - ret = xsk_configure_socket(&ifobject->xsk_arr[i], &ifobject->umem_arr[i], - ifobject, i); + ret = xsk_configure_socket(&ifobject->xsk_arr[i], ifobject->umem, + ifobject, !!i); if (!ret) break;
@@ -985,8 +987,22 @@ static void thread_common_ops(struct test_spec *test, struct ifobject *ifobject) } }
- ifobject->umem = &ifobject->umem_arr[0]; ifobject->xsk = &ifobject->xsk_arr[0]; + + if (!ifobject->rx_on) + return; + + ifindex = if_nametoindex(ifobject->ifname); + if (!ifindex) + exit_with_error(errno); + + ret = xsk_setup_xdp_prog(ifindex, &ifobject->xsk_map_fd); + if (ret) + exit_with_error(-ret); + + ret = xsk_socket__update_xskmap(ifobject->xsk->xsk, ifobject->xsk_map_fd); + if (ret) + exit_with_error(-ret); }
static void testapp_cleanup_xsk_res(struct ifobject *ifobj) @@ -1142,14 +1158,16 @@ static void testapp_bidi(struct test_spec *test)
static void swap_xsk_resources(struct ifobject *ifobj_tx, struct ifobject *ifobj_rx) { + int ret; + xsk_socket__delete(ifobj_tx->xsk->xsk); - xsk_umem__delete(ifobj_tx->umem->umem); xsk_socket__delete(ifobj_rx->xsk->xsk); - xsk_umem__delete(ifobj_rx->umem->umem); - ifobj_tx->umem = &ifobj_tx->umem_arr[1]; ifobj_tx->xsk = &ifobj_tx->xsk_arr[1]; - ifobj_rx->umem = &ifobj_rx->umem_arr[1]; ifobj_rx->xsk = &ifobj_rx->xsk_arr[1]; + + ret = xsk_socket__update_xskmap(ifobj_rx->xsk->xsk, ifobj_rx->xsk_map_fd); + if (ret) + exit_with_error(-ret); }
static void testapp_bpf_res(struct test_spec *test) @@ -1408,13 +1426,13 @@ static struct ifobject *ifobject_create(void) if (!ifobj->xsk_arr) goto out_xsk_arr;
- ifobj->umem_arr = calloc(MAX_SOCKETS, sizeof(*ifobj->umem_arr)); - if (!ifobj->umem_arr) - goto out_umem_arr; + ifobj->umem = calloc(1, sizeof(*ifobj->umem)); + if (!ifobj->umem) + goto out_umem;
return ifobj;
-out_umem_arr: +out_umem: free(ifobj->xsk_arr); out_xsk_arr: free(ifobj); @@ -1423,7 +1441,7 @@ static struct ifobject *ifobject_create(void)
static void ifobject_delete(struct ifobject *ifobj) { - free(ifobj->umem_arr); + free(ifobj->umem); free(ifobj->xsk_arr); free(ifobj); } diff --git a/tools/testing/selftests/bpf/xdpxceiver.h b/tools/testing/selftests/bpf/xdpxceiver.h index 2f705f44b748..62a3e6388632 100644 --- a/tools/testing/selftests/bpf/xdpxceiver.h +++ b/tools/testing/selftests/bpf/xdpxceiver.h @@ -125,10 +125,10 @@ struct ifobject { struct xsk_socket_info *xsk; struct xsk_socket_info *xsk_arr; struct xsk_umem_info *umem; - struct xsk_umem_info *umem_arr; thread_func_t func_ptr; struct pkt_stream *pkt_stream; int ns_fd; + int xsk_map_fd; u32 dst_ip; u32 src_ip; u32 xdp_flags;
From: Roi Dayan roid@nvidia.com
[ Upstream commit eeed226ed110ed40598e60e29b66643012277be7 ]
In later commit we are going to instantiate multiple attr instances for flow instead of single attr. Parsing TC sample allocates a new memory but there is no symmetric cleanup in the infrastructure. To avoid asymmetric alloc/free use sample_attr as part of the flow attr and not allocated and held as a pointer. This will avoid a cleanup leak when sample action is not on the first attr.
Signed-off-by: Roi Dayan roid@nvidia.com Reviewed-by: Oz Shlomo ozsh@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/en/tc/act/sample.c | 7 +------ drivers/net/ethernet/mellanox/mlx5/core/en/tc/sample.c | 10 +++++----- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 1 - drivers/net/ethernet/mellanox/mlx5/core/en_tc.h | 2 +- .../net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 6 +++--- 5 files changed, 10 insertions(+), 16 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/sample.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/sample.c index 6699bdf5cf01..b895c378cfaf 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/sample.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/sample.c @@ -27,11 +27,7 @@ tc_act_parse_sample(struct mlx5e_tc_act_parse_state *parse_state, struct mlx5e_priv *priv, struct mlx5_flow_attr *attr) { - struct mlx5e_sample_attr *sample_attr; - - sample_attr = kzalloc(sizeof(*attr->sample_attr), GFP_KERNEL); - if (!sample_attr) - return -ENOMEM; + struct mlx5e_sample_attr *sample_attr = &attr->sample_attr;
sample_attr->rate = act->sample.rate; sample_attr->group_num = act->sample.psample_group->group_num; @@ -39,7 +35,6 @@ tc_act_parse_sample(struct mlx5e_tc_act_parse_state *parse_state, if (act->sample.truncate) sample_attr->trunc_size = act->sample.trunc_size;
- attr->sample_attr = sample_attr; flow_flag_set(parse_state->flow, SAMPLE);
return 0; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/sample.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/sample.c index ff4b4f8a5a9d..0faaf9a4b531 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/sample.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/sample.c @@ -513,7 +513,7 @@ mlx5e_tc_sample_offload(struct mlx5e_tc_psample *tc_psample, sample_flow = kzalloc(sizeof(*sample_flow), GFP_KERNEL); if (!sample_flow) return ERR_PTR(-ENOMEM); - sample_attr = attr->sample_attr; + sample_attr = &attr->sample_attr; sample_attr->sample_flow = sample_flow;
/* For NICs with reg_c_preserve support or decap action, use @@ -546,6 +546,7 @@ mlx5e_tc_sample_offload(struct mlx5e_tc_psample *tc_psample, err = PTR_ERR(sample_flow->sampler); goto err_sampler; } + sample_attr->sampler_id = sample_flow->sampler->sampler_id;
/* Create an id mapping reg_c0 value to sample object. */ restore_obj.type = MLX5_MAPPED_OBJ_SAMPLE; @@ -585,8 +586,7 @@ mlx5e_tc_sample_offload(struct mlx5e_tc_psample *tc_psample, pre_attr->outer_match_level = attr->outer_match_level; pre_attr->chain = attr->chain; pre_attr->prio = attr->prio; - pre_attr->sample_attr = attr->sample_attr; - sample_attr->sampler_id = sample_flow->sampler->sampler_id; + pre_attr->sample_attr = *sample_attr; pre_esw_attr = pre_attr->esw_attr; pre_esw_attr->in_mdev = esw_attr->in_mdev; pre_esw_attr->in_rep = esw_attr->in_rep; @@ -633,11 +633,11 @@ mlx5e_tc_sample_unoffload(struct mlx5e_tc_psample *tc_psample, * will hit fw syndromes. */ esw = tc_psample->esw; - sample_flow = attr->sample_attr->sample_flow; + sample_flow = attr->sample_attr.sample_flow; mlx5_eswitch_del_offloaded_rule(esw, sample_flow->pre_rule, sample_flow->pre_attr);
sample_restore_put(tc_psample, sample_flow->restore); - mapping_remove(esw->offloads.reg_c0_obj_pool, attr->sample_attr->restore_obj_id); + mapping_remove(esw->offloads.reg_c0_obj_pool, attr->sample_attr.restore_obj_id); sampler_put(tc_psample, sample_flow->sampler); if (sample_flow->post_act_handle) mlx5e_tc_post_act_del(tc_psample->post_act, sample_flow->post_act_handle); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index b27532a9301e..7e5c00349ccf 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -1634,7 +1634,6 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv, if (flow_flag_test(flow, L3_TO_L2_DECAP)) mlx5e_detach_decap(priv, flow);
- kfree(attr->sample_attr); kvfree(attr->esw_attr->rx_tun_attr); kvfree(attr->parse_attr); kfree(flow->attr); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h index 5ffae9b13066..2f09e34db9ff 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h @@ -71,7 +71,7 @@ struct mlx5_flow_attr { struct mlx5_fc *counter; struct mlx5_modify_hdr *modify_hdr; struct mlx5_ct_attr ct_attr; - struct mlx5e_sample_attr *sample_attr; + struct mlx5e_sample_attr sample_attr; struct mlx5e_tc_flow_parse_attr *parse_attr; u32 chain; u16 prio; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index cfcd72bad9af..e7e7b4b0dcdb 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -201,12 +201,12 @@ esw_cleanup_decap_indir(struct mlx5_eswitch *esw, static int esw_setup_sampler_dest(struct mlx5_flow_destination *dest, struct mlx5_flow_act *flow_act, - struct mlx5_flow_attr *attr, + u32 sampler_id, int i) { flow_act->flags |= FLOW_ACT_IGNORE_FLOW_LEVEL; dest[i].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_SAMPLER; - dest[i].sampler_id = attr->sample_attr->sampler_id; + dest[i].sampler_id = sampler_id;
return 0; } @@ -466,7 +466,7 @@ esw_setup_dests(struct mlx5_flow_destination *dest, attr->flags |= MLX5_ESW_ATTR_FLAG_SRC_REWRITE;
if (attr->flags & MLX5_ESW_ATTR_FLAG_SAMPLE) { - esw_setup_sampler_dest(dest, flow_act, attr, *i); + esw_setup_sampler_dest(dest, flow_act, attr->sample_attr.sampler_id, *i); (*i)++; } else if (attr->dest_ft) { esw_setup_ft_dest(dest, flow_act, esw, attr, spec, *i);
From: Philip Yang Philip.Yang@amd.com
[ Upstream commit ac7c48c0cce00d03b3c95fddcccb0a45257e33e3 ]
SVM ioctls take proper svms->lock to handle race conditions, don't need take process mutex to serialize ioctls. This also fixes circular locking warning:
WARNING: possible circular locking dependency detected
Possible unsafe locking scenario:
CPU0 CPU1 ---- ---- lock((work_completion)(&svms->deferred_list_work)); lock(&process->mutex); lock((work_completion)(&svms->deferred_list_work)); lock(&process->mutex);
*** DEADLOCK ***
Signed-off-by: Philip Yang Philip.Yang@amd.com Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 337953af7c2f..70122978bdd0 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1846,13 +1846,9 @@ static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, void *data) if (!args->start_addr || !args->size) return -EINVAL;
- mutex_lock(&p->mutex); - r = svm_ioctl(p, args->op, args->start_addr, args->size, args->nattr, args->attrs);
- mutex_unlock(&p->mutex); - return r; } #else
From: Philip Yang Philip.Yang@amd.com
[ Upstream commit 367c9b0f1b8750a704070e7ae85234d591290434 ]
svm_deferred_list work should continue to handle deferred_range_list which maybe split to child range to avoid child range leak, and remove ranges mmu interval notifier to avoid mm mm_count leak. So taking mm reference when adding range to deferred list, to ensure mm is valid in the scheduled deferred_list_work, and drop the mm referrence after range is handled.
Signed-off-by: Philip Yang Philip.Yang@amd.com Reported-by: Ruili Ji ruili.ji@amd.com Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 62 ++++++++++++++++------------ 1 file changed, 36 insertions(+), 26 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index f2805ba74c80..225affcddbc1 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1985,10 +1985,9 @@ svm_range_update_notifier_and_interval_tree(struct mm_struct *mm, }
static void -svm_range_handle_list_op(struct svm_range_list *svms, struct svm_range *prange) +svm_range_handle_list_op(struct svm_range_list *svms, struct svm_range *prange, + struct mm_struct *mm) { - struct mm_struct *mm = prange->work_item.mm; - switch (prange->work_item.op) { case SVM_OP_NULL: pr_debug("NULL OP 0x%p prange 0x%p [0x%lx 0x%lx]\n", @@ -2071,34 +2070,41 @@ static void svm_range_deferred_list_work(struct work_struct *work) pr_debug("enter svms 0x%p\n", svms);
p = container_of(svms, struct kfd_process, svms); - /* Avoid mm is gone when inserting mmu notifier */ - mm = get_task_mm(p->lead_thread); - if (!mm) { - pr_debug("svms 0x%p process mm gone\n", svms); - return; - } -retry: - mmap_write_lock(mm); - - /* Checking for the need to drain retry faults must be inside - * mmap write lock to serialize with munmap notifiers. - */ - if (unlikely(atomic_read(&svms->drain_pagefaults))) { - mmap_write_unlock(mm); - svm_range_drain_retry_fault(svms); - goto retry; - }
spin_lock(&svms->deferred_list_lock); while (!list_empty(&svms->deferred_range_list)) { prange = list_first_entry(&svms->deferred_range_list, struct svm_range, deferred_list); - list_del_init(&prange->deferred_list); spin_unlock(&svms->deferred_list_lock);
pr_debug("prange 0x%p [0x%lx 0x%lx] op %d\n", prange, prange->start, prange->last, prange->work_item.op);
+ mm = prange->work_item.mm; +retry: + mmap_write_lock(mm); + + /* Checking for the need to drain retry faults must be inside + * mmap write lock to serialize with munmap notifiers. + */ + if (unlikely(atomic_read(&svms->drain_pagefaults))) { + mmap_write_unlock(mm); + svm_range_drain_retry_fault(svms); + goto retry; + } + + /* Remove from deferred_list must be inside mmap write lock, for + * two race cases: + * 1. unmap_from_cpu may change work_item.op and add the range + * to deferred_list again, cause use after free bug. + * 2. svm_range_list_lock_and_flush_work may hold mmap write + * lock and continue because deferred_list is empty, but + * deferred_list work is actually waiting for mmap lock. + */ + spin_lock(&svms->deferred_list_lock); + list_del_init(&prange->deferred_list); + spin_unlock(&svms->deferred_list_lock); + mutex_lock(&svms->lock); mutex_lock(&prange->migrate_mutex); while (!list_empty(&prange->child_list)) { @@ -2109,19 +2115,20 @@ static void svm_range_deferred_list_work(struct work_struct *work) pr_debug("child prange 0x%p op %d\n", pchild, pchild->work_item.op); list_del_init(&pchild->child_list); - svm_range_handle_list_op(svms, pchild); + svm_range_handle_list_op(svms, pchild, mm); } mutex_unlock(&prange->migrate_mutex);
- svm_range_handle_list_op(svms, prange); + svm_range_handle_list_op(svms, prange, mm); mutex_unlock(&svms->lock); + mmap_write_unlock(mm); + + /* Pairs with mmget in svm_range_add_list_work */ + mmput(mm);
spin_lock(&svms->deferred_list_lock); } spin_unlock(&svms->deferred_list_lock); - - mmap_write_unlock(mm); - mmput(mm); pr_debug("exit svms 0x%p\n", svms); }
@@ -2139,6 +2146,9 @@ svm_range_add_list_work(struct svm_range_list *svms, struct svm_range *prange, prange->work_item.op = op; } else { prange->work_item.op = op; + + /* Pairs with mmput in deferred_list_work */ + mmget(mm); prange->work_item.mm = mm; list_add_tail(&prange->deferred_list, &prange->svms->deferred_range_list);
From: Philip Yang Philip.Yang@amd.com
[ Upstream commit 6225bb3a88d22594aacea2485dc28ca12d596721 ]
kfd_process_notifier_release flush svm_range_restore_work which calls svm_range_list_lock_and_flush_work to flush deferred_list work, but if deferred_list work mmput release the last user, it will call exit_mmap -> notifier_release, it is deadlock with below backtrace.
Move flush svm_range_restore_work to kfd_process_wq_release to avoid deadlock. Then svm_range_restore_work take task->mm ref to avoid mm is gone while validating and mapping ranges to GPU.
Workqueue: events svm_range_deferred_list_work [amdgpu] Call Trace: wait_for_completion+0x94/0x100 __flush_work+0x12a/0x1e0 __cancel_work_timer+0x10e/0x190 cancel_delayed_work_sync+0x13/0x20 kfd_process_notifier_release+0x98/0x2a0 [amdgpu] __mmu_notifier_release+0x74/0x1f0 exit_mmap+0x170/0x200 mmput+0x5d/0x130 svm_range_deferred_list_work+0x104/0x230 [amdgpu] process_one_work+0x220/0x3c0
Signed-off-by: Philip Yang Philip.Yang@amd.com Reported-by: Ruili Ji ruili.ji@amd.com Tested-by: Ruili Ji ruili.ji@amd.com Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 1 - drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 15 +++++++++------ 2 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index d1145da5348f..74f162887d3b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -1150,7 +1150,6 @@ static void kfd_process_notifier_release(struct mmu_notifier *mn,
cancel_delayed_work_sync(&p->eviction_work); cancel_delayed_work_sync(&p->restore_work); - cancel_delayed_work_sync(&p->svms.restore_work);
mutex_lock(&p->mutex);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 225affcddbc1..1cf9041c9727 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1643,13 +1643,14 @@ static void svm_range_restore_work(struct work_struct *work)
pr_debug("restore svm ranges\n");
- /* kfd_process_notifier_release destroys this worker thread. So during - * the lifetime of this thread, kfd_process and mm will be valid. - */ p = container_of(svms, struct kfd_process, svms); - mm = p->mm; - if (!mm) + + /* Keep mm reference when svm_range_validate_and_map ranges */ + mm = get_task_mm(p->lead_thread); + if (!mm) { + pr_debug("svms 0x%p process mm gone\n", svms); return; + }
svm_range_list_lock_and_flush_work(svms, mm); mutex_lock(&svms->lock); @@ -1703,6 +1704,7 @@ static void svm_range_restore_work(struct work_struct *work) out_reschedule: mutex_unlock(&svms->lock); mmap_write_unlock(mm); + mmput(mm);
/* If validation failed, reschedule another attempt */ if (evicted_ranges) { @@ -2840,6 +2842,8 @@ void svm_range_list_fini(struct kfd_process *p)
pr_debug("pasid 0x%x svms 0x%p\n", p->pasid, &p->svms);
+ cancel_delayed_work_sync(&p->svms.restore_work); + /* Ensure list work is finished before process is destroyed */ flush_work(&p->svms.deferred_list_work);
@@ -2850,7 +2854,6 @@ void svm_range_list_fini(struct kfd_process *p) atomic_inc(&p->svms.drain_pagefaults); svm_range_drain_retry_fault(&p->svms);
- list_for_each_entry_safe(prange, next, &p->svms.list, list) { svm_range_unlink(prange); svm_range_remove_notifier(prange);
From: Tianci.Yin tianci.yin@amd.com
[ Upstream commit 7270e8957eb9aacf5914605d04865f3829a14bce ]
[why] In rmmod procedure, kfd sends cp a dequeue request, but the request does not get response, then an error message "cp queue pipe 4 queue 0 preemption failed" printed.
[how] Performing kfd suspending after disabling gfxoff can fix it.
Acked-by: Felix Kuehling Felix.Kuehling@amd.com Reviewed-by: Yang Wang kevinyang.wang@amd.com Signed-off-by: Tianci.Yin tianci.yin@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index f18c698137a6..b87dca6d09fa 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2723,11 +2723,11 @@ static int amdgpu_device_ip_fini_early(struct amdgpu_device *adev) } }
- amdgpu_amdkfd_suspend(adev, false); - amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE); amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
+ amdgpu_amdkfd_suspend(adev, false); + /* Workaroud for ASICs need to disable SMC first */ amdgpu_device_smu_fini_early(adev);
From: Amit Cohen amcohen@nvidia.com
[ Upstream commit bcdfd615f83b4bd04678109bf18022d1476e4bbf ]
When processing events generated by the device's firmware, the driver protects itself from events reported for non-existent local ports, but not for the CPU port (local port 0), which exists, but does not have all the fields as any local port.
This can result in a NULL pointer dereference when trying access 'struct mlxsw_sp_port' fields which are not initialized for CPU port.
Commit 63b08b1f6834 ("mlxsw: spectrum: Protect driver from buggy firmware") already handled such issue by bailing early when processing a PUDE event reported for the CPU port.
Generalize the approach by moving the check to a common function and making use of it in all relevant places.
Signed-off-by: Amit Cohen amcohen@nvidia.com Signed-off-by: Ido Schimmel idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 4 +--- drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 7 +++++++ drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c | 3 +-- drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 3 +-- 4 files changed, 10 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c index aa411dec62f0..eb1319d63613 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c @@ -2148,13 +2148,11 @@ static void mlxsw_sp_pude_event_func(const struct mlxsw_reg_info *reg, struct mlxsw_sp *mlxsw_sp = priv; struct mlxsw_sp_port *mlxsw_sp_port; enum mlxsw_reg_pude_oper_status status; - unsigned int max_ports; u16 local_port;
- max_ports = mlxsw_core_max_ports(mlxsw_sp->core); local_port = mlxsw_reg_pude_local_port_get(pude_pl);
- if (WARN_ON_ONCE(!local_port || local_port >= max_ports)) + if (WARN_ON_ONCE(!mlxsw_sp_local_port_is_valid(mlxsw_sp, local_port))) return; mlxsw_sp_port = mlxsw_sp->ports[local_port]; if (!mlxsw_sp_port) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h index bb2442e1f705..30942b6ffcf9 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h @@ -481,6 +481,13 @@ int mlxsw_sp_port_vlan_classification_set(struct mlxsw_sp_port *mlxsw_sp_port, bool is_8021ad_tagged, bool is_8021q_tagged); +static inline bool +mlxsw_sp_local_port_is_valid(struct mlxsw_sp *mlxsw_sp, u16 local_port) +{ + unsigned int max_ports = mlxsw_core_max_ports(mlxsw_sp->core); + + return local_port < max_ports && local_port; +}
/* spectrum_buffers.c */ struct mlxsw_sp_hdroom_prio { diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c index 0ff163fbc775..35422e64d89f 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c @@ -568,12 +568,11 @@ void mlxsw_sp1_ptp_got_timestamp(struct mlxsw_sp *mlxsw_sp, bool ingress, u8 domain_number, u16 sequence_id, u64 timestamp) { - unsigned int max_ports = mlxsw_core_max_ports(mlxsw_sp->core); struct mlxsw_sp_port *mlxsw_sp_port; struct mlxsw_sp1_ptp_key key; u8 types;
- if (WARN_ON_ONCE(local_port >= max_ports)) + if (WARN_ON_ONCE(!mlxsw_sp_local_port_is_valid(mlxsw_sp, local_port))) return; mlxsw_sp_port = mlxsw_sp->ports[local_port]; if (!mlxsw_sp_port) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c index 65c1724c63b0..bffdb41fc4ed 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c @@ -2616,7 +2616,6 @@ static void mlxsw_sp_fdb_notify_mac_process(struct mlxsw_sp *mlxsw_sp, char *sfn_pl, int rec_index, bool adding) { - unsigned int max_ports = mlxsw_core_max_ports(mlxsw_sp->core); struct mlxsw_sp_port_vlan *mlxsw_sp_port_vlan; struct mlxsw_sp_bridge_device *bridge_device; struct mlxsw_sp_bridge_port *bridge_port; @@ -2630,7 +2629,7 @@ static void mlxsw_sp_fdb_notify_mac_process(struct mlxsw_sp *mlxsw_sp,
mlxsw_reg_sfn_mac_unpack(sfn_pl, rec_index, mac, &fid, &local_port);
- if (WARN_ON_ONCE(local_port >= max_ports)) + if (WARN_ON_ONCE(!mlxsw_sp_local_port_is_valid(mlxsw_sp, local_port))) return; mlxsw_sp_port = mlxsw_sp->ports[local_port]; if (!mlxsw_sp_port) {
From: Jack Wang jinpu.wang@ionos.com
[ Upstream commit c1289d5d8502d62e5bc50ff066c9d6daabfc3264 ]
We can't do instant reconnect, not to DDoS server, but we should stop and failover earlier, so there is less service interruption.
To avoid deadlock, as error_recovery is called from different callback like rdma event or hb error handler, add a new err recovery_work.
Link: https://lore.kernel.org/r/20220114154753.983568-6-haris.iqbal@ionos.com Signed-off-by: Jack Wang jinpu.wang@ionos.com Reviewed-by: Aleksei Marov aleksei.marov@ionos.com Reviewed-by: Md Haris Iqbal haris.iqbal@ionos.com Signed-off-by: Md Haris Iqbal haris.iqbal@ionos.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/ulp/rtrs/rtrs-clt.c | 40 ++++++++++++++------------ drivers/infiniband/ulp/rtrs/rtrs-clt.h | 1 + 2 files changed, 23 insertions(+), 18 deletions(-)
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c index 759b85f03331..df4d06d4d183 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c @@ -297,6 +297,7 @@ static bool rtrs_clt_change_state_from_to(struct rtrs_clt_path *clt_path, return changed; }
+static void rtrs_clt_stop_and_destroy_conns(struct rtrs_clt_path *clt_path); static void rtrs_rdma_error_recovery(struct rtrs_clt_con *con) { struct rtrs_clt_path *clt_path = to_clt_path(con->c.path); @@ -304,16 +305,7 @@ static void rtrs_rdma_error_recovery(struct rtrs_clt_con *con) if (rtrs_clt_change_state_from_to(clt_path, RTRS_CLT_CONNECTED, RTRS_CLT_RECONNECTING)) { - struct rtrs_clt_sess *clt = clt_path->clt; - unsigned int delay_ms; - - /* - * Normal scenario, reconnect if we were successfully connected - */ - delay_ms = clt->reconnect_delay_sec * 1000; - queue_delayed_work(rtrs_wq, &clt_path->reconnect_dwork, - msecs_to_jiffies(delay_ms + - prandom_u32() % RTRS_RECONNECT_SEED)); + queue_work(rtrs_wq, &clt_path->err_recovery_work); } else { /* * Error can happen just on establishing new connection, @@ -1511,6 +1503,22 @@ static void rtrs_clt_init_hb(struct rtrs_clt_path *clt_path) static void rtrs_clt_reconnect_work(struct work_struct *work); static void rtrs_clt_close_work(struct work_struct *work);
+static void rtrs_clt_err_recovery_work(struct work_struct *work) +{ + struct rtrs_clt_path *clt_path; + struct rtrs_clt_sess *clt; + int delay_ms; + + clt_path = container_of(work, struct rtrs_clt_path, err_recovery_work); + clt = clt_path->clt; + delay_ms = clt->reconnect_delay_sec * 1000; + rtrs_clt_stop_and_destroy_conns(clt_path); + queue_delayed_work(rtrs_wq, &clt_path->reconnect_dwork, + msecs_to_jiffies(delay_ms + + prandom_u32() % + RTRS_RECONNECT_SEED)); +} + static struct rtrs_clt_path *alloc_path(struct rtrs_clt_sess *clt, const struct rtrs_addr *path, size_t con_num, u32 nr_poll_queues) @@ -1562,6 +1570,7 @@ static struct rtrs_clt_path *alloc_path(struct rtrs_clt_sess *clt, clt_path->state = RTRS_CLT_CONNECTING; atomic_set(&clt_path->connected_cnt, 0); INIT_WORK(&clt_path->close_work, rtrs_clt_close_work); + INIT_WORK(&clt_path->err_recovery_work, rtrs_clt_err_recovery_work); INIT_DELAYED_WORK(&clt_path->reconnect_dwork, rtrs_clt_reconnect_work); rtrs_clt_init_hb(clt_path);
@@ -2326,6 +2335,7 @@ static void rtrs_clt_close_work(struct work_struct *work)
clt_path = container_of(work, struct rtrs_clt_path, close_work);
+ cancel_work_sync(&clt_path->err_recovery_work); cancel_delayed_work_sync(&clt_path->reconnect_dwork); rtrs_clt_stop_and_destroy_conns(clt_path); rtrs_clt_change_state_get_old(clt_path, RTRS_CLT_CLOSED, NULL); @@ -2638,7 +2648,6 @@ static void rtrs_clt_reconnect_work(struct work_struct *work) { struct rtrs_clt_path *clt_path; struct rtrs_clt_sess *clt; - unsigned int delay_ms; int err;
clt_path = container_of(to_delayed_work(work), struct rtrs_clt_path, @@ -2655,8 +2664,6 @@ static void rtrs_clt_reconnect_work(struct work_struct *work) } clt_path->reconnect_attempts++;
- /* Stop everything */ - rtrs_clt_stop_and_destroy_conns(clt_path); msleep(RTRS_RECONNECT_BACKOFF); if (rtrs_clt_change_state_get_old(clt_path, RTRS_CLT_CONNECTING, NULL)) { err = init_path(clt_path); @@ -2669,11 +2676,7 @@ static void rtrs_clt_reconnect_work(struct work_struct *work) reconnect_again: if (rtrs_clt_change_state_get_old(clt_path, RTRS_CLT_RECONNECTING, NULL)) { clt_path->stats->reconnects.fail_cnt++; - delay_ms = clt->reconnect_delay_sec * 1000; - queue_delayed_work(rtrs_wq, &clt_path->reconnect_dwork, - msecs_to_jiffies(delay_ms + - prandom_u32() % - RTRS_RECONNECT_SEED)); + queue_work(rtrs_wq, &clt_path->err_recovery_work); } }
@@ -2908,6 +2911,7 @@ int rtrs_clt_reconnect_from_sysfs(struct rtrs_clt_path *clt_path) &old_state); if (changed) { clt_path->reconnect_attempts = 0; + rtrs_clt_stop_and_destroy_conns(clt_path); queue_delayed_work(rtrs_wq, &clt_path->reconnect_dwork, 0); } if (changed || old_state == RTRS_CLT_RECONNECTING) { diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.h b/drivers/infiniband/ulp/rtrs/rtrs-clt.h index d1b18a154ae0..f848c0392d98 100644 --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.h +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.h @@ -134,6 +134,7 @@ struct rtrs_clt_path { struct rtrs_clt_io_req *reqs; struct delayed_work reconnect_dwork; struct work_struct close_work; + struct work_struct err_recovery_work; unsigned int reconnect_attempts; bool established; struct rtrs_rbuf *rbufs;
From: Sachin Sant sachinp@linux.ibm.com
[ Upstream commit 279d1a72c0f8021520f68ddb0a1346ff9ba1ea8c ]
Cédric pointed out that XIVE IPI information exported via sysfs (debug/powerpc/xive) display empty lines for processors which are not online.
Switch to using for_each_online_cpu() so that information is displayed for online-only processors.
Reported-by: Cédric Le Goater clg@kaod.org Signed-off-by: Sachin Sant sachinp@linux.ibm.com Reviewed-by: Cédric Le Goater clg@kaod.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/164146703333.19039.10920919226094771665.sendpatchs... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/sysdev/xive/common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c index 89c86f32aff8..bb5bda6b2357 100644 --- a/arch/powerpc/sysdev/xive/common.c +++ b/arch/powerpc/sysdev/xive/common.c @@ -1791,7 +1791,7 @@ static int xive_ipi_debug_show(struct seq_file *m, void *private) if (xive_ops->debug_show) xive_ops->debug_show(m, private);
- for_each_possible_cpu(cpu) + for_each_online_cpu(cpu) xive_debug_show_ipi(m, cpu); return 0; }
From: Maxim Kiselev bigunclemax@gmail.com
[ Upstream commit 17846485dff91acce1ad47b508b633dffc32e838 ]
T1040RDB has two RTL8211E-VB phys which requires setting of internal delays for correct work.
Changing the phy-connection-type property to `rgmii-id` will fix this issue.
Signed-off-by: Maxim Kiselev bigunclemax@gmail.com Reviewed-by: Maxim Kochetkov fido_max@inbox.ru Reviewed-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20211230151123.1258321-1-bigunclemax@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/boot/dts/fsl/t104xrdb.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/boot/dts/fsl/t104xrdb.dtsi b/arch/powerpc/boot/dts/fsl/t104xrdb.dtsi index 099a598c74c0..bfe1ed5be337 100644 --- a/arch/powerpc/boot/dts/fsl/t104xrdb.dtsi +++ b/arch/powerpc/boot/dts/fsl/t104xrdb.dtsi @@ -139,12 +139,12 @@ fman@400000 { ethernet@e6000 { phy-handle = <&phy_rgmii_0>; - phy-connection-type = "rgmii"; + phy-connection-type = "rgmii-id"; };
ethernet@e8000 { phy-handle = <&phy_rgmii_1>; - phy-connection-type = "rgmii"; + phy-connection-type = "rgmii-id"; };
mdio0: mdio@fc000 {
From: Venkateswara Naralasetty quic_vnaralas@quicinc.com
[ Upstream commit 22b59cb965f79ee1accf83172441c9ca0ecb632a ]
Call netif_napi_del() from ath11k_ahb_free_ext_irq() to fix the following kernel panic when unload/load ath11k modules for few iterations.
[ 971.201365] Unable to handle kernel paging request at virtual address 6d97a208 [ 971.204227] pgd = 594c2919 [ 971.211478] [6d97a208] *pgd=00000000 [ 971.214120] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [ 971.412024] CPU: 2 PID: 4435 Comm: insmod Not tainted 5.4.89 #0 [ 971.434256] Hardware name: Generic DT based system [ 971.440165] PC is at napi_by_id+0x10/0x40 [ 971.445019] LR is at netif_napi_add+0x160/0x1dc
[ 971.743127] (napi_by_id) from [<807d89a0>] (netif_napi_add+0x160/0x1dc) [ 971.751295] (netif_napi_add) from [<7f1209ac>] (ath11k_ahb_config_irq+0xf8/0x414 [ath11k_ahb]) [ 971.759164] (ath11k_ahb_config_irq [ath11k_ahb]) from [<7f12135c>] (ath11k_ahb_probe+0x40c/0x51c [ath11k_ahb]) [ 971.768567] (ath11k_ahb_probe [ath11k_ahb]) from [<80666864>] (platform_drv_probe+0x48/0x94) [ 971.779670] (platform_drv_probe) from [<80664718>] (really_probe+0x1c8/0x450) [ 971.789389] (really_probe) from [<80664cc4>] (driver_probe_device+0x15c/0x1b8) [ 971.797547] (driver_probe_device) from [<80664f60>] (device_driver_attach+0x44/0x60) [ 971.805795] (device_driver_attach) from [<806650a0>] (__driver_attach+0x124/0x140) [ 971.814822] (__driver_attach) from [<80662adc>] (bus_for_each_dev+0x58/0xa4) [ 971.823328] (bus_for_each_dev) from [<80663a2c>] (bus_add_driver+0xf0/0x1e8) [ 971.831662] (bus_add_driver) from [<806658a4>] (driver_register+0xa8/0xf0) [ 971.839822] (driver_register) from [<8030269c>] (do_one_initcall+0x78/0x1ac) [ 971.847638] (do_one_initcall) from [<80392524>] (do_init_module+0x54/0x200) [ 971.855968] (do_init_module) from [<803945b0>] (load_module+0x1e30/0x1ffc) [ 971.864126] (load_module) from [<803948b0>] (sys_init_module+0x134/0x17c) [ 971.871852] (sys_init_module) from [<80301000>] (ret_fast_syscall+0x0/0x50)
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.6.0.1-00760-QCAHKSWPL_SILICONZ-1
Signed-off-by: Venkateswara Naralasetty quic_vnaralas@quicinc.com Signed-off-by: Kalle Valo quic_kvalo@quicinc.com Link: https://lore.kernel.org/r/1642583973-21599-1-git-send-email-quic_vnaralas@qu... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath11k/ahb.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/wireless/ath/ath11k/ahb.c b/drivers/net/wireless/ath/ath11k/ahb.c index 3fb0aa000825..24bd0520926b 100644 --- a/drivers/net/wireless/ath/ath11k/ahb.c +++ b/drivers/net/wireless/ath/ath11k/ahb.c @@ -391,6 +391,8 @@ static void ath11k_ahb_free_ext_irq(struct ath11k_base *ab)
for (j = 0; j < irq_grp->num_irq; j++) free_irq(ab->irq_num[irq_grp->irqs[j]], irq_grp); + + netif_napi_del(&irq_grp->napi); } }
From: Kalle Valo quic_kvalo@quicinc.com
[ Upstream commit b4f4c56459a5c744f7f066b9fc2b54ea995030c5 ]
Mario reported that the kernel was crashing on suspend if ath11k was not able to find a board file:
[ 473.693286] PM: Suspending system (s2idle) [ 473.693291] printk: Suspending console(s) (use no_console_suspend to debug) [ 474.407787] BUG: unable to handle page fault for address: 0000000000002070 [ 474.407791] #PF: supervisor read access in kernel mode [ 474.407794] #PF: error_code(0x0000) - not-present page [ 474.407798] PGD 0 P4D 0 [ 474.407801] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 474.407805] CPU: 2 PID: 2350 Comm: kworker/u32:14 Tainted: G W 5.16.0 #248 [...] [ 474.407868] Call Trace: [ 474.407870] <TASK> [ 474.407874] ? _raw_spin_lock_irqsave+0x2a/0x60 [ 474.407882] ? lock_timer_base+0x72/0xa0 [ 474.407889] ? _raw_spin_unlock_irqrestore+0x29/0x3d [ 474.407892] ? try_to_del_timer_sync+0x54/0x80 [ 474.407896] ath11k_dp_rx_pktlog_stop+0x49/0xc0 [ath11k] [ 474.407912] ath11k_core_suspend+0x34/0x130 [ath11k] [ 474.407923] ath11k_pci_pm_suspend+0x1b/0x50 [ath11k_pci] [ 474.407928] pci_pm_suspend+0x7e/0x170 [ 474.407935] ? pci_pm_freeze+0xc0/0xc0 [ 474.407939] dpm_run_callback+0x4e/0x150 [ 474.407947] __device_suspend+0x148/0x4c0 [ 474.407951] async_suspend+0x20/0x90 dmesg-efi-164255130401001: Oops#1 Part1 [ 474.407955] async_run_entry_fn+0x33/0x120 [ 474.407959] process_one_work+0x220/0x3f0 [ 474.407966] worker_thread+0x4a/0x3d0 [ 474.407971] kthread+0x17a/0x1a0 [ 474.407975] ? process_one_work+0x3f0/0x3f0 [ 474.407979] ? set_kthread_struct+0x40/0x40 [ 474.407983] ret_from_fork+0x22/0x30 [ 474.407991] </TASK>
The issue here is that board file loading happens after ath11k_pci_probe() succesfully returns (ath11k initialisation happends asynchronously) and the suspend handler is still enabled, of course failing as ath11k is not properly initialised. Fix this by checking ATH11K_FLAG_QMI_FAIL during both suspend and resume.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03003-QCAHSPSWPL_V1_V2_SILICONZ_LITE-2
Reported-by: Mario Limonciello mario.limonciello@amd.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=215504 Signed-off-by: Kalle Valo quic_kvalo@quicinc.com Link: https://lore.kernel.org/r/20220127090117.2024-1-kvalo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath11k/pci.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/net/wireless/ath/ath11k/pci.c b/drivers/net/wireless/ath/ath11k/pci.c index de71ad594f34..903758751c99 100644 --- a/drivers/net/wireless/ath/ath11k/pci.c +++ b/drivers/net/wireless/ath/ath11k/pci.c @@ -1571,6 +1571,11 @@ static __maybe_unused int ath11k_pci_pm_suspend(struct device *dev) struct ath11k_base *ab = dev_get_drvdata(dev); int ret;
+ if (test_bit(ATH11K_FLAG_QMI_FAIL, &ab->dev_flags)) { + ath11k_dbg(ab, ATH11K_DBG_BOOT, "boot skipping pci suspend as qmi is not initialised\n"); + return 0; + } + ret = ath11k_core_suspend(ab); if (ret) ath11k_warn(ab, "failed to suspend core: %d\n", ret); @@ -1583,6 +1588,11 @@ static __maybe_unused int ath11k_pci_pm_resume(struct device *dev) struct ath11k_base *ab = dev_get_drvdata(dev); int ret;
+ if (test_bit(ATH11K_FLAG_QMI_FAIL, &ab->dev_flags)) { + ath11k_dbg(ab, ATH11K_DBG_BOOT, "boot skipping pci resume as qmi is not initialised\n"); + return 0; + } + ret = ath11k_core_resume(ab); if (ret) ath11k_warn(ab, "failed to resume core: %d\n", ret);
From: Kalle Valo quic_kvalo@quicinc.com
[ Upstream commit 3df6d74aedfdca919cca475d15dfdbc8b05c9e5d ]
If amss.bin was missing ath11k would crash during 'rmmod ath11k_pci'. The reason for that was that we were using mhi_async_power_up() which does not check any errors. But mhi_sync_power_up() on the other hand does check for errors so let's use that to fix the crash.
I was not able to find a reason why an async version was used. ath11k_mhi_start() (which enables state ATH11K_MHI_POWER_ON) is called from ath11k_hif_power_up(), which can sleep. So sync version should be safe to use here.
[ 145.569731] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI [ 145.569789] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] [ 145.569843] CPU: 2 PID: 1628 Comm: rmmod Kdump: loaded Tainted: G W 5.16.0-wt-ath+ #567 [ 145.569898] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0067.2021.0528.1339 05/28/2021 [ 145.569956] RIP: 0010:ath11k_hal_srng_access_begin+0xb5/0x2b0 [ath11k] [ 145.570028] Code: df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 ec 01 00 00 48 8b ab a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 ea 48 c1 ea 03 <0f> b6 14 02 48 89 e8 83 e0 07 83 c0 03 45 85 ed 75 48 38 d0 7c 08 [ 145.570089] RSP: 0018:ffffc900025d7ac0 EFLAGS: 00010246 [ 145.570144] RAX: dffffc0000000000 RBX: ffff88814fca2dd8 RCX: 1ffffffff50cb455 [ 145.570196] RDX: 0000000000000000 RSI: ffff88814fca2dd8 RDI: ffff88814fca2e80 [ 145.570252] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffffa8659497 [ 145.570329] R10: fffffbfff50cb292 R11: 0000000000000001 R12: ffff88814fca0000 [ 145.570410] R13: 0000000000000000 R14: ffff88814fca2798 R15: ffff88814fca2dd8 [ 145.570465] FS: 00007fa399988540(0000) GS:ffff888233e00000(0000) knlGS:0000000000000000 [ 145.570519] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 145.570571] CR2: 00007fa399b51421 CR3: 0000000137898002 CR4: 00000000003706e0 [ 145.570623] Call Trace: [ 145.570675] <TASK> [ 145.570727] ? ath11k_ce_tx_process_cb+0x34b/0x860 [ath11k] [ 145.570797] ath11k_ce_tx_process_cb+0x356/0x860 [ath11k] [ 145.570864] ? tasklet_init+0x150/0x150 [ 145.570919] ? ath11k_ce_alloc_pipes+0x280/0x280 [ath11k] [ 145.570986] ? tasklet_clear_sched+0x42/0xe0 [ 145.571042] ? tasklet_kill+0xe9/0x1b0 [ 145.571095] ? tasklet_clear_sched+0xe0/0xe0 [ 145.571148] ? irq_has_action+0x120/0x120 [ 145.571202] ath11k_ce_cleanup_pipes+0x45a/0x580 [ath11k] [ 145.571270] ? ath11k_pci_stop+0x10e/0x170 [ath11k_pci] [ 145.571345] ath11k_core_stop+0x8a/0xc0 [ath11k] [ 145.571434] ath11k_core_deinit+0x9e/0x150 [ath11k] [ 145.571499] ath11k_pci_remove+0xd2/0x260 [ath11k_pci] [ 145.571553] pci_device_remove+0x9a/0x1c0 [ 145.571605] __device_release_driver+0x332/0x660 [ 145.571659] driver_detach+0x1e7/0x2c0 [ 145.571712] bus_remove_driver+0xe2/0x2d0 [ 145.571772] pci_unregister_driver+0x21/0x250 [ 145.571826] __do_sys_delete_module+0x30a/0x4b0 [ 145.571879] ? free_module+0xac0/0xac0 [ 145.571933] ? lockdep_hardirqs_on_prepare.part.0+0x18c/0x370 [ 145.571986] ? syscall_enter_from_user_mode+0x1d/0x50 [ 145.572039] ? lockdep_hardirqs_on+0x79/0x100 [ 145.572097] do_syscall_64+0x3b/0x90 [ 145.572153] entry_SYSCALL_64_after_hwframe+0x44/0xae
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03003-QCAHSPSWPL_V1_V2_SILICONZ_LITE-2
Signed-off-by: Kalle Valo quic_kvalo@quicinc.com Link: https://lore.kernel.org/r/20220127090117.2024-2-kvalo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath11k/mhi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ath/ath11k/mhi.c b/drivers/net/wireless/ath/ath11k/mhi.c index cccaa348cf21..8b2143802816 100644 --- a/drivers/net/wireless/ath/ath11k/mhi.c +++ b/drivers/net/wireless/ath/ath11k/mhi.c @@ -561,7 +561,7 @@ static int ath11k_mhi_set_state(struct ath11k_pci *ab_pci, ret = 0; break; case ATH11K_MHI_POWER_ON: - ret = mhi_async_power_up(ab_pci->mhi_ctrl); + ret = mhi_sync_power_up(ab_pci->mhi_ctrl); break; case ATH11K_MHI_POWER_OFF: mhi_power_down(ab_pci->mhi_ctrl, true);
From: Tony Lu tonylu@linux.alibaba.com
[ Upstream commit ea785a1a573b390a150010b3c5b81e1ccd8c98a8 ]
According to the man page of TCP_CORK [1], if set, don't send out partial frames. All queued partial frames are sent when option is cleared again.
When applications call setsockopt to disable TCP_CORK, this call is protected by lock_sock(), and tries to mod_delayed_work() to 0, in order to send pending data right now. However, the delayed work smc_tx_work is also protected by lock_sock(). There introduces lock contention for sending data.
To fix it, send pending data directly which acts like TCP, without lock_sock() protected in the context of setsockopt (already lock_sock()ed), and cancel unnecessary dealyed work, which is protected by lock.
[1] https://linux.die.net/man/7/tcp
Signed-off-by: Tony Lu tonylu@linux.alibaba.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/smc/af_smc.c | 4 ++-- net/smc/smc_tx.c | 25 +++++++++++++++---------- net/smc/smc_tx.h | 1 + 3 files changed, 18 insertions(+), 12 deletions(-)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 284befa90967..67fc72047c9c 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -2636,8 +2636,8 @@ static int smc_setsockopt(struct socket *sock, int level, int optname, sk->sk_state != SMC_CLOSED) { if (!val) { SMC_STAT_INC(smc, cork_cnt); - mod_delayed_work(smc->conn.lgr->tx_wq, - &smc->conn.tx_work, 0); + smc_tx_pending(&smc->conn); + cancel_delayed_work(&smc->conn.tx_work); } } break; diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c index be241d53020f..7b0b6e24582f 100644 --- a/net/smc/smc_tx.c +++ b/net/smc/smc_tx.c @@ -597,27 +597,32 @@ int smc_tx_sndbuf_nonempty(struct smc_connection *conn) return rc; }
-/* Wakeup sndbuf consumers from process context - * since there is more data to transmit - */ -void smc_tx_work(struct work_struct *work) +void smc_tx_pending(struct smc_connection *conn) { - struct smc_connection *conn = container_of(to_delayed_work(work), - struct smc_connection, - tx_work); struct smc_sock *smc = container_of(conn, struct smc_sock, conn); int rc;
- lock_sock(&smc->sk); if (smc->sk.sk_err) - goto out; + return;
rc = smc_tx_sndbuf_nonempty(conn); if (!rc && conn->local_rx_ctrl.prod_flags.write_blocked && !atomic_read(&conn->bytes_to_rcv)) conn->local_rx_ctrl.prod_flags.write_blocked = 0; +} + +/* Wakeup sndbuf consumers from process context + * since there is more data to transmit + */ +void smc_tx_work(struct work_struct *work) +{ + struct smc_connection *conn = container_of(to_delayed_work(work), + struct smc_connection, + tx_work); + struct smc_sock *smc = container_of(conn, struct smc_sock, conn);
-out: + lock_sock(&smc->sk); + smc_tx_pending(conn); release_sock(&smc->sk); }
diff --git a/net/smc/smc_tx.h b/net/smc/smc_tx.h index 07e6ad76224a..a59f370b8b43 100644 --- a/net/smc/smc_tx.h +++ b/net/smc/smc_tx.h @@ -27,6 +27,7 @@ static inline int smc_tx_prepared_sends(struct smc_connection *conn) return smc_curs_diff(conn->sndbuf_desc->len, &sent, &prep); }
+void smc_tx_pending(struct smc_connection *conn); void smc_tx_work(struct work_struct *work); void smc_tx_init(struct smc_sock *smc); int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len);
From: Yongzhi Liu lyz_cs@pku.edu.cn
[ Upstream commit 46f47807738441e354873546dde0b000106c068a ]
pm_runtime_get_sync() will increase the rumtime PM counter even when it returns an error. Thus a pairing decrement is needed to prevent refcount leak. Fix this by replacing this API with pm_runtime_resume_and_get(), which will not change the runtime PM counter on error. Besides, a matching decrement is needed on the error handling path to keep the counter balanced.
Signed-off-by: Yongzhi Liu lyz_cs@pku.edu.cn Reviewed-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Signed-off-by: Robert Foss robert.foss@linaro.org Link: https://patchwork.freedesktop.org/patch/msgid/1643008835-73961-1-git-send-em... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/bridge/nwl-dsi.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/bridge/nwl-dsi.c b/drivers/gpu/drm/bridge/nwl-dsi.c index 6e484d836cfe..691039aba87f 100644 --- a/drivers/gpu/drm/bridge/nwl-dsi.c +++ b/drivers/gpu/drm/bridge/nwl-dsi.c @@ -861,18 +861,19 @@ nwl_dsi_bridge_mode_set(struct drm_bridge *bridge, memcpy(&dsi->mode, adjusted_mode, sizeof(dsi->mode)); drm_mode_debug_printmodeline(adjusted_mode);
- pm_runtime_get_sync(dev); + if (pm_runtime_resume_and_get(dev) < 0) + return;
if (clk_prepare_enable(dsi->lcdif_clk) < 0) - return; + goto runtime_put; if (clk_prepare_enable(dsi->core_clk) < 0) - return; + goto runtime_put;
/* Step 1 from DSI reset-out instructions */ ret = reset_control_deassert(dsi->rst_pclk); if (ret < 0) { DRM_DEV_ERROR(dev, "Failed to deassert PCLK: %d\n", ret); - return; + goto runtime_put; }
/* Step 2 from DSI reset-out instructions */ @@ -882,13 +883,18 @@ nwl_dsi_bridge_mode_set(struct drm_bridge *bridge, ret = reset_control_deassert(dsi->rst_esc); if (ret < 0) { DRM_DEV_ERROR(dev, "Failed to deassert ESC: %d\n", ret); - return; + goto runtime_put; } ret = reset_control_deassert(dsi->rst_byte); if (ret < 0) { DRM_DEV_ERROR(dev, "Failed to deassert BYTE: %d\n", ret); - return; + goto runtime_put; } + + return; + +runtime_put: + pm_runtime_put_sync(dev); }
static void
From: Jakub Sitnicki jakub@cloudflare.com
[ Upstream commit 4421a582718ab81608d8486734c18083b822390d ]
Menglong Dong reports that the documentation for the dst_port field in struct bpf_sock is inaccurate and confusing. From the BPF program PoV, the field is a zero-padded 16-bit integer in network byte order. The value appears to the BPF user as if laid out in memory as so:
offsetof(struct bpf_sock, dst_port) + 0 <port MSB> + 8 <port LSB> +16 0x00 +24 0x00
32-, 16-, and 8-bit wide loads from the field are all allowed, but only if the offset into the field is 0.
32-bit wide loads from dst_port are especially confusing. The loaded value, after converting to host byte order with bpf_ntohl(dst_port), contains the port number in the upper 16-bits.
Remove the confusion by splitting the field into two 16-bit fields. For backward compatibility, allow 32-bit wide loads from offsetof(struct bpf_sock, dst_port).
While at it, allow loads 8-bit loads at offset [0] and [1] from dst_port.
Reported-by: Menglong Dong imagedong@tencent.com Signed-off-by: Jakub Sitnicki jakub@cloudflare.com Link: https://lore.kernel.org/r/20220130115518.213259-2-jakub@cloudflare.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/uapi/linux/bpf.h | 3 ++- net/core/filter.c | 10 +++++++++- 2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 015bfec0dbfd..51b0f899424a 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -5500,7 +5500,8 @@ struct bpf_sock { __u32 src_ip4; __u32 src_ip6[4]; __u32 src_port; /* host byte order */ - __u32 dst_port; /* network byte order */ + __be16 dst_port; /* network byte order */ + __u16 :16; /* zero padding */ __u32 dst_ip4; __u32 dst_ip6[4]; __u32 state; diff --git a/net/core/filter.c b/net/core/filter.c index 9eb785842258..82fcb7533663 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -8033,6 +8033,7 @@ bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type, struct bpf_insn_access_aux *info) { const int size_default = sizeof(__u32); + int field_size;
if (off < 0 || off >= sizeof(struct bpf_sock)) return false; @@ -8044,7 +8045,6 @@ bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type, case offsetof(struct bpf_sock, family): case offsetof(struct bpf_sock, type): case offsetof(struct bpf_sock, protocol): - case offsetof(struct bpf_sock, dst_port): case offsetof(struct bpf_sock, src_port): case offsetof(struct bpf_sock, rx_queue_mapping): case bpf_ctx_range(struct bpf_sock, src_ip4): @@ -8053,6 +8053,14 @@ bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type, case bpf_ctx_range_till(struct bpf_sock, dst_ip6[0], dst_ip6[3]): bpf_ctx_record_field_size(info, size_default); return bpf_ctx_narrow_access_ok(off, size, size_default); + case bpf_ctx_range(struct bpf_sock, dst_port): + field_size = size == size_default ? + size_default : sizeof_field(struct bpf_sock, dst_port); + bpf_ctx_record_field_size(info, field_size); + return bpf_ctx_narrow_access_ok(off, size, field_size); + case offsetofend(struct bpf_sock, dst_port) ... + offsetof(struct bpf_sock, dst_ip4) - 1: + return false; }
return size == size_default;
From: Yang Guang yang.guang5@zte.com.cn
[ Upstream commit 0ad3867b0f13e45cfee5a1298bfd40eef096116c ]
coccinelle report: ./drivers/scsi/mvsas/mv_init.c:699:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/mvsas/mv_init.c:747:8-16: WARNING: use scnprintf or sprintf
Use sysfs_emit() instead of scnprintf() or sprintf().
Link: https://lore.kernel.org/r/c1711f7cf251730a8ceb5bdfc313bf85662b3395.164318294... Reported-by: Zeal Robot zealci@zte.com.cn Signed-off-by: Yang Guang yang.guang5@zte.com.cn Signed-off-by: David Yang davidcomponentone@gmail.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/mvsas/mv_init.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c index dcae2d4464f9..44df7c03aab8 100644 --- a/drivers/scsi/mvsas/mv_init.c +++ b/drivers/scsi/mvsas/mv_init.c @@ -696,7 +696,7 @@ static struct pci_driver mvs_pci_driver = { static ssize_t driver_version_show(struct device *cdev, struct device_attribute *attr, char *buffer) { - return snprintf(buffer, PAGE_SIZE, "%s\n", DRV_VERSION); + return sysfs_emit(buffer, "%s\n", DRV_VERSION); }
static DEVICE_ATTR_RO(driver_version); @@ -744,7 +744,7 @@ static ssize_t interrupt_coalescing_store(struct device *cdev, static ssize_t interrupt_coalescing_show(struct device *cdev, struct device_attribute *attr, char *buffer) { - return snprintf(buffer, PAGE_SIZE, "%d\n", interrupt_coalescing); + return sysfs_emit(buffer, "%d\n", interrupt_coalescing); }
static DEVICE_ATTR_RW(interrupt_coalescing);
From: Yang Guang yang.guang5@zte.com.cn
[ Upstream commit 2245ea91fd3a04cafbe2f54911432a8657528c3b ]
coccinelle report: ./drivers/scsi/bfa/bfad_attr.c:908:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:860:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:888:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:853:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:808:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:728:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:822:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:927:9-17: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:900:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:874:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:714:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:839:8-16: WARNING: use scnprintf or sprintf
Use sysfs_emit() instead of scnprintf() or sprintf().
Link: https://lore.kernel.org/r/def83ff75faec64ba592b867a8499b1367bae303.164318146... Reported-by: Zeal Robot zealci@zte.com.cn Signed-off-by: Yang Guang yang.guang5@zte.com.cn Signed-off-by: David Yang davidcomponentone@gmail.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/bfa/bfad_attr.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/drivers/scsi/bfa/bfad_attr.c b/drivers/scsi/bfa/bfad_attr.c index f46989bd083c..5a85401e9e2d 100644 --- a/drivers/scsi/bfa/bfad_attr.c +++ b/drivers/scsi/bfa/bfad_attr.c @@ -711,7 +711,7 @@ bfad_im_serial_num_show(struct device *dev, struct device_attribute *attr, char serial_num[BFA_ADAPTER_SERIAL_NUM_LEN];
bfa_get_adapter_serial_num(&bfad->bfa, serial_num); - return snprintf(buf, PAGE_SIZE, "%s\n", serial_num); + return sysfs_emit(buf, "%s\n", serial_num); }
static ssize_t @@ -725,7 +725,7 @@ bfad_im_model_show(struct device *dev, struct device_attribute *attr, char model[BFA_ADAPTER_MODEL_NAME_LEN];
bfa_get_adapter_model(&bfad->bfa, model); - return snprintf(buf, PAGE_SIZE, "%s\n", model); + return sysfs_emit(buf, "%s\n", model); }
static ssize_t @@ -805,7 +805,7 @@ bfad_im_model_desc_show(struct device *dev, struct device_attribute *attr, snprintf(model_descr, BFA_ADAPTER_MODEL_DESCR_LEN, "Invalid Model");
- return snprintf(buf, PAGE_SIZE, "%s\n", model_descr); + return sysfs_emit(buf, "%s\n", model_descr); }
static ssize_t @@ -819,7 +819,7 @@ bfad_im_node_name_show(struct device *dev, struct device_attribute *attr, u64 nwwn;
nwwn = bfa_fcs_lport_get_nwwn(port->fcs_port); - return snprintf(buf, PAGE_SIZE, "0x%llx\n", cpu_to_be64(nwwn)); + return sysfs_emit(buf, "0x%llx\n", cpu_to_be64(nwwn)); }
static ssize_t @@ -836,7 +836,7 @@ bfad_im_symbolic_name_show(struct device *dev, struct device_attribute *attr, bfa_fcs_lport_get_attr(&bfad->bfa_fcs.fabric.bport, &port_attr); strlcpy(symname, port_attr.port_cfg.sym_name.symname, BFA_SYMNAME_MAXLEN); - return snprintf(buf, PAGE_SIZE, "%s\n", symname); + return sysfs_emit(buf, "%s\n", symname); }
static ssize_t @@ -850,14 +850,14 @@ bfad_im_hw_version_show(struct device *dev, struct device_attribute *attr, char hw_ver[BFA_VERSION_LEN];
bfa_get_pci_chip_rev(&bfad->bfa, hw_ver); - return snprintf(buf, PAGE_SIZE, "%s\n", hw_ver); + return sysfs_emit(buf, "%s\n", hw_ver); }
static ssize_t bfad_im_drv_version_show(struct device *dev, struct device_attribute *attr, char *buf) { - return snprintf(buf, PAGE_SIZE, "%s\n", BFAD_DRIVER_VERSION); + return sysfs_emit(buf, "%s\n", BFAD_DRIVER_VERSION); }
static ssize_t @@ -871,7 +871,7 @@ bfad_im_optionrom_version_show(struct device *dev, char optrom_ver[BFA_VERSION_LEN];
bfa_get_adapter_optrom_ver(&bfad->bfa, optrom_ver); - return snprintf(buf, PAGE_SIZE, "%s\n", optrom_ver); + return sysfs_emit(buf, "%s\n", optrom_ver); }
static ssize_t @@ -885,7 +885,7 @@ bfad_im_fw_version_show(struct device *dev, struct device_attribute *attr, char fw_ver[BFA_VERSION_LEN];
bfa_get_adapter_fw_ver(&bfad->bfa, fw_ver); - return snprintf(buf, PAGE_SIZE, "%s\n", fw_ver); + return sysfs_emit(buf, "%s\n", fw_ver); }
static ssize_t @@ -897,7 +897,7 @@ bfad_im_num_of_ports_show(struct device *dev, struct device_attribute *attr, (struct bfad_im_port_s *) shost->hostdata[0]; struct bfad_s *bfad = im_port->bfad;
- return snprintf(buf, PAGE_SIZE, "%d\n", + return sysfs_emit(buf, "%d\n", bfa_get_nports(&bfad->bfa)); }
@@ -905,7 +905,7 @@ static ssize_t bfad_im_drv_name_show(struct device *dev, struct device_attribute *attr, char *buf) { - return snprintf(buf, PAGE_SIZE, "%s\n", BFAD_DRIVER_NAME); + return sysfs_emit(buf, "%s\n", BFAD_DRIVER_NAME); }
static ssize_t @@ -924,14 +924,14 @@ bfad_im_num_of_discovered_ports_show(struct device *dev, rports = kcalloc(nrports, sizeof(struct bfa_rport_qualifier_s), GFP_ATOMIC); if (rports == NULL) - return snprintf(buf, PAGE_SIZE, "Failed\n"); + return sysfs_emit(buf, "Failed\n");
spin_lock_irqsave(&bfad->bfad_lock, flags); bfa_fcs_lport_get_rport_quals(port->fcs_port, rports, &nrports); spin_unlock_irqrestore(&bfad->bfad_lock, flags); kfree(rports);
- return snprintf(buf, PAGE_SIZE, "%d\n", nrports); + return sysfs_emit(buf, "%d\n", nrports); }
static DEVICE_ATTR(serial_number, S_IRUGO,
From: Yongzhi Liu lyz_cs@pku.edu.cn
[ Upstream commit e57c1a3bd5e8e0c7181f65ae55581f0236a8f284 ]
[why] Unlock is needed on the error handling path to prevent dead lock. v3d_submit_cl_ioctl and v3d_submit_csd_ioctl is missing unlock.
[how] Fix this by changing goto target on the error handling path. So changing the goto to target an error handling path that includes drm_gem_unlock reservations.
Signed-off-by: Yongzhi Liu lyz_cs@pku.edu.cn Reviewed-by: Melissa Wen mwen@igalia.com Signed-off-by: Melissa Wen melissa.srw@gmail.com Link: https://patchwork.freedesktop.org/patch/msgid/1643377262-109975-1-git-send-e... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/v3d/v3d_gem.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c index c7ed2e1cbab6..92bc0faee84f 100644 --- a/drivers/gpu/drm/v3d/v3d_gem.c +++ b/drivers/gpu/drm/v3d/v3d_gem.c @@ -798,7 +798,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
if (!render->base.perfmon) { ret = -ENOENT; - goto fail; + goto fail_perfmon; } }
@@ -847,6 +847,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
fail_unreserve: mutex_unlock(&v3d->sched_lock); +fail_perfmon: drm_gem_unlock_reservations(last_job->bo, last_job->bo_count, &acquire_ctx); fail: @@ -1027,7 +1028,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data, args->perfmon_id); if (!job->base.perfmon) { ret = -ENOENT; - goto fail; + goto fail_perfmon; } }
@@ -1056,6 +1057,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
fail_unreserve: mutex_unlock(&v3d->sched_lock); +fail_perfmon: drm_gem_unlock_reservations(clean_job->bo, clean_job->bo_count, &acquire_ctx); fail:
From: Evgeny Boger boger@wirenboard.com
[ Upstream commit d4f408cdcd26921c1268cb8dcbe8ffb6faf837f3 ]
As stated in [1], negative current values are used for discharging batteries.
AXP PMICs internally have two different ADC channels for shunt current measurement: one used during charging and one during discharging. The values reported by these ADCs are unsigned. While the driver properly selects ADC channel to get the data from, it doesn't apply negative sign when reporting discharging current.
[1] Documentation/ABI/testing/sysfs-class-power
Signed-off-by: Evgeny Boger boger@wirenboard.com Acked-by: Chen-Yu Tsai wens@csie.org Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/power/supply/axp20x_battery.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/power/supply/axp20x_battery.c b/drivers/power/supply/axp20x_battery.c index 5d197141f476..9106077c0dbb 100644 --- a/drivers/power/supply/axp20x_battery.c +++ b/drivers/power/supply/axp20x_battery.c @@ -186,7 +186,6 @@ static int axp20x_battery_get_prop(struct power_supply *psy, union power_supply_propval *val) { struct axp20x_batt_ps *axp20x_batt = power_supply_get_drvdata(psy); - struct iio_channel *chan; int ret = 0, reg, val1;
switch (psp) { @@ -266,12 +265,12 @@ static int axp20x_battery_get_prop(struct power_supply *psy, if (ret) return ret;
- if (reg & AXP20X_PWR_STATUS_BAT_CHARGING) - chan = axp20x_batt->batt_chrg_i; - else - chan = axp20x_batt->batt_dischrg_i; - - ret = iio_read_channel_processed(chan, &val->intval); + if (reg & AXP20X_PWR_STATUS_BAT_CHARGING) { + ret = iio_read_channel_processed(axp20x_batt->batt_chrg_i, &val->intval); + } else { + ret = iio_read_channel_processed(axp20x_batt->batt_dischrg_i, &val1); + val->intval = -val1; + } if (ret) return ret;
From: Ben Greear greearb@candelatech.com
[ Upstream commit 827e7799c61b978fbc2cc9dac66cb62401b2b3f0 ]
If the nic fails to start, it is possible that the reset_work has already been scheduled. Ensure the work item is canceled so we do not have use-after-free crash in case cleanup is called before the work item is executed.
This fixes crash on my x86_64 apu2 when mt7921k radio fails to work. Radio still fails, but OS does not crash.
Signed-off-by: Ben Greear greearb@candelatech.com Signed-off-by: Felix Fietkau nbd@nbd.name Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/mediatek/mt76/mt7921/main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/main.c b/drivers/net/wireless/mediatek/mt76/mt7921/main.c index 7a8d2596c226..4abb7a6e775a 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7921/main.c +++ b/drivers/net/wireless/mediatek/mt76/mt7921/main.c @@ -273,6 +273,7 @@ static void mt7921_stop(struct ieee80211_hw *hw)
cancel_delayed_work_sync(&dev->pm.ps_work); cancel_work_sync(&dev->pm.wake_work); + cancel_work_sync(&dev->reset_work); mt76_connac_free_pending_tx_skbs(&dev->pm, NULL);
mt7921_mutex_acquire(dev);
From: Lorenzo Bianconi lorenzo@kernel.org
[ Upstream commit 577298ec55dfc8b9aece54520f0258c3f93a6573 ]
Even if it is only a false-positive since skip_buf0/skip_buf1 are only used in mt76_dma_tx_cleanup_idx routine, initialize skip_unmap in mt76_dma_rx_fill in order to fix the following UBSAN report:
[ 13.924906] UBSAN: invalid-load in linux-5.15.0/drivers/net/wireless/mediatek/mt76/dma.c:162:13 [ 13.924909] load of value 225 is not a valid value for type '_Bool' [ 13.924912] CPU: 9 PID: 672 Comm: systemd-udevd Not tainted 5.15.0-18-generic #18-Ubuntu [ 13.924914] Hardware name: LENOVO 21A0000CMX/21A0000CMX, BIOS R1MET43W (1.13 ) 11/05/2021 [ 13.924915] Call Trace: [ 13.924917] <TASK> [ 13.924920] show_stack+0x52/0x58 [ 13.924925] dump_stack_lvl+0x4a/0x5f [ 13.924931] dump_stack+0x10/0x12 [ 13.924932] ubsan_epilogue+0x9/0x45 [ 13.924934] __ubsan_handle_load_invalid_value.cold+0x44/0x49 [ 13.924935] ? __iommu_dma_map+0x84/0xf0 [ 13.924939] mt76_dma_add_buf.constprop.0.cold+0x23/0x85 [mt76] [ 13.924949] mt76_dma_rx_fill.isra.0+0x102/0x1f0 [mt76] [ 13.924954] mt76_dma_init+0xc9/0x150 [mt76] [ 13.924959] ? mt7921_dma_enable+0x110/0x110 [mt7921e] [ 13.924966] mt7921_dma_init+0x1e3/0x260 [mt7921e] [ 13.924970] mt7921_register_device+0x29d/0x510 [mt7921e] [ 13.924975] mt7921_pci_probe.part.0+0x17f/0x1b0 [mt7921e] [ 13.924980] mt7921_pci_probe+0x43/0x60 [mt7921e] [ 13.924984] local_pci_probe+0x4b/0x90 [ 13.924987] pci_device_probe+0x115/0x1f0 [ 13.924989] really_probe+0x21e/0x420 [ 13.924992] __driver_probe_device+0x115/0x190 [ 13.924994] driver_probe_device+0x23/0xc0 [ 13.924996] __driver_attach+0xbd/0x1d0 [ 13.924998] ? __device_attach_driver+0x110/0x110 [ 13.924999] bus_for_each_dev+0x7e/0xc0 [ 13.925001] driver_attach+0x1e/0x20 [ 13.925003] bus_add_driver+0x135/0x200 [ 13.925005] driver_register+0x95/0xf0 [ 13.925008] ? 0xffffffffc0766000 [ 13.925010] __pci_register_driver+0x68/0x70 [ 13.925011] mt7921_pci_driver_init+0x23/0x1000 [mt7921e] [ 13.925015] do_one_initcall+0x48/0x1d0 [ 13.925019] ? kmem_cache_alloc_trace+0x19e/0x2e0 [ 13.925022] do_init_module+0x62/0x280 [ 13.925025] load_module+0xac9/0xbb0 [ 13.925027] __do_sys_finit_module+0xbf/0x120 [ 13.925029] __x64_sys_finit_module+0x18/0x20 [ 13.925030] do_syscall_64+0x5c/0xc0 [ 13.925033] ? do_syscall_64+0x69/0xc0 [ 13.925034] ? sysvec_reschedule_ipi+0x78/0xe0 [ 13.925036] ? asm_sysvec_reschedule_ipi+0xa/0x20 [ 13.925039] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 13.925040] RIP: 0033:0x7fbf2b90f94d [ 13.925045] RSP: 002b:00007ffe2ec7e5d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 13.925047] RAX: ffffffffffffffda RBX: 000056106b0634e0 RCX: 00007fbf2b90f94d [ 13.925048] RDX: 0000000000000000 RSI: 00007fbf2baa3441 RDI: 0000000000000013 [ 13.925049] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002 [ 13.925050] R10: 0000000000000013 R11: 0000000000000246 R12: 00007fbf2baa3441 [ 13.925051] R13: 000056106b062620 R14: 000056106b0610c0 R15: 000056106b0640d0 [ 13.925053] </TASK>
Signed-off-by: Lorenzo Bianconi lorenzo@kernel.org Signed-off-by: Felix Fietkau nbd@nbd.name Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/mediatek/mt76/dma.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/mediatek/mt76/dma.c b/drivers/net/wireless/mediatek/mt76/dma.c index 3a9af8931c35..3d644925a4e0 100644 --- a/drivers/net/wireless/mediatek/mt76/dma.c +++ b/drivers/net/wireless/mediatek/mt76/dma.c @@ -465,6 +465,7 @@ mt76_dma_rx_fill(struct mt76_dev *dev, struct mt76_queue *q)
qbuf.addr = addr + offset; qbuf.len = len - offset; + qbuf.skip_unmap = false; mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, NULL); frames++; }
From: Jedrzej Jagielski jedrzej.jagielski@intel.com
[ Upstream commit 59b3d7350ff35c939b8e173eb2eecac80a5ee046 ]
Change functions: - i40e_aq_add_macvlan - i40e_aq_remove_macvlan - i40e_aq_delete_element - i40e_aq_add_vsi - i40e_aq_update_vsi_params to explicitly use i40e_asq_send_command_atomic(..., true) instead of i40e_asq_send_command, as they use mutexes and do some work in an atomic context. Without this change setting vlan via netdev will fail with call trace cased by bug "BUG: scheduling while atomic".
Signed-off-by: Witold Fijalkowski witoldx.fijalkowski@intel.com Signed-off-by: Jedrzej Jagielski jedrzej.jagielski@intel.com Tested-by: Gurucharan G gurucharanx.g@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_common.c | 21 +++++++++++-------- 1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c index 9ddeb015eb7e..e830987a8c6d 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_common.c +++ b/drivers/net/ethernet/intel/i40e/i40e_common.c @@ -1899,8 +1899,9 @@ i40e_status i40e_aq_add_vsi(struct i40e_hw *hw,
desc.flags |= cpu_to_le16((u16)(I40E_AQ_FLAG_BUF | I40E_AQ_FLAG_RD));
- status = i40e_asq_send_command(hw, &desc, &vsi_ctx->info, - sizeof(vsi_ctx->info), cmd_details); + status = i40e_asq_send_command_atomic(hw, &desc, &vsi_ctx->info, + sizeof(vsi_ctx->info), + cmd_details, true);
if (status) goto aq_add_vsi_exit; @@ -2287,8 +2288,9 @@ i40e_status i40e_aq_update_vsi_params(struct i40e_hw *hw,
desc.flags |= cpu_to_le16((u16)(I40E_AQ_FLAG_BUF | I40E_AQ_FLAG_RD));
- status = i40e_asq_send_command(hw, &desc, &vsi_ctx->info, - sizeof(vsi_ctx->info), cmd_details); + status = i40e_asq_send_command_atomic(hw, &desc, &vsi_ctx->info, + sizeof(vsi_ctx->info), + cmd_details, true);
vsi_ctx->vsis_allocated = le16_to_cpu(resp->vsi_used); vsi_ctx->vsis_unallocated = le16_to_cpu(resp->vsi_free); @@ -2673,8 +2675,8 @@ i40e_status i40e_aq_add_macvlan(struct i40e_hw *hw, u16 seid, if (buf_size > I40E_AQ_LARGE_BUF) desc.flags |= cpu_to_le16((u16)I40E_AQ_FLAG_LB);
- status = i40e_asq_send_command(hw, &desc, mv_list, buf_size, - cmd_details); + status = i40e_asq_send_command_atomic(hw, &desc, mv_list, buf_size, + cmd_details, true);
return status; } @@ -2715,8 +2717,8 @@ i40e_status i40e_aq_remove_macvlan(struct i40e_hw *hw, u16 seid, if (buf_size > I40E_AQ_LARGE_BUF) desc.flags |= cpu_to_le16((u16)I40E_AQ_FLAG_LB);
- status = i40e_asq_send_command(hw, &desc, mv_list, buf_size, - cmd_details); + status = i40e_asq_send_command_atomic(hw, &desc, mv_list, buf_size, + cmd_details, true);
return status; } @@ -3868,7 +3870,8 @@ i40e_status i40e_aq_delete_element(struct i40e_hw *hw, u16 seid,
cmd->seid = cpu_to_le16(seid);
- status = i40e_asq_send_command(hw, &desc, NULL, 0, cmd_details); + status = i40e_asq_send_command_atomic(hw, &desc, NULL, 0, + cmd_details, true);
return status; }
From: Avraham Stern avraham.stern@intel.com
[ Upstream commit 5666ee154f4696c011dfa8544aaf5591b6b87515 ]
When adding 6GHz channels to scan request based on reported co-located APs, don't add channels that have only APs with "non-transmitted" BSSes if they only match the wildcard SSID since they will be found by probing the "transmitted" BSS.
Signed-off-by: Avraham Stern avraham.stern@intel.com Signed-off-by: Luca Coelho luciano.coelho@intel.com Link: https://lore.kernel.org/r/iwlwifi.20220202104617.f6ddf099f934.I231e55885d364... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/wireless/scan.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/net/wireless/scan.c b/net/wireless/scan.c index b888522f133b..b2fdac96bab0 100644 --- a/net/wireless/scan.c +++ b/net/wireless/scan.c @@ -700,8 +700,12 @@ static bool cfg80211_find_ssid_match(struct cfg80211_colocated_ap *ap,
for (i = 0; i < request->n_ssids; i++) { /* wildcard ssid in the scan request */ - if (!request->ssids[i].ssid_len) + if (!request->ssids[i].ssid_len) { + if (ap->multi_bss && !ap->transmitted_bssid) + continue; + return true; + }
if (ap->ssid_len && ap->ssid_len == request->ssids[i].ssid_len) { @@ -827,6 +831,9 @@ static int cfg80211_scan_6ghz(struct cfg80211_registered_device *rdev) !cfg80211_find_ssid_match(ap, request)) continue;
+ if (!request->n_ssids && ap->multi_bss && !ap->transmitted_bssid) + continue; + cfg80211_scan_req_add_chan(request, chan, true); memcpy(scan_6ghz_params->bssid, ap->bssid, ETH_ALEN); scan_6ghz_params->short_ssid = ap->short_ssid;
From: Yonghong Song yhs@fb.com
[ Upstream commit 0908a66ad1124c1634c33847ac662106f7f2c198 ]
There are cases where clang compiler is packaged in a way readelf is a symbolic link to llvm-readelf. In such cases, llvm-readelf will be used instead of default binutils readelf, and the following error will appear during libbpf build:
# Warning: Num of global symbols in # /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/sharedobjs/libbpf-in.o (367) # does NOT match with num of versioned symbols in # /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/libbpf.so libbpf.map (383). # Please make sure all LIBBPF_API symbols are versioned in libbpf.map. # --- /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/libbpf_global_syms.tmp ... # +++ /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/libbpf_versioned_syms.tmp ... # @@ -324,6 +324,22 @@ # btf__str_by_offset # btf__type_by_id # btf__type_cnt # +LIBBPF_0.0.1 # +LIBBPF_0.0.2 # +LIBBPF_0.0.3 # +LIBBPF_0.0.4 # +LIBBPF_0.0.5 # +LIBBPF_0.0.6 # +LIBBPF_0.0.7 # +LIBBPF_0.0.8 # +LIBBPF_0.0.9 # +LIBBPF_0.1.0 # +LIBBPF_0.2.0 # +LIBBPF_0.3.0 # +LIBBPF_0.4.0 # +LIBBPF_0.5.0 # +LIBBPF_0.6.0 # +LIBBPF_0.7.0 # libbpf_attach_type_by_name # libbpf_find_kernel_btf # libbpf_find_vmlinux_btf_id # make[2]: *** [Makefile:184: check_abi] Error 1 # make[1]: *** [Makefile:140: all] Error 2
The above failure is due to different printouts for some ABS versioned symbols. For example, with the same libbpf.so, $ /bin/readelf --dyn-syms --wide tools/lib/bpf/libbpf.so | grep "LIBBPF" | grep ABS 134: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.5.0 202: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.6.0 ... $ /opt/llvm/bin/readelf --dyn-syms --wide tools/lib/bpf/libbpf.so | grep "LIBBPF" | grep ABS 134: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.5.0@@LIBBPF_0.5.0 202: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.6.0@@LIBBPF_0.6.0 ... The binutils readelf doesn't print out the symbol LIBBPF_* version and llvm-readelf does. Such a difference caused libbpf build failure with llvm-readelf.
The proposed fix filters out all ABS symbols as they are not part of the comparison. This works for both binutils readelf and llvm-readelf.
Reported-by: Delyan Kratunov delyank@fb.com Signed-off-by: Yonghong Song yhs@fb.com Signed-off-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/bpf/20220204214355.502108-1-yhs@fb.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/lib/bpf/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/tools/lib/bpf/Makefile +++ b/tools/lib/bpf/Makefile @@ -131,7 +131,7 @@ GLOBAL_SYM_COUNT = $(shell readelf -s -- sort -u | wc -l) VERSIONED_SYM_COUNT = $(shell readelf --dyn-syms --wide $(OUTPUT)libbpf.so | \ sed 's/[.*]//' | \ - awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$NF}' | \ + awk '/GLOBAL/ && /DEFAULT/ && !/UND|ABS/ {print $$NF}' | \ grep -Eo '[^ ]+@LIBBPF_' | cut -d@ -f1 | sort -u | wc -l)
CMD_TARGETS = $(LIB_TARGET) $(PC_FILE) @@ -194,7 +194,7 @@ check_abi: $(OUTPUT)libbpf.so $(VERSION_ sort -u > $(OUTPUT)libbpf_global_syms.tmp; \ readelf --dyn-syms --wide $(OUTPUT)libbpf.so | \ sed 's/[.*]//' | \ - awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$NF}'| \ + awk '/GLOBAL/ && /DEFAULT/ && !/UND|ABS/ {print $$NF}'| \ grep -Eo '[^ ]+@LIBBPF_' | cut -d@ -f1 | \ sort -u > $(OUTPUT)libbpf_versioned_syms.tmp; \ diff -u $(OUTPUT)libbpf_global_syms.tmp \
From: Eric Dumazet edumazet@google.com
[ Upstream commit 145c7a793838add5e004e7d49a67654dc7eba147 ]
This fixes minor data-races in ip6_mc_input() and batadv_mcast_mla_rtr_flags_softif_get_ipv6()
Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/ipv6.h | 2 +- net/batman-adv/multicast.c | 2 +- net/ipv6/addrconf.c | 4 ++-- net/ipv6/ip6_input.c | 2 +- net/ipv6/ip6mr.c | 8 ++++---- 5 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index a59d25f19385..b8641dc0ee66 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -51,7 +51,7 @@ struct ipv6_devconf { __s32 use_optimistic; #endif #ifdef CONFIG_IPV6_MROUTE - __s32 mc_forwarding; + atomic_t mc_forwarding; #endif __s32 disable_ipv6; __s32 drop_unicast_in_l2_multicast; diff --git a/net/batman-adv/multicast.c b/net/batman-adv/multicast.c index f4004cf0ff6f..9f311fddfaf9 100644 --- a/net/batman-adv/multicast.c +++ b/net/batman-adv/multicast.c @@ -134,7 +134,7 @@ static u8 batadv_mcast_mla_rtr_flags_softif_get_ipv6(struct net_device *dev) { struct inet6_dev *in6_dev = __in6_dev_get(dev);
- if (in6_dev && in6_dev->cnf.mc_forwarding) + if (in6_dev && atomic_read(&in6_dev->cnf.mc_forwarding)) return BATADV_NO_FLAGS; else return BATADV_MCAST_WANT_NO_RTR6; diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index f908e2fd30b2..4df84013c4e6 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -554,7 +554,7 @@ static int inet6_netconf_fill_devconf(struct sk_buff *skb, int ifindex, #ifdef CONFIG_IPV6_MROUTE if ((all || type == NETCONFA_MC_FORWARDING) && nla_put_s32(skb, NETCONFA_MC_FORWARDING, - devconf->mc_forwarding) < 0) + atomic_read(&devconf->mc_forwarding)) < 0) goto nla_put_failure; #endif if ((all || type == NETCONFA_PROXY_NEIGH) && @@ -5539,7 +5539,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf, array[DEVCONF_USE_OPTIMISTIC] = cnf->use_optimistic; #endif #ifdef CONFIG_IPV6_MROUTE - array[DEVCONF_MC_FORWARDING] = cnf->mc_forwarding; + array[DEVCONF_MC_FORWARDING] = atomic_read(&cnf->mc_forwarding); #endif array[DEVCONF_DISABLE_IPV6] = cnf->disable_ipv6; array[DEVCONF_ACCEPT_DAD] = cnf->accept_dad; diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c index 80256717868e..d4b1e2c5aa76 100644 --- a/net/ipv6/ip6_input.c +++ b/net/ipv6/ip6_input.c @@ -508,7 +508,7 @@ int ip6_mc_input(struct sk_buff *skb) /* * IPv6 multicast router mode is now supported ;) */ - if (dev_net(skb->dev)->ipv6.devconf_all->mc_forwarding && + if (atomic_read(&dev_net(skb->dev)->ipv6.devconf_all->mc_forwarding) && !(ipv6_addr_type(&hdr->daddr) & (IPV6_ADDR_LOOPBACK|IPV6_ADDR_LINKLOCAL)) && likely(!(IP6CB(skb)->flags & IP6SKB_FORWARDED))) { diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c index 8a2db926b5eb..e3c884678dbe 100644 --- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -734,7 +734,7 @@ static int mif6_delete(struct mr_table *mrt, int vifi, int notify,
in6_dev = __in6_dev_get(dev); if (in6_dev) { - in6_dev->cnf.mc_forwarding--; + atomic_dec(&in6_dev->cnf.mc_forwarding); inet6_netconf_notify_devconf(dev_net(dev), RTM_NEWNETCONF, NETCONFA_MC_FORWARDING, dev->ifindex, &in6_dev->cnf); @@ -902,7 +902,7 @@ static int mif6_add(struct net *net, struct mr_table *mrt,
in6_dev = __in6_dev_get(dev); if (in6_dev) { - in6_dev->cnf.mc_forwarding++; + atomic_inc(&in6_dev->cnf.mc_forwarding); inet6_netconf_notify_devconf(dev_net(dev), RTM_NEWNETCONF, NETCONFA_MC_FORWARDING, dev->ifindex, &in6_dev->cnf); @@ -1553,7 +1553,7 @@ static int ip6mr_sk_init(struct mr_table *mrt, struct sock *sk) } else { rcu_assign_pointer(mrt->mroute_sk, sk); sock_set_flag(sk, SOCK_RCU_FREE); - net->ipv6.devconf_all->mc_forwarding++; + atomic_inc(&net->ipv6.devconf_all->mc_forwarding); } write_unlock_bh(&mrt_lock);
@@ -1586,7 +1586,7 @@ int ip6mr_sk_done(struct sock *sk) * so the RCU grace period before sk freeing * is guaranteed by sk_destruct() */ - net->ipv6.devconf_all->mc_forwarding--; + atomic_dec(&net->ipv6.devconf_all->mc_forwarding); write_unlock_bh(&mrt_lock); inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, NETCONFA_MC_FORWARDING,
From: Eric Dumazet edumazet@google.com
[ Upstream commit e3ececfe668facd87d920b608349a32607060e66 ]
Whenever ref_tracker_dir_init() is called, mark the struct ref_tracker_dir as dead.
Test the dead status from ref_tracker_alloc() and ref_tracker_free()
This should detect buggy dev_put()/dev_hold() happening too late in netdevice dismantle process.
Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/ref_tracker.h | 2 ++ lib/ref_tracker.c | 5 +++++ 2 files changed, 7 insertions(+)
diff --git a/include/linux/ref_tracker.h b/include/linux/ref_tracker.h index 60f3453be23e..a443abda937d 100644 --- a/include/linux/ref_tracker.h +++ b/include/linux/ref_tracker.h @@ -13,6 +13,7 @@ struct ref_tracker_dir { spinlock_t lock; unsigned int quarantine_avail; refcount_t untracked; + bool dead; struct list_head list; /* List of active trackers */ struct list_head quarantine; /* List of dead trackers */ #endif @@ -26,6 +27,7 @@ static inline void ref_tracker_dir_init(struct ref_tracker_dir *dir, INIT_LIST_HEAD(&dir->quarantine); spin_lock_init(&dir->lock); dir->quarantine_avail = quarantine_count; + dir->dead = false; refcount_set(&dir->untracked, 1); stack_depot_init(); } diff --git a/lib/ref_tracker.c b/lib/ref_tracker.c index a6789c0c626b..32ff6bd497f8 100644 --- a/lib/ref_tracker.c +++ b/lib/ref_tracker.c @@ -20,6 +20,7 @@ void ref_tracker_dir_exit(struct ref_tracker_dir *dir) unsigned long flags; bool leak = false;
+ dir->dead = true; spin_lock_irqsave(&dir->lock, flags); list_for_each_entry_safe(tracker, n, &dir->quarantine, head) { list_del(&tracker->head); @@ -72,6 +73,8 @@ int ref_tracker_alloc(struct ref_tracker_dir *dir, gfp_t gfp_mask = gfp; unsigned long flags;
+ WARN_ON_ONCE(dir->dead); + if (gfp & __GFP_DIRECT_RECLAIM) gfp_mask |= __GFP_NOFAIL; *trackerp = tracker = kzalloc(sizeof(*tracker), gfp_mask); @@ -100,6 +103,8 @@ int ref_tracker_free(struct ref_tracker_dir *dir, unsigned int nr_entries; unsigned long flags;
+ WARN_ON_ONCE(dir->dead); + if (!tracker) { refcount_dec(&dir->untracked); return -EEXIST;
From: Eric Dumazet edumazet@google.com
[ Upstream commit 9c1be1935fb68b2413796cdc03d019b8cf35ab51 ]
While testing a patch that will follow later ("net: add netns refcount tracker to struct nsproxy") I found that devtmpfs_init() was called before init_net was initialized.
This is a bug, because devtmpfs_setup() calls ksys_unshare(CLONE_NEWNS);
This has the effect of increasing init_net refcount, which will be later overwritten to 1, as part of setup_net(&init_net)
We had too many prior patches [1] trying to work around the root cause.
Really, make sure init_net is in BSS section, and that net_ns_init() is called earlier at boot time.
Note that another patch ("vfs: add netns refcount tracker to struct fs_context") also will need net_ns_init() being called before vfs_caches_init()
As a bonus, this patch saves around 4KB in .data section.
[1]
f8c46cb39079 ("netns: do not call pernet ops for not yet set up init_net namespace") b5082df8019a ("net: Initialise init_net.count to 1") 734b65417b24 ("net: Statically initialize init_net.dev_base_head")
v2: fixed a build error reported by kernel build bots (CONFIG_NET=n)
Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/net_namespace.h | 6 ++++++ init/main.c | 2 ++ net/core/dev.c | 3 +-- net/core/net_namespace.c | 17 +++++------------ 4 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 5b61c462e534..374cc7b260fc 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -513,4 +513,10 @@ static inline void fnhe_genid_bump(struct net *net) atomic_inc(&net->fnhe_genid); }
+#ifdef CONFIG_NET +void net_ns_init(void); +#else +static inline void net_ns_init(void) {} +#endif + #endif /* __NET_NET_NAMESPACE_H */ diff --git a/init/main.c b/init/main.c index 65fa2e41a9c0..ada50f5a15e4 100644 --- a/init/main.c +++ b/init/main.c @@ -99,6 +99,7 @@ #include <linux/kcsan.h> #include <linux/init_syscalls.h> #include <linux/stackdepot.h> +#include <net/net_namespace.h>
#include <asm/io.h> #include <asm/bugs.h> @@ -1116,6 +1117,7 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void) key_init(); security_init(); dbg_late_init(); + net_ns_init(); vfs_caches_init(); pagecache_init(); signals_init(); diff --git a/net/core/dev.c b/net/core/dev.c index 1baab07820f6..91cf709c98b3 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -10732,8 +10732,7 @@ static int __net_init netdev_init(struct net *net) BUILD_BUG_ON(GRO_HASH_BUCKETS > 8 * sizeof_field(struct napi_struct, gro_bitmask));
- if (net != &init_net) - INIT_LIST_HEAD(&net->dev_base_head); + INIT_LIST_HEAD(&net->dev_base_head);
net->dev_name_head = netdev_create_hash(); if (net->dev_name_head == NULL) diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index a5b5bb99c644..212e65add951 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -44,13 +44,7 @@ EXPORT_SYMBOL_GPL(net_rwsem); static struct key_tag init_net_key_domain = { .usage = REFCOUNT_INIT(1) }; #endif
-struct net init_net = { - .ns.count = REFCOUNT_INIT(1), - .dev_base_head = LIST_HEAD_INIT(init_net.dev_base_head), -#ifdef CONFIG_KEYS - .key_domain = &init_net_key_domain, -#endif -}; +struct net init_net; EXPORT_SYMBOL(init_net);
static bool init_net_initialized; @@ -1084,7 +1078,7 @@ static void rtnl_net_notifyid(struct net *net, int cmd, int id, u32 portid, rtnl_set_sk_err(net, RTNLGRP_NSID, err); }
-static int __init net_ns_init(void) +void __init net_ns_init(void) { struct net_generic *ng;
@@ -1105,6 +1099,9 @@ static int __init net_ns_init(void)
rcu_assign_pointer(init_net.gen, ng);
+#ifdef CONFIG_KEYS + init_net.key_domain = &init_net_key_domain; +#endif down_write(&pernet_ops_rwsem); if (setup_net(&init_net, &init_user_ns)) panic("Could not setup the initial network namespace"); @@ -1119,12 +1116,8 @@ static int __init net_ns_init(void) RTNL_FLAG_DOIT_UNLOCKED); rtnl_register(PF_UNSPEC, RTM_GETNSID, rtnl_net_getid, rtnl_net_dumpid, RTNL_FLAG_DOIT_UNLOCKED); - - return 0; }
-pure_initcall(net_ns_init); - static void free_exit_list(struct pernet_operations *ops, struct list_head *net_exit_list) { ops_pre_exit_list(ops, net_exit_list);
From: Sourabh Jain sourabhjain@linux.ibm.com
[ Upstream commit 7c5ed82b800d8615cdda00729e7b62e5899f0b13 ]
On large config LPARs (having 192 and more cores), Linux fails to boot due to insufficient memory in the first memblock. It is due to the memory reservation for the crash kernel which starts at 128MB offset of the first memblock. This memory reservation for the crash kernel doesn't leave enough space in the first memblock to accommodate other essential system resources.
The crash kernel start address was set to 128MB offset by default to ensure that the crash kernel get some memory below the RMA region which is used to be of size 256MB. But given that the RMA region size can be 512MB or more, setting the crash kernel offset to mid of RMA size will leave enough space for the kernel to allocate memory for other system resources.
Since the above crash kernel offset change is only applicable to the LPAR platform, the LPAR feature detection is pushed before the crash kernel reservation. The rest of LPAR specific initialization will still be done during pseries_probe_fw_features as usual.
This patch is dependent on changes to paca allocation for boot CPU. It expect boot CPU to discover 1T segment support which is introduced by the patch posted here: https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.html
Reported-by: Abdul haleem abdhalee@linux.vnet.ibm.com Signed-off-by: Sourabh Jain sourabhjain@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220204085601.107257-1-sourabhjain@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/rtas.c | 6 ++++++ arch/powerpc/kexec/core.c | 15 +++++++++++---- 2 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 733e6ef36758..1f42aabbbab3 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -1313,6 +1313,12 @@ int __init early_init_dt_scan_rtas(unsigned long node, entryp = of_get_flat_dt_prop(node, "linux,rtas-entry", NULL); sizep = of_get_flat_dt_prop(node, "rtas-size", NULL);
+#ifdef CONFIG_PPC64 + /* need this feature to decide the crashkernel offset */ + if (of_get_flat_dt_prop(node, "ibm,hypertas-functions", NULL)) + powerpc_firmware_features |= FW_FEATURE_LPAR; +#endif + if (basep && entryp && sizep) { rtas.base = *basep; rtas.entry = *entryp; diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c index 8b68d9f91a03..abf5897ae88c 100644 --- a/arch/powerpc/kexec/core.c +++ b/arch/powerpc/kexec/core.c @@ -134,11 +134,18 @@ void __init reserve_crashkernel(void) if (!crashk_res.start) { #ifdef CONFIG_PPC64 /* - * On 64bit we split the RMO in half but cap it at half of - * a small SLB (128MB) since the crash kernel needs to place - * itself and some stacks to be in the first segment. + * On the LPAR platform place the crash kernel to mid of + * RMA size (512MB or more) to ensure the crash kernel + * gets enough space to place itself and some stack to be + * in the first segment. At the same time normal kernel + * also get enough space to allocate memory for essential + * system resource in the first segment. Keep the crash + * kernel starts at 128MB offset on other platforms. */ - crashk_res.start = min(0x8000000ULL, (ppc64_rma_size / 2)); + if (firmware_has_feature(FW_FEATURE_LPAR)) + crashk_res.start = ppc64_rma_size / 2; + else + crashk_res.start = min(0x8000000ULL, (ppc64_rma_size / 2)); #else crashk_res.start = KDUMP_KERNELBASE; #endif
From: Rajneesh Bhardwaj rajneesh.bhardwaj@amd.com
[ Upstream commit 447c7997b62a5115ba4da846dcdee4fc12298a6a ]
Noticed the below warning while running a pytorch workload on vega10 GPUs. Change to trylock to avoid conflicts with already held reservation locks.
[ +0.000003] WARNING: possible recursive locking detected [ +0.000003] 5.13.0-kfd-rajneesh #1030 Not tainted [ +0.000004] -------------------------------------------- [ +0.000002] python/4822 is trying to acquire lock: [ +0.000004] ffff932cd9a259f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000203] but task is already holding lock: [ +0.000003] ffff932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: ttm_eu_reserve_buffers+0x270/0x470 [ttm] [ +0.000017] other info that might help us debug this: [ +0.000002] Possible unsafe locking scenario:
[ +0.000003] CPU0 [ +0.000002] ---- [ +0.000002] lock(reservation_ww_class_mutex); [ +0.000004] lock(reservation_ww_class_mutex); [ +0.000003] *** DEADLOCK ***
[ +0.000002] May be due to missing lock nesting notation
[ +0.000003] 7 locks held by python/4822: [ +0.000003] #0: ffff932c4ac028d0 (&process->mutex){+.+.}-{3:3}, at: kfd_ioctl_map_memory_to_gpu+0x10b/0x320 [amdgpu] [ +0.000232] #1: ffff932c55e830a8 (&info->lock#2){+.+.}-{3:3}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x64/0xf60 [amdgpu] [ +0.000241] #2: ffff932cc45b5e68 (&(*mem)->lock){+.+.}-{3:3}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0xdf/0xf60 [amdgpu] [ +0.000236] #3: ffffb2b35606fd28 (reservation_ww_class_acquire){+.+.}-{0:0}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x232/0xf60 [amdgpu] [ +0.000235] #4: ffff932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: ttm_eu_reserve_buffers+0x270/0x470 [ttm] [ +0.000015] #5: ffffffffc045f700 (*(sspp++)){....}-{0:0}, at: drm_dev_enter+0x5/0xa0 [drm] [ +0.000038] #6: ffff932c52da7078 (&vm->eviction_lock){+.+.}-{3:3}, at: amdgpu_vm_bo_update_mapping+0xd5/0x4f0 [amdgpu] [ +0.000195] stack backtrace: [ +0.000003] CPU: 11 PID: 4822 Comm: python Not tainted 5.13.0-kfd-rajneesh #1030 [ +0.000005] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F02 08/29/2018 [ +0.000003] Call Trace: [ +0.000003] dump_stack+0x6d/0x89 [ +0.000010] __lock_acquire+0xb93/0x1a90 [ +0.000009] lock_acquire+0x25d/0x2d0 [ +0.000005] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000184] ? lock_is_held_type+0xa2/0x110 [ +0.000006] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000184] __ww_mutex_lock.constprop.17+0xca/0x1060 [ +0.000007] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000183] ? lock_release+0x13f/0x270 [ +0.000005] ? lock_is_held_type+0xa2/0x110 [ +0.000006] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000183] amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000185] ttm_bo_release+0x4c6/0x580 [ttm] [ +0.000010] amdgpu_bo_unref+0x1a/0x30 [amdgpu] [ +0.000183] amdgpu_vm_free_table+0x76/0xa0 [amdgpu] [ +0.000189] amdgpu_vm_free_pts+0xb8/0xf0 [amdgpu] [ +0.000189] amdgpu_vm_update_ptes+0x411/0x770 [amdgpu] [ +0.000191] amdgpu_vm_bo_update_mapping+0x324/0x4f0 [amdgpu] [ +0.000191] amdgpu_vm_bo_update+0x251/0x610 [amdgpu] [ +0.000191] update_gpuvm_pte+0xcc/0x290 [amdgpu] [ +0.000229] ? amdgpu_vm_bo_map+0xd7/0x130 [amdgpu] [ +0.000190] amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x912/0xf60 [amdgpu] [ +0.000234] kfd_ioctl_map_memory_to_gpu+0x182/0x320 [amdgpu] [ +0.000218] kfd_ioctl+0x2b9/0x600 [amdgpu] [ +0.000216] ? kfd_ioctl_unmap_memory_from_gpu+0x270/0x270 [amdgpu] [ +0.000216] ? lock_release+0x13f/0x270 [ +0.000006] ? __fget_files+0x107/0x1e0 [ +0.000007] __x64_sys_ioctl+0x8b/0xd0 [ +0.000007] do_syscall_64+0x36/0x70 [ +0.000004] entry_SYSCALL_64_after_hwframe+0x44/0xae [ +0.000007] RIP: 0033:0x7fbff90a7317 [ +0.000004] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48 [ +0.000005] RSP: 002b:00007fbe301fe648 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.000006] RAX: ffffffffffffffda RBX: 00007fbcc402d820 RCX: 00007fbff90a7317 [ +0.000003] RDX: 00007fbe301fe690 RSI: 00000000c0184b18 RDI: 0000000000000004 [ +0.000003] RBP: 00007fbe301fe690 R08: 0000000000000000 R09: 00007fbcc402d880 [ +0.000003] R10: 0000000002001000 R11: 0000000000000246 R12: 00000000c0184b18 [ +0.000003] R13: 0000000000000004 R14: 00007fbf689593a0 R15: 00007fbcc402d820
Cc: Christian König christian.koenig@amd.com Cc: Felix Kuehling Felix.Kuehling@amd.com Cc: Alex Deucher Alexander.Deucher@amd.com
Reviewed-by: Christian König christian.koenig@amd.com Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Rajneesh Bhardwaj rajneesh.bhardwaj@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 5661b82d84d4..dda53fe30975 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1303,7 +1303,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo) !(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) return;
- dma_resv_lock(bo->base.resv, NULL); + if (WARN_ON_ONCE(!dma_resv_trylock(bo->base.resv))) + return;
r = amdgpu_fill_buffer(abo, AMDGPU_POISON, bo->base.resv, &fence); if (!WARN_ON(r)) {
From: Don Brace don.brace@microchip.com
[ Upstream commit c4ff687d25c05919382a759503bd3821689f4e2f ]
Prevent "BUG: scheduling while atomic: rmmod" stack trace.
Stop setting spin_locks before calling OS functions to remove devices.
Link: https://lore.kernel.org/r/164375207296.440833.4996145011193819683.stgit@brun... Reviewed-by: Scott Benesh scott.benesh@microchip.com Reviewed-by: Scott Teel scott.teel@microchip.com Reviewed-by: Kevin Barnett kevin.barnett@microchip.com Signed-off-by: Don Brace don.brace@microchip.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/smartpqi/smartpqi_init.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/smartpqi/smartpqi_init.c b/drivers/scsi/smartpqi/smartpqi_init.c index f0897d587454..2db9f874cc51 100644 --- a/drivers/scsi/smartpqi/smartpqi_init.c +++ b/drivers/scsi/smartpqi/smartpqi_init.c @@ -2513,17 +2513,15 @@ static void pqi_remove_all_scsi_devices(struct pqi_ctrl_info *ctrl_info) struct pqi_scsi_dev *device; struct pqi_scsi_dev *next;
- spin_lock_irqsave(&ctrl_info->scsi_device_list_lock, flags); - list_for_each_entry_safe(device, next, &ctrl_info->scsi_device_list, scsi_device_list_entry) { if (pqi_is_device_added(device)) pqi_remove_device(ctrl_info, device); + spin_lock_irqsave(&ctrl_info->scsi_device_list_lock, flags); list_del(&device->scsi_device_list_entry); pqi_free_device(device); + spin_unlock_irqrestore(&ctrl_info->scsi_device_list_lock, flags); } - - spin_unlock_irqrestore(&ctrl_info->scsi_device_list_lock, flags); }
static int pqi_scan_scsi_devices(struct pqi_ctrl_info *ctrl_info)
From: Mahesh Rajashekhara mahesh.rajashekhara@microchip.com
[ Upstream commit 3ada501d602abf02353445c03bb3258146445d90 ]
Avoid dropping into shell if the controller is in locked up state.
Driver issues SIS soft reset to bring back the controller to SIS mode while OS boots into kdump mode.
If the controller is in lockup state, SIS soft reset does not work.
Since the controller lockup code has not been cleared, driver considers the firmware is no longer up and running. Driver returns back an error code to OS and the kdump fails.
Link: https://lore.kernel.org/r/164375212337.440833.11955356190354940369.stgit@bru... Reviewed-by: Kevin Barnett kevin.barnett@microchip.com Reviewed-by: Scott Benesh scott.benesh@microchip.com Reviewed-by: Scott Teel scott.teel@microchip.com Signed-off-by: Mahesh Rajashekhara mahesh.rajashekhara@microchip.com Signed-off-by: Don Brace don.brace@microchip.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/smartpqi/smartpqi_init.c | 39 ++++++++++++++++----------- 1 file changed, 23 insertions(+), 16 deletions(-)
diff --git a/drivers/scsi/smartpqi/smartpqi_init.c b/drivers/scsi/smartpqi/smartpqi_init.c index 2db9f874cc51..f3749e508673 100644 --- a/drivers/scsi/smartpqi/smartpqi_init.c +++ b/drivers/scsi/smartpqi/smartpqi_init.c @@ -7855,6 +7855,21 @@ static int pqi_force_sis_mode(struct pqi_ctrl_info *ctrl_info) return pqi_revert_to_sis_mode(ctrl_info); }
+static void pqi_perform_lockup_action(void) +{ + switch (pqi_lockup_action) { + case PANIC: + panic("FATAL: Smart Family Controller lockup detected"); + break; + case REBOOT: + emergency_restart(); + break; + case NONE: + default: + break; + } +} + static int pqi_ctrl_init(struct pqi_ctrl_info *ctrl_info) { int rc; @@ -7879,8 +7894,15 @@ static int pqi_ctrl_init(struct pqi_ctrl_info *ctrl_info) * commands. */ rc = sis_wait_for_ctrl_ready(ctrl_info); - if (rc) + if (rc) { + if (reset_devices) { + dev_err(&ctrl_info->pci_dev->dev, + "kdump init failed with error %d\n", rc); + pqi_lockup_action = REBOOT; + pqi_perform_lockup_action(); + } return rc; + }
/* * Get the controller properties. This allows us to determine @@ -8605,21 +8627,6 @@ static int pqi_ofa_ctrl_restart(struct pqi_ctrl_info *ctrl_info, unsigned int de return pqi_ctrl_init_resume(ctrl_info); }
-static void pqi_perform_lockup_action(void) -{ - switch (pqi_lockup_action) { - case PANIC: - panic("FATAL: Smart Family Controller lockup detected"); - break; - case REBOOT: - emergency_restart(); - break; - case NONE: - default: - break; - } -} - static struct pqi_raid_error_info pqi_ctrl_offline_raid_error_info = { .data_out_result = PQI_DATA_IN_OUT_HARDWARE_ERROR, .status = SAM_STAT_CHECK_CONDITION,
From: Pali Rohár pali@kernel.org
[ Upstream commit b0b0b8b897f8e12b2368e868bd7cdc5742d5c5a9 ]
Aardvark hardware supports Multi-MSI and MSI_FLAG_MULTI_PCI_MSI is already set for the MSI chip. But when allocating MSI interrupt numbers for Multi-MSI, the numbers need to be properly aligned, otherwise endpoint devices send MSI interrupt with incorrect numbers.
Fix this issue by using function bitmap_find_free_region() instead of bitmap_find_next_zero_area().
To ensure that aligned MSI interrupt numbers are used by endpoint devices, we cannot use Linux virtual irq numbers (as they are random and not properly aligned). Instead we need to use the aligned hwirq numbers.
This change fixes receiving MSI interrupts on Armada 3720 boards and allows using NVMe disks which use Multi-MSI feature with 3 interrupts.
Without this NVMe disks freeze booting as linux nvme-core.c is waiting 60s for an interrupt.
Link: https://lore.kernel.org/r/20220110015018.26359-4-kabel@kernel.org Signed-off-by: Pali Rohár pali@kernel.org Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/controller/pci-aardvark.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c index 82e2c618d532..15348be1a8aa 100644 --- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -1186,7 +1186,7 @@ static void advk_msi_irq_compose_msi_msg(struct irq_data *data,
msg->address_lo = lower_32_bits(msi_msg); msg->address_hi = upper_32_bits(msi_msg); - msg->data = data->irq; + msg->data = data->hwirq; }
static int advk_msi_set_affinity(struct irq_data *irq_data, @@ -1203,15 +1203,11 @@ static int advk_msi_irq_domain_alloc(struct irq_domain *domain, int hwirq, i;
mutex_lock(&pcie->msi_used_lock); - hwirq = bitmap_find_next_zero_area(pcie->msi_used, MSI_IRQ_NUM, - 0, nr_irqs, 0); - if (hwirq >= MSI_IRQ_NUM) { - mutex_unlock(&pcie->msi_used_lock); - return -ENOSPC; - } - - bitmap_set(pcie->msi_used, hwirq, nr_irqs); + hwirq = bitmap_find_free_region(pcie->msi_used, MSI_IRQ_NUM, + order_base_2(nr_irqs)); mutex_unlock(&pcie->msi_used_lock); + if (hwirq < 0) + return -ENOSPC;
for (i = 0; i < nr_irqs; i++) irq_domain_set_info(domain, virq + i, hwirq + i, @@ -1229,7 +1225,7 @@ static void advk_msi_irq_domain_free(struct irq_domain *domain, struct advk_pcie *pcie = domain->host_data;
mutex_lock(&pcie->msi_used_lock); - bitmap_clear(pcie->msi_used, d->hwirq, nr_irqs); + bitmap_release_region(pcie->msi_used, d->hwirq, order_base_2(nr_irqs)); mutex_unlock(&pcie->msi_used_lock); }
From: Ricardo Koller ricarkol@google.com
[ Upstream commit cc94d47ce16d4147d546e47c8248e8bd12ba5fe5 ]
The val argument in gicv3_access_reg can have any value when used for a read, not necessarily 0. Fix the assert by checking val only for writes.
Signed-off-by: Ricardo Koller ricarkol@google.com Reported-by: Reiji Watanabe reijiw@google.com Cc: Andrew Jones drjones@redhat.com Reviewed-by: Andrew Jones drjones@redhat.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20220127030858.3269036-2-ricarkol@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/kvm/lib/aarch64/gic_v3.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/lib/aarch64/gic_v3.c b/tools/testing/selftests/kvm/lib/aarch64/gic_v3.c index 00f613c0583c..e4945fe66620 100644 --- a/tools/testing/selftests/kvm/lib/aarch64/gic_v3.c +++ b/tools/testing/selftests/kvm/lib/aarch64/gic_v3.c @@ -159,7 +159,7 @@ static void gicv3_access_reg(uint32_t intid, uint64_t offset, uint32_t cpu_or_dist;
GUEST_ASSERT(bits_per_field <= reg_bits); - GUEST_ASSERT(*val < (1U << bits_per_field)); + GUEST_ASSERT(!write || *val < (1U << bits_per_field)); /* Some registers like IROUTER are 64 bit long. Those are currently not * supported by readl nor writel, so just asserting here until then. */
From: Ricardo Koller ricarkol@google.com
[ Upstream commit 11024a7a0ac26dd31ddfa0f6590e158bdf9ab858 ]
The guest in vgic_irq gets its arguments in a struct. This struct used to fit nicely in a single register so vcpu_args_set() was able to pass it by value by setting x0 with it. Unfortunately, this args struct grew after some commits and some guest args became random (specically kvm_supports_irqfd).
Fix this by passing the guest args as a pointer (after allocating some guest memory for it).
Signed-off-by: Ricardo Koller ricarkol@google.com Reported-by: Reiji Watanabe reijiw@google.com Cc: Andrew Jones drjones@redhat.com Reviewed-by: Andrew Jones drjones@redhat.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20220127030858.3269036-3-ricarkol@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../testing/selftests/kvm/aarch64/vgic_irq.c | 29 ++++++++++--------- 1 file changed, 16 insertions(+), 13 deletions(-)
diff --git a/tools/testing/selftests/kvm/aarch64/vgic_irq.c b/tools/testing/selftests/kvm/aarch64/vgic_irq.c index 7eca97799917..7f3afee5cc00 100644 --- a/tools/testing/selftests/kvm/aarch64/vgic_irq.c +++ b/tools/testing/selftests/kvm/aarch64/vgic_irq.c @@ -472,10 +472,10 @@ static void test_restore_active(struct test_args *args, struct kvm_inject_desc * guest_restore_active(args, MIN_SPI, 4, f->cmd); }
-static void guest_code(struct test_args args) +static void guest_code(struct test_args *args) { - uint32_t i, nr_irqs = args.nr_irqs; - bool level_sensitive = args.level_sensitive; + uint32_t i, nr_irqs = args->nr_irqs; + bool level_sensitive = args->level_sensitive; struct kvm_inject_desc *f, *inject_fns;
gic_init(GIC_V3, 1, dist, redist); @@ -484,11 +484,11 @@ static void guest_code(struct test_args args) gic_irq_enable(i);
for (i = MIN_SPI; i < nr_irqs; i++) - gic_irq_set_config(i, !args.level_sensitive); + gic_irq_set_config(i, !level_sensitive);
- gic_set_eoi_split(args.eoi_split); + gic_set_eoi_split(args->eoi_split);
- reset_priorities(&args); + reset_priorities(args); gic_set_priority_mask(CPU_PRIO_MASK);
inject_fns = level_sensitive ? inject_level_fns @@ -497,17 +497,17 @@ static void guest_code(struct test_args args) local_irq_enable();
/* Start the tests. */ - for_each_supported_inject_fn(&args, inject_fns, f) { - test_injection(&args, f); - test_preemption(&args, f); - test_injection_failure(&args, f); + for_each_supported_inject_fn(args, inject_fns, f) { + test_injection(args, f); + test_preemption(args, f); + test_injection_failure(args, f); }
/* Restore the active state of IRQs. This would happen when live * migrating IRQs in the middle of being handled. */ - for_each_supported_activate_fn(&args, set_active_fns, f) - test_restore_active(&args, f); + for_each_supported_activate_fn(args, set_active_fns, f) + test_restore_active(args, f);
GUEST_DONE(); } @@ -739,6 +739,7 @@ static void test_vgic(uint32_t nr_irqs, bool level_sensitive, bool eoi_split) int gic_fd; struct kvm_vm *vm; struct kvm_inject_args inject_args; + vm_vaddr_t args_gva;
struct test_args args = { .nr_irqs = nr_irqs, @@ -757,7 +758,9 @@ static void test_vgic(uint32_t nr_irqs, bool level_sensitive, bool eoi_split) vcpu_init_descriptor_tables(vm, VCPU_ID);
/* Setup the guest args page (so it gets the args). */ - vcpu_args_set(vm, 0, 1, args); + args_gva = vm_vaddr_alloc_page(vm); + memcpy(addr_gva2hva(vm, args_gva), &args, sizeof(args)); + vcpu_args_set(vm, 0, 1, args_gva);
gic_fd = vgic_v3_setup(vm, 1, nr_irqs, GICD_BASE_GPA, GICR_BASE_GPA);
From: Ricardo Koller ricarkol@google.com
[ Upstream commit 5b7898648f02083012900e48d063e51ccbdad165 ]
kvm_set_gsi_routing_irqchip_check(expect_failure=true) is used to check the error code returned by the kernel when trying to setup an invalid gsi routing table. The ioctl fails if "pin >= KVM_IRQCHIP_NUM_PINS", so kvm_set_gsi_routing_irqchip_check() should test the error only when "intid >= KVM_IRQCHIP_NUM_PINS+32". The issue is that the test check is "intid >= KVM_IRQCHIP_NUM_PINS", so for a case like "intid = KVM_IRQCHIP_NUM_PINS" the test wrongly assumes that the kernel will return an error. Fix this by using the right check.
Signed-off-by: Ricardo Koller ricarkol@google.com Reported-by: Reiji Watanabe reijiw@google.com Cc: Andrew Jones drjones@redhat.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20220127030858.3269036-4-ricarkol@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/kvm/aarch64/vgic_irq.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/aarch64/vgic_irq.c b/tools/testing/selftests/kvm/aarch64/vgic_irq.c index 7f3afee5cc00..48e43e24d240 100644 --- a/tools/testing/selftests/kvm/aarch64/vgic_irq.c +++ b/tools/testing/selftests/kvm/aarch64/vgic_irq.c @@ -573,8 +573,8 @@ static void kvm_set_gsi_routing_irqchip_check(struct kvm_vm *vm, kvm_gsi_routing_write(vm, routing); } else { ret = _kvm_gsi_routing_write(vm, routing); - /* The kernel only checks for KVM_IRQCHIP_NUM_PINS. */ - if (intid >= KVM_IRQCHIP_NUM_PINS) + /* The kernel only checks e->irqchip.pin >= KVM_IRQCHIP_NUM_PINS */ + if (((uint64_t)intid + num - 1 - MIN_SPI) >= KVM_IRQCHIP_NUM_PINS) TEST_ASSERT(ret != 0 && errno == EINVAL, "Bad intid %u did not cause KVM_SET_GSI_ROUTING " "error: rc: %i errno: %i", intid, ret, errno);
From: Ricardo Koller ricarkol@google.com
[ Upstream commit a5cd38fd9c47b23abc6df08d6ee6a71b39038185 ]
Fix the formatting of some comments and the wording of one of them (in gicv3_access_reg).
Signed-off-by: Ricardo Koller ricarkol@google.com Reported-by: Reiji Watanabe reijiw@google.com Cc: Andrew Jones drjones@redhat.com Reviewed-by: Andrew Jones drjones@redhat.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20220127030858.3269036-5-ricarkol@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/kvm/aarch64/vgic_irq.c | 12 ++++++++---- tools/testing/selftests/kvm/lib/aarch64/gic_v3.c | 10 ++++++---- tools/testing/selftests/kvm/lib/aarch64/vgic.c | 3 ++- 3 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/tools/testing/selftests/kvm/aarch64/vgic_irq.c b/tools/testing/selftests/kvm/aarch64/vgic_irq.c index 48e43e24d240..554ca649d470 100644 --- a/tools/testing/selftests/kvm/aarch64/vgic_irq.c +++ b/tools/testing/selftests/kvm/aarch64/vgic_irq.c @@ -306,7 +306,8 @@ static void guest_restore_active(struct test_args *args, uint32_t prio, intid, ap1r; int i;
- /* Set the priorities of the first (KVM_NUM_PRIOS - 1) IRQs + /* + * Set the priorities of the first (KVM_NUM_PRIOS - 1) IRQs * in descending order, so intid+1 can preempt intid. */ for (i = 0, prio = (num - 1) * 8; i < num; i++, prio -= 8) { @@ -315,7 +316,8 @@ static void guest_restore_active(struct test_args *args, gic_set_priority(intid, prio); }
- /* In a real migration, KVM would restore all GIC state before running + /* + * In a real migration, KVM would restore all GIC state before running * guest code. */ for (i = 0; i < num; i++) { @@ -503,7 +505,8 @@ static void guest_code(struct test_args *args) test_injection_failure(args, f); }
- /* Restore the active state of IRQs. This would happen when live + /* + * Restore the active state of IRQs. This would happen when live * migrating IRQs in the middle of being handled. */ for_each_supported_activate_fn(args, set_active_fns, f) @@ -844,7 +847,8 @@ int main(int argc, char **argv) } }
- /* If the user just specified nr_irqs and/or gic_version, then run all + /* + * If the user just specified nr_irqs and/or gic_version, then run all * combinations. */ if (default_args) { diff --git a/tools/testing/selftests/kvm/lib/aarch64/gic_v3.c b/tools/testing/selftests/kvm/lib/aarch64/gic_v3.c index e4945fe66620..263bf3ed8fd5 100644 --- a/tools/testing/selftests/kvm/lib/aarch64/gic_v3.c +++ b/tools/testing/selftests/kvm/lib/aarch64/gic_v3.c @@ -19,7 +19,7 @@ struct gicv3_data { unsigned int nr_spis; };
-#define sgi_base_from_redist(redist_base) (redist_base + SZ_64K) +#define sgi_base_from_redist(redist_base) (redist_base + SZ_64K) #define DIST_BIT (1U << 31)
enum gicv3_intid_range { @@ -105,7 +105,8 @@ static void gicv3_set_eoi_split(bool split) { uint32_t val;
- /* All other fields are read-only, so no need to read CTLR first. In + /* + * All other fields are read-only, so no need to read CTLR first. In * fact, the kernel does the same. */ val = split ? (1U << 1) : 0; @@ -160,8 +161,9 @@ static void gicv3_access_reg(uint32_t intid, uint64_t offset,
GUEST_ASSERT(bits_per_field <= reg_bits); GUEST_ASSERT(!write || *val < (1U << bits_per_field)); - /* Some registers like IROUTER are 64 bit long. Those are currently not - * supported by readl nor writel, so just asserting here until then. + /* + * This function does not support 64 bit accesses. Just asserting here + * until we implement readq/writeq. */ GUEST_ASSERT(reg_bits == 32);
diff --git a/tools/testing/selftests/kvm/lib/aarch64/vgic.c b/tools/testing/selftests/kvm/lib/aarch64/vgic.c index f5cd0c536d85..7c876ccf9294 100644 --- a/tools/testing/selftests/kvm/lib/aarch64/vgic.c +++ b/tools/testing/selftests/kvm/lib/aarch64/vgic.c @@ -152,7 +152,8 @@ static void vgic_poke_irq(int gic_fd, uint32_t intid, attr += SZ_64K; }
- /* All calls will succeed, even with invalid intid's, as long as the + /* + * All calls will succeed, even with invalid intid's, as long as the * addr part of the attr is within 32 bits (checked above). An invalid * intid will just make the read/writes point to above the intended * register space (i.e., ICPENDR after ISPENDR).
From: Ricardo Koller ricarkol@google.com
[ Upstream commit b53de63a89244c196d8a2ea76b6754e3fdb4b626 ]
vgic_poke_irq() checks that the attr argument passed to the vgic device ioctl is sane. Make this check tighter by moving it to after the last attr update.
Signed-off-by: Ricardo Koller ricarkol@google.com Reported-by: Reiji Watanabe reijiw@google.com Cc: Andrew Jones drjones@redhat.com Reviewed-by: Andrew Jones drjones@redhat.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20220127030858.3269036-6-ricarkol@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/kvm/lib/aarch64/vgic.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/aarch64/vgic.c b/tools/testing/selftests/kvm/lib/aarch64/vgic.c index 7c876ccf9294..5d45046c1b80 100644 --- a/tools/testing/selftests/kvm/lib/aarch64/vgic.c +++ b/tools/testing/selftests/kvm/lib/aarch64/vgic.c @@ -140,9 +140,6 @@ static void vgic_poke_irq(int gic_fd, uint32_t intid, uint64_t val; bool intid_is_private = INTID_IS_SGI(intid) || INTID_IS_PPI(intid);
- /* Check that the addr part of the attr is within 32 bits. */ - assert(attr <= KVM_DEV_ARM_VGIC_OFFSET_MASK); - uint32_t group = intid_is_private ? KVM_DEV_ARM_VGIC_GRP_REDIST_REGS : KVM_DEV_ARM_VGIC_GRP_DIST_REGS;
@@ -152,6 +149,9 @@ static void vgic_poke_irq(int gic_fd, uint32_t intid, attr += SZ_64K; }
+ /* Check that the addr part of the attr is within 32 bits. */ + assert((attr & ~KVM_DEV_ARM_VGIC_OFFSET_MASK) == 0); + /* * All calls will succeed, even with invalid intid's, as long as the * addr part of the attr is within 32 bits (checked above). An invalid
From: Zhou Guanghui zhouguanghui1@huawei.com
[ Upstream commit 30de2b541af98179780054836b48825fcfba4408 ]
During event processing, events are read from the event queue one by one until the queue is empty.If the master device continuously requests address access at the same time and the SMMU generates events, the cyclic processing of the event takes a long time and softlockup warnings may be reported.
arm-smmu-v3 arm-smmu-v3.34.auto: event 0x0a received: arm-smmu-v3 arm-smmu-v3.34.auto: 0x00007f220000280a arm-smmu-v3 arm-smmu-v3.34.auto: 0x000010000000007e arm-smmu-v3 arm-smmu-v3.34.auto: 0x00000000034e8670 watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [irq/268-arm-smm:247] Call trace: _dev_info+0x7c/0xa0 arm_smmu_evtq_thread+0x1c0/0x230 irq_thread_fn+0x30/0x80 irq_thread+0x128/0x210 kthread+0x134/0x138 ret_from_fork+0x10/0x1c Kernel panic - not syncing: softlockup: hung tasks
Fix this by calling cond_resched() after the event information is printed.
Signed-off-by: Zhou Guanghui zhouguanghui1@huawei.com Link: https://lore.kernel.org/r/20220119070754.26528-1-zhouguanghui1@huawei.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 6dc6d8b6b368..f60381cdf1c4 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1558,6 +1558,7 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev) dev_info(smmu->dev, "\t0x%016llx\n", (unsigned long long)evt[i]);
+ cond_resched(); }
/*
From: Neal Liu neal_liu@aspeedtech.com
[ Upstream commit c3c9cee592828528fd228b01d312c7526c584a42 ]
Enable Aspeed quirks in commit 7f2d73788d90 ("usb: ehci: handshake CMD_RUN instead of STS_HALT") to support Aspeed ehci-pci device.
Acked-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Neal Liu neal_liu@aspeedtech.com Link: https://lore.kernel.org/r/20220208101657.76459-1-neal_liu@aspeedtech.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/host/ehci-pci.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/drivers/usb/host/ehci-pci.c b/drivers/usb/host/ehci-pci.c index e87cf3a00fa4..638f03b89739 100644 --- a/drivers/usb/host/ehci-pci.c +++ b/drivers/usb/host/ehci-pci.c @@ -21,6 +21,9 @@ static const char hcd_name[] = "ehci-pci"; /* defined here to avoid adding to pci_ids.h for single instance use */ #define PCI_DEVICE_ID_INTEL_CE4100_USB 0x2e70
+#define PCI_VENDOR_ID_ASPEED 0x1a03 +#define PCI_DEVICE_ID_ASPEED_EHCI 0x2603 + /*-------------------------------------------------------------------------*/ #define PCI_DEVICE_ID_INTEL_QUARK_X1000_SOC 0x0939 static inline bool is_intel_quark_x1000(struct pci_dev *pdev) @@ -222,6 +225,12 @@ static int ehci_pci_setup(struct usb_hcd *hcd) ehci->has_synopsys_hc_bug = 1; } break; + case PCI_VENDOR_ID_ASPEED: + if (pdev->device == PCI_DEVICE_ID_ASPEED_EHCI) { + ehci_info(ehci, "applying Aspeed HC workaround\n"); + ehci->is_aspeed = 1; + } + break; }
/* optional debug port, normally in the first BAR */
From: Marc Zyngier maz@kernel.org
[ Upstream commit 5177fe91e4cf78a659aada2c9cf712db4d788481 ]
Userspace can specify which events a guest is allowed to use with the KVM_ARM_VCPU_PMU_V3_FILTER attribute. The list of allowed events can be identified by a guest from reading the PMCEID{0,1}_EL0 registers.
Changing the PMU event filter after a VCPU has run can cause reads of the registers performed before the filter is changed to return different values than reads performed with the new event filter in place. The architecture defines the two registers as read-only, and this behaviour contradicts that.
Keep track when the first VCPU has run and deny changes to the PMU event filter to prevent this from happening.
Signed-off-by: Marc Zyngier maz@kernel.org [ Alexandru E: Added commit message, updated ioctl documentation ] Signed-off-by: Alexandru Elisei alexandru.elisei@arm.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20220127161759.53553-2-alexandru.elisei@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/virt/kvm/devices/vcpu.rst | 2 +- arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/kvm/arm.c | 4 +++ arch/arm64/kvm/pmu-emul.c | 33 +++++++++++++++---------- 4 files changed, 26 insertions(+), 14 deletions(-)
diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst index 60a29972d3f1..d063aaee5bb7 100644 --- a/Documentation/virt/kvm/devices/vcpu.rst +++ b/Documentation/virt/kvm/devices/vcpu.rst @@ -70,7 +70,7 @@ irqchip. -ENODEV PMUv3 not supported or GIC not initialized -ENXIO PMUv3 not properly configured or in-kernel irqchip not configured as required prior to calling this attribute - -EBUSY PMUv3 already initialized + -EBUSY PMUv3 already initialized or a VCPU has already run -EINVAL Invalid filter range ======= ======================================================
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 031e3a2537fc..8234626a945a 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -136,6 +136,7 @@ struct kvm_arch {
/* Memory Tagging Extension enabled for the guest */ bool mte_enabled; + bool ran_once; };
struct kvm_vcpu_fault_info { diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 4dca6ffd03d4..85a2a75f4498 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -634,6 +634,10 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu) if (kvm_vm_is_protected(kvm)) kvm_call_hyp_nvhe(__pkvm_vcpu_init_traps, vcpu);
+ mutex_lock(&kvm->lock); + kvm->arch.ran_once = true; + mutex_unlock(&kvm->lock); + return ret; }
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c index fbcfd4ec6f92..bc771bc1a041 100644 --- a/arch/arm64/kvm/pmu-emul.c +++ b/arch/arm64/kvm/pmu-emul.c @@ -924,6 +924,8 @@ static bool pmu_irq_is_valid(struct kvm *kvm, int irq)
int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) { + struct kvm *kvm = vcpu->kvm; + if (!kvm_vcpu_has_pmu(vcpu)) return -ENODEV;
@@ -941,7 +943,7 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) int __user *uaddr = (int __user *)(long)attr->addr; int irq;
- if (!irqchip_in_kernel(vcpu->kvm)) + if (!irqchip_in_kernel(kvm)) return -EINVAL;
if (get_user(irq, uaddr)) @@ -951,7 +953,7 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) if (!(irq_is_ppi(irq) || irq_is_spi(irq))) return -EINVAL;
- if (!pmu_irq_is_valid(vcpu->kvm, irq)) + if (!pmu_irq_is_valid(kvm, irq)) return -EINVAL;
if (kvm_arm_pmu_irq_initialized(vcpu)) @@ -966,7 +968,7 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) struct kvm_pmu_event_filter filter; int nr_events;
- nr_events = kvm_pmu_event_mask(vcpu->kvm) + 1; + nr_events = kvm_pmu_event_mask(kvm) + 1;
uaddr = (struct kvm_pmu_event_filter __user *)(long)attr->addr;
@@ -978,12 +980,17 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) filter.action != KVM_PMU_EVENT_DENY)) return -EINVAL;
- mutex_lock(&vcpu->kvm->lock); + mutex_lock(&kvm->lock); + + if (kvm->arch.ran_once) { + mutex_unlock(&kvm->lock); + return -EBUSY; + }
- if (!vcpu->kvm->arch.pmu_filter) { - vcpu->kvm->arch.pmu_filter = bitmap_alloc(nr_events, GFP_KERNEL_ACCOUNT); - if (!vcpu->kvm->arch.pmu_filter) { - mutex_unlock(&vcpu->kvm->lock); + if (!kvm->arch.pmu_filter) { + kvm->arch.pmu_filter = bitmap_alloc(nr_events, GFP_KERNEL_ACCOUNT); + if (!kvm->arch.pmu_filter) { + mutex_unlock(&kvm->lock); return -ENOMEM; }
@@ -994,17 +1001,17 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) * events, the default is to allow. */ if (filter.action == KVM_PMU_EVENT_ALLOW) - bitmap_zero(vcpu->kvm->arch.pmu_filter, nr_events); + bitmap_zero(kvm->arch.pmu_filter, nr_events); else - bitmap_fill(vcpu->kvm->arch.pmu_filter, nr_events); + bitmap_fill(kvm->arch.pmu_filter, nr_events); }
if (filter.action == KVM_PMU_EVENT_ALLOW) - bitmap_set(vcpu->kvm->arch.pmu_filter, filter.base_event, filter.nevents); + bitmap_set(kvm->arch.pmu_filter, filter.base_event, filter.nevents); else - bitmap_clear(vcpu->kvm->arch.pmu_filter, filter.base_event, filter.nevents); + bitmap_clear(kvm->arch.pmu_filter, filter.base_event, filter.nevents);
- mutex_unlock(&vcpu->kvm->lock); + mutex_unlock(&kvm->lock);
return 0; }
From: Ilya Leoshkevich iii@linux.ibm.com
[ Upstream commit f07f1503469b11b739892d50c836992ffbe026ee ]
powerpc does not select ARCH_HAS_SYSCALL_WRAPPER, so its syscall handlers take "unpacked" syscall arguments. Indicate this to libbpf using PT_REGS_SYSCALL_REGS macro.
Reported-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Ilya Leoshkevich iii@linux.ibm.com Signed-off-by: Andrii Nakryiko andrii@kernel.org Tested-by: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com Link: https://lore.kernel.org/bpf/20220209021745.2215452-5-iii@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/lib/bpf/bpf_tracing.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/tools/lib/bpf/bpf_tracing.h b/tools/lib/bpf/bpf_tracing.h index e1b505606882..41f5f3149875 100644 --- a/tools/lib/bpf/bpf_tracing.h +++ b/tools/lib/bpf/bpf_tracing.h @@ -178,6 +178,8 @@ #define __PT_RC_REG gpr[3] #define __PT_SP_REG sp #define __PT_IP_REG nip +/* powerpc does not select ARCH_HAS_SYSCALL_WRAPPER. */ +#define PT_REGS_SYSCALL_REGS(ctx) ctx
#elif defined(bpf_target_sparc)
From: Ilya Leoshkevich iii@linux.ibm.com
[ Upstream commit fbca4a2f649730b67488a8b36140ce4d2cf13c63 ]
On arm64, the first syscall argument should be accessed via orig_x0 (see arch/arm64/include/asm/syscall.h). Currently regs[0] is used instead, leading to bpf_syscall_macro test failure.
orig_x0 cannot be added to struct user_pt_regs, since its layout is a part of the ABI. Therefore provide access to it only through PT_REGS_PARM1_CORE_SYSCALL() by using a struct pt_regs flavor.
Reported-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Ilya Leoshkevich iii@linux.ibm.com Signed-off-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/bpf/20220209021745.2215452-10-iii@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/lib/bpf/bpf_tracing.h | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/tools/lib/bpf/bpf_tracing.h b/tools/lib/bpf/bpf_tracing.h index 41f5f3149875..bed07c35b8de 100644 --- a/tools/lib/bpf/bpf_tracing.h +++ b/tools/lib/bpf/bpf_tracing.h @@ -140,6 +140,10 @@
#elif defined(bpf_target_arm64)
+struct pt_regs___arm64 { + unsigned long orig_x0; +}; + /* arm64 provides struct user_pt_regs instead of struct pt_regs to userspace */ #define __PT_REGS_CAST(x) ((const struct user_pt_regs *)(x)) #define __PT_PARM1_REG regs[0] @@ -152,6 +156,8 @@ #define __PT_RC_REG regs[0] #define __PT_SP_REG sp #define __PT_IP_REG pc +#define PT_REGS_PARM1_SYSCALL(x) ({ _Pragma("GCC error "use PT_REGS_PARM1_CORE_SYSCALL() instead""); 0l; }) +#define PT_REGS_PARM1_CORE_SYSCALL(x) BPF_CORE_READ((const struct pt_regs___arm64 *)(x), orig_x0)
#elif defined(bpf_target_mips)
From: Ilya Leoshkevich iii@linux.ibm.com
[ Upstream commit 1f22a6f9f9a0f50218a11a0554709fd34a821fa3 ]
On s390, the first syscall argument should be accessed via orig_gpr2 (see arch/s390/include/asm/syscall.h). Currently gpr[2] is used instead, leading to bpf_syscall_macro test failure.
orig_gpr2 cannot be added to user_pt_regs, since its layout is a part of the ABI. Therefore provide access to it only through PT_REGS_PARM1_CORE_SYSCALL() by using a struct pt_regs flavor.
Reported-by: Andrii Nakryiko andrii@kernel.org Signed-off-by: Ilya Leoshkevich iii@linux.ibm.com Signed-off-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/bpf/20220209021745.2215452-11-iii@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/lib/bpf/bpf_tracing.h | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/tools/lib/bpf/bpf_tracing.h b/tools/lib/bpf/bpf_tracing.h index bed07c35b8de..122d79c8f4b4 100644 --- a/tools/lib/bpf/bpf_tracing.h +++ b/tools/lib/bpf/bpf_tracing.h @@ -112,6 +112,10 @@
#elif defined(bpf_target_s390)
+struct pt_regs___s390 { + unsigned long orig_gpr2; +}; + /* s390 provides user_pt_regs instead of struct pt_regs to userspace */ #define __PT_REGS_CAST(x) ((const user_pt_regs *)(x)) #define __PT_PARM1_REG gprs[2] @@ -124,6 +128,8 @@ #define __PT_RC_REG gprs[2] #define __PT_SP_REG gprs[15] #define __PT_IP_REG psw.addr +#define PT_REGS_PARM1_SYSCALL(x) ({ _Pragma("GCC error "use PT_REGS_PARM1_CORE_SYSCALL() instead""); 0l; }) +#define PT_REGS_PARM1_CORE_SYSCALL(x) BPF_CORE_READ((const struct pt_regs___s390 *)(x), orig_gpr2)
#elif defined(bpf_target_arm)
From: Hou Zhiqiang Zhiqiang.Hou@nxp.com
[ Upstream commit 829cc0e2ea2d61fb6c54bc3f8a17f86c56e11864 ]
The copy test uses the memcpy() to copy data between IO memory spaces. This can trigger an alignment fault error (pasted the error logs below) because memcpy() may use unaligned accesses on a mapped memory that is just IO, which does not support unaligned memory accesses.
Fix it by using the correct memcpy API to copy from/to IO memory.
Alignment fault error logs: Unable to handle kernel paging request at virtual address ffff8000101cd3c1 Mem abort info: ESR = 0x96000021 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x21: alignment fault Data abort info: ISV = 0, ISS = 0x00000021 CM = 0, WnR = 0 swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000081773000 [ffff8000101cd3c1] pgd=1000000082410003, p4d=1000000082410003, pud=1000000082411003, pmd=1000000082412003, pte=0068004000001f13 Internal error: Oops: 96000021 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 6 Comm: kworker/0:0H Not tainted 5.15.0-rc1-next-20210914-dirty #2 Hardware name: LS1012A RDB Board (DT) Workqueue: kpcitest pci_epf_test_cmd_handler pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __memcpy+0x168/0x230 lr : pci_epf_test_cmd_handler+0x6f0/0xa68 sp : ffff80001003bce0 x29: ffff80001003bce0 x28: ffff800010135000 x27: ffff8000101e5000 x26: ffff8000101cd000 x25: ffff6cda941cf6c8 x24: 0000000000000000 x23: ffff6cda863f2000 x22: ffff6cda9096c800 x21: ffff800010135000 x20: ffff6cda941cf680 x19: ffffaf39fd999000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: ffffaf39fd2b6000 x14: 0000000000000000 x13: 15f5c8fa2f984d57 x12: 604d132b60275454 x11: 065cee5e5fb428b6 x10: aae662eb17d0cf3e x9 : 1d97c9a1b4ddef37 x8 : 7541b65edebf928c x7 : e71937c4fc595de0 x6 : b8a0e09562430d1c x5 : ffff8000101e5401 x4 : ffff8000101cd401 x3 : ffff8000101e5380 x2 : fffffffffffffff1 x1 : ffff8000101cd3c0 x0 : ffff8000101e5000 Call trace: __memcpy+0x168/0x230 process_one_work+0x1ec/0x370 worker_thread+0x44/0x478 kthread+0x154/0x160 ret_from_fork+0x10/0x20 Code: a984346c a9c4342c f1010042 54fffee8 (a97c3c8e) ---[ end trace 568c28c7b6336335 ]---
Link: https://lore.kernel.org/r/20211217094708.28678-1-Zhiqiang.Hou@nxp.com Signed-off-by: Hou Zhiqiang Zhiqiang.Hou@nxp.com Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Reviewed-by: Kishon Vijay Abraham I kishon@ti.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/endpoint/functions/pci-epf-test.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/endpoint/functions/pci-epf-test.c b/drivers/pci/endpoint/functions/pci-epf-test.c index 90d84d3bc868..c7e45633beaf 100644 --- a/drivers/pci/endpoint/functions/pci-epf-test.c +++ b/drivers/pci/endpoint/functions/pci-epf-test.c @@ -285,7 +285,17 @@ static int pci_epf_test_copy(struct pci_epf_test *epf_test) if (ret) dev_err(dev, "Data transfer failed\n"); } else { - memcpy(dst_addr, src_addr, reg->size); + void *buf; + + buf = kzalloc(reg->size, GFP_KERNEL); + if (!buf) { + ret = -ENOMEM; + goto err_map_addr; + } + + memcpy_fromio(buf, src_addr, reg->size); + memcpy_toio(dst_addr, buf, reg->size); + kfree(buf); } ktime_get_ts64(&end); pci_epf_test_print_rate("COPY", reg->size, &start, &end, use_dma);
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
[ Upstream commit 4f9bf2a2f5aacf988e6d5e56b961ba45c5a25248 ]
Commit 9652dc2eb9e40 ("tcp: relax listening_hash operations")
removed the need to disable bottom half while acquiring listening_hash.lock. There are still two callers left which disable bottom half before the lock is acquired.
On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts as a lock to ensure that resources, that are protected by disabling bottom halves, remain protected. This leads to a circular locking dependency if the lock acquired with disabled bottom halves is also acquired with enabled bottom halves followed by disabling bottom halves. This is the reverse locking order. It has been observed with inet_listen_hashbucket::lock:
local_bh_disable() + spin_lock(&ilb->lock): inet_listen() inet_csk_listen_start() sk->sk_prot->hash() := inet_hash() local_bh_disable() __inet_hash() spin_lock(&ilb->lock); acquire(&ilb->lock);
Reverse order: spin_lock(&ilb2->lock) + local_bh_disable(): tcp_seq_next() listening_get_next() spin_lock(&ilb2->lock); acquire(&ilb2->lock);
tcp4_seq_show() get_tcp4_sock() sock_i_ino() read_lock_bh(&sk->sk_callback_lock); acquire(softirq_ctrl) // <---- whoops acquire(&sk->sk_callback_lock)
Drop local_bh_disable() around __inet_hash() which acquires listening_hash->lock. Split inet_unhash() and acquire the listen_hashbucket lock without disabling bottom halves; the inet_ehash lock with disabled bottom halves.
Reported-by: Mike Galbraith efault@gmx.de Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Link: https://lkml.kernel.org/r/12d6f9879a97cd56c09fb53dee343cbb14f7f1f7.camel@gmx... Link: https://lkml.kernel.org/r/X9CheYjuXWc75Spa@hirez.programming.kicks-ass.net Link: https://lore.kernel.org/r/YgQOebeZ10eNx1W6@linutronix.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/inet_hashtables.c | 53 ++++++++++++++++++++++--------------- net/ipv6/inet6_hashtables.c | 5 +--- 2 files changed, 33 insertions(+), 25 deletions(-)
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 30ab717ff1b8..17440840a791 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -637,7 +637,9 @@ int __inet_hash(struct sock *sk, struct sock *osk) int err = 0;
if (sk->sk_state != TCP_LISTEN) { + local_bh_disable(); inet_ehash_nolisten(sk, osk, NULL); + local_bh_enable(); return 0; } WARN_ON(!sk_unhashed(sk)); @@ -669,45 +671,54 @@ int inet_hash(struct sock *sk) { int err = 0;
- if (sk->sk_state != TCP_CLOSE) { - local_bh_disable(); + if (sk->sk_state != TCP_CLOSE) err = __inet_hash(sk, NULL); - local_bh_enable(); - }
return err; } EXPORT_SYMBOL_GPL(inet_hash);
-void inet_unhash(struct sock *sk) +static void __inet_unhash(struct sock *sk, struct inet_listen_hashbucket *ilb) { - struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo; - struct inet_listen_hashbucket *ilb = NULL; - spinlock_t *lock; - if (sk_unhashed(sk)) return;
- if (sk->sk_state == TCP_LISTEN) { - ilb = &hashinfo->listening_hash[inet_sk_listen_hashfn(sk)]; - lock = &ilb->lock; - } else { - lock = inet_ehash_lockp(hashinfo, sk->sk_hash); - } - spin_lock_bh(lock); - if (sk_unhashed(sk)) - goto unlock; - if (rcu_access_pointer(sk->sk_reuseport_cb)) reuseport_stop_listen_sock(sk); if (ilb) { + struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo; + inet_unhash2(hashinfo, sk); ilb->count--; } __sk_nulls_del_node_init_rcu(sk); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); -unlock: - spin_unlock_bh(lock); +} + +void inet_unhash(struct sock *sk) +{ + struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo; + + if (sk_unhashed(sk)) + return; + + if (sk->sk_state == TCP_LISTEN) { + struct inet_listen_hashbucket *ilb; + + ilb = &hashinfo->listening_hash[inet_sk_listen_hashfn(sk)]; + /* Don't disable bottom halves while acquiring the lock to + * avoid circular locking dependency on PREEMPT_RT. + */ + spin_lock(&ilb->lock); + __inet_unhash(sk, ilb); + spin_unlock(&ilb->lock); + } else { + spinlock_t *lock = inet_ehash_lockp(hashinfo, sk->sk_hash); + + spin_lock_bh(lock); + __inet_unhash(sk, NULL); + spin_unlock_bh(lock); + } } EXPORT_SYMBOL_GPL(inet_unhash);
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index 4514444e96c8..4740afecf7c6 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -333,11 +333,8 @@ int inet6_hash(struct sock *sk) { int err = 0;
- if (sk->sk_state != TCP_CLOSE) { - local_bh_disable(); + if (sk->sk_state != TCP_CLOSE) err = __inet_hash(sk, NULL); - local_bh_enable(); - }
return err; }
From: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org
[ Upstream commit 9f72d4757cbe4d1ed669192f6d23817c9e437c4b ]
The Qualcomm PCI bridge device (Device ID 0x0110) found in chipsets such as SM8450 does not set the Command Completed bit unless writes to the Slot Command register change "Control" bits.
This results in timeouts like below:
pcieport 0001:00:00.0: pciehp: Timeout on hotplug command 0x03c0 (issued 2020 msec ago)
Add the device to the Command Completed quirk to mark commands "completed" immediately unless they change the "Control" bits.
Link: https://lore.kernel.org/r/20220210145003.135907-1-manivannan.sadhasivam@lina... Signed-off-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Signed-off-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/hotplug/pciehp_hpc.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 85dce560831a..040ae076ec0e 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -1086,6 +1086,8 @@ static void quirk_cmd_compl(struct pci_dev *pdev) } DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI, 8, quirk_cmd_compl); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_QCOM, 0x0110, + PCI_CLASS_BRIDGE_PCI, 8, quirk_cmd_compl); DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_QCOM, 0x0400, PCI_CLASS_BRIDGE_PCI, 8, quirk_cmd_compl); DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_QCOM, 0x0401,
From: Sreekanth Reddy sreekanth.reddy@broadcom.com
[ Upstream commit 580e6742205efe6b0bfa5a6a6079f509d99168e0 ]
During controller reset, the driver tries to flush all the pending firmware event works from worker queue that are queued prior to the reset. However, if any work is waiting for device addition/removal operation to be completed at the SML, then a deadlock is observed. This is due to the controller reset waiting for the device addition/removal to be completed and the device/addition removal is waiting for the controller reset to be completed.
To limit this deadlock, continue with the controller reset handling without canceling the work which is waiting for device addition/removal operation to complete at SML.
Link: https://lore.kernel.org/r/20220210095817.22828-2-sreekanth.reddy@broadcom.co... Signed-off-by: Sreekanth Reddy sreekanth.reddy@broadcom.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/mpi3mr/mpi3mr.h | 4 ++ drivers/scsi/mpi3mr/mpi3mr_fw.c | 2 +- drivers/scsi/mpi3mr/mpi3mr_os.c | 106 ++++++++++++++++++++++++++------ 3 files changed, 91 insertions(+), 21 deletions(-)
diff --git a/drivers/scsi/mpi3mr/mpi3mr.h b/drivers/scsi/mpi3mr/mpi3mr.h index fc4eaf6d1e47..d892ade421bf 100644 --- a/drivers/scsi/mpi3mr/mpi3mr.h +++ b/drivers/scsi/mpi3mr/mpi3mr.h @@ -866,6 +866,8 @@ struct mpi3mr_ioc { * @send_ack: Event acknowledgment required or not * @process_evt: Bottomhalf processing required or not * @evt_ctx: Event context to send in Ack + * @pending_at_sml: waiting for device add/remove API to complete + * @discard: discard this event * @ref_count: kref count * @event_data: Actual MPI3 event data */ @@ -877,6 +879,8 @@ struct mpi3mr_fwevt { bool send_ack; bool process_evt; u32 evt_ctx; + bool pending_at_sml; + bool discard; struct kref ref_count; char event_data[0] __aligned(4); }; diff --git a/drivers/scsi/mpi3mr/mpi3mr_fw.c b/drivers/scsi/mpi3mr/mpi3mr_fw.c index 15bdc21ead66..7193b983ee3b 100644 --- a/drivers/scsi/mpi3mr/mpi3mr_fw.c +++ b/drivers/scsi/mpi3mr/mpi3mr_fw.c @@ -4353,8 +4353,8 @@ int mpi3mr_soft_reset_handler(struct mpi3mr_ioc *mrioc, memset(mrioc->devrem_bitmap, 0, mrioc->devrem_bitmap_sz); memset(mrioc->removepend_bitmap, 0, mrioc->dev_handle_bitmap_sz); memset(mrioc->evtack_cmds_bitmap, 0, mrioc->evtack_cmds_bitmap_sz); - mpi3mr_cleanup_fwevt_list(mrioc); mpi3mr_flush_host_io(mrioc); + mpi3mr_cleanup_fwevt_list(mrioc); mpi3mr_invalidate_devhandles(mrioc); if (mrioc->prepare_for_reset) { mrioc->prepare_for_reset = 0; diff --git a/drivers/scsi/mpi3mr/mpi3mr_os.c b/drivers/scsi/mpi3mr/mpi3mr_os.c index 284117da9086..d205354be63a 100644 --- a/drivers/scsi/mpi3mr/mpi3mr_os.c +++ b/drivers/scsi/mpi3mr/mpi3mr_os.c @@ -285,6 +285,35 @@ static struct mpi3mr_fwevt *mpi3mr_dequeue_fwevt( return fwevt; }
+/** + * mpi3mr_cancel_work - cancel firmware event + * @fwevt: fwevt object which needs to be canceled + * + * Return: Nothing. + */ +static void mpi3mr_cancel_work(struct mpi3mr_fwevt *fwevt) +{ + /* + * Wait on the fwevt to complete. If this returns 1, then + * the event was never executed. + * + * If it did execute, we wait for it to finish, and the put will + * happen from mpi3mr_process_fwevt() + */ + if (cancel_work_sync(&fwevt->work)) { + /* + * Put fwevt reference count after + * dequeuing it from worker queue + */ + mpi3mr_fwevt_put(fwevt); + /* + * Put fwevt reference count to neutralize + * kref_init increment + */ + mpi3mr_fwevt_put(fwevt); + } +} + /** * mpi3mr_cleanup_fwevt_list - Cleanup firmware event list * @mrioc: Adapter instance reference @@ -302,28 +331,25 @@ void mpi3mr_cleanup_fwevt_list(struct mpi3mr_ioc *mrioc) !mrioc->fwevt_worker_thread) return;
- while ((fwevt = mpi3mr_dequeue_fwevt(mrioc)) || - (fwevt = mrioc->current_event)) { + while ((fwevt = mpi3mr_dequeue_fwevt(mrioc))) + mpi3mr_cancel_work(fwevt); + + if (mrioc->current_event) { + fwevt = mrioc->current_event; /* - * Wait on the fwevt to complete. If this returns 1, then - * the event was never executed, and we need a put for the - * reference the work had on the fwevt. - * - * If it did execute, we wait for it to finish, and the put will - * happen from mpi3mr_process_fwevt() + * Don't call cancel_work_sync() API for the + * fwevt work if the controller reset is + * get called as part of processing the + * same fwevt work (or) when worker thread is + * waiting for device add/remove APIs to complete. + * Otherwise we will see deadlock. */ - if (cancel_work_sync(&fwevt->work)) { - /* - * Put fwevt reference count after - * dequeuing it from worker queue - */ - mpi3mr_fwevt_put(fwevt); - /* - * Put fwevt reference count to neutralize - * kref_init increment - */ - mpi3mr_fwevt_put(fwevt); + if (current_work() == &fwevt->work || fwevt->pending_at_sml) { + fwevt->discard = 1; + return; } + + mpi3mr_cancel_work(fwevt); } }
@@ -690,6 +716,24 @@ static struct mpi3mr_tgt_dev *__mpi3mr_get_tgtdev_from_tgtpriv( return tgtdev; }
+/** + * mpi3mr_print_device_event_notice - print notice related to post processing of + * device event after controller reset. + * + * @mrioc: Adapter instance reference + * @device_add: true for device add event and false for device removal event + * + * Return: None. + */ +static void mpi3mr_print_device_event_notice(struct mpi3mr_ioc *mrioc, + bool device_add) +{ + ioc_notice(mrioc, "Device %s was in progress before the reset and\n", + (device_add ? "addition" : "removal")); + ioc_notice(mrioc, "completed after reset, verify whether the exposed devices\n"); + ioc_notice(mrioc, "are matched with attached devices for correctness\n"); +} + /** * mpi3mr_remove_tgtdev_from_host - Remove dev from upper layers * @mrioc: Adapter instance reference @@ -714,8 +758,17 @@ static void mpi3mr_remove_tgtdev_from_host(struct mpi3mr_ioc *mrioc, }
if (tgtdev->starget) { + if (mrioc->current_event) + mrioc->current_event->pending_at_sml = 1; scsi_remove_target(&tgtdev->starget->dev); tgtdev->host_exposed = 0; + if (mrioc->current_event) { + mrioc->current_event->pending_at_sml = 0; + if (mrioc->current_event->discard) { + mpi3mr_print_device_event_notice(mrioc, false); + return; + } + } } ioc_info(mrioc, "%s :Removed handle(0x%04x), wwid(0x%016llx)\n", __func__, tgtdev->dev_handle, (unsigned long long)tgtdev->wwid); @@ -749,11 +802,20 @@ static int mpi3mr_report_tgtdev_to_host(struct mpi3mr_ioc *mrioc, } if (!tgtdev->host_exposed && !mrioc->reset_in_progress) { tgtdev->host_exposed = 1; + if (mrioc->current_event) + mrioc->current_event->pending_at_sml = 1; scsi_scan_target(&mrioc->shost->shost_gendev, 0, tgtdev->perst_id, SCAN_WILD_CARD, SCSI_SCAN_INITIAL); if (!tgtdev->starget) tgtdev->host_exposed = 0; + if (mrioc->current_event) { + mrioc->current_event->pending_at_sml = 0; + if (mrioc->current_event->discard) { + mpi3mr_print_device_event_notice(mrioc, true); + goto out; + } + } } out: if (tgtdev) @@ -1193,6 +1255,8 @@ static void mpi3mr_sastopochg_evt_bh(struct mpi3mr_ioc *mrioc, mpi3mr_sastopochg_evt_debug(mrioc, event_data);
for (i = 0; i < event_data->num_entries; i++) { + if (fwevt->discard) + return; handle = le16_to_cpu(event_data->phy_entry[i].attached_dev_handle); if (!handle) continue; @@ -1324,6 +1388,8 @@ static void mpi3mr_pcietopochg_evt_bh(struct mpi3mr_ioc *mrioc, mpi3mr_pcietopochg_evt_debug(mrioc, event_data);
for (i = 0; i < event_data->num_entries; i++) { + if (fwevt->discard) + return; handle = le16_to_cpu(event_data->port_entry[i].attached_dev_handle); if (!handle) @@ -1362,8 +1428,8 @@ static void mpi3mr_pcietopochg_evt_bh(struct mpi3mr_ioc *mrioc, static void mpi3mr_fwevt_bh(struct mpi3mr_ioc *mrioc, struct mpi3mr_fwevt *fwevt) { - mrioc->current_event = fwevt; mpi3mr_fwevt_del_from_list(mrioc, fwevt); + mrioc->current_event = fwevt;
if (mrioc->stop_drv_processing) goto out;
From: Sreekanth Reddy sreekanth.reddy@broadcom.com
[ Upstream commit 9992246127246a27cc7184f05cce6f62ac48f84e ]
The driver is missing to set the residual size while completing an I/O. Ensure proper data transfer size is reported to the kernel on I/O completion based on the transfer length reported by the firmware.
Link: https://lore.kernel.org/r/20220210095817.22828-7-sreekanth.reddy@broadcom.co... Signed-off-by: Sreekanth Reddy sreekanth.reddy@broadcom.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/mpi3mr/mpi3mr_os.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/scsi/mpi3mr/mpi3mr_os.c b/drivers/scsi/mpi3mr/mpi3mr_os.c index d205354be63a..f7893de35b26 100644 --- a/drivers/scsi/mpi3mr/mpi3mr_os.c +++ b/drivers/scsi/mpi3mr/mpi3mr_os.c @@ -2617,6 +2617,8 @@ void mpi3mr_process_op_reply_desc(struct mpi3mr_ioc *mrioc, scmd->result = DID_OK << 16; goto out_success; } + + scsi_set_resid(scmd, scsi_bufflen(scmd) - xfer_count); if (ioc_status == MPI3_IOCSTATUS_SCSI_DATA_UNDERRUN && xfer_count == 0 && (scsi_status == MPI3_SCSI_STATUS_BUSY || scsi_status == MPI3_SCSI_STATUS_RESERVATION_CONFLICT ||
From: Sreekanth Reddy sreekanth.reddy@broadcom.com
[ Upstream commit d44b5fefb22e139408ae12b864da1ecb9ad9d1d2 ]
Fix memory leaks related to operational reply queue's memory segments which are not getting freed while unloading the driver.
Link: https://lore.kernel.org/r/20220210095817.22828-9-sreekanth.reddy@broadcom.co... Signed-off-by: Sreekanth Reddy sreekanth.reddy@broadcom.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/mpi3mr/mpi3mr_fw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/mpi3mr/mpi3mr_fw.c b/drivers/scsi/mpi3mr/mpi3mr_fw.c index 7193b983ee3b..e44868230197 100644 --- a/drivers/scsi/mpi3mr/mpi3mr_fw.c +++ b/drivers/scsi/mpi3mr/mpi3mr_fw.c @@ -1520,7 +1520,7 @@ static void mpi3mr_free_op_req_q_segments(struct mpi3mr_ioc *mrioc, u16 q_idx) MPI3MR_MAX_SEG_LIST_SIZE, mrioc->req_qinfo[q_idx].q_segment_list, mrioc->req_qinfo[q_idx].q_segment_list_dma); - mrioc->op_reply_qinfo[q_idx].q_segment_list = NULL; + mrioc->req_qinfo[q_idx].q_segment_list = NULL; } } else size = mrioc->req_qinfo[q_idx].segment_qd *
From: Christophe Leroy christophe.leroy@csgroup.eu
[ Upstream commit a4c182ecf33584b9b2d1aa9dad073014a504c01f ]
Commit 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines") included a spin_lock() to change_page_attr() in order to safely perform the three step operations. But then commit 9f7853d7609d ("powerpc/mm: Fix set_memory_*() against concurrent accesses") modify it to use pte_update() and do the operation safely against concurrent access.
In the meantime, Maxime reported some spinlock recursion.
[ 15.351649] BUG: spinlock recursion on CPU#0, kworker/0:2/217 [ 15.357540] lock: init_mm+0x3c/0x420, .magic: dead4ead, .owner: kworker/0:2/217, .owner_cpu: 0 [ 15.366563] CPU: 0 PID: 217 Comm: kworker/0:2 Not tainted 5.15.0+ #523 [ 15.373350] Workqueue: events do_free_init [ 15.377615] Call Trace: [ 15.380232] [e4105ac0] [800946a4] do_raw_spin_lock+0xf8/0x120 (unreliable) [ 15.387340] [e4105ae0] [8001f4ec] change_page_attr+0x40/0x1d4 [ 15.393413] [e4105b10] [801424e0] __apply_to_page_range+0x164/0x310 [ 15.400009] [e4105b60] [80169620] free_pcp_prepare+0x1e4/0x4a0 [ 15.406045] [e4105ba0] [8016c5a0] free_unref_page+0x40/0x2b8 [ 15.411979] [e4105be0] [8018724c] kasan_depopulate_vmalloc_pte+0x6c/0x94 [ 15.418989] [e4105c00] [801424e0] __apply_to_page_range+0x164/0x310 [ 15.425451] [e4105c50] [80187834] kasan_release_vmalloc+0xbc/0x134 [ 15.431898] [e4105c70] [8015f7a8] __purge_vmap_area_lazy+0x4e4/0xdd8 [ 15.438560] [e4105d30] [80160d10] _vm_unmap_aliases.part.0+0x17c/0x24c [ 15.445283] [e4105d60] [801642d0] __vunmap+0x2f0/0x5c8 [ 15.450684] [e4105db0] [800e32d0] do_free_init+0x68/0x94 [ 15.456181] [e4105dd0] [8005d094] process_one_work+0x4bc/0x7b8 [ 15.462283] [e4105e90] [8005d614] worker_thread+0x284/0x6e8 [ 15.468227] [e4105f00] [8006aaec] kthread+0x1f0/0x210 [ 15.473489] [e4105f40] [80017148] ret_from_kernel_thread+0x14/0x1c
Remove the read / modify / write sequence to make the operation atomic and remove the spin_lock() in change_page_attr().
To do the operation atomically, we can't use pte modification helpers anymore. Because all platforms have different combination of bits, it is not easy to use those bits directly. But all have the _PAGE_KERNEL_{RO/ROX/RW/RWX} set of flags. All we need it to compare two sets to know which bits are set or cleared.
For instance, by comparing _PAGE_KERNEL_ROX and _PAGE_KERNEL_RO you know which bit gets cleared and which bit get set when changing exec permission.
Reported-by: Maxime Bizon mbizon@freebox.fr Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/all/20211212112152.GA27070@sakura/ Link: https://lore.kernel.org/r/43c3c76a1175ae6dc1a3d3b5c3f7ecb48f683eea.164034401... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/mm/pageattr.c | 32 +++++++++++++------------------- 1 file changed, 13 insertions(+), 19 deletions(-)
diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c index 3bb9d168e3b3..85753e32a4de 100644 --- a/arch/powerpc/mm/pageattr.c +++ b/arch/powerpc/mm/pageattr.c @@ -15,12 +15,14 @@ #include <asm/pgtable.h>
+static pte_basic_t pte_update_delta(pte_t *ptep, unsigned long addr, + unsigned long old, unsigned long new) +{ + return pte_update(&init_mm, addr, ptep, old & ~new, new & ~old, 0); +} + /* - * Updates the attributes of a page in three steps: - * - * 1. take the page_table_lock - * 2. install the new entry with the updated attributes - * 3. flush the TLB + * Updates the attributes of a page atomically. * * This sequence is safe against concurrent updates, and also allows updating the * attributes of a page currently being executed or accessed. @@ -28,25 +30,21 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, void *data) { long action = (long)data; - pte_t pte;
- spin_lock(&init_mm.page_table_lock); - - pte = ptep_get(ptep); - - /* modify the PTE bits as desired, then apply */ + /* modify the PTE bits as desired */ switch (action) { case SET_MEMORY_RO: - pte = pte_wrprotect(pte); + /* Don't clear DIRTY bit */ + pte_update_delta(ptep, addr, _PAGE_KERNEL_RW & ~_PAGE_DIRTY, _PAGE_KERNEL_RO); break; case SET_MEMORY_RW: - pte = pte_mkwrite(pte_mkdirty(pte)); + pte_update_delta(ptep, addr, _PAGE_KERNEL_RO, _PAGE_KERNEL_RW); break; case SET_MEMORY_NX: - pte = pte_exprotect(pte); + pte_update_delta(ptep, addr, _PAGE_KERNEL_ROX, _PAGE_KERNEL_RO); break; case SET_MEMORY_X: - pte = pte_mkexec(pte); + pte_update_delta(ptep, addr, _PAGE_KERNEL_RO, _PAGE_KERNEL_ROX); break; case SET_MEMORY_NP: pte_update(&init_mm, addr, ptep, _PAGE_PRESENT, 0, 0); @@ -59,16 +57,12 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, void *data) break; }
- pte_update(&init_mm, addr, ptep, ~0UL, pte_val(pte), 0); - /* See ptesync comment in radix__set_pte_at() */ if (radix_enabled()) asm volatile("ptesync": : :"memory");
flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
- spin_unlock(&init_mm.page_table_lock); - return 0; }
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit 5ac121b81b4051e7fc83d5b3456a5e499d5bd147 ]
The AXP288's recommended and factory default Vhold value (minimum input voltage below which the input current draw will be reduced) is 4.4V. This lines up with other charger IC's such as the TI bq2419x/bq2429x series which use 4.36V or 4.44V.
For some reason some BIOS-es initialize Vhold to 4.6V or even 4.7V which combined with the typical voltage drop over typically low wire gauge micro-USB cables leads to the input-current getting capped below 1A (with a 2A capable dedicated charger) based on Vhold.
This leads to slow charging, or even to the device slowly discharging if the device is in heavy use.
As the Linux AXP288 drivers use the builtin BC1.2 charger detection and send the input-current-limit according to the detected charger there really is no reason not to use the recommended 4.4V Vhold.
Set Vhold to 4.4V to fix the slow charging issue on various devices.
There is one exception, the special-case of the HP X2 2-in-1s which combine this BC1.2 capable PMIC with a Type-C port and a 5V/3A factory provided charger with a Type-C plug which does not do BC1.2. These have their input-current-limit hardcoded to 3A (like under Windows) and use a higher Vhold on purpose to limit the current when used with other chargers. To avoid touching Vhold on these HP X2 laptops the code setting Vhold is added to an else branch of the if checking for these models.
Note this also fixes the sofar unused VBUS_ISPOUT_VHOLD_SET_MASK define, which was wrong.
Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/power/supply/axp288_charger.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/power/supply/axp288_charger.c b/drivers/power/supply/axp288_charger.c index ec41f6cd3f93..c498e62ab4e2 100644 --- a/drivers/power/supply/axp288_charger.c +++ b/drivers/power/supply/axp288_charger.c @@ -42,11 +42,11 @@ #define VBUS_ISPOUT_CUR_LIM_1500MA 0x1 /* 1500mA */ #define VBUS_ISPOUT_CUR_LIM_2000MA 0x2 /* 2000mA */ #define VBUS_ISPOUT_CUR_NO_LIM 0x3 /* 2500mA */ -#define VBUS_ISPOUT_VHOLD_SET_MASK 0x31 +#define VBUS_ISPOUT_VHOLD_SET_MASK 0x38 #define VBUS_ISPOUT_VHOLD_SET_BIT_POS 0x3 #define VBUS_ISPOUT_VHOLD_SET_OFFSET 4000 /* 4000mV */ #define VBUS_ISPOUT_VHOLD_SET_LSB_RES 100 /* 100mV */ -#define VBUS_ISPOUT_VHOLD_SET_4300MV 0x3 /* 4300mV */ +#define VBUS_ISPOUT_VHOLD_SET_4400MV 0x4 /* 4400mV */ #define VBUS_ISPOUT_VBUS_PATH_DIS BIT(7)
#define CHRG_CCCV_CC_MASK 0xf /* 4 bits */ @@ -769,6 +769,16 @@ static int charger_init_hw_regs(struct axp288_chrg_info *info) ret = axp288_charger_vbus_path_select(info, true); if (ret < 0) return ret; + } else { + /* Set Vhold to the factory default / recommended 4.4V */ + val = VBUS_ISPOUT_VHOLD_SET_4400MV << VBUS_ISPOUT_VHOLD_SET_BIT_POS; + ret = regmap_update_bits(info->regmap, AXP20X_VBUS_IPSOUT_MGMT, + VBUS_ISPOUT_VHOLD_SET_MASK, val); + if (ret < 0) { + dev_err(&info->pdev->dev, "register(%x) write error(%d)\n", + AXP20X_VBUS_IPSOUT_MGMT, ret); + return ret; + } }
/* Read current charge voltage and current limit */
From: Kevin Tang kevin3.tang@gmail.com
[ Upstream commit 8668658aebb0a19d877d5a81c004baf716c4aaa6 ]
'drm' could be null in sprd_drm_shutdown, and drm_warn maybe dereference it, remove this warning log.
Reported-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Kevin Tang kevin3.tang@gmail.com Reviewed-by: Javier Martinez Canillas javierm@redhat.com Acked-by: Thomas Zimmermann tzimmermann@suse.de Link: https://lore.kernel.org/all/20220117084044.9210-1-kevin3.tang@gmail.com
v1 -> v2: - Split checking platform_get_resource() return value to a separate patch - Use dev_warn() instead of removing the warning log
Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/sprd/sprd_drm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/sprd/sprd_drm.c b/drivers/gpu/drm/sprd/sprd_drm.c index a077e2d4d721..af2be97d5ed0 100644 --- a/drivers/gpu/drm/sprd/sprd_drm.c +++ b/drivers/gpu/drm/sprd/sprd_drm.c @@ -155,7 +155,7 @@ static void sprd_drm_shutdown(struct platform_device *pdev) struct drm_device *drm = platform_get_drvdata(pdev);
if (!drm) { - drm_warn(drm, "drm device is not available, no shutdown\n"); + dev_warn(&pdev->dev, "drm device is not available, no shutdown\n"); return; }
From: Kevin Tang kevin3.tang@gmail.com
[ Upstream commit 73792e6e66be1225837cc1a40f1e39b1d077751c ]
platform_get_resource() may fail and return NULL, so check it's value before using it.
Reported-by: Zou Wei zou_wei@huawei.com Signed-off-by: Kevin Tang kevin3.tang@gmail.com Reviewed-by: Javier Martinez Canillas javierm@redhat.com Acked-by: Thomas Zimmermann tzimmermann@suse.de Link: https://lore.kernel.org/all/20220117084156.9338-1-kevin3.tang@gmail.com
v1 -> v2: - new patch
Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/sprd/sprd_dpu.c | 5 +++++ drivers/gpu/drm/sprd/sprd_dsi.c | 5 +++++ 2 files changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/sprd/sprd_dpu.c b/drivers/gpu/drm/sprd/sprd_dpu.c index 06a3414ee43a..1637203ea103 100644 --- a/drivers/gpu/drm/sprd/sprd_dpu.c +++ b/drivers/gpu/drm/sprd/sprd_dpu.c @@ -790,6 +790,11 @@ static int sprd_dpu_context_init(struct sprd_dpu *dpu, int ret;
res = platform_get_resource(pdev, IORESOURCE_MEM, 0); + if (!res) { + dev_err(dev, "failed to get I/O resource\n"); + return -EINVAL; + } + ctx->base = devm_ioremap(dev, res->start, resource_size(res)); if (!ctx->base) { dev_err(dev, "failed to map dpu registers\n"); diff --git a/drivers/gpu/drm/sprd/sprd_dsi.c b/drivers/gpu/drm/sprd/sprd_dsi.c index 911b3cddc264..12b67a5d5923 100644 --- a/drivers/gpu/drm/sprd/sprd_dsi.c +++ b/drivers/gpu/drm/sprd/sprd_dsi.c @@ -907,6 +907,11 @@ static int sprd_dsi_context_init(struct sprd_dsi *dsi, struct resource *res;
res = platform_get_resource(pdev, IORESOURCE_MEM, 0); + if (!res) { + dev_err(dev, "failed to get I/O resource\n"); + return -EINVAL; + } + ctx->base = devm_ioremap(dev, res->start, resource_size(res)); if (!ctx->base) { drm_err(dsi->drm, "failed to map dsi host registers\n");
From: Sung Joon Kim sungkim@amd.com
[ Upstream commit 3b853c316c9321e195414a6fb121d1c2d45b1e87 ]
[why] In LTTPR non-transparent mode, we need to reset the cached lane settings before performing link training on the next PHY repeater. Otherwise, the cached lane settings will be used for the next clock recovery e.g. VS = MAX (3) which should not be the case according to the DP specs. We expect to use minimum lane settings on each clock recovery sequence.
[how] Reset DPCD and HW lane settings on each repeater LT. Set training pattern to 0 for the repeater that failed LT at the proper place.
Reviewed-by: Meenakshikumar Somasundaram Meenakshikumar.Somasundaram@amd.com Reviewed-by: Jun Lei Jun.Lei@amd.com Acked-by: Jasdeep Dhillon jdhillon@amd.com Signed-off-by: Sung Joon Kim sungkim@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c index 61b8f29a0c30..49d5271dcfdc 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c @@ -2378,22 +2378,27 @@ static enum link_training_result dp_perform_8b_10b_link_training( repeater_id--) { status = perform_clock_recovery_sequence(link, link_res, lt_settings, repeater_id);
- if (status != LINK_TRAINING_SUCCESS) + if (status != LINK_TRAINING_SUCCESS) { + repeater_training_done(link, repeater_id); break; + }
status = perform_channel_equalization_sequence(link, link_res, lt_settings, repeater_id);
+ repeater_training_done(link, repeater_id); + if (status != LINK_TRAINING_SUCCESS) break;
- repeater_training_done(link, repeater_id); + for (lane = 0; lane < LANE_COUNT_DP_MAX; lane++) { + lt_settings->dpcd_lane_settings[lane].raw = 0; + lt_settings->hw_lane_settings[lane].VOLTAGE_SWING = 0; + lt_settings->hw_lane_settings[lane].PRE_EMPHASIS = 0; + } } - - for (lane = 0; lane < (uint8_t)lt_settings->link_settings.lane_count; lane++) - lt_settings->dpcd_lane_settings[lane].raw = 0; }
if (status == LINK_TRAINING_SUCCESS) {
From: Maxim Mikityanskiy maximmi@nvidia.com
[ Upstream commit d08c6e2a4d0308a7922d7ef3b1b3af45d4096aad ]
Normally, the queues are disabled when the channels are deactivated, and enabled when the channels are activated. However, on register, the channels are not active, but the queues are enabled by default. This change fixes it, preventing mlx5e_xmit from running when the channels are deactivated in the beginning.
Signed-off-by: Maxim Mikityanskiy maximmi@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 3667f5ef5990..169e3524bb1c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -5345,6 +5345,7 @@ mlx5e_create_netdev(struct mlx5_core_dev *mdev, const struct mlx5e_profile *prof }
netif_carrier_off(netdev); + netif_tx_disable(netdev); dev_net_set(netdev, mlx5_core_net(mdev));
return netdev;
From: José Expósito jose.exposito89@gmail.com
[ Upstream commit 8ae5c16c9d421d43f32f66d2308031f1bd3f9336 ]
Like the Apple Magic Keyboard 2015, when connected over USB, the 2021 version registers 2 different interfaces. One of them is used to report the battery level.
However, unlike when connected over Bluetooth, the battery level is not reported automatically and it is required to fetch it manually.
Add the APPLE_RDESC_BATTERY quirk to fix the battery report descriptor and manually fetch the battery level.
Tested with the ANSI, ISO and JIS variants of the keyboard.
Signed-off-by: José Expósito jose.exposito89@gmail.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-apple.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hid/hid-apple.c b/drivers/hid/hid-apple.c index 7dc89dc6b0f0..18de4ccb0fb2 100644 --- a/drivers/hid/hid-apple.c +++ b/drivers/hid/hid-apple.c @@ -748,7 +748,7 @@ static const struct hid_device_id apple_devices[] = { { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER1_TP_ONLY), .driver_data = APPLE_NUMLOCK_EMULATION | APPLE_HAS_FN }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_MAGIC_KEYBOARD_2021), - .driver_data = APPLE_HAS_FN | APPLE_ISO_TILDE_QUIRK }, + .driver_data = APPLE_HAS_FN | APPLE_ISO_TILDE_QUIRK | APPLE_RDESC_BATTERY }, { HID_BLUETOOTH_DEVICE(BT_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_MAGIC_KEYBOARD_2021), .driver_data = APPLE_HAS_FN | APPLE_ISO_TILDE_QUIRK }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_MAGIC_KEYBOARD_FINGERPRINT_2021),
From: José Expósito jose.exposito89@gmail.com
[ Upstream commit cbfcfbfc384890a062a5d0cc4792df094a6cc7a8 ]
Like the Apple Magic Keyboard 2015, when connected over USB, the 2021 version with fingerprint reader registers 2 different interfaces. One of them is used to report the battery level.
However, unlike when connected over Bluetooth, the battery level is not reported automatically and it is required to fetch it manually.
Add the APPLE_RDESC_BATTERY quirk to fix the battery report descriptor and manually fetch the battery level.
Tested with the ANSI variant of the keyboard with and without numpad.
Signed-off-by: José Expósito jose.exposito89@gmail.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-apple.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/hid/hid-apple.c b/drivers/hid/hid-apple.c index 18de4ccb0fb2..590376d776a1 100644 --- a/drivers/hid/hid-apple.c +++ b/drivers/hid/hid-apple.c @@ -752,11 +752,11 @@ static const struct hid_device_id apple_devices[] = { { HID_BLUETOOTH_DEVICE(BT_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_MAGIC_KEYBOARD_2021), .driver_data = APPLE_HAS_FN | APPLE_ISO_TILDE_QUIRK }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_MAGIC_KEYBOARD_FINGERPRINT_2021), - .driver_data = APPLE_HAS_FN | APPLE_ISO_TILDE_QUIRK }, + .driver_data = APPLE_HAS_FN | APPLE_ISO_TILDE_QUIRK | APPLE_RDESC_BATTERY }, { HID_BLUETOOTH_DEVICE(BT_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_MAGIC_KEYBOARD_FINGERPRINT_2021), .driver_data = APPLE_HAS_FN | APPLE_ISO_TILDE_QUIRK }, { HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_MAGIC_KEYBOARD_NUMPAD_2021), - .driver_data = APPLE_HAS_FN | APPLE_ISO_TILDE_QUIRK }, + .driver_data = APPLE_HAS_FN | APPLE_ISO_TILDE_QUIRK | APPLE_RDESC_BATTERY }, { HID_BLUETOOTH_DEVICE(BT_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_MAGIC_KEYBOARD_NUMPAD_2021), .driver_data = APPLE_HAS_FN | APPLE_ISO_TILDE_QUIRK },
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit e285cb403994419e997749c9a52b9370884ae0c8 ]
The quirk handling may need to set some different properties which means using a different swnode, move the setting of the swnode to inside dwc3_pci_quirks() so that the quirk handling can choose a different swnode.
Signed-off-by: Hans de Goede hdegoede@redhat.com Link: https://lore.kernel.org/r/20220213130524.18748-4-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/dwc3/dwc3-pci.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c index 06d0e88ec8af..4d9608cc55f7 100644 --- a/drivers/usb/dwc3/dwc3-pci.c +++ b/drivers/usb/dwc3/dwc3-pci.c @@ -185,7 +185,8 @@ static const struct software_node dwc3_pci_amd_mr_swnode = { .properties = dwc3_pci_mr_properties, };
-static int dwc3_pci_quirks(struct dwc3_pci *dwc) +static int dwc3_pci_quirks(struct dwc3_pci *dwc, + const struct software_node *swnode) { struct pci_dev *pdev = dwc->pci;
@@ -242,7 +243,7 @@ static int dwc3_pci_quirks(struct dwc3_pci *dwc) } }
- return 0; + return device_add_software_node(&dwc->dwc3->dev, swnode); }
#ifdef CONFIG_PM @@ -307,11 +308,7 @@ static int dwc3_pci_probe(struct pci_dev *pci, const struct pci_device_id *id) dwc->dwc3->dev.parent = dev; ACPI_COMPANION_SET(&dwc->dwc3->dev, ACPI_COMPANION(dev));
- ret = device_add_software_node(&dwc->dwc3->dev, (void *)id->driver_data); - if (ret < 0) - goto err; - - ret = dwc3_pci_quirks(dwc); + ret = dwc3_pci_quirks(dwc, (void *)id->driver_data); if (ret) goto err;
From: Ilan Peer ilan.peer@intel.com
[ Upstream commit d8d4dd26b9e0469baf5017f0544d852fd4e3fb6d ]
Currently, fragmented EBS was set for a channel only if the 'hb_type' was set to fragmented or balanced scan. However, 'hb_type' is set only in case of CDB, and thus fragmented EBS is never set for a channel for non-CDB devices. Fix it.
Signed-off-by: Ilan Peer ilan.peer@intel.com Signed-off-by: Luca Coelho luciano.coelho@intel.com Link: https://lore.kernel.org/r/iwlwifi.20220204122220.a6165ac9b9d5.I654eafa62fd64... Signed-off-by: Luca Coelho luciano.coelho@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c index 5f92a09db374..4cd507cb412d 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c @@ -1893,7 +1893,10 @@ static u8 iwl_mvm_scan_umac_chan_flags_v2(struct iwl_mvm *mvm, IWL_SCAN_CHANNEL_FLAG_CACHE_ADD;
/* set fragmented ebs for fragmented scan on HB channels */ - if (iwl_mvm_is_scan_fragmented(params->hb_type)) + if ((!iwl_mvm_is_cdb_supported(mvm) && + iwl_mvm_is_scan_fragmented(params->type)) || + (iwl_mvm_is_cdb_supported(mvm) && + iwl_mvm_is_scan_fragmented(params->hb_type))) flags |= IWL_SCAN_CHANNEL_FLAG_EBS_FRAG;
return flags;
From: Luca Coelho luciano.coelho@intel.com
[ Upstream commit 3009c797c4b3840495e8f48d8d07f48d2ddfed80 ]
There was a small copy and paste mistake in the doc declaration of iwl_fw_ini_addr_val. Fix it.
Signed-off-by: Luca Coelho luciano.coelho@intel.com Link: https://lore.kernel.org/r/iwlwifi.20220205112029.aeec71c397b3.I0ba3234419eb8... Signed-off-by: Luca Coelho luciano.coelho@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/fw/api/dbg-tlv.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/api/dbg-tlv.h b/drivers/net/wireless/intel/iwlwifi/fw/api/dbg-tlv.h index 456b7eaac570..061fe6cc6cf5 100644 --- a/drivers/net/wireless/intel/iwlwifi/fw/api/dbg-tlv.h +++ b/drivers/net/wireless/intel/iwlwifi/fw/api/dbg-tlv.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */ /* - * Copyright (C) 2018-2021 Intel Corporation + * Copyright (C) 2018-2022 Intel Corporation */ #ifndef __iwl_fw_dbg_tlv_h__ #define __iwl_fw_dbg_tlv_h__ @@ -249,11 +249,10 @@ struct iwl_fw_ini_hcmd_tlv { } __packed; /* FW_TLV_DEBUG_HCMD_API_S_VER_1 */
/** -* struct iwl_fw_ini_conf_tlv - preset configuration TLV +* struct iwl_fw_ini_addr_val - Address and value to set it to * * @address: the base address * @value: value to set at address - */ struct iwl_fw_ini_addr_val { __le32 address;
From: Miri Korenblit miriam.rachel.korenblit@intel.com
[ Upstream commit e04135c07755d001b5cde61048c69a7cc84bb94b ]
During disassociation we're decreasing the phy's ref count. If the ref count becomes 0, we're configuring the phy ctxt to the default channel (the lowest channel which the device can operate on). Currently we're not checking whether the the default channel is enabled or not. Fix it by configuring the phy ctxt to the lowest channel which is enabled.
Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Signed-off-by: Luca Coelho luciano.coelho@intel.com Link: https://lore.kernel.org/r/iwlwifi.20220210181930.03f281b6a6bc.I5b63d43ec4199... Signed-off-by: Luca Coelho luciano.coelho@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/wireless/intel/iwlwifi/mvm/phy-ctxt.c | 31 +++++++++++++------ 1 file changed, 22 insertions(+), 9 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/phy-ctxt.c b/drivers/net/wireless/intel/iwlwifi/mvm/phy-ctxt.c index 9af40b0fa37a..a6e6673bf4ee 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/phy-ctxt.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/phy-ctxt.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause /* - * Copyright (C) 2012-2014, 2018-2021 Intel Corporation + * Copyright (C) 2012-2014, 2018-2022 Intel Corporation * Copyright (C) 2013-2014 Intel Mobile Communications GmbH * Copyright (C) 2017 Intel Deutschland GmbH */ @@ -349,18 +349,31 @@ void iwl_mvm_phy_ctxt_unref(struct iwl_mvm *mvm, struct iwl_mvm_phy_ctxt *ctxt) * otherwise we might not be able to reuse this phy. */ if (ctxt->ref == 0) { - struct ieee80211_channel *chan; + struct ieee80211_channel *chan = NULL; struct cfg80211_chan_def chandef; - struct ieee80211_supported_band *sband = NULL; - enum nl80211_band band = NL80211_BAND_2GHZ; + struct ieee80211_supported_band *sband; + enum nl80211_band band; + int channel;
- while (!sband && band < NUM_NL80211_BANDS) - sband = mvm->hw->wiphy->bands[band++]; + for (band = NL80211_BAND_2GHZ; band < NUM_NL80211_BANDS; band++) { + sband = mvm->hw->wiphy->bands[band];
- if (WARN_ON(!sband)) - return; + if (!sband) + continue; + + for (channel = 0; channel < sband->n_channels; channel++) + if (!(sband->channels[channel].flags & + IEEE80211_CHAN_DISABLED)) { + chan = &sband->channels[channel]; + break; + }
- chan = &sband->channels[0]; + if (chan) + break; + } + + if (WARN_ON(!chan)) + return;
cfg80211_chandef_create(&chandef, chan, NL80211_CHAN_NO_HT); iwl_mvm_phy_ctxt_changed(mvm, ctxt, &chandef, 1, 1);
From: Eric Dumazet edumazet@google.com
[ Upstream commit 086d49058cd8471046ae9927524708820f5fd1c7 ]
IPv6 has this hack changing sk->sk_prot when an IPv6 socket is 'converted' to an IPv4 one with IPV6_ADDRFORM option.
This operation is only performed for TCP and UDP, knowing their 'struct proto' for the two network families are populated in the same way, and can not disappear while a reader might use and dereference sk->sk_prot.
If we think about it all reads of sk->sk_prot while either socket lock or RTNL is not acquired should be using READ_ONCE().
Also note that other layers like MPTCP, XFRM, CHELSIO_TLS also write over sk->sk_prot.
BUG: KCSAN: data-race in inet6_recvmsg / ipv6_setsockopt
write to 0xffff8881386f7aa8 of 8 bytes by task 26932 on cpu 0: do_ipv6_setsockopt net/ipv6/ipv6_sockglue.c:492 [inline] ipv6_setsockopt+0x3758/0x3910 net/ipv6/ipv6_sockglue.c:1019 udpv6_setsockopt+0x85/0x90 net/ipv6/udp.c:1649 sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3489 __sys_setsockopt+0x209/0x2a0 net/socket.c:2180 __do_sys_setsockopt net/socket.c:2191 [inline] __se_sys_setsockopt net/socket.c:2188 [inline] __x64_sys_setsockopt+0x62/0x70 net/socket.c:2188 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff8881386f7aa8 of 8 bytes by task 26911 on cpu 1: inet6_recvmsg+0x7a/0x210 net/ipv6/af_inet6.c:659 ____sys_recvmsg+0x16c/0x320 ___sys_recvmsg net/socket.c:2674 [inline] do_recvmmsg+0x3f5/0xae0 net/socket.c:2768 __sys_recvmmsg net/socket.c:2847 [inline] __do_sys_recvmmsg net/socket.c:2870 [inline] __se_sys_recvmmsg net/socket.c:2863 [inline] __x64_sys_recvmmsg+0xde/0x160 net/socket.c:2863 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0xffffffff85e0e980 -> 0xffffffff85e01580
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 26911 Comm: syz-executor.3 Not tainted 5.17.0-rc2-syzkaller-00316-g0457e5153e0e-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/af_inet6.c | 24 ++++++++++++++++++------ net/ipv6/ipv6_sockglue.c | 6 ++++-- 2 files changed, 22 insertions(+), 8 deletions(-)
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 8fe7900f1949..7d7b7523d126 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -441,11 +441,14 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) { struct sock *sk = sock->sk; u32 flags = BIND_WITH_LOCK; + const struct proto *prot; int err = 0;
+ /* IPV6_ADDRFORM can change sk->sk_prot under us. */ + prot = READ_ONCE(sk->sk_prot); /* If the socket has its own bind function then use it. */ - if (sk->sk_prot->bind) - return sk->sk_prot->bind(sk, uaddr, addr_len); + if (prot->bind) + return prot->bind(sk, uaddr, addr_len);
if (addr_len < SIN6_LEN_RFC2133) return -EINVAL; @@ -555,6 +558,7 @@ int inet6_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) void __user *argp = (void __user *)arg; struct sock *sk = sock->sk; struct net *net = sock_net(sk); + const struct proto *prot;
switch (cmd) { case SIOCADDRT: @@ -572,9 +576,11 @@ int inet6_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) case SIOCSIFDSTADDR: return addrconf_set_dstaddr(net, argp); default: - if (!sk->sk_prot->ioctl) + /* IPV6_ADDRFORM can change sk->sk_prot under us. */ + prot = READ_ONCE(sk->sk_prot); + if (!prot->ioctl) return -ENOIOCTLCMD; - return sk->sk_prot->ioctl(sk, cmd, arg); + return prot->ioctl(sk, cmd, arg); } /*NOTREACHED*/ return 0; @@ -636,11 +642,14 @@ INDIRECT_CALLABLE_DECLARE(int udpv6_sendmsg(struct sock *, struct msghdr *, int inet6_sendmsg(struct socket *sock, struct msghdr *msg, size_t size) { struct sock *sk = sock->sk; + const struct proto *prot;
if (unlikely(inet_send_prepare(sk))) return -EAGAIN;
- return INDIRECT_CALL_2(sk->sk_prot->sendmsg, tcp_sendmsg, udpv6_sendmsg, + /* IPV6_ADDRFORM can change sk->sk_prot under us. */ + prot = READ_ONCE(sk->sk_prot); + return INDIRECT_CALL_2(prot->sendmsg, tcp_sendmsg, udpv6_sendmsg, sk, msg, size); }
@@ -650,13 +659,16 @@ int inet6_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, int flags) { struct sock *sk = sock->sk; + const struct proto *prot; int addr_len = 0; int err;
if (likely(!(flags & MSG_ERRQUEUE))) sock_rps_record_flow(sk);
- err = INDIRECT_CALL_2(sk->sk_prot->recvmsg, tcp_recvmsg, udpv6_recvmsg, + /* IPV6_ADDRFORM can change sk->sk_prot under us. */ + prot = READ_ONCE(sk->sk_prot); + err = INDIRECT_CALL_2(prot->recvmsg, tcp_recvmsg, udpv6_recvmsg, sk, msg, size, flags & MSG_DONTWAIT, flags & ~MSG_DONTWAIT, &addr_len); if (err >= 0) diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index a733803a710c..222f6bf220ba 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -475,7 +475,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, sock_prot_inuse_add(net, sk->sk_prot, -1); sock_prot_inuse_add(net, &tcp_prot, 1);
- sk->sk_prot = &tcp_prot; + /* Paired with READ_ONCE(sk->sk_prot) in net/ipv6/af_inet6.c */ + WRITE_ONCE(sk->sk_prot, &tcp_prot); icsk->icsk_af_ops = &ipv4_specific; sk->sk_socket->ops = &inet_stream_ops; sk->sk_family = PF_INET; @@ -489,7 +490,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, sock_prot_inuse_add(net, sk->sk_prot, -1); sock_prot_inuse_add(net, prot, 1);
- sk->sk_prot = prot; + /* Paired with READ_ONCE(sk->sk_prot) in net/ipv6/af_inet6.c */ + WRITE_ONCE(sk->sk_prot, prot); sk->sk_socket->ops = &inet_dgram_ops; sk->sk_family = PF_INET; }
From: Daniel Thompson daniel.thompson@linaro.org
[ Upstream commit 24b176d8827d167ac3b379317f60c0985f6e95aa ]
Quoting the header comments, IRQF_ONESHOT is "Used by threaded interrupts which need to keep the irq line disabled until the threaded handler has been run.". When applied to an interrupt that doesn't request a threaded irq then IRQF_ONESHOT has a lesser known (undocumented?) side effect, which it to disable the forced threading of irqs (and for "normal" kernels it is a nop). In this case I can find no evidence that suppressing forced threading is intentional. Had it been intentional then a driver must adopt the raw_spinlock API in order to avoid deadlocks on PREEMPT_RT kernels (and avoid calling any kernel API that uses regular spinlocks).
Fix this by removing the spurious additional flag.
This change is required for my Snapdragon 7cx Gen2 tablet to boot-to-GUI with PREEMPT_RT enabled.
Signed-off-by: Daniel Thompson daniel.thompson@linaro.org Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Link: https://lore.kernel.org/r/20220201174734.196718-2-daniel.thompson@linaro.org Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/dsi/dsi_host.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/dsi/dsi_host.c b/drivers/gpu/drm/msm/dsi/dsi_host.c index 6b3ced4aaaf5..3a3f53f0c8ae 100644 --- a/drivers/gpu/drm/msm/dsi/dsi_host.c +++ b/drivers/gpu/drm/msm/dsi/dsi_host.c @@ -1877,7 +1877,7 @@ int msm_dsi_host_init(struct msm_dsi *msm_dsi)
/* do not autoenable, will be enabled later */ ret = devm_request_irq(&pdev->dev, msm_host->irq, dsi_host_irq, - IRQF_TRIGGER_HIGH | IRQF_ONESHOT | IRQF_NO_AUTOEN, + IRQF_TRIGGER_HIGH | IRQF_NO_AUTOEN, "dsi_isr", msm_host); if (ret < 0) { dev_err(&pdev->dev, "failed to request IRQ%u: %d\n",
From: Jue Wang juew@google.com
[ Upstream commit 8ca97812c3c830573f965a07bbd84223e8c5f5bd ]
A rare kernel panic scenario can happen when the following conditions are met due to an erratum on fast string copy instructions:
1) An uncorrected error. 2) That error must be in first cache line of a page. 3) Kernel must execute page_copy from the page immediately before that page.
The fast string copy instructions ("REP; MOVS*") could consume an uncorrectable memory error in the cache line _right after_ the desired region to copy and raise an MCE.
Bit 0 of MSR_IA32_MISC_ENABLE can be cleared to disable fast string copy and will avoid such spurious machine checks. However, that is less preferable due to the permanent performance impact. Considering memory poison is rare, it's desirable to keep fast string copy enabled until an MCE is seen.
Intel has confirmed the following: 1. The CPU erratum of fast string copy only applies to Skylake, Cascade Lake and Cooper Lake generations.
Directly return from the MCE handler: 2. Will result in complete execution of the "REP; MOVS*" with no data loss or corruption. 3. Will not result in another MCE firing on the next poisoned cache line due to "REP; MOVS*". 4. Will resume execution from a correct point in code. 5. Will result in the same instruction that triggered the MCE firing a second MCE immediately for any other software recoverable data fetch errors. 6. Is not safe without disabling the fast string copy, as the next fast string copy of the same buffer on the same CPU would result in a PANIC MCE.
This should mitigate the erratum completely with the only caveat that the fast string copy is disabled on the affected hyper thread thus performance degradation.
This is still better than the OS crashing on MCEs raised on an irrelevant process due to "REP; MOVS*' accesses in a kernel context, e.g., copy_page.
Tested:
Injected errors on 1st cache line of 8 anonymous pages of process 'proc1' and observed MCE consumption from 'proc2' with no panic (directly returned).
Without the fix, the host panicked within a few minutes on a random 'proc2' process due to kernel access from copy_page.
[ bp: Fix comment style + touch ups, zap an unlikely(), improve the quirk function's readability. ]
Signed-off-by: Jue Wang juew@google.com Signed-off-by: Borislav Petkov bp@suse.de Reviewed-by: Tony Luck tony.luck@intel.com Link: https://lore.kernel.org/r/20220218013209.2436006-1-juew@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/cpu/mce/core.c | 64 ++++++++++++++++++++++++++++++ arch/x86/kernel/cpu/mce/internal.h | 5 ++- 2 files changed, 68 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 5818b837fd4d..2d719e0d2e40 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -834,6 +834,59 @@ static void quirk_sandybridge_ifu(int bank, struct mce *m, struct pt_regs *regs) m->cs = regs->cs; }
+/* + * Disable fast string copy and return from the MCE handler upon the first SRAR + * MCE on bank 1 due to a CPU erratum on Intel Skylake/Cascade Lake/Cooper Lake + * CPUs. + * The fast string copy instructions ("REP; MOVS*") could consume an + * uncorrectable memory error in the cache line _right after_ the desired region + * to copy and raise an MCE with RIP pointing to the instruction _after_ the + * "REP; MOVS*". + * This mitigation addresses the issue completely with the caveat of performance + * degradation on the CPU affected. This is still better than the OS crashing on + * MCEs raised on an irrelevant process due to "REP; MOVS*" accesses from a + * kernel context (e.g., copy_page). + * + * Returns true when fast string copy on CPU has been disabled. + */ +static noinstr bool quirk_skylake_repmov(void) +{ + u64 mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS); + u64 misc_enable = mce_rdmsrl(MSR_IA32_MISC_ENABLE); + u64 mc1_status; + + /* + * Apply the quirk only to local machine checks, i.e., no broadcast + * sync is needed. + */ + if (!(mcgstatus & MCG_STATUS_LMCES) || + !(misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING)) + return false; + + mc1_status = mce_rdmsrl(MSR_IA32_MCx_STATUS(1)); + + /* Check for a software-recoverable data fetch error. */ + if ((mc1_status & + (MCI_STATUS_VAL | MCI_STATUS_OVER | MCI_STATUS_UC | MCI_STATUS_EN | + MCI_STATUS_ADDRV | MCI_STATUS_MISCV | MCI_STATUS_PCC | + MCI_STATUS_AR | MCI_STATUS_S)) == + (MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN | + MCI_STATUS_ADDRV | MCI_STATUS_MISCV | + MCI_STATUS_AR | MCI_STATUS_S)) { + misc_enable &= ~MSR_IA32_MISC_ENABLE_FAST_STRING; + mce_wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable); + mce_wrmsrl(MSR_IA32_MCx_STATUS(1), 0); + + instrumentation_begin(); + pr_err_once("Erratum detected, disable fast string copy instructions.\n"); + instrumentation_end(); + + return true; + } + + return false; +} + /* * Do a quick check if any of the events requires a panic. * This decides if we keep the events around or clear them. @@ -1403,6 +1456,9 @@ noinstr void do_machine_check(struct pt_regs *regs) else if (unlikely(!mca_cfg.initialized)) return unexpected_machine_check(regs);
+ if (mce_flags.skx_repmov_quirk && quirk_skylake_repmov()) + goto clear; + /* * Establish sequential order between the CPUs entering the machine * check handler. @@ -1545,6 +1601,7 @@ noinstr void do_machine_check(struct pt_regs *regs) out: instrumentation_end();
+clear: mce_wrmsrl(MSR_IA32_MCG_STATUS, 0); } EXPORT_SYMBOL_GPL(do_machine_check); @@ -1858,6 +1915,13 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
if (c->x86 == 6 && c->x86_model == 45) mce_flags.snb_ifu_quirk = 1; + + /* + * Skylake, Cascacde Lake and Cooper Lake require a quirk on + * rep movs. + */ + if (c->x86 == 6 && c->x86_model == INTEL_FAM6_SKYLAKE_X) + mce_flags.skx_repmov_quirk = 1; }
if (c->x86_vendor == X86_VENDOR_ZHAOXIN) { diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h index 52c633950b38..24d099e2d2a2 100644 --- a/arch/x86/kernel/cpu/mce/internal.h +++ b/arch/x86/kernel/cpu/mce/internal.h @@ -170,7 +170,10 @@ struct mce_vendor_flags { /* SandyBridge IFU quirk */ snb_ifu_quirk : 1,
- __reserved_0 : 57; + /* Skylake, Cascade Lake, Cooper Lake REP;MOVS* quirk */ + skx_repmov_quirk : 1, + + __reserved_0 : 56; };
extern struct mce_vendor_flags mce_flags;
From: Jiri Kosina jkosina@suse.cz
[ Upstream commit f3d825a35920714fb7f73e4d4f36ea2328860660 ]
ieee80211_tx_h_select_key() is performing a series of RCU dereferences, but rtw89_core_txq_push() is calling it (via ieee80211_tx_dequeue_ni()) without RCU read-side lock held; fix that.
This addresses the splat below.
============================= WARNING: suspicious RCU usage 5.17.0-rc4-00003-gccad664b7f14 #3 Tainted: G E ----------------------------- net/mac80211/tx.c:593 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1 2 locks held by kworker/u33:0/184: #0: ffff9c0b14811d38 ((wq_completion)rtw89_tx_wq){+.+.}-{0:0}, at: process_one_work+0x258/0x660 #1: ffffb97380cf3e78 ((work_completion)(&rtwdev->txq_work)){+.+.}-{0:0}, at: process_one_work+0x258/0x660
stack backtrace: CPU: 8 PID: 184 Comm: kworker/u33:0 Tainted: G E 5.17.0-rc4-00003-gccad664b7f14 #3 473b49ab0e7c2d6af2900c756bfd04efd7a9de13 Hardware name: LENOVO 20UJS2B905/20UJS2B905, BIOS R1CET63W(1.32 ) 04/09/2021 Workqueue: rtw89_tx_wq rtw89_core_txq_work [rtw89_core] Call Trace: <TASK> dump_stack_lvl+0x58/0x71 ieee80211_tx_h_select_key+0x2c0/0x530 [mac80211 911c23e2351c0ae60b597a67b1204a5ea955e365] ieee80211_tx_dequeue+0x1a7/0x1260 [mac80211 911c23e2351c0ae60b597a67b1204a5ea955e365] rtw89_core_txq_work+0x1a6/0x420 [rtw89_core b39ba493f2e517ad75e0f8187ecc24edf58bbbea] process_one_work+0x2d8/0x660 worker_thread+0x39/0x3e0 ? process_one_work+0x660/0x660 kthread+0xe5/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK>
============================= WARNING: suspicious RCU usage 5.17.0-rc4-00003-gccad664b7f14 #3 Tainted: G E ----------------------------- net/mac80211/tx.c:607 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1 2 locks held by kworker/u33:0/184: #0: ffff9c0b14811d38 ((wq_completion)rtw89_tx_wq){+.+.}-{0:0}, at: process_one_work+0x258/0x660 #1: ffffb97380cf3e78 ((work_completion)(&rtwdev->txq_work)){+.+.}-{0:0}, at: process_one_work+0x258/0x660
stack backtrace: CPU: 8 PID: 184 Comm: kworker/u33:0 Tainted: G E 5.17.0-rc4-00003-gccad664b7f14 #3 473b49ab0e7c2d6af2900c756bfd04efd7a9de13 Hardware name: LENOVO 20UJS2B905/20UJS2B905, BIOS R1CET63W(1.32 ) 04/09/2021 Workqueue: rtw89_tx_wq rtw89_core_txq_work [rtw89_core] Call Trace: <TASK> dump_stack_lvl+0x58/0x71 ieee80211_tx_h_select_key+0x464/0x530 [mac80211 911c23e2351c0ae60b597a67b1204a5ea955e365] ieee80211_tx_dequeue+0x1a7/0x1260 [mac80211 911c23e2351c0ae60b597a67b1204a5ea955e365] rtw89_core_txq_work+0x1a6/0x420 [rtw89_core b39ba493f2e517ad75e0f8187ecc24edf58bbbea] process_one_work+0x2d8/0x660 worker_thread+0x39/0x3e0 ? process_one_work+0x660/0x660 kthread+0xe5/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK>
Signed-off-by: Jiri Kosina jkosina@suse.cz Acked-by: Ping-Ke Shih pkshih@realtek.com Signed-off-by: Kalle Valo kvalo@kernel.org Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2202152037000.11721@cbobk.fhfr.pm Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/realtek/rtw89/core.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/realtek/rtw89/core.c b/drivers/net/wireless/realtek/rtw89/core.c index a0737eea9f81..9632e7f218dd 100644 --- a/drivers/net/wireless/realtek/rtw89/core.c +++ b/drivers/net/wireless/realtek/rtw89/core.c @@ -1509,11 +1509,12 @@ static void rtw89_core_txq_push(struct rtw89_dev *rtwdev, unsigned long i; int ret;
+ rcu_read_lock(); for (i = 0; i < frame_cnt; i++) { skb = ieee80211_tx_dequeue_ni(rtwdev->hw, txq); if (!skb) { rtw89_debug(rtwdev, RTW89_DBG_TXRX, "dequeue a NULL skb\n"); - return; + goto out; } rtw89_core_txq_check_agg(rtwdev, rtwtxq, skb); ret = rtw89_core_tx_write(rtwdev, vif, sta, skb, NULL); @@ -1523,6 +1524,8 @@ static void rtw89_core_txq_push(struct rtw89_dev *rtwdev, break; } } +out: + rcu_read_unlock(); }
static u32 rtw89_check_and_reclaim_tx_resource(struct rtw89_dev *rtwdev, u8 tid)
From: Baochen Qiang quic_bqiang@quicinc.com
[ Upstream commit 261b07519518bd14cb168b287b17e1d195f8d0c8 ]
We are seeing below warnings:
kernel: [25393.301506] ath11k_pci 0000:01:00.0: failed to flush mgmt transmit queue 0 kernel: [25398.421509] ath11k_pci 0000:01:00.0: failed to flush mgmt transmit queue 0 kernel: [25398.421831] ath11k_pci 0000:01:00.0: dropping mgmt frame for vdev 0, is_started 0
this means ath11k fails to flush mgmt. frames because wmi_mgmt_tx_work has no chance to run in 5 seconds.
By setting /proc/sys/kernel/hung_task_timeout_secs to 20 and increasing ATH11K_FLUSH_TIMEOUT to 50 we get below warnings:
kernel: [ 120.763160] INFO: task wpa_supplicant:924 blocked for more than 20 seconds. kernel: [ 120.763169] Not tainted 5.10.90 #12 kernel: [ 120.763177] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: [ 120.763186] task:wpa_supplicant state:D stack: 0 pid: 924 ppid: 1 flags:0x000043a0 kernel: [ 120.763201] Call Trace: kernel: [ 120.763214] __schedule+0x785/0x12fa kernel: [ 120.763224] ? lockdep_hardirqs_on_prepare+0xe2/0x1bb kernel: [ 120.763242] schedule+0x7e/0xa1 kernel: [ 120.763253] schedule_timeout+0x98/0xfe kernel: [ 120.763266] ? run_local_timers+0x4a/0x4a kernel: [ 120.763291] ath11k_mac_flush_tx_complete+0x197/0x2b1 [ath11k 13c3a9bf37790f4ac8103b3decf7ab4008ac314a] kernel: [ 120.763306] ? init_wait_entry+0x2e/0x2e kernel: [ 120.763343] __ieee80211_flush_queues+0x167/0x21f [mac80211 335da900954f1c5ea7f1613d92088ce83342042c] kernel: [ 120.763378] __ieee80211_recalc_idle+0x105/0x125 [mac80211 335da900954f1c5ea7f1613d92088ce83342042c] kernel: [ 120.763411] ieee80211_recalc_idle+0x14/0x27 [mac80211 335da900954f1c5ea7f1613d92088ce83342042c] kernel: [ 120.763441] ieee80211_free_chanctx+0x77/0xa2 [mac80211 335da900954f1c5ea7f1613d92088ce83342042c] kernel: [ 120.763473] __ieee80211_vif_release_channel+0x100/0x131 [mac80211 335da900954f1c5ea7f1613d92088ce83342042c] kernel: [ 120.763540] ieee80211_vif_release_channel+0x66/0x81 [mac80211 335da900954f1c5ea7f1613d92088ce83342042c] kernel: [ 120.763572] ieee80211_destroy_auth_data+0xa3/0xe6 [mac80211 335da900954f1c5ea7f1613d92088ce83342042c] kernel: [ 120.763612] ieee80211_mgd_deauth+0x178/0x29b [mac80211 335da900954f1c5ea7f1613d92088ce83342042c] kernel: [ 120.763654] cfg80211_mlme_deauth+0x1a8/0x22c [cfg80211 8945aa5bc2af5f6972336665d8ad6f9c191ad5be] kernel: [ 120.763697] nl80211_deauthenticate+0xfa/0x123 [cfg80211 8945aa5bc2af5f6972336665d8ad6f9c191ad5be] kernel: [ 120.763715] genl_rcv_msg+0x392/0x3c2 kernel: [ 120.763750] ? nl80211_associate+0x432/0x432 [cfg80211 8945aa5bc2af5f6972336665d8ad6f9c191ad5be] kernel: [ 120.763782] ? nl80211_associate+0x432/0x432 [cfg80211 8945aa5bc2af5f6972336665d8ad6f9c191ad5be] kernel: [ 120.763802] ? genl_rcv+0x36/0x36 kernel: [ 120.763814] netlink_rcv_skb+0x89/0xf7 kernel: [ 120.763829] genl_rcv+0x28/0x36 kernel: [ 120.763840] netlink_unicast+0x179/0x24b kernel: [ 120.763854] netlink_sendmsg+0x393/0x401 kernel: [ 120.763872] sock_sendmsg+0x72/0x76 kernel: [ 120.763886] ____sys_sendmsg+0x170/0x1e6 kernel: [ 120.763897] ? copy_msghdr_from_user+0x7a/0xa2 kernel: [ 120.763914] ___sys_sendmsg+0x95/0xd1 kernel: [ 120.763940] __sys_sendmsg+0x85/0xbf kernel: [ 120.763956] do_syscall_64+0x43/0x55 kernel: [ 120.763966] entry_SYSCALL_64_after_hwframe+0x44/0xa9 kernel: [ 120.763977] RIP: 0033:0x79089f3fcc83 kernel: [ 120.763986] RSP: 002b:00007ffe604f0508 EFLAGS: 00000246 ORIG_RAX: 000000000000002e kernel: [ 120.763997] RAX: ffffffffffffffda RBX: 000059b40e987690 RCX: 000079089f3fcc83 kernel: [ 120.764006] RDX: 0000000000000000 RSI: 00007ffe604f0558 RDI: 0000000000000009 kernel: [ 120.764014] RBP: 00007ffe604f0540 R08: 0000000000000004 R09: 0000000000400000 kernel: [ 120.764023] R10: 00007ffe604f0638 R11: 0000000000000246 R12: 000059b40ea04980 kernel: [ 120.764032] R13: 00007ffe604f0638 R14: 000059b40e98c360 R15: 00007ffe604f0558 ... kernel: [ 120.765230] INFO: task kworker/u32:26:4239 blocked for more than 20 seconds. kernel: [ 120.765238] Not tainted 5.10.90 #12 kernel: [ 120.765245] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: [ 120.765253] task:kworker/u32:26 state:D stack: 0 pid: 4239 ppid: 2 flags:0x00004080 kernel: [ 120.765284] Workqueue: phy0 ieee80211_iface_work [mac80211] kernel: [ 120.765295] Call Trace: kernel: [ 120.765306] __schedule+0x785/0x12fa kernel: [ 120.765316] ? find_held_lock+0x3d/0xb2 kernel: [ 120.765331] schedule+0x7e/0xa1 kernel: [ 120.765340] schedule_preempt_disabled+0x15/0x1e kernel: [ 120.765349] __mutex_lock_common+0x561/0xc0d kernel: [ 120.765375] ? ieee80211_sta_work+0x3e/0x1232 [mac80211 335da900954f1c5ea7f1613d92088ce83342042c] kernel: [ 120.765390] mutex_lock_nested+0x20/0x26 kernel: [ 120.765416] ieee80211_sta_work+0x3e/0x1232 [mac80211 335da900954f1c5ea7f1613d92088ce83342042c] kernel: [ 120.765430] ? skb_dequeue+0x54/0x5e kernel: [ 120.765456] ? ieee80211_iface_work+0x7b/0x339 [mac80211 335da900954f1c5ea7f1613d92088ce83342042c] kernel: [ 120.765485] process_one_work+0x270/0x504 kernel: [ 120.765501] worker_thread+0x215/0x376 kernel: [ 120.765514] kthread+0x159/0x168 kernel: [ 120.765526] ? pr_cont_work+0x5b/0x5b kernel: [ 120.765536] ? kthread_blkcg+0x31/0x31 kernel: [ 120.765550] ret_from_fork+0x22/0x30 ... kernel: [ 120.765867] Showing all locks held in the system: ... kernel: [ 120.766164] 5 locks held by wpa_supplicant/924: kernel: [ 120.766172] #0: ffffffffb1e63eb0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x36 kernel: [ 120.766197] #1: ffffffffb1e5b1c8 (rtnl_mutex){+.+.}-{3:3}, at: nl80211_pre_doit+0x2a/0x15c [cfg80211] kernel: [ 120.766238] #2: ffff99f08347cd08 (&wdev->mtx){+.+.}-{3:3}, at: nl80211_deauthenticate+0xde/0x123 [cfg80211] kernel: [ 120.766279] #3: ffff99f09df12a48 (&local->mtx){+.+.}-{3:3}, at: ieee80211_destroy_auth_data+0x9b/0xe6 [mac80211] kernel: [ 120.766321] #4: ffff99f09df12ce0 (&local->chanctx_mtx){+.+.}-{3:3}, at: ieee80211_vif_release_channel+0x5e/0x81 [mac80211] ... kernel: [ 120.766585] 3 locks held by kworker/u32:26/4239: kernel: [ 120.766593] #0: ffff99f04458f948 ((wq_completion)phy0){+.+.}-{0:0}, at: process_one_work+0x19a/0x504 kernel: [ 120.766621] #1: ffffbad54b3cfe50 ((work_completion)(&sdata->work)){+.+.}-{0:0}, at: process_one_work+0x1c0/0x504 kernel: [ 120.766649] #2: ffff99f08347cd08 (&wdev->mtx){+.+.}-{3:3}, at: ieee80211_sta_work+0x3e/0x1232 [mac80211]
With above info the issue is clear: First wmi_mgmt_tx_work is inserted to local->workqueue after sdata->work inserted, then wpa_supplicant acquires wdev->mtx in nl80211_deauthenticate and finally calls ath11k_mac_op_flush where it waits all mgmt. frames to be sent out by wmi_mgmt_tx_work. Meanwhile, sdata->work is blocked by wdev->mtx in ieee80211_sta_work, as a result wmi_mgmt_tx_work has no chance to run.
Change to use ab->workqueue instead of local->workqueue to fix this issue.
Signed-off-by: Baochen Qiang quic_bqiang@quicinc.com Signed-off-by: Kalle Valo quic_kvalo@quicinc.com Link: https://lore.kernel.org/r/20220217084545.18844-1-quic_bqiang@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath11k/mac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ath/ath11k/mac.c b/drivers/net/wireless/ath/ath11k/mac.c index 08e33778f63b..28de877ad6c4 100644 --- a/drivers/net/wireless/ath/ath11k/mac.c +++ b/drivers/net/wireless/ath/ath11k/mac.c @@ -5574,7 +5574,7 @@ static int ath11k_mac_mgmt_tx(struct ath11k *ar, struct sk_buff *skb,
skb_queue_tail(q, skb); atomic_inc(&ar->num_pending_mgmt_tx); - ieee80211_queue_work(ar->hw, &ar->wmi_mgmt_tx_work); + queue_work(ar->ab->workqueue, &ar->wmi_mgmt_tx_work);
return 0; }
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit 0c51e12e218f20b7d976158fdc18019627326f7a ]
In case user space sends a packet destined to a broadcast address when a matching broadcast route is not configured, the kernel will create a unicast neighbour entry that will never be resolved [1].
When the broadcast route is configured, the unicast neighbour entry will not be invalidated and continue to linger, resulting in packets being dropped.
Solve this by invalidating unresolved neighbour entries for broadcast addresses after routes for these addresses are internally configured by the kernel. This allows the kernel to create a broadcast neighbour entry following the next route lookup.
Another possible solution that is more generic but also more complex is to have the ARP code register a listener to the FIB notification chain and invalidate matching neighbour entries upon the addition of broadcast routes.
It is also possible to wave off the issue as a user space problem, but it seems a bit excessive to expect user space to be that intimately familiar with the inner workings of the FIB/neighbour kernel code.
[1] https://lore.kernel.org/netdev/55a04a8f-56f3-f73c-2aea-2195923f09d1@huawei.c...
Reported-by: Wang Hai wanghai38@huawei.com Signed-off-by: Ido Schimmel idosch@nvidia.com Tested-by: Wang Hai wanghai38@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/arp.h | 1 + net/ipv4/arp.c | 9 +++++++-- net/ipv4/fib_frontend.c | 5 ++++- 3 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/include/net/arp.h b/include/net/arp.h index 031374ac2f22..d7ef4ec71dfe 100644 --- a/include/net/arp.h +++ b/include/net/arp.h @@ -65,6 +65,7 @@ void arp_send(int type, int ptype, __be32 dest_ip, const unsigned char *src_hw, const unsigned char *th); int arp_mc_map(__be32 addr, u8 *haddr, struct net_device *dev, int dir); void arp_ifdown(struct net_device *dev); +int arp_invalidate(struct net_device *dev, __be32 ip, bool force);
struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip, struct net_device *dev, __be32 src_ip, diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index 4db0325f6e1a..dc28f0588e54 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -1116,13 +1116,18 @@ static int arp_req_get(struct arpreq *r, struct net_device *dev) return err; }
-static int arp_invalidate(struct net_device *dev, __be32 ip) +int arp_invalidate(struct net_device *dev, __be32 ip, bool force) { struct neighbour *neigh = neigh_lookup(&arp_tbl, &ip, dev); int err = -ENXIO; struct neigh_table *tbl = &arp_tbl;
if (neigh) { + if ((neigh->nud_state & NUD_VALID) && !force) { + neigh_release(neigh); + return 0; + } + if (neigh->nud_state & ~NUD_NOARP) err = neigh_update(neigh, NULL, NUD_FAILED, NEIGH_UPDATE_F_OVERRIDE| @@ -1169,7 +1174,7 @@ static int arp_req_delete(struct net *net, struct arpreq *r, if (!dev) return -EINVAL; } - return arp_invalidate(dev, ip); + return arp_invalidate(dev, ip, true); }
/* diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 85117b45216d..89a5a4875595 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -1115,9 +1115,11 @@ void fib_add_ifaddr(struct in_ifaddr *ifa) return;
/* Add broadcast address, if it is explicitly assigned. */ - if (ifa->ifa_broadcast && ifa->ifa_broadcast != htonl(0xFFFFFFFF)) + if (ifa->ifa_broadcast && ifa->ifa_broadcast != htonl(0xFFFFFFFF)) { fib_magic(RTM_NEWROUTE, RTN_BROADCAST, ifa->ifa_broadcast, 32, prim, 0); + arp_invalidate(dev, ifa->ifa_broadcast, false); + }
if (!ipv4_is_zeronet(prefix) && !(ifa->ifa_flags & IFA_F_SECONDARY) && (prefix != addr || ifa->ifa_prefixlen < 32)) { @@ -1131,6 +1133,7 @@ void fib_add_ifaddr(struct in_ifaddr *ifa) if (ifa->ifa_prefixlen < 31) { fib_magic(RTM_NEWROUTE, RTN_BROADCAST, prefix | ~mask, 32, prim, 0); + arp_invalidate(dev, prefix | ~mask, false); } } }
From: Ping-Ke Shih pkshih@realtek.com
[ Upstream commit a0061be4e54b52e5e4ff179c3f817107ddbb2830 ]
Larry reported funny log entries [1] when he used rtl8821ce. These messages are not harmless, but not useful for users, so change them to rtw_dbg() level. By the way, I review all rtw_info() and change others to rtw_warn().
[1] https://lore.kernel.org/linux-wireless/c356d5ae-a7b3-3065-1121-64c446e70333@...
Reported-by: Larry Finger Larry.Finger@lwfinger.net Signed-off-by: Ping-Ke Shih pkshih@realtek.com Signed-off-by: Kalle Valo kvalo@kernel.org Link: https://lore.kernel.org/r/20220218035527.9835-1-pkshih@realtek.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/realtek/rtw88/debug.c | 2 +- drivers/net/wireless/realtek/rtw88/debug.h | 1 + drivers/net/wireless/realtek/rtw88/fw.c | 2 +- drivers/net/wireless/realtek/rtw88/mac80211.c | 8 ++++---- drivers/net/wireless/realtek/rtw88/main.c | 8 ++++---- drivers/net/wireless/realtek/rtw88/rtw8821c.c | 2 +- drivers/net/wireless/realtek/rtw88/rtw8822b.c | 4 ++-- drivers/net/wireless/realtek/rtw88/rtw8822c.c | 4 ++-- drivers/net/wireless/realtek/rtw88/sar.c | 8 ++++---- 9 files changed, 20 insertions(+), 19 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtw88/debug.c b/drivers/net/wireless/realtek/rtw88/debug.c index e429428232c1..e7e9f17df96a 100644 --- a/drivers/net/wireless/realtek/rtw88/debug.c +++ b/drivers/net/wireless/realtek/rtw88/debug.c @@ -390,7 +390,7 @@ static ssize_t rtw_debugfs_set_h2c(struct file *filp, ¶m[0], ¶m[1], ¶m[2], ¶m[3], ¶m[4], ¶m[5], ¶m[6], ¶m[7]); if (num != 8) { - rtw_info(rtwdev, "invalid H2C command format for debug\n"); + rtw_warn(rtwdev, "invalid H2C command format for debug\n"); return -EINVAL; }
diff --git a/drivers/net/wireless/realtek/rtw88/debug.h b/drivers/net/wireless/realtek/rtw88/debug.h index 61f8369fe2d6..066792dd96af 100644 --- a/drivers/net/wireless/realtek/rtw88/debug.h +++ b/drivers/net/wireless/realtek/rtw88/debug.h @@ -23,6 +23,7 @@ enum rtw_debug_mask { RTW_DBG_PATH_DIV = 0x00004000, RTW_DBG_ADAPTIVITY = 0x00008000, RTW_DBG_HW_SCAN = 0x00010000, + RTW_DBG_STATE = 0x00020000,
RTW_DBG_ALL = 0xffffffff }; diff --git a/drivers/net/wireless/realtek/rtw88/fw.c b/drivers/net/wireless/realtek/rtw88/fw.c index 4c8e5ea5d069..db90d75a8633 100644 --- a/drivers/net/wireless/realtek/rtw88/fw.c +++ b/drivers/net/wireless/realtek/rtw88/fw.c @@ -2131,7 +2131,7 @@ void rtw_hw_scan_status_report(struct rtw_dev *rtwdev, struct sk_buff *skb) rtw_hw_scan_complete(rtwdev, vif, aborted);
if (aborted) - rtw_info(rtwdev, "HW scan aborted with code: %d\n", rc); + rtw_dbg(rtwdev, RTW_DBG_HW_SCAN, "HW scan aborted with code: %d\n", rc); }
void rtw_store_op_chan(struct rtw_dev *rtwdev) diff --git a/drivers/net/wireless/realtek/rtw88/mac80211.c b/drivers/net/wireless/realtek/rtw88/mac80211.c index 647d2662955b..5cdc54c9a9aa 100644 --- a/drivers/net/wireless/realtek/rtw88/mac80211.c +++ b/drivers/net/wireless/realtek/rtw88/mac80211.c @@ -208,7 +208,7 @@ static int rtw_ops_add_interface(struct ieee80211_hw *hw,
mutex_unlock(&rtwdev->mutex);
- rtw_info(rtwdev, "start vif %pM on port %d\n", vif->addr, rtwvif->port); + rtw_dbg(rtwdev, RTW_DBG_STATE, "start vif %pM on port %d\n", vif->addr, rtwvif->port); return 0; }
@@ -219,7 +219,7 @@ static void rtw_ops_remove_interface(struct ieee80211_hw *hw, struct rtw_vif *rtwvif = (struct rtw_vif *)vif->drv_priv; u32 config = 0;
- rtw_info(rtwdev, "stop vif %pM on port %d\n", vif->addr, rtwvif->port); + rtw_dbg(rtwdev, RTW_DBG_STATE, "stop vif %pM on port %d\n", vif->addr, rtwvif->port);
mutex_lock(&rtwdev->mutex);
@@ -245,8 +245,8 @@ static int rtw_ops_change_interface(struct ieee80211_hw *hw, { struct rtw_dev *rtwdev = hw->priv;
- rtw_info(rtwdev, "change vif %pM (%d)->(%d), p2p (%d)->(%d)\n", - vif->addr, vif->type, type, vif->p2p, p2p); + rtw_dbg(rtwdev, RTW_DBG_STATE, "change vif %pM (%d)->(%d), p2p (%d)->(%d)\n", + vif->addr, vif->type, type, vif->p2p, p2p);
rtw_ops_remove_interface(hw, vif);
diff --git a/drivers/net/wireless/realtek/rtw88/main.c b/drivers/net/wireless/realtek/rtw88/main.c index 39c223a2e3e2..b00200f81db7 100644 --- a/drivers/net/wireless/realtek/rtw88/main.c +++ b/drivers/net/wireless/realtek/rtw88/main.c @@ -314,8 +314,8 @@ int rtw_sta_add(struct rtw_dev *rtwdev, struct ieee80211_sta *sta,
rtwdev->sta_cnt++; rtwdev->beacon_loss = false; - rtw_info(rtwdev, "sta %pM joined with macid %d\n", - sta->addr, si->mac_id); + rtw_dbg(rtwdev, RTW_DBG_STATE, "sta %pM joined with macid %d\n", + sta->addr, si->mac_id);
return 0; } @@ -336,8 +336,8 @@ void rtw_sta_remove(struct rtw_dev *rtwdev, struct ieee80211_sta *sta, kfree(si->mask);
rtwdev->sta_cnt--; - rtw_info(rtwdev, "sta %pM with macid %d left\n", - sta->addr, si->mac_id); + rtw_dbg(rtwdev, RTW_DBG_STATE, "sta %pM with macid %d left\n", + sta->addr, si->mac_id); }
struct rtw_fwcd_hdr { diff --git a/drivers/net/wireless/realtek/rtw88/rtw8821c.c b/drivers/net/wireless/realtek/rtw88/rtw8821c.c index db078df63f85..80d4761796b1 100644 --- a/drivers/net/wireless/realtek/rtw88/rtw8821c.c +++ b/drivers/net/wireless/realtek/rtw88/rtw8821c.c @@ -499,7 +499,7 @@ static s8 get_cck_rx_pwr(struct rtw_dev *rtwdev, u8 lna_idx, u8 vga_idx) }
if (lna_idx >= lna_gain_table_size) { - rtw_info(rtwdev, "incorrect lna index (%d)\n", lna_idx); + rtw_warn(rtwdev, "incorrect lna index (%d)\n", lna_idx); return -120; }
diff --git a/drivers/net/wireless/realtek/rtw88/rtw8822b.c b/drivers/net/wireless/realtek/rtw88/rtw8822b.c index dd4fbb82750d..a23806b69b0f 100644 --- a/drivers/net/wireless/realtek/rtw88/rtw8822b.c +++ b/drivers/net/wireless/realtek/rtw88/rtw8822b.c @@ -1012,12 +1012,12 @@ static int rtw8822b_set_antenna(struct rtw_dev *rtwdev, antenna_tx, antenna_rx);
if (!rtw8822b_check_rf_path(antenna_tx)) { - rtw_info(rtwdev, "unsupported tx path 0x%x\n", antenna_tx); + rtw_warn(rtwdev, "unsupported tx path 0x%x\n", antenna_tx); return -EINVAL; }
if (!rtw8822b_check_rf_path(antenna_rx)) { - rtw_info(rtwdev, "unsupported rx path 0x%x\n", antenna_rx); + rtw_warn(rtwdev, "unsupported rx path 0x%x\n", antenna_rx); return -EINVAL; }
diff --git a/drivers/net/wireless/realtek/rtw88/rtw8822c.c b/drivers/net/wireless/realtek/rtw88/rtw8822c.c index 35c46e5209de..ddf4d1a23e60 100644 --- a/drivers/net/wireless/realtek/rtw88/rtw8822c.c +++ b/drivers/net/wireless/realtek/rtw88/rtw8822c.c @@ -2798,7 +2798,7 @@ static int rtw8822c_set_antenna(struct rtw_dev *rtwdev, case BB_PATH_AB: break; default: - rtw_info(rtwdev, "unsupported tx path 0x%x\n", antenna_tx); + rtw_warn(rtwdev, "unsupported tx path 0x%x\n", antenna_tx); return -EINVAL; }
@@ -2808,7 +2808,7 @@ static int rtw8822c_set_antenna(struct rtw_dev *rtwdev, case BB_PATH_AB: break; default: - rtw_info(rtwdev, "unsupported rx path 0x%x\n", antenna_rx); + rtw_warn(rtwdev, "unsupported rx path 0x%x\n", antenna_rx); return -EINVAL; }
diff --git a/drivers/net/wireless/realtek/rtw88/sar.c b/drivers/net/wireless/realtek/rtw88/sar.c index 3383726c4d90..c472f1502b82 100644 --- a/drivers/net/wireless/realtek/rtw88/sar.c +++ b/drivers/net/wireless/realtek/rtw88/sar.c @@ -91,10 +91,10 @@ int rtw_set_sar_specs(struct rtw_dev *rtwdev, return -EINVAL;
power = sar->sub_specs[i].power; - rtw_info(rtwdev, "On freq %u to %u, set SAR %d in 1/%lu dBm\n", - rtw_common_sar_freq_ranges[idx].start_freq, - rtw_common_sar_freq_ranges[idx].end_freq, - power, BIT(RTW_COMMON_SAR_FCT)); + rtw_dbg(rtwdev, RTW_DBG_REGD, "On freq %u to %u, set SAR %d in 1/%lu dBm\n", + rtw_common_sar_freq_ranges[idx].start_freq, + rtw_common_sar_freq_ranges[idx].end_freq, + power, BIT(RTW_COMMON_SAR_FCT));
for (j = 0; j < RTW_RF_PATH_MAX; j++) { for (k = 0; k < RTW_RATE_SECTION_MAX; k++) {
From: Jordy Zomer jordy@jordyzomer.github.io
[ Upstream commit cd9c88da171a62c4b0f1c70e50c75845969fbc18 ]
It appears like cmd could be a Spectre v1 gadget as it's supplied by a user and used as an array index. Prevent the contents of kernel memory from being leaked to userspace via speculative execution by using array_index_nospec.
Signed-off-by: Jordy Zomer jordy@pwning.systems Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/dm-ioctl.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index 21fe8652b095..901abd6dea41 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -18,6 +18,7 @@ #include <linux/dm-ioctl.h> #include <linux/hdreg.h> #include <linux/compat.h> +#include <linux/nospec.h>
#include <linux/uaccess.h> #include <linux/ima.h> @@ -1788,6 +1789,7 @@ static ioctl_fn lookup_ioctl(unsigned int cmd, int *ioctl_flags) if (unlikely(cmd >= ARRAY_SIZE(_ioctls))) return NULL;
+ cmd = array_index_nospec(cmd, ARRAY_SIZE(_ioctls)); *ioctl_flags = _ioctls[cmd].flags; return _ioctls[cmd].fn; }
From: Mike Snitzer snitzer@redhat.com
[ Upstream commit fa247089de9936a46e290d4724cb5f0b845600f5 ]
Update both bio-based and request-based DM to requeue IO if the mapping table not available.
This race of IO being submitted before the DM device ready is so narrow, yet possible for initial table load given that the DM device's request_queue is created prior, that it best to requeue IO to handle this unlikely case.
Reported-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/dm-rq.c | 7 ++++++- drivers/md/dm.c | 11 +++-------- 2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index 579ab6183d4d..dffeb47a9efb 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -499,8 +499,13 @@ static blk_status_t dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
if (unlikely(!ti)) { int srcu_idx; - struct dm_table *map = dm_get_live_table(md, &srcu_idx); + struct dm_table *map;
+ map = dm_get_live_table(md, &srcu_idx); + if (unlikely(!map)) { + dm_put_live_table(md, srcu_idx); + return BLK_STS_RESOURCE; + } ti = dm_table_find_target(map, 0); dm_put_live_table(md, srcu_idx); } diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 394778d8bf54..dcb8d8fc7877 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1519,15 +1519,10 @@ static void dm_submit_bio(struct bio *bio) struct dm_table *map;
map = dm_get_live_table(md, &srcu_idx); - if (unlikely(!map)) { - DMERR_LIMIT("%s: mapping table unavailable, erroring io", - dm_device_name(md)); - bio_io_error(bio); - goto out; - }
- /* If suspended, queue this IO for later */ - if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags))) { + /* If suspended, or map not yet available, queue this IO for later */ + if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) || + unlikely(!map)) { if (bio->bi_opf & REQ_NOWAIT) bio_wouldblock_error(bio); else if (bio->bi_opf & REQ_RAHEAD)
From: Alex Deucher alexander.deucher@amd.com
[ Upstream commit 9dff13f9edf755a15f6507874185a3290c1ae8bb ]
The driver has a fallback so make the message informational rather than a warning. The driver has a fallback if the Component Resource Association Table (CRAT) is missing, so make this informational now.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1906 Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c index 9624bbe8b501..281def1c6c08 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c @@ -1567,7 +1567,7 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t *size) /* Fetch the CRAT table from ACPI */ status = acpi_get_table(CRAT_SIGNATURE, 0, &crat_table); if (status == AE_NOT_FOUND) { - pr_warn("CRAT table not found\n"); + pr_info("CRAT table not found\n"); return -ENODATA; } else if (ACPI_FAILURE(status)) { const char *err = acpi_format_exception(status);
From: Alex Williamson alex.williamson@redhat.com
[ Upstream commit 6e031ec0e5a2dda53e12e0d2a7e9b15b47a3c502 ]
Resolve build errors reported against UML build for undefined ioport_map() and ioport_unmap() functions. Without this config option a device cannot have vfio_pci_core_device.has_vga set, so the existing function would always return -EINVAL anyway.
Reported-by: Geert Uytterhoeven geert@linux-m68k.org Link: https://lore.kernel.org/r/20220123125737.2658758-1-geert@linux-m68k.org Link: https://lore.kernel.org/r/164306582968.3758255.15192949639574660648.stgit@om... Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vfio/pci/vfio_pci_rdwr.c | 2 ++ include/linux/vfio_pci_core.h | 9 +++++++++ 2 files changed, 11 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index 57d3b2cbbd8e..82ac1569deb0 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -288,6 +288,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, return done; }
+#ifdef CONFIG_VFIO_PCI_VGA ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf, size_t count, loff_t *ppos, bool iswrite) { @@ -355,6 +356,7 @@ ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf,
return done; } +#endif
static void vfio_pci_ioeventfd_do_write(struct vfio_pci_ioeventfd *ioeventfd, bool test_mem) diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index ef9a44b6cf5d..ae6f4838ab75 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -159,8 +159,17 @@ extern ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev, extern ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, size_t count, loff_t *ppos, bool iswrite);
+#ifdef CONFIG_VFIO_PCI_VGA extern ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf, size_t count, loff_t *ppos, bool iswrite); +#else +static inline ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, + char __user *buf, size_t count, + loff_t *ppos, bool iswrite) +{ + return -EINVAL; +} +#endif
extern long vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset, uint64_t data, int count, int fd);
From: Damien Le Moal damien.lemoal@opensource.wdc.com
[ Upstream commit 3762d8f6edcdb03994c919f9487fd6d336c06561 ]
The declaration of the local variable destination1 in pm80xx_pci_mem_copy() as a pointer to a u32 results in the sparse warning:
warning: incorrect type in assignment (different base types) expected unsigned int [usertype] got restricted __le32 [usertype]
Furthermore, the destination" argument of pm80xx_pci_mem_copy() is wrongly declared with the const attribute.
Fix both problems by changing the type of the "destination" argument to "__le32 *" and use this argument directly inside the pm80xx_pci_mem_copy() function, thus removing the need for the destination1 local variable.
Link: https://lore.kernel.org/r/20220220031810.738362-6-damien.lemoal@opensource.w... Reviewed-by: Jack Wang jinpu.wang@ionos.com Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/pm8001/pm80xx_hwi.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c b/drivers/scsi/pm8001/pm80xx_hwi.c index 908dbac20b48..9a0d65ac0174 100644 --- a/drivers/scsi/pm8001/pm80xx_hwi.c +++ b/drivers/scsi/pm8001/pm80xx_hwi.c @@ -67,18 +67,16 @@ int pm80xx_bar4_shift(struct pm8001_hba_info *pm8001_ha, u32 shift_value) }
static void pm80xx_pci_mem_copy(struct pm8001_hba_info *pm8001_ha, u32 soffset, - const void *destination, + __le32 *destination, u32 dw_count, u32 bus_base_number) { u32 index, value, offset; - u32 *destination1; - destination1 = (u32 *)destination;
- for (index = 0; index < dw_count; index += 4, destination1++) { + for (index = 0; index < dw_count; index += 4, destination++) { offset = (soffset + index); if (offset < (64 * 1024)) { value = pm8001_cr32(pm8001_ha, bus_base_number, offset); - *destination1 = cpu_to_le32(value); + *destination = cpu_to_le32(value); } } return;
From: Damien Le Moal damien.lemoal@opensource.wdc.com
[ Upstream commit 7e6b7e740addcea450041b5be8e42f0a4ceece0f ]
The call to pm8001_ccb_task_free() at the end of pm8001_mpi_task_abort_resp() already frees the ccb tag. So when the device NCQ_ABORT_ALL_FLAG is set, the tag should not be freed again. Also change the hardcoded 0xBFFFFFFF value to ~NCQ_ABORT_ALL_FLAG as it ought to be.
Link: https://lore.kernel.org/r/20220220031810.738362-19-damien.lemoal@opensource.... Reviewed-by: Jack Wang jinpu.wang@ionos.com Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/pm8001/pm8001_hwi.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/pm8001/pm8001_hwi.c b/drivers/scsi/pm8001/pm8001_hwi.c index d853e8d0195a..4f4a9dcb6a1e 100644 --- a/drivers/scsi/pm8001/pm8001_hwi.c +++ b/drivers/scsi/pm8001/pm8001_hwi.c @@ -3706,12 +3706,11 @@ int pm8001_mpi_task_abort_resp(struct pm8001_hba_info *pm8001_ha, void *piomb) mb();
if (pm8001_dev->id & NCQ_ABORT_ALL_FLAG) { - pm8001_tag_free(pm8001_ha, tag); sas_free_task(t); - /* clear the flag */ - pm8001_dev->id &= 0xBFFFFFFF; - } else + pm8001_dev->id &= ~NCQ_ABORT_ALL_FLAG; + } else { t->task_done(t); + }
return 0; }
From: Damien Le Moal damien.lemoal@opensource.wdc.com
[ Upstream commit 7fb23a785ba38dea907323a039123f231195b297 ]
The function pm8001_tag_alloc() determines free tags using the function find_first_zero_bit() which can return 0 when the first bit of the bitmap being inspected is 0. As such, tag 0 is a valid tag value that should not be dismissed as invalid. Fix the functions pm8001_work_fn(), mpi_sata_completion(), pm8001_mpi_task_abort_resp() and pm8001_open_reject_retry() to not dismiss 0 tags as invalid.
The value 0xffffffff is used for invalid tags for unused ccb information structures. Add the macro definition PM8001_INVALID_TAG to define this value.
Link: https://lore.kernel.org/r/20220220031810.738362-20-damien.lemoal@opensource.... Reviewed-by: Jack Wang jinpu.wang@ionos.com Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/pm8001/pm8001_hwi.c | 52 +++++++++++-------------------- drivers/scsi/pm8001/pm8001_init.c | 3 +- drivers/scsi/pm8001/pm8001_sas.c | 13 ++++---- drivers/scsi/pm8001/pm8001_sas.h | 2 ++ drivers/scsi/pm8001/pm80xx_hwi.c | 5 --- 5 files changed, 28 insertions(+), 47 deletions(-)
diff --git a/drivers/scsi/pm8001/pm8001_hwi.c b/drivers/scsi/pm8001/pm8001_hwi.c index 4f4a9dcb6a1e..d244231e62f9 100644 --- a/drivers/scsi/pm8001/pm8001_hwi.c +++ b/drivers/scsi/pm8001/pm8001_hwi.c @@ -1522,7 +1522,6 @@ void pm8001_work_fn(struct work_struct *work) case IO_XFER_ERROR_BREAK: { /* This one stashes the sas_task instead */ struct sas_task *t = (struct sas_task *)pm8001_dev; - u32 tag; struct pm8001_ccb_info *ccb; struct pm8001_hba_info *pm8001_ha = pw->pm8001_ha; unsigned long flags, flags1; @@ -1544,8 +1543,8 @@ void pm8001_work_fn(struct work_struct *work) /* Search for a possible ccb that matches the task */ for (i = 0; ccb = NULL, i < PM8001_MAX_CCB; i++) { ccb = &pm8001_ha->ccb_info[i]; - tag = ccb->ccb_tag; - if ((tag != 0xFFFFFFFF) && (ccb->task == t)) + if ((ccb->ccb_tag != PM8001_INVALID_TAG) && + (ccb->task == t)) break; } if (!ccb) { @@ -1567,11 +1566,11 @@ void pm8001_work_fn(struct work_struct *work) spin_unlock_irqrestore(&t->task_state_lock, flags1); pm8001_dbg(pm8001_ha, FAIL, "task 0x%p done with event 0x%x resp 0x%x stat 0x%x but aborted by upper layer!\n", t, pw->handler, ts->resp, ts->stat); - pm8001_ccb_task_free(pm8001_ha, t, ccb, tag); + pm8001_ccb_task_free(pm8001_ha, t, ccb, ccb->ccb_tag); spin_unlock_irqrestore(&pm8001_ha->lock, flags); } else { spin_unlock_irqrestore(&t->task_state_lock, flags1); - pm8001_ccb_task_free(pm8001_ha, t, ccb, tag); + pm8001_ccb_task_free(pm8001_ha, t, ccb, ccb->ccb_tag); mb();/* in order to force CPU ordering */ spin_unlock_irqrestore(&pm8001_ha->lock, flags); t->task_done(t); @@ -1580,7 +1579,6 @@ void pm8001_work_fn(struct work_struct *work) case IO_XFER_OPEN_RETRY_TIMEOUT: { /* This one stashes the sas_task instead */ struct sas_task *t = (struct sas_task *)pm8001_dev; - u32 tag; struct pm8001_ccb_info *ccb; struct pm8001_hba_info *pm8001_ha = pw->pm8001_ha; unsigned long flags, flags1; @@ -1614,8 +1612,8 @@ void pm8001_work_fn(struct work_struct *work) /* Search for a possible ccb that matches the task */ for (i = 0; ccb = NULL, i < PM8001_MAX_CCB; i++) { ccb = &pm8001_ha->ccb_info[i]; - tag = ccb->ccb_tag; - if ((tag != 0xFFFFFFFF) && (ccb->task == t)) + if ((ccb->ccb_tag != PM8001_INVALID_TAG) && + (ccb->task == t)) break; } if (!ccb) { @@ -1686,19 +1684,13 @@ void pm8001_work_fn(struct work_struct *work) struct task_status_struct *ts; struct sas_task *task; int i; - u32 tag, device_id; + u32 device_id;
for (i = 0; ccb = NULL, i < PM8001_MAX_CCB; i++) { ccb = &pm8001_ha->ccb_info[i]; task = ccb->task; ts = &task->task_status; - tag = ccb->ccb_tag; - /* check if tag is NULL */ - if (!tag) { - pm8001_dbg(pm8001_ha, FAIL, - "tag Null\n"); - continue; - } + if (task != NULL) { dev = task->dev; if (!dev) { @@ -1707,10 +1699,11 @@ void pm8001_work_fn(struct work_struct *work) continue; } /*complete sas task and update to top layer */ - pm8001_ccb_task_free(pm8001_ha, task, ccb, tag); + pm8001_ccb_task_free(pm8001_ha, task, ccb, + ccb->ccb_tag); ts->resp = SAS_TASK_COMPLETE; task->task_done(task); - } else if (tag != 0xFFFFFFFF) { + } else if (ccb->ccb_tag != PM8001_INVALID_TAG) { /* complete the internal commands/non-sas task */ pm8001_dev = ccb->device; if (pm8001_dev->dcompletion) { @@ -1718,7 +1711,7 @@ void pm8001_work_fn(struct work_struct *work) pm8001_dev->dcompletion = NULL; } complete(pm8001_ha->nvmd_completion); - pm8001_tag_free(pm8001_ha, tag); + pm8001_tag_free(pm8001_ha, ccb->ccb_tag); } } /* Deregister all the device ids */ @@ -2316,11 +2309,6 @@ mpi_sata_completion(struct pm8001_hba_info *pm8001_ha, void *piomb) param = le32_to_cpu(psataPayload->param); tag = le32_to_cpu(psataPayload->tag);
- if (!tag) { - pm8001_dbg(pm8001_ha, FAIL, "tag null\n"); - return; - } - ccb = &pm8001_ha->ccb_info[tag]; t = ccb->task; pm8001_dev = ccb->device; @@ -3056,7 +3044,7 @@ void pm8001_mpi_set_dev_state_resp(struct pm8001_hba_info *pm8001_ha, device_id, pds, nds, status); complete(pm8001_dev->setds_completion); ccb->task = NULL; - ccb->ccb_tag = 0xFFFFFFFF; + ccb->ccb_tag = PM8001_INVALID_TAG; pm8001_tag_free(pm8001_ha, tag); }
@@ -3074,7 +3062,7 @@ void pm8001_mpi_set_nvmd_resp(struct pm8001_hba_info *pm8001_ha, void *piomb) dlen_status); } ccb->task = NULL; - ccb->ccb_tag = 0xFFFFFFFF; + ccb->ccb_tag = PM8001_INVALID_TAG; pm8001_tag_free(pm8001_ha, tag); }
@@ -3101,7 +3089,7 @@ pm8001_mpi_get_nvmd_resp(struct pm8001_hba_info *pm8001_ha, void *piomb) * freed by requesting path anywhere. */ ccb->task = NULL; - ccb->ccb_tag = 0xFFFFFFFF; + ccb->ccb_tag = PM8001_INVALID_TAG; pm8001_tag_free(pm8001_ha, tag); return; } @@ -3147,7 +3135,7 @@ pm8001_mpi_get_nvmd_resp(struct pm8001_hba_info *pm8001_ha, void *piomb) complete(pm8001_ha->nvmd_completion); pm8001_dbg(pm8001_ha, MSG, "Get nvmd data complete!\n"); ccb->task = NULL; - ccb->ccb_tag = 0xFFFFFFFF; + ccb->ccb_tag = PM8001_INVALID_TAG; pm8001_tag_free(pm8001_ha, tag); }
@@ -3560,7 +3548,7 @@ int pm8001_mpi_reg_resp(struct pm8001_hba_info *pm8001_ha, void *piomb) } complete(pm8001_dev->dcompletion); ccb->task = NULL; - ccb->ccb_tag = 0xFFFFFFFF; + ccb->ccb_tag = PM8001_INVALID_TAG; pm8001_tag_free(pm8001_ha, htag); return 0; } @@ -3632,7 +3620,7 @@ int pm8001_mpi_fw_flash_update_resp(struct pm8001_hba_info *pm8001_ha, } kfree(ccb->fw_control_context); ccb->task = NULL; - ccb->ccb_tag = 0xFFFFFFFF; + ccb->ccb_tag = PM8001_INVALID_TAG; pm8001_tag_free(pm8001_ha, tag); complete(pm8001_ha->nvmd_completion); return 0; @@ -3668,10 +3656,6 @@ int pm8001_mpi_task_abort_resp(struct pm8001_hba_info *pm8001_ha, void *piomb)
status = le32_to_cpu(pPayload->status); tag = le32_to_cpu(pPayload->tag); - if (!tag) { - pm8001_dbg(pm8001_ha, FAIL, " TAG NULL. RETURNING !!!\n"); - return -1; - }
scp = le32_to_cpu(pPayload->scp); ccb = &pm8001_ha->ccb_info[tag]; diff --git a/drivers/scsi/pm8001/pm8001_init.c b/drivers/scsi/pm8001/pm8001_init.c index d8a2121cb8d9..d2a2593f669d 100644 --- a/drivers/scsi/pm8001/pm8001_init.c +++ b/drivers/scsi/pm8001/pm8001_init.c @@ -1216,10 +1216,11 @@ pm8001_init_ccb_tag(struct pm8001_hba_info *pm8001_ha, struct Scsi_Host *shost, goto err_out; } pm8001_ha->ccb_info[i].task = NULL; - pm8001_ha->ccb_info[i].ccb_tag = 0xffffffff; + pm8001_ha->ccb_info[i].ccb_tag = PM8001_INVALID_TAG; pm8001_ha->ccb_info[i].device = NULL; ++pm8001_ha->tags_num; } + return 0;
err_out_noccb: diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c index 32edda3e55c6..c1f871561b32 100644 --- a/drivers/scsi/pm8001/pm8001_sas.c +++ b/drivers/scsi/pm8001/pm8001_sas.c @@ -567,7 +567,7 @@ void pm8001_ccb_task_free(struct pm8001_hba_info *pm8001_ha,
task->lldd_task = NULL; ccb->task = NULL; - ccb->ccb_tag = 0xFFFFFFFF; + ccb->ccb_tag = PM8001_INVALID_TAG; ccb->open_retry = 0; pm8001_tag_free(pm8001_ha, ccb_idx); } @@ -952,9 +952,11 @@ void pm8001_open_reject_retry( struct task_status_struct *ts; struct pm8001_device *pm8001_dev; unsigned long flags1; - u32 tag; struct pm8001_ccb_info *ccb = &pm8001_ha->ccb_info[i];
+ if (ccb->ccb_tag == PM8001_INVALID_TAG) + continue; + pm8001_dev = ccb->device; if (!pm8001_dev || (pm8001_dev->dev_type == SAS_PHY_UNUSED)) continue; @@ -966,9 +968,6 @@ void pm8001_open_reject_retry( continue; } else if (pm8001_dev != device_to_close) continue; - tag = ccb->ccb_tag; - if (!tag || (tag == 0xFFFFFFFF)) - continue; task = ccb->task; if (!task || !task->task_done) continue; @@ -989,11 +988,11 @@ void pm8001_open_reject_retry( & SAS_TASK_STATE_ABORTED))) { spin_unlock_irqrestore(&task->task_state_lock, flags1); - pm8001_ccb_task_free(pm8001_ha, task, ccb, tag); + pm8001_ccb_task_free(pm8001_ha, task, ccb, ccb->ccb_tag); } else { spin_unlock_irqrestore(&task->task_state_lock, flags1); - pm8001_ccb_task_free(pm8001_ha, task, ccb, tag); + pm8001_ccb_task_free(pm8001_ha, task, ccb, ccb->ccb_tag); mb();/* in order to force CPU ordering */ spin_unlock_irqrestore(&pm8001_ha->lock, flags); task->task_done(task); diff --git a/drivers/scsi/pm8001/pm8001_sas.h b/drivers/scsi/pm8001/pm8001_sas.h index a17da1cebce1..1791cdf30276 100644 --- a/drivers/scsi/pm8001/pm8001_sas.h +++ b/drivers/scsi/pm8001/pm8001_sas.h @@ -738,6 +738,8 @@ void pm8001_free_dev(struct pm8001_device *pm8001_dev); /* ctl shared API */ extern const struct attribute_group *pm8001_host_groups[];
+#define PM8001_INVALID_TAG ((u32)-1) + static inline void pm8001_ccb_task_free_done(struct pm8001_hba_info *pm8001_ha, struct sas_task *task, struct pm8001_ccb_info *ccb, diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c b/drivers/scsi/pm8001/pm80xx_hwi.c index 9a0d65ac0174..728190b26924 100644 --- a/drivers/scsi/pm8001/pm80xx_hwi.c +++ b/drivers/scsi/pm8001/pm80xx_hwi.c @@ -2404,11 +2404,6 @@ mpi_sata_completion(struct pm8001_hba_info *pm8001_ha, param = le32_to_cpu(psataPayload->param); tag = le32_to_cpu(psataPayload->tag);
- if (!tag) { - pm8001_dbg(pm8001_ha, FAIL, "tag null\n"); - return; - } - ccb = &pm8001_ha->ccb_info[tag]; t = ccb->task; pm8001_dev = ccb->device;
From: Damien Le Moal damien.lemoal@opensource.wdc.com
[ Upstream commit f90a74892f3acf0cdec5844e90fc8686ca13e7d7 ]
In pm8001_send_abort_all(), make sure to free the allocated sas task if pm8001_tag_alloc() or pm8001_mpi_build_cmd() fail.
Link: https://lore.kernel.org/r/20220220031810.738362-21-damien.lemoal@opensource.... Reviewed-by: John Garry john.garry@huawei.com Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/pm8001/pm8001_hwi.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/scsi/pm8001/pm8001_hwi.c b/drivers/scsi/pm8001/pm8001_hwi.c index d244231e62f9..5ec429cf1e20 100644 --- a/drivers/scsi/pm8001/pm8001_hwi.c +++ b/drivers/scsi/pm8001/pm8001_hwi.c @@ -1765,7 +1765,6 @@ static void pm8001_send_abort_all(struct pm8001_hba_info *pm8001_ha, }
task = sas_alloc_slow_task(GFP_ATOMIC); - if (!task) { pm8001_dbg(pm8001_ha, FAIL, "cannot allocate task\n"); return; @@ -1774,8 +1773,10 @@ static void pm8001_send_abort_all(struct pm8001_hba_info *pm8001_ha, task->task_done = pm8001_task_done;
res = pm8001_tag_alloc(pm8001_ha, &ccb_tag); - if (res) + if (res) { + sas_free_task(task); return; + }
ccb = &pm8001_ha->ccb_info[ccb_tag]; ccb->device = pm8001_ha_dev; @@ -1792,8 +1793,10 @@ static void pm8001_send_abort_all(struct pm8001_hba_info *pm8001_ha,
ret = pm8001_mpi_build_cmd(pm8001_ha, circularQ, opc, &task_abort, sizeof(task_abort), 0); - if (ret) + if (ret) { + sas_free_task(task); pm8001_tag_free(pm8001_ha, ccb_tag); + }
}
From: Damien Le Moal damien.lemoal@opensource.wdc.com
[ Upstream commit 4c8f04b1905cd4b776d0b720463c091545478ef7 ]
In pm8001_chip_set_dev_state_req(), pm8001_chip_fw_flash_update_req(), pm80xx_chip_phy_ctl_req() and pm8001_chip_reg_dev_req() add missing calls to pm8001_tag_free() to free the allocated tag when pm8001_mpi_build_cmd() fails.
Similarly, in pm8001_exec_internal_task_abort(), if the chip ->task_abort method fails, the tag allocated for the abort request task must be freed. Add the missing call to pm8001_tag_free().
Link: https://lore.kernel.org/r/20220220031810.738362-22-damien.lemoal@opensource.... Reviewed-by: John Garry john.garry@huawei.com Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/pm8001/pm8001_hwi.c | 9 +++++++++ drivers/scsi/pm8001/pm8001_sas.c | 2 +- drivers/scsi/pm8001/pm80xx_hwi.c | 9 +++++++-- 3 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/drivers/scsi/pm8001/pm8001_hwi.c b/drivers/scsi/pm8001/pm8001_hwi.c index 5ec429cf1e20..ccc7f53ddbd6 100644 --- a/drivers/scsi/pm8001/pm8001_hwi.c +++ b/drivers/scsi/pm8001/pm8001_hwi.c @@ -4465,6 +4465,9 @@ static int pm8001_chip_reg_dev_req(struct pm8001_hba_info *pm8001_ha, SAS_ADDR_SIZE); rc = pm8001_mpi_build_cmd(pm8001_ha, circularQ, opc, &payload, sizeof(payload), 0); + if (rc) + pm8001_tag_free(pm8001_ha, tag); + return rc; }
@@ -4877,6 +4880,9 @@ pm8001_chip_fw_flash_update_req(struct pm8001_hba_info *pm8001_ha, ccb->ccb_tag = tag; rc = pm8001_chip_fw_flash_update_build(pm8001_ha, &flash_update_info, tag); + if (rc) + pm8001_tag_free(pm8001_ha, tag); + return rc; }
@@ -4981,6 +4987,9 @@ pm8001_chip_set_dev_state_req(struct pm8001_hba_info *pm8001_ha, payload.nds = cpu_to_le32(state); rc = pm8001_mpi_build_cmd(pm8001_ha, circularQ, opc, &payload, sizeof(payload), 0); + if (rc) + pm8001_tag_free(pm8001_ha, tag); + return rc;
} diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c index c1f871561b32..b68c8400ca15 100644 --- a/drivers/scsi/pm8001/pm8001_sas.c +++ b/drivers/scsi/pm8001/pm8001_sas.c @@ -847,10 +847,10 @@ pm8001_exec_internal_task_abort(struct pm8001_hba_info *pm8001_ha,
res = PM8001_CHIP_DISP->task_abort(pm8001_ha, pm8001_dev, flag, task_tag, ccb_tag); - if (res) { del_timer(&task->slow_task->timer); pm8001_dbg(pm8001_ha, FAIL, "Executing internal task failed\n"); + pm8001_tag_free(pm8001_ha, ccb_tag); goto ex_err; } wait_for_completion(&task->slow_task->completion); diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c b/drivers/scsi/pm8001/pm80xx_hwi.c index 728190b26924..55163469030d 100644 --- a/drivers/scsi/pm8001/pm80xx_hwi.c +++ b/drivers/scsi/pm8001/pm80xx_hwi.c @@ -4920,8 +4920,13 @@ static int pm80xx_chip_phy_ctl_req(struct pm8001_hba_info *pm8001_ha, payload.tag = cpu_to_le32(tag); payload.phyop_phyid = cpu_to_le32(((phy_op & 0xFF) << 8) | (phyId & 0xFF)); - return pm8001_mpi_build_cmd(pm8001_ha, circularQ, opc, &payload, - sizeof(payload), 0); + + rc = pm8001_mpi_build_cmd(pm8001_ha, circularQ, opc, &payload, + sizeof(payload), 0); + if (rc) + pm8001_tag_free(pm8001_ha, tag); + + return rc; }
static u32 pm80xx_chip_is_our_interrupt(struct pm8001_hba_info *pm8001_ha)
From: Damien Le Moal damien.lemoal@opensource.wdc.com
[ Upstream commit f792a3629f4c4aa4c3703d66b43ce1edcc3ec09a ]
In pm8001_chip_fw_flash_update_build(), if pm8001_chip_fw_flash_update_build() fails, the struct fw_control_ex allocated must be freed.
Link: https://lore.kernel.org/r/20220220031810.738362-23-damien.lemoal@opensource.... Reviewed-by: Jack Wang jinpu.wang@ionos.com Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/pm8001/pm8001_hwi.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/pm8001/pm8001_hwi.c b/drivers/scsi/pm8001/pm8001_hwi.c index ccc7f53ddbd6..27ead825c2bb 100644 --- a/drivers/scsi/pm8001/pm8001_hwi.c +++ b/drivers/scsi/pm8001/pm8001_hwi.c @@ -4880,8 +4880,10 @@ pm8001_chip_fw_flash_update_req(struct pm8001_hba_info *pm8001_ha, ccb->ccb_tag = tag; rc = pm8001_chip_fw_flash_update_build(pm8001_ha, &flash_update_info, tag); - if (rc) + if (rc) { + kfree(fw_control_context); pm8001_tag_free(pm8001_ha, tag); + }
return rc; }
From: Johan Almbladh johan.almbladh@anyfinetworks.com
[ Upstream commit 28225a6ef80ebf46c46e5fbd5b1ee231a0b2b5b7 ]
Before, the hardware would be allowed to transmit injected 802.11 MPDUs as A-MSDU. This resulted in corrupted frames being transmitted. Now, injected MPDUs are transmitted as-is, without A-MSDU.
The fix was verified with frame injection on MT7915 hardware, both with and without the injected frame being encrypted.
If the hardware cannot do A-MSDU aggregation on MPDUs, this problem would also be present in the TX path where mac80211 does the 802.11 encapsulation. However, I have not observed any such problem when disabling IEEE80211_HW_SUPPORTS_TX_ENCAP_OFFLOAD to force that mode. Therefore this fix is isolated to injected frames only.
The same A-MSDU logic is also present in the mt7921 driver, so it is likely that this fix should be applied there too. I do not have access to mt7921 hardware so I have not been able to test that.
Signed-off-by: Johan Almbladh johan.almbladh@anyfinetworks.com Signed-off-by: Felix Fietkau nbd@nbd.name Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/mediatek/mt76/mt7915/mac.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/mediatek/mt76/mt7915/mac.c b/drivers/net/wireless/mediatek/mt76/mt7915/mac.c index db267642924d..e4c300aa1526 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7915/mac.c +++ b/drivers/net/wireless/mediatek/mt76/mt7915/mac.c @@ -1085,6 +1085,7 @@ mt7915_mac_write_txwi_80211(struct mt7915_dev *dev, __le32 *txwi, val = MT_TXD3_SN_VALID | FIELD_PREP(MT_TXD3_SEQ, IEEE80211_SEQ_TO_SN(seqno)); txwi[3] |= cpu_to_le32(val); + txwi[7] &= ~cpu_to_le32(MT_TXD7_HW_AMSDU); }
val = FIELD_PREP(MT_TXD7_TYPE, fc_type) |
From: Matt Johnston matt@codeconstruct.com.au
[ Upstream commit dc121c0084910db985cf1c8ba6fce5d8c307cc02 ]
Previously there was a race that could allow the mctp_dev refcount to hit zero:
rcu_read_lock(); mdev = __mctp_dev_get(dev); // mctp_unregister() happens here, mdev->refs hits zero mctp_dev_hold(dev); rcu_read_unlock();
Now we make __mctp_dev_get() take the hold itself. It is safe to test against the zero refcount because __mctp_dev_get() is called holding rcu_read_lock and mctp_dev uses kfree_rcu().
Reported-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Matt Johnston matt@codeconstruct.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/mctp/device.c | 21 ++++++++++++++++++--- net/mctp/route.c | 5 ++++- net/mctp/test/utils.c | 1 - 3 files changed, 22 insertions(+), 5 deletions(-)
diff --git a/net/mctp/device.c b/net/mctp/device.c index ef2755f82f87..f86ef6d751bd 100644 --- a/net/mctp/device.c +++ b/net/mctp/device.c @@ -24,12 +24,25 @@ struct mctp_dump_cb { size_t a_idx; };
-/* unlocked: caller must hold rcu_read_lock */ +/* unlocked: caller must hold rcu_read_lock. + * Returned mctp_dev has its refcount incremented, or NULL if unset. + */ struct mctp_dev *__mctp_dev_get(const struct net_device *dev) { - return rcu_dereference(dev->mctp_ptr); + struct mctp_dev *mdev = rcu_dereference(dev->mctp_ptr); + + /* RCU guarantees that any mdev is still live. + * Zero refcount implies a pending free, return NULL. + */ + if (mdev) + if (!refcount_inc_not_zero(&mdev->refs)) + return NULL; + return mdev; }
+/* Returned mctp_dev does not have refcount incremented. The returned pointer + * remains live while rtnl_lock is held, as that prevents mctp_unregister() + */ struct mctp_dev *mctp_dev_get_rtnl(const struct net_device *dev) { return rtnl_dereference(dev->mctp_ptr); @@ -123,6 +136,7 @@ static int mctp_dump_addrinfo(struct sk_buff *skb, struct netlink_callback *cb) if (mdev) { rc = mctp_dump_dev_addrinfo(mdev, skb, cb); + mctp_dev_put(mdev); // Error indicates full buffer, this // callback will get retried. if (rc < 0) @@ -297,7 +311,7 @@ void mctp_dev_hold(struct mctp_dev *mdev)
void mctp_dev_put(struct mctp_dev *mdev) { - if (refcount_dec_and_test(&mdev->refs)) { + if (mdev && refcount_dec_and_test(&mdev->refs)) { dev_put(mdev->dev); kfree_rcu(mdev, rcu); } @@ -369,6 +383,7 @@ static size_t mctp_get_link_af_size(const struct net_device *dev, if (!mdev) return 0; ret = nla_total_size(4); /* IFLA_MCTP_NET */ + mctp_dev_put(mdev); return ret; }
diff --git a/net/mctp/route.c b/net/mctp/route.c index e52cef750500..05fbd318eb98 100644 --- a/net/mctp/route.c +++ b/net/mctp/route.c @@ -786,7 +786,7 @@ int mctp_local_output(struct sock *sk, struct mctp_route *rt, { struct mctp_sock *msk = container_of(sk, struct mctp_sock, sk); struct mctp_skb_cb *cb = mctp_cb(skb); - struct mctp_route tmp_rt; + struct mctp_route tmp_rt = {0}; struct mctp_sk_key *key; struct net_device *dev; struct mctp_hdr *hdr; @@ -892,6 +892,7 @@ int mctp_local_output(struct sock *sk, struct mctp_route *rt, mctp_route_release(rt);
dev_put(dev); + mctp_dev_put(tmp_rt.dev);
return rc;
@@ -1057,11 +1058,13 @@ static int mctp_pkttype_receive(struct sk_buff *skb, struct net_device *dev,
rt->output(rt, skb); mctp_route_release(rt); + mctp_dev_put(mdev);
return NET_RX_SUCCESS;
err_drop: kfree_skb(skb); + mctp_dev_put(mdev); return NET_RX_DROP; }
diff --git a/net/mctp/test/utils.c b/net/mctp/test/utils.c index 7b7918702592..e03ba66bbe18 100644 --- a/net/mctp/test/utils.c +++ b/net/mctp/test/utils.c @@ -54,7 +54,6 @@ struct mctp_test_dev *mctp_test_create_dev(void)
rcu_read_lock(); dev->mdev = __mctp_dev_get(ndev); - mctp_dev_hold(dev->mdev); rcu_read_unlock();
return dev;
From: Nicholas Piggin npiggin@gmail.com
[ Upstream commit 8b91cee5eadd2021f55e6775f2d50bd56d00c217 ]
Hash faults are not resoved in NMI context, instead causing the access to fail. This is done because perf interrupts can get backtraces including walking the user stack, and taking a hash fault on those could deadlock on the HPTE lock if the perf interrupt hits while the same HPTE lock is being held by the hash fault code. The user-access for the stack walking will notice the access failed and deal with that in the perf code.
The reason to allow perf interrupts in is to better profile hash faults.
The problem with this is any hash fault on a kernel access that happens in NMI context will crash, because kernel accesses must not fail.
Hard lockups, system reset, machine checks that access vmalloc space including modules and including stack backtracing and symbol lookup in modules, per-cpu data, etc could all run into this problem.
Fix this by disallowing perf interrupts in the hash fault code (the direct hash fault is covered by MSR[EE]=0 so the PMI disable just needs to extend to the preload case). This simplifies the tricky logic in hash faults and perf, at the cost of reduced profiling of hash faults.
perf can still latch addresses when interrupts are disabled, it just won't get the stack trace at that point, so it would still find hot spots, just sometimes with confusing stack chains.
An alternative could be to allow perf interrupts here but always do the slowpath stack walk if we are in nmi context, but that slows down all perf interrupt stack walking on hash though and it does not remove as much tricky code.
Reported-by: Laurent Dufour ldufour@linux.ibm.com Signed-off-by: Nicholas Piggin npiggin@gmail.com Tested-by: Laurent Dufour ldufour@linux.ibm.com Reviewed-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220204035348.545435-1-npiggin@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/interrupt.h | 2 +- arch/powerpc/mm/book3s64/hash_utils.c | 54 ++++----------------------- arch/powerpc/perf/callchain.h | 9 +---- arch/powerpc/perf/callchain_64.c | 27 -------------- 4 files changed, 10 insertions(+), 82 deletions(-)
diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h index fc28f46d2f9d..5404f7abbcf8 100644 --- a/arch/powerpc/include/asm/interrupt.h +++ b/arch/powerpc/include/asm/interrupt.h @@ -612,7 +612,7 @@ DECLARE_INTERRUPT_HANDLER_RAW(do_slb_fault); DECLARE_INTERRUPT_HANDLER(do_bad_segment_interrupt);
/* hash_utils.c */ -DECLARE_INTERRUPT_HANDLER_RAW(do_hash_fault); +DECLARE_INTERRUPT_HANDLER(do_hash_fault);
/* fault.c */ DECLARE_INTERRUPT_HANDLER(do_page_fault); diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c index 7abf82a698d3..985cabdd7f67 100644 --- a/arch/powerpc/mm/book3s64/hash_utils.c +++ b/arch/powerpc/mm/book3s64/hash_utils.c @@ -1621,8 +1621,7 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap, } EXPORT_SYMBOL_GPL(hash_page);
-DECLARE_INTERRUPT_HANDLER(__do_hash_fault); -DEFINE_INTERRUPT_HANDLER(__do_hash_fault) +DEFINE_INTERRUPT_HANDLER(do_hash_fault) { unsigned long ea = regs->dar; unsigned long dsisr = regs->dsisr; @@ -1681,35 +1680,6 @@ DEFINE_INTERRUPT_HANDLER(__do_hash_fault) } }
-/* - * The _RAW interrupt entry checks for the in_nmi() case before - * running the full handler. - */ -DEFINE_INTERRUPT_HANDLER_RAW(do_hash_fault) -{ - /* - * If we are in an "NMI" (e.g., an interrupt when soft-disabled), then - * don't call hash_page, just fail the fault. This is required to - * prevent re-entrancy problems in the hash code, namely perf - * interrupts hitting while something holds H_PAGE_BUSY, and taking a - * hash fault. See the comment in hash_preload(). - * - * We come here as a result of a DSI at a point where we don't want - * to call hash_page, such as when we are accessing memory (possibly - * user memory) inside a PMU interrupt that occurred while interrupts - * were soft-disabled. We want to invoke the exception handler for - * the access, or panic if there isn't a handler. - */ - if (unlikely(in_nmi())) { - do_bad_page_fault_segv(regs); - return 0; - } - - __do_hash_fault(regs); - - return 0; -} - #ifdef CONFIG_PPC_MM_SLICES static bool should_hash_preload(struct mm_struct *mm, unsigned long ea) { @@ -1776,26 +1746,18 @@ static void hash_preload(struct mm_struct *mm, pte_t *ptep, unsigned long ea, #endif /* CONFIG_PPC_64K_PAGES */
/* - * __hash_page_* must run with interrupts off, as it sets the - * H_PAGE_BUSY bit. It's possible for perf interrupts to hit at any - * time and may take a hash fault reading the user stack, see - * read_user_stack_slow() in the powerpc/perf code. - * - * If that takes a hash fault on the same page as we lock here, it - * will bail out when seeing H_PAGE_BUSY set, and retry the access - * leading to an infinite loop. + * __hash_page_* must run with interrupts off, including PMI interrupts + * off, as it sets the H_PAGE_BUSY bit. * - * Disabling interrupts here does not prevent perf interrupts, but it - * will prevent them taking hash faults (see the NMI test in - * do_hash_page), then read_user_stack's copy_from_user_nofault will - * fail and perf will fall back to read_user_stack_slow(), which - * walks the Linux page tables. + * It's otherwise possible for perf interrupts to hit at any time and + * may take a hash fault reading the user stack, which could take a + * hash miss and deadlock on the same H_PAGE_BUSY bit. * * Interrupts must also be off for the duration of the * mm_is_thread_local test and update, to prevent preempt running the * mm on another CPU (XXX: this may be racy vs kthread_use_mm). */ - local_irq_save(flags); + powerpc_local_irq_pmu_save(flags);
/* Is that local to this CPU ? */ if (mm_is_thread_local(mm)) @@ -1820,7 +1782,7 @@ static void hash_preload(struct mm_struct *mm, pte_t *ptep, unsigned long ea, mm_ctx_user_psize(&mm->context), pte_val(*ptep));
- local_irq_restore(flags); + powerpc_local_irq_pmu_restore(flags); }
/* diff --git a/arch/powerpc/perf/callchain.h b/arch/powerpc/perf/callchain.h index d6fa6e25234f..19a8d051ddf1 100644 --- a/arch/powerpc/perf/callchain.h +++ b/arch/powerpc/perf/callchain.h @@ -2,7 +2,6 @@ #ifndef _POWERPC_PERF_CALLCHAIN_H #define _POWERPC_PERF_CALLCHAIN_H
-int read_user_stack_slow(const void __user *ptr, void *buf, int nb); void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs); void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry, @@ -26,17 +25,11 @@ static inline int __read_user_stack(const void __user *ptr, void *ret, size_t size) { unsigned long addr = (unsigned long)ptr; - int rc;
if (addr > TASK_SIZE - size || (addr & (size - 1))) return -EFAULT;
- rc = copy_from_user_nofault(ret, ptr, size); - - if (IS_ENABLED(CONFIG_PPC64) && !radix_enabled() && rc) - return read_user_stack_slow(ptr, ret, size); - - return rc; + return copy_from_user_nofault(ret, ptr, size); }
#endif /* _POWERPC_PERF_CALLCHAIN_H */ diff --git a/arch/powerpc/perf/callchain_64.c b/arch/powerpc/perf/callchain_64.c index 8d0df4226328..488e8a21a11e 100644 --- a/arch/powerpc/perf/callchain_64.c +++ b/arch/powerpc/perf/callchain_64.c @@ -18,33 +18,6 @@
#include "callchain.h"
-/* - * On 64-bit we don't want to invoke hash_page on user addresses from - * interrupt context, so if the access faults, we read the page tables - * to find which page (if any) is mapped and access it directly. Radix - * has no need for this so it doesn't use read_user_stack_slow. - */ -int read_user_stack_slow(const void __user *ptr, void *buf, int nb) -{ - - unsigned long addr = (unsigned long) ptr; - unsigned long offset; - struct page *page; - void *kaddr; - - if (get_user_page_fast_only(addr, FOLL_WRITE, &page)) { - kaddr = page_address(page); - - /* align address to page boundary */ - offset = addr & ~PAGE_MASK; - - memcpy(buf, kaddr + offset, nb); - put_page(page); - return 0; - } - return -EFAULT; -} - static int read_user_stack_64(const unsigned long __user *ptr, unsigned long *ret) { return __read_user_stack(ptr, ret, sizeof(*ret));
From: Yang Li yang.lee@linux.alibaba.com
[ Upstream commit 9273ffcc9a11942bd586bb42584337ef3962b692 ]
Smatch reports the following: drivers/net/wireless/mediatek/mt76/mt7615/mac.c:1865 mt7615_mac_adjust_sensitivity() warn: assigning (-110) to unsigned variable 'def_th' drivers/net/wireless/mediatek/mt76/mt7615/mac.c:1865 mt7615_mac_adjust_sensitivity() warn: assigning (-98) to unsigned variable 'def_th'
Reported-by: Abaci Robot abaci@linux.alibaba.com Signed-off-by: Yang Li yang.lee@linux.alibaba.com Signed-off-by: Felix Fietkau nbd@nbd.name Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/mediatek/mt76/mt7615/mac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/mediatek/mt76/mt7615/mac.c b/drivers/net/wireless/mediatek/mt76/mt7615/mac.c index ba31bb7caaf9..5d69e77814c9 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7615/mac.c +++ b/drivers/net/wireless/mediatek/mt76/mt7615/mac.c @@ -1841,7 +1841,7 @@ mt7615_mac_adjust_sensitivity(struct mt7615_phy *phy, struct mt7615_dev *dev = phy->dev; int false_cca = ofdm ? phy->false_cca_ofdm : phy->false_cca_cck; bool ext_phy = phy != &dev->phy; - u16 def_th = ofdm ? -98 : -110; + s16 def_th = ofdm ? -98 : -110; bool update = false; s8 *sensitivity; int signal;
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit 00d0566614b7bb7b226cb5a6895b0180ffe6915a ]
Normally the native AXP288 fg/charger drivers are preferred but one some devices the ACPI drivers should be used instead.
The ACPI battery/ac drivers use the acpi_quirk_skip_acpi_ac_and_battery() helper to determine if they should skip loading because native fuel-gauge/ charger drivers like the AXP288 drivers will be used.
The new acpi_quirk_skip_acpi_ac_and_battery() helper includes a list of exceptions for boards where the ACPI drivers should be used instead.
Use this new helper to avoid loading on such boards. Note this requires adding a Kconfig dependency on ACPI, this is not a problem because ACPI should be enabled on all boards with an AXP288 PMIC anyways.
Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/power/supply/Kconfig | 2 +- drivers/power/supply/axp288_charger.c | 7 +++++++ 2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/power/supply/Kconfig b/drivers/power/supply/Kconfig index b366e2fd8e97..d7534f12e9ef 100644 --- a/drivers/power/supply/Kconfig +++ b/drivers/power/supply/Kconfig @@ -351,7 +351,7 @@ config AXP20X_POWER
config AXP288_CHARGER tristate "X-Powers AXP288 Charger" - depends on MFD_AXP20X && EXTCON_AXP288 && IOSF_MBI + depends on MFD_AXP20X && EXTCON_AXP288 && IOSF_MBI && ACPI help Say yes here to have support X-Power AXP288 power management IC (PMIC) integrated charger. diff --git a/drivers/power/supply/axp288_charger.c b/drivers/power/supply/axp288_charger.c index c498e62ab4e2..19746e658a6a 100644 --- a/drivers/power/supply/axp288_charger.c +++ b/drivers/power/supply/axp288_charger.c @@ -838,6 +838,13 @@ static int axp288_charger_probe(struct platform_device *pdev) struct power_supply_config charger_cfg = {}; unsigned int val;
+ /* + * Normally the native AXP288 fg/charger drivers are preferred but + * on some devices the ACPI drivers should be used instead. + */ + if (!acpi_quirk_skip_acpi_ac_and_battery()) + return -ENODEV; + /* * On some devices the fuelgauge and charger parts of the axp288 are * not used, check that the fuelgauge is enabled (CC_CTRL != 0).
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit da365db704d290fb4dc4cdbd41f60b0ecec1cc03 ]
Normally the native AXP288 fg/charger drivers are preferred but one some devices the ACPI drivers should be used instead.
The ACPI battery/ac drivers use the acpi_quirk_skip_acpi_ac_and_battery() helper to determine if they should skip loading because native fuel-gauge/ charger drivers like the AXP288 drivers will be used.
The new acpi_quirk_skip_acpi_ac_and_battery() helper includes a list of exceptions for boards where the ACPI drivers should be used instead.
Use this new helper to avoid loading on such boards. Note this requires adding a Kconfig dependency on ACPI, this is not a problem because ACPI should be enabled on all boards with an AXP288 PMIC anyways.
Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/power/supply/Kconfig | 2 +- drivers/power/supply/axp288_fuel_gauge.c | 14 ++++++++------ 2 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/drivers/power/supply/Kconfig b/drivers/power/supply/Kconfig index d7534f12e9ef..5e4a69352811 100644 --- a/drivers/power/supply/Kconfig +++ b/drivers/power/supply/Kconfig @@ -358,7 +358,7 @@ config AXP288_CHARGER
config AXP288_FUEL_GAUGE tristate "X-Powers AXP288 Fuel Gauge" - depends on MFD_AXP20X && IIO && IOSF_MBI + depends on MFD_AXP20X && IIO && IOSF_MBI && ACPI help Say yes here to have support for X-Power power management IC (PMIC) Fuel Gauge. The device provides battery statistics and status diff --git a/drivers/power/supply/axp288_fuel_gauge.c b/drivers/power/supply/axp288_fuel_gauge.c index c1da217fdb0e..ce8ffd0a41b5 100644 --- a/drivers/power/supply/axp288_fuel_gauge.c +++ b/drivers/power/supply/axp288_fuel_gauge.c @@ -9,6 +9,7 @@ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */
+#include <linux/acpi.h> #include <linux/dmi.h> #include <linux/module.h> #include <linux/kernel.h> @@ -560,12 +561,6 @@ static const struct dmi_system_id axp288_no_battery_list[] = { DMI_EXACT_MATCH(DMI_BIOS_VERSION, "1.000"), }, }, - { - /* ECS EF20EA */ - .matches = { - DMI_MATCH(DMI_PRODUCT_NAME, "EF20EA"), - }, - }, { /* Intel Cherry Trail Compute Stick, Windows version */ .matches = { @@ -624,6 +619,13 @@ static int axp288_fuel_gauge_probe(struct platform_device *pdev) }; unsigned int val;
+ /* + * Normally the native AXP288 fg/charger drivers are preferred but + * on some devices the ACPI drivers should be used instead. + */ + if (!acpi_quirk_skip_acpi_ac_and_battery()) + return -ENODEV; + if (dmi_check_system(axp288_no_battery_list)) return -ENODEV;
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit cc8294ec4738d25e2bb2d71f7d82a9bf7f4a157b ]
__setup() handlers should return 1 if the command line option is handled and 0 if not (or maybe never return 0; doing so just pollutes init's environment with strings that are not init arguments/parameters).
Return 1 from aha152x_setup() to indicate that the boot option has been handled.
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru Link: https://lore.kernel.org/r/20220223000623.5920-1-rdunlap@infradead.org Cc: "Juergen E. Fischer" fischer@norbit.de Cc: "James E.J. Bottomley" jejb@linux.ibm.com Cc: "Martin K. Petersen" martin.petersen@oracle.com Reported-by: Igor Zhbanov i.zhbanov@omprussia.ru Signed-off-by: Randy Dunlap rdunlap@infradead.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/aha152x.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/aha152x.c b/drivers/scsi/aha152x.c index d17880b57d17..2449b4215b32 100644 --- a/drivers/scsi/aha152x.c +++ b/drivers/scsi/aha152x.c @@ -3375,13 +3375,11 @@ static int __init aha152x_setup(char *str) setup[setup_count].synchronous = ints[0] >= 6 ? ints[6] : 1; setup[setup_count].delay = ints[0] >= 7 ? ints[7] : DELAY_DEFAULT; setup[setup_count].ext_trans = ints[0] >= 8 ? ints[8] : 0; - if (ints[0] > 8) { /*}*/ + if (ints[0] > 8) printk(KERN_NOTICE "aha152x: usage: aha152x=<IOBASE>[,<IRQ>[,<SCSI ID>" "[,<RECONNECT>[,<PARITY>[,<SYNCHRONOUS>[,<DELAY>[,<EXT_TRANS>]]]]]]]\n"); - } else { + else setup_count++; - return 0; - }
return 1; }
From: Qi Liu liuqi115@huawei.com
[ Upstream commit 554fb72ee34f4732c7f694f56c3c6e67790352a0 ]
If the driver probe fails to request the channel IRQ or fatal IRQ, the driver will free the IRQ vectors before freeing the IRQs in free_irq(), and this will cause a kernel BUG like this:
------------[ cut here ]------------ kernel BUG at drivers/pci/msi.c:369! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Call trace: free_msi_irqs+0x118/0x13c pci_disable_msi+0xfc/0x120 pci_free_irq_vectors+0x24/0x3c hisi_sas_v3_probe+0x360/0x9d0 [hisi_sas_v3_hw] local_pci_probe+0x44/0xb0 work_for_cpu_fn+0x20/0x34 process_one_work+0x1d0/0x340 worker_thread+0x2e0/0x460 kthread+0x180/0x190 ret_from_fork+0x10/0x20 ---[ end trace b88990335b610c11 ]---
So we use devm_add_action() to control the order in which we free the vectors.
Link: https://lore.kernel.org/r/1645703489-87194-4-git-send-email-john.garry@huawe... Signed-off-by: Qi Liu liuqi115@huawei.com Signed-off-by: John Garry john.garry@huawei.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index 70173389f6eb..104ef772b512 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -2398,17 +2398,25 @@ static irqreturn_t cq_interrupt_v3_hw(int irq_no, void *p) return IRQ_WAKE_THREAD; }
+static void hisi_sas_v3_free_vectors(void *data) +{ + struct pci_dev *pdev = data; + + pci_free_irq_vectors(pdev); +} + static int interrupt_preinit_v3_hw(struct hisi_hba *hisi_hba) { int vectors; int max_msi = HISI_SAS_MSI_COUNT_V3_HW, min_msi; struct Scsi_Host *shost = hisi_hba->shost; + struct pci_dev *pdev = hisi_hba->pci_dev; struct irq_affinity desc = { .pre_vectors = BASE_VECTORS_V3_HW, };
min_msi = MIN_AFFINE_VECTORS_V3_HW; - vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev, + vectors = pci_alloc_irq_vectors_affinity(pdev, min_msi, max_msi, PCI_IRQ_MSI | PCI_IRQ_AFFINITY, @@ -2420,6 +2428,7 @@ static int interrupt_preinit_v3_hw(struct hisi_hba *hisi_hba) hisi_hba->cq_nvecs = vectors - BASE_VECTORS_V3_HW; shost->nr_hw_queues = hisi_hba->cq_nvecs;
+ devm_add_action(&pdev->dev, hisi_sas_v3_free_vectors, pdev); return 0; }
@@ -4769,7 +4778,7 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct pci_device_id *id)
rc = scsi_add_host(shost, dev); if (rc) - goto err_out_free_irq_vectors; + goto err_out_debugfs;
rc = sas_register_ha(sha); if (rc) @@ -4800,8 +4809,6 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct pci_device_id *id) sas_unregister_ha(sha); err_out_register_ha: scsi_remove_host(shost); -err_out_free_irq_vectors: - pci_free_irq_vectors(pdev); err_out_debugfs: debugfs_exit_v3_hw(hisi_hba); err_out_ha: @@ -4825,7 +4832,6 @@ hisi_sas_v3_destroy_irqs(struct pci_dev *pdev, struct hisi_hba *hisi_hba)
devm_free_irq(&pdev->dev, pci_irq_vector(pdev, nr), cq); } - pci_free_irq_vectors(pdev); }
static void hisi_sas_v3_remove(struct pci_dev *pdev)
From: Xiang Chen chenxiang66@hisilicon.com
[ Upstream commit 286ce4c65fbdf5eb9d4d5f4e4997c4e32bf1b073 ]
Add a file operation for "cnt" file under bist directory, so users can only read "cnt" or clear "cnt" to zero, but cannot randomly modify.
Link: https://lore.kernel.org/r/1645703489-87194-6-git-send-email-john.garry@huawe... Signed-off-by: Xiang Chen chenxiang66@hisilicon.com Signed-off-by: Qi Liu liuqi115@huawei.com Signed-off-by: John Garry john.garry@huawei.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 52 +++++++++++++++++++++++++- 1 file changed, 50 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index 104ef772b512..52089538e9de 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -3976,6 +3976,54 @@ static const struct file_operations debugfs_bist_phy_v3_hw_fops = { .owner = THIS_MODULE, };
+static ssize_t debugfs_bist_cnt_v3_hw_write(struct file *filp, + const char __user *buf, + size_t count, loff_t *ppos) +{ + struct seq_file *m = filp->private_data; + struct hisi_hba *hisi_hba = m->private; + unsigned int cnt; + int val; + + if (hisi_hba->debugfs_bist_enable) + return -EPERM; + + val = kstrtouint_from_user(buf, count, 0, &cnt); + if (val) + return val; + + if (cnt) + return -EINVAL; + + hisi_hba->debugfs_bist_cnt = 0; + return count; +} + +static int debugfs_bist_cnt_v3_hw_show(struct seq_file *s, void *p) +{ + struct hisi_hba *hisi_hba = s->private; + + seq_printf(s, "%u\n", hisi_hba->debugfs_bist_cnt); + + return 0; +} + +static int debugfs_bist_cnt_v3_hw_open(struct inode *inode, + struct file *filp) +{ + return single_open(filp, debugfs_bist_cnt_v3_hw_show, + inode->i_private); +} + +static const struct file_operations debugfs_bist_cnt_v3_hw_ops = { + .open = debugfs_bist_cnt_v3_hw_open, + .read = seq_read, + .write = debugfs_bist_cnt_v3_hw_write, + .llseek = seq_lseek, + .release = single_release, + .owner = THIS_MODULE, +}; + static const struct { int value; char *name; @@ -4613,8 +4661,8 @@ static void debugfs_bist_init_v3_hw(struct hisi_hba *hisi_hba) debugfs_create_file("phy_id", 0600, hisi_hba->debugfs_bist_dentry, hisi_hba, &debugfs_bist_phy_v3_hw_fops);
- debugfs_create_u32("cnt", 0600, hisi_hba->debugfs_bist_dentry, - &hisi_hba->debugfs_bist_cnt); + debugfs_create_file("cnt", 0600, hisi_hba->debugfs_bist_dentry, + hisi_hba, &debugfs_bist_cnt_v3_hw_ops);
debugfs_create_file("loopback_mode", 0600, hisi_hba->debugfs_bist_dentry,
From: Dust Li dust.li@linux.alibaba.com
[ Upstream commit 6bf536eb5c8ca011d1ff57b5c5f7c57ceac06a37 ]
rmbe_update_limit is used to limit announcing receive window updating too frequently. RFC7609 request a minimal increase in the window size of 10% of the receive buffer space. But current implementation used:
min_t(int, rmbe_size / 10, SOCK_MIN_SNDBUF / 2)
and SOCK_MIN_SNDBUF / 2 == 2304 Bytes, which is almost always less then 10% of the receive buffer space.
This causes the receiver always sending CDC message to update its consumer cursor when it consumes more then 2K of data. And as a result, we may encounter something like "TCP silly window syndrome" when sending 2.5~8K message.
This patch fixes this using max(rmbe_size / 10, SOCK_MIN_SNDBUF / 2).
With this patch and SMC autocorking enabled, qperf 2K/4K/8K tcp_bw test shows 45%/75%/40% increase in throughput respectively.
Signed-off-by: Dust Li dust.li@linux.alibaba.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/smc/smc_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index be7d704976ff..f40f6ed0fbdb 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -1989,7 +1989,7 @@ static struct smc_buf_desc *smc_buf_get_slot(int compressed_bufsize, */ static inline int smc_rmb_wnd_update_limit(int rmbe_size) { - return min_t(int, rmbe_size / 10, SOCK_MIN_SNDBUF / 2); + return max_t(int, rmbe_size / 10, SOCK_MIN_SNDBUF / 2); }
/* map an rmb buf to a link */
From: Hangyu Hua hbh25y@gmail.com
[ Upstream commit 4a0a1436053b17e50b7c88858fb0824326641793 ]
of_node_put(np) needs to be called when pdev == NULL.
Signed-off-by: Hangyu Hua hbh25y@gmail.com Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/mips/ralink/ill_acc.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/mips/ralink/ill_acc.c b/arch/mips/ralink/ill_acc.c index 115a69fc20ca..f395ae218470 100644 --- a/arch/mips/ralink/ill_acc.c +++ b/arch/mips/ralink/ill_acc.c @@ -61,6 +61,7 @@ static int __init ill_acc_of_setup(void) pdev = of_find_device_by_node(np); if (!pdev) { pr_err("%pOFn: failed to lookup pdev\n", np); + of_node_put(np); return -EINVAL; }
From: Mateusz Palczewski mateusz.palczewski@intel.com
[ Upstream commit bae569d01a1f4929ce28093be80bbbbacbf1b127 ]
Several functions in the iAVF core files take status values of the enum iavf_status and convert them into integer values. This leads to confusion as functions return both Linux errno values and status codes intermixed. Reporting status codes as if they were "errno" values can lead to confusion when reviewing error logs. Additionally, it can lead to unexpected behavior if a return value is not interpreted properly.
Fix this by introducing iavf_status_to_errno, a switch that explicitly converts from the status codes into an appropriate error value. Also introduce a virtchnl_status_to_errno function for the one case where we were returning both virtchnl status codes and iavf_status codes in the same function.
Signed-off-by: Jacob Keller jacob.e.keller@intel.com Signed-off-by: Mateusz Palczewski mateusz.palczewski@intel.com Tested-by: Konrad Jankowski konrad0.jankowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/iavf/iavf.h | 5 +- drivers/net/ethernet/intel/iavf/iavf_main.c | 173 +++++++++++++++--- .../net/ethernet/intel/iavf/iavf_virtchnl.c | 18 +- 3 files changed, 157 insertions(+), 39 deletions(-)
diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h index 4babe4705a55..358a9b3031d5 100644 --- a/drivers/net/ethernet/intel/iavf/iavf.h +++ b/drivers/net/ethernet/intel/iavf/iavf.h @@ -44,6 +44,9 @@ #define DEFAULT_DEBUG_LEVEL_SHIFT 3 #define PFX "iavf: "
+int iavf_status_to_errno(enum iavf_status status); +int virtchnl_status_to_errno(enum virtchnl_status_code v_status); + /* VSI state flags shared with common code */ enum iavf_vsi_state_t { __IAVF_VSI_DOWN, @@ -515,7 +518,7 @@ void iavf_add_vlans(struct iavf_adapter *adapter); void iavf_del_vlans(struct iavf_adapter *adapter); void iavf_set_promiscuous(struct iavf_adapter *adapter, int flags); void iavf_request_stats(struct iavf_adapter *adapter); -void iavf_request_reset(struct iavf_adapter *adapter); +int iavf_request_reset(struct iavf_adapter *adapter); void iavf_get_hena(struct iavf_adapter *adapter); void iavf_set_hena(struct iavf_adapter *adapter); void iavf_set_rss_key(struct iavf_adapter *adapter); diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index 0e178a0a59c5..d10e9a8e8011 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -51,6 +51,113 @@ MODULE_LICENSE("GPL v2"); static const struct net_device_ops iavf_netdev_ops; struct workqueue_struct *iavf_wq;
+int iavf_status_to_errno(enum iavf_status status) +{ + switch (status) { + case IAVF_SUCCESS: + return 0; + case IAVF_ERR_PARAM: + case IAVF_ERR_MAC_TYPE: + case IAVF_ERR_INVALID_MAC_ADDR: + case IAVF_ERR_INVALID_LINK_SETTINGS: + case IAVF_ERR_INVALID_PD_ID: + case IAVF_ERR_INVALID_QP_ID: + case IAVF_ERR_INVALID_CQ_ID: + case IAVF_ERR_INVALID_CEQ_ID: + case IAVF_ERR_INVALID_AEQ_ID: + case IAVF_ERR_INVALID_SIZE: + case IAVF_ERR_INVALID_ARP_INDEX: + case IAVF_ERR_INVALID_FPM_FUNC_ID: + case IAVF_ERR_QP_INVALID_MSG_SIZE: + case IAVF_ERR_INVALID_FRAG_COUNT: + case IAVF_ERR_INVALID_ALIGNMENT: + case IAVF_ERR_INVALID_PUSH_PAGE_INDEX: + case IAVF_ERR_INVALID_IMM_DATA_SIZE: + case IAVF_ERR_INVALID_VF_ID: + case IAVF_ERR_INVALID_HMCFN_ID: + case IAVF_ERR_INVALID_PBLE_INDEX: + case IAVF_ERR_INVALID_SD_INDEX: + case IAVF_ERR_INVALID_PAGE_DESC_INDEX: + case IAVF_ERR_INVALID_SD_TYPE: + case IAVF_ERR_INVALID_HMC_OBJ_INDEX: + case IAVF_ERR_INVALID_HMC_OBJ_COUNT: + case IAVF_ERR_INVALID_SRQ_ARM_LIMIT: + return -EINVAL; + case IAVF_ERR_NVM: + case IAVF_ERR_NVM_CHECKSUM: + case IAVF_ERR_PHY: + case IAVF_ERR_CONFIG: + case IAVF_ERR_UNKNOWN_PHY: + case IAVF_ERR_LINK_SETUP: + case IAVF_ERR_ADAPTER_STOPPED: + case IAVF_ERR_MASTER_REQUESTS_PENDING: + case IAVF_ERR_AUTONEG_NOT_COMPLETE: + case IAVF_ERR_RESET_FAILED: + case IAVF_ERR_BAD_PTR: + case IAVF_ERR_SWFW_SYNC: + case IAVF_ERR_QP_TOOMANY_WRS_POSTED: + case IAVF_ERR_QUEUE_EMPTY: + case IAVF_ERR_FLUSHED_QUEUE: + case IAVF_ERR_OPCODE_MISMATCH: + case IAVF_ERR_CQP_COMPL_ERROR: + case IAVF_ERR_BACKING_PAGE_ERROR: + case IAVF_ERR_NO_PBLCHUNKS_AVAILABLE: + case IAVF_ERR_MEMCPY_FAILED: + case IAVF_ERR_SRQ_ENABLED: + case IAVF_ERR_ADMIN_QUEUE_ERROR: + case IAVF_ERR_ADMIN_QUEUE_FULL: + case IAVF_ERR_BAD_IWARP_CQE: + case IAVF_ERR_NVM_BLANK_MODE: + case IAVF_ERR_PE_DOORBELL_NOT_ENABLED: + case IAVF_ERR_DIAG_TEST_FAILED: + case IAVF_ERR_FIRMWARE_API_VERSION: + case IAVF_ERR_ADMIN_QUEUE_CRITICAL_ERROR: + return -EIO; + case IAVF_ERR_DEVICE_NOT_SUPPORTED: + return -ENODEV; + case IAVF_ERR_NO_AVAILABLE_VSI: + case IAVF_ERR_RING_FULL: + return -ENOSPC; + case IAVF_ERR_NO_MEMORY: + return -ENOMEM; + case IAVF_ERR_TIMEOUT: + case IAVF_ERR_ADMIN_QUEUE_TIMEOUT: + return -ETIMEDOUT; + case IAVF_ERR_NOT_IMPLEMENTED: + case IAVF_NOT_SUPPORTED: + return -EOPNOTSUPP; + case IAVF_ERR_ADMIN_QUEUE_NO_WORK: + return -EALREADY; + case IAVF_ERR_NOT_READY: + return -EBUSY; + case IAVF_ERR_BUF_TOO_SHORT: + return -EMSGSIZE; + } + + return -EIO; +} + +int virtchnl_status_to_errno(enum virtchnl_status_code v_status) +{ + switch (v_status) { + case VIRTCHNL_STATUS_SUCCESS: + return 0; + case VIRTCHNL_STATUS_ERR_PARAM: + case VIRTCHNL_STATUS_ERR_INVALID_VF_ID: + return -EINVAL; + case VIRTCHNL_STATUS_ERR_NO_MEMORY: + return -ENOMEM; + case VIRTCHNL_STATUS_ERR_OPCODE_MISMATCH: + case VIRTCHNL_STATUS_ERR_CQP_COMPL_ERROR: + case VIRTCHNL_STATUS_ERR_ADMIN_QUEUE_ERROR: + return -EIO; + case VIRTCHNL_STATUS_ERR_NOT_SUPPORTED: + return -EOPNOTSUPP; + } + + return -EIO; +} + /** * iavf_pdev_to_adapter - go from pci_dev to adapter * @pdev: pci_dev pointer @@ -1421,7 +1528,7 @@ static int iavf_config_rss_aq(struct iavf_adapter *adapter) struct iavf_aqc_get_set_rss_key_data *rss_key = (struct iavf_aqc_get_set_rss_key_data *)adapter->rss_key; struct iavf_hw *hw = &adapter->hw; - int ret = 0; + enum iavf_status status;
if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) { /* bail because we already have a command pending */ @@ -1430,24 +1537,25 @@ static int iavf_config_rss_aq(struct iavf_adapter *adapter) return -EBUSY; }
- ret = iavf_aq_set_rss_key(hw, adapter->vsi.id, rss_key); - if (ret) { + status = iavf_aq_set_rss_key(hw, adapter->vsi.id, rss_key); + if (status) { dev_err(&adapter->pdev->dev, "Cannot set RSS key, err %s aq_err %s\n", - iavf_stat_str(hw, ret), + iavf_stat_str(hw, status), iavf_aq_str(hw, hw->aq.asq_last_status)); - return ret; + return iavf_status_to_errno(status);
}
- ret = iavf_aq_set_rss_lut(hw, adapter->vsi.id, false, - adapter->rss_lut, adapter->rss_lut_size); - if (ret) { + status = iavf_aq_set_rss_lut(hw, adapter->vsi.id, false, + adapter->rss_lut, adapter->rss_lut_size); + if (status) { dev_err(&adapter->pdev->dev, "Cannot set RSS lut, err %s aq_err %s\n", - iavf_stat_str(hw, ret), + iavf_stat_str(hw, status), iavf_aq_str(hw, hw->aq.asq_last_status)); + return iavf_status_to_errno(status); }
- return ret; + return 0;
}
@@ -2003,23 +2111,24 @@ static void iavf_startup(struct iavf_adapter *adapter) { struct pci_dev *pdev = adapter->pdev; struct iavf_hw *hw = &adapter->hw; - int err; + enum iavf_status status; + int ret;
WARN_ON(adapter->state != __IAVF_STARTUP);
/* driver loaded, probe complete */ adapter->flags &= ~IAVF_FLAG_PF_COMMS_FAILED; adapter->flags &= ~IAVF_FLAG_RESET_PENDING; - err = iavf_set_mac_type(hw); - if (err) { - dev_err(&pdev->dev, "Failed to set MAC type (%d)\n", err); + status = iavf_set_mac_type(hw); + if (status) { + dev_err(&pdev->dev, "Failed to set MAC type (%d)\n", status); goto err; }
- err = iavf_check_reset_complete(hw); - if (err) { + ret = iavf_check_reset_complete(hw); + if (ret) { dev_info(&pdev->dev, "Device is still in reset (%d), retrying\n", - err); + ret); goto err; } hw->aq.num_arq_entries = IAVF_AQ_LEN; @@ -2027,14 +2136,15 @@ static void iavf_startup(struct iavf_adapter *adapter) hw->aq.arq_buf_size = IAVF_MAX_AQ_BUF_SIZE; hw->aq.asq_buf_size = IAVF_MAX_AQ_BUF_SIZE;
- err = iavf_init_adminq(hw); - if (err) { - dev_err(&pdev->dev, "Failed to init Admin Queue (%d)\n", err); + status = iavf_init_adminq(hw); + if (status) { + dev_err(&pdev->dev, "Failed to init Admin Queue (%d)\n", + status); goto err; } - err = iavf_send_api_ver(adapter); - if (err) { - dev_err(&pdev->dev, "Unable to send to PF (%d)\n", err); + ret = iavf_send_api_ver(adapter); + if (ret) { + dev_err(&pdev->dev, "Unable to send to PF (%d)\n", ret); iavf_shutdown_adminq(hw); goto err; } @@ -2070,7 +2180,7 @@ static void iavf_init_version_check(struct iavf_adapter *adapter) /* aq msg sent, awaiting reply */ err = iavf_verify_api_ver(adapter); if (err) { - if (err == IAVF_ERR_ADMIN_QUEUE_NO_WORK) + if (err == -EALREADY) err = iavf_send_api_ver(adapter); else dev_err(&pdev->dev, "Unsupported PF API version %d.%d, expected %d.%d\n", @@ -2171,11 +2281,11 @@ static void iavf_init_get_resources(struct iavf_adapter *adapter) } } err = iavf_get_vf_config(adapter); - if (err == IAVF_ERR_ADMIN_QUEUE_NO_WORK) { + if (err == -EALREADY) { err = iavf_send_vf_config_msg(adapter); goto err_alloc; - } else if (err == IAVF_ERR_PARAM) { - /* We only get ERR_PARAM if the device is in a very bad + } else if (err == -EINVAL) { + /* We only get -EINVAL if the device is in a very bad * state or if we've been disabled for previous bad * behavior. Either way, we're done now. */ @@ -2626,6 +2736,7 @@ static void iavf_reset_task(struct work_struct *work) struct iavf_hw *hw = &adapter->hw; struct iavf_mac_filter *f, *ftmp; struct iavf_cloud_filter *cf; + enum iavf_status status; u32 reg_val; int i = 0, err; bool running; @@ -2727,10 +2838,12 @@ static void iavf_reset_task(struct work_struct *work) /* kill and reinit the admin queue */ iavf_shutdown_adminq(hw); adapter->current_op = VIRTCHNL_OP_UNKNOWN; - err = iavf_init_adminq(hw); - if (err) + status = iavf_init_adminq(hw); + if (status) { dev_info(&adapter->pdev->dev, "Failed to init adminq: %d\n", - err); + status); + goto reset_err; + } adapter->aq_required = 0;
if ((adapter->flags & IAVF_FLAG_REINIT_MSIX_NEEDED) || diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c index 5263cefe46f5..b8c5837f8b50 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c +++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c @@ -22,17 +22,17 @@ static int iavf_send_pf_msg(struct iavf_adapter *adapter, enum virtchnl_ops op, u8 *msg, u16 len) { struct iavf_hw *hw = &adapter->hw; - enum iavf_status err; + enum iavf_status status;
if (adapter->flags & IAVF_FLAG_PF_COMMS_FAILED) return 0; /* nothing to see here, move along */
- err = iavf_aq_send_msg_to_pf(hw, op, 0, msg, len, NULL); - if (err) - dev_dbg(&adapter->pdev->dev, "Unable to send opcode %d to PF, err %s, aq_err %s\n", - op, iavf_stat_str(hw, err), + status = iavf_aq_send_msg_to_pf(hw, op, 0, msg, len, NULL); + if (status) + dev_dbg(&adapter->pdev->dev, "Unable to send opcode %d to PF, status %s, aq_err %s\n", + op, iavf_stat_str(hw, status), iavf_aq_str(hw, hw->aq.asq_last_status)); - return err; + return iavf_status_to_errno(status); }
/** @@ -1827,11 +1827,13 @@ void iavf_del_adv_rss_cfg(struct iavf_adapter *adapter) * * Request that the PF reset this VF. No response is expected. **/ -void iavf_request_reset(struct iavf_adapter *adapter) +int iavf_request_reset(struct iavf_adapter *adapter) { + int err; /* Don't check CURRENT_OP - this is always higher priority */ - iavf_send_pf_msg(adapter, VIRTCHNL_OP_RESET_VF, NULL, 0); + err = iavf_send_pf_msg(adapter, VIRTCHNL_OP_RESET_VF, NULL, 0); adapter->current_op = VIRTCHNL_OP_UNKNOWN; + return err; }
/**
From: Sven Eckelmann sven@narfation.org
[ Upstream commit a02192151b7dbf855084c38dca380d77c7658353 ]
Assign rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is added to rtnetlink messages. This fixes iproute2 which otherwise resolved the link interface to an interface in the wrong namespace.
Test commands:
ip netns add nst ip link add dummy0 type dummy ip link add link macvtap0 link dummy0 type macvtap ip link set macvtap0 netns nst ip -netns nst link show macvtap0
Before:
10: macvtap0@gre0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500 link/ether 5e:8f:ae:1d:60:50 brd ff:ff:ff:ff:ff:ff
After:
10: macvtap0@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500 link/ether 5e:8f:ae:1d:60:50 brd ff:ff:ff:ff:ff:ff link-netnsid 0
Reported-by: Leonardo Mörlein freifunk@irrelefant.net Signed-off-by: Sven Eckelmann sven@narfation.org Link: https://lore.kernel.org/r/20220228003240.1337426-1-sven@narfation.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/macvtap.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c index 6b12902a803f..cecf8c63096c 100644 --- a/drivers/net/macvtap.c +++ b/drivers/net/macvtap.c @@ -133,11 +133,17 @@ static void macvtap_setup(struct net_device *dev) dev->tx_queue_len = TUN_READQ_SIZE; }
+static struct net *macvtap_link_net(const struct net_device *dev) +{ + return dev_net(macvlan_dev_real_dev(dev)); +} + static struct rtnl_link_ops macvtap_link_ops __read_mostly = { .kind = "macvtap", .setup = macvtap_setup, .newlink = macvtap_newlink, .dellink = macvtap_dellink, + .get_link_net = macvtap_link_net, .priv_size = sizeof(struct macvtap_dev), };
From: Mark Pearson markpearson@lenovo.com
[ Upstream commit bf779aaf56ea23864e39e9862b3b3a8436236e07 ]
Instead of having quirks for systems that have a second fan it would be nice to detect this setup. Unfortunately, confirmed by the Lenovo FW team, there is no way to retrieve this information from the EC or BIOS. Recommendation was to attempt to read the fan and if successful then assume a 2nd fan is present.
The fans are also supposed to spin up on boot for some time, so in theory we could check for a speed > 0. In testing this seems to hold true but as I couldn't test on all platforms I've avoided implementing this. It also breaks for the corner case where you load the module once the fans are idle.
Tested on P1G4, P1G3, X1C9 and T14 (no fans) and it works correctly. For the platforms with dual fans where it was confirmed to work I have removed the quirks. Potentially this could be done for all platforms but I've left untested platforms in for now. On these platforms the fans will be enabled and then detected - so no impact.
Signed-off-by: Mark Pearson markpearson@lenovo.com Link: https://lore.kernel.org/r/20220222185137.4325-1-markpearson@lenovo.com Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/thinkpad_acpi.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c index 3424b080db77..3fb8cda31eb9 100644 --- a/drivers/platform/x86/thinkpad_acpi.c +++ b/drivers/platform/x86/thinkpad_acpi.c @@ -8699,10 +8699,7 @@ static const struct tpacpi_quirk fan_quirk_table[] __initconst = { TPACPI_Q_LNV3('N', '2', 'N', TPACPI_FAN_2CTL), /* P53 / P73 */ TPACPI_Q_LNV3('N', '2', 'E', TPACPI_FAN_2CTL), /* P1 / X1 Extreme (1st gen) */ TPACPI_Q_LNV3('N', '2', 'O', TPACPI_FAN_2CTL), /* P1 / X1 Extreme (2nd gen) */ - TPACPI_Q_LNV3('N', '2', 'V', TPACPI_FAN_2CTL), /* P1 / X1 Extreme (3nd gen) */ - TPACPI_Q_LNV3('N', '4', '0', TPACPI_FAN_2CTL), /* P1 / X1 Extreme (4nd gen) */ TPACPI_Q_LNV3('N', '3', '0', TPACPI_FAN_2CTL), /* P15 (1st gen) / P15v (1st gen) */ - TPACPI_Q_LNV3('N', '3', '2', TPACPI_FAN_2CTL), /* X1 Carbon (9th gen) */ TPACPI_Q_LNV3('N', '3', '7', TPACPI_FAN_2CTL), /* T15g (2nd gen) */ TPACPI_Q_LNV3('N', '1', 'O', TPACPI_FAN_NOFAN), /* X1 Tablet (2nd gen) */ }; @@ -8746,6 +8743,9 @@ static int __init fan_init(struct ibm_init_struct *iibm) * ThinkPad ECs supports the fan control register */ if (likely(acpi_ec_read(fan_status_offset, &fan_control_initial_status))) { + int res; + unsigned int speed; + fan_status_access_mode = TPACPI_FAN_RD_TPEC; if (quirks & TPACPI_FAN_Q1) fan_quirk1_setup(); @@ -8758,6 +8758,15 @@ static int __init fan_init(struct ibm_init_struct *iibm) tp_features.second_fan_ctl = 1; pr_info("secondary fan control enabled\n"); } + /* Try and probe the 2nd fan */ + res = fan2_get_speed(&speed); + if (res >= 0) { + /* It responded - so let's assume it's there */ + tp_features.second_fan = 1; + tp_features.second_fan_ctl = 1; + pr_info("secondary fan control detected & enabled\n"); + } + } else { pr_err("ThinkPad ACPI EC access misbehaving, fan status and control unavailable\n"); return -ENODEV;
From: Harold Huang baymaxhuang@gmail.com
[ Upstream commit 74a335a07a17d131b9263bfdbdcb5e40673ca9ca ]
In patch [1], tun_msg_ctl was added to allow pass batched xdp buffers to tun_sendmsg. Although we donot use msg_controllen in this path, we should check msg_controllen to make sure the caller pass a valid msg_ctl.
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
Reported-by: Eric Dumazet eric.dumazet@gmail.com Suggested-by: Jason Wang jasowang@redhat.com Signed-off-by: Harold Huang baymaxhuang@gmail.com Acked-by: Jason Wang jasowang@redhat.com Link: https://lore.kernel.org/r/20220303022441.383865-1-baymaxhuang@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/tap.c | 3 ++- drivers/net/tun.c | 3 ++- drivers/vhost/net.c | 1 + 3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 8e3a28ba6b28..ba2ef5437e16 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -1198,7 +1198,8 @@ static int tap_sendmsg(struct socket *sock, struct msghdr *m, struct xdp_buff *xdp; int i;
- if (ctl && (ctl->type == TUN_MSG_PTR)) { + if (m->msg_controllen == sizeof(struct tun_msg_ctl) && + ctl && ctl->type == TUN_MSG_PTR) { for (i = 0; i < ctl->num; i++) { xdp = &((struct xdp_buff *)ctl->ptr)[i]; tap_get_user_xdp(q, xdp); diff --git a/drivers/net/tun.c b/drivers/net/tun.c index fed85447701a..de999e0fedbc 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -2489,7 +2489,8 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) if (!tun) return -EBADFD;
- if (ctl && (ctl->type == TUN_MSG_PTR)) { + if (m->msg_controllen == sizeof(struct tun_msg_ctl) && + ctl && ctl->type == TUN_MSG_PTR) { struct tun_page tpage; int n = ctl->num; int flush = 0; diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 28ef323882fb..792ab5f23647 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -473,6 +473,7 @@ static void vhost_tx_batch(struct vhost_net *net, goto signal_used;
msghdr->msg_control = &ctl; + msghdr->msg_controllen = sizeof(ctl); err = sock->ops->sendmsg(sock, msghdr, 0); if (unlikely(err < 0)) { vq_err(&nvq->vq, "Fail to batch sending packets\n");
From: Sean Wang sean.wang@mediatek.com
[ Upstream commit e4412654e260842e1a94ffe0d4026e8a6fd34246 ]
There is a conflict between MediaTek wmt event and msft vendor extension logic in the core layer since 145373cb1b1f ("Bluetooth: Add framework for Microsoft vendor extension") was introduced because we changed the type of mediatek wmt event to the type of msft vendor event in the driver.
But the purpose we reported mediatek event to the core layer is for the diagnostic purpose with that we are able to see the full packet trace via monitoring socket with btmon. Thus, it is harmless we keep the original type of mediatek vendor event here to avoid breaking the msft extension function especially they can be supported by Mediatek chipset like MT7921 , MT7922 devices and future devices.
Signed-off-by: Sean Wang sean.wang@mediatek.com Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btmtk.h | 1 + drivers/bluetooth/btmtksdio.c | 9 +-------- drivers/bluetooth/btusb.c | 8 -------- 3 files changed, 2 insertions(+), 16 deletions(-)
diff --git a/drivers/bluetooth/btmtk.h b/drivers/bluetooth/btmtk.h index 6e7b0c7567c0..0defa68bc2ce 100644 --- a/drivers/bluetooth/btmtk.h +++ b/drivers/bluetooth/btmtk.h @@ -5,6 +5,7 @@ #define FIRMWARE_MT7668 "mediatek/mt7668pr2h.bin" #define FIRMWARE_MT7961 "mediatek/BT_RAM_CODE_MT7961_1_2_hdr.bin"
+#define HCI_EV_WMT 0xe4 #define HCI_WMT_MAX_EVENT_SIZE 64
#define BTMTK_WMT_REG_READ 0x2 diff --git a/drivers/bluetooth/btmtksdio.c b/drivers/bluetooth/btmtksdio.c index 9b868f187316..ecf29cfa7d79 100644 --- a/drivers/bluetooth/btmtksdio.c +++ b/drivers/bluetooth/btmtksdio.c @@ -370,13 +370,6 @@ static int btmtksdio_recv_event(struct hci_dev *hdev, struct sk_buff *skb) struct hci_event_hdr *hdr = (void *)skb->data; int err;
- /* Fix up the vendor event id with 0xff for vendor specific instead - * of 0xe4 so that event send via monitoring socket can be parsed - * properly. - */ - if (hdr->evt == 0xe4) - hdr->evt = HCI_EV_VENDOR; - /* When someone waits for the WMT event, the skb is being cloned * and being processed the events from there then. */ @@ -392,7 +385,7 @@ static int btmtksdio_recv_event(struct hci_dev *hdev, struct sk_buff *skb) if (err < 0) goto err_free_skb;
- if (hdr->evt == HCI_EV_VENDOR) { + if (hdr->evt == HCI_EV_WMT) { if (test_and_clear_bit(BTMTKSDIO_TX_WAIT_VND_EVT, &bdev->tx_state)) { /* Barrier to sync with other CPUs */ diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index 2afbd87d77c9..42234d5f602d 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -2254,7 +2254,6 @@ static void btusb_mtk_wmt_recv(struct urb *urb) { struct hci_dev *hdev = urb->context; struct btusb_data *data = hci_get_drvdata(hdev); - struct hci_event_hdr *hdr; struct sk_buff *skb; int err;
@@ -2274,13 +2273,6 @@ static void btusb_mtk_wmt_recv(struct urb *urb) hci_skb_pkt_type(skb) = HCI_EVENT_PKT; skb_put_data(skb, urb->transfer_buffer, urb->actual_length);
- hdr = (void *)skb->data; - /* Fix up the vendor event id with 0xff for vendor specific - * instead of 0xe4 so that event send via monitoring socket can - * be parsed properly. - */ - hdr->evt = 0xff; - /* When someone waits for the WMT event, the skb is being cloned * and being processed the events from there then. */
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 9b392e0e0b6d026da5a62bb79a08f32e27af858e ]
This fixes attemting to print hdev->name directly which causes them to print an error:
kernel: read_version:367: (efault): sock 000000006a3008f2
Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/bluetooth/bluetooth.h | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h index a647e5fabdbd..2aa5e95808a5 100644 --- a/include/net/bluetooth/bluetooth.h +++ b/include/net/bluetooth/bluetooth.h @@ -204,19 +204,21 @@ void bt_err_ratelimited(const char *fmt, ...); #define BT_DBG(fmt, ...) pr_debug(fmt "\n", ##__VA_ARGS__) #endif
+#define bt_dev_name(hdev) ((hdev) ? (hdev)->name : "null") + #define bt_dev_info(hdev, fmt, ...) \ - BT_INFO("%s: " fmt, (hdev)->name, ##__VA_ARGS__) + BT_INFO("%s: " fmt, bt_dev_name(hdev), ##__VA_ARGS__) #define bt_dev_warn(hdev, fmt, ...) \ - BT_WARN("%s: " fmt, (hdev)->name, ##__VA_ARGS__) + BT_WARN("%s: " fmt, bt_dev_name(hdev), ##__VA_ARGS__) #define bt_dev_err(hdev, fmt, ...) \ - BT_ERR("%s: " fmt, (hdev)->name, ##__VA_ARGS__) + BT_ERR("%s: " fmt, bt_dev_name(hdev), ##__VA_ARGS__) #define bt_dev_dbg(hdev, fmt, ...) \ - BT_DBG("%s: " fmt, (hdev)->name, ##__VA_ARGS__) + BT_DBG("%s: " fmt, bt_dev_name(hdev), ##__VA_ARGS__)
#define bt_dev_warn_ratelimited(hdev, fmt, ...) \ - bt_warn_ratelimited("%s: " fmt, (hdev)->name, ##__VA_ARGS__) + bt_warn_ratelimited("%s: " fmt, bt_dev_name(hdev), ##__VA_ARGS__) #define bt_dev_err_ratelimited(hdev, fmt, ...) \ - bt_err_ratelimited("%s: " fmt, (hdev)->name, ##__VA_ARGS__) + bt_err_ratelimited("%s: " fmt, bt_dev_name(hdev), ##__VA_ARGS__)
/* Connection and socket states */ enum {
From: Minghao Chi (CGEL ZTE) chi.minghao@zte.com.cn
[ Upstream commit d3715b2333e9a21692ba16ef8645eda584a9515d ]
Use memset to initialize structs to prevent memory leaks in l2cap_ecred_connect
Reported-by: Zeal Robot zealci@zte.com.cn Signed-off-by: Minghao Chi (CGEL ZTE) chi.minghao@zte.com.cn Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/l2cap_core.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index e817ff0607a0..8df99c07f272 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -1436,6 +1436,7 @@ static void l2cap_ecred_connect(struct l2cap_chan *chan)
l2cap_ecred_init(chan, 0);
+ memset(&data, 0, sizeof(data)); data.pdu.req.psm = chan->psm; data.pdu.req.mtu = cpu_to_le16(chan->imtu); data.pdu.req.mps = cpu_to_le16(chan->mps);
From: Michael Chan michael.chan@broadcom.com
[ Upstream commit 7c492a2530c1f05441da541307c2534230dfd59b ]
If the flow control settings have been changed, a subsequent FW reset may cause the ethernet link to toggle unnecessarily. This link toggle will increase the down time by a few seconds.
The problem is caused by bnxt_update_phy_setting() detecting a false mismatch in the flow control settings between the stored software settings and the current FW settings after the FW reset. This mismatch is caused by the AUTONEG bit added to link_info->req_flow_ctrl in an inconsistent way in bnxt_set_pauseparam() in autoneg mode. The AUTONEG bit should not be added to link_info->req_flow_ctrl.
Reviewed-by: Colin Winegarden colin.winegarden@broadcom.com Reviewed-by: Pavan Chebbi pavan.chebbi@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c index 8aaa2335f848..f09b04556c32 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c @@ -2101,9 +2101,7 @@ static int bnxt_set_pauseparam(struct net_device *dev, }
link_info->autoneg |= BNXT_AUTONEG_FLOW_CTRL; - if (bp->hwrm_spec_code >= 0x10201) - link_info->req_flow_ctrl = - PORT_PHY_CFG_REQ_AUTO_PAUSE_AUTONEG_PAUSE; + link_info->req_flow_ctrl = 0; } else { /* when transition from auto pause to force pause, * force a link change
From: Li Chen lchen@ambarella.com
[ Upstream commit bf8d87c076f55b8b4dfdb6bc6c6b6dc0c2ccb487 ]
Fix a misused goto label jump since that can result in a memory leak.
Link: https://lore.kernel.org/r/17e7b9b9ee6.c6d9c6a02564.4545388417402742326@zohom... Signed-off-by: Li Chen lchen@ambarella.com Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Acked-by: Kishon Vijay Abraham I kishon@ti.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/endpoint/functions/pci-epf-test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pci/endpoint/functions/pci-epf-test.c b/drivers/pci/endpoint/functions/pci-epf-test.c index c7e45633beaf..5b833f00e980 100644 --- a/drivers/pci/endpoint/functions/pci-epf-test.c +++ b/drivers/pci/endpoint/functions/pci-epf-test.c @@ -451,7 +451,7 @@ static int pci_epf_test_write(struct pci_epf_test *epf_test) if (!epf_test->dma_supported) { dev_err(dev, "Cannot transfer data using DMA\n"); ret = -EINVAL; - goto err_map_addr; + goto err_dma_map; }
src_phys_addr = dma_map_single(dma_dev, buf, reg->size,
From: Alexander Lobakin alobakin@pm.me
[ Upstream commit d17b66417308996e7e64b270a3c7f3c1fbd4cfc8 ]
With KCFLAGS="-O3", I was able to trigger a fortify-source memcpy() overflow panic on set_vi_srs_handler(). Although O3 level is not supported in the mainline, under some conditions that may've happened with any optimization settings, it's just a matter of inlining luck. The panic itself is correct, more precisely, 50/50 false-positive and not at the same time.
From the one side, no real overflow happens. Exception handler
defined in asm just gets copied to some reserved places in the memory. But the reason behind is that C code refers to that exception handler declares it as `char`, i.e. something of 1 byte length. It's obvious that the asm function itself is way more than 1 byte, so fortify logics thought we are going to past the symbol declared. The standard way to refer to asm symbols from C code which is not supposed to be called from C is to declare them as `extern const u8[]`. This is fully correct from any point of view, as any code itself is just a bunch of bytes (including 0 as it is for syms like _stext/_etext/etc.), and the exact size is not known at the moment of compilation. Adjust the type of the except_vec_vi_*() and related variables. Make set_handler() take `const` as a second argument to avoid cast-away warnings and give a little more room for optimization.
Signed-off-by: Alexander Lobakin alobakin@pm.me Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/mips/include/asm/setup.h | 2 +- arch/mips/kernel/traps.c | 22 +++++++++++----------- 2 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/arch/mips/include/asm/setup.h b/arch/mips/include/asm/setup.h index bb36a400203d..8c56b862fd9c 100644 --- a/arch/mips/include/asm/setup.h +++ b/arch/mips/include/asm/setup.h @@ -16,7 +16,7 @@ static inline void setup_8250_early_printk_port(unsigned long base, unsigned int reg_shift, unsigned int timeout) {} #endif
-extern void set_handler(unsigned long offset, void *addr, unsigned long len); +void set_handler(unsigned long offset, const void *addr, unsigned long len); extern void set_uncached_handler(unsigned long offset, void *addr, unsigned long len);
typedef void (*vi_handler_t)(void); diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c index a486486b2355..246c6a6b0261 100644 --- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -2091,19 +2091,19 @@ static void *set_vi_srs_handler(int n, vi_handler_t addr, int srs) * If no shadow set is selected then use the default handler * that does normal register saving and standard interrupt exit */ - extern char except_vec_vi, except_vec_vi_lui; - extern char except_vec_vi_ori, except_vec_vi_end; - extern char rollback_except_vec_vi; - char *vec_start = using_rollback_handler() ? - &rollback_except_vec_vi : &except_vec_vi; + extern const u8 except_vec_vi[], except_vec_vi_lui[]; + extern const u8 except_vec_vi_ori[], except_vec_vi_end[]; + extern const u8 rollback_except_vec_vi[]; + const u8 *vec_start = using_rollback_handler() ? + rollback_except_vec_vi : except_vec_vi; #if defined(CONFIG_CPU_MICROMIPS) || defined(CONFIG_CPU_BIG_ENDIAN) - const int lui_offset = &except_vec_vi_lui - vec_start + 2; - const int ori_offset = &except_vec_vi_ori - vec_start + 2; + const int lui_offset = except_vec_vi_lui - vec_start + 2; + const int ori_offset = except_vec_vi_ori - vec_start + 2; #else - const int lui_offset = &except_vec_vi_lui - vec_start; - const int ori_offset = &except_vec_vi_ori - vec_start; + const int lui_offset = except_vec_vi_lui - vec_start; + const int ori_offset = except_vec_vi_ori - vec_start; #endif - const int handler_len = &except_vec_vi_end - vec_start; + const int handler_len = except_vec_vi_end - vec_start;
if (handler_len > VECTORSPACING) { /* @@ -2311,7 +2311,7 @@ void per_cpu_trap_init(bool is_boot_cpu) }
/* Install CPU exception handler */ -void set_handler(unsigned long offset, void *addr, unsigned long size) +void set_handler(unsigned long offset, const void *addr, unsigned long size) { #ifdef CONFIG_CPU_MICROMIPS memcpy((void *)(ebase + offset), ((unsigned char *)addr - 1), size);
From: Michael Ellerman mpe@ellerman.id.au
[ Upstream commit 591b4b268435f00d2f0b81f786c2c7bd5ef66416 ]
Paul reported a warning with DEBUG_ATOMIC_SLEEP=y:
BUG: sleeping function called from invalid context at include/linux/sched/mm.h:256 in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 preempt_count: 0, expected: 0 ... Call Trace: dump_stack_lvl+0xa0/0xec (unreliable) __might_resched+0x2f4/0x310 kmem_cache_alloc+0x220/0x4b0 __pud_alloc+0x74/0x1d0 hash__map_kernel_page+0x2cc/0x390 do_patch_instruction+0x134/0x4a0 arch_jump_label_transform+0x64/0x78 __jump_label_update+0x148/0x180 static_key_enable_cpuslocked+0xd0/0x120 static_key_enable+0x30/0x50 check_kvm_guest+0x60/0x88 pSeries_smp_probe+0x54/0xb0 smp_prepare_cpus+0x3e0/0x430 kernel_init_freeable+0x20c/0x43c kernel_init+0x30/0x1a0 ret_from_kernel_thread+0x5c/0x64
Peter pointed out that this is because do_patch_instruction() has disabled interrupts, but then map_patch_area() calls map_kernel_page() then hash__map_kernel_page() which does a sleeping memory allocation.
We only see the warning in KVM guests with SMT enabled, which is not particularly common, or on other platforms if CONFIG_KPROBES is disabled, also not common. The reason we don't see it in most configurations is that another path that happens to have interrupts enabled has allocated the required page tables for us, eg. there's a path in kprobes init that does that. That's just pure luck though.
As Christophe suggested, the simplest solution is to do a dummy map/unmap when we initialise the patching, so that any required page table levels are pre-allocated before the first call to do_patch_instruction(). This works because the unmap doesn't free any page tables that were allocated by the map, it just clears the PTE, leaving the page table levels there for the next map.
Reported-by: Paul Menzel pmenzel@molgen.mpg.de Debugged-by: Peter Zijlstra peterz@infradead.org Suggested-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220223015821.473097-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/lib/code-patching.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 906d43463366..00c68e7fb11e 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -43,9 +43,14 @@ int raw_patch_instruction(u32 *addr, ppc_inst_t instr) #ifdef CONFIG_STRICT_KERNEL_RWX static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
+static int map_patch_area(void *addr, unsigned long text_poke_addr); +static void unmap_patch_area(unsigned long addr); + static int text_area_cpu_up(unsigned int cpu) { struct vm_struct *area; + unsigned long addr; + int err;
area = get_vm_area(PAGE_SIZE, VM_ALLOC); if (!area) { @@ -53,6 +58,15 @@ static int text_area_cpu_up(unsigned int cpu) cpu); return -1; } + + // Map/unmap the area to ensure all page tables are pre-allocated + addr = (unsigned long)area->addr; + err = map_patch_area(empty_zero_page, addr); + if (err) + return err; + + unmap_patch_area(addr); + this_cpu_write(text_poke_area, area);
return 0;
From: Michael Ellerman mpe@ellerman.id.au
[ Upstream commit 1a76e520ee1831a81dabf8a9a58c6453f700026e ]
Since the IBM A2 CPU support was removed, see commit fb5a515704d7 ("powerpc: Remove platforms/wsp and associated pieces"), the only 64-bit Book3E CPUs we support are Freescale (NXP) ones.
However our Kconfig still allows configurating a kernel that has 64-bit Book3E support, but no Freescale CPU support enabled. Such a kernel would never boot, it doesn't know about any CPUs.
It also causes build errors, as reported by lkp, because PPC_BARRIER_NOSPEC is not enabled in such a configuration:
powerpc64-linux-ld: arch/powerpc/net/bpf_jit_comp64.o:(.toc+0x0): undefined reference to `powerpc_security_features'
To fix this, force PPC_FSL_BOOK3E to be selected whenever we are building a 64-bit Book3E kernel.
Reported-by: kernel test robot lkp@intel.com Reported-by: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com Suggested-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220304061222.2478720-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/platforms/Kconfig.cputype | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 87bc1929ee5a..e2e1fec91c6e 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -107,6 +107,7 @@ config PPC_BOOK3S_64
config PPC_BOOK3E_64 bool "Embedded processors" + select PPC_FSL_BOOK3E select PPC_FPU # Make it a choice ? select PPC_SMP_MUXED_IPI select PPC_DOORBELL @@ -295,7 +296,7 @@ config FSL_BOOKE config PPC_FSL_BOOK3E bool select ARCH_SUPPORTS_HUGETLBFS if PHYS_64BIT || PPC64 - select FSL_EMB_PERFMON + imply FSL_EMB_PERFMON select PPC_SMP_MUXED_IPI select PPC_DOORBELL select PPC_KUEP
From: Hangyu Hua hbh25y@gmail.com
[ Upstream commit d601fd24e6964967f115f036a840f4f28488f63f ]
Refcount leak will happen when format_show returns failure in multiple cases. Unified management of of_node_put can fix this problem.
Signed-off-by: Hangyu Hua hbh25y@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220302021959.10959-1-hbh25y@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/secvar-sysfs.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/kernel/secvar-sysfs.c b/arch/powerpc/kernel/secvar-sysfs.c index a0a78aba2083..1ee4640a2641 100644 --- a/arch/powerpc/kernel/secvar-sysfs.c +++ b/arch/powerpc/kernel/secvar-sysfs.c @@ -26,15 +26,18 @@ static ssize_t format_show(struct kobject *kobj, struct kobj_attribute *attr, const char *format;
node = of_find_compatible_node(NULL, NULL, "ibm,secvar-backend"); - if (!of_device_is_available(node)) - return -ENODEV; + if (!of_device_is_available(node)) { + rc = -ENODEV; + goto out; + }
rc = of_property_read_string(node, "format", &format); if (rc) - return rc; + goto out;
rc = sprintf(buf, "%s\n", format);
+out: of_node_put(node);
return rc;
From: Jianglei Nie niejianglei2021@163.com
[ Upstream commit 271add11994ba1a334859069367e04d2be2ebdd4 ]
fc_exch_release(ep) will decrease the ep's reference count. When the reference count reaches zero, it is freed. But ep is still used in the following code, which will lead to a use after free.
Return after the fc_exch_release() call to avoid use after free.
Link: https://lore.kernel.org/r/20220303015115.459778-1-niejianglei2021@163.com Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Jianglei Nie niejianglei2021@163.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/libfc/fc_exch.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/scsi/libfc/fc_exch.c b/drivers/scsi/libfc/fc_exch.c index 841000445b9a..aa223db4cf53 100644 --- a/drivers/scsi/libfc/fc_exch.c +++ b/drivers/scsi/libfc/fc_exch.c @@ -1701,6 +1701,7 @@ static void fc_exch_abts_resp(struct fc_exch *ep, struct fc_frame *fp) if (cancel_delayed_work_sync(&ep->timeout_work)) { FC_EXCH_DBG(ep, "Exchange timer canceled due to ABTS response\n"); fc_exch_release(ep); /* release from pending timer hold */ + return; }
spin_lock_bh(&ep->ex_lock);
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit 1e8aa2aa1274953e8e595f0630436744597d0d64 ]
The recently added support for Lenovo Yoga Tablet 2 tablets uses symbols from EFI and SPI add "depends on EFI && SPI" to the X86_ANDROID_TABLETS Kconfig entry.
Reported-by: Randy Dunlap rdunlap@infradead.org Signed-off-by: Hans de Goede hdegoede@redhat.com Acked-by: Randy Dunlap rdunlap@infradead.org Tested-by: Randy Dunlap rdunlap@infradead.org Link: https://lore.kernel.org/r/20220308152942.262130-1-hdegoede@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig index 24deeeb29af2..53abd553b842 100644 --- a/drivers/platform/x86/Kconfig +++ b/drivers/platform/x86/Kconfig @@ -1027,7 +1027,7 @@ config TOUCHSCREEN_DMI
config X86_ANDROID_TABLETS tristate "X86 Android tablet support" - depends on I2C && SERIAL_DEV_BUS && ACPI && GPIOLIB + depends on I2C && SPI && SERIAL_DEV_BUS && ACPI && EFI && GPIOLIB help X86 tablets which ship with Android as (part of) the factory image typically have various problems with their DSDTs. The factory kernels
From: Oliver Hartkopp socketcan@hartkopp.net
[ Upstream commit 530e0d46c61314c59ecfdb8d3bcb87edbc0f85d3 ]
The N_As value describes the time a CAN frame needs on the wire when transmitted by the CAN controller. Even very short CAN FD frames need arround 100 usecs (bitrate 1Mbit/s, data bitrate 8Mbit/s).
Having N_As to be zero (the former default) leads to 'no CAN frame separation' when STmin is set to zero by the receiving node. This 'burst mode' should not be enabled by default as it could potentially dump a high number of CAN frames into the netdev queue from the soft hrtimer context. This does not affect the system stability but is just not nice and cooperative.
With this N_As/frame_txtime value the 'burst mode' is disabled by default.
As user space applications usually do not set the frame_txtime element of struct can_isotp_options the new in-kernel default is very likely overwritten with zero when the sockopt() CAN_ISOTP_OPTS is invoked. To make sure that a N_As value of zero is only set intentional the value '0' is now interpreted as 'do not change the current value'. When a frame_txtime of zero is required for testing purposes this CAN_ISOTP_FRAME_TXTIME_ZERO u32 value has to be set in frame_txtime.
Link: https://lore.kernel.org/all/20220309120416.83514-2-socketcan@hartkopp.net Signed-off-by: Oliver Hartkopp socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- include/uapi/linux/can/isotp.h | 28 ++++++++++++++++++++++------ net/can/isotp.c | 12 +++++++++++- 2 files changed, 33 insertions(+), 7 deletions(-)
diff --git a/include/uapi/linux/can/isotp.h b/include/uapi/linux/can/isotp.h index c55935b64ccc..590f8aea2b6d 100644 --- a/include/uapi/linux/can/isotp.h +++ b/include/uapi/linux/can/isotp.h @@ -137,20 +137,16 @@ struct can_isotp_ll_options { #define CAN_ISOTP_WAIT_TX_DONE 0x400 /* wait for tx completion */ #define CAN_ISOTP_SF_BROADCAST 0x800 /* 1-to-N functional addressing */
-/* default values */ +/* protocol machine default values */
#define CAN_ISOTP_DEFAULT_FLAGS 0 #define CAN_ISOTP_DEFAULT_EXT_ADDRESS 0x00 #define CAN_ISOTP_DEFAULT_PAD_CONTENT 0xCC /* prevent bit-stuffing */ -#define CAN_ISOTP_DEFAULT_FRAME_TXTIME 0 +#define CAN_ISOTP_DEFAULT_FRAME_TXTIME 50000 /* 50 micro seconds */ #define CAN_ISOTP_DEFAULT_RECV_BS 0 #define CAN_ISOTP_DEFAULT_RECV_STMIN 0x00 #define CAN_ISOTP_DEFAULT_RECV_WFTMAX 0
-#define CAN_ISOTP_DEFAULT_LL_MTU CAN_MTU -#define CAN_ISOTP_DEFAULT_LL_TX_DL CAN_MAX_DLEN -#define CAN_ISOTP_DEFAULT_LL_TX_FLAGS 0 - /* * Remark on CAN_ISOTP_DEFAULT_RECV_* values: * @@ -162,4 +158,24 @@ struct can_isotp_ll_options { * consistency and copied directly into the flow control (FC) frame. */
+/* link layer default values => make use of Classical CAN frames */ + +#define CAN_ISOTP_DEFAULT_LL_MTU CAN_MTU +#define CAN_ISOTP_DEFAULT_LL_TX_DL CAN_MAX_DLEN +#define CAN_ISOTP_DEFAULT_LL_TX_FLAGS 0 + +/* + * The CAN_ISOTP_DEFAULT_FRAME_TXTIME has become a non-zero value as + * it only makes sense for isotp implementation tests to run without + * a N_As value. As user space applications usually do not set the + * frame_txtime element of struct can_isotp_options the new in-kernel + * default is very likely overwritten with zero when the sockopt() + * CAN_ISOTP_OPTS is invoked. + * To make sure that a N_As value of zero is only set intentional the + * value '0' is now interpreted as 'do not change the current value'. + * When a frame_txtime of zero is required for testing purposes this + * CAN_ISOTP_FRAME_TXTIME_ZERO u32 value has to be set in frame_txtime. + */ +#define CAN_ISOTP_FRAME_TXTIME_ZERO 0xFFFFFFFF + #endif /* !_UAPI_CAN_ISOTP_H */ diff --git a/net/can/isotp.c b/net/can/isotp.c index a95d171b3a64..5bce7c66c121 100644 --- a/net/can/isotp.c +++ b/net/can/isotp.c @@ -141,6 +141,7 @@ struct isotp_sock { struct can_isotp_options opt; struct can_isotp_fc_options rxfc, txfc; struct can_isotp_ll_options ll; + u32 frame_txtime; u32 force_tx_stmin; u32 force_rx_stmin; struct tpcon rx, tx; @@ -360,7 +361,7 @@ static int isotp_rcv_fc(struct isotp_sock *so, struct canfd_frame *cf, int ae)
so->tx_gap = ktime_set(0, 0); /* add transmission time for CAN frame N_As */ - so->tx_gap = ktime_add_ns(so->tx_gap, so->opt.frame_txtime); + so->tx_gap = ktime_add_ns(so->tx_gap, so->frame_txtime); /* add waiting time for consecutive frames N_Cs */ if (so->opt.flags & CAN_ISOTP_FORCE_TXSTMIN) so->tx_gap = ktime_add_ns(so->tx_gap, @@ -1247,6 +1248,14 @@ static int isotp_setsockopt_locked(struct socket *sock, int level, int optname, /* no separate rx_ext_address is given => use ext_address */ if (!(so->opt.flags & CAN_ISOTP_RX_EXT_ADDR)) so->opt.rx_ext_address = so->opt.ext_address; + + /* check for frame_txtime changes (0 => no changes) */ + if (so->opt.frame_txtime) { + if (so->opt.frame_txtime == CAN_ISOTP_FRAME_TXTIME_ZERO) + so->frame_txtime = 0; + else + so->frame_txtime = so->opt.frame_txtime; + } break;
case CAN_ISOTP_RECV_FC: @@ -1448,6 +1457,7 @@ static int isotp_init(struct sock *sk) so->opt.rxpad_content = CAN_ISOTP_DEFAULT_PAD_CONTENT; so->opt.txpad_content = CAN_ISOTP_DEFAULT_PAD_CONTENT; so->opt.frame_txtime = CAN_ISOTP_DEFAULT_FRAME_TXTIME; + so->frame_txtime = CAN_ISOTP_DEFAULT_FRAME_TXTIME; so->rxfc.bs = CAN_ISOTP_DEFAULT_RECV_BS; so->rxfc.stmin = CAN_ISOTP_DEFAULT_RECV_STMIN; so->rxfc.wftmax = CAN_ISOTP_DEFAULT_RECV_WFTMAX;
From: Vincent Mailhol mailhol.vincent@wanadoo.fr
[ Upstream commit 7a8cd7c0ee823a1cc893ab3feaa23e4b602bfb9a ]
Function es58x_fd_rx_event() invokes the es58x_check_msg_len() macro:
| ret = es58x_check_msg_len(es58x_dev->dev, *rx_event_msg, msg_len);
While doing so, it dereferences an uninitialized variable: *rx_event_msg.
This is actually harmless because es58x_check_msg_len() only uses preprocessor macros (sizeof() and __stringify()) on *rx_event_msg. c.f. [1].
Nonetheless, this pattern is confusing so the lines are reordered to make sure that rx_event_msg is correctly initialized.
This patch also fixes a false positive warning reported by cppcheck:
| cppcheck possible warnings: (new ones prefixed by >>, may not be real problems) | | In file included from drivers/net/can/usb/etas_es58x/es58x_fd.c: | >> drivers/net/can/usb/etas_es58x/es58x_fd.c:174:8: warning: Uninitialized variable: rx_event_msg [uninitvar] | ret = es58x_check_msg_len(es58x_dev->dev, *rx_event_msg, msg_len); | ^
[1] https://elixir.bootlin.com/linux/v5.16/source/drivers/net/can/usb/etas_es58x...
Link: https://lore.kernel.org/all/20220306101302.708783-1-mailhol.vincent@wanadoo.... Signed-off-by: Vincent Mailhol mailhol.vincent@wanadoo.fr Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/can/usb/etas_es58x/es58x_fd.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/can/usb/etas_es58x/es58x_fd.c b/drivers/net/can/usb/etas_es58x/es58x_fd.c index ec87126e1a7d..8ccda748fd08 100644 --- a/drivers/net/can/usb/etas_es58x/es58x_fd.c +++ b/drivers/net/can/usb/etas_es58x/es58x_fd.c @@ -172,12 +172,11 @@ static int es58x_fd_rx_event_msg(struct net_device *netdev, const struct es58x_fd_rx_event_msg *rx_event_msg; int ret;
+ rx_event_msg = &es58x_fd_urb_cmd->rx_event_msg; ret = es58x_check_msg_len(es58x_dev->dev, *rx_event_msg, msg_len); if (ret) return ret;
- rx_event_msg = &es58x_fd_urb_cmd->rx_event_msg; - return es58x_rx_err_msg(netdev, rx_event_msg->error_code, rx_event_msg->event_code, get_unaligned_le64(&rx_event_msg->timestamp));
From: Michael T. Kloos michael@michaelkloos.com
[ Upstream commit 9d1f0ec9f71780e69ceb9d91697600c747d6e02e ]
Rewrote the RISC-V memmove() assembly implementation. The previous implementation did not check memory alignment and it compared 2 pointers with a signed comparison. The misaligned memory access would cause the kernel to crash on systems that did not emulate it in firmware and did not support it in hardware. Firmware emulation is slow and may not exist. The RISC-V spec does not guarantee that support for misaligned memory accesses will exist. It should not be depended on.
This patch now checks for XLEN granularity of co-alignment between the pointers. Failing that, copying is done by loading from the 2 contiguous and naturally aligned XLEN memory locations containing the overlapping XLEN sized data to be copied. The data is shifted into the correct place and binary or'ed together on each iteration. The result is then stored into the corresponding naturally aligned XLEN sized location in the destination. For unaligned data at the terminations of the regions to be copied or for copies less than (2 * XLEN) in size, byte copy is used.
This patch also now uses unsigned comparison for the pointers and migrates to the newer assembler annotations from the now deprecated ones.
Signed-off-by: Michael T. Kloos michael@michaelkloos.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/lib/memmove.S | 368 +++++++++++++++++++++++++++++++++------ 1 file changed, 310 insertions(+), 58 deletions(-)
diff --git a/arch/riscv/lib/memmove.S b/arch/riscv/lib/memmove.S index 07d1d2152ba5..e0609e1f0864 100644 --- a/arch/riscv/lib/memmove.S +++ b/arch/riscv/lib/memmove.S @@ -1,64 +1,316 @@ -/* SPDX-License-Identifier: GPL-2.0 */ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2022 Michael T. Kloos michael@michaelkloos.com + */
#include <linux/linkage.h> #include <asm/asm.h>
-ENTRY(__memmove) -WEAK(memmove) - move t0, a0 - move t1, a1 - - beq a0, a1, exit_memcpy - beqz a2, exit_memcpy - srli t2, a2, 0x2 - - slt t3, a0, a1 - beqz t3, do_reverse - - andi a2, a2, 0x3 - li t4, 1 - beqz t2, byte_copy - -word_copy: - lw t3, 0(a1) - addi t2, t2, -1 - addi a1, a1, 4 - sw t3, 0(a0) - addi a0, a0, 4 - bnez t2, word_copy - beqz a2, exit_memcpy - j byte_copy - -do_reverse: - add a0, a0, a2 - add a1, a1, a2 - andi a2, a2, 0x3 - li t4, -1 - beqz t2, reverse_byte_copy - -reverse_word_copy: - addi a1, a1, -4 - addi t2, t2, -1 - lw t3, 0(a1) - addi a0, a0, -4 - sw t3, 0(a0) - bnez t2, reverse_word_copy - beqz a2, exit_memcpy - -reverse_byte_copy: - addi a0, a0, -1 - addi a1, a1, -1 +SYM_FUNC_START(__memmove) +SYM_FUNC_START_WEAK(memmove) + /* + * Returns + * a0 - dest + * + * Parameters + * a0 - Inclusive first byte of dest + * a1 - Inclusive first byte of src + * a2 - Length of copy n + * + * Because the return matches the parameter register a0, + * we will not clobber or modify that register. + * + * Note: This currently only works on little-endian. + * To port to big-endian, reverse the direction of shifts + * in the 2 misaligned fixup copy loops. + */
+ /* Return if nothing to do */ + beq a0, a1, return_from_memmove + beqz a2, return_from_memmove + + /* + * Register Uses + * Forward Copy: a1 - Index counter of src + * Reverse Copy: a4 - Index counter of src + * Forward Copy: t3 - Index counter of dest + * Reverse Copy: t4 - Index counter of dest + * Both Copy Modes: t5 - Inclusive first multibyte/aligned of dest + * Both Copy Modes: t6 - Non-Inclusive last multibyte/aligned of dest + * Both Copy Modes: t0 - Link / Temporary for load-store + * Both Copy Modes: t1 - Temporary for load-store + * Both Copy Modes: t2 - Temporary for load-store + * Both Copy Modes: a5 - dest to src alignment offset + * Both Copy Modes: a6 - Shift ammount + * Both Copy Modes: a7 - Inverse Shift ammount + * Both Copy Modes: a2 - Alternate breakpoint for unrolled loops + */ + + /* + * Solve for some register values now. + * Byte copy does not need t5 or t6. + */ + mv t3, a0 + add t4, a0, a2 + add a4, a1, a2 + + /* + * Byte copy if copying less than (2 * SZREG) bytes. This can + * cause problems with the bulk copy implementation and is + * small enough not to bother. + */ + andi t0, a2, -(2 * SZREG) + beqz t0, byte_copy + + /* + * Now solve for t5 and t6. + */ + andi t5, t3, -SZREG + andi t6, t4, -SZREG + /* + * If dest(Register t3) rounded down to the nearest naturally + * aligned SZREG address, does not equal dest, then add SZREG + * to find the low-bound of SZREG alignment in the dest memory + * region. Note that this could overshoot the dest memory + * region if n is less than SZREG. This is one reason why + * we always byte copy if n is less than SZREG. + * Otherwise, dest is already naturally aligned to SZREG. + */ + beq t5, t3, 1f + addi t5, t5, SZREG + 1: + + /* + * If the dest and src are co-aligned to SZREG, then there is + * no need for the full rigmarole of a full misaligned fixup copy. + * Instead, do a simpler co-aligned copy. + */ + xor t0, a0, a1 + andi t1, t0, (SZREG - 1) + beqz t1, coaligned_copy + /* Fall through to misaligned fixup copy */ + +misaligned_fixup_copy: + bltu a1, a0, misaligned_fixup_copy_reverse + +misaligned_fixup_copy_forward: + jal t0, byte_copy_until_aligned_forward + + andi a5, a1, (SZREG - 1) /* Find the alignment offset of src (a1) */ + slli a6, a5, 3 /* Multiply by 8 to convert that to bits to shift */ + sub a5, a1, t3 /* Find the difference between src and dest */ + andi a1, a1, -SZREG /* Align the src pointer */ + addi a2, t6, SZREG /* The other breakpoint for the unrolled loop*/ + + /* + * Compute The Inverse Shift + * a7 = XLEN - a6 = XLEN + -a6 + * 2s complement negation to find the negative: -a6 = ~a6 + 1 + * Add that to XLEN. XLEN = SZREG * 8. + */ + not a7, a6 + addi a7, a7, (SZREG * 8 + 1) + + /* + * Fix Misalignment Copy Loop - Forward + * load_val0 = load_ptr[0]; + * do { + * load_val1 = load_ptr[1]; + * store_ptr += 2; + * store_ptr[0 - 2] = (load_val0 >> {a6}) | (load_val1 << {a7}); + * + * if (store_ptr == {a2}) + * break; + * + * load_val0 = load_ptr[2]; + * load_ptr += 2; + * store_ptr[1 - 2] = (load_val1 >> {a6}) | (load_val0 << {a7}); + * + * } while (store_ptr != store_ptr_end); + * store_ptr = store_ptr_end; + */ + + REG_L t0, (0 * SZREG)(a1) + 1: + REG_L t1, (1 * SZREG)(a1) + addi t3, t3, (2 * SZREG) + srl t0, t0, a6 + sll t2, t1, a7 + or t2, t0, t2 + REG_S t2, ((0 * SZREG) - (2 * SZREG))(t3) + + beq t3, a2, 2f + + REG_L t0, (2 * SZREG)(a1) + addi a1, a1, (2 * SZREG) + srl t1, t1, a6 + sll t2, t0, a7 + or t2, t1, t2 + REG_S t2, ((1 * SZREG) - (2 * SZREG))(t3) + + bne t3, t6, 1b + 2: + mv t3, t6 /* Fix the dest pointer in case the loop was broken */ + + add a1, t3, a5 /* Restore the src pointer */ + j byte_copy_forward /* Copy any remaining bytes */ + +misaligned_fixup_copy_reverse: + jal t0, byte_copy_until_aligned_reverse + + andi a5, a4, (SZREG - 1) /* Find the alignment offset of src (a4) */ + slli a6, a5, 3 /* Multiply by 8 to convert that to bits to shift */ + sub a5, a4, t4 /* Find the difference between src and dest */ + andi a4, a4, -SZREG /* Align the src pointer */ + addi a2, t5, -SZREG /* The other breakpoint for the unrolled loop*/ + + /* + * Compute The Inverse Shift + * a7 = XLEN - a6 = XLEN + -a6 + * 2s complement negation to find the negative: -a6 = ~a6 + 1 + * Add that to XLEN. XLEN = SZREG * 8. + */ + not a7, a6 + addi a7, a7, (SZREG * 8 + 1) + + /* + * Fix Misalignment Copy Loop - Reverse + * load_val1 = load_ptr[0]; + * do { + * load_val0 = load_ptr[-1]; + * store_ptr -= 2; + * store_ptr[1] = (load_val0 >> {a6}) | (load_val1 << {a7}); + * + * if (store_ptr == {a2}) + * break; + * + * load_val1 = load_ptr[-2]; + * load_ptr -= 2; + * store_ptr[0] = (load_val1 >> {a6}) | (load_val0 << {a7}); + * + * } while (store_ptr != store_ptr_end); + * store_ptr = store_ptr_end; + */ + + REG_L t1, ( 0 * SZREG)(a4) + 1: + REG_L t0, (-1 * SZREG)(a4) + addi t4, t4, (-2 * SZREG) + sll t1, t1, a7 + srl t2, t0, a6 + or t2, t1, t2 + REG_S t2, ( 1 * SZREG)(t4) + + beq t4, a2, 2f + + REG_L t1, (-2 * SZREG)(a4) + addi a4, a4, (-2 * SZREG) + sll t0, t0, a7 + srl t2, t1, a6 + or t2, t0, t2 + REG_S t2, ( 0 * SZREG)(t4) + + bne t4, t5, 1b + 2: + mv t4, t5 /* Fix the dest pointer in case the loop was broken */ + + add a4, t4, a5 /* Restore the src pointer */ + j byte_copy_reverse /* Copy any remaining bytes */ + +/* + * Simple copy loops for SZREG co-aligned memory locations. + * These also make calls to do byte copies for any unaligned + * data at their terminations. + */ +coaligned_copy: + bltu a1, a0, coaligned_copy_reverse + +coaligned_copy_forward: + jal t0, byte_copy_until_aligned_forward + + 1: + REG_L t1, ( 0 * SZREG)(a1) + addi a1, a1, SZREG + addi t3, t3, SZREG + REG_S t1, (-1 * SZREG)(t3) + bne t3, t6, 1b + + j byte_copy_forward /* Copy any remaining bytes */ + +coaligned_copy_reverse: + jal t0, byte_copy_until_aligned_reverse + + 1: + REG_L t1, (-1 * SZREG)(a4) + addi a4, a4, -SZREG + addi t4, t4, -SZREG + REG_S t1, ( 0 * SZREG)(t4) + bne t4, t5, 1b + + j byte_copy_reverse /* Copy any remaining bytes */ + +/* + * These are basically sub-functions within the function. They + * are used to byte copy until the dest pointer is in alignment. + * At which point, a bulk copy method can be used by the + * calling code. These work on the same registers as the bulk + * copy loops. Therefore, the register values can be picked + * up from where they were left and we avoid code duplication + * without any overhead except the call in and return jumps. + */ +byte_copy_until_aligned_forward: + beq t3, t5, 2f + 1: + lb t1, 0(a1) + addi a1, a1, 1 + addi t3, t3, 1 + sb t1, -1(t3) + bne t3, t5, 1b + 2: + jalr zero, 0x0(t0) /* Return to multibyte copy loop */ + +byte_copy_until_aligned_reverse: + beq t4, t6, 2f + 1: + lb t1, -1(a4) + addi a4, a4, -1 + addi t4, t4, -1 + sb t1, 0(t4) + bne t4, t6, 1b + 2: + jalr zero, 0x0(t0) /* Return to multibyte copy loop */ + +/* + * Simple byte copy loops. + * These will byte copy until they reach the end of data to copy. + * At that point, they will call to return from memmove. + */ byte_copy: - lb t3, 0(a1) - addi a2, a2, -1 - sb t3, 0(a0) - add a1, a1, t4 - add a0, a0, t4 - bnez a2, byte_copy - -exit_memcpy: - move a0, t0 - move a1, t1 - ret -END(__memmove) + bltu a1, a0, byte_copy_reverse + +byte_copy_forward: + beq t3, t4, 2f + 1: + lb t1, 0(a1) + addi a1, a1, 1 + addi t3, t3, 1 + sb t1, -1(t3) + bne t3, t4, 1b + 2: + ret + +byte_copy_reverse: + beq t4, t3, 2f + 1: + lb t1, -1(a4) + addi a4, a4, -1 + addi t4, t4, -1 + sb t1, 0(t4) + bne t4, t3, 1b + 2: + +return_from_memmove: + ret + +SYM_FUNC_END(memmove) +SYM_FUNC_END(__memmove)
Backporting to 5.17 looks good to me.
Michael
On Tue, 2022-04-12 at 08:29 +0200, Greg Kroah-Hartman wrote:
From: Michael T. Kloos michael@michaelkloos.com
[ Upstream commit 9d1f0ec9f71780e69ceb9d91697600c747d6e02e ]
Rewrote the RISC-V memmove() assembly implementation. The previous implementation did not check memory alignment and it compared 2 pointers with a signed comparison. The misaligned memory access would cause the kernel to crash on systems that did not emulate it in firmware and did not support it in hardware. Firmware emulation is slow and may not exist. The RISC-V spec does not guarantee that support for misaligned memory accesses will exist. It should not be depended on.
This patch now checks for XLEN granularity of co-alignment between the pointers. Failing that, copying is done by loading from the 2 contiguous and naturally aligned XLEN memory locations containing the overlapping XLEN sized data to be copied. The data is shifted into the correct place and binary or'ed together on each iteration. The result is then stored into the corresponding naturally aligned XLEN sized location in the destination. For unaligned data at the terminations of the regions to be copied or for copies less than (2 * XLEN) in size, byte copy is used.
This patch also now uses unsigned comparison for the pointers and migrates to the newer assembler annotations from the now deprecated ones.
Signed-off-by: Michael T. Kloos michael@michaelkloos.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org
arch/riscv/lib/memmove.S | 368 +++++++++++++++++++++++++++++++++------ 1 file changed, 310 insertions(+), 58 deletions(-)
diff --git a/arch/riscv/lib/memmove.S b/arch/riscv/lib/memmove.S index 07d1d2152ba5..e0609e1f0864 100644 --- a/arch/riscv/lib/memmove.S +++ b/arch/riscv/lib/memmove.S @@ -1,64 +1,316 @@ -/* SPDX-License-Identifier: GPL-2.0 */ +/* SPDX-License-Identifier: GPL-2.0-only */ +/*
- Copyright (C) 2022 Michael T. Kloos michael@michaelkloos.com
- */
#include <linux/linkage.h> #include <asm/asm.h> -ENTRY(__memmove) -WEAK(memmove)
move t0, a0
move t1, a1
beq a0, a1, exit_memcpy
beqz a2, exit_memcpy
srli t2, a2, 0x2
slt t3, a0, a1
beqz t3, do_reverse
andi a2, a2, 0x3
li t4, 1
beqz t2, byte_copy
-word_copy:
lw t3, 0(a1)
addi t2, t2, -1
addi a1, a1, 4
sw t3, 0(a0)
addi a0, a0, 4
bnez t2, word_copy
beqz a2, exit_memcpy
j byte_copy
-do_reverse:
add a0, a0, a2
add a1, a1, a2
andi a2, a2, 0x3
li t4, -1
beqz t2, reverse_byte_copy
-reverse_word_copy:
addi a1, a1, -4
addi t2, t2, -1
lw t3, 0(a1)
addi a0, a0, -4
sw t3, 0(a0)
bnez t2, reverse_word_copy
beqz a2, exit_memcpy
-reverse_byte_copy:
addi a0, a0, -1
addi a1, a1, -1
+SYM_FUNC_START(__memmove) +SYM_FUNC_START_WEAK(memmove)
- /*
* Returns
* a0 - dest
*
* Parameters
* a0 - Inclusive first byte of dest
* a1 - Inclusive first byte of src
* a2 - Length of copy n
*
* Because the return matches the parameter register a0,
* we will not clobber or modify that register.
*
* Note: This currently only works on little-endian.
* To port to big-endian, reverse the direction of shifts
* in the 2 misaligned fixup copy loops.
*/
- /* Return if nothing to do */
- beq a0, a1, return_from_memmove
- beqz a2, return_from_memmove
- /*
* Register Uses
* Forward Copy: a1 - Index counter of src
* Reverse Copy: a4 - Index counter of src
* Forward Copy: t3 - Index counter of dest
* Reverse Copy: t4 - Index counter of dest
* Both Copy Modes: t5 - Inclusive first multibyte/aligned of dest
* Both Copy Modes: t6 - Non-Inclusive last multibyte/aligned of dest
* Both Copy Modes: t0 - Link / Temporary for load-store
* Both Copy Modes: t1 - Temporary for load-store
* Both Copy Modes: t2 - Temporary for load-store
* Both Copy Modes: a5 - dest to src alignment offset
* Both Copy Modes: a6 - Shift ammount
* Both Copy Modes: a7 - Inverse Shift ammount
* Both Copy Modes: a2 - Alternate breakpoint for unrolled loops
*/
- /*
* Solve for some register values now.
* Byte copy does not need t5 or t6.
*/
- mv t3, a0
- add t4, a0, a2
- add a4, a1, a2
- /*
* Byte copy if copying less than (2 * SZREG) bytes. This can
* cause problems with the bulk copy implementation and is
* small enough not to bother.
*/
- andi t0, a2, -(2 * SZREG)
- beqz t0, byte_copy
- /*
* Now solve for t5 and t6.
*/
- andi t5, t3, -SZREG
- andi t6, t4, -SZREG
- /*
* If dest(Register t3) rounded down to the nearest naturally
* aligned SZREG address, does not equal dest, then add SZREG
* to find the low-bound of SZREG alignment in the dest memory
* region. Note that this could overshoot the dest memory
* region if n is less than SZREG. This is one reason why
* we always byte copy if n is less than SZREG.
* Otherwise, dest is already naturally aligned to SZREG.
*/
- beq t5, t3, 1f
addi t5, t5, SZREG
- 1:
- /*
* If the dest and src are co-aligned to SZREG, then there is
* no need for the full rigmarole of a full misaligned fixup copy.
* Instead, do a simpler co-aligned copy.
*/
- xor t0, a0, a1
- andi t1, t0, (SZREG - 1)
- beqz t1, coaligned_copy
- /* Fall through to misaligned fixup copy */
+misaligned_fixup_copy:
- bltu a1, a0, misaligned_fixup_copy_reverse
+misaligned_fixup_copy_forward:
- jal t0, byte_copy_until_aligned_forward
- andi a5, a1, (SZREG - 1) /* Find the alignment offset of src (a1) */
- slli a6, a5, 3 /* Multiply by 8 to convert that to bits to shift */
- sub a5, a1, t3 /* Find the difference between src and dest */
- andi a1, a1, -SZREG /* Align the src pointer */
- addi a2, t6, SZREG /* The other breakpoint for the unrolled loop*/
- /*
* Compute The Inverse Shift
* a7 = XLEN - a6 = XLEN + -a6
* 2s complement negation to find the negative: -a6 = ~a6 + 1
* Add that to XLEN. XLEN = SZREG * 8.
*/
- not a7, a6
- addi a7, a7, (SZREG * 8 + 1)
- /*
* Fix Misalignment Copy Loop - Forward
* load_val0 = load_ptr[0];
* do {
* load_val1 = load_ptr[1];
* store_ptr += 2;
* store_ptr[0 - 2] = (load_val0 >> {a6}) | (load_val1 << {a7});
*
* if (store_ptr == {a2})
* break;
*
* load_val0 = load_ptr[2];
* load_ptr += 2;
* store_ptr[1 - 2] = (load_val1 >> {a6}) | (load_val0 << {a7});
*
* } while (store_ptr != store_ptr_end);
* store_ptr = store_ptr_end;
*/
- REG_L t0, (0 * SZREG)(a1)
- 1:
- REG_L t1, (1 * SZREG)(a1)
- addi t3, t3, (2 * SZREG)
- srl t0, t0, a6
- sll t2, t1, a7
- or t2, t0, t2
- REG_S t2, ((0 * SZREG) - (2 * SZREG))(t3)
- beq t3, a2, 2f
- REG_L t0, (2 * SZREG)(a1)
- addi a1, a1, (2 * SZREG)
- srl t1, t1, a6
- sll t2, t0, a7
- or t2, t1, t2
- REG_S t2, ((1 * SZREG) - (2 * SZREG))(t3)
- bne t3, t6, 1b
- 2:
- mv t3, t6 /* Fix the dest pointer in case the loop was broken */
- add a1, t3, a5 /* Restore the src pointer */
- j byte_copy_forward /* Copy any remaining bytes */
+misaligned_fixup_copy_reverse:
- jal t0, byte_copy_until_aligned_reverse
- andi a5, a4, (SZREG - 1) /* Find the alignment offset of src (a4) */
- slli a6, a5, 3 /* Multiply by 8 to convert that to bits to shift */
- sub a5, a4, t4 /* Find the difference between src and dest */
- andi a4, a4, -SZREG /* Align the src pointer */
- addi a2, t5, -SZREG /* The other breakpoint for the unrolled loop*/
- /*
* Compute The Inverse Shift
* a7 = XLEN - a6 = XLEN + -a6
* 2s complement negation to find the negative: -a6 = ~a6 + 1
* Add that to XLEN. XLEN = SZREG * 8.
*/
- not a7, a6
- addi a7, a7, (SZREG * 8 + 1)
- /*
* Fix Misalignment Copy Loop - Reverse
* load_val1 = load_ptr[0];
* do {
* load_val0 = load_ptr[-1];
* store_ptr -= 2;
* store_ptr[1] = (load_val0 >> {a6}) | (load_val1 << {a7});
*
* if (store_ptr == {a2})
* break;
*
* load_val1 = load_ptr[-2];
* load_ptr -= 2;
* store_ptr[0] = (load_val1 >> {a6}) | (load_val0 << {a7});
*
* } while (store_ptr != store_ptr_end);
* store_ptr = store_ptr_end;
*/
- REG_L t1, ( 0 * SZREG)(a4)
- 1:
- REG_L t0, (-1 * SZREG)(a4)
- addi t4, t4, (-2 * SZREG)
- sll t1, t1, a7
- srl t2, t0, a6
- or t2, t1, t2
- REG_S t2, ( 1 * SZREG)(t4)
- beq t4, a2, 2f
- REG_L t1, (-2 * SZREG)(a4)
- addi a4, a4, (-2 * SZREG)
- sll t0, t0, a7
- srl t2, t1, a6
- or t2, t0, t2
- REG_S t2, ( 0 * SZREG)(t4)
- bne t4, t5, 1b
- 2:
- mv t4, t5 /* Fix the dest pointer in case the loop was broken */
- add a4, t4, a5 /* Restore the src pointer */
- j byte_copy_reverse /* Copy any remaining bytes */
+/*
- Simple copy loops for SZREG co-aligned memory locations.
- These also make calls to do byte copies for any unaligned
- data at their terminations.
- */
+coaligned_copy:
- bltu a1, a0, coaligned_copy_reverse
+coaligned_copy_forward:
- jal t0, byte_copy_until_aligned_forward
- 1:
- REG_L t1, ( 0 * SZREG)(a1)
- addi a1, a1, SZREG
- addi t3, t3, SZREG
- REG_S t1, (-1 * SZREG)(t3)
- bne t3, t6, 1b
- j byte_copy_forward /* Copy any remaining bytes */
+coaligned_copy_reverse:
- jal t0, byte_copy_until_aligned_reverse
- 1:
- REG_L t1, (-1 * SZREG)(a4)
- addi a4, a4, -SZREG
- addi t4, t4, -SZREG
- REG_S t1, ( 0 * SZREG)(t4)
- bne t4, t5, 1b
- j byte_copy_reverse /* Copy any remaining bytes */
+/*
- These are basically sub-functions within the function. They
- are used to byte copy until the dest pointer is in alignment.
- At which point, a bulk copy method can be used by the
- calling code. These work on the same registers as the bulk
- copy loops. Therefore, the register values can be picked
- up from where they were left and we avoid code duplication
- without any overhead except the call in and return jumps.
- */
+byte_copy_until_aligned_forward:
- beq t3, t5, 2f
- 1:
- lb t1, 0(a1)
- addi a1, a1, 1
- addi t3, t3, 1
- sb t1, -1(t3)
- bne t3, t5, 1b
- 2:
- jalr zero, 0x0(t0) /* Return to multibyte copy loop */
+byte_copy_until_aligned_reverse:
- beq t4, t6, 2f
- 1:
- lb t1, -1(a4)
- addi a4, a4, -1
- addi t4, t4, -1
- sb t1, 0(t4)
- bne t4, t6, 1b
- 2:
- jalr zero, 0x0(t0) /* Return to multibyte copy loop */
+/*
- Simple byte copy loops.
- These will byte copy until they reach the end of data to copy.
- At that point, they will call to return from memmove.
- */
byte_copy:
lb t3, 0(a1)
addi a2, a2, -1
sb t3, 0(a0)
add a1, a1, t4
add a0, a0, t4
bnez a2, byte_copy
-exit_memcpy:
move a0, t0
move a1, t1
ret
-END(__memmove)
- bltu a1, a0, byte_copy_reverse
+byte_copy_forward:
- beq t3, t4, 2f
- 1:
- lb t1, 0(a1)
- addi a1, a1, 1
- addi t3, t3, 1
- sb t1, -1(t3)
- bne t3, t4, 1b
- 2:
- ret
+byte_copy_reverse:
- beq t4, t3, 2f
- 1:
- lb t1, -1(a4)
- addi a4, a4, -1
- addi t4, t4, -1
- sb t1, 0(t4)
- bne t4, t3, 1b
- 2:
+return_from_memmove:
- ret
+SYM_FUNC_END(memmove) +SYM_FUNC_END(__memmove)
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 5d26cff5bdbebdf98ba48217c078ff102536f134 ]
George reports that altnames can eat up kernel memory. We should charge that memory appropriately.
Reported-by: George Shuklin george.shuklin@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/rtnetlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 2fb8eb6791e8..9c9ad3d4b766 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3658,7 +3658,7 @@ static int rtnl_alt_ifname(int cmd, struct net_device *dev, struct nlattr *attr, if (err) return err;
- alt_ifname = nla_strdup(attr, GFP_KERNEL); + alt_ifname = nla_strdup(attr, GFP_KERNEL_ACCOUNT); if (!alt_ifname) return -ENOMEM;
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 155fb43b70b5fce341347a77d1af2765d1e8fbb8 ]
Property list (altname is a link "property") is wrapped in a nlattr. nlattrs length is 16bit so practically speaking the list of properties can't be longer than that, otherwise user space would have to interpret broken netlink messages.
Prevent the problem from occurring by checking the length of the property list before adding new entries.
Reported-by: George Shuklin george.shuklin@gmail.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/rtnetlink.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 9c9ad3d4b766..43b995e935cd 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3652,12 +3652,23 @@ static int rtnl_alt_ifname(int cmd, struct net_device *dev, struct nlattr *attr, bool *changed, struct netlink_ext_ack *extack) { char *alt_ifname; + size_t size; int err;
err = nla_validate(attr, attr->nla_len, IFLA_MAX, ifla_policy, extack); if (err) return err;
+ if (cmd == RTM_NEWLINKPROP) { + size = rtnl_prop_list_size(dev); + size += nla_total_size(ALTIFNAMSIZ); + if (size >= U16_MAX) { + NL_SET_ERR_MSG(extack, + "effective property list too long"); + return -EINVAL; + } + } + alt_ifname = nla_strdup(attr, GFP_KERNEL_ACCOUNT); if (!alt_ifname) return -ENOMEM;
From: Gal Pressman gal@nvidia.com
[ Upstream commit 970adfb76095fa719778d70a6b86030d2feb88dd ]
Unlike the legacy EEPROM callbacks, when using the netlink EEPROM query (get_module_eeprom_by_page) the driver should not try to validate the query parameters, but just perform the read requested by the userspace.
Recent discussion in the mailing list: https://lore.kernel.org/netdev/20220120093051.70845141@kicinski-fedora-PC1C0...
Signed-off-by: Gal Pressman gal@nvidia.com Reviewed-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Maxim Mikityanskiy maximmi@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/port.c | 23 ------------------- 1 file changed, 23 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c index 7b16a1188aab..fd79860de723 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/port.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c @@ -433,35 +433,12 @@ int mlx5_query_module_eeprom_by_page(struct mlx5_core_dev *dev, struct mlx5_module_eeprom_query_params *params, u8 *data) { - u8 module_id; int err;
err = mlx5_query_module_num(dev, ¶ms->module_number); if (err) return err;
- err = mlx5_query_module_id(dev, params->module_number, &module_id); - if (err) - return err; - - switch (module_id) { - case MLX5_MODULE_ID_SFP: - if (params->page > 0) - return -EINVAL; - break; - case MLX5_MODULE_ID_QSFP: - case MLX5_MODULE_ID_QSFP28: - case MLX5_MODULE_ID_QSFP_PLUS: - if (params->page > 3) - return -EINVAL; - break; - case MLX5_MODULE_ID_DSFP: - break; - default: - mlx5_core_err(dev, "Module ID not recognized: 0x%x\n", module_id); - return -EINVAL; - } - if (params->i2c_address != MLX5_I2C_ADDR_HIGH && params->i2c_address != MLX5_I2C_ADDR_LOW) { mlx5_core_err(dev, "I2C address not recognized: 0x%x\n", params->i2c_address);
From: Jorge Lopez jorge.lopez2@hp.com
[ Upstream commit 520ee4ea1cc60251a6e3c911cf0336278aa52634 ]
The purpose of this patch is to introduce a fix and removal of the current hack when determining tablet mode status.
Determining the tablet mode status requires reading Byte 0 bit 2 as reported by HPWMI_HARDWARE_QUERY. The investigation identified the failure was rooted in two areas: HPWMI_HARDWARE_QUERY failure (0x05) and reading Byte 0, bit 2 only to determine the table mode status. HPWMI_HARDWARE_QUERY WMI failure also rendered the dock state value invalid.
The latest changes use SMBIOS Type 3 (chassis type) and WMI Command 0x40 (device_mode_status) information to determine if the device is in tablet mode or not.
hp_wmi_hw_state function was split into two functions; hp_wmi_get_dock_state and hp_wmi_get_tablet_mode. The new functions separate how dock_state and tablet_mode is handled in a cleaner manner.
All changes were validated on a HP ZBook Workstation notebook, HP EliteBook x360, and HP EliteBook 850 G8.
Signed-off-by: Jorge Lopez jorge.lopez2@hp.com Link: https://lore.kernel.org/r/20220310210853.28367-3-jorge.lopez2@hp.com Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/hp-wmi.c | 71 +++++++++++++++++++++++++---------- 1 file changed, 52 insertions(+), 19 deletions(-)
diff --git a/drivers/platform/x86/hp-wmi.c b/drivers/platform/x86/hp-wmi.c index 48a46466f086..f822ef6eb93c 100644 --- a/drivers/platform/x86/hp-wmi.c +++ b/drivers/platform/x86/hp-wmi.c @@ -35,10 +35,6 @@ MODULE_LICENSE("GPL"); MODULE_ALIAS("wmi:95F24279-4D7B-4334-9387-ACCDC67EF61C"); MODULE_ALIAS("wmi:5FB7F034-2C63-45e9-BE91-3D44E2C707E4");
-static int enable_tablet_mode_sw = -1; -module_param(enable_tablet_mode_sw, int, 0444); -MODULE_PARM_DESC(enable_tablet_mode_sw, "Enable SW_TABLET_MODE reporting (-1=auto, 0=no, 1=yes)"); - #define HPWMI_EVENT_GUID "95F24279-4D7B-4334-9387-ACCDC67EF61C" #define HPWMI_BIOS_GUID "5FB7F034-2C63-45e9-BE91-3D44E2C707E4" #define HP_OMEN_EC_THERMAL_PROFILE_OFFSET 0x95 @@ -107,6 +103,7 @@ enum hp_wmi_commandtype { HPWMI_FEATURE2_QUERY = 0x0d, HPWMI_WIRELESS2_QUERY = 0x1b, HPWMI_POSTCODEERROR_QUERY = 0x2a, + HPWMI_SYSTEM_DEVICE_MODE = 0x40, HPWMI_THERMAL_PROFILE_QUERY = 0x4c, };
@@ -217,6 +214,19 @@ struct rfkill2_device { static int rfkill2_count; static struct rfkill2_device rfkill2[HPWMI_MAX_RFKILL2_DEVICES];
+/* + * Chassis Types values were obtained from SMBIOS reference + * specification version 3.00. A complete list of system enclosures + * and chassis types is available on Table 17. + */ +static const char * const tablet_chassis_types[] = { + "30", /* Tablet*/ + "31", /* Convertible */ + "32" /* Detachable */ +}; + +#define DEVICE_MODE_TABLET 0x06 + /* map output size to the corresponding WMI method id */ static inline int encode_outsize_for_pvsz(int outsize) { @@ -345,14 +355,39 @@ static int hp_wmi_read_int(int query) return val; }
-static int hp_wmi_hw_state(int mask) +static int hp_wmi_get_dock_state(void) { int state = hp_wmi_read_int(HPWMI_HARDWARE_QUERY);
if (state < 0) return state;
- return !!(state & mask); + return !!(state & HPWMI_DOCK_MASK); +} + +static int hp_wmi_get_tablet_mode(void) +{ + char system_device_mode[4] = { 0 }; + const char *chassis_type; + bool tablet_found; + int ret; + + chassis_type = dmi_get_system_info(DMI_CHASSIS_TYPE); + if (!chassis_type) + return -ENODEV; + + tablet_found = match_string(tablet_chassis_types, + ARRAY_SIZE(tablet_chassis_types), + chassis_type) >= 0; + if (!tablet_found) + return -ENODEV; + + ret = hp_wmi_perform_query(HPWMI_SYSTEM_DEVICE_MODE, HPWMI_READ, + system_device_mode, 0, sizeof(system_device_mode)); + if (ret < 0) + return ret; + + return system_device_mode[0] == DEVICE_MODE_TABLET; }
static int omen_thermal_profile_set(int mode) @@ -568,7 +603,7 @@ static ssize_t als_show(struct device *dev, struct device_attribute *attr, static ssize_t dock_show(struct device *dev, struct device_attribute *attr, char *buf) { - int value = hp_wmi_hw_state(HPWMI_DOCK_MASK); + int value = hp_wmi_get_dock_state(); if (value < 0) return value; return sprintf(buf, "%d\n", value); @@ -577,7 +612,7 @@ static ssize_t dock_show(struct device *dev, struct device_attribute *attr, static ssize_t tablet_show(struct device *dev, struct device_attribute *attr, char *buf) { - int value = hp_wmi_hw_state(HPWMI_TABLET_MASK); + int value = hp_wmi_get_tablet_mode(); if (value < 0) return value; return sprintf(buf, "%d\n", value); @@ -699,10 +734,10 @@ static void hp_wmi_notify(u32 value, void *context) case HPWMI_DOCK_EVENT: if (test_bit(SW_DOCK, hp_wmi_input_dev->swbit)) input_report_switch(hp_wmi_input_dev, SW_DOCK, - hp_wmi_hw_state(HPWMI_DOCK_MASK)); + hp_wmi_get_dock_state()); if (test_bit(SW_TABLET_MODE, hp_wmi_input_dev->swbit)) input_report_switch(hp_wmi_input_dev, SW_TABLET_MODE, - hp_wmi_hw_state(HPWMI_TABLET_MASK)); + hp_wmi_get_tablet_mode()); input_sync(hp_wmi_input_dev); break; case HPWMI_PARK_HDD: @@ -780,19 +815,17 @@ static int __init hp_wmi_input_setup(void) __set_bit(EV_SW, hp_wmi_input_dev->evbit);
/* Dock */ - val = hp_wmi_hw_state(HPWMI_DOCK_MASK); + val = hp_wmi_get_dock_state(); if (!(val < 0)) { __set_bit(SW_DOCK, hp_wmi_input_dev->swbit); input_report_switch(hp_wmi_input_dev, SW_DOCK, val); }
/* Tablet mode */ - if (enable_tablet_mode_sw > 0) { - val = hp_wmi_hw_state(HPWMI_TABLET_MASK); - if (val >= 0) { - __set_bit(SW_TABLET_MODE, hp_wmi_input_dev->swbit); - input_report_switch(hp_wmi_input_dev, SW_TABLET_MODE, val); - } + val = hp_wmi_get_tablet_mode(); + if (!(val < 0)) { + __set_bit(SW_TABLET_MODE, hp_wmi_input_dev->swbit); + input_report_switch(hp_wmi_input_dev, SW_TABLET_MODE, val); }
err = sparse_keymap_setup(hp_wmi_input_dev, hp_wmi_keymap, NULL); @@ -1227,10 +1260,10 @@ static int hp_wmi_resume_handler(struct device *device) if (hp_wmi_input_dev) { if (test_bit(SW_DOCK, hp_wmi_input_dev->swbit)) input_report_switch(hp_wmi_input_dev, SW_DOCK, - hp_wmi_hw_state(HPWMI_DOCK_MASK)); + hp_wmi_get_dock_state()); if (test_bit(SW_TABLET_MODE, hp_wmi_input_dev->swbit)) input_report_switch(hp_wmi_input_dev, SW_TABLET_MODE, - hp_wmi_hw_state(HPWMI_TABLET_MASK)); + hp_wmi_get_tablet_mode()); input_sync(hp_wmi_input_dev); }
From: Jorge Lopez jorge.lopez2@hp.com
[ Upstream commit be9d73e64957bbd31ee9a0d11adc0f720974c558 ]
Several WMI queries leverage hp_wmi_read_int function to read their data. hp_wmi_read_int function was corrected in a previous patch. Now, this function invokes hp_wmi_perform_query with input parameter of size zero and the output buffer of size 4.
WMI commands calling hp_wmi_perform_query with input buffer size value of zero are listed below.
HPWMI_DISPLAY_QUERY HPWMI_HDDTEMP_QUERY HPWMI_ALS_QUERY HPWMI_HARDWARE_QUERY HPWMI_WIRELESS_QUERY HPWMI_BIOS_QUERY HPWMI_FEATURE_QUERY HPWMI_HOTKEY_QUERY HPWMI_FEATURE2_QUERY HPWMI_WIRELESS2_QUERY HPWMI_POSTCODEERROR_QUERY HPWMI_THERMAL_PROFILE_QUERY HPWMI_FAN_SPEED_MAX_GET_QUERY
Invoking those WMI commands with an input buffer size greater than zero will cause error 0x05 to be returned.
All WMI commands executed by the driver were reviewed and changes were made to ensure the expected input and output buffer size match the WMI specification.
Changes were validated on a HP ZBook Workstation notebook, HP EliteBook x360, and HP EliteBook 850 G8. Additional validation was included in the test process to ensure no other commands were incorrectly handled.
Signed-off-by: Jorge Lopez jorge.lopez2@hp.com Link: https://lore.kernel.org/r/20220310210853.28367-4-jorge.lopez2@hp.com Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/hp-wmi.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/drivers/platform/x86/hp-wmi.c b/drivers/platform/x86/hp-wmi.c index f822ef6eb93c..88f0bfd6ecf1 100644 --- a/drivers/platform/x86/hp-wmi.c +++ b/drivers/platform/x86/hp-wmi.c @@ -330,7 +330,7 @@ static int hp_wmi_get_fan_speed(int fan) char fan_data[4] = { fan, 0, 0, 0 };
int ret = hp_wmi_perform_query(HPWMI_FAN_SPEED_GET_QUERY, HPWMI_GM, - &fan_data, sizeof(fan_data), + &fan_data, sizeof(char), sizeof(fan_data));
if (ret != 0) @@ -399,7 +399,7 @@ static int omen_thermal_profile_set(int mode) return -EINVAL;
ret = hp_wmi_perform_query(HPWMI_SET_PERFORMANCE_MODE, HPWMI_GM, - &buffer, sizeof(buffer), sizeof(buffer)); + &buffer, sizeof(buffer), 0);
if (ret) return ret < 0 ? ret : -EINVAL; @@ -436,7 +436,7 @@ static int hp_wmi_fan_speed_max_set(int enabled) int ret;
ret = hp_wmi_perform_query(HPWMI_FAN_SPEED_MAX_SET_QUERY, HPWMI_GM, - &enabled, sizeof(enabled), sizeof(enabled)); + &enabled, sizeof(enabled), 0);
if (ret) return ret < 0 ? ret : -EINVAL; @@ -449,7 +449,7 @@ static int hp_wmi_fan_speed_max_get(void) int val = 0, ret;
ret = hp_wmi_perform_query(HPWMI_FAN_SPEED_MAX_GET_QUERY, HPWMI_GM, - &val, sizeof(val), sizeof(val)); + &val, 0, sizeof(val));
if (ret) return ret < 0 ? ret : -EINVAL; @@ -461,7 +461,7 @@ static int __init hp_wmi_bios_2008_later(void) { int state = 0; int ret = hp_wmi_perform_query(HPWMI_FEATURE_QUERY, HPWMI_READ, &state, - sizeof(state), sizeof(state)); + 0, sizeof(state)); if (!ret) return 1;
@@ -472,7 +472,7 @@ static int __init hp_wmi_bios_2009_later(void) { u8 state[128]; int ret = hp_wmi_perform_query(HPWMI_FEATURE2_QUERY, HPWMI_READ, &state, - sizeof(state), sizeof(state)); + 0, sizeof(state)); if (!ret) return 1;
@@ -550,7 +550,7 @@ static int hp_wmi_rfkill2_refresh(void) int err, i;
err = hp_wmi_perform_query(HPWMI_WIRELESS2_QUERY, HPWMI_READ, &state, - sizeof(state), sizeof(state)); + 0, sizeof(state)); if (err) return err;
@@ -639,7 +639,7 @@ static ssize_t als_store(struct device *dev, struct device_attribute *attr, return ret;
ret = hp_wmi_perform_query(HPWMI_ALS_QUERY, HPWMI_WRITE, &tmp, - sizeof(tmp), sizeof(tmp)); + sizeof(tmp), 0); if (ret) return ret < 0 ? ret : -EINVAL;
@@ -660,9 +660,9 @@ static ssize_t postcode_store(struct device *dev, struct device_attribute *attr, if (clear == false) return -EINVAL;
- /* Clear the POST error code. It is kept until until cleared. */ + /* Clear the POST error code. It is kept until cleared. */ ret = hp_wmi_perform_query(HPWMI_POSTCODEERROR_QUERY, HPWMI_WRITE, &tmp, - sizeof(tmp), sizeof(tmp)); + sizeof(tmp), 0); if (ret) return ret < 0 ? ret : -EINVAL;
@@ -952,7 +952,7 @@ static int __init hp_wmi_rfkill2_setup(struct platform_device *device) int err, i;
err = hp_wmi_perform_query(HPWMI_WIRELESS2_QUERY, HPWMI_READ, &state, - sizeof(state), sizeof(state)); + 0, sizeof(state)); if (err) return err < 0 ? err : -EINVAL;
From: Michael Walle michael@walle.cc
[ Upstream commit 00eec9fe4f3b9588b4bfa8ef9dd0aae96407d5d7 ]
The Lantech 8330-262D-E module is 2500base-X capable, but it reports the nominal bitrate as 2500MBd instead of 3125MBd. Add a quirk for the module.
The following in an EEPROM dump of such a SFP with the serial number redacted:
00: 03 04 07 00 00 00 01 20 40 0c 05 01 19 00 00 00 ???...? @????... 10: 1e 0f 00 00 4c 61 6e 74 65 63 68 20 20 20 20 20 ??..Lantech 20: 20 20 20 20 00 00 00 00 38 33 33 30 2d 32 36 32 ....8330-262 30: 44 2d 45 20 20 20 20 20 56 31 2e 30 03 52 00 cb D-E V1.0?R.? 40: 00 1a 00 00 46 43 XX XX XX XX XX XX XX XX XX XX .?..FCXXXXXXXXXX 50: 20 20 20 20 32 32 30 32 31 34 20 20 68 b0 01 98 220214 h??? 60: 45 58 54 52 45 4d 45 4c 59 20 43 4f 4d 50 41 54 EXTREMELY COMPAT 70: 49 42 4c 45 20 20 20 20 20 20 20 20 20 20 20 20 IBLE
Signed-off-by: Michael Walle michael@walle.cc Link: https://lore.kernel.org/r/20220312205014.4154907-1-michael@walle.cc Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/sfp-bus.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/phy/sfp-bus.c b/drivers/net/phy/sfp-bus.c index c1512c9925a6..15aa5ac1ff49 100644 --- a/drivers/net/phy/sfp-bus.c +++ b/drivers/net/phy/sfp-bus.c @@ -74,6 +74,12 @@ static const struct sfp_quirk sfp_quirks[] = { .vendor = "HUAWEI", .part = "MA5671A", .modes = sfp_quirk_2500basex, + }, { + // Lantech 8330-262D-E can operate at 2500base-X, but + // incorrectly report 2500MBd NRZ in their EEPROM + .vendor = "Lantech", + .part = "8330-262D-E", + .modes = sfp_quirk_2500basex, }, { .vendor = "UBNT", .part = "UF-INSTANT",
From: H. Nikolaus Schaller hns@goldelico.com
[ Upstream commit ac01df343e5a6c6bcead2ed421af1fde30f73e7e ]
Usually, the vbus_regulator (smps10 on omap5evm) boots up disabled.
Hence calling regulator_disable() indirectly through dwc3_omap_set_mailbox() during probe leads to:
[ 10.332764] WARNING: CPU: 0 PID: 1628 at drivers/regulator/core.c:2853 _regulator_disable+0x40/0x164 [ 10.351919] unbalanced disables for smps10_out1 [ 10.361298] Modules linked in: dwc3_omap(+) clk_twl6040 at24 gpio_twl6040 palmas_gpadc palmas_pwrbutton industrialio snd_soc_omap_mcbsp(+) snd_soc_ti_sdma display_connector ti_tpd12s015 drm leds_gpio drm_panel_orientation_quirks ip_tables x_tables ipv6 autofs4 [ 10.387818] CPU: 0 PID: 1628 Comm: systemd-udevd Not tainted 5.17.0-rc1-letux-lpae+ #8139 [ 10.405129] Hardware name: Generic OMAP5 (Flattened Device Tree) [ 10.411455] unwind_backtrace from show_stack+0x10/0x14 [ 10.416970] show_stack from dump_stack_lvl+0x40/0x4c [ 10.422313] dump_stack_lvl from __warn+0xb8/0x170 [ 10.427377] __warn from warn_slowpath_fmt+0x70/0x9c [ 10.432595] warn_slowpath_fmt from _regulator_disable+0x40/0x164 [ 10.439037] _regulator_disable from regulator_disable+0x30/0x64 [ 10.445382] regulator_disable from dwc3_omap_set_mailbox+0x8c/0xf0 [dwc3_omap] [ 10.453116] dwc3_omap_set_mailbox [dwc3_omap] from dwc3_omap_probe+0x2b8/0x394 [dwc3_omap] [ 10.467021] dwc3_omap_probe [dwc3_omap] from platform_probe+0x58/0xa8 [ 10.481762] platform_probe from really_probe+0x168/0x2fc [ 10.481782] really_probe from __driver_probe_device+0xc4/0xd8 [ 10.481782] __driver_probe_device from driver_probe_device+0x24/0xa4 [ 10.503762] driver_probe_device from __driver_attach+0xc4/0xd8 [ 10.510018] __driver_attach from bus_for_each_dev+0x64/0xa0 [ 10.516001] bus_for_each_dev from bus_add_driver+0x148/0x1a4 [ 10.524880] bus_add_driver from driver_register+0xb4/0xf8 [ 10.530678] driver_register from do_one_initcall+0x90/0x1c4 [ 10.536661] do_one_initcall from do_init_module+0x4c/0x200 [ 10.536683] do_init_module from load_module+0x13dc/0x1910 [ 10.551159] load_module from sys_finit_module+0xc8/0xd8 [ 10.561319] sys_finit_module from __sys_trace_return+0x0/0x18 [ 10.561336] Exception stack(0xc344bfa8 to 0xc344bff0) [ 10.561341] bfa0: b6fb5778 b6fab8d8 00000007 b6ecfbb8 00000000 b6ed0398 [ 10.561341] bfc0: b6fb5778 b6fab8d8 855c0500 0000017b 00020000 b6f9a3cc 00000000 b6fb5778 [ 10.595500] bfe0: bede18f8 bede18e8 b6ec9aeb b6dda1c2 [ 10.601345] ---[ end trace 0000000000000000 ]---
Fix this unnecessary warning by checking if the regulator is enabled.
Signed-off-by: H. Nikolaus Schaller hns@goldelico.com Link: https://lore.kernel.org/r/af3b750dc2265d875deaabcf5f80098c9645da45.164674461... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/dwc3/dwc3-omap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/dwc3-omap.c b/drivers/usb/dwc3/dwc3-omap.c index e196673f5c64..efaf0db595f4 100644 --- a/drivers/usb/dwc3/dwc3-omap.c +++ b/drivers/usb/dwc3/dwc3-omap.c @@ -242,7 +242,7 @@ static void dwc3_omap_set_mailbox(struct dwc3_omap *omap, break;
case OMAP_DWC3_ID_FLOAT: - if (omap->vbus_reg) + if (omap->vbus_reg && regulator_is_enabled(omap->vbus_reg)) regulator_disable(omap->vbus_reg); val = dwc3_omap_read_utmi_ctrl(omap); val |= USBOTGSS_UTMI_OTG_CTRL_IDDIG;
From: Juergen Gross jgross@suse.com
[ Upstream commit aff477cb8f94613f501d386d10f20019e294bc35 ]
Make sure a malicious backend can't cause any harm other than wrong I/O data.
Missing are verification of the request id in a response, sanitizing the reported actual I/O length, and protection against interrupt storms from the backend.
Signed-off-by: Juergen Gross jgross@suse.com Link: https://lore.kernel.org/r/20220311103509.12908-1-jgross@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/host/xen-hcd.c | 57 ++++++++++++++++++++++++++++---------- 1 file changed, 43 insertions(+), 14 deletions(-)
diff --git a/drivers/usb/host/xen-hcd.c b/drivers/usb/host/xen-hcd.c index 19b8c7ed74cb..4ed3ee328a4a 100644 --- a/drivers/usb/host/xen-hcd.c +++ b/drivers/usb/host/xen-hcd.c @@ -51,6 +51,7 @@ struct vdevice_status { struct usb_shadow { struct xenusb_urb_request req; struct urb *urb; + bool in_flight; };
struct xenhcd_info { @@ -722,6 +723,12 @@ static void xenhcd_gnttab_done(struct xenhcd_info *info, unsigned int id) int nr_segs = 0; int i;
+ if (!shadow->in_flight) { + xenhcd_set_error(info, "Illegal request id"); + return; + } + shadow->in_flight = false; + nr_segs = shadow->req.nr_buffer_segs;
if (xenusb_pipeisoc(shadow->req.pipe)) @@ -805,6 +812,7 @@ static int xenhcd_do_request(struct xenhcd_info *info, struct urb_priv *urbp)
info->urb_ring.req_prod_pvt++; info->shadow[id].urb = urb; + info->shadow[id].in_flight = true;
RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&info->urb_ring, notify); if (notify) @@ -933,10 +941,27 @@ static int xenhcd_unlink_urb(struct xenhcd_info *info, struct urb_priv *urbp) return ret; }
-static int xenhcd_urb_request_done(struct xenhcd_info *info) +static void xenhcd_res_to_urb(struct xenhcd_info *info, + struct xenusb_urb_response *res, struct urb *urb) +{ + if (unlikely(!urb)) + return; + + if (res->actual_length > urb->transfer_buffer_length) + urb->actual_length = urb->transfer_buffer_length; + else if (res->actual_length < 0) + urb->actual_length = 0; + else + urb->actual_length = res->actual_length; + urb->error_count = res->error_count; + urb->start_frame = res->start_frame; + xenhcd_giveback_urb(info, urb, res->status); +} + +static int xenhcd_urb_request_done(struct xenhcd_info *info, + unsigned int *eoiflag) { struct xenusb_urb_response res; - struct urb *urb; RING_IDX i, rp; __u16 id; int more_to_do = 0; @@ -963,16 +988,12 @@ static int xenhcd_urb_request_done(struct xenhcd_info *info) xenhcd_gnttab_done(info, id); if (info->error) goto err; - urb = info->shadow[id].urb; - if (likely(urb)) { - urb->actual_length = res.actual_length; - urb->error_count = res.error_count; - urb->start_frame = res.start_frame; - xenhcd_giveback_urb(info, urb, res.status); - } + xenhcd_res_to_urb(info, &res, info->shadow[id].urb); }
xenhcd_add_id_to_freelist(info, id); + + *eoiflag = 0; } info->urb_ring.rsp_cons = i;
@@ -990,7 +1011,7 @@ static int xenhcd_urb_request_done(struct xenhcd_info *info) return 0; }
-static int xenhcd_conn_notify(struct xenhcd_info *info) +static int xenhcd_conn_notify(struct xenhcd_info *info, unsigned int *eoiflag) { struct xenusb_conn_response res; struct xenusb_conn_request *req; @@ -1035,6 +1056,8 @@ static int xenhcd_conn_notify(struct xenhcd_info *info) info->conn_ring.req_prod_pvt); req->id = id; info->conn_ring.req_prod_pvt++; + + *eoiflag = 0; }
if (rc != info->conn_ring.req_prod_pvt) @@ -1057,14 +1080,19 @@ static int xenhcd_conn_notify(struct xenhcd_info *info) static irqreturn_t xenhcd_int(int irq, void *dev_id) { struct xenhcd_info *info = (struct xenhcd_info *)dev_id; + unsigned int eoiflag = XEN_EOI_FLAG_SPURIOUS;
- if (unlikely(info->error)) + if (unlikely(info->error)) { + xen_irq_lateeoi(irq, XEN_EOI_FLAG_SPURIOUS); return IRQ_HANDLED; + }
- while (xenhcd_urb_request_done(info) | xenhcd_conn_notify(info)) + while (xenhcd_urb_request_done(info, &eoiflag) | + xenhcd_conn_notify(info, &eoiflag)) /* Yield point for this unbounded loop. */ cond_resched();
+ xen_irq_lateeoi(irq, eoiflag); return IRQ_HANDLED; }
@@ -1141,9 +1169,9 @@ static int xenhcd_setup_rings(struct xenbus_device *dev, goto fail; }
- err = bind_evtchn_to_irq(info->evtchn); + err = bind_evtchn_to_irq_lateeoi(info->evtchn); if (err <= 0) { - xenbus_dev_fatal(dev, err, "bind_evtchn_to_irq"); + xenbus_dev_fatal(dev, err, "bind_evtchn_to_irq_lateeoi"); goto fail; }
@@ -1496,6 +1524,7 @@ static struct usb_hcd *xenhcd_create_hcd(struct xenbus_device *dev) for (i = 0; i < XENUSB_URB_RING_SIZE; i++) { info->shadow[i].req.id = i + 1; info->shadow[i].urb = NULL; + info->shadow[i].in_flight = false; } info->shadow[XENUSB_URB_RING_SIZE - 1].req.id = 0x0fff;
From: Deren Wu deren.wu@mediatek.com
[ Upstream commit 123bc712b1de0805f9d683687e17b1ec2aba0b68 ]
mt7921s driver may receive frames with fragment buffers. If there is a CTS packet received in monitor mode, the payload is 10 bytes only and need 6 bytes header padding after RXD buffer. However, only RXD in the first linear buffer, if we pull buffer size RXD-size+6 bytes with skb_pull(), that would trigger "BUG_ON(skb->len < skb->data_len)" in __skb_pull().
To avoid the nonlinear buffer issue, enlarge the RXD size from 128 to 256 to make sure all MCU operation in linear buffer.
[ 52.007562] kernel BUG at include/linux/skbuff.h:2313! [ 52.007578] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 52.007987] pc : skb_pull+0x48/0x4c [ 52.008015] lr : mt7921_queue_rx_skb+0x494/0x890 [mt7921_common] [ 52.008361] Call trace: [ 52.008377] skb_pull+0x48/0x4c [ 52.008400] mt76s_net_worker+0x134/0x1b0 [mt76_sdio 35339a92c6eb7d4bbcc806a1d22f56365565135c] [ 52.008431] __mt76_worker_fn+0xe8/0x170 [mt76 ef716597d11a77150bc07e3fdd68eeb0f9b56917] [ 52.008449] kthread+0x148/0x3ac [ 52.008466] ret_from_fork+0x10/0x30
Signed-off-by: Lorenzo Bianconi lorenzo@kernel.org Signed-off-by: Sean Wang sean.wang@mediatek.com Signed-off-by: Deren Wu deren.wu@mediatek.com Signed-off-by: Felix Fietkau nbd@nbd.name Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/mediatek/mt76/mt76.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/mediatek/mt76/mt76.h b/drivers/net/wireless/mediatek/mt76/mt76.h index 1f6f7a44d3f0..5197fcb06649 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76.h +++ b/drivers/net/wireless/mediatek/mt76/mt76.h @@ -19,7 +19,7 @@
#define MT_MCU_RING_SIZE 32 #define MT_RX_BUF_SIZE 2048 -#define MT_SKB_HEAD_LEN 128 +#define MT_SKB_HEAD_LEN 256
#define MT_MAX_NON_AQL_PKT 16 #define MT_TXQ_FREE_THR 32
From: Max Filippov jcmvbkbc@gmail.com
[ Upstream commit e85d29ba4b24f68e7a78cb85c55e754362eeb2de ]
DTC issues the following warnings when building xtfpga device trees:
/soc/flash@00000000/partition@0x0: unit name should not have leading "0x" /soc/flash@00000000/partition@0x6000000: unit name should not have leading "0x" /soc/flash@00000000/partition@0x6800000: unit name should not have leading "0x" /soc/flash@00000000/partition@0x7fe0000: unit name should not have leading "0x"
Drop leading 0x from flash partition unit names.
Signed-off-by: Max Filippov jcmvbkbc@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/xtensa/boot/dts/xtfpga-flash-128m.dtsi | 8 ++++---- arch/xtensa/boot/dts/xtfpga-flash-16m.dtsi | 8 ++++---- arch/xtensa/boot/dts/xtfpga-flash-4m.dtsi | 4 ++-- 3 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/arch/xtensa/boot/dts/xtfpga-flash-128m.dtsi b/arch/xtensa/boot/dts/xtfpga-flash-128m.dtsi index 9bf8bad1dd18..c33932568aa7 100644 --- a/arch/xtensa/boot/dts/xtfpga-flash-128m.dtsi +++ b/arch/xtensa/boot/dts/xtfpga-flash-128m.dtsi @@ -8,19 +8,19 @@ reg = <0x00000000 0x08000000>; bank-width = <2>; device-width = <2>; - partition@0x0 { + partition@0 { label = "data"; reg = <0x00000000 0x06000000>; }; - partition@0x6000000 { + partition@6000000 { label = "boot loader area"; reg = <0x06000000 0x00800000>; }; - partition@0x6800000 { + partition@6800000 { label = "kernel image"; reg = <0x06800000 0x017e0000>; }; - partition@0x7fe0000 { + partition@7fe0000 { label = "boot environment"; reg = <0x07fe0000 0x00020000>; }; diff --git a/arch/xtensa/boot/dts/xtfpga-flash-16m.dtsi b/arch/xtensa/boot/dts/xtfpga-flash-16m.dtsi index 40c2f81f7cb6..7bde2ab2d6fb 100644 --- a/arch/xtensa/boot/dts/xtfpga-flash-16m.dtsi +++ b/arch/xtensa/boot/dts/xtfpga-flash-16m.dtsi @@ -8,19 +8,19 @@ reg = <0x08000000 0x01000000>; bank-width = <2>; device-width = <2>; - partition@0x0 { + partition@0 { label = "boot loader area"; reg = <0x00000000 0x00400000>; }; - partition@0x400000 { + partition@400000 { label = "kernel image"; reg = <0x00400000 0x00600000>; }; - partition@0xa00000 { + partition@a00000 { label = "data"; reg = <0x00a00000 0x005e0000>; }; - partition@0xfe0000 { + partition@fe0000 { label = "boot environment"; reg = <0x00fe0000 0x00020000>; }; diff --git a/arch/xtensa/boot/dts/xtfpga-flash-4m.dtsi b/arch/xtensa/boot/dts/xtfpga-flash-4m.dtsi index fb8d3a9f33c2..0655b868749a 100644 --- a/arch/xtensa/boot/dts/xtfpga-flash-4m.dtsi +++ b/arch/xtensa/boot/dts/xtfpga-flash-4m.dtsi @@ -8,11 +8,11 @@ reg = <0x08000000 0x00400000>; bank-width = <2>; device-width = <2>; - partition@0x0 { + partition@0 { label = "boot loader area"; reg = <0x00000000 0x003f0000>; }; - partition@0x3f0000 { + partition@3f0000 { label = "boot environment"; reg = <0x003f0000 0x00010000>; };
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit 066291bec0c55315e568ead501bebdefcb8453d2 ]
Building iwlmei without CONFIG_CFG80211 causes a link-time warning:
ld.lld: error: undefined symbol: ieee80211_hdrlen
referenced by net.c net/wireless/intel/iwlwifi/mei/net.o:(iwl_mei_tx_copy_to_csme) in archive drivers/built-in.a
Add an explicit dependency to avoid this. In theory it should not be needed here, but it also seems pointless to allow IWLMEI for configurations without CFG80211.
Signed-off-by: Arnd Bergmann arnd@arndb.de Acked-by: Emmanuel Grumbach Emmanuel.grumbach@intel.com Acked-by: Luca Coelho luciano.coelho@intel.com Signed-off-by: Kalle Valo kvalo@kernel.org Link: https://lore.kernel.org/r/20220316183617.1470631-1-arnd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/intel/iwlwifi/Kconfig b/drivers/net/wireless/intel/iwlwifi/Kconfig index 85e704283755..a647a406b87b 100644 --- a/drivers/net/wireless/intel/iwlwifi/Kconfig +++ b/drivers/net/wireless/intel/iwlwifi/Kconfig @@ -139,6 +139,7 @@ config IWLMEI tristate "Intel Management Engine communication over WLAN" depends on INTEL_MEI depends on PM + depends on CFG80211 help Enables the iwlmei kernel module.
From: Krzysztof Kozlowski krzysztof.kozlowski@canonical.com
[ Upstream commit 8931ddd8d6a55fcefb20f44a38ba42bb746f0b62 ]
Unit node addresses should not have leading 0x:
Warning (unit_address_format): /nemc@13410000/efuse@d0/eth-mac-addr@0x22: unit name should not have leading "0x"
Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@canonical.com Reviewed-by: Paul Cercueil paul@crapouillou.net Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/mips/boot/dts/ingenic/jz4780.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/mips/boot/dts/ingenic/jz4780.dtsi b/arch/mips/boot/dts/ingenic/jz4780.dtsi index 3f9ea47a10cd..b998301f179c 100644 --- a/arch/mips/boot/dts/ingenic/jz4780.dtsi +++ b/arch/mips/boot/dts/ingenic/jz4780.dtsi @@ -510,7 +510,7 @@ #address-cells = <1>; #size-cells = <1>;
- eth0_addr: eth-mac-addr@0x22 { + eth0_addr: eth-mac-addr@22 { reg = <0x22 0x6>; }; };
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit f63d24baff787e13b723d86fe036f84bdbc35045 ]
This fixes the following trace caused by receiving HCI_EV_DISCONN_PHY_LINK_COMPLETE which does call hci_conn_del without first checking if conn->type is in fact AMP_LINK and in case it is do properly cleanup upper layers with hci_disconn_cfm:
================================================================== BUG: KASAN: use-after-free in hci_send_acl+0xaba/0xc50 Read of size 8 at addr ffff88800e404818 by task bluetoothd/142
CPU: 0 PID: 142 Comm: bluetoothd Not tainted 5.17.0-rc5-00006-gda4022eeac1a #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x45/0x59 print_address_description.constprop.0+0x1f/0x150 kasan_report.cold+0x7f/0x11b hci_send_acl+0xaba/0xc50 l2cap_do_send+0x23f/0x3d0 l2cap_chan_send+0xc06/0x2cc0 l2cap_sock_sendmsg+0x201/0x2b0 sock_sendmsg+0xdc/0x110 sock_write_iter+0x20f/0x370 do_iter_readv_writev+0x343/0x690 do_iter_write+0x132/0x640 vfs_writev+0x198/0x570 do_writev+0x202/0x280 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RSP: 002b:00007ffce8a099b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000014 Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 14 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 RDX: 0000000000000001 RSI: 00007ffce8a099e0 RDI: 0000000000000015 RAX: ffffffffffffffda RBX: 00007ffce8a099e0 RCX: 00007f788fc3cf77 R10: 00007ffce8af7080 R11: 0000000000000246 R12: 000055e4ccf75580 RBP: 0000000000000015 R08: 0000000000000002 R09: 0000000000000001 </TASK> R13: 000055e4ccf754a0 R14: 000055e4ccf75cd0 R15: 000055e4ccf4a6b0
Allocated by task 45: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 hci_chan_create+0x9a/0x2f0 l2cap_conn_add.part.0+0x1a/0xdc0 l2cap_connect_cfm+0x236/0x1000 le_conn_complete_evt+0x15a7/0x1db0 hci_le_conn_complete_evt+0x226/0x2c0 hci_le_meta_evt+0x247/0x450 hci_event_packet+0x61b/0xe90 hci_rx_work+0x4d5/0xc50 process_one_work+0x8fb/0x15a0 worker_thread+0x576/0x1240 kthread+0x29d/0x340 ret_from_fork+0x1f/0x30
Freed by task 45: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 __kasan_slab_free+0xfb/0x130 kfree+0xac/0x350 hci_conn_cleanup+0x101/0x6a0 hci_conn_del+0x27e/0x6c0 hci_disconn_phylink_complete_evt+0xe0/0x120 hci_event_packet+0x812/0xe90 hci_rx_work+0x4d5/0xc50 process_one_work+0x8fb/0x15a0 worker_thread+0x576/0x1240 kthread+0x29d/0x340 ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff88800c0f0500 The buggy address is located 24 bytes inside of which belongs to the cache kmalloc-128 of size 128 The buggy address belongs to the page: 128-byte region [ffff88800c0f0500, ffff88800c0f0580) flags: 0x100000000000200(slab|node=0|zone=1) page:00000000fe45cd86 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xc0f0 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 raw: 0100000000000200 ffffea00003a2c80 dead000000000004 ffff8880078418c0 page dumped because: kasan: bad access detected ffff88800c0f0400: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc Memory state around the buggy address: >ffff88800c0f0500: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88800c0f0480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88800c0f0580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ================================================================== ffff88800c0f0600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Reported-by: Sönke Huster soenke.huster@eknoes.de Tested-by: Sönke Huster soenke.huster@eknoes.de Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_event.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index 63b925921c87..d984777c9b58 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -5453,8 +5453,9 @@ static void hci_disconn_phylink_complete_evt(struct hci_dev *hdev, void *data, hci_dev_lock(hdev);
hcon = hci_conn_hash_lookup_handle(hdev, ev->phy_handle); - if (hcon) { + if (hcon && hcon->type == AMP_LINK) { hcon->state = BT_CLOSED; + hci_disconn_cfm(hcon, ev->reason); hci_conn_del(hcon); }
From: Florian Westphal fw@strlen.de
[ Upstream commit 2cfadb761d3d0219412fd8150faea60c7e863833 ]
as of commit 4608fdfc07e1 ("netfilter: conntrack: collect all entries in one cycle") conntrack gc was changed to run every 2 minutes.
On systems where conntrack hash table is set to large value, most evictions happen from gc worker rather than the packet path due to hash table distribution.
This causes netlink event overflows when events are collected.
This change collects average expiry of scanned entries and reschedules to the average remaining value, within 1 to 60 second interval.
To avoid event overflows, reschedule after each bucket and add a limit for both run time and number of evictions per run.
If more entries have to be evicted, reschedule and restart 1 jiffy into the future.
Reported-by: Karel Rericha karel@maxtel.cz Cc: Shmulik Ladkani shmulik.ladkani@gmail.com Cc: Eyal Birger eyal.birger@gmail.com Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_conntrack_core.c | 85 ++++++++++++++++++++++++------- 1 file changed, 68 insertions(+), 17 deletions(-)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index bf1e17c678f1..7552e1e9fd62 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -67,6 +67,8 @@ EXPORT_SYMBOL_GPL(nf_conntrack_hash); struct conntrack_gc_work { struct delayed_work dwork; u32 next_bucket; + u32 avg_timeout; + u32 start_time; bool exiting; bool early_drop; }; @@ -78,8 +80,19 @@ static __read_mostly bool nf_conntrack_locks_all; /* serialize hash resizes and nf_ct_iterate_cleanup */ static DEFINE_MUTEX(nf_conntrack_mutex);
-#define GC_SCAN_INTERVAL (120u * HZ) +#define GC_SCAN_INTERVAL_MAX (60ul * HZ) +#define GC_SCAN_INTERVAL_MIN (1ul * HZ) + +/* clamp timeouts to this value (TCP unacked) */ +#define GC_SCAN_INTERVAL_CLAMP (300ul * HZ) + +/* large initial bias so that we don't scan often just because we have + * three entries with a 1s timeout. + */ +#define GC_SCAN_INTERVAL_INIT INT_MAX + #define GC_SCAN_MAX_DURATION msecs_to_jiffies(10) +#define GC_SCAN_EXPIRED_MAX (64000u / HZ)
#define MIN_CHAINLEN 8u #define MAX_CHAINLEN (32u - MIN_CHAINLEN) @@ -1421,16 +1434,28 @@ static bool gc_worker_can_early_drop(const struct nf_conn *ct)
static void gc_worker(struct work_struct *work) { - unsigned long end_time = jiffies + GC_SCAN_MAX_DURATION; unsigned int i, hashsz, nf_conntrack_max95 = 0; - unsigned long next_run = GC_SCAN_INTERVAL; + u32 end_time, start_time = nfct_time_stamp; struct conntrack_gc_work *gc_work; + unsigned int expired_count = 0; + unsigned long next_run; + s32 delta_time; + gc_work = container_of(work, struct conntrack_gc_work, dwork.work);
i = gc_work->next_bucket; if (gc_work->early_drop) nf_conntrack_max95 = nf_conntrack_max / 100u * 95u;
+ if (i == 0) { + gc_work->avg_timeout = GC_SCAN_INTERVAL_INIT; + gc_work->start_time = start_time; + } + + next_run = gc_work->avg_timeout; + + end_time = start_time + GC_SCAN_MAX_DURATION; + do { struct nf_conntrack_tuple_hash *h; struct hlist_nulls_head *ct_hash; @@ -1447,6 +1472,7 @@ static void gc_worker(struct work_struct *work)
hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[i], hnnode) { struct nf_conntrack_net *cnet; + unsigned long expires; struct net *net;
tmp = nf_ct_tuplehash_to_ctrack(h); @@ -1456,11 +1482,29 @@ static void gc_worker(struct work_struct *work) continue; }
+ if (expired_count > GC_SCAN_EXPIRED_MAX) { + rcu_read_unlock(); + + gc_work->next_bucket = i; + gc_work->avg_timeout = next_run; + + delta_time = nfct_time_stamp - gc_work->start_time; + + /* re-sched immediately if total cycle time is exceeded */ + next_run = delta_time < (s32)GC_SCAN_INTERVAL_MAX; + goto early_exit; + } + if (nf_ct_is_expired(tmp)) { nf_ct_gc_expired(tmp); + expired_count++; continue; }
+ expires = clamp(nf_ct_expires(tmp), GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_CLAMP); + next_run += expires; + next_run /= 2u; + if (nf_conntrack_max95 == 0 || gc_worker_skip_ct(tmp)) continue;
@@ -1478,8 +1522,10 @@ static void gc_worker(struct work_struct *work) continue; }
- if (gc_worker_can_early_drop(tmp)) + if (gc_worker_can_early_drop(tmp)) { nf_ct_kill(tmp); + expired_count++; + }
nf_ct_put(tmp); } @@ -1492,33 +1538,38 @@ static void gc_worker(struct work_struct *work) cond_resched(); i++;
- if (time_after(jiffies, end_time) && i < hashsz) { + delta_time = nfct_time_stamp - end_time; + if (delta_time > 0 && i < hashsz) { + gc_work->avg_timeout = next_run; gc_work->next_bucket = i; next_run = 0; - break; + goto early_exit; } } while (i < hashsz);
+ gc_work->next_bucket = 0; + + next_run = clamp(next_run, GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_MAX); + + delta_time = max_t(s32, nfct_time_stamp - gc_work->start_time, 1); + if (next_run > (unsigned long)delta_time) + next_run -= delta_time; + else + next_run = 1; + +early_exit: if (gc_work->exiting) return;
- /* - * Eviction will normally happen from the packet path, and not - * from this gc worker. - * - * This worker is only here to reap expired entries when system went - * idle after a busy period. - */ - if (next_run) { + if (next_run) gc_work->early_drop = false; - gc_work->next_bucket = 0; - } + queue_delayed_work(system_power_efficient_wq, &gc_work->dwork, next_run); }
static void conntrack_gc_work_init(struct conntrack_gc_work *gc_work) { - INIT_DEFERRABLE_WORK(&gc_work->dwork, gc_worker); + INIT_DELAYED_WORK(&gc_work->dwork, gc_worker); gc_work->exiting = false; }
From: Wang Yufen wangyufen@huawei.com
[ Upstream commit f22881de730ebd472e15bcc2c0d1d46e36a87b9c ]
In calipso_map_cat_ntoh(), in the for loop, if the return value of netlbl_bitmap_walk() is equal to (net_clen_bits - 1), when netlbl_bitmap_walk() is called next time, out-of-bounds memory accesses of bitmap[byte_offset] occurs.
The bug was found during fuzzing. The following is the fuzzing report BUG: KASAN: slab-out-of-bounds in netlbl_bitmap_walk+0x3c/0xd0 Read of size 1 at addr ffffff8107bf6f70 by task err_OH/252
CPU: 7 PID: 252 Comm: err_OH Not tainted 5.17.0-rc7+ #17 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x21c/0x230 show_stack+0x1c/0x60 dump_stack_lvl+0x64/0x7c print_address_description.constprop.0+0x70/0x2d0 __kasan_report+0x158/0x16c kasan_report+0x74/0x120 __asan_load1+0x80/0xa0 netlbl_bitmap_walk+0x3c/0xd0 calipso_opt_getattr+0x1a8/0x230 calipso_sock_getattr+0x218/0x340 calipso_sock_getattr+0x44/0x60 netlbl_sock_getattr+0x44/0x80 selinux_netlbl_socket_setsockopt+0x138/0x170 selinux_socket_setsockopt+0x4c/0x60 security_socket_setsockopt+0x4c/0x90 __sys_setsockopt+0xbc/0x2b0 __arm64_sys_setsockopt+0x6c/0x84 invoke_syscall+0x64/0x190 el0_svc_common.constprop.0+0x88/0x200 do_el0_svc+0x88/0xa0 el0_svc+0x128/0x1b0 el0t_64_sync_handler+0x9c/0x120 el0t_64_sync+0x16c/0x170
Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Wang Yufen wangyufen@huawei.com Acked-by: Paul Moore paul@paul-moore.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/netlabel/netlabel_kapi.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c index beb0e573266d..54c083003947 100644 --- a/net/netlabel/netlabel_kapi.c +++ b/net/netlabel/netlabel_kapi.c @@ -885,6 +885,8 @@ int netlbl_bitmap_walk(const unsigned char *bitmap, u32 bitmap_len, unsigned char bitmask; unsigned char byte;
+ if (offset >= bitmap_len) + return -1; byte_offset = offset / 8; byte = bitmap[byte_offset]; bit_spot = offset;
From: Xiubo Li xiubli@redhat.com
[ Upstream commit 322794d3355c33adcc4feace0045d85a8e4ed813 ]
The ceph_get_inode() will search for or insert a new inode into the hash for the given vino, and return a reference to it. If new is non-NULL, its reference is consumed.
We should release the reference when in error handing cases.
Signed-off-by: Xiubo Li xiubli@redhat.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ceph/inode.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index ef4a980a7bf3..c092dce0485c 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -87,13 +87,13 @@ struct inode *ceph_get_snapdir(struct inode *parent) if (!S_ISDIR(parent->i_mode)) { pr_warn_once("bad snapdir parent type (mode=0%o)\n", parent->i_mode); - return ERR_PTR(-ENOTDIR); + goto err; }
if (!(inode->i_state & I_NEW) && !S_ISDIR(inode->i_mode)) { pr_warn_once("bad snapdir inode type (mode=0%o)\n", inode->i_mode); - return ERR_PTR(-ENOTDIR); + goto err; }
inode->i_mode = parent->i_mode; @@ -113,6 +113,12 @@ struct inode *ceph_get_snapdir(struct inode *parent) }
return inode; +err: + if ((inode->i_state & I_NEW)) + discard_new_inode(inode); + else + iput(inode); + return ERR_PTR(-ENOTDIR); }
const struct inode_operations ceph_file_iops = {
From: Xiubo Li xiubli@redhat.com
[ Upstream commit f639d9867eea647005dc824e0e24f39ffc50d4e4 ]
Reset the last_readdir at the same time, and add a comment explaining why we don't free last_readdir when dir_emit returns false.
Signed-off-by: Xiubo Li xiubli@redhat.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ceph/dir.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c index 133dbd9338e7..d91fa53e12b3 100644 --- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -478,8 +478,11 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx) 2 : (fpos_off(rde->offset) + 1); err = note_last_dentry(dfi, rde->name, rde->name_len, next_offset); - if (err) + if (err) { + ceph_mdsc_put_request(dfi->last_readdir); + dfi->last_readdir = NULL; return err; + } } else if (req->r_reply_info.dir_end) { dfi->next_offset = 2; /* keep last name */ @@ -520,6 +523,12 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx) if (!dir_emit(ctx, rde->name, rde->name_len, ceph_present_ino(inode->i_sb, le64_to_cpu(rde->inode.in->ino)), le32_to_cpu(rde->inode.in->mode) >> 12)) { + /* + * NOTE: Here no need to put the 'dfi->last_readdir', + * because when dir_emit stops us it's most likely + * doesn't have enough memory, etc. So for next readdir + * it will continue. + */ dout("filldir stopping us...\n"); return 0; }
From: Feng Tang feng.tang@intel.com
[ Upstream commit 1bf18da62106225dbc47aab41efee2aeb99caccd ]
0Day robots reported there is compiling issue for 'csky' ARCH when CONFIG_DEBUG_FORCE_DATA_SECTION_ALIGNED is enabled [1]:
All errors (new ones prefixed by >>):
{standard input}: Assembler messages:
{standard input}:2277: Error: pcrel offset for branch to .LS000B too far (0x3c)
Which was discussed in [2]. And as there is no solution for csky yet, add some dependency for this config to limit it to several ARCHs which have no compiling issue so far.
[1]. https://lore.kernel.org/lkml/202202271612.W32UJAj2-lkp@intel.com/ [2]. https://www.spinics.net/lists/linux-kbuild/msg30298.html
Link: https://lkml.kernel.org/r/20220304021100.GN4548@shbuild999.sh.intel.com Reported-by: kernel test robot lkp@intel.com Signed-off-by: Feng Tang feng.tang@intel.com Cc: Guo Ren guoren@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- lib/Kconfig.debug | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 14b89aa37c5c..440fd666c16d 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -416,7 +416,8 @@ config SECTION_MISMATCH_WARN_ONLY If unsure, say Y.
config DEBUG_FORCE_FUNCTION_ALIGN_64B - bool "Force all function address 64B aligned" if EXPERT + bool "Force all function address 64B aligned" + depends on EXPERT && (X86_64 || ARM64 || PPC32 || PPC64 || ARC) help There are cases that a commit from one domain changes the function address alignment of other domains, and cause magic performance
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit f9a40b0890658330c83c95511f9d6b396610defc ]
initcall_blacklist() should return 1 to indicate that it handled its cmdline arguments.
set_debug_rodata() should return 1 to indicate that it handled its cmdline arguments. Print a warning if the option string is invalid.
This prevents these strings from being added to the 'init' program's environment as they are not init arguments/parameters.
Link: https://lkml.kernel.org/r/20220221050901.23985-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap rdunlap@infradead.org Reported-by: Igor Zhbanov i.zhbanov@omprussia.ru Cc: Ingo Molnar mingo@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- init/main.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/init/main.c b/init/main.c index ada50f5a15e4..9a5097b2251a 100644 --- a/init/main.c +++ b/init/main.c @@ -1192,7 +1192,7 @@ static int __init initcall_blacklist(char *str) } } while (str_entry);
- return 0; + return 1; }
static bool __init_or_module initcall_blacklisted(initcall_t fn) @@ -1454,7 +1454,9 @@ static noinline void __init kernel_init_freeable(void); bool rodata_enabled __ro_after_init = true; static int __init set_debug_rodata(char *str) { - return strtobool(str, &rodata_enabled); + if (strtobool(str, &rodata_enabled)) + pr_warn("Invalid option string for rodata: '%s'\n", str); + return 1; } __setup("rodata=", set_debug_rodata); #endif
From: Qinghua Jin qhjin.dev@gmail.com
[ Upstream commit 9ce3c0d26c42d279b6c378a03cd6a61d828f19ca ]
Testcase: 1. create a minix file system and mount it 2. open a file on the file system with O_RDWR|O_CREAT|O_TRUNC|O_DIRECT 3. open fails with -EINVAL but leaves an empty file behind. All other open() failures don't leave the failed open files behind.
It is hard to check the direct_IO op before creating the inode. Just as ext4 and btrfs do, this patch will resolve the issue by allowing to create the file with O_DIRECT but returning error when writing the file.
Link: https://lkml.kernel.org/r/20220107133626.413379-1-qhjin.dev@gmail.com Signed-off-by: Qinghua Jin qhjin.dev@gmail.com Reported-by: Colin Ian King colin.king@intel.com Reviewed-by: Jan Kara jack@suse.cz Acked-by: Christian Brauner christian.brauner@ubuntu.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/minix/inode.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/minix/inode.c b/fs/minix/inode.c index a71f1cf894b9..d4bd94234ef7 100644 --- a/fs/minix/inode.c +++ b/fs/minix/inode.c @@ -447,7 +447,8 @@ static const struct address_space_operations minix_aops = { .writepage = minix_writepage, .write_begin = minix_write_begin, .write_end = generic_write_end, - .bmap = minix_bmap + .bmap = minix_bmap, + .direct_IO = noop_direct_IO };
static const struct inode_operations minix_symlink_inode_operations = {
From: Adam Wujek dev_public@wujek.eu
[ Upstream commit 2a8b539433e111c4de364237627ef219d2f6350a ]
SI5341_OUT_CFG_RDIV_FORCE2 shall be checked first to distinguish whether a divider for a given output is set to 2 (SI5341_OUT_CFG_RDIV_FORCE2 is set) or the output is disabled (SI5341_OUT_CFG_RDIV_FORCE2 not set, SI5341_OUT_R_REG is set 0). Before the change, divider set to 2 (SI5341_OUT_R_REG set to 0) was interpreted as output is disabled.
Signed-off-by: Adam Wujek dev_public@wujek.eu Link: https://lore.kernel.org/r/20211203141125.2447520-1-dev_public@wujek.eu Reviewed-by: Robert Hancock robert.hancock@calian.com Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/clk-si5341.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/drivers/clk/clk-si5341.c b/drivers/clk/clk-si5341.c index f7b41366666e..4de098b6b0d4 100644 --- a/drivers/clk/clk-si5341.c +++ b/drivers/clk/clk-si5341.c @@ -798,6 +798,15 @@ static unsigned long si5341_output_clk_recalc_rate(struct clk_hw *hw, u32 r_divider; u8 r[3];
+ err = regmap_read(output->data->regmap, + SI5341_OUT_CONFIG(output), &val); + if (err < 0) + return err; + + /* If SI5341_OUT_CFG_RDIV_FORCE2 is set, r_divider is 2 */ + if (val & SI5341_OUT_CFG_RDIV_FORCE2) + return parent_rate / 2; + err = regmap_bulk_read(output->data->regmap, SI5341_OUT_R_REG(output), r, 3); if (err < 0) @@ -814,13 +823,6 @@ static unsigned long si5341_output_clk_recalc_rate(struct clk_hw *hw, r_divider += 1; r_divider <<= 1;
- err = regmap_read(output->data->regmap, - SI5341_OUT_CONFIG(output), &val); - if (err < 0) - return err; - - if (val & SI5341_OUT_CFG_RDIV_FORCE2) - r_divider = 2;
return parent_rate / r_divider; }
From: José Expósito jose.exposito89@gmail.com
[ Upstream commit 7a688c91d3fd54c53e7a9edd6052cdae98dd99d8 ]
Handle the error branches to free memory where required.
Addresses-Coverity-ID: 1491825 ("Resource leak") Signed-off-by: José Expósito jose.exposito89@gmail.com Reviewed-by: Chen-Yu Tsai wenst@chromium.org Link: https://lore.kernel.org/r/20220115183059.GA10809@elementary Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/mediatek/clk-mt8192.c | 36 +++++++++++++++++++++++++------ 1 file changed, 30 insertions(+), 6 deletions(-)
diff --git a/drivers/clk/mediatek/clk-mt8192.c b/drivers/clk/mediatek/clk-mt8192.c index cbc7c6dbe0f4..79ddb3cc0b98 100644 --- a/drivers/clk/mediatek/clk-mt8192.c +++ b/drivers/clk/mediatek/clk-mt8192.c @@ -1236,9 +1236,17 @@ static int clk_mt8192_infra_probe(struct platform_device *pdev)
r = mtk_clk_register_gates(node, infra_clks, ARRAY_SIZE(infra_clks), clk_data); if (r) - return r; + goto free_clk_data; + + r = of_clk_add_provider(node, of_clk_src_onecell_get, clk_data); + if (r) + goto free_clk_data; + + return r;
- return of_clk_add_provider(node, of_clk_src_onecell_get, clk_data); +free_clk_data: + mtk_free_clk_data(clk_data); + return r; }
static int clk_mt8192_peri_probe(struct platform_device *pdev) @@ -1253,9 +1261,17 @@ static int clk_mt8192_peri_probe(struct platform_device *pdev)
r = mtk_clk_register_gates(node, peri_clks, ARRAY_SIZE(peri_clks), clk_data); if (r) - return r; + goto free_clk_data; + + r = of_clk_add_provider(node, of_clk_src_onecell_get, clk_data); + if (r) + goto free_clk_data;
- return of_clk_add_provider(node, of_clk_src_onecell_get, clk_data); + return r; + +free_clk_data: + mtk_free_clk_data(clk_data); + return r; }
static int clk_mt8192_apmixed_probe(struct platform_device *pdev) @@ -1271,9 +1287,17 @@ static int clk_mt8192_apmixed_probe(struct platform_device *pdev) mtk_clk_register_plls(node, plls, ARRAY_SIZE(plls), clk_data); r = mtk_clk_register_gates(node, apmixed_clks, ARRAY_SIZE(apmixed_clks), clk_data); if (r) - return r; + goto free_clk_data;
- return of_clk_add_provider(node, of_clk_src_onecell_get, clk_data); + r = of_clk_add_provider(node, of_clk_src_onecell_get, clk_data); + if (r) + goto free_clk_data; + + return r; + +free_clk_data: + mtk_free_clk_data(clk_data); + return r; }
static const struct of_device_id of_match_clk_mt8192[] = {
From: Stefan Wahren stefan.wahren@i2se.com
[ Upstream commit aa899e686d442c63d50f4d369cc02dbbf0941cb0 ]
vchiq_get_state() can return a NULL pointer. So handle this cases and avoid a NULL pointer derefence in vchiq_dump_platform_instances.
Reviewed-by: Nicolas Saenz Julienne nsaenz@kernel.org Signed-off-by: Stefan Wahren stefan.wahren@i2se.com Link: https://lore.kernel.org/r/1642968143-19281-17-git-send-email-stefan.wahren@i... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c index 3a2e4582db8e..a3e3c9f9aa18 100644 --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c @@ -1209,6 +1209,9 @@ int vchiq_dump_platform_instances(void *dump_context) int len; int i;
+ if (!state) + return -ENOTCONN; + /* * There is no list of instances, so instead scan all services, * marking those that have been dumped.
From: Stefan Wahren stefan.wahren@i2se.com
[ Upstream commit ca225857faf237234d2fffe5d1919467dfadd822 ]
In case of an invalid handle the function find_servive_by_handle returns NULL. So take care of this and avoid a NULL pointer dereference.
Reviewed-by: Nicolas Saenz Julienne nsaenz@kernel.org Signed-off-by: Stefan Wahren stefan.wahren@i2se.com Link: https://lore.kernel.org/r/1642968143-19281-18-git-send-email-stefan.wahren@i... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../staging/vc04_services/interface/vchiq_arm/vchiq_core.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_core.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_core.c index 7fe20d4b7ba2..b7295236671c 100644 --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_core.c +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_core.c @@ -2306,6 +2306,9 @@ void vchiq_msg_queue_push(unsigned int handle, struct vchiq_header *header) struct vchiq_service *service = find_service_by_handle(handle); int pos;
+ if (!service) + return; + while (service->msg_queue_write == service->msg_queue_read + VCHIQ_MAX_SLOTS) { if (wait_for_completion_interruptible(&service->msg_queue_pop)) @@ -2326,6 +2329,9 @@ struct vchiq_header *vchiq_msg_hold(unsigned int handle) struct vchiq_header *header; int pos;
+ if (!service) + return NULL; + if (service->msg_queue_write == service->msg_queue_read) return NULL;
From: Amjad Ouled-Ameur aouledameur@baylibre.com
[ Upstream commit 2f87727130ce17ffefecd0895eeebf22d5a36f6f ]
Use reset_control_rearm() call if an error occurs in case phy_meson_gxl_usb2_init() fails after reset() has been called ; or in case phy_meson_gxl_usb2_exit() is called i.e the resource is no longer used and the reset line may be triggered again by other devices.
reset_control_rearm() keeps use of triggered_count sane in the reset framework. Therefore, use of reset_control_reset() on shared reset line should be balanced with reset_control_rearm().
Signed-off-by: Amjad Ouled-Ameur aouledameur@baylibre.com Reported-by: Jerome Brunet jbrunet@baylibre.com Reviewed-by: Martin Blumenstingl martin.blumenstingl@googlemail.com Reviewed-by: Philipp Zabel p.zabel@pengutronix.de Acked-by: Neil Armstrong narmstrong@baylibre.com Link: https://lore.kernel.org/r/20220111095255.176141-2-aouledameur@baylibre.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/phy/amlogic/phy-meson-gxl-usb2.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/phy/amlogic/phy-meson-gxl-usb2.c b/drivers/phy/amlogic/phy-meson-gxl-usb2.c index 2b3c0d730f20..db17c3448bfe 100644 --- a/drivers/phy/amlogic/phy-meson-gxl-usb2.c +++ b/drivers/phy/amlogic/phy-meson-gxl-usb2.c @@ -114,8 +114,10 @@ static int phy_meson_gxl_usb2_init(struct phy *phy) return ret;
ret = clk_prepare_enable(priv->clk); - if (ret) + if (ret) { + reset_control_rearm(priv->reset); return ret; + }
return 0; } @@ -125,6 +127,7 @@ static int phy_meson_gxl_usb2_exit(struct phy *phy) struct phy_meson_gxl_usb2_priv *priv = phy_get_drvdata(phy);
clk_disable_unprepare(priv->clk); + reset_control_rearm(priv->reset);
return 0; }
From: Amjad Ouled-Ameur aouledameur@baylibre.com
[ Upstream commit 6466ba1898d415b527e1013bd8551a6fdfece94c ]
Use the existing dev_err_probe() helper instead of open-coding the same operation.
Signed-off-by: Amjad Ouled-Ameur aouledameur@baylibre.com Reported-by: Martin Blumenstingl martin.blumenstingl@googlemail.com Reviewed-by: Martin Blumenstingl martin.blumenstingl@googlemail.com Acked-by: Neil Armstrong narmstrong@baylibre.com Link: https://lore.kernel.org/r/20220111095255.176141-3-aouledameur@baylibre.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/phy/amlogic/phy-meson8b-usb2.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/phy/amlogic/phy-meson8b-usb2.c b/drivers/phy/amlogic/phy-meson8b-usb2.c index cf10bed40528..77e7e9b1428c 100644 --- a/drivers/phy/amlogic/phy-meson8b-usb2.c +++ b/drivers/phy/amlogic/phy-meson8b-usb2.c @@ -265,8 +265,9 @@ static int phy_meson8b_usb2_probe(struct platform_device *pdev) return PTR_ERR(priv->clk_usb);
priv->reset = devm_reset_control_get_optional_shared(&pdev->dev, NULL); - if (PTR_ERR(priv->reset) == -EPROBE_DEFER) - return PTR_ERR(priv->reset); + if (IS_ERR(priv->reset)) + return dev_err_probe(&pdev->dev, PTR_ERR(priv->reset), + "Failed to get the reset line");
priv->dr_mode = of_usb_get_dr_mode_by_phy(pdev->dev.of_node, -1); if (priv->dr_mode == USB_DR_MODE_UNKNOWN) {
From: Amjad Ouled-Ameur aouledameur@baylibre.com
[ Upstream commit 6f1dedf089ab1a4f03ea7aadc3c4a99885b4b4a0 ]
Use reset_control_rearm() call if an error occurs in case phy_meson8b_usb2_power_on() fails after reset() has been called, or in case phy_meson8b_usb2_power_off() is called i.e the resource is no longer used and the reset line may be triggered again by other devices.
reset_control_rearm() keeps use of triggered_count sane in the reset framework, use of reset_control_reset() on shared reset line should be balanced with reset_control_rearm().
Signed-off-by: Amjad Ouled-Ameur aouledameur@baylibre.com Reported-by: Jerome Brunet jbrunet@baylibre.com Reviewed-by: Martin Blumenstingl martin.blumenstingl@googlemail.com Acked-by: Neil Armstrong narmstrong@baylibre.com Link: https://lore.kernel.org/r/20220111095255.176141-4-aouledameur@baylibre.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/phy/amlogic/phy-meson8b-usb2.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/phy/amlogic/phy-meson8b-usb2.c b/drivers/phy/amlogic/phy-meson8b-usb2.c index 77e7e9b1428c..dd96763911b8 100644 --- a/drivers/phy/amlogic/phy-meson8b-usb2.c +++ b/drivers/phy/amlogic/phy-meson8b-usb2.c @@ -154,6 +154,7 @@ static int phy_meson8b_usb2_power_on(struct phy *phy) ret = clk_prepare_enable(priv->clk_usb_general); if (ret) { dev_err(&phy->dev, "Failed to enable USB general clock\n"); + reset_control_rearm(priv->reset); return ret; }
@@ -161,6 +162,7 @@ static int phy_meson8b_usb2_power_on(struct phy *phy) if (ret) { dev_err(&phy->dev, "Failed to enable USB DDR clock\n"); clk_disable_unprepare(priv->clk_usb_general); + reset_control_rearm(priv->reset); return ret; }
@@ -199,6 +201,7 @@ static int phy_meson8b_usb2_power_on(struct phy *phy) dev_warn(&phy->dev, "USB ID detect failed!\n"); clk_disable_unprepare(priv->clk_usb); clk_disable_unprepare(priv->clk_usb_general); + reset_control_rearm(priv->reset); return -EINVAL; } } @@ -218,6 +221,7 @@ static int phy_meson8b_usb2_power_off(struct phy *phy)
clk_disable_unprepare(priv->clk_usb); clk_disable_unprepare(priv->clk_usb_general); + reset_control_rearm(priv->reset);
/* power off the PHY by putting it into reset mode */ regmap_update_bits(priv->regmap, REG_CTRL, REG_CTRL_POWER_ON_RESET,
From: Sascha Hauer s.hauer@pengutronix.de
[ Upstream commit ff3187eabb5ce478d15b6ed62eb286756adefac3 ]
The pixel clocks dclk_vop[012] can be clocked from hpll, vpll, gpll or cpll. gpll and cpll also drive many other clocks, so changing the dclk_vop[012] clocks could change these other clocks as well. Drop CLK_SET_RATE_PARENT to fix that. With this change the VOP2 driver can only adjust the pixel clocks with the divider between the PLL and the dclk_vop[012] which means the user may have to adjust the PLL clock to a suitable rate using the assigned-clock-rate device tree property.
Signed-off-by: Sascha Hauer s.hauer@pengutronix.de Link: https://lore.kernel.org/r/20220126145549.617165-25-s.hauer@pengutronix.de Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/rockchip/clk-rk3568.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/clk/rockchip/clk-rk3568.c b/drivers/clk/rockchip/clk-rk3568.c index 69a9e8069a48..604a367bc498 100644 --- a/drivers/clk/rockchip/clk-rk3568.c +++ b/drivers/clk/rockchip/clk-rk3568.c @@ -1038,13 +1038,13 @@ static struct rockchip_clk_branch rk3568_clk_branches[] __initdata = { RK3568_CLKGATE_CON(20), 8, GFLAGS), GATE(HCLK_VOP, "hclk_vop", "hclk_vo", 0, RK3568_CLKGATE_CON(20), 9, GFLAGS), - COMPOSITE(DCLK_VOP0, "dclk_vop0", hpll_vpll_gpll_cpll_p, CLK_SET_RATE_PARENT | CLK_SET_RATE_NO_REPARENT, + COMPOSITE(DCLK_VOP0, "dclk_vop0", hpll_vpll_gpll_cpll_p, CLK_SET_RATE_NO_REPARENT, RK3568_CLKSEL_CON(39), 10, 2, MFLAGS, 0, 8, DFLAGS, RK3568_CLKGATE_CON(20), 10, GFLAGS), - COMPOSITE(DCLK_VOP1, "dclk_vop1", hpll_vpll_gpll_cpll_p, CLK_SET_RATE_PARENT | CLK_SET_RATE_NO_REPARENT, + COMPOSITE(DCLK_VOP1, "dclk_vop1", hpll_vpll_gpll_cpll_p, CLK_SET_RATE_NO_REPARENT, RK3568_CLKSEL_CON(40), 10, 2, MFLAGS, 0, 8, DFLAGS, RK3568_CLKGATE_CON(20), 11, GFLAGS), - COMPOSITE(DCLK_VOP2, "dclk_vop2", hpll_vpll_gpll_cpll_p, 0, + COMPOSITE(DCLK_VOP2, "dclk_vop2", hpll_vpll_gpll_cpll_p, CLK_SET_RATE_NO_REPARENT, RK3568_CLKSEL_CON(41), 10, 2, MFLAGS, 0, 8, DFLAGS, RK3568_CLKGATE_CON(20), 12, GFLAGS), GATE(CLK_VOP_PWM, "clk_vop_pwm", "xin24m", 0,
From: Pierre Gondois Pierre.Gondois@arm.com
[ Upstream commit ec1c7ad47664f964c1101fe555b6fde0cb124b38 ]
CPUfreq governors request CPU frequencies using information on current CPU usage. The CPPC driver converts them to performance requests. Frequency targets are computed as: target_freq = (util / cpu_capacity) * max_freq target_freq is then clamped between [policy->min, policy->max].
The CPPC driver converts performance values to frequencies (and vice-versa) using cppc_cpufreq_perf_to_khz() and cppc_cpufreq_khz_to_perf(). These functions both use two different factors depending on the range of the input value. For cppc_cpufreq_khz_to_perf(): - (NOMINAL_PERF / NOMINAL_FREQ) or - (LOWEST_PERF / LOWEST_FREQ) and for cppc_cpufreq_perf_to_khz(): - (NOMINAL_FREQ / NOMINAL_PERF) or - ((NOMINAL_PERF - LOWEST_FREQ) / (NOMINAL_PERF - LOWEST_PERF))
This means: 1- the functions are not inverse for some values: (perf_to_khz(khz_to_perf(x)) != x) 2- cppc_cpufreq_perf_to_khz(LOWEST_PERF) can sometimes give a different value from LOWEST_FREQ due to integer approximation 3- it is implied that performance and frequency are proportional (NOMINAL_FREQ / NOMINAL_PERF) == (LOWEST_PERF / LOWEST_FREQ)
This patch changes the conversion functions to an affine function. This fixes the 3 points above.
Suggested-by: Lukasz Luba lukasz.luba@arm.com Suggested-by: Morten Rasmussen morten.rasmussen@arm.com Signed-off-by: Pierre Gondois Pierre.Gondois@arm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cpufreq/cppc_cpufreq.c | 43 +++++++++++++++++----------------- 1 file changed, 21 insertions(+), 22 deletions(-)
diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index db17196266e4..82d370ae6a4a 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -303,52 +303,48 @@ static u64 cppc_get_dmi_max_khz(void)
/* * If CPPC lowest_freq and nominal_freq registers are exposed then we can - * use them to convert perf to freq and vice versa - * - * If the perf/freq point lies between Nominal and Lowest, we can treat - * (Low perf, Low freq) and (Nom Perf, Nom freq) as 2D co-ordinates of a line - * and extrapolate the rest - * For perf/freq > Nominal, we use the ratio perf:freq at Nominal for conversion + * use them to convert perf to freq and vice versa. The conversion is + * extrapolated as an affine function passing by the 2 points: + * - (Low perf, Low freq) + * - (Nominal perf, Nominal perf) */ static unsigned int cppc_cpufreq_perf_to_khz(struct cppc_cpudata *cpu_data, unsigned int perf) { struct cppc_perf_caps *caps = &cpu_data->perf_caps; + s64 retval, offset = 0; static u64 max_khz; u64 mul, div;
if (caps->lowest_freq && caps->nominal_freq) { - if (perf >= caps->nominal_perf) { - mul = caps->nominal_freq; - div = caps->nominal_perf; - } else { - mul = caps->nominal_freq - caps->lowest_freq; - div = caps->nominal_perf - caps->lowest_perf; - } + mul = caps->nominal_freq - caps->lowest_freq; + div = caps->nominal_perf - caps->lowest_perf; + offset = caps->nominal_freq - div64_u64(caps->nominal_perf * mul, div); } else { if (!max_khz) max_khz = cppc_get_dmi_max_khz(); mul = max_khz; div = caps->highest_perf; } - return (u64)perf * mul / div; + + retval = offset + div64_u64(perf * mul, div); + if (retval >= 0) + return retval; + return 0; }
static unsigned int cppc_cpufreq_khz_to_perf(struct cppc_cpudata *cpu_data, unsigned int freq) { struct cppc_perf_caps *caps = &cpu_data->perf_caps; + s64 retval, offset = 0; static u64 max_khz; u64 mul, div;
if (caps->lowest_freq && caps->nominal_freq) { - if (freq >= caps->nominal_freq) { - mul = caps->nominal_perf; - div = caps->nominal_freq; - } else { - mul = caps->lowest_perf; - div = caps->lowest_freq; - } + mul = caps->nominal_perf - caps->lowest_perf; + div = caps->nominal_freq - caps->lowest_freq; + offset = caps->nominal_perf - div64_u64(caps->nominal_freq * mul, div); } else { if (!max_khz) max_khz = cppc_get_dmi_max_khz(); @@ -356,7 +352,10 @@ static unsigned int cppc_cpufreq_khz_to_perf(struct cppc_cpudata *cpu_data, div = max_khz; }
- return (u64)freq * mul / div; + retval = offset + div64_u64(freq * mul, div); + if (retval >= 0) + return retval; + return 0; }
static int cppc_cpufreq_set_target(struct cpufreq_policy *policy,
From: Viresh Kumar viresh.kumar@linaro.org
[ Upstream commit 021dbecabc93b1610b5db989d52a94e0c6671136 ]
It is difficult to find which OPPs are active at the moment, specially if there are multiple OPPs with same frequency available in the device tree (controlled by supported hardware feature).
Expose name of the DT node to find out the exact OPP.
While at it, also expose level field.
Reported-by: Leo Yan leo.yan@linaro.org Tested-by: Leo Yan leo.yan@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/opp/debugfs.c | 5 +++++ drivers/opp/opp.h | 1 + 2 files changed, 6 insertions(+)
diff --git a/drivers/opp/debugfs.c b/drivers/opp/debugfs.c index 596c185b5dda..b5f2f9f39392 100644 --- a/drivers/opp/debugfs.c +++ b/drivers/opp/debugfs.c @@ -10,6 +10,7 @@ #include <linux/debugfs.h> #include <linux/device.h> #include <linux/err.h> +#include <linux/of.h> #include <linux/init.h> #include <linux/limits.h> #include <linux/slab.h> @@ -131,9 +132,13 @@ void opp_debug_create_one(struct dev_pm_opp *opp, struct opp_table *opp_table) debugfs_create_bool("suspend", S_IRUGO, d, &opp->suspend); debugfs_create_u32("performance_state", S_IRUGO, d, &opp->pstate); debugfs_create_ulong("rate_hz", S_IRUGO, d, &opp->rate); + debugfs_create_u32("level", S_IRUGO, d, &opp->level); debugfs_create_ulong("clock_latency_ns", S_IRUGO, d, &opp->clock_latency_ns);
+ opp->of_name = of_node_full_name(opp->np); + debugfs_create_str("of_name", S_IRUGO, d, (char **)&opp->of_name); + opp_debug_create_supplies(opp, opp_table, d); opp_debug_create_bw(opp, opp_table, d);
diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h index 407c3bfe51d9..45e3a55239a1 100644 --- a/drivers/opp/opp.h +++ b/drivers/opp/opp.h @@ -96,6 +96,7 @@ struct dev_pm_opp {
#ifdef CONFIG_DEBUG_FS struct dentry *dentry; + const char *of_name; #endif };
From: Jérôme Pouiller jerome.pouiller@silabs.com
[ Upstream commit 96e0cbca1cb96e9d3deac3051aa816e13082f3fd ]
Until now, the SDIO quirks are applied directly from the driver. However, it is better to apply the quirks before driver probing. So, this patch relocate the quirks in the MMC framework.
Note that the WF200 has no valid SDIO VID/PID. Therefore, we match DT rather than on the SDIO VID/PID.
Reviewed-by: Pali Rohár pali@kernel.org Reviewed-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Jérôme Pouiller jerome.pouiller@silabs.com Link: https://lore.kernel.org/r/20220216093112.92469-3-Jerome.Pouiller@silabs.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/core/quirks.h | 5 +++++ drivers/staging/wfx/bus_sdio.c | 3 --- 2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/mmc/core/quirks.h b/drivers/mmc/core/quirks.h index 20f568727277..f879dc63d936 100644 --- a/drivers/mmc/core/quirks.h +++ b/drivers/mmc/core/quirks.h @@ -149,6 +149,11 @@ static const struct mmc_fixup __maybe_unused sdio_fixup_methods[] = { static const struct mmc_fixup __maybe_unused sdio_card_init_methods[] = { SDIO_FIXUP_COMPATIBLE("ti,wl1251", wl1251_quirk, 0),
+ SDIO_FIXUP_COMPATIBLE("silabs,wf200", add_quirk, + MMC_QUIRK_BROKEN_BYTE_MODE_512 | + MMC_QUIRK_LENIENT_FN0 | + MMC_QUIRK_BLKSZ_FOR_BYTE_MODE), + END_FIXUP };
diff --git a/drivers/staging/wfx/bus_sdio.c b/drivers/staging/wfx/bus_sdio.c index a670176ba06f..0612f8a7c085 100644 --- a/drivers/staging/wfx/bus_sdio.c +++ b/drivers/staging/wfx/bus_sdio.c @@ -207,9 +207,6 @@ static int wfx_sdio_probe(struct sdio_func *func,
bus->func = func; sdio_set_drvdata(func, bus); - func->card->quirks |= MMC_QUIRK_LENIENT_FN0 | - MMC_QUIRK_BLKSZ_FOR_BYTE_MODE | - MMC_QUIRK_BROKEN_BYTE_MODE_512;
sdio_claim_host(func); ret = sdio_enable_func(func);
From: Xiaoke Wang xkernel.wang@foxmail.com
[ Upstream commit 60f1d3c92dc1ef1026e5b917a329a7fa947da036 ]
One error handler of wfx_init_common() return without calling ieee80211_free_hw(hw), which may result in memory leak. And I add one err label to unify the error handler, which is useful for the subsequent changes.
Suggested-by: Jérôme Pouiller jerome.pouiller@silabs.com Reviewed-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Jérôme Pouiller jerome.pouiller@silabs.com Signed-off-by: Xiaoke Wang xkernel.wang@foxmail.com Link: https://lore.kernel.org/r/tencent_24A24A3EFF61206ECCC4B94B1C5C1454E108@qq.co... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/staging/wfx/main.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/staging/wfx/main.c b/drivers/staging/wfx/main.c index 858d778cc589..e3999e95ce85 100644 --- a/drivers/staging/wfx/main.c +++ b/drivers/staging/wfx/main.c @@ -322,7 +322,8 @@ struct wfx_dev *wfx_init_common(struct device *dev, wdev->pdata.gpio_wakeup = devm_gpiod_get_optional(dev, "wakeup", GPIOD_OUT_LOW); if (IS_ERR(wdev->pdata.gpio_wakeup)) - return NULL; + goto err; + if (wdev->pdata.gpio_wakeup) gpiod_set_consumer_name(wdev->pdata.gpio_wakeup, "wfx wakeup");
@@ -341,6 +342,10 @@ struct wfx_dev *wfx_init_common(struct device *dev, return NULL;
return wdev; + +err: + ieee80211_free_hw(hw); + return NULL; }
int wfx_probe(struct wfx_dev *wdev)
From: Lucas Denefle lucas.denefle@converge.io
[ Upstream commit 41a92a89eee819298f805c40187ad8b02bb53426 ]
w1_seq was failing due to several devices responding to the CHAIN_DONE at the same time. Now properly selects the current device in the chain with MATCH_ROM. Also acknowledgment was read twice.
Signed-off-by: Lucas Denefle lucas.denefle@converge.io Link: https://lore.kernel.org/r/20220223113558.232750-1-lucas.denefle@converge.io Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/w1/slaves/w1_therm.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c index 565578002d79..c7b8a8e787e2 100644 --- a/drivers/w1/slaves/w1_therm.c +++ b/drivers/w1/slaves/w1_therm.c @@ -2089,16 +2089,20 @@ static ssize_t w1_seq_show(struct device *device, if (sl->reg_num.id == reg_num->id) seq = i;
+ if (w1_reset_bus(sl->master)) + goto error; + + /* Put the device into chain DONE state */ + w1_write_8(sl->master, W1_MATCH_ROM); + w1_write_block(sl->master, (u8 *)&rn, 8); w1_write_8(sl->master, W1_42_CHAIN); w1_write_8(sl->master, W1_42_CHAIN_DONE); w1_write_8(sl->master, W1_42_CHAIN_DONE_INV); - w1_read_block(sl->master, &ack, sizeof(ack));
/* check for acknowledgment */ ack = w1_read_8(sl->master); if (ack != W1_42_SUCCESS_CONFIRM_BYTE) goto error; - }
/* Exit from CHAIN state */
From: Xin Xiong xiongx18@fudan.edu.cn
[ Upstream commit b7f114edd54326f730a754547e7cfb197b5bc132 ]
[You don't often get email from xiongx18@fudan.edu.cn. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.]
The reference counting issue happens in two error paths in the function _nfs42_proc_copy_notify(). In both error paths, the function simply returns the error code and forgets to balance the refcount of object `ctx`, bumped by get_nfs_open_context() earlier, which may cause refcount leaks.
Fix it by balancing refcount of the `ctx` object before the function returns in both error paths.
Signed-off-by: Xin Xiong xiongx18@fudan.edu.cn Signed-off-by: Xiyu Yang xiyuyang19@fudan.edu.cn Signed-off-by: Xin Tan tanxin.ctf@gmail.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs42proc.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c index 32129446beca..ca878d021fab 100644 --- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -591,8 +591,10 @@ static int _nfs42_proc_copy_notify(struct file *src, struct file *dst,
ctx = get_nfs_open_context(nfs_file_open_context(src)); l_ctx = nfs_get_lock_context(ctx); - if (IS_ERR(l_ctx)) - return PTR_ERR(l_ctx); + if (IS_ERR(l_ctx)) { + status = PTR_ERR(l_ctx); + goto out; + }
status = nfs4_set_rw_stateid(&args->cna_src_stateid, ctx, l_ctx, FMODE_READ); @@ -600,7 +602,7 @@ static int _nfs42_proc_copy_notify(struct file *src, struct file *dst, if (status) { if (status == -EAGAIN) status = -NFS4ERR_BAD_STATEID; - return status; + goto out; }
status = nfs4_call_sync(src_server->client, src_server, &msg, @@ -609,6 +611,7 @@ static int _nfs42_proc_copy_notify(struct file *src, struct file *dst, if (status == -ENOTSUPP) src_server->caps &= ~NFS_CAP_COPY_NOTIFY;
+out: put_nfs_open_context(nfs_file_open_context(src)); return status; }
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 3e17898aca293a24dae757a440a50aa63ca29671 ]
If memory allocation triggers a direct reclaim from the state recovery thread, then we can deadlock. Use memalloc_nofs_save/restore to ensure that doesn't happen.
Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4state.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index f5a62c0d999b..0f4818627ef0 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -49,6 +49,7 @@ #include <linux/workqueue.h> #include <linux/bitops.h> #include <linux/jiffies.h> +#include <linux/sched/mm.h>
#include <linux/sunrpc/clnt.h>
@@ -2560,9 +2561,17 @@ static void nfs4_layoutreturn_any_run(struct nfs_client *clp)
static void nfs4_state_manager(struct nfs_client *clp) { + unsigned int memflags; int status = 0; const char *section = "", *section_sep = "";
+ /* + * State recovery can deadlock if the direct reclaim code tries + * start NFS writeback. So ensure memory allocations are all + * GFP_NOFS. + */ + memflags = memalloc_nofs_save(); + /* Ensure exclusive access to NFSv4 state */ do { trace_nfs4_state_mgr(clp); @@ -2657,6 +2666,7 @@ static void nfs4_state_manager(struct nfs_client *clp) clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state); }
+ memalloc_nofs_restore(memflags); nfs4_end_drain_session(clp); nfs4_clear_state_manager_bit(clp);
@@ -2674,6 +2684,7 @@ static void nfs4_state_manager(struct nfs_client *clp) return; if (test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) != 0) return; + memflags = memalloc_nofs_save(); } while (refcount_read(&clp->cl_count) > 1 && !signalled()); goto out_drain;
@@ -2686,6 +2697,7 @@ static void nfs4_state_manager(struct nfs_client *clp) clp->cl_hostname, -status); ssleep(1); out_drain: + memalloc_nofs_restore(memflags); nfs4_end_drain_session(clp); nfs4_clear_state_manager_bit(clp); }
From: Ohad Sharabi osharabi@habana.ai
[ Upstream commit eb85eec858c1a5c11d3a0bff403f6440b05b40dc ]
This patch fixes what seems to be copy paste error.
We will have a memory leak if the host-resident shadow is NULL (which will likely happen as the DR and HR are not dependent).
Signed-off-by: Ohad Sharabi osharabi@habana.ai Reviewed-by: Oded Gabbay ogabbay@kernel.org Signed-off-by: Oded Gabbay ogabbay@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/misc/habanalabs/common/mmu/mmu_v1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/misc/habanalabs/common/mmu/mmu_v1.c b/drivers/misc/habanalabs/common/mmu/mmu_v1.c index 6134b6ae7615..3cadef97817d 100644 --- a/drivers/misc/habanalabs/common/mmu/mmu_v1.c +++ b/drivers/misc/habanalabs/common/mmu/mmu_v1.c @@ -467,7 +467,7 @@ static void hl_mmu_v1_fini(struct hl_device *hdev) { /* MMU H/W fini was already done in device hw_fini() */
- if (!ZERO_OR_NULL_PTR(hdev->mmu_priv.hr.mmu_shadow_hop0)) { + if (!ZERO_OR_NULL_PTR(hdev->mmu_priv.dr.mmu_shadow_hop0)) { kvfree(hdev->mmu_priv.dr.mmu_shadow_hop0); gen_pool_destroy(hdev->mmu_priv.dr.mmu_pgt_pool);
From: Oded Gabbay ogabbay@kernel.org
[ Upstream commit 9a79e3e4a3637c07352d9723b825490a1b04391f ]
This is not something we can do a workaround. It is clearly an error and we should notify the user that it is an error.
Signed-off-by: Oded Gabbay ogabbay@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/misc/habanalabs/common/memory.c | 30 +++++++++---------------- 1 file changed, 11 insertions(+), 19 deletions(-)
diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c index c1eefaebacb6..bcc6e547e071 100644 --- a/drivers/misc/habanalabs/common/memory.c +++ b/drivers/misc/habanalabs/common/memory.c @@ -1967,16 +1967,15 @@ static int export_dmabuf_from_handle(struct hl_ctx *ctx, u64 handle, int flags, static int mem_ioctl_no_mmu(struct hl_fpriv *hpriv, union hl_mem_args *args) { struct hl_device *hdev = hpriv->hdev; - struct hl_ctx *ctx = hpriv->ctx; u64 block_handle, device_addr = 0; + struct hl_ctx *ctx = hpriv->ctx; u32 handle = 0, block_size; - int rc, dmabuf_fd = -EBADF; + int rc;
switch (args->in.op) { case HL_MEM_OP_ALLOC: if (args->in.alloc.mem_size == 0) { - dev_err(hdev->dev, - "alloc size must be larger than 0\n"); + dev_err(hdev->dev, "alloc size must be larger than 0\n"); rc = -EINVAL; goto out; } @@ -1997,15 +1996,14 @@ static int mem_ioctl_no_mmu(struct hl_fpriv *hpriv, union hl_mem_args *args)
case HL_MEM_OP_MAP: if (args->in.flags & HL_MEM_USERPTR) { - device_addr = args->in.map_host.host_virt_addr; - rc = 0; + dev_err(hdev->dev, "Failed to map host memory when MMU is disabled\n"); + rc = -EPERM; } else { - rc = get_paddr_from_handle(ctx, &args->in, - &device_addr); + rc = get_paddr_from_handle(ctx, &args->in, &device_addr); + memset(args, 0, sizeof(*args)); + args->out.device_virt_addr = device_addr; }
- memset(args, 0, sizeof(*args)); - args->out.device_virt_addr = device_addr; break;
case HL_MEM_OP_UNMAP: @@ -2013,20 +2011,14 @@ static int mem_ioctl_no_mmu(struct hl_fpriv *hpriv, union hl_mem_args *args) break;
case HL_MEM_OP_MAP_BLOCK: - rc = map_block(hdev, args->in.map_block.block_addr, - &block_handle, &block_size); + rc = map_block(hdev, args->in.map_block.block_addr, &block_handle, &block_size); args->out.block_handle = block_handle; args->out.block_size = block_size; break;
case HL_MEM_OP_EXPORT_DMABUF_FD: - rc = export_dmabuf_from_addr(ctx, - args->in.export_dmabuf_fd.handle, - args->in.export_dmabuf_fd.mem_size, - args->in.flags, - &dmabuf_fd); - memset(args, 0, sizeof(*args)); - args->out.fd = dmabuf_fd; + dev_err(hdev->dev, "Failed to export dma-buf object when MMU is disabled\n"); + rc = -EPERM; break;
default:
From: Oded Gabbay ogabbay@kernel.org
[ Upstream commit 26ef1c000bc21a192618c9ec651dd36ba63ca00c ]
Various AXI errors can occur in the NIC engines and are reported to the driver by the f/w. Add code to print the errors and ack them to the f/w.
Signed-off-by: Oded Gabbay ogabbay@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/misc/habanalabs/gaudi/gaudi.c | 48 +++++++++++++++++++++++++++ 1 file changed, 48 insertions(+)
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c index 013c6da2e3ca..b4dacea80151 100644 --- a/drivers/misc/habanalabs/gaudi/gaudi.c +++ b/drivers/misc/habanalabs/gaudi/gaudi.c @@ -7819,6 +7819,48 @@ static void gaudi_print_fw_alive_info(struct hl_device *hdev, fw_alive->thread_id, fw_alive->uptime_seconds); }
+static void gaudi_print_nic_axi_irq_info(struct hl_device *hdev, u16 event_type, + void *data) +{ + char desc[64] = "", *type; + struct eq_nic_sei_event *eq_nic_sei = data; + u16 nic_id = event_type - GAUDI_EVENT_NIC_SEI_0; + + switch (eq_nic_sei->axi_error_cause) { + case RXB: + type = "RXB"; + break; + case RXE: + type = "RXE"; + break; + case TXS: + type = "TXS"; + break; + case TXE: + type = "TXE"; + break; + case QPC_RESP: + type = "QPC_RESP"; + break; + case NON_AXI_ERR: + type = "NON_AXI_ERR"; + break; + case TMR: + type = "TMR"; + break; + default: + dev_err(hdev->dev, "unknown NIC AXI cause %d\n", + eq_nic_sei->axi_error_cause); + type = "N/A"; + break; + } + + snprintf(desc, sizeof(desc), "NIC%d_%s%d", nic_id, type, + eq_nic_sei->id); + dev_err_ratelimited(hdev->dev, "Received H/W interrupt %d ["%s"]\n", + event_type, desc); +} + static int gaudi_non_hard_reset_late_init(struct hl_device *hdev) { /* GAUDI doesn't support any reset except hard-reset */ @@ -8066,6 +8108,7 @@ static void gaudi_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_entry) { struct gaudi_device *gaudi = hdev->asic_specific; + u64 data = le64_to_cpu(eq_entry->data[0]); u32 ctl = le32_to_cpu(eq_entry->hdr.ctl); u32 fw_fatal_err_flag = 0; u16 event_type = ((ctl & EQ_CTL_EVENT_TYPE_MASK) @@ -8263,6 +8306,11 @@ static void gaudi_handle_eqe(struct hl_device *hdev, hl_fw_unmask_irq(hdev, event_type); break;
+ case GAUDI_EVENT_NIC_SEI_0 ... GAUDI_EVENT_NIC_SEI_4: + gaudi_print_nic_axi_irq_info(hdev, event_type, &data); + hl_fw_unmask_irq(hdev, event_type); + break; + case GAUDI_EVENT_DMA_IF_SEI_0 ... GAUDI_EVENT_DMA_IF_SEI_3: gaudi_print_irq_info(hdev, event_type, false); gaudi_print_sm_sei_info(hdev, event_type,
From: Dongli Zhang dongli.zhang@oracle.com
[ Upstream commit eed05744322da07dd7e419432dcedf3c2e017179 ]
The sched_clock() can be used very early since commit 857baa87b642 ("sched/clock: Enable sched clock early"). In addition, with commit 38669ba205d1 ("x86/xen/time: Output xen sched_clock time from 0"), kdump kernel in Xen HVM guest may panic at very early stage when accessing &__this_cpu_read(xen_vcpu)->time as in below:
setup_arch() -> init_hypervisor_platform() -> x86_init.hyper.init_platform = xen_hvm_guest_init() -> xen_hvm_init_time_ops() -> xen_clocksource_read() -> src = &__this_cpu_read(xen_vcpu)->time;
This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info' embedded inside 'shared_info' during early stage until xen_vcpu_setup() is used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address.
However, when Xen HVM guest panic on vcpu >= 32, since xen_vcpu_info_reset(0) would set per_cpu(xen_vcpu, cpu) = NULL when vcpu >= 32, xen_clocksource_read() on vcpu >= 32 would panic.
This patch calls xen_hvm_init_time_ops() again later in xen_hvm_smp_prepare_boot_cpu() after the 'vcpu_info' for boot vcpu is registered when the boot vcpu is >= 32.
This issue can be reproduced on purpose via below command at the guest side when kdump/kexec is enabled:
"taskset -c 33 echo c > /proc/sysrq-trigger"
The bugfix for PVM is not implemented due to the lack of testing environment.
[boris: xen_hvm_init_time_ops() returns on errors instead of jumping to end]
Cc: Joe Jin joe.jin@oracle.com Signed-off-by: Dongli Zhang dongli.zhang@oracle.com Reviewed-by: Boris Ostrovsky boris.ostrovsky@oracle.com Link: https://lore.kernel.org/r/20220302164032.14569-3-dongli.zhang@oracle.com Signed-off-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/xen/smp_hvm.c | 6 ++++++ arch/x86/xen/time.c | 24 +++++++++++++++++++++++- 2 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/arch/x86/xen/smp_hvm.c b/arch/x86/xen/smp_hvm.c index 6ff3c887e0b9..b70afdff419c 100644 --- a/arch/x86/xen/smp_hvm.c +++ b/arch/x86/xen/smp_hvm.c @@ -19,6 +19,12 @@ static void __init xen_hvm_smp_prepare_boot_cpu(void) */ xen_vcpu_setup(0);
+ /* + * Called again in case the kernel boots on vcpu >= MAX_VIRT_CPUS. + * Refer to comments in xen_hvm_init_time_ops(). + */ + xen_hvm_init_time_ops(); + /* * The alternative logic (which patches the unlock/lock) runs before * the smp bootup up code is activated. Hence we need to set this up diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index d9c945ee1100..9ef0a5cca96e 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -558,6 +558,11 @@ static void xen_hvm_setup_cpu_clockevents(void)
void __init xen_hvm_init_time_ops(void) { + static bool hvm_time_initialized; + + if (hvm_time_initialized) + return; + /* * vector callback is needed otherwise we cannot receive interrupts * on cpu > 0 and at this point we don't know how many cpus are @@ -567,7 +572,22 @@ void __init xen_hvm_init_time_ops(void) return;
if (!xen_feature(XENFEAT_hvm_safe_pvclock)) { - pr_info("Xen doesn't support pvclock on HVM, disable pv timer"); + pr_info_once("Xen doesn't support pvclock on HVM, disable pv timer"); + return; + } + + /* + * Only MAX_VIRT_CPUS 'vcpu_info' are embedded inside 'shared_info'. + * The __this_cpu_read(xen_vcpu) is still NULL when Xen HVM guest + * boots on vcpu >= MAX_VIRT_CPUS (e.g., kexec), To access + * __this_cpu_read(xen_vcpu) via xen_clocksource_read() will panic. + * + * The xen_hvm_init_time_ops() should be called again later after + * __this_cpu_read(xen_vcpu) is available. + */ + if (!__this_cpu_read(xen_vcpu)) { + pr_info("Delay xen_init_time_common() as kernel is running on vcpu=%d\n", + xen_vcpu_nr(0)); return; }
@@ -577,6 +597,8 @@ void __init xen_hvm_init_time_ops(void) x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents;
x86_platform.set_wallclock = xen_set_wallclock; + + hvm_time_initialized = true; } #endif
From: Tony Lindgren tony@atomide.com
[ Upstream commit 80864594ff2ad002e2755daf97d46ff0c86faf1f ]
In preparation for making use of the clock-output-names, we want to keep node around in ti_dt_clocks_register().
This change should not needed as a fix currently.
Signed-off-by: Tony Lindgren tony@atomide.com Link: https://lore.kernel.org/r/20220204071449.16762-3-tony@atomide.com Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/ti/clk.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/clk/ti/clk.c b/drivers/clk/ti/clk.c index 3da33c786d77..29eafab4353e 100644 --- a/drivers/clk/ti/clk.c +++ b/drivers/clk/ti/clk.c @@ -131,7 +131,7 @@ int ti_clk_setup_ll_ops(struct ti_clk_ll_ops *ops) void __init ti_dt_clocks_register(struct ti_dt_clk oclks[]) { struct ti_dt_clk *c; - struct device_node *node, *parent; + struct device_node *node, *parent, *child; struct clk *clk; struct of_phandle_args clkspec; char buf[64]; @@ -171,10 +171,13 @@ void __init ti_dt_clocks_register(struct ti_dt_clk oclks[]) node = of_find_node_by_name(NULL, buf); if (num_args && compat_mode) { parent = node; - node = of_get_child_by_name(parent, "clock"); - if (!node) - node = of_get_child_by_name(parent, "clk"); - of_node_put(parent); + child = of_get_child_by_name(parent, "clock"); + if (!child) + child = of_get_child_by_name(parent, "clk"); + if (child) { + of_node_put(parent); + node = child; + } }
clkspec.np = node;
From: Maxime Ripard maxime@cerno.tech
[ Upstream commit 10c46f2ea914202482d19cf80dcc9c321c9ff59b ]
If we were to have two users of the same clock, doing something like:
clk_set_rate_range(user1, 1000, 2000); clk_set_rate_range(user2, 3000, 4000);
The second call would fail with -EINVAL, preventing from getting in a situation where we end up with impossible limits.
However, this is never explicitly checked against and enforced, and works by relying on an undocumented behaviour of clk_set_rate().
Indeed, on the first clk_set_rate_range will make sure the current clock rate is within the new range, so it will be between 1000 and 2000Hz. On the second clk_set_rate_range(), it will consider (rightfully), that our current clock is outside of the 3000-4000Hz range, and will call clk_core_set_rate_nolock() to set it to 3000Hz.
clk_core_set_rate_nolock() will then call clk_calc_new_rates() that will eventually check that our rate 3000Hz rate is outside the min 3000Hz max 2000Hz range, will bail out, the error will propagate and we'll eventually return -EINVAL.
This solely relies on the fact that clk_calc_new_rates(), and in particular clk_core_determine_round_nolock(), won't modify the new rate allowing the error to be reported. That assumption won't be true for all drivers, and most importantly we'll break that assumption in a later patch.
It can also be argued that we shouldn't even reach the point where we're calling clk_core_set_rate_nolock().
Let's make an explicit check for disjoints range before we're doing anything.
Signed-off-by: Maxime Ripard maxime@cerno.tech Link: https://lore.kernel.org/r/20220225143534.405820-4-maxime@cerno.tech Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/clk.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index 01b64b962e76..2fdfce116087 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -632,6 +632,24 @@ static void clk_core_get_boundaries(struct clk_core *core, *max_rate = min(*max_rate, clk_user->max_rate); }
+static bool clk_core_check_boundaries(struct clk_core *core, + unsigned long min_rate, + unsigned long max_rate) +{ + struct clk *user; + + lockdep_assert_held(&prepare_lock); + + if (min_rate > core->max_rate || max_rate < core->min_rate) + return false; + + hlist_for_each_entry(user, &core->clks, clks_node) + if (min_rate > user->max_rate || max_rate < user->min_rate) + return false; + + return true; +} + void clk_hw_set_rate_range(struct clk_hw *hw, unsigned long min_rate, unsigned long max_rate) { @@ -2348,6 +2366,11 @@ int clk_set_rate_range(struct clk *clk, unsigned long min, unsigned long max) clk->min_rate = min; clk->max_rate = max;
+ if (!clk_core_check_boundaries(clk->core, min, max)) { + ret = -EINVAL; + goto out; + } + rate = clk_core_get_rate_nolock(clk->core); if (rate < min || rate > max) { /* @@ -2376,6 +2399,7 @@ int clk_set_rate_range(struct clk *clk, unsigned long min, unsigned long max) } }
+out: if (clk->exclusive_count) clk_core_rate_protect(clk->core);
From: NeilBrown neilb@suse.de
[ Upstream commit a721035477fb5fb8abc738fbe410b07c12af3dc5 ]
When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory.
xprt_dynamic_alloc_slot can block indefinitely. This can tie up all workqueue threads and NFS can deadlock. So when called from a workqueue, set __GFP_NORETRY.
The rdma alloc_slot already does not block. However it sets the error to -EAGAIN suggesting this will trigger a sleep. It does not. As we can see in call_reserveresult(), only -ENOMEM causes a sleep. -EAGAIN causes immediate retry.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/xprt.c | 5 ++++- net/sunrpc/xprtrdma/transport.c | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 396a74974f60..75acde97d748 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1690,12 +1690,15 @@ static bool xprt_throttle_congested(struct rpc_xprt *xprt, struct rpc_task *task static struct rpc_rqst *xprt_dynamic_alloc_slot(struct rpc_xprt *xprt) { struct rpc_rqst *req = ERR_PTR(-EAGAIN); + gfp_t gfp_mask = GFP_KERNEL;
if (xprt->num_reqs >= xprt->max_reqs) goto out; ++xprt->num_reqs; spin_unlock(&xprt->reserve_lock); - req = kzalloc(sizeof(struct rpc_rqst), GFP_NOFS); + if (current->flags & PF_WQ_WORKER) + gfp_mask |= __GFP_NORETRY | __GFP_NOWARN; + req = kzalloc(sizeof(*req), gfp_mask); spin_lock(&xprt->reserve_lock); if (req != NULL) goto out; diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index ff78a296fa81..6b7e10e5a141 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -521,7 +521,7 @@ xprt_rdma_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task) return;
out_sleep: - task->tk_status = -EAGAIN; + task->tk_status = -ENOMEM; xprt_add_backlog(xprt, task); }
From: NeilBrown neilb@suse.de
[ Upstream commit a80a8461868905823609be97f91776a26befe839 ]
Currently, tasks marked as "swapper" tasks get put to the front of non-priority rpc_queues, and are sorted earlier than non-swapper tasks on the transport's ->xmit_queue.
This is pointless as currently *all* tasks for a mount that has swap enabled on *any* file are marked as "swapper" tasks. So the net result is that the non-priority rpc_queues are reverse-ordered (LIFO).
This scheduling boost is not necessary to avoid deadlocks, and hurts fairness, so remove it. If there were a need to expedite some requests, the tk_priority mechanism is a more appropriate tool.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/sched.c | 7 ------- net/sunrpc/xprt.c | 11 ----------- 2 files changed, 18 deletions(-)
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index ae295844ac55..9020cedb7c95 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -186,11 +186,6 @@ static void __rpc_add_wait_queue_priority(struct rpc_wait_queue *queue,
/* * Add new request to wait queue. - * - * Swapper tasks always get inserted at the head of the queue. - * This should avoid many nasty memory deadlocks and hopefully - * improve overall performance. - * Everyone else gets appended to the queue to ensure proper FIFO behavior. */ static void __rpc_add_wait_queue(struct rpc_wait_queue *queue, struct rpc_task *task, @@ -199,8 +194,6 @@ static void __rpc_add_wait_queue(struct rpc_wait_queue *queue, INIT_LIST_HEAD(&task->u.tk_wait.timer_list); if (RPC_IS_PRIORITY(queue)) __rpc_add_wait_queue_priority(queue, task, queue_priority); - else if (RPC_IS_SWAPPER(task)) - list_add(&task->u.tk_wait.list, &queue->tasks[0]); else list_add_tail(&task->u.tk_wait.list, &queue->tasks[0]); task->tk_waitqueue = queue; diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 75acde97d748..b1bb466bbdda 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1354,17 +1354,6 @@ xprt_request_enqueue_transmit(struct rpc_task *task) INIT_LIST_HEAD(&req->rq_xmit2); goto out; } - } else if (RPC_IS_SWAPPER(task)) { - list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { - if (pos->rq_cong || pos->rq_bytes_sent) - continue; - if (RPC_IS_SWAPPER(pos->rq_task)) - continue; - /* Note: req is added _before_ pos */ - list_add_tail(&req->rq_xmit, &pos->rq_xmit); - INIT_LIST_HEAD(&req->rq_xmit2); - goto out; - } } else if (!req->rq_seqno) { list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { if (pos->rq_task->tk_owner != task->tk_owner)
From: NeilBrown neilb@suse.de
[ Upstream commit 64158668ac8b31626a8ce48db4cad08496eb8340 ]
1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding possible deadlocks with "fs_reclaim". These deadlocks could, I believe, eventuate if a buffered read on the swapfile was attempted.
We don't need coherence with the page cache for a swap file, and buffered writes are forbidden anyway. There is no other need for i_rwsem during direct IO. So never take it for swap_rw()
2/ generic_write_checks() explicitly forbids writes to swap, and performs checks that are not needed for swap. So bypass it for swap_rw().
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/direct.c | 42 ++++++++++++++++++++++++++++-------------- fs/nfs/file.c | 4 ++-- include/linux/nfs_fs.h | 8 ++++---- 3 files changed, 34 insertions(+), 20 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index eabfdab543c8..04aaf39a05cb 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -173,8 +173,8 @@ ssize_t nfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) VM_BUG_ON(iov_iter_count(iter) != PAGE_SIZE);
if (iov_iter_rw(iter) == READ) - return nfs_file_direct_read(iocb, iter); - return nfs_file_direct_write(iocb, iter); + return nfs_file_direct_read(iocb, iter, true); + return nfs_file_direct_write(iocb, iter, true); }
static void nfs_direct_release_pages(struct page **pages, unsigned int npages) @@ -425,6 +425,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, * nfs_file_direct_read - file direct read operation for NFS files * @iocb: target I/O control block * @iter: vector of user buffers into which to read data + * @swap: flag indicating this is swap IO, not O_DIRECT IO * * We use this function for direct reads instead of calling * generic_file_aio_read() in order to avoid gfar's check to see if @@ -440,7 +441,8 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, * client must read the updated atime from the server back into its * cache. */ -ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter) +ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter, + bool swap) { struct file *file = iocb->ki_filp; struct address_space *mapping = file->f_mapping; @@ -482,12 +484,14 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter) if (iter_is_iovec(iter)) dreq->flags = NFS_ODIRECT_SHOULD_DIRTY;
- nfs_start_io_direct(inode); + if (!swap) + nfs_start_io_direct(inode);
NFS_I(inode)->read_io += count; requested = nfs_direct_read_schedule_iovec(dreq, iter, iocb->ki_pos);
- nfs_end_io_direct(inode); + if (!swap) + nfs_end_io_direct(inode);
if (requested > 0) { result = nfs_direct_wait(dreq); @@ -876,6 +880,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, * nfs_file_direct_write - file direct write operation for NFS files * @iocb: target I/O control block * @iter: vector of user buffers from which to write data + * @swap: flag indicating this is swap IO, not O_DIRECT IO * * We use this function for direct writes instead of calling * generic_file_aio_write() in order to avoid taking the inode @@ -892,7 +897,8 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, * Note that O_APPEND is not supported for NFS direct writes, as there * is no atomic O_APPEND write facility in the NFS protocol. */ -ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) +ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, + bool swap) { ssize_t result, requested; size_t count; @@ -906,7 +912,11 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) dfprintk(FILE, "NFS: direct write(%pD2, %zd@%Ld)\n", file, iov_iter_count(iter), (long long) iocb->ki_pos);
- result = generic_write_checks(iocb, iter); + if (swap) + /* bypass generic checks */ + result = iov_iter_count(iter); + else + result = generic_write_checks(iocb, iter); if (result <= 0) return result; count = result; @@ -937,16 +947,20 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) dreq->iocb = iocb; pnfs_init_ds_commit_info_ops(&dreq->ds_cinfo, inode);
- nfs_start_io_direct(inode); + if (swap) { + requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); + } else { + nfs_start_io_direct(inode);
- requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); + requested = nfs_direct_write_schedule_iovec(dreq, iter, pos);
- if (mapping->nrpages) { - invalidate_inode_pages2_range(mapping, - pos >> PAGE_SHIFT, end); - } + if (mapping->nrpages) { + invalidate_inode_pages2_range(mapping, + pos >> PAGE_SHIFT, end); + }
- nfs_end_io_direct(inode); + nfs_end_io_direct(inode); + }
if (requested > 0) { result = nfs_direct_wait(dreq); diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 76d76acbc594..d8583f57ff99 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -162,7 +162,7 @@ nfs_file_read(struct kiocb *iocb, struct iov_iter *to) ssize_t result;
if (iocb->ki_flags & IOCB_DIRECT) - return nfs_file_direct_read(iocb, to); + return nfs_file_direct_read(iocb, to, false);
dprintk("NFS: read(%pD2, %zu@%lu)\n", iocb->ki_filp, @@ -619,7 +619,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) return result;
if (iocb->ki_flags & IOCB_DIRECT) - return nfs_file_direct_write(iocb, from); + return nfs_file_direct_write(iocb, from, false);
dprintk("NFS: write(%pD2, %zu@%Ld)\n", file, iov_iter_count(from), (long long) iocb->ki_pos); diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 68f81d8d36de..161e4f5ea7a0 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -513,10 +513,10 @@ static inline const struct cred *nfs_file_cred(struct file *file) * linux/fs/nfs/direct.c */ extern ssize_t nfs_direct_IO(struct kiocb *, struct iov_iter *); -extern ssize_t nfs_file_direct_read(struct kiocb *iocb, - struct iov_iter *iter); -extern ssize_t nfs_file_direct_write(struct kiocb *iocb, - struct iov_iter *iter); +ssize_t nfs_file_direct_read(struct kiocb *iocb, + struct iov_iter *iter, bool swap); +ssize_t nfs_file_direct_write(struct kiocb *iocb, + struct iov_iter *iter, bool swap);
/* * linux/fs/nfs/dir.c
From: NeilBrown neilb@suse.de
[ Upstream commit c265de257f558a05c1859ee9e3fed04883b9ec0e ]
The commit handling code is not safe against memory-pressure deadlocks when writing to swap. In particular, nfs_commitdata_alloc() blocks indefinitely waiting for memory, and this can consume all available workqueue threads.
swap-out most likely uses STABLE writes anyway as COND_STABLE indicates that a stable write should be used if the write fits in a single request, and it normally does. However if we ever swap with a small wsize, or gather unusually large numbers of pages for a single write, this might change.
For safety, make it explicit in the code that direct writes used for swap must always use FLUSH_STABLE.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/direct.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 04aaf39a05cb..11c566d8769f 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -794,7 +794,7 @@ static const struct nfs_pgio_completion_ops nfs_direct_write_completion_ops = { */ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, struct iov_iter *iter, - loff_t pos) + loff_t pos, int ioflags) { struct nfs_pageio_descriptor desc; struct inode *inode = dreq->inode; @@ -802,7 +802,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, size_t requested_bytes = 0; size_t wsize = max_t(size_t, NFS_SERVER(inode)->wsize, PAGE_SIZE);
- nfs_pageio_init_write(&desc, inode, FLUSH_COND_STABLE, false, + nfs_pageio_init_write(&desc, inode, ioflags, false, &nfs_direct_write_completion_ops); desc.pg_dreq = dreq; get_dreq(dreq); @@ -948,11 +948,13 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, pnfs_init_ds_commit_info_ops(&dreq->ds_cinfo, inode);
if (swap) { - requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); + requested = nfs_direct_write_schedule_iovec(dreq, iter, pos, + FLUSH_STABLE); } else { nfs_start_io_direct(inode);
- requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); + requested = nfs_direct_write_schedule_iovec(dreq, iter, pos, + FLUSH_COND_STABLE);
if (mapping->nrpages) { invalidate_inode_pages2_range(mapping,
From: Peter Zijlstra peterz@infradead.org
[ Upstream commit be0075951fde739f14ee2b659e2fd6e2499c46c0 ]
vmlinux.o: warning: objtool: page_fault_oops()+0x13c: unreachable instruction
0000 000000000005b460 <page_fault_oops>: ... 0128 5b588: 49 89 23 mov %rsp,(%r11) 012b 5b58b: 4c 89 dc mov %r11,%rsp 012e 5b58e: 4c 89 f2 mov %r14,%rdx 0131 5b591: 48 89 ee mov %rbp,%rsi 0134 5b594: 4c 89 e7 mov %r12,%rdi 0137 5b597: e8 00 00 00 00 call 5b59c <page_fault_oops+0x13c> 5b598: R_X86_64_PLT32 handle_stack_overflow-0x4 013c 5b59c: 5c pop %rsp
vmlinux.o: warning: objtool: sysvec_reboot()+0x6d: unreachable instruction
0000 00000000000033f0 <sysvec_reboot>: ... 005d 344d: 4c 89 dc mov %r11,%rsp 0060 3450: e8 00 00 00 00 call 3455 <sysvec_reboot+0x65> 3451: R_X86_64_PLT32 irq_enter_rcu-0x4 0065 3455: 48 89 ef mov %rbp,%rdi 0068 3458: e8 00 00 00 00 call 345d <sysvec_reboot+0x6d> 3459: R_X86_64_PC32 .text+0x47d0c 006d 345d: e8 00 00 00 00 call 3462 <sysvec_reboot+0x72> 345e: R_X86_64_PLT32 irq_exit_rcu-0x4 0072 3462: 5c pop %rsp
Both cases are due to a call_on_stack() calling a __noreturn function. Since that's an inline asm, GCC can't do anything about the instructions after the CALL. Therefore put in an explicit ASM_REACHABLE annotation to make sure objtool and gcc are consistently confused about control flow.
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Acked-by: Josh Poimboeuf jpoimboe@redhat.com Link: https://lore.kernel.org/r/20220308154319.468805622@infradead.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/irq_stack.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/irq_stack.h b/arch/x86/include/asm/irq_stack.h index ae9d40f6c706..05af249d6bec 100644 --- a/arch/x86/include/asm/irq_stack.h +++ b/arch/x86/include/asm/irq_stack.h @@ -99,7 +99,8 @@ }
#define ASM_CALL_ARG0 \ - "call %P[__func] \n" + "call %P[__func] \n" \ + ASM_REACHABLE
#define ASM_CALL_ARG1 \ "movq %[arg1], %%rdi \n" \
From: Nathan Chancellor nathan@kernel.org
[ Upstream commit aaeed6ecc1253ce1463fa1aca0b70a4ccbc9fa75 ]
There are two outstanding issues with CONFIG_X86_X32_ABI and llvm-objcopy, with similar root causes:
1. llvm-objcopy does not properly convert .note.gnu.property when going from x86_64 to x86_x32, resulting in a corrupted section when linking:
https://github.com/ClangBuiltLinux/linux/issues/1141
2. llvm-objcopy produces corrupted compressed debug sections when going from x86_64 to x86_x32, also resulting in an error when linking:
https://github.com/ClangBuiltLinux/linux/issues/514
After commit 41c5ef31ad71 ("x86/ibt: Base IBT bits"), the .note.gnu.property section is always generated when CONFIG_X86_KERNEL_IBT is enabled, which causes the first issue to become visible with an allmodconfig build:
ld.lld: error: arch/x86/entry/vdso/vclock_gettime-x32.o:(.note.gnu.property+0x1c): program property is too short
To avoid this error, do not allow CONFIG_X86_X32_ABI to be selected when using llvm-objcopy. If the two issues ever get fixed in llvm-objcopy, this can be turned into a feature check.
Signed-off-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lore.kernel.org/r/20220314194842.3452-3-nathan@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/Kconfig | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 9f5bd41bf660..d0ecc4005df3 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2837,6 +2837,11 @@ config IA32_AOUT config X86_X32 bool "x32 ABI for 64-bit mode" depends on X86_64 + # llvm-objcopy does not convert x86_64 .note.gnu.property or + # compressed debug sections to x86_x32 properly: + # https://github.com/ClangBuiltLinux/linux/issues/514 + # https://github.com/ClangBuiltLinux/linux/issues/1141 + depends on $(success,$(OBJCOPY) --version | head -n1 | grep -qv llvm) help Include code to run binaries for the x32 native 32-bit ABI for 64-bit processors. An x32 process gets access to the
From: Jiri Slaby jslaby@suse.cz
[ Upstream commit 988c7c00691008ea1daaa1235680a0da49dab4e8 ]
The commit c15c3747ee32 (serial: samsung: fix potential soft lockup during uart write) added an unlock of port->lock before uart_write_wakeup() and a lock after it. It was always problematic to write data from tty_ldisc_ops::write_wakeup and it was even documented that way. We fixed the line disciplines to conform to this recently. So if there is still a missed one, we should fix them instead of this workaround.
On the top of that, s3c24xx_serial_tx_dma_complete() in this driver still holds the port->lock while calling uart_write_wakeup().
So revert the wrap added by the commit above.
Cc: Thomas Abraham thomas.abraham@linaro.org Cc: Kyungmin Park kyungmin.park@samsung.com Cc: Hyeonkook Kim hk619.kim@samsung.com Signed-off-by: Jiri Slaby jslaby@suse.cz Link: https://lore.kernel.org/r/20220308115153.4225-1-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/tty/serial/samsung_tty.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/tty/serial/samsung_tty.c b/drivers/tty/serial/samsung_tty.c index d002a4e48ed9..0d94a7cb275e 100644 --- a/drivers/tty/serial/samsung_tty.c +++ b/drivers/tty/serial/samsung_tty.c @@ -921,11 +921,8 @@ static void s3c24xx_serial_tx_chars(struct s3c24xx_uart_port *ourport) return; }
- if (uart_circ_chars_pending(xmit) < WAKEUP_CHARS) { - spin_unlock(&port->lock); + if (uart_circ_chars_pending(xmit) < WAKEUP_CHARS) uart_write_wakeup(port); - spin_lock(&port->lock); - }
if (uart_circ_empty(xmit)) s3c24xx_serial_stop_tx(port);
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit fefb8a2a941338d871e2d83fbd65fbfa068857bd ]
Eliminate anonymous module_init() and module_exit(), which can lead to confusion or ambiguity when reading System.map, crashes/oops/bugs, or an initcall_debug log.
Give each of these init and exit functions unique driver-specific names to eliminate the anonymous names.
Example 1: (System.map) ffffffff832fc78c t init ffffffff832fc79e t init ffffffff832fc8f8 t init
Example 2: (initcall_debug log) calling init+0x0/0x12 @ 1 initcall init+0x0/0x12 returned 0 after 15 usecs calling init+0x0/0x60 @ 1 initcall init+0x0/0x60 returned 0 after 2 usecs calling init+0x0/0x9a @ 1 initcall init+0x0/0x9a returned 0 after 74 usecs
Signed-off-by: Randy Dunlap rdunlap@infradead.org Reviewed-by: Amit Shah amit@kernel.org Cc: virtualization@lists.linux-foundation.org Cc: Arnd Bergmann arnd@arndb.de Link: https://lore.kernel.org/r/20220316192010.19001-3-rdunlap@infradead.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/char/virtio_console.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c index e3c430539a17..9fa3c76a267f 100644 --- a/drivers/char/virtio_console.c +++ b/drivers/char/virtio_console.c @@ -2245,7 +2245,7 @@ static struct virtio_driver virtio_rproc_serial = { .remove = virtcons_remove, };
-static int __init init(void) +static int __init virtio_console_init(void) { int err;
@@ -2280,7 +2280,7 @@ static int __init init(void) return err; }
-static void __exit fini(void) +static void __exit virtio_console_fini(void) { reclaim_dma_bufs();
@@ -2290,8 +2290,8 @@ static void __exit fini(void) class_destroy(pdrvdata.class); debugfs_remove_recursive(pdrvdata.debugfs_dir); } -module_init(init); -module_exit(fini); +module_init(virtio_console_init); +module_exit(virtio_console_fini);
MODULE_DESCRIPTION("Virtio console driver"); MODULE_LICENSE("GPL");
From: Haimin Zhang tcs_kernel@tencent.com
[ Upstream commit a53046291020ec41e09181396c1e829287b48d47 ]
Add validation check for JFS_IP(ipimap)->i_imap to prevent a NULL deref in diFree since diFree uses it without do any validations. When function jfs_mount calls diMount to initialize fileset inode allocation map, it can fail and JFS_IP(ipimap)->i_imap won't be initialized. Then it calls diFreeSpecial to close fileset inode allocation map inode and it will flow into jfs_evict_inode. Function jfs_evict_inode just validates JFS_SBI(inode->i_sb)->ipimap, then calls diFree. diFree use JFS_IP(ipimap)->i_imap directly, then it will cause a NULL deref.
Reported-by: TCS Robot tcs_robot@tencent.com Signed-off-by: Haimin Zhang tcs_kernel@tencent.com Signed-off-by: Dave Kleikamp dave.kleikamp@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/jfs/inode.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/jfs/inode.c b/fs/jfs/inode.c index 57ab424c05ff..072821b50ab9 100644 --- a/fs/jfs/inode.c +++ b/fs/jfs/inode.c @@ -146,12 +146,13 @@ void jfs_evict_inode(struct inode *inode) dquot_initialize(inode);
if (JFS_IP(inode)->fileset == FILESYSTEM_I) { + struct inode *ipimap = JFS_SBI(inode->i_sb)->ipimap; truncate_inode_pages_final(&inode->i_data);
if (test_cflag(COMMIT_Freewmap, inode)) jfs_free_zero_link(inode);
- if (JFS_SBI(inode->i_sb)->ipimap) + if (ipimap && JFS_IP(ipimap)->i_imap) diFree(inode);
/*
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 7496b59f588dd52886fdbac7633608097543a0a5 ]
The socket layer requires that we use the socket lock to protect changes to the sock->sk_write_pending field and others.
Reported-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/xprtsock.c | 54 +++++++++++++++++++++++++++++++------------ 1 file changed, 39 insertions(+), 15 deletions(-)
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 11eab0f0333b..5eb05c2dd6d6 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -763,12 +763,12 @@ xs_stream_start_connect(struct sock_xprt *transport) /** * xs_nospace - handle transmit was incomplete * @req: pointer to RPC request + * @transport: pointer to struct sock_xprt * */ -static int xs_nospace(struct rpc_rqst *req) +static int xs_nospace(struct rpc_rqst *req, struct sock_xprt *transport) { - struct rpc_xprt *xprt = req->rq_xprt; - struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); + struct rpc_xprt *xprt = &transport->xprt; struct sock *sk = transport->inet; int ret = -EAGAIN;
@@ -779,25 +779,49 @@ static int xs_nospace(struct rpc_rqst *req)
/* Don't race with disconnect */ if (xprt_connected(xprt)) { + struct socket_wq *wq; + + rcu_read_lock(); + wq = rcu_dereference(sk->sk_wq); + set_bit(SOCKWQ_ASYNC_NOSPACE, &wq->flags); + rcu_read_unlock(); + /* wait for more buffer space */ + set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); sk->sk_write_pending++; xprt_wait_for_buffer_space(xprt); } else ret = -ENOTCONN;
spin_unlock(&xprt->transport_lock); + return ret; +}
- /* Race breaker in case memory is freed before above code is called */ - if (ret == -EAGAIN) { - struct socket_wq *wq; +static int xs_sock_nospace(struct rpc_rqst *req) +{ + struct sock_xprt *transport = + container_of(req->rq_xprt, struct sock_xprt, xprt); + struct sock *sk = transport->inet; + int ret = -EAGAIN;
- rcu_read_lock(); - wq = rcu_dereference(sk->sk_wq); - set_bit(SOCKWQ_ASYNC_NOSPACE, &wq->flags); - rcu_read_unlock(); + lock_sock(sk); + if (!sock_writeable(sk)) + ret = xs_nospace(req, transport); + release_sock(sk); + return ret; +}
- sk->sk_write_space(sk); - } +static int xs_stream_nospace(struct rpc_rqst *req) +{ + struct sock_xprt *transport = + container_of(req->rq_xprt, struct sock_xprt, xprt); + struct sock *sk = transport->inet; + int ret = -EAGAIN; + + lock_sock(sk); + if (!sk_stream_memory_free(sk)) + ret = xs_nospace(req, transport); + release_sock(sk); return ret; }
@@ -887,7 +911,7 @@ static int xs_local_send_request(struct rpc_rqst *req) case -ENOBUFS: break; case -EAGAIN: - status = xs_nospace(req); + status = xs_stream_nospace(req); break; default: dprintk("RPC: sendmsg returned unrecognized error %d\n", @@ -963,7 +987,7 @@ static int xs_udp_send_request(struct rpc_rqst *req) /* Should we call xs_close() here? */ break; case -EAGAIN: - status = xs_nospace(req); + status = xs_sock_nospace(req); break; case -ENETUNREACH: case -ENOBUFS: @@ -1083,7 +1107,7 @@ static int xs_tcp_send_request(struct rpc_rqst *req) /* Should we call xs_close() here? */ break; case -EAGAIN: - status = xs_nospace(req); + status = xs_stream_nospace(req); break; case -ECONNRESET: case -ECONNREFUSED:
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 515dcdcd48736576c6f5c197814da6f81c60a21e ]
The concern is that since nfsiod is sometimes required to kick off a commit, it can get locked up waiting forever in mempool_alloc() instead of failing gracefully and leaving the commit until later.
Try to allocate from the slab first, with GFP_KERNEL | __GFP_NORETRY, then fall back to a non-blocking attempt to allocate from the memory pool.
Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/internal.h | 7 +++++++ fs/nfs/pnfs_nfs.c | 8 ++++++-- fs/nfs/write.c | 24 +++++++++--------------- include/linux/nfs_fs.h | 2 +- 4 files changed, 23 insertions(+), 18 deletions(-)
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 2de7c56a1fbe..db9f611e8efd 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -573,6 +573,13 @@ nfs_write_match_verf(const struct nfs_writeverf *verf, !nfs_write_verifier_cmp(&req->wb_verf, &verf->verifier); }
+static inline gfp_t nfs_io_gfp_mask(void) +{ + if (current->flags & PF_WQ_WORKER) + return GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN; + return GFP_KERNEL; +} + /* unlink.c */ extern struct rpc_task * nfs_async_rename(struct inode *old_dir, struct inode *new_dir, diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c index 316f68f96e57..657c242a18ff 100644 --- a/fs/nfs/pnfs_nfs.c +++ b/fs/nfs/pnfs_nfs.c @@ -419,7 +419,7 @@ static struct nfs_commit_data * pnfs_bucket_fetch_commitdata(struct pnfs_commit_bucket *bucket, struct nfs_commit_info *cinfo) { - struct nfs_commit_data *data = nfs_commitdata_alloc(false); + struct nfs_commit_data *data = nfs_commitdata_alloc();
if (!data) return NULL; @@ -515,7 +515,11 @@ pnfs_generic_commit_pagelist(struct inode *inode, struct list_head *mds_pages, unsigned int nreq = 0;
if (!list_empty(mds_pages)) { - data = nfs_commitdata_alloc(true); + data = nfs_commitdata_alloc(); + if (!data) { + nfs_retry_commit(mds_pages, NULL, cinfo, -1); + return -ENOMEM; + } data->ds_commit_index = -1; list_splice_init(mds_pages, &data->pages); list_add_tail(&data->list, &list); diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 60693ab6a032..d0b9083bbfb5 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -70,27 +70,17 @@ static mempool_t *nfs_wdata_mempool; static struct kmem_cache *nfs_cdata_cachep; static mempool_t *nfs_commit_mempool;
-struct nfs_commit_data *nfs_commitdata_alloc(bool never_fail) +struct nfs_commit_data *nfs_commitdata_alloc(void) { struct nfs_commit_data *p;
- if (never_fail) - p = mempool_alloc(nfs_commit_mempool, GFP_NOIO); - else { - /* It is OK to do some reclaim, not no safe to wait - * for anything to be returned to the pool. - * mempool_alloc() cannot handle that particular combination, - * so we need two separate attempts. - */ + p = kmem_cache_zalloc(nfs_cdata_cachep, nfs_io_gfp_mask()); + if (!p) { p = mempool_alloc(nfs_commit_mempool, GFP_NOWAIT); - if (!p) - p = kmem_cache_alloc(nfs_cdata_cachep, GFP_NOIO | - __GFP_NOWARN | __GFP_NORETRY); if (!p) return NULL; + memset(p, 0, sizeof(*p)); } - - memset(p, 0, sizeof(*p)); INIT_LIST_HEAD(&p->pages); return p; } @@ -1826,7 +1816,11 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how, if (list_empty(head)) return 0;
- data = nfs_commitdata_alloc(true); + data = nfs_commitdata_alloc(); + if (!data) { + nfs_retry_commit(head, NULL, cinfo, -1); + return -ENOMEM; + }
/* Set up the argument struct */ nfs_init_commit(data, head, NULL, cinfo); diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 161e4f5ea7a0..c9d3dc79d587 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -585,7 +585,7 @@ extern int nfs_wb_all(struct inode *inode); extern int nfs_wb_page(struct inode *inode, struct page *page); extern int nfs_wb_page_cancel(struct inode *inode, struct page* page); extern int nfs_commit_inode(struct inode *, int); -extern struct nfs_commit_data *nfs_commitdata_alloc(bool never_fail); +extern struct nfs_commit_data *nfs_commitdata_alloc(void); extern void nfs_commit_free(struct nfs_commit_data *data); bool nfs_commit_end(struct nfs_mds_commit_info *cinfo);
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 0bae835b63c53f86cdc524f5962e39409585b22c ]
In a low memory situation, allow the NFS writeback code to fail without getting stuck in infinite loops in mempool_alloc().
Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/pagelist.c | 10 +++++----- fs/nfs/write.c | 10 ++++++++-- 2 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c index 815d63080245..9157dd19b8b4 100644 --- a/fs/nfs/pagelist.c +++ b/fs/nfs/pagelist.c @@ -90,10 +90,10 @@ void nfs_set_pgio_error(struct nfs_pgio_header *hdr, int error, loff_t pos) } }
-static inline struct nfs_page * -nfs_page_alloc(void) +static inline struct nfs_page *nfs_page_alloc(void) { - struct nfs_page *p = kmem_cache_zalloc(nfs_page_cachep, GFP_KERNEL); + struct nfs_page *p = + kmem_cache_zalloc(nfs_page_cachep, nfs_io_gfp_mask()); if (p) INIT_LIST_HEAD(&p->wb_list); return p; @@ -892,7 +892,7 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *desc, struct nfs_commit_info cinfo; struct nfs_page_array *pg_array = &hdr->page_array; unsigned int pagecount, pageused; - gfp_t gfp_flags = GFP_KERNEL; + gfp_t gfp_flags = nfs_io_gfp_mask();
pagecount = nfs_page_array_len(mirror->pg_base, mirror->pg_count); pg_array->npages = pagecount; @@ -979,7 +979,7 @@ nfs_pageio_alloc_mirrors(struct nfs_pageio_descriptor *desc, desc->pg_mirrors_dynamic = NULL; if (mirror_count == 1) return desc->pg_mirrors_static; - ret = kmalloc_array(mirror_count, sizeof(*ret), GFP_KERNEL); + ret = kmalloc_array(mirror_count, sizeof(*ret), nfs_io_gfp_mask()); if (ret != NULL) { for (i = 0; i < mirror_count; i++) nfs_pageio_mirror_init(&ret[i], desc->pg_bsize); diff --git a/fs/nfs/write.c b/fs/nfs/write.c index d0b9083bbfb5..938850303099 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -94,9 +94,15 @@ EXPORT_SYMBOL_GPL(nfs_commit_free);
static struct nfs_pgio_header *nfs_writehdr_alloc(void) { - struct nfs_pgio_header *p = mempool_alloc(nfs_wdata_mempool, GFP_KERNEL); + struct nfs_pgio_header *p;
- memset(p, 0, sizeof(*p)); + p = kmem_cache_zalloc(nfs_wdata_cachep, nfs_io_gfp_mask()); + if (!p) { + p = mempool_alloc(nfs_wdata_mempool, GFP_NOWAIT); + if (!p) + return NULL; + memset(p, 0, sizeof(*p)); + } p->rw_mode = FMODE_WRITE; return p; }
From: Naresh Kamboju naresh.kamboju@linaro.org
[ Upstream commit d9142e1cf3bbdaf21337767114ecab26fe702d47 ]
selftest net tls test cases need TLS=m without this the test hangs. Enabling config TLS solves this problem and runs to complete. - CONFIG_TLS=m
Reported-by: Linux Kernel Functional Testing lkft@linaro.org Signed-off-by: Naresh Kamboju naresh.kamboju@linaro.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/config | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config index ead7963b9bf0..cecb921a0dbf 100644 --- a/tools/testing/selftests/net/config +++ b/tools/testing/selftests/net/config @@ -43,5 +43,6 @@ CONFIG_NET_ACT_TUNNEL_KEY=m CONFIG_NET_ACT_MIRRED=m CONFIG_BAREUDP=m CONFIG_IPV6_IOAM6_LWTUNNEL=y +CONFIG_TLS=m CONFIG_CRYPTO_SM4=y CONFIG_AMT=m
From: Helge Deller deller@gmx.de
[ Upstream commit 939fc856676c266c3bc347c1c1661872a3725c0f ]
Add the missing logic to allow Lasi, WAX and Dino to set the CPU affinity. This fixes IRQ migration to other CPUs when a CPU is shutdown which currently holds the IRQs for one of those chips.
Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/parisc/dino.c | 41 +++++++++++++++++++++++++++++++++-------- drivers/parisc/gsc.c | 31 +++++++++++++++++++++++++++++++ drivers/parisc/gsc.h | 1 + drivers/parisc/lasi.c | 7 +++---- drivers/parisc/wax.c | 7 +++---- 5 files changed, 71 insertions(+), 16 deletions(-)
diff --git a/drivers/parisc/dino.c b/drivers/parisc/dino.c index 952a92504df6..e33036281327 100644 --- a/drivers/parisc/dino.c +++ b/drivers/parisc/dino.c @@ -142,9 +142,8 @@ struct dino_device { struct pci_hba_data hba; /* 'C' inheritance - must be first */ spinlock_t dinosaur_pen; - unsigned long txn_addr; /* EIR addr to generate interrupt */ - u32 txn_data; /* EIR data assign to each dino */ u32 imr; /* IRQ's which are enabled */ + struct gsc_irq gsc_irq; int global_irq[DINO_LOCAL_IRQS]; /* map IMR bit to global irq */ #ifdef DINO_DEBUG unsigned int dino_irr0; /* save most recent IRQ line stat */ @@ -339,14 +338,43 @@ static void dino_unmask_irq(struct irq_data *d) if (tmp & DINO_MASK_IRQ(local_irq)) { DBG(KERN_WARNING "%s(): IRQ asserted! (ILR 0x%x)\n", __func__, tmp); - gsc_writel(dino_dev->txn_data, dino_dev->txn_addr); + gsc_writel(dino_dev->gsc_irq.txn_data, dino_dev->gsc_irq.txn_addr); } }
+#ifdef CONFIG_SMP +static int dino_set_affinity_irq(struct irq_data *d, const struct cpumask *dest, + bool force) +{ + struct dino_device *dino_dev = irq_data_get_irq_chip_data(d); + struct cpumask tmask; + int cpu_irq; + u32 eim; + + if (!cpumask_and(&tmask, dest, cpu_online_mask)) + return -EINVAL; + + cpu_irq = cpu_check_affinity(d, &tmask); + if (cpu_irq < 0) + return cpu_irq; + + dino_dev->gsc_irq.txn_addr = txn_affinity_addr(d->irq, cpu_irq); + eim = ((u32) dino_dev->gsc_irq.txn_addr) | dino_dev->gsc_irq.txn_data; + __raw_writel(eim, dino_dev->hba.base_addr+DINO_IAR0); + + irq_data_update_effective_affinity(d, &tmask); + + return IRQ_SET_MASK_OK; +} +#endif + static struct irq_chip dino_interrupt_type = { .name = "GSC-PCI", .irq_unmask = dino_unmask_irq, .irq_mask = dino_mask_irq, +#ifdef CONFIG_SMP + .irq_set_affinity = dino_set_affinity_irq, +#endif };
@@ -806,7 +834,6 @@ static int __init dino_common_init(struct parisc_device *dev, { int status; u32 eim; - struct gsc_irq gsc_irq; struct resource *res;
pcibios_register_hba(&dino_dev->hba); @@ -821,10 +848,8 @@ static int __init dino_common_init(struct parisc_device *dev, ** still only has 11 IRQ input lines - just map some of them ** to a different processor. */ - dev->irq = gsc_alloc_irq(&gsc_irq); - dino_dev->txn_addr = gsc_irq.txn_addr; - dino_dev->txn_data = gsc_irq.txn_data; - eim = ((u32) gsc_irq.txn_addr) | gsc_irq.txn_data; + dev->irq = gsc_alloc_irq(&dino_dev->gsc_irq); + eim = ((u32) dino_dev->gsc_irq.txn_addr) | dino_dev->gsc_irq.txn_data;
/* ** Dino needs a PA "IRQ" to get a processor's attention. diff --git a/drivers/parisc/gsc.c b/drivers/parisc/gsc.c index ed9371acf37e..ec175ae99873 100644 --- a/drivers/parisc/gsc.c +++ b/drivers/parisc/gsc.c @@ -135,10 +135,41 @@ static void gsc_asic_unmask_irq(struct irq_data *d) */ }
+#ifdef CONFIG_SMP +static int gsc_set_affinity_irq(struct irq_data *d, const struct cpumask *dest, + bool force) +{ + struct gsc_asic *gsc_dev = irq_data_get_irq_chip_data(d); + struct cpumask tmask; + int cpu_irq; + + if (!cpumask_and(&tmask, dest, cpu_online_mask)) + return -EINVAL; + + cpu_irq = cpu_check_affinity(d, &tmask); + if (cpu_irq < 0) + return cpu_irq; + + gsc_dev->gsc_irq.txn_addr = txn_affinity_addr(d->irq, cpu_irq); + gsc_dev->eim = ((u32) gsc_dev->gsc_irq.txn_addr) | gsc_dev->gsc_irq.txn_data; + + /* switch IRQ's for devices below LASI/WAX to other CPU */ + gsc_writel(gsc_dev->eim, gsc_dev->hpa + OFFSET_IAR); + + irq_data_update_effective_affinity(d, &tmask); + + return IRQ_SET_MASK_OK; +} +#endif + + static struct irq_chip gsc_asic_interrupt_type = { .name = "GSC-ASIC", .irq_unmask = gsc_asic_unmask_irq, .irq_mask = gsc_asic_mask_irq, +#ifdef CONFIG_SMP + .irq_set_affinity = gsc_set_affinity_irq, +#endif };
int gsc_assign_irq(struct irq_chip *type, void *data) diff --git a/drivers/parisc/gsc.h b/drivers/parisc/gsc.h index 86abad3fa215..73cbd0bb1975 100644 --- a/drivers/parisc/gsc.h +++ b/drivers/parisc/gsc.h @@ -31,6 +31,7 @@ struct gsc_asic { int version; int type; int eim; + struct gsc_irq gsc_irq; int global_irq[32]; };
diff --git a/drivers/parisc/lasi.c b/drivers/parisc/lasi.c index 4e4fd12c2112..6ef621adb63a 100644 --- a/drivers/parisc/lasi.c +++ b/drivers/parisc/lasi.c @@ -163,7 +163,6 @@ static int __init lasi_init_chip(struct parisc_device *dev) { extern void (*chassis_power_off)(void); struct gsc_asic *lasi; - struct gsc_irq gsc_irq; int ret;
lasi = kzalloc(sizeof(*lasi), GFP_KERNEL); @@ -185,7 +184,7 @@ static int __init lasi_init_chip(struct parisc_device *dev) lasi_init_irq(lasi);
/* the IRQ lasi should use */ - dev->irq = gsc_alloc_irq(&gsc_irq); + dev->irq = gsc_alloc_irq(&lasi->gsc_irq); if (dev->irq < 0) { printk(KERN_ERR "%s(): cannot get GSC irq\n", __func__); @@ -193,9 +192,9 @@ static int __init lasi_init_chip(struct parisc_device *dev) return -EBUSY; }
- lasi->eim = ((u32) gsc_irq.txn_addr) | gsc_irq.txn_data; + lasi->eim = ((u32) lasi->gsc_irq.txn_addr) | lasi->gsc_irq.txn_data;
- ret = request_irq(gsc_irq.irq, gsc_asic_intr, 0, "lasi", lasi); + ret = request_irq(lasi->gsc_irq.irq, gsc_asic_intr, 0, "lasi", lasi); if (ret < 0) { kfree(lasi); return ret; diff --git a/drivers/parisc/wax.c b/drivers/parisc/wax.c index 5b6df1516235..73a2b01f8d9c 100644 --- a/drivers/parisc/wax.c +++ b/drivers/parisc/wax.c @@ -68,7 +68,6 @@ static int __init wax_init_chip(struct parisc_device *dev) { struct gsc_asic *wax; struct parisc_device *parent; - struct gsc_irq gsc_irq; int ret;
wax = kzalloc(sizeof(*wax), GFP_KERNEL); @@ -85,7 +84,7 @@ static int __init wax_init_chip(struct parisc_device *dev) wax_init_irq(wax);
/* the IRQ wax should use */ - dev->irq = gsc_claim_irq(&gsc_irq, WAX_GSC_IRQ); + dev->irq = gsc_claim_irq(&wax->gsc_irq, WAX_GSC_IRQ); if (dev->irq < 0) { printk(KERN_ERR "%s(): cannot get GSC irq\n", __func__); @@ -93,9 +92,9 @@ static int __init wax_init_chip(struct parisc_device *dev) return -EBUSY; }
- wax->eim = ((u32) gsc_irq.txn_addr) | gsc_irq.txn_data; + wax->eim = ((u32) wax->gsc_irq.txn_addr) | wax->gsc_irq.txn_data;
- ret = request_irq(gsc_irq.irq, gsc_asic_intr, 0, "wax", wax); + ret = request_irq(wax->gsc_irq.irq, gsc_asic_intr, 0, "wax", wax); if (ret < 0) { kfree(wax); return ret;
From: John David Anglin dave.anglin@bell.net
[ Upstream commit a9fe7fa7d874a536e0540469f314772c054a0323 ]
This change fixes the following:
1) The flags variable is not initialized. Always use raw_spin_lock_irqsave and raw_spin_unlock_irqrestore to serialize patching.
2) flush_kernel_vmap_range is primarily intended for DMA flushes. Since __patch_text_multiple is often called with interrupts disabled, it is better to directly call flush_kernel_dcache_range_asm and flush_kernel_icache_range_asm. This avoids an extra call.
3) The final call to flush_icache_range is unnecessary.
Signed-off-by: John David Anglin dave.anglin@bell.net Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/parisc/kernel/patch.c | 25 +++++++++++-------------- 1 file changed, 11 insertions(+), 14 deletions(-)
diff --git a/arch/parisc/kernel/patch.c b/arch/parisc/kernel/patch.c index 80a0ab372802..e59574f65e64 100644 --- a/arch/parisc/kernel/patch.c +++ b/arch/parisc/kernel/patch.c @@ -40,10 +40,7 @@ static void __kprobes *patch_map(void *addr, int fixmap, unsigned long *flags,
*need_unmap = 1; set_fixmap(fixmap, page_to_phys(page)); - if (flags) - raw_spin_lock_irqsave(&patch_lock, *flags); - else - __acquire(&patch_lock); + raw_spin_lock_irqsave(&patch_lock, *flags);
return (void *) (__fix_to_virt(fixmap) + (uintaddr & ~PAGE_MASK)); } @@ -52,10 +49,7 @@ static void __kprobes patch_unmap(int fixmap, unsigned long *flags) { clear_fixmap(fixmap);
- if (flags) - raw_spin_unlock_irqrestore(&patch_lock, *flags); - else - __release(&patch_lock); + raw_spin_unlock_irqrestore(&patch_lock, *flags); }
void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len) @@ -67,8 +61,9 @@ void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len) int mapped;
/* Make sure we don't have any aliases in cache */ - flush_kernel_vmap_range(addr, len); - flush_icache_range(start, end); + flush_kernel_dcache_range_asm(start, end); + flush_kernel_icache_range_asm(start, end); + flush_tlb_kernel_range(start, end);
p = fixmap = patch_map(addr, FIX_TEXT_POKE0, &flags, &mapped);
@@ -81,8 +76,10 @@ void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len) * We're crossing a page boundary, so * need to remap */ - flush_kernel_vmap_range((void *)fixmap, - (p-fixmap) * sizeof(*p)); + flush_kernel_dcache_range_asm((unsigned long)fixmap, + (unsigned long)p); + flush_tlb_kernel_range((unsigned long)fixmap, + (unsigned long)p); if (mapped) patch_unmap(FIX_TEXT_POKE0, &flags); p = fixmap = patch_map(addr, FIX_TEXT_POKE0, &flags, @@ -90,10 +87,10 @@ void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len) } }
- flush_kernel_vmap_range((void *)fixmap, (p-fixmap) * sizeof(*p)); + flush_kernel_dcache_range_asm((unsigned long)fixmap, (unsigned long)p); + flush_tlb_kernel_range((unsigned long)fixmap, (unsigned long)p); if (mapped) patch_unmap(FIX_TEXT_POKE0, &flags); - flush_icache_range(start, end); }
void __kprobes __patch_text(void *addr, u32 insn)
From: Mauricio Faria de Oliveira mfo@canonical.com
commit 6c8e2a256915a223f6289f651d6b926cd7135c9e upstream.
Problem: =======
Userspace might read the zero-page instead of actual data from a direct IO read on a block device if the buffers have been called madvise(MADV_FREE) on earlier (this is discussed below) due to a race between page reclaim on MADV_FREE and blkdev direct IO read.
- Race condition: ==============
During page reclaim, the MADV_FREE page check in try_to_unmap_one() checks if the page is not dirty, then discards its rmap PTE(s) (vs. remap back if the page is dirty).
However, after try_to_unmap_one() returns to shrink_page_list(), it might keep the page _anyway_ if page_ref_freeze() fails (it expects exactly _one_ page reference, from the isolation for page reclaim).
Well, blkdev_direct_IO() gets references for all pages, and on READ operations it only sets them dirty _later_.
So, if MADV_FREE'd pages (i.e., not dirty) are used as buffers for direct IO read from block devices, and page reclaim happens during __blkdev_direct_IO[_simple]() exactly AFTER bio_iov_iter_get_pages() returns, but BEFORE the pages are set dirty, the situation happens.
The direct IO read eventually completes. Now, when userspace reads the buffers, the PTE is no longer there and the page fault handler do_anonymous_page() services that with the zero-page, NOT the data!
A synthetic reproducer is provided.
- Page faults: ===========
If page reclaim happens BEFORE bio_iov_iter_get_pages() the issue doesn't happen, because that faults-in all pages as writeable, so do_anonymous_page() sets up a new page/rmap/PTE, and that is used by direct IO. The userspace reads don't fault as the PTE is there (thus zero-page is not used/setup).
But if page reclaim happens AFTER it / BEFORE setting pages dirty, the PTE is no longer there; the subsequent page faults can't help:
The data-read from the block device probably won't generate faults due to DMA (no MMU) but even in the case it wouldn't use DMA, that happens on different virtual addresses (not user-mapped addresses) because `struct bio_vec` stores `struct page` to figure addresses out (which are different from user-mapped addresses) for the read.
Thus userspace reads (to user-mapped addresses) still fault, then do_anonymous_page() gets another `struct page` that would address/ map to other memory than the `struct page` used by `struct bio_vec` for the read. (The original `struct page` is not available, since it wasn't freed, as page_ref_freeze() failed due to more page refs. And even if it were available, its data cannot be trusted anymore.)
Solution: ========
One solution is to check for the expected page reference count in try_to_unmap_one().
There should be one reference from the isolation (that is also checked in shrink_page_list() with page_ref_freeze()) plus one or more references from page mapping(s) (put in discard: label). Further references mean that rmap/PTE cannot be unmapped/nuked.
(Note: there might be more than one reference from mapping due to fork()/clone() without CLONE_VM, which use the same `struct page` for references, until the copy-on-write page gets copied.)
So, additional page references (e.g., from direct IO read) now prevent the rmap/PTE from being unmapped/dropped; similarly to the page is not freed per shrink_page_list()/page_ref_freeze()).
- Races and Barriers: ==================
The new check in try_to_unmap_one() should be safe in races with bio_iov_iter_get_pages() in get_user_pages() fast and slow paths, as it's done under the PTE lock.
The fast path doesn't take the lock, but it checks if the PTE has changed and if so, it drops the reference and leaves the page for the slow path (which does take that lock).
The fast path requires synchronization w/ full memory barrier: it writes the page reference count first then it reads the PTE later, while try_to_unmap() writes PTE first then it reads page refcount.
And a second barrier is needed, as the page dirty flag should not be read before the page reference count (as in __remove_mapping()). (This can be a load memory barrier only; no writes are involved.)
Call stack/comments:
- try_to_unmap_one() - page_vma_mapped_walk() - map_pte() # see pte_offset_map_lock(): pte_offset_map() spin_lock()
- ptep_get_and_clear() # write PTE - smp_mb() # (new barrier) GUP fast path - page_ref_count() # (new check) read refcount
- page_vma_mapped_walk_done() # see pte_unmap_unlock(): pte_unmap() spin_unlock()
- bio_iov_iter_get_pages() - __bio_iov_iter_get_pages() - iov_iter_get_pages() - get_user_pages_fast() - internal_get_user_pages_fast()
# fast path - lockless_pages_from_mm() - gup_{pgd,p4d,pud,pmd,pte}_range() ptep = pte_offset_map() # not _lock() pte = ptep_get_lockless(ptep)
page = pte_page(pte) try_grab_compound_head(page) # inc refcount # (RMW/barrier # on success)
if (pte_val(pte) != pte_val(*ptep)) # read PTE put_compound_head(page) # dec refcount # go slow path
# slow path - __gup_longterm_unlocked() - get_user_pages_unlocked() - __get_user_pages_locked() - __get_user_pages() - follow_{page,p4d,pud,pmd}_mask() - follow_page_pte() ptep = pte_offset_map_lock() pte = *ptep page = vm_normal_page(pte) try_grab_page(page) # inc refcount pte_unmap_unlock()
- Huge Pages: ==========
Regarding transparent hugepages, that logic shouldn't change, as MADV_FREE (aka lazyfree) pages are PageAnon() && !PageSwapBacked() (madvise_free_pte_range() -> mark_page_lazyfree() -> lru_lazyfree_fn()) thus should reach shrink_page_list() -> split_huge_page_to_list() before try_to_unmap[_one](), so it deals with normal pages only.
(And in case unlikely/TTU_SPLIT_HUGE_PMD/split_huge_pmd_address() happens, which should not or be rare, the page refcount should be greater than mapcount: the head page is referenced by tail pages. That also prevents checking the head `page` then incorrectly call page_remove_rmap(subpage) for a tail page, that isn't even in the shrink_page_list()'s page_list (an effect of split huge pmd/pmvw), as it might happen today in this unlikely scenario.)
MADV_FREE'd buffers: ===================
So, back to the "if MADV_FREE pages are used as buffers" note. The case is arguable, and subject to multiple interpretations.
The madvise(2) manual page on the MADV_FREE advice value says:
1) 'After a successful MADV_FREE ... data will be lost when the kernel frees the pages.' 2) 'the free operation will be canceled if the caller writes into the page' / 'subsequent writes ... will succeed and then [the] kernel cannot free those dirtied pages' 3) 'If there is no subsequent write, the kernel can free the pages at any time.'
Thoughts, questions, considerations... respectively:
1) Since the kernel didn't actually free the page (page_ref_freeze() failed), should the data not have been lost? (on userspace read.) 2) Should writes performed by the direct IO read be able to cancel the free operation? - Should the direct IO read be considered as 'the caller' too, as it's been requested by 'the caller'? - Should the bio technique to dirty pages on return to userspace (bio_check_pages_dirty() is called/used by __blkdev_direct_IO()) be considered in another/special way here? 3) Should an upcoming write from a previously requested direct IO read be considered as a subsequent write, so the kernel should not free the pages? (as it's known at the time of page reclaim.)
And lastly:
Technically, the last point would seem a reasonable consideration and balance, as the madvise(2) manual page apparently (and fairly) seem to assume that 'writes' are memory access from the userspace process (not explicitly considering writes from the kernel or its corner cases; again, fairly).. plus the kernel fix implementation for the corner case of the largely 'non-atomic write' encompassed by a direct IO read operation, is relatively simple; and it helps.
Reproducer: ==========
@ test.c (simplified, but works)
#define _GNU_SOURCE #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/mman.h>
int main() { int fd, i; char *buf;
fd = open(DEV, O_RDONLY | O_DIRECT);
buf = mmap(NULL, BUF_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
for (i = 0; i < BUF_SIZE; i += PAGE_SIZE) buf[i] = 1; // init to non-zero
madvise(buf, BUF_SIZE, MADV_FREE);
read(fd, buf, BUF_SIZE);
for (i = 0; i < BUF_SIZE; i += PAGE_SIZE) printf("%p: 0x%x\n", &buf[i], buf[i]);
return 0; }
@ block/fops.c (formerly fs/block_dev.c)
+#include <linux/swap.h> ... ... __blkdev_direct_IO[_simple](...) { ... + if (!strcmp(current->comm, "good")) + shrink_all_memory(ULONG_MAX); + ret = bio_iov_iter_get_pages(...); + + if (!strcmp(current->comm, "bad")) + shrink_all_memory(ULONG_MAX); ... }
@ shell
# NUM_PAGES=4 # PAGE_SIZE=$(getconf PAGE_SIZE)
# yes | dd of=test.img bs=${PAGE_SIZE} count=${NUM_PAGES} # DEV=$(losetup -f --show test.img)
# gcc -DDEV="$DEV" \ -DBUF_SIZE=$((PAGE_SIZE * NUM_PAGES)) \ -DPAGE_SIZE=${PAGE_SIZE} \ test.c -o test
# od -tx1 $DEV 0000000 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a * 0040000
# mv test good # ./good 0x7f7c10418000: 0x79 0x7f7c10419000: 0x79 0x7f7c1041a000: 0x79 0x7f7c1041b000: 0x79
# mv good bad # ./bad 0x7fa1b8050000: 0x0 0x7fa1b8051000: 0x0 0x7fa1b8052000: 0x0 0x7fa1b8053000: 0x0
Note: the issue is consistent on v5.17-rc3, but it's intermittent with the support of MADV_FREE on v4.5 (60%-70% error; needs swap). [wrap do_direct_IO() in do_blockdev_direct_IO() @ fs/direct-io.c].
- v5.17-rc3:
# for i in {1..1000}; do ./good; done \ | cut -d: -f2 | sort | uniq -c 4000 0x79
# mv good bad # for i in {1..1000}; do ./bad; done \ | cut -d: -f2 | sort | uniq -c 4000 0x0
# free | grep Swap Swap: 0 0 0
- v4.5:
# for i in {1..1000}; do ./good; done \ | cut -d: -f2 | sort | uniq -c 4000 0x79
# mv good bad # for i in {1..1000}; do ./bad; done \ | cut -d: -f2 | sort | uniq -c 2702 0x0 1298 0x79
# swapoff -av swapoff /swap
# for i in {1..1000}; do ./bad; done \ | cut -d: -f2 | sort | uniq -c 4000 0x79
Ceph/TCMalloc: =============
For documentation purposes, the use case driving the analysis/fix is Ceph on Ubuntu 18.04, as the TCMalloc library there still uses MADV_FREE to release unused memory to the system from the mmap'ed page heap (might be committed back/used again; it's not munmap'ed.) - PageHeap::DecommitSpan() -> TCMalloc_SystemRelease() -> madvise() - PageHeap::CommitSpan() -> TCMalloc_SystemCommit() -> do nothing.
Note: TCMalloc switched back to MADV_DONTNEED a few commits after the release in Ubuntu 18.04 (google-perftools/gperftools 2.5), so the issue just 'disappeared' on Ceph on later Ubuntu releases but is still present in the kernel, and can be hit by other use cases.
The observed issue seems to be the old Ceph bug #22464 [1], where checksum mismatches are observed (and instrumentation with buffer dumps shows zero-pages read from mmap'ed/MADV_FREE'd page ranges).
The issue in Ceph was reasonably deemed a kernel bug (comment #50) and mostly worked around with a retry mechanism, but other parts of Ceph could still hit that (rocksdb). Anyway, it's less likely to be hit again as TCMalloc switched out of MADV_FREE by default.
(Some kernel versions/reports from the Ceph bug, and relation with the MADV_FREE introduction/changes; TCMalloc versions not checked.) - 4.4 good - 4.5 (madv_free: introduction) - 4.9 bad - 4.10 good? maybe a swapless system - 4.12 (madv_free: no longer free instantly on swapless systems) - 4.13 bad
[1] https://tracker.ceph.com/issues/22464
Thanks: ======
Several people contributed to analysis/discussions/tests/reproducers in the first stages when drilling down on ceph/tcmalloc/linux kernel:
- Dan Hill - Dan Streetman - Dongdong Tao - Gavin Guo - Gerald Yang - Heitor Alves de Siqueira - Ioanna Alifieraki - Jay Vosburgh - Matthew Ruffell - Ponnuvel Palaniyappan
Reviews, suggestions, corrections, comments:
- Minchan Kim - Yu Zhao - Huang, Ying - John Hubbard - Christoph Hellwig
[mfo@canonical.com: v4] Link: https://lkml.kernel.org/r/20220209202659.183418-1-mfo@canonical.comLink: https://lkml.kernel.org/r/20220131230255.789059-1-mfo@canonical.com
Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages") Signed-off-by: Mauricio Faria de Oliveira mfo@canonical.com Reviewed-by: "Huang, Ying" ying.huang@intel.com Cc: Minchan Kim minchan@kernel.org Cc: Yu Zhao yuzhao@google.com Cc: Yang Shi shy828301@gmail.com Cc: Miaohe Lin linmiaohe@huawei.com Cc: Dan Hill daniel.hill@canonical.com Cc: Dan Streetman dan.streetman@canonical.com Cc: Dongdong Tao dongdong.tao@canonical.com Cc: Gavin Guo gavin.guo@canonical.com Cc: Gerald Yang gerald.yang@canonical.com Cc: Heitor Alves de Siqueira halves@canonical.com Cc: Ioanna Alifieraki ioanna-maria.alifieraki@canonical.com Cc: Jay Vosburgh jay.vosburgh@canonical.com Cc: Matthew Ruffell matthew.ruffell@canonical.com Cc: Ponnuvel Palaniyappan ponnuvel.palaniyappan@canonical.com Cc: stable@vger.kernel.org Cc: Christoph Hellwig hch@infradead.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org [mfo: backport: replace folio/test_flag with page/flag equivalents; real Fixes: 854e9ed09ded ("mm: support madvise(MADV_FREE)") in v4.] Signed-off-by: Mauricio Faria de Oliveira mfo@canonical.com Signed-off-by: Sasha Levin sashal@kernel.org --- mm/rmap.c | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-)
diff --git a/mm/rmap.c b/mm/rmap.c index 6a1e8c7f6213..9e27f9f038d3 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1599,7 +1599,30 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
/* MADV_FREE page check */ if (!PageSwapBacked(page)) { - if (!PageDirty(page)) { + int ref_count, map_count; + + /* + * Synchronize with gup_pte_range(): + * - clear PTE; barrier; read refcount + * - inc refcount; barrier; read PTE + */ + smp_mb(); + + ref_count = page_ref_count(page); + map_count = page_mapcount(page); + + /* + * Order reads for page refcount and dirty flag + * (see comments in __remove_mapping()). + */ + smp_rmb(); + + /* + * The only page refs must be one from isolation + * plus the rmap(s) (dropped by discard:). + */ + if (ref_count == 1 + map_count && + !PageDirty(page)) { /* Invalidate as we cleared the pte */ mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE);
From: Dan Carpenter dan.carpenter@oracle.com
[ Upstream commit 1647b54ed55d4d48c7199d439f8834626576cbe9 ]
This post-op should be a pre-op so that we do not pass -1 as the bit number to test_bit(). The current code will loop downwards from 63 to -1. After changing to a pre-op, it loops from 63 to 0.
Fixes: 71c37505e7ea ("drm/amdgpu/gfx: move more common KIQ code to amdgpu_gfx.c") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index 1916ec84dd71..e7845df6cad2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -266,7 +266,7 @@ static int amdgpu_gfx_kiq_acquire(struct amdgpu_device *adev, * adev->gfx.mec.num_pipe_per_mec * adev->gfx.mec.num_queue_per_pipe;
- while (queue_bit-- >= 0) { + while (--queue_bit >= 0) { if (test_bit(queue_bit, adev->gfx.mec.queue_bitmap)) continue;
From: Andrea Parri (Microsoft) parri.andrea@gmail.com
[ Upstream commit 3a5469582c241abca22500f36a9cb8e9331969cf ]
Initialize the device's dma_{mask,parms} pointers and the device's dma_mask value before invoking device_register(). Address the following trace with 5.17-rc7:
[ 49.646839] WARNING: CPU: 0 PID: 189 at include/linux/dma-mapping.h:543 netvsc_probe+0x37a/0x3a0 [hv_netvsc] [ 49.646928] Call Trace: [ 49.646930] <TASK> [ 49.646935] vmbus_probe+0x40/0x60 [hv_vmbus] [ 49.646942] really_probe+0x1ce/0x3b0 [ 49.646948] __driver_probe_device+0x109/0x180 [ 49.646952] driver_probe_device+0x23/0xa0 [ 49.646955] __device_attach_driver+0x76/0xe0 [ 49.646958] ? driver_allows_async_probing+0x50/0x50 [ 49.646961] bus_for_each_drv+0x84/0xd0 [ 49.646964] __device_attach+0xed/0x170 [ 49.646967] device_initial_probe+0x13/0x20 [ 49.646970] bus_probe_device+0x8f/0xa0 [ 49.646973] device_add+0x41a/0x8e0 [ 49.646975] ? hrtimer_init+0x28/0x80 [ 49.646981] device_register+0x1b/0x20 [ 49.646983] vmbus_device_register+0x5e/0xf0 [hv_vmbus] [ 49.646991] vmbus_add_channel_work+0x12d/0x190 [hv_vmbus] [ 49.646999] process_one_work+0x21d/0x3f0 [ 49.647002] worker_thread+0x4a/0x3b0 [ 49.647005] ? process_one_work+0x3f0/0x3f0 [ 49.647007] kthread+0xff/0x130 [ 49.647011] ? kthread_complete_and_exit+0x20/0x20 [ 49.647015] ret_from_fork+0x22/0x30 [ 49.647020] </TASK> [ 49.647021] ---[ end trace 0000000000000000 ]---
Fixes: 743b237c3a7b0 ("scsi: storvsc: Add Isolation VM support for storvsc driver") Signed-off-by: Andrea Parri (Microsoft) parri.andrea@gmail.com Reviewed-by: Michael Kelley mikelley@microsoft.com Link: https://lore.kernel.org/r/20220315141053.3223-1-parri.andrea@gmail.com Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hv/vmbus_drv.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 12a2b37e87f3..0a05e10ab36c 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -2097,6 +2097,10 @@ int vmbus_device_register(struct hv_device *child_device_obj) child_device_obj->device.parent = &hv_acpi_dev->dev; child_device_obj->device.release = vmbus_device_release;
+ child_device_obj->device.dma_parms = &child_device_obj->dma_parms; + child_device_obj->device.dma_mask = &child_device_obj->dma_mask; + dma_set_mask(&child_device_obj->device, DMA_BIT_MASK(64)); + /* * Register with the LDM. This will kick off the driver/device * binding...which will eventually call vmbus_match() and vmbus_probe() @@ -2122,9 +2126,6 @@ int vmbus_device_register(struct hv_device *child_device_obj) } hv_debug_add_dev_dir(child_device_obj);
- child_device_obj->device.dma_parms = &child_device_obj->dma_parms; - child_device_obj->device.dma_mask = &child_device_obj->dma_mask; - dma_set_mask(&child_device_obj->device, DMA_BIT_MASK(64)); return 0;
err_kset_unregister:
From: Guilherme G. Piccoli gpiccoli@igalia.com
[ Upstream commit 792f232d57ff28bbd5f9c4abe0466b23d5879dc8 ]
The vmbus driver relies on the panic notifier infrastructure to perform some operations when a panic event is detected. Since vmbus can be built as module, it is required that the driver handles both registering and unregistering such panic notifier callback.
After commit 74347a99e73a ("x86/Hyper-V: Unload vmbus channel in hv panic callback") though, the panic notifier registration is done unconditionally in the module initialization routine whereas the unregistering procedure is conditionally guarded and executes only if HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE capability is set.
This patch fixes that by unconditionally unregistering the panic notifier in the module's exit routine as well.
Fixes: 74347a99e73a ("x86/Hyper-V: Unload vmbus channel in hv panic callback") Signed-off-by: Guilherme G. Piccoli gpiccoli@igalia.com Reviewed-by: Michael Kelley mikelley@microsoft.com Link: https://lore.kernel.org/r/20220315203535.682306-1-gpiccoli@igalia.com Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hv/vmbus_drv.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 0a05e10ab36c..4bea1dfa41cd 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -2781,10 +2781,15 @@ static void __exit vmbus_exit(void) if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) { kmsg_dump_unregister(&hv_kmsg_dumper); unregister_die_notifier(&hyperv_die_block); - atomic_notifier_chain_unregister(&panic_notifier_list, - &hyperv_panic_block); }
+ /* + * The panic notifier is always registered, hence we should + * also unconditionally unregister it here as well. + */ + atomic_notifier_chain_unregister(&panic_notifier_list, + &hyperv_panic_block); + free_page((unsigned long)hv_panic_page); unregister_sysctl_table(hv_ctl_table_hdr); hv_ctl_table_hdr = NULL;
From: Jeremy Sowden jeremy@azazel.net
[ Upstream commit 31818213170caa51d116eb5dc1167b88523b4fe1 ]
The `nft_bitwise_reduce` and `nft_bitwise_fast_reduce` functions should compare the bitwise operation in `expr` with the tracked operation associated with the destination register of `expr`. However, instead of being called on `expr` and `track->regs[priv->dreg].selector`, `nft_expr_priv` is called on `expr` twice, so both reduce functions return true even when the operations differ.
Fixes: be5650f8f47e ("netfilter: nft_bitwise: track register operations") Signed-off-by: Jeremy Sowden jeremy@azazel.net Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_bitwise.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nft_bitwise.c b/net/netfilter/nft_bitwise.c index 7b727d3ebf9d..04bd2f89afe8 100644 --- a/net/netfilter/nft_bitwise.c +++ b/net/netfilter/nft_bitwise.c @@ -287,7 +287,7 @@ static bool nft_bitwise_reduce(struct nft_regs_track *track, if (!track->regs[priv->sreg].selector) return false;
- bitwise = nft_expr_priv(expr); + bitwise = nft_expr_priv(track->regs[priv->dreg].selector); if (track->regs[priv->sreg].selector == track->regs[priv->dreg].selector && track->regs[priv->dreg].bitwise && track->regs[priv->dreg].bitwise->ops == expr->ops && @@ -434,7 +434,7 @@ static bool nft_bitwise_fast_reduce(struct nft_regs_track *track, if (!track->regs[priv->sreg].selector) return false;
- bitwise = nft_expr_priv(expr); + bitwise = nft_expr_priv(track->regs[priv->dreg].selector); if (track->regs[priv->sreg].selector == track->regs[priv->dreg].selector && track->regs[priv->dreg].bitwise && track->regs[priv->dreg].bitwise->ops == expr->ops &&
From: ChenXiaoSong chenxiaosong2@huawei.com
[ Upstream commit ab0fc21bc7105b54bafd85bd8b82742f9e68898a ]
This reverts commit 44942b4e457beda00981f616402a1a791e8c616e.
After secondly opening a file with O_ACCMODE|O_DIRECT flags, nfs4_valid_open_stateid() will dereference NULL nfs4_state when lseek().
Reproducer: 1. mount -t nfs -o vers=4.2 $server_ip:/ /mnt/ 2. fd = open("/mnt/file", O_ACCMODE|O_DIRECT|O_CREAT) 3. close(fd) 4. fd = open("/mnt/file", O_ACCMODE|O_DIRECT) 5. lseek(fd)
Reported-by: Lyu Tao tao.lyu@epfl.ch Signed-off-by: ChenXiaoSong chenxiaosong2@huawei.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/inode.c | 1 - fs/nfs/nfs4file.c | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index d96baa4450e3..e4fb939a2904 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -1180,7 +1180,6 @@ int nfs_open(struct inode *inode, struct file *filp) nfs_fscache_open_file(inode, filp); return 0; } -EXPORT_SYMBOL_GPL(nfs_open);
/* * This function is called whenever some part of NFS notices that diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c index e79ae4cbc395..c178db86a6e8 100644 --- a/fs/nfs/nfs4file.c +++ b/fs/nfs/nfs4file.c @@ -51,7 +51,7 @@ nfs4_file_open(struct inode *inode, struct file *filp) return err;
if ((openflags & O_ACCMODE) == 3) - return nfs_open(inode, filp); + openflags--;
/* We can't create new files here */ openflags &= ~(O_CREAT|O_EXCL);
From: ChenXiaoSong chenxiaosong2@huawei.com
[ Upstream commit b243874f6f9568b2daf1a00e9222cacdc15e159c ]
open() with O_ACCMODE|O_DIRECT flags secondly will fail.
Reproducer: 1. mount -t nfs -o vers=4.2 $server_ip:/ /mnt/ 2. fd = open("/mnt/file", O_ACCMODE|O_DIRECT|O_CREAT) 3. close(fd) 4. fd = open("/mnt/file", O_ACCMODE|O_DIRECT)
Server nfsd4_decode_share_access() will fail with error nfserr_bad_xdr when client use incorrect share access mode of 0.
Fix this by using NFS4_SHARE_ACCESS_BOTH share access mode in client, just like firstly opening.
Fixes: ce4ef7c0a8a05 ("NFS: Split out NFS v4 file operations") Signed-off-by: ChenXiaoSong chenxiaosong2@huawei.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/dir.c | 10 ---------- fs/nfs/internal.h | 10 ++++++++++ fs/nfs/nfs4file.c | 6 ++++-- 3 files changed, 14 insertions(+), 12 deletions(-)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 75cb1cbe4cde..911bdb35eb08 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1853,16 +1853,6 @@ const struct dentry_operations nfs4_dentry_operations = { }; EXPORT_SYMBOL_GPL(nfs4_dentry_operations);
-static fmode_t flags_to_mode(int flags) -{ - fmode_t res = (__force fmode_t)flags & FMODE_EXEC; - if ((flags & O_ACCMODE) != O_WRONLY) - res |= FMODE_READ; - if ((flags & O_ACCMODE) != O_RDONLY) - res |= FMODE_WRITE; - return res; -} - static struct nfs_open_context *create_nfs_open_context(struct dentry *dentry, int open_flags, struct file *filp) { return alloc_nfs_open_context(dentry, flags_to_mode(open_flags), filp); diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index db9f611e8efd..465e39ff018d 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -42,6 +42,16 @@ static inline bool nfs_lookup_is_soft_revalidate(const struct dentry *dentry) return true; }
+static inline fmode_t flags_to_mode(int flags) +{ + fmode_t res = (__force fmode_t)flags & FMODE_EXEC; + if ((flags & O_ACCMODE) != O_WRONLY) + res |= FMODE_READ; + if ((flags & O_ACCMODE) != O_RDONLY) + res |= FMODE_WRITE; + return res; +} + /* * Note: RFC 1813 doesn't limit the number of auth flavors that * a server can return, so make something up. diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c index c178db86a6e8..e34af48fb4f4 100644 --- a/fs/nfs/nfs4file.c +++ b/fs/nfs/nfs4file.c @@ -32,6 +32,7 @@ nfs4_file_open(struct inode *inode, struct file *filp) struct dentry *parent = NULL; struct inode *dir; unsigned openflags = filp->f_flags; + fmode_t f_mode; struct iattr attr; int err;
@@ -50,8 +51,9 @@ nfs4_file_open(struct inode *inode, struct file *filp) if (err) return err;
+ f_mode = filp->f_mode; if ((openflags & O_ACCMODE) == 3) - openflags--; + f_mode |= flags_to_mode(openflags);
/* We can't create new files here */ openflags &= ~(O_CREAT|O_EXCL); @@ -59,7 +61,7 @@ nfs4_file_open(struct inode *inode, struct file *filp) parent = dget_parent(dentry); dir = d_inode(parent);
- ctx = alloc_nfs_open_context(file_dentry(filp), filp->f_mode, filp); + ctx = alloc_nfs_open_context(file_dentry(filp), f_mode, filp); err = PTR_ERR(ctx); if (IS_ERR(ctx)) goto out;
From: Tomas Henzl thenzl@redhat.com
[ Upstream commit f06aa52cb2723ec67e92df463827b800d6c477d1 ]
The request_queue may be NULL in a request, for example when it comes from scsi_ioctl_reset(). Check it before use.
Fixes: f3fa33acca9f ("block: remove the ->rq_disk field in struct request") Link: https://lore.kernel.org/r/20220324134603.28463-1-thenzl@redhat.com Reported-by: Changhui Zhong czhong@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Tomas Henzl thenzl@redhat.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/scsi_logging.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/scsi_logging.c b/drivers/scsi/scsi_logging.c index 1f8f80b2dbfc..a9f8de5e9639 100644 --- a/drivers/scsi/scsi_logging.c +++ b/drivers/scsi/scsi_logging.c @@ -30,7 +30,7 @@ static inline const char *scmd_name(const struct scsi_cmnd *scmd) { struct request *rq = scsi_cmd_to_rq((struct scsi_cmnd *)scmd);
- if (!rq->q->disk) + if (!rq->q || !rq->q->disk) return NULL; return rq->q->disk->disk_name; }
From: Kevin Groeneveld kgroeneveld@lenbrook.com
[ Upstream commit bc5519c18a32ce855bb51b9f5eceb77a9489d080 ]
Commit 2e27f576abc6 ("scsi: scsi_ioctl: Call scsi_cmd_ioctl() from scsi_ioctl()") seems to have a typo as it is checking ret instead of cmd in the if statement checking for CDROMCLOSETRAY and CDROMEJECT. This changes the behaviour of these ioctls as the cdrom_ioctl handling of these is more restrictive than the scsi_ioctl version.
Link: https://lore.kernel.org/r/20220323002242.21157-1-kgroeneveld@lenbrook.com Fixes: 2e27f576abc6 ("scsi: scsi_ioctl: Call scsi_cmd_ioctl() from scsi_ioctl()") Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Kevin Groeneveld kgroeneveld@lenbrook.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/sr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c index f925b1f1f9ad..a0beb11abdc9 100644 --- a/drivers/scsi/sr.c +++ b/drivers/scsi/sr.c @@ -578,7 +578,7 @@ static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
scsi_autopm_get_device(sdev);
- if (ret != CDROMCLOSETRAY && ret != CDROMEJECT) { + if (cmd != CDROMCLOSETRAY && cmd != CDROMEJECT) { ret = cdrom_ioctl(&cd->cdi, bdev, mode, cmd, arg); if (ret != -ENOSYS) goto put;
From: John Garry john.garry@huawei.com
[ Upstream commit eaba83b5b8506bbc9ee7ca2f10aeab3fff3719e7 ]
In commit edb854a3680b ("scsi: core: Reallocate device's budget map on queue depth change"), the sbitmap for the device budget map may be reallocated after the slave device depth is configured.
When the sbitmap is reallocated we use the result from scsi_device_max_queue_depth() for the sbitmap size, but don't resize to match the actual device queue depth.
Fix by resizing the sbitmap after reallocating the budget sbitmap. We do this instead of init'ing the sbitmap to the device queue depth as the user may want to change the queue depth later via sysfs or other.
Link: https://lore.kernel.org/r/1647423870-143867-1-git-send-email-john.garry@huaw... Fixes: edb854a3680b ("scsi: core: Reallocate device's budget map on queue depth change") Tested-by: Damien Le Moal damien.lemoal@opensource.wdc.com Reviewed-by: Ming Lei ming.lei@redhat.com Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: John Garry john.garry@huawei.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/scsi_scan.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index f4e6c68ac99e..2ef78083f1ef 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -223,6 +223,8 @@ static int scsi_realloc_sdev_budget_map(struct scsi_device *sdev, int ret; struct sbitmap sb_backup;
+ depth = min_t(unsigned int, depth, scsi_device_max_queue_depth(sdev)); + /* * realloc if new shift is calculated, which is caused by setting * up one new default queue depth after calling ->slave_configure @@ -245,6 +247,9 @@ static int scsi_realloc_sdev_budget_map(struct scsi_device *sdev, scsi_device_max_queue_depth(sdev), new_shift, GFP_KERNEL, sdev->request_queue->node, false, true); + if (!ret) + sbitmap_resize(&sdev->budget_map, depth); + if (need_free) { if (ret) sdev->budget_map = sb_backup;
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit 16ed828b872d12ccba8f07bcc446ae89ba662f9c ]
The error handling path of the probe releases a resource that is not freed in the remove function. In some cases, a ioremap() must be undone.
Add the missing iounmap() call in the remove function.
Link: https://lore.kernel.org/r/247066a3104d25f9a05de8b3270fc3c848763bcc.164767326... Fixes: 45804fbb00ee ("[SCSI] 53c700: Amiga Zorro NCR53c710 SCSI") Reviewed-by: Geert Uytterhoeven geert@linux-m68k.org Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/zorro7xx.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/scsi/zorro7xx.c b/drivers/scsi/zorro7xx.c index 27b9e2baab1a..7acf9193a9e8 100644 --- a/drivers/scsi/zorro7xx.c +++ b/drivers/scsi/zorro7xx.c @@ -159,6 +159,8 @@ static void zorro7xx_remove_one(struct zorro_dev *z) scsi_remove_host(host);
NCR_700_release(host); + if (host->base > 0x01000000) + iounmap(hostdata->base); kfree(hostdata); free_irq(host->irq, host); zorro_release_device(z);
From: Jason Wang jasowang@redhat.com
[ Upstream commit 55ebf0d60e3cc6c9e8593399e185842c00e12f36 ]
A userspace triggerable infinite loop could happen in mlx5_cvq_kick_handler() if userspace keeps sending a huge amount of cvq requests.
Fixing this by introducing a quota and re-queue the work if we're out of the budget (currently the implicit budget is one) . While at it, using a per device work struct to avoid on demand memory allocation for cvq.
Fixes: 5262912ef3cfc ("vdpa/mlx5: Add support for control VQ and MAC setting") Signed-off-by: Jason Wang jasowang@redhat.com Link: https://lore.kernel.org/r/20220329042109.4029-1-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Acked-by: Eli Cohen elic@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 9fe1071a9644..1b5de3af1a62 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -163,6 +163,7 @@ struct mlx5_vdpa_net { u32 cur_num_vqs; struct notifier_block nb; struct vdpa_callback config_cb; + struct mlx5_vdpa_wq_ent cvq_ent; };
static void free_resources(struct mlx5_vdpa_net *ndev); @@ -1616,10 +1617,10 @@ static void mlx5_cvq_kick_handler(struct work_struct *work) ndev = to_mlx5_vdpa_ndev(mvdev); cvq = &mvdev->cvq; if (!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))) - goto out; + return;
if (!cvq->ready) - goto out; + return;
while (true) { err = vringh_getdesc_iotlb(&cvq->vring, &cvq->riov, &cvq->wiov, &cvq->head, @@ -1653,9 +1654,10 @@ static void mlx5_cvq_kick_handler(struct work_struct *work)
if (vringh_need_notify_iotlb(&cvq->vring)) vringh_notify(&cvq->vring); + + queue_work(mvdev->wq, &wqent->work); + break; } -out: - kfree(wqent); }
static void mlx5_vdpa_kick_vq(struct vdpa_device *vdev, u16 idx) @@ -1663,7 +1665,6 @@ static void mlx5_vdpa_kick_vq(struct vdpa_device *vdev, u16 idx) struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); struct mlx5_vdpa_virtqueue *mvq; - struct mlx5_vdpa_wq_ent *wqent;
if (!is_index_valid(mvdev, idx)) return; @@ -1672,13 +1673,7 @@ static void mlx5_vdpa_kick_vq(struct vdpa_device *vdev, u16 idx) if (!mvdev->wq || !mvdev->cvq.ready) return;
- wqent = kzalloc(sizeof(*wqent), GFP_ATOMIC); - if (!wqent) - return; - - wqent->mvdev = mvdev; - INIT_WORK(&wqent->work, mlx5_cvq_kick_handler); - queue_work(mvdev->wq, &wqent->work); + queue_work(mvdev->wq, &ndev->cvq_ent.work); return; }
@@ -2668,6 +2663,8 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name, if (err) goto err_mr;
+ ndev->cvq_ent.mvdev = mvdev; + INIT_WORK(&ndev->cvq_ent.work, mlx5_cvq_kick_handler); mvdev->wq = create_singlethread_workqueue("mlx5_vdpa_wq"); if (!mvdev->wq) { err = -ENOMEM;
From: Taehee Yoo ap420073@gmail.com
[ Upstream commit 059a47f1da93811d37533556d67e72f2261b1127 ]
After rx/tx ring buffer size is changed, kernel panic occurs when it acts XDP_TX or XDP_REDIRECT.
When tx/rx ring buffer size is changed(ethtool -G), sfc driver reallocates and reinitializes rx and tx queues and their buffer (tx_queue->buffer). But it misses reinitializing xdp queues(efx->xdp_tx_queues). So, while it is acting XDP_TX or XDP_REDIRECT, it uses the uninitialized tx_queue->buffer.
A new function efx_set_xdp_channels() is separated from efx_set_channels() to handle only xdp queues.
Splat looks like: BUG: kernel NULL pointer dereference, address: 000000000000002a #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#4] PREEMPT SMP NOPTI RIP: 0010:efx_tx_map_chunk+0x54/0x90 [sfc] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G D 5.17.0+ #55 e8beeee8289528f11357029357cf Code: 48 8b 8d a8 01 00 00 48 8d 14 52 4c 8d 2c d0 44 89 e0 48 85 c9 74 0e 44 89 e2 4c 89 f6 48 80 RSP: 0018:ffff92f121e45c60 EFLAGS: 00010297 RIP: 0010:efx_tx_map_chunk+0x54/0x90 [sfc] RAX: 0000000000000040 RBX: ffff92ea506895c0 RCX: ffffffffc0330870 RDX: 0000000000000001 RSI: 00000001139b10ce RDI: ffff92ea506895c0 RBP: ffffffffc0358a80 R08: 00000001139b110d R09: 0000000000000000 R10: 0000000000000001 R11: ffff92ea414c0088 R12: 0000000000000040 R13: 0000000000000018 R14: 00000001139b10ce R15: ffff92ea506895c0 FS: 0000000000000000(0000) GS:ffff92f121ec0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Code: 48 8b 8d a8 01 00 00 48 8d 14 52 4c 8d 2c d0 44 89 e0 48 85 c9 74 0e 44 89 e2 4c 89 f6 48 80 CR2: 000000000000002a CR3: 00000003e6810004 CR4: 00000000007706e0 RSP: 0018:ffff92f121e85c60 EFLAGS: 00010297 PKRU: 55555554 RAX: 0000000000000040 RBX: ffff92ea50689700 RCX: ffffffffc0330870 RDX: 0000000000000001 RSI: 00000001145a90ce RDI: ffff92ea50689700 RBP: ffffffffc0358a80 R08: 00000001145a910d R09: 0000000000000000 R10: 0000000000000001 R11: ffff92ea414c0088 R12: 0000000000000040 R13: 0000000000000018 R14: 00000001145a90ce R15: ffff92ea50689700 FS: 0000000000000000(0000) GS:ffff92f121e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000002a CR3: 00000003e6810005 CR4: 00000000007706e0 PKRU: 55555554 Call Trace: <IRQ> efx_xdp_tx_buffers+0x12b/0x3d0 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] __efx_rx_packet+0x5c3/0x930 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] efx_rx_packet+0x28c/0x2e0 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] efx_ef10_ev_process+0x5f8/0xf40 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] ? enqueue_task_fair+0x95/0x550 efx_poll+0xc4/0x360 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5]
Fixes: 3990a8fffbda ("sfc: allocate channels for XDP tx queues") Signed-off-by: Taehee Yoo ap420073@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/sfc/efx_channels.c | 146 +++++++++++++----------- 1 file changed, 81 insertions(+), 65 deletions(-)
diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c index ead550ae2709..5e587cb853b9 100644 --- a/drivers/net/ethernet/sfc/efx_channels.c +++ b/drivers/net/ethernet/sfc/efx_channels.c @@ -764,6 +764,85 @@ void efx_remove_channels(struct efx_nic *efx) kfree(efx->xdp_tx_queues); }
+static int efx_set_xdp_tx_queue(struct efx_nic *efx, int xdp_queue_number, + struct efx_tx_queue *tx_queue) +{ + if (xdp_queue_number >= efx->xdp_tx_queue_count) + return -EINVAL; + + netif_dbg(efx, drv, efx->net_dev, + "Channel %u TXQ %u is XDP %u, HW %u\n", + tx_queue->channel->channel, tx_queue->label, + xdp_queue_number, tx_queue->queue); + efx->xdp_tx_queues[xdp_queue_number] = tx_queue; + return 0; +} + +static void efx_set_xdp_channels(struct efx_nic *efx) +{ + struct efx_tx_queue *tx_queue; + struct efx_channel *channel; + unsigned int next_queue = 0; + int xdp_queue_number = 0; + int rc; + + /* We need to mark which channels really have RX and TX + * queues, and adjust the TX queue numbers if we have separate + * RX-only and TX-only channels. + */ + efx_for_each_channel(channel, efx) { + if (channel->channel < efx->tx_channel_offset) + continue; + + if (efx_channel_is_xdp_tx(channel)) { + efx_for_each_channel_tx_queue(tx_queue, channel) { + tx_queue->queue = next_queue++; + rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, + tx_queue); + if (rc == 0) + xdp_queue_number++; + } + } else { + efx_for_each_channel_tx_queue(tx_queue, channel) { + tx_queue->queue = next_queue++; + netif_dbg(efx, drv, efx->net_dev, + "Channel %u TXQ %u is HW %u\n", + channel->channel, tx_queue->label, + tx_queue->queue); + } + + /* If XDP is borrowing queues from net stack, it must + * use the queue with no csum offload, which is the + * first one of the channel + * (note: tx_queue_by_type is not initialized yet) + */ + if (efx->xdp_txq_queues_mode == + EFX_XDP_TX_QUEUES_BORROWED) { + tx_queue = &channel->tx_queue[0]; + rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, + tx_queue); + if (rc == 0) + xdp_queue_number++; + } + } + } + WARN_ON(efx->xdp_txq_queues_mode == EFX_XDP_TX_QUEUES_DEDICATED && + xdp_queue_number != efx->xdp_tx_queue_count); + WARN_ON(efx->xdp_txq_queues_mode != EFX_XDP_TX_QUEUES_DEDICATED && + xdp_queue_number > efx->xdp_tx_queue_count); + + /* If we have more CPUs than assigned XDP TX queues, assign the already + * existing queues to the exceeding CPUs + */ + next_queue = 0; + while (xdp_queue_number < efx->xdp_tx_queue_count) { + tx_queue = efx->xdp_tx_queues[next_queue++]; + rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, tx_queue); + if (rc == 0) + xdp_queue_number++; + } +} + int efx_realloc_channels(struct efx_nic *efx, u32 rxq_entries, u32 txq_entries) { struct efx_channel *other_channel[EFX_MAX_CHANNELS], *channel; @@ -835,6 +914,7 @@ int efx_realloc_channels(struct efx_nic *efx, u32 rxq_entries, u32 txq_entries) efx_init_napi_channel(efx->channel[i]); }
+ efx_set_xdp_channels(efx); out: /* Destroy unused channel structures */ for (i = 0; i < efx->n_channels; i++) { @@ -867,26 +947,9 @@ int efx_realloc_channels(struct efx_nic *efx, u32 rxq_entries, u32 txq_entries) goto out; }
-static inline int -efx_set_xdp_tx_queue(struct efx_nic *efx, int xdp_queue_number, - struct efx_tx_queue *tx_queue) -{ - if (xdp_queue_number >= efx->xdp_tx_queue_count) - return -EINVAL; - - netif_dbg(efx, drv, efx->net_dev, "Channel %u TXQ %u is XDP %u, HW %u\n", - tx_queue->channel->channel, tx_queue->label, - xdp_queue_number, tx_queue->queue); - efx->xdp_tx_queues[xdp_queue_number] = tx_queue; - return 0; -} - int efx_set_channels(struct efx_nic *efx) { - struct efx_tx_queue *tx_queue; struct efx_channel *channel; - unsigned int next_queue = 0; - int xdp_queue_number; int rc;
efx->tx_channel_offset = @@ -904,61 +967,14 @@ int efx_set_channels(struct efx_nic *efx) return -ENOMEM; }
- /* We need to mark which channels really have RX and TX - * queues, and adjust the TX queue numbers if we have separate - * RX-only and TX-only channels. - */ - xdp_queue_number = 0; efx_for_each_channel(channel, efx) { if (channel->channel < efx->n_rx_channels) channel->rx_queue.core_index = channel->channel; else channel->rx_queue.core_index = -1; - - if (channel->channel >= efx->tx_channel_offset) { - if (efx_channel_is_xdp_tx(channel)) { - efx_for_each_channel_tx_queue(tx_queue, channel) { - tx_queue->queue = next_queue++; - rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, tx_queue); - if (rc == 0) - xdp_queue_number++; - } - } else { - efx_for_each_channel_tx_queue(tx_queue, channel) { - tx_queue->queue = next_queue++; - netif_dbg(efx, drv, efx->net_dev, "Channel %u TXQ %u is HW %u\n", - channel->channel, tx_queue->label, - tx_queue->queue); - } - - /* If XDP is borrowing queues from net stack, it must use the queue - * with no csum offload, which is the first one of the channel - * (note: channel->tx_queue_by_type is not initialized yet) - */ - if (efx->xdp_txq_queues_mode == EFX_XDP_TX_QUEUES_BORROWED) { - tx_queue = &channel->tx_queue[0]; - rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, tx_queue); - if (rc == 0) - xdp_queue_number++; - } - } - } } - WARN_ON(efx->xdp_txq_queues_mode == EFX_XDP_TX_QUEUES_DEDICATED && - xdp_queue_number != efx->xdp_tx_queue_count); - WARN_ON(efx->xdp_txq_queues_mode != EFX_XDP_TX_QUEUES_DEDICATED && - xdp_queue_number > efx->xdp_tx_queue_count);
- /* If we have more CPUs than assigned XDP TX queues, assign the already - * existing queues to the exceeding CPUs - */ - next_queue = 0; - while (xdp_queue_number < efx->xdp_tx_queue_count) { - tx_queue = efx->xdp_tx_queues[next_queue++]; - rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, tx_queue); - if (rc == 0) - xdp_queue_number++; - } + efx_set_xdp_channels(efx);
rc = netif_set_real_num_tx_queues(efx->net_dev, efx->n_tx_channels); if (rc)
From: Ziyang Xuan william.xuanziyang@huawei.com
[ Upstream commit 9381fe8c849cfbe50245ac01fc077554f6eaa0e2 ]
The memory size of tls_ctx->rx.iv for AES128-CCM is 12 setting in tls_set_sw_offload(). The return value of crypto_aead_ivsize() for "ccm(aes)" is 16. So memcpy() require 16 bytes from 12 bytes memory space will trigger slab-out-of-bounds bug as following:
================================================================== BUG: KASAN: slab-out-of-bounds in decrypt_internal+0x385/0xc40 [tls] Read of size 16 at addr ffff888114e84e60 by task tls/10911
Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x5e/0x5db ? decrypt_internal+0x385/0xc40 [tls] kasan_report+0xab/0x120 ? decrypt_internal+0x385/0xc40 [tls] kasan_check_range+0xf9/0x1e0 memcpy+0x20/0x60 decrypt_internal+0x385/0xc40 [tls] ? tls_get_rec+0x2e0/0x2e0 [tls] ? process_rx_list+0x1a5/0x420 [tls] ? tls_setup_from_iter.constprop.0+0x2e0/0x2e0 [tls] decrypt_skb_update+0x9d/0x400 [tls] tls_sw_recvmsg+0x3c8/0xb50 [tls]
Allocated by task 10911: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 tls_set_sw_offload+0x2eb/0xa20 [tls] tls_setsockopt+0x68c/0x700 [tls] __sys_setsockopt+0xfe/0x1b0
Replace the crypto_aead_ivsize() with prot->iv_size + prot->salt_size when memcpy() iv value in TLS_1_3_VERSION scenario.
Fixes: f295b3ae9f59 ("net/tls: Add support of AES128-CCM based ciphers") Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index efc84845bb6b..75a699591383 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1495,7 +1495,7 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb, if (prot->version == TLS_1_3_VERSION || prot->cipher_type == TLS_CIPHER_CHACHA20_POLY1305) memcpy(iv + iv_offset, tls_ctx->rx.iv, - crypto_aead_ivsize(ctx->aead_recv)); + prot->iv_size + prot->salt_size); else memcpy(iv + iv_offset, tls_ctx->rx.iv, prot->salt_size);
From: Eyal Birger eyal.birger@gmail.com
[ Upstream commit 012d69fbfcc739f846766c1da56ef8b493b803b5 ]
in commit 048939088220 ("vrf: add mac header for tunneled packets when sniffer is attached") an Ethernet header was cooked for traffic originating from tunnel devices.
However, the header is added based on whether the mac_header is unset and ignores cases where the device doesn't expose a mac header to upper layers, such as in ip tunnels like ipip and gre.
Traffic originating from such devices still appears garbled when capturing on the vrf device.
Fix by observing whether the original device exposes a header to upper layers, similar to the logic done in af_packet.
In addition, skb->mac_len needs to be adjusted after adding the Ethernet header for the skb_push/pull() surrounding dev_queue_xmit_nit() to work on these packets.
Fixes: 048939088220 ("vrf: add mac header for tunneled packets when sniffer is attached") Signed-off-by: Eyal Birger eyal.birger@gmail.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/vrf.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index e0b1ab99a359..f37adcef4bef 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -1266,6 +1266,7 @@ static int vrf_prepare_mac_header(struct sk_buff *skb, eth = (struct ethhdr *)skb->data;
skb_reset_mac_header(skb); + skb_reset_mac_len(skb);
/* we set the ethernet destination and the source addresses to the * address of the VRF device. @@ -1295,9 +1296,9 @@ static int vrf_prepare_mac_header(struct sk_buff *skb, */ static int vrf_add_mac_header_if_unset(struct sk_buff *skb, struct net_device *vrf_dev, - u16 proto) + u16 proto, struct net_device *orig_dev) { - if (skb_mac_header_was_set(skb)) + if (skb_mac_header_was_set(skb) && dev_has_header(orig_dev)) return 0;
return vrf_prepare_mac_header(skb, vrf_dev, proto); @@ -1403,6 +1404,8 @@ static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev,
/* if packet is NDISC then keep the ingress interface */ if (!is_ndisc) { + struct net_device *orig_dev = skb->dev; + vrf_rx_stats(vrf_dev, skb->len); skb->dev = vrf_dev; skb->skb_iif = vrf_dev->ifindex; @@ -1411,7 +1414,8 @@ static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev, int err;
err = vrf_add_mac_header_if_unset(skb, vrf_dev, - ETH_P_IPV6); + ETH_P_IPV6, + orig_dev); if (likely(!err)) { skb_push(skb, skb->mac_len); dev_queue_xmit_nit(skb, vrf_dev); @@ -1441,6 +1445,8 @@ static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev, static struct sk_buff *vrf_ip_rcv(struct net_device *vrf_dev, struct sk_buff *skb) { + struct net_device *orig_dev = skb->dev; + skb->dev = vrf_dev; skb->skb_iif = vrf_dev->ifindex; IPCB(skb)->flags |= IPSKB_L3SLAVE; @@ -1461,7 +1467,8 @@ static struct sk_buff *vrf_ip_rcv(struct net_device *vrf_dev, if (!list_empty(&vrf_dev->ptype_all)) { int err;
- err = vrf_add_mac_header_if_unset(skb, vrf_dev, ETH_P_IP); + err = vrf_add_mac_header_if_unset(skb, vrf_dev, ETH_P_IP, + orig_dev); if (likely(!err)) { skb_push(skb, skb->mac_len); dev_queue_xmit_nit(skb, vrf_dev);
From: Jean-Philippe Brucker jean-philippe@linaro.org
[ Upstream commit 1effe8ca4e34c34cdd9318436a4232dcb582ebf4 ]
Fix a use-after-free when using page_pool with page fragments. We encountered this problem during normal RX in the hns3 driver:
(1) Initially we have three descriptors in the RX queue. The first one allocates PAGE1 through page_pool, and the other two allocate one half of PAGE2 each. Page references look like this:
RX_BD1 _______ PAGE1 RX_BD2 _______ PAGE2 RX_BD3 _________/
(2) Handle RX on the first descriptor. Allocate SKB1, eventually added to the receive queue by tcp_queue_rcv().
(3) Handle RX on the second descriptor. Allocate SKB2 and pass it to netif_receive_skb():
netif_receive_skb(SKB2) ip_rcv(SKB2) SKB3 = skb_clone(SKB2)
SKB2 and SKB3 share a reference to PAGE2 through skb_shinfo()->dataref. The other ref to PAGE2 is still held by RX_BD3:
SKB2 ---+- PAGE2 SKB3 __/ / RX_BD3 _________/
(3b) Now while handling TCP, coalesce SKB3 with SKB1:
tcp_v4_rcv(SKB3) tcp_try_coalesce(to=SKB1, from=SKB3) // succeeds kfree_skb_partial(SKB3) skb_release_data(SKB3) // drops one dataref
SKB1 _____ PAGE1 ____ SKB2 _____ PAGE2 / RX_BD3 _________/
In skb_try_coalesce(), __skb_frag_ref() takes a page reference to PAGE2, where it should instead have increased the page_pool frag reference, pp_frag_count. Without coalescing, when releasing both SKB2 and SKB3, a single reference to PAGE2 would be dropped. Now when releasing SKB1 and SKB2, two references to PAGE2 will be dropped, resulting in underflow.
(3c) Drop SKB2:
af_packet_rcv(SKB2) consume_skb(SKB2) skb_release_data(SKB2) // drops second dataref page_pool_return_skb_page(PAGE2) // drops one pp_frag_count
SKB1 _____ PAGE1 ____ PAGE2 / RX_BD3 _________/
(4) Userspace calls recvmsg() Copies SKB1 and releases it. Since SKB3 was coalesced with SKB1, we release the SKB3 page as well:
tcp_eat_recv_skb(SKB1) skb_release_data(SKB1) page_pool_return_skb_page(PAGE1) page_pool_return_skb_page(PAGE2) // drops second pp_frag_count
(5) PAGE2 is freed, but the third RX descriptor was still using it! In our case this causes IOMMU faults, but it would silently corrupt memory if the IOMMU was disabled.
Change the logic that checks whether pp_recycle SKBs can be coalesced. We still reject differing pp_recycle between 'from' and 'to' SKBs, but in order to avoid the situation described above, we also reject coalescing when both 'from' and 'to' are pp_recycled and 'from' is cloned.
The new logic allows coalescing a cloned pp_recycle SKB into a page refcounted one, because in this case the release (4) will drop the right reference, the one taken by skb_try_coalesce().
Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") Suggested-by: Alexander Duyck alexanderduyck@fb.com Signed-off-by: Jean-Philippe Brucker jean-philippe@linaro.org Reviewed-by: Yunsheng Lin linyunsheng@huawei.com Reviewed-by: Alexander Duyck alexanderduyck@fb.com Acked-by: Ilias Apalodimas ilias.apalodimas@linaro.org Acked-by: Jesper Dangaard Brouer brouer@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/skbuff.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index a8a2fb745274..180fa6a26ad4 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -5275,11 +5275,18 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, if (skb_cloned(to)) return false;
- /* The page pool signature of struct page will eventually figure out - * which pages can be recycled or not but for now let's prohibit slab - * allocated and page_pool allocated SKBs from being coalesced. + /* In general, avoid mixing slab allocated and page_pool allocated + * pages within the same SKB. However when @to is not pp_recycle and + * @from is cloned, we can transition frag pages from page_pool to + * reference counted. + * + * On the other hand, don't allow coalescing two pp_recycle SKBs if + * @from is cloned, in case the SKB is using page_pool fragment + * references (PP_FLAG_PAGE_FRAG). Since we only take full page + * references for cloned SKBs at the moment that would result in + * inconsistent reference counts. */ - if (to->pp_recycle != from->pp_recycle) + if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from))) return false;
if (len <= skb_tailroom(to)) {
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 066dfc4290406b1b0b014ae3267d4266a344efd1 ]
This reverts commit a1ff94c2973c43bc1e2677ac63ebb15b1d1ff846.
Switch drivers that don't implement ->port_change_mtu() will cause the DSA master to remain with an MTU of 1500, since we've deleted the other code path. In turn, this causes a regression for those systems, where MTU-sized traffic can no longer be terminated.
Revert the change taking into account the fact that rtnl_lock() is now taken top-level from the callers of dsa_master_setup() and dsa_master_teardown(). Also add a comment in order for it to be absolutely clear why it is still needed.
Fixes: a1ff94c2973c ("net: dsa: stop updating master MTU from master.c") Reported-by: Luiz Angelo Daros de Luca luizluca@gmail.com Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Tested-by: Luiz Angelo Daros de Luca luizluca@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/dsa/master.c | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-)
diff --git a/net/dsa/master.c b/net/dsa/master.c index 880f910b23a9..10b51ffbb6f4 100644 --- a/net/dsa/master.c +++ b/net/dsa/master.c @@ -337,11 +337,24 @@ static const struct attribute_group dsa_group = {
static struct lock_class_key dsa_master_addr_list_lock_key;
+static void dsa_master_reset_mtu(struct net_device *dev) +{ + int err; + + err = dev_set_mtu(dev, ETH_DATA_LEN); + if (err) + netdev_dbg(dev, + "Unable to reset MTU to exclude DSA overheads\n"); +} + int dsa_master_setup(struct net_device *dev, struct dsa_port *cpu_dp) { + const struct dsa_device_ops *tag_ops = cpu_dp->tag_ops; struct dsa_switch *ds = cpu_dp->ds; struct device_link *consumer_link; - int ret; + int mtu, ret; + + mtu = ETH_DATA_LEN + dsa_tag_protocol_overhead(tag_ops);
/* The DSA master must use SET_NETDEV_DEV for this to work. */ consumer_link = device_link_add(ds->dev, dev->dev.parent, @@ -351,6 +364,15 @@ int dsa_master_setup(struct net_device *dev, struct dsa_port *cpu_dp) "Failed to create a device link to DSA switch %s\n", dev_name(ds->dev));
+ /* The switch driver may not implement ->port_change_mtu(), case in + * which dsa_slave_change_mtu() will not update the master MTU either, + * so we need to do that here. + */ + ret = dev_set_mtu(dev, mtu); + if (ret) + netdev_warn(dev, "error %d setting MTU to %d to include DSA overhead\n", + ret, mtu); + /* If we use a tagging format that doesn't have an ethertype * field, make sure that all packets from this point on get * sent to the tag format's receive function. @@ -388,6 +410,7 @@ void dsa_master_teardown(struct net_device *dev) sysfs_remove_group(&dev->dev.kobj, &dsa_group); dsa_netdev_ops_set(dev, NULL); dsa_master_ethtool_teardown(dev); + dsa_master_reset_mtu(dev); dsa_master_set_promiscuity(dev, -1);
dev->dsa_ptr = NULL;
From: Ivan Vecera ivecera@redhat.com
[ Upstream commit bd8c624c0cd59de0032752ba3001c107bba97f7b ]
VSI is set as default forwarding one when promisc mode is set for PF interface, when PF is switched to switchdev mode or when VF driver asks to enable allmulticast or promisc mode for the VF interface (when vf-true-promisc-support priv flag is off). The third case is buggy because in that case VSI associated with VF remains as default one after VF removal.
Reproducer: 1. Create VF echo 1 > sys/class/net/ens7f0/device/sriov_numvfs 2. Enable allmulticast or promisc mode on VF ip link set ens7f0v0 allmulticast on ip link set ens7f0v0 promisc on 3. Delete VF echo 0 > sys/class/net/ens7f0/device/sriov_numvfs 4. Try to enable promisc mode on PF ip link set ens7f0 promisc on
Although it looks that promisc mode on PF is enabled the opposite is true because ice_vsi_sync_fltr() responsible for IFF_PROMISC handling first checks if any other VSI is set as default forwarding one and if so the function does not do anything. At this point it is not possible to enable promisc mode on PF without re-probe device.
To resolve the issue this patch clear default forwarding VSI during ice_vsi_release() when the VSI to be released is the default one.
Fixes: 01b5e89aab49 ("ice: Add VF promiscuous support") Signed-off-by: Ivan Vecera ivecera@redhat.com Reviewed-by: Michal Swiatkowski michal.swiatkowski@linux.intel.com Reviewed-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Signed-off-by: Alice Michael alice.michael@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_lib.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index 53256aca27c7..20d755822d43 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -3147,6 +3147,8 @@ int ice_vsi_release(struct ice_vsi *vsi) } }
+ if (ice_is_vsi_dflt_vsi(pf->first_sw, vsi)) + ice_clear_dflt_vsi(pf->first_sw); ice_fltr_remove_all(vsi); ice_rm_vsi_lan_cfg(vsi->port_info, vsi->idx); err = ice_rm_vsi_rdma_cfg(vsi->port_info, vsi->idx);
From: Ivan Vecera ivecera@redhat.com
[ Upstream commit 2c0069f3f91f125b1b2ce66cc6bea8eb134723c3 ]
Commit 2ccc1c1ccc671b ("ice: Remove excess error variables") merged the usage of 'status' and 'err' variables into single one in function ice_set_mac_address(). Unfortunately this causes a regression when call of ice_fltr_add_mac() returns -EEXIST because this return value does not indicate an error in this case but value of 'err' remains to be -EEXIST till the end of the function and is returned to caller.
Prior mentioned commit this does not happen because return value of ice_fltr_add_mac() was stored to 'status' variable first and if it was -EEXIST then 'err' remains to be zero.
Fix the problem by reset 'err' to zero when ice_fltr_add_mac() returns -EEXIST.
Fixes: 2ccc1c1ccc671b ("ice: Remove excess error variables") Signed-off-by: Ivan Vecera ivecera@redhat.com Reviewed-by: Jacob Keller jacob.e.keller@intel.com Acked-by: Alexander Lobakin alexandr.lobakin@intel.com Signed-off-by: Alice Michael alice.michael@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_main.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 296f9d5f7408..92e0fe9316b9 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -5432,16 +5432,19 @@ static int ice_set_mac_address(struct net_device *netdev, void *pi)
/* Add filter for new MAC. If filter exists, return success */ err = ice_fltr_add_mac(vsi, mac, ICE_FWD_TO_VSI); - if (err == -EEXIST) + if (err == -EEXIST) { /* Although this MAC filter is already present in hardware it's * possible in some cases (e.g. bonding) that dev_addr was * modified outside of the driver and needs to be restored back * to this value. */ netdev_dbg(netdev, "filter for MAC %pM already exists\n", mac); - else if (err) + + return 0; + } else if (err) { /* error if the new filter addition failed */ err = -EADDRNOTAVAIL; + }
err_update_filters: if (err) {
From: Matt Johnston matt@codeconstruct.com.au
[ Upstream commit 60be976ac45137657b7b505d7e0d44d0e51accb7 ]
dev_hard_header() returns the length of the header, so we need to test for negative errors rather than non-zero.
Fixes: 889b7da23abf ("mctp: Add initial routing framework") Signed-off-by: Matt Johnston matt@codeconstruct.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/mctp/route.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mctp/route.c b/net/mctp/route.c index 05fbd318eb98..d47438f5233d 100644 --- a/net/mctp/route.c +++ b/net/mctp/route.c @@ -507,7 +507,7 @@ static int mctp_route_output(struct mctp_route *route, struct sk_buff *skb)
rc = dev_hard_header(skb, skb->dev, ntohs(skb->protocol), daddr, skb->dev->dev_addr, skb->len); - if (rc) { + if (rc < 0) { kfree_skb(skb); return -EHOSTUNREACH; }
From: Matt Johnston matt@codeconstruct.com.au
[ Upstream commit 4a9dda1c1da65beee994f0977a56a9a21c5db2a7 ]
Previously the skb was allocated with headroom MCTP_HEADER_MAXLEN, but that isn't sufficient if we are using devs that are not MCTP specific.
This also adds a check that the smctp_halen provided to sendmsg for extended addressing is the correct size for the netdev.
Fixes: 833ef3b91de6 ("mctp: Populate socket implementation") Reported-by: Matthew Rinaldi mjrinal@g.clemson.edu Signed-off-by: Matt Johnston matt@codeconstruct.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/mctp.h | 2 -- net/mctp/af_mctp.c | 46 +++++++++++++++++++++++++++++++++------------- net/mctp/route.c | 14 +++++++++++--- 3 files changed, 44 insertions(+), 18 deletions(-)
diff --git a/include/net/mctp.h b/include/net/mctp.h index 7e35ec79b909..204ae3aebc0d 100644 --- a/include/net/mctp.h +++ b/include/net/mctp.h @@ -36,8 +36,6 @@ struct mctp_hdr { #define MCTP_HDR_TAG_SHIFT 0 #define MCTP_HDR_TAG_MASK GENMASK(2, 0)
-#define MCTP_HEADER_MAXLEN 4 - #define MCTP_INITIAL_DEFAULT_NET 1
static inline bool mctp_address_ok(mctp_eid_t eid) diff --git a/net/mctp/af_mctp.c b/net/mctp/af_mctp.c index c921de63b494..fc05351d3a82 100644 --- a/net/mctp/af_mctp.c +++ b/net/mctp/af_mctp.c @@ -90,13 +90,13 @@ static int mctp_bind(struct socket *sock, struct sockaddr *addr, int addrlen) static int mctp_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) { DECLARE_SOCKADDR(struct sockaddr_mctp *, addr, msg->msg_name); - const int hlen = MCTP_HEADER_MAXLEN + sizeof(struct mctp_hdr); int rc, addrlen = msg->msg_namelen; struct sock *sk = sock->sk; struct mctp_sock *msk = container_of(sk, struct mctp_sock, sk); struct mctp_skb_cb *cb; struct mctp_route *rt; - struct sk_buff *skb; + struct sk_buff *skb = NULL; + int hlen;
if (addr) { if (addrlen < sizeof(struct sockaddr_mctp)) @@ -119,6 +119,34 @@ static int mctp_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) if (addr->smctp_network == MCTP_NET_ANY) addr->smctp_network = mctp_default_net(sock_net(sk));
+ /* direct addressing */ + if (msk->addr_ext && addrlen >= sizeof(struct sockaddr_mctp_ext)) { + DECLARE_SOCKADDR(struct sockaddr_mctp_ext *, + extaddr, msg->msg_name); + struct net_device *dev; + + rc = -EINVAL; + rcu_read_lock(); + dev = dev_get_by_index_rcu(sock_net(sk), extaddr->smctp_ifindex); + /* check for correct halen */ + if (dev && extaddr->smctp_halen == dev->addr_len) { + hlen = LL_RESERVED_SPACE(dev) + sizeof(struct mctp_hdr); + rc = 0; + } + rcu_read_unlock(); + if (rc) + goto err_free; + rt = NULL; + } else { + rt = mctp_route_lookup(sock_net(sk), addr->smctp_network, + addr->smctp_addr.s_addr); + if (!rt) { + rc = -EHOSTUNREACH; + goto err_free; + } + hlen = LL_RESERVED_SPACE(rt->dev->dev) + sizeof(struct mctp_hdr); + } + skb = sock_alloc_send_skb(sk, hlen + 1 + len, msg->msg_flags & MSG_DONTWAIT, &rc); if (!skb) @@ -137,8 +165,8 @@ static int mctp_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) cb = __mctp_cb(skb); cb->net = addr->smctp_network;
- /* direct addressing */ - if (msk->addr_ext && addrlen >= sizeof(struct sockaddr_mctp_ext)) { + if (!rt) { + /* fill extended address in cb */ DECLARE_SOCKADDR(struct sockaddr_mctp_ext *, extaddr, msg->msg_name);
@@ -149,17 +177,9 @@ static int mctp_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) }
cb->ifindex = extaddr->smctp_ifindex; + /* smctp_halen is checked above */ cb->halen = extaddr->smctp_halen; memcpy(cb->haddr, extaddr->smctp_haddr, cb->halen); - - rt = NULL; - } else { - rt = mctp_route_lookup(sock_net(sk), addr->smctp_network, - addr->smctp_addr.s_addr); - if (!rt) { - rc = -EHOSTUNREACH; - goto err_free; - } }
rc = mctp_local_output(sk, rt, skb, addr->smctp_addr.s_addr, diff --git a/net/mctp/route.c b/net/mctp/route.c index d47438f5233d..1a296e211a50 100644 --- a/net/mctp/route.c +++ b/net/mctp/route.c @@ -498,6 +498,11 @@ static int mctp_route_output(struct mctp_route *route, struct sk_buff *skb)
if (cb->ifindex) { /* direct route; use the hwaddr we stashed in sendmsg */ + if (cb->halen != skb->dev->addr_len) { + /* sanity check, sendmsg should have already caught this */ + kfree_skb(skb); + return -EMSGSIZE; + } daddr = cb->haddr; } else { /* If lookup fails let the device handle daddr==NULL */ @@ -707,7 +712,7 @@ static int mctp_do_fragment_route(struct mctp_route *rt, struct sk_buff *skb, { const unsigned int hlen = sizeof(struct mctp_hdr); struct mctp_hdr *hdr, *hdr2; - unsigned int pos, size; + unsigned int pos, size, headroom; struct sk_buff *skb2; int rc; u8 seq; @@ -721,6 +726,9 @@ static int mctp_do_fragment_route(struct mctp_route *rt, struct sk_buff *skb, return -EMSGSIZE; }
+ /* keep same headroom as the original skb */ + headroom = skb_headroom(skb); + /* we've got the header */ skb_pull(skb, hlen);
@@ -728,7 +736,7 @@ static int mctp_do_fragment_route(struct mctp_route *rt, struct sk_buff *skb, /* size of message payload */ size = min(mtu - hlen, skb->len - pos);
- skb2 = alloc_skb(MCTP_HEADER_MAXLEN + hlen + size, GFP_KERNEL); + skb2 = alloc_skb(headroom + hlen + size, GFP_KERNEL); if (!skb2) { rc = -ENOMEM; break; @@ -744,7 +752,7 @@ static int mctp_do_fragment_route(struct mctp_route *rt, struct sk_buff *skb, skb_set_owner_w(skb2, skb->sk);
/* establish packet */ - skb_reserve(skb2, MCTP_HEADER_MAXLEN); + skb_reserve(skb2, headroom); skb_reset_network_header(skb2); skb_put(skb2, hlen + size); skb2->transport_header = skb2->network_header + hlen;
From: Nikolay Aleksandrov razor@blackwall.org
[ Upstream commit 6bf92d70e690b7ff12b24f4bfff5e5434d019b82 ]
FRR folks have hit a kernel warning[1] while deleting routes[2] which is caused by trying to delete a route pointing to a nexthop id without specifying nhid but matching on an interface. That is, a route is found but we hit a warning while matching it. The warning is from fib_info_nh() in include/net/nexthop.h because we run it on a fib_info with nexthop object. The call chain is: inet_rtm_delroute -> fib_table_delete -> fib_nh_match (called with a nexthop fib_info and also with fc_oif set thus calling fib_info_nh on the fib_info and triggering the warning). The fix is to not do any matching in that branch if the fi has a nexthop object because those are managed separately. I.e. we should match when deleting without nh spec and should fail when deleting a nexthop route with old-style nh spec because nexthop objects are managed separately, e.g.: $ ip r show 1.2.3.4/32 1.2.3.4 nhid 12 via 192.168.11.2 dev dummy0
$ ip r del 1.2.3.4/32 $ ip r del 1.2.3.4/32 nhid 12 <both should work>
$ ip r del 1.2.3.4/32 dev dummy0 <should fail with ESRCH>
[1] [ 523.462226] ------------[ cut here ]------------ [ 523.462230] WARNING: CPU: 14 PID: 22893 at include/net/nexthop.h:468 fib_nh_match+0x210/0x460 [ 523.462236] Modules linked in: dummy rpcsec_gss_krb5 xt_socket nf_socket_ipv4 nf_socket_ipv6 ip6table_raw iptable_raw bpf_preload xt_statistic ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_mark nf_tables xt_nat veth nf_conntrack_netlink nfnetlink xt_addrtype br_netfilter overlay dm_crypt nfsv3 nfs fscache netfs vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack 8021q garp mrp ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bridge stp llc rfcomm snd_seq_dummy snd_hrtimer rpcrdma rdma_cm iw_cm ib_cm ib_core ip6table_filter xt_comment ip6_tables vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) qrtr bnep binfmt_misc xfs vfat fat squashfs loop nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(POE) nvidia(POE) intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi btusb btrtl iwlmvm uvcvideo btbcm snd_hda_intel edac_mce_amd [ 523.462274] videobuf2_vmalloc videobuf2_memops btintel snd_intel_dspcfg videobuf2_v4l2 snd_intel_sdw_acpi bluetooth snd_usb_audio snd_hda_codec mac80211 snd_usbmidi_lib joydev snd_hda_core videobuf2_common kvm_amd snd_rawmidi snd_hwdep snd_seq videodev ccp snd_seq_device libarc4 ecdh_generic mc snd_pcm kvm iwlwifi snd_timer drm_kms_helper snd cfg80211 cec soundcore irqbypass rapl wmi_bmof i2c_piix4 rfkill k10temp pcspkr acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc drm zram ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel nvme sp5100_tco r8169 nvme_core wmi ipmi_devintf ipmi_msghandler fuse [ 523.462300] CPU: 14 PID: 22893 Comm: ip Tainted: P OE 5.16.18-200.fc35.x86_64 #1 [ 523.462302] Hardware name: Micro-Star International Co., Ltd. MS-7C37/MPG X570 GAMING EDGE WIFI (MS-7C37), BIOS 1.C0 10/29/2020 [ 523.462303] RIP: 0010:fib_nh_match+0x210/0x460 [ 523.462304] Code: 7c 24 20 48 8b b5 90 00 00 00 e8 bb ee f4 ff 48 8b 7c 24 20 41 89 c4 e8 ee eb f4 ff 45 85 e4 0f 85 2e fe ff ff e9 4c ff ff ff <0f> 0b e9 17 ff ff ff 3c 0a 0f 85 61 fe ff ff 48 8b b5 98 00 00 00 [ 523.462306] RSP: 0018:ffffaa53d4d87928 EFLAGS: 00010286 [ 523.462307] RAX: 0000000000000000 RBX: ffffaa53d4d87a90 RCX: ffffaa53d4d87bb0 [ 523.462308] RDX: ffff9e3d2ee6be80 RSI: ffffaa53d4d87a90 RDI: ffffffff920ed380 [ 523.462309] RBP: ffff9e3d2ee6be80 R08: 0000000000000064 R09: 0000000000000000 [ 523.462310] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000031 [ 523.462310] R13: 0000000000000020 R14: 0000000000000000 R15: ffff9e3d331054e0 [ 523.462311] FS: 00007f245517c1c0(0000) GS:ffff9e492ed80000(0000) knlGS:0000000000000000 [ 523.462313] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 523.462313] CR2: 000055e5dfdd8268 CR3: 00000003ef488000 CR4: 0000000000350ee0 [ 523.462315] Call Trace: [ 523.462316] <TASK> [ 523.462320] fib_table_delete+0x1a9/0x310 [ 523.462323] inet_rtm_delroute+0x93/0x110 [ 523.462325] rtnetlink_rcv_msg+0x133/0x370 [ 523.462327] ? _copy_to_iter+0xb5/0x6f0 [ 523.462330] ? rtnl_calcit.isra.0+0x110/0x110 [ 523.462331] netlink_rcv_skb+0x50/0xf0 [ 523.462334] netlink_unicast+0x211/0x330 [ 523.462336] netlink_sendmsg+0x23f/0x480 [ 523.462338] sock_sendmsg+0x5e/0x60 [ 523.462340] ____sys_sendmsg+0x22c/0x270 [ 523.462341] ? import_iovec+0x17/0x20 [ 523.462343] ? sendmsg_copy_msghdr+0x59/0x90 [ 523.462344] ? __mod_lruvec_page_state+0x85/0x110 [ 523.462348] ___sys_sendmsg+0x81/0xc0 [ 523.462350] ? netlink_seq_start+0x70/0x70 [ 523.462352] ? __dentry_kill+0x13a/0x180 [ 523.462354] ? __fput+0xff/0x250 [ 523.462356] __sys_sendmsg+0x49/0x80 [ 523.462358] do_syscall_64+0x3b/0x90 [ 523.462361] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 523.462364] RIP: 0033:0x7f24552aa337 [ 523.462365] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 523.462366] RSP: 002b:00007fff7f05a838 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 523.462368] RAX: ffffffffffffffda RBX: 000000006245bf91 RCX: 00007f24552aa337 [ 523.462368] RDX: 0000000000000000 RSI: 00007fff7f05a8a0 RDI: 0000000000000003 [ 523.462369] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 [ 523.462370] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000001 [ 523.462370] R13: 00007fff7f05ce08 R14: 0000000000000000 R15: 000055e5dfdd1040 [ 523.462373] </TASK> [ 523.462374] ---[ end trace ba537bc16f6bf4ed ]---
[2] https://github.com/FRRouting/frr/issues/6412
Fixes: 4c7e8084fd46 ("ipv4: Plumb support for nexthop object in a fib_info") Signed-off-by: Nikolay Aleksandrov razor@blackwall.org Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/fib_semantics.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 2dd375f7407b..0a0f49770345 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -888,8 +888,13 @@ int fib_nh_match(struct net *net, struct fib_config *cfg, struct fib_info *fi, }
if (cfg->fc_oif || cfg->fc_gw_family) { - struct fib_nh *nh = fib_info_nh(fi, 0); + struct fib_nh *nh; + + /* cannot match on nexthop object attributes */ + if (fi->nh) + return 1;
+ nh = fib_info_nh(fi, 0); if (cfg->fc_encap) { if (fib_encap_match(net, cfg->fc_encap_type, cfg->fc_encap, nh, cfg, extack))
From: Chen-Yu Tsai wens@csie.org
[ Upstream commit c21cabb0fd0b54b8b54235fc1ecfe1195a23bcb2 ]
In commit 9cbadf094d9d ("net: stmmac: support max-speed device tree property"), when DT platforms don't set "max-speed", max_speed is set to -1; for non-DT platforms, it stays the default 0.
Prior to commit eeef2f6b9f6e ("net: stmmac: Start adding phylink support"), the check for a valid max_speed setting was to check if it was greater than zero. This commit got it right, but subsequent patches just checked for non-zero, which is incorrect for DT platforms.
In commit 92c3807b9ac3 ("net: stmmac: convert to phylink_get_linkmodes()") the conversion switched completely to checking for non-zero value as a valid value, which caused 1000base-T to stop getting advertised by default.
Instead of trying to fix all the checks, simply leave max_speed alone if DT property parsing fails.
Fixes: 9cbadf094d9d ("net: stmmac: support max-speed device tree property") Fixes: 92c3807b9ac3 ("net: stmmac: convert to phylink_get_linkmodes()") Signed-off-by: Chen-Yu Tsai wens@csie.org Acked-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Reviewed-by: Srinivas Kandagatla srinivas.kandagatla@linaro.org Link: https://lore.kernel.org/r/20220331184832.16316-1-wens@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c index 5d29f336315b..11e1055e8260 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c @@ -431,8 +431,7 @@ stmmac_probe_config_dt(struct platform_device *pdev, u8 *mac) plat->phylink_node = np;
/* Get max speed of operation from device tree */ - if (of_property_read_u32(np, "max-speed", &plat->max_speed)) - plat->max_speed = -1; + of_property_read_u32(np, "max-speed", &plat->max_speed);
plat->bus_id = of_alias_get_id(np, "ethernet"); if (plat->bus_id < 0)
From: Jiasheng Jiang jiasheng@iscas.ac.cn
[ Upstream commit 8027a9ad9b3568c5eb49c968ad6c97f279d76730 ]
As the possible failure of the allocation, kmemdup() may return NULL pointer. Therefore, it should be better to check the return value of kmemdup() and return error if fails.
Fixes: dc80d7038883 ("drm/imx-ldb: Add support to drm-bridge") Signed-off-by: Jiasheng Jiang jiasheng@iscas.ac.cn Signed-off-by: Philipp Zabel p.zabel@pengutronix.de Link: https://lore.kernel.org/r/20220105074729.2363657-1-jiasheng@iscas.ac.cn Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/imx/imx-ldb.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/imx/imx-ldb.c b/drivers/gpu/drm/imx/imx-ldb.c index e5078d03020d..fb0e951248f6 100644 --- a/drivers/gpu/drm/imx/imx-ldb.c +++ b/drivers/gpu/drm/imx/imx-ldb.c @@ -572,6 +572,8 @@ static int imx_ldb_panel_ddc(struct device *dev, edidp = of_get_property(child, "edid", &edid_len); if (edidp) { channel->edid = kmemdup(edidp, edid_len, GFP_KERNEL); + if (!channel->edid) + return -ENOMEM; } else if (!channel->panel) { /* fallback to display-timings node */ ret = of_get_drm_display_mode(child,
From: José Expósito jose.exposito89@gmail.com
[ Upstream commit bce81feb03a20fca7bbdd1c4af16b4e9d5c0e1d3 ]
Avoid leaking the display mode variable if of_get_drm_display_mode fails.
Fixes: 76ecd9c9fb24 ("drm/imx: parallel-display: check return code from of_get_drm_display_mode()") Addresses-Coverity-ID: 1443943 ("Resource leak") Signed-off-by: José Expósito jose.exposito89@gmail.com Signed-off-by: Philipp Zabel p.zabel@pengutronix.de Link: https://lore.kernel.org/r/20220108165230.44610-1-jose.exposito89@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/imx/parallel-display.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/imx/parallel-display.c b/drivers/gpu/drm/imx/parallel-display.c index 06cb1a59b9bc..63ba2ad84679 100644 --- a/drivers/gpu/drm/imx/parallel-display.c +++ b/drivers/gpu/drm/imx/parallel-display.c @@ -75,8 +75,10 @@ static int imx_pd_connector_get_modes(struct drm_connector *connector) ret = of_get_drm_display_mode(np, &imxpd->mode, &imxpd->bus_flags, OF_USE_NATIVE_MODE); - if (ret) + if (ret) { + drm_mode_destroy(connector->dev, mode); return ret; + }
drm_mode_copy(mode, &imxpd->mode); mode->type |= DRM_MODE_TYPE_DRIVER | DRM_MODE_TYPE_PREFERRED;
From: Liu Ying victor.liu@nxp.com
[ Upstream commit e8083acc3f8cc2097917018e947fd4c857f60454 ]
In dw_hdmi_imx_probe(), if error happens after dw_hdmi_probe() returns successfully, dw_hdmi_remove() should be called where necessary as bailout.
Fixes: c805ec7eb210 ("drm/imx: dw_hdmi-imx: move initialization into probe") Cc: Philipp Zabel p.zabel@pengutronix.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Shawn Guo shawnguo@kernel.org Cc: Sascha Hauer s.hauer@pengutronix.de Cc: Pengutronix Kernel Team kernel@pengutronix.de Cc: Fabio Estevam festevam@gmail.com Cc: NXP Linux Team linux-imx@nxp.com Signed-off-by: Liu Ying victor.liu@nxp.com Signed-off-by: Philipp Zabel p.zabel@pengutronix.de Link: https://lore.kernel.org/r/20220128091944.3831256-1-victor.liu@nxp.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/imx/dw_hdmi-imx.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/imx/dw_hdmi-imx.c b/drivers/gpu/drm/imx/dw_hdmi-imx.c index 87428fb23d9f..a2277a0d6d06 100644 --- a/drivers/gpu/drm/imx/dw_hdmi-imx.c +++ b/drivers/gpu/drm/imx/dw_hdmi-imx.c @@ -222,6 +222,7 @@ static int dw_hdmi_imx_probe(struct platform_device *pdev) struct device_node *np = pdev->dev.of_node; const struct of_device_id *match = of_match_node(dw_hdmi_imx_dt_ids, np); struct imx_hdmi *hdmi; + int ret;
hdmi = devm_kzalloc(&pdev->dev, sizeof(*hdmi), GFP_KERNEL); if (!hdmi) @@ -243,10 +244,15 @@ static int dw_hdmi_imx_probe(struct platform_device *pdev) hdmi->bridge = of_drm_find_bridge(np); if (!hdmi->bridge) { dev_err(hdmi->dev, "Unable to find bridge\n"); + dw_hdmi_remove(hdmi->hdmi); return -ENODEV; }
- return component_add(&pdev->dev, &dw_hdmi_imx_ops); + ret = component_add(&pdev->dev, &dw_hdmi_imx_ops); + if (ret) + dw_hdmi_remove(hdmi->hdmi); + + return ret; }
static int dw_hdmi_imx_remove(struct platform_device *pdev)
From: Axel Lin axel.lin@ingics.com
[ Upstream commit 17049bf9de55a42ee96fd34520aff8a484677675 ]
The active_discharge_on setting was missed, so output discharge resistor is always disabled. Fix it.
Fixes: 0555d41497de ("regulator: rtq2134: Add support for Richtek RTQ2134 SubPMIC") Signed-off-by: Axel Lin axel.lin@ingics.com Link: https://lore.kernel.org/r/20220404022514.449231-1-axel.lin@ingics.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/regulator/rtq2134-regulator.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/regulator/rtq2134-regulator.c b/drivers/regulator/rtq2134-regulator.c index f21e3f8b21f2..8e13dea354a2 100644 --- a/drivers/regulator/rtq2134-regulator.c +++ b/drivers/regulator/rtq2134-regulator.c @@ -285,6 +285,7 @@ static const unsigned int rtq2134_buck_ramp_delay_table[] = { .enable_mask = RTQ2134_VOUTEN_MASK, \ .active_discharge_reg = RTQ2134_REG_BUCK##_id##_CFG0, \ .active_discharge_mask = RTQ2134_ACTDISCHG_MASK, \ + .active_discharge_on = RTQ2134_ACTDISCHG_MASK, \ .ramp_reg = RTQ2134_REG_BUCK##_id##_RSPCFG, \ .ramp_mask = RTQ2134_RSPUP_MASK, \ .ramp_delay_table = rtq2134_buck_ramp_delay_table, \
From: Geert Uytterhoeven geert+renesas@glider.be
[ Upstream commit 2f8cf5f642e80f8b6b0e660a9c86924a1f41cd80 ]
If rpcif_hw_init() fails, Runtime PM is left enabled.
Fixes: b04cc0d912eb80d3 ("memory: renesas-rpc-if: Add support for RZ/G2L") Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Reviewed-by: Wolfram Sang wsa+renesas@sang-engineering.com Link: https://lore.kernel.org/r/1c78a1f447d019bb66b6e7787f520ae78821e2ae.164856228... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-rpc-if.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/spi-rpc-if.c b/drivers/spi/spi-rpc-if.c index fe82f3575df4..24ec1c83f379 100644 --- a/drivers/spi/spi-rpc-if.c +++ b/drivers/spi/spi-rpc-if.c @@ -158,14 +158,18 @@ static int rpcif_spi_probe(struct platform_device *pdev)
error = rpcif_hw_init(rpc, false); if (error) - return error; + goto out_disable_rpm;
error = spi_register_controller(ctlr); if (error) { dev_err(&pdev->dev, "spi_register_controller failed\n"); - rpcif_disable_rpm(rpc); + goto out_disable_rpm; }
+ return 0; + +out_disable_rpm: + rpcif_disable_rpm(rpc); return error; }
From: Axel Lin axel.lin@ingics.com
[ Upstream commit 2316f0fc0ad2aa87a568ceaf3d76be983ee555c3 ]
Without active_discharge_on setting, the SWITCH1 discharge enable control is always disabled. Fix it.
Fixes: 3b15ccac161a ("regulator: Add regulator driver for ATC260x PMICs") Signed-off-by: Axel Lin axel.lin@ingics.com Link: https://lore.kernel.org/r/20220403132235.123727-1-axel.lin@ingics.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/regulator/atc260x-regulator.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/regulator/atc260x-regulator.c b/drivers/regulator/atc260x-regulator.c index 05147d2c3842..485e58b264c0 100644 --- a/drivers/regulator/atc260x-regulator.c +++ b/drivers/regulator/atc260x-regulator.c @@ -292,6 +292,7 @@ enum atc2603c_reg_ids { .bypass_mask = BIT(5), \ .active_discharge_reg = ATC2603C_PMU_SWITCH_CTL, \ .active_discharge_mask = BIT(1), \ + .active_discharge_on = BIT(1), \ .owner = THIS_MODULE, \ }
From: Phil Auld pauld@redhat.com
[ Upstream commit 5524cbb1bfcdff0cad0aaa9f94e6092002a07259 ]
Arm64 systems rely on store_cpu_topology() to call update_siblings_masks() to transfer the toplogy to the various cpu masks. This needs to be done before the call to notify_cpu_starting() which tells the scheduler about each cpu found, otherwise the core scheduling data structures are setup in a way that does not match the actual topology.
With smt_mask not setup correctly we bail on `cpumask_weight(smt_mask) == 1` for !leaders in:
notify_cpu_starting() cpuhp_invoke_callback_range() sched_cpu_starting() sched_core_cpu_starting()
which leads to rq->core not being correctly set for !leader-rq's.
Without this change stress-ng (which enables core scheduling in its prctl tests in newer versions -- i.e. with PR_SCHED_CORE support) causes a warning and then a crash (trimmed for legibility):
[ 1853.805168] ------------[ cut here ]------------ [ 1853.809784] task_rq(b)->core != rq->core [ 1853.809792] WARNING: CPU: 117 PID: 0 at kernel/sched/fair.c:11102 cfs_prio_less+0x1b4/0x1c4 ... [ 1854.015210] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010 ... [ 1854.231256] Call trace: [ 1854.233689] pick_next_task+0x3dc/0x81c [ 1854.237512] __schedule+0x10c/0x4cc [ 1854.240988] schedule_idle+0x34/0x54
Fixes: 9edeaea1bc45 ("sched: Core-wide rq->lock") Signed-off-by: Phil Auld pauld@redhat.com Reviewed-by: Dietmar Eggemann dietmar.eggemann@arm.com Tested-by: Dietmar Eggemann dietmar.eggemann@arm.com Link: https://lore.kernel.org/r/20220331153926.25742-1-pauld@redhat.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/kernel/smp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 27df5c1e6baa..3b46041f2b97 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -234,6 +234,7 @@ asmlinkage notrace void secondary_start_kernel(void) * Log the CPU info before it is marked online and might get read. */ cpuinfo_store_cpu(); + store_cpu_topology(cpu);
/* * Enable GIC and timers. @@ -242,7 +243,6 @@ asmlinkage notrace void secondary_start_kernel(void)
ipi_setup(cpu);
- store_cpu_topology(cpu); numa_add_cpu(cpu);
/*
From: Pavan Chebbi pavan.chebbi@broadcom.com
[ Upstream commit 4f81def272de17dc4bbd89ac38f49b2676c9b3d2 ]
If there are more CPUs than the number of TX XDP rings, multiple XDP redirects can select the same TX ring based on the CPU on which XDP redirect is called. Add locking when needed and use static key to decide whether to take the lock.
Fixes: f18c2b77b2e4 ("bnxt_en: optimized XDP_REDIRECT support") Signed-off-by: Pavan Chebbi pavan.chebbi@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 7 +++++++ drivers/net/ethernet/broadcom/bnxt/bnxt.h | 2 ++ drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 8 ++++++++ drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h | 2 ++ 4 files changed, 19 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index b1c98d1408b8..6af0ae1d0c46 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -3224,6 +3224,7 @@ static int bnxt_alloc_tx_rings(struct bnxt *bp) } qidx = bp->tc_to_qidx[j]; ring->queue_id = bp->q_info[qidx].queue_id; + spin_lock_init(&txr->xdp_tx_lock); if (i < bp->tx_nr_rings_xdp) continue; if (i % bp->tx_nr_rings_per_tc == (bp->tx_nr_rings_per_tc - 1)) @@ -10294,6 +10295,12 @@ static int __bnxt_open_nic(struct bnxt *bp, bool irq_re_init, bool link_re_init) if (irq_re_init) udp_tunnel_nic_reset_ntf(bp->dev);
+ if (bp->tx_nr_rings_xdp < num_possible_cpus()) { + if (!static_key_enabled(&bnxt_xdp_locking_key)) + static_branch_enable(&bnxt_xdp_locking_key); + } else if (static_key_enabled(&bnxt_xdp_locking_key)) { + static_branch_disable(&bnxt_xdp_locking_key); + } set_bit(BNXT_STATE_OPEN, &bp->state); bnxt_enable_int(bp); /* Enable TX queues */ diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index 666fc1e7a7d2..caf66a35d923 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -800,6 +800,8 @@ struct bnxt_tx_ring_info { u32 dev_state;
struct bnxt_ring_struct tx_ring_struct; + /* Synchronize simultaneous xdp_xmit on same ring */ + spinlock_t xdp_tx_lock; };
#define BNXT_LEGACY_COAL_CMPL_PARAMS \ diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c index 52fad0fdeacf..c0541ff00ac8 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c @@ -20,6 +20,8 @@ #include "bnxt.h" #include "bnxt_xdp.h"
+DEFINE_STATIC_KEY_FALSE(bnxt_xdp_locking_key); + struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp, struct bnxt_tx_ring_info *txr, dma_addr_t mapping, u32 len) @@ -227,6 +229,9 @@ int bnxt_xdp_xmit(struct net_device *dev, int num_frames, ring = smp_processor_id() % bp->tx_nr_rings_xdp; txr = &bp->tx_ring[ring];
+ if (static_branch_unlikely(&bnxt_xdp_locking_key)) + spin_lock(&txr->xdp_tx_lock); + for (i = 0; i < num_frames; i++) { struct xdp_frame *xdp = frames[i];
@@ -250,6 +255,9 @@ int bnxt_xdp_xmit(struct net_device *dev, int num_frames, bnxt_db_write(bp, &txr->tx_db, txr->tx_prod); }
+ if (static_branch_unlikely(&bnxt_xdp_locking_key)) + spin_unlock(&txr->xdp_tx_lock); + return nxmit; }
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h index 0df40c3beb05..067bb5e821f5 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h @@ -10,6 +10,8 @@ #ifndef BNXT_XDP_H #define BNXT_XDP_H
+DECLARE_STATIC_KEY_FALSE(bnxt_xdp_locking_key); + struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp, struct bnxt_tx_ring_info *txr, dma_addr_t mapping, u32 len);
From: Andy Gospodarek gospo@broadcom.com
[ Upstream commit facc173cf700e55b2ad249ecbd3a7537f7315691 ]
Insufficient space was being reserved in the page used for packet reception, so the interface MTU could be set too large to still have room for the contents of the packet when doing XDP redirect. This resulted in the following message when redirecting a packet between 3520 and 3822 bytes with an MTU of 3822:
[311815.561880] XDP_WARN: xdp_update_frame_from_buff(line:200): Driver BUG: missing reserved tailroom
Fixes: f18c2b77b2e4 ("bnxt_en: optimized XDP_REDIRECT support") Reviewed-by: Somnath Kotur somnath.kotur@broadcom.com Reviewed-by: Pavan Chebbi pavan.chebbi@broadcom.com Signed-off-by: Andy Gospodarek gospo@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index caf66a35d923..d57bff46b587 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -593,7 +593,8 @@ struct nqe_cn { #define BNXT_MAX_MTU 9500 #define BNXT_MAX_PAGE_MODE_MTU \ ((unsigned int)PAGE_SIZE - VLAN_ETH_HLEN - NET_IP_ALIGN - \ - XDP_PACKET_HEADROOM) + XDP_PACKET_HEADROOM - \ + SKB_DATA_ALIGN((unsigned int)sizeof(struct skb_shared_info)))
#define BNXT_MIN_PKT_SIZE 52
From: Ray Jui ray.jui@broadcom.com
[ Upstream commit 27d4073f8d9af0340362554414f4961643a4f4de ]
Add checks in the XDP redirect callback to prevent XDP from running when the TX ring is undergoing shutdown.
Also remove redundant checks in the XDP redirect callback to validate the txr and the flag that indicates the ring supports XDP. The modulo arithmetic on 'tx_nr_rings_xdp' already guarantees the derived TX ring is an XDP ring. txr is also guaranteed to be valid after checking BNXT_STATE_OPEN and within RCU grace period.
Fixes: f18c2b77b2e4 ("bnxt_en: optimized XDP_REDIRECT support") Reviewed-by: Vladimir Olovyannikov vladimir.olovyannikov@broadcom.com Signed-off-by: Ray Jui ray.jui@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c index c0541ff00ac8..03b1d6c04504 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c @@ -229,14 +229,16 @@ int bnxt_xdp_xmit(struct net_device *dev, int num_frames, ring = smp_processor_id() % bp->tx_nr_rings_xdp; txr = &bp->tx_ring[ring];
+ if (READ_ONCE(txr->dev_state) == BNXT_DEV_STATE_CLOSING) + return -EINVAL; + if (static_branch_unlikely(&bnxt_xdp_locking_key)) spin_lock(&txr->xdp_tx_lock);
for (i = 0; i < num_frames; i++) { struct xdp_frame *xdp = frames[i];
- if (!txr || !bnxt_tx_avail(bp, txr) || - !(bp->bnapi[ring]->flags & BNXT_NAPI_FLAG_XDP)) + if (!bnxt_tx_avail(bp, txr)) break;
mapping = dma_map_single(&pdev->dev, xdp->data, xdp->len,
From: Martin Habets habetsm.xilinx@gmail.com
[ Upstream commit 458f5d92df4807e2a7c803ed928369129996bf96 ]
When the page_ring is not used page_ptr_mask is 0. Do not dereference page_ring[0] in this case.
Fixes: 2768935a4660 ("sfc: reuse pages to avoid DMA mapping/unmapping costs") Reported-by: Taehee Yoo ap420073@gmail.com Signed-off-by: Martin Habets habetsm.xilinx@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/sfc/rx_common.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/sfc/rx_common.c b/drivers/net/ethernet/sfc/rx_common.c index 633ca77a26fd..b925de9b4302 100644 --- a/drivers/net/ethernet/sfc/rx_common.c +++ b/drivers/net/ethernet/sfc/rx_common.c @@ -166,6 +166,9 @@ static void efx_fini_rx_recycle_ring(struct efx_rx_queue *rx_queue) struct efx_nic *efx = rx_queue->efx; int i;
+ if (unlikely(!rx_queue->page_ring)) + return; + /* Unmap and release the pages in the recycle ring. Remove the ring. */ for (i = 0; i <= rx_queue->page_ptr_mask; i++) { struct page *page = rx_queue->page_ring[i];
From: Aharon Landau aharonl@nvidia.com
[ Upstream commit 84c2362fb65d69c721fec0974556378cbb36a62b ]
Don't remove MRs from the cache if need to delay the removal.
Fixes: b9358bdbc713 ("RDMA/mlx5: Fix locking in MR cache work queue") Link: https://lore.kernel.org/r/c3087a90ff362c8796c7eaa2715128743ce36722.164906243... Signed-off-by: Aharon Landau aharonl@nvidia.com Reviewed-by: Shay Drory shayd@nvidia.com Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/mlx5/mr.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index 2910d7833313..d3b2d02a4872 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -541,8 +541,10 @@ static void __cache_work_func(struct mlx5_cache_ent *ent) spin_lock_irq(&ent->lock); if (ent->disabled) goto out; - if (need_delay) + if (need_delay) { queue_delayed_work(cache->wq, &ent->dwork, 300 * HZ); + goto out; + } remove_cache_mr_locked(ent); queue_adjust_cache_locked(ent); }
From: Aharon Landau aharonl@nvidia.com
[ Upstream commit 1d735eeee63a0beb65180ca0224f239cc0c9f804 ]
Update cache->last_add when returning an MR to the cache so that the cache work won't remove it.
Fixes: b9358bdbc713 ("RDMA/mlx5: Fix locking in MR cache work queue") Link: https://lore.kernel.org/r/c99f076fce4b44829d434936bbcd3b5fc4c95020.164906243... Signed-off-by: Aharon Landau aharonl@nvidia.com Reviewed-by: Shay Drory shayd@nvidia.com Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/mlx5/mr.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index d3b2d02a4872..d40a1460ef97 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -632,6 +632,7 @@ static void mlx5_mr_cache_free(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr) { struct mlx5_cache_ent *ent = mr->cache_ent;
+ WRITE_ONCE(dev->cache.last_add, jiffies); spin_lock_irq(&ent->lock); list_add_tail(&mr->list, &ent->head); ent->available_mrs++;
From: Mark Zhang markzhang@nvidia.com
[ Upstream commit 107dd7beba403a363adfeb3ffe3734fe38a05cce ]
On the passive side when the disconnectReq event comes, if the current state is MRA_REP_RCVD, it needs to cancel the MAD before entering the DREQ_RCVD and TIMEWAIT states, otherwise the destroy_id may block until this mad will reach timeout.
Fixes: a977049dacde ("[PATCH] IB: Add the kernel CM implementation") Link: https://lore.kernel.org/r/75261c00c1d82128b1d981af9ff46e994186e621.164906243... Signed-off-by: Mark Zhang markzhang@nvidia.com Reviewed-by: Maor Gottlieb maorg@nvidia.com Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/core/cm.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 35f0d5e7533d..1c107d6d03b9 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -2824,6 +2824,7 @@ static int cm_dreq_handler(struct cm_work *work) switch (cm_id_priv->id.state) { case IB_CM_REP_SENT: case IB_CM_DREQ_SENT: + case IB_CM_MRA_REP_RCVD: ib_cancel_mad(cm_id_priv->msg); break; case IB_CM_ESTABLISHED: @@ -2831,8 +2832,6 @@ static int cm_dreq_handler(struct cm_work *work) cm_id_priv->id.lap_state == IB_CM_MRA_LAP_RCVD) ib_cancel_mad(cm_id_priv->msg); break; - case IB_CM_MRA_REP_RCVD: - break; case IB_CM_TIMEWAIT: atomic_long_inc(&work->port->counters[CM_RECV_DUPLICATES] [CM_DREQ_COUNTER]);
From: Paulo Alcantara pc@cjr.nz
[ Upstream commit 687127c81ad32c8900a3fedbc7ed8f686ca95855 ]
To avoid racing with demultiplex thread while it is handling data on socket, use cifs_signal_cifsd_for_reconnect() helper for marking current server to reconnect and let the demultiplex thread handle the rest.
Fixes: dca65818c80c ("cifs: use a different reconnect helper for non-cifsd threads") Reviewed-by: Enzo Matsumiya ematsumiya@suse.de Reviewed-by: Shyam Prasad N sprasad@microsoft.com Signed-off-by: Paulo Alcantara (SUSE) pc@cjr.nz Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cifs/connect.c | 2 +- fs/cifs/netmisc.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index d6f8ccc7bfe2..0270b412f801 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -4465,7 +4465,7 @@ static int tree_connect_dfs_target(const unsigned int xid, struct cifs_tcon *tco */ if (rc && server->current_fullpath != server->origin_fullpath) { server->current_fullpath = server->origin_fullpath; - cifs_reconnect(tcon->ses->server, true); + cifs_signal_cifsd_for_reconnect(server, true); }
dfs_cache_free_tgts(tl); diff --git a/fs/cifs/netmisc.c b/fs/cifs/netmisc.c index ebe236b9d9f5..235aa1b395eb 100644 --- a/fs/cifs/netmisc.c +++ b/fs/cifs/netmisc.c @@ -896,7 +896,7 @@ map_and_check_smb_error(struct mid_q_entry *mid, bool logErr) if (class == ERRSRV && code == ERRbaduid) { cifs_dbg(FYI, "Server returned 0x%x, reconnecting session...\n", code); - cifs_reconnect(mid->server, false); + cifs_signal_cifsd_for_reconnect(mid->server, false); } }
From: Niels Dossche dossche.niels@gmail.com
[ Upstream commit 4d809f69695d4e7d1378b3a072fa9aef23123018 ]
The documentation of the function rvt_error_qp says both r_lock and s_lock need to be held when calling that function. It also asserts using lockdep that both of those locks are held. However, the commit I referenced in Fixes accidentally makes the call to rvt_error_qp in rvt_ruc_loopback no longer covered by r_lock. This results in the lockdep assertion failing and also possibly in a race condition.
Fixes: d757c60eca9b ("IB/rdmavt: Fix concurrency panics in QP post_send and modify to error") Link: https://lore.kernel.org/r/20220228165330.41546-1-dossche.niels@gmail.com Signed-off-by: Niels Dossche dossche.niels@gmail.com Acked-by: Dennis Dalessandro dennis.dalessandro@cornelisnetworks.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/sw/rdmavt/qp.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/sw/rdmavt/qp.c b/drivers/infiniband/sw/rdmavt/qp.c index ae50b56e8913..8ef112f883a7 100644 --- a/drivers/infiniband/sw/rdmavt/qp.c +++ b/drivers/infiniband/sw/rdmavt/qp.c @@ -3190,7 +3190,11 @@ void rvt_ruc_loopback(struct rvt_qp *sqp) spin_lock_irqsave(&sqp->s_lock, flags); rvt_send_complete(sqp, wqe, send_status); if (sqp->ibqp.qp_type == IB_QPT_RC) { - int lastwqe = rvt_error_qp(sqp, IB_WC_WR_FLUSH_ERR); + int lastwqe; + + spin_lock(&sqp->r_lock); + lastwqe = rvt_error_qp(sqp, IB_WC_WR_FLUSH_ERR); + spin_unlock(&sqp->r_lock);
sqp->s_flags &= ~RVT_S_BUSY; spin_unlock_irqrestore(&sqp->s_lock, flags);
From: Jamie Bainbridge jamie.bainbridge@gmail.com
[ Upstream commit e3d37210df5c41c51147a2d5d465de1a4d77be7a ]
Singleton chunks (INIT, HEARTBEAT PMTU probes, and SHUTDOWN- COMPLETE) are not counted in SCTP_GET_ASOC_STATS "sas_octrlchunks" counter available to the assoc owner.
These are all control chunks so they should be counted as such.
Add counting of singleton chunks so they are properly accounted for.
Fixes: 196d67593439 ("sctp: Add support to per-association statistics via a new SCTP_GET_ASSOC_STATS call") Signed-off-by: Jamie Bainbridge jamie.bainbridge@gmail.com Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Link: https://lore.kernel.org/r/c9ba8785789880cf07923b8a5051e174442ea9ee.164902966... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sctp/outqueue.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index a18609f608fb..e213aaf45d67 100644 --- a/net/sctp/outqueue.c +++ b/net/sctp/outqueue.c @@ -914,6 +914,7 @@ static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx) ctx->asoc->base.sk->sk_err = -error; return; } + ctx->asoc->stats.octrlchunks++; break;
case SCTP_CID_ABORT: @@ -938,7 +939,10 @@ static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx)
case SCTP_CID_HEARTBEAT: if (chunk->pmtu_probe) { - sctp_packet_singleton(ctx->transport, chunk, ctx->gfp); + error = sctp_packet_singleton(ctx->transport, + chunk, ctx->gfp); + if (!error) + ctx->asoc->stats.octrlchunks++; break; } fallthrough;
From: Miaoqian Lin linmq006@gmail.com
[ Upstream commit 2b04bd4f03bba021959ca339314f6739710f0954 ]
This node pointer is returned by of_find_compatible_node() with refcount incremented. Calling of_node_put() to aovid the refcount leak.
Fixes: d346c9e86d86 ("dpaa2-ptp: reuse ptp_qoriq driver") Signed-off-by: Miaoqian Lin linmq006@gmail.com Link: https://lore.kernel.org/r/20220404125336.13427-1-linmq006@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c index 5f5f8c53c4a0..c8cb541572ff 100644 --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c @@ -167,7 +167,7 @@ static int dpaa2_ptp_probe(struct fsl_mc_device *mc_dev) base = of_iomap(node, 0); if (!base) { err = -ENOMEM; - goto err_close; + goto err_put; }
err = fsl_mc_allocate_irqs(mc_dev); @@ -210,6 +210,8 @@ static int dpaa2_ptp_probe(struct fsl_mc_device *mc_dev) fsl_mc_free_irqs(mc_dev); err_unmap: iounmap(base); +err_put: + of_node_put(node); err_close: dprtc_close(mc_dev->mc_io, 0, mc_dev->mc_handle); err_free_mcp:
From: Anatolii Gerasymenko anatolii.gerasymenko@intel.com
[ Upstream commit ccfee1822042b87e5135d33cad8ea353e64612d2 ]
When VF is freshly created, but not brought up, ring->txq_teid value is by default set to 0. But 0 is a valid TEID. On some platforms the Root Node of Tx scheduler has a TEID = 0. This can cause issues as shown below.
The proper way is to set ring->txq_teid to ICE_INVAL_TEID (0xFFFFFFFF).
Testing Hints: echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs ip link set dev ens785f0v0 up ip link set dev ens785f0v0 down
If we have freshly created VF and quickly turn it on and off, so there would be no time to reach VIRTCHNL_OP_CONFIG_VSI_QUEUES stage, then VIRTCHNL_OP_DISABLE_QUEUES stage will fail with error: [ 639.531454] disable queue 89 failed 14 [ 639.532233] Failed to disable LAN Tx queues, error: ICE_ERR_AQ_ERROR [ 639.533107] ice 0000:02:00.0: Failed to stop Tx ring 0 on VSI 5
The reason for the fail is that we are trying to send AQ command to delete queue 89, which has never been created and receive an "invalid argument" error from firmware.
As this queue has never been created, it's teid and ring->txq_teid have default value 0. ice_dis_vsi_txq has a check against non-existent queues:
node = ice_sched_find_node_by_teid(pi->root, q_teids[i]); if (!node) continue;
But on some platforms the Root Node of Tx scheduler has a teid = 0. Hence, ice_sched_find_node_by_teid finds a node with teid = 0 (it is pi->root), and we go further to submit an erroneous request to firmware.
Fixes: 37bb83901286 ("ice: Move common functions out of ice_main.c part 7/7") Signed-off-by: Anatolii Gerasymenko anatolii.gerasymenko@intel.com Reviewed-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Tested-by: Konrad Jankowski konrad0.jankowski@intel.com Signed-off-by: Alice Michael alice.michael@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_lib.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index 20d755822d43..5fd2bbeab2d1 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -1452,6 +1452,7 @@ static int ice_vsi_alloc_rings(struct ice_vsi *vsi) ring->tx_tstamps = &pf->ptp.port.tx; ring->dev = dev; ring->count = vsi->num_tx_desc; + ring->txq_teid = ICE_INVAL_TEID; WRITE_ONCE(vsi->tx_rings[i], ring); }
From: Anatolii Gerasymenko anatolii.gerasymenko@intel.com
[ Upstream commit 05ef6813b234db3196f083b91db3963f040b65bb ]
Disable check for queue being enabled in ice_vc_dis_qs_msg, because there could be a case when queues were created, but were not enabled. We still need to delete those queues.
Normal workflow for VF looks like: Enable path: VIRTCHNL_OP_ADD_ETH_ADDR (opcode 10) VIRTCHNL_OP_CONFIG_VSI_QUEUES (opcode 6) VIRTCHNL_OP_ENABLE_QUEUES (opcode 8)
Disable path: VIRTCHNL_OP_DISABLE_QUEUES (opcode 9) VIRTCHNL_OP_DEL_ETH_ADDR (opcode 11)
The issue appears only in stress conditions when VF is enabled and disabled very fast. Eventually there will be a case, when queues are created by VIRTCHNL_OP_CONFIG_VSI_QUEUES, but are not enabled by VIRTCHNL_OP_ENABLE_QUEUES. In turn, these queues are not deleted by VIRTCHNL_OP_DISABLE_QUEUES, because there is a check whether queues are enabled in ice_vc_dis_qs_msg.
When we bring up the VF again, we will see the "Failed to set LAN Tx queue context" error during VIRTCHNL_OP_CONFIG_VSI_QUEUES step. This happens because old 16 queues were not deleted and VF requests to create 16 more, but ice_sched_get_free_qparent in ice_ena_vsi_txq would fail to find a parent node for first newly requested queue (because all nodes are allocated to 16 old queues).
Testing Hints:
Just enable and disable VF fast enough, so it would be disabled before reaching VIRTCHNL_OP_ENABLE_QUEUES.
while true; do ip link set dev ens785f0v0 up sleep 0.065 # adjust delay value for you machine ip link set dev ens785f0v0 down done
Fixes: 77ca27c41705 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap") Signed-off-by: Anatolii Gerasymenko anatolii.gerasymenko@intel.com Tested-by: Konrad Jankowski konrad0.jankowski@intel.com Signed-off-by: Alice Michael alice.michael@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c index 1be3cd4b2bef..2bee8f10ad89 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c @@ -3351,9 +3351,9 @@ static int ice_vc_dis_qs_msg(struct ice_vf *vf, u8 *msg) goto error_param; }
- /* Skip queue if not enabled */ if (!test_bit(vf_q_id, vf->txq_ena)) - continue; + dev_dbg(ice_pf_to_dev(vsi->back), "Queue %u on VSI %u is not enabled, but stopping it anyway\n", + vf_q_id, vsi->vsi_num);
ice_fill_txq_meta(vsi, ring, &txq_meta);
From: David Ahern dsahern@kernel.org
[ Upstream commit 1158f79f82d437093aeed87d57df0548bdd68146 ]
VRF devices are the loopbacks for VRFs, and a loopback can not be assigned to a VRF. Accordingly, the condition in ip6_pkt_drop should be '||' not '&&'.
Fixes: 1d3fd8a10bed ("vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach") Reported-by: Pudak, Filip Filip.Pudak@windriver.com Reported-by: Xiao, Jiguang Jiguang.Xiao@windriver.com Signed-off-by: David Ahern dsahern@kernel.org Link: https://lore.kernel.org/r/20220404150908.2937-1-dsahern@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/route.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c index ea1cf414a92e..da1bf48e7937 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -4495,7 +4495,7 @@ static int ip6_pkt_drop(struct sk_buff *skb, u8 code, int ipstats_mib_noroutes) struct inet6_dev *idev; int type;
- if (netif_is_l3_master(skb->dev) && + if (netif_is_l3_master(skb->dev) || dst->dev == net->loopback_dev) idev = __in6_dev_get_safely(dev_get_by_index_rcu(net, IP6CB(skb)->iif)); else
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit f9124c68f05ffdb87a47e3ea6d5fae9dad7cb6eb ]
Unfortunately, the ice driver doesn't respect the RCU critical section that XSK wakeup is surrounded with. To fix this, add synchronize_rcu() calls to paths that destroy resources that might be in use.
This was addressed in other AF_XDP ZC enabled drivers, for reference see for example commit b3873a5be757 ("net/i40e: Fix concurrency issues between config flow and XSK")
Fixes: efc2214b6047 ("ice: Add support for XDP") Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Tested-by: Shwetha Nagaraju shwetha.nagaraju@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice.h | 2 +- drivers/net/ethernet/intel/ice/ice_main.c | 4 +++- drivers/net/ethernet/intel/ice/ice_xsk.c | 4 +++- 3 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index 2f60230d332a..9c04a71a9fca 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -674,7 +674,7 @@ static inline struct ice_pf *ice_netdev_to_pf(struct net_device *netdev)
static inline bool ice_is_xdp_ena_vsi(struct ice_vsi *vsi) { - return !!vsi->xdp_prog; + return !!READ_ONCE(vsi->xdp_prog); }
static inline void ice_set_ring_xdp(struct ice_tx_ring *ring) diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 92e0fe9316b9..5229bce1a4ab 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -2742,8 +2742,10 @@ int ice_destroy_xdp_rings(struct ice_vsi *vsi)
ice_for_each_xdp_txq(vsi, i) if (vsi->xdp_rings[i]) { - if (vsi->xdp_rings[i]->desc) + if (vsi->xdp_rings[i]->desc) { + synchronize_rcu(); ice_free_tx_ring(vsi->xdp_rings[i]); + } kfree_rcu(vsi->xdp_rings[i], rcu); vsi->xdp_rings[i] = NULL; } diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index feb874bde171..f95560c7387e 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -41,8 +41,10 @@ static void ice_qp_reset_stats(struct ice_vsi *vsi, u16 q_idx) static void ice_qp_clean_rings(struct ice_vsi *vsi, u16 q_idx) { ice_clean_tx_ring(vsi->tx_rings[q_idx]); - if (ice_is_xdp_ena_vsi(vsi)) + if (ice_is_xdp_ena_vsi(vsi)) { + synchronize_rcu(); ice_clean_tx_ring(vsi->xdp_rings[q_idx]); + } ice_clean_rx_ring(vsi->rx_rings[q_idx]); }
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit 72b915a2b444e9247c9d424a840e94263db07c27 ]
ICE_DOWN is dedicated for pf->state. Check for ICE_VSI_DOWN being set on vsi->state in ice_xsk_wakeup().
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Tested-by: Shwetha Nagaraju shwetha.nagaraju@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_xsk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index f95560c7387e..30620b942fa0 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -765,7 +765,7 @@ ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, struct ice_vsi *vsi = np->vsi; struct ice_tx_ring *ring;
- if (test_bit(ICE_DOWN, vsi->state)) + if (test_bit(ICE_VSI_DOWN, vsi->state)) return -ENETDOWN;
if (!ice_is_xdp_ena_vsi(vsi))
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit e19778e6c911691856447c3bf9617f00b3e1347f ]
Currently when XDP rings are created, each descriptor gets its DD bit set, which turns out to be the wrong approach as it can lead to a situation where more descriptors get cleaned than it was supposed to, e.g. when AF_XDP busy poll is run with a large batch size. In this situation, the driver would request for more buffers than it is able to handle.
Fix this by not setting the DD bits in ice_xdp_alloc_setup_rings(). They should be initialized to zero instead.
Fixes: 9610bd988df9 ("ice: optimize XDP_TX workloads") Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Tested-by: Shwetha Nagaraju shwetha.nagaraju@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 5229bce1a4ab..db2e02e673a7 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -2546,7 +2546,7 @@ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi) spin_lock_init(&xdp_ring->tx_lock); for (j = 0; j < xdp_ring->count; j++) { tx_desc = ICE_TX_DESC(xdp_ring, j); - tx_desc->cmd_type_offset_bsz = cpu_to_le64(ICE_TX_DESC_DTYPE_DESC_DONE); + tx_desc->cmd_type_offset_bsz = 0; } }
From: Ilya Maximets i.maximets@ovn.org
[ Upstream commit 3f2a3050b4a3e7f32fc0ea3c9b0183090ae00522 ]
'OVS_CLONE_ATTR_EXEC' is an internal attribute that is used for performance optimization inside the kernel. It's added by the kernel while parsing user-provided actions and should not be sent during the flow dump as it's not part of the uAPI.
The issue doesn't cause any significant problems to the ovs-vswitchd process, because reported actions are not really used in the application lifecycle and only supposed to be shown to a human via ovs-dpctl flow dump. However, the action list is still incorrect and causes the following error if the user wants to look at the datapath flows:
# ovs-dpctl add-dp system@ovs-system # ovs-dpctl add-flow "<flow match>" "clone(ct(commit),0)" # ovs-dpctl dump-flows <flow match>, packets:0, bytes:0, used:never, actions:clone(bad length 4, expected -1 for: action0(01 00 00 00), ct(commit),0)
With the fix:
# ovs-dpctl dump-flows <flow match>, packets:0, bytes:0, used:never, actions:clone(ct(commit),0)
Additionally fixed an incorrect attribute name in the comment.
Fixes: b233504033db ("openvswitch: kernel datapath clone action") Signed-off-by: Ilya Maximets i.maximets@ovn.org Acked-by: Aaron Conole aconole@redhat.com Link: https://lore.kernel.org/r/20220404104150.2865736-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/openvswitch/actions.c | 2 +- net/openvswitch/flow_netlink.c | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 780d9e2246f3..8955f31fa47e 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -1051,7 +1051,7 @@ static int clone(struct datapath *dp, struct sk_buff *skb, int rem = nla_len(attr); bool dont_clone_flow_key;
- /* The first action is always 'OVS_CLONE_ATTR_ARG'. */ + /* The first action is always 'OVS_CLONE_ATTR_EXEC'. */ clone_arg = nla_data(attr); dont_clone_flow_key = nla_get_u32(clone_arg); actions = nla_next(clone_arg, &rem); diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c index 0d677c9c2c80..2679007f8aeb 100644 --- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -3429,7 +3429,9 @@ static int clone_action_to_attr(const struct nlattr *attr, if (!start) return -EMSGSIZE;
- err = ovs_nla_put_actions(nla_data(attr), rem, skb); + /* Skipping the OVS_CLONE_ATTR_EXEC that is always the first attribute. */ + attr = nla_next(nla_data(attr), &rem); + err = ovs_nla_put_actions(attr, rem, skb);
if (err) nla_nest_cancel(skb, start);
From: Andrew Lunn andrew@lunn.ch
[ Upstream commit 11f8e7c122ce013fa745029fa8c94c6db69c2e54 ]
There is often not a MAC address available in an EEPROM accessible by Linux with Marvell devices. Instead the bootload has the MAC address and directly programs it into the hardware. So don't consider an error from of_get_mac_address() has fatal. However, the check was added for the case where there is a MAC address in an the EEPROM, but the EEPROM has not probed yet, and -EPROBE_DEFER is returned. In that case the error should be returned. So make the check specific to this error code.
Cc: Mauri Sandberg maukka@ext.kapsi.fi Reported-by: Thomas Walther walther-it@gmx.de Fixes: 42404d8f1c01 ("net: mv643xx_eth: process retval from of_get_mac_address") Signed-off-by: Andrew Lunn andrew@lunn.ch Link: https://lore.kernel.org/r/20220405000404.3374734-1-andrew@lunn.ch Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/mv643xx_eth.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c index 143ca8be5eb5..4008596963be 100644 --- a/drivers/net/ethernet/marvell/mv643xx_eth.c +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c @@ -2751,7 +2751,7 @@ static int mv643xx_eth_shared_of_add_port(struct platform_device *pdev, }
ret = of_get_mac_address(pnp, ppd.mac_addr); - if (ret) + if (ret == -EPROBE_DEFER) return ret;
mv643xx_eth_property(pnp, "tx-queue-size", ppd.tx_queue_size);
From: Ilya Maximets i.maximets@ovn.org
[ Upstream commit 1f30fb9166d4f15a1aa19449b9da871fe0ed4796 ]
While parsing user-provided actions, openvswitch module may dynamically allocate memory and store pointers in the internal copy of the actions. So this memory has to be freed while destroying the actions.
Currently there are only two such actions: ct() and set(). However, there are many actions that can hold nested lists of actions and ovs_nla_free_flow_actions() just jumps over them leaking the memory.
For example, removal of the flow with the following actions will lead to a leak of the memory allocated by nf_ct_tmpl_alloc():
actions:clone(ct(commit),0)
Non-freed set() action may also leak the 'dst' structure for the tunnel info including device references.
Under certain conditions with a high rate of flow rotation that may cause significant memory leak problem (2MB per second in reporter's case). The problem is also hard to mitigate, because the user doesn't have direct control over the datapath flows generated by OVS.
Fix that by iterating over all the nested actions and freeing everything that needs to be freed recursively.
New build time assertion should protect us from this problem if new actions will be added in the future.
Unfortunately, openvswitch module doesn't use NLA_F_NESTED, so all attributes has to be explicitly checked. sample() and clone() actions are mixing extra attributes into the user-provided action list. That prevents some code generalization too.
Fixes: 34ae932a4036 ("openvswitch: Make tunnel set action attach a metadata dst") Link: https://mail.openvswitch.org/pipermail/ovs-dev/2022-March/392922.html Reported-by: Stéphane Graber stgraber@ubuntu.com Signed-off-by: Ilya Maximets i.maximets@ovn.org Acked-by: Aaron Conole aconole@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/openvswitch/flow_netlink.c | 95 ++++++++++++++++++++++++++++++++-- 1 file changed, 90 insertions(+), 5 deletions(-)
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c index 2679007f8aeb..c591b923016a 100644 --- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -2288,6 +2288,62 @@ static struct sw_flow_actions *nla_alloc_flow_actions(int size) return sfa; }
+static void ovs_nla_free_nested_actions(const struct nlattr *actions, int len); + +static void ovs_nla_free_check_pkt_len_action(const struct nlattr *action) +{ + const struct nlattr *a; + int rem; + + nla_for_each_nested(a, action, rem) { + switch (nla_type(a)) { + case OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL: + case OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER: + ovs_nla_free_nested_actions(nla_data(a), nla_len(a)); + break; + } + } +} + +static void ovs_nla_free_clone_action(const struct nlattr *action) +{ + const struct nlattr *a = nla_data(action); + int rem = nla_len(action); + + switch (nla_type(a)) { + case OVS_CLONE_ATTR_EXEC: + /* The real list of actions follows this attribute. */ + a = nla_next(a, &rem); + ovs_nla_free_nested_actions(a, rem); + break; + } +} + +static void ovs_nla_free_dec_ttl_action(const struct nlattr *action) +{ + const struct nlattr *a = nla_data(action); + + switch (nla_type(a)) { + case OVS_DEC_TTL_ATTR_ACTION: + ovs_nla_free_nested_actions(nla_data(a), nla_len(a)); + break; + } +} + +static void ovs_nla_free_sample_action(const struct nlattr *action) +{ + const struct nlattr *a = nla_data(action); + int rem = nla_len(action); + + switch (nla_type(a)) { + case OVS_SAMPLE_ATTR_ARG: + /* The real list of actions follows this attribute. */ + a = nla_next(a, &rem); + ovs_nla_free_nested_actions(a, rem); + break; + } +} + static void ovs_nla_free_set_action(const struct nlattr *a) { const struct nlattr *ovs_key = nla_data(a); @@ -2301,25 +2357,54 @@ static void ovs_nla_free_set_action(const struct nlattr *a) } }
-void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts) +static void ovs_nla_free_nested_actions(const struct nlattr *actions, int len) { const struct nlattr *a; int rem;
- if (!sf_acts) + /* Whenever new actions are added, the need to update this + * function should be considered. + */ + BUILD_BUG_ON(OVS_ACTION_ATTR_MAX != 23); + + if (!actions) return;
- nla_for_each_attr(a, sf_acts->actions, sf_acts->actions_len, rem) { + nla_for_each_attr(a, actions, len, rem) { switch (nla_type(a)) { - case OVS_ACTION_ATTR_SET: - ovs_nla_free_set_action(a); + case OVS_ACTION_ATTR_CHECK_PKT_LEN: + ovs_nla_free_check_pkt_len_action(a); + break; + + case OVS_ACTION_ATTR_CLONE: + ovs_nla_free_clone_action(a); break; + case OVS_ACTION_ATTR_CT: ovs_ct_free_action(a); break; + + case OVS_ACTION_ATTR_DEC_TTL: + ovs_nla_free_dec_ttl_action(a); + break; + + case OVS_ACTION_ATTR_SAMPLE: + ovs_nla_free_sample_action(a); + break; + + case OVS_ACTION_ATTR_SET: + ovs_nla_free_set_action(a); + break; } } +} + +void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts) +{ + if (!sf_acts) + return;
+ ovs_nla_free_nested_actions(sf_acts->actions, sf_acts->actions_len); kfree(sf_acts); }
From: Eric Dumazet edumazet@google.com
[ Upstream commit 1946014ca3b19be9e485e780e862c375c6f98bad ]
Current code can lead to the following race:
CPU0 CPU1
rxrpc_exit_net() rxrpc_peer_keepalive_worker() if (rxnet->live)
rxnet->live = false; del_timer_sync(&rxnet->peer_keepalive_timer);
timer_reduce(&rxnet->peer_keepalive_timer, jiffies + delay);
cancel_work_sync(&rxnet->peer_keepalive_work);
rxrpc_exit_net() exits while peer_keepalive_timer is still armed, leading to use-after-free.
syzbot report was:
ODEBUG: free active (active state 0) object type: timer_list hint: rxrpc_peer_keepalive_timeout+0x0/0xb0 WARNING: CPU: 0 PID: 3660 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505 Modules linked in: CPU: 0 PID: 3660 Comm: kworker/u4:6 Not tainted 5.17.0-syzkaller-13993-g88e6c0207623 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505 Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd 00 1c 26 8a 4c 89 ee 48 c7 c7 00 10 26 8a e8 b1 e7 28 05 <0f> 0b 83 05 15 eb c5 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3 RSP: 0018:ffffc9000353fb00 EFLAGS: 00010082 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000 RDX: ffff888029196140 RSI: ffffffff815efad8 RDI: fffff520006a7f52 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815ea4ae R11: 0000000000000000 R12: ffffffff89ce23e0 R13: ffffffff8a2614e0 R14: ffffffff816628c0 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe1f2908924 CR3: 0000000043720000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __debug_check_no_obj_freed lib/debugobjects.c:992 [inline] debug_check_no_obj_freed+0x301/0x420 lib/debugobjects.c:1023 kfree+0xd6/0x310 mm/slab.c:3809 ops_free_list.part.0+0x119/0x370 net/core/net_namespace.c:176 ops_free_list net/core/net_namespace.c:174 [inline] cleanup_net+0x591/0xb00 net/core/net_namespace.c:598 process_one_work+0x996/0x1610 kernel/workqueue.c:2289 worker_thread+0x665/0x1080 kernel/workqueue.c:2436 kthread+0x2e9/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298 </TASK>
Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive") Signed-off-by: Eric Dumazet edumazet@google.com Cc: David Howells dhowells@redhat.com Cc: Marc Dionne marc.dionne@auristor.com Cc: linux-afs@lists.infradead.org Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/rxrpc/net_ns.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/rxrpc/net_ns.c b/net/rxrpc/net_ns.c index 25bbc4cc8b13..f15d6942da45 100644 --- a/net/rxrpc/net_ns.c +++ b/net/rxrpc/net_ns.c @@ -113,8 +113,8 @@ static __net_exit void rxrpc_exit_net(struct net *net) struct rxrpc_net *rxnet = rxrpc_net(net);
rxnet->live = false; - del_timer_sync(&rxnet->peer_keepalive_timer); cancel_work_sync(&rxnet->peer_keepalive_work); + del_timer_sync(&rxnet->peer_keepalive_timer); rxrpc_destroy_all_calls(rxnet); rxrpc_destroy_all_connections(rxnet); rxrpc_destroy_all_peers(rxnet);
From: Taehee Yoo ap420073@gmail.com
[ Upstream commit fb5833d81e4333294add35d3ac7f7f52a7bf107f ]
In some cases, xdp tx_queue can get used before initialization. 1. interface up/down 2. ring buffer size change
When CPU cores are lower than maximum number of channels of sfc driver, it creates new channels only for XDP.
When an interface is up or ring buffer size is changed, all channels are initialized. But xdp channels are always initialized later. So, the below scenario is possible. Packets are received to rx queue of normal channels and it is acted XDP_TX and tx_queue of xdp channels get used. But these tx_queues are not initialized yet. If so, TX DMA or queue error occurs.
In order to avoid this problem. 1. initializes xdp tx_queues earlier than other rx_queue in efx_start_channels(). 2. checks whether tx_queue is initialized or not in efx_xdp_tx_buffers().
Splat looks like: sfc 0000:08:00.1 enp8s0f1np1: TX queue 10 spurious TX completion id 250 sfc 0000:08:00.1 enp8s0f1np1: resetting (RECOVER_OR_ALL) sfc 0000:08:00.1 enp8s0f1np1: MC command 0x80 inlen 100 failed rc=-22 (raw=22) arg=789 sfc 0000:08:00.1 enp8s0f1np1: has been disabled
Fixes: f28100cb9c96 ("sfc: fix lack of XDP TX queues - error XDP TX failed (-22)") Acked-by: Martin Habets habetsm.xilinx@gmail.com Signed-off-by: Taehee Yoo ap420073@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/sfc/efx_channels.c | 2 +- drivers/net/ethernet/sfc/tx.c | 3 +++ drivers/net/ethernet/sfc/tx_common.c | 2 ++ 3 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c index 5e587cb853b9..40bfd0ad7d05 100644 --- a/drivers/net/ethernet/sfc/efx_channels.c +++ b/drivers/net/ethernet/sfc/efx_channels.c @@ -1118,7 +1118,7 @@ void efx_start_channels(struct efx_nic *efx) struct efx_rx_queue *rx_queue; struct efx_channel *channel;
- efx_for_each_channel(channel, efx) { + efx_for_each_channel_rev(channel, efx) { efx_for_each_channel_tx_queue(tx_queue, channel) { efx_init_tx_queue(tx_queue); atomic_inc(&efx->active_queues); diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c index d16e031e95f4..6983799e1c05 100644 --- a/drivers/net/ethernet/sfc/tx.c +++ b/drivers/net/ethernet/sfc/tx.c @@ -443,6 +443,9 @@ int efx_xdp_tx_buffers(struct efx_nic *efx, int n, struct xdp_frame **xdpfs, if (unlikely(!tx_queue)) return -EINVAL;
+ if (!tx_queue->initialised) + return -EINVAL; + if (efx->xdp_txq_queues_mode != EFX_XDP_TX_QUEUES_DEDICATED) HARD_TX_LOCK(efx->net_dev, tx_queue->core_txq, cpu);
diff --git a/drivers/net/ethernet/sfc/tx_common.c b/drivers/net/ethernet/sfc/tx_common.c index d530cde2b864..9bc8281b7f5b 100644 --- a/drivers/net/ethernet/sfc/tx_common.c +++ b/drivers/net/ethernet/sfc/tx_common.c @@ -101,6 +101,8 @@ void efx_fini_tx_queue(struct efx_tx_queue *tx_queue) netif_dbg(tx_queue->efx, drv, tx_queue->efx->net_dev, "shutting down TX queue %d\n", tx_queue->queue);
+ tx_queue->initialised = false; + if (!tx_queue->buffer) return;
From: Michael Walle michael@walle.cc
[ Upstream commit 8d90991e5bf7fdb9f264f5f579d18969913054b7 ]
The driver doesn't support clause 45 register access yet, but doesn't check if the access is a c45 one either. This leads to spurious register reads and writes. Add the check.
Fixes: 542671fe4d86 ("net: phy: mscc-miim: Add MDIO driver") Signed-off-by: Michael Walle michael@walle.cc Reviewed-by: Andrew Lunn andrew@lunn.ch Reviewed-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/mdio/mdio-mscc-miim.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/mdio/mdio-mscc-miim.c b/drivers/net/mdio/mdio-mscc-miim.c index 64fb76c1e395..08381038810d 100644 --- a/drivers/net/mdio/mdio-mscc-miim.c +++ b/drivers/net/mdio/mdio-mscc-miim.c @@ -93,6 +93,9 @@ static int mscc_miim_read(struct mii_bus *bus, int mii_id, int regnum) u32 val; int ret;
+ if (regnum & MII_ADDR_C45) + return -EOPNOTSUPP; + ret = mscc_miim_wait_pending(bus); if (ret) goto out; @@ -136,6 +139,9 @@ static int mscc_miim_write(struct mii_bus *bus, int mii_id, struct mscc_miim_dev *miim = bus->priv; int ret;
+ if (regnum & MII_ADDR_C45) + return -EOPNOTSUPP; + ret = mscc_miim_wait_pending(bus); if (ret < 0) goto out;
From: Jamie Bainbridge jamie.bainbridge@gmail.com
[ Upstream commit 4e910dbe36508654a896d5735b318c0b88172570 ]
qede_build_skb() assumes build_skb() always works and goes straight to skb_reserve(). However, build_skb() can fail under memory pressure. This results in a kernel panic because the skb to reserve is NULL.
Add a check in case build_skb() failed to allocate and return NULL.
The NULL return is handled correctly in callers to qede_build_skb().
Fixes: 8a8633978b842 ("qede: Add build_skb() support.") Signed-off-by: Jamie Bainbridge jamie.bainbridge@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/qlogic/qede/qede_fp.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/qlogic/qede/qede_fp.c b/drivers/net/ethernet/qlogic/qede/qede_fp.c index b242000a77fd..b7cc36589f59 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_fp.c +++ b/drivers/net/ethernet/qlogic/qede/qede_fp.c @@ -748,6 +748,9 @@ qede_build_skb(struct qede_rx_queue *rxq, buf = page_address(bd->data) + bd->page_offset; skb = build_skb(buf, rxq->rx_buf_seg_size);
+ if (unlikely(!skb)) + return NULL; + skb_reserve(skb, pad); skb_put(skb, len);
From: Kamal Dasu kdasu.kdev@gmail.com
[ Upstream commit 2c7d1b281286c46049cd22b43435cecba560edde ]
This fixes case where MSPI controller is used to access spi-nor flash and BSPI block is not present.
Fixes: 5f195ee7d830 ("spi: bcm-qspi: Implement the spi_mem interface") Signed-off-by: Kamal Dasu kdasu.kdev@gmail.com Acked-by: Florian Fainelli f.fainelli@gmail.com Link: https://lore.kernel.org/r/20220328142442.7553-1-kdasu.kdev@gmail.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-bcm-qspi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/spi-bcm-qspi.c b/drivers/spi/spi-bcm-qspi.c index 86c76211b3d3..cad2d55dcd3d 100644 --- a/drivers/spi/spi-bcm-qspi.c +++ b/drivers/spi/spi-bcm-qspi.c @@ -1205,7 +1205,7 @@ static int bcm_qspi_exec_mem_op(struct spi_mem *mem, addr = op->addr.val; len = op->data.nbytes;
- if (bcm_qspi_bspi_ver_three(qspi) == true) { + if (has_bspi(qspi) && bcm_qspi_bspi_ver_three(qspi) == true) { /* * The address coming into this function is a raw flash offset. * But for BSPI <= V3, we need to convert it to a remapped BSPI @@ -1224,7 +1224,7 @@ static int bcm_qspi_exec_mem_op(struct spi_mem *mem, len < 4) mspi_read = true;
- if (mspi_read) + if (!has_bspi(qspi) || mspi_read) return bcm_qspi_mspi_exec_mem_op(spi, op);
ret = bcm_qspi_bspi_set_mode(qspi, op, 0);
From: Meenakshikumar Somasundaram meenakshikumar.somasundaram@amd.com
[ Upstream commit ed7208706448953c6f15009cf139135776c15713 ]
[Why] Currently driver enables dmub outbox notification before oubox ISR is registered. During boot scenario, sometimes dmub issues hpd outbox message before driver registers ISR and those messages are missed.
[How] Enable dmub outbox notification after outbox ISR is registered. Also, restructured outbox enable code to call from dm layer and renamed APIs.
Reviewed-by: Jun Lei Jun.Lei@amd.com Acked-by: Jasdeep Dhillon jdhillon@amd.com Signed-off-by: Meenakshikumar Somasundaram meenakshikumar.somasundaram@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/core/dc.c | 66 +++++++++++++++++-- drivers/gpu/drm/amd/display/dc/dc.h | 3 + .../gpu/drm/amd/display/dc/dce/dmub_outbox.c | 25 +++---- .../gpu/drm/amd/display/dc/dce/dmub_outbox.h | 4 +- .../amd/display/dc/dcn10/dcn10_hw_sequencer.c | 4 -- .../drm/amd/display/dc/dcn31/dcn31_hwseq.c | 2 +- 6 files changed, 80 insertions(+), 24 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index ba1aa994db4b..62bc6ce88753 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -76,6 +76,8 @@
#include "dc_trace.h"
+#include "dce/dmub_outbox.h" + #define CTX \ dc->ctx
@@ -3707,13 +3709,23 @@ void dc_hardware_release(struct dc *dc) } #endif
-/** - * dc_enable_dmub_notifications - Returns whether dmub notification can be enabled - * @dc: dc structure +/* + ***************************************************************************** + * Function: dc_is_dmub_outbox_supported - + * + * @brief + * Checks whether DMUB FW supports outbox notifications, if supported + * DM should register outbox interrupt prior to actually enabling interrupts + * via dc_enable_dmub_outbox * - * Returns: True to enable dmub notifications, False otherwise + * @param + * [in] dc: dc structure + * + * @return + * True if DMUB FW supports outbox notifications, False otherwise + ***************************************************************************** */ -bool dc_enable_dmub_notifications(struct dc *dc) +bool dc_is_dmub_outbox_supported(struct dc *dc) { #if defined(CONFIG_DRM_AMD_DC_DCN) /* YELLOW_CARP B0 USB4 DPIA needs dmub notifications for interrupts */ @@ -3728,6 +3740,48 @@ bool dc_enable_dmub_notifications(struct dc *dc)
/** * dc_process_dmub_aux_transfer_async - Submits aux command to dmub via inbox message + * Function: dc_enable_dmub_notifications + * + * @brief + * Calls dc_is_dmub_outbox_supported to check if dmub fw supports outbox + * notifications. All DMs shall switch to dc_is_dmub_outbox_supported. + * This API shall be removed after switching. + * + * @param + * [in] dc: dc structure + * + * @return + * True if DMUB FW supports outbox notifications, False otherwise + ***************************************************************************** + */ +bool dc_enable_dmub_notifications(struct dc *dc) +{ + return dc_is_dmub_outbox_supported(dc); +} + +/** + ***************************************************************************** + * Function: dc_enable_dmub_outbox + * + * @brief + * Enables DMUB unsolicited notifications to x86 via outbox + * + * @param + * [in] dc: dc structure + * + * @return + * None + ***************************************************************************** + */ +void dc_enable_dmub_outbox(struct dc *dc) +{ + struct dc_context *dc_ctx = dc->ctx; + + dmub_enable_outbox_notification(dc_ctx->dmub_srv); +} + +/** + ***************************************************************************** * Sets port index appropriately for legacy DDC * @dc: dc structure * @link_index: link index @@ -3829,7 +3883,7 @@ uint8_t get_link_index_from_dpia_port_index(const struct dc *dc, * [in] payload: aux payload * [out] notify: set_config immediate reply * - * @return + * @return * True if successful, False if failure ***************************************************************************** */ diff --git a/drivers/gpu/drm/amd/display/dc/dc.h b/drivers/gpu/drm/amd/display/dc/dc.h index b51864890621..6a8c100a3688 100644 --- a/drivers/gpu/drm/amd/display/dc/dc.h +++ b/drivers/gpu/drm/amd/display/dc/dc.h @@ -1448,8 +1448,11 @@ void dc_z10_restore(const struct dc *dc); void dc_z10_save_init(struct dc *dc); #endif
+bool dc_is_dmub_outbox_supported(struct dc *dc); bool dc_enable_dmub_notifications(struct dc *dc);
+void dc_enable_dmub_outbox(struct dc *dc); + bool dc_process_dmub_aux_transfer_async(struct dc *dc, uint32_t link_index, struct aux_payload *payload); diff --git a/drivers/gpu/drm/amd/display/dc/dce/dmub_outbox.c b/drivers/gpu/drm/amd/display/dc/dce/dmub_outbox.c index faad8555ddbb..fff1d07d865d 100644 --- a/drivers/gpu/drm/amd/display/dc/dce/dmub_outbox.c +++ b/drivers/gpu/drm/amd/display/dc/dce/dmub_outbox.c @@ -22,20 +22,23 @@ * Authors: AMD */
-#include "dmub_outbox.h" +#include "dc.h" #include "dc_dmub_srv.h" +#include "dmub_outbox.h" #include "dmub/inc/dmub_cmd.h"
-/** - * dmub_enable_outbox_notification - Sends inbox cmd to dmub to enable outbox1 - * messages with interrupt. Dmub sends outbox1 - * message and triggers outbox1 interrupt. - * @dc: dc structure +/* + * Function: dmub_enable_outbox_notification + * + * @brief + * Sends inbox cmd to dmub for enabling outbox notifications to x86. + * + * @param + * [in] dmub_srv: dmub_srv structure */ -void dmub_enable_outbox_notification(struct dc *dc) +void dmub_enable_outbox_notification(struct dc_dmub_srv *dmub_srv) { union dmub_rb_cmd cmd; - struct dc_context *dc_ctx = dc->ctx;
memset(&cmd, 0x0, sizeof(cmd)); cmd.outbox1_enable.header.type = DMUB_CMD__OUTBOX1_ENABLE; @@ -45,7 +48,7 @@ void dmub_enable_outbox_notification(struct dc *dc) sizeof(cmd.outbox1_enable.header); cmd.outbox1_enable.enable = true;
- dc_dmub_srv_cmd_queue(dc_ctx->dmub_srv, &cmd); - dc_dmub_srv_cmd_execute(dc_ctx->dmub_srv); - dc_dmub_srv_wait_idle(dc_ctx->dmub_srv); + dc_dmub_srv_cmd_queue(dmub_srv, &cmd); + dc_dmub_srv_cmd_execute(dmub_srv); + dc_dmub_srv_wait_idle(dmub_srv); } diff --git a/drivers/gpu/drm/amd/display/dc/dce/dmub_outbox.h b/drivers/gpu/drm/amd/display/dc/dce/dmub_outbox.h index 4e0aa0d1a2d5..58ceabb9d497 100644 --- a/drivers/gpu/drm/amd/display/dc/dce/dmub_outbox.h +++ b/drivers/gpu/drm/amd/display/dc/dce/dmub_outbox.h @@ -26,8 +26,8 @@ #ifndef _DMUB_OUTBOX_H_ #define _DMUB_OUTBOX_H_
-#include "dc.h" +struct dc_dmub_srv;
-void dmub_enable_outbox_notification(struct dc *dc); +void dmub_enable_outbox_notification(struct dc_dmub_srv *dmub_srv);
#endif /* _DMUB_OUTBOX_H_ */ diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c index 530a72e3eefe..636e2d90ff93 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c @@ -1500,10 +1500,6 @@ void dcn10_init_hw(struct dc *dc) hws->funcs.dsc_pg_control(hws, res_pool->dscs[i]->inst, false); }
- /* Enable outbox notification feature of dmub */ - if (dc->debug.enable_dmub_aux_for_legacy_ddc) - dmub_enable_outbox_notification(dc); - /* we want to turn off all dp displays before doing detection */ if (dc->config.power_down_display_on_boot) dc_link_blank_all_dp_displays(dc); diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c index 4206ce5bf9a9..1e156f398065 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c @@ -194,7 +194,7 @@ void dcn31_init_hw(struct dc *dc)
/* Enables outbox notifications for usb4 dpia */ if (dc->res_pool->usb4_dpia_count) - dmub_enable_outbox_notification(dc); + dmub_enable_outbox_notification(dc->ctx->dmub_srv);
/* we want to turn off all dp displays before doing detection */ if (dc->config.power_down_display_on_boot)
From: Roman Li Roman.Li@amd.com
[ Upstream commit 95707203407c4cf0b7e520a99d6f46d8aed4b57f ]
[Why] DSC Power down code has been moved from dcn31_init_hw into init_pipes() Need to remove it from dcn10_init_hw() as well to avoid duplicated action on dcn1.x/2.x
[How] Remove DSC power down code from dcn10_init_hw()
Fixes: 8fa6f4c5715c ("drm/amd/display: fixed the DSC power off sequence during Driver PnP")
Reviewed-by: Anthony Koo Anthony.Koo@amd.com Reviewed-by: Eric Yang Eric.Yang2@amd.com Acked-by: Alex Hung alex.hung@amd.com Signed-off-by: Roman Li Roman.Li@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 7 ------- 1 file changed, 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c index 636e2d90ff93..2cefdd96d0cb 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c @@ -1493,13 +1493,6 @@ void dcn10_init_hw(struct dc *dc) link->link_status.link_active = true; }
- /* Power gate DSCs */ - if (!is_optimized_init_done) { - for (i = 0; i < res_pool->res_cap->num_dsc; i++) - if (hws->funcs.dsc_pg_control != NULL) - hws->funcs.dsc_pg_control(hws, res_pool->dscs[i]->inst, false); - } - /* we want to turn off all dp displays before doing detection */ if (dc->config.power_down_display_on_boot) dc_link_blank_all_dp_displays(dc);
From: Maxim Mikityanskiy maximmi@nvidia.com
[ Upstream commit 2e8702cc0cfa1080f29fd64003c00a3e24ac38de ]
bpf_tcp_gen_syncookie looks at the IP version in the IP header and validates the address family of the socket. It supports IPv4 packets in AF_INET6 dual-stack sockets.
On the other hand, bpf_tcp_check_syncookie looks only at the address family of the socket, ignoring the real IP version in headers, and validates only the packet size. This implementation has some drawbacks:
1. Packets are not validated properly, allowing a BPF program to trick bpf_tcp_check_syncookie into handling an IPv6 packet on an IPv4 socket.
2. Dual-stack sockets fail the checks on IPv4 packets. IPv4 clients end up receiving a SYNACK with the cookie, but the following ACK gets dropped.
This patch fixes these issues by changing the checks in bpf_tcp_check_syncookie to match the ones in bpf_tcp_gen_syncookie. IP version from the header is taken into account, and it is validated properly with address family.
Fixes: 399040847084 ("bpf: add helper to check for a valid SYN cookie") Signed-off-by: Maxim Mikityanskiy maximmi@nvidia.com Signed-off-by: Alexei Starovoitov ast@kernel.org Reviewed-by: Tariq Toukan tariqt@nvidia.com Acked-by: Arthur Fabre afabre@cloudflare.com Link: https://lore.kernel.org/bpf/20220406124113.2795730-1-maximmi@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/filter.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c index 82fcb7533663..48fc95626597 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -6777,24 +6777,33 @@ BPF_CALL_5(bpf_tcp_check_syncookie, struct sock *, sk, void *, iph, u32, iph_len if (!th->ack || th->rst || th->syn) return -ENOENT;
+ if (unlikely(iph_len < sizeof(struct iphdr))) + return -EINVAL; + if (tcp_synq_no_recent_overflow(sk)) return -ENOENT;
cookie = ntohl(th->ack_seq) - 1;
- switch (sk->sk_family) { - case AF_INET: - if (unlikely(iph_len < sizeof(struct iphdr))) + /* Both struct iphdr and struct ipv6hdr have the version field at the + * same offset so we can cast to the shorter header (struct iphdr). + */ + switch (((struct iphdr *)iph)->version) { + case 4: + if (sk->sk_family == AF_INET6 && ipv6_only_sock(sk)) return -EINVAL;
ret = __cookie_v4_check((struct iphdr *)iph, th, cookie); break;
#if IS_BUILTIN(CONFIG_IPV6) - case AF_INET6: + case 6: if (unlikely(iph_len < sizeof(struct ipv6hdr))) return -EINVAL;
+ if (sk->sk_family != AF_INET6) + return -EINVAL; + ret = __cookie_v6_check((struct ipv6hdr *)iph, th, cookie); break; #endif /* CONFIG_IPV6 */
From: Lv Yunlong lyl2019@mail.ustc.edu.cn
[ Upstream commit aadb22ba2f656581b2f733deb3a467c48cc618f6 ]
In get_initial_state, it calls notify_initial_state_done(skb,..) if cb->args[5]==1. If genlmsg_put() failed in notify_initial_state_done(), the skb will be freed by nlmsg_free(skb). Then get_initial_state will goto out and the freed skb will be used by return value skb->len, which is a uaf bug.
What's worse, the same problem goes even further: skb can also be freed in the notify_*_state_change -> notify_*_state calls below. Thus 4 additional uaf bugs happened.
My patch lets the problem callee functions: notify_initial_state_done and notify_*_state_change return an error code if errors happen. So that the error codes could be propagated and the uaf bugs can be avoid.
v2 reports a compilation warning. This v3 fixed this warning and built successfully in my local environment with no additional warnings. v2: https://lore.kernel.org/patchwork/patch/1435218/
Fixes: a29728463b254 ("drbd: Backport the "events2" command") Signed-off-by: Lv Yunlong lyl2019@mail.ustc.edu.cn Reviewed-by: Christoph Böhmwalder christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/block/drbd/drbd_int.h | 8 ++--- drivers/block/drbd/drbd_nl.c | 41 ++++++++++++++++---------- drivers/block/drbd/drbd_state.c | 18 +++++------ drivers/block/drbd/drbd_state_change.h | 8 ++--- 4 files changed, 42 insertions(+), 33 deletions(-)
diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h index f27d5b0f9a0b..a98bfcf4a5f0 100644 --- a/drivers/block/drbd/drbd_int.h +++ b/drivers/block/drbd/drbd_int.h @@ -1642,22 +1642,22 @@ struct sib_info { }; void drbd_bcast_event(struct drbd_device *device, const struct sib_info *sib);
-extern void notify_resource_state(struct sk_buff *, +extern int notify_resource_state(struct sk_buff *, unsigned int, struct drbd_resource *, struct resource_info *, enum drbd_notification_type); -extern void notify_device_state(struct sk_buff *, +extern int notify_device_state(struct sk_buff *, unsigned int, struct drbd_device *, struct device_info *, enum drbd_notification_type); -extern void notify_connection_state(struct sk_buff *, +extern int notify_connection_state(struct sk_buff *, unsigned int, struct drbd_connection *, struct connection_info *, enum drbd_notification_type); -extern void notify_peer_device_state(struct sk_buff *, +extern int notify_peer_device_state(struct sk_buff *, unsigned int, struct drbd_peer_device *, struct peer_device_info *, diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c index 44ccf8b4f4b2..69184cf17b6a 100644 --- a/drivers/block/drbd/drbd_nl.c +++ b/drivers/block/drbd/drbd_nl.c @@ -4617,7 +4617,7 @@ static int nla_put_notification_header(struct sk_buff *msg, return drbd_notification_header_to_skb(msg, &nh, true); }
-void notify_resource_state(struct sk_buff *skb, +int notify_resource_state(struct sk_buff *skb, unsigned int seq, struct drbd_resource *resource, struct resource_info *resource_info, @@ -4659,16 +4659,17 @@ void notify_resource_state(struct sk_buff *skb, if (err && err != -ESRCH) goto failed; } - return; + return 0;
nla_put_failure: nlmsg_free(skb); failed: drbd_err(resource, "Error %d while broadcasting event. Event seq:%u\n", err, seq); + return err; }
-void notify_device_state(struct sk_buff *skb, +int notify_device_state(struct sk_buff *skb, unsigned int seq, struct drbd_device *device, struct device_info *device_info, @@ -4708,16 +4709,17 @@ void notify_device_state(struct sk_buff *skb, if (err && err != -ESRCH) goto failed; } - return; + return 0;
nla_put_failure: nlmsg_free(skb); failed: drbd_err(device, "Error %d while broadcasting event. Event seq:%u\n", err, seq); + return err; }
-void notify_connection_state(struct sk_buff *skb, +int notify_connection_state(struct sk_buff *skb, unsigned int seq, struct drbd_connection *connection, struct connection_info *connection_info, @@ -4757,16 +4759,17 @@ void notify_connection_state(struct sk_buff *skb, if (err && err != -ESRCH) goto failed; } - return; + return 0;
nla_put_failure: nlmsg_free(skb); failed: drbd_err(connection, "Error %d while broadcasting event. Event seq:%u\n", err, seq); + return err; }
-void notify_peer_device_state(struct sk_buff *skb, +int notify_peer_device_state(struct sk_buff *skb, unsigned int seq, struct drbd_peer_device *peer_device, struct peer_device_info *peer_device_info, @@ -4807,13 +4810,14 @@ void notify_peer_device_state(struct sk_buff *skb, if (err && err != -ESRCH) goto failed; } - return; + return 0;
nla_put_failure: nlmsg_free(skb); failed: drbd_err(peer_device, "Error %d while broadcasting event. Event seq:%u\n", err, seq); + return err; }
void notify_helper(enum drbd_notification_type type, @@ -4864,7 +4868,7 @@ void notify_helper(enum drbd_notification_type type, err, seq); }
-static void notify_initial_state_done(struct sk_buff *skb, unsigned int seq) +static int notify_initial_state_done(struct sk_buff *skb, unsigned int seq) { struct drbd_genlmsghdr *dh; int err; @@ -4878,11 +4882,12 @@ static void notify_initial_state_done(struct sk_buff *skb, unsigned int seq) if (nla_put_notification_header(skb, NOTIFY_EXISTS)) goto nla_put_failure; genlmsg_end(skb, dh); - return; + return 0;
nla_put_failure: nlmsg_free(skb); pr_err("Error %d sending event. Event seq:%u\n", err, seq); + return err; }
static void free_state_changes(struct list_head *list) @@ -4909,6 +4914,7 @@ static int get_initial_state(struct sk_buff *skb, struct netlink_callback *cb) unsigned int seq = cb->args[2]; unsigned int n; enum drbd_notification_type flags = 0; + int err = 0;
/* There is no need for taking notification_mutex here: it doesn't matter if the initial state events mix with later state chage @@ -4917,32 +4923,32 @@ static int get_initial_state(struct sk_buff *skb, struct netlink_callback *cb)
cb->args[5]--; if (cb->args[5] == 1) { - notify_initial_state_done(skb, seq); + err = notify_initial_state_done(skb, seq); goto out; } n = cb->args[4]++; if (cb->args[4] < cb->args[3]) flags |= NOTIFY_CONTINUES; if (n < 1) { - notify_resource_state_change(skb, seq, state_change->resource, + err = notify_resource_state_change(skb, seq, state_change->resource, NOTIFY_EXISTS | flags); goto next; } n--; if (n < state_change->n_connections) { - notify_connection_state_change(skb, seq, &state_change->connections[n], + err = notify_connection_state_change(skb, seq, &state_change->connections[n], NOTIFY_EXISTS | flags); goto next; } n -= state_change->n_connections; if (n < state_change->n_devices) { - notify_device_state_change(skb, seq, &state_change->devices[n], + err = notify_device_state_change(skb, seq, &state_change->devices[n], NOTIFY_EXISTS | flags); goto next; } n -= state_change->n_devices; if (n < state_change->n_devices * state_change->n_connections) { - notify_peer_device_state_change(skb, seq, &state_change->peer_devices[n], + err = notify_peer_device_state_change(skb, seq, &state_change->peer_devices[n], NOTIFY_EXISTS | flags); goto next; } @@ -4957,7 +4963,10 @@ static int get_initial_state(struct sk_buff *skb, struct netlink_callback *cb) cb->args[4] = 0; } out: - return skb->len; + if (err) + return err; + else + return skb->len; }
int drbd_adm_get_initial_state(struct sk_buff *skb, struct netlink_callback *cb) diff --git a/drivers/block/drbd/drbd_state.c b/drivers/block/drbd/drbd_state.c index b8a27818ab3f..4ee11aef6672 100644 --- a/drivers/block/drbd/drbd_state.c +++ b/drivers/block/drbd/drbd_state.c @@ -1537,7 +1537,7 @@ int drbd_bitmap_io_from_worker(struct drbd_device *device, return rv; }
-void notify_resource_state_change(struct sk_buff *skb, +int notify_resource_state_change(struct sk_buff *skb, unsigned int seq, struct drbd_resource_state_change *resource_state_change, enum drbd_notification_type type) @@ -1550,10 +1550,10 @@ void notify_resource_state_change(struct sk_buff *skb, .res_susp_fen = resource_state_change->susp_fen[NEW], };
- notify_resource_state(skb, seq, resource, &resource_info, type); + return notify_resource_state(skb, seq, resource, &resource_info, type); }
-void notify_connection_state_change(struct sk_buff *skb, +int notify_connection_state_change(struct sk_buff *skb, unsigned int seq, struct drbd_connection_state_change *connection_state_change, enum drbd_notification_type type) @@ -1564,10 +1564,10 @@ void notify_connection_state_change(struct sk_buff *skb, .conn_role = connection_state_change->peer_role[NEW], };
- notify_connection_state(skb, seq, connection, &connection_info, type); + return notify_connection_state(skb, seq, connection, &connection_info, type); }
-void notify_device_state_change(struct sk_buff *skb, +int notify_device_state_change(struct sk_buff *skb, unsigned int seq, struct drbd_device_state_change *device_state_change, enum drbd_notification_type type) @@ -1577,10 +1577,10 @@ void notify_device_state_change(struct sk_buff *skb, .dev_disk_state = device_state_change->disk_state[NEW], };
- notify_device_state(skb, seq, device, &device_info, type); + return notify_device_state(skb, seq, device, &device_info, type); }
-void notify_peer_device_state_change(struct sk_buff *skb, +int notify_peer_device_state_change(struct sk_buff *skb, unsigned int seq, struct drbd_peer_device_state_change *p, enum drbd_notification_type type) @@ -1594,7 +1594,7 @@ void notify_peer_device_state_change(struct sk_buff *skb, .peer_resync_susp_dependency = p->resync_susp_dependency[NEW], };
- notify_peer_device_state(skb, seq, peer_device, &peer_device_info, type); + return notify_peer_device_state(skb, seq, peer_device, &peer_device_info, type); }
static void broadcast_state_change(struct drbd_state_change *state_change) @@ -1602,7 +1602,7 @@ static void broadcast_state_change(struct drbd_state_change *state_change) struct drbd_resource_state_change *resource_state_change = &state_change->resource[0]; bool resource_state_has_changed; unsigned int n_device, n_connection, n_peer_device, n_peer_devices; - void (*last_func)(struct sk_buff *, unsigned int, void *, + int (*last_func)(struct sk_buff *, unsigned int, void *, enum drbd_notification_type) = NULL; void *last_arg = NULL;
diff --git a/drivers/block/drbd/drbd_state_change.h b/drivers/block/drbd/drbd_state_change.h index ba80f612d6ab..d5b0479bc9a6 100644 --- a/drivers/block/drbd/drbd_state_change.h +++ b/drivers/block/drbd/drbd_state_change.h @@ -44,19 +44,19 @@ extern struct drbd_state_change *remember_old_state(struct drbd_resource *, gfp_ extern void copy_old_to_new_state_change(struct drbd_state_change *); extern void forget_state_change(struct drbd_state_change *);
-extern void notify_resource_state_change(struct sk_buff *, +extern int notify_resource_state_change(struct sk_buff *, unsigned int, struct drbd_resource_state_change *, enum drbd_notification_type type); -extern void notify_connection_state_change(struct sk_buff *, +extern int notify_connection_state_change(struct sk_buff *, unsigned int, struct drbd_connection_state_change *, enum drbd_notification_type type); -extern void notify_device_state_change(struct sk_buff *, +extern int notify_device_state_change(struct sk_buff *, unsigned int, struct drbd_device_state_change *, enum drbd_notification_type type); -extern void notify_peer_device_state_change(struct sk_buff *, +extern int notify_peer_device_state_change(struct sk_buff *, unsigned int, struct drbd_peer_device_state_change *, enum drbd_notification_type type);
From: Martin K. Petersen martin.petersen@oracle.com
[ Upstream commit 1700714b1ff252b634db21186db4d91e7e006043 ]
As such it should be called inside the scsi_device_supports_vpd() conditional.
Link: https://lore.kernel.org/r/20220302053559.32147-13-martin.petersen@oracle.com Fixes: e815d36548f0 ("scsi: sd: add concurrent positioning ranges support") Cc: Damien Le Moal damien.lemoal@wdc.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Damien Le Moal damien.lemoal@opensource.wdc.com Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/sd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 66056806159a..8b5d2a4076c2 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3320,6 +3320,7 @@ static int sd_revalidate_disk(struct gendisk *disk) sd_read_block_limits(sdkp); sd_read_block_characteristics(sdkp); sd_zbc_read_zones(sdkp, buffer); + sd_read_cpr(sdkp); }
sd_print_capacity(sdkp, old_capacity); @@ -3329,7 +3330,6 @@ static int sd_revalidate_disk(struct gendisk *disk) sd_read_app_tag_own(sdkp, buffer); sd_read_write_same(sdkp, buffer); sd_read_security(sdkp, buffer); - sd_read_cpr(sdkp); }
/*
From: Xiaomeng Tong xiam0nd.tong@gmail.com
[ Upstream commit bfb7789bcbd901caead43861461bc8f334c90d3b ]
The list iterator is always non-NULL so the check 'if (!rgn)' is always false and the dev_err() is never called. Move the check outside the loop and determine if 'victim_rgn' is NULL, to fix this bug.
Link: https://lore.kernel.org/r/20220320150733.21824-1-xiam0nd.tong@gmail.com Fixes: 4b5f49079c52 ("scsi: ufs: ufshpb: L2P map management for HPB read") Reviewed-by: Daejun Park daejun7.park@samsung.com Signed-off-by: Xiaomeng Tong xiam0nd.tong@gmail.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/ufs/ufshpb.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c index 2d36a0715fca..b34feba1f53d 100644 --- a/drivers/scsi/ufs/ufshpb.c +++ b/drivers/scsi/ufs/ufshpb.c @@ -869,12 +869,6 @@ static struct ufshpb_region *ufshpb_victim_lru_info(struct ufshpb_lu *hpb) struct ufshpb_region *rgn, *victim_rgn = NULL;
list_for_each_entry(rgn, &lru_info->lh_lru_rgn, list_lru_rgn) { - if (!rgn) { - dev_err(&hpb->sdev_ufs_lu->sdev_dev, - "%s: no region allocated\n", - __func__); - return NULL; - } if (ufshpb_check_srgns_issue_state(hpb, rgn)) continue;
@@ -890,6 +884,11 @@ static struct ufshpb_region *ufshpb_victim_lru_info(struct ufshpb_lu *hpb) break; }
+ if (!victim_rgn) + dev_err(&hpb->sdev_ufs_lu->sdev_dev, + "%s: no region allocated\n", + __func__); + return victim_rgn; }
From: Pavel Begunkov asml.silence@gmail.com
[ Upstream commit 34bb77184123ae401100a4d156584f12fa630e5c ]
Don't forget to array_index_nospec() for indexes before updating rsrc tags in __io_sqe_files_update(), just use already safe and precalculated index @i.
Fixes: c3bdad0271834 ("io_uring: add generic rsrc update with tags") Signed-off-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io_uring.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c index 5e6788ab188f..a3e82aececd9 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -8700,7 +8700,7 @@ static int __io_sqe_files_update(struct io_ring_ctx *ctx, err = -EBADF; break; } - *io_get_tag_slot(data, up->offset + done) = tag; + *io_get_tag_slot(data, i) = tag; io_fixed_file_set(file_slot, file); err = io_sqe_file_register(ctx, file, i); if (err) {
From: Pavel Begunkov asml.silence@gmail.com
[ Upstream commit a07211e3001435fe8591b992464cd8d5e3c98c5a ]
It's safer to not touch scm_fp_list after we queued an skb to which it was assigned, there might be races lurking if we screw subtle sync guarantees on the io_uring side.
Fixes: 6b06314c47e14 ("io_uring: add file set registration") Signed-off-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io_uring.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c index a3e82aececd9..0ee1d8903ffe 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -8237,8 +8237,12 @@ static int __io_sqe_files_scm(struct io_ring_ctx *ctx, int nr, int offset) refcount_add(skb->truesize, &sk->sk_wmem_alloc); skb_queue_head(&sk->sk_receive_queue, skb);
- for (i = 0; i < nr_files; i++) - fput(fpl->fp[i]); + for (i = 0; i < nr; i++) { + struct file *file = io_file_from_index(ctx, i + offset); + + if (file) + fput(file); + } } else { kfree_skb(skb); free_uid(fpl->user);
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit d3c15033b240767d0287f1c4a529cbbe2d5ded8a ]
Both call_transmit() and call_bc_transmit() can now return ENOMEM, so let's make sure that we handle the errors gracefully.
Fixes: 0472e4766049 ("SUNRPC: Convert socket page send code to use iov_iter()") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/clnt.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index b36d235d2d6d..bf1fd6caaf92 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -2197,6 +2197,7 @@ call_transmit_status(struct rpc_task *task) * socket just returned a connection error, * then hold onto the transport lock. */ + case -ENOMEM: case -ENOBUFS: rpc_delay(task, HZ>>2); fallthrough; @@ -2280,6 +2281,7 @@ call_bc_transmit_status(struct rpc_task *task) case -ENOTCONN: case -EPIPE: break; + case -ENOMEM: case -ENOBUFS: rpc_delay(task, HZ>>2); fallthrough;
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 9d82819d5b065348ce623f196bf601028e22ed00 ]
We need to handle ENFILE, ENOBUFS, and ENOMEM, because xprt_wake_pending_tasks() can be called with any one of these due to socket creation failures.
Fixes: b61d59fffd3e ("SUNRPC: xs_tcp_connect_worker{4,6}: merge common code") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/clnt.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index bf1fd6caaf92..0222ad4523a9 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -2364,6 +2364,11 @@ call_status(struct rpc_task *task) case -EPIPE: case -EAGAIN: break; + case -ENFILE: + case -ENOBUFS: + case -ENOMEM: + rpc_delay(task, HZ>>2); + break; case -EIO: /* shutdown or soft timeout */ goto out_exit;
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit b056fa070814897be32d83b079dbc311375588e7 ]
The allocation is done with GFP_KERNEL, but it could still fail in a low memory situation.
Fixes: 4a85a6a3320b ("SUNRPC: Handle TCP socket sends with kernel_sendpage() again") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/svcsock.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 478f857cdaed..6ea3d87e1147 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1096,7 +1096,9 @@ static int svc_tcp_sendmsg(struct socket *sock, struct xdr_buf *xdr, int ret;
*sentp = 0; - xdr_alloc_bvec(xdr, GFP_KERNEL); + ret = xdr_alloc_bvec(xdr, GFP_KERNEL); + if (ret < 0) + return ret;
ret = kernel_sendmsg(sock, &msg, &rm, 1, rm.iov_len); if (ret < 0)
From: Tony Lindgren tony@atomide.com
[ Upstream commit 71ff461c3f41f6465434b9e980c01782763e7ad8 ]
Commit 3f6634d997db ("iommu: Use right way to retrieve iommu_ops") started triggering a NULL pointer dereference for some omap variants:
__iommu_probe_device from probe_iommu_group+0x2c/0x38 probe_iommu_group from bus_for_each_dev+0x74/0xbc bus_for_each_dev from bus_iommu_probe+0x34/0x2e8 bus_iommu_probe from bus_set_iommu+0x80/0xc8 bus_set_iommu from omap_iommu_init+0x88/0xcc omap_iommu_init from do_one_initcall+0x44/0x24
This is caused by omap iommu probe returning 0 instead of ERR_PTR(-ENODEV) as noted by Jason Gunthorpe jgg@ziepe.ca.
Looks like the regression already happened with an earlier commit 6785eb9105e3 ("iommu/omap: Convert to probe/release_device() call-backs") that changed the function return type and missed converting one place.
Cc: Drew Fustini dfustini@baylibre.com Cc: Lu Baolu baolu.lu@linux.intel.com Cc: Suman Anna s-anna@ti.com Suggested-by: Jason Gunthorpe jgg@ziepe.ca Fixes: 6785eb9105e3 ("iommu/omap: Convert to probe/release_device() call-backs") Fixes: 3f6634d997db ("iommu: Use right way to retrieve iommu_ops") Signed-off-by: Tony Lindgren tony@atomide.com Tested-by: Drew Fustini dfustini@baylibre.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/20220331062301.24269-1-tony@atomide.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/omap-iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c index 980e4af3f06b..d2e82a1b56d8 100644 --- a/drivers/iommu/omap-iommu.c +++ b/drivers/iommu/omap-iommu.c @@ -1661,7 +1661,7 @@ static struct iommu_device *omap_iommu_probe_device(struct device *dev) num_iommus = of_property_count_elems_of_size(dev->of_node, "iommus", sizeof(phandle)); if (num_iommus < 0) - return 0; + return ERR_PTR(-ENODEV);
arch_data = kcalloc(num_iommus + 1, sizeof(*arch_data), GFP_KERNEL); if (!arch_data)
From: James Clark james.clark@arm.com
[ Upstream commit fa7095c5c3240bb2ecbc77f8b69be9b1d9e2cf60 ]
Commit Fixes: b9f6fbb3b2c29736 ("perf arm64: Inject missing frames when using 'perf record --call-graph=fp'") intended to add a 'best effort' DWARF unwind that improved the frame pointer stack in most scenarios.
It's expected that the unwind will fail sometimes, but this shouldn't be reported as an error. It only works when the return address can be determined from the contents of the link register alone.
Fix the error shown when the unwinder requires extra registers by adding a new flag that suppresses error messages. This flag is not set in the normal --call-graph=dwarf unwind mode so that behavior is not changed.
Fixes: b9f6fbb3b2c29736 ("perf arm64: Inject missing frames when using 'perf record --call-graph=fp'") Reported-by: John Garry john.garry@huawei.com Signed-off-by: James Clark james.clark@arm.com Tested-by: John Garry john.garry@huawei.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Alexandre Truong alexandre.truong@arm.com Cc: German Gomez german.gomez@arm.com Cc: Jiri Olsa jolsa@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Link: https://lore.kernel.org/r/20220406145651.1392529-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/tests/dwarf-unwind.c | 2 +- .../perf/util/arm64-frame-pointer-unwind-support.c | 2 +- tools/perf/util/machine.c | 2 +- tools/perf/util/unwind-libdw.c | 10 +++++++--- tools/perf/util/unwind-libdw.h | 1 + tools/perf/util/unwind-libunwind-local.c | 10 +++++++--- tools/perf/util/unwind-libunwind.c | 6 ++++-- tools/perf/util/unwind.h | 13 ++++++++++--- 8 files changed, 32 insertions(+), 14 deletions(-)
diff --git a/tools/perf/tests/dwarf-unwind.c b/tools/perf/tests/dwarf-unwind.c index 2dab2d262060..afdca7f2959f 100644 --- a/tools/perf/tests/dwarf-unwind.c +++ b/tools/perf/tests/dwarf-unwind.c @@ -122,7 +122,7 @@ NO_TAIL_CALL_ATTRIBUTE noinline int test_dwarf_unwind__thread(struct thread *thr }
err = unwind__get_entries(unwind_entry, &cnt, thread, - &sample, MAX_STACK); + &sample, MAX_STACK, false); if (err) pr_debug("unwind failed\n"); else if (cnt != MAX_STACK) { diff --git a/tools/perf/util/arm64-frame-pointer-unwind-support.c b/tools/perf/util/arm64-frame-pointer-unwind-support.c index 2242a885fbd7..4940be4a0569 100644 --- a/tools/perf/util/arm64-frame-pointer-unwind-support.c +++ b/tools/perf/util/arm64-frame-pointer-unwind-support.c @@ -53,7 +53,7 @@ u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thr sample->user_regs.cache_regs[PERF_REG_ARM64_SP] = 0; }
- ret = unwind__get_entries(add_entry, &entries, thread, sample, 2); + ret = unwind__get_entries(add_entry, &entries, thread, sample, 2, true); sample->user_regs = old_regs;
if (ret || entries.length != 2) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 394550003693..564abe17a0bd 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -2983,7 +2983,7 @@ static int thread__resolve_callchain_unwind(struct thread *thread, return 0;
return unwind__get_entries(unwind_entry, cursor, - thread, sample, max_stack); + thread, sample, max_stack, false); }
int thread__resolve_callchain(struct thread *thread, diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c index a74b517f7497..94aa40f6e348 100644 --- a/tools/perf/util/unwind-libdw.c +++ b/tools/perf/util/unwind-libdw.c @@ -200,7 +200,8 @@ frame_callback(Dwfl_Frame *state, void *arg) bool isactivation;
if (!dwfl_frame_pc(state, &pc, NULL)) { - pr_err("%s", dwfl_errmsg(-1)); + if (!ui->best_effort) + pr_err("%s", dwfl_errmsg(-1)); return DWARF_CB_ABORT; }
@@ -208,7 +209,8 @@ frame_callback(Dwfl_Frame *state, void *arg) report_module(pc, ui);
if (!dwfl_frame_pc(state, &pc, &isactivation)) { - pr_err("%s", dwfl_errmsg(-1)); + if (!ui->best_effort) + pr_err("%s", dwfl_errmsg(-1)); return DWARF_CB_ABORT; }
@@ -222,7 +224,8 @@ frame_callback(Dwfl_Frame *state, void *arg) int unwind__get_entries(unwind_entry_cb_t cb, void *arg, struct thread *thread, struct perf_sample *data, - int max_stack) + int max_stack, + bool best_effort) { struct unwind_info *ui, ui_buf = { .sample = data, @@ -231,6 +234,7 @@ int unwind__get_entries(unwind_entry_cb_t cb, void *arg, .cb = cb, .arg = arg, .max_stack = max_stack, + .best_effort = best_effort }; Dwarf_Word ip; int err = -EINVAL, i; diff --git a/tools/perf/util/unwind-libdw.h b/tools/perf/util/unwind-libdw.h index 0cbd2650e280..8c88bc4f2304 100644 --- a/tools/perf/util/unwind-libdw.h +++ b/tools/perf/util/unwind-libdw.h @@ -20,6 +20,7 @@ struct unwind_info { void *arg; int max_stack; int idx; + bool best_effort; struct unwind_entry entries[]; };
diff --git a/tools/perf/util/unwind-libunwind-local.c b/tools/perf/util/unwind-libunwind-local.c index 71a353349181..41e29fc7648a 100644 --- a/tools/perf/util/unwind-libunwind-local.c +++ b/tools/perf/util/unwind-libunwind-local.c @@ -96,6 +96,7 @@ struct unwind_info { struct perf_sample *sample; struct machine *machine; struct thread *thread; + bool best_effort; };
#define dw_read(ptr, type, end) ({ \ @@ -553,7 +554,8 @@ static int access_reg(unw_addr_space_t __maybe_unused as,
ret = perf_reg_value(&val, &ui->sample->user_regs, id); if (ret) { - pr_err("unwind: can't read reg %d\n", regnum); + if (!ui->best_effort) + pr_err("unwind: can't read reg %d\n", regnum); return ret; }
@@ -666,7 +668,7 @@ static int get_entries(struct unwind_info *ui, unwind_entry_cb_t cb, return -1;
ret = unw_init_remote(&c, addr_space, ui); - if (ret) + if (ret && !ui->best_effort) display_error(ret);
while (!ret && (unw_step(&c) > 0) && i < max_stack) { @@ -704,12 +706,14 @@ static int get_entries(struct unwind_info *ui, unwind_entry_cb_t cb,
static int _unwind__get_entries(unwind_entry_cb_t cb, void *arg, struct thread *thread, - struct perf_sample *data, int max_stack) + struct perf_sample *data, int max_stack, + bool best_effort) { struct unwind_info ui = { .sample = data, .thread = thread, .machine = thread->maps->machine, + .best_effort = best_effort };
if (!data->user_regs.regs) diff --git a/tools/perf/util/unwind-libunwind.c b/tools/perf/util/unwind-libunwind.c index e89a5479b361..509c287ee762 100644 --- a/tools/perf/util/unwind-libunwind.c +++ b/tools/perf/util/unwind-libunwind.c @@ -80,9 +80,11 @@ void unwind__finish_access(struct maps *maps)
int unwind__get_entries(unwind_entry_cb_t cb, void *arg, struct thread *thread, - struct perf_sample *data, int max_stack) + struct perf_sample *data, int max_stack, + bool best_effort) { if (thread->maps->unwind_libunwind_ops) - return thread->maps->unwind_libunwind_ops->get_entries(cb, arg, thread, data, max_stack); + return thread->maps->unwind_libunwind_ops->get_entries(cb, arg, thread, data, + max_stack, best_effort); return 0; } diff --git a/tools/perf/util/unwind.h b/tools/perf/util/unwind.h index ab8ad469c8de..b2a03fa5289b 100644 --- a/tools/perf/util/unwind.h +++ b/tools/perf/util/unwind.h @@ -23,13 +23,19 @@ struct unwind_libunwind_ops { void (*finish_access)(struct maps *maps); int (*get_entries)(unwind_entry_cb_t cb, void *arg, struct thread *thread, - struct perf_sample *data, int max_stack); + struct perf_sample *data, int max_stack, bool best_effort); };
#ifdef HAVE_DWARF_UNWIND_SUPPORT +/* + * When best_effort is set, don't report errors and fail silently. This could + * be expanded in the future to be more permissive about things other than + * error messages. + */ int unwind__get_entries(unwind_entry_cb_t cb, void *arg, struct thread *thread, - struct perf_sample *data, int max_stack); + struct perf_sample *data, int max_stack, + bool best_effort); /* libunwind specific */ #ifdef HAVE_LIBUNWIND_SUPPORT #ifndef LIBUNWIND__ARCH_REG_ID @@ -65,7 +71,8 @@ unwind__get_entries(unwind_entry_cb_t cb __maybe_unused, void *arg __maybe_unused, struct thread *thread __maybe_unused, struct perf_sample *data __maybe_unused, - int max_stack __maybe_unused) + int max_stack __maybe_unused, + bool best_effort __maybe_unused) { return 0; }
From: James Clark james.clark@arm.com
[ Upstream commit ffab487052054162b3b6c9c6005777ec6cfcea05 ]
Since commit bb30acae4c4dacfa ("perf report: Bail out --mem-mode if mem info is not available") "perf mem report" and "perf report --mem-mode" don't allow opening the file unless one of the events has PERF_SAMPLE_DATA_SRC set.
SPE doesn't have this set even though synthetic memory data is generated after it is decoded. Fix this issue by setting DATA_SRC on SPE events. This has no effect on the data collected because the SPE driver doesn't do anything with that flag and doesn't generate samples.
Fixes: bb30acae4c4dacfa ("perf report: Bail out --mem-mode if mem info is not available") Signed-off-by: James Clark james.clark@arm.com Tested-by: Leo Yan leo.yan@linaro.org Acked-by: Namhyung Kim namhyung@kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: German Gomez german.gomez@arm.com Cc: Jiri Olsa jolsa@kernel.org Cc: John Garry john.garry@huawei.com Cc: Leo Yan leo.yan@linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Ravi Bangoria ravi.bangoria@linux.ibm.com Cc: Will Deacon will@kernel.org Link: https://lore.kernel.org/r/20220408144056.1955535-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/arch/arm64/util/arm-spe.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c index 2100d46ccf5e..bb4ab99afa7f 100644 --- a/tools/perf/arch/arm64/util/arm-spe.c +++ b/tools/perf/arch/arm64/util/arm-spe.c @@ -239,6 +239,12 @@ static int arm_spe_recording_options(struct auxtrace_record *itr, arm_spe_set_timestamp(itr, arm_spe_evsel); }
+ /* + * Set this only so that perf report knows that SPE generates memory info. It has no effect + * on the opening of the event or the SPE data produced. + */ + evsel__set_sample_bit(arm_spe_evsel, DATA_SRC); + /* Add dummy event to keep tracking */ err = parse_events(evlist, "dummy:u", NULL); if (err)
From: Adrian Hunter adrian.hunter@intel.com
[ Upstream commit aeee9dc53ce405d2161f9915f553114e94e5b677 ]
eprintf() does not expect va_list as the type of the 4th parameter.
Use veprintf() because it does.
Signed-off-by: Adrian Hunter adrian.hunter@intel.com Fixes: 428dab813a56ce94 ("libperf: Merge libperf_set_print() into libperf_init()") Cc: Jiri Olsa jolsa@kernel.org Link: https://lore.kernel.org/r/20220408132625.2451452-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/perf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/perf.c b/tools/perf/perf.c index 2f6b67189b42..6aae7b6c376b 100644 --- a/tools/perf/perf.c +++ b/tools/perf/perf.c @@ -434,7 +434,7 @@ void pthread__unblock_sigwinch(void) static int libperf_print(enum libperf_print_level level, const char *fmt, va_list ap) { - return eprintf(level, verbose, fmt, ap); + return veprintf(level, verbose, fmt, ap); }
int main(int argc, const char **argv)
From: Denis Nikitin denik@chromium.org
[ Upstream commit bc21e74d4775f883ae1f542c1f1dc7205b15d925 ]
If a perf event doesn't fit into remaining buffer space return NULL to remap buf and fetch the event again.
Keep the logic to error out on inadequate input from fuzzing.
This fixes perf failing on ChromeOS (with 32b userspace):
$ perf report -v -i perf.data ... prefetch_event: head=0x1fffff8 event->header_size=0x30, mmap_size=0x2000000: fuzzed or compressed perf.data? Error: failed to process sample
Fixes: 57fc032ad643ffd0 ("perf session: Avoid infinite loop when seeing invalid header.size") Reviewed-by: James Clark james.clark@arm.com Signed-off-by: Denis Nikitin denik@chromium.org Acked-by: Jiri Olsa jolsa@kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Alexey Budankov alexey.budankov@linux.intel.com Cc: Namhyung Kim namhyung@kernel.org Link: https://lore.kernel.org/r/20220330031130.2152327-1-denik@chromium.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/session.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 498b05708db5..245dc70d1882 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -2084,6 +2084,7 @@ prefetch_event(char *buf, u64 head, size_t mmap_size, bool needs_swap, union perf_event *error) { union perf_event *event; + u16 event_size;
/* * Ensure we have enough space remaining to read @@ -2096,15 +2097,23 @@ prefetch_event(char *buf, u64 head, size_t mmap_size, if (needs_swap) perf_event_header__bswap(&event->header);
- if (head + event->header.size <= mmap_size) + event_size = event->header.size; + if (head + event_size <= mmap_size) return event;
/* We're not fetching the event so swap back again */ if (needs_swap) perf_event_header__bswap(&event->header);
- pr_debug("%s: head=%#" PRIx64 " event->header_size=%#x, mmap_size=%#zx:" - " fuzzed or compressed perf.data?\n",__func__, head, event->header.size, mmap_size); + /* Check if the event fits into the next mmapped buf. */ + if (event_size <= mmap_size - head % page_size) { + /* Remap buf and fetch again. */ + return NULL; + } + + /* Invalid input. Event size should never exceed mmap_size. */ + pr_debug("%s: head=%#" PRIx64 " event->header.size=%#x, mmap_size=%#zx:" + " fuzzed or compressed perf.data?\n", __func__, head, event_size, mmap_size);
return error; }
From: Chanho Park chanho61.park@samsung.com
commit 83bea32ac7ed37bbda58733de61fc9369513f9f9 upstream.
Add the MIDR part number info for the Arm Cortex-A78AE[1] and add it to spectre-BHB affected list[2].
[1]: https://developer.arm.com/Processors/Cortex-A78AE [2]: https://developer.arm.com/Arm%20Security%20Center/Spectre-BHB
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Mark Rutland mark.rutland@arm.com Cc: Will Deacon will@kernel.org Cc: James Morse james.morse@arm.com Signed-off-by: Chanho Park chanho61.park@samsung.com Link: https://lore.kernel.org/r/20220407091128.8700-1-chanho61.park@samsung.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/cputype.h | 2 ++ arch/arm64/kernel/proton-pack.c | 1 + 2 files changed, 3 insertions(+)
--- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -75,6 +75,7 @@ #define ARM_CPU_PART_CORTEX_A77 0xD0D #define ARM_CPU_PART_NEOVERSE_V1 0xD40 #define ARM_CPU_PART_CORTEX_A78 0xD41 +#define ARM_CPU_PART_CORTEX_A78AE 0xD42 #define ARM_CPU_PART_CORTEX_X1 0xD44 #define ARM_CPU_PART_CORTEX_A510 0xD46 #define ARM_CPU_PART_CORTEX_A710 0xD47 @@ -123,6 +124,7 @@ #define MIDR_CORTEX_A77 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A77) #define MIDR_NEOVERSE_V1 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_NEOVERSE_V1) #define MIDR_CORTEX_A78 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A78) +#define MIDR_CORTEX_A78AE MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A78AE) #define MIDR_CORTEX_X1 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_X1) #define MIDR_CORTEX_A510 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A510) #define MIDR_CORTEX_A710 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A710) --- a/arch/arm64/kernel/proton-pack.c +++ b/arch/arm64/kernel/proton-pack.c @@ -853,6 +853,7 @@ u8 spectre_bhb_loop_affected(int scope) if (scope == SCOPE_LOCAL_CPU) { static const struct midr_range spectre_bhb_k32_list[] = { MIDR_ALL_VERSIONS(MIDR_CORTEX_A78), + MIDR_ALL_VERSIONS(MIDR_CORTEX_A78AE), MIDR_ALL_VERSIONS(MIDR_CORTEX_A78C), MIDR_ALL_VERSIONS(MIDR_CORTEX_X1), MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
From: Damien Le Moal damien.lemoal@opensource.wdc.com
commit 87d663d40801dffc99a5ad3b0188ad3e2b4d1557 upstream.
The function mpt3sas_transport_port_remove() called in _scsih_expander_node_remove() frees the port field of the sas_expander structure, leading to the following use-after-free splat from KASAN when the ioc_info() call following that function is executed (e.g. when doing rmmod of the driver module):
[ 3479.371167] ================================================================== [ 3479.378496] BUG: KASAN: use-after-free in _scsih_expander_node_remove+0x710/0x750 [mpt3sas] [ 3479.386936] Read of size 1 at addr ffff8881c037691c by task rmmod/1531 [ 3479.393524] [ 3479.395035] CPU: 18 PID: 1531 Comm: rmmod Not tainted 5.17.0-rc8+ #1436 [ 3479.401712] Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.1 06/02/2021 [ 3479.409263] Call Trace: [ 3479.411743] <TASK> [ 3479.413875] dump_stack_lvl+0x45/0x59 [ 3479.417582] print_address_description.constprop.0+0x1f/0x120 [ 3479.423389] ? _scsih_expander_node_remove+0x710/0x750 [mpt3sas] [ 3479.429469] kasan_report.cold+0x83/0xdf [ 3479.433438] ? _scsih_expander_node_remove+0x710/0x750 [mpt3sas] [ 3479.439514] _scsih_expander_node_remove+0x710/0x750 [mpt3sas] [ 3479.445411] ? _raw_spin_unlock_irqrestore+0x2d/0x40 [ 3479.452032] scsih_remove+0x525/0xc90 [mpt3sas] [ 3479.458212] ? mpt3sas_expander_remove+0x1d0/0x1d0 [mpt3sas] [ 3479.465529] ? down_write+0xde/0x150 [ 3479.470746] ? up_write+0x14d/0x460 [ 3479.475840] ? kernfs_find_ns+0x137/0x310 [ 3479.481438] pci_device_remove+0x65/0x110 [ 3479.487013] __device_release_driver+0x316/0x680 [ 3479.493180] driver_detach+0x1ec/0x2d0 [ 3479.498499] bus_remove_driver+0xe7/0x2d0 [ 3479.504081] pci_unregister_driver+0x26/0x250 [ 3479.510033] _mpt3sas_exit+0x2b/0x6cf [mpt3sas] [ 3479.516144] __x64_sys_delete_module+0x2fd/0x510 [ 3479.522315] ? free_module+0xaa0/0xaa0 [ 3479.527593] ? __cond_resched+0x1c/0x90 [ 3479.532951] ? lockdep_hardirqs_on_prepare+0x273/0x3e0 [ 3479.539607] ? syscall_enter_from_user_mode+0x21/0x70 [ 3479.546161] ? trace_hardirqs_on+0x1c/0x110 [ 3479.551828] do_syscall_64+0x35/0x80 [ 3479.556884] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3479.563402] RIP: 0033:0x7f1fc482483b ... [ 3479.943087] ==================================================================
Fix this by introducing the local variable port_id to store the port ID value before executing mpt3sas_transport_port_remove(). This local variable is then used in the call to ioc_info() instead of dereferencing the freed port structure.
Link: https://lore.kernel.org/r/20220322055702.95276-1-damien.lemoal@opensource.wd... Fixes: 7d310f241001 ("scsi: mpt3sas: Get device objects using sas_address & portID") Cc: stable@vger.kernel.org Acked-by: Sreekanth Reddy sreekanth.reddy@broadcom.com Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -11035,6 +11035,7 @@ _scsih_expander_node_remove(struct MPT3S { struct _sas_port *mpt3sas_port, *next; unsigned long flags; + int port_id;
/* remove sibling ports attached to this expander */ list_for_each_entry_safe(mpt3sas_port, next, @@ -11055,6 +11056,8 @@ _scsih_expander_node_remove(struct MPT3S mpt3sas_port->hba_port); }
+ port_id = sas_expander->port->port_id; + mpt3sas_transport_port_remove(ioc, sas_expander->sas_address, sas_expander->sas_address_parent, sas_expander->port);
@@ -11062,7 +11065,7 @@ _scsih_expander_node_remove(struct MPT3S "expander_remove: handle(0x%04x), sas_addr(0x%016llx), port:%d\n", sas_expander->handle, (unsigned long long) sas_expander->sas_address, - sas_expander->port->port_id); + port_id);
spin_lock_irqsave(&ioc->sas_node_lock, flags); list_del(&sas_expander->list);
From: Adrian Hunter adrian.hunter@intel.com
commit 4049f7acef3eb37c1ea0df45f3ffc29404f4e708 upstream.
Add PCI ID and callbacks to support Intel Meteor Lake (MTL).
Link: https://lore.kernel.org/r/20220404055038.2208051-1-adrian.hunter@intel.com Cc: stable@vger.kernel.org # v5.15+ Reviewed-by: Avri Altman avri.altman@wdc.com Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Adrian Hunter adrian.hunter@intel.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/ufs/ufshcd-pci.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
--- a/drivers/scsi/ufs/ufshcd-pci.c +++ b/drivers/scsi/ufs/ufshcd-pci.c @@ -428,6 +428,12 @@ static int ufs_intel_adl_init(struct ufs return ufs_intel_common_init(hba); }
+static int ufs_intel_mtl_init(struct ufs_hba *hba) +{ + hba->caps |= UFSHCD_CAP_CRYPTO | UFSHCD_CAP_WB_EN; + return ufs_intel_common_init(hba); +} + static struct ufs_hba_variant_ops ufs_intel_cnl_hba_vops = { .name = "intel-pci", .init = ufs_intel_common_init, @@ -465,6 +471,16 @@ static struct ufs_hba_variant_ops ufs_in .device_reset = ufs_intel_device_reset, };
+static struct ufs_hba_variant_ops ufs_intel_mtl_hba_vops = { + .name = "intel-pci", + .init = ufs_intel_mtl_init, + .exit = ufs_intel_common_exit, + .hce_enable_notify = ufs_intel_hce_enable_notify, + .link_startup_notify = ufs_intel_link_startup_notify, + .resume = ufs_intel_resume, + .device_reset = ufs_intel_device_reset, +}; + #ifdef CONFIG_PM_SLEEP static int ufshcd_pci_restore(struct device *dev) { @@ -579,6 +595,7 @@ static const struct pci_device_id ufshcd { PCI_VDEVICE(INTEL, 0x98FA), (kernel_ulong_t)&ufs_intel_lkf_hba_vops }, { PCI_VDEVICE(INTEL, 0x51FF), (kernel_ulong_t)&ufs_intel_adl_hba_vops }, { PCI_VDEVICE(INTEL, 0x54FF), (kernel_ulong_t)&ufs_intel_adl_hba_vops }, + { PCI_VDEVICE(INTEL, 0x7E47), (kernel_ulong_t)&ufs_intel_mtl_hba_vops }, { } /* terminate list */ };
From: Pali Rohár pali@kernel.org
commit 7e2646ed47542123168d43916b84b954532e5386 upstream.
This reverts commit bb32e1987bc55ce1db400faf47d85891da3c9b9f.
Commit 1a3ed0dc3594 ("mmc: sdhci-xenon: fix 1.8v regulator stabilization") contains proper fix for the issue described in commit bb32e1987bc5 ("mmc: sdhci-xenon: fix annoying 1.8V regulator warning").
Fixes: 8d876bf472db ("mmc: sdhci-xenon: wait 5ms after set 1.8V signal enable") Cc: stable@vger.kernel.org # 1a3ed0dc3594 ("mmc: sdhci-xenon: fix 1.8v regulator stabilization") Signed-off-by: Pali Rohár pali@kernel.org Reviewed-by: Marek Behún kabel@kernel.org Reviewed-by: Marcin Wojtas mw@semihalf.com Link: https://lore.kernel.org/r/20220318141441.32329-1-pali@kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/sdhci-xenon.c | 10 ---------- 1 file changed, 10 deletions(-)
--- a/drivers/mmc/host/sdhci-xenon.c +++ b/drivers/mmc/host/sdhci-xenon.c @@ -241,16 +241,6 @@ static void xenon_voltage_switch(struct { /* Wait for 5ms after set 1.8V signal enable bit */ usleep_range(5000, 5500); - - /* - * For some reason the controller's Host Control2 register reports - * the bit representing 1.8V signaling as 0 when read after it was - * written as 1. Subsequent read reports 1. - * - * Since this may cause some issues, do an empty read of the Host - * Control2 register here to circumvent this. - */ - sdhci_readw(host, SDHCI_HOST_CONTROL2); }
static unsigned int xenon_get_max_clock(struct sdhci_host *host)
From: Christian Löhle CLoehle@hyperstone.com
commit 5d435933376962b107bd76970912e7e80247dcc7 upstream.
Introduce a SEND_STATUS check for writes through SPI to not mark an unsuccessful write as successful.
Since SPI SD/MMC does not have states, after a write, the card will just hold the line LOW until it is ready again. The driver marks the write therefore as completed as soon as it reads something other than all zeroes. The driver does not distinguish from a card no longer signalling busy and it being disconnected (and the line being pulled-up by the host). This lead to writes being marked as successful when disconnecting a busy card. Now the card is ensured to be still connected by an additional CMD13, just like non-SPI is ensured to go back to TRAN state.
While at it and since we already poll for the post-write status anyway, we might as well check for SPIs error bits (any of them).
The disconnecting card problem is reproducable for me after continuous write activity and randomly disconnecting, around every 20-50 tries on SPI DS for some card.
Fixes: 7213d175e3b6f ("MMC/SD card driver learns SPI") Cc: stable@vger.kernel.org Signed-off-by: Christian Loehle cloehle@hyperstone.com Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Link: https://lore.kernel.org/r/76f6f5d2b35543bab3dfe438f268609c@hyperstone.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/core/block.c | 34 +++++++++++++++++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-)
--- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -1880,6 +1880,31 @@ static inline bool mmc_blk_rq_error(stru brq->data.error || brq->cmd.resp[0] & CMD_ERRORS; }
+static int mmc_spi_err_check(struct mmc_card *card) +{ + u32 status = 0; + int err; + + /* + * SPI does not have a TRAN state we have to wait on, instead the + * card is ready again when it no longer holds the line LOW. + * We still have to ensure two things here before we know the write + * was successful: + * 1. The card has not disconnected during busy and we actually read our + * own pull-up, thinking it was still connected, so ensure it + * still responds. + * 2. Check for any error bits, in particular R1_SPI_IDLE to catch a + * just reconnected card after being disconnected during busy. + */ + err = __mmc_send_status(card, &status, 0); + if (err) + return err; + /* All R1 and R2 bits of SPI are errors in our case */ + if (status) + return -EIO; + return 0; +} + static int mmc_blk_busy_cb(void *cb_data, bool *busy) { struct mmc_blk_busy_data *data = cb_data; @@ -1903,9 +1928,16 @@ static int mmc_blk_card_busy(struct mmc_ struct mmc_blk_busy_data cb_data; int err;
- if (mmc_host_is_spi(card->host) || rq_data_dir(req) == READ) + if (rq_data_dir(req) == READ) return 0;
+ if (mmc_host_is_spi(card->host)) { + err = mmc_spi_err_check(card); + if (err) + mqrq->brq.data.bytes_xfered = 0; + return err; + } + cb_data.card = card; cb_data.status = 0; err = __mmc_poll_for_busy(card->host, 0, MMC_BLK_TIMEOUT_MS,
From: Yann Gautier yann.gautier@foss.st.com
commit 0d319dd5a27183b75d984e3dc495248e59f99334 upstream.
Use sg and not data->sg when checking sg list elements. Else only the first element alignment is checked. The last element should be checked the same way, for_each_sg already set sg to sg_next(sg).
Fixes: 46b723dd867d ("mmc: mmci: add stm32 sdmmc variant") Cc: stable@vger.kernel.org Signed-off-by: Yann Gautier yann.gautier@foss.st.com Link: https://lore.kernel.org/r/20220317111944.116148-2-yann.gautier@foss.st.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/mmci_stm32_sdmmc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/mmc/host/mmci_stm32_sdmmc.c +++ b/drivers/mmc/host/mmci_stm32_sdmmc.c @@ -62,8 +62,8 @@ static int sdmmc_idma_validate_data(stru * excepted the last element which has no constraint on idmasize */ for_each_sg(data->sg, sg, data->sg_len - 1, i) { - if (!IS_ALIGNED(data->sg->offset, sizeof(u32)) || - !IS_ALIGNED(data->sg->length, SDMMC_IDMA_BURST)) { + if (!IS_ALIGNED(sg->offset, sizeof(u32)) || + !IS_ALIGNED(sg->length, SDMMC_IDMA_BURST)) { dev_err(mmc_dev(host->mmc), "unaligned scatterlist: ofst:%x length:%d\n", data->sg->offset, data->sg->length); @@ -71,7 +71,7 @@ static int sdmmc_idma_validate_data(stru } }
- if (!IS_ALIGNED(data->sg->offset, sizeof(u32))) { + if (!IS_ALIGNED(sg->offset, sizeof(u32))) { dev_err(mmc_dev(host->mmc), "unaligned last scatterlist: ofst:%x length:%d\n", data->sg->offset, data->sg->length);
From: Wolfram Sang wsa+renesas@sang-engineering.com
commit 46d4820f949a3030b19ee482c68a50b06dd27590 upstream.
Previous documentation was vague, so we included SDR104 for slow SDnH clock settings. It turns out now, that it is only needed for HS400.
Fixes: bb6d3fa98a41 ("clk: renesas: rcar-gen3: Switch to new SD clock handling") Cc: stable@vger.kernel.org Reported-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Signed-off-by: Wolfram Sang wsa+renesas@sang-engineering.com Reviewed-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Link: https://lore.kernel.org/r/20220404100508.3209-1-wsa+renesas@sang-engineering... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/renesas_sdhi_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/mmc/host/renesas_sdhi_core.c +++ b/drivers/mmc/host/renesas_sdhi_core.c @@ -144,9 +144,9 @@ static unsigned int renesas_sdhi_clk_upd return clk_get_rate(priv->clk);
if (priv->clkh) { + /* HS400 with 4TAP needs different clock settings */ bool use_4tap = priv->quirks && priv->quirks->hs400_4taps; - bool need_slow_clkh = (host->mmc->ios.timing == MMC_TIMING_UHS_SDR104) || - (host->mmc->ios.timing == MMC_TIMING_MMC_HS400); + bool need_slow_clkh = host->mmc->ios.timing == MMC_TIMING_MMC_HS400; clkh_shift = use_4tap && need_slow_clkh ? 1 : 2; ref_clk = priv->clkh; }
From: Wolfram Sang wsa+renesas@sang-engineering.com
commit 03e59b1e2f56245163b14c69e0a830c24b1a3a47 upstream.
When HS400 tuning is complete and HS400 is going to be activated, we have to keep the current number of TAPs and should not overwrite them with a hardcoded value. This was probably a copy&paste mistake when upporting HS400 support from the BSP.
Fixes: 26eb2607fa28 ("mmc: renesas_sdhi: add eMMC HS400 mode support") Reported-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Signed-off-by: Wolfram Sang wsa+renesas@sang-engineering.com Reviewed-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220404114902.12175-1-wsa+renesas@sang-engineerin... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/renesas_sdhi_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/mmc/host/renesas_sdhi_core.c +++ b/drivers/mmc/host/renesas_sdhi_core.c @@ -396,10 +396,10 @@ static void renesas_sdhi_hs400_complete( SH_MOBILE_SDHI_SCC_TMPPORT2_HS400OSEL) | sd_scc_read32(host, priv, SH_MOBILE_SDHI_SCC_TMPPORT2));
- /* Set the sampling clock selection range of HS400 mode */ sd_scc_write32(host, priv, SH_MOBILE_SDHI_SCC_DTCNTL, SH_MOBILE_SDHI_SCC_DTCNTL_TAPEN | - 0x4 << SH_MOBILE_SDHI_SCC_DTCNTL_TAPNUM_SHIFT); + sd_scc_read32(host, priv, + SH_MOBILE_SDHI_SCC_DTCNTL));
/* Avoid bad TAP */ if (bad_taps & BIT(priv->tap_set)) {
From: Michael Wu michael@allwinnertech.com
commit 08ebf903af57cda6d773f3dd1671b64f73b432b8 upstream.
During the card initialization process, the mmc core checks whether the eMMC/SD card supports an internal writeback-cache and then enables it inside the card.
Unfortunately, this isn't according to what the mmc core reports to the upper block layer. Instead, the writeback-cache support with REQ_FLUSH and REQ_FUA, are being enabled depending on whether the host supports the CMD23 (MMC_CAP_CMD23) and whether an eMMC supports the reliable-write command.
This is wrong and it may also sound awkward. In fact, it's a remnant from when both eMMC/SD cards didn't have dedicated commands/support to control the internal writeback-cache. In other words, it was the best we could do at that point in time.
To fix the problem, but also without breaking backwards compatibility, let's align the REQ_FLUSH support with whether the writeback-cache became successfully enabled - for both eMMC and SD cards.
Cc: stable@kernel.org Fixes: 881d1c25f765 ("mmc: core: Add cache control for eMMC4.5 device") Fixes: 130206a615a9 ("mmc: core: Add support for cache ctrl for SD cards") Depends-on: 97fce126e279 ("mmc: block: Issue a cache flush only when it's enabled") Reviewed-by: Avri Altman Avri.Altman@wdc.com Signed-off-by: Michael Wu michael@allwinnertech.com Link: https://lore.kernel.org/r/20220331073223.106415-1-michael@allwinnertech.com [Ulf: Re-wrote the commit message] Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/core/block.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
--- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -2382,6 +2382,8 @@ static struct mmc_blk_data *mmc_blk_allo struct mmc_blk_data *md; int devidx, ret; char cap_str[10]; + bool cache_enabled = false; + bool fua_enabled = false;
devidx = ida_simple_get(&mmc_blk_ida, 0, max_devices, GFP_KERNEL); if (devidx < 0) { @@ -2461,13 +2463,17 @@ static struct mmc_blk_data *mmc_blk_allo md->flags |= MMC_BLK_CMD23; }
- if (mmc_card_mmc(card) && - md->flags & MMC_BLK_CMD23 && + if (md->flags & MMC_BLK_CMD23 && ((card->ext_csd.rel_param & EXT_CSD_WR_REL_PARAM_EN) || card->ext_csd.rel_sectors)) { md->flags |= MMC_BLK_REL_WR; - blk_queue_write_cache(md->queue.queue, true, true); + fua_enabled = true; + cache_enabled = true; } + if (mmc_cache_enabled(card->host)) + cache_enabled = true; + + blk_queue_write_cache(md->queue.queue, cache_enabled, fua_enabled);
string_get_size((u64)size, 512, STRING_UNITS_2, cap_str, sizeof(cap_str));
From: Guo Xuenan guoxuenan@huawei.com
commit eafc0a02391b7b36617b36c97c4b5d6832cf5e24 upstream.
When partialDecoding, it is EOF if we've either filled the output buffer or can't proceed with reading an offset for following match.
In some extreme corner cases when compressed data is suitably corrupted, UAF will occur. As reported by KASAN [1], LZ4_decompress_safe_partial may lead to read out of bound problem during decoding. lz4 upstream has fixed it [2] and this issue has been disscussed here [3] before.
current decompression routine was ported from lz4 v1.8.3, bumping lib/lz4 to v1.9.+ is certainly a huge work to be done later, so, we'd better fix it first.
[1] https://lore.kernel.org/all/000000000000830d1205cf7f0477@google.com/ [2] https://github.com/lz4/lz4/commit/c5d6f8a8be3927c0bec91bcc58667a6cfad244ad# [3] https://lore.kernel.org/all/CC666AE8-4CA4-4951-B6FB-A2EFDE3AC03B@fb.com/
Link: https://lkml.kernel.org/r/20211111105048.2006070-1-guoxuenan@huawei.com Reported-by: syzbot+63d688f1d899c588fb71@syzkaller.appspotmail.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Nick Terrell terrelln@fb.com Acked-by: Gao Xiang hsiangkao@linux.alibaba.com Cc: Yann Collet cyan@fb.com Cc: Chengyang Fan cy.fan@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- lib/lz4/lz4_decompress.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/lib/lz4/lz4_decompress.c +++ b/lib/lz4/lz4_decompress.c @@ -271,8 +271,12 @@ static FORCE_INLINE int LZ4_decompress_g ip += length; op += length;
- /* Necessarily EOF, due to parsing restrictions */ - if (!partialDecoding || (cpy == oend)) + /* Necessarily EOF when !partialDecoding. + * When partialDecoding, it is EOF if we've either + * filled the output buffer or + * can't proceed with reading an offset for following match. + */ + if (!partialDecoding || (cpy == oend) || (ip >= (iend - 2))) break; } else { /* may overwrite up to WILDCOPYLENGTH beyond cpy */
From: Max Filippov jcmvbkbc@gmail.com
commit 66f133ceab7456c789f70a242991ed1b27ba1c3d upstream.
When CONFIG_DEBUG_KMAP_LOCAL is enabled __kmap_local_sched_{in,out} check that even slots in the tsk->kmap_ctrl.pteval are unmapped. The slots are initialized with 0 value, but the check is done with pte_none. 0 pte however does not necessarily mean that pte_none will return true. e.g. on xtensa it returns false, resulting in the following runtime warnings:
WARNING: CPU: 0 PID: 101 at mm/highmem.c:627 __kmap_local_sched_out+0x51/0x108 CPU: 0 PID: 101 Comm: touch Not tainted 5.17.0-rc7-00010-gd3a1cdde80d2-dirty #13 Call Trace: dump_stack+0xc/0x40 __warn+0x8f/0x174 warn_slowpath_fmt+0x48/0xac __kmap_local_sched_out+0x51/0x108 __schedule+0x71a/0x9c4 preempt_schedule_irq+0xa0/0xe0 common_exception_return+0x5c/0x93 do_wp_page+0x30e/0x330 handle_mm_fault+0xa70/0xc3c do_page_fault+0x1d8/0x3c4 common_exception+0x7f/0x7f
WARNING: CPU: 0 PID: 101 at mm/highmem.c:664 __kmap_local_sched_in+0x50/0xe0 CPU: 0 PID: 101 Comm: touch Tainted: G W 5.17.0-rc7-00010-gd3a1cdde80d2-dirty #13 Call Trace: dump_stack+0xc/0x40 __warn+0x8f/0x174 warn_slowpath_fmt+0x48/0xac __kmap_local_sched_in+0x50/0xe0 finish_task_switch$isra$0+0x1ce/0x2f8 __schedule+0x86e/0x9c4 preempt_schedule_irq+0xa0/0xe0 common_exception_return+0x5c/0x93 do_wp_page+0x30e/0x330 handle_mm_fault+0xa70/0xc3c do_page_fault+0x1d8/0x3c4 common_exception+0x7f/0x7f
Fix it by replacing !pte_none(pteval) with pte_val(pteval) != 0.
Link: https://lkml.kernel.org/r/20220403235159.3498065-1-jcmvbkbc@gmail.com Fixes: 5fbda3ecd14a ("sched: highmem: Store local kmaps in task struct") Signed-off-by: Max Filippov jcmvbkbc@gmail.com Reviewed-by: Thomas Gleixner tglx@linutronix.de Cc: "Peter Zijlstra (Intel)" peterz@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/highmem.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/highmem.c +++ b/mm/highmem.c @@ -624,7 +624,7 @@ void __kmap_local_sched_out(void)
/* With debug all even slots are unmapped and act as guard */ if (IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL) && !(i & 0x01)) { - WARN_ON_ONCE(!pte_none(pteval)); + WARN_ON_ONCE(pte_val(pteval) != 0); continue; } if (WARN_ON_ONCE(pte_none(pteval))) @@ -661,7 +661,7 @@ void __kmap_local_sched_in(void)
/* With debug all even slots are unmapped and act as guard */ if (IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL) && !(i & 0x01)) { - WARN_ON_ONCE(!pte_none(pteval)); + WARN_ON_ONCE(pte_val(pteval) != 0); continue; } if (WARN_ON_ONCE(pte_none(pteval)))
From: Paolo Bonzini pbonzini@redhat.com
commit 01e67e04c28170c47700c2c226d732bbfedb1ad0 upstream.
If an mremap() syscall with old_size=0 ends up in move_page_tables(), it will call invalidate_range_start()/invalidate_range_end() unnecessarily, i.e. with an empty range.
This causes a WARN in KVM's mmu_notifier. In the past, empty ranges have been diagnosed to be off-by-one bugs, hence the WARNing. Given the low (so far) number of unique reports, the benefits of detecting more buggy callers seem to outweigh the cost of having to fix cases such as this one, where userspace is doing something silly. In this particular case, an early return from move_page_tables() is enough to fix the issue.
Link: https://lkml.kernel.org/r/20220329173155.172439-1-pbonzini@redhat.com Reported-by: syzbot+6bde52d89cfdf9f61425@syzkaller.appspotmail.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Cc: Sean Christopherson seanjc@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/mremap.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/mm/mremap.c +++ b/mm/mremap.c @@ -486,6 +486,9 @@ unsigned long move_page_tables(struct vm pmd_t *old_pmd, *new_pmd; pud_t *old_pud, *new_pud;
+ if (!len) + return 0; + old_end = old_addr + len; flush_cache_range(vma, old_addr, old_end);
From: Miaohe Lin linmiaohe@huawei.com
commit 4ad099559b00ac01c3726e5c95dc3108ef47d03e upstream.
If mpol_new is allocated but not used in restart loop, mpol_new will be freed via mpol_put before returning to the caller. But refcnt is not initialized yet, so mpol_put could not do the right things and might leak the unused mpol_new. This would happen if mempolicy was updated on the shared shmem file while the sp->lock has been dropped during the memory allocation.
This issue could be triggered easily with the below code snippet if there are many processes doing the below work at the same time:
shmid = shmget((key_t)5566, 1024 * PAGE_SIZE, 0666|IPC_CREAT); shm = shmat(shmid, 0, 0); loop many times { mbind(shm, 1024 * PAGE_SIZE, MPOL_LOCAL, mask, maxnode, 0); mbind(shm + 128 * PAGE_SIZE, 128 * PAGE_SIZE, MPOL_DEFAULT, mask, maxnode, 0); }
Link: https://lkml.kernel.org/r/20220329111416.27954-1-linmiaohe@huawei.com Fixes: 42288fe366c4 ("mm: mempolicy: Convert shared_policy mutex to spinlock") Signed-off-by: Miaohe Lin linmiaohe@huawei.com Acked-by: Michal Hocko mhocko@suse.com Cc: KOSAKI Motohiro kosaki.motohiro@jp.fujitsu.com Cc: Mel Gorman mgorman@suse.de Cc: stable@vger.kernel.org [3.8] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/mempolicy.c | 1 + 1 file changed, 1 insertion(+)
--- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2736,6 +2736,7 @@ alloc_new: mpol_new = kmem_cache_alloc(policy_cache, GFP_KERNEL); if (!mpol_new) goto err_out; + atomic_set(&mpol_new->refcnt, 1); goto restart; }
From: Jens Axboe axboe@kernel.dk
commit ec858afda857e361182ceafc3d2ba2b164b8e889 upstream.
This is a leftover from the really old days where we weren't able to track and error early if we need a file and it wasn't assigned. Kill the check.
Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 3 --- 1 file changed, 3 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -4245,9 +4245,6 @@ static int io_fsync_prep(struct io_kiocb { struct io_ring_ctx *ctx = req->ctx;
- if (!req->file) - return -EBADF; - if (unlikely(ctx->flags & IORING_SETUP_IOPOLL)) return -EINVAL; if (unlikely(sqe->addr || sqe->ioprio || sqe->buf_index ||
From: Jens Axboe axboe@kernel.dk
commit a3e4bc23d5470b2beb7cc42a86b6a3e75b704c15 upstream.
In preparation for not using the file at prep time, defer checking if this file refers to a valid io_uring instance until issue time.
This also means we can get rid of the cleanup flag for splice and tee.
Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 49 +++++++++++++++++++++---------------------------- 1 file changed, 21 insertions(+), 28 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -621,10 +621,10 @@ struct io_epoll {
struct io_splice { struct file *file_out; - struct file *file_in; loff_t off_out; loff_t off_in; u64 len; + int splice_fd_in; unsigned int flags; };
@@ -1551,14 +1551,6 @@ static void io_prep_async_work(struct io if (def->unbound_nonreg_file) req->work.flags |= IO_WQ_WORK_UNBOUND; } - - switch (req->opcode) { - case IORING_OP_SPLICE: - case IORING_OP_TEE: - if (!S_ISREG(file_inode(req->splice.file_in)->i_mode)) - req->work.flags |= IO_WQ_WORK_UNBOUND; - break; - } }
static void io_prep_async_link(struct io_kiocb *req) @@ -4144,18 +4136,11 @@ static int __io_splice_prep(struct io_ki if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL)) return -EINVAL;
- sp->file_in = NULL; sp->len = READ_ONCE(sqe->len); sp->flags = READ_ONCE(sqe->splice_flags); - if (unlikely(sp->flags & ~valid_flags)) return -EINVAL; - - sp->file_in = io_file_get(req->ctx, req, READ_ONCE(sqe->splice_fd_in), - (sp->flags & SPLICE_F_FD_IN_FIXED)); - if (!sp->file_in) - return -EBADF; - req->flags |= REQ_F_NEED_CLEANUP; + sp->splice_fd_in = READ_ONCE(sqe->splice_fd_in); return 0; }
@@ -4170,20 +4155,27 @@ static int io_tee_prep(struct io_kiocb * static int io_tee(struct io_kiocb *req, unsigned int issue_flags) { struct io_splice *sp = &req->splice; - struct file *in = sp->file_in; struct file *out = sp->file_out; unsigned int flags = sp->flags & ~SPLICE_F_FD_IN_FIXED; + struct file *in; long ret = 0;
if (issue_flags & IO_URING_F_NONBLOCK) return -EAGAIN; + + in = io_file_get(req->ctx, req, sp->splice_fd_in, + (sp->flags & SPLICE_F_FD_IN_FIXED)); + if (!in) { + ret = -EBADF; + goto done; + } + if (sp->len) ret = do_tee(in, out, sp->len, flags);
if (!(sp->flags & SPLICE_F_FD_IN_FIXED)) io_put_file(in); - req->flags &= ~REQ_F_NEED_CLEANUP; - +done: if (ret != sp->len) req_set_fail(req); io_req_complete(req, ret); @@ -4202,15 +4194,22 @@ static int io_splice_prep(struct io_kioc static int io_splice(struct io_kiocb *req, unsigned int issue_flags) { struct io_splice *sp = &req->splice; - struct file *in = sp->file_in; struct file *out = sp->file_out; unsigned int flags = sp->flags & ~SPLICE_F_FD_IN_FIXED; loff_t *poff_in, *poff_out; + struct file *in; long ret = 0;
if (issue_flags & IO_URING_F_NONBLOCK) return -EAGAIN;
+ in = io_file_get(req->ctx, req, sp->splice_fd_in, + (sp->flags & SPLICE_F_FD_IN_FIXED)); + if (!in) { + ret = -EBADF; + goto done; + } + poff_in = (sp->off_in == -1) ? NULL : &sp->off_in; poff_out = (sp->off_out == -1) ? NULL : &sp->off_out;
@@ -4219,8 +4218,7 @@ static int io_splice(struct io_kiocb *re
if (!(sp->flags & SPLICE_F_FD_IN_FIXED)) io_put_file(in); - req->flags &= ~REQ_F_NEED_CLEANUP; - +done: if (ret != sp->len) req_set_fail(req); io_req_complete(req, ret); @@ -6686,11 +6684,6 @@ static void io_clean_op(struct io_kiocb kfree(io->free_iov); break; } - case IORING_OP_SPLICE: - case IORING_OP_TEE: - if (!(req->splice.flags & SPLICE_F_FD_IN_FIXED)) - io_put_file(req->splice.file_in); - break; case IORING_OP_OPENAT: case IORING_OP_OPENAT2: if (req->open.filename)
From: Eugene Syromiatnikov esyr@redhat.com
commit 0f5e4b83b37a96e3643951588ed7176b9b187c0a upstream.
Similarly to the way it is done im mbind syscall.
Cc: stable@vger.kernel.org # 5.14 Fixes: fe76421d1da1dcdb ("io_uring: allow user configurable IO thread CPU affinity") Signed-off-by: Eugene Syromiatnikov esyr@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -10860,7 +10860,15 @@ static __cold int io_register_iowq_aff(s if (len > cpumask_size()) len = cpumask_size();
- if (copy_from_user(new_mask, arg, len)) { + if (in_compat_syscall()) { + ret = compat_get_bitmap(cpumask_bits(new_mask), + (const compat_ulong_t __user *)arg, + len * 8 /* CHAR_BIT */); + } else { + ret = copy_from_user(new_mask, arg, len); + } + + if (ret) { free_cpumask_var(new_mask); return -EFAULT; }
From: Jens Axboe axboe@kernel.dk
commit e677edbcabee849bfdd43f1602bccbecf736a646 upstream.
io_flush_timeouts() assumes the timeout isn't in progress of triggering or being removed/canceled, so it unconditionally removes it from the timeout list and attempts to cancel it.
Leave it on the list and let the normal timeout cancelation take care of it.
Cc: stable@vger.kernel.org # 5.5+ Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1644,12 +1644,11 @@ static __cold void io_flush_timeouts(str __must_hold(&ctx->completion_lock) { u32 seq = ctx->cached_cq_tail - atomic_read(&ctx->cq_timeouts); + struct io_kiocb *req, *tmp;
spin_lock_irq(&ctx->timeout_lock); - while (!list_empty(&ctx->timeout_list)) { + list_for_each_entry_safe(req, tmp, &ctx->timeout_list, timeout.list) { u32 events_needed, events_got; - struct io_kiocb *req = list_first_entry(&ctx->timeout_list, - struct io_kiocb, timeout.list);
if (io_is_timeout_noseq(req)) break; @@ -1666,7 +1665,6 @@ static __cold void io_flush_timeouts(str if (events_got < events_needed) break;
- list_del_init(&req->timeout.list); io_kill_timeout(req, 0); } ctx->cq_last_tm_flush = seq; @@ -6276,6 +6274,7 @@ static int io_timeout_prep(struct io_kio if (data->ts.tv_sec < 0 || data->ts.tv_nsec < 0) return -EINVAL;
+ INIT_LIST_HEAD(&req->timeout.list); data->mode = io_translate_timeout_mode(flags); hrtimer_init(&data->timer, io_timeout_get_clock(data), data->mode);
From: Pawan Gupta pawan.kumar.gupta@linux.intel.com
commit 73924ec4d560257004d5b5116b22a3647661e364 upstream.
The mechanism to save/restore MSRs during S3 suspend/resume checks for the MSR validity during suspend, and only restores the MSR if its a valid MSR. This is not optimal, as an invalid MSR will unnecessarily throw an exception for every suspend cycle. The more invalid MSRs, higher the impact will be.
Check and save the MSR validity at setup. This ensures that only valid MSRs that are guaranteed to not throw an exception will be attempted during suspend.
Fixes: 7a9c2dd08ead ("x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume") Suggested-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com Reviewed-by: Dave Hansen dave.hansen@linux.intel.com Acked-by: Borislav Petkov bp@suse.de Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/power/cpu.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -40,7 +40,8 @@ static void msr_save_context(struct save struct saved_msr *end = msr + ctxt->saved_msrs.num;
while (msr < end) { - msr->valid = !rdmsrl_safe(msr->info.msr_no, &msr->info.reg.q); + if (msr->valid) + rdmsrl(msr->info.msr_no, msr->info.reg.q); msr++; } } @@ -424,8 +425,10 @@ static int msr_build_context(const u32 * }
for (i = saved_msrs->num, j = 0; i < total_num; i++, j++) { + u64 dummy; + msr_array[i].info.msr_no = msr_id[j]; - msr_array[i].valid = false; + msr_array[i].valid = !rdmsrl_safe(msr_id[j], &dummy); msr_array[i].info.reg.q = 0; } saved_msrs->num = total_num;
From: Pawan Gupta pawan.kumar.gupta@linux.intel.com
commit e2a1256b17b16f9b9adf1b6fea56819e7b68e463 upstream.
After resuming from suspend-to-RAM, the MSRs that control CPU's speculative execution behavior are not being restored on the boot CPU.
These MSRs are used to mitigate speculative execution vulnerabilities. Not restoring them correctly may leave the CPU vulnerable. Secondary CPU's MSRs are correctly being restored at S3 resume by identify_secondary_cpu().
During S3 resume, restore these MSRs for boot CPU when restoring its processor state.
Fixes: 772439717dbf ("x86/bugs/intel: Set proper CPU features and setup RDS") Reported-by: Neelima Krishnan neelima.krishnan@intel.com Signed-off-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com Tested-by: Neelima Krishnan neelima.krishnan@intel.com Acked-by: Borislav Petkov bp@suse.de Reviewed-by: Dave Hansen dave.hansen@linux.intel.com Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/power/cpu.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
--- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -503,10 +503,24 @@ static int pm_cpu_check(const struct x86 return ret; }
+static void pm_save_spec_msr(void) +{ + u32 spec_msr_id[] = { + MSR_IA32_SPEC_CTRL, + MSR_IA32_TSX_CTRL, + MSR_TSX_FORCE_ABORT, + MSR_IA32_MCU_OPT_CTRL, + MSR_AMD64_LS_CFG, + }; + + msr_build_context(spec_msr_id, ARRAY_SIZE(spec_msr_id)); +} + static int pm_check_save_msr(void) { dmi_check_system(msr_save_dmi_table); pm_cpu_check(msr_save_cpu_table); + pm_save_spec_msr();
return 0; }
From: Kan Liang kan.liang@linux.intel.com
commit e590928de7547454469693da9bc7ffd562e54b7e upstream.
On Sapphire Rapids, the FRONTEND_RETIRED.MS_FLOWS event requires the FRONTEND MSR value 0x8. However, the current FRONTEND MSR mask doesn't support it.
Update intel_spr_extra_regs[] to support it.
Fixes: 61b985e3e775 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids") Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1648482543-14923-2-git-send-email-kan.liang@linux.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/events/intel/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -281,7 +281,7 @@ static struct extra_reg intel_spr_extra_ INTEL_UEVENT_EXTRA_REG(0x012a, MSR_OFFCORE_RSP_0, 0x3fffffffffull, RSP_0), INTEL_UEVENT_EXTRA_REG(0x012b, MSR_OFFCORE_RSP_1, 0x3fffffffffull, RSP_1), INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x01cd), - INTEL_UEVENT_EXTRA_REG(0x01c6, MSR_PEBS_FRONTEND, 0x7fff17, FE), + INTEL_UEVENT_EXTRA_REG(0x01c6, MSR_PEBS_FRONTEND, 0x7fff1f, FE), INTEL_UEVENT_EXTRA_REG(0x40ad, MSR_PEBS_FRONTEND, 0x7, FE), INTEL_UEVENT_EXTRA_REG(0x04c2, MSR_PEBS_FRONTEND, 0x8, FE), EVENT_EXTRA_END
From: Ethan Lien ethanlien@synology.com
commit b642b52d0b50f4d398cb4293f64992d0eed2e2ce upstream.
We use extent_changeset->bytes_changed in qgroup_reserve_data() to record how many bytes we set for EXTENT_QGROUP_RESERVED state. Currently the bytes_changed is set as "unsigned int", and it will overflow if we try to fallocate a range larger than 4GiB. The result is we reserve less bytes and eventually break the qgroup limit.
Unlike regular buffered/direct write, which we use one changeset for each ordered extent, which can never be larger than 256M. For fallocate, we use one changeset for the whole range, thus it no longer respects the 256M per extent limit, and caused the problem.
The following example test script reproduces the problem:
$ cat qgroup-overflow.sh #!/bin/bash
DEV=/dev/sdj MNT=/mnt/sdj
mkfs.btrfs -f $DEV mount $DEV $MNT
# Set qgroup limit to 2GiB. btrfs quota enable $MNT btrfs qgroup limit 2G $MNT
# Try to fallocate a 3GiB file. This should fail. echo echo "Try to fallocate a 3GiB file..." fallocate -l 3G $MNT/3G.file
# Try to fallocate a 5GiB file. echo echo "Try to fallocate a 5GiB file..." fallocate -l 5G $MNT/5G.file
# See we break the qgroup limit. echo sync btrfs qgroup show -r $MNT
umount $MNT
When running the test:
$ ./qgroup-overflow.sh (...)
Try to fallocate a 3GiB file... fallocate: fallocate failed: Disk quota exceeded
Try to fallocate a 5GiB file...
qgroupid rfer excl max_rfer -------- ---- ---- -------- 0/5 5.00GiB 5.00GiB 2.00GiB
Since we have no control of how bytes_changed is used, it's better to set it to u64.
CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Ethan Lien ethanlien@synology.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/extent_io.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -118,7 +118,7 @@ struct btrfs_bio_ctrl { */ struct extent_changeset { /* How many bytes are set/cleared in this operation */ - unsigned int bytes_changed; + u64 bytes_changed;
/* Changed ranges */ struct ulist range_changed;
From: Johannes Thumshirn johannes.thumshirn@wdc.com
commit 0b9e66762aa0cda2a9c2d5542d64e04dac528fa6 upstream.
btrfs_can_activate_zone() can be called with the device_list_mutex already held, which will lead to a deadlock:
insert_dev_extents() // Takes device_list_mutex `-> insert_dev_extent() `-> btrfs_insert_empty_item() `-> btrfs_insert_empty_items() `-> btrfs_search_slot() `-> btrfs_cow_block() `-> __btrfs_cow_block() `-> btrfs_alloc_tree_block() `-> btrfs_reserve_extent() `-> find_free_extent() `-> find_free_extent_update_loop() `-> can_allocate_chunk() `-> btrfs_can_activate_zone() // Takes device_list_mutex again
Instead of using the RCU on fs_devices->device_list we can use fs_devices->alloc_list, protected by the chunk_mutex to traverse the list of active devices.
We are in the chunk allocation thread. The newer chunk allocation happens from the devices in the fs_device->alloc_list protected by the chunk_mutex.
btrfs_create_chunk() lockdep_assert_held(&info->chunk_mutex); gather_device_info list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list)
Also, a device that reappears after the mount won't join the alloc_list yet and, it will be in the dev_list, which we don't want to consider in the context of the chunk alloc.
[15.166572] WARNING: possible recursive locking detected [15.167117] 5.17.0-rc6-dennis #79 Not tainted [15.167487] -------------------------------------------- [15.167733] kworker/u8:3/146 is trying to acquire lock: [15.167733] ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: find_free_extent+0x15a/0x14f0 [btrfs] [15.167733] [15.167733] but task is already holding lock: [15.167733] ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_create_pending_block_groups+0x20a/0x560 [btrfs] [15.167733] [15.167733] other info that might help us debug this: [15.167733] Possible unsafe locking scenario: [15.167733] [15.171834] CPU0 [15.171834] ---- [15.171834] lock(&fs_devs->device_list_mutex); [15.171834] lock(&fs_devs->device_list_mutex); [15.171834] [15.171834] *** DEADLOCK *** [15.171834] [15.171834] May be due to missing lock nesting notation [15.171834] [15.171834] 5 locks held by kworker/u8:3/146: [15.171834] #0: ffff888100050938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1c3/0x5a0 [15.171834] #1: ffffc9000067be80 ((work_completion)(&fs_info->async_data_reclaim_work)){+.+.}-{0:0}, at: process_one_work+0x1c3/0x5a0 [15.176244] #2: ffff88810521e620 (sb_internal){.+.+}-{0:0}, at: flush_space+0x335/0x600 [btrfs] [15.176244] #3: ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_create_pending_block_groups+0x20a/0x560 [btrfs] [15.176244] #4: ffff8881152e4b78 (btrfs-dev-00){++++}-{3:3}, at: __btrfs_tree_lock+0x27/0x130 [btrfs] [15.179641] [15.179641] stack backtrace: [15.179641] CPU: 1 PID: 146 Comm: kworker/u8:3 Not tainted 5.17.0-rc6-dennis #79 [15.179641] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014 [15.179641] Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs] [15.179641] Call Trace: [15.179641] <TASK> [15.179641] dump_stack_lvl+0x45/0x59 [15.179641] __lock_acquire.cold+0x217/0x2b2 [15.179641] lock_acquire+0xbf/0x2b0 [15.183838] ? find_free_extent+0x15a/0x14f0 [btrfs] [15.183838] __mutex_lock+0x8e/0x970 [15.183838] ? find_free_extent+0x15a/0x14f0 [btrfs] [15.183838] ? find_free_extent+0x15a/0x14f0 [btrfs] [15.183838] ? lock_is_held_type+0xd7/0x130 [15.183838] ? find_free_extent+0x15a/0x14f0 [btrfs] [15.183838] find_free_extent+0x15a/0x14f0 [btrfs] [15.183838] ? _raw_spin_unlock+0x24/0x40 [15.183838] ? btrfs_get_alloc_profile+0x106/0x230 [btrfs] [15.187601] btrfs_reserve_extent+0x131/0x260 [btrfs] [15.187601] btrfs_alloc_tree_block+0xb5/0x3b0 [btrfs] [15.187601] __btrfs_cow_block+0x138/0x600 [btrfs] [15.187601] btrfs_cow_block+0x10f/0x230 [btrfs] [15.187601] btrfs_search_slot+0x55f/0xbc0 [btrfs] [15.187601] ? lock_is_held_type+0xd7/0x130 [15.187601] btrfs_insert_empty_items+0x2d/0x60 [btrfs] [15.187601] btrfs_create_pending_block_groups+0x2b3/0x560 [btrfs] [15.187601] __btrfs_end_transaction+0x36/0x2a0 [btrfs] [15.192037] flush_space+0x374/0x600 [btrfs] [15.192037] ? find_held_lock+0x2b/0x80 [15.192037] ? btrfs_async_reclaim_data_space+0x49/0x180 [btrfs] [15.192037] ? lock_release+0x131/0x2b0 [15.192037] btrfs_async_reclaim_data_space+0x70/0x180 [btrfs] [15.192037] process_one_work+0x24c/0x5a0 [15.192037] worker_thread+0x4a/0x3d0
Fixes: a85f05e59bc1 ("btrfs: zoned: avoid chunk allocation if active block group has enough space") CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Anand Jain anand.jain@oracle.com Signed-off-by: Johannes Thumshirn johannes.thumshirn@wdc.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/zoned.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1927,18 +1927,19 @@ int btrfs_zone_finish(struct btrfs_block
bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags) { + struct btrfs_fs_info *fs_info = fs_devices->fs_info; struct btrfs_device *device; bool ret = false;
- if (!btrfs_is_zoned(fs_devices->fs_info)) + if (!btrfs_is_zoned(fs_info)) return true;
/* Non-single profiles are not supported yet */ ASSERT((flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0);
/* Check if there is a device with active zones left */ - mutex_lock(&fs_devices->device_list_mutex); - list_for_each_entry(device, &fs_devices->devices, dev_list) { + mutex_lock(&fs_info->chunk_mutex); + list_for_each_entry(device, &fs_devices->alloc_list, dev_alloc_list) { struct btrfs_zoned_device_info *zinfo = device->zone_info;
if (!device->bdev) @@ -1950,7 +1951,7 @@ bool btrfs_can_activate_zone(struct btrf break; } } - mutex_unlock(&fs_devices->device_list_mutex); + mutex_unlock(&fs_info->chunk_mutex);
return ret; }
From: Qu Wenruo wqu@suse.com
commit bbac58698a55cc0a6f0c0d69a6dcd3f9f3134c11 upstream.
[BUG] There is a report that a btrfs has a bad super block num devices.
This makes btrfs to reject the fs completely.
BTRFS error (device sdd3): super_num_devices 3 mismatch with num_devices 2 found here BTRFS error (device sdd3): failed to read chunk tree: -22 BTRFS error (device sdd3): open_ctree failed
[CAUSE] During btrfs device removal, chunk tree and super block num devs are updated in two different transactions:
btrfs_rm_device() |- btrfs_rm_dev_item(device) | |- trans = btrfs_start_transaction() | | Now we got transaction X | | | |- btrfs_del_item() | | Now device item is removed from chunk tree | | | |- btrfs_commit_transaction() | Transaction X got committed, super num devs untouched, | but device item removed from chunk tree. | (AKA, super num devs is already incorrect) | |- cur_devices->num_devices--; |- cur_devices->total_devices--; |- btrfs_set_super_num_devices() All those operations are not in transaction X, thus it will only be written back to disk in next transaction.
So after the transaction X in btrfs_rm_dev_item() committed, but before transaction X+1 (which can be minutes away), a power loss happen, then we got the super num mismatch.
[FIX] Instead of starting and committing a transaction inside btrfs_rm_dev_item(), start a transaction in side btrfs_rm_device() and pass it to btrfs_rm_dev_item().
And only commit the transaction after everything is done.
Reported-by: Luca Béla Palkovics luca.bela.palkovics@gmail.com Link: https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CK... CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Anand Jain anand.jain@oracle.com Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/volumes.c | 65 ++++++++++++++++++++++------------------------------- 1 file changed, 28 insertions(+), 37 deletions(-)
--- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1945,23 +1945,18 @@ static void update_dev_time(const char * path_put(&path); }
-static int btrfs_rm_dev_item(struct btrfs_device *device) +static int btrfs_rm_dev_item(struct btrfs_trans_handle *trans, + struct btrfs_device *device) { struct btrfs_root *root = device->fs_info->chunk_root; int ret; struct btrfs_path *path; struct btrfs_key key; - struct btrfs_trans_handle *trans;
path = btrfs_alloc_path(); if (!path) return -ENOMEM;
- trans = btrfs_start_transaction(root, 0); - if (IS_ERR(trans)) { - btrfs_free_path(path); - return PTR_ERR(trans); - } key.objectid = BTRFS_DEV_ITEMS_OBJECTID; key.type = BTRFS_DEV_ITEM_KEY; key.offset = device->devid; @@ -1972,21 +1967,12 @@ static int btrfs_rm_dev_item(struct btrf if (ret) { if (ret > 0) ret = -ENOENT; - btrfs_abort_transaction(trans, ret); - btrfs_end_transaction(trans); goto out; }
ret = btrfs_del_item(trans, root, path); - if (ret) { - btrfs_abort_transaction(trans, ret); - btrfs_end_transaction(trans); - } - out: btrfs_free_path(path); - if (!ret) - ret = btrfs_commit_transaction(trans); return ret; }
@@ -2127,6 +2113,7 @@ int btrfs_rm_device(struct btrfs_fs_info struct btrfs_dev_lookup_args *args, struct block_device **bdev, fmode_t *mode) { + struct btrfs_trans_handle *trans; struct btrfs_device *device; struct btrfs_fs_devices *cur_devices; struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; @@ -2142,7 +2129,7 @@ int btrfs_rm_device(struct btrfs_fs_info
ret = btrfs_check_raid_min_devices(fs_info, num_devices - 1); if (ret) - goto out; + return ret;
device = btrfs_find_device(fs_info->fs_devices, args); if (!device) { @@ -2150,27 +2137,22 @@ int btrfs_rm_device(struct btrfs_fs_info ret = BTRFS_ERROR_DEV_MISSING_NOT_FOUND; else ret = -ENOENT; - goto out; + return ret; }
if (btrfs_pinned_by_swapfile(fs_info, device)) { btrfs_warn_in_rcu(fs_info, "cannot remove device %s (devid %llu) due to active swapfile", rcu_str_deref(device->name), device->devid); - ret = -ETXTBSY; - goto out; + return -ETXTBSY; }
- if (test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) { - ret = BTRFS_ERROR_DEV_TGT_REPLACE; - goto out; - } + if (test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) + return BTRFS_ERROR_DEV_TGT_REPLACE;
if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) && - fs_info->fs_devices->rw_devices == 1) { - ret = BTRFS_ERROR_DEV_ONLY_WRITABLE; - goto out; - } + fs_info->fs_devices->rw_devices == 1) + return BTRFS_ERROR_DEV_ONLY_WRITABLE;
if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) { mutex_lock(&fs_info->chunk_mutex); @@ -2183,14 +2165,22 @@ int btrfs_rm_device(struct btrfs_fs_info if (ret) goto error_undo;
- /* - * TODO: the superblock still includes this device in its num_devices - * counter although write_all_supers() is not locked out. This - * could give a filesystem state which requires a degraded mount. - */ - ret = btrfs_rm_dev_item(device); - if (ret) + trans = btrfs_start_transaction(fs_info->chunk_root, 0); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); goto error_undo; + } + + ret = btrfs_rm_dev_item(trans, device); + if (ret) { + /* Any error in dev item removal is critical */ + btrfs_crit(fs_info, + "failed to remove device item for devid %llu: %d", + device->devid, ret); + btrfs_abort_transaction(trans, ret); + btrfs_end_transaction(trans); + return ret; + }
clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state); btrfs_scrub_cancel_dev(device); @@ -2273,7 +2263,8 @@ int btrfs_rm_device(struct btrfs_fs_info free_fs_devices(cur_devices); }
-out: + ret = btrfs_commit_transaction(trans); + return ret;
error_undo: @@ -2284,7 +2275,7 @@ error_undo: device->fs_devices->rw_devices++; mutex_unlock(&fs_info->chunk_mutex); } - goto out; + return ret; }
void btrfs_rm_dev_replace_remove_srcdev(struct btrfs_device *srcdev)
From: Qu Wenruo wqu@suse.com
commit 75a36a7d3ea904cef2e5b56af0c58cc60dcf947a upstream.
[BUG] There is a report that autodefrag is defragging single sector, which is completely waste of IO, and no help for defragging:
btrfs-cleaner-808 defrag_one_locked_range: root=256 ino=651122 start=0 len=4096
[CAUSE] In defrag_collect_targets(), we check if the current range (A) can be merged with next one (B).
If mergeable, we will add range A into target for defrag.
However there is a catch for autodefrag, when checking mergeability against range B, we intentionally pass 0 as @newer_than, hoping to get a higher chance to merge with the next extent.
But in the next iteration, range B will looked up by defrag_lookup_extent(), with non-zero @newer_than.
And if range B is not really newer, it will rejected directly, causing only range A being defragged, while we expect to defrag both range A and B.
[FIX] Since the root cause is the difference in check condition of defrag_check_next_extent() and defrag_collect_targets(), we fix it by:
1. Pass @newer_than to defrag_check_next_extent() 2. Pass @extent_thresh to defrag_check_next_extent()
This makes the check between defrag_collect_targets() and defrag_check_next_extent() more consistent.
While there is still some minor difference, the remaining checks are focus on runtime flags like writeback/delalloc, which are mostly transient and safe to be checked only in defrag_collect_targets().
Link: https://github.com/btrfs/linux/issues/423#issuecomment-1066981856 CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/ioctl.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-)
--- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1215,7 +1215,7 @@ static u32 get_extent_max_capacity(const }
static bool defrag_check_next_extent(struct inode *inode, struct extent_map *em, - bool locked) + u32 extent_thresh, u64 newer_than, bool locked) { struct extent_map *next; bool ret = false; @@ -1225,11 +1225,12 @@ static bool defrag_check_next_extent(str return false;
/* - * We want to check if the next extent can be merged with the current - * one, which can be an extent created in a past generation, so we pass - * a minimum generation of 0 to defrag_lookup_extent(). + * Here we need to pass @newer_then when checking the next extent, or + * we will hit a case we mark current extent for defrag, but the next + * one will not be a target. + * This will just cause extra IO without really reducing the fragments. */ - next = defrag_lookup_extent(inode, em->start + em->len, 0, locked); + next = defrag_lookup_extent(inode, em->start + em->len, newer_than, locked); /* No more em or hole */ if (!next || next->block_start >= EXTENT_MAP_LAST_BYTE) goto out; @@ -1241,6 +1242,13 @@ static bool defrag_check_next_extent(str */ if (next->len >= get_extent_max_capacity(em)) goto out; + /* Skip older extent */ + if (next->generation < newer_than) + goto out; + /* Also check extent size */ + if (next->len >= extent_thresh) + goto out; + ret = true; out: free_extent_map(next); @@ -1446,7 +1454,7 @@ static int defrag_collect_targets(struct goto next;
next_mergeable = defrag_check_next_extent(&inode->vfs_inode, em, - locked); + extent_thresh, newer_than, locked); if (!next_mergeable) { struct defrag_target_range *last;
From: Kaiwen Hu kevinhu@synology.com
commit 60021bd754c6ca0addc6817994f20290a321d8d6 upstream.
A subvolume with an active swapfile must not be deleted otherwise it would not be possible to deactivate it.
After the subvolume is deleted, we cannot swapoff the swapfile in this deleted subvolume because the path is unreachable. The swapfile is still active and holding references, the filesystem cannot be unmounted.
The test looks like this:
mkfs.btrfs -f $dev > /dev/null mount $dev $mnt
btrfs sub create $mnt/subvol touch $mnt/subvol/swapfile chmod 600 $mnt/subvol/swapfile chattr +C $mnt/subvol/swapfile dd if=/dev/zero of=$mnt/subvol/swapfile bs=1K count=4096 mkswap $mnt/subvol/swapfile swapon $mnt/subvol/swapfile
btrfs sub delete $mnt/subvol swapoff $mnt/subvol/swapfile # failed: No such file or directory swapoff --all
unmount $mnt # target is busy.
To prevent above issue, we simply check that whether the subvolume contains any active swapfile, and stop the deleting process. This behavior is like snapshot ioctl dealing with a swapfile.
CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Robbie Ko robbieko@synology.com Reviewed-by: Qu Wenruo wqu@suse.com Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Kaiwen Hu kevinhu@synology.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/inode.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-)
--- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4466,6 +4466,13 @@ int btrfs_delete_subvolume(struct inode dest->root_key.objectid); return -EPERM; } + if (atomic_read(&dest->nr_swapfiles)) { + spin_unlock(&dest->root_item_lock); + btrfs_warn(fs_info, + "attempt to delete subvolume %llu with active swapfile", + root->root_key.objectid); + return -EPERM; + } root_flags = btrfs_root_flags(&dest->root_item); btrfs_set_root_flags(&dest->root_item, root_flags | BTRFS_ROOT_SUBVOL_DEAD); @@ -10424,8 +10431,23 @@ static int btrfs_swap_activate(struct sw * set. We use this counter to prevent snapshots. We must increment it * before walking the extents because we don't want a concurrent * snapshot to run after we've already checked the extents. - */ + * + * It is possible that subvolume is marked for deletion but still not + * removed yet. To prevent this race, we check the root status before + * activating the swapfile. + */ + spin_lock(&root->root_item_lock); + if (btrfs_root_dead(root)) { + spin_unlock(&root->root_item_lock); + + btrfs_exclop_finish(fs_info); + btrfs_warn(fs_info, + "cannot activate swapfile because subvolume %llu is being deleted", + root->root_key.objectid); + return -EPERM; + } atomic_inc(&root->nr_swapfiles); + spin_unlock(&root->root_item_lock);
isize = ALIGN_DOWN(inode->i_size, fs_info->sectorsize);
From: Vinod Koul vkoul@kernel.org
commit 409543cec01a84610029d6440c480c3fdd7214fb upstream.
Commit b470e10eb43f ("spi: core: add dma_map_dev for dma device") added dma_map_dev for _spi_map_msg() but missed to add for unmap routine, __spi_unmap_msg(), so add it now.
Fixes: b470e10eb43f ("spi: core: add dma_map_dev for dma device") Cc: stable@vger.kernel.org # v5.14+ Signed-off-by: Vinod Koul vkoul@kernel.org Link: https://lore.kernel.org/r/20220406132238.1029249-1-vkoul@kernel.org Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/spi/spi.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -1149,11 +1149,15 @@ static int __spi_unmap_msg(struct spi_co
if (ctlr->dma_tx) tx_dev = ctlr->dma_tx->device->dev; + else if (ctlr->dma_map_dev) + tx_dev = ctlr->dma_map_dev; else tx_dev = ctlr->dev.parent;
if (ctlr->dma_rx) rx_dev = ctlr->dma_rx->device->dev; + else if (ctlr->dma_map_dev) + rx_dev = ctlr->dma_map_dev; else rx_dev = ctlr->dev.parent;
From: Paulo Alcantara pc@cjr.nz
commit fb39d30e227233498c8debe6a9fe3e7cf575c85f upstream.
Do not reuse existing sessions and tcons in DFS failover as it might connect to different servers and shares.
Signed-off-by: Paulo Alcantara (SUSE) pc@cjr.nz Cc: stable@vger.kernel.org Reviewed-by: Enzo Matsumiya ematsumiya@suse.de Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/cifs/connect.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
--- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -453,9 +453,7 @@ static int reconnect_target_unlocked(str return rc; }
-static int -reconnect_dfs_server(struct TCP_Server_Info *server, - bool mark_smb_session) +static int reconnect_dfs_server(struct TCP_Server_Info *server) { int rc = 0; const char *refpath = server->current_fullpath + 1; @@ -479,7 +477,12 @@ reconnect_dfs_server(struct TCP_Server_I if (!cifs_tcp_ses_needs_reconnect(server, num_targets)) return 0;
- cifs_mark_tcp_ses_conns_for_reconnect(server, mark_smb_session); + /* + * Unconditionally mark all sessions & tcons for reconnect as we might be connecting to a + * different server or share during failover. It could be improved by adding some logic to + * only do that in case it connects to a different server or share, though. + */ + cifs_mark_tcp_ses_conns_for_reconnect(server, true);
cifs_abort_connection(server);
@@ -537,7 +540,7 @@ int cifs_reconnect(struct TCP_Server_Inf } spin_unlock(&cifs_tcp_ses_lock);
- return reconnect_dfs_server(server, mark_smb_session); + return reconnect_dfs_server(server); } #else int cifs_reconnect(struct TCP_Server_Info *server, bool mark_smb_session)
From: Manish Chopra manishc@marvell.com
commit 20921c0c86092b4082c91bd7c88305da74e5520b upstream.
To fix a coverity complain, commit d5ac07dfbd2b ("qed: Initialize debug string array") removed "sw-platform" (one of the common global parameters) from the dump as this was used in the dump with an uninitialized string, however it did not reduce the number of common global parameters which caused the incorrect (unable to parse) register dump
this patch fixes it with reducing NUM_COMMON_GLOBAL_PARAMS bye one.
Cc: stable@vger.kernel.org Cc: Tim Gardner tim.gardner@canonical.com Cc: "David S. Miller" davem@davemloft.net Fixes: d5ac07dfbd2b ("qed: Initialize debug string array") Signed-off-by: Prabhakar Kushwaha pkushwaha@marvell.com Signed-off-by: Alok Prasad palok@marvell.com Signed-off-by: Ariel Elior aelior@marvell.com Signed-off-by: Manish Chopra manishc@marvell.com Reviewed-by: Tim Gardner tim.gardner@canonical.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/qlogic/qed/qed_debug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/qlogic/qed/qed_debug.c +++ b/drivers/net/ethernet/qlogic/qed/qed_debug.c @@ -489,7 +489,7 @@ struct split_type_defs {
#define STATIC_DEBUG_LINE_DWORDS 9
-#define NUM_COMMON_GLOBAL_PARAMS 11 +#define NUM_COMMON_GLOBAL_PARAMS 10
#define MAX_RECURSION_DEPTH 10
From: Guo Ren guoren@linux.alibaba.com
commit 31a099dbd91e69fcab55eef4be15ed7a8c984918 upstream.
These patch_text implementations are using stop_machine_cpuslocked infrastructure with atomic cpu_count. The original idea: When the master CPU patch_text, the others should wait for it. But current implementation is using the first CPU as master, which couldn't guarantee the remaining CPUs are waiting. This patch changes the last CPU as the master to solve the potential risk.
Fixes: ae16480785de ("arm64: introduce interfaces to hotpatch kernel and module code") Signed-off-by: Guo Ren guoren@linux.alibaba.com Signed-off-by: Guo Ren guoren@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Reviewed-by: Masami Hiramatsu mhiramat@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220407073323.743224-2-guoren@kernel.org Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/patching.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm64/kernel/patching.c +++ b/arch/arm64/kernel/patching.c @@ -117,8 +117,8 @@ static int __kprobes aarch64_insn_patch_ int i, ret = 0; struct aarch64_insn_patch *pp = arg;
- /* The first CPU becomes master */ - if (atomic_inc_return(&pp->cpu_count) == 1) { + /* The last CPU becomes master */ + if (atomic_inc_return(&pp->cpu_count) == num_online_cpus()) { for (i = 0; ret == 0 && i < pp->insn_cnt; i++) ret = aarch64_insn_patch_text_nosync(pp->text_addrs[i], pp->new_insns[i]);
From: Douglas Miller doug.miller@cornelisnetworks.com
commit 2bbac98d0930e8161b1957dc0ec99de39ade1b3c upstream.
Under certain conditions, such as MPI_Abort, the hfi1 cleanup code may represent the last reference held on the task mm. hfi1_mmu_rb_unregister() then drops the last reference and the mm is freed before the final use in hfi1_release_user_pages(). A new task may allocate the mm structure while it is still being used, resulting in problems. One manifestation is corruption of the mmap_sem counter leading to a hang in down_write(). Another is corruption of an mm struct that is in use by another task.
Fixes: 3d2a9d642512 ("IB/hfi1: Ensure correct mm is used at all times") Link: https://lore.kernel.org/r/20220408133523.122165.72975.stgit@awfm-01.cornelis... Cc: stable@vger.kernel.org Signed-off-by: Douglas Miller doug.miller@cornelisnetworks.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@cornelisnetworks.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/infiniband/hw/hfi1/mmu_rb.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/infiniband/hw/hfi1/mmu_rb.c +++ b/drivers/infiniband/hw/hfi1/mmu_rb.c @@ -80,6 +80,9 @@ void hfi1_mmu_rb_unregister(struct mmu_r unsigned long flags; struct list_head del_list;
+ /* Prevent freeing of mm until we are completely finished. */ + mmgrab(handler->mn.mm); + /* Unregister first so we don't get any more notifications. */ mmu_notifier_unregister(&handler->mn, handler->mn.mm);
@@ -102,6 +105,9 @@ void hfi1_mmu_rb_unregister(struct mmu_r
do_remove(handler, &del_list);
+ /* Now the mm may be freed. */ + mmdrop(handler->mn.mm); + kfree(handler); }
From: Xiaomeng Tong xiam0nd.tong@gmail.com
commit ae4d37b5df749926891583d42a6801b5da11e3c1 upstream.
The bug is here: idr_remove(&connection->peer_devices, vnr);
If the previous for_each_connection() don't exit early (no goto hit inside the loop), the iterator 'connection' after the loop will be a bogus pointer to an invalid structure object containing the HEAD (&resource->connections). As a result, the use of 'connection' above will lead to a invalid memory access (including a possible invalid free as idr_remove could call free_layer).
The original intention should have been to remove all peer_devices, but the following lines have already done the work. So just remove this line and the unneeded label, to fix this bug.
Cc: stable@vger.kernel.org Fixes: c06ece6ba6f1b ("drbd: Turn connection->volumes into connection->peer_devices") Signed-off-by: Xiaomeng Tong xiam0nd.tong@gmail.com Reviewed-by: Christoph Böhmwalder christoph.boehmwalder@linbit.com Reviewed-by: Lars Ellenberg lars.ellenberg@linbit.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/drbd/drbd_main.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
--- a/drivers/block/drbd/drbd_main.c +++ b/drivers/block/drbd/drbd_main.c @@ -2793,12 +2793,12 @@ enum drbd_ret_code drbd_create_device(st
if (init_submitter(device)) { err = ERR_NOMEM; - goto out_idr_remove_vol; + goto out_idr_remove_from_resource; }
err = add_disk(disk); if (err) - goto out_idr_remove_vol; + goto out_idr_remove_from_resource;
/* inherit the connection state */ device->state.conn = first_connection(resource)->cstate; @@ -2812,8 +2812,6 @@ enum drbd_ret_code drbd_create_device(st drbd_debugfs_device_add(device); return NO_ERROR;
-out_idr_remove_vol: - idr_remove(&connection->peer_devices, vnr); out_idr_remove_from_resource: for_each_connection(connection, resource) { peer_device = idr_remove(&connection->peer_devices, vnr);
From: Shreeya Patel shreeya.patel@collabora.com
commit 5467801f1fcbdc46bc7298a84dbf3ca1ff2a7320 upstream.
GPIO chip irq members are exposed before they could be completely initialized and this leads to race conditions.
One such issue was observed for the gc->irq.domain variable which was accessed through the I2C interface in gpiochip_to_irq() before it could be initialized by gpiochip_add_irqchip(). This resulted in Kernel NULL pointer dereference.
Following are the logs for reference :-
kernel: Call Trace: kernel: gpiod_to_irq+0x53/0x70 kernel: acpi_dev_gpio_irq_get_by+0x113/0x1f0 kernel: i2c_acpi_get_irq+0xc0/0xd0 kernel: i2c_device_probe+0x28a/0x2a0 kernel: really_probe+0xf2/0x460 kernel: RIP: 0010:gpiochip_to_irq+0x47/0xc0
To avoid such scenarios, restrict usage of GPIO chip irq members before they are completely initialized.
Signed-off-by: Shreeya Patel shreeya.patel@collabora.com Cc: stable@vger.kernel.org Reviewed-by: Andy Shevchenko andy.shevchenko@gmail.com Reviewed-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Bartosz Golaszewski brgl@bgdev.pl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpiolib.c | 19 +++++++++++++++++++ include/linux/gpio/driver.h | 9 +++++++++ 2 files changed, 28 insertions(+)
--- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -1404,6 +1404,16 @@ static int gpiochip_to_irq(struct gpio_c { struct irq_domain *domain = gc->irq.domain;
+#ifdef CONFIG_GPIOLIB_IRQCHIP + /* + * Avoid race condition with other code, which tries to lookup + * an IRQ before the irqchip has been properly registered, + * i.e. while gpiochip is still being brought up. + */ + if (!gc->irq.initialized) + return -EPROBE_DEFER; +#endif + if (!gpiochip_irqchip_irq_valid(gc, offset)) return -ENXIO;
@@ -1593,6 +1603,15 @@ static int gpiochip_add_irqchip(struct g
acpi_gpiochip_request_interrupts(gc);
+ /* + * Using barrier() here to prevent compiler from reordering + * gc->irq.initialized before initialization of above + * GPIO chip irq members. + */ + barrier(); + + gc->irq.initialized = true; + return 0; }
--- a/include/linux/gpio/driver.h +++ b/include/linux/gpio/driver.h @@ -219,6 +219,15 @@ struct gpio_irq_chip { bool per_parent_data;
/** + * @initialized: + * + * Flag to track GPIO chip irq member's initialization. + * This flag will make sure GPIO chip irq members are not used + * before they are initialized. + */ + bool initialized; + + /** * @init_hw: optional routine to initialize hardware before * an IRQ chip will be added. This is quite useful when * a particular driver wants to clear IRQ related registers
From: Reto Buerki reet@codelabs.ch
commit 59b18a1e65b7a2134814106d0860010e10babe18 upstream.
The x86 MSI message data is 32 bits in total and is either in compatibility or remappable format, see Intel Virtualization Technology for Directed I/O, section 5.1.2.
Fixes: 6285aa50736 ("x86/msi: Provide msi message shadow structs") Co-developed-by: Adrian-Ken Rueegsegger ken@codelabs.ch Signed-off-by: Adrian-Ken Rueegsegger ken@codelabs.ch Signed-off-by: Reto Buerki reet@codelabs.ch Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220407110647.67372-1-reet@codelabs.ch Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/msi.h | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-)
--- a/arch/x86/include/asm/msi.h +++ b/arch/x86/include/asm/msi.h @@ -12,14 +12,17 @@ int pci_msi_prepare(struct irq_domain *d /* Structs and defines for the X86 specific MSI message format */
typedef struct x86_msi_data { - u32 vector : 8, - delivery_mode : 3, - dest_mode_logical : 1, - reserved : 2, - active_low : 1, - is_level : 1; - - u32 dmar_subhandle; + union { + struct { + u32 vector : 8, + delivery_mode : 3, + dest_mode_logical : 1, + reserved : 2, + active_low : 1, + is_level : 1; + }; + u32 dmar_subhandle; + }; } __attribute__ ((packed)) arch_msi_msg_data_t; #define arch_msi_msg_data x86_msi_data
From: Dave Hansen dave.hansen@linux.intel.com
commit d39268ad24c0fd0665d0c5cf55a7c1a0ebf94766 upstream.
0day reported a regression on a microbenchmark which is intended to stress the TLB flushing path:
https://lore.kernel.org/all/20220317090415.GE735@xsang-OptiPlex-9020/
It pointed at a commit from Nadav which intended to remove retpoline overhead in the TLB flushing path by taking the 'cond'-ition in on_each_cpu_cond_mask(), pre-calculating it, and incorporating it into 'cpumask'. That allowed the code to use a bunch of earlier direct calls instead of later indirect calls that need a retpoline.
But, in practice, threads can go idle (and into lazy TLB mode where they don't need to flush their TLB) between the early and late calls. It works in this direction and not in the other because TLB-flushing threads tend to hold mmap_lock for write. Contention on that lock causes threads to _go_ idle right in this early/late window.
There was not any performance data in the original commit specific to the retpoline overhead. I did a few tests on a system with retpolines:
https://lore.kernel.org/all/dd8be93c-ded6-b962-50d4-96b1c3afb2b7@intel.com/
which showed a possible small win. But, that small win pales in comparison with the bigger loss induced on non-retpoline systems.
Revert the patch that removed the retpolines. This was not a clean revert, but it was self-contained enough not to be too painful.
Fixes: 6035152d8eeb ("x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy()") Reported-by: kernel test robot oliver.sang@intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Borislav Petkov bp@suse.de Acked-by: Nadav Amit namit@vmware.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/164874672286.389.7021457716635788197.tip-bot2@tip-... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/mm/tlb.c | 37 +++++-------------------------------- 1 file changed, 5 insertions(+), 32 deletions(-)
--- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -854,13 +854,11 @@ done: nr_invalidate); }
-static bool tlb_is_not_lazy(int cpu) +static bool tlb_is_not_lazy(int cpu, void *data) { return !per_cpu(cpu_tlbstate_shared.is_lazy, cpu); }
-static DEFINE_PER_CPU(cpumask_t, flush_tlb_mask); - DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_shared); EXPORT_PER_CPU_SYMBOL(cpu_tlbstate_shared);
@@ -889,36 +887,11 @@ STATIC_NOPV void native_flush_tlb_multi( * up on the new contents of what used to be page tables, while * doing a speculative memory access. */ - if (info->freed_tables) { + if (info->freed_tables) on_each_cpu_mask(cpumask, flush_tlb_func, (void *)info, true); - } else { - /* - * Although we could have used on_each_cpu_cond_mask(), - * open-coding it has performance advantages, as it eliminates - * the need for indirect calls or retpolines. In addition, it - * allows to use a designated cpumask for evaluating the - * condition, instead of allocating one. - * - * This code works under the assumption that there are no nested - * TLB flushes, an assumption that is already made in - * flush_tlb_mm_range(). - * - * cond_cpumask is logically a stack-local variable, but it is - * more efficient to have it off the stack and not to allocate - * it on demand. Preemption is disabled and this code is - * non-reentrant. - */ - struct cpumask *cond_cpumask = this_cpu_ptr(&flush_tlb_mask); - int cpu; - - cpumask_clear(cond_cpumask); - - for_each_cpu(cpu, cpumask) { - if (tlb_is_not_lazy(cpu)) - __cpumask_set_cpu(cpu, cond_cpumask); - } - on_each_cpu_mask(cond_cpumask, flush_tlb_func, (void *)info, true); - } + else + on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func, + (void *)info, 1, cpumask); }
void flush_tlb_multi(const struct cpumask *cpumask,
From: Kan Liang kan.liang@linux.intel.com
commit 4a263bf331c512849062805ef1b4ac40301a9829 upstream.
The INST_RETIRED.PREC_DIST event (0x0100) doesn't count on SPR. perf stat -e cpu/event=0xc0,umask=0x0/,cpu/event=0x0,umask=0x1/ -C0
Performance counter stats for 'CPU(s) 0':
607,246 cpu/event=0xc0,umask=0x0/ 0 cpu/event=0x0,umask=0x1/
The encoding for INST_RETIRED.PREC_DIST is pseudo-encoding, which doesn't work on the generic counters. However, current perf extends its mask to the generic counters.
The pseudo event-code for a fixed counter must be 0x00. Check and avoid extending the mask for the fixed counter event which using the pseudo-encoding, e.g., ref-cycles and PREC_DIST event.
With the patch, perf stat -e cpu/event=0xc0,umask=0x0/,cpu/event=0x0,umask=0x1/ -C0
Performance counter stats for 'CPU(s) 0':
583,184 cpu/event=0xc0,umask=0x0/ 583,048 cpu/event=0x0,umask=0x1/
Fixes: 2de71ee153ef ("perf/x86/intel: Fix ICL/SPR INST_RETIRED.PREC_DIST encodings") Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1648482543-14923-1-git-send-email-kan.liang@linux.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/events/intel/core.c | 6 +++++- arch/x86/include/asm/perf_event.h | 5 +++++ 2 files changed, 10 insertions(+), 1 deletion(-)
--- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -5515,7 +5515,11 @@ static void intel_pmu_check_event_constr /* Disabled fixed counters which are not in CPUID */ c->idxmsk64 &= intel_ctrl;
- if (c->idxmsk64 != INTEL_PMC_MSK_FIXED_REF_CYCLES) + /* + * Don't extend the pseudo-encoding to the + * generic counters + */ + if (!use_fixed_pseudo_encoding(c->code)) c->idxmsk64 |= (1ULL << num_counters) - 1; } c->idxmsk64 &= --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -241,6 +241,11 @@ struct x86_pmu_capability { #define INTEL_PMC_IDX_FIXED_SLOTS (INTEL_PMC_IDX_FIXED + 3) #define INTEL_PMC_MSK_FIXED_SLOTS (1ULL << INTEL_PMC_IDX_FIXED_SLOTS)
+static inline bool use_fixed_pseudo_encoding(u64 code) +{ + return !(code & 0xff); +} + /* * We model BTS tracing as another fixed-mode PMC. *
From: Christian Lamparter chunkeey@gmail.com
commit 7aa8104a554713b685db729e66511b93d989dd6a upstream.
the driver uses libata's "tag" values from in various arrays. Since the mentioned patch bumped the ATA_TAG_INTERNAL to 32, the value of the SATA_DWC_QCMD_MAX needs to account for that.
Otherwise ATA_TAG_INTERNAL usage cause similar crashes like this as reported by Tice Rex on the OpenWrt Forum and reproduced (with symbols) here:
| BUG: Kernel NULL pointer dereference at 0x00000000 | Faulting instruction address: 0xc03ed4b8 | Oops: Kernel access of bad area, sig: 11 [#1] | BE PAGE_SIZE=4K PowerPC 44x Platform | CPU: 0 PID: 362 Comm: scsi_eh_1 Not tainted 5.4.163 #0 | NIP: c03ed4b8 LR: c03d27e8 CTR: c03ed36c | REGS: cfa59950 TRAP: 0300 Not tainted (5.4.163) | MSR: 00021000 <CE,ME> CR: 42000222 XER: 00000000 | DEAR: 00000000 ESR: 00000000 | GPR00: c03d27e8 cfa59a08 cfa55fe0 00000000 0fa46bc0 [...] | [..] | NIP [c03ed4b8] sata_dwc_qc_issue+0x14c/0x254 | LR [c03d27e8] ata_qc_issue+0x1c8/0x2dc | Call Trace: | [cfa59a08] [c003f4e0] __cancel_work_timer+0x124/0x194 (unreliable) | [cfa59a78] [c03d27e8] ata_qc_issue+0x1c8/0x2dc | [cfa59a98] [c03d2b3c] ata_exec_internal_sg+0x240/0x524 | [cfa59b08] [c03d2e98] ata_exec_internal+0x78/0xe0 | [cfa59b58] [c03d30fc] ata_read_log_page.part.38+0x1dc/0x204 | [cfa59bc8] [c03d324c] ata_identify_page_supported+0x68/0x130 | [...]
This is because sata_dwc_dma_xfer_complete() NULLs the dma_pending's next neighbour "chan" (a *dma_chan struct) in this '32' case right here (line ~735):
hsdevp->dma_pending[tag] = SATA_DWC_DMA_PENDING_NONE;
Then the next time, a dma gets issued; dma_dwc_xfer_setup() passes the NULL'd hsdevp->chan to the dmaengine_slave_config() which then causes the crash.
With this patch, SATA_DWC_QCMD_MAX is now set to ATA_MAX_QUEUE + 1. This avoids the OOB. But please note, there was a worthwhile discussion on what ATA_TAG_INTERNAL and ATA_MAX_QUEUE is. And why there should not be a "fake" 33 command-long queue size.
Ideally, the dw driver should account for the ATA_TAG_INTERNAL. In Damien Le Moal's words: "... having looked at the driver, it is a bigger change than just faking a 33rd "tag" that is in fact not a command tag at all."
Fixes: 28361c403683c ("libata: add extra internal command") Cc: stable@kernel.org # 4.18+ BugLink: https://github.com/openwrt/openwrt/issues/9505 Signed-off-by: Christian Lamparter chunkeey@gmail.com Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ata/sata_dwc_460ex.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/ata/sata_dwc_460ex.c +++ b/drivers/ata/sata_dwc_460ex.c @@ -137,7 +137,11 @@ struct sata_dwc_device { #endif };
-#define SATA_DWC_QCMD_MAX 32 +/* + * Allow one extra special slot for commands and DMA management + * to account for libata internal commands. + */ +#define SATA_DWC_QCMD_MAX (ATA_MAX_QUEUE + 1)
struct sata_dwc_device_port { struct sata_dwc_device *hsdev;
From: Xiaomeng Tong xiam0nd.tong@gmail.com
commit 2012a9e279013933885983cbe0a5fe828052563b upstream.
The bug is here: return cluster;
The list iterator value 'cluster' will *always* be set and non-NULL by list_for_each_entry(), so it is incorrect to assume that the iterator value will be NULL if the list is empty or no element is found.
To fix the bug, return 'cluster' when found, otherwise return NULL.
Cc: stable@vger.kernel.org Fixes: 21bdbb7102ed ("perf: add qcom l2 cache perf events driver") Signed-off-by: Xiaomeng Tong xiam0nd.tong@gmail.com Link: https://lore.kernel.org/r/20220327055733.4070-1-xiam0nd.tong@gmail.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/perf/qcom_l2_pmu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/perf/qcom_l2_pmu.c +++ b/drivers/perf/qcom_l2_pmu.c @@ -736,7 +736,7 @@ static struct cluster_pmu *l2_cache_asso { u64 mpidr; int cpu_cluster_id; - struct cluster_pmu *cluster = NULL; + struct cluster_pmu *cluster;
/* * This assumes that the cluster_id is in MPIDR[aff1] for @@ -758,10 +758,10 @@ static struct cluster_pmu *l2_cache_asso cluster->cluster_id); cpumask_set_cpu(cpu, &cluster->cluster_cpus); *per_cpu_ptr(l2cache_pmu->pmu_cluster, cpu) = cluster; - break; + return cluster; }
- return cluster; + return NULL; }
static int l2cache_pmu_online_cpu(unsigned int cpu, struct hlist_node *node)
From: Namhyung Kim namhyung@kernel.org
commit e3265a4386428d3d157d9565bb520aabff8b4bf0 upstream.
It was reported that some perf event setup can make fork failed on ARM64. It was the case of a group of mixed hw and sw events and it failed in perf_event_init_task() due to armpmu_event_init().
The ARM PMU code checks if all the events in a group belong to the same PMU except for software events. But it didn't set the event_caps of inherited events and no longer identify them as software events. Therefore the test failed in a child process.
A simple reproducer is:
$ perf stat -e '{cycles,cs,instructions}' perf bench sched messaging # Running 'sched/messaging' benchmark: perf: fork(): Invalid argument
The perf stat was fine but the perf bench failed in fork(). Let's inherit the event caps from the parent.
Signed-off-by: Namhyung Kim namhyung@kernel.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220328200112.457740-1-namhyung@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/events/core.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -11640,6 +11640,9 @@ perf_event_alloc(struct perf_event_attr
event->state = PERF_EVENT_STATE_INACTIVE;
+ if (parent_event) + event->event_caps = parent_event->event_caps; + if (event->attr.sigtrap) atomic_set(&event->event_limit, 1);
From: Marc Zyngier maz@kernel.org
commit 0df6664531a12cdd8fc873f0cac0dcb40243d3e9 upstream.
It turns out that our polling of RWP is totally wrong when checking for it in the redistributors, as we test the *distributor* bit index, whereas it is a different bit number in the RDs... Oopsie boo.
This is embarassing. Not only because it is wrong, but also because it took *8 years* to notice the blunder...
Just fix the damn thing.
Fixes: 021f653791ad ("irqchip: gic-v3: Initial support for GICv3") Signed-off-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Reviewed-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Link: https://lore.kernel.org/r/20220315165034.794482-2-maz@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/irqchip/irq-gic-v3.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -206,11 +206,11 @@ static inline void __iomem *gic_dist_bas } }
-static void gic_do_wait_for_rwp(void __iomem *base) +static void gic_do_wait_for_rwp(void __iomem *base, u32 bit) { u32 count = 1000000; /* 1s! */
- while (readl_relaxed(base + GICD_CTLR) & GICD_CTLR_RWP) { + while (readl_relaxed(base + GICD_CTLR) & bit) { count--; if (!count) { pr_err_ratelimited("RWP timeout, gone fishing\n"); @@ -224,13 +224,13 @@ static void gic_do_wait_for_rwp(void __i /* Wait for completion of a distributor change */ static void gic_dist_wait_for_rwp(void) { - gic_do_wait_for_rwp(gic_data.dist_base); + gic_do_wait_for_rwp(gic_data.dist_base, GICD_CTLR_RWP); }
/* Wait for completion of a redistributor change */ static void gic_redist_wait_for_rwp(void) { - gic_do_wait_for_rwp(gic_data_rdist_rd_base()); + gic_do_wait_for_rwp(gic_data_rdist_rd_base(), GICR_CTLR_RWP); }
#ifdef CONFIG_ARM64
From: Thomas Zimmermann tzimmermann@suse.de
commit 0f525289ff0ddeb380813bd81e0f9bdaaa1c9078 upstream.
OF framebuffers do not have an underlying device in the Linux device hierarchy. Do a regular unregister call instead of hot unplugging such a non-existing device. Fixes a NULL dereference. An example error message on ppc64le is shown below.
BUG: Kernel NULL pointer dereference on read at 0x00000060 Faulting instruction address: 0xc00000000080dfa4 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [...] CPU: 2 PID: 139 Comm: systemd-udevd Not tainted 5.17.0-ae085d7f9365 #1 NIP: c00000000080dfa4 LR: c00000000080df9c CTR: c000000000797430 REGS: c000000004132fe0 TRAP: 0300 Not tainted (5.17.0-ae085d7f9365) MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 28228282 XER: 20000000 CFAR: c00000000000c80c DAR: 0000000000000060 DSISR: 40000000 IRQMASK: 0 GPR00: c00000000080df9c c000000004133280 c00000000169d200 0000000000000029 GPR04: 00000000ffffefff c000000004132f90 c000000004132f88 0000000000000000 GPR08: c0000000015658f8 c0000000015cd200 c0000000014f57d0 0000000048228283 GPR12: 0000000000000000 c00000003fffe300 0000000020000000 0000000000000000 GPR16: 0000000000000000 0000000113fc4a40 0000000000000005 0000000113fcfb80 GPR20: 000001000f7283b0 0000000000000000 c000000000e4a588 c000000000e4a5b0 GPR24: 0000000000000001 00000000000a0000 c008000000db0168 c0000000021f6ec0 GPR28: c0000000016d65a8 c000000004b36460 0000000000000000 c0000000016d64b0 NIP [c00000000080dfa4] do_remove_conflicting_framebuffers+0x184/0x1d0 [c000000004133280] [c00000000080df9c] do_remove_conflicting_framebuffers+0x17c/0x1d0 (unreliable) [c000000004133350] [c00000000080e4d0] remove_conflicting_framebuffers+0x60/0x150 [c0000000041333a0] [c00000000080e6f4] remove_conflicting_pci_framebuffers+0x134/0x1b0 [c000000004133450] [c008000000e70438] drm_aperture_remove_conflicting_pci_framebuffers+0x90/0x100 [drm] [c000000004133490] [c008000000da0ce4] bochs_pci_probe+0x6c/0xa64 [bochs] [...] [c000000004133db0] [c00000000002aaa0] system_call_exception+0x170/0x2d0 [c000000004133e10] [c00000000000c3cc] system_call_common+0xec/0x250
The bug [1] was introduced by commit 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal"). Most firmware framebuffers have an underlying platform device, which can be hot-unplugged before loading the native graphics driver. OF framebuffers do not (yet) have that device. Fix the code by unregistering the framebuffer as before without a hot unplug.
Tested with 5.17 on qemu ppc64le emulation.
Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Fixes: 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal") Reported-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch Reviewed-by: Javier Martinez Canillas javierm@redhat.com Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk Cc: Zack Rusin zackr@vmware.com Cc: Javier Martinez Canillas javierm@redhat.com Cc: Hans de Goede hdegoede@redhat.com Cc: stable@vger.kernel.org # v5.11+ Cc: Helge Deller deller@gmx.de Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Sam Ravnborg sam@ravnborg.org Cc: Zheyu Ma zheyuma97@gmail.com Cc: Xiyu Yang xiyuyang19@fudan.edu.cn Cc: Zhen Lei thunder.leizhen@huawei.com Cc: Matthew Wilcox willy@infradead.org Cc: Alex Deucher alexander.deucher@amd.com Cc: Tetsuo Handa penguin-kernel@i-love.sakura.ne.jp Cc: Guenter Roeck linux@roeck-us.net Cc: linux-fbdev@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Link: https://lore.kernel.org/all/YkHXO6LGHAN0p1pq@debian/ # [1] Link: https://patchwork.freedesktop.org/patch/msgid/20220404194402.29974-1-tzimmer... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/video/fbdev/core/fbmem.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/drivers/video/fbdev/core/fbmem.c +++ b/drivers/video/fbdev/core/fbmem.c @@ -1583,7 +1583,14 @@ static void do_remove_conflicting_frameb * If it's not a platform device, at least print a warning. A * fix would add code to remove the device from the system. */ - if (dev_is_platform(device)) { + if (!device) { + /* TODO: Represent each OF framebuffer as its own + * device in the device hierarchy. For now, offb + * doesn't have such a device, so unregister the + * framebuffer as before without warning. + */ + do_unregister_framebuffer(registered_fb[i]); + } else if (dev_is_platform(device)) { registered_fb[i]->forced_out = true; platform_device_unregister(to_platform_device(device)); } else {
From: Shirish S shirish.s@amd.com
commit 4052287a75eb3fc0f487fcc5f768a38bede455c8 upstream.
[Why] comparing pwm bl values (coverted) with user brightness(converted) levels in commit_tail leads to continuous setting of backlight via dmub as they don't to match. This leads overdrive in queuing of commands to DMCU that sometimes lead to depending on load on DMCU fw:
"[drm:dc_dmub_srv_wait_idle] *ERROR* Error waiting for DMUB idle: status=3"
[How] Store last successfully set backlight value and compare with it instead of pwm reads which is not what we should compare with.
Signed-off-by: Shirish S shirish.s@amd.com Reviewed-by: Harry Wentland harry.wentland@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 7 ++++--- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 6 ++++++ 2 files changed, 10 insertions(+), 3 deletions(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -3951,7 +3951,7 @@ static u32 convert_brightness_to_user(co max - min); }
-static int amdgpu_dm_backlight_set_level(struct amdgpu_display_manager *dm, +static void amdgpu_dm_backlight_set_level(struct amdgpu_display_manager *dm, int bl_idx, u32 user_brightness) { @@ -3982,7 +3982,8 @@ static int amdgpu_dm_backlight_set_level DRM_DEBUG("DM: Failed to update backlight on eDP[%d]\n", bl_idx); }
- return rc ? 0 : 1; + if (rc) + dm->actual_brightness[bl_idx] = user_brightness; }
static int amdgpu_dm_backlight_update_status(struct backlight_device *bd) @@ -9914,7 +9915,7 @@ static void amdgpu_dm_atomic_commit_tail /* restore the backlight level */ for (i = 0; i < dm->num_of_edps; i++) { if (dm->backlight_dev[i] && - (amdgpu_dm_backlight_get_level(dm, i) != dm->brightness[i])) + (dm->actual_brightness[i] != dm->brightness[i])) amdgpu_dm_backlight_set_level(dm, i, dm->brightness[i]); } #endif --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h @@ -540,6 +540,12 @@ struct amdgpu_display_manager { * cached backlight values. */ u32 brightness[AMDGPU_DM_MAX_NUM_EDP]; + /** + * @actual_brightness: + * + * last successfully applied backlight values. + */ + u32 actual_brightness[AMDGPU_DM_MAX_NUM_EDP]; };
enum dsc_clock_force_state {
From: Daniel Mack daniel@zonque.org
commit d14eb80e27795b7b20060f7b151cdfe39722a813 upstream.
If the optional regulator lookup fails, reset the pointer to NULL. Other functions such as mipi_dbi_poweron_reset_conditional() only do a NULL pointer check and will otherwise dereference the error pointer.
Fixes: 5a04227326b04c15 ("drm/panel: Add ilitek ili9341 panel driver") Signed-off-by: Daniel Mack daniel@zonque.org Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch Link: https://patchwork.freedesktop.org/patch/msgid/20220317225537.826302-1-daniel... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/panel/panel-ilitek-ili9341.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/panel/panel-ilitek-ili9341.c +++ b/drivers/gpu/drm/panel/panel-ilitek-ili9341.c @@ -612,8 +612,10 @@ static int ili9341_dbi_probe(struct spi_ int ret;
vcc = devm_regulator_get_optional(dev, "vcc"); - if (IS_ERR(vcc)) + if (IS_ERR(vcc)) { dev_err(dev, "get optional vcc failed\n"); + vcc = NULL; + }
dbidev = devm_drm_dev_alloc(dev, &ili9341_dbi_driver, struct mipi_dbi_dev, drm);
From: CHANDAN VURDIGERE NATARAJ chandan.vurdigerenataraj@amd.com
commit ca1198849ab0e7af5efb392ef6baf1138f6fc086 upstream.
[Why] Below general protection fault observed when WebGL Aquarium is run for longer duration. If drm debug logs are enabled and set to 0x1f then the issue is observed within 10 minutes of run.
[ 100.717056] general protection fault, probably for non-canonical address 0x2d33302d32323032: 0000 [#1] PREEMPT SMP NOPTI [ 100.727921] CPU: 3 PID: 1906 Comm: DrmThread Tainted: G W 5.15.30 #12 d726c6a2d6ebe5cf9223931cbca6892f916fe18b [ 100.754419] RIP: 0010:CalculateSwathWidth+0x1f7/0x44f [ 100.767109] Code: 00 00 00 f2 42 0f 11 04 f0 48 8b 85 88 00 00 00 f2 42 0f 10 04 f0 48 8b 85 98 00 00 00 f2 42 0f 11 04 f0 48 8b 45 10 0f 57 c0 <f3> 42 0f 2a 04 b0 0f 57 c9 f3 43 0f 2a 0c b4 e8 8c e2 f3 ff 48 8b [ 100.781269] RSP: 0018:ffffa9230079eeb0 EFLAGS: 00010246 [ 100.812528] RAX: 2d33302d32323032 RBX: 0000000000000500 RCX: 0000000000000000 [ 100.819656] RDX: 0000000000000001 RSI: ffff99deb712c49c RDI: 0000000000000000 [ 100.826781] RBP: ffffa9230079ef50 R08: ffff99deb712460c R09: ffff99deb712462c [ 100.833907] R10: ffff99deb7124940 R11: ffff99deb7124d70 R12: ffff99deb712ae44 [ 100.841033] R13: 0000000000000001 R14: 0000000000000000 R15: ffffa9230079f0a0 [ 100.848159] FS: 00007af121212640(0000) GS:ffff99deba780000(0000) knlGS:0000000000000000 [ 100.856240] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 100.861980] CR2: 0000209000fe1000 CR3: 000000011b18c000 CR4: 0000000000350ee0 [ 100.869106] Call Trace: [ 100.871555] <TASK> [ 100.873655] ? asm_sysvec_reschedule_ipi+0x12/0x20 [ 100.878449] CalculateSwathAndDETConfiguration+0x1a3/0x6dd [ 100.883937] dml31_ModeSupportAndSystemConfigurationFull+0x2ce4/0x76da [ 100.890467] ? kallsyms_lookup_buildid+0xc8/0x163 [ 100.895173] ? kallsyms_lookup_buildid+0xc8/0x163 [ 100.899874] ? __sprint_symbol+0x80/0x135 [ 100.903883] ? dm_update_plane_state+0x3f9/0x4d2 [ 100.908500] ? symbol_string+0xb7/0xde [ 100.912250] ? number+0x145/0x29b [ 100.915566] ? vsnprintf+0x341/0x5ff [ 100.919141] ? desc_read_finalized_seq+0x39/0x87 [ 100.923755] ? update_load_avg+0x1b9/0x607 [ 100.927849] ? compute_mst_dsc_configs_for_state+0x7d/0xd5b [ 100.933416] ? fetch_pipe_params+0xa4d/0xd0c [ 100.937686] ? dc_fpu_end+0x3d/0xa8 [ 100.941175] dml_get_voltage_level+0x16b/0x180 [ 100.945619] dcn30_internal_validate_bw+0x10e/0x89b [ 100.950495] ? dcn31_validate_bandwidth+0x68/0x1fc [ 100.955285] ? resource_build_scaling_params+0x98b/0xb8c [ 100.960595] ? dcn31_validate_bandwidth+0x68/0x1fc [ 100.965384] dcn31_validate_bandwidth+0x9a/0x1fc [ 100.970001] dc_validate_global_state+0x238/0x295 [ 100.974703] amdgpu_dm_atomic_check+0x9c1/0xbce [ 100.979235] ? _printk+0x59/0x73 [ 100.982467] drm_atomic_check_only+0x403/0x78b [ 100.986912] drm_mode_atomic_ioctl+0x49b/0x546 [ 100.991358] ? drm_ioctl+0x1c1/0x3b3 [ 100.994936] ? drm_atomic_set_property+0x92a/0x92a [ 100.999725] drm_ioctl_kernel+0xdc/0x149 [ 101.003648] drm_ioctl+0x27f/0x3b3 [ 101.007051] ? drm_atomic_set_property+0x92a/0x92a [ 101.011842] amdgpu_drm_ioctl+0x49/0x7d [ 101.015679] __se_sys_ioctl+0x7c/0xb8 [ 101.015685] do_syscall_64+0x5f/0xb8 [ 101.015690] ? __irq_exit_rcu+0x34/0x96
[How] It calles populate_dml_pipes which uses doubles to initialize. Adding FPU protection avoids context switch and probable loss of vba context as there is potential contention while drm debug logs are enabled.
Signed-off-by: CHANDAN VURDIGERE NATARAJ chandan.vurdigerenataraj@amd.com Reviewed-by: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c @@ -2025,7 +2025,9 @@ bool dcn31_validate_bandwidth(struct dc
BW_VAL_TRACE_COUNT();
+ DC_FP_START(); out = dcn30_internal_validate_bw(dc, context, pipes, &pipe_cnt, &vlevel, fast_validate); + DC_FP_END();
// Disable fast_validate to set min dcfclk in alculate_wm_and_dlg if (pipe_cnt == 0)
From: Benjamin Marty info@benjaminmarty.ch
commit 879791ad8bf3dc5453061cad74776a617b6e3319 upstream.
Fixes crash on MST Hub disconnect.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1849 Fixes: ee2698cf79cc ("drm/amd/display: Changed pipe split policy to allow for multi-display pipe split") Signed-off-by: Benjamin Marty info@benjaminmarty.ch Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c @@ -873,7 +873,7 @@ static const struct dc_debug_options deb .clock_trace = true, .disable_pplib_clock_request = true, .min_disp_clk_khz = 100000, - .pipe_split_policy = MPC_SPLIT_DYNAMIC, + .pipe_split_policy = MPC_SPLIT_AVOID_MULT_DISP, .force_single_disp_pipe_split = false, .disable_dcc = DCC_ENABLE, .vsr_support = true,
From: Alex Deucher alexander.deucher@amd.com
commit 2f25d8ce09b7ba5d769c132ba3d4eb84a941d2cb upstream.
SMU takes clock limits in Mhz units. socclk and fclk were using 10 khz units in some cases. Switch to Mhz units. Fixes higher than required SoC clocks.
Fixes: 97cf32996c46d9 ("drm/amd/pm: Removed fixed clock in auto mode DPM") Reviewed-by: Paul Menzel pmenzel@molgen.mpg.de Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c @@ -773,13 +773,13 @@ static int smu10_dpm_force_dpm_level(str smum_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_SetHardMinFclkByFreq, hwmgr->display_config->num_display > 3 ? - data->clock_vol_info.vdd_dep_on_fclk->entries[0].clk : + (data->clock_vol_info.vdd_dep_on_fclk->entries[0].clk / 100) : min_mclk, NULL);
smum_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_SetHardMinSocclkByFreq, - data->clock_vol_info.vdd_dep_on_socclk->entries[0].clk, + data->clock_vol_info.vdd_dep_on_socclk->entries[0].clk / 100, NULL); smum_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_SetHardMinVcn, @@ -792,11 +792,11 @@ static int smu10_dpm_force_dpm_level(str NULL); smum_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_SetSoftMaxFclkByFreq, - data->clock_vol_info.vdd_dep_on_fclk->entries[index_fclk].clk, + data->clock_vol_info.vdd_dep_on_fclk->entries[index_fclk].clk / 100, NULL); smum_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_SetSoftMaxSocclkByFreq, - data->clock_vol_info.vdd_dep_on_socclk->entries[index_socclk].clk, + data->clock_vol_info.vdd_dep_on_socclk->entries[index_socclk].clk / 100, NULL); smum_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_SetSoftMaxVcn,
From: Emily Deng Emily.Deng@amd.com
commit 02fc996d5098f4c3f65bdf6cdb6b28e3f29ba789 upstream.
Correct the code error for setting register UVD_GFX10_ADDR_CONFIG. Need to use inst_idx, or it only will set VCN0.
Signed-off-by: Emily Deng Emily.Deng@amd.com Reviewed-by: James Zhu James.Zhu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c @@ -569,8 +569,8 @@ static void vcn_v3_0_mc_resume_dpg_mode( AMDGPU_GPU_PAGE_ALIGN(sizeof(struct amdgpu_fw_shared)), 0, indirect);
/* VCN global tiling registers */ - WREG32_SOC15_DPG_MODE(0, SOC15_DPG_MODE_OFFSET( - UVD, 0, mmUVD_GFX10_ADDR_CONFIG), adev->gfx.config.gb_addr_config, 0, indirect); + WREG32_SOC15_DPG_MODE(inst_idx, SOC15_DPG_MODE_OFFSET( + UVD, inst_idx, mmUVD_GFX10_ADDR_CONFIG), adev->gfx.config.gb_addr_config, 0, indirect); }
static void vcn_v3_0_disable_static_power_gating(struct amdgpu_device *adev, int inst)
From: Karol Herbst kherbst@redhat.com
commit 38d4e5cf5b08798f093374e53c2f4609d5382dd5 upstream.
Fixes a crash booting on those platforms with nouveau.
Fixes: 4cdd2450bf73 ("drm/nouveau/pmu/gm200-: use alternate falcon reset sequence") Cc: Ben Skeggs bskeggs@redhat.com Cc: Karol Herbst kherbst@redhat.com Cc: dri-devel@lists.freedesktop.org Cc: nouveau@lists.freedesktop.org Cc: stable@vger.kernel.org # v5.17+ Signed-off-by: Karol Herbst kherbst@redhat.com Reviewed-by: Lyude Paul lyude@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20220322124800.2605463-1-kherb... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gm20b.c | 1 + drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp102.c | 2 +- drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c | 1 + drivers/gpu/drm/nouveau/nvkm/subdev/pmu/priv.h | 1 + 4 files changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gm20b.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gm20b.c @@ -216,6 +216,7 @@ gm20b_pmu = { .intr = gt215_pmu_intr, .recv = gm20b_pmu_recv, .initmsg = gm20b_pmu_initmsg, + .reset = gf100_pmu_reset, };
#if IS_ENABLED(CONFIG_ARCH_TEGRA_210_SOC) --- a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp102.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp102.c @@ -23,7 +23,7 @@ */ #include "priv.h"
-static void +void gp102_pmu_reset(struct nvkm_pmu *pmu) { struct nvkm_device *device = pmu->subdev.device; --- a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c @@ -83,6 +83,7 @@ gp10b_pmu = { .intr = gt215_pmu_intr, .recv = gm20b_pmu_recv, .initmsg = gm20b_pmu_initmsg, + .reset = gp102_pmu_reset, };
#if IS_ENABLED(CONFIG_ARCH_TEGRA_210_SOC) --- a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/priv.h +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/priv.h @@ -41,6 +41,7 @@ int gt215_pmu_send(struct nvkm_pmu *, u3
bool gf100_pmu_enabled(struct nvkm_pmu *); void gf100_pmu_reset(struct nvkm_pmu *); +void gp102_pmu_reset(struct nvkm_pmu *pmu);
void gk110_pmu_pgob(struct nvkm_pmu *, bool);
From: Lee Jones lee.jones@linaro.org
commit e79a2398e1b2d47060474dca291542368183bc0f upstream.
This ensures userspace cannot prematurely clean-up the client before it is fully initialised which has been proven to cause issues in the past.
Cc: Felix Kuehling Felix.Kuehling@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: "Christian König" christian.koenig@amd.com Cc: "Pan, Xinhui" Xinhui.Pan@amd.com Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Lee Jones lee.jones@linaro.org Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-)
--- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c @@ -268,15 +268,6 @@ int kfd_smi_event_open(struct kfd_dev *d return ret; }
- ret = anon_inode_getfd(kfd_smi_name, &kfd_smi_ev_fops, (void *)client, - O_RDWR); - if (ret < 0) { - kfifo_free(&client->fifo); - kfree(client); - return ret; - } - *fd = ret; - init_waitqueue_head(&client->wait_queue); spin_lock_init(&client->lock); client->events = 0; @@ -286,5 +277,20 @@ int kfd_smi_event_open(struct kfd_dev *d list_add_rcu(&client->list, &dev->smi_clients); spin_unlock(&dev->smi_lock);
+ ret = anon_inode_getfd(kfd_smi_name, &kfd_smi_ev_fops, (void *)client, + O_RDWR); + if (ret < 0) { + spin_lock(&dev->smi_lock); + list_del_rcu(&client->list); + spin_unlock(&dev->smi_lock); + + synchronize_rcu(); + + kfifo_free(&client->fifo); + kfree(client); + return ret; + } + *fd = ret; + return 0; }
From: Alex Deucher alexander.deucher@amd.com
commit ebc002e3ee78409c42156e62e4e27ad1d09c5a75 upstream.
Seems to cause a reboots or hangs on some systems.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1924 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1953 Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)") Reviewed-by: Lijo Lazar lijo.lazar@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c +++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c @@ -1045,6 +1045,17 @@ bool amdgpu_dpm_is_baco_supported(struct
if (!pp_funcs || !pp_funcs->get_asic_baco_capability) return false; + /* Don't use baco for reset in S3. + * This is a workaround for some platforms + * where entering BACO during suspend + * seems to cause reboots or hangs. + * This might be related to the fact that BACO controls + * power to the whole GPU including devices like audio and USB. + * Powering down/up everything may adversely affect these other + * devices. Needs more investigation. + */ + if (adev->in_s3) + return false;
if (pp_funcs->get_asic_baco_capability(pp_handle, &baco_cap)) return false;
From: Trond Myklebust trond.myklebust@hammerspace.com
commit f00432063db1a0db484e85193eccc6845435b80e upstream.
We must ensure that all sockets are closed before we call xprt_free() and release the reference to the net namespace. The problem is that calling fput() will defer closing the socket until delayed_fput() gets called. Let's fix the situation by allowing rpciod and the transport teardown code (which runs on the system wq) to call __fput_sync(), and directly close the socket.
Reported-by: Felix Fu foyjog@gmail.com Acked-by: Al Viro viro@zeniv.linux.org.uk Fixes: a73881c96d73 ("SUNRPC: Fix an Oops in udp_poll()") Cc: stable@vger.kernel.org # 5.1.x: 3be232f11a3c: SUNRPC: Prevent immediate close+reconnect Cc: stable@vger.kernel.org # 5.1.x: 89f42494f92f: SUNRPC: Don't call connect() more than once on a TCP socket Cc: stable@vger.kernel.org # 5.1.x Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/file_table.c | 1 + include/trace/events/sunrpc.h | 1 - net/sunrpc/xprt.c | 7 +------ net/sunrpc/xprtsock.c | 16 +++++++++++++--- 4 files changed, 15 insertions(+), 10 deletions(-)
--- a/fs/file_table.c +++ b/fs/file_table.c @@ -412,6 +412,7 @@ void __fput_sync(struct file *file) }
EXPORT_SYMBOL(fput); +EXPORT_SYMBOL(__fput_sync);
void __init files_init(void) { --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -1005,7 +1005,6 @@ DEFINE_RPC_XPRT_LIFETIME_EVENT(connect); DEFINE_RPC_XPRT_LIFETIME_EVENT(disconnect_auto); DEFINE_RPC_XPRT_LIFETIME_EVENT(disconnect_done); DEFINE_RPC_XPRT_LIFETIME_EVENT(disconnect_force); -DEFINE_RPC_XPRT_LIFETIME_EVENT(disconnect_cleanup); DEFINE_RPC_XPRT_LIFETIME_EVENT(destroy);
DECLARE_EVENT_CLASS(rpc_xprt_event, --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -929,12 +929,7 @@ void xprt_connect(struct rpc_task *task) if (!xprt_lock_write(xprt, task)) return;
- if (test_and_clear_bit(XPRT_CLOSE_WAIT, &xprt->state)) { - trace_xprt_disconnect_cleanup(xprt); - xprt->ops->close(xprt); - } - - if (!xprt_connected(xprt)) { + if (!xprt_connected(xprt) && !test_bit(XPRT_CLOSE_WAIT, &xprt->state)) { task->tk_rqstp->rq_connect_cookie = xprt->connect_cookie; rpc_sleep_on_timeout(&xprt->pending, task, NULL, xprt_request_timeout(task->tk_rqstp)); --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -880,7 +880,7 @@ static int xs_local_send_request(struct
/* Close the stream if the previous transmission was incomplete */ if (xs_send_request_was_aborted(transport, req)) { - xs_close(xprt); + xprt_force_disconnect(xprt); return -ENOTCONN; }
@@ -918,7 +918,7 @@ static int xs_local_send_request(struct -status); fallthrough; case -EPIPE: - xs_close(xprt); + xprt_force_disconnect(xprt); status = -ENOTCONN; }
@@ -1203,6 +1203,16 @@ static void xs_reset_transport(struct so
if (sk == NULL) return; + /* + * Make sure we're calling this in a context from which it is safe + * to call __fput_sync(). In practice that means rpciod and the + * system workqueue. + */ + if (!(current->flags & PF_WQ_WORKER)) { + WARN_ON_ONCE(1); + set_bit(XPRT_CLOSE_WAIT, &xprt->state); + return; + }
if (atomic_read(&transport->xprt.swapper)) sk_clear_memalloc(sk); @@ -1226,7 +1236,7 @@ static void xs_reset_transport(struct so mutex_unlock(&transport->recv_mutex);
trace_rpc_socket_close(xprt, sock); - fput(filp); + __fput_sync(filp);
xprt_disconnect_done(xprt); }
From: Akihiko Odaki akihiko.odaki@gmail.com
commit dfbba2518aac4204203b0697a894d3b2f80134d3 upstream.
Revert commit 87ebbb8c612b ("ACPI: processor: idle: Only flush cache on entering C3") that broke the assumptions of the acpi_idle_play_dead() callers.
Namely, the CPU cache must always be flushed in acpi_idle_play_dead(), regardless of the target C-state that is going to be requested, because this is likely to be part of a CPU offline procedure or preparation for entering a system-wide sleep state and the lack of synchronization between the CPU cache and RAM may lead to problems going forward, for example when the CPU is brought back online.
In particular, it breaks resume from suspend-to-RAM on Lenovo ThinkPad C13 which fails occasionally until the problematic commit is reverted.
Signed-off-by: Akihiko Odaki akihiko.odaki@gmail.com [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Cc: Ketsui esgwpl@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/processor_idle.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/acpi/processor_idle.c +++ b/drivers/acpi/processor_idle.c @@ -570,8 +570,7 @@ static int acpi_idle_play_dead(struct cp { struct acpi_processor_cx *cx = per_cpu(acpi_cstate[index], dev->cpu);
- if (cx->type == ACPI_STATE_C3) - ACPI_FLUSH_CPU_CACHE(); + ACPI_FLUSH_CPU_CACHE();
while (1) {
From: Philip Yang Philip.Yang@amd.com
commit 90c44207cdd18091ac9aa7cab8a3e7b0ef00e847 upstream.
All warnings (new ones prefixed by >>):
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c: In function 'svm_range_deferred_list_work':
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:2067:22: warning:
variable 'p' set but not used [-Wunused-but-set-variable] 2067 | struct kfd_process *p; |
Fixes: 367c9b0f1b8750 ("drm/amdkfd: Ensure mm remain valid in svm deferred_list work") Reported-by: kernel test robot lkp@intel.com Signed-off-by: Philip Yang Philip.Yang@amd.com Reviewed-By: Harish Kasiviswanathan Harish.Kasiviswanathan@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 3 --- 1 file changed, 3 deletions(-)
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -2066,13 +2066,10 @@ static void svm_range_deferred_list_work struct svm_range_list *svms; struct svm_range *prange; struct mm_struct *mm; - struct kfd_process *p;
svms = container_of(work, struct svm_range_list, deferred_list_work); pr_debug("enter svms 0x%p\n", svms);
- p = container_of(svms, struct kfd_process, svms); - spin_lock(&svms->deferred_list_lock); while (!list_empty(&svms->deferred_range_list)) { prange = list_first_entry(&svms->deferred_range_list,
From: Dust Li dust.li@linux.alibaba.com
commit b70a5cc045197aad9c159042621baf3c015f6cc7 upstream.
In commit ea785a1a573b("net/smc: Send directly when TCP_CORK is cleared"), we don't use delayed work to implement cork.
This patch use the same algorithm, removes the delayed work when setting TCP_NODELAY and send directly in setsockopt(). This also makes the TCP_NODELAY the same as TCP.
Cc: Tony Lu tonylu@linux.alibaba.com Signed-off-by: Dust Li dust.li@linux.alibaba.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/smc/af_smc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -2625,8 +2625,8 @@ static int smc_setsockopt(struct socket sk->sk_state != SMC_CLOSED) { if (val) { SMC_STAT_INC(smc, ndly_cnt); - mod_delayed_work(smc->conn.lgr->tx_wq, - &smc->conn.tx_work, 0); + smc_tx_pending(&smc->conn); + cancel_delayed_work(&smc->conn.tx_work); } } break;
From: Jakub Kicinski kuba@kernel.org
commit 20695e9a9fd39103d1b0669470ae74030b7aa196 upstream.
This reverts commit d9142e1cf3bbdaf21337767114ecab26fe702d47.
The test is supposed to run cleanly with TLS is disabled, to test compatibility with TCP behavior. I can't repro the failure [1], the problem should be debugged rather than papered over.
Link: https://lore.kernel.org/all/20220325161203.7000698c@kicinski-fedora-pc1c0hjn... [1] Fixes: d9142e1cf3bb ("selftests: net: Add tls config dependency for tls selftests") Signed-off-by: Jakub Kicinski kuba@kernel.org Link: https://lore.kernel.org/r/20220328212904.2685395-1-kuba@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/config | 1 - 1 file changed, 1 deletion(-)
--- a/tools/testing/selftests/net/config +++ b/tools/testing/selftests/net/config @@ -43,6 +43,5 @@ CONFIG_NET_ACT_TUNNEL_KEY=m CONFIG_NET_ACT_MIRRED=m CONFIG_BAREUDP=m CONFIG_IPV6_IOAM6_LWTUNNEL=y -CONFIG_TLS=m CONFIG_CRYPTO_SM4=y CONFIG_AMT=m
From: Jakub Sitnicki jakub@cloudflare.com
commit 9a69e2b385f443f244a7e8b8bcafe5ccfb0866b4 upstream.
remote_port is another case of a BPF context field documented as a 32-bit value in network byte order for which the BPF context access converter generates a load of a zero-padded 16-bit integer in network byte order.
First such case was dst_port in bpf_sock which got addressed in commit 4421a582718a ("bpf: Make dst_port field in struct bpf_sock 16-bit wide").
Loading 4-bytes from the remote_port offset and converting the value with bpf_ntohl() leads to surprising results, as the expected value is shifted by 16 bits.
Reduce the confusion by splitting the field in two - a 16-bit field holding a big-endian integer, and a 16-bit zero-padding anonymous field that follows it.
Suggested-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Jakub Sitnicki jakub@cloudflare.com Signed-off-by: Alexei Starovoitov ast@kernel.org Link: https://lore.kernel.org/bpf/20220209184333.654927-2-jakub@cloudflare.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/uapi/linux/bpf.h | 3 ++- net/bpf/test_run.c | 4 ++-- net/core/filter.c | 3 ++- 3 files changed, 6 insertions(+), 4 deletions(-)
--- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -6379,7 +6379,8 @@ struct bpf_sk_lookup { __u32 protocol; /* IP protocol (IPPROTO_TCP, IPPROTO_UDP) */ __u32 remote_ip4; /* Network byte order */ __u32 remote_ip6[4]; /* Network byte order */ - __u32 remote_port; /* Network byte order */ + __be16 remote_port; /* Network byte order */ + __u16 :16; /* Zero padding */ __u32 local_ip4; /* Network byte order */ __u32 local_ip6[4]; /* Network byte order */ __u32 local_port; /* Host byte order */ --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -960,7 +960,7 @@ int bpf_prog_test_run_sk_lookup(struct b if (!range_is_zero(user_ctx, offsetofend(typeof(*user_ctx), local_port), sizeof(*user_ctx))) goto out;
- if (user_ctx->local_port > U16_MAX || user_ctx->remote_port > U16_MAX) { + if (user_ctx->local_port > U16_MAX) { ret = -ERANGE; goto out; } @@ -968,7 +968,7 @@ int bpf_prog_test_run_sk_lookup(struct b ctx.family = (u16)user_ctx->family; ctx.protocol = (u16)user_ctx->protocol; ctx.dport = (u16)user_ctx->local_port; - ctx.sport = (__force __be16)user_ctx->remote_port; + ctx.sport = user_ctx->remote_port;
switch (ctx.family) { case AF_INET: --- a/net/core/filter.c +++ b/net/core/filter.c @@ -10621,7 +10621,8 @@ static bool sk_lookup_is_valid_access(in case bpf_ctx_range(struct bpf_sk_lookup, local_ip4): case bpf_ctx_range_till(struct bpf_sk_lookup, remote_ip6[0], remote_ip6[3]): case bpf_ctx_range_till(struct bpf_sk_lookup, local_ip6[0], local_ip6[3]): - case bpf_ctx_range(struct bpf_sk_lookup, remote_port): + case offsetof(struct bpf_sk_lookup, remote_port) ... + offsetof(struct bpf_sk_lookup, local_ip4) - 1: case bpf_ctx_range(struct bpf_sk_lookup, local_port): case bpf_ctx_range(struct bpf_sk_lookup, ingress_ifindex): bpf_ctx_record_field_size(info, sizeof(__u32));
From: Jakub Sitnicki jakub@cloudflare.com
commit 3c69611b8926f8e74fcf76bd97ae0e5dafbeb26a upstream.
In commit 9a69e2b385f4 ("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide") ->remote_port field changed from __u32 to __be16.
However, narrow load tests which exercise 1-byte sized loads from offsetof(struct bpf_sk_lookup, remote_port) were not adopted to reflect the change.
As a result, on little-endian we continue testing loads from addresses:
- (__u8 *)&ctx->remote_port + 3 - (__u8 *)&ctx->remote_port + 4
which map to the zero padding following the remote_port field, and don't break the tests because there is no observable change.
While on big-endian, we observe breakage because tests expect to see zeros for values loaded from:
- (__u8 *)&ctx->remote_port - 1 - (__u8 *)&ctx->remote_port - 2
Above addresses map to ->remote_ip6 field, which precedes ->remote_port, and are populated during the bpf_sk_lookup IPv6 tests.
Unsurprisingly, on s390x we observe:
#136/38 sk_lookup/narrow access to ctx v4:OK #136/39 sk_lookup/narrow access to ctx v6:FAIL
Fix it by removing the checks for 1-byte loads from offsets outside of the ->remote_port field.
Fixes: 9a69e2b385f4 ("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide") Suggested-by: Ilya Leoshkevich iii@linux.ibm.com Signed-off-by: Jakub Sitnicki jakub@cloudflare.com Signed-off-by: Alexei Starovoitov ast@kernel.org Acked-by: Martin KaFai Lau kafai@fb.com Link: https://lore.kernel.org/bpf/20220319183356.233666-3-jakub@cloudflare.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/bpf/progs/test_sk_lookup.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c +++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c @@ -412,8 +412,7 @@ int ctx_narrow_access(struct bpf_sk_look
/* Narrow loads from remote_port field. Expect SRC_PORT. */ if (LSB(ctx->remote_port, 0) != ((SRC_PORT >> 0) & 0xff) || - LSB(ctx->remote_port, 1) != ((SRC_PORT >> 8) & 0xff) || - LSB(ctx->remote_port, 2) != 0 || LSB(ctx->remote_port, 3) != 0) + LSB(ctx->remote_port, 1) != ((SRC_PORT >> 8) & 0xff)) return SK_DROP; if (LSW(ctx->remote_port, 0) != SRC_PORT) return SK_DROP;
From: Jakub Sitnicki jakub@cloudflare.com
commit 058ec4a7d9cf77238c73ad9f1e1a3ed9a29afcab upstream.
In commit 9a69e2b385f4 ("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide") the remote_port field has been split up and re-declared from u32 to be16.
However, the accompanying changes to the context access converter have not been well thought through when it comes big-endian platforms.
Today 2-byte wide loads from offsetof(struct bpf_sk_lookup, remote_port) are handled as narrow loads from a 4-byte wide field.
This by itself is not enough to create a problem, but when we combine
1. 32-bit wide access to ->remote_port backed by a 16-wide wide load, with 2. inherent difference between litte- and big-endian in how narrow loads need have to be handled (see bpf_ctx_narrow_access_offset),
we get inconsistent results for a 2-byte loads from &ctx->remote_port on LE and BE architectures. This in turn makes BPF C code for the common case of 2-byte load from ctx->remote_port not portable.
To rectify it, inform the context access converter that remote_port is 2-byte wide field, and only 1-byte loads need to be treated as narrow loads.
At the same time, we special-case the 4-byte load from &ctx->remote_port to continue handling it the same way as do today, in order to keep the existing BPF programs working.
Fixes: 9a69e2b385f4 ("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide") Signed-off-by: Jakub Sitnicki jakub@cloudflare.com Signed-off-by: Alexei Starovoitov ast@kernel.org Acked-by: Martin KaFai Lau kafai@fb.com Link: https://lore.kernel.org/bpf/20220319183356.233666-2-jakub@cloudflare.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/filter.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
--- a/net/core/filter.c +++ b/net/core/filter.c @@ -10621,13 +10621,24 @@ static bool sk_lookup_is_valid_access(in case bpf_ctx_range(struct bpf_sk_lookup, local_ip4): case bpf_ctx_range_till(struct bpf_sk_lookup, remote_ip6[0], remote_ip6[3]): case bpf_ctx_range_till(struct bpf_sk_lookup, local_ip6[0], local_ip6[3]): - case offsetof(struct bpf_sk_lookup, remote_port) ... - offsetof(struct bpf_sk_lookup, local_ip4) - 1: case bpf_ctx_range(struct bpf_sk_lookup, local_port): case bpf_ctx_range(struct bpf_sk_lookup, ingress_ifindex): bpf_ctx_record_field_size(info, sizeof(__u32)); return bpf_ctx_narrow_access_ok(off, size, sizeof(__u32));
+ case bpf_ctx_range(struct bpf_sk_lookup, remote_port): + /* Allow 4-byte access to 2-byte field for backward compatibility */ + if (size == sizeof(__u32)) + return true; + bpf_ctx_record_field_size(info, sizeof(__be16)); + return bpf_ctx_narrow_access_ok(off, size, sizeof(__be16)); + + case offsetofend(struct bpf_sk_lookup, remote_port) ... + offsetof(struct bpf_sk_lookup, local_ip4) - 1: + /* Allow access to zero padding for backward compatibility */ + bpf_ctx_record_field_size(info, sizeof(__u16)); + return bpf_ctx_narrow_access_ok(off, size, sizeof(__u16)); + default: return false; } @@ -10709,6 +10720,11 @@ static u32 sk_lookup_convert_ctx_access( sport, 2, target_size)); break;
+ case offsetofend(struct bpf_sk_lookup, remote_port): + *target_size = 2; + *insn++ = BPF_MOV32_IMM(si->dst_reg, 0); + break; + case offsetof(struct bpf_sk_lookup, local_port): *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg, bpf_target_off(struct bpf_sk_lookup_kern,
From: Arnaldo Carvalho de Melo acme@redhat.com
commit 3a8a0475861a443f02e3a9b57d044fe2a0a99291 upstream.
Using -ffat-lto-objects in the python feature test when building with clang-13 results in:
clang-13: error: optimization flag '-ffat-lto-objects' is not supported [-Werror,-Wignored-optimization-argument] error: command '/usr/sbin/clang' failed with exit code 1 cp: cannot stat '/tmp/build/perf/python_ext_build/lib/perf*.so': No such file or directory make[2]: *** [Makefile.perf:639: /tmp/build/perf/python/perf.so] Error 1
Noticed when building on a docker.io/library/archlinux:base container.
Cc: Adrian Hunter adrian.hunter@intel.com Cc: Fangrui Song maskray@google.com Cc: Florian Fainelli f.fainelli@gmail.com Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@kernel.org Cc: John Keeping john@metanate.com Cc: Leo Yan leo.yan@linaro.org Cc: Michael Petlan mpetlan@redhat.com Cc: Namhyung Kim namhyung@kernel.org Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: Sedat Dilek sedat.dilek@gmail.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/perf/Makefile.config | 3 +++ tools/perf/util/setup.py | 2 ++ 2 files changed, 5 insertions(+)
--- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -272,6 +272,9 @@ ifdef PYTHON_CONFIG PYTHON_EMBED_LIBADD := $(call grep-libs,$(PYTHON_EMBED_LDOPTS)) -lutil PYTHON_EMBED_CCOPTS := $(shell $(PYTHON_CONFIG_SQ) --includes 2>/dev/null) FLAGS_PYTHON_EMBED := $(PYTHON_EMBED_CCOPTS) $(PYTHON_EMBED_LDOPTS) + ifeq ($(CC_NO_CLANG), 0) + PYTHON_EMBED_CCOPTS := $(filter-out -ffat-lto-objects, $(PYTHON_EMBED_CCOPTS)) + endif endif
FEATURE_CHECK_CFLAGS-libpython := $(PYTHON_EMBED_CCOPTS) --- a/tools/perf/util/setup.py +++ b/tools/perf/util/setup.py @@ -23,6 +23,8 @@ if cc_is_clang: vars[var] = sub("-fstack-protector-strong", "", vars[var]) if not clang_has_option("-fno-semantic-interposition"): vars[var] = sub("-fno-semantic-interposition", "", vars[var]) + if not clang_has_option("-ffat-lto-objects"): + vars[var] = sub("-ffat-lto-objects", "", vars[var])
from distutils.core import setup, Extension
From: Arnaldo Carvalho de Melo acme@redhat.com
commit dd6e1fe91cdd52774ca642d1da75b58a86356b56 upstream.
The clang compiler complains about some options even without a source file being available, while others require one, so use the simple tools/build/feature/test-hello.c file.
Then check for the "is not supported" string in its output, in addition to the "unknown argument" already being looked for.
This was noticed when building with clang-13 where -ffat-lto-objects isn't supported and since we were looking just for "unknown argument" and not providing a source code to clang, was mistakenly assumed as being available and not being filtered to set of command line options provided to clang, leading to a build failure.
Cc: Adrian Hunter adrian.hunter@intel.com Cc: Fangrui Song maskray@google.com Cc: Florian Fainelli f.fainelli@gmail.com Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@kernel.org Cc: John Keeping john@metanate.com Cc: Leo Yan leo.yan@linaro.org Cc: Michael Petlan mpetlan@redhat.com Cc: Namhyung Kim namhyung@kernel.org Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: Sedat Dilek sedat.dilek@gmail.com Link: http://lore.kernel.org/lkml/ Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/perf/util/setup.py | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/tools/perf/util/setup.py +++ b/tools/perf/util/setup.py @@ -1,12 +1,14 @@ -from os import getenv +from os import getenv, path from subprocess import Popen, PIPE from re import sub
cc = getenv("CC") cc_is_clang = b"clang version" in Popen([cc.split()[0], "-v"], stderr=PIPE).stderr.readline() +src_feature_tests = getenv('srctree') + '/tools/build/feature'
def clang_has_option(option): - return [o for o in Popen([cc, option], stderr=PIPE).stderr.readlines() if b"unknown argument" in o] == [ ] + cc_output = Popen([cc, option, path.join(src_feature_tests, "test-hello.c") ], stderr=PIPE).stderr.readlines() + return [o for o in cc_output if ((b"unknown argument" in o) or (b"is not supported" in o))] == [ ]
if cc_is_clang: from distutils.sysconfig import get_config_vars
From: Arnaldo Carvalho de Melo acme@redhat.com
commit 41caff459a5b956b3e23ba9ca759dd0629ad3dda upstream.
These make the feature check fail when using clang, so remove them just like is done in tools/perf/Makefile.config to build perf itself.
Adding -Wno-compound-token-split-by-macro to tools/perf/Makefile.config when building with clang is also necessary to avoid these warnings turned into errors (-Werror):
CC /tmp/build/perf/util/scripting-engines/trace-event-perl.o In file included from util/scripting-engines/trace-event-perl.c:35: In file included from /usr/lib64/perl5/CORE/perl.h:4085: In file included from /usr/lib64/perl5/CORE/hv.h:659: In file included from /usr/lib64/perl5/CORE/hv_func.h:34: In file included from /usr/lib64/perl5/CORE/sbox32_hash.h:4: /usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: error: '(' and '{' tokens introducing statement expression appear in different macro expansion contexts [-Werror,-Wcompound-token-split-by-macro] ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/lib64/perl5/CORE/zaphod32_hash.h:80:38: note: expanded from macro 'ZAPHOD32_SCRAMBLE32' #define ZAPHOD32_SCRAMBLE32(v,prime) STMT_START { \ ^~~~~~~~~~ /usr/lib64/perl5/CORE/perl.h:737:29: note: expanded from macro 'STMT_START' # define STMT_START (void)( /* gcc supports "({ STATEMENTS; })" */ ^ /usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: note: '{' token is here ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/lib64/perl5/CORE/zaphod32_hash.h:80:49: note: expanded from macro 'ZAPHOD32_SCRAMBLE32' #define ZAPHOD32_SCRAMBLE32(v,prime) STMT_START { \ ^ /usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: error: '}' and ')' tokens terminating statement expression appear in different macro expansion contexts [-Werror,-Wcompound-token-split-by-macro] ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/lib64/perl5/CORE/zaphod32_hash.h:87:41: note: expanded from macro 'ZAPHOD32_SCRAMBLE32' v ^= (v>>23); \ ^ /usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: note: ')' token is here ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/lib64/perl5/CORE/zaphod32_hash.h:88:3: note: expanded from macro 'ZAPHOD32_SCRAMBLE32' } STMT_END ^~~~~~~~ /usr/lib64/perl5/CORE/perl.h:738:21: note: expanded from macro 'STMT_END' # define STMT_END ) ^
Please refer to the discussion on the Link: tag below, where Nathan clarifies the situation:
<quote> acme> And then get to the problems at the end of this message, which seem acme> similar to the problem described here: acme> acme> From Nathan Chancellor <> acme> Subject [PATCH] mwifiex: Remove unnecessary braces from HostCmd_SET_SEQ_NO_BSS_INFO acme> acme> https://lkml.org/lkml/2020/9/1/135 acme> acme> So perhaps in this case its better to disable that acme> -Werror,-Wcompound-token-split-by-macro when building with clang?
Yes, I think that is probably the best solution. As far as I can tell, at least in this file and context, the warning appears harmless, as the "create a GNU C statement expression from two different macros" is very much intentional, based on the presence of PERL_USE_GCC_BRACE_GROUPS. The warning is fixed in upstream Perl by just avoiding creating GNU C statement expressions using STMT_START and STMT_END:
https://github.com/Perl/perl5/issues/18780 https://github.com/Perl/perl5/pull/18984
If I am reading the source code correctly, an alternative to disabling the warning would be specifying -DPERL_GCC_BRACE_GROUPS_FORBIDDEN but it seems like that might end up impacting more than just this site, according to the issue discussion above. </quote>
Based-on-a-patch-by: Sedat Dilek sedat.dilek@gmail.com Tested-by: Sedat Dilek sedat.dilek@gmail.com # Debian/Selfmade LLVM-14 (x86-64) Cc: Adrian Hunter adrian.hunter@intel.com Cc: Fangrui Song maskray@google.com Cc: Florian Fainelli f.fainelli@gmail.com Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@kernel.org Cc: John Keeping john@metanate.com Cc: Leo Yan leo.yan@linaro.org Cc: Michael Petlan mpetlan@redhat.com Cc: Namhyung Kim namhyung@kernel.org Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Link: http://lore.kernel.org/lkml/YkxWcYzph5pC1EK8@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/build/feature/Makefile | 7 +++++++ tools/perf/Makefile.config | 3 +++ 2 files changed, 10 insertions(+)
--- a/tools/build/feature/Makefile +++ b/tools/build/feature/Makefile @@ -220,6 +220,13 @@ PERL_EMBED_LIBADD = $(call grep-libs,$(P PERL_EMBED_CCOPTS = `perl -MExtUtils::Embed -e ccopts 2>/dev/null` FLAGS_PERL_EMBED=$(PERL_EMBED_CCOPTS) $(PERL_EMBED_LDOPTS)
+ifeq ($(CC_NO_CLANG), 0) + PERL_EMBED_LDOPTS := $(filter-out -specs=%,$(PERL_EMBED_LDOPTS)) + PERL_EMBED_CCOPTS := $(filter-out -flto=auto -ffat-lto-objects, $(PERL_EMBED_CCOPTS)) + PERL_EMBED_CCOPTS := $(filter-out -specs=%,$(PERL_EMBED_CCOPTS)) + FLAGS_PERL_EMBED += -Wno-compound-token-split-by-macro +endif + $(OUTPUT)test-libperl.bin: $(BUILD) $(FLAGS_PERL_EMBED)
--- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -793,6 +793,9 @@ else LDFLAGS += $(PERL_EMBED_LDFLAGS) EXTLIBS += $(PERL_EMBED_LIBADD) CFLAGS += -DHAVE_LIBPERL_SUPPORT + ifeq ($(CC_NO_CLANG), 0) + CFLAGS += -Wno-compound-token-split-by-macro + endif $(call detected,CONFIG_LIBPERL) endif endif
From: Arnaldo Carvalho de Melo acme@redhat.com
commit 541f695cbcb6932c22638b06e0cbe1d56177e2e9 upstream.
Just like its done for ldopts and for both in tools/perf/Makefile.config.
Using `` to initialize PERL_EMBED_CCOPTS somehow precludes using:
$(filter-out SOMETHING_TO_FILTER,$(PERL_EMBED_CCOPTS))
And we need to do it to allow for building with versions of clang where some gcc options selected by distros are not available.
Tested-by: Sedat Dilek sedat.dilek@gmail.com # Debian/Selfmade LLVM-14 (x86-64) Cc: Adrian Hunter adrian.hunter@intel.com Cc: Fangrui Song maskray@google.com Cc: Florian Fainelli f.fainelli@gmail.com Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@kernel.org Cc: John Keeping john@metanate.com Cc: Leo Yan leo.yan@linaro.org Cc: Michael Petlan mpetlan@redhat.com Cc: Namhyung Kim namhyung@kernel.org Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Link: http://lore.kernel.org/lkml/YktYX2OnLtyobRYD@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/build/feature/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/build/feature/Makefile +++ b/tools/build/feature/Makefile @@ -217,7 +217,7 @@ strip-libs = $(filter-out -l%,$(1)) PERL_EMBED_LDOPTS = $(shell perl -MExtUtils::Embed -e ldopts 2>/dev/null) PERL_EMBED_LDFLAGS = $(call strip-libs,$(PERL_EMBED_LDOPTS)) PERL_EMBED_LIBADD = $(call grep-libs,$(PERL_EMBED_LDOPTS)) -PERL_EMBED_CCOPTS = `perl -MExtUtils::Embed -e ccopts 2>/dev/null` +PERL_EMBED_CCOPTS = $(shell perl -MExtUtils::Embed -e ccopts 2>/dev/null) FLAGS_PERL_EMBED=$(PERL_EMBED_CCOPTS) $(PERL_EMBED_LDOPTS)
ifeq ($(CC_NO_CLANG), 0)
From: Vinod Koul vkoul@kernel.org
commit d143f939a95696d38ff800ada14402fa50ebbd6c upstream.
This reverts commit 455896c53d5b ("dmaengine: shdma: Fix runtime PM imbalance on error") as the patch wrongly reduced the count on error and did not bail out. So drop the count by reverting the patch .
Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/sh/shdma-base.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/drivers/dma/sh/shdma-base.c +++ b/drivers/dma/sh/shdma-base.c @@ -115,10 +115,8 @@ static dma_cookie_t shdma_tx_submit(stru ret = pm_runtime_get(schan->dev);
spin_unlock_irq(&schan->chan_lock); - if (ret < 0) { + if (ret < 0) dev_err(schan->dev, "%s(): GET = %d\n", __func__, ret); - pm_runtime_put(schan->dev); - }
pm_runtime_barrier(schan->dev);
From: Paolo Bonzini pbonzini@redhat.com
commit 5593473a1e6c743764b08e3b6071cb43b5cfa6c4 upstream.
kvm_vcpu_release() will call kvm_dirty_ring_free(), freeing ring->dirty_gfns and setting it to NULL. Afterwards, it calls kvm_arch_vcpu_destroy().
However, if closing the file descriptor races with KVM_RUN in such away that vcpu->arch.st.preempted == 0, the following call stack leads to a NULL pointer dereference in kvm_dirty_run_push():
mark_page_dirty_in_slot+0x192/0x270 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3171 kvm_steal_time_set_preempted arch/x86/kvm/x86.c:4600 [inline] kvm_arch_vcpu_put+0x34e/0x5b0 arch/x86/kvm/x86.c:4618 vcpu_put+0x1b/0x70 arch/x86/kvm/../../../virt/kvm/kvm_main.c:211 vmx_free_vcpu+0xcb/0x130 arch/x86/kvm/vmx/vmx.c:6985 kvm_arch_vcpu_destroy+0x76/0x290 arch/x86/kvm/x86.c:11219 kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline]
The fix is to release the dirty page ring after kvm_arch_vcpu_destroy has run.
Reported-by: Qiuhao Li qiuhao@sysec.org Reported-by: Gaoning Pan pgn@zju.edu.cn Reported-by: Yongkang Jia kangel@zju.edu.cn Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- virt/kvm/kvm_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -439,8 +439,8 @@ static void kvm_vcpu_init(struct kvm_vcp
static void kvm_vcpu_destroy(struct kvm_vcpu *vcpu) { - kvm_dirty_ring_free(&vcpu->dirty_ring); kvm_arch_vcpu_destroy(vcpu); + kvm_dirty_ring_free(&vcpu->dirty_ring);
/* * No need for rcu_read_lock as VCPU_RUN is the only place that changes
From: Andrea Parri (Microsoft) parri.andrea@gmail.com
commit eaa03d34535872d29004cb5cf77dc9dec1ba9a25 upstream.
Following the recommendation in Documentation/memory-barriers.txt for virtual machine guests.
Fixes: 8b6a877c060ed ("Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels") Signed-off-by: Andrea Parri (Microsoft) parri.andrea@gmail.com Link: https://lore.kernel.org/r/20220328154457.100872-1-parri.andrea@gmail.com Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hv/channel_mgmt.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/hv/channel_mgmt.c +++ b/drivers/hv/channel_mgmt.c @@ -380,7 +380,7 @@ void vmbus_channel_map_relid(struct vmbu * execute: * * (a) In the "normal (i.e., not resuming from hibernation)" path, - * the full barrier in smp_store_mb() guarantees that the store + * the full barrier in virt_store_mb() guarantees that the store * is propagated to all CPUs before the add_channel_work work * is queued. In turn, add_channel_work is queued before the * channel's ring buffer is allocated/initialized and the @@ -392,14 +392,14 @@ void vmbus_channel_map_relid(struct vmbu * recv_int_page before retrieving the channel pointer from the * array of channels. * - * (b) In the "resuming from hibernation" path, the smp_store_mb() + * (b) In the "resuming from hibernation" path, the virt_store_mb() * guarantees that the store is propagated to all CPUs before * the VMBus connection is marked as ready for the resume event * (cf. check_ready_for_resume_event()). The interrupt handler * of the VMBus driver and vmbus_chan_sched() can not run before * vmbus_bus_resume() has completed execution (cf. resume_noirq). */ - smp_store_mb( + virt_store_mb( vmbus_connection.channels[channel->offermsg.child_relid], channel); }
From: Kefeng Wang wangkefeng.wang@huawei.com
commit ffa0b64e3be58519ae472ea29a1a1ad681e32f48 upstream.
mpe: On 64-bit Book3E vmalloc space starts at 0x8000000000000000.
Because of the way __pa() works we have: __pa(0x8000000000000000) == 0, and therefore virt_to_pfn(0x8000000000000000) == 0, and therefore virt_addr_valid(0x8000000000000000) == true
Which is wrong, virt_addr_valid() should be false for vmalloc space. In fact all vmalloc addresses that alias with a valid PFN will return true from virt_addr_valid(). That can cause bugs with hardened usercopy as described below by Kefeng Wang:
When running ethtool eth0 on 64-bit Book3E, a BUG occurred:
usercopy: Kernel memory exposure attempt detected from SLUB object not in SLUB page?! (offset 0, size 1048)! kernel BUG at mm/usercopy.c:99 ... usercopy_abort+0x64/0xa0 (unreliable) __check_heap_object+0x168/0x190 __check_object_size+0x1a0/0x200 dev_ethtool+0x2494/0x2b20 dev_ioctl+0x5d0/0x770 sock_do_ioctl+0xf0/0x1d0 sock_ioctl+0x3ec/0x5a0 __se_sys_ioctl+0xf0/0x160 system_call_exception+0xfc/0x1f0 system_call_common+0xf8/0x200
The code shows below,
data = vzalloc(array_size(gstrings.len, ETH_GSTRING_LEN)); copy_to_user(useraddr, data, gstrings.len * ETH_GSTRING_LEN))
The data is alloced by vmalloc(), virt_addr_valid(ptr) will return true on 64-bit Book3E, which leads to the panic.
As commit 4dd7554a6456 ("powerpc/64: Add VIRTUAL_BUG_ON checks for __va and __pa addresses") does, make sure the virt addr above PAGE_OFFSET in the virt_addr_valid() for 64-bit, also add upper limit check to make sure the virt is below high_memory.
Meanwhile, for 32-bit PAGE_OFFSET is the virtual address of the start of lowmem, high_memory is the upper low virtual address, the check is suitable for 32-bit, this will fix the issue mentioned in commit 602946ec2f90 ("powerpc: Set max_mapnr correctly") too.
On 32-bit there is a similar problem with high memory, that was fixed in commit 602946ec2f90 ("powerpc: Set max_mapnr correctly"), but that commit breaks highmem and needs to be reverted.
We can't easily fix __pa(), we have code that relies on its current behaviour. So for now add extra checks to virt_addr_valid().
For 64-bit Book3S the extra checks are not necessary, the combination of virt_to_pfn() and pfn_valid() should yield the correct result, but they are harmless.
Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Reviewed-by: Christophe Leroy christophe.leroy@csgroup.eu [mpe: Add additional change log detail] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220406145802.538416-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/page.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/arch/powerpc/include/asm/page.h +++ b/arch/powerpc/include/asm/page.h @@ -132,7 +132,11 @@ static inline bool pfn_valid(unsigned lo #define virt_to_page(kaddr) pfn_to_page(virt_to_pfn(kaddr)) #define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT)
-#define virt_addr_valid(kaddr) pfn_valid(virt_to_pfn(kaddr)) +#define virt_addr_valid(vaddr) ({ \ + unsigned long _addr = (unsigned long)vaddr; \ + _addr >= PAGE_OFFSET && _addr < (unsigned long)high_memory && \ + pfn_valid(virt_to_pfn(_addr)); \ +})
/* * On Book-E parts we need __va to parse the device tree and we can't
From: Kefeng Wang wangkefeng.wang@huawei.com
commit 1ff5c8e8c835e8a81c0868e3050c76563dd56a2c upstream.
This reverts commit 602946ec2f90d5bd965857753880db29d2d9a1e9.
If CONFIG_HIGHMEM is enabled, no highmem will be added with max_mapnr set to max_low_pfn, see mem_init():
for (pfn = highmem_mapnr; pfn < max_mapnr; ++pfn) { ... free_highmem_page(); }
Now that virt_addr_valid() has been fixed in the previous commit, we can revert the change to max_mapnr.
Fixes: 602946ec2f90 ("powerpc: Set max_mapnr correctly") Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Reviewed-by: Christophe Leroy christophe.leroy@csgroup.eu Reported-by: Erhard F. erhard_f@mailbox.org [mpe: Update change log to reflect series reordering] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220406145802.538416-2-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/mm/mem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -255,7 +255,7 @@ void __init mem_init(void) #endif
high_memory = (void *) __va(max_low_pfn * PAGE_SIZE); - set_max_mapnr(max_low_pfn); + set_max_mapnr(max_pfn);
kasan_late_init();
From: Vincent Mailhol mailhol.vincent@wanadoo.fr
commit 9ce02f0fc68326dd1f87a0a3a4c6ae7fdd39e6f6 upstream.
The macro __WARN_FLAGS() uses a local variable named "f". This being a common name, there is a risk of shadowing other variables.
For example, GCC would yield:
| In file included from ./include/linux/bug.h:5, | from ./include/linux/cpumask.h:14, | from ./arch/x86/include/asm/cpumask.h:5, | from ./arch/x86/include/asm/msr.h:11, | from ./arch/x86/include/asm/processor.h:22, | from ./arch/x86/include/asm/timex.h:5, | from ./include/linux/timex.h:65, | from ./include/linux/time32.h:13, | from ./include/linux/time.h:60, | from ./include/linux/stat.h:19, | from ./include/linux/module.h:13, | from virt/lib/irqbypass.mod.c:1: | ./include/linux/rcupdate.h: In function 'rcu_head_after_call_rcu': | ./arch/x86/include/asm/bug.h:80:21: warning: declaration of 'f' shadows a parameter [-Wshadow] | 80 | __auto_type f = BUGFLAG_WARNING|(flags); \ | | ^ | ./include/asm-generic/bug.h:106:17: note: in expansion of macro '__WARN_FLAGS' | 106 | __WARN_FLAGS(BUGFLAG_ONCE | \ | | ^~~~~~~~~~~~ | ./include/linux/rcupdate.h:1007:9: note: in expansion of macro 'WARN_ON_ONCE' | 1007 | WARN_ON_ONCE(func != (rcu_callback_t)~0L); | | ^~~~~~~~~~~~ | In file included from ./include/linux/rbtree.h:24, | from ./include/linux/mm_types.h:11, | from ./include/linux/buildid.h:5, | from ./include/linux/module.h:14, | from virt/lib/irqbypass.mod.c:1: | ./include/linux/rcupdate.h:1001:62: note: shadowed declaration is here | 1001 | rcu_head_after_call_rcu(struct rcu_head *rhp, rcu_callback_t f) | | ~~~~~~~~~~~~~~~^
For reference, sparse also warns about it, c.f. [1].
This patch renames the variable from f to __flags (with two underscore prefixes as suggested in the Linux kernel coding style [2]) in order to prevent collisions.
[1] https://lore.kernel.org/all/CAFGhKbyifH1a+nAMCvWM88TK6fpNPdzFtUXPmRGnnQeePV+...
[2] Linux kernel coding style, section 12) Macros, Enums and RTL, paragraph 5) namespace collisions when defining local variables in macros resembling functions https://www.kernel.org/doc/html/latest/process/coding-style.html#macros-enum...
Fixes: bfb1a7c91fb7 ("x86/bug: Merge annotate_reachable() into_BUG_FLAGS() asm") Signed-off-by: Vincent Mailhol mailhol.vincent@wanadoo.fr Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Nick Desaulniers ndesaulniers@google.com Acked-by: Josh Poimboeuf jpoimboe@redhat.com Link: https://lkml.kernel.org/r/20220324023742.106546-1-mailhol.vincent@wanadoo.fr Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/bug.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/bug.h +++ b/arch/x86/include/asm/bug.h @@ -77,9 +77,9 @@ do { \ */ #define __WARN_FLAGS(flags) \ do { \ - __auto_type f = BUGFLAG_WARNING|(flags); \ + __auto_type __flags = BUGFLAG_WARNING|(flags); \ instrumentation_begin(); \ - _BUG_FLAGS(ASM_UD2, f, ASM_REACHABLE); \ + _BUG_FLAGS(ASM_UD2, __flags, ASM_REACHABLE); \ instrumentation_end(); \ } while (0)
From: Peter Zijlstra peterz@infradead.org
commit 7a53f408902d913cd541b4f8ad7dbcd4961f5b82 upstream.
Since not all compilers have a function attribute to disable KCOV instrumentation, objtool can rewrite KCOV instrumentation in noinstr functions as per commit:
f56dae88a81f ("objtool: Handle __sanitize_cov*() tail calls")
However, this has subtle interaction with the SLS validation from commit:
1cc1e4c8aab4 ("objtool: Add straight-line-speculation validation")
In that when a tail-call instrucion is replaced with a RET an additional INT3 instruction is also written, but is not represented in the decoded instruction stream.
This then leads to false positive missing INT3 objtool warnings in noinstr code.
Instead of adding additional struct instruction objects, mark the RET instruction with retpoline_safe to suppress the warning (since we know there really is an INT3).
Fixes: 1cc1e4c8aab4 ("objtool: Add straight-line-speculation validation") Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/20220323230712.GA8939@worktop.programming.kicks-as... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/objtool/check.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
--- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -1090,6 +1090,17 @@ static void annotate_call_site(struct ob : arch_nop_insn(insn->len));
insn->type = sibling ? INSN_RETURN : INSN_NOP; + + if (sibling) { + /* + * We've replaced the tail-call JMP insn by two new + * insn: RET; INT3, except we only have a single struct + * insn here. Mark it retpoline_safe to avoid the SLS + * warning, instead of adding another insn. + */ + insn->retpoline_safe = true; + } + return; }
From: Peter Zijlstra peterz@infradead.org
commit 5b6547ed97f4f5dfc23f8e3970af6d11d7b7ed7e upstream.
Steve reported that ChromeOS encounters the forceidle balancer being ran from rt_mutex_setprio()'s balance_callback() invocation and explodes.
Now, the forceidle balancer gets queued every time the idle task gets selected, set_next_task(), which is strictly too often. rt_mutex_setprio() also uses set_next_task() in the 'change' pattern:
queued = task_on_rq_queued(p); /* p->on_rq == TASK_ON_RQ_QUEUED */ running = task_current(rq, p); /* rq->curr == p */
if (queued) dequeue_task(...); if (running) put_prev_task(...);
/* change task properties */
if (queued) enqueue_task(...); if (running) set_next_task(...);
However, rt_mutex_setprio() will explicitly not run this pattern on the idle task (since priority boosting the idle task is quite insane). Most other 'change' pattern users are pidhash based and would also not apply to idle.
Also, the change pattern doesn't contain a __balance_callback() invocation and hence we could have an out-of-band balance-callback, which *should* trigger the WARN in rq_pin_lock() (which guards against this exact anti-pattern).
So while none of that explains how this happens, it does indicate that having it in set_next_task() might not be the most robust option.
Instead, explicitly queue the forceidle balancer from pick_next_task() when it does indeed result in forceidle selection. Having it here, ensures it can only be triggered under the __schedule() rq->lock instance, and hence must be ran from that context.
This also happens to clean up the code a little, so win-win.
Fixes: d2dfa17bc7de ("sched: Trivial forced-newidle balancer") Reported-by: Steven Rostedt rostedt@goodmis.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Tested-by: T.J. Alumbaugh talumbau@chromium.org Link: https://lkml.kernel.org/r/20220330160535.GN8939@worktop.programming.kicks-as... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sched/core.c | 14 ++++++++++---- kernel/sched/idle.c | 1 - kernel/sched/sched.h | 6 ------ 3 files changed, 10 insertions(+), 11 deletions(-)
--- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5675,6 +5675,8 @@ static inline struct task_struct *pick_t
extern void task_vruntime_update(struct rq *rq, struct task_struct *p, bool in_fi);
+static void queue_core_balance(struct rq *rq); + static struct task_struct * pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) { @@ -5724,7 +5726,7 @@ pick_next_task(struct rq *rq, struct tas }
rq->core_pick = NULL; - return next; + goto out; }
put_prev_task_balance(rq, prev, rf); @@ -5774,7 +5776,7 @@ pick_next_task(struct rq *rq, struct tas */ WARN_ON_ONCE(fi_before); task_vruntime_update(rq, next, false); - goto done; + goto out_set_next; } }
@@ -5893,8 +5895,12 @@ pick_next_task(struct rq *rq, struct tas resched_curr(rq_i); }
-done: +out_set_next: set_next_task(rq, next); +out: + if (rq->core->core_forceidle_count && next == rq->idle) + queue_core_balance(rq); + return next; }
@@ -5989,7 +5995,7 @@ static void sched_core_balance(struct rq
static DEFINE_PER_CPU(struct callback_head, core_balance_head);
-void queue_core_balance(struct rq *rq) +static void queue_core_balance(struct rq *rq) { if (!sched_core_enabled(rq)) return; --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -437,7 +437,6 @@ static void set_next_task_idle(struct rq { update_idle_core(rq); schedstat_inc(rq->sched_goidle); - queue_core_balance(rq); }
#ifdef CONFIG_SMP --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1247,8 +1247,6 @@ static inline bool sched_group_cookie_ma return false; }
-extern void queue_core_balance(struct rq *rq); - static inline bool sched_core_enqueued(struct task_struct *p) { return !RB_EMPTY_NODE(&p->core_node); @@ -1282,10 +1280,6 @@ static inline raw_spinlock_t *__rq_lockp return &rq->__lock; }
-static inline void queue_core_balance(struct rq *rq) -{ -} - static inline bool sched_cpu_cookie_match(struct rq *rq, struct task_struct *p) { return true;
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
commit 386ef214c3c6ab111d05e1790e79475363abaa05 upstream.
try_steal_cookie() looks at task_struct::cpus_mask to decide if the task could be moved to `this' CPU. It ignores that the task might be in a migration disabled section while not on the CPU. In this case the task must not be moved otherwise per-CPU assumption are broken.
Use is_cpu_allowed(), as suggested by Peter Zijlstra, to decide if the a task can be moved.
Fixes: d2dfa17bc7de6 ("sched: Trivial forced-newidle balancer") Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/YjNK9El+3fzGmswf@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5929,7 +5929,7 @@ static bool try_steal_cookie(int this, i if (p == src->core_pick || p == src->curr) goto next;
- if (!cpumask_test_cpu(this, &p->cpus_mask)) + if (!is_cpu_allowed(p, this)) goto next;
if (p->core_occupation > dst->idle->core_occupation)
From: Peter Zijlstra peterz@infradead.org
commit 1cd5f059d956e6f614ba6666ecdbcf95db05d5f5 upstream.
Paolo reported that the instruction sequence that is used to replace:
call __static_call_return0
namely:
66 66 48 31 c0 data16 data16 xor %rax,%rax
decodes to something else on i386, namely:
66 66 48 data16 dec %ax 31 c0 xor %eax,%eax
Which is a nonsensical sequence that happens to have the same outcome. *However* an important distinction is that it consists of 2 instructions which is a problem when the thing needs to be overwriten with a regular call instruction again.
As such, replace the instruction with something that decodes the same on both i386 and x86_64.
Fixes: 3f2a8fc4b15d ("static_call/x86: Add __static_call_return0()") Reported-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/20220318204419.GT8939@worktop.programming.kicks-as... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/static_call.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/static_call.c +++ b/arch/x86/kernel/static_call.c @@ -12,10 +12,9 @@ enum insn_type { };
/* - * data16 data16 xorq %rax, %rax - a single 5 byte instruction that clears %rax - * The REX.W cancels the effect of any data16. + * cs cs cs xorl %eax, %eax - a single 5 byte instruction that clears %[er]ax */ -static const u8 xor5rax[] = { 0x66, 0x66, 0x48, 0x31, 0xc0 }; +static const u8 xor5rax[] = { 0x2e, 0x2e, 0x2e, 0x31, 0xc0 };
static const u8 retinsn[] = { RET_INSN_OPCODE, 0xcc, 0xcc, 0xcc, 0xcc };
From: Nick Desaulniers ndesaulniers@google.com
commit 334865b2915c33080624e0d06f1c3e917036472c upstream.
Bernardo reported an error that Nathan bisected down to (x86_64) defconfig+LTO_CLANG_FULL+X86_PMEM_LEGACY.
LTO vmlinux.o ld.lld: error: <instantiation>:1:13: redefinition of 'found' .set found, 0 ^
<inline asm>:29:1: while in macro instantiation extable_type_reg reg=%eax, type=(17 | ((0) << 16)) ^
This appears to be another LTO specific issue similar to what was folded into commit 4b5305decc84 ("x86/extable: Extend extable functionality"), where the `.set found, 0` in DEFINE_EXTABLE_TYPE_REG in arch/x86/include/asm/asm.h conflicts with the symbol for the static function `found` in arch/x86/kernel/pmem.c.
Assembler .set directive declare symbols with global visibility, so the assembler may not rename such symbols in the event of a conflict. LTO could rename static functions if there was a conflict in C sources, but it cannot see into symbols defined in inline asm.
The symbols are also retained in the symbol table, regardless of LTO.
Give the symbols .L prefixes making them locally visible, so that they may be renamed for LTO to avoid conflicts, and to drop them from the symbol table regardless of LTO.
Fixes: 4b5305decc84 ("x86/extable: Extend extable functionality") Reported-by: Bernardo Meurer Costa beme@google.com Debugged-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Nick Desaulniers ndesaulniers@google.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Nathan Chancellor nathan@kernel.org Tested-by: Nathan Chancellor nathan@kernel.org Link: https://lore.kernel.org/r/20220329202148.2379697-1-ndesaulniers@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/asm.h | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)
--- a/arch/x86/include/asm/asm.h +++ b/arch/x86/include/asm/asm.h @@ -154,24 +154,24 @@
# define DEFINE_EXTABLE_TYPE_REG \ ".macro extable_type_reg type:req reg:req\n" \ - ".set found, 0\n" \ - ".set regnr, 0\n" \ + ".set .Lfound, 0\n" \ + ".set .Lregnr, 0\n" \ ".irp rs,rax,rcx,rdx,rbx,rsp,rbp,rsi,rdi,r8,r9,r10,r11,r12,r13,r14,r15\n" \ ".ifc \reg, %%\rs\n" \ - ".set found, found+1\n" \ - ".long \type + (regnr << 8)\n" \ + ".set .Lfound, .Lfound+1\n" \ + ".long \type + (.Lregnr << 8)\n" \ ".endif\n" \ - ".set regnr, regnr+1\n" \ + ".set .Lregnr, .Lregnr+1\n" \ ".endr\n" \ - ".set regnr, 0\n" \ + ".set .Lregnr, 0\n" \ ".irp rs,eax,ecx,edx,ebx,esp,ebp,esi,edi,r8d,r9d,r10d,r11d,r12d,r13d,r14d,r15d\n" \ ".ifc \reg, %%\rs\n" \ - ".set found, found+1\n" \ - ".long \type + (regnr << 8)\n" \ + ".set .Lfound, .Lfound+1\n" \ + ".long \type + (.Lregnr << 8)\n" \ ".endif\n" \ - ".set regnr, regnr+1\n" \ + ".set .Lregnr, .Lregnr+1\n" \ ".endr\n" \ - ".if (found != 1)\n" \ + ".if (.Lfound != 1)\n" \ ".error "extable_type_reg: bad register argument"\n" \ ".endif\n" \ ".endm\n"
From: Marc Zyngier maz@kernel.org
commit af27e41612ec7e5b4783f589b753a7c31a37aac8 upstream.
The way KVM drives GICv4.{0,1} is as follows: - vcpu_load() makes the VPE resident, instructing the RD to start scanning for interrupts - just before entering the guest, we check that the RD has finished scanning and that we can start running the vcpu - on preemption, we deschedule the VPE by making it invalid on the RD
However, we are preemptible between the first two steps. If it so happens *and* that the RD was still scanning, we nonetheless write to the GICR_VPENDBASER register while Dirty is set, and bad things happen (we're in UNPRED land).
This affects both the 4.0 and 4.1 implementations.
Make sure Dirty is cleared before performing the deschedule, meaning that its_clear_vpend_valid() becomes a sort of full VPE residency barrier.
Reported-by: Jingyi Wang wangjingyi11@huawei.com Tested-by: Nianyao Tang tangnianyao@huawei.com Signed-off-by: Marc Zyngier maz@kernel.org Fixes: 57e3cebd022f ("KVM: arm64: Delay the polling of the GICR_VPENDBASER.Dirty bit") Link: https://lore.kernel.org/r/4aae10ba-b39a-5f84-754b-69c2eb0a2c03@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/irqchip/irq-gic-v3-its.c | 28 +++++++++++++++++++--------- 1 file changed, 19 insertions(+), 9 deletions(-)
--- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -3011,18 +3011,12 @@ static int __init allocate_lpi_tables(vo return 0; }
-static u64 its_clear_vpend_valid(void __iomem *vlpi_base, u64 clr, u64 set) +static u64 read_vpend_dirty_clear(void __iomem *vlpi_base) { u32 count = 1000000; /* 1s! */ bool clean; u64 val;
- val = gicr_read_vpendbaser(vlpi_base + GICR_VPENDBASER); - val &= ~GICR_VPENDBASER_Valid; - val &= ~clr; - val |= set; - gicr_write_vpendbaser(val, vlpi_base + GICR_VPENDBASER); - do { val = gicr_read_vpendbaser(vlpi_base + GICR_VPENDBASER); clean = !(val & GICR_VPENDBASER_Dirty); @@ -3033,10 +3027,26 @@ static u64 its_clear_vpend_valid(void __ } } while (!clean && count);
- if (unlikely(val & GICR_VPENDBASER_Dirty)) { + if (unlikely(!clean)) pr_err_ratelimited("ITS virtual pending table not cleaning\n"); + + return val; +} + +static u64 its_clear_vpend_valid(void __iomem *vlpi_base, u64 clr, u64 set) +{ + u64 val; + + /* Make sure we wait until the RD is done with the initial scan */ + val = read_vpend_dirty_clear(vlpi_base); + val &= ~GICR_VPENDBASER_Valid; + val &= ~clr; + val |= set; + gicr_write_vpendbaser(val, vlpi_base + GICR_VPENDBASER); + + val = read_vpend_dirty_clear(vlpi_base); + if (unlikely(val & GICR_VPENDBASER_Dirty)) val |= GICR_VPENDBASER_PendingLast; - }
return val; }
From: Christophe Leroy christophe.leroy@csgroup.eu
commit af41d2866f7d75bbb38d487f6ec7770425d70e45 upstream.
Using conditional branches between two files is hasardous, they may get linked too far from each other.
arch/powerpc/kvm/book3s_64_entry.o:(.text+0x3ec): relocation truncated to fit: R_PPC64_REL14 (stub) against symbol `system_reset_common' defined in .text section in arch/powerpc/kernel/head_64.o
Reorganise the code to use non conditional branches.
Fixes: 89d35b239101 ("KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C") Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu [mpe: Avoid odd-looking bne ., use named local labels] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/89cf27bf43ee07a0b2879b9e8e2f5cd6386a3645.164836633... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kvm/book3s_64_entry.S | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/arch/powerpc/kvm/book3s_64_entry.S +++ b/arch/powerpc/kvm/book3s_64_entry.S @@ -414,10 +414,16 @@ END_FTR_SECTION_IFSET(CPU_FTR_DAWR1) */ ld r10,HSTATE_SCRATCH0(r13) cmpwi r10,BOOK3S_INTERRUPT_MACHINE_CHECK - beq machine_check_common + beq .Lcall_machine_check_common
cmpwi r10,BOOK3S_INTERRUPT_SYSTEM_RESET - beq system_reset_common + beq .Lcall_system_reset_common
b . + +.Lcall_machine_check_common: + b machine_check_common + +.Lcall_system_reset_common: + b system_reset_common #endif
From: Andre Przywara andre.przywara@arm.com
commit 544808f7e21cb9ccdb8f3aa7de594c05b1419061 upstream.
At the moment the GIC IRQ domain translation routine happily converts ACPI table GSI numbers below 16 to GIC SGIs (Software Generated Interrupts aka IPIs). On the Devicetree side we explicitly forbid this translation, actually the function will never return HWIRQs below 16 when using a DT based domain translation.
We expect SGIs to be handled in the first part of the function, and any further occurrence should be treated as a firmware bug, so add a check and print to report this explicitly and avoid lengthy debug sessions.
Fixes: 64b499d8df40 ("irqchip/gic-v3: Configure SGIs as standard interrupts") Signed-off-by: Andre Przywara andre.przywara@arm.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20220404110842.2882446-1-andre.przywara@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/irqchip/irq-gic-v3.c | 6 ++++++ drivers/irqchip/irq-gic.c | 6 ++++++ 2 files changed, 12 insertions(+)
--- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -1466,6 +1466,12 @@ static int gic_irq_domain_translate(stru if(fwspec->param_count != 2) return -EINVAL;
+ if (fwspec->param[0] < 16) { + pr_err(FW_BUG "Illegal GSI%d translation request\n", + fwspec->param[0]); + return -EINVAL; + } + *hwirq = fwspec->param[0]; *type = fwspec->param[1];
--- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -1085,6 +1085,12 @@ static int gic_irq_domain_translate(stru if(fwspec->param_count != 2) return -EINVAL;
+ if (fwspec->param[0] < 16) { + pr_err(FW_BUG "Illegal GSI%d translation request\n", + fwspec->param[0]); + return -EINVAL; + } + *hwirq = fwspec->param[0]; *type = fwspec->param[1];
From: Waiman Long longman@redhat.com
commit a431dbbc540532b7465eae4fc8b56a85a9fc7d17 upstream.
The gcc 12 compiler reports a "'mem_section' will never be NULL" warning on the following code:
static inline struct mem_section *__nr_to_section(unsigned long nr) { #ifdef CONFIG_SPARSEMEM_EXTREME if (!mem_section) return NULL; #endif if (!mem_section[SECTION_NR_TO_ROOT(nr)]) return NULL; :
It happens with CONFIG_SPARSEMEM_EXTREME off. The mem_section definition is
#ifdef CONFIG_SPARSEMEM_EXTREME extern struct mem_section **mem_section; #else extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]; #endif
In the !CONFIG_SPARSEMEM_EXTREME case, mem_section is a static 2-dimensional array and so the check "!mem_section[SECTION_NR_TO_ROOT(nr)]" doesn't make sense.
Fix this warning by moving the "!mem_section[SECTION_NR_TO_ROOT(nr)]" check up inside the CONFIG_SPARSEMEM_EXTREME block and adding an explicit NR_SECTION_ROOTS check to make sure that there is no out-of-bound array access.
Link: https://lkml.kernel.org/r/20220331180246.2746210-1-longman@redhat.com Fixes: 3e347261a80b ("sparsemem extreme implementation") Signed-off-by: Waiman Long longman@redhat.com Reported-by: Justin Forbes jforbes@redhat.com Cc: "Kirill A . Shutemov" kirill.shutemov@linux.intel.com Cc: Ingo Molnar mingo@kernel.org Cc: Rafael Aquini aquini@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/mmzone.h | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1389,13 +1389,16 @@ static inline unsigned long *section_to_
static inline struct mem_section *__nr_to_section(unsigned long nr) { + unsigned long root = SECTION_NR_TO_ROOT(nr); + + if (unlikely(root >= NR_SECTION_ROOTS)) + return NULL; + #ifdef CONFIG_SPARSEMEM_EXTREME - if (!mem_section) + if (!mem_section || !mem_section[root]) return NULL; #endif - if (!mem_section[SECTION_NR_TO_ROOT(nr)]) - return NULL; - return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK]; + return &mem_section[root][nr & SECTION_ROOT_MASK]; } extern size_t mem_section_usage_size(void);
From: Christophe Leroy christophe.leroy@csgroup.eu
commit 8fd4ddda2f49a66bf5dd3d0c01966c4b1971308b upstream.
System.map shows that vmlinux contains several instances of __static_call_return0():
c0004fc0 t __static_call_return0 c0011518 t __static_call_return0 c00d8160 t __static_call_return0
arch_static_call_transform() uses the middle one to check whether we are setting a call to __static_call_return0 or not:
c0011520 <arch_static_call_transform>: c0011520: 3d 20 c0 01 lis r9,-16383 <== r9 = 0xc001 << 16 c0011524: 39 29 15 18 addi r9,r9,5400 <== r9 += 0x1518 c0011528: 7c 05 48 00 cmpw r5,r9 <== r9 has value 0xc0011518 here
So if static_call_update() is called with one of the other instances of __static_call_return0(), arch_static_call_transform() won't recognise it.
In order to work properly, global single instance of __static_call_return0() is required.
Fixes: 3f2a8fc4b15d ("static_call/x86: Add __static_call_return0()") Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Acked-by: Josh Poimboeuf jpoimboe@redhat.com Link: https://lkml.kernel.org/r/30821468a0e7d28251954b578e5051dc09300d04.164725849... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/static_call.h | 5 kernel/Makefile | 3 kernel/static_call.c | 542 ------------------------------------------- kernel/static_call_inline.c | 543 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 547 insertions(+), 546 deletions(-) create mode 100644 kernel/static_call_inline.c
--- a/include/linux/static_call.h +++ b/include/linux/static_call.h @@ -248,10 +248,7 @@ static inline int static_call_text_reser return 0; }
-static inline long __static_call_return0(void) -{ - return 0; -} +extern long __static_call_return0(void);
#define EXPORT_STATIC_CALL(name) \ EXPORT_SYMBOL(STATIC_CALL_KEY(name)); \ --- a/kernel/Makefile +++ b/kernel/Makefile @@ -113,7 +113,8 @@ obj-$(CONFIG_CPU_PM) += cpu_pm.o obj-$(CONFIG_BPF) += bpf/ obj-$(CONFIG_KCSAN) += kcsan/ obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o -obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o +obj-$(CONFIG_HAVE_STATIC_CALL) += static_call.o +obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call_inline.o obj-$(CONFIG_CFI_CLANG) += cfi.o
obj-$(CONFIG_PERF_EVENTS) += events/ --- a/kernel/static_call.c +++ b/kernel/static_call.c @@ -1,548 +1,8 @@ // SPDX-License-Identifier: GPL-2.0 -#include <linux/init.h> #include <linux/static_call.h> -#include <linux/bug.h> -#include <linux/smp.h> -#include <linux/sort.h> -#include <linux/slab.h> -#include <linux/module.h> -#include <linux/cpu.h> -#include <linux/processor.h> -#include <asm/sections.h> - -extern struct static_call_site __start_static_call_sites[], - __stop_static_call_sites[]; -extern struct static_call_tramp_key __start_static_call_tramp_key[], - __stop_static_call_tramp_key[]; - -static bool static_call_initialized; - -/* mutex to protect key modules/sites */ -static DEFINE_MUTEX(static_call_mutex); - -static void static_call_lock(void) -{ - mutex_lock(&static_call_mutex); -} - -static void static_call_unlock(void) -{ - mutex_unlock(&static_call_mutex); -} - -static inline void *static_call_addr(struct static_call_site *site) -{ - return (void *)((long)site->addr + (long)&site->addr); -} - -static inline unsigned long __static_call_key(const struct static_call_site *site) -{ - return (long)site->key + (long)&site->key; -} - -static inline struct static_call_key *static_call_key(const struct static_call_site *site) -{ - return (void *)(__static_call_key(site) & ~STATIC_CALL_SITE_FLAGS); -} - -/* These assume the key is word-aligned. */ -static inline bool static_call_is_init(struct static_call_site *site) -{ - return __static_call_key(site) & STATIC_CALL_SITE_INIT; -} - -static inline bool static_call_is_tail(struct static_call_site *site) -{ - return __static_call_key(site) & STATIC_CALL_SITE_TAIL; -} - -static inline void static_call_set_init(struct static_call_site *site) -{ - site->key = (__static_call_key(site) | STATIC_CALL_SITE_INIT) - - (long)&site->key; -} - -static int static_call_site_cmp(const void *_a, const void *_b) -{ - const struct static_call_site *a = _a; - const struct static_call_site *b = _b; - const struct static_call_key *key_a = static_call_key(a); - const struct static_call_key *key_b = static_call_key(b); - - if (key_a < key_b) - return -1; - - if (key_a > key_b) - return 1; - - return 0; -} - -static void static_call_site_swap(void *_a, void *_b, int size) -{ - long delta = (unsigned long)_a - (unsigned long)_b; - struct static_call_site *a = _a; - struct static_call_site *b = _b; - struct static_call_site tmp = *a; - - a->addr = b->addr - delta; - a->key = b->key - delta; - - b->addr = tmp.addr + delta; - b->key = tmp.key + delta; -} - -static inline void static_call_sort_entries(struct static_call_site *start, - struct static_call_site *stop) -{ - sort(start, stop - start, sizeof(struct static_call_site), - static_call_site_cmp, static_call_site_swap); -} - -static inline bool static_call_key_has_mods(struct static_call_key *key) -{ - return !(key->type & 1); -} - -static inline struct static_call_mod *static_call_key_next(struct static_call_key *key) -{ - if (!static_call_key_has_mods(key)) - return NULL; - - return key->mods; -} - -static inline struct static_call_site *static_call_key_sites(struct static_call_key *key) -{ - if (static_call_key_has_mods(key)) - return NULL; - - return (struct static_call_site *)(key->type & ~1); -} - -void __static_call_update(struct static_call_key *key, void *tramp, void *func) -{ - struct static_call_site *site, *stop; - struct static_call_mod *site_mod, first; - - cpus_read_lock(); - static_call_lock(); - - if (key->func == func) - goto done; - - key->func = func; - - arch_static_call_transform(NULL, tramp, func, false); - - /* - * If uninitialized, we'll not update the callsites, but they still - * point to the trampoline and we just patched that. - */ - if (WARN_ON_ONCE(!static_call_initialized)) - goto done; - - first = (struct static_call_mod){ - .next = static_call_key_next(key), - .mod = NULL, - .sites = static_call_key_sites(key), - }; - - for (site_mod = &first; site_mod; site_mod = site_mod->next) { - bool init = system_state < SYSTEM_RUNNING; - struct module *mod = site_mod->mod; - - if (!site_mod->sites) { - /* - * This can happen if the static call key is defined in - * a module which doesn't use it. - * - * It also happens in the has_mods case, where the - * 'first' entry has no sites associated with it. - */ - continue; - } - - stop = __stop_static_call_sites; - - if (mod) { -#ifdef CONFIG_MODULES - stop = mod->static_call_sites + - mod->num_static_call_sites; - init = mod->state == MODULE_STATE_COMING; -#endif - } - - for (site = site_mod->sites; - site < stop && static_call_key(site) == key; site++) { - void *site_addr = static_call_addr(site); - - if (!init && static_call_is_init(site)) - continue; - - if (!kernel_text_address((unsigned long)site_addr)) { - /* - * This skips patching built-in __exit, which - * is part of init_section_contains() but is - * not part of kernel_text_address(). - * - * Skipping built-in __exit is fine since it - * will never be executed. - */ - WARN_ONCE(!static_call_is_init(site), - "can't patch static call site at %pS", - site_addr); - continue; - } - - arch_static_call_transform(site_addr, NULL, func, - static_call_is_tail(site)); - } - } - -done: - static_call_unlock(); - cpus_read_unlock(); -} -EXPORT_SYMBOL_GPL(__static_call_update); - -static int __static_call_init(struct module *mod, - struct static_call_site *start, - struct static_call_site *stop) -{ - struct static_call_site *site; - struct static_call_key *key, *prev_key = NULL; - struct static_call_mod *site_mod; - - if (start == stop) - return 0; - - static_call_sort_entries(start, stop); - - for (site = start; site < stop; site++) { - void *site_addr = static_call_addr(site); - - if ((mod && within_module_init((unsigned long)site_addr, mod)) || - (!mod && init_section_contains(site_addr, 1))) - static_call_set_init(site); - - key = static_call_key(site); - if (key != prev_key) { - prev_key = key; - - /* - * For vmlinux (!mod) avoid the allocation by storing - * the sites pointer in the key itself. Also see - * __static_call_update()'s @first. - * - * This allows architectures (eg. x86) to call - * static_call_init() before memory allocation works. - */ - if (!mod) { - key->sites = site; - key->type |= 1; - goto do_transform; - } - - site_mod = kzalloc(sizeof(*site_mod), GFP_KERNEL); - if (!site_mod) - return -ENOMEM; - - /* - * When the key has a direct sites pointer, extract - * that into an explicit struct static_call_mod, so we - * can have a list of modules. - */ - if (static_call_key_sites(key)) { - site_mod->mod = NULL; - site_mod->next = NULL; - site_mod->sites = static_call_key_sites(key); - - key->mods = site_mod; - - site_mod = kzalloc(sizeof(*site_mod), GFP_KERNEL); - if (!site_mod) - return -ENOMEM; - } - - site_mod->mod = mod; - site_mod->sites = site; - site_mod->next = static_call_key_next(key); - key->mods = site_mod; - } - -do_transform: - arch_static_call_transform(site_addr, NULL, key->func, - static_call_is_tail(site)); - } - - return 0; -} - -static int addr_conflict(struct static_call_site *site, void *start, void *end) -{ - unsigned long addr = (unsigned long)static_call_addr(site); - - if (addr <= (unsigned long)end && - addr + CALL_INSN_SIZE > (unsigned long)start) - return 1; - - return 0; -} - -static int __static_call_text_reserved(struct static_call_site *iter_start, - struct static_call_site *iter_stop, - void *start, void *end, bool init) -{ - struct static_call_site *iter = iter_start; - - while (iter < iter_stop) { - if (init || !static_call_is_init(iter)) { - if (addr_conflict(iter, start, end)) - return 1; - } - iter++; - } - - return 0; -} - -#ifdef CONFIG_MODULES - -static int __static_call_mod_text_reserved(void *start, void *end) -{ - struct module *mod; - int ret; - - preempt_disable(); - mod = __module_text_address((unsigned long)start); - WARN_ON_ONCE(__module_text_address((unsigned long)end) != mod); - if (!try_module_get(mod)) - mod = NULL; - preempt_enable(); - - if (!mod) - return 0; - - ret = __static_call_text_reserved(mod->static_call_sites, - mod->static_call_sites + mod->num_static_call_sites, - start, end, mod->state == MODULE_STATE_COMING); - - module_put(mod); - - return ret; -} - -static unsigned long tramp_key_lookup(unsigned long addr) -{ - struct static_call_tramp_key *start = __start_static_call_tramp_key; - struct static_call_tramp_key *stop = __stop_static_call_tramp_key; - struct static_call_tramp_key *tramp_key; - - for (tramp_key = start; tramp_key != stop; tramp_key++) { - unsigned long tramp; - - tramp = (long)tramp_key->tramp + (long)&tramp_key->tramp; - if (tramp == addr) - return (long)tramp_key->key + (long)&tramp_key->key; - } - - return 0; -} - -static int static_call_add_module(struct module *mod) -{ - struct static_call_site *start = mod->static_call_sites; - struct static_call_site *stop = start + mod->num_static_call_sites; - struct static_call_site *site; - - for (site = start; site != stop; site++) { - unsigned long s_key = __static_call_key(site); - unsigned long addr = s_key & ~STATIC_CALL_SITE_FLAGS; - unsigned long key; - - /* - * Is the key is exported, 'addr' points to the key, which - * means modules are allowed to call static_call_update() on - * it. - * - * Otherwise, the key isn't exported, and 'addr' points to the - * trampoline so we need to lookup the key. - * - * We go through this dance to prevent crazy modules from - * abusing sensitive static calls. - */ - if (!kernel_text_address(addr)) - continue; - - key = tramp_key_lookup(addr); - if (!key) { - pr_warn("Failed to fixup __raw_static_call() usage at: %ps\n", - static_call_addr(site)); - return -EINVAL; - } - - key |= s_key & STATIC_CALL_SITE_FLAGS; - site->key = key - (long)&site->key; - } - - return __static_call_init(mod, start, stop); -} - -static void static_call_del_module(struct module *mod) -{ - struct static_call_site *start = mod->static_call_sites; - struct static_call_site *stop = mod->static_call_sites + - mod->num_static_call_sites; - struct static_call_key *key, *prev_key = NULL; - struct static_call_mod *site_mod, **prev; - struct static_call_site *site; - - for (site = start; site < stop; site++) { - key = static_call_key(site); - if (key == prev_key) - continue; - - prev_key = key; - - for (prev = &key->mods, site_mod = key->mods; - site_mod && site_mod->mod != mod; - prev = &site_mod->next, site_mod = site_mod->next) - ; - - if (!site_mod) - continue; - - *prev = site_mod->next; - kfree(site_mod); - } -} - -static int static_call_module_notify(struct notifier_block *nb, - unsigned long val, void *data) -{ - struct module *mod = data; - int ret = 0; - - cpus_read_lock(); - static_call_lock(); - - switch (val) { - case MODULE_STATE_COMING: - ret = static_call_add_module(mod); - if (ret) { - WARN(1, "Failed to allocate memory for static calls"); - static_call_del_module(mod); - } - break; - case MODULE_STATE_GOING: - static_call_del_module(mod); - break; - } - - static_call_unlock(); - cpus_read_unlock(); - - return notifier_from_errno(ret); -} - -static struct notifier_block static_call_module_nb = { - .notifier_call = static_call_module_notify, -}; - -#else - -static inline int __static_call_mod_text_reserved(void *start, void *end) -{ - return 0; -} - -#endif /* CONFIG_MODULES */ - -int static_call_text_reserved(void *start, void *end) -{ - bool init = system_state < SYSTEM_RUNNING; - int ret = __static_call_text_reserved(__start_static_call_sites, - __stop_static_call_sites, start, end, init); - - if (ret) - return ret; - - return __static_call_mod_text_reserved(start, end); -} - -int __init static_call_init(void) -{ - int ret; - - if (static_call_initialized) - return 0; - - cpus_read_lock(); - static_call_lock(); - ret = __static_call_init(NULL, __start_static_call_sites, - __stop_static_call_sites); - static_call_unlock(); - cpus_read_unlock(); - - if (ret) { - pr_err("Failed to allocate memory for static_call!\n"); - BUG(); - } - - static_call_initialized = true; - -#ifdef CONFIG_MODULES - register_module_notifier(&static_call_module_nb); -#endif - return 0; -} -early_initcall(static_call_init);
long __static_call_return0(void) { return 0; } - -#ifdef CONFIG_STATIC_CALL_SELFTEST - -static int func_a(int x) -{ - return x+1; -} - -static int func_b(int x) -{ - return x+2; -} - -DEFINE_STATIC_CALL(sc_selftest, func_a); - -static struct static_call_data { - int (*func)(int); - int val; - int expect; -} static_call_data [] __initdata = { - { NULL, 2, 3 }, - { func_b, 2, 4 }, - { func_a, 2, 3 } -}; - -static int __init test_static_call_init(void) -{ - int i; - - for (i = 0; i < ARRAY_SIZE(static_call_data); i++ ) { - struct static_call_data *scd = &static_call_data[i]; - - if (scd->func) - static_call_update(sc_selftest, scd->func); - - WARN_ON(static_call(sc_selftest)(scd->val) != scd->expect); - } - - return 0; -} -early_initcall(test_static_call_init); - -#endif /* CONFIG_STATIC_CALL_SELFTEST */ +EXPORT_SYMBOL_GPL(__static_call_return0); --- /dev/null +++ b/kernel/static_call_inline.c @@ -0,0 +1,543 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/init.h> +#include <linux/static_call.h> +#include <linux/bug.h> +#include <linux/smp.h> +#include <linux/sort.h> +#include <linux/slab.h> +#include <linux/module.h> +#include <linux/cpu.h> +#include <linux/processor.h> +#include <asm/sections.h> + +extern struct static_call_site __start_static_call_sites[], + __stop_static_call_sites[]; +extern struct static_call_tramp_key __start_static_call_tramp_key[], + __stop_static_call_tramp_key[]; + +static bool static_call_initialized; + +/* mutex to protect key modules/sites */ +static DEFINE_MUTEX(static_call_mutex); + +static void static_call_lock(void) +{ + mutex_lock(&static_call_mutex); +} + +static void static_call_unlock(void) +{ + mutex_unlock(&static_call_mutex); +} + +static inline void *static_call_addr(struct static_call_site *site) +{ + return (void *)((long)site->addr + (long)&site->addr); +} + +static inline unsigned long __static_call_key(const struct static_call_site *site) +{ + return (long)site->key + (long)&site->key; +} + +static inline struct static_call_key *static_call_key(const struct static_call_site *site) +{ + return (void *)(__static_call_key(site) & ~STATIC_CALL_SITE_FLAGS); +} + +/* These assume the key is word-aligned. */ +static inline bool static_call_is_init(struct static_call_site *site) +{ + return __static_call_key(site) & STATIC_CALL_SITE_INIT; +} + +static inline bool static_call_is_tail(struct static_call_site *site) +{ + return __static_call_key(site) & STATIC_CALL_SITE_TAIL; +} + +static inline void static_call_set_init(struct static_call_site *site) +{ + site->key = (__static_call_key(site) | STATIC_CALL_SITE_INIT) - + (long)&site->key; +} + +static int static_call_site_cmp(const void *_a, const void *_b) +{ + const struct static_call_site *a = _a; + const struct static_call_site *b = _b; + const struct static_call_key *key_a = static_call_key(a); + const struct static_call_key *key_b = static_call_key(b); + + if (key_a < key_b) + return -1; + + if (key_a > key_b) + return 1; + + return 0; +} + +static void static_call_site_swap(void *_a, void *_b, int size) +{ + long delta = (unsigned long)_a - (unsigned long)_b; + struct static_call_site *a = _a; + struct static_call_site *b = _b; + struct static_call_site tmp = *a; + + a->addr = b->addr - delta; + a->key = b->key - delta; + + b->addr = tmp.addr + delta; + b->key = tmp.key + delta; +} + +static inline void static_call_sort_entries(struct static_call_site *start, + struct static_call_site *stop) +{ + sort(start, stop - start, sizeof(struct static_call_site), + static_call_site_cmp, static_call_site_swap); +} + +static inline bool static_call_key_has_mods(struct static_call_key *key) +{ + return !(key->type & 1); +} + +static inline struct static_call_mod *static_call_key_next(struct static_call_key *key) +{ + if (!static_call_key_has_mods(key)) + return NULL; + + return key->mods; +} + +static inline struct static_call_site *static_call_key_sites(struct static_call_key *key) +{ + if (static_call_key_has_mods(key)) + return NULL; + + return (struct static_call_site *)(key->type & ~1); +} + +void __static_call_update(struct static_call_key *key, void *tramp, void *func) +{ + struct static_call_site *site, *stop; + struct static_call_mod *site_mod, first; + + cpus_read_lock(); + static_call_lock(); + + if (key->func == func) + goto done; + + key->func = func; + + arch_static_call_transform(NULL, tramp, func, false); + + /* + * If uninitialized, we'll not update the callsites, but they still + * point to the trampoline and we just patched that. + */ + if (WARN_ON_ONCE(!static_call_initialized)) + goto done; + + first = (struct static_call_mod){ + .next = static_call_key_next(key), + .mod = NULL, + .sites = static_call_key_sites(key), + }; + + for (site_mod = &first; site_mod; site_mod = site_mod->next) { + bool init = system_state < SYSTEM_RUNNING; + struct module *mod = site_mod->mod; + + if (!site_mod->sites) { + /* + * This can happen if the static call key is defined in + * a module which doesn't use it. + * + * It also happens in the has_mods case, where the + * 'first' entry has no sites associated with it. + */ + continue; + } + + stop = __stop_static_call_sites; + + if (mod) { +#ifdef CONFIG_MODULES + stop = mod->static_call_sites + + mod->num_static_call_sites; + init = mod->state == MODULE_STATE_COMING; +#endif + } + + for (site = site_mod->sites; + site < stop && static_call_key(site) == key; site++) { + void *site_addr = static_call_addr(site); + + if (!init && static_call_is_init(site)) + continue; + + if (!kernel_text_address((unsigned long)site_addr)) { + /* + * This skips patching built-in __exit, which + * is part of init_section_contains() but is + * not part of kernel_text_address(). + * + * Skipping built-in __exit is fine since it + * will never be executed. + */ + WARN_ONCE(!static_call_is_init(site), + "can't patch static call site at %pS", + site_addr); + continue; + } + + arch_static_call_transform(site_addr, NULL, func, + static_call_is_tail(site)); + } + } + +done: + static_call_unlock(); + cpus_read_unlock(); +} +EXPORT_SYMBOL_GPL(__static_call_update); + +static int __static_call_init(struct module *mod, + struct static_call_site *start, + struct static_call_site *stop) +{ + struct static_call_site *site; + struct static_call_key *key, *prev_key = NULL; + struct static_call_mod *site_mod; + + if (start == stop) + return 0; + + static_call_sort_entries(start, stop); + + for (site = start; site < stop; site++) { + void *site_addr = static_call_addr(site); + + if ((mod && within_module_init((unsigned long)site_addr, mod)) || + (!mod && init_section_contains(site_addr, 1))) + static_call_set_init(site); + + key = static_call_key(site); + if (key != prev_key) { + prev_key = key; + + /* + * For vmlinux (!mod) avoid the allocation by storing + * the sites pointer in the key itself. Also see + * __static_call_update()'s @first. + * + * This allows architectures (eg. x86) to call + * static_call_init() before memory allocation works. + */ + if (!mod) { + key->sites = site; + key->type |= 1; + goto do_transform; + } + + site_mod = kzalloc(sizeof(*site_mod), GFP_KERNEL); + if (!site_mod) + return -ENOMEM; + + /* + * When the key has a direct sites pointer, extract + * that into an explicit struct static_call_mod, so we + * can have a list of modules. + */ + if (static_call_key_sites(key)) { + site_mod->mod = NULL; + site_mod->next = NULL; + site_mod->sites = static_call_key_sites(key); + + key->mods = site_mod; + + site_mod = kzalloc(sizeof(*site_mod), GFP_KERNEL); + if (!site_mod) + return -ENOMEM; + } + + site_mod->mod = mod; + site_mod->sites = site; + site_mod->next = static_call_key_next(key); + key->mods = site_mod; + } + +do_transform: + arch_static_call_transform(site_addr, NULL, key->func, + static_call_is_tail(site)); + } + + return 0; +} + +static int addr_conflict(struct static_call_site *site, void *start, void *end) +{ + unsigned long addr = (unsigned long)static_call_addr(site); + + if (addr <= (unsigned long)end && + addr + CALL_INSN_SIZE > (unsigned long)start) + return 1; + + return 0; +} + +static int __static_call_text_reserved(struct static_call_site *iter_start, + struct static_call_site *iter_stop, + void *start, void *end, bool init) +{ + struct static_call_site *iter = iter_start; + + while (iter < iter_stop) { + if (init || !static_call_is_init(iter)) { + if (addr_conflict(iter, start, end)) + return 1; + } + iter++; + } + + return 0; +} + +#ifdef CONFIG_MODULES + +static int __static_call_mod_text_reserved(void *start, void *end) +{ + struct module *mod; + int ret; + + preempt_disable(); + mod = __module_text_address((unsigned long)start); + WARN_ON_ONCE(__module_text_address((unsigned long)end) != mod); + if (!try_module_get(mod)) + mod = NULL; + preempt_enable(); + + if (!mod) + return 0; + + ret = __static_call_text_reserved(mod->static_call_sites, + mod->static_call_sites + mod->num_static_call_sites, + start, end, mod->state == MODULE_STATE_COMING); + + module_put(mod); + + return ret; +} + +static unsigned long tramp_key_lookup(unsigned long addr) +{ + struct static_call_tramp_key *start = __start_static_call_tramp_key; + struct static_call_tramp_key *stop = __stop_static_call_tramp_key; + struct static_call_tramp_key *tramp_key; + + for (tramp_key = start; tramp_key != stop; tramp_key++) { + unsigned long tramp; + + tramp = (long)tramp_key->tramp + (long)&tramp_key->tramp; + if (tramp == addr) + return (long)tramp_key->key + (long)&tramp_key->key; + } + + return 0; +} + +static int static_call_add_module(struct module *mod) +{ + struct static_call_site *start = mod->static_call_sites; + struct static_call_site *stop = start + mod->num_static_call_sites; + struct static_call_site *site; + + for (site = start; site != stop; site++) { + unsigned long s_key = __static_call_key(site); + unsigned long addr = s_key & ~STATIC_CALL_SITE_FLAGS; + unsigned long key; + + /* + * Is the key is exported, 'addr' points to the key, which + * means modules are allowed to call static_call_update() on + * it. + * + * Otherwise, the key isn't exported, and 'addr' points to the + * trampoline so we need to lookup the key. + * + * We go through this dance to prevent crazy modules from + * abusing sensitive static calls. + */ + if (!kernel_text_address(addr)) + continue; + + key = tramp_key_lookup(addr); + if (!key) { + pr_warn("Failed to fixup __raw_static_call() usage at: %ps\n", + static_call_addr(site)); + return -EINVAL; + } + + key |= s_key & STATIC_CALL_SITE_FLAGS; + site->key = key - (long)&site->key; + } + + return __static_call_init(mod, start, stop); +} + +static void static_call_del_module(struct module *mod) +{ + struct static_call_site *start = mod->static_call_sites; + struct static_call_site *stop = mod->static_call_sites + + mod->num_static_call_sites; + struct static_call_key *key, *prev_key = NULL; + struct static_call_mod *site_mod, **prev; + struct static_call_site *site; + + for (site = start; site < stop; site++) { + key = static_call_key(site); + if (key == prev_key) + continue; + + prev_key = key; + + for (prev = &key->mods, site_mod = key->mods; + site_mod && site_mod->mod != mod; + prev = &site_mod->next, site_mod = site_mod->next) + ; + + if (!site_mod) + continue; + + *prev = site_mod->next; + kfree(site_mod); + } +} + +static int static_call_module_notify(struct notifier_block *nb, + unsigned long val, void *data) +{ + struct module *mod = data; + int ret = 0; + + cpus_read_lock(); + static_call_lock(); + + switch (val) { + case MODULE_STATE_COMING: + ret = static_call_add_module(mod); + if (ret) { + WARN(1, "Failed to allocate memory for static calls"); + static_call_del_module(mod); + } + break; + case MODULE_STATE_GOING: + static_call_del_module(mod); + break; + } + + static_call_unlock(); + cpus_read_unlock(); + + return notifier_from_errno(ret); +} + +static struct notifier_block static_call_module_nb = { + .notifier_call = static_call_module_notify, +}; + +#else + +static inline int __static_call_mod_text_reserved(void *start, void *end) +{ + return 0; +} + +#endif /* CONFIG_MODULES */ + +int static_call_text_reserved(void *start, void *end) +{ + bool init = system_state < SYSTEM_RUNNING; + int ret = __static_call_text_reserved(__start_static_call_sites, + __stop_static_call_sites, start, end, init); + + if (ret) + return ret; + + return __static_call_mod_text_reserved(start, end); +} + +int __init static_call_init(void) +{ + int ret; + + if (static_call_initialized) + return 0; + + cpus_read_lock(); + static_call_lock(); + ret = __static_call_init(NULL, __start_static_call_sites, + __stop_static_call_sites); + static_call_unlock(); + cpus_read_unlock(); + + if (ret) { + pr_err("Failed to allocate memory for static_call!\n"); + BUG(); + } + + static_call_initialized = true; + +#ifdef CONFIG_MODULES + register_module_notifier(&static_call_module_nb); +#endif + return 0; +} +early_initcall(static_call_init); + +#ifdef CONFIG_STATIC_CALL_SELFTEST + +static int func_a(int x) +{ + return x+1; +} + +static int func_b(int x) +{ + return x+2; +} + +DEFINE_STATIC_CALL(sc_selftest, func_a); + +static struct static_call_data { + int (*func)(int); + int val; + int expect; +} static_call_data [] __initdata = { + { NULL, 2, 3 }, + { func_b, 2, 4 }, + { func_a, 2, 3 } +}; + +static int __init test_static_call_init(void) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(static_call_data); i++ ) { + struct static_call_data *scd = &static_call_data[i]; + + if (scd->func) + static_call_update(sc_selftest, scd->func); + + WARN_ON(static_call(sc_selftest)(scd->val) != scd->expect); + } + + return 0; +} +early_initcall(test_static_call_init); + +#endif /* CONFIG_STATIC_CALL_SELFTEST */
From: Jens Axboe axboe@kernel.dk
commit 584b0180f0f4d67d7145950fe68c625f06c88b10 upstream.
In preparation for not necessarily having a file assigned at prep time, defer any initialization associated with the file to when the opcode handler is run.
Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 119 ++++++++++++++++++++++++++++++---------------------------- 1 file changed, 62 insertions(+), 57 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -560,7 +560,8 @@ struct io_rw { /* NOTE: kiocb has the file as the first member, so don't do it here */ struct kiocb kiocb; u64 addr; - u64 len; + u32 len; + u32 flags; };
struct io_connect { @@ -2984,50 +2985,11 @@ static inline bool io_file_supports_nowa
static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe) { - struct io_ring_ctx *ctx = req->ctx; struct kiocb *kiocb = &req->rw.kiocb; - struct file *file = req->file; unsigned ioprio; int ret;
- if (!io_req_ffs_set(req)) - req->flags |= io_file_get_flags(file) << REQ_F_SUPPORT_NOWAIT_BIT; - kiocb->ki_pos = READ_ONCE(sqe->off); - if (kiocb->ki_pos == -1) { - if (!(file->f_mode & FMODE_STREAM)) { - req->flags |= REQ_F_CUR_POS; - kiocb->ki_pos = file->f_pos; - } else { - kiocb->ki_pos = 0; - } - } - kiocb->ki_flags = iocb_flags(file); - ret = kiocb_set_rw_flags(kiocb, READ_ONCE(sqe->rw_flags)); - if (unlikely(ret)) - return ret; - - /* - * If the file is marked O_NONBLOCK, still allow retry for it if it - * supports async. Otherwise it's impossible to use O_NONBLOCK files - * reliably. If not, or it IOCB_NOWAIT is set, don't retry. - */ - if ((kiocb->ki_flags & IOCB_NOWAIT) || - ((file->f_flags & O_NONBLOCK) && !io_file_supports_nowait(req))) - req->flags |= REQ_F_NOWAIT; - - if (ctx->flags & IORING_SETUP_IOPOLL) { - if (!(kiocb->ki_flags & IOCB_DIRECT) || !file->f_op->iopoll) - return -EOPNOTSUPP; - - kiocb->ki_flags |= IOCB_HIPRI | IOCB_ALLOC_CACHE; - kiocb->ki_complete = io_complete_rw_iopoll; - req->iopoll_completed = 0; - } else { - if (kiocb->ki_flags & IOCB_HIPRI) - return -EINVAL; - kiocb->ki_complete = io_complete_rw; - }
ioprio = READ_ONCE(sqe->ioprio); if (ioprio) { @@ -3043,6 +3005,7 @@ static int io_prep_rw(struct io_kiocb *r req->imu = NULL; req->rw.addr = READ_ONCE(sqe->addr); req->rw.len = READ_ONCE(sqe->len); + req->rw.flags = READ_ONCE(sqe->rw_flags); req->buf_index = READ_ONCE(sqe->buf_index); return 0; } @@ -3523,13 +3486,6 @@ static inline int io_rw_prep_async(struc return 0; }
-static int io_read_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) -{ - if (unlikely(!(req->file->f_mode & FMODE_READ))) - return -EBADF; - return io_prep_rw(req, sqe); -} - /* * This is our waitqueue callback handler, registered through __folio_lock_async() * when we initially tried to do the IO with the iocb armed our waitqueue. @@ -3617,6 +3573,58 @@ static bool need_read_all(struct io_kioc S_ISBLK(file_inode(req->file)->i_mode); }
+static int io_rw_init_file(struct io_kiocb *req, fmode_t mode) +{ + struct kiocb *kiocb = &req->rw.kiocb; + struct io_ring_ctx *ctx = req->ctx; + struct file *file = req->file; + int ret; + + if (unlikely(!file || !(file->f_mode & mode))) + return -EBADF; + + if (!io_req_ffs_set(req)) + req->flags |= io_file_get_flags(file) << REQ_F_SUPPORT_NOWAIT_BIT; + + if (kiocb->ki_pos == -1) { + if (!(file->f_mode & FMODE_STREAM)) { + req->flags |= REQ_F_CUR_POS; + kiocb->ki_pos = file->f_pos; + } else { + kiocb->ki_pos = 0; + } + } + + kiocb->ki_flags = iocb_flags(file); + ret = kiocb_set_rw_flags(kiocb, req->rw.flags); + if (unlikely(ret)) + return ret; + + /* + * If the file is marked O_NONBLOCK, still allow retry for it if it + * supports async. Otherwise it's impossible to use O_NONBLOCK files + * reliably. If not, or it IOCB_NOWAIT is set, don't retry. + */ + if ((kiocb->ki_flags & IOCB_NOWAIT) || + ((file->f_flags & O_NONBLOCK) && !io_file_supports_nowait(req))) + req->flags |= REQ_F_NOWAIT; + + if (ctx->flags & IORING_SETUP_IOPOLL) { + if (!(kiocb->ki_flags & IOCB_DIRECT) || !file->f_op->iopoll) + return -EOPNOTSUPP; + + kiocb->ki_flags |= IOCB_HIPRI | IOCB_ALLOC_CACHE; + kiocb->ki_complete = io_complete_rw_iopoll; + req->iopoll_completed = 0; + } else { + if (kiocb->ki_flags & IOCB_HIPRI) + return -EINVAL; + kiocb->ki_complete = io_complete_rw; + } + + return 0; +} + static int io_read(struct io_kiocb *req, unsigned int issue_flags) { struct io_rw_state __s, *s = &__s; @@ -3641,6 +3649,9 @@ static int io_read(struct io_kiocb *req, iov_iter_restore(&s->iter, &s->iter_state); iovec = NULL; } + ret = io_rw_init_file(req, FMODE_READ); + if (unlikely(ret)) + return ret; req->result = iov_iter_count(&s->iter);
if (force_nonblock) { @@ -3739,14 +3750,6 @@ out_free: return 0; }
-static int io_write_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) -{ - if (unlikely(!(req->file->f_mode & FMODE_WRITE))) - return -EBADF; - req->rw.kiocb.ki_hint = ki_hint_validate(file_write_hint(req->file)); - return io_prep_rw(req, sqe); -} - static int io_write(struct io_kiocb *req, unsigned int issue_flags) { struct io_rw_state __s, *s = &__s; @@ -3766,6 +3769,9 @@ static int io_write(struct io_kiocb *req iov_iter_restore(&s->iter, &s->iter_state); iovec = NULL; } + ret = io_rw_init_file(req, FMODE_WRITE); + if (unlikely(ret)) + return ret; req->result = iov_iter_count(&s->iter);
if (force_nonblock) { @@ -6501,11 +6507,10 @@ static int io_req_prep(struct io_kiocb * case IORING_OP_READV: case IORING_OP_READ_FIXED: case IORING_OP_READ: - return io_read_prep(req, sqe); case IORING_OP_WRITEV: case IORING_OP_WRITE_FIXED: case IORING_OP_WRITE: - return io_write_prep(req, sqe); + return io_prep_rw(req, sqe); case IORING_OP_POLL_ADD: return io_poll_add_prep(req, sqe); case IORING_OP_POLL_REMOVE:
From: Jens Axboe axboe@kernel.dk
commit 5106dd6e74ab6c94daac1c357094f11e6934b36f upstream.
We'll need this in a future patch, when we could be assigning the file after the prep stage. While at it, get rid of the io_file_get() helper, it just makes the code harder to read.
Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 82 +++++++++++++++++++++++++++++++++------------------------- 1 file changed, 47 insertions(+), 35 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1128,8 +1128,9 @@ static int __io_register_rsrc_update(str struct io_uring_rsrc_update2 *up, unsigned nr_args); static void io_clean_op(struct io_kiocb *req); -static struct file *io_file_get(struct io_ring_ctx *ctx, - struct io_kiocb *req, int fd, bool fixed); +static inline struct file *io_file_get_fixed(struct io_kiocb *req, int fd, + unsigned issue_flags); +static inline struct file *io_file_get_normal(struct io_kiocb *req, int fd); static void __io_queue_sqe(struct io_kiocb *req); static void io_rsrc_put_work(struct work_struct *work);
@@ -1258,13 +1259,20 @@ static void io_rsrc_refs_refill(struct i }
static inline void io_req_set_rsrc_node(struct io_kiocb *req, - struct io_ring_ctx *ctx) + struct io_ring_ctx *ctx, + unsigned int issue_flags) { if (!req->fixed_rsrc_refs) { req->fixed_rsrc_refs = &ctx->rsrc_node->refs; - ctx->rsrc_cached_refs--; - if (unlikely(ctx->rsrc_cached_refs < 0)) - io_rsrc_refs_refill(ctx); + + if (!(issue_flags & IO_URING_F_UNLOCKED)) { + lockdep_assert_held(&ctx->uring_lock); + ctx->rsrc_cached_refs--; + if (unlikely(ctx->rsrc_cached_refs < 0)) + io_rsrc_refs_refill(ctx); + } else { + percpu_ref_get(req->fixed_rsrc_refs); + } } }
@@ -3122,7 +3130,8 @@ static int __io_import_fixed(struct io_k return 0; }
-static int io_import_fixed(struct io_kiocb *req, int rw, struct iov_iter *iter) +static int io_import_fixed(struct io_kiocb *req, int rw, struct iov_iter *iter, + unsigned int issue_flags) { struct io_mapped_ubuf *imu = req->imu; u16 index, buf_index = req->buf_index; @@ -3132,7 +3141,7 @@ static int io_import_fixed(struct io_kio
if (unlikely(buf_index >= ctx->nr_user_bufs)) return -EFAULT; - io_req_set_rsrc_node(req, ctx); + io_req_set_rsrc_node(req, ctx, issue_flags); index = array_index_nospec(buf_index, ctx->nr_user_bufs); imu = READ_ONCE(ctx->user_bufs[index]); req->imu = imu; @@ -3288,7 +3297,7 @@ static struct iovec *__io_import_iovec(i ssize_t ret;
if (opcode == IORING_OP_READ_FIXED || opcode == IORING_OP_WRITE_FIXED) { - ret = io_import_fixed(req, rw, iter); + ret = io_import_fixed(req, rw, iter, issue_flags); if (ret) return ERR_PTR(ret); return NULL; @@ -4167,8 +4176,10 @@ static int io_tee(struct io_kiocb *req, if (issue_flags & IO_URING_F_NONBLOCK) return -EAGAIN;
- in = io_file_get(req->ctx, req, sp->splice_fd_in, - (sp->flags & SPLICE_F_FD_IN_FIXED)); + if (sp->flags & SPLICE_F_FD_IN_FIXED) + in = io_file_get_fixed(req, sp->splice_fd_in, IO_URING_F_UNLOCKED); + else + in = io_file_get_normal(req, sp->splice_fd_in); if (!in) { ret = -EBADF; goto done; @@ -4207,8 +4218,10 @@ static int io_splice(struct io_kiocb *re if (issue_flags & IO_URING_F_NONBLOCK) return -EAGAIN;
- in = io_file_get(req->ctx, req, sp->splice_fd_in, - (sp->flags & SPLICE_F_FD_IN_FIXED)); + if (sp->flags & SPLICE_F_FD_IN_FIXED) + in = io_file_get_fixed(req, sp->splice_fd_in, IO_URING_F_UNLOCKED); + else + in = io_file_get_normal(req, sp->splice_fd_in); if (!in) { ret = -EBADF; goto done; @@ -5513,7 +5526,7 @@ static void io_poll_remove_entries(struc * either spurious wakeup or multishot CQE is served. 0 when it's done with * the request, then the mask is stored in req->result. */ -static int io_poll_check_events(struct io_kiocb *req) +static int io_poll_check_events(struct io_kiocb *req, bool locked) { struct io_ring_ctx *ctx = req->ctx; struct io_poll_iocb *poll = io_poll_get_single(req); @@ -5569,7 +5582,7 @@ static void io_poll_task_func(struct io_ struct io_ring_ctx *ctx = req->ctx; int ret;
- ret = io_poll_check_events(req); + ret = io_poll_check_events(req, *locked); if (ret > 0) return;
@@ -5594,7 +5607,7 @@ static void io_apoll_task_func(struct io struct io_ring_ctx *ctx = req->ctx; int ret;
- ret = io_poll_check_events(req); + ret = io_poll_check_events(req, *locked); if (ret > 0) return;
@@ -6962,30 +6975,36 @@ static void io_fixed_file_set(struct io_ file_slot->file_ptr = file_ptr; }
-static inline struct file *io_file_get_fixed(struct io_ring_ctx *ctx, - struct io_kiocb *req, int fd) +static inline struct file *io_file_get_fixed(struct io_kiocb *req, int fd, + unsigned int issue_flags) { - struct file *file; + struct io_ring_ctx *ctx = req->ctx; + struct file *file = NULL; unsigned long file_ptr;
+ if (issue_flags & IO_URING_F_UNLOCKED) + mutex_lock(&ctx->uring_lock); + if (unlikely((unsigned int)fd >= ctx->nr_user_files)) - return NULL; + goto out; fd = array_index_nospec(fd, ctx->nr_user_files); file_ptr = io_fixed_file_slot(&ctx->file_table, fd)->file_ptr; file = (struct file *) (file_ptr & FFS_MASK); file_ptr &= ~FFS_MASK; /* mask in overlapping REQ_F and FFS bits */ req->flags |= (file_ptr << REQ_F_SUPPORT_NOWAIT_BIT); - io_req_set_rsrc_node(req, ctx); + io_req_set_rsrc_node(req, ctx, 0); +out: + if (issue_flags & IO_URING_F_UNLOCKED) + mutex_unlock(&ctx->uring_lock); return file; }
-static struct file *io_file_get_normal(struct io_ring_ctx *ctx, - struct io_kiocb *req, int fd) +static struct file *io_file_get_normal(struct io_kiocb *req, int fd) { struct file *file = fget(fd);
- trace_io_uring_file_get(ctx, fd); + trace_io_uring_file_get(req->ctx, fd);
/* we don't allow fixed io_uring files */ if (file && unlikely(file->f_op == &io_uring_fops)) @@ -6993,15 +7012,6 @@ static struct file *io_file_get_normal(s return file; }
-static inline struct file *io_file_get(struct io_ring_ctx *ctx, - struct io_kiocb *req, int fd, bool fixed) -{ - if (fixed) - return io_file_get_fixed(ctx, req, fd); - else - return io_file_get_normal(ctx, req, fd); -} - static void io_req_task_link_timeout(struct io_kiocb *req, bool *locked) { struct io_kiocb *prev = req->timeout.prev; @@ -7249,8 +7259,10 @@ static int io_init_req(struct io_ring_ct blk_start_plug_nr_ios(&state->plug, state->submit_nr); }
- req->file = io_file_get(ctx, req, READ_ONCE(sqe->fd), - (sqe_flags & IOSQE_FIXED_FILE)); + if (req->flags & REQ_F_FIXED_FILE) + req->file = io_file_get_fixed(req, READ_ONCE(sqe->fd), 0); + else + req->file = io_file_get_normal(req, READ_ONCE(sqe->fd)); if (unlikely(!req->file)) return -EBADF; }
From: Jens Axboe axboe@kernel.dk
commit 6bf9c47a398911e0ab920e362115153596c80432 upstream.
If an application uses direct open or accept, it knows in advance what direct descriptor value it will get as it picks it itself. This allows combined requests such as:
sqe = io_uring_get_sqe(ring); io_uring_prep_openat_direct(sqe, ..., file_slot); sqe->flags |= IOSQE_IO_LINK | IOSQE_CQE_SKIP_SUCCESS;
sqe = io_uring_get_sqe(ring); io_uring_prep_read(sqe,file_slot, buf, buf_size, 0); sqe->flags |= IOSQE_FIXED_FILE;
io_uring_submit(ring);
where we prepare both a file open and read, and only get a completion event for the read when both have completed successfully.
Currently links are fully prepared before the head is issued, but that fails if the dependent link needs a file assigned that isn't valid until the head has completed.
Conversely, if the same chain is performed but the fixed file slot is already valid, then we would be unexpectedly returning data from the old file slot rather than the newly opened one. Make sure we're consistent here.
Allow deferral of file setup, which makes this documented case work.
Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io-wq.h | 1 + fs/io_uring.c | 39 +++++++++++++++++++++++++++++---------- 2 files changed, 30 insertions(+), 10 deletions(-)
--- a/fs/io-wq.h +++ b/fs/io-wq.h @@ -155,6 +155,7 @@ struct io_wq_work_node *wq_stack_extract struct io_wq_work { struct io_wq_work_node list; unsigned flags; + int fd; };
static inline struct io_wq_work *wq_next_work(struct io_wq_work *work) --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -6745,6 +6745,23 @@ static void io_clean_op(struct io_kiocb req->flags &= ~IO_REQ_CLEAN_FLAGS; }
+static bool io_assign_file(struct io_kiocb *req, unsigned int issue_flags) +{ + if (req->file || !io_op_defs[req->opcode].needs_file) + return true; + + if (req->flags & REQ_F_FIXED_FILE) + req->file = io_file_get_fixed(req, req->work.fd, issue_flags); + else + req->file = io_file_get_normal(req, req->work.fd); + if (req->file) + return true; + + req_set_fail(req); + req->result = -EBADF; + return false; +} + static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) { const struct cred *creds = NULL; @@ -6755,6 +6772,8 @@ static int io_issue_sqe(struct io_kiocb
if (!io_op_defs[req->opcode].audit_skip) audit_uring_entry(req->opcode); + if (unlikely(!io_assign_file(req, issue_flags))) + return -EBADF;
switch (req->opcode) { case IORING_OP_NOP: @@ -6896,10 +6915,11 @@ static struct io_wq_work *io_wq_free_wor static void io_wq_submit_work(struct io_wq_work *work) { struct io_kiocb *req = container_of(work, struct io_kiocb, work); + const struct io_op_def *def = &io_op_defs[req->opcode]; unsigned int issue_flags = IO_URING_F_UNLOCKED; bool needs_poll = false; struct io_kiocb *timeout; - int ret = 0; + int ret = 0, err = -ECANCELED;
/* one will be dropped by ->io_free_work() after returning to io-wq */ if (!(req->flags & REQ_F_REFCOUNT)) @@ -6911,14 +6931,18 @@ static void io_wq_submit_work(struct io_ if (timeout) io_queue_linked_timeout(timeout);
+ if (!io_assign_file(req, issue_flags)) { + err = -EBADF; + work->flags |= IO_WQ_WORK_CANCEL; + } + /* either cancelled or io-wq is dying, so don't touch tctx->iowq */ if (work->flags & IO_WQ_WORK_CANCEL) { - io_req_task_queue_fail(req, -ECANCELED); + io_req_task_queue_fail(req, err); return; }
if (req->flags & REQ_F_FORCE_ASYNC) { - const struct io_op_def *def = &io_op_defs[req->opcode]; bool opcode_poll = def->pollin || def->pollout;
if (opcode_poll && file_can_poll(req->file)) { @@ -7249,6 +7273,8 @@ static int io_init_req(struct io_ring_ct if (io_op_defs[opcode].needs_file) { struct io_submit_state *state = &ctx->submit_state;
+ req->work.fd = READ_ONCE(sqe->fd); + /* * Plug now if we have more than 2 IO left after this, and the * target is potentially a read/write to block based storage. @@ -7258,13 +7284,6 @@ static int io_init_req(struct io_ring_ct state->need_plug = false; blk_start_plug_nr_ios(&state->plug, state->submit_nr); } - - if (req->flags & REQ_F_FIXED_FILE) - req->file = io_file_get_fixed(req, READ_ONCE(sqe->fd), 0); - else - req->file = io_file_get_normal(req, READ_ONCE(sqe->fd)); - if (unlikely(!req->file)) - return -EBADF; }
personality = READ_ONCE(sqe->personality);
From: Jens Axboe axboe@kernel.dk
commit d5361233e9ab920e135819f73dd8466355f1fddd upstream.
io_uring tracks requests that are referencing an io_uring descriptor to be able to cancel without worrying about loops in the references. Since we now assign the file at execution time, the easier approach is to drop a potentially problematic reference before we punt the request. This eliminates the need to special case these types of files beyond just marking them as such, and simplifies cancelation quite a bit.
This also fixes a recent issue where an async punted tee operation would with the io_uring descriptor as the output file would crash when attempting to get a reference to the file from the io-wq worker. We could have worked around that, but this is the much cleaner fix.
Fixes: 6bf9c47a3989 ("io_uring: defer file assignment") Reported-by: syzbot+c4b9303500a21750b250@syzkaller.appspotmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 85 ++++++++++++++++++---------------------------------------- 1 file changed, 27 insertions(+), 58 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -112,8 +112,7 @@ IOSQE_IO_DRAIN | IOSQE_CQE_SKIP_SUCCESS)
#define IO_REQ_CLEAN_FLAGS (REQ_F_BUFFER_SELECTED | REQ_F_NEED_CLEANUP | \ - REQ_F_POLLED | REQ_F_INFLIGHT | REQ_F_CREDS | \ - REQ_F_ASYNC_DATA) + REQ_F_POLLED | REQ_F_CREDS | REQ_F_ASYNC_DATA)
#define IO_TCTX_REFS_CACHE_NR (1U << 10)
@@ -469,7 +468,6 @@ struct io_uring_task { const struct io_ring_ctx *last; struct io_wq *io_wq; struct percpu_counter inflight; - atomic_t inflight_tracked; atomic_t in_idle;
spinlock_t task_lock; @@ -1131,6 +1129,8 @@ static void io_clean_op(struct io_kiocb static inline struct file *io_file_get_fixed(struct io_kiocb *req, int fd, unsigned issue_flags); static inline struct file *io_file_get_normal(struct io_kiocb *req, int fd); +static void io_drop_inflight_file(struct io_kiocb *req); +static bool io_assign_file(struct io_kiocb *req, unsigned int issue_flags); static void __io_queue_sqe(struct io_kiocb *req); static void io_rsrc_put_work(struct work_struct *work);
@@ -1312,29 +1312,9 @@ static bool io_match_task(struct io_kioc bool cancel_all) __must_hold(&req->ctx->timeout_lock) { - struct io_kiocb *req; - if (task && head->task != task) return false; - if (cancel_all) - return true; - - io_for_each_link(req, head) { - if (req->flags & REQ_F_INFLIGHT) - return true; - } - return false; -} - -static bool io_match_linked(struct io_kiocb *head) -{ - struct io_kiocb *req; - - io_for_each_link(req, head) { - if (req->flags & REQ_F_INFLIGHT) - return true; - } - return false; + return cancel_all; }
/* @@ -1344,24 +1324,9 @@ static bool io_match_linked(struct io_ki static bool io_match_task_safe(struct io_kiocb *head, struct task_struct *task, bool cancel_all) { - bool matched; - if (task && head->task != task) return false; - if (cancel_all) - return true; - - if (head->flags & REQ_F_LINK_TIMEOUT) { - struct io_ring_ctx *ctx = head->ctx; - - /* protect against races with linked timeouts */ - spin_lock_irq(&ctx->timeout_lock); - matched = io_match_linked(head); - spin_unlock_irq(&ctx->timeout_lock); - } else { - matched = io_match_linked(head); - } - return matched; + return cancel_all; }
static inline bool req_has_async_data(struct io_kiocb *req) @@ -1509,14 +1474,6 @@ static inline bool io_req_ffs_set(struct return req->flags & REQ_F_FIXED_FILE; }
-static inline void io_req_track_inflight(struct io_kiocb *req) -{ - if (!(req->flags & REQ_F_INFLIGHT)) { - req->flags |= REQ_F_INFLIGHT; - atomic_inc(¤t->io_uring->inflight_tracked); - } -} - static struct io_kiocb *__io_prep_linked_timeout(struct io_kiocb *req) { if (WARN_ON_ONCE(!req->link)) @@ -2380,6 +2337,8 @@ static void io_req_task_work_add(struct
WARN_ON_ONCE(!tctx);
+ io_drop_inflight_file(req); + spin_lock_irqsave(&tctx->task_lock, flags); if (priority) wq_list_add_tail(&req->io_task_work.node, &tctx->prior_task_list); @@ -5548,7 +5507,10 @@ static int io_poll_check_events(struct i if (!req->result) { struct poll_table_struct pt = { ._key = poll->events };
- req->result = vfs_poll(req->file, &pt) & poll->events; + if (unlikely(!io_assign_file(req, IO_URING_F_UNLOCKED))) + req->result = -EBADF; + else + req->result = vfs_poll(req->file, &pt) & poll->events; }
/* multishot, just fill an CQE and proceed */ @@ -6731,11 +6693,6 @@ static void io_clean_op(struct io_kiocb kfree(req->apoll); req->apoll = NULL; } - if (req->flags & REQ_F_INFLIGHT) { - struct io_uring_task *tctx = req->task->io_uring; - - atomic_dec(&tctx->inflight_tracked); - } if (req->flags & REQ_F_CREDS) put_cred(req->creds); if (req->flags & REQ_F_ASYNC_DATA) { @@ -7024,6 +6981,19 @@ out: return file; }
+/* + * Drop the file for requeue operations. Only used of req->file is the + * io_uring descriptor itself. + */ +static void io_drop_inflight_file(struct io_kiocb *req) +{ + if (unlikely(req->flags & REQ_F_INFLIGHT)) { + fput(req->file); + req->file = NULL; + req->flags &= ~REQ_F_INFLIGHT; + } +} + static struct file *io_file_get_normal(struct io_kiocb *req, int fd) { struct file *file = fget(fd); @@ -7031,8 +7001,8 @@ static struct file *io_file_get_normal(s trace_io_uring_file_get(req->ctx, fd);
/* we don't allow fixed io_uring files */ - if (file && unlikely(file->f_op == &io_uring_fops)) - io_req_track_inflight(req); + if (file && file->f_op == &io_uring_fops) + req->flags |= REQ_F_INFLIGHT; return file; }
@@ -8804,7 +8774,6 @@ static __cold int io_uring_alloc_task_co xa_init(&tctx->xa); init_waitqueue_head(&tctx->wait); atomic_set(&tctx->in_idle, 0); - atomic_set(&tctx->inflight_tracked, 0); task->io_uring = tctx; spin_lock_init(&tctx->task_lock); INIT_WQ_LIST(&tctx->task_list); @@ -9942,7 +9911,7 @@ static __cold void io_uring_clean_tctx(s static s64 tctx_inflight(struct io_uring_task *tctx, bool tracked) { if (tracked) - return atomic_read(&tctx->inflight_tracked); + return 0; return percpu_counter_sum(&tctx->inflight); }
On 4/11/22 11:26 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.17.3 release. There are 343 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 14 Apr 2022 06:28:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.17.3-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.17.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
On Tue, Apr 12, 2022 at 08:26:58AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.17.3 release. There are 343 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 14 Apr 2022 06:28:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.17.3-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.17.y and the diffstat can be found below.
thanks,
greg k-h
Tested rc1 against the Fedora build system (aarch64, armv7, ppc64le, s390x, x86_64), and boot tested x86_64. No regressions noted.
Tested-by: Justin M. Forbes jforbes@fedoraproject.org
On 4/12/22 12:26 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.17.3 release. There are 343 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 14 Apr 2022 06:28:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.17.3-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.17.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On Tue, 12 Apr 2022 08:26:58 +0200, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.17.3 release. There are 343 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 14 Apr 2022 06:28:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.17.3-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.17.y and the diffstat can be found below.
thanks,
greg k-h
5.17.3-rc1 Successfully Compiled and booted on my Raspberry PI 4b (8g) (bcm2711)
Tested-by: Fox Chen foxhlchen@gmail.com
On Tue, Apr 12, 2022 at 8:49 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.17.3 release. There are 343 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 14 Apr 2022 06:28:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.17.3-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.17.y and the diffstat can be found below.
thanks,
greg k-h
Hi Greg,
Compiled and booted on my test system Lenovo P50s: Intel Core i7 No emergency and critical messages in the dmesg
./perf bench sched all # Running sched/messaging benchmark... # 20 sender and receiver processes per group # 10 groups == 400 processes run
Total time: 0.447 [sec]
# Running sched/pipe benchmark... # Executed 1000000 pipe operations between two processes
Total time: 11.259 [sec]
11.259176 usecs/op 88816 ops/sec
Tested-by: Zan Aziz zanaziz313@gmail.com
Thanks -Zan
On Tue, Apr 12, 2022 at 08:26:58AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.17.3 release. There are 343 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 14 Apr 2022 06:28:59 +0000. Anything received after that time might be too late.
Build results: total: 155 pass: 155 fail: 0 Qemu test results: total: 488 pass: 488 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On Tue, 12 Apr 2022 at 12:38, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.17.3 release. There are 343 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 14 Apr 2022 06:28:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.17.3-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.17.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 5.17.3-rc1 * git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git * git branch: linux-5.17.y * git commit: 66349d151ef98c411fbfe080a4e9c646dc41eca8 * git describe: v5.17.2-344-g66349d151ef9 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.17.y/build/v5.17....
## Test Regressions (compared to v5.17.2-340-ge5a51d774e89) No test regressions found.
## Metric Regressions (compared to v5.17.2-340-ge5a51d774e89) No metric regressions found.
## Test Fixes (compared to v5.17.2-340-ge5a51d774e89) No test fixes found.
## Metric Fixes (compared to v5.17.2-340-ge5a51d774e89) No metric fixes found.
## Test result summary total: 100152, pass: 85962, fail: 732, skip: 12550, xfail: 908
## Build Summary * arc: 10 total, 10 passed, 0 failed * arm: 296 total, 293 passed, 3 failed * arm64: 47 total, 46 passed, 1 failed * dragonboard-410c: 1 total, 1 passed, 0 failed * hi6220-hikey: 1 total, 1 passed, 0 failed * i386: 45 total, 41 passed, 4 failed * juno-r2: 1 total, 1 passed, 0 failed * mips: 41 total, 38 passed, 3 failed * parisc: 14 total, 14 passed, 0 failed * powerpc: 65 total, 56 passed, 9 failed * riscv: 32 total, 26 passed, 6 failed * s390: 26 total, 23 passed, 3 failed * sh: 26 total, 24 passed, 2 failed * sparc: 14 total, 14 passed, 0 failed * x15: 1 total, 1 passed, 0 failed * x86: 1 total, 1 passed, 0 failed * x86_64: 47 total, 46 passed, 1 failed
## Test suites summary * fwts * igt-gpu-tools * kselftest- * kselftest-android * kselftest-arm64 * kselftest-bpf * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers * kselftest-efivarfs * kselftest-filesystems * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * linux-log-parser * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-controllers-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-tracing-tests * network-basic-tests * packetdrill * perf * perf/Zstd-perf.data-compression * rcutorture * ssuite * v4l2-compliance * vdso
-- Linaro LKFT https://lkft.linaro.org
On 12. 04. 22, 8:26, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.17.3 release. There are 343 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 14 Apr 2022 06:28:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.17.3-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.17.y and the diffstat can be found below.
openSUSE configs¹⁾ all green.
Tested-by: Jiri Slaby jirislaby@kernel.org
¹⁾ armv6hl armv7hl arm64 i386 ppc64 ppc64le riscv64 s390x x86_64
On Tue, Apr 12, 2022 at 08:26:58AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.17.3 release. There are 343 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 14 Apr 2022 06:28:59 +0000. Anything received after that time might be too late.
Hi Greg,
5.17.3-rc1 tested.
Run tested on: - Allwinner H6 (Tanix TX6) - Intel Tiger Lake x86_64 (nuc11 i7-1165G7)
In addition - build tested for: - Allwinner A64 - Allwinner H3 - Allwinner H5 - NXP iMX6 - NXP iMX8 - Qualcomm Dragonboard - Rockchip RK3288 - Rockchip RK3328 - Rockchip RK3399pro - Samsung Exynos
Tested-by: Rudi Heitbaum rudi@heitbaum.com -- Rudi
On 4/11/2022 11:26 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.17.3 release. There are 343 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 14 Apr 2022 06:28:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.17.3-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.17.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels:
Tested-by: Florian Fainelli f.fainelli@gmail.com
linux-stable-mirror@lists.linaro.org