This is the start of the stable review cycle for the 5.15.34 release. There are 277 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 14 Apr 2022 06:28:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.34-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.15.34-rc1
Kefeng Wang wangkefeng.wang@huawei.com powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit
Christophe Leroy christophe.leroy@csgroup.eu static_call: Don't make __static_call_return0 static
Waiman Long longman@redhat.com mm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warning
Andre Przywara andre.przywara@arm.com irqchip/gic, gic-v3: Prevent GSI to SGI translations
Christophe Leroy christophe.leroy@csgroup.eu powerpc/64: Fix build failure with allyesconfig in book3s_64_entry.S
Marc Zyngier maz@kernel.org irqchip/gic-v4: Wait for GICR_VPENDBASER.Dirty to clear before descheduling
Peter Zijlstra peterz@infradead.org x86,static_call: Fix __static_call_return0 for i386
Sebastian Andrzej Siewior bigeasy@linutronix.de sched: Teach the forced-newidle balancer about CPU affinity limitation.
Vincent Mailhol mailhol.vincent@wanadoo.fr x86/bug: Prevent shadowing in __WARN_FLAGS
Andrea Parri (Microsoft) parri.andrea@gmail.com Drivers: hv: vmbus: Replace smp_store_mb() with virt_store_mb()
Peter Xu peterx@redhat.com mm: don't skip swap entry even if zap_details specified
Tejun Heo tj@kernel.org selftests: cgroup: Test open-time cgroup namespace usage for migration checks
Tejun Heo tj@kernel.org selftests: cgroup: Test open-time credential usage for migration checks
Tejun Heo tj@kernel.org selftests: cgroup: Make cg_create() use 0755 for permission instead of 0644
Kees Cook keescook@chromium.org ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
dann frazier dann.frazier@canonical.com Revert "net/mlx5: Accept devlink user input after driver initialization complete"
Paolo Bonzini pbonzini@redhat.com KVM: avoid NULL pointer dereference in kvm_dirty_ring_push
Vinod Koul vkoul@kernel.org dmaengine: Revert "dmaengine: shdma: Fix runtime PM imbalance on error"
Arnaldo Carvalho de Melo acme@redhat.com tools build: Use $(shell ) instead of `` to get embedded libperl's ccopts
Arnaldo Carvalho de Melo acme@redhat.com tools build: Filter out options and warnings not supported by clang
Arnaldo Carvalho de Melo acme@redhat.com perf python: Fix probing for some clang command line options
Arnaldo Carvalho de Melo acme@redhat.com perf build: Don't use -ffat-lto-objects in the python feature test when building with clang-13
Jens Axboe axboe@kernel.dk Revert "nbd: fix possible overflow on 'first_minor' in nbd_dev_add()"
Trond Myklebust trond.myklebust@hammerspace.com SUNRPC: Don't call connect() more than once on a TCP socket
Dan Carpenter dan.carpenter@oracle.com rtc: mc146818-lib: fix signedness bug in mc146818_get_time()
Jakub Sitnicki jakub@cloudflare.com selftests/bpf: Fix u8 narrow load checks for bpf_sk_lookup remote_port
Jakub Sitnicki jakub@cloudflare.com bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide
Jakub Kicinski kuba@kernel.org Revert "selftests: net: Add tls config dependency for tls selftests"
Dust Li dust.li@linux.alibaba.com net/smc: send directly on setting TCP_NODELAY
Suravee Suthikulpanit suravee.suthikulpanit@amd.com KVM: SVM: Allow AVIC support on system w/ physical APIC ID > 255
Alex Deucher alexander.deucher@amd.com drm/amdgpu: don't use BACO for reset in S3
Lee Jones lee.jones@linaro.org drm/amdkfd: Create file descriptor after client is added to smi_clients list
Karol Herbst kherbst@redhat.com drm/nouveau/pmu: Add missing callbacks for Tegra devices
Emily Deng Emily.Deng@amd.com drm/amdgpu/vcn: Fix the register setting for vcn1
Alex Deucher alexander.deucher@amd.com drm/amdgpu/smu10: fix SoC/fclk units in auto mode
Benjamin Marty info@benjaminmarty.ch drm/amdgpu/display: change pipe policy for DCN 2.1
Daniel Mack daniel@zonque.org drm/panel: ili9341: fix optional regulator handling
Trond Myklebust trond.myklebust@hammerspace.com SUNRPC: Prevent immediate close+reconnect
Shirish S shirish.s@amd.com amd/display: set backlight only if required
Thomas Zimmermann tzimmermann@suse.de fbdev: Fix unregistering of framebuffers without device
Marc Zyngier maz@kernel.org irqchip/gic-v3: Fix GICR_CTLR.RWP polling
Namhyung Kim namhyung@kernel.org perf/core: Inherit event_caps
Xiaomeng Tong xiam0nd.tong@gmail.com perf: qcom_l2_pmu: fix an incorrect NULL check on list iterator
Christian Lamparter chunkeey@gmail.com ata: sata_dwc_460ex: Fix crash due to OOB write
Kan Liang kan.liang@linux.intel.com perf/x86/intel: Don't extend the pseudo-encoding to GP counters
Dave Hansen dave.hansen@linux.intel.com x86/mm/tlb: Revert retpoline avoidance approach
Reto Buerki reet@codelabs.ch x86/msi: Fix msi message data shadow struct
Shreeya Patel shreeya.patel@collabora.com gpio: Restrict usage of GPIO chip irq members before initialization
Douglas Miller doug.miller@cornelisnetworks.com RDMA/hfi1: Fix use-after-free bug for mm struct
Guo Ren guoren@linux.alibaba.com arm64: patch_text: Fixup last cpu should be master
Vinod Koul vkoul@kernel.org spi: core: add dma_map_dev for __spi_unmap_msg()
Kaiwen Hu kevinhu@synology.com btrfs: prevent subvol with swapfile from being deleted
Ethan Lien ethanlien@synology.com btrfs: fix qgroup reserve overflow the qgroup limit
Kan Liang kan.liang@linux.intel.com perf/x86/intel: Update the FRONTEND MSR mask on Sapphire Rapids
Pawan Gupta pawan.kumar.gupta@linux.intel.com x86/speculation: Restore speculation related MSRs during S3 resume
Pawan Gupta pawan.kumar.gupta@linux.intel.com x86/pm: Save the MSR validity status at context setup
Jens Axboe axboe@kernel.dk io_uring: fix race between timeout flush and removal
Eugene Syromiatnikov esyr@redhat.com io_uring: implement compat handling for IORING_REGISTER_IOWQ_AFF
Jens Axboe axboe@kernel.dk io_uring: defer splice/tee file validity check until command issue
Jens Axboe axboe@kernel.dk io_uring: don't check req->file in io_fsync_prep()
Miaohe Lin linmiaohe@huawei.com mm/mempolicy: fix mpol_new leak in shared_policy_replace
Paolo Bonzini pbonzini@redhat.com mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0)
Max Filippov jcmvbkbc@gmail.com highmem: fix checks in __kmap_local_sched_{in,out}
Guo Xuenan guoxuenan@huawei.com lz4: fix LZ4_decompress_safe_partial read out of bound
Michael Wu michael@allwinnertech.com mmc: core: Fixup support for writeback-cache for eMMC and SD
Wolfram Sang wsa+renesas@sang-engineering.com mmc: renesas_sdhi: don't overwrite TAP settings when HS400 tuning is complete
Yann Gautier yann.gautier@foss.st.com mmc: mmci: stm32: correctly check all elements of sg list
Christian Löhle CLoehle@hyperstone.com mmc: block: Check for errors after write on SPI
Pali Rohár pali@kernel.org Revert "mmc: sdhci-xenon: fix annoying 1.8V regulator warning"
Adrian Hunter adrian.hunter@intel.com scsi: ufs: ufs-pci: Add support for Intel MTL
Damien Le Moal damien.lemoal@opensource.wdc.com scsi: mpt3sas: Fix use after free in _scsih_expander_node_remove()
Chanho Park chanho61.park@samsung.com arm64: Add part number for Arm Cortex-A78AE
Denis Nikitin denik@chromium.org perf session: Remap buf if there is no space for event
Adrian Hunter adrian.hunter@intel.com perf tools: Fix perf's libperf_print callback
James Clark james.clark@arm.com perf: arm-spe: Fix perf report --mem-mode
Tony Lindgren tony@atomide.com iommu/omap: Fix regression in probe for NULL pointer dereference
Trond Myklebust trond.myklebust@hammerspace.com SUNRPC: svc_tcp_sendmsg() should handle errors from xdr_alloc_bvec()
Trond Myklebust trond.myklebust@hammerspace.com SUNRPC: Handle low memory situations in call_status()
Trond Myklebust trond.myklebust@hammerspace.com SUNRPC: Handle ENOMEM in call_transmit_status()
Pavel Begunkov asml.silence@gmail.com io_uring: don't touch scm_fp_list after queueing skb
Pavel Begunkov asml.silence@gmail.com io_uring: nospec index for tags on files update
Xiaomeng Tong xiam0nd.tong@gmail.com scsi: ufs: ufshpb: Fix a NULL check on list iterator
Lv Yunlong lyl2019@mail.ustc.edu.cn drbd: Fix five use after free bugs in get_initial_state
Maxim Mikityanskiy maximmi@nvidia.com bpf: Support dual-stack sockets in bpf_tcp_check_syncookie
Kamal Dasu kdasu.kdev@gmail.com spi: bcm-qspi: fix MSPI only access with bcm_qspi_exec_mem_op()
Jamie Bainbridge jamie.bainbridge@gmail.com qede: confirm skb is allocated before using
Michael Walle michael@walle.cc net: phy: mscc-miim: reject clause 45 register accesses
Taehee Yoo ap420073@gmail.com net: sfc: fix using uninitialized xdp tx_queue
Eric Dumazet edumazet@google.com rxrpc: fix a race in rxrpc_exit_net()
Ilya Maximets i.maximets@ovn.org net: openvswitch: fix leak of nested actions
Andrew Lunn andrew@lunn.ch net: ethernet: mv643xx: Fix over zealous checking of_get_mac_address()
Ilya Maximets i.maximets@ovn.org net: openvswitch: don't send internal clone attribute to the userspace.
Maciej Fijalkowski maciej.fijalkowski@intel.com ice: xsk: fix VSI state check in ice_xsk_wakeup()
Maciej Fijalkowski maciej.fijalkowski@intel.com ice: synchronize_rcu() when terminating rings
David Ahern dsahern@kernel.org ipv6: Fix stats accounting in ip6_pkt_drop
Anatolii Gerasymenko anatolii.gerasymenko@intel.com ice: Do not skip not enabled queues in ice_vc_dis_qs_msg
Anatolii Gerasymenko anatolii.gerasymenko@intel.com ice: Set txq_teid to ICE_INVAL_TEID on ring creation
Miaoqian Lin linmq006@gmail.com dpaa2-ptp: Fix refcount leak in dpaa2_ptp_probe
Jamie Bainbridge jamie.bainbridge@gmail.com sctp: count singleton chunks in assoc user stats
Niels Dossche dossche.niels@gmail.com IB/rdmavt: add lock to call to rvt_error_qp to prevent a race condition
Mark Zhang markzhang@nvidia.com IB/cm: Cancel mad on the DREQ event when the state is MRA_REP_RCVD
Aharon Landau aharonl@nvidia.com RDMA/mlx5: Add a missing update of cache->last_add
Aharon Landau aharonl@nvidia.com RDMA/mlx5: Don't remove cache MRs when a delay is needed
Martin Habets habetsm.xilinx@gmail.com sfc: Do not free an empty page_ring
Ray Jui ray.jui@broadcom.com bnxt_en: Prevent XDP redirect from running when stopping TX queue
Andy Gospodarek gospo@broadcom.com bnxt_en: reserve space inside receive page for skb_shared_info
Pavan Chebbi pavan.chebbi@broadcom.com bnxt_en: Synchronize tx when xdp redirects happen on same ring
Phil Auld pauld@redhat.com arch/arm64: Fix topology initialization for core scheduling
Axel Lin axel.lin@ingics.com regulator: atc260x: Fix missing active_discharge_on setting
Axel Lin axel.lin@ingics.com regulator: rtq2134: Fix missing active_discharge_on setting
Liu Ying victor.liu@nxp.com drm/imx: dw_hdmi-imx: Fix bailout in error cases of probe
José Expósito jose.exposito89@gmail.com drm/imx: Fix memory leak in imx_pd_connector_get_modes
Jiasheng Jiang jiasheng@iscas.ac.cn drm/imx: imx-ldb: Check for null pointer after calling kmemdup
Chen-Yu Tsai wens@csie.org net: stmmac: Fix unset max_speed difference between DT and non-DT platforms
Nikolay Aleksandrov razor@blackwall.org net: ipv4: fix route with nexthop object delete warning
Matt Johnston matt@codeconstruct.com.au mctp: Fix check for dev_hard_header() result
Ivan Vecera ivecera@redhat.com ice: Clear default forwarding VSI during VSI release
Jean-Philippe Brucker jean-philippe@linaro.org skbuff: fix coalescing for page_pool fragment recycling
Eyal Birger eyal.birger@gmail.com vrf: fix packet sniffing for traffic originating from ip tunnels
Ziyang Xuan william.xuanziyang@huawei.com net/tls: fix slab-out-of-bounds bug in decrypt_internal
Taehee Yoo ap420073@gmail.com net: sfc: add missing xdp queue reinitialization
Jason Wang jasowang@redhat.com vdpa: mlx5: prevent cvq work from hogging CPU
Eli Cohen elic@nvidia.com vdpa/mlx5: Propagate link status from device to vdpa driver
Eli Cohen elic@nvidia.com vdpa/mlx5: Rename control VQ workqueue to vdpa wq
Christophe JAILLET christophe.jaillet@wanadoo.fr scsi: zorro7xx: Fix a resource leak in zorro7xx_remove_one()
John Garry john.garry@huawei.com scsi: core: Fix sbitmap depth in scsi_realloc_sdev_budget_map()
Kevin Groeneveld kgroeneveld@lenbrook.com scsi: sr: Fix typo in CDROM(CLOSETRAY|EJECT) handling
ChenXiaoSong chenxiaosong2@huawei.com NFSv4: fix open failure with O_ACCMODE flag
ChenXiaoSong chenxiaosong2@huawei.com Revert "NFSv4: Handle the special Linux file open access mode"
Guilherme G. Piccoli gpiccoli@igalia.com Drivers: hv: vmbus: Fix potential crash on module unload
Dan Carpenter dan.carpenter@oracle.com drm/amdgpu: fix off by one in amdgpu_gfx_kiq_acquire()
Mateusz Jończyk mat.jonczyk@o2.pl rtc: mc146818-lib: fix RTC presence check
Mateusz Jończyk mat.jonczyk@o2.pl rtc: Check return value from mc146818_get_time()
Mateusz Jończyk mat.jonczyk@o2.pl rtc: mc146818-lib: change return values of mc146818_get_time()
Mauricio Faria de Oliveira mfo@canonical.com mm: fix race between MADV_FREE reclaim and blkdev direct IO read
John David Anglin dave.anglin@bell.net parisc: Fix patch code locking and flushing
Helge Deller deller@gmx.de parisc: Fix CPU affinity for Lasi, WAX and Dino chips
Naresh Kamboju naresh.kamboju@linaro.org selftests: net: Add tls config dependency for tls selftests
Trond Myklebust trond.myklebust@hammerspace.com NFS: Avoid writeback threads getting stuck in mempool_alloc()
Trond Myklebust trond.myklebust@hammerspace.com NFS: nfsiod should not block forever in mempool_alloc()
Trond Myklebust trond.myklebust@hammerspace.com SUNRPC: Fix socket waits for write buffer space
Haimin Zhang tcs_kernel@tencent.com jfs: prevent NULL deref in diFree
Randy Dunlap rdunlap@infradead.org virtio_console: eliminate anonymous module_init & module_exit
Jiri Slaby jirislaby@kernel.org serial: samsung_tty: do not unlock port->lock for uart_write_wakeup()
Nathan Chancellor nathan@kernel.org x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy
Peter Zijlstra peterz@infradead.org x86: Annotate call_on_stack()
NeilBrown neilb@suse.de NFS: swap-out must always use STABLE writes.
NeilBrown neilb@suse.de NFS: swap IO handling is slightly different for O_DIRECT IO
NeilBrown neilb@suse.de SUNRPC: remove scheduling boost for "SWAPPER" tasks.
NeilBrown neilb@suse.de SUNRPC/xprt: async tasks mustn't block waiting for memory
NeilBrown neilb@suse.de SUNRPC/call_alloc: async tasks mustn't block waiting for memory
Maxime Ripard maxime@cerno.tech clk: Enforce that disjoints limits are invalid
Tony Lindgren tony@atomide.com clk: ti: Preserve node in ti_dt_clocks_register()
Dongli Zhang dongli.zhang@oracle.com xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32
Ohad Sharabi osharabi@habana.ai habanalabs: fix possible memory leak in MMU DR fini
Trond Myklebust trond.myklebust@hammerspace.com NFSv4: Protect the state recovery thread against direct reclaim
Xin Xiong xiongx18@fudan.edu.cn NFSv4.2: fix reference count leaks in _nfs42_proc_copy_notify()
Lucas Denefle lucas.denefle@converge.io w1: w1_therm: fixes w1_seq for ds28ea00 sensors
Xiaoke Wang xkernel.wang@foxmail.com staging: wfx: fix an error handling in wfx_init_common()
Viresh Kumar viresh.kumar@linaro.org opp: Expose of-node's name in debugfs
Pierre Gondois Pierre.Gondois@arm.com cpufreq: CPPC: Fix performance/frequency conversion
Sascha Hauer s.hauer@pengutronix.de clk: rockchip: drop CLK_SET_RATE_PARENT from dclk_vop* on rk3568
Amjad Ouled-Ameur aouledameur@baylibre.com phy: amlogic: meson8b-usb2: fix shared reset control use
Amjad Ouled-Ameur aouledameur@baylibre.com phy: amlogic: meson8b-usb2: Use dev_err_probe()
Amjad Ouled-Ameur aouledameur@baylibre.com phy: amlogic: phy-meson-gxl-usb2: fix shared reset controller use
Stefan Wahren stefan.wahren@i2se.com staging: vchiq_core: handle NULL result of find_service_by_handle
Stefan Wahren stefan.wahren@i2se.com staging: vchiq_arm: Avoid NULL ptr deref in vchiq_dump_platform_instances
Adam Wujek dev_public@wujek.eu clk: si5341: fix reported clk_rate when output divider is 2
Qinghua Jin qhjin.dev@gmail.com minix: fix bug when opening a file with O_DIRECT
Randy Dunlap rdunlap@infradead.org init/main.c: return 1 from handled __setup() functions
Feng Tang feng.tang@intel.com lib/Kconfig.debug: add ARCH dependency for FUNCTION_ALIGN option
Xiubo Li xiubli@redhat.com ceph: fix memory leak in ceph_readdir when note_last_dentry returns error
Xiubo Li xiubli@redhat.com ceph: fix inode reference leakage in ceph_get_snapdir()
Wang Yufen wangyufen@huawei.com netlabel: fix out-of-bounds memory accesses
Florian Westphal fw@strlen.de netfilter: conntrack: revisit gc autotuning
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: Fix use after free in hci_send_acl
Krzysztof Kozlowski krzysztof.kozlowski@canonical.com MIPS: ingenic: correct unit node address
Max Filippov jcmvbkbc@gmail.com xtensa: fix DTC warning unit_address_format
Deren Wu deren.wu@mediatek.com mt76: fix monitor mode crash with sdio driver
H. Nikolaus Schaller hns@goldelico.com usb: dwc3: omap: fix "unbalanced disables for smps10_out1" on omap5evm
Michael Walle michael@walle.cc net: sfp: add 2500base-X quirk for Lantech SFP module
Gal Pressman gal@nvidia.com net/mlx5e: Remove overzealous validations in netlink EEPROM query
Jakub Kicinski kuba@kernel.org net: limit altnames to 64k total
Jakub Kicinski kuba@kernel.org net: account alternate interface name memory
Michael T. Kloos michael@michaelkloos.com riscv: Fixed misaligned memory access. Fixed pointer comparison.
Vincent Mailhol mailhol.vincent@wanadoo.fr can: etas_es58x: es58x_fd_rx_event_msg(): initialize rx_event_msg before calling es58x_check_msg_len()
Oliver Hartkopp socketcan@hartkopp.net can: isotp: set default value for N_As to 50 micro seconds
Jianglei Nie niejianglei2021@163.com scsi: libfc: Fix use after free in fc_exch_abts_resp()
Hangyu Hua hbh25y@gmail.com powerpc/secvar: fix refcount leak in format_show()
Michael Ellerman mpe@ellerman.id.au powerpc/64e: Tie PPC_BOOK3E_64 to PPC_FSL_BOOK3E
Michael Ellerman mpe@ellerman.id.au powerpc/code-patching: Pre-map patch area
Alexander Lobakin alobakin@pm.me MIPS: fix fortify panic when copying asm exception handlers
Li Chen lchen@ambarella.com PCI: endpoint: Fix misused goto label
Michael Chan michael.chan@broadcom.com bnxt_en: Eliminate unintended link toggle during FW reset
Minghao Chi (CGEL ZTE) chi.minghao@zte.com.cn Bluetooth: use memset avoid memory leaks
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: Fix not checking for valid hdev on bt_dev_{info,warn,err,dbg}
Harold Huang baymaxhuang@gmail.com tuntap: add sanity checks about msg_controllen in sendmsg
Sven Eckelmann sven@narfation.org macvtap: advertise link netns via netlink
Hangyu Hua hbh25y@gmail.com mips: ralink: fix a refcount leak in ill_acc_of_setup()
Dust Li dust.li@linux.alibaba.com net/smc: correct settings of RMB window update limit
Xiang Chen chenxiang66@hisilicon.com scsi: hisi_sas: Limit users changing debugfs BIST count value
Qi Liu liuqi115@huawei.com scsi: hisi_sas: Free irq vectors in order for v3 HW
Randy Dunlap rdunlap@infradead.org scsi: aha152x: Fix aha152x_setup() __setup handler return value
Yang Li yang.lee@linux.alibaba.com mt76: mt7615: Fix assigning negative values to unsigned variable
Nicholas Piggin npiggin@gmail.com powerpc/64s/hash: Make hash faults work in NMI context
Johan Almbladh johan.almbladh@anyfinetworks.com mt76: mt7915: fix injected MPDU transmission to not use HW A-MSDU
Damien Le Moal damien.lemoal@opensource.wdc.com scsi: pm8001: Fix memory leak in pm8001_chip_fw_flash_update_req()
Damien Le Moal damien.lemoal@opensource.wdc.com scsi: pm8001: Fix tag leaks on error
Damien Le Moal damien.lemoal@opensource.wdc.com scsi: pm8001: Fix task leak in pm8001_send_abort_all()
Damien Le Moal damien.lemoal@opensource.wdc.com scsi: pm8001: Fix pm8001_mpi_task_abort_resp()
Damien Le Moal damien.lemoal@opensource.wdc.com scsi: pm8001: Fix pm80xx_pci_mem_copy() interface
Alex Williamson alex.williamson@redhat.com vfio/pci: Stub vfio_pci_vga_rw when !CONFIG_VFIO_PCI_VGA
Alex Deucher alexander.deucher@amd.com drm/amdkfd: make CRAT table missing message informational only
Mike Snitzer snitzer@redhat.com dm: requeue IO if mapping table not yet available
Jordy Zomer jordy@jordyzomer.github.io dm ioctl: prevent potential spectre v1 gadget
Ido Schimmel idosch@nvidia.com ipv4: Invalidate neighbour for broadcast address upon address addition
Daniel Thompson daniel.thompson@linaro.org drm/msm/dsi: Remove spurious IRQF_ONESHOT flag
Miri Korenblit miriam.rachel.korenblit@intel.com iwlwifi: mvm: move only to an enabled channel
Ilan Peer ilan.peer@intel.com iwlwifi: mvm: Correctly set fragmented EBS
Hans de Goede hdegoede@redhat.com usb: dwc3: pci: Set the swnode from inside dwc3_pci_quirks()
Maxim Mikityanskiy maximmi@nvidia.com net/mlx5e: Disable TX queues before registering the netdev
Hans de Goede hdegoede@redhat.com power: supply: axp288-charger: Set Vhold to 4.4V
Christophe Leroy christophe.leroy@csgroup.eu powerpc/set_memory: Avoid spinlock recursion in change_page_attr()
Sreekanth Reddy sreekanth.reddy@broadcom.com scsi: mpi3mr: Fix memory leaks
Sreekanth Reddy sreekanth.reddy@broadcom.com scsi: mpi3mr: Fix reporting of actual data transfer size
Manivannan Sadhasivam mani@kernel.org PCI: pciehp: Add Qualcomm quirk for Command Completed erratum
Sebastian Andrzej Siewior bigeasy@linutronix.de tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH.
Hou Zhiqiang Zhiqiang.Hou@nxp.com PCI: endpoint: Fix alignment fault error in copy tests
Neal Liu neal_liu@aspeedtech.com usb: ehci: add pci device support for Aspeed platforms
Zhou Guanghui zhouguanghui1@huawei.com iommu/arm-smmu-v3: fix event handling soft lockup
Pali Rohár pali@kernel.org PCI: aardvark: Fix support for MSI interrupts
Mahesh Rajashekhara mahesh.rajashekhara@microchip.com scsi: smartpqi: Fix kdump issue when controller is locked up
Rajneesh Bhardwaj rajneesh.bhardwaj@amd.com drm/amdgpu: Fix recursive locking warning
Sourabh Jain sourabhjain@linux.ibm.com powerpc: Set crashkernel offset to mid of RMA region
Eric Dumazet edumazet@google.com net: initialize init_net earlier
Eric Dumazet edumazet@google.com ipv6: make mc_forwarding atomic
Yonghong Song yhs@fb.com libbpf: Fix build issue with llvm-readelf
Avraham Stern avraham.stern@intel.com cfg80211: don't add non transmitted BSS to 6GHz scanned channels
Lorenzo Bianconi lorenzo@kernel.org mt76: dma: initialize skip_unmap in mt76_dma_rx_fill
Ben Greear greearb@candelatech.com mt76: mt7921: fix crash when startup fails.
Evgeny Boger boger@wirenboard.com power: supply: axp20x_battery: properly report current when discharging
Yongzhi Liu lyz_cs@pku.edu.cn drm/v3d: fix missing unlock
Yang Guang yang.guang5@zte.com.cn scsi: bfa: Replace snprintf() with sysfs_emit()
Yang Guang yang.guang5@zte.com.cn scsi: mvsas: Replace snprintf() with sysfs_emit()
Jakub Sitnicki jakub@cloudflare.com bpf: Make dst_port field in struct bpf_sock 16-bit wide
Yongzhi Liu lyz_cs@pku.edu.cn drm/bridge: Add missing pm_runtime_put_sync
Tony Lu tonylu@linux.alibaba.com net/smc: Send directly when TCP_CORK is cleared
Kalle Valo quic_kvalo@quicinc.com ath11k: mhi: use mhi_sync_power_up()
Kalle Valo quic_kvalo@quicinc.com ath11k: pci: fix crash on suspend if board file is not found
Venkateswara Naralasetty quic_vnaralas@quicinc.com ath11k: fix kernel panic during unload/load ath11k modules
Maxim Kiselev bigunclemax@gmail.com powerpc: dts: t104xrdb: fix phy type for FMAN 4/5
Philip Yang Philip.Yang@amd.com drm/amdkfd: Don't take process mutex for svm ioctls
Yang Guang yang.guang5@zte.com.cn ptp: replace snprintf with sysfs_emit
Pawel Laszczak pawell@cadence.com usb: cdnsp: fix cdnsp_decode_trb function to properly handle ret value
Wayne Chang waynec@nvidia.com usb: gadget: tegra-xudc: Fix control endpoint's definitions
Wayne Chang waynec@nvidia.com usb: gadget: tegra-xudc: Do not program SPARAM
Nicholas Kazlauskas nicholas.kazlauskas@amd.com drm/amd/display: Use PSR version selected during set_psr_caps
Yongzhi Liu lyz_cs@pku.edu.cn drm/amd/display: Fix memory leak
Xin Xiong xiongx18@fudan.edu.cn drm/amd/amdgpu/amdgpu_cs: fix refcount leak of a dma_fence obj
Dale Zhao dale.zhao@amd.com drm/amd/display: Add signal type check when verify stream backends same
Zekun Shen bruceshenzk@gmail.com ath5k: fix OOB in ath5k_eeprom_read_pcal_info_5111
Anisse Astier anisse@astier.eu drm: Add orientation quirk for GPD Win Max
Hou Wenlong houwenlong.hwl@antgroup.com KVM: x86/emulator: Emulate RDPID only if it is enabled in guest
Like Xu likexu@tencent.com KVM: x86/pmu: Fix and isolate TSX-specific performance event logic
Jim Mattson jmattson@google.com KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs
Peter Gonda pgonda@google.com KVM: SVM: Fix kvm_cache_regs.h inclusions for is_guest_mode()
Jim Mattson jmattson@google.com KVM: x86/pmu: Use different raw event masks for AMD and Intel
Marco Elver elver@google.com kfence: limit currently covered allocations when pool nearly full
Marco Elver elver@google.com kfence: move saving stack trace of allocations into __kfence_alloc()
Marco Elver elver@google.com kfence: count unexpectedly skipped allocations
Zhang Wensheng zhangwensheng5@huawei.com nbd: fix possible overflow on 'first_minor' in nbd_dev_add()
Ye Bin yebin10@huawei.com nbd: Fix hungtask when nbd_config_put
Ye Bin yebin10@huawei.com nbd: Fix incorrect error handle when first_minor is illegal in nbd_dev_add
Luis Chamberlain mcgrof@kernel.org nbd: add error handling support for add_disk()
Jiasheng Jiang jiasheng@iscas.ac.cn rtc: wm8350: Handle error for wm8350_register_irq
Benjamin Beichler benjamin.beichler@uni-rostock.de um: fix and optimize xor select template for CONFIG64 and timetravel mode
Johannes Berg johannes.berg@intel.com lib/logic_iomem: correct fallback config references
-------------
Diffstat:
Makefile | 4 +- arch/alpha/kernel/rtc.c | 7 +- arch/arm64/include/asm/cputype.h | 2 + arch/arm64/kernel/patching.c | 4 +- arch/arm64/kernel/proton-pack.c | 1 + arch/arm64/kernel/smp.c | 2 +- arch/mips/boot/dts/ingenic/jz4780.dtsi | 2 +- arch/mips/include/asm/setup.h | 2 +- arch/mips/kernel/traps.c | 22 +- arch/mips/ralink/ill_acc.c | 1 + arch/parisc/kernel/patch.c | 25 +- arch/powerpc/boot/dts/fsl/t104xrdb.dtsi | 4 +- arch/powerpc/include/asm/interrupt.h | 2 +- arch/powerpc/include/asm/page.h | 6 +- arch/powerpc/kernel/rtas.c | 6 + arch/powerpc/kernel/secvar-sysfs.c | 9 +- arch/powerpc/kexec/core.c | 15 +- arch/powerpc/kvm/book3s_64_entry.S | 10 +- arch/powerpc/lib/code-patching.c | 14 + arch/powerpc/mm/book3s64/hash_utils.c | 54 +- arch/powerpc/mm/pageattr.c | 32 +- arch/powerpc/perf/callchain.h | 9 +- arch/powerpc/perf/callchain_64.c | 27 - arch/powerpc/platforms/Kconfig.cputype | 3 +- arch/riscv/lib/memmove.S | 368 +++++++++++--- arch/um/include/asm/xor.h | 4 +- arch/x86/Kconfig | 5 + arch/x86/events/intel/core.c | 8 +- arch/x86/include/asm/bug.h | 4 +- arch/x86/include/asm/irq_stack.h | 3 +- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/include/asm/msi.h | 19 +- arch/x86/include/asm/perf_event.h | 5 + arch/x86/kernel/hpet.c | 8 +- arch/x86/kernel/static_call.c | 5 +- arch/x86/kvm/emulate.c | 4 +- arch/x86/kvm/kvm_emulate.h | 1 + arch/x86/kvm/pmu.c | 18 +- arch/x86/kvm/svm/avic.c | 7 +- arch/x86/kvm/svm/pmu.c | 9 +- arch/x86/kvm/svm/svm.h | 4 +- arch/x86/kvm/svm/svm_onhyperv.c | 1 - arch/x86/kvm/vmx/pmu_intel.c | 14 +- arch/x86/kvm/x86.c | 6 + arch/x86/mm/tlb.c | 37 +- arch/x86/power/cpu.c | 21 +- arch/x86/xen/smp_hvm.c | 6 + arch/x86/xen/time.c | 24 +- arch/xtensa/boot/dts/xtfpga-flash-128m.dtsi | 8 +- arch/xtensa/boot/dts/xtfpga-flash-16m.dtsi | 8 +- arch/xtensa/boot/dts/xtfpga-flash-4m.dtsi | 4 +- drivers/ata/sata_dwc_460ex.c | 6 +- drivers/base/power/trace.c | 6 +- drivers/block/drbd/drbd_int.h | 8 +- drivers/block/drbd/drbd_nl.c | 41 +- drivers/block/drbd/drbd_state.c | 18 +- drivers/block/drbd/drbd_state_change.h | 8 +- drivers/block/nbd.c | 40 +- drivers/char/virtio_console.c | 8 +- drivers/clk/clk-si5341.c | 16 +- drivers/clk/clk.c | 24 + drivers/clk/rockchip/clk-rk3568.c | 6 +- drivers/clk/ti/clk.c | 13 +- drivers/cpufreq/cppc_cpufreq.c | 43 +- drivers/dma/sh/shdma-base.c | 4 +- drivers/gpio/gpiolib.c | 19 + drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 +- drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 - drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 24 +- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 7 +- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 6 + .../drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c | 80 ++- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c | 6 +- drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 3 + .../gpu/drm/amd/display/dc/dcn21/dcn21_resource.c | 2 +- drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 11 + .../gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c | 8 +- drivers/gpu/drm/bridge/nwl-dsi.c | 18 +- drivers/gpu/drm/drm_panel_orientation_quirks.c | 6 + drivers/gpu/drm/imx/dw_hdmi-imx.c | 8 +- drivers/gpu/drm/imx/imx-ldb.c | 2 + drivers/gpu/drm/imx/parallel-display.c | 4 +- drivers/gpu/drm/msm/dsi/dsi_host.c | 2 +- drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gm20b.c | 1 + drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp102.c | 2 +- drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c | 1 + drivers/gpu/drm/nouveau/nvkm/subdev/pmu/priv.h | 1 + drivers/gpu/drm/panel/panel-ilitek-ili9341.c | 4 +- drivers/gpu/drm/v3d/v3d_gem.c | 6 +- drivers/hv/channel_mgmt.c | 6 +- drivers/hv/vmbus_drv.c | 9 +- drivers/infiniband/core/cm.c | 3 +- drivers/infiniband/hw/hfi1/mmu_rb.c | 6 + drivers/infiniband/hw/mlx5/mr.c | 5 +- drivers/infiniband/sw/rdmavt/qp.c | 6 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 + drivers/iommu/omap-iommu.c | 2 +- drivers/irqchip/irq-gic-v3-its.c | 28 +- drivers/irqchip/irq-gic-v3.c | 14 +- drivers/irqchip/irq-gic.c | 6 + drivers/md/dm-ioctl.c | 2 + drivers/md/dm-rq.c | 7 +- drivers/md/dm.c | 11 +- drivers/misc/habanalabs/common/mmu/mmu_v1.c | 2 +- drivers/mmc/core/block.c | 46 +- drivers/mmc/host/mmci_stm32_sdmmc.c | 6 +- drivers/mmc/host/renesas_sdhi_core.c | 4 +- drivers/mmc/host/sdhci-xenon.c | 10 - drivers/net/can/usb/etas_es58x/es58x_fd.c | 3 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 7 + drivers/net/ethernet/broadcom/bnxt/bnxt.h | 5 +- drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 4 +- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 14 +- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h | 2 + drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c | 4 +- drivers/net/ethernet/intel/ice/ice.h | 2 +- drivers/net/ethernet/intel/ice/ice_lib.c | 3 + drivers/net/ethernet/intel/ice/ice_main.c | 4 +- drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 4 +- drivers/net/ethernet/intel/ice/ice_xsk.c | 6 +- drivers/net/ethernet/marvell/mv643xx_eth.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 12 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 - drivers/net/ethernet/mellanox/mlx5/core/port.c | 23 - .../ethernet/mellanox/mlx5/core/sf/dev/driver.c | 2 - drivers/net/ethernet/qlogic/qede/qede_fp.c | 3 + drivers/net/ethernet/sfc/efx_channels.c | 148 +++--- drivers/net/ethernet/sfc/rx_common.c | 3 + drivers/net/ethernet/sfc/tx.c | 3 + drivers/net/ethernet/sfc/tx_common.c | 2 + .../net/ethernet/stmicro/stmmac/stmmac_platform.c | 3 +- drivers/net/macvtap.c | 6 + drivers/net/mdio/mdio-mscc-miim.c | 6 + drivers/net/phy/sfp-bus.c | 6 + drivers/net/tap.c | 3 +- drivers/net/tun.c | 3 +- drivers/net/vrf.c | 15 +- drivers/net/wireless/ath/ath11k/ahb.c | 2 + drivers/net/wireless/ath/ath11k/mhi.c | 2 +- drivers/net/wireless/ath/ath11k/pci.c | 10 + drivers/net/wireless/ath/ath5k/eeprom.c | 3 + drivers/net/wireless/intel/iwlwifi/mvm/phy-ctxt.c | 31 +- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 5 +- drivers/net/wireless/mediatek/mt76/dma.c | 1 + drivers/net/wireless/mediatek/mt76/mt76.h | 2 +- drivers/net/wireless/mediatek/mt76/mt7615/mac.c | 2 +- drivers/net/wireless/mediatek/mt76/mt7915/mac.c | 1 + drivers/net/wireless/mediatek/mt76/mt7921/main.c | 1 + drivers/opp/debugfs.c | 5 + drivers/opp/opp.h | 1 + drivers/parisc/dino.c | 41 +- drivers/parisc/gsc.c | 31 ++ drivers/parisc/gsc.h | 1 + drivers/parisc/lasi.c | 7 +- drivers/parisc/wax.c | 7 +- drivers/pci/controller/pci-aardvark.c | 16 +- drivers/pci/endpoint/functions/pci-epf-test.c | 14 +- drivers/pci/hotplug/pciehp_hpc.c | 2 + drivers/perf/qcom_l2_pmu.c | 6 +- drivers/phy/amlogic/phy-meson-gxl-usb2.c | 5 +- drivers/phy/amlogic/phy-meson8b-usb2.c | 9 +- drivers/power/supply/axp20x_battery.c | 13 +- drivers/power/supply/axp288_charger.c | 14 +- drivers/ptp/ptp_sysfs.c | 4 +- drivers/regulator/atc260x-regulator.c | 1 + drivers/regulator/rtq2134-regulator.c | 1 + drivers/rtc/rtc-cmos.c | 19 +- drivers/rtc/rtc-mc146818-lib.c | 42 +- drivers/rtc/rtc-wm8350.c | 11 +- drivers/scsi/aha152x.c | 6 +- drivers/scsi/bfa/bfad_attr.c | 26 +- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 68 ++- drivers/scsi/libfc/fc_exch.c | 1 + drivers/scsi/mpi3mr/mpi3mr_fw.c | 2 +- drivers/scsi/mpi3mr/mpi3mr_os.c | 2 + drivers/scsi/mpt3sas/mpt3sas_scsih.c | 5 +- drivers/scsi/mvsas/mv_init.c | 4 +- drivers/scsi/pm8001/pm8001_hwi.c | 27 +- drivers/scsi/pm8001/pm8001_sas.c | 2 +- drivers/scsi/pm8001/pm80xx_hwi.c | 17 +- drivers/scsi/scsi_scan.c | 5 + drivers/scsi/smartpqi/smartpqi_init.c | 39 +- drivers/scsi/sr.c | 2 +- drivers/scsi/ufs/ufshcd-pci.c | 17 + drivers/scsi/ufs/ufshpb.c | 11 +- drivers/scsi/zorro7xx.c | 2 + drivers/spi/spi-bcm-qspi.c | 4 +- drivers/spi/spi.c | 4 + .../vc04_services/interface/vchiq_arm/vchiq_arm.c | 3 + .../vc04_services/interface/vchiq_arm/vchiq_core.c | 6 + drivers/staging/wfx/main.c | 7 +- drivers/tty/serial/samsung_tty.c | 5 +- drivers/usb/cdns3/cdnsp-debug.h | 305 ++++++------ drivers/usb/dwc3/dwc3-omap.c | 2 +- drivers/usb/dwc3/dwc3-pci.c | 11 +- drivers/usb/gadget/udc/tegra-xudc.c | 20 +- drivers/usb/host/ehci-pci.c | 9 + drivers/vdpa/mlx5/core/mlx5_vdpa.h | 2 +- drivers/vdpa/mlx5/net/mlx5_vnet.c | 121 ++++- drivers/vfio/pci/vfio_pci_rdwr.c | 2 + drivers/vhost/net.c | 1 + drivers/video/fbdev/core/fbmem.c | 9 +- drivers/w1/slaves/w1_therm.c | 8 +- fs/btrfs/extent_io.h | 2 +- fs/btrfs/inode.c | 22 + fs/ceph/dir.c | 11 +- fs/ceph/inode.c | 10 +- fs/io_uring.c | 79 +-- fs/jfs/inode.c | 3 +- fs/minix/inode.c | 3 +- fs/nfs/dir.c | 10 - fs/nfs/direct.c | 48 +- fs/nfs/file.c | 4 +- fs/nfs/inode.c | 1 - fs/nfs/internal.h | 17 + fs/nfs/nfs42proc.c | 9 +- fs/nfs/nfs4file.c | 6 +- fs/nfs/nfs4state.c | 12 + fs/nfs/pagelist.c | 10 +- fs/nfs/pnfs_nfs.c | 8 +- fs/nfs/write.c | 34 +- include/linux/gpio/driver.h | 9 + include/linux/ipv6.h | 2 +- include/linux/mc146818rtc.h | 3 +- include/linux/mmzone.h | 11 +- include/linux/nfs_fs.h | 10 +- include/linux/static_call.h | 5 +- include/linux/sunrpc/xprtsock.h | 1 + include/linux/vfio_pci_core.h | 9 + include/net/arp.h | 1 + include/net/bluetooth/bluetooth.h | 14 +- include/net/net_namespace.h | 6 + include/uapi/linux/bpf.h | 6 +- include/uapi/linux/can/isotp.h | 28 +- init/main.c | 8 +- kernel/Makefile | 3 +- kernel/events/core.c | 3 + kernel/sched/core.c | 2 +- kernel/static_call.c | 542 +------------------- kernel/static_call_inline.c | 543 +++++++++++++++++++++ lib/Kconfig.debug | 3 +- lib/Kconfig.ubsan | 13 - lib/logic_iomem.c | 8 +- lib/lz4/lz4_decompress.c | 8 +- lib/test_ubsan.c | 22 - mm/highmem.c | 4 +- mm/kfence/core.c | 156 +++++- mm/kfence/kfence.h | 2 + mm/memory.c | 25 +- mm/mempolicy.c | 1 + mm/mremap.c | 3 + mm/rmap.c | 25 +- net/batman-adv/multicast.c | 2 +- net/bluetooth/hci_event.c | 3 +- net/bluetooth/l2cap_core.c | 1 + net/bpf/test_run.c | 4 +- net/can/isotp.c | 12 +- net/core/dev.c | 3 +- net/core/filter.c | 30 +- net/core/net_namespace.c | 17 +- net/core/rtnetlink.c | 13 +- net/core/skbuff.c | 15 +- net/ipv4/arp.c | 9 +- net/ipv4/fib_frontend.c | 5 +- net/ipv4/fib_semantics.c | 7 +- net/ipv4/inet_hashtables.c | 53 +- net/ipv6/addrconf.c | 4 +- net/ipv6/inet6_hashtables.c | 5 +- net/ipv6/ip6_input.c | 2 +- net/ipv6/ip6mr.c | 8 +- net/ipv6/route.c | 2 +- net/mctp/route.c | 2 +- net/netfilter/nf_conntrack_core.c | 85 +++- net/netlabel/netlabel_kapi.c | 2 + net/openvswitch/actions.c | 2 +- net/openvswitch/flow_netlink.c | 99 +++- net/rxrpc/net_ns.c | 2 +- net/sctp/outqueue.c | 6 +- net/smc/af_smc.c | 8 +- net/smc/smc_core.c | 2 +- net/smc/smc_tx.c | 25 +- net/smc/smc_tx.h | 1 + net/sunrpc/clnt.c | 7 + net/sunrpc/sched.c | 11 +- net/sunrpc/svcsock.c | 4 +- net/sunrpc/xprt.c | 19 +- net/sunrpc/xprtrdma/transport.c | 6 +- net/sunrpc/xprtsock.c | 73 ++- net/tls/tls_sw.c | 2 +- net/wireless/scan.c | 9 +- scripts/Makefile.ubsan | 1 - tools/build/feature/Makefile | 9 +- tools/lib/bpf/Makefile | 4 +- tools/perf/Makefile.config | 6 + tools/perf/arch/arm64/util/arm-spe.c | 6 + tools/perf/perf.c | 2 +- tools/perf/util/session.c | 15 +- tools/perf/util/setup.py | 8 +- tools/testing/selftests/bpf/progs/test_sk_lookup.c | 3 +- tools/testing/selftests/cgroup/cgroup_util.c | 2 +- tools/testing/selftests/cgroup/test_core.c | 165 +++++++ virt/kvm/kvm_main.c | 2 +- 307 files changed, 3548 insertions(+), 1883 deletions(-)
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit 2a6852cb8ff0c8c1363cac648d68489343813212 ]
Due to some renaming, we ended up with the "indirect iomem" naming in Kconfig, following INDIRECT_PIO. However, clearly I missed following through on that in the ifdefs, but so far INDIRECT_IOMEM_FALLBACK isn't used by any architecture.
Reported-by: Lukas Bulwahn lukas.bulwahn@gmail.com Fixes: ca2e334232b6 ("lib: add iomem emulation (logic_iomem)") Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Richard Weinberger richard@nod.at Signed-off-by: Sasha Levin sashal@kernel.org --- lib/logic_iomem.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/lib/logic_iomem.c b/lib/logic_iomem.c index 549b22d4bcde..e7ea9b28d8db 100644 --- a/lib/logic_iomem.c +++ b/lib/logic_iomem.c @@ -68,7 +68,7 @@ int logic_iomem_add_region(struct resource *resource, } EXPORT_SYMBOL(logic_iomem_add_region);
-#ifndef CONFIG_LOGIC_IOMEM_FALLBACK +#ifndef CONFIG_INDIRECT_IOMEM_FALLBACK static void __iomem *real_ioremap(phys_addr_t offset, size_t size) { WARN(1, "invalid ioremap(0x%llx, 0x%zx)\n", @@ -81,7 +81,7 @@ static void real_iounmap(void __iomem *addr) WARN(1, "invalid iounmap for addr 0x%llx\n", (unsigned long long)(uintptr_t __force)addr); } -#endif /* CONFIG_LOGIC_IOMEM_FALLBACK */ +#endif /* CONFIG_INDIRECT_IOMEM_FALLBACK */
void __iomem *ioremap(phys_addr_t offset, size_t size) { @@ -168,7 +168,7 @@ void iounmap(void __iomem *addr) } EXPORT_SYMBOL(iounmap);
-#ifndef CONFIG_LOGIC_IOMEM_FALLBACK +#ifndef CONFIG_INDIRECT_IOMEM_FALLBACK #define MAKE_FALLBACK(op, sz) \ static u##sz real_raw_read ## op(const volatile void __iomem *addr) \ { \ @@ -213,7 +213,7 @@ static void real_memcpy_toio(volatile void __iomem *addr, const void *buffer, WARN(1, "Invalid memcpy_toio at address 0x%llx\n", (unsigned long long)(uintptr_t __force)addr); } -#endif /* CONFIG_LOGIC_IOMEM_FALLBACK */ +#endif /* CONFIG_INDIRECT_IOMEM_FALLBACK */
#define MAKE_OP(op, sz) \ u##sz __raw_read ## op(const volatile void __iomem *addr) \
From: Benjamin Beichler benjamin.beichler@uni-rostock.de
[ Upstream commit e3a33af812c611d99756e2ec61e9d7068d466bdf ]
Due to dropped inclusion of asm-generic/xor.h, xor_block_8regs symbol is missing with CONFIG64 and break compilation, as the asm/xor_64.h also did not include it. The patch recreate the logic from arch/x86, which check whether AVX is available and add fallbacks for 32bit and 64bit config of um.
A very minor additional "fix" is, the return of the macro parameter instead of NULL, as this is the original intent of the macro, but this does not change the actual behavior.
Fixes: c0ecca6604b8 ("um: enable the use of optimized xor routines in UML") Signed-off-by: Benjamin Beichler benjamin.beichler@uni-rostock.de Acked-By: Anton Ivanov anton.ivanov@cambridgegreys.com Signed-off-by: Richard Weinberger richard@nod.at Signed-off-by: Sasha Levin sashal@kernel.org --- arch/um/include/asm/xor.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/um/include/asm/xor.h b/arch/um/include/asm/xor.h index f512704a9ec7..22b39de73c24 100644 --- a/arch/um/include/asm/xor.h +++ b/arch/um/include/asm/xor.h @@ -4,8 +4,10 @@
#ifdef CONFIG_64BIT #undef CONFIG_X86_32 +#define TT_CPU_INF_XOR_DEFAULT (AVX_SELECT(&xor_block_sse_pf64)) #else #define CONFIG_X86_32 1 +#define TT_CPU_INF_XOR_DEFAULT (AVX_SELECT(&xor_block_8regs)) #endif
#include <asm/cpufeature.h> @@ -16,7 +18,7 @@ #undef XOR_SELECT_TEMPLATE /* pick an arbitrary one - measuring isn't possible with inf-cpu */ #define XOR_SELECT_TEMPLATE(x) \ - (time_travel_mode == TT_MODE_INFCPU ? &xor_block_8regs : NULL) + (time_travel_mode == TT_MODE_INFCPU ? TT_CPU_INF_XOR_DEFAULT : x)) #endif
#endif
From: Jiasheng Jiang jiasheng@iscas.ac.cn
[ Upstream commit 43f0269b6b89c1eec4ef83c48035608f4dcdd886 ]
As the potential failure of the wm8350_register_irq(), it should be better to check it and return error if fails. Also, it need not free 'wm_rtc->rtc' since it will be freed automatically.
Fixes: 077eaf5b40ec ("rtc: rtc-wm8350: add support for WM8350 RTC") Signed-off-by: Jiasheng Jiang jiasheng@iscas.ac.cn Acked-by: Charles Keepax ckeepax@opensource.cirrus.com Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Link: https://lore.kernel.org/r/20220303085030.291793-1-jiasheng@iscas.ac.cn Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/rtc/rtc-wm8350.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/rtc/rtc-wm8350.c b/drivers/rtc/rtc-wm8350.c index 2018614f258f..6eaa9321c074 100644 --- a/drivers/rtc/rtc-wm8350.c +++ b/drivers/rtc/rtc-wm8350.c @@ -432,14 +432,21 @@ static int wm8350_rtc_probe(struct platform_device *pdev) return ret; }
- wm8350_register_irq(wm8350, WM8350_IRQ_RTC_SEC, + ret = wm8350_register_irq(wm8350, WM8350_IRQ_RTC_SEC, wm8350_rtc_update_handler, 0, "RTC Seconds", wm8350); + if (ret) + return ret; + wm8350_mask_irq(wm8350, WM8350_IRQ_RTC_SEC);
- wm8350_register_irq(wm8350, WM8350_IRQ_RTC_ALM, + ret = wm8350_register_irq(wm8350, WM8350_IRQ_RTC_ALM, wm8350_rtc_alarm_handler, 0, "RTC Alarm", wm8350); + if (ret) { + wm8350_free_irq(wm8350, WM8350_IRQ_RTC_SEC, wm8350); + return ret; + }
return 0; }
From: Luis Chamberlain mcgrof@kernel.org
[ Upstream commit e1654f413fe08ffbc3292d8d2b8958b2cc5cb5e8 ]
We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling.
Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Luis Chamberlain mcgrof@kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/block/nbd.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 577c7dba5d78..946fdc1734f4 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -1762,7 +1762,9 @@ static struct nbd_device *nbd_dev_add(int index, unsigned int refs) disk->fops = &nbd_fops; disk->private_data = nbd; sprintf(disk->disk_name, "nbd%d", index); - add_disk(disk); + err = add_disk(disk); + if (err) + goto out_err_disk;
/* * Now publish the device. @@ -1771,6 +1773,8 @@ static struct nbd_device *nbd_dev_add(int index, unsigned int refs) nbd_total_devices++; return nbd;
+out_err_disk: + blk_cleanup_disk(disk); out_free_idr: mutex_lock(&nbd_index_mutex); idr_remove(&nbd_index_idr, index);
From: Ye Bin yebin10@huawei.com
[ Upstream commit 69beb62ff0d1723a750eebe1c4d01da573d7cd19 ]
If first_minor is illegal will goto out_free_idr label, this will miss cleanup disk.
Fixes: b1a811633f73 ("block: nbd: add sanity check for first_minor") Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Josef Bacik josef@toxicpanda.com Link: https://lore.kernel.org/r/20211102015237.2309763-4-yebin10@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/block/nbd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 946fdc1734f4..4ae6c221b36d 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -1755,7 +1755,7 @@ static struct nbd_device *nbd_dev_add(int index, unsigned int refs) disk->first_minor = index << part_shift; if (disk->first_minor < index || disk->first_minor > MINORMASK) { err = -EINVAL; - goto out_free_idr; + goto out_err_disk; }
disk->minors = 1 << part_shift;
From: Ye Bin yebin10@huawei.com
[ Upstream commit e2daec488c57069a4a431d5b752f50294c4bf273 ]
I got follow issue: [ 247.381177] INFO: task kworker/u10:0:47 blocked for more than 120 seconds. [ 247.382644] Not tainted 4.19.90-dirty #140 [ 247.383502] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 247.385027] Call Trace: [ 247.388384] schedule+0xb8/0x3c0 [ 247.388966] schedule_timeout+0x2b4/0x380 [ 247.392815] wait_for_completion+0x367/0x510 [ 247.397713] flush_workqueue+0x32b/0x1340 [ 247.402700] drain_workqueue+0xda/0x3c0 [ 247.403442] destroy_workqueue+0x7b/0x690 [ 247.405014] nbd_config_put.cold+0x2f9/0x5b6 [ 247.405823] recv_work+0x1fd/0x2b0 [ 247.406485] process_one_work+0x70b/0x1610 [ 247.407262] worker_thread+0x5a9/0x1060 [ 247.408699] kthread+0x35e/0x430 [ 247.410918] ret_from_fork+0x1f/0x30
We can reproduce issue as follows: 1. Inject memory fault in nbd_start_device -1244,10 +1248,18 @@ static int nbd_start_device(struct nbd_device *nbd) nbd_dev_dbg_init(nbd); for (i = 0; i < num_connections; i++) { struct recv_thread_args *args; - - args = kzalloc(sizeof(*args), GFP_KERNEL); + + if (i == 1) { + args = NULL; + printk("%s: inject malloc error\n", __func__); + } + else + args = kzalloc(sizeof(*args), GFP_KERNEL); 2. Inject delay in recv_work -757,6 +760,8 @@ static void recv_work(struct work_struct *work)
blk_mq_complete_request(blk_mq_rq_from_pdu(cmd)); } + printk("%s: comm=%s pid=%d\n", __func__, current->comm, current->pid); + mdelay(5 * 1000); nbd_config_put(nbd); atomic_dec(&config->recv_threads); wake_up(&config->recv_wq); 3. Create nbd server nbd-server 8000 /tmp/disk 4. Create nbd client nbd-client localhost 8000 /dev/nbd1 Then will trigger above issue.
Reason is when add delay in recv_work, lead to release the last reference of 'nbd->config_refs'. nbd_config_put will call flush_workqueue to make all work finish. Obviously, it will lead to deadloop. To solve this issue, according to Josef's suggestion move 'recv_work' init from start device to nbd_dev_add, then destroy 'recv_work'when nbd device teardown.
Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Josef Bacik josef@toxicpanda.com Link: https://lore.kernel.org/r/20211102015237.2309763-5-yebin10@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/block/nbd.c | 36 ++++++++++++++++-------------------- 1 file changed, 16 insertions(+), 20 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 4ae6c221b36d..582b23befb5c 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -254,7 +254,7 @@ static void nbd_dev_remove(struct nbd_device *nbd) mutex_lock(&nbd_index_mutex); idr_remove(&nbd_index_idr, nbd->index); mutex_unlock(&nbd_index_mutex); - + destroy_workqueue(nbd->recv_workq); kfree(nbd); }
@@ -1260,10 +1260,6 @@ static void nbd_config_put(struct nbd_device *nbd) kfree(nbd->config); nbd->config = NULL;
- if (nbd->recv_workq) - destroy_workqueue(nbd->recv_workq); - nbd->recv_workq = NULL; - nbd->tag_set.timeout = 0; nbd->disk->queue->limits.discard_granularity = 0; nbd->disk->queue->limits.discard_alignment = 0; @@ -1292,14 +1288,6 @@ static int nbd_start_device(struct nbd_device *nbd) return -EINVAL; }
- nbd->recv_workq = alloc_workqueue("knbd%d-recv", - WQ_MEM_RECLAIM | WQ_HIGHPRI | - WQ_UNBOUND, 0, nbd->index); - if (!nbd->recv_workq) { - dev_err(disk_to_dev(nbd->disk), "Could not allocate knbd recv work queue.\n"); - return -ENOMEM; - } - blk_mq_update_nr_hw_queues(&nbd->tag_set, config->num_connections); nbd->pid = task_pid_nr(current);
@@ -1725,6 +1713,15 @@ static struct nbd_device *nbd_dev_add(int index, unsigned int refs) } nbd->disk = disk;
+ nbd->recv_workq = alloc_workqueue("nbd%d-recv", + WQ_MEM_RECLAIM | WQ_HIGHPRI | + WQ_UNBOUND, 0, nbd->index); + if (!nbd->recv_workq) { + dev_err(disk_to_dev(nbd->disk), "Could not allocate knbd recv work queue.\n"); + err = -ENOMEM; + goto out_err_disk; + } + /* * Tell the block layer that we are not a rotational device */ @@ -1755,7 +1752,7 @@ static struct nbd_device *nbd_dev_add(int index, unsigned int refs) disk->first_minor = index << part_shift; if (disk->first_minor < index || disk->first_minor > MINORMASK) { err = -EINVAL; - goto out_err_disk; + goto out_free_work; }
disk->minors = 1 << part_shift; @@ -1764,7 +1761,7 @@ static struct nbd_device *nbd_dev_add(int index, unsigned int refs) sprintf(disk->disk_name, "nbd%d", index); err = add_disk(disk); if (err) - goto out_err_disk; + goto out_free_work;
/* * Now publish the device. @@ -1773,6 +1770,8 @@ static struct nbd_device *nbd_dev_add(int index, unsigned int refs) nbd_total_devices++; return nbd;
+out_free_work: + destroy_workqueue(nbd->recv_workq); out_err_disk: blk_cleanup_disk(disk); out_free_idr: @@ -2028,13 +2027,10 @@ static void nbd_disconnect_and_put(struct nbd_device *nbd) nbd_disconnect(nbd); sock_shutdown(nbd); /* - * Make sure recv thread has finished, so it does not drop the last - * config ref and try to destroy the workqueue from inside the work - * queue. And this also ensure that we can safely call nbd_clear_que() + * Make sure recv thread has finished, we can safely call nbd_clear_que() * to cancel the inflight I/Os. */ - if (nbd->recv_workq) - flush_workqueue(nbd->recv_workq); + flush_workqueue(nbd->recv_workq); nbd_clear_que(nbd); nbd->task_setup = NULL; mutex_unlock(&nbd->config_lock);
From: Zhang Wensheng zhangwensheng5@huawei.com
[ Upstream commit 6d35d04a9e18990040e87d2bbf72689252669d54 ]
When 'index' is a big numbers, it may become negative which forced to 'int'. then 'index << part_shift' might overflow to a positive value that is not greater than '0xfffff', then sysfs might complains about duplicate creation. Because of this, move the 'index' judgment to the front will fix it and be better.
Fixes: b0d9111a2d53 ("nbd: use an idr to keep track of nbd devices") Fixes: 940c264984fd ("nbd: fix possible overflow for 'first_minor' in nbd_dev_add()") Signed-off-by: Zhang Wensheng zhangwensheng5@huawei.com Reviewed-by: Josef Bacik josef@toxicpanda.com Link: https://lore.kernel.org/r/20220310093224.4002895-1-zhangwensheng5@huawei.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/block/nbd.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 582b23befb5c..1c62262304be 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -1744,17 +1744,6 @@ static struct nbd_device *nbd_dev_add(int index, unsigned int refs) refcount_set(&nbd->refs, 0); INIT_LIST_HEAD(&nbd->list); disk->major = NBD_MAJOR; - - /* Too big first_minor can cause duplicate creation of - * sysfs files/links, since index << part_shift might overflow, or - * MKDEV() expect that the max bits of first_minor is 20. - */ - disk->first_minor = index << part_shift; - if (disk->first_minor < index || disk->first_minor > MINORMASK) { - err = -EINVAL; - goto out_free_work; - } - disk->minors = 1 << part_shift; disk->fops = &nbd_fops; disk->private_data = nbd; @@ -1859,8 +1848,19 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info) if (!netlink_capable(skb, CAP_SYS_ADMIN)) return -EPERM;
- if (info->attrs[NBD_ATTR_INDEX]) + if (info->attrs[NBD_ATTR_INDEX]) { index = nla_get_u32(info->attrs[NBD_ATTR_INDEX]); + + /* + * Too big first_minor can cause duplicate creation of + * sysfs files/links, since index << part_shift might overflow, or + * MKDEV() expect that the max bits of first_minor is 20. + */ + if (index < 0 || index > MINORMASK >> part_shift) { + printk(KERN_ERR "nbd: illegal input index %d\n", index); + return -EINVAL; + } + } if (!info->attrs[NBD_ATTR_SOCKETS]) { printk(KERN_ERR "nbd: must specify at least one socket\n"); return -EINVAL;
From: Marco Elver elver@google.com
[ Upstream commit 9a19aeb5665068c3e2727230588684aae2cab7ef ]
Maintain a counter to count allocations that are skipped due to being incompatible (oversized, incompatible gfp flags) or no capacity.
This is to compute the fraction of allocations that could not be serviced by KFENCE, which we expect to be rare.
Link: https://lkml.kernel.org/r/20210923104803.2620285-2-elver@google.com Signed-off-by: Marco Elver elver@google.com Reviewed-by: Dmitry Vyukov dvyukov@google.com Acked-by: Alexander Potapenko glider@google.com Cc: Aleksandr Nogikh nogikh@google.com Cc: Jann Horn jannh@google.com Cc: Taras Madan tarasmadan@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/kfence/core.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 84555b8233ef..f26f55850ad7 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -113,6 +113,8 @@ enum kfence_counter_id { KFENCE_COUNTER_FREES, KFENCE_COUNTER_ZOMBIES, KFENCE_COUNTER_BUGS, + KFENCE_COUNTER_SKIP_INCOMPAT, + KFENCE_COUNTER_SKIP_CAPACITY, KFENCE_COUNTER_COUNT, }; static atomic_long_t counters[KFENCE_COUNTER_COUNT]; @@ -122,6 +124,8 @@ static const char *const counter_names[] = { [KFENCE_COUNTER_FREES] = "total frees", [KFENCE_COUNTER_ZOMBIES] = "zombie allocations", [KFENCE_COUNTER_BUGS] = "total bugs", + [KFENCE_COUNTER_SKIP_INCOMPAT] = "skipped allocations (incompatible)", + [KFENCE_COUNTER_SKIP_CAPACITY] = "skipped allocations (capacity)", }; static_assert(ARRAY_SIZE(counter_names) == KFENCE_COUNTER_COUNT);
@@ -272,8 +276,10 @@ static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t g list_del_init(&meta->list); } raw_spin_unlock_irqrestore(&kfence_freelist_lock, flags); - if (!meta) + if (!meta) { + atomic_long_inc(&counters[KFENCE_COUNTER_SKIP_CAPACITY]); return NULL; + }
if (unlikely(!raw_spin_trylock_irqsave(&meta->lock, flags))) { /* @@ -744,8 +750,10 @@ void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags) * Perform size check before switching kfence_allocation_gate, so that * we don't disable KFENCE without making an allocation. */ - if (size > PAGE_SIZE) + if (size > PAGE_SIZE) { + atomic_long_inc(&counters[KFENCE_COUNTER_SKIP_INCOMPAT]); return NULL; + }
/* * Skip allocations from non-default zones, including DMA. We cannot @@ -753,8 +761,10 @@ void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags) * properties (e.g. reside in DMAable memory). */ if ((flags & GFP_ZONEMASK) || - (s->flags & (SLAB_CACHE_DMA | SLAB_CACHE_DMA32))) + (s->flags & (SLAB_CACHE_DMA | SLAB_CACHE_DMA32))) { + atomic_long_inc(&counters[KFENCE_COUNTER_SKIP_INCOMPAT]); return NULL; + }
if (atomic_inc_return(&kfence_allocation_gate) > 1) return NULL;
From: Marco Elver elver@google.com
[ Upstream commit a9ab52bbcb52df49ec4b30e6741e120588989455 ]
Move the saving of the stack trace of allocations into __kfence_alloc(), so that the stack entries array can be used outside of kfence_guarded_alloc() and we avoid potentially unwinding the stack multiple times.
Link: https://lkml.kernel.org/r/20210923104803.2620285-3-elver@google.com Signed-off-by: Marco Elver elver@google.com Reviewed-by: Dmitry Vyukov dvyukov@google.com Acked-by: Alexander Potapenko glider@google.com Cc: Aleksandr Nogikh nogikh@google.com Cc: Jann Horn jannh@google.com Cc: Taras Madan tarasmadan@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/kfence/core.c | 35 ++++++++++++++++++++++++----------- 1 file changed, 24 insertions(+), 11 deletions(-)
diff --git a/mm/kfence/core.c b/mm/kfence/core.c index f26f55850ad7..4eec0c5d32b5 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -188,19 +188,26 @@ static inline unsigned long metadata_to_pageaddr(const struct kfence_metadata *m * Update the object's metadata state, including updating the alloc/free stacks * depending on the state transition. */ -static noinline void metadata_update_state(struct kfence_metadata *meta, - enum kfence_object_state next) +static noinline void +metadata_update_state(struct kfence_metadata *meta, enum kfence_object_state next, + unsigned long *stack_entries, size_t num_stack_entries) { struct kfence_track *track = next == KFENCE_OBJECT_FREED ? &meta->free_track : &meta->alloc_track;
lockdep_assert_held(&meta->lock);
- /* - * Skip over 1 (this) functions; noinline ensures we do not accidentally - * skip over the caller by never inlining. - */ - track->num_stack_entries = stack_trace_save(track->stack_entries, KFENCE_STACK_DEPTH, 1); + if (stack_entries) { + memcpy(track->stack_entries, stack_entries, + num_stack_entries * sizeof(stack_entries[0])); + } else { + /* + * Skip over 1 (this) functions; noinline ensures we do not + * accidentally skip over the caller by never inlining. + */ + num_stack_entries = stack_trace_save(track->stack_entries, KFENCE_STACK_DEPTH, 1); + } + track->num_stack_entries = num_stack_entries; track->pid = task_pid_nr(current); track->cpu = raw_smp_processor_id(); track->ts_nsec = local_clock(); /* Same source as printk timestamps. */ @@ -262,7 +269,8 @@ static __always_inline void for_each_canary(const struct kfence_metadata *meta, } }
-static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t gfp) +static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t gfp, + unsigned long *stack_entries, size_t num_stack_entries) { struct kfence_metadata *meta = NULL; unsigned long flags; @@ -321,7 +329,7 @@ static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t g addr = (void *)meta->addr;
/* Update remaining metadata. */ - metadata_update_state(meta, KFENCE_OBJECT_ALLOCATED); + metadata_update_state(meta, KFENCE_OBJECT_ALLOCATED, stack_entries, num_stack_entries); /* Pairs with READ_ONCE() in kfence_shutdown_cache(). */ WRITE_ONCE(meta->cache, cache); meta->size = size; @@ -401,7 +409,7 @@ static void kfence_guarded_free(void *addr, struct kfence_metadata *meta, bool z memzero_explicit(addr, meta->size);
/* Mark the object as freed. */ - metadata_update_state(meta, KFENCE_OBJECT_FREED); + metadata_update_state(meta, KFENCE_OBJECT_FREED, NULL, 0);
raw_spin_unlock_irqrestore(&meta->lock, flags);
@@ -746,6 +754,9 @@ void kfence_shutdown_cache(struct kmem_cache *s)
void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags) { + unsigned long stack_entries[KFENCE_STACK_DEPTH]; + size_t num_stack_entries; + /* * Perform size check before switching kfence_allocation_gate, so that * we don't disable KFENCE without making an allocation. @@ -785,7 +796,9 @@ void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags) if (!READ_ONCE(kfence_enabled)) return NULL;
- return kfence_guarded_alloc(s, size, flags); + num_stack_entries = stack_trace_save(stack_entries, KFENCE_STACK_DEPTH, 0); + + return kfence_guarded_alloc(s, size, flags, stack_entries, num_stack_entries); }
size_t kfence_ksize(const void *addr)
From: Marco Elver elver@google.com
[ Upstream commit 08f6b10630f284755087f58aa393402e15b92977 ]
One of KFENCE's main design principles is that with increasing uptime, allocation coverage increases sufficiently to detect previously undetected bugs.
We have observed that frequent long-lived allocations of the same source (e.g. pagecache) tend to permanently fill up the KFENCE pool with increasing system uptime, thus breaking the above requirement. The workaround thus far had been increasing the sample interval and/or increasing the KFENCE pool size, but is no reliable solution.
To ensure diverse coverage of allocations, limit currently covered allocations of the same source once pool utilization reaches 75% (configurable via `kfence.skip_covered_thresh`) or above. The effect is retaining reasonable allocation coverage when the pool is close to full.
A side-effect is that this also limits frequent long-lived allocations of the same source filling up the pool permanently.
Uniqueness of an allocation for coverage purposes is based on its (partial) allocation stack trace (the source). A Counting Bloom filter is used to check if an allocation is covered; if the allocation is currently covered, the allocation is skipped by KFENCE.
Testing was done using:
(a) a synthetic workload that performs frequent long-lived allocations (default config values; sample_interval=1; num_objects=63), and
(b) normal desktop workloads on an otherwise idle machine where the problem was first reported after a few days of uptime (default config values).
In both test cases the sampled allocation rate no longer drops to zero at any point. In the case of (b) we observe (after 2 days uptime) 15% unique allocations in the pool, 77% pool utilization, with 20% "skipped allocations (covered)".
[elver@google.com: simplify and just use hash_32(), use more random stack_hash_seed] Link: https://lkml.kernel.org/r/YU3MRGaCaJiYht5g@elver.google.com [elver@google.com: fix 32 bit]
Link: https://lkml.kernel.org/r/20210923104803.2620285-4-elver@google.com Signed-off-by: Marco Elver elver@google.com Reviewed-by: Dmitry Vyukov dvyukov@google.com Acked-by: Alexander Potapenko glider@google.com Cc: Aleksandr Nogikh nogikh@google.com Cc: Jann Horn jannh@google.com Cc: Taras Madan tarasmadan@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/kfence/core.c | 109 ++++++++++++++++++++++++++++++++++++++++++++- mm/kfence/kfence.h | 2 + 2 files changed, 109 insertions(+), 2 deletions(-)
diff --git a/mm/kfence/core.c b/mm/kfence/core.c index 4eec0c5d32b5..51ea9193cecb 100644 --- a/mm/kfence/core.c +++ b/mm/kfence/core.c @@ -10,12 +10,15 @@ #include <linux/atomic.h> #include <linux/bug.h> #include <linux/debugfs.h> +#include <linux/hash.h> #include <linux/irq_work.h> +#include <linux/jhash.h> #include <linux/kcsan-checks.h> #include <linux/kfence.h> #include <linux/kmemleak.h> #include <linux/list.h> #include <linux/lockdep.h> +#include <linux/log2.h> #include <linux/memblock.h> #include <linux/moduleparam.h> #include <linux/random.h> @@ -82,6 +85,10 @@ static const struct kernel_param_ops sample_interval_param_ops = { }; module_param_cb(sample_interval, &sample_interval_param_ops, &kfence_sample_interval, 0600);
+/* Pool usage% threshold when currently covered allocations are skipped. */ +static unsigned long kfence_skip_covered_thresh __read_mostly = 75; +module_param_named(skip_covered_thresh, kfence_skip_covered_thresh, ulong, 0644); + /* The pool of pages used for guard pages and objects. */ char *__kfence_pool __ro_after_init; EXPORT_SYMBOL(__kfence_pool); /* Export for test modules. */ @@ -106,6 +113,32 @@ DEFINE_STATIC_KEY_FALSE(kfence_allocation_key); /* Gates the allocation, ensuring only one succeeds in a given period. */ atomic_t kfence_allocation_gate = ATOMIC_INIT(1);
+/* + * A Counting Bloom filter of allocation coverage: limits currently covered + * allocations of the same source filling up the pool. + * + * Assuming a range of 15%-85% unique allocations in the pool at any point in + * time, the below parameters provide a probablity of 0.02-0.33 for false + * positive hits respectively: + * + * P(alloc_traces) = (1 - e^(-HNUM * (alloc_traces / SIZE)) ^ HNUM + */ +#define ALLOC_COVERED_HNUM 2 +#define ALLOC_COVERED_ORDER (const_ilog2(CONFIG_KFENCE_NUM_OBJECTS) + 2) +#define ALLOC_COVERED_SIZE (1 << ALLOC_COVERED_ORDER) +#define ALLOC_COVERED_HNEXT(h) hash_32(h, ALLOC_COVERED_ORDER) +#define ALLOC_COVERED_MASK (ALLOC_COVERED_SIZE - 1) +static atomic_t alloc_covered[ALLOC_COVERED_SIZE]; + +/* Stack depth used to determine uniqueness of an allocation. */ +#define UNIQUE_ALLOC_STACK_DEPTH ((size_t)8) + +/* + * Randomness for stack hashes, making the same collisions across reboots and + * different machines less likely. + */ +static u32 stack_hash_seed __ro_after_init; + /* Statistics counters for debugfs. */ enum kfence_counter_id { KFENCE_COUNTER_ALLOCATED, @@ -115,6 +148,7 @@ enum kfence_counter_id { KFENCE_COUNTER_BUGS, KFENCE_COUNTER_SKIP_INCOMPAT, KFENCE_COUNTER_SKIP_CAPACITY, + KFENCE_COUNTER_SKIP_COVERED, KFENCE_COUNTER_COUNT, }; static atomic_long_t counters[KFENCE_COUNTER_COUNT]; @@ -126,11 +160,57 @@ static const char *const counter_names[] = { [KFENCE_COUNTER_BUGS] = "total bugs", [KFENCE_COUNTER_SKIP_INCOMPAT] = "skipped allocations (incompatible)", [KFENCE_COUNTER_SKIP_CAPACITY] = "skipped allocations (capacity)", + [KFENCE_COUNTER_SKIP_COVERED] = "skipped allocations (covered)", }; static_assert(ARRAY_SIZE(counter_names) == KFENCE_COUNTER_COUNT);
/* === Internals ============================================================ */
+static inline bool should_skip_covered(void) +{ + unsigned long thresh = (CONFIG_KFENCE_NUM_OBJECTS * kfence_skip_covered_thresh) / 100; + + return atomic_long_read(&counters[KFENCE_COUNTER_ALLOCATED]) > thresh; +} + +static u32 get_alloc_stack_hash(unsigned long *stack_entries, size_t num_entries) +{ + num_entries = min(num_entries, UNIQUE_ALLOC_STACK_DEPTH); + num_entries = filter_irq_stacks(stack_entries, num_entries); + return jhash(stack_entries, num_entries * sizeof(stack_entries[0]), stack_hash_seed); +} + +/* + * Adds (or subtracts) count @val for allocation stack trace hash + * @alloc_stack_hash from Counting Bloom filter. + */ +static void alloc_covered_add(u32 alloc_stack_hash, int val) +{ + int i; + + for (i = 0; i < ALLOC_COVERED_HNUM; i++) { + atomic_add(val, &alloc_covered[alloc_stack_hash & ALLOC_COVERED_MASK]); + alloc_stack_hash = ALLOC_COVERED_HNEXT(alloc_stack_hash); + } +} + +/* + * Returns true if the allocation stack trace hash @alloc_stack_hash is + * currently contained (non-zero count) in Counting Bloom filter. + */ +static bool alloc_covered_contains(u32 alloc_stack_hash) +{ + int i; + + for (i = 0; i < ALLOC_COVERED_HNUM; i++) { + if (!atomic_read(&alloc_covered[alloc_stack_hash & ALLOC_COVERED_MASK])) + return false; + alloc_stack_hash = ALLOC_COVERED_HNEXT(alloc_stack_hash); + } + + return true; +} + static bool kfence_protect(unsigned long addr) { return !KFENCE_WARN_ON(!kfence_protect_page(ALIGN_DOWN(addr, PAGE_SIZE), true)); @@ -270,7 +350,8 @@ static __always_inline void for_each_canary(const struct kfence_metadata *meta, }
static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t gfp, - unsigned long *stack_entries, size_t num_stack_entries) + unsigned long *stack_entries, size_t num_stack_entries, + u32 alloc_stack_hash) { struct kfence_metadata *meta = NULL; unsigned long flags; @@ -333,6 +414,8 @@ static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t g /* Pairs with READ_ONCE() in kfence_shutdown_cache(). */ WRITE_ONCE(meta->cache, cache); meta->size = size; + meta->alloc_stack_hash = alloc_stack_hash; + for_each_canary(meta, set_canary_byte);
/* Set required struct page fields. */ @@ -345,6 +428,8 @@ static void *kfence_guarded_alloc(struct kmem_cache *cache, size_t size, gfp_t g
raw_spin_unlock_irqrestore(&meta->lock, flags);
+ alloc_covered_add(alloc_stack_hash, 1); + /* Memory initialization. */
/* @@ -413,6 +498,8 @@ static void kfence_guarded_free(void *addr, struct kfence_metadata *meta, bool z
raw_spin_unlock_irqrestore(&meta->lock, flags);
+ alloc_covered_add(meta->alloc_stack_hash, -1); + /* Protect to detect use-after-frees. */ kfence_protect((unsigned long)addr);
@@ -679,6 +766,7 @@ void __init kfence_init(void) if (!kfence_sample_interval) return;
+ stack_hash_seed = (u32)random_get_entropy(); if (!kfence_init_pool()) { pr_err("%s failed\n", __func__); return; @@ -756,6 +844,7 @@ void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags) { unsigned long stack_entries[KFENCE_STACK_DEPTH]; size_t num_stack_entries; + u32 alloc_stack_hash;
/* * Perform size check before switching kfence_allocation_gate, so that @@ -798,7 +887,23 @@ void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags)
num_stack_entries = stack_trace_save(stack_entries, KFENCE_STACK_DEPTH, 0);
- return kfence_guarded_alloc(s, size, flags, stack_entries, num_stack_entries); + /* + * Do expensive check for coverage of allocation in slow-path after + * allocation_gate has already become non-zero, even though it might + * mean not making any allocation within a given sample interval. + * + * This ensures reasonable allocation coverage when the pool is almost + * full, including avoiding long-lived allocations of the same source + * filling up the pool (e.g. pagecache allocations). + */ + alloc_stack_hash = get_alloc_stack_hash(stack_entries, num_stack_entries); + if (should_skip_covered() && alloc_covered_contains(alloc_stack_hash)) { + atomic_long_inc(&counters[KFENCE_COUNTER_SKIP_COVERED]); + return NULL; + } + + return kfence_guarded_alloc(s, size, flags, stack_entries, num_stack_entries, + alloc_stack_hash); }
size_t kfence_ksize(const void *addr) diff --git a/mm/kfence/kfence.h b/mm/kfence/kfence.h index c1f23c61e5f9..2a2d5de9d379 100644 --- a/mm/kfence/kfence.h +++ b/mm/kfence/kfence.h @@ -87,6 +87,8 @@ struct kfence_metadata { /* Allocation and free stack information. */ struct kfence_track alloc_track; struct kfence_track free_track; + /* For updating alloc_covered on frees. */ + u32 alloc_stack_hash; };
extern struct kfence_metadata kfence_metadata[CONFIG_KFENCE_NUM_OBJECTS];
From: Jim Mattson jmattson@google.com
[ Upstream commit 95b065bf5c431c06c68056a03a5853b660640ecc ]
The third nybble of AMD's event select overlaps with Intel's IN_TX and IN_TXCP bits. Therefore, we can't use AMD64_RAW_EVENT_MASK on Intel platforms that support TSX.
Declare a raw_event_mask in the kvm_pmu structure, initialize it in the vendor-specific pmu_refresh() functions, and use that mask for PERF_TYPE_RAW configurations in reprogram_gp_counter().
Fixes: 710c47651431 ("KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW") Signed-off-by: Jim Mattson jmattson@google.com Message-Id: 20220308012452.3468611-1-jmattson@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/pmu.c | 3 ++- arch/x86/kvm/svm/pmu.c | 1 + arch/x86/kvm/vmx/pmu_intel.c | 1 + 4 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 01759199d723..d9bb5cdb5db2 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -505,6 +505,7 @@ struct kvm_pmu { u64 global_ctrl_mask; u64 global_ovf_ctrl_mask; u64 reserved_bits; + u64 raw_event_mask; u8 version; struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC]; struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED]; diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index f256f01056bd..44a5ab91a99d 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -178,6 +178,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel) struct kvm *kvm = pmc->vcpu->kvm; struct kvm_pmu_event_filter *filter; int i; + struct kvm_pmu *pmu = vcpu_to_pmu(pmc->vcpu); bool allow_event = true;
if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL) @@ -217,7 +218,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel) }
if (type == PERF_TYPE_RAW) - config = eventsel & AMD64_RAW_EVENT_MASK; + config = eventsel & pmu->raw_event_mask;
if (pmc->current_config == eventsel && pmc_resume_counter(pmc)) return; diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c index 06f8034f62e4..369164368819 100644 --- a/arch/x86/kvm/svm/pmu.c +++ b/arch/x86/kvm/svm/pmu.c @@ -283,6 +283,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1; pmu->reserved_bits = 0xfffffff000280000ull; + pmu->raw_event_mask = AMD64_RAW_EVENT_MASK; pmu->version = 1; /* not applicable to AMD; but clean them to prevent any fall out */ pmu->counter_bitmask[KVM_PMC_FIXED] = 0; diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index 6427d95de01c..db1b88445acb 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -478,6 +478,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu) pmu->counter_bitmask[KVM_PMC_FIXED] = 0; pmu->version = 0; pmu->reserved_bits = 0xffffffff00200000ull; + pmu->raw_event_mask = X86_RAW_EVENT_MASK;
entry = kvm_find_cpuid_entry(vcpu, 0xa, 0); if (!entry)
From: Peter Gonda pgonda@google.com
[ Upstream commit 4a9e7b9ea252842bc8b14d495706ac6317fafd5d ]
Include kvm_cache_regs.h to pick up the definition of is_guest_mode(), which is referenced by nested_svm_virtualize_tpr() in svm.h. Remove include from svm_onhpyerv.c which was done only because of lack of include in svm.h.
Fixes: 883b0a91f41ab ("KVM: SVM: Move Nested SVM Implementation to nested.c") Cc: Paolo Bonzini pbonzini@redhat.com Cc: Sean Christopherson seanjc@google.com Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Peter Gonda pgonda@google.com Message-Id: 20220304161032.2270688-1-pgonda@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/svm/svm.h | 2 ++ arch/x86/kvm/svm/svm_onhyperv.c | 1 - 2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index ff0855c03c91..a4c13e517487 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -22,6 +22,8 @@ #include <asm/svm.h> #include <asm/sev-common.h>
+#include "kvm_cache_regs.h" + #define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)
#define IOPM_SIZE PAGE_SIZE * 3 diff --git a/arch/x86/kvm/svm/svm_onhyperv.c b/arch/x86/kvm/svm/svm_onhyperv.c index 98aa981c04ec..8cdc62c74a96 100644 --- a/arch/x86/kvm/svm/svm_onhyperv.c +++ b/arch/x86/kvm/svm/svm_onhyperv.c @@ -4,7 +4,6 @@ */
#include <linux/kvm_host.h> -#include "kvm_cache_regs.h"
#include <asm/mshyperv.h>
From: Jim Mattson jmattson@google.com
[ Upstream commit 9b026073db2f1ad0e4d8b61c83316c8497981037 ]
AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some reserved bits are cleared, and some are not. Specifically, on Zen3/Milan, bits 19 and 42 are not cleared.
When emulating such a WRMSR, KVM should not synthesize a #GP, regardless of which bits are set. However, undocumented bits should not be passed through to the hardware MSR. So, rather than checking for reserved bits and synthesizing a #GP, just clear the reserved bits.
This may seem pedantic, but since KVM currently does not support the "Host/Guest Only" bits (41:40), it is necessary to clear these bits rather than synthesizing #GP, because some popular guests (e.g Linux) will set the "Host Only" bit even on CPUs that don't support EFER.SVME, and they don't expect a #GP.
For example,
root@Ubuntu1804:~# perf stat -e r26 -a sleep 1
Performance counter stats for 'system wide':
0 r26
1.001070977 seconds time elapsed
Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379957] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000130026) at rIP: 0xffffffff9b276a28 (native_write_msr+0x8/0x30) Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379958] Call Trace: Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379963] amd_pmu_disable_event+0x27/0x90
Fixes: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM") Reported-by: Lotus Fenn lotusf@google.com Signed-off-by: Jim Mattson jmattson@google.com Reviewed-by: Like Xu likexu@tencent.com Reviewed-by: David Dunn daviddunn@google.com Message-Id: 20220226234131.2167175-1-jmattson@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/svm/pmu.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c index 369164368819..3faf1d9c6c91 100644 --- a/arch/x86/kvm/svm/pmu.c +++ b/arch/x86/kvm/svm/pmu.c @@ -261,12 +261,10 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) /* MSR_EVNTSELn */ pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_EVNTSEL); if (pmc) { - if (data == pmc->eventsel) - return 0; - if (!(data & pmu->reserved_bits)) { + data &= ~pmu->reserved_bits; + if (data != pmc->eventsel) reprogram_gp_counter(pmc, data); - return 0; - } + return 0; }
return 1;
From: Like Xu likexu@tencent.com
[ Upstream commit e644896f5106aa3f6d7e8c7adf2e4dc0fce53555 ]
HSW_IN_TX* bits are used in generic code which are not supported on AMD. Worse, these bits overlap with AMD EventSelect[11:8] and hence using HSW_IN_TX* bits unconditionally in generic code is resulting in unintentional pmu behavior on AMD. For example, if EventSelect[11:8] is 0x2, pmc_reprogram_counter() wrongly assumes that HSW_IN_TX_CHECKPOINTED is set and thus forces sampling period to be 0.
Also per the SDM, both bits 32 and 33 "may only be set if the processor supports HLE or RTM" and for "IN_TXCP (bit 33): this bit may only be set for IA32_PERFEVTSEL2."
Opportunistically eliminate code redundancy, because if the HSW_IN_TX* bit is set in pmc->eventsel, it is already set in attr.config.
Reported-by: Ravi Bangoria ravi.bangoria@amd.com Reported-by: Jim Mattson jmattson@google.com Fixes: 103af0a98788 ("perf, kvm: Support the in_tx/in_tx_cp modifiers in KVM arch perfmon emulation v5") Co-developed-by: Ravi Bangoria ravi.bangoria@amd.com Signed-off-by: Ravi Bangoria ravi.bangoria@amd.com Signed-off-by: Like Xu likexu@tencent.com Message-Id: 20220309084257.88931-1-likexu@tencent.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/pmu.c | 15 +++++---------- arch/x86/kvm/vmx/pmu_intel.c | 13 ++++++++++--- 2 files changed, 15 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index 44a5ab91a99d..62333f9756a3 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -96,8 +96,7 @@ static void kvm_perf_overflow_intr(struct perf_event *perf_event,
static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config, bool exclude_user, - bool exclude_kernel, bool intr, - bool in_tx, bool in_tx_cp) + bool exclude_kernel, bool intr) { struct perf_event *event; struct perf_event_attr attr = { @@ -113,16 +112,14 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
attr.sample_period = get_sample_period(pmc, pmc->counter);
- if (in_tx) - attr.config |= HSW_IN_TX; - if (in_tx_cp) { + if ((attr.config & HSW_IN_TX_CHECKPOINTED) && + guest_cpuid_is_intel(pmc->vcpu)) { /* * HSW_IN_TX_CHECKPOINTED is not supported with nonzero * period. Just clear the sample period so at least * allocating the counter doesn't fail. */ attr.sample_period = 0; - attr.config |= HSW_IN_TX_CHECKPOINTED; }
event = perf_event_create_kernel_counter(&attr, -1, current, @@ -229,9 +226,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel) pmc_reprogram_counter(pmc, type, config, !(eventsel & ARCH_PERFMON_EVENTSEL_USR), !(eventsel & ARCH_PERFMON_EVENTSEL_OS), - eventsel & ARCH_PERFMON_EVENTSEL_INT, - (eventsel & HSW_IN_TX), - (eventsel & HSW_IN_TX_CHECKPOINTED)); + eventsel & ARCH_PERFMON_EVENTSEL_INT); } EXPORT_SYMBOL_GPL(reprogram_gp_counter);
@@ -267,7 +262,7 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx) kvm_x86_ops.pmu_ops->find_fixed_event(idx), !(en_field & 0x2), /* exclude user */ !(en_field & 0x1), /* exclude kernel */ - pmi, false, false); + pmi); } EXPORT_SYMBOL_GPL(reprogram_fixed_counter);
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index db1b88445acb..7abe77c8b5d0 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -396,6 +396,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) struct kvm_pmc *pmc; u32 msr = msr_info->index; u64 data = msr_info->data; + u64 reserved_bits;
switch (msr) { case MSR_CORE_PERF_FIXED_CTR_CTRL: @@ -451,7 +452,11 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) } else if ((pmc = get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0))) { if (data == pmc->eventsel) return 0; - if (!(data & pmu->reserved_bits)) { + reserved_bits = pmu->reserved_bits; + if ((pmc->idx == 2) && + (pmu->raw_event_mask & HSW_IN_TX_CHECKPOINTED)) + reserved_bits ^= HSW_IN_TX_CHECKPOINTED; + if (!(data & reserved_bits)) { reprogram_gp_counter(pmc, data); return 0; } @@ -525,8 +530,10 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu) entry = kvm_find_cpuid_entry(vcpu, 7, 0); if (entry && (boot_cpu_has(X86_FEATURE_HLE) || boot_cpu_has(X86_FEATURE_RTM)) && - (entry->ebx & (X86_FEATURE_HLE|X86_FEATURE_RTM))) - pmu->reserved_bits ^= HSW_IN_TX|HSW_IN_TX_CHECKPOINTED; + (entry->ebx & (X86_FEATURE_HLE|X86_FEATURE_RTM))) { + pmu->reserved_bits ^= HSW_IN_TX; + pmu->raw_event_mask |= (HSW_IN_TX|HSW_IN_TX_CHECKPOINTED); + }
bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters);
From: Hou Wenlong houwenlong.hwl@antgroup.com
[ Upstream commit a836839cbfe60dc434c5476a7429cf2bae36415d ]
When RDTSCP is supported but RDPID is not supported in host, RDPID emulation is available. However, __kvm_get_msr() would only fail when RDTSCP/RDPID both are disabled in guest, so the emulator wouldn't inject a #UD when RDPID is disabled but RDTSCP is enabled in guest.
Fixes: fb6d4d340e05 ("KVM: x86: emulate RDPID") Signed-off-by: Hou Wenlong houwenlong.hwl@antgroup.com Message-Id: 1dfd46ae5b76d3ed87bde3154d51c64ea64c99c1.1646226788.git.houwenlong.hwl@antgroup.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/emulate.c | 4 +++- arch/x86/kvm/kvm_emulate.h | 1 + arch/x86/kvm/x86.c | 6 ++++++ 3 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 4cf0938a876b..3747a754a8e8 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3514,8 +3514,10 @@ static int em_rdpid(struct x86_emulate_ctxt *ctxt) { u64 tsc_aux = 0;
- if (ctxt->ops->get_msr(ctxt, MSR_TSC_AUX, &tsc_aux)) + if (!ctxt->ops->guest_has_rdpid(ctxt)) return emulate_ud(ctxt); + + ctxt->ops->get_msr(ctxt, MSR_TSC_AUX, &tsc_aux); ctxt->dst.val = tsc_aux; return X86EMUL_CONTINUE; } diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h index 68b420289d7e..fb09cd22cb7f 100644 --- a/arch/x86/kvm/kvm_emulate.h +++ b/arch/x86/kvm/kvm_emulate.h @@ -226,6 +226,7 @@ struct x86_emulate_ops { bool (*guest_has_long_mode)(struct x86_emulate_ctxt *ctxt); bool (*guest_has_movbe)(struct x86_emulate_ctxt *ctxt); bool (*guest_has_fxsr)(struct x86_emulate_ctxt *ctxt); + bool (*guest_has_rdpid)(struct x86_emulate_ctxt *ctxt);
void (*set_nmi_mask)(struct x86_emulate_ctxt *ctxt, bool masked);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3e606a6940dc..5e2983959f23 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7393,6 +7393,11 @@ static bool emulator_guest_has_fxsr(struct x86_emulate_ctxt *ctxt) return guest_cpuid_has(emul_to_vcpu(ctxt), X86_FEATURE_FXSR); }
+static bool emulator_guest_has_rdpid(struct x86_emulate_ctxt *ctxt) +{ + return guest_cpuid_has(emul_to_vcpu(ctxt), X86_FEATURE_RDPID); +} + static ulong emulator_read_gpr(struct x86_emulate_ctxt *ctxt, unsigned reg) { return kvm_register_read_raw(emul_to_vcpu(ctxt), reg); @@ -7475,6 +7480,7 @@ static const struct x86_emulate_ops emulate_ops = { .guest_has_long_mode = emulator_guest_has_long_mode, .guest_has_movbe = emulator_guest_has_movbe, .guest_has_fxsr = emulator_guest_has_fxsr, + .guest_has_rdpid = emulator_guest_has_rdpid, .set_nmi_mask = emulator_set_nmi_mask, .get_hflags = emulator_get_hflags, .exiting_smm = emulator_exiting_smm,
From: Anisse Astier anisse@astier.eu
[ Upstream commit 0b464ca3e0dd3cec65f28bc6d396d82f19080f69 ]
Panel is 800x1280, but mounted on a laptop form factor, sideways.
Signed-off-by: Anisse Astier anisse@astier.eu Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Jani Nikula jani.nikula@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20211229222200.53128-3-anisse@... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/drm_panel_orientation_quirks.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/drm_panel_orientation_quirks.c b/drivers/gpu/drm/drm_panel_orientation_quirks.c index 448c2f2d803a..f5ab891731d0 100644 --- a/drivers/gpu/drm/drm_panel_orientation_quirks.c +++ b/drivers/gpu/drm/drm_panel_orientation_quirks.c @@ -166,6 +166,12 @@ static const struct dmi_system_id orientation_data[] = { DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "MicroPC"), }, .driver_data = (void *)&lcd720x1280_rightside_up, + }, { /* GPD Win Max */ + .matches = { + DMI_EXACT_MATCH(DMI_SYS_VENDOR, "GPD"), + DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "G1619-01"), + }, + .driver_data = (void *)&lcd800x1280_rightside_up, }, { /* * GPD Pocket, note that the the DMI data is less generic then * it seems, devices with a board-vendor of "AMI Corporation"
From: Zekun Shen bruceshenzk@gmail.com
[ Upstream commit 564d4eceb97eaf381dd6ef6470b06377bb50c95a ]
The bug was found during fuzzing. Stacktrace locates it in ath5k_eeprom_convert_pcal_info_5111. When none of the curve is selected in the loop, idx can go up to AR5K_EEPROM_N_PD_CURVES. The line makes pd out of bound. pd = &chinfo[pier].pd_curves[idx];
There are many OOB writes using pd later in the code. So I added a sanity check for idx. Checks for other loops involving AR5K_EEPROM_N_PD_CURVES are not needed as the loop index is not used outside the loops.
The patch is NOT tested with real device.
The following is the fuzzing report
BUG: KASAN: slab-out-of-bounds in ath5k_eeprom_read_pcal_info_5111+0x126a/0x1390 [ath5k] Write of size 1 at addr ffff8880174a4d60 by task modprobe/214
CPU: 0 PID: 214 Comm: modprobe Not tainted 5.6.0 #1 Call Trace: dump_stack+0x76/0xa0 print_address_description.constprop.0+0x16/0x200 ? ath5k_eeprom_read_pcal_info_5111+0x126a/0x1390 [ath5k] ? ath5k_eeprom_read_pcal_info_5111+0x126a/0x1390 [ath5k] __kasan_report.cold+0x37/0x7c ? ath5k_eeprom_read_pcal_info_5111+0x126a/0x1390 [ath5k] kasan_report+0xe/0x20 ath5k_eeprom_read_pcal_info_5111+0x126a/0x1390 [ath5k] ? apic_timer_interrupt+0xa/0x20 ? ath5k_eeprom_init_11a_pcal_freq+0xbc0/0xbc0 [ath5k] ? ath5k_pci_eeprom_read+0x228/0x3c0 [ath5k] ath5k_eeprom_init+0x2513/0x6290 [ath5k] ? ath5k_eeprom_init_11a_pcal_freq+0xbc0/0xbc0 [ath5k] ? usleep_range+0xb8/0x100 ? apic_timer_interrupt+0xa/0x20 ? ath5k_eeprom_read_pcal_info_2413+0x2f20/0x2f20 [ath5k] ath5k_hw_init+0xb60/0x1970 [ath5k] ath5k_init_ah+0x6fe/0x2530 [ath5k] ? kasprintf+0xa6/0xe0 ? ath5k_stop+0x140/0x140 [ath5k] ? _dev_notice+0xf6/0xf6 ? apic_timer_interrupt+0xa/0x20 ath5k_pci_probe.cold+0x29a/0x3d6 [ath5k] ? ath5k_pci_eeprom_read+0x3c0/0x3c0 [ath5k] ? mutex_lock+0x89/0xd0 ? ath5k_pci_eeprom_read+0x3c0/0x3c0 [ath5k] local_pci_probe+0xd3/0x160 pci_device_probe+0x23f/0x3e0 ? pci_device_remove+0x280/0x280 ? pci_device_remove+0x280/0x280 really_probe+0x209/0x5d0
Reported-by: Brendan Dolan-Gavitt brendandg@nyu.edu Signed-off-by: Zekun Shen bruceshenzk@gmail.com Signed-off-by: Kalle Valo quic_kvalo@quicinc.com Link: https://lore.kernel.org/r/YckvDdj3mtCkDRIt@a-10-27-26-18.dynapool.vpn.nyu.ed... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath5k/eeprom.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/wireless/ath/ath5k/eeprom.c b/drivers/net/wireless/ath/ath5k/eeprom.c index 1fbc2c19848f..d444b3d70ba2 100644 --- a/drivers/net/wireless/ath/ath5k/eeprom.c +++ b/drivers/net/wireless/ath/ath5k/eeprom.c @@ -746,6 +746,9 @@ ath5k_eeprom_convert_pcal_info_5111(struct ath5k_hw *ah, int mode, } }
+ if (idx == AR5K_EEPROM_N_PD_CURVES) + goto err_out; + ee->ee_pd_gains[mode] = 1;
pd = &chinfo[pier].pd_curves[idx];
From: Dale Zhao dale.zhao@amd.com
[ Upstream commit 047db281c026de5971cedb5bb486aa29bd16a39d ]
[Why] For allow eDP hot-plug feature, the stream signal may change to VIRTUAL when plug-out and back to eDP when plug-in. OS will still setPathMode with same timing for each plugging, but eDP gets no stream update as we don't check signal type changing back as keeping it VIRTUAL. It's also unsafe for future cases that stream signal is switched with same timing.
[How] Check stream signal type change include previous HDMI signal case.
Reviewed-by: Aric Cyr Aric.Cyr@amd.com Acked-by: Wayne Lin wayne.lin@amd.com Signed-off-by: Dale Zhao dale.zhao@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index 7ae409f7dcf8..108f3854cd2a 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -1599,6 +1599,9 @@ static bool are_stream_backends_same( if (is_timing_changed(stream_a, stream_b)) return false;
+ if (stream_a->signal != stream_b->signal) + return false; + if (stream_a->dpms_off != stream_b->dpms_off) return false;
From: Xin Xiong xiongx18@fudan.edu.cn
[ Upstream commit dfced44f122c500004a48ecc8db516bb6a295a1b ]
This issue takes place in an error path in amdgpu_cs_fence_to_handle_ioctl(). When `info->in.what` falls into default case, the function simply returns -EINVAL, forgetting to decrement the reference count of a dma_fence obj, which is bumped earlier by amdgpu_cs_get_fence(). This may result in reference count leaks.
Fix it by decreasing the refcount of specific object before returning the error code.
Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Xin Xiong xiongx18@fudan.edu.cn Signed-off-by: Xin Tan tanxin.ctf@gmail.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 913f9eaa9cd6..aa823f154199 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1508,6 +1508,7 @@ int amdgpu_cs_fence_to_handle_ioctl(struct drm_device *dev, void *data, return 0;
default: + dma_fence_put(fence); return -EINVAL; } }
From: Yongzhi Liu lyz_cs@pku.edu.cn
[ Upstream commit 5d5c6dba2b43e28845d7d7ed32a36802329a5f52 ]
[why] Resource release is needed on the error handling path to prevent memory leak.
[how] Fix this by adding kfree on the error handling path.
Reviewed-by: Harry Wentland harry.wentland@amd.com Signed-off-by: Yongzhi Liu lyz_cs@pku.edu.cn Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../amd/display/amdgpu_dm/amdgpu_dm_debugfs.c | 80 ++++++++++++++----- 1 file changed, 60 insertions(+), 20 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c index e94ddd5e7b63..5c9f5214bc4e 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c @@ -229,8 +229,10 @@ static ssize_t dp_link_settings_read(struct file *f, char __user *buf, break;
r = put_user(*(rd_buf + result), buf); - if (r) + if (r) { + kfree(rd_buf); return r; /* r = -EFAULT */ + }
buf += 1; size -= 1; @@ -388,8 +390,10 @@ static ssize_t dp_phy_settings_read(struct file *f, char __user *buf, break;
r = put_user((*(rd_buf + result)), buf); - if (r) + if (r) { + kfree(rd_buf); return r; /* r = -EFAULT */ + }
buf += 1; size -= 1; @@ -1316,8 +1320,10 @@ static ssize_t dp_dsc_clock_en_read(struct file *f, char __user *buf, break; }
- if (!pipe_ctx) + if (!pipe_ctx) { + kfree(rd_buf); return -ENXIO; + }
dsc = pipe_ctx->stream_res.dsc; if (dsc) @@ -1333,8 +1339,10 @@ static ssize_t dp_dsc_clock_en_read(struct file *f, char __user *buf, break;
r = put_user(*(rd_buf + result), buf); - if (r) + if (r) { + kfree(rd_buf); return r; /* r = -EFAULT */ + }
buf += 1; size -= 1; @@ -1503,8 +1511,10 @@ static ssize_t dp_dsc_slice_width_read(struct file *f, char __user *buf, break; }
- if (!pipe_ctx) + if (!pipe_ctx) { + kfree(rd_buf); return -ENXIO; + }
dsc = pipe_ctx->stream_res.dsc; if (dsc) @@ -1520,8 +1530,10 @@ static ssize_t dp_dsc_slice_width_read(struct file *f, char __user *buf, break;
r = put_user(*(rd_buf + result), buf); - if (r) + if (r) { + kfree(rd_buf); return r; /* r = -EFAULT */ + }
buf += 1; size -= 1; @@ -1688,8 +1700,10 @@ static ssize_t dp_dsc_slice_height_read(struct file *f, char __user *buf, break; }
- if (!pipe_ctx) + if (!pipe_ctx) { + kfree(rd_buf); return -ENXIO; + }
dsc = pipe_ctx->stream_res.dsc; if (dsc) @@ -1705,8 +1719,10 @@ static ssize_t dp_dsc_slice_height_read(struct file *f, char __user *buf, break;
r = put_user(*(rd_buf + result), buf); - if (r) + if (r) { + kfree(rd_buf); return r; /* r = -EFAULT */ + }
buf += 1; size -= 1; @@ -1869,8 +1885,10 @@ static ssize_t dp_dsc_bits_per_pixel_read(struct file *f, char __user *buf, break; }
- if (!pipe_ctx) + if (!pipe_ctx) { + kfree(rd_buf); return -ENXIO; + }
dsc = pipe_ctx->stream_res.dsc; if (dsc) @@ -1886,8 +1904,10 @@ static ssize_t dp_dsc_bits_per_pixel_read(struct file *f, char __user *buf, break;
r = put_user(*(rd_buf + result), buf); - if (r) + if (r) { + kfree(rd_buf); return r; /* r = -EFAULT */ + }
buf += 1; size -= 1; @@ -2045,8 +2065,10 @@ static ssize_t dp_dsc_pic_width_read(struct file *f, char __user *buf, break; }
- if (!pipe_ctx) + if (!pipe_ctx) { + kfree(rd_buf); return -ENXIO; + }
dsc = pipe_ctx->stream_res.dsc; if (dsc) @@ -2062,8 +2084,10 @@ static ssize_t dp_dsc_pic_width_read(struct file *f, char __user *buf, break;
r = put_user(*(rd_buf + result), buf); - if (r) + if (r) { + kfree(rd_buf); return r; /* r = -EFAULT */ + }
buf += 1; size -= 1; @@ -2102,8 +2126,10 @@ static ssize_t dp_dsc_pic_height_read(struct file *f, char __user *buf, break; }
- if (!pipe_ctx) + if (!pipe_ctx) { + kfree(rd_buf); return -ENXIO; + }
dsc = pipe_ctx->stream_res.dsc; if (dsc) @@ -2119,8 +2145,10 @@ static ssize_t dp_dsc_pic_height_read(struct file *f, char __user *buf, break;
r = put_user(*(rd_buf + result), buf); - if (r) + if (r) { + kfree(rd_buf); return r; /* r = -EFAULT */ + }
buf += 1; size -= 1; @@ -2174,8 +2202,10 @@ static ssize_t dp_dsc_chunk_size_read(struct file *f, char __user *buf, break; }
- if (!pipe_ctx) + if (!pipe_ctx) { + kfree(rd_buf); return -ENXIO; + }
dsc = pipe_ctx->stream_res.dsc; if (dsc) @@ -2191,8 +2221,10 @@ static ssize_t dp_dsc_chunk_size_read(struct file *f, char __user *buf, break;
r = put_user(*(rd_buf + result), buf); - if (r) + if (r) { + kfree(rd_buf); return r; /* r = -EFAULT */ + }
buf += 1; size -= 1; @@ -2246,8 +2278,10 @@ static ssize_t dp_dsc_slice_bpg_offset_read(struct file *f, char __user *buf, break; }
- if (!pipe_ctx) + if (!pipe_ctx) { + kfree(rd_buf); return -ENXIO; + }
dsc = pipe_ctx->stream_res.dsc; if (dsc) @@ -2263,8 +2297,10 @@ static ssize_t dp_dsc_slice_bpg_offset_read(struct file *f, char __user *buf, break;
r = put_user(*(rd_buf + result), buf); - if (r) + if (r) { + kfree(rd_buf); return r; /* r = -EFAULT */ + }
buf += 1; size -= 1; @@ -3254,8 +3290,10 @@ static ssize_t dcc_en_bits_read( dc->hwss.get_dcc_en_bits(dc, dcc_en_bits);
rd_buf = kcalloc(rd_buf_size, sizeof(char), GFP_KERNEL); - if (!rd_buf) + if (!rd_buf) { + kfree(dcc_en_bits); return -ENOMEM; + }
for (i = 0; i < num_pipes; i++) offset += snprintf(rd_buf + offset, rd_buf_size - offset, @@ -3268,8 +3306,10 @@ static ssize_t dcc_en_bits_read( if (*pos >= rd_buf_size) break; r = put_user(*(rd_buf + result), buf); - if (r) + if (r) { + kfree(rd_buf); return r; /* r = -EFAULT */ + } buf += 1; size -= 1; *pos += 1;
From: Nicholas Kazlauskas nicholas.kazlauskas@amd.com
[ Upstream commit b80ddeb29d9df449f875f0b6f5de08d7537c02b8 ]
[Why] If the DPCD caps specifies a PSR version newer than PSR_VERSION_1 then we fallback to using PSR_VERSION_1 in amdgpu_dm_set_psr_caps.
This gets overriden with the raw DPCD value in amdgpu_dm_link_setup_psr, which can result in DMCUB hanging if we pass in an unsupported PSR version number.
[How] Fix the hang by using link->psr_settings.psr_version directly during amdgpu_dm_link_setup_psr.
Tested-by: Daniel Wheeler daniel.wheeler@amd.com Reviewed-by: Anthony Koo Anthony.Koo@amd.com Acked-by: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Signed-off-by: Nicholas Kazlauskas nicholas.kazlauskas@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c index 70a554f1e725..7072fb2ec07f 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c @@ -74,10 +74,8 @@ bool amdgpu_dm_link_setup_psr(struct dc_stream_state *stream)
link = stream->link;
- psr_config.psr_version = link->dpcd_caps.psr_caps.psr_version; - - if (psr_config.psr_version > 0) { - psr_config.psr_exit_link_training_required = 0x1; + if (link->psr_settings.psr_version != DC_PSR_VERSION_UNSUPPORTED) { + psr_config.psr_version = link->psr_settings.psr_version; psr_config.psr_frame_capture_indication_req = 0; psr_config.psr_rfb_setup_time = 0x37; psr_config.psr_sdp_transmit_line_num_deadline = 0x20;
From: Wayne Chang waynec@nvidia.com
[ Upstream commit 62fb61580eb48fc890b7bc9fb5fd263367baeca8 ]
According to the Tegra Technical Reference Manual, SPARAM is a read-only register and should not be programmed in the driver.
The change removes the wrong SPARAM usage.
Signed-off-by: Wayne Chang waynec@nvidia.com Link: https://lore.kernel.org/r/20220107090443.149021-1-waynec@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/gadget/udc/tegra-xudc.c | 8 -------- 1 file changed, 8 deletions(-)
diff --git a/drivers/usb/gadget/udc/tegra-xudc.c b/drivers/usb/gadget/udc/tegra-xudc.c index 43f1b0d461c1..716d9ab2d2ff 100644 --- a/drivers/usb/gadget/udc/tegra-xudc.c +++ b/drivers/usb/gadget/udc/tegra-xudc.c @@ -32,9 +32,6 @@ #include <linux/workqueue.h>
/* XUSB_DEV registers */ -#define SPARAM 0x000 -#define SPARAM_ERSTMAX_MASK GENMASK(20, 16) -#define SPARAM_ERSTMAX(x) (((x) << 16) & SPARAM_ERSTMAX_MASK) #define DB 0x004 #define DB_TARGET_MASK GENMASK(15, 8) #define DB_TARGET(x) (((x) << 8) & DB_TARGET_MASK) @@ -3295,11 +3292,6 @@ static void tegra_xudc_init_event_ring(struct tegra_xudc *xudc) unsigned int i; u32 val;
- val = xudc_readl(xudc, SPARAM); - val &= ~(SPARAM_ERSTMAX_MASK); - val |= SPARAM_ERSTMAX(XUDC_NR_EVENT_RINGS); - xudc_writel(xudc, val, SPARAM); - for (i = 0; i < ARRAY_SIZE(xudc->event_ring); i++) { memset(xudc->event_ring[i], 0, XUDC_EVENT_RING_SIZE * sizeof(*xudc->event_ring[i]));
From: Wayne Chang waynec@nvidia.com
[ Upstream commit 7bd42fb95eb4f98495ccadf467ad15124208ec49 ]
According to the Tegra Technical Reference Manual, the seq_num field of control endpoint is not [31:24] but [31:27]. Bit 24 is reserved and bit 26 is splitxstate.
The change fixes the wrong control endpoint's definitions.
Signed-off-by: Wayne Chang waynec@nvidia.com Link: https://lore.kernel.org/r/20220107091349.149798-1-waynec@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/gadget/udc/tegra-xudc.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/gadget/udc/tegra-xudc.c b/drivers/usb/gadget/udc/tegra-xudc.c index 716d9ab2d2ff..be76f891b9c5 100644 --- a/drivers/usb/gadget/udc/tegra-xudc.c +++ b/drivers/usb/gadget/udc/tegra-xudc.c @@ -272,8 +272,10 @@ BUILD_EP_CONTEXT_RW(deq_hi, deq_hi, 0, 0xffffffff) BUILD_EP_CONTEXT_RW(avg_trb_len, tx_info, 0, 0xffff) BUILD_EP_CONTEXT_RW(max_esit_payload, tx_info, 16, 0xffff) BUILD_EP_CONTEXT_RW(edtla, rsvd[0], 0, 0xffffff) -BUILD_EP_CONTEXT_RW(seq_num, rsvd[0], 24, 0xff) +BUILD_EP_CONTEXT_RW(rsvd, rsvd[0], 24, 0x1) BUILD_EP_CONTEXT_RW(partial_td, rsvd[0], 25, 0x1) +BUILD_EP_CONTEXT_RW(splitxstate, rsvd[0], 26, 0x1) +BUILD_EP_CONTEXT_RW(seq_num, rsvd[0], 27, 0x1f) BUILD_EP_CONTEXT_RW(cerrcnt, rsvd[1], 18, 0x3) BUILD_EP_CONTEXT_RW(data_offset, rsvd[2], 0, 0x1ffff) BUILD_EP_CONTEXT_RW(numtrbs, rsvd[2], 22, 0x1f) @@ -1554,6 +1556,9 @@ static int __tegra_xudc_ep_set_halt(struct tegra_xudc_ep *ep, bool halt) ep_reload(xudc, ep->index);
ep_ctx_write_state(ep->context, EP_STATE_RUNNING); + ep_ctx_write_rsvd(ep->context, 0); + ep_ctx_write_partial_td(ep->context, 0); + ep_ctx_write_splitxstate(ep->context, 0); ep_ctx_write_seq_num(ep->context, 0);
ep_reload(xudc, ep->index); @@ -2809,7 +2814,10 @@ static void tegra_xudc_reset(struct tegra_xudc *xudc) xudc->setup_seq_num = 0; xudc->queued_setup_packet = false;
- ep_ctx_write_seq_num(ep0->context, xudc->setup_seq_num); + ep_ctx_write_rsvd(ep0->context, 0); + ep_ctx_write_partial_td(ep0->context, 0); + ep_ctx_write_splitxstate(ep0->context, 0); + ep_ctx_write_seq_num(ep0->context, 0);
deq_ptr = trb_virt_to_phys(ep0, &ep0->transfer_ring[ep0->deq_ptr]);
From: Pawel Laszczak pawell@cadence.com
[ Upstream commit 03db9289b5ab59437e42a111a34545a7cedb5190 ]
Variable ret in function cdnsp_decode_trb is initialized but not used. To fix this compiler warning patch adds checking whether the data buffer has not been overflowed.
Reported-by: kernel test robot lkp@intel.com Signed-off-by: Pawel Laszczak pawell@cadence.com Link: https://lore.kernel.org/r/20220112053237.14309-1-pawell@gli-login.cadence.co... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/cdns3/cdnsp-debug.h | 305 ++++++++++++++++---------------- 1 file changed, 154 insertions(+), 151 deletions(-)
diff --git a/drivers/usb/cdns3/cdnsp-debug.h b/drivers/usb/cdns3/cdnsp-debug.h index a8776df2d4e0..f0ca865cce2a 100644 --- a/drivers/usb/cdns3/cdnsp-debug.h +++ b/drivers/usb/cdns3/cdnsp-debug.h @@ -182,208 +182,211 @@ static inline const char *cdnsp_decode_trb(char *str, size_t size, u32 field0, int ep_id = TRB_TO_EP_INDEX(field3) - 1; int type = TRB_FIELD_TO_TYPE(field3); unsigned int ep_num; - int ret = 0; + int ret; u32 temp;
ep_num = DIV_ROUND_UP(ep_id, 2);
switch (type) { case TRB_LINK: - ret += snprintf(str, size, - "LINK %08x%08x intr %ld type '%s' flags %c:%c:%c:%c", - field1, field0, GET_INTR_TARGET(field2), - cdnsp_trb_type_string(type), - field3 & TRB_IOC ? 'I' : 'i', - field3 & TRB_CHAIN ? 'C' : 'c', - field3 & TRB_TC ? 'T' : 't', - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, + "LINK %08x%08x intr %ld type '%s' flags %c:%c:%c:%c", + field1, field0, GET_INTR_TARGET(field2), + cdnsp_trb_type_string(type), + field3 & TRB_IOC ? 'I' : 'i', + field3 & TRB_CHAIN ? 'C' : 'c', + field3 & TRB_TC ? 'T' : 't', + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_TRANSFER: case TRB_COMPLETION: case TRB_PORT_STATUS: case TRB_HC_EVENT: - ret += snprintf(str, size, - "ep%d%s(%d) type '%s' TRB %08x%08x status '%s'" - " len %ld slot %ld flags %c:%c", - ep_num, ep_id % 2 ? "out" : "in", - TRB_TO_EP_INDEX(field3), - cdnsp_trb_type_string(type), field1, field0, - cdnsp_trb_comp_code_string(GET_COMP_CODE(field2)), - EVENT_TRB_LEN(field2), TRB_TO_SLOT_ID(field3), - field3 & EVENT_DATA ? 'E' : 'e', - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, + "ep%d%s(%d) type '%s' TRB %08x%08x status '%s'" + " len %ld slot %ld flags %c:%c", + ep_num, ep_id % 2 ? "out" : "in", + TRB_TO_EP_INDEX(field3), + cdnsp_trb_type_string(type), field1, field0, + cdnsp_trb_comp_code_string(GET_COMP_CODE(field2)), + EVENT_TRB_LEN(field2), TRB_TO_SLOT_ID(field3), + field3 & EVENT_DATA ? 'E' : 'e', + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_MFINDEX_WRAP: - ret += snprintf(str, size, "%s: flags %c", - cdnsp_trb_type_string(type), - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, "%s: flags %c", + cdnsp_trb_type_string(type), + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_SETUP: - ret += snprintf(str, size, - "type '%s' bRequestType %02x bRequest %02x " - "wValue %02x%02x wIndex %02x%02x wLength %d " - "length %ld TD size %ld intr %ld Setup ID %ld " - "flags %c:%c:%c", - cdnsp_trb_type_string(type), - field0 & 0xff, - (field0 & 0xff00) >> 8, - (field0 & 0xff000000) >> 24, - (field0 & 0xff0000) >> 16, - (field1 & 0xff00) >> 8, - field1 & 0xff, - (field1 & 0xff000000) >> 16 | - (field1 & 0xff0000) >> 16, - TRB_LEN(field2), GET_TD_SIZE(field2), - GET_INTR_TARGET(field2), - TRB_SETUPID_TO_TYPE(field3), - field3 & TRB_IDT ? 'D' : 'd', - field3 & TRB_IOC ? 'I' : 'i', - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, + "type '%s' bRequestType %02x bRequest %02x " + "wValue %02x%02x wIndex %02x%02x wLength %d " + "length %ld TD size %ld intr %ld Setup ID %ld " + "flags %c:%c:%c", + cdnsp_trb_type_string(type), + field0 & 0xff, + (field0 & 0xff00) >> 8, + (field0 & 0xff000000) >> 24, + (field0 & 0xff0000) >> 16, + (field1 & 0xff00) >> 8, + field1 & 0xff, + (field1 & 0xff000000) >> 16 | + (field1 & 0xff0000) >> 16, + TRB_LEN(field2), GET_TD_SIZE(field2), + GET_INTR_TARGET(field2), + TRB_SETUPID_TO_TYPE(field3), + field3 & TRB_IDT ? 'D' : 'd', + field3 & TRB_IOC ? 'I' : 'i', + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_DATA: - ret += snprintf(str, size, - "type '%s' Buffer %08x%08x length %ld TD size %ld " - "intr %ld flags %c:%c:%c:%c:%c:%c:%c", - cdnsp_trb_type_string(type), - field1, field0, TRB_LEN(field2), - GET_TD_SIZE(field2), - GET_INTR_TARGET(field2), - field3 & TRB_IDT ? 'D' : 'i', - field3 & TRB_IOC ? 'I' : 'i', - field3 & TRB_CHAIN ? 'C' : 'c', - field3 & TRB_NO_SNOOP ? 'S' : 's', - field3 & TRB_ISP ? 'I' : 'i', - field3 & TRB_ENT ? 'E' : 'e', - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, + "type '%s' Buffer %08x%08x length %ld TD size %ld " + "intr %ld flags %c:%c:%c:%c:%c:%c:%c", + cdnsp_trb_type_string(type), + field1, field0, TRB_LEN(field2), + GET_TD_SIZE(field2), + GET_INTR_TARGET(field2), + field3 & TRB_IDT ? 'D' : 'i', + field3 & TRB_IOC ? 'I' : 'i', + field3 & TRB_CHAIN ? 'C' : 'c', + field3 & TRB_NO_SNOOP ? 'S' : 's', + field3 & TRB_ISP ? 'I' : 'i', + field3 & TRB_ENT ? 'E' : 'e', + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_STATUS: - ret += snprintf(str, size, - "Buffer %08x%08x length %ld TD size %ld intr" - "%ld type '%s' flags %c:%c:%c:%c", - field1, field0, TRB_LEN(field2), - GET_TD_SIZE(field2), - GET_INTR_TARGET(field2), - cdnsp_trb_type_string(type), - field3 & TRB_IOC ? 'I' : 'i', - field3 & TRB_CHAIN ? 'C' : 'c', - field3 & TRB_ENT ? 'E' : 'e', - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, + "Buffer %08x%08x length %ld TD size %ld intr" + "%ld type '%s' flags %c:%c:%c:%c", + field1, field0, TRB_LEN(field2), + GET_TD_SIZE(field2), + GET_INTR_TARGET(field2), + cdnsp_trb_type_string(type), + field3 & TRB_IOC ? 'I' : 'i', + field3 & TRB_CHAIN ? 'C' : 'c', + field3 & TRB_ENT ? 'E' : 'e', + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_NORMAL: case TRB_ISOC: case TRB_EVENT_DATA: case TRB_TR_NOOP: - ret += snprintf(str, size, - "type '%s' Buffer %08x%08x length %ld " - "TD size %ld intr %ld " - "flags %c:%c:%c:%c:%c:%c:%c:%c:%c", - cdnsp_trb_type_string(type), - field1, field0, TRB_LEN(field2), - GET_TD_SIZE(field2), - GET_INTR_TARGET(field2), - field3 & TRB_BEI ? 'B' : 'b', - field3 & TRB_IDT ? 'T' : 't', - field3 & TRB_IOC ? 'I' : 'i', - field3 & TRB_CHAIN ? 'C' : 'c', - field3 & TRB_NO_SNOOP ? 'S' : 's', - field3 & TRB_ISP ? 'I' : 'i', - field3 & TRB_ENT ? 'E' : 'e', - field3 & TRB_CYCLE ? 'C' : 'c', - !(field3 & TRB_EVENT_INVALIDATE) ? 'V' : 'v'); + ret = snprintf(str, size, + "type '%s' Buffer %08x%08x length %ld " + "TD size %ld intr %ld " + "flags %c:%c:%c:%c:%c:%c:%c:%c:%c", + cdnsp_trb_type_string(type), + field1, field0, TRB_LEN(field2), + GET_TD_SIZE(field2), + GET_INTR_TARGET(field2), + field3 & TRB_BEI ? 'B' : 'b', + field3 & TRB_IDT ? 'T' : 't', + field3 & TRB_IOC ? 'I' : 'i', + field3 & TRB_CHAIN ? 'C' : 'c', + field3 & TRB_NO_SNOOP ? 'S' : 's', + field3 & TRB_ISP ? 'I' : 'i', + field3 & TRB_ENT ? 'E' : 'e', + field3 & TRB_CYCLE ? 'C' : 'c', + !(field3 & TRB_EVENT_INVALIDATE) ? 'V' : 'v'); break; case TRB_CMD_NOOP: case TRB_ENABLE_SLOT: - ret += snprintf(str, size, "%s: flags %c", - cdnsp_trb_type_string(type), - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, "%s: flags %c", + cdnsp_trb_type_string(type), + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_DISABLE_SLOT: - ret += snprintf(str, size, "%s: slot %ld flags %c", - cdnsp_trb_type_string(type), - TRB_TO_SLOT_ID(field3), - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, "%s: slot %ld flags %c", + cdnsp_trb_type_string(type), + TRB_TO_SLOT_ID(field3), + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_ADDR_DEV: - ret += snprintf(str, size, - "%s: ctx %08x%08x slot %ld flags %c:%c", - cdnsp_trb_type_string(type), field1, field0, - TRB_TO_SLOT_ID(field3), - field3 & TRB_BSR ? 'B' : 'b', - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, + "%s: ctx %08x%08x slot %ld flags %c:%c", + cdnsp_trb_type_string(type), field1, field0, + TRB_TO_SLOT_ID(field3), + field3 & TRB_BSR ? 'B' : 'b', + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_CONFIG_EP: - ret += snprintf(str, size, - "%s: ctx %08x%08x slot %ld flags %c:%c", - cdnsp_trb_type_string(type), field1, field0, - TRB_TO_SLOT_ID(field3), - field3 & TRB_DC ? 'D' : 'd', - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, + "%s: ctx %08x%08x slot %ld flags %c:%c", + cdnsp_trb_type_string(type), field1, field0, + TRB_TO_SLOT_ID(field3), + field3 & TRB_DC ? 'D' : 'd', + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_EVAL_CONTEXT: - ret += snprintf(str, size, - "%s: ctx %08x%08x slot %ld flags %c", - cdnsp_trb_type_string(type), field1, field0, - TRB_TO_SLOT_ID(field3), - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, + "%s: ctx %08x%08x slot %ld flags %c", + cdnsp_trb_type_string(type), field1, field0, + TRB_TO_SLOT_ID(field3), + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_RESET_EP: case TRB_HALT_ENDPOINT: case TRB_FLUSH_ENDPOINT: - ret += snprintf(str, size, - "%s: ep%d%s(%d) ctx %08x%08x slot %ld flags %c", - cdnsp_trb_type_string(type), - ep_num, ep_id % 2 ? "out" : "in", - TRB_TO_EP_INDEX(field3), field1, field0, - TRB_TO_SLOT_ID(field3), - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, + "%s: ep%d%s(%d) ctx %08x%08x slot %ld flags %c", + cdnsp_trb_type_string(type), + ep_num, ep_id % 2 ? "out" : "in", + TRB_TO_EP_INDEX(field3), field1, field0, + TRB_TO_SLOT_ID(field3), + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_STOP_RING: - ret += snprintf(str, size, - "%s: ep%d%s(%d) slot %ld sp %d flags %c", - cdnsp_trb_type_string(type), - ep_num, ep_id % 2 ? "out" : "in", - TRB_TO_EP_INDEX(field3), - TRB_TO_SLOT_ID(field3), - TRB_TO_SUSPEND_PORT(field3), - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, + "%s: ep%d%s(%d) slot %ld sp %d flags %c", + cdnsp_trb_type_string(type), + ep_num, ep_id % 2 ? "out" : "in", + TRB_TO_EP_INDEX(field3), + TRB_TO_SLOT_ID(field3), + TRB_TO_SUSPEND_PORT(field3), + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_SET_DEQ: - ret += snprintf(str, size, - "%s: ep%d%s(%d) deq %08x%08x stream %ld slot %ld flags %c", - cdnsp_trb_type_string(type), - ep_num, ep_id % 2 ? "out" : "in", - TRB_TO_EP_INDEX(field3), field1, field0, - TRB_TO_STREAM_ID(field2), - TRB_TO_SLOT_ID(field3), - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, + "%s: ep%d%s(%d) deq %08x%08x stream %ld slot %ld flags %c", + cdnsp_trb_type_string(type), + ep_num, ep_id % 2 ? "out" : "in", + TRB_TO_EP_INDEX(field3), field1, field0, + TRB_TO_STREAM_ID(field2), + TRB_TO_SLOT_ID(field3), + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_RESET_DEV: - ret += snprintf(str, size, "%s: slot %ld flags %c", - cdnsp_trb_type_string(type), - TRB_TO_SLOT_ID(field3), - field3 & TRB_CYCLE ? 'C' : 'c'); + ret = snprintf(str, size, "%s: slot %ld flags %c", + cdnsp_trb_type_string(type), + TRB_TO_SLOT_ID(field3), + field3 & TRB_CYCLE ? 'C' : 'c'); break; case TRB_ENDPOINT_NRDY: - temp = TRB_TO_HOST_STREAM(field2); - - ret += snprintf(str, size, - "%s: ep%d%s(%d) H_SID %x%s%s D_SID %lx flags %c:%c", - cdnsp_trb_type_string(type), - ep_num, ep_id % 2 ? "out" : "in", - TRB_TO_EP_INDEX(field3), temp, - temp == STREAM_PRIME_ACK ? "(PRIME)" : "", - temp == STREAM_REJECTED ? "(REJECTED)" : "", - TRB_TO_DEV_STREAM(field0), - field3 & TRB_STAT ? 'S' : 's', - field3 & TRB_CYCLE ? 'C' : 'c'); + temp = TRB_TO_HOST_STREAM(field2); + + ret = snprintf(str, size, + "%s: ep%d%s(%d) H_SID %x%s%s D_SID %lx flags %c:%c", + cdnsp_trb_type_string(type), + ep_num, ep_id % 2 ? "out" : "in", + TRB_TO_EP_INDEX(field3), temp, + temp == STREAM_PRIME_ACK ? "(PRIME)" : "", + temp == STREAM_REJECTED ? "(REJECTED)" : "", + TRB_TO_DEV_STREAM(field0), + field3 & TRB_STAT ? 'S' : 's', + field3 & TRB_CYCLE ? 'C' : 'c'); break; default: - ret += snprintf(str, size, - "type '%s' -> raw %08x %08x %08x %08x", - cdnsp_trb_type_string(type), - field0, field1, field2, field3); + ret = snprintf(str, size, + "type '%s' -> raw %08x %08x %08x %08x", + cdnsp_trb_type_string(type), + field0, field1, field2, field3); }
+ if (ret >= size) + pr_info("CDNSP: buffer overflowed.\n"); + return str; }
From: Yang Guang yang.guang5@zte.com.cn
[ Upstream commit e2cf07654efb0fd7bbcb475c6f74be7b5755a8fd ]
coccinelle report: ./drivers/ptp/ptp_sysfs.c:17:8-16: WARNING: use scnprintf or sprintf ./drivers/ptp/ptp_sysfs.c:390:8-16: WARNING: use scnprintf or sprintf
Use sysfs_emit instead of scnprintf or sprintf makes more sense.
Reported-by: Zeal Robot zealci@zte.com.cn Signed-off-by: Yang Guang yang.guang5@zte.com.cn Signed-off-by: David Yang davidcomponentone@gmail.com Acked-by: Richard Cochran richardcochran@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ptp/ptp_sysfs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/ptp/ptp_sysfs.c b/drivers/ptp/ptp_sysfs.c index 41b92dc2f011..9233bfedeb17 100644 --- a/drivers/ptp/ptp_sysfs.c +++ b/drivers/ptp/ptp_sysfs.c @@ -14,7 +14,7 @@ static ssize_t clock_name_show(struct device *dev, struct device_attribute *attr, char *page) { struct ptp_clock *ptp = dev_get_drvdata(dev); - return snprintf(page, PAGE_SIZE-1, "%s\n", ptp->info->name); + return sysfs_emit(page, "%s\n", ptp->info->name); } static DEVICE_ATTR_RO(clock_name);
@@ -387,7 +387,7 @@ static ssize_t ptp_pin_show(struct device *dev, struct device_attribute *attr,
mutex_unlock(&ptp->pincfg_mux);
- return snprintf(page, PAGE_SIZE, "%u %u\n", func, chan); + return sysfs_emit(page, "%u %u\n", func, chan); }
static ssize_t ptp_pin_store(struct device *dev, struct device_attribute *attr,
From: Philip Yang Philip.Yang@amd.com
[ Upstream commit ac7c48c0cce00d03b3c95fddcccb0a45257e33e3 ]
SVM ioctls take proper svms->lock to handle race conditions, don't need take process mutex to serialize ioctls. This also fixes circular locking warning:
WARNING: possible circular locking dependency detected
Possible unsafe locking scenario:
CPU0 CPU1 ---- ---- lock((work_completion)(&svms->deferred_list_work)); lock(&process->mutex); lock((work_completion)(&svms->deferred_list_work)); lock(&process->mutex);
*** DEADLOCK ***
Signed-off-by: Philip Yang Philip.Yang@amd.com Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 86afd37b098d..6688129df240 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1807,13 +1807,9 @@ static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, void *data) if (!args->start_addr || !args->size) return -EINVAL;
- mutex_lock(&p->mutex); - r = svm_ioctl(p, args->op, args->start_addr, args->size, args->nattr, args->attrs);
- mutex_unlock(&p->mutex); - return r; } #else
From: Maxim Kiselev bigunclemax@gmail.com
[ Upstream commit 17846485dff91acce1ad47b508b633dffc32e838 ]
T1040RDB has two RTL8211E-VB phys which requires setting of internal delays for correct work.
Changing the phy-connection-type property to `rgmii-id` will fix this issue.
Signed-off-by: Maxim Kiselev bigunclemax@gmail.com Reviewed-by: Maxim Kochetkov fido_max@inbox.ru Reviewed-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20211230151123.1258321-1-bigunclemax@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/boot/dts/fsl/t104xrdb.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/boot/dts/fsl/t104xrdb.dtsi b/arch/powerpc/boot/dts/fsl/t104xrdb.dtsi index 099a598c74c0..bfe1ed5be337 100644 --- a/arch/powerpc/boot/dts/fsl/t104xrdb.dtsi +++ b/arch/powerpc/boot/dts/fsl/t104xrdb.dtsi @@ -139,12 +139,12 @@ fman@400000 { ethernet@e6000 { phy-handle = <&phy_rgmii_0>; - phy-connection-type = "rgmii"; + phy-connection-type = "rgmii-id"; };
ethernet@e8000 { phy-handle = <&phy_rgmii_1>; - phy-connection-type = "rgmii"; + phy-connection-type = "rgmii-id"; };
mdio0: mdio@fc000 {
From: Venkateswara Naralasetty quic_vnaralas@quicinc.com
[ Upstream commit 22b59cb965f79ee1accf83172441c9ca0ecb632a ]
Call netif_napi_del() from ath11k_ahb_free_ext_irq() to fix the following kernel panic when unload/load ath11k modules for few iterations.
[ 971.201365] Unable to handle kernel paging request at virtual address 6d97a208 [ 971.204227] pgd = 594c2919 [ 971.211478] [6d97a208] *pgd=00000000 [ 971.214120] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [ 971.412024] CPU: 2 PID: 4435 Comm: insmod Not tainted 5.4.89 #0 [ 971.434256] Hardware name: Generic DT based system [ 971.440165] PC is at napi_by_id+0x10/0x40 [ 971.445019] LR is at netif_napi_add+0x160/0x1dc
[ 971.743127] (napi_by_id) from [<807d89a0>] (netif_napi_add+0x160/0x1dc) [ 971.751295] (netif_napi_add) from [<7f1209ac>] (ath11k_ahb_config_irq+0xf8/0x414 [ath11k_ahb]) [ 971.759164] (ath11k_ahb_config_irq [ath11k_ahb]) from [<7f12135c>] (ath11k_ahb_probe+0x40c/0x51c [ath11k_ahb]) [ 971.768567] (ath11k_ahb_probe [ath11k_ahb]) from [<80666864>] (platform_drv_probe+0x48/0x94) [ 971.779670] (platform_drv_probe) from [<80664718>] (really_probe+0x1c8/0x450) [ 971.789389] (really_probe) from [<80664cc4>] (driver_probe_device+0x15c/0x1b8) [ 971.797547] (driver_probe_device) from [<80664f60>] (device_driver_attach+0x44/0x60) [ 971.805795] (device_driver_attach) from [<806650a0>] (__driver_attach+0x124/0x140) [ 971.814822] (__driver_attach) from [<80662adc>] (bus_for_each_dev+0x58/0xa4) [ 971.823328] (bus_for_each_dev) from [<80663a2c>] (bus_add_driver+0xf0/0x1e8) [ 971.831662] (bus_add_driver) from [<806658a4>] (driver_register+0xa8/0xf0) [ 971.839822] (driver_register) from [<8030269c>] (do_one_initcall+0x78/0x1ac) [ 971.847638] (do_one_initcall) from [<80392524>] (do_init_module+0x54/0x200) [ 971.855968] (do_init_module) from [<803945b0>] (load_module+0x1e30/0x1ffc) [ 971.864126] (load_module) from [<803948b0>] (sys_init_module+0x134/0x17c) [ 971.871852] (sys_init_module) from [<80301000>] (ret_fast_syscall+0x0/0x50)
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.6.0.1-00760-QCAHKSWPL_SILICONZ-1
Signed-off-by: Venkateswara Naralasetty quic_vnaralas@quicinc.com Signed-off-by: Kalle Valo quic_kvalo@quicinc.com Link: https://lore.kernel.org/r/1642583973-21599-1-git-send-email-quic_vnaralas@qu... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath11k/ahb.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/wireless/ath/ath11k/ahb.c b/drivers/net/wireless/ath/ath11k/ahb.c index 3fb0aa000825..24bd0520926b 100644 --- a/drivers/net/wireless/ath/ath11k/ahb.c +++ b/drivers/net/wireless/ath/ath11k/ahb.c @@ -391,6 +391,8 @@ static void ath11k_ahb_free_ext_irq(struct ath11k_base *ab)
for (j = 0; j < irq_grp->num_irq; j++) free_irq(ab->irq_num[irq_grp->irqs[j]], irq_grp); + + netif_napi_del(&irq_grp->napi); } }
From: Kalle Valo quic_kvalo@quicinc.com
[ Upstream commit b4f4c56459a5c744f7f066b9fc2b54ea995030c5 ]
Mario reported that the kernel was crashing on suspend if ath11k was not able to find a board file:
[ 473.693286] PM: Suspending system (s2idle) [ 473.693291] printk: Suspending console(s) (use no_console_suspend to debug) [ 474.407787] BUG: unable to handle page fault for address: 0000000000002070 [ 474.407791] #PF: supervisor read access in kernel mode [ 474.407794] #PF: error_code(0x0000) - not-present page [ 474.407798] PGD 0 P4D 0 [ 474.407801] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 474.407805] CPU: 2 PID: 2350 Comm: kworker/u32:14 Tainted: G W 5.16.0 #248 [...] [ 474.407868] Call Trace: [ 474.407870] <TASK> [ 474.407874] ? _raw_spin_lock_irqsave+0x2a/0x60 [ 474.407882] ? lock_timer_base+0x72/0xa0 [ 474.407889] ? _raw_spin_unlock_irqrestore+0x29/0x3d [ 474.407892] ? try_to_del_timer_sync+0x54/0x80 [ 474.407896] ath11k_dp_rx_pktlog_stop+0x49/0xc0 [ath11k] [ 474.407912] ath11k_core_suspend+0x34/0x130 [ath11k] [ 474.407923] ath11k_pci_pm_suspend+0x1b/0x50 [ath11k_pci] [ 474.407928] pci_pm_suspend+0x7e/0x170 [ 474.407935] ? pci_pm_freeze+0xc0/0xc0 [ 474.407939] dpm_run_callback+0x4e/0x150 [ 474.407947] __device_suspend+0x148/0x4c0 [ 474.407951] async_suspend+0x20/0x90 dmesg-efi-164255130401001: Oops#1 Part1 [ 474.407955] async_run_entry_fn+0x33/0x120 [ 474.407959] process_one_work+0x220/0x3f0 [ 474.407966] worker_thread+0x4a/0x3d0 [ 474.407971] kthread+0x17a/0x1a0 [ 474.407975] ? process_one_work+0x3f0/0x3f0 [ 474.407979] ? set_kthread_struct+0x40/0x40 [ 474.407983] ret_from_fork+0x22/0x30 [ 474.407991] </TASK>
The issue here is that board file loading happens after ath11k_pci_probe() succesfully returns (ath11k initialisation happends asynchronously) and the suspend handler is still enabled, of course failing as ath11k is not properly initialised. Fix this by checking ATH11K_FLAG_QMI_FAIL during both suspend and resume.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03003-QCAHSPSWPL_V1_V2_SILICONZ_LITE-2
Reported-by: Mario Limonciello mario.limonciello@amd.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=215504 Signed-off-by: Kalle Valo quic_kvalo@quicinc.com Link: https://lore.kernel.org/r/20220127090117.2024-1-kvalo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath11k/pci.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/net/wireless/ath/ath11k/pci.c b/drivers/net/wireless/ath/ath11k/pci.c index 54ce08f1c6e0..353a2d669fcd 100644 --- a/drivers/net/wireless/ath/ath11k/pci.c +++ b/drivers/net/wireless/ath/ath11k/pci.c @@ -1382,6 +1382,11 @@ static __maybe_unused int ath11k_pci_pm_suspend(struct device *dev) struct ath11k_base *ab = dev_get_drvdata(dev); int ret;
+ if (test_bit(ATH11K_FLAG_QMI_FAIL, &ab->dev_flags)) { + ath11k_dbg(ab, ATH11K_DBG_BOOT, "boot skipping pci suspend as qmi is not initialised\n"); + return 0; + } + ret = ath11k_core_suspend(ab); if (ret) ath11k_warn(ab, "failed to suspend core: %d\n", ret); @@ -1394,6 +1399,11 @@ static __maybe_unused int ath11k_pci_pm_resume(struct device *dev) struct ath11k_base *ab = dev_get_drvdata(dev); int ret;
+ if (test_bit(ATH11K_FLAG_QMI_FAIL, &ab->dev_flags)) { + ath11k_dbg(ab, ATH11K_DBG_BOOT, "boot skipping pci resume as qmi is not initialised\n"); + return 0; + } + ret = ath11k_core_resume(ab); if (ret) ath11k_warn(ab, "failed to resume core: %d\n", ret);
From: Kalle Valo quic_kvalo@quicinc.com
[ Upstream commit 3df6d74aedfdca919cca475d15dfdbc8b05c9e5d ]
If amss.bin was missing ath11k would crash during 'rmmod ath11k_pci'. The reason for that was that we were using mhi_async_power_up() which does not check any errors. But mhi_sync_power_up() on the other hand does check for errors so let's use that to fix the crash.
I was not able to find a reason why an async version was used. ath11k_mhi_start() (which enables state ATH11K_MHI_POWER_ON) is called from ath11k_hif_power_up(), which can sleep. So sync version should be safe to use here.
[ 145.569731] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI [ 145.569789] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] [ 145.569843] CPU: 2 PID: 1628 Comm: rmmod Kdump: loaded Tainted: G W 5.16.0-wt-ath+ #567 [ 145.569898] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0067.2021.0528.1339 05/28/2021 [ 145.569956] RIP: 0010:ath11k_hal_srng_access_begin+0xb5/0x2b0 [ath11k] [ 145.570028] Code: df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 ec 01 00 00 48 8b ab a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 ea 48 c1 ea 03 <0f> b6 14 02 48 89 e8 83 e0 07 83 c0 03 45 85 ed 75 48 38 d0 7c 08 [ 145.570089] RSP: 0018:ffffc900025d7ac0 EFLAGS: 00010246 [ 145.570144] RAX: dffffc0000000000 RBX: ffff88814fca2dd8 RCX: 1ffffffff50cb455 [ 145.570196] RDX: 0000000000000000 RSI: ffff88814fca2dd8 RDI: ffff88814fca2e80 [ 145.570252] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffffa8659497 [ 145.570329] R10: fffffbfff50cb292 R11: 0000000000000001 R12: ffff88814fca0000 [ 145.570410] R13: 0000000000000000 R14: ffff88814fca2798 R15: ffff88814fca2dd8 [ 145.570465] FS: 00007fa399988540(0000) GS:ffff888233e00000(0000) knlGS:0000000000000000 [ 145.570519] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 145.570571] CR2: 00007fa399b51421 CR3: 0000000137898002 CR4: 00000000003706e0 [ 145.570623] Call Trace: [ 145.570675] <TASK> [ 145.570727] ? ath11k_ce_tx_process_cb+0x34b/0x860 [ath11k] [ 145.570797] ath11k_ce_tx_process_cb+0x356/0x860 [ath11k] [ 145.570864] ? tasklet_init+0x150/0x150 [ 145.570919] ? ath11k_ce_alloc_pipes+0x280/0x280 [ath11k] [ 145.570986] ? tasklet_clear_sched+0x42/0xe0 [ 145.571042] ? tasklet_kill+0xe9/0x1b0 [ 145.571095] ? tasklet_clear_sched+0xe0/0xe0 [ 145.571148] ? irq_has_action+0x120/0x120 [ 145.571202] ath11k_ce_cleanup_pipes+0x45a/0x580 [ath11k] [ 145.571270] ? ath11k_pci_stop+0x10e/0x170 [ath11k_pci] [ 145.571345] ath11k_core_stop+0x8a/0xc0 [ath11k] [ 145.571434] ath11k_core_deinit+0x9e/0x150 [ath11k] [ 145.571499] ath11k_pci_remove+0xd2/0x260 [ath11k_pci] [ 145.571553] pci_device_remove+0x9a/0x1c0 [ 145.571605] __device_release_driver+0x332/0x660 [ 145.571659] driver_detach+0x1e7/0x2c0 [ 145.571712] bus_remove_driver+0xe2/0x2d0 [ 145.571772] pci_unregister_driver+0x21/0x250 [ 145.571826] __do_sys_delete_module+0x30a/0x4b0 [ 145.571879] ? free_module+0xac0/0xac0 [ 145.571933] ? lockdep_hardirqs_on_prepare.part.0+0x18c/0x370 [ 145.571986] ? syscall_enter_from_user_mode+0x1d/0x50 [ 145.572039] ? lockdep_hardirqs_on+0x79/0x100 [ 145.572097] do_syscall_64+0x3b/0x90 [ 145.572153] entry_SYSCALL_64_after_hwframe+0x44/0xae
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03003-QCAHSPSWPL_V1_V2_SILICONZ_LITE-2
Signed-off-by: Kalle Valo quic_kvalo@quicinc.com Link: https://lore.kernel.org/r/20220127090117.2024-2-kvalo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath11k/mhi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ath/ath11k/mhi.c b/drivers/net/wireless/ath/ath11k/mhi.c index 49c0b1ad40a0..f2149241fb13 100644 --- a/drivers/net/wireless/ath/ath11k/mhi.c +++ b/drivers/net/wireless/ath/ath11k/mhi.c @@ -519,7 +519,7 @@ static int ath11k_mhi_set_state(struct ath11k_pci *ab_pci, ret = 0; break; case ATH11K_MHI_POWER_ON: - ret = mhi_async_power_up(ab_pci->mhi_ctrl); + ret = mhi_sync_power_up(ab_pci->mhi_ctrl); break; case ATH11K_MHI_POWER_OFF: mhi_power_down(ab_pci->mhi_ctrl, true);
From: Tony Lu tonylu@linux.alibaba.com
[ Upstream commit ea785a1a573b390a150010b3c5b81e1ccd8c98a8 ]
According to the man page of TCP_CORK [1], if set, don't send out partial frames. All queued partial frames are sent when option is cleared again.
When applications call setsockopt to disable TCP_CORK, this call is protected by lock_sock(), and tries to mod_delayed_work() to 0, in order to send pending data right now. However, the delayed work smc_tx_work is also protected by lock_sock(). There introduces lock contention for sending data.
To fix it, send pending data directly which acts like TCP, without lock_sock() protected in the context of setsockopt (already lock_sock()ed), and cancel unnecessary dealyed work, which is protected by lock.
[1] https://linux.die.net/man/7/tcp
Signed-off-by: Tony Lu tonylu@linux.alibaba.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/smc/af_smc.c | 4 ++-- net/smc/smc_tx.c | 25 +++++++++++++++---------- net/smc/smc_tx.h | 1 + 3 files changed, 18 insertions(+), 12 deletions(-)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 5c4c0320e822..183b122807b6 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -2430,8 +2430,8 @@ static int smc_setsockopt(struct socket *sock, int level, int optname, sk->sk_state != SMC_CLOSED) { if (!val) { SMC_STAT_INC(smc, cork_cnt); - mod_delayed_work(smc->conn.lgr->tx_wq, - &smc->conn.tx_work, 0); + smc_tx_pending(&smc->conn); + cancel_delayed_work(&smc->conn.tx_work); } } break; diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c index 738a4a99c827..31ee76131a79 100644 --- a/net/smc/smc_tx.c +++ b/net/smc/smc_tx.c @@ -594,27 +594,32 @@ int smc_tx_sndbuf_nonempty(struct smc_connection *conn) return rc; }
-/* Wakeup sndbuf consumers from process context - * since there is more data to transmit - */ -void smc_tx_work(struct work_struct *work) +void smc_tx_pending(struct smc_connection *conn) { - struct smc_connection *conn = container_of(to_delayed_work(work), - struct smc_connection, - tx_work); struct smc_sock *smc = container_of(conn, struct smc_sock, conn); int rc;
- lock_sock(&smc->sk); if (smc->sk.sk_err) - goto out; + return;
rc = smc_tx_sndbuf_nonempty(conn); if (!rc && conn->local_rx_ctrl.prod_flags.write_blocked && !atomic_read(&conn->bytes_to_rcv)) conn->local_rx_ctrl.prod_flags.write_blocked = 0; +} + +/* Wakeup sndbuf consumers from process context + * since there is more data to transmit + */ +void smc_tx_work(struct work_struct *work) +{ + struct smc_connection *conn = container_of(to_delayed_work(work), + struct smc_connection, + tx_work); + struct smc_sock *smc = container_of(conn, struct smc_sock, conn);
-out: + lock_sock(&smc->sk); + smc_tx_pending(conn); release_sock(&smc->sk); }
diff --git a/net/smc/smc_tx.h b/net/smc/smc_tx.h index 07e6ad76224a..a59f370b8b43 100644 --- a/net/smc/smc_tx.h +++ b/net/smc/smc_tx.h @@ -27,6 +27,7 @@ static inline int smc_tx_prepared_sends(struct smc_connection *conn) return smc_curs_diff(conn->sndbuf_desc->len, &sent, &prep); }
+void smc_tx_pending(struct smc_connection *conn); void smc_tx_work(struct work_struct *work); void smc_tx_init(struct smc_sock *smc); int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len);
From: Yongzhi Liu lyz_cs@pku.edu.cn
[ Upstream commit 46f47807738441e354873546dde0b000106c068a ]
pm_runtime_get_sync() will increase the rumtime PM counter even when it returns an error. Thus a pairing decrement is needed to prevent refcount leak. Fix this by replacing this API with pm_runtime_resume_and_get(), which will not change the runtime PM counter on error. Besides, a matching decrement is needed on the error handling path to keep the counter balanced.
Signed-off-by: Yongzhi Liu lyz_cs@pku.edu.cn Reviewed-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Signed-off-by: Robert Foss robert.foss@linaro.org Link: https://patchwork.freedesktop.org/patch/msgid/1643008835-73961-1-git-send-em... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/bridge/nwl-dsi.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/bridge/nwl-dsi.c b/drivers/gpu/drm/bridge/nwl-dsi.c index 6e484d836cfe..691039aba87f 100644 --- a/drivers/gpu/drm/bridge/nwl-dsi.c +++ b/drivers/gpu/drm/bridge/nwl-dsi.c @@ -861,18 +861,19 @@ nwl_dsi_bridge_mode_set(struct drm_bridge *bridge, memcpy(&dsi->mode, adjusted_mode, sizeof(dsi->mode)); drm_mode_debug_printmodeline(adjusted_mode);
- pm_runtime_get_sync(dev); + if (pm_runtime_resume_and_get(dev) < 0) + return;
if (clk_prepare_enable(dsi->lcdif_clk) < 0) - return; + goto runtime_put; if (clk_prepare_enable(dsi->core_clk) < 0) - return; + goto runtime_put;
/* Step 1 from DSI reset-out instructions */ ret = reset_control_deassert(dsi->rst_pclk); if (ret < 0) { DRM_DEV_ERROR(dev, "Failed to deassert PCLK: %d\n", ret); - return; + goto runtime_put; }
/* Step 2 from DSI reset-out instructions */ @@ -882,13 +883,18 @@ nwl_dsi_bridge_mode_set(struct drm_bridge *bridge, ret = reset_control_deassert(dsi->rst_esc); if (ret < 0) { DRM_DEV_ERROR(dev, "Failed to deassert ESC: %d\n", ret); - return; + goto runtime_put; } ret = reset_control_deassert(dsi->rst_byte); if (ret < 0) { DRM_DEV_ERROR(dev, "Failed to deassert BYTE: %d\n", ret); - return; + goto runtime_put; } + + return; + +runtime_put: + pm_runtime_put_sync(dev); }
static void
From: Jakub Sitnicki jakub@cloudflare.com
[ Upstream commit 4421a582718ab81608d8486734c18083b822390d ]
Menglong Dong reports that the documentation for the dst_port field in struct bpf_sock is inaccurate and confusing. From the BPF program PoV, the field is a zero-padded 16-bit integer in network byte order. The value appears to the BPF user as if laid out in memory as so:
offsetof(struct bpf_sock, dst_port) + 0 <port MSB> + 8 <port LSB> +16 0x00 +24 0x00
32-, 16-, and 8-bit wide loads from the field are all allowed, but only if the offset into the field is 0.
32-bit wide loads from dst_port are especially confusing. The loaded value, after converting to host byte order with bpf_ntohl(dst_port), contains the port number in the upper 16-bits.
Remove the confusion by splitting the field into two 16-bit fields. For backward compatibility, allow 32-bit wide loads from offsetof(struct bpf_sock, dst_port).
While at it, allow loads 8-bit loads at offset [0] and [1] from dst_port.
Reported-by: Menglong Dong imagedong@tencent.com Signed-off-by: Jakub Sitnicki jakub@cloudflare.com Link: https://lore.kernel.org/r/20220130115518.213259-2-jakub@cloudflare.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/uapi/linux/bpf.h | 3 ++- net/core/filter.c | 10 +++++++++- 2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index e3fb5e520511..2136e45656ab 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -5347,7 +5347,8 @@ struct bpf_sock { __u32 src_ip4; __u32 src_ip6[4]; __u32 src_port; /* host byte order */ - __u32 dst_port; /* network byte order */ + __be16 dst_port; /* network byte order */ + __u16 :16; /* zero padding */ __u32 dst_ip4; __u32 dst_ip6[4]; __u32 state; diff --git a/net/core/filter.c b/net/core/filter.c index 76e406965b6f..a65de7ac60aa 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -7966,6 +7966,7 @@ bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type, struct bpf_insn_access_aux *info) { const int size_default = sizeof(__u32); + int field_size;
if (off < 0 || off >= sizeof(struct bpf_sock)) return false; @@ -7977,7 +7978,6 @@ bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type, case offsetof(struct bpf_sock, family): case offsetof(struct bpf_sock, type): case offsetof(struct bpf_sock, protocol): - case offsetof(struct bpf_sock, dst_port): case offsetof(struct bpf_sock, src_port): case offsetof(struct bpf_sock, rx_queue_mapping): case bpf_ctx_range(struct bpf_sock, src_ip4): @@ -7986,6 +7986,14 @@ bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type, case bpf_ctx_range_till(struct bpf_sock, dst_ip6[0], dst_ip6[3]): bpf_ctx_record_field_size(info, size_default); return bpf_ctx_narrow_access_ok(off, size, size_default); + case bpf_ctx_range(struct bpf_sock, dst_port): + field_size = size == size_default ? + size_default : sizeof_field(struct bpf_sock, dst_port); + bpf_ctx_record_field_size(info, field_size); + return bpf_ctx_narrow_access_ok(off, size, field_size); + case offsetofend(struct bpf_sock, dst_port) ... + offsetof(struct bpf_sock, dst_ip4) - 1: + return false; }
return size == size_default;
From: Yang Guang yang.guang5@zte.com.cn
[ Upstream commit 0ad3867b0f13e45cfee5a1298bfd40eef096116c ]
coccinelle report: ./drivers/scsi/mvsas/mv_init.c:699:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/mvsas/mv_init.c:747:8-16: WARNING: use scnprintf or sprintf
Use sysfs_emit() instead of scnprintf() or sprintf().
Link: https://lore.kernel.org/r/c1711f7cf251730a8ceb5bdfc313bf85662b3395.164318294... Reported-by: Zeal Robot zealci@zte.com.cn Signed-off-by: Yang Guang yang.guang5@zte.com.cn Signed-off-by: David Yang davidcomponentone@gmail.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/mvsas/mv_init.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c index f18dd9703595..787cf439ba57 100644 --- a/drivers/scsi/mvsas/mv_init.c +++ b/drivers/scsi/mvsas/mv_init.c @@ -696,7 +696,7 @@ static struct pci_driver mvs_pci_driver = { static ssize_t driver_version_show(struct device *cdev, struct device_attribute *attr, char *buffer) { - return snprintf(buffer, PAGE_SIZE, "%s\n", DRV_VERSION); + return sysfs_emit(buffer, "%s\n", DRV_VERSION); }
static DEVICE_ATTR_RO(driver_version); @@ -744,7 +744,7 @@ static ssize_t interrupt_coalescing_store(struct device *cdev, static ssize_t interrupt_coalescing_show(struct device *cdev, struct device_attribute *attr, char *buffer) { - return snprintf(buffer, PAGE_SIZE, "%d\n", interrupt_coalescing); + return sysfs_emit(buffer, "%d\n", interrupt_coalescing); }
static DEVICE_ATTR_RW(interrupt_coalescing);
From: Yang Guang yang.guang5@zte.com.cn
[ Upstream commit 2245ea91fd3a04cafbe2f54911432a8657528c3b ]
coccinelle report: ./drivers/scsi/bfa/bfad_attr.c:908:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:860:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:888:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:853:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:808:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:728:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:822:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:927:9-17: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:900:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:874:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:714:8-16: WARNING: use scnprintf or sprintf ./drivers/scsi/bfa/bfad_attr.c:839:8-16: WARNING: use scnprintf or sprintf
Use sysfs_emit() instead of scnprintf() or sprintf().
Link: https://lore.kernel.org/r/def83ff75faec64ba592b867a8499b1367bae303.164318146... Reported-by: Zeal Robot zealci@zte.com.cn Signed-off-by: Yang Guang yang.guang5@zte.com.cn Signed-off-by: David Yang davidcomponentone@gmail.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/bfa/bfad_attr.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/drivers/scsi/bfa/bfad_attr.c b/drivers/scsi/bfa/bfad_attr.c index 5ae1e3f78910..e049cdb3c286 100644 --- a/drivers/scsi/bfa/bfad_attr.c +++ b/drivers/scsi/bfa/bfad_attr.c @@ -711,7 +711,7 @@ bfad_im_serial_num_show(struct device *dev, struct device_attribute *attr, char serial_num[BFA_ADAPTER_SERIAL_NUM_LEN];
bfa_get_adapter_serial_num(&bfad->bfa, serial_num); - return snprintf(buf, PAGE_SIZE, "%s\n", serial_num); + return sysfs_emit(buf, "%s\n", serial_num); }
static ssize_t @@ -725,7 +725,7 @@ bfad_im_model_show(struct device *dev, struct device_attribute *attr, char model[BFA_ADAPTER_MODEL_NAME_LEN];
bfa_get_adapter_model(&bfad->bfa, model); - return snprintf(buf, PAGE_SIZE, "%s\n", model); + return sysfs_emit(buf, "%s\n", model); }
static ssize_t @@ -805,7 +805,7 @@ bfad_im_model_desc_show(struct device *dev, struct device_attribute *attr, snprintf(model_descr, BFA_ADAPTER_MODEL_DESCR_LEN, "Invalid Model");
- return snprintf(buf, PAGE_SIZE, "%s\n", model_descr); + return sysfs_emit(buf, "%s\n", model_descr); }
static ssize_t @@ -819,7 +819,7 @@ bfad_im_node_name_show(struct device *dev, struct device_attribute *attr, u64 nwwn;
nwwn = bfa_fcs_lport_get_nwwn(port->fcs_port); - return snprintf(buf, PAGE_SIZE, "0x%llx\n", cpu_to_be64(nwwn)); + return sysfs_emit(buf, "0x%llx\n", cpu_to_be64(nwwn)); }
static ssize_t @@ -836,7 +836,7 @@ bfad_im_symbolic_name_show(struct device *dev, struct device_attribute *attr, bfa_fcs_lport_get_attr(&bfad->bfa_fcs.fabric.bport, &port_attr); strlcpy(symname, port_attr.port_cfg.sym_name.symname, BFA_SYMNAME_MAXLEN); - return snprintf(buf, PAGE_SIZE, "%s\n", symname); + return sysfs_emit(buf, "%s\n", symname); }
static ssize_t @@ -850,14 +850,14 @@ bfad_im_hw_version_show(struct device *dev, struct device_attribute *attr, char hw_ver[BFA_VERSION_LEN];
bfa_get_pci_chip_rev(&bfad->bfa, hw_ver); - return snprintf(buf, PAGE_SIZE, "%s\n", hw_ver); + return sysfs_emit(buf, "%s\n", hw_ver); }
static ssize_t bfad_im_drv_version_show(struct device *dev, struct device_attribute *attr, char *buf) { - return snprintf(buf, PAGE_SIZE, "%s\n", BFAD_DRIVER_VERSION); + return sysfs_emit(buf, "%s\n", BFAD_DRIVER_VERSION); }
static ssize_t @@ -871,7 +871,7 @@ bfad_im_optionrom_version_show(struct device *dev, char optrom_ver[BFA_VERSION_LEN];
bfa_get_adapter_optrom_ver(&bfad->bfa, optrom_ver); - return snprintf(buf, PAGE_SIZE, "%s\n", optrom_ver); + return sysfs_emit(buf, "%s\n", optrom_ver); }
static ssize_t @@ -885,7 +885,7 @@ bfad_im_fw_version_show(struct device *dev, struct device_attribute *attr, char fw_ver[BFA_VERSION_LEN];
bfa_get_adapter_fw_ver(&bfad->bfa, fw_ver); - return snprintf(buf, PAGE_SIZE, "%s\n", fw_ver); + return sysfs_emit(buf, "%s\n", fw_ver); }
static ssize_t @@ -897,7 +897,7 @@ bfad_im_num_of_ports_show(struct device *dev, struct device_attribute *attr, (struct bfad_im_port_s *) shost->hostdata[0]; struct bfad_s *bfad = im_port->bfad;
- return snprintf(buf, PAGE_SIZE, "%d\n", + return sysfs_emit(buf, "%d\n", bfa_get_nports(&bfad->bfa)); }
@@ -905,7 +905,7 @@ static ssize_t bfad_im_drv_name_show(struct device *dev, struct device_attribute *attr, char *buf) { - return snprintf(buf, PAGE_SIZE, "%s\n", BFAD_DRIVER_NAME); + return sysfs_emit(buf, "%s\n", BFAD_DRIVER_NAME); }
static ssize_t @@ -924,14 +924,14 @@ bfad_im_num_of_discovered_ports_show(struct device *dev, rports = kcalloc(nrports, sizeof(struct bfa_rport_qualifier_s), GFP_ATOMIC); if (rports == NULL) - return snprintf(buf, PAGE_SIZE, "Failed\n"); + return sysfs_emit(buf, "Failed\n");
spin_lock_irqsave(&bfad->bfad_lock, flags); bfa_fcs_lport_get_rport_quals(port->fcs_port, rports, &nrports); spin_unlock_irqrestore(&bfad->bfad_lock, flags); kfree(rports);
- return snprintf(buf, PAGE_SIZE, "%d\n", nrports); + return sysfs_emit(buf, "%d\n", nrports); }
static DEVICE_ATTR(serial_number, S_IRUGO,
From: Yongzhi Liu lyz_cs@pku.edu.cn
[ Upstream commit e57c1a3bd5e8e0c7181f65ae55581f0236a8f284 ]
[why] Unlock is needed on the error handling path to prevent dead lock. v3d_submit_cl_ioctl and v3d_submit_csd_ioctl is missing unlock.
[how] Fix this by changing goto target on the error handling path. So changing the goto to target an error handling path that includes drm_gem_unlock reservations.
Signed-off-by: Yongzhi Liu lyz_cs@pku.edu.cn Reviewed-by: Melissa Wen mwen@igalia.com Signed-off-by: Melissa Wen melissa.srw@gmail.com Link: https://patchwork.freedesktop.org/patch/msgid/1643377262-109975-1-git-send-e... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/v3d/v3d_gem.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c index 772b5831bcc6..805d6f6cba0e 100644 --- a/drivers/gpu/drm/v3d/v3d_gem.c +++ b/drivers/gpu/drm/v3d/v3d_gem.c @@ -625,7 +625,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
if (!render->base.perfmon) { ret = -ENOENT; - goto fail; + goto fail_perfmon; } }
@@ -678,6 +678,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
fail_unreserve: mutex_unlock(&v3d->sched_lock); +fail_perfmon: drm_gem_unlock_reservations(last_job->bo, last_job->bo_count, &acquire_ctx); fail: @@ -854,7 +855,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data, args->perfmon_id); if (!job->base.perfmon) { ret = -ENOENT; - goto fail; + goto fail_perfmon; } }
@@ -886,6 +887,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
fail_unreserve: mutex_unlock(&v3d->sched_lock); +fail_perfmon: drm_gem_unlock_reservations(clean_job->bo, clean_job->bo_count, &acquire_ctx); fail:
From: Evgeny Boger boger@wirenboard.com
[ Upstream commit d4f408cdcd26921c1268cb8dcbe8ffb6faf837f3 ]
As stated in [1], negative current values are used for discharging batteries.
AXP PMICs internally have two different ADC channels for shunt current measurement: one used during charging and one during discharging. The values reported by these ADCs are unsigned. While the driver properly selects ADC channel to get the data from, it doesn't apply negative sign when reporting discharging current.
[1] Documentation/ABI/testing/sysfs-class-power
Signed-off-by: Evgeny Boger boger@wirenboard.com Acked-by: Chen-Yu Tsai wens@csie.org Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/power/supply/axp20x_battery.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/power/supply/axp20x_battery.c b/drivers/power/supply/axp20x_battery.c index 18a9db0df4b1..335e12cc5e2f 100644 --- a/drivers/power/supply/axp20x_battery.c +++ b/drivers/power/supply/axp20x_battery.c @@ -186,7 +186,6 @@ static int axp20x_battery_get_prop(struct power_supply *psy, union power_supply_propval *val) { struct axp20x_batt_ps *axp20x_batt = power_supply_get_drvdata(psy); - struct iio_channel *chan; int ret = 0, reg, val1;
switch (psp) { @@ -266,12 +265,12 @@ static int axp20x_battery_get_prop(struct power_supply *psy, if (ret) return ret;
- if (reg & AXP20X_PWR_STATUS_BAT_CHARGING) - chan = axp20x_batt->batt_chrg_i; - else - chan = axp20x_batt->batt_dischrg_i; - - ret = iio_read_channel_processed(chan, &val->intval); + if (reg & AXP20X_PWR_STATUS_BAT_CHARGING) { + ret = iio_read_channel_processed(axp20x_batt->batt_chrg_i, &val->intval); + } else { + ret = iio_read_channel_processed(axp20x_batt->batt_dischrg_i, &val1); + val->intval = -val1; + } if (ret) return ret;
From: Ben Greear greearb@candelatech.com
[ Upstream commit 827e7799c61b978fbc2cc9dac66cb62401b2b3f0 ]
If the nic fails to start, it is possible that the reset_work has already been scheduled. Ensure the work item is canceled so we do not have use-after-free crash in case cleanup is called before the work item is executed.
This fixes crash on my x86_64 apu2 when mt7921k radio fails to work. Radio still fails, but OS does not crash.
Signed-off-by: Ben Greear greearb@candelatech.com Signed-off-by: Felix Fietkau nbd@nbd.name Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/mediatek/mt76/mt7921/main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/main.c b/drivers/net/wireless/mediatek/mt76/mt7921/main.c index 9eb90e6f0103..30252f408ddc 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7921/main.c +++ b/drivers/net/wireless/mediatek/mt76/mt7921/main.c @@ -224,6 +224,7 @@ static void mt7921_stop(struct ieee80211_hw *hw)
cancel_delayed_work_sync(&dev->pm.ps_work); cancel_work_sync(&dev->pm.wake_work); + cancel_work_sync(&dev->reset_work); mt76_connac_free_pending_tx_skbs(&dev->pm, NULL);
mt7921_mutex_acquire(dev);
From: Lorenzo Bianconi lorenzo@kernel.org
[ Upstream commit 577298ec55dfc8b9aece54520f0258c3f93a6573 ]
Even if it is only a false-positive since skip_buf0/skip_buf1 are only used in mt76_dma_tx_cleanup_idx routine, initialize skip_unmap in mt76_dma_rx_fill in order to fix the following UBSAN report:
[ 13.924906] UBSAN: invalid-load in linux-5.15.0/drivers/net/wireless/mediatek/mt76/dma.c:162:13 [ 13.924909] load of value 225 is not a valid value for type '_Bool' [ 13.924912] CPU: 9 PID: 672 Comm: systemd-udevd Not tainted 5.15.0-18-generic #18-Ubuntu [ 13.924914] Hardware name: LENOVO 21A0000CMX/21A0000CMX, BIOS R1MET43W (1.13 ) 11/05/2021 [ 13.924915] Call Trace: [ 13.924917] <TASK> [ 13.924920] show_stack+0x52/0x58 [ 13.924925] dump_stack_lvl+0x4a/0x5f [ 13.924931] dump_stack+0x10/0x12 [ 13.924932] ubsan_epilogue+0x9/0x45 [ 13.924934] __ubsan_handle_load_invalid_value.cold+0x44/0x49 [ 13.924935] ? __iommu_dma_map+0x84/0xf0 [ 13.924939] mt76_dma_add_buf.constprop.0.cold+0x23/0x85 [mt76] [ 13.924949] mt76_dma_rx_fill.isra.0+0x102/0x1f0 [mt76] [ 13.924954] mt76_dma_init+0xc9/0x150 [mt76] [ 13.924959] ? mt7921_dma_enable+0x110/0x110 [mt7921e] [ 13.924966] mt7921_dma_init+0x1e3/0x260 [mt7921e] [ 13.924970] mt7921_register_device+0x29d/0x510 [mt7921e] [ 13.924975] mt7921_pci_probe.part.0+0x17f/0x1b0 [mt7921e] [ 13.924980] mt7921_pci_probe+0x43/0x60 [mt7921e] [ 13.924984] local_pci_probe+0x4b/0x90 [ 13.924987] pci_device_probe+0x115/0x1f0 [ 13.924989] really_probe+0x21e/0x420 [ 13.924992] __driver_probe_device+0x115/0x190 [ 13.924994] driver_probe_device+0x23/0xc0 [ 13.924996] __driver_attach+0xbd/0x1d0 [ 13.924998] ? __device_attach_driver+0x110/0x110 [ 13.924999] bus_for_each_dev+0x7e/0xc0 [ 13.925001] driver_attach+0x1e/0x20 [ 13.925003] bus_add_driver+0x135/0x200 [ 13.925005] driver_register+0x95/0xf0 [ 13.925008] ? 0xffffffffc0766000 [ 13.925010] __pci_register_driver+0x68/0x70 [ 13.925011] mt7921_pci_driver_init+0x23/0x1000 [mt7921e] [ 13.925015] do_one_initcall+0x48/0x1d0 [ 13.925019] ? kmem_cache_alloc_trace+0x19e/0x2e0 [ 13.925022] do_init_module+0x62/0x280 [ 13.925025] load_module+0xac9/0xbb0 [ 13.925027] __do_sys_finit_module+0xbf/0x120 [ 13.925029] __x64_sys_finit_module+0x18/0x20 [ 13.925030] do_syscall_64+0x5c/0xc0 [ 13.925033] ? do_syscall_64+0x69/0xc0 [ 13.925034] ? sysvec_reschedule_ipi+0x78/0xe0 [ 13.925036] ? asm_sysvec_reschedule_ipi+0xa/0x20 [ 13.925039] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 13.925040] RIP: 0033:0x7fbf2b90f94d [ 13.925045] RSP: 002b:00007ffe2ec7e5d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 13.925047] RAX: ffffffffffffffda RBX: 000056106b0634e0 RCX: 00007fbf2b90f94d [ 13.925048] RDX: 0000000000000000 RSI: 00007fbf2baa3441 RDI: 0000000000000013 [ 13.925049] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002 [ 13.925050] R10: 0000000000000013 R11: 0000000000000246 R12: 00007fbf2baa3441 [ 13.925051] R13: 000056106b062620 R14: 000056106b0610c0 R15: 000056106b0640d0 [ 13.925053] </TASK>
Signed-off-by: Lorenzo Bianconi lorenzo@kernel.org Signed-off-by: Felix Fietkau nbd@nbd.name Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/mediatek/mt76/dma.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/mediatek/mt76/dma.c b/drivers/net/wireless/mediatek/mt76/dma.c index 5e1c1506a4c6..7aecde35cb9a 100644 --- a/drivers/net/wireless/mediatek/mt76/dma.c +++ b/drivers/net/wireless/mediatek/mt76/dma.c @@ -465,6 +465,7 @@ mt76_dma_rx_fill(struct mt76_dev *dev, struct mt76_queue *q)
qbuf.addr = addr + offset; qbuf.len = len - offset; + qbuf.skip_unmap = false; mt76_dma_add_buf(dev, q, &qbuf, 1, 0, buf, NULL); frames++; }
From: Avraham Stern avraham.stern@intel.com
[ Upstream commit 5666ee154f4696c011dfa8544aaf5591b6b87515 ]
When adding 6GHz channels to scan request based on reported co-located APs, don't add channels that have only APs with "non-transmitted" BSSes if they only match the wildcard SSID since they will be found by probing the "transmitted" BSS.
Signed-off-by: Avraham Stern avraham.stern@intel.com Signed-off-by: Luca Coelho luciano.coelho@intel.com Link: https://lore.kernel.org/r/iwlwifi.20220202104617.f6ddf099f934.I231e55885d364... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/wireless/scan.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/net/wireless/scan.c b/net/wireless/scan.c index adc0d14cfd86..8e1e578d64bc 100644 --- a/net/wireless/scan.c +++ b/net/wireless/scan.c @@ -702,8 +702,12 @@ static bool cfg80211_find_ssid_match(struct cfg80211_colocated_ap *ap,
for (i = 0; i < request->n_ssids; i++) { /* wildcard ssid in the scan request */ - if (!request->ssids[i].ssid_len) + if (!request->ssids[i].ssid_len) { + if (ap->multi_bss && !ap->transmitted_bssid) + continue; + return true; + }
if (ap->ssid_len && ap->ssid_len == request->ssids[i].ssid_len) { @@ -829,6 +833,9 @@ static int cfg80211_scan_6ghz(struct cfg80211_registered_device *rdev) !cfg80211_find_ssid_match(ap, request)) continue;
+ if (!request->n_ssids && ap->multi_bss && !ap->transmitted_bssid) + continue; + cfg80211_scan_req_add_chan(request, chan, true); memcpy(scan_6ghz_params->bssid, ap->bssid, ETH_ALEN); scan_6ghz_params->short_ssid = ap->short_ssid;
From: Yonghong Song yhs@fb.com
[ Upstream commit 0908a66ad1124c1634c33847ac662106f7f2c198 ]
There are cases where clang compiler is packaged in a way readelf is a symbolic link to llvm-readelf. In such cases, llvm-readelf will be used instead of default binutils readelf, and the following error will appear during libbpf build:
# Warning: Num of global symbols in # /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/sharedobjs/libbpf-in.o (367) # does NOT match with num of versioned symbols in # /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/libbpf.so libbpf.map (383). # Please make sure all LIBBPF_API symbols are versioned in libbpf.map. # --- /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/libbpf_global_syms.tmp ... # +++ /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/libbpf_versioned_syms.tmp ... # @@ -324,6 +324,22 @@ # btf__str_by_offset # btf__type_by_id # btf__type_cnt # +LIBBPF_0.0.1 # +LIBBPF_0.0.2 # +LIBBPF_0.0.3 # +LIBBPF_0.0.4 # +LIBBPF_0.0.5 # +LIBBPF_0.0.6 # +LIBBPF_0.0.7 # +LIBBPF_0.0.8 # +LIBBPF_0.0.9 # +LIBBPF_0.1.0 # +LIBBPF_0.2.0 # +LIBBPF_0.3.0 # +LIBBPF_0.4.0 # +LIBBPF_0.5.0 # +LIBBPF_0.6.0 # +LIBBPF_0.7.0 # libbpf_attach_type_by_name # libbpf_find_kernel_btf # libbpf_find_vmlinux_btf_id # make[2]: *** [Makefile:184: check_abi] Error 1 # make[1]: *** [Makefile:140: all] Error 2
The above failure is due to different printouts for some ABS versioned symbols. For example, with the same libbpf.so, $ /bin/readelf --dyn-syms --wide tools/lib/bpf/libbpf.so | grep "LIBBPF" | grep ABS 134: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.5.0 202: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.6.0 ... $ /opt/llvm/bin/readelf --dyn-syms --wide tools/lib/bpf/libbpf.so | grep "LIBBPF" | grep ABS 134: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.5.0@@LIBBPF_0.5.0 202: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.6.0@@LIBBPF_0.6.0 ... The binutils readelf doesn't print out the symbol LIBBPF_* version and llvm-readelf does. Such a difference caused libbpf build failure with llvm-readelf.
The proposed fix filters out all ABS symbols as they are not part of the comparison. This works for both binutils readelf and llvm-readelf.
Reported-by: Delyan Kratunov delyank@fb.com Signed-off-by: Yonghong Song yhs@fb.com Signed-off-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/bpf/20220204214355.502108-1-yhs@fb.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/lib/bpf/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile index 74c3b73a5fbe..089b73b3cb37 100644 --- a/tools/lib/bpf/Makefile +++ b/tools/lib/bpf/Makefile @@ -126,7 +126,7 @@ GLOBAL_SYM_COUNT = $(shell readelf -s --wide $(BPF_IN_SHARED) | \ sort -u | wc -l) VERSIONED_SYM_COUNT = $(shell readelf --dyn-syms --wide $(OUTPUT)libbpf.so | \ sed 's/[.*]//' | \ - awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$NF}' | \ + awk '/GLOBAL/ && /DEFAULT/ && !/UND|ABS/ {print $$NF}' | \ grep -Eo '[^ ]+@LIBBPF_' | cut -d@ -f1 | sort -u | wc -l)
CMD_TARGETS = $(LIB_TARGET) $(PC_FILE) @@ -195,7 +195,7 @@ check_abi: $(OUTPUT)libbpf.so $(VERSION_SCRIPT) sort -u > $(OUTPUT)libbpf_global_syms.tmp; \ readelf --dyn-syms --wide $(OUTPUT)libbpf.so | \ sed 's/[.*]//' | \ - awk '/GLOBAL/ && /DEFAULT/ && !/UND/ {print $$NF}'| \ + awk '/GLOBAL/ && /DEFAULT/ && !/UND|ABS/ {print $$NF}'| \ grep -Eo '[^ ]+@LIBBPF_' | cut -d@ -f1 | \ sort -u > $(OUTPUT)libbpf_versioned_syms.tmp; \ diff -u $(OUTPUT)libbpf_global_syms.tmp \
From: Eric Dumazet edumazet@google.com
[ Upstream commit 145c7a793838add5e004e7d49a67654dc7eba147 ]
This fixes minor data-races in ip6_mc_input() and batadv_mcast_mla_rtr_flags_softif_get_ipv6()
Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/ipv6.h | 2 +- net/batman-adv/multicast.c | 2 +- net/ipv6/addrconf.c | 4 ++-- net/ipv6/ip6_input.c | 2 +- net/ipv6/ip6mr.c | 8 ++++---- 5 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index 07cba0b3496d..d1f386430795 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -51,7 +51,7 @@ struct ipv6_devconf { __s32 use_optimistic; #endif #ifdef CONFIG_IPV6_MROUTE - __s32 mc_forwarding; + atomic_t mc_forwarding; #endif __s32 disable_ipv6; __s32 drop_unicast_in_l2_multicast; diff --git a/net/batman-adv/multicast.c b/net/batman-adv/multicast.c index 6e3419beca09..2853634a3979 100644 --- a/net/batman-adv/multicast.c +++ b/net/batman-adv/multicast.c @@ -134,7 +134,7 @@ static u8 batadv_mcast_mla_rtr_flags_softif_get_ipv6(struct net_device *dev) { struct inet6_dev *in6_dev = __in6_dev_get(dev);
- if (in6_dev && in6_dev->cnf.mc_forwarding) + if (in6_dev && atomic_read(&in6_dev->cnf.mc_forwarding)) return BATADV_NO_FLAGS; else return BATADV_MCAST_WANT_NO_RTR6; diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 1fe27807e471..3a8838b79bb6 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -552,7 +552,7 @@ static int inet6_netconf_fill_devconf(struct sk_buff *skb, int ifindex, #ifdef CONFIG_IPV6_MROUTE if ((all || type == NETCONFA_MC_FORWARDING) && nla_put_s32(skb, NETCONFA_MC_FORWARDING, - devconf->mc_forwarding) < 0) + atomic_read(&devconf->mc_forwarding)) < 0) goto nla_put_failure; #endif if ((all || type == NETCONFA_PROXY_NEIGH) && @@ -5537,7 +5537,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf, array[DEVCONF_USE_OPTIMISTIC] = cnf->use_optimistic; #endif #ifdef CONFIG_IPV6_MROUTE - array[DEVCONF_MC_FORWARDING] = cnf->mc_forwarding; + array[DEVCONF_MC_FORWARDING] = atomic_read(&cnf->mc_forwarding); #endif array[DEVCONF_DISABLE_IPV6] = cnf->disable_ipv6; array[DEVCONF_ACCEPT_DAD] = cnf->accept_dad; diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c index 80256717868e..d4b1e2c5aa76 100644 --- a/net/ipv6/ip6_input.c +++ b/net/ipv6/ip6_input.c @@ -508,7 +508,7 @@ int ip6_mc_input(struct sk_buff *skb) /* * IPv6 multicast router mode is now supported ;) */ - if (dev_net(skb->dev)->ipv6.devconf_all->mc_forwarding && + if (atomic_read(&dev_net(skb->dev)->ipv6.devconf_all->mc_forwarding) && !(ipv6_addr_type(&hdr->daddr) & (IPV6_ADDR_LOOPBACK|IPV6_ADDR_LINKLOCAL)) && likely(!(IP6CB(skb)->flags & IP6SKB_FORWARDED))) { diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c index 6a4065d81aa9..91f1c5f56d5f 100644 --- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -739,7 +739,7 @@ static int mif6_delete(struct mr_table *mrt, int vifi, int notify,
in6_dev = __in6_dev_get(dev); if (in6_dev) { - in6_dev->cnf.mc_forwarding--; + atomic_dec(&in6_dev->cnf.mc_forwarding); inet6_netconf_notify_devconf(dev_net(dev), RTM_NEWNETCONF, NETCONFA_MC_FORWARDING, dev->ifindex, &in6_dev->cnf); @@ -907,7 +907,7 @@ static int mif6_add(struct net *net, struct mr_table *mrt,
in6_dev = __in6_dev_get(dev); if (in6_dev) { - in6_dev->cnf.mc_forwarding++; + atomic_inc(&in6_dev->cnf.mc_forwarding); inet6_netconf_notify_devconf(dev_net(dev), RTM_NEWNETCONF, NETCONFA_MC_FORWARDING, dev->ifindex, &in6_dev->cnf); @@ -1557,7 +1557,7 @@ static int ip6mr_sk_init(struct mr_table *mrt, struct sock *sk) } else { rcu_assign_pointer(mrt->mroute_sk, sk); sock_set_flag(sk, SOCK_RCU_FREE); - net->ipv6.devconf_all->mc_forwarding++; + atomic_inc(&net->ipv6.devconf_all->mc_forwarding); } write_unlock_bh(&mrt_lock);
@@ -1590,7 +1590,7 @@ int ip6mr_sk_done(struct sock *sk) * so the RCU grace period before sk freeing * is guaranteed by sk_destruct() */ - net->ipv6.devconf_all->mc_forwarding--; + atomic_dec(&net->ipv6.devconf_all->mc_forwarding); write_unlock_bh(&mrt_lock); inet6_netconf_notify_devconf(net, RTM_NEWNETCONF, NETCONFA_MC_FORWARDING,
From: Eric Dumazet edumazet@google.com
[ Upstream commit 9c1be1935fb68b2413796cdc03d019b8cf35ab51 ]
While testing a patch that will follow later ("net: add netns refcount tracker to struct nsproxy") I found that devtmpfs_init() was called before init_net was initialized.
This is a bug, because devtmpfs_setup() calls ksys_unshare(CLONE_NEWNS);
This has the effect of increasing init_net refcount, which will be later overwritten to 1, as part of setup_net(&init_net)
We had too many prior patches [1] trying to work around the root cause.
Really, make sure init_net is in BSS section, and that net_ns_init() is called earlier at boot time.
Note that another patch ("vfs: add netns refcount tracker to struct fs_context") also will need net_ns_init() being called before vfs_caches_init()
As a bonus, this patch saves around 4KB in .data section.
[1]
f8c46cb39079 ("netns: do not call pernet ops for not yet set up init_net namespace") b5082df8019a ("net: Initialise init_net.count to 1") 734b65417b24 ("net: Statically initialize init_net.dev_base_head")
v2: fixed a build error reported by kernel build bots (CONFIG_NET=n)
Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/net_namespace.h | 6 ++++++ init/main.c | 2 ++ net/core/dev.c | 3 +-- net/core/net_namespace.c | 17 +++++------------ 4 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index bb5fa5914032..2ba326f9e004 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -479,4 +479,10 @@ static inline void fnhe_genid_bump(struct net *net) atomic_inc(&net->fnhe_genid); }
+#ifdef CONFIG_NET +void net_ns_init(void); +#else +static inline void net_ns_init(void) {} +#endif + #endif /* __NET_NET_NAMESPACE_H */ diff --git a/init/main.c b/init/main.c index bcd132d4e7bd..b340d990d77c 100644 --- a/init/main.c +++ b/init/main.c @@ -100,6 +100,7 @@ #include <linux/kcsan.h> #include <linux/init_syscalls.h> #include <linux/stackdepot.h> +#include <net/net_namespace.h>
#include <asm/io.h> #include <asm/bugs.h> @@ -1122,6 +1123,7 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void) key_init(); security_init(); dbg_late_init(); + net_ns_init(); vfs_caches_init(); pagecache_init(); signals_init(); diff --git a/net/core/dev.c b/net/core/dev.c index 33dc2a3ff7d7..804aba2228c2 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -11378,8 +11378,7 @@ static int __net_init netdev_init(struct net *net) BUILD_BUG_ON(GRO_HASH_BUCKETS > 8 * sizeof_field(struct napi_struct, gro_bitmask));
- if (net != &init_net) - INIT_LIST_HEAD(&net->dev_base_head); + INIT_LIST_HEAD(&net->dev_base_head);
net->dev_name_head = netdev_create_hash(); if (net->dev_name_head == NULL) diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 9702d2b0d920..9745cb6fdf51 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -44,13 +44,7 @@ EXPORT_SYMBOL_GPL(net_rwsem); static struct key_tag init_net_key_domain = { .usage = REFCOUNT_INIT(1) }; #endif
-struct net init_net = { - .ns.count = REFCOUNT_INIT(1), - .dev_base_head = LIST_HEAD_INIT(init_net.dev_base_head), -#ifdef CONFIG_KEYS - .key_domain = &init_net_key_domain, -#endif -}; +struct net init_net; EXPORT_SYMBOL(init_net);
static bool init_net_initialized; @@ -1081,7 +1075,7 @@ static void rtnl_net_notifyid(struct net *net, int cmd, int id, u32 portid, rtnl_set_sk_err(net, RTNLGRP_NSID, err); }
-static int __init net_ns_init(void) +void __init net_ns_init(void) { struct net_generic *ng;
@@ -1102,6 +1096,9 @@ static int __init net_ns_init(void)
rcu_assign_pointer(init_net.gen, ng);
+#ifdef CONFIG_KEYS + init_net.key_domain = &init_net_key_domain; +#endif down_write(&pernet_ops_rwsem); if (setup_net(&init_net, &init_user_ns)) panic("Could not setup the initial network namespace"); @@ -1116,12 +1113,8 @@ static int __init net_ns_init(void) RTNL_FLAG_DOIT_UNLOCKED); rtnl_register(PF_UNSPEC, RTM_GETNSID, rtnl_net_getid, rtnl_net_dumpid, RTNL_FLAG_DOIT_UNLOCKED); - - return 0; }
-pure_initcall(net_ns_init); - static void free_exit_list(struct pernet_operations *ops, struct list_head *net_exit_list) { ops_pre_exit_list(ops, net_exit_list);
From: Sourabh Jain sourabhjain@linux.ibm.com
[ Upstream commit 7c5ed82b800d8615cdda00729e7b62e5899f0b13 ]
On large config LPARs (having 192 and more cores), Linux fails to boot due to insufficient memory in the first memblock. It is due to the memory reservation for the crash kernel which starts at 128MB offset of the first memblock. This memory reservation for the crash kernel doesn't leave enough space in the first memblock to accommodate other essential system resources.
The crash kernel start address was set to 128MB offset by default to ensure that the crash kernel get some memory below the RMA region which is used to be of size 256MB. But given that the RMA region size can be 512MB or more, setting the crash kernel offset to mid of RMA size will leave enough space for the kernel to allocate memory for other system resources.
Since the above crash kernel offset change is only applicable to the LPAR platform, the LPAR feature detection is pushed before the crash kernel reservation. The rest of LPAR specific initialization will still be done during pseries_probe_fw_features as usual.
This patch is dependent on changes to paca allocation for boot CPU. It expect boot CPU to discover 1T segment support which is introduced by the patch posted here: https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.html
Reported-by: Abdul haleem abdhalee@linux.vnet.ibm.com Signed-off-by: Sourabh Jain sourabhjain@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220204085601.107257-1-sourabhjain@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/rtas.c | 6 ++++++ arch/powerpc/kexec/core.c | 15 +++++++++++---- 2 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index ff80bbad22a5..e18a725a8e5d 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -1235,6 +1235,12 @@ int __init early_init_dt_scan_rtas(unsigned long node, entryp = of_get_flat_dt_prop(node, "linux,rtas-entry", NULL); sizep = of_get_flat_dt_prop(node, "rtas-size", NULL);
+#ifdef CONFIG_PPC64 + /* need this feature to decide the crashkernel offset */ + if (of_get_flat_dt_prop(node, "ibm,hypertas-functions", NULL)) + powerpc_firmware_features |= FW_FEATURE_LPAR; +#endif + if (basep && entryp && sizep) { rtas.base = *basep; rtas.entry = *entryp; diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c index 48525e8b5730..71b1bfdadd76 100644 --- a/arch/powerpc/kexec/core.c +++ b/arch/powerpc/kexec/core.c @@ -147,11 +147,18 @@ void __init reserve_crashkernel(void) if (!crashk_res.start) { #ifdef CONFIG_PPC64 /* - * On 64bit we split the RMO in half but cap it at half of - * a small SLB (128MB) since the crash kernel needs to place - * itself and some stacks to be in the first segment. + * On the LPAR platform place the crash kernel to mid of + * RMA size (512MB or more) to ensure the crash kernel + * gets enough space to place itself and some stack to be + * in the first segment. At the same time normal kernel + * also get enough space to allocate memory for essential + * system resource in the first segment. Keep the crash + * kernel starts at 128MB offset on other platforms. */ - crashk_res.start = min(0x8000000ULL, (ppc64_rma_size / 2)); + if (firmware_has_feature(FW_FEATURE_LPAR)) + crashk_res.start = ppc64_rma_size / 2; + else + crashk_res.start = min(0x8000000ULL, (ppc64_rma_size / 2)); #else crashk_res.start = KDUMP_KERNELBASE; #endif
From: Rajneesh Bhardwaj rajneesh.bhardwaj@amd.com
[ Upstream commit 447c7997b62a5115ba4da846dcdee4fc12298a6a ]
Noticed the below warning while running a pytorch workload on vega10 GPUs. Change to trylock to avoid conflicts with already held reservation locks.
[ +0.000003] WARNING: possible recursive locking detected [ +0.000003] 5.13.0-kfd-rajneesh #1030 Not tainted [ +0.000004] -------------------------------------------- [ +0.000002] python/4822 is trying to acquire lock: [ +0.000004] ffff932cd9a259f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000203] but task is already holding lock: [ +0.000003] ffff932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: ttm_eu_reserve_buffers+0x270/0x470 [ttm] [ +0.000017] other info that might help us debug this: [ +0.000002] Possible unsafe locking scenario:
[ +0.000003] CPU0 [ +0.000002] ---- [ +0.000002] lock(reservation_ww_class_mutex); [ +0.000004] lock(reservation_ww_class_mutex); [ +0.000003] *** DEADLOCK ***
[ +0.000002] May be due to missing lock nesting notation
[ +0.000003] 7 locks held by python/4822: [ +0.000003] #0: ffff932c4ac028d0 (&process->mutex){+.+.}-{3:3}, at: kfd_ioctl_map_memory_to_gpu+0x10b/0x320 [amdgpu] [ +0.000232] #1: ffff932c55e830a8 (&info->lock#2){+.+.}-{3:3}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x64/0xf60 [amdgpu] [ +0.000241] #2: ffff932cc45b5e68 (&(*mem)->lock){+.+.}-{3:3}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0xdf/0xf60 [amdgpu] [ +0.000236] #3: ffffb2b35606fd28 (reservation_ww_class_acquire){+.+.}-{0:0}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x232/0xf60 [amdgpu] [ +0.000235] #4: ffff932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: ttm_eu_reserve_buffers+0x270/0x470 [ttm] [ +0.000015] #5: ffffffffc045f700 (*(sspp++)){....}-{0:0}, at: drm_dev_enter+0x5/0xa0 [drm] [ +0.000038] #6: ffff932c52da7078 (&vm->eviction_lock){+.+.}-{3:3}, at: amdgpu_vm_bo_update_mapping+0xd5/0x4f0 [amdgpu] [ +0.000195] stack backtrace: [ +0.000003] CPU: 11 PID: 4822 Comm: python Not tainted 5.13.0-kfd-rajneesh #1030 [ +0.000005] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F02 08/29/2018 [ +0.000003] Call Trace: [ +0.000003] dump_stack+0x6d/0x89 [ +0.000010] __lock_acquire+0xb93/0x1a90 [ +0.000009] lock_acquire+0x25d/0x2d0 [ +0.000005] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000184] ? lock_is_held_type+0xa2/0x110 [ +0.000006] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000184] __ww_mutex_lock.constprop.17+0xca/0x1060 [ +0.000007] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000183] ? lock_release+0x13f/0x270 [ +0.000005] ? lock_is_held_type+0xa2/0x110 [ +0.000006] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000183] amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000185] ttm_bo_release+0x4c6/0x580 [ttm] [ +0.000010] amdgpu_bo_unref+0x1a/0x30 [amdgpu] [ +0.000183] amdgpu_vm_free_table+0x76/0xa0 [amdgpu] [ +0.000189] amdgpu_vm_free_pts+0xb8/0xf0 [amdgpu] [ +0.000189] amdgpu_vm_update_ptes+0x411/0x770 [amdgpu] [ +0.000191] amdgpu_vm_bo_update_mapping+0x324/0x4f0 [amdgpu] [ +0.000191] amdgpu_vm_bo_update+0x251/0x610 [amdgpu] [ +0.000191] update_gpuvm_pte+0xcc/0x290 [amdgpu] [ +0.000229] ? amdgpu_vm_bo_map+0xd7/0x130 [amdgpu] [ +0.000190] amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x912/0xf60 [amdgpu] [ +0.000234] kfd_ioctl_map_memory_to_gpu+0x182/0x320 [amdgpu] [ +0.000218] kfd_ioctl+0x2b9/0x600 [amdgpu] [ +0.000216] ? kfd_ioctl_unmap_memory_from_gpu+0x270/0x270 [amdgpu] [ +0.000216] ? lock_release+0x13f/0x270 [ +0.000006] ? __fget_files+0x107/0x1e0 [ +0.000007] __x64_sys_ioctl+0x8b/0xd0 [ +0.000007] do_syscall_64+0x36/0x70 [ +0.000004] entry_SYSCALL_64_after_hwframe+0x44/0xae [ +0.000007] RIP: 0033:0x7fbff90a7317 [ +0.000004] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48 [ +0.000005] RSP: 002b:00007fbe301fe648 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.000006] RAX: ffffffffffffffda RBX: 00007fbcc402d820 RCX: 00007fbff90a7317 [ +0.000003] RDX: 00007fbe301fe690 RSI: 00000000c0184b18 RDI: 0000000000000004 [ +0.000003] RBP: 00007fbe301fe690 R08: 0000000000000000 R09: 00007fbcc402d880 [ +0.000003] R10: 0000000002001000 R11: 0000000000000246 R12: 00000000c0184b18 [ +0.000003] R13: 0000000000000004 R14: 00007fbf689593a0 R15: 00007fbcc402d820
Cc: Christian König christian.koenig@amd.com Cc: Felix Kuehling Felix.Kuehling@amd.com Cc: Alex Deucher Alexander.Deucher@amd.com
Reviewed-by: Christian König christian.koenig@amd.com Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Rajneesh Bhardwaj rajneesh.bhardwaj@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 01a78c786536..d62b770cc9dc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1343,7 +1343,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo) !(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) return;
- dma_resv_lock(bo->base.resv, NULL); + if (WARN_ON_ONCE(!dma_resv_trylock(bo->base.resv))) + return;
r = amdgpu_fill_buffer(abo, AMDGPU_POISON, bo->base.resv, &fence); if (!WARN_ON(r)) {
From: Mahesh Rajashekhara mahesh.rajashekhara@microchip.com
[ Upstream commit 3ada501d602abf02353445c03bb3258146445d90 ]
Avoid dropping into shell if the controller is in locked up state.
Driver issues SIS soft reset to bring back the controller to SIS mode while OS boots into kdump mode.
If the controller is in lockup state, SIS soft reset does not work.
Since the controller lockup code has not been cleared, driver considers the firmware is no longer up and running. Driver returns back an error code to OS and the kdump fails.
Link: https://lore.kernel.org/r/164375212337.440833.11955356190354940369.stgit@bru... Reviewed-by: Kevin Barnett kevin.barnett@microchip.com Reviewed-by: Scott Benesh scott.benesh@microchip.com Reviewed-by: Scott Teel scott.teel@microchip.com Signed-off-by: Mahesh Rajashekhara mahesh.rajashekhara@microchip.com Signed-off-by: Don Brace don.brace@microchip.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/smartpqi/smartpqi_init.c | 39 ++++++++++++++++----------- 1 file changed, 23 insertions(+), 16 deletions(-)
diff --git a/drivers/scsi/smartpqi/smartpqi_init.c b/drivers/scsi/smartpqi/smartpqi_init.c index a5453f5e87c3..2e690d8a3444 100644 --- a/drivers/scsi/smartpqi/smartpqi_init.c +++ b/drivers/scsi/smartpqi/smartpqi_init.c @@ -7653,6 +7653,21 @@ static int pqi_force_sis_mode(struct pqi_ctrl_info *ctrl_info) return pqi_revert_to_sis_mode(ctrl_info); }
+static void pqi_perform_lockup_action(void) +{ + switch (pqi_lockup_action) { + case PANIC: + panic("FATAL: Smart Family Controller lockup detected"); + break; + case REBOOT: + emergency_restart(); + break; + case NONE: + default: + break; + } +} + static int pqi_ctrl_init(struct pqi_ctrl_info *ctrl_info) { int rc; @@ -7677,8 +7692,15 @@ static int pqi_ctrl_init(struct pqi_ctrl_info *ctrl_info) * commands. */ rc = sis_wait_for_ctrl_ready(ctrl_info); - if (rc) + if (rc) { + if (reset_devices) { + dev_err(&ctrl_info->pci_dev->dev, + "kdump init failed with error %d\n", rc); + pqi_lockup_action = REBOOT; + pqi_perform_lockup_action(); + } return rc; + }
/* * Get the controller properties. This allows us to determine @@ -8402,21 +8424,6 @@ static int pqi_ofa_ctrl_restart(struct pqi_ctrl_info *ctrl_info, unsigned int de return pqi_ctrl_init_resume(ctrl_info); }
-static void pqi_perform_lockup_action(void) -{ - switch (pqi_lockup_action) { - case PANIC: - panic("FATAL: Smart Family Controller lockup detected"); - break; - case REBOOT: - emergency_restart(); - break; - case NONE: - default: - break; - } -} - static struct pqi_raid_error_info pqi_ctrl_offline_raid_error_info = { .data_out_result = PQI_DATA_IN_OUT_HARDWARE_ERROR, .status = SAM_STAT_CHECK_CONDITION,
From: Pali Rohár pali@kernel.org
[ Upstream commit b0b0b8b897f8e12b2368e868bd7cdc5742d5c5a9 ]
Aardvark hardware supports Multi-MSI and MSI_FLAG_MULTI_PCI_MSI is already set for the MSI chip. But when allocating MSI interrupt numbers for Multi-MSI, the numbers need to be properly aligned, otherwise endpoint devices send MSI interrupt with incorrect numbers.
Fix this issue by using function bitmap_find_free_region() instead of bitmap_find_next_zero_area().
To ensure that aligned MSI interrupt numbers are used by endpoint devices, we cannot use Linux virtual irq numbers (as they are random and not properly aligned). Instead we need to use the aligned hwirq numbers.
This change fixes receiving MSI interrupts on Armada 3720 boards and allows using NVMe disks which use Multi-MSI feature with 3 interrupts.
Without this NVMe disks freeze booting as linux nvme-core.c is waiting 60s for an interrupt.
Link: https://lore.kernel.org/r/20220110015018.26359-4-kabel@kernel.org Signed-off-by: Pali Rohár pali@kernel.org Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/controller/pci-aardvark.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c index a924564fdbbc..6277b3f3031a 100644 --- a/drivers/pci/controller/pci-aardvark.c +++ b/drivers/pci/controller/pci-aardvark.c @@ -1179,7 +1179,7 @@ static void advk_msi_irq_compose_msi_msg(struct irq_data *data,
msg->address_lo = lower_32_bits(msi_msg); msg->address_hi = upper_32_bits(msi_msg); - msg->data = data->irq; + msg->data = data->hwirq; }
static int advk_msi_set_affinity(struct irq_data *irq_data, @@ -1196,15 +1196,11 @@ static int advk_msi_irq_domain_alloc(struct irq_domain *domain, int hwirq, i;
mutex_lock(&pcie->msi_used_lock); - hwirq = bitmap_find_next_zero_area(pcie->msi_used, MSI_IRQ_NUM, - 0, nr_irqs, 0); - if (hwirq >= MSI_IRQ_NUM) { - mutex_unlock(&pcie->msi_used_lock); - return -ENOSPC; - } - - bitmap_set(pcie->msi_used, hwirq, nr_irqs); + hwirq = bitmap_find_free_region(pcie->msi_used, MSI_IRQ_NUM, + order_base_2(nr_irqs)); mutex_unlock(&pcie->msi_used_lock); + if (hwirq < 0) + return -ENOSPC;
for (i = 0; i < nr_irqs; i++) irq_domain_set_info(domain, virq + i, hwirq + i, @@ -1222,7 +1218,7 @@ static void advk_msi_irq_domain_free(struct irq_domain *domain, struct advk_pcie *pcie = domain->host_data;
mutex_lock(&pcie->msi_used_lock); - bitmap_clear(pcie->msi_used, d->hwirq, nr_irqs); + bitmap_release_region(pcie->msi_used, d->hwirq, order_base_2(nr_irqs)); mutex_unlock(&pcie->msi_used_lock); }
From: Zhou Guanghui zhouguanghui1@huawei.com
[ Upstream commit 30de2b541af98179780054836b48825fcfba4408 ]
During event processing, events are read from the event queue one by one until the queue is empty.If the master device continuously requests address access at the same time and the SMMU generates events, the cyclic processing of the event takes a long time and softlockup warnings may be reported.
arm-smmu-v3 arm-smmu-v3.34.auto: event 0x0a received: arm-smmu-v3 arm-smmu-v3.34.auto: 0x00007f220000280a arm-smmu-v3 arm-smmu-v3.34.auto: 0x000010000000007e arm-smmu-v3 arm-smmu-v3.34.auto: 0x00000000034e8670 watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [irq/268-arm-smm:247] Call trace: _dev_info+0x7c/0xa0 arm_smmu_evtq_thread+0x1c0/0x230 irq_thread_fn+0x30/0x80 irq_thread+0x128/0x210 kthread+0x134/0x138 ret_from_fork+0x10/0x1c Kernel panic - not syncing: softlockup: hung tasks
Fix this by calling cond_resched() after the event information is printed.
Signed-off-by: Zhou Guanghui zhouguanghui1@huawei.com Link: https://lore.kernel.org/r/20220119070754.26528-1-zhouguanghui1@huawei.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index a388e318f86e..430315135cff 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1552,6 +1552,7 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev) dev_info(smmu->dev, "\t0x%016llx\n", (unsigned long long)evt[i]);
+ cond_resched(); }
/*
From: Neal Liu neal_liu@aspeedtech.com
[ Upstream commit c3c9cee592828528fd228b01d312c7526c584a42 ]
Enable Aspeed quirks in commit 7f2d73788d90 ("usb: ehci: handshake CMD_RUN instead of STS_HALT") to support Aspeed ehci-pci device.
Acked-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Neal Liu neal_liu@aspeedtech.com Link: https://lore.kernel.org/r/20220208101657.76459-1-neal_liu@aspeedtech.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/host/ehci-pci.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/drivers/usb/host/ehci-pci.c b/drivers/usb/host/ehci-pci.c index e87cf3a00fa4..638f03b89739 100644 --- a/drivers/usb/host/ehci-pci.c +++ b/drivers/usb/host/ehci-pci.c @@ -21,6 +21,9 @@ static const char hcd_name[] = "ehci-pci"; /* defined here to avoid adding to pci_ids.h for single instance use */ #define PCI_DEVICE_ID_INTEL_CE4100_USB 0x2e70
+#define PCI_VENDOR_ID_ASPEED 0x1a03 +#define PCI_DEVICE_ID_ASPEED_EHCI 0x2603 + /*-------------------------------------------------------------------------*/ #define PCI_DEVICE_ID_INTEL_QUARK_X1000_SOC 0x0939 static inline bool is_intel_quark_x1000(struct pci_dev *pdev) @@ -222,6 +225,12 @@ static int ehci_pci_setup(struct usb_hcd *hcd) ehci->has_synopsys_hc_bug = 1; } break; + case PCI_VENDOR_ID_ASPEED: + if (pdev->device == PCI_DEVICE_ID_ASPEED_EHCI) { + ehci_info(ehci, "applying Aspeed HC workaround\n"); + ehci->is_aspeed = 1; + } + break; }
/* optional debug port, normally in the first BAR */
From: Hou Zhiqiang Zhiqiang.Hou@nxp.com
[ Upstream commit 829cc0e2ea2d61fb6c54bc3f8a17f86c56e11864 ]
The copy test uses the memcpy() to copy data between IO memory spaces. This can trigger an alignment fault error (pasted the error logs below) because memcpy() may use unaligned accesses on a mapped memory that is just IO, which does not support unaligned memory accesses.
Fix it by using the correct memcpy API to copy from/to IO memory.
Alignment fault error logs: Unable to handle kernel paging request at virtual address ffff8000101cd3c1 Mem abort info: ESR = 0x96000021 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x21: alignment fault Data abort info: ISV = 0, ISS = 0x00000021 CM = 0, WnR = 0 swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000081773000 [ffff8000101cd3c1] pgd=1000000082410003, p4d=1000000082410003, pud=1000000082411003, pmd=1000000082412003, pte=0068004000001f13 Internal error: Oops: 96000021 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 6 Comm: kworker/0:0H Not tainted 5.15.0-rc1-next-20210914-dirty #2 Hardware name: LS1012A RDB Board (DT) Workqueue: kpcitest pci_epf_test_cmd_handler pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __memcpy+0x168/0x230 lr : pci_epf_test_cmd_handler+0x6f0/0xa68 sp : ffff80001003bce0 x29: ffff80001003bce0 x28: ffff800010135000 x27: ffff8000101e5000 x26: ffff8000101cd000 x25: ffff6cda941cf6c8 x24: 0000000000000000 x23: ffff6cda863f2000 x22: ffff6cda9096c800 x21: ffff800010135000 x20: ffff6cda941cf680 x19: ffffaf39fd999000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: ffffaf39fd2b6000 x14: 0000000000000000 x13: 15f5c8fa2f984d57 x12: 604d132b60275454 x11: 065cee5e5fb428b6 x10: aae662eb17d0cf3e x9 : 1d97c9a1b4ddef37 x8 : 7541b65edebf928c x7 : e71937c4fc595de0 x6 : b8a0e09562430d1c x5 : ffff8000101e5401 x4 : ffff8000101cd401 x3 : ffff8000101e5380 x2 : fffffffffffffff1 x1 : ffff8000101cd3c0 x0 : ffff8000101e5000 Call trace: __memcpy+0x168/0x230 process_one_work+0x1ec/0x370 worker_thread+0x44/0x478 kthread+0x154/0x160 ret_from_fork+0x10/0x20 Code: a984346c a9c4342c f1010042 54fffee8 (a97c3c8e) ---[ end trace 568c28c7b6336335 ]---
Link: https://lore.kernel.org/r/20211217094708.28678-1-Zhiqiang.Hou@nxp.com Signed-off-by: Hou Zhiqiang Zhiqiang.Hou@nxp.com Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Reviewed-by: Kishon Vijay Abraham I kishon@ti.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/endpoint/functions/pci-epf-test.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/endpoint/functions/pci-epf-test.c b/drivers/pci/endpoint/functions/pci-epf-test.c index 90d84d3bc868..c7e45633beaf 100644 --- a/drivers/pci/endpoint/functions/pci-epf-test.c +++ b/drivers/pci/endpoint/functions/pci-epf-test.c @@ -285,7 +285,17 @@ static int pci_epf_test_copy(struct pci_epf_test *epf_test) if (ret) dev_err(dev, "Data transfer failed\n"); } else { - memcpy(dst_addr, src_addr, reg->size); + void *buf; + + buf = kzalloc(reg->size, GFP_KERNEL); + if (!buf) { + ret = -ENOMEM; + goto err_map_addr; + } + + memcpy_fromio(buf, src_addr, reg->size); + memcpy_toio(dst_addr, buf, reg->size); + kfree(buf); } ktime_get_ts64(&end); pci_epf_test_print_rate("COPY", reg->size, &start, &end, use_dma);
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
[ Upstream commit 4f9bf2a2f5aacf988e6d5e56b961ba45c5a25248 ]
Commit 9652dc2eb9e40 ("tcp: relax listening_hash operations")
removed the need to disable bottom half while acquiring listening_hash.lock. There are still two callers left which disable bottom half before the lock is acquired.
On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts as a lock to ensure that resources, that are protected by disabling bottom halves, remain protected. This leads to a circular locking dependency if the lock acquired with disabled bottom halves is also acquired with enabled bottom halves followed by disabling bottom halves. This is the reverse locking order. It has been observed with inet_listen_hashbucket::lock:
local_bh_disable() + spin_lock(&ilb->lock): inet_listen() inet_csk_listen_start() sk->sk_prot->hash() := inet_hash() local_bh_disable() __inet_hash() spin_lock(&ilb->lock); acquire(&ilb->lock);
Reverse order: spin_lock(&ilb2->lock) + local_bh_disable(): tcp_seq_next() listening_get_next() spin_lock(&ilb2->lock); acquire(&ilb2->lock);
tcp4_seq_show() get_tcp4_sock() sock_i_ino() read_lock_bh(&sk->sk_callback_lock); acquire(softirq_ctrl) // <---- whoops acquire(&sk->sk_callback_lock)
Drop local_bh_disable() around __inet_hash() which acquires listening_hash->lock. Split inet_unhash() and acquire the listen_hashbucket lock without disabling bottom halves; the inet_ehash lock with disabled bottom halves.
Reported-by: Mike Galbraith efault@gmx.de Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Link: https://lkml.kernel.org/r/12d6f9879a97cd56c09fb53dee343cbb14f7f1f7.camel@gmx... Link: https://lkml.kernel.org/r/X9CheYjuXWc75Spa@hirez.programming.kicks-ass.net Link: https://lore.kernel.org/r/YgQOebeZ10eNx1W6@linutronix.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/inet_hashtables.c | 53 ++++++++++++++++++++++--------------- net/ipv6/inet6_hashtables.c | 5 +--- 2 files changed, 33 insertions(+), 25 deletions(-)
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 75737267746f..7bd1e10086f0 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -637,7 +637,9 @@ int __inet_hash(struct sock *sk, struct sock *osk) int err = 0;
if (sk->sk_state != TCP_LISTEN) { + local_bh_disable(); inet_ehash_nolisten(sk, osk, NULL); + local_bh_enable(); return 0; } WARN_ON(!sk_unhashed(sk)); @@ -669,45 +671,54 @@ int inet_hash(struct sock *sk) { int err = 0;
- if (sk->sk_state != TCP_CLOSE) { - local_bh_disable(); + if (sk->sk_state != TCP_CLOSE) err = __inet_hash(sk, NULL); - local_bh_enable(); - }
return err; } EXPORT_SYMBOL_GPL(inet_hash);
-void inet_unhash(struct sock *sk) +static void __inet_unhash(struct sock *sk, struct inet_listen_hashbucket *ilb) { - struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo; - struct inet_listen_hashbucket *ilb = NULL; - spinlock_t *lock; - if (sk_unhashed(sk)) return;
- if (sk->sk_state == TCP_LISTEN) { - ilb = &hashinfo->listening_hash[inet_sk_listen_hashfn(sk)]; - lock = &ilb->lock; - } else { - lock = inet_ehash_lockp(hashinfo, sk->sk_hash); - } - spin_lock_bh(lock); - if (sk_unhashed(sk)) - goto unlock; - if (rcu_access_pointer(sk->sk_reuseport_cb)) reuseport_stop_listen_sock(sk); if (ilb) { + struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo; + inet_unhash2(hashinfo, sk); ilb->count--; } __sk_nulls_del_node_init_rcu(sk); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); -unlock: - spin_unlock_bh(lock); +} + +void inet_unhash(struct sock *sk) +{ + struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo; + + if (sk_unhashed(sk)) + return; + + if (sk->sk_state == TCP_LISTEN) { + struct inet_listen_hashbucket *ilb; + + ilb = &hashinfo->listening_hash[inet_sk_listen_hashfn(sk)]; + /* Don't disable bottom halves while acquiring the lock to + * avoid circular locking dependency on PREEMPT_RT. + */ + spin_lock(&ilb->lock); + __inet_unhash(sk, ilb); + spin_unlock(&ilb->lock); + } else { + spinlock_t *lock = inet_ehash_lockp(hashinfo, sk->sk_hash); + + spin_lock_bh(lock); + __inet_unhash(sk, NULL); + spin_unlock_bh(lock); + } } EXPORT_SYMBOL_GPL(inet_unhash);
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index 67c9114835c8..0a2e7f228391 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -333,11 +333,8 @@ int inet6_hash(struct sock *sk) { int err = 0;
- if (sk->sk_state != TCP_CLOSE) { - local_bh_disable(); + if (sk->sk_state != TCP_CLOSE) err = __inet_hash(sk, NULL); - local_bh_enable(); - }
return err; }
From: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org
[ Upstream commit 9f72d4757cbe4d1ed669192f6d23817c9e437c4b ]
The Qualcomm PCI bridge device (Device ID 0x0110) found in chipsets such as SM8450 does not set the Command Completed bit unless writes to the Slot Command register change "Control" bits.
This results in timeouts like below:
pcieport 0001:00:00.0: pciehp: Timeout on hotplug command 0x03c0 (issued 2020 msec ago)
Add the device to the Command Completed quirk to mark commands "completed" immediately unless they change the "Control" bits.
Link: https://lore.kernel.org/r/20220210145003.135907-1-manivannan.sadhasivam@lina... Signed-off-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Signed-off-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/hotplug/pciehp_hpc.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index c0985316649d..8bedbc77fe95 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -1060,6 +1060,8 @@ static void quirk_cmd_compl(struct pci_dev *pdev) } DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI, 8, quirk_cmd_compl); +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_QCOM, 0x0110, + PCI_CLASS_BRIDGE_PCI, 8, quirk_cmd_compl); DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_QCOM, 0x0400, PCI_CLASS_BRIDGE_PCI, 8, quirk_cmd_compl); DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_QCOM, 0x0401,
From: Sreekanth Reddy sreekanth.reddy@broadcom.com
[ Upstream commit 9992246127246a27cc7184f05cce6f62ac48f84e ]
The driver is missing to set the residual size while completing an I/O. Ensure proper data transfer size is reported to the kernel on I/O completion based on the transfer length reported by the firmware.
Link: https://lore.kernel.org/r/20220210095817.22828-7-sreekanth.reddy@broadcom.co... Signed-off-by: Sreekanth Reddy sreekanth.reddy@broadcom.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/mpi3mr/mpi3mr_os.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/scsi/mpi3mr/mpi3mr_os.c b/drivers/scsi/mpi3mr/mpi3mr_os.c index 3cae8803383b..b2c650542bac 100644 --- a/drivers/scsi/mpi3mr/mpi3mr_os.c +++ b/drivers/scsi/mpi3mr/mpi3mr_os.c @@ -2204,6 +2204,8 @@ void mpi3mr_process_op_reply_desc(struct mpi3mr_ioc *mrioc, scmd->result = DID_OK << 16; goto out_success; } + + scsi_set_resid(scmd, scsi_bufflen(scmd) - xfer_count); if (ioc_status == MPI3_IOCSTATUS_SCSI_DATA_UNDERRUN && xfer_count == 0 && (scsi_status == MPI3_SCSI_STATUS_BUSY || scsi_status == MPI3_SCSI_STATUS_RESERVATION_CONFLICT ||
From: Sreekanth Reddy sreekanth.reddy@broadcom.com
[ Upstream commit d44b5fefb22e139408ae12b864da1ecb9ad9d1d2 ]
Fix memory leaks related to operational reply queue's memory segments which are not getting freed while unloading the driver.
Link: https://lore.kernel.org/r/20220210095817.22828-9-sreekanth.reddy@broadcom.co... Signed-off-by: Sreekanth Reddy sreekanth.reddy@broadcom.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/mpi3mr/mpi3mr_fw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/mpi3mr/mpi3mr_fw.c b/drivers/scsi/mpi3mr/mpi3mr_fw.c index 5af36c54cb59..3ef6b6edef46 100644 --- a/drivers/scsi/mpi3mr/mpi3mr_fw.c +++ b/drivers/scsi/mpi3mr/mpi3mr_fw.c @@ -1275,7 +1275,7 @@ static void mpi3mr_free_op_req_q_segments(struct mpi3mr_ioc *mrioc, u16 q_idx) MPI3MR_MAX_SEG_LIST_SIZE, mrioc->req_qinfo[q_idx].q_segment_list, mrioc->req_qinfo[q_idx].q_segment_list_dma); - mrioc->op_reply_qinfo[q_idx].q_segment_list = NULL; + mrioc->req_qinfo[q_idx].q_segment_list = NULL; } } else size = mrioc->req_qinfo[q_idx].segment_qd *
From: Christophe Leroy christophe.leroy@csgroup.eu
[ Upstream commit a4c182ecf33584b9b2d1aa9dad073014a504c01f ]
Commit 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines") included a spin_lock() to change_page_attr() in order to safely perform the three step operations. But then commit 9f7853d7609d ("powerpc/mm: Fix set_memory_*() against concurrent accesses") modify it to use pte_update() and do the operation safely against concurrent access.
In the meantime, Maxime reported some spinlock recursion.
[ 15.351649] BUG: spinlock recursion on CPU#0, kworker/0:2/217 [ 15.357540] lock: init_mm+0x3c/0x420, .magic: dead4ead, .owner: kworker/0:2/217, .owner_cpu: 0 [ 15.366563] CPU: 0 PID: 217 Comm: kworker/0:2 Not tainted 5.15.0+ #523 [ 15.373350] Workqueue: events do_free_init [ 15.377615] Call Trace: [ 15.380232] [e4105ac0] [800946a4] do_raw_spin_lock+0xf8/0x120 (unreliable) [ 15.387340] [e4105ae0] [8001f4ec] change_page_attr+0x40/0x1d4 [ 15.393413] [e4105b10] [801424e0] __apply_to_page_range+0x164/0x310 [ 15.400009] [e4105b60] [80169620] free_pcp_prepare+0x1e4/0x4a0 [ 15.406045] [e4105ba0] [8016c5a0] free_unref_page+0x40/0x2b8 [ 15.411979] [e4105be0] [8018724c] kasan_depopulate_vmalloc_pte+0x6c/0x94 [ 15.418989] [e4105c00] [801424e0] __apply_to_page_range+0x164/0x310 [ 15.425451] [e4105c50] [80187834] kasan_release_vmalloc+0xbc/0x134 [ 15.431898] [e4105c70] [8015f7a8] __purge_vmap_area_lazy+0x4e4/0xdd8 [ 15.438560] [e4105d30] [80160d10] _vm_unmap_aliases.part.0+0x17c/0x24c [ 15.445283] [e4105d60] [801642d0] __vunmap+0x2f0/0x5c8 [ 15.450684] [e4105db0] [800e32d0] do_free_init+0x68/0x94 [ 15.456181] [e4105dd0] [8005d094] process_one_work+0x4bc/0x7b8 [ 15.462283] [e4105e90] [8005d614] worker_thread+0x284/0x6e8 [ 15.468227] [e4105f00] [8006aaec] kthread+0x1f0/0x210 [ 15.473489] [e4105f40] [80017148] ret_from_kernel_thread+0x14/0x1c
Remove the read / modify / write sequence to make the operation atomic and remove the spin_lock() in change_page_attr().
To do the operation atomically, we can't use pte modification helpers anymore. Because all platforms have different combination of bits, it is not easy to use those bits directly. But all have the _PAGE_KERNEL_{RO/ROX/RW/RWX} set of flags. All we need it to compare two sets to know which bits are set or cleared.
For instance, by comparing _PAGE_KERNEL_ROX and _PAGE_KERNEL_RO you know which bit gets cleared and which bit get set when changing exec permission.
Reported-by: Maxime Bizon mbizon@freebox.fr Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/all/20211212112152.GA27070@sakura/ Link: https://lore.kernel.org/r/43c3c76a1175ae6dc1a3d3b5c3f7ecb48f683eea.164034401... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/mm/pageattr.c | 32 +++++++++++++------------------- 1 file changed, 13 insertions(+), 19 deletions(-)
diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c index 3bb9d168e3b3..85753e32a4de 100644 --- a/arch/powerpc/mm/pageattr.c +++ b/arch/powerpc/mm/pageattr.c @@ -15,12 +15,14 @@ #include <asm/pgtable.h>
+static pte_basic_t pte_update_delta(pte_t *ptep, unsigned long addr, + unsigned long old, unsigned long new) +{ + return pte_update(&init_mm, addr, ptep, old & ~new, new & ~old, 0); +} + /* - * Updates the attributes of a page in three steps: - * - * 1. take the page_table_lock - * 2. install the new entry with the updated attributes - * 3. flush the TLB + * Updates the attributes of a page atomically. * * This sequence is safe against concurrent updates, and also allows updating the * attributes of a page currently being executed or accessed. @@ -28,25 +30,21 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, void *data) { long action = (long)data; - pte_t pte;
- spin_lock(&init_mm.page_table_lock); - - pte = ptep_get(ptep); - - /* modify the PTE bits as desired, then apply */ + /* modify the PTE bits as desired */ switch (action) { case SET_MEMORY_RO: - pte = pte_wrprotect(pte); + /* Don't clear DIRTY bit */ + pte_update_delta(ptep, addr, _PAGE_KERNEL_RW & ~_PAGE_DIRTY, _PAGE_KERNEL_RO); break; case SET_MEMORY_RW: - pte = pte_mkwrite(pte_mkdirty(pte)); + pte_update_delta(ptep, addr, _PAGE_KERNEL_RO, _PAGE_KERNEL_RW); break; case SET_MEMORY_NX: - pte = pte_exprotect(pte); + pte_update_delta(ptep, addr, _PAGE_KERNEL_ROX, _PAGE_KERNEL_RO); break; case SET_MEMORY_X: - pte = pte_mkexec(pte); + pte_update_delta(ptep, addr, _PAGE_KERNEL_RO, _PAGE_KERNEL_ROX); break; case SET_MEMORY_NP: pte_update(&init_mm, addr, ptep, _PAGE_PRESENT, 0, 0); @@ -59,16 +57,12 @@ static int change_page_attr(pte_t *ptep, unsigned long addr, void *data) break; }
- pte_update(&init_mm, addr, ptep, ~0UL, pte_val(pte), 0); - /* See ptesync comment in radix__set_pte_at() */ if (radix_enabled()) asm volatile("ptesync": : :"memory");
flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
- spin_unlock(&init_mm.page_table_lock); - return 0; }
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit 5ac121b81b4051e7fc83d5b3456a5e499d5bd147 ]
The AXP288's recommended and factory default Vhold value (minimum input voltage below which the input current draw will be reduced) is 4.4V. This lines up with other charger IC's such as the TI bq2419x/bq2429x series which use 4.36V or 4.44V.
For some reason some BIOS-es initialize Vhold to 4.6V or even 4.7V which combined with the typical voltage drop over typically low wire gauge micro-USB cables leads to the input-current getting capped below 1A (with a 2A capable dedicated charger) based on Vhold.
This leads to slow charging, or even to the device slowly discharging if the device is in heavy use.
As the Linux AXP288 drivers use the builtin BC1.2 charger detection and send the input-current-limit according to the detected charger there really is no reason not to use the recommended 4.4V Vhold.
Set Vhold to 4.4V to fix the slow charging issue on various devices.
There is one exception, the special-case of the HP X2 2-in-1s which combine this BC1.2 capable PMIC with a Type-C port and a 5V/3A factory provided charger with a Type-C plug which does not do BC1.2. These have their input-current-limit hardcoded to 3A (like under Windows) and use a higher Vhold on purpose to limit the current when used with other chargers. To avoid touching Vhold on these HP X2 laptops the code setting Vhold is added to an else branch of the if checking for these models.
Note this also fixes the sofar unused VBUS_ISPOUT_VHOLD_SET_MASK define, which was wrong.
Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/power/supply/axp288_charger.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/power/supply/axp288_charger.c b/drivers/power/supply/axp288_charger.c index b9553be9bed5..fb9db7f43895 100644 --- a/drivers/power/supply/axp288_charger.c +++ b/drivers/power/supply/axp288_charger.c @@ -41,11 +41,11 @@ #define VBUS_ISPOUT_CUR_LIM_1500MA 0x1 /* 1500mA */ #define VBUS_ISPOUT_CUR_LIM_2000MA 0x2 /* 2000mA */ #define VBUS_ISPOUT_CUR_NO_LIM 0x3 /* 2500mA */ -#define VBUS_ISPOUT_VHOLD_SET_MASK 0x31 +#define VBUS_ISPOUT_VHOLD_SET_MASK 0x38 #define VBUS_ISPOUT_VHOLD_SET_BIT_POS 0x3 #define VBUS_ISPOUT_VHOLD_SET_OFFSET 4000 /* 4000mV */ #define VBUS_ISPOUT_VHOLD_SET_LSB_RES 100 /* 100mV */ -#define VBUS_ISPOUT_VHOLD_SET_4300MV 0x3 /* 4300mV */ +#define VBUS_ISPOUT_VHOLD_SET_4400MV 0x4 /* 4400mV */ #define VBUS_ISPOUT_VBUS_PATH_DIS BIT(7)
#define CHRG_CCCV_CC_MASK 0xf /* 4 bits */ @@ -744,6 +744,16 @@ static int charger_init_hw_regs(struct axp288_chrg_info *info) ret = axp288_charger_vbus_path_select(info, true); if (ret < 0) return ret; + } else { + /* Set Vhold to the factory default / recommended 4.4V */ + val = VBUS_ISPOUT_VHOLD_SET_4400MV << VBUS_ISPOUT_VHOLD_SET_BIT_POS; + ret = regmap_update_bits(info->regmap, AXP20X_VBUS_IPSOUT_MGMT, + VBUS_ISPOUT_VHOLD_SET_MASK, val); + if (ret < 0) { + dev_err(&info->pdev->dev, "register(%x) write error(%d)\n", + AXP20X_VBUS_IPSOUT_MGMT, ret); + return ret; + } }
/* Read current charge voltage and current limit */
From: Maxim Mikityanskiy maximmi@nvidia.com
[ Upstream commit d08c6e2a4d0308a7922d7ef3b1b3af45d4096aad ]
Normally, the queues are disabled when the channels are deactivated, and enabled when the channels are activated. However, on register, the channels are not active, but the queues are enabled by default. This change fixes it, preventing mlx5e_xmit from running when the channels are deactivated in the beginning.
Signed-off-by: Maxim Mikityanskiy maximmi@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index f075bb8ccd00..01301bee420c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -4914,6 +4914,7 @@ mlx5e_create_netdev(struct mlx5_core_dev *mdev, const struct mlx5e_profile *prof }
netif_carrier_off(netdev); + netif_tx_disable(netdev); dev_net_set(netdev, mlx5_core_net(mdev));
return netdev;
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit e285cb403994419e997749c9a52b9370884ae0c8 ]
The quirk handling may need to set some different properties which means using a different swnode, move the setting of the swnode to inside dwc3_pci_quirks() so that the quirk handling can choose a different swnode.
Signed-off-by: Hans de Goede hdegoede@redhat.com Link: https://lore.kernel.org/r/20220213130524.18748-4-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/dwc3/dwc3-pci.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c index 06d0e88ec8af..4d9608cc55f7 100644 --- a/drivers/usb/dwc3/dwc3-pci.c +++ b/drivers/usb/dwc3/dwc3-pci.c @@ -185,7 +185,8 @@ static const struct software_node dwc3_pci_amd_mr_swnode = { .properties = dwc3_pci_mr_properties, };
-static int dwc3_pci_quirks(struct dwc3_pci *dwc) +static int dwc3_pci_quirks(struct dwc3_pci *dwc, + const struct software_node *swnode) { struct pci_dev *pdev = dwc->pci;
@@ -242,7 +243,7 @@ static int dwc3_pci_quirks(struct dwc3_pci *dwc) } }
- return 0; + return device_add_software_node(&dwc->dwc3->dev, swnode); }
#ifdef CONFIG_PM @@ -307,11 +308,7 @@ static int dwc3_pci_probe(struct pci_dev *pci, const struct pci_device_id *id) dwc->dwc3->dev.parent = dev; ACPI_COMPANION_SET(&dwc->dwc3->dev, ACPI_COMPANION(dev));
- ret = device_add_software_node(&dwc->dwc3->dev, (void *)id->driver_data); - if (ret < 0) - goto err; - - ret = dwc3_pci_quirks(dwc); + ret = dwc3_pci_quirks(dwc, (void *)id->driver_data); if (ret) goto err;
From: Ilan Peer ilan.peer@intel.com
[ Upstream commit d8d4dd26b9e0469baf5017f0544d852fd4e3fb6d ]
Currently, fragmented EBS was set for a channel only if the 'hb_type' was set to fragmented or balanced scan. However, 'hb_type' is set only in case of CDB, and thus fragmented EBS is never set for a channel for non-CDB devices. Fix it.
Signed-off-by: Ilan Peer ilan.peer@intel.com Signed-off-by: Luca Coelho luciano.coelho@intel.com Link: https://lore.kernel.org/r/iwlwifi.20220204122220.a6165ac9b9d5.I654eafa62fd64... Signed-off-by: Luca Coelho luciano.coelho@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c index 5461bf399959..65e382756de6 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c @@ -1890,7 +1890,10 @@ static u8 iwl_mvm_scan_umac_chan_flags_v2(struct iwl_mvm *mvm, IWL_SCAN_CHANNEL_FLAG_CACHE_ADD;
/* set fragmented ebs for fragmented scan on HB channels */ - if (iwl_mvm_is_scan_fragmented(params->hb_type)) + if ((!iwl_mvm_is_cdb_supported(mvm) && + iwl_mvm_is_scan_fragmented(params->type)) || + (iwl_mvm_is_cdb_supported(mvm) && + iwl_mvm_is_scan_fragmented(params->hb_type))) flags |= IWL_SCAN_CHANNEL_FLAG_EBS_FRAG;
return flags;
From: Miri Korenblit miriam.rachel.korenblit@intel.com
[ Upstream commit e04135c07755d001b5cde61048c69a7cc84bb94b ]
During disassociation we're decreasing the phy's ref count. If the ref count becomes 0, we're configuring the phy ctxt to the default channel (the lowest channel which the device can operate on). Currently we're not checking whether the the default channel is enabled or not. Fix it by configuring the phy ctxt to the lowest channel which is enabled.
Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Signed-off-by: Luca Coelho luciano.coelho@intel.com Link: https://lore.kernel.org/r/iwlwifi.20220210181930.03f281b6a6bc.I5b63d43ec4199... Signed-off-by: Luca Coelho luciano.coelho@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/wireless/intel/iwlwifi/mvm/phy-ctxt.c | 31 +++++++++++++------ 1 file changed, 22 insertions(+), 9 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/phy-ctxt.c b/drivers/net/wireless/intel/iwlwifi/mvm/phy-ctxt.c index 035336a9e755..6d82725cb87d 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/phy-ctxt.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/phy-ctxt.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause /* - * Copyright (C) 2012-2014, 2018-2021 Intel Corporation + * Copyright (C) 2012-2014, 2018-2022 Intel Corporation * Copyright (C) 2013-2014 Intel Mobile Communications GmbH * Copyright (C) 2017 Intel Deutschland GmbH */ @@ -295,18 +295,31 @@ void iwl_mvm_phy_ctxt_unref(struct iwl_mvm *mvm, struct iwl_mvm_phy_ctxt *ctxt) * otherwise we might not be able to reuse this phy. */ if (ctxt->ref == 0) { - struct ieee80211_channel *chan; + struct ieee80211_channel *chan = NULL; struct cfg80211_chan_def chandef; - struct ieee80211_supported_band *sband = NULL; - enum nl80211_band band = NL80211_BAND_2GHZ; + struct ieee80211_supported_band *sband; + enum nl80211_band band; + int channel;
- while (!sband && band < NUM_NL80211_BANDS) - sband = mvm->hw->wiphy->bands[band++]; + for (band = NL80211_BAND_2GHZ; band < NUM_NL80211_BANDS; band++) { + sband = mvm->hw->wiphy->bands[band];
- if (WARN_ON(!sband)) - return; + if (!sband) + continue; + + for (channel = 0; channel < sband->n_channels; channel++) + if (!(sband->channels[channel].flags & + IEEE80211_CHAN_DISABLED)) { + chan = &sband->channels[channel]; + break; + }
- chan = &sband->channels[0]; + if (chan) + break; + } + + if (WARN_ON(!chan)) + return;
cfg80211_chandef_create(&chandef, chan, NL80211_CHAN_NO_HT); iwl_mvm_phy_ctxt_changed(mvm, ctxt, &chandef, 1, 1);
From: Daniel Thompson daniel.thompson@linaro.org
[ Upstream commit 24b176d8827d167ac3b379317f60c0985f6e95aa ]
Quoting the header comments, IRQF_ONESHOT is "Used by threaded interrupts which need to keep the irq line disabled until the threaded handler has been run.". When applied to an interrupt that doesn't request a threaded irq then IRQF_ONESHOT has a lesser known (undocumented?) side effect, which it to disable the forced threading of irqs (and for "normal" kernels it is a nop). In this case I can find no evidence that suppressing forced threading is intentional. Had it been intentional then a driver must adopt the raw_spinlock API in order to avoid deadlocks on PREEMPT_RT kernels (and avoid calling any kernel API that uses regular spinlocks).
Fix this by removing the spurious additional flag.
This change is required for my Snapdragon 7cx Gen2 tablet to boot-to-GUI with PREEMPT_RT enabled.
Signed-off-by: Daniel Thompson daniel.thompson@linaro.org Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Link: https://lore.kernel.org/r/20220201174734.196718-2-daniel.thompson@linaro.org Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/dsi/dsi_host.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/dsi/dsi_host.c b/drivers/gpu/drm/msm/dsi/dsi_host.c index dc85974c7897..eea679a52e86 100644 --- a/drivers/gpu/drm/msm/dsi/dsi_host.c +++ b/drivers/gpu/drm/msm/dsi/dsi_host.c @@ -1909,7 +1909,7 @@ int msm_dsi_host_init(struct msm_dsi *msm_dsi)
/* do not autoenable, will be enabled later */ ret = devm_request_irq(&pdev->dev, msm_host->irq, dsi_host_irq, - IRQF_TRIGGER_HIGH | IRQF_ONESHOT | IRQF_NO_AUTOEN, + IRQF_TRIGGER_HIGH | IRQF_NO_AUTOEN, "dsi_isr", msm_host); if (ret < 0) { dev_err(&pdev->dev, "failed to request IRQ%u: %d\n",
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit 0c51e12e218f20b7d976158fdc18019627326f7a ]
In case user space sends a packet destined to a broadcast address when a matching broadcast route is not configured, the kernel will create a unicast neighbour entry that will never be resolved [1].
When the broadcast route is configured, the unicast neighbour entry will not be invalidated and continue to linger, resulting in packets being dropped.
Solve this by invalidating unresolved neighbour entries for broadcast addresses after routes for these addresses are internally configured by the kernel. This allows the kernel to create a broadcast neighbour entry following the next route lookup.
Another possible solution that is more generic but also more complex is to have the ARP code register a listener to the FIB notification chain and invalidate matching neighbour entries upon the addition of broadcast routes.
It is also possible to wave off the issue as a user space problem, but it seems a bit excessive to expect user space to be that intimately familiar with the inner workings of the FIB/neighbour kernel code.
[1] https://lore.kernel.org/netdev/55a04a8f-56f3-f73c-2aea-2195923f09d1@huawei.c...
Reported-by: Wang Hai wanghai38@huawei.com Signed-off-by: Ido Schimmel idosch@nvidia.com Tested-by: Wang Hai wanghai38@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/arp.h | 1 + net/ipv4/arp.c | 9 +++++++-- net/ipv4/fib_frontend.c | 5 ++++- 3 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/include/net/arp.h b/include/net/arp.h index 4950191f6b2b..4a23a97195f3 100644 --- a/include/net/arp.h +++ b/include/net/arp.h @@ -71,6 +71,7 @@ void arp_send(int type, int ptype, __be32 dest_ip, const unsigned char *src_hw, const unsigned char *th); int arp_mc_map(__be32 addr, u8 *haddr, struct net_device *dev, int dir); void arp_ifdown(struct net_device *dev); +int arp_invalidate(struct net_device *dev, __be32 ip, bool force);
struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip, struct net_device *dev, __be32 src_ip, diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index 922dd73e5740..83a47998c4b1 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -1116,13 +1116,18 @@ static int arp_req_get(struct arpreq *r, struct net_device *dev) return err; }
-static int arp_invalidate(struct net_device *dev, __be32 ip) +int arp_invalidate(struct net_device *dev, __be32 ip, bool force) { struct neighbour *neigh = neigh_lookup(&arp_tbl, &ip, dev); int err = -ENXIO; struct neigh_table *tbl = &arp_tbl;
if (neigh) { + if ((neigh->nud_state & NUD_VALID) && !force) { + neigh_release(neigh); + return 0; + } + if (neigh->nud_state & ~NUD_NOARP) err = neigh_update(neigh, NULL, NUD_FAILED, NEIGH_UPDATE_F_OVERRIDE| @@ -1169,7 +1174,7 @@ static int arp_req_delete(struct net *net, struct arpreq *r, if (!dev) return -EINVAL; } - return arp_invalidate(dev, ip); + return arp_invalidate(dev, ip, true); }
/* diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 4d61ddd8a0ec..1eb7795edb9d 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -1112,9 +1112,11 @@ void fib_add_ifaddr(struct in_ifaddr *ifa) return;
/* Add broadcast address, if it is explicitly assigned. */ - if (ifa->ifa_broadcast && ifa->ifa_broadcast != htonl(0xFFFFFFFF)) + if (ifa->ifa_broadcast && ifa->ifa_broadcast != htonl(0xFFFFFFFF)) { fib_magic(RTM_NEWROUTE, RTN_BROADCAST, ifa->ifa_broadcast, 32, prim, 0); + arp_invalidate(dev, ifa->ifa_broadcast, false); + }
if (!ipv4_is_zeronet(prefix) && !(ifa->ifa_flags & IFA_F_SECONDARY) && (prefix != addr || ifa->ifa_prefixlen < 32)) { @@ -1128,6 +1130,7 @@ void fib_add_ifaddr(struct in_ifaddr *ifa) if (ifa->ifa_prefixlen < 31) { fib_magic(RTM_NEWROUTE, RTN_BROADCAST, prefix | ~mask, 32, prim, 0); + arp_invalidate(dev, prefix | ~mask, false); } } }
From: Jordy Zomer jordy@jordyzomer.github.io
[ Upstream commit cd9c88da171a62c4b0f1c70e50c75845969fbc18 ]
It appears like cmd could be a Spectre v1 gadget as it's supplied by a user and used as an array index. Prevent the contents of kernel memory from being leaked to userspace via speculative execution by using array_index_nospec.
Signed-off-by: Jordy Zomer jordy@pwning.systems Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/dm-ioctl.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index 21fe8652b095..901abd6dea41 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -18,6 +18,7 @@ #include <linux/dm-ioctl.h> #include <linux/hdreg.h> #include <linux/compat.h> +#include <linux/nospec.h>
#include <linux/uaccess.h> #include <linux/ima.h> @@ -1788,6 +1789,7 @@ static ioctl_fn lookup_ioctl(unsigned int cmd, int *ioctl_flags) if (unlikely(cmd >= ARRAY_SIZE(_ioctls))) return NULL;
+ cmd = array_index_nospec(cmd, ARRAY_SIZE(_ioctls)); *ioctl_flags = _ioctls[cmd].flags; return _ioctls[cmd].fn; }
From: Mike Snitzer snitzer@redhat.com
[ Upstream commit fa247089de9936a46e290d4724cb5f0b845600f5 ]
Update both bio-based and request-based DM to requeue IO if the mapping table not available.
This race of IO being submitted before the DM device ready is so narrow, yet possible for initial table load given that the DM device's request_queue is created prior, that it best to requeue IO to handle this unlikely case.
Reported-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/dm-rq.c | 7 ++++++- drivers/md/dm.c | 11 +++-------- 2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index a896dea9750e..53a9b16c7b2e 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -500,8 +500,13 @@ static blk_status_t dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
if (unlikely(!ti)) { int srcu_idx; - struct dm_table *map = dm_get_live_table(md, &srcu_idx); + struct dm_table *map;
+ map = dm_get_live_table(md, &srcu_idx); + if (unlikely(!map)) { + dm_put_live_table(md, srcu_idx); + return BLK_STS_RESOURCE; + } ti = dm_table_find_target(map, 0); dm_put_live_table(md, srcu_idx); } diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 443478a08857..36449422e7e0 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1594,15 +1594,10 @@ static blk_qc_t dm_submit_bio(struct bio *bio) struct dm_table *map;
map = dm_get_live_table(md, &srcu_idx); - if (unlikely(!map)) { - DMERR_LIMIT("%s: mapping table unavailable, erroring io", - dm_device_name(md)); - bio_io_error(bio); - goto out; - }
- /* If suspended, queue this IO for later */ - if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags))) { + /* If suspended, or map not yet available, queue this IO for later */ + if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) || + unlikely(!map)) { if (bio->bi_opf & REQ_NOWAIT) bio_wouldblock_error(bio); else if (bio->bi_opf & REQ_RAHEAD)
From: Alex Deucher alexander.deucher@amd.com
[ Upstream commit 9dff13f9edf755a15f6507874185a3290c1ae8bb ]
The driver has a fallback so make the message informational rather than a warning. The driver has a fallback if the Component Resource Association Table (CRAT) is missing, so make this informational now.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1906 Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c index c33d689f29e8..e574aa32a111 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c @@ -1563,7 +1563,7 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t *size) /* Fetch the CRAT table from ACPI */ status = acpi_get_table(CRAT_SIGNATURE, 0, &crat_table); if (status == AE_NOT_FOUND) { - pr_warn("CRAT table not found\n"); + pr_info("CRAT table not found\n"); return -ENODATA; } else if (ACPI_FAILURE(status)) { const char *err = acpi_format_exception(status);
From: Alex Williamson alex.williamson@redhat.com
[ Upstream commit 6e031ec0e5a2dda53e12e0d2a7e9b15b47a3c502 ]
Resolve build errors reported against UML build for undefined ioport_map() and ioport_unmap() functions. Without this config option a device cannot have vfio_pci_core_device.has_vga set, so the existing function would always return -EINVAL anyway.
Reported-by: Geert Uytterhoeven geert@linux-m68k.org Link: https://lore.kernel.org/r/20220123125737.2658758-1-geert@linux-m68k.org Link: https://lore.kernel.org/r/164306582968.3758255.15192949639574660648.stgit@om... Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vfio/pci/vfio_pci_rdwr.c | 2 ++ include/linux/vfio_pci_core.h | 9 +++++++++ 2 files changed, 11 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index 57d3b2cbbd8e..82ac1569deb0 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -288,6 +288,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, return done; }
+#ifdef CONFIG_VFIO_PCI_VGA ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf, size_t count, loff_t *ppos, bool iswrite) { @@ -355,6 +356,7 @@ ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf,
return done; } +#endif
static void vfio_pci_ioeventfd_do_write(struct vfio_pci_ioeventfd *ioeventfd, bool test_mem) diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index ef9a44b6cf5d..ae6f4838ab75 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -159,8 +159,17 @@ extern ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev, extern ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, size_t count, loff_t *ppos, bool iswrite);
+#ifdef CONFIG_VFIO_PCI_VGA extern ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf, size_t count, loff_t *ppos, bool iswrite); +#else +static inline ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, + char __user *buf, size_t count, + loff_t *ppos, bool iswrite) +{ + return -EINVAL; +} +#endif
extern long vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset, uint64_t data, int count, int fd);
From: Damien Le Moal damien.lemoal@opensource.wdc.com
[ Upstream commit 3762d8f6edcdb03994c919f9487fd6d336c06561 ]
The declaration of the local variable destination1 in pm80xx_pci_mem_copy() as a pointer to a u32 results in the sparse warning:
warning: incorrect type in assignment (different base types) expected unsigned int [usertype] got restricted __le32 [usertype]
Furthermore, the destination" argument of pm80xx_pci_mem_copy() is wrongly declared with the const attribute.
Fix both problems by changing the type of the "destination" argument to "__le32 *" and use this argument directly inside the pm80xx_pci_mem_copy() function, thus removing the need for the destination1 local variable.
Link: https://lore.kernel.org/r/20220220031810.738362-6-damien.lemoal@opensource.w... Reviewed-by: Jack Wang jinpu.wang@ionos.com Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/pm8001/pm80xx_hwi.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c b/drivers/scsi/pm8001/pm80xx_hwi.c index b0a108e1a3d9..df140eca341f 100644 --- a/drivers/scsi/pm8001/pm80xx_hwi.c +++ b/drivers/scsi/pm8001/pm80xx_hwi.c @@ -66,18 +66,16 @@ int pm80xx_bar4_shift(struct pm8001_hba_info *pm8001_ha, u32 shift_value) }
static void pm80xx_pci_mem_copy(struct pm8001_hba_info *pm8001_ha, u32 soffset, - const void *destination, + __le32 *destination, u32 dw_count, u32 bus_base_number) { u32 index, value, offset; - u32 *destination1; - destination1 = (u32 *)destination;
- for (index = 0; index < dw_count; index += 4, destination1++) { + for (index = 0; index < dw_count; index += 4, destination++) { offset = (soffset + index); if (offset < (64 * 1024)) { value = pm8001_cr32(pm8001_ha, bus_base_number, offset); - *destination1 = cpu_to_le32(value); + *destination = cpu_to_le32(value); } } return;
From: Damien Le Moal damien.lemoal@opensource.wdc.com
[ Upstream commit 7e6b7e740addcea450041b5be8e42f0a4ceece0f ]
The call to pm8001_ccb_task_free() at the end of pm8001_mpi_task_abort_resp() already frees the ccb tag. So when the device NCQ_ABORT_ALL_FLAG is set, the tag should not be freed again. Also change the hardcoded 0xBFFFFFFF value to ~NCQ_ABORT_ALL_FLAG as it ought to be.
Link: https://lore.kernel.org/r/20220220031810.738362-19-damien.lemoal@opensource.... Reviewed-by: Jack Wang jinpu.wang@ionos.com Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/pm8001/pm8001_hwi.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/pm8001/pm8001_hwi.c b/drivers/scsi/pm8001/pm8001_hwi.c index bed06ed0f1cb..9e6ffc37d6b0 100644 --- a/drivers/scsi/pm8001/pm8001_hwi.c +++ b/drivers/scsi/pm8001/pm8001_hwi.c @@ -3713,12 +3713,11 @@ int pm8001_mpi_task_abort_resp(struct pm8001_hba_info *pm8001_ha, void *piomb) mb();
if (pm8001_dev->id & NCQ_ABORT_ALL_FLAG) { - pm8001_tag_free(pm8001_ha, tag); sas_free_task(t); - /* clear the flag */ - pm8001_dev->id &= 0xBFFFFFFF; - } else + pm8001_dev->id &= ~NCQ_ABORT_ALL_FLAG; + } else { t->task_done(t); + }
return 0; }
From: Damien Le Moal damien.lemoal@opensource.wdc.com
[ Upstream commit f90a74892f3acf0cdec5844e90fc8686ca13e7d7 ]
In pm8001_send_abort_all(), make sure to free the allocated sas task if pm8001_tag_alloc() or pm8001_mpi_build_cmd() fail.
Link: https://lore.kernel.org/r/20220220031810.738362-21-damien.lemoal@opensource.... Reviewed-by: John Garry john.garry@huawei.com Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/pm8001/pm8001_hwi.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/scsi/pm8001/pm8001_hwi.c b/drivers/scsi/pm8001/pm8001_hwi.c index 9e6ffc37d6b0..2adf1435a187 100644 --- a/drivers/scsi/pm8001/pm8001_hwi.c +++ b/drivers/scsi/pm8001/pm8001_hwi.c @@ -1767,7 +1767,6 @@ static void pm8001_send_abort_all(struct pm8001_hba_info *pm8001_ha, }
task = sas_alloc_slow_task(GFP_ATOMIC); - if (!task) { pm8001_dbg(pm8001_ha, FAIL, "cannot allocate task\n"); return; @@ -1776,8 +1775,10 @@ static void pm8001_send_abort_all(struct pm8001_hba_info *pm8001_ha, task->task_done = pm8001_task_done;
res = pm8001_tag_alloc(pm8001_ha, &ccb_tag); - if (res) + if (res) { + sas_free_task(task); return; + }
ccb = &pm8001_ha->ccb_info[ccb_tag]; ccb->device = pm8001_ha_dev; @@ -1794,8 +1795,10 @@ static void pm8001_send_abort_all(struct pm8001_hba_info *pm8001_ha,
ret = pm8001_mpi_build_cmd(pm8001_ha, circularQ, opc, &task_abort, sizeof(task_abort), 0); - if (ret) + if (ret) { + sas_free_task(task); pm8001_tag_free(pm8001_ha, ccb_tag); + }
}
From: Damien Le Moal damien.lemoal@opensource.wdc.com
[ Upstream commit 4c8f04b1905cd4b776d0b720463c091545478ef7 ]
In pm8001_chip_set_dev_state_req(), pm8001_chip_fw_flash_update_req(), pm80xx_chip_phy_ctl_req() and pm8001_chip_reg_dev_req() add missing calls to pm8001_tag_free() to free the allocated tag when pm8001_mpi_build_cmd() fails.
Similarly, in pm8001_exec_internal_task_abort(), if the chip ->task_abort method fails, the tag allocated for the abort request task must be freed. Add the missing call to pm8001_tag_free().
Link: https://lore.kernel.org/r/20220220031810.738362-22-damien.lemoal@opensource.... Reviewed-by: John Garry john.garry@huawei.com Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/pm8001/pm8001_hwi.c | 9 +++++++++ drivers/scsi/pm8001/pm8001_sas.c | 2 +- drivers/scsi/pm8001/pm80xx_hwi.c | 9 +++++++-- 3 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/drivers/scsi/pm8001/pm8001_hwi.c b/drivers/scsi/pm8001/pm8001_hwi.c index 2adf1435a187..619fbcf37933 100644 --- a/drivers/scsi/pm8001/pm8001_hwi.c +++ b/drivers/scsi/pm8001/pm8001_hwi.c @@ -4488,6 +4488,9 @@ static int pm8001_chip_reg_dev_req(struct pm8001_hba_info *pm8001_ha, SAS_ADDR_SIZE); rc = pm8001_mpi_build_cmd(pm8001_ha, circularQ, opc, &payload, sizeof(payload), 0); + if (rc) + pm8001_tag_free(pm8001_ha, tag); + return rc; }
@@ -4900,6 +4903,9 @@ pm8001_chip_fw_flash_update_req(struct pm8001_hba_info *pm8001_ha, ccb->ccb_tag = tag; rc = pm8001_chip_fw_flash_update_build(pm8001_ha, &flash_update_info, tag); + if (rc) + pm8001_tag_free(pm8001_ha, tag); + return rc; }
@@ -5004,6 +5010,9 @@ pm8001_chip_set_dev_state_req(struct pm8001_hba_info *pm8001_ha, payload.nds = cpu_to_le32(state); rc = pm8001_mpi_build_cmd(pm8001_ha, circularQ, opc, &payload, sizeof(payload), 0); + if (rc) + pm8001_tag_free(pm8001_ha, tag); + return rc;
} diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c index 491cecbbe1aa..5fb08acbc0e5 100644 --- a/drivers/scsi/pm8001/pm8001_sas.c +++ b/drivers/scsi/pm8001/pm8001_sas.c @@ -831,10 +831,10 @@ pm8001_exec_internal_task_abort(struct pm8001_hba_info *pm8001_ha,
res = PM8001_CHIP_DISP->task_abort(pm8001_ha, pm8001_dev, flag, task_tag, ccb_tag); - if (res) { del_timer(&task->slow_task->timer); pm8001_dbg(pm8001_ha, FAIL, "Executing internal task failed\n"); + pm8001_tag_free(pm8001_ha, ccb_tag); goto ex_err; } wait_for_completion(&task->slow_task->completion); diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c b/drivers/scsi/pm8001/pm80xx_hwi.c index df140eca341f..5561057109de 100644 --- a/drivers/scsi/pm8001/pm80xx_hwi.c +++ b/drivers/scsi/pm8001/pm80xx_hwi.c @@ -4920,8 +4920,13 @@ static int pm80xx_chip_phy_ctl_req(struct pm8001_hba_info *pm8001_ha, payload.tag = cpu_to_le32(tag); payload.phyop_phyid = cpu_to_le32(((phy_op & 0xFF) << 8) | (phyId & 0xFF)); - return pm8001_mpi_build_cmd(pm8001_ha, circularQ, opc, &payload, - sizeof(payload), 0); + + rc = pm8001_mpi_build_cmd(pm8001_ha, circularQ, opc, &payload, + sizeof(payload), 0); + if (rc) + pm8001_tag_free(pm8001_ha, tag); + + return rc; }
static u32 pm80xx_chip_is_our_interrupt(struct pm8001_hba_info *pm8001_ha)
From: Damien Le Moal damien.lemoal@opensource.wdc.com
[ Upstream commit f792a3629f4c4aa4c3703d66b43ce1edcc3ec09a ]
In pm8001_chip_fw_flash_update_build(), if pm8001_chip_fw_flash_update_build() fails, the struct fw_control_ex allocated must be freed.
Link: https://lore.kernel.org/r/20220220031810.738362-23-damien.lemoal@opensource.... Reviewed-by: Jack Wang jinpu.wang@ionos.com Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/pm8001/pm8001_hwi.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/pm8001/pm8001_hwi.c b/drivers/scsi/pm8001/pm8001_hwi.c index 619fbcf37933..32fc450bf84b 100644 --- a/drivers/scsi/pm8001/pm8001_hwi.c +++ b/drivers/scsi/pm8001/pm8001_hwi.c @@ -4903,8 +4903,10 @@ pm8001_chip_fw_flash_update_req(struct pm8001_hba_info *pm8001_ha, ccb->ccb_tag = tag; rc = pm8001_chip_fw_flash_update_build(pm8001_ha, &flash_update_info, tag); - if (rc) + if (rc) { + kfree(fw_control_context); pm8001_tag_free(pm8001_ha, tag); + }
return rc; }
From: Johan Almbladh johan.almbladh@anyfinetworks.com
[ Upstream commit 28225a6ef80ebf46c46e5fbd5b1ee231a0b2b5b7 ]
Before, the hardware would be allowed to transmit injected 802.11 MPDUs as A-MSDU. This resulted in corrupted frames being transmitted. Now, injected MPDUs are transmitted as-is, without A-MSDU.
The fix was verified with frame injection on MT7915 hardware, both with and without the injected frame being encrypted.
If the hardware cannot do A-MSDU aggregation on MPDUs, this problem would also be present in the TX path where mac80211 does the 802.11 encapsulation. However, I have not observed any such problem when disabling IEEE80211_HW_SUPPORTS_TX_ENCAP_OFFLOAD to force that mode. Therefore this fix is isolated to injected frames only.
The same A-MSDU logic is also present in the mt7921 driver, so it is likely that this fix should be applied there too. I do not have access to mt7921 hardware so I have not been able to test that.
Signed-off-by: Johan Almbladh johan.almbladh@anyfinetworks.com Signed-off-by: Felix Fietkau nbd@nbd.name Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/mediatek/mt76/mt7915/mac.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/mediatek/mt76/mt7915/mac.c b/drivers/net/wireless/mediatek/mt76/mt7915/mac.c index ff613d705611..7691292526e0 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7915/mac.c +++ b/drivers/net/wireless/mediatek/mt76/mt7915/mac.c @@ -899,6 +899,7 @@ mt7915_mac_write_txwi_80211(struct mt7915_dev *dev, __le32 *txwi, val = MT_TXD3_SN_VALID | FIELD_PREP(MT_TXD3_SEQ, IEEE80211_SEQ_TO_SN(seqno)); txwi[3] |= cpu_to_le32(val); + txwi[7] &= ~cpu_to_le32(MT_TXD7_HW_AMSDU); }
val = FIELD_PREP(MT_TXD7_TYPE, fc_type) |
From: Nicholas Piggin npiggin@gmail.com
[ Upstream commit 8b91cee5eadd2021f55e6775f2d50bd56d00c217 ]
Hash faults are not resoved in NMI context, instead causing the access to fail. This is done because perf interrupts can get backtraces including walking the user stack, and taking a hash fault on those could deadlock on the HPTE lock if the perf interrupt hits while the same HPTE lock is being held by the hash fault code. The user-access for the stack walking will notice the access failed and deal with that in the perf code.
The reason to allow perf interrupts in is to better profile hash faults.
The problem with this is any hash fault on a kernel access that happens in NMI context will crash, because kernel accesses must not fail.
Hard lockups, system reset, machine checks that access vmalloc space including modules and including stack backtracing and symbol lookup in modules, per-cpu data, etc could all run into this problem.
Fix this by disallowing perf interrupts in the hash fault code (the direct hash fault is covered by MSR[EE]=0 so the PMI disable just needs to extend to the preload case). This simplifies the tricky logic in hash faults and perf, at the cost of reduced profiling of hash faults.
perf can still latch addresses when interrupts are disabled, it just won't get the stack trace at that point, so it would still find hot spots, just sometimes with confusing stack chains.
An alternative could be to allow perf interrupts here but always do the slowpath stack walk if we are in nmi context, but that slows down all perf interrupt stack walking on hash though and it does not remove as much tricky code.
Reported-by: Laurent Dufour ldufour@linux.ibm.com Signed-off-by: Nicholas Piggin npiggin@gmail.com Tested-by: Laurent Dufour ldufour@linux.ibm.com Reviewed-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220204035348.545435-1-npiggin@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/interrupt.h | 2 +- arch/powerpc/mm/book3s64/hash_utils.c | 54 ++++----------------------- arch/powerpc/perf/callchain.h | 9 +---- arch/powerpc/perf/callchain_64.c | 27 -------------- 4 files changed, 10 insertions(+), 82 deletions(-)
diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h index a1d238255f07..a07960066b5f 100644 --- a/arch/powerpc/include/asm/interrupt.h +++ b/arch/powerpc/include/asm/interrupt.h @@ -567,7 +567,7 @@ DECLARE_INTERRUPT_HANDLER_RAW(do_slb_fault); DECLARE_INTERRUPT_HANDLER(do_bad_slb_fault);
/* hash_utils.c */ -DECLARE_INTERRUPT_HANDLER_RAW(do_hash_fault); +DECLARE_INTERRUPT_HANDLER(do_hash_fault);
/* fault.c */ DECLARE_INTERRUPT_HANDLER(do_page_fault); diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c index c145776d3ae5..7bfd88c4b547 100644 --- a/arch/powerpc/mm/book3s64/hash_utils.c +++ b/arch/powerpc/mm/book3s64/hash_utils.c @@ -1522,8 +1522,7 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap, } EXPORT_SYMBOL_GPL(hash_page);
-DECLARE_INTERRUPT_HANDLER(__do_hash_fault); -DEFINE_INTERRUPT_HANDLER(__do_hash_fault) +DEFINE_INTERRUPT_HANDLER(do_hash_fault) { unsigned long ea = regs->dar; unsigned long dsisr = regs->dsisr; @@ -1582,35 +1581,6 @@ DEFINE_INTERRUPT_HANDLER(__do_hash_fault) } }
-/* - * The _RAW interrupt entry checks for the in_nmi() case before - * running the full handler. - */ -DEFINE_INTERRUPT_HANDLER_RAW(do_hash_fault) -{ - /* - * If we are in an "NMI" (e.g., an interrupt when soft-disabled), then - * don't call hash_page, just fail the fault. This is required to - * prevent re-entrancy problems in the hash code, namely perf - * interrupts hitting while something holds H_PAGE_BUSY, and taking a - * hash fault. See the comment in hash_preload(). - * - * We come here as a result of a DSI at a point where we don't want - * to call hash_page, such as when we are accessing memory (possibly - * user memory) inside a PMU interrupt that occurred while interrupts - * were soft-disabled. We want to invoke the exception handler for - * the access, or panic if there isn't a handler. - */ - if (unlikely(in_nmi())) { - do_bad_page_fault_segv(regs); - return 0; - } - - __do_hash_fault(regs); - - return 0; -} - #ifdef CONFIG_PPC_MM_SLICES static bool should_hash_preload(struct mm_struct *mm, unsigned long ea) { @@ -1677,26 +1647,18 @@ static void hash_preload(struct mm_struct *mm, pte_t *ptep, unsigned long ea, #endif /* CONFIG_PPC_64K_PAGES */
/* - * __hash_page_* must run with interrupts off, as it sets the - * H_PAGE_BUSY bit. It's possible for perf interrupts to hit at any - * time and may take a hash fault reading the user stack, see - * read_user_stack_slow() in the powerpc/perf code. - * - * If that takes a hash fault on the same page as we lock here, it - * will bail out when seeing H_PAGE_BUSY set, and retry the access - * leading to an infinite loop. + * __hash_page_* must run with interrupts off, including PMI interrupts + * off, as it sets the H_PAGE_BUSY bit. * - * Disabling interrupts here does not prevent perf interrupts, but it - * will prevent them taking hash faults (see the NMI test in - * do_hash_page), then read_user_stack's copy_from_user_nofault will - * fail and perf will fall back to read_user_stack_slow(), which - * walks the Linux page tables. + * It's otherwise possible for perf interrupts to hit at any time and + * may take a hash fault reading the user stack, which could take a + * hash miss and deadlock on the same H_PAGE_BUSY bit. * * Interrupts must also be off for the duration of the * mm_is_thread_local test and update, to prevent preempt running the * mm on another CPU (XXX: this may be racy vs kthread_use_mm). */ - local_irq_save(flags); + powerpc_local_irq_pmu_save(flags);
/* Is that local to this CPU ? */ if (mm_is_thread_local(mm)) @@ -1721,7 +1683,7 @@ static void hash_preload(struct mm_struct *mm, pte_t *ptep, unsigned long ea, mm_ctx_user_psize(&mm->context), pte_val(*ptep));
- local_irq_restore(flags); + powerpc_local_irq_pmu_restore(flags); }
/* diff --git a/arch/powerpc/perf/callchain.h b/arch/powerpc/perf/callchain.h index d6fa6e25234f..19a8d051ddf1 100644 --- a/arch/powerpc/perf/callchain.h +++ b/arch/powerpc/perf/callchain.h @@ -2,7 +2,6 @@ #ifndef _POWERPC_PERF_CALLCHAIN_H #define _POWERPC_PERF_CALLCHAIN_H
-int read_user_stack_slow(const void __user *ptr, void *buf, int nb); void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs); void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry, @@ -26,17 +25,11 @@ static inline int __read_user_stack(const void __user *ptr, void *ret, size_t size) { unsigned long addr = (unsigned long)ptr; - int rc;
if (addr > TASK_SIZE - size || (addr & (size - 1))) return -EFAULT;
- rc = copy_from_user_nofault(ret, ptr, size); - - if (IS_ENABLED(CONFIG_PPC64) && !radix_enabled() && rc) - return read_user_stack_slow(ptr, ret, size); - - return rc; + return copy_from_user_nofault(ret, ptr, size); }
#endif /* _POWERPC_PERF_CALLCHAIN_H */ diff --git a/arch/powerpc/perf/callchain_64.c b/arch/powerpc/perf/callchain_64.c index 8d0df4226328..488e8a21a11e 100644 --- a/arch/powerpc/perf/callchain_64.c +++ b/arch/powerpc/perf/callchain_64.c @@ -18,33 +18,6 @@
#include "callchain.h"
-/* - * On 64-bit we don't want to invoke hash_page on user addresses from - * interrupt context, so if the access faults, we read the page tables - * to find which page (if any) is mapped and access it directly. Radix - * has no need for this so it doesn't use read_user_stack_slow. - */ -int read_user_stack_slow(const void __user *ptr, void *buf, int nb) -{ - - unsigned long addr = (unsigned long) ptr; - unsigned long offset; - struct page *page; - void *kaddr; - - if (get_user_page_fast_only(addr, FOLL_WRITE, &page)) { - kaddr = page_address(page); - - /* align address to page boundary */ - offset = addr & ~PAGE_MASK; - - memcpy(buf, kaddr + offset, nb); - put_page(page); - return 0; - } - return -EFAULT; -} - static int read_user_stack_64(const unsigned long __user *ptr, unsigned long *ret) { return __read_user_stack(ptr, ret, sizeof(*ret));
From: Yang Li yang.lee@linux.alibaba.com
[ Upstream commit 9273ffcc9a11942bd586bb42584337ef3962b692 ]
Smatch reports the following: drivers/net/wireless/mediatek/mt76/mt7615/mac.c:1865 mt7615_mac_adjust_sensitivity() warn: assigning (-110) to unsigned variable 'def_th' drivers/net/wireless/mediatek/mt76/mt7615/mac.c:1865 mt7615_mac_adjust_sensitivity() warn: assigning (-98) to unsigned variable 'def_th'
Reported-by: Abaci Robot abaci@linux.alibaba.com Signed-off-by: Yang Li yang.lee@linux.alibaba.com Signed-off-by: Felix Fietkau nbd@nbd.name Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/mediatek/mt76/mt7615/mac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/mediatek/mt76/mt7615/mac.c b/drivers/net/wireless/mediatek/mt76/mt7615/mac.c index eb7bda91f2b3..8f4a5d4929e0 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7615/mac.c +++ b/drivers/net/wireless/mediatek/mt76/mt7615/mac.c @@ -1732,7 +1732,7 @@ mt7615_mac_adjust_sensitivity(struct mt7615_phy *phy, struct mt7615_dev *dev = phy->dev; int false_cca = ofdm ? phy->false_cca_ofdm : phy->false_cca_cck; bool ext_phy = phy != &dev->phy; - u16 def_th = ofdm ? -98 : -110; + s16 def_th = ofdm ? -98 : -110; bool update = false; s8 *sensitivity; int signal;
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit cc8294ec4738d25e2bb2d71f7d82a9bf7f4a157b ]
__setup() handlers should return 1 if the command line option is handled and 0 if not (or maybe never return 0; doing so just pollutes init's environment with strings that are not init arguments/parameters).
Return 1 from aha152x_setup() to indicate that the boot option has been handled.
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru Link: https://lore.kernel.org/r/20220223000623.5920-1-rdunlap@infradead.org Cc: "Juergen E. Fischer" fischer@norbit.de Cc: "James E.J. Bottomley" jejb@linux.ibm.com Cc: "Martin K. Petersen" martin.petersen@oracle.com Reported-by: Igor Zhbanov i.zhbanov@omprussia.ru Signed-off-by: Randy Dunlap rdunlap@infradead.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/aha152x.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/aha152x.c b/drivers/scsi/aha152x.c index b13b5c85f3de..75a5a4765f42 100644 --- a/drivers/scsi/aha152x.c +++ b/drivers/scsi/aha152x.c @@ -3370,13 +3370,11 @@ static int __init aha152x_setup(char *str) setup[setup_count].synchronous = ints[0] >= 6 ? ints[6] : 1; setup[setup_count].delay = ints[0] >= 7 ? ints[7] : DELAY_DEFAULT; setup[setup_count].ext_trans = ints[0] >= 8 ? ints[8] : 0; - if (ints[0] > 8) { /*}*/ + if (ints[0] > 8) printk(KERN_NOTICE "aha152x: usage: aha152x=<IOBASE>[,<IRQ>[,<SCSI ID>" "[,<RECONNECT>[,<PARITY>[,<SYNCHRONOUS>[,<DELAY>[,<EXT_TRANS>]]]]]]]\n"); - } else { + else setup_count++; - return 0; - }
return 1; }
From: Qi Liu liuqi115@huawei.com
[ Upstream commit 554fb72ee34f4732c7f694f56c3c6e67790352a0 ]
If the driver probe fails to request the channel IRQ or fatal IRQ, the driver will free the IRQ vectors before freeing the IRQs in free_irq(), and this will cause a kernel BUG like this:
------------[ cut here ]------------ kernel BUG at drivers/pci/msi.c:369! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Call trace: free_msi_irqs+0x118/0x13c pci_disable_msi+0xfc/0x120 pci_free_irq_vectors+0x24/0x3c hisi_sas_v3_probe+0x360/0x9d0 [hisi_sas_v3_hw] local_pci_probe+0x44/0xb0 work_for_cpu_fn+0x20/0x34 process_one_work+0x1d0/0x340 worker_thread+0x2e0/0x460 kthread+0x180/0x190 ret_from_fork+0x10/0x20 ---[ end trace b88990335b610c11 ]---
So we use devm_add_action() to control the order in which we free the vectors.
Link: https://lore.kernel.org/r/1645703489-87194-4-git-send-email-john.garry@huawe... Signed-off-by: Qi Liu liuqi115@huawei.com Signed-off-by: John Garry john.garry@huawei.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index 1942970f9eb7..6010acae4cf3 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -2392,17 +2392,25 @@ static irqreturn_t cq_interrupt_v3_hw(int irq_no, void *p) return IRQ_WAKE_THREAD; }
+static void hisi_sas_v3_free_vectors(void *data) +{ + struct pci_dev *pdev = data; + + pci_free_irq_vectors(pdev); +} + static int interrupt_preinit_v3_hw(struct hisi_hba *hisi_hba) { int vectors; int max_msi = HISI_SAS_MSI_COUNT_V3_HW, min_msi; struct Scsi_Host *shost = hisi_hba->shost; + struct pci_dev *pdev = hisi_hba->pci_dev; struct irq_affinity desc = { .pre_vectors = BASE_VECTORS_V3_HW, };
min_msi = MIN_AFFINE_VECTORS_V3_HW; - vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev, + vectors = pci_alloc_irq_vectors_affinity(pdev, min_msi, max_msi, PCI_IRQ_MSI | PCI_IRQ_AFFINITY, @@ -2414,6 +2422,7 @@ static int interrupt_preinit_v3_hw(struct hisi_hba *hisi_hba) hisi_hba->cq_nvecs = vectors - BASE_VECTORS_V3_HW; shost->nr_hw_queues = hisi_hba->cq_nvecs;
+ devm_add_action(&pdev->dev, hisi_sas_v3_free_vectors, pdev); return 0; }
@@ -4763,7 +4772,7 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct pci_device_id *id) dev_err(dev, "%d hw queues\n", shost->nr_hw_queues); rc = scsi_add_host(shost, dev); if (rc) - goto err_out_free_irq_vectors; + goto err_out_debugfs;
rc = sas_register_ha(sha); if (rc) @@ -4792,8 +4801,6 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct pci_device_id *id) sas_unregister_ha(sha); err_out_register_ha: scsi_remove_host(shost); -err_out_free_irq_vectors: - pci_free_irq_vectors(pdev); err_out_debugfs: debugfs_exit_v3_hw(hisi_hba); err_out_ha: @@ -4821,7 +4828,6 @@ hisi_sas_v3_destroy_irqs(struct pci_dev *pdev, struct hisi_hba *hisi_hba)
devm_free_irq(&pdev->dev, pci_irq_vector(pdev, nr), cq); } - pci_free_irq_vectors(pdev); }
static void hisi_sas_v3_remove(struct pci_dev *pdev)
From: Xiang Chen chenxiang66@hisilicon.com
[ Upstream commit 286ce4c65fbdf5eb9d4d5f4e4997c4e32bf1b073 ]
Add a file operation for "cnt" file under bist directory, so users can only read "cnt" or clear "cnt" to zero, but cannot randomly modify.
Link: https://lore.kernel.org/r/1645703489-87194-6-git-send-email-john.garry@huawe... Signed-off-by: Xiang Chen chenxiang66@hisilicon.com Signed-off-by: Qi Liu liuqi115@huawei.com Signed-off-by: John Garry john.garry@huawei.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 52 +++++++++++++++++++++++++- 1 file changed, 50 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index 6010acae4cf3..1f5e0688c0c8 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -3968,6 +3968,54 @@ static const struct file_operations debugfs_bist_phy_v3_hw_fops = { .owner = THIS_MODULE, };
+static ssize_t debugfs_bist_cnt_v3_hw_write(struct file *filp, + const char __user *buf, + size_t count, loff_t *ppos) +{ + struct seq_file *m = filp->private_data; + struct hisi_hba *hisi_hba = m->private; + unsigned int cnt; + int val; + + if (hisi_hba->debugfs_bist_enable) + return -EPERM; + + val = kstrtouint_from_user(buf, count, 0, &cnt); + if (val) + return val; + + if (cnt) + return -EINVAL; + + hisi_hba->debugfs_bist_cnt = 0; + return count; +} + +static int debugfs_bist_cnt_v3_hw_show(struct seq_file *s, void *p) +{ + struct hisi_hba *hisi_hba = s->private; + + seq_printf(s, "%u\n", hisi_hba->debugfs_bist_cnt); + + return 0; +} + +static int debugfs_bist_cnt_v3_hw_open(struct inode *inode, + struct file *filp) +{ + return single_open(filp, debugfs_bist_cnt_v3_hw_show, + inode->i_private); +} + +static const struct file_operations debugfs_bist_cnt_v3_hw_ops = { + .open = debugfs_bist_cnt_v3_hw_open, + .read = seq_read, + .write = debugfs_bist_cnt_v3_hw_write, + .llseek = seq_lseek, + .release = single_release, + .owner = THIS_MODULE, +}; + static const struct { int value; char *name; @@ -4605,8 +4653,8 @@ static void debugfs_bist_init_v3_hw(struct hisi_hba *hisi_hba) debugfs_create_file("phy_id", 0600, hisi_hba->debugfs_bist_dentry, hisi_hba, &debugfs_bist_phy_v3_hw_fops);
- debugfs_create_u32("cnt", 0600, hisi_hba->debugfs_bist_dentry, - &hisi_hba->debugfs_bist_cnt); + debugfs_create_file("cnt", 0600, hisi_hba->debugfs_bist_dentry, + hisi_hba, &debugfs_bist_cnt_v3_hw_ops);
debugfs_create_file("loopback_mode", 0600, hisi_hba->debugfs_bist_dentry,
From: Dust Li dust.li@linux.alibaba.com
[ Upstream commit 6bf536eb5c8ca011d1ff57b5c5f7c57ceac06a37 ]
rmbe_update_limit is used to limit announcing receive window updating too frequently. RFC7609 request a minimal increase in the window size of 10% of the receive buffer space. But current implementation used:
min_t(int, rmbe_size / 10, SOCK_MIN_SNDBUF / 2)
and SOCK_MIN_SNDBUF / 2 == 2304 Bytes, which is almost always less then 10% of the receive buffer space.
This causes the receiver always sending CDC message to update its consumer cursor when it consumes more then 2K of data. And as a result, we may encounter something like "TCP silly window syndrome" when sending 2.5~8K message.
This patch fixes this using max(rmbe_size / 10, SOCK_MIN_SNDBUF / 2).
With this patch and SMC autocorking enabled, qperf 2K/4K/8K tcp_bw test shows 45%/75%/40% increase in throughput respectively.
Signed-off-by: Dust Li dust.li@linux.alibaba.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/smc/smc_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index dee336eef6d2..7401ec67ebcf 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -1822,7 +1822,7 @@ static struct smc_buf_desc *smc_buf_get_slot(int compressed_bufsize, */ static inline int smc_rmb_wnd_update_limit(int rmbe_size) { - return min_t(int, rmbe_size / 10, SOCK_MIN_SNDBUF / 2); + return max_t(int, rmbe_size / 10, SOCK_MIN_SNDBUF / 2); }
/* map an rmb buf to a link */
From: Hangyu Hua hbh25y@gmail.com
[ Upstream commit 4a0a1436053b17e50b7c88858fb0824326641793 ]
of_node_put(np) needs to be called when pdev == NULL.
Signed-off-by: Hangyu Hua hbh25y@gmail.com Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/mips/ralink/ill_acc.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/mips/ralink/ill_acc.c b/arch/mips/ralink/ill_acc.c index bdf53807d7c2..bea857c9da8b 100644 --- a/arch/mips/ralink/ill_acc.c +++ b/arch/mips/ralink/ill_acc.c @@ -61,6 +61,7 @@ static int __init ill_acc_of_setup(void) pdev = of_find_device_by_node(np); if (!pdev) { pr_err("%pOFn: failed to lookup pdev\n", np); + of_node_put(np); return -EINVAL; }
From: Sven Eckelmann sven@narfation.org
[ Upstream commit a02192151b7dbf855084c38dca380d77c7658353 ]
Assign rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is added to rtnetlink messages. This fixes iproute2 which otherwise resolved the link interface to an interface in the wrong namespace.
Test commands:
ip netns add nst ip link add dummy0 type dummy ip link add link macvtap0 link dummy0 type macvtap ip link set macvtap0 netns nst ip -netns nst link show macvtap0
Before:
10: macvtap0@gre0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500 link/ether 5e:8f:ae:1d:60:50 brd ff:ff:ff:ff:ff:ff
After:
10: macvtap0@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500 link/ether 5e:8f:ae:1d:60:50 brd ff:ff:ff:ff:ff:ff link-netnsid 0
Reported-by: Leonardo Mörlein freifunk@irrelefant.net Signed-off-by: Sven Eckelmann sven@narfation.org Link: https://lore.kernel.org/r/20220228003240.1337426-1-sven@narfation.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/macvtap.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c index 694e2f5dbbe5..39801c31e507 100644 --- a/drivers/net/macvtap.c +++ b/drivers/net/macvtap.c @@ -133,11 +133,17 @@ static void macvtap_setup(struct net_device *dev) dev->tx_queue_len = TUN_READQ_SIZE; }
+static struct net *macvtap_link_net(const struct net_device *dev) +{ + return dev_net(macvlan_dev_real_dev(dev)); +} + static struct rtnl_link_ops macvtap_link_ops __read_mostly = { .kind = "macvtap", .setup = macvtap_setup, .newlink = macvtap_newlink, .dellink = macvtap_dellink, + .get_link_net = macvtap_link_net, .priv_size = sizeof(struct macvtap_dev), };
From: Harold Huang baymaxhuang@gmail.com
[ Upstream commit 74a335a07a17d131b9263bfdbdcb5e40673ca9ca ]
In patch [1], tun_msg_ctl was added to allow pass batched xdp buffers to tun_sendmsg. Although we donot use msg_controllen in this path, we should check msg_controllen to make sure the caller pass a valid msg_ctl.
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
Reported-by: Eric Dumazet eric.dumazet@gmail.com Suggested-by: Jason Wang jasowang@redhat.com Signed-off-by: Harold Huang baymaxhuang@gmail.com Acked-by: Jason Wang jasowang@redhat.com Link: https://lore.kernel.org/r/20220303022441.383865-1-baymaxhuang@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/tap.c | 3 ++- drivers/net/tun.c | 3 ++- drivers/vhost/net.c | 1 + 3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 8e3a28ba6b28..ba2ef5437e16 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -1198,7 +1198,8 @@ static int tap_sendmsg(struct socket *sock, struct msghdr *m, struct xdp_buff *xdp; int i;
- if (ctl && (ctl->type == TUN_MSG_PTR)) { + if (m->msg_controllen == sizeof(struct tun_msg_ctl) && + ctl && ctl->type == TUN_MSG_PTR) { for (i = 0; i < ctl->num; i++) { xdp = &((struct xdp_buff *)ctl->ptr)[i]; tap_get_user_xdp(q, xdp); diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 45a67e72a02c..02de8d998bfa 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -2489,7 +2489,8 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) if (!tun) return -EBADFD;
- if (ctl && (ctl->type == TUN_MSG_PTR)) { + if (m->msg_controllen == sizeof(struct tun_msg_ctl) && + ctl && ctl->type == TUN_MSG_PTR) { struct tun_page tpage; int n = ctl->num; int flush = 0; diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 28ef323882fb..792ab5f23647 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -473,6 +473,7 @@ static void vhost_tx_batch(struct vhost_net *net, goto signal_used;
msghdr->msg_control = &ctl; + msghdr->msg_controllen = sizeof(ctl); err = sock->ops->sendmsg(sock, msghdr, 0); if (unlikely(err < 0)) { vq_err(&nvq->vq, "Fail to batch sending packets\n");
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 9b392e0e0b6d026da5a62bb79a08f32e27af858e ]
This fixes attemting to print hdev->name directly which causes them to print an error:
kernel: read_version:367: (efault): sock 000000006a3008f2
Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/bluetooth/bluetooth.h | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h index 9125effbf448..3fecc4a411a1 100644 --- a/include/net/bluetooth/bluetooth.h +++ b/include/net/bluetooth/bluetooth.h @@ -180,19 +180,21 @@ void bt_err_ratelimited(const char *fmt, ...); #define BT_DBG(fmt, ...) pr_debug(fmt "\n", ##__VA_ARGS__) #endif
+#define bt_dev_name(hdev) ((hdev) ? (hdev)->name : "null") + #define bt_dev_info(hdev, fmt, ...) \ - BT_INFO("%s: " fmt, (hdev)->name, ##__VA_ARGS__) + BT_INFO("%s: " fmt, bt_dev_name(hdev), ##__VA_ARGS__) #define bt_dev_warn(hdev, fmt, ...) \ - BT_WARN("%s: " fmt, (hdev)->name, ##__VA_ARGS__) + BT_WARN("%s: " fmt, bt_dev_name(hdev), ##__VA_ARGS__) #define bt_dev_err(hdev, fmt, ...) \ - BT_ERR("%s: " fmt, (hdev)->name, ##__VA_ARGS__) + BT_ERR("%s: " fmt, bt_dev_name(hdev), ##__VA_ARGS__) #define bt_dev_dbg(hdev, fmt, ...) \ - BT_DBG("%s: " fmt, (hdev)->name, ##__VA_ARGS__) + BT_DBG("%s: " fmt, bt_dev_name(hdev), ##__VA_ARGS__)
#define bt_dev_warn_ratelimited(hdev, fmt, ...) \ - bt_warn_ratelimited("%s: " fmt, (hdev)->name, ##__VA_ARGS__) + bt_warn_ratelimited("%s: " fmt, bt_dev_name(hdev), ##__VA_ARGS__) #define bt_dev_err_ratelimited(hdev, fmt, ...) \ - bt_err_ratelimited("%s: " fmt, (hdev)->name, ##__VA_ARGS__) + bt_err_ratelimited("%s: " fmt, bt_dev_name(hdev), ##__VA_ARGS__)
/* Connection and socket states */ enum {
From: Minghao Chi (CGEL ZTE) chi.minghao@zte.com.cn
[ Upstream commit d3715b2333e9a21692ba16ef8645eda584a9515d ]
Use memset to initialize structs to prevent memory leaks in l2cap_ecred_connect
Reported-by: Zeal Robot zealci@zte.com.cn Signed-off-by: Minghao Chi (CGEL ZTE) chi.minghao@zte.com.cn Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/l2cap_core.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index 77ba68209dbd..c57a45df7a26 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -1436,6 +1436,7 @@ static void l2cap_ecred_connect(struct l2cap_chan *chan)
l2cap_ecred_init(chan, 0);
+ memset(&data, 0, sizeof(data)); data.pdu.req.psm = chan->psm; data.pdu.req.mtu = cpu_to_le16(chan->imtu); data.pdu.req.mps = cpu_to_le16(chan->mps);
From: Michael Chan michael.chan@broadcom.com
[ Upstream commit 7c492a2530c1f05441da541307c2534230dfd59b ]
If the flow control settings have been changed, a subsequent FW reset may cause the ethernet link to toggle unnecessarily. This link toggle will increase the down time by a few seconds.
The problem is caused by bnxt_update_phy_setting() detecting a false mismatch in the flow control settings between the stored software settings and the current FW settings after the FW reset. This mismatch is caused by the AUTONEG bit added to link_info->req_flow_ctrl in an inconsistent way in bnxt_set_pauseparam() in autoneg mode. The AUTONEG bit should not be added to link_info->req_flow_ctrl.
Reviewed-by: Colin Winegarden colin.winegarden@broadcom.com Reviewed-by: Pavan Chebbi pavan.chebbi@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c index af7de9ee66cf..0f276ce2d1eb 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c @@ -2074,9 +2074,7 @@ static int bnxt_set_pauseparam(struct net_device *dev, }
link_info->autoneg |= BNXT_AUTONEG_FLOW_CTRL; - if (bp->hwrm_spec_code >= 0x10201) - link_info->req_flow_ctrl = - PORT_PHY_CFG_REQ_AUTO_PAUSE_AUTONEG_PAUSE; + link_info->req_flow_ctrl = 0; } else { /* when transition from auto pause to force pause, * force a link change
From: Li Chen lchen@ambarella.com
[ Upstream commit bf8d87c076f55b8b4dfdb6bc6c6b6dc0c2ccb487 ]
Fix a misused goto label jump since that can result in a memory leak.
Link: https://lore.kernel.org/r/17e7b9b9ee6.c6d9c6a02564.4545388417402742326@zohom... Signed-off-by: Li Chen lchen@ambarella.com Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Acked-by: Kishon Vijay Abraham I kishon@ti.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/endpoint/functions/pci-epf-test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pci/endpoint/functions/pci-epf-test.c b/drivers/pci/endpoint/functions/pci-epf-test.c index c7e45633beaf..5b833f00e980 100644 --- a/drivers/pci/endpoint/functions/pci-epf-test.c +++ b/drivers/pci/endpoint/functions/pci-epf-test.c @@ -451,7 +451,7 @@ static int pci_epf_test_write(struct pci_epf_test *epf_test) if (!epf_test->dma_supported) { dev_err(dev, "Cannot transfer data using DMA\n"); ret = -EINVAL; - goto err_map_addr; + goto err_dma_map; }
src_phys_addr = dma_map_single(dma_dev, buf, reg->size,
From: Alexander Lobakin alobakin@pm.me
[ Upstream commit d17b66417308996e7e64b270a3c7f3c1fbd4cfc8 ]
With KCFLAGS="-O3", I was able to trigger a fortify-source memcpy() overflow panic on set_vi_srs_handler(). Although O3 level is not supported in the mainline, under some conditions that may've happened with any optimization settings, it's just a matter of inlining luck. The panic itself is correct, more precisely, 50/50 false-positive and not at the same time.
From the one side, no real overflow happens. Exception handler
defined in asm just gets copied to some reserved places in the memory. But the reason behind is that C code refers to that exception handler declares it as `char`, i.e. something of 1 byte length. It's obvious that the asm function itself is way more than 1 byte, so fortify logics thought we are going to past the symbol declared. The standard way to refer to asm symbols from C code which is not supposed to be called from C is to declare them as `extern const u8[]`. This is fully correct from any point of view, as any code itself is just a bunch of bytes (including 0 as it is for syms like _stext/_etext/etc.), and the exact size is not known at the moment of compilation. Adjust the type of the except_vec_vi_*() and related variables. Make set_handler() take `const` as a second argument to avoid cast-away warnings and give a little more room for optimization.
Signed-off-by: Alexander Lobakin alobakin@pm.me Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/mips/include/asm/setup.h | 2 +- arch/mips/kernel/traps.c | 22 +++++++++++----------- 2 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/arch/mips/include/asm/setup.h b/arch/mips/include/asm/setup.h index bb36a400203d..8c56b862fd9c 100644 --- a/arch/mips/include/asm/setup.h +++ b/arch/mips/include/asm/setup.h @@ -16,7 +16,7 @@ static inline void setup_8250_early_printk_port(unsigned long base, unsigned int reg_shift, unsigned int timeout) {} #endif
-extern void set_handler(unsigned long offset, void *addr, unsigned long len); +void set_handler(unsigned long offset, const void *addr, unsigned long len); extern void set_uncached_handler(unsigned long offset, void *addr, unsigned long len);
typedef void (*vi_handler_t)(void); diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c index 6f07362de5ce..edd93430b954 100644 --- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -2085,19 +2085,19 @@ static void *set_vi_srs_handler(int n, vi_handler_t addr, int srs) * If no shadow set is selected then use the default handler * that does normal register saving and standard interrupt exit */ - extern char except_vec_vi, except_vec_vi_lui; - extern char except_vec_vi_ori, except_vec_vi_end; - extern char rollback_except_vec_vi; - char *vec_start = using_rollback_handler() ? - &rollback_except_vec_vi : &except_vec_vi; + extern const u8 except_vec_vi[], except_vec_vi_lui[]; + extern const u8 except_vec_vi_ori[], except_vec_vi_end[]; + extern const u8 rollback_except_vec_vi[]; + const u8 *vec_start = using_rollback_handler() ? + rollback_except_vec_vi : except_vec_vi; #if defined(CONFIG_CPU_MICROMIPS) || defined(CONFIG_CPU_BIG_ENDIAN) - const int lui_offset = &except_vec_vi_lui - vec_start + 2; - const int ori_offset = &except_vec_vi_ori - vec_start + 2; + const int lui_offset = except_vec_vi_lui - vec_start + 2; + const int ori_offset = except_vec_vi_ori - vec_start + 2; #else - const int lui_offset = &except_vec_vi_lui - vec_start; - const int ori_offset = &except_vec_vi_ori - vec_start; + const int lui_offset = except_vec_vi_lui - vec_start; + const int ori_offset = except_vec_vi_ori - vec_start; #endif - const int handler_len = &except_vec_vi_end - vec_start; + const int handler_len = except_vec_vi_end - vec_start;
if (handler_len > VECTORSPACING) { /* @@ -2305,7 +2305,7 @@ void per_cpu_trap_init(bool is_boot_cpu) }
/* Install CPU exception handler */ -void set_handler(unsigned long offset, void *addr, unsigned long size) +void set_handler(unsigned long offset, const void *addr, unsigned long size) { #ifdef CONFIG_CPU_MICROMIPS memcpy((void *)(ebase + offset), ((unsigned char *)addr - 1), size);
From: Michael Ellerman mpe@ellerman.id.au
[ Upstream commit 591b4b268435f00d2f0b81f786c2c7bd5ef66416 ]
Paul reported a warning with DEBUG_ATOMIC_SLEEP=y:
BUG: sleeping function called from invalid context at include/linux/sched/mm.h:256 in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 preempt_count: 0, expected: 0 ... Call Trace: dump_stack_lvl+0xa0/0xec (unreliable) __might_resched+0x2f4/0x310 kmem_cache_alloc+0x220/0x4b0 __pud_alloc+0x74/0x1d0 hash__map_kernel_page+0x2cc/0x390 do_patch_instruction+0x134/0x4a0 arch_jump_label_transform+0x64/0x78 __jump_label_update+0x148/0x180 static_key_enable_cpuslocked+0xd0/0x120 static_key_enable+0x30/0x50 check_kvm_guest+0x60/0x88 pSeries_smp_probe+0x54/0xb0 smp_prepare_cpus+0x3e0/0x430 kernel_init_freeable+0x20c/0x43c kernel_init+0x30/0x1a0 ret_from_kernel_thread+0x5c/0x64
Peter pointed out that this is because do_patch_instruction() has disabled interrupts, but then map_patch_area() calls map_kernel_page() then hash__map_kernel_page() which does a sleeping memory allocation.
We only see the warning in KVM guests with SMT enabled, which is not particularly common, or on other platforms if CONFIG_KPROBES is disabled, also not common. The reason we don't see it in most configurations is that another path that happens to have interrupts enabled has allocated the required page tables for us, eg. there's a path in kprobes init that does that. That's just pure luck though.
As Christophe suggested, the simplest solution is to do a dummy map/unmap when we initialise the patching, so that any required page table levels are pre-allocated before the first call to do_patch_instruction(). This works because the unmap doesn't free any page tables that were allocated by the map, it just clears the PTE, leaving the page table levels there for the next map.
Reported-by: Paul Menzel pmenzel@molgen.mpg.de Debugged-by: Peter Zijlstra peterz@infradead.org Suggested-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220223015821.473097-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/lib/code-patching.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index c5ed98823835..b76b31196be1 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -47,9 +47,14 @@ int raw_patch_instruction(u32 *addr, struct ppc_inst instr) #ifdef CONFIG_STRICT_KERNEL_RWX static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
+static int map_patch_area(void *addr, unsigned long text_poke_addr); +static void unmap_patch_area(unsigned long addr); + static int text_area_cpu_up(unsigned int cpu) { struct vm_struct *area; + unsigned long addr; + int err;
area = get_vm_area(PAGE_SIZE, VM_ALLOC); if (!area) { @@ -57,6 +62,15 @@ static int text_area_cpu_up(unsigned int cpu) cpu); return -1; } + + // Map/unmap the area to ensure all page tables are pre-allocated + addr = (unsigned long)area->addr; + err = map_patch_area(empty_zero_page, addr); + if (err) + return err; + + unmap_patch_area(addr); + this_cpu_write(text_poke_area, area);
return 0;
From: Michael Ellerman mpe@ellerman.id.au
[ Upstream commit 1a76e520ee1831a81dabf8a9a58c6453f700026e ]
Since the IBM A2 CPU support was removed, see commit fb5a515704d7 ("powerpc: Remove platforms/wsp and associated pieces"), the only 64-bit Book3E CPUs we support are Freescale (NXP) ones.
However our Kconfig still allows configurating a kernel that has 64-bit Book3E support, but no Freescale CPU support enabled. Such a kernel would never boot, it doesn't know about any CPUs.
It also causes build errors, as reported by lkp, because PPC_BARRIER_NOSPEC is not enabled in such a configuration:
powerpc64-linux-ld: arch/powerpc/net/bpf_jit_comp64.o:(.toc+0x0): undefined reference to `powerpc_security_features'
To fix this, force PPC_FSL_BOOK3E to be selected whenever we are building a 64-bit Book3E kernel.
Reported-by: kernel test robot lkp@intel.com Reported-by: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com Suggested-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220304061222.2478720-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/platforms/Kconfig.cputype | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index a208997ade88..87a95cbff2f3 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -111,6 +111,7 @@ config PPC_BOOK3S_64
config PPC_BOOK3E_64 bool "Embedded processors" + select PPC_FSL_BOOK3E select PPC_FPU # Make it a choice ? select PPC_SMP_MUXED_IPI select PPC_DOORBELL @@ -287,7 +288,7 @@ config FSL_BOOKE config PPC_FSL_BOOK3E bool select ARCH_SUPPORTS_HUGETLBFS if PHYS_64BIT || PPC64 - select FSL_EMB_PERFMON + imply FSL_EMB_PERFMON select PPC_SMP_MUXED_IPI select PPC_DOORBELL default y if FSL_BOOKE
From: Hangyu Hua hbh25y@gmail.com
[ Upstream commit d601fd24e6964967f115f036a840f4f28488f63f ]
Refcount leak will happen when format_show returns failure in multiple cases. Unified management of of_node_put can fix this problem.
Signed-off-by: Hangyu Hua hbh25y@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220302021959.10959-1-hbh25y@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/secvar-sysfs.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/kernel/secvar-sysfs.c b/arch/powerpc/kernel/secvar-sysfs.c index a0a78aba2083..1ee4640a2641 100644 --- a/arch/powerpc/kernel/secvar-sysfs.c +++ b/arch/powerpc/kernel/secvar-sysfs.c @@ -26,15 +26,18 @@ static ssize_t format_show(struct kobject *kobj, struct kobj_attribute *attr, const char *format;
node = of_find_compatible_node(NULL, NULL, "ibm,secvar-backend"); - if (!of_device_is_available(node)) - return -ENODEV; + if (!of_device_is_available(node)) { + rc = -ENODEV; + goto out; + }
rc = of_property_read_string(node, "format", &format); if (rc) - return rc; + goto out;
rc = sprintf(buf, "%s\n", format);
+out: of_node_put(node);
return rc;
From: Jianglei Nie niejianglei2021@163.com
[ Upstream commit 271add11994ba1a334859069367e04d2be2ebdd4 ]
fc_exch_release(ep) will decrease the ep's reference count. When the reference count reaches zero, it is freed. But ep is still used in the following code, which will lead to a use after free.
Return after the fc_exch_release() call to avoid use after free.
Link: https://lore.kernel.org/r/20220303015115.459778-1-niejianglei2021@163.com Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Jianglei Nie niejianglei2021@163.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/libfc/fc_exch.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/scsi/libfc/fc_exch.c b/drivers/scsi/libfc/fc_exch.c index 841000445b9a..aa223db4cf53 100644 --- a/drivers/scsi/libfc/fc_exch.c +++ b/drivers/scsi/libfc/fc_exch.c @@ -1701,6 +1701,7 @@ static void fc_exch_abts_resp(struct fc_exch *ep, struct fc_frame *fp) if (cancel_delayed_work_sync(&ep->timeout_work)) { FC_EXCH_DBG(ep, "Exchange timer canceled due to ABTS response\n"); fc_exch_release(ep); /* release from pending timer hold */ + return; }
spin_lock_bh(&ep->ex_lock);
From: Oliver Hartkopp socketcan@hartkopp.net
[ Upstream commit 530e0d46c61314c59ecfdb8d3bcb87edbc0f85d3 ]
The N_As value describes the time a CAN frame needs on the wire when transmitted by the CAN controller. Even very short CAN FD frames need arround 100 usecs (bitrate 1Mbit/s, data bitrate 8Mbit/s).
Having N_As to be zero (the former default) leads to 'no CAN frame separation' when STmin is set to zero by the receiving node. This 'burst mode' should not be enabled by default as it could potentially dump a high number of CAN frames into the netdev queue from the soft hrtimer context. This does not affect the system stability but is just not nice and cooperative.
With this N_As/frame_txtime value the 'burst mode' is disabled by default.
As user space applications usually do not set the frame_txtime element of struct can_isotp_options the new in-kernel default is very likely overwritten with zero when the sockopt() CAN_ISOTP_OPTS is invoked. To make sure that a N_As value of zero is only set intentional the value '0' is now interpreted as 'do not change the current value'. When a frame_txtime of zero is required for testing purposes this CAN_ISOTP_FRAME_TXTIME_ZERO u32 value has to be set in frame_txtime.
Link: https://lore.kernel.org/all/20220309120416.83514-2-socketcan@hartkopp.net Signed-off-by: Oliver Hartkopp socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- include/uapi/linux/can/isotp.h | 28 ++++++++++++++++++++++------ net/can/isotp.c | 12 +++++++++++- 2 files changed, 33 insertions(+), 7 deletions(-)
diff --git a/include/uapi/linux/can/isotp.h b/include/uapi/linux/can/isotp.h index c55935b64ccc..590f8aea2b6d 100644 --- a/include/uapi/linux/can/isotp.h +++ b/include/uapi/linux/can/isotp.h @@ -137,20 +137,16 @@ struct can_isotp_ll_options { #define CAN_ISOTP_WAIT_TX_DONE 0x400 /* wait for tx completion */ #define CAN_ISOTP_SF_BROADCAST 0x800 /* 1-to-N functional addressing */
-/* default values */ +/* protocol machine default values */
#define CAN_ISOTP_DEFAULT_FLAGS 0 #define CAN_ISOTP_DEFAULT_EXT_ADDRESS 0x00 #define CAN_ISOTP_DEFAULT_PAD_CONTENT 0xCC /* prevent bit-stuffing */ -#define CAN_ISOTP_DEFAULT_FRAME_TXTIME 0 +#define CAN_ISOTP_DEFAULT_FRAME_TXTIME 50000 /* 50 micro seconds */ #define CAN_ISOTP_DEFAULT_RECV_BS 0 #define CAN_ISOTP_DEFAULT_RECV_STMIN 0x00 #define CAN_ISOTP_DEFAULT_RECV_WFTMAX 0
-#define CAN_ISOTP_DEFAULT_LL_MTU CAN_MTU -#define CAN_ISOTP_DEFAULT_LL_TX_DL CAN_MAX_DLEN -#define CAN_ISOTP_DEFAULT_LL_TX_FLAGS 0 - /* * Remark on CAN_ISOTP_DEFAULT_RECV_* values: * @@ -162,4 +158,24 @@ struct can_isotp_ll_options { * consistency and copied directly into the flow control (FC) frame. */
+/* link layer default values => make use of Classical CAN frames */ + +#define CAN_ISOTP_DEFAULT_LL_MTU CAN_MTU +#define CAN_ISOTP_DEFAULT_LL_TX_DL CAN_MAX_DLEN +#define CAN_ISOTP_DEFAULT_LL_TX_FLAGS 0 + +/* + * The CAN_ISOTP_DEFAULT_FRAME_TXTIME has become a non-zero value as + * it only makes sense for isotp implementation tests to run without + * a N_As value. As user space applications usually do not set the + * frame_txtime element of struct can_isotp_options the new in-kernel + * default is very likely overwritten with zero when the sockopt() + * CAN_ISOTP_OPTS is invoked. + * To make sure that a N_As value of zero is only set intentional the + * value '0' is now interpreted as 'do not change the current value'. + * When a frame_txtime of zero is required for testing purposes this + * CAN_ISOTP_FRAME_TXTIME_ZERO u32 value has to be set in frame_txtime. + */ +#define CAN_ISOTP_FRAME_TXTIME_ZERO 0xFFFFFFFF + #endif /* !_UAPI_CAN_ISOTP_H */ diff --git a/net/can/isotp.c b/net/can/isotp.c index a95d171b3a64..5bce7c66c121 100644 --- a/net/can/isotp.c +++ b/net/can/isotp.c @@ -141,6 +141,7 @@ struct isotp_sock { struct can_isotp_options opt; struct can_isotp_fc_options rxfc, txfc; struct can_isotp_ll_options ll; + u32 frame_txtime; u32 force_tx_stmin; u32 force_rx_stmin; struct tpcon rx, tx; @@ -360,7 +361,7 @@ static int isotp_rcv_fc(struct isotp_sock *so, struct canfd_frame *cf, int ae)
so->tx_gap = ktime_set(0, 0); /* add transmission time for CAN frame N_As */ - so->tx_gap = ktime_add_ns(so->tx_gap, so->opt.frame_txtime); + so->tx_gap = ktime_add_ns(so->tx_gap, so->frame_txtime); /* add waiting time for consecutive frames N_Cs */ if (so->opt.flags & CAN_ISOTP_FORCE_TXSTMIN) so->tx_gap = ktime_add_ns(so->tx_gap, @@ -1247,6 +1248,14 @@ static int isotp_setsockopt_locked(struct socket *sock, int level, int optname, /* no separate rx_ext_address is given => use ext_address */ if (!(so->opt.flags & CAN_ISOTP_RX_EXT_ADDR)) so->opt.rx_ext_address = so->opt.ext_address; + + /* check for frame_txtime changes (0 => no changes) */ + if (so->opt.frame_txtime) { + if (so->opt.frame_txtime == CAN_ISOTP_FRAME_TXTIME_ZERO) + so->frame_txtime = 0; + else + so->frame_txtime = so->opt.frame_txtime; + } break;
case CAN_ISOTP_RECV_FC: @@ -1448,6 +1457,7 @@ static int isotp_init(struct sock *sk) so->opt.rxpad_content = CAN_ISOTP_DEFAULT_PAD_CONTENT; so->opt.txpad_content = CAN_ISOTP_DEFAULT_PAD_CONTENT; so->opt.frame_txtime = CAN_ISOTP_DEFAULT_FRAME_TXTIME; + so->frame_txtime = CAN_ISOTP_DEFAULT_FRAME_TXTIME; so->rxfc.bs = CAN_ISOTP_DEFAULT_RECV_BS; so->rxfc.stmin = CAN_ISOTP_DEFAULT_RECV_STMIN; so->rxfc.wftmax = CAN_ISOTP_DEFAULT_RECV_WFTMAX;
From: Vincent Mailhol mailhol.vincent@wanadoo.fr
[ Upstream commit 7a8cd7c0ee823a1cc893ab3feaa23e4b602bfb9a ]
Function es58x_fd_rx_event() invokes the es58x_check_msg_len() macro:
| ret = es58x_check_msg_len(es58x_dev->dev, *rx_event_msg, msg_len);
While doing so, it dereferences an uninitialized variable: *rx_event_msg.
This is actually harmless because es58x_check_msg_len() only uses preprocessor macros (sizeof() and __stringify()) on *rx_event_msg. c.f. [1].
Nonetheless, this pattern is confusing so the lines are reordered to make sure that rx_event_msg is correctly initialized.
This patch also fixes a false positive warning reported by cppcheck:
| cppcheck possible warnings: (new ones prefixed by >>, may not be real problems) | | In file included from drivers/net/can/usb/etas_es58x/es58x_fd.c: | >> drivers/net/can/usb/etas_es58x/es58x_fd.c:174:8: warning: Uninitialized variable: rx_event_msg [uninitvar] | ret = es58x_check_msg_len(es58x_dev->dev, *rx_event_msg, msg_len); | ^
[1] https://elixir.bootlin.com/linux/v5.16/source/drivers/net/can/usb/etas_es58x...
Link: https://lore.kernel.org/all/20220306101302.708783-1-mailhol.vincent@wanadoo.... Signed-off-by: Vincent Mailhol mailhol.vincent@wanadoo.fr Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/can/usb/etas_es58x/es58x_fd.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/can/usb/etas_es58x/es58x_fd.c b/drivers/net/can/usb/etas_es58x/es58x_fd.c index af042aa55f59..26bf4775e884 100644 --- a/drivers/net/can/usb/etas_es58x/es58x_fd.c +++ b/drivers/net/can/usb/etas_es58x/es58x_fd.c @@ -171,12 +171,11 @@ static int es58x_fd_rx_event_msg(struct net_device *netdev, const struct es58x_fd_rx_event_msg *rx_event_msg; int ret;
+ rx_event_msg = &es58x_fd_urb_cmd->rx_event_msg; ret = es58x_check_msg_len(es58x_dev->dev, *rx_event_msg, msg_len); if (ret) return ret;
- rx_event_msg = &es58x_fd_urb_cmd->rx_event_msg; - return es58x_rx_err_msg(netdev, rx_event_msg->error_code, rx_event_msg->event_code, get_unaligned_le64(&rx_event_msg->timestamp));
From: Michael T. Kloos michael@michaelkloos.com
[ Upstream commit 9d1f0ec9f71780e69ceb9d91697600c747d6e02e ]
Rewrote the RISC-V memmove() assembly implementation. The previous implementation did not check memory alignment and it compared 2 pointers with a signed comparison. The misaligned memory access would cause the kernel to crash on systems that did not emulate it in firmware and did not support it in hardware. Firmware emulation is slow and may not exist. The RISC-V spec does not guarantee that support for misaligned memory accesses will exist. It should not be depended on.
This patch now checks for XLEN granularity of co-alignment between the pointers. Failing that, copying is done by loading from the 2 contiguous and naturally aligned XLEN memory locations containing the overlapping XLEN sized data to be copied. The data is shifted into the correct place and binary or'ed together on each iteration. The result is then stored into the corresponding naturally aligned XLEN sized location in the destination. For unaligned data at the terminations of the regions to be copied or for copies less than (2 * XLEN) in size, byte copy is used.
This patch also now uses unsigned comparison for the pointers and migrates to the newer assembler annotations from the now deprecated ones.
Signed-off-by: Michael T. Kloos michael@michaelkloos.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/lib/memmove.S | 368 +++++++++++++++++++++++++++++++++------ 1 file changed, 310 insertions(+), 58 deletions(-)
diff --git a/arch/riscv/lib/memmove.S b/arch/riscv/lib/memmove.S index 07d1d2152ba5..e0609e1f0864 100644 --- a/arch/riscv/lib/memmove.S +++ b/arch/riscv/lib/memmove.S @@ -1,64 +1,316 @@ -/* SPDX-License-Identifier: GPL-2.0 */ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2022 Michael T. Kloos michael@michaelkloos.com + */
#include <linux/linkage.h> #include <asm/asm.h>
-ENTRY(__memmove) -WEAK(memmove) - move t0, a0 - move t1, a1 - - beq a0, a1, exit_memcpy - beqz a2, exit_memcpy - srli t2, a2, 0x2 - - slt t3, a0, a1 - beqz t3, do_reverse - - andi a2, a2, 0x3 - li t4, 1 - beqz t2, byte_copy - -word_copy: - lw t3, 0(a1) - addi t2, t2, -1 - addi a1, a1, 4 - sw t3, 0(a0) - addi a0, a0, 4 - bnez t2, word_copy - beqz a2, exit_memcpy - j byte_copy - -do_reverse: - add a0, a0, a2 - add a1, a1, a2 - andi a2, a2, 0x3 - li t4, -1 - beqz t2, reverse_byte_copy - -reverse_word_copy: - addi a1, a1, -4 - addi t2, t2, -1 - lw t3, 0(a1) - addi a0, a0, -4 - sw t3, 0(a0) - bnez t2, reverse_word_copy - beqz a2, exit_memcpy - -reverse_byte_copy: - addi a0, a0, -1 - addi a1, a1, -1 +SYM_FUNC_START(__memmove) +SYM_FUNC_START_WEAK(memmove) + /* + * Returns + * a0 - dest + * + * Parameters + * a0 - Inclusive first byte of dest + * a1 - Inclusive first byte of src + * a2 - Length of copy n + * + * Because the return matches the parameter register a0, + * we will not clobber or modify that register. + * + * Note: This currently only works on little-endian. + * To port to big-endian, reverse the direction of shifts + * in the 2 misaligned fixup copy loops. + */
+ /* Return if nothing to do */ + beq a0, a1, return_from_memmove + beqz a2, return_from_memmove + + /* + * Register Uses + * Forward Copy: a1 - Index counter of src + * Reverse Copy: a4 - Index counter of src + * Forward Copy: t3 - Index counter of dest + * Reverse Copy: t4 - Index counter of dest + * Both Copy Modes: t5 - Inclusive first multibyte/aligned of dest + * Both Copy Modes: t6 - Non-Inclusive last multibyte/aligned of dest + * Both Copy Modes: t0 - Link / Temporary for load-store + * Both Copy Modes: t1 - Temporary for load-store + * Both Copy Modes: t2 - Temporary for load-store + * Both Copy Modes: a5 - dest to src alignment offset + * Both Copy Modes: a6 - Shift ammount + * Both Copy Modes: a7 - Inverse Shift ammount + * Both Copy Modes: a2 - Alternate breakpoint for unrolled loops + */ + + /* + * Solve for some register values now. + * Byte copy does not need t5 or t6. + */ + mv t3, a0 + add t4, a0, a2 + add a4, a1, a2 + + /* + * Byte copy if copying less than (2 * SZREG) bytes. This can + * cause problems with the bulk copy implementation and is + * small enough not to bother. + */ + andi t0, a2, -(2 * SZREG) + beqz t0, byte_copy + + /* + * Now solve for t5 and t6. + */ + andi t5, t3, -SZREG + andi t6, t4, -SZREG + /* + * If dest(Register t3) rounded down to the nearest naturally + * aligned SZREG address, does not equal dest, then add SZREG + * to find the low-bound of SZREG alignment in the dest memory + * region. Note that this could overshoot the dest memory + * region if n is less than SZREG. This is one reason why + * we always byte copy if n is less than SZREG. + * Otherwise, dest is already naturally aligned to SZREG. + */ + beq t5, t3, 1f + addi t5, t5, SZREG + 1: + + /* + * If the dest and src are co-aligned to SZREG, then there is + * no need for the full rigmarole of a full misaligned fixup copy. + * Instead, do a simpler co-aligned copy. + */ + xor t0, a0, a1 + andi t1, t0, (SZREG - 1) + beqz t1, coaligned_copy + /* Fall through to misaligned fixup copy */ + +misaligned_fixup_copy: + bltu a1, a0, misaligned_fixup_copy_reverse + +misaligned_fixup_copy_forward: + jal t0, byte_copy_until_aligned_forward + + andi a5, a1, (SZREG - 1) /* Find the alignment offset of src (a1) */ + slli a6, a5, 3 /* Multiply by 8 to convert that to bits to shift */ + sub a5, a1, t3 /* Find the difference between src and dest */ + andi a1, a1, -SZREG /* Align the src pointer */ + addi a2, t6, SZREG /* The other breakpoint for the unrolled loop*/ + + /* + * Compute The Inverse Shift + * a7 = XLEN - a6 = XLEN + -a6 + * 2s complement negation to find the negative: -a6 = ~a6 + 1 + * Add that to XLEN. XLEN = SZREG * 8. + */ + not a7, a6 + addi a7, a7, (SZREG * 8 + 1) + + /* + * Fix Misalignment Copy Loop - Forward + * load_val0 = load_ptr[0]; + * do { + * load_val1 = load_ptr[1]; + * store_ptr += 2; + * store_ptr[0 - 2] = (load_val0 >> {a6}) | (load_val1 << {a7}); + * + * if (store_ptr == {a2}) + * break; + * + * load_val0 = load_ptr[2]; + * load_ptr += 2; + * store_ptr[1 - 2] = (load_val1 >> {a6}) | (load_val0 << {a7}); + * + * } while (store_ptr != store_ptr_end); + * store_ptr = store_ptr_end; + */ + + REG_L t0, (0 * SZREG)(a1) + 1: + REG_L t1, (1 * SZREG)(a1) + addi t3, t3, (2 * SZREG) + srl t0, t0, a6 + sll t2, t1, a7 + or t2, t0, t2 + REG_S t2, ((0 * SZREG) - (2 * SZREG))(t3) + + beq t3, a2, 2f + + REG_L t0, (2 * SZREG)(a1) + addi a1, a1, (2 * SZREG) + srl t1, t1, a6 + sll t2, t0, a7 + or t2, t1, t2 + REG_S t2, ((1 * SZREG) - (2 * SZREG))(t3) + + bne t3, t6, 1b + 2: + mv t3, t6 /* Fix the dest pointer in case the loop was broken */ + + add a1, t3, a5 /* Restore the src pointer */ + j byte_copy_forward /* Copy any remaining bytes */ + +misaligned_fixup_copy_reverse: + jal t0, byte_copy_until_aligned_reverse + + andi a5, a4, (SZREG - 1) /* Find the alignment offset of src (a4) */ + slli a6, a5, 3 /* Multiply by 8 to convert that to bits to shift */ + sub a5, a4, t4 /* Find the difference between src and dest */ + andi a4, a4, -SZREG /* Align the src pointer */ + addi a2, t5, -SZREG /* The other breakpoint for the unrolled loop*/ + + /* + * Compute The Inverse Shift + * a7 = XLEN - a6 = XLEN + -a6 + * 2s complement negation to find the negative: -a6 = ~a6 + 1 + * Add that to XLEN. XLEN = SZREG * 8. + */ + not a7, a6 + addi a7, a7, (SZREG * 8 + 1) + + /* + * Fix Misalignment Copy Loop - Reverse + * load_val1 = load_ptr[0]; + * do { + * load_val0 = load_ptr[-1]; + * store_ptr -= 2; + * store_ptr[1] = (load_val0 >> {a6}) | (load_val1 << {a7}); + * + * if (store_ptr == {a2}) + * break; + * + * load_val1 = load_ptr[-2]; + * load_ptr -= 2; + * store_ptr[0] = (load_val1 >> {a6}) | (load_val0 << {a7}); + * + * } while (store_ptr != store_ptr_end); + * store_ptr = store_ptr_end; + */ + + REG_L t1, ( 0 * SZREG)(a4) + 1: + REG_L t0, (-1 * SZREG)(a4) + addi t4, t4, (-2 * SZREG) + sll t1, t1, a7 + srl t2, t0, a6 + or t2, t1, t2 + REG_S t2, ( 1 * SZREG)(t4) + + beq t4, a2, 2f + + REG_L t1, (-2 * SZREG)(a4) + addi a4, a4, (-2 * SZREG) + sll t0, t0, a7 + srl t2, t1, a6 + or t2, t0, t2 + REG_S t2, ( 0 * SZREG)(t4) + + bne t4, t5, 1b + 2: + mv t4, t5 /* Fix the dest pointer in case the loop was broken */ + + add a4, t4, a5 /* Restore the src pointer */ + j byte_copy_reverse /* Copy any remaining bytes */ + +/* + * Simple copy loops for SZREG co-aligned memory locations. + * These also make calls to do byte copies for any unaligned + * data at their terminations. + */ +coaligned_copy: + bltu a1, a0, coaligned_copy_reverse + +coaligned_copy_forward: + jal t0, byte_copy_until_aligned_forward + + 1: + REG_L t1, ( 0 * SZREG)(a1) + addi a1, a1, SZREG + addi t3, t3, SZREG + REG_S t1, (-1 * SZREG)(t3) + bne t3, t6, 1b + + j byte_copy_forward /* Copy any remaining bytes */ + +coaligned_copy_reverse: + jal t0, byte_copy_until_aligned_reverse + + 1: + REG_L t1, (-1 * SZREG)(a4) + addi a4, a4, -SZREG + addi t4, t4, -SZREG + REG_S t1, ( 0 * SZREG)(t4) + bne t4, t5, 1b + + j byte_copy_reverse /* Copy any remaining bytes */ + +/* + * These are basically sub-functions within the function. They + * are used to byte copy until the dest pointer is in alignment. + * At which point, a bulk copy method can be used by the + * calling code. These work on the same registers as the bulk + * copy loops. Therefore, the register values can be picked + * up from where they were left and we avoid code duplication + * without any overhead except the call in and return jumps. + */ +byte_copy_until_aligned_forward: + beq t3, t5, 2f + 1: + lb t1, 0(a1) + addi a1, a1, 1 + addi t3, t3, 1 + sb t1, -1(t3) + bne t3, t5, 1b + 2: + jalr zero, 0x0(t0) /* Return to multibyte copy loop */ + +byte_copy_until_aligned_reverse: + beq t4, t6, 2f + 1: + lb t1, -1(a4) + addi a4, a4, -1 + addi t4, t4, -1 + sb t1, 0(t4) + bne t4, t6, 1b + 2: + jalr zero, 0x0(t0) /* Return to multibyte copy loop */ + +/* + * Simple byte copy loops. + * These will byte copy until they reach the end of data to copy. + * At that point, they will call to return from memmove. + */ byte_copy: - lb t3, 0(a1) - addi a2, a2, -1 - sb t3, 0(a0) - add a1, a1, t4 - add a0, a0, t4 - bnez a2, byte_copy - -exit_memcpy: - move a0, t0 - move a1, t1 - ret -END(__memmove) + bltu a1, a0, byte_copy_reverse + +byte_copy_forward: + beq t3, t4, 2f + 1: + lb t1, 0(a1) + addi a1, a1, 1 + addi t3, t3, 1 + sb t1, -1(t3) + bne t3, t4, 1b + 2: + ret + +byte_copy_reverse: + beq t4, t3, 2f + 1: + lb t1, -1(a4) + addi a4, a4, -1 + addi t4, t4, -1 + sb t1, 0(t4) + bne t4, t3, 1b + 2: + +return_from_memmove: + ret + +SYM_FUNC_END(memmove) +SYM_FUNC_END(__memmove)
Backporting to 5.15 looks good to me.
Michael
On Tue, 2022-04-12 at 08:28 +0200, Greg Kroah-Hartman wrote:
From: Michael T. Kloos michael@michaelkloos.com
[ Upstream commit 9d1f0ec9f71780e69ceb9d91697600c747d6e02e ]
Rewrote the RISC-V memmove() assembly implementation. The previous implementation did not check memory alignment and it compared 2 pointers with a signed comparison. The misaligned memory access would cause the kernel to crash on systems that did not emulate it in firmware and did not support it in hardware. Firmware emulation is slow and may not exist. The RISC-V spec does not guarantee that support for misaligned memory accesses will exist. It should not be depended on.
This patch now checks for XLEN granularity of co-alignment between the pointers. Failing that, copying is done by loading from the 2 contiguous and naturally aligned XLEN memory locations containing the overlapping XLEN sized data to be copied. The data is shifted into the correct place and binary or'ed together on each iteration. The result is then stored into the corresponding naturally aligned XLEN sized location in the destination. For unaligned data at the terminations of the regions to be copied or for copies less than (2 * XLEN) in size, byte copy is used.
This patch also now uses unsigned comparison for the pointers and migrates to the newer assembler annotations from the now deprecated ones.
Signed-off-by: Michael T. Kloos michael@michaelkloos.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org
arch/riscv/lib/memmove.S | 368 +++++++++++++++++++++++++++++++++------ 1 file changed, 310 insertions(+), 58 deletions(-)
diff --git a/arch/riscv/lib/memmove.S b/arch/riscv/lib/memmove.S index 07d1d2152ba5..e0609e1f0864 100644 --- a/arch/riscv/lib/memmove.S +++ b/arch/riscv/lib/memmove.S @@ -1,64 +1,316 @@ -/* SPDX-License-Identifier: GPL-2.0 */ +/* SPDX-License-Identifier: GPL-2.0-only */ +/*
- Copyright (C) 2022 Michael T. Kloos michael@michaelkloos.com
- */
#include <linux/linkage.h> #include <asm/asm.h> -ENTRY(__memmove) -WEAK(memmove)
move t0, a0
move t1, a1
beq a0, a1, exit_memcpy
beqz a2, exit_memcpy
srli t2, a2, 0x2
slt t3, a0, a1
beqz t3, do_reverse
andi a2, a2, 0x3
li t4, 1
beqz t2, byte_copy
-word_copy:
lw t3, 0(a1)
addi t2, t2, -1
addi a1, a1, 4
sw t3, 0(a0)
addi a0, a0, 4
bnez t2, word_copy
beqz a2, exit_memcpy
j byte_copy
-do_reverse:
add a0, a0, a2
add a1, a1, a2
andi a2, a2, 0x3
li t4, -1
beqz t2, reverse_byte_copy
-reverse_word_copy:
addi a1, a1, -4
addi t2, t2, -1
lw t3, 0(a1)
addi a0, a0, -4
sw t3, 0(a0)
bnez t2, reverse_word_copy
beqz a2, exit_memcpy
-reverse_byte_copy:
addi a0, a0, -1
addi a1, a1, -1
+SYM_FUNC_START(__memmove) +SYM_FUNC_START_WEAK(memmove)
- /*
* Returns
* a0 - dest
*
* Parameters
* a0 - Inclusive first byte of dest
* a1 - Inclusive first byte of src
* a2 - Length of copy n
*
* Because the return matches the parameter register a0,
* we will not clobber or modify that register.
*
* Note: This currently only works on little-endian.
* To port to big-endian, reverse the direction of shifts
* in the 2 misaligned fixup copy loops.
*/
- /* Return if nothing to do */
- beq a0, a1, return_from_memmove
- beqz a2, return_from_memmove
- /*
* Register Uses
* Forward Copy: a1 - Index counter of src
* Reverse Copy: a4 - Index counter of src
* Forward Copy: t3 - Index counter of dest
* Reverse Copy: t4 - Index counter of dest
* Both Copy Modes: t5 - Inclusive first multibyte/aligned of dest
* Both Copy Modes: t6 - Non-Inclusive last multibyte/aligned of dest
* Both Copy Modes: t0 - Link / Temporary for load-store
* Both Copy Modes: t1 - Temporary for load-store
* Both Copy Modes: t2 - Temporary for load-store
* Both Copy Modes: a5 - dest to src alignment offset
* Both Copy Modes: a6 - Shift ammount
* Both Copy Modes: a7 - Inverse Shift ammount
* Both Copy Modes: a2 - Alternate breakpoint for unrolled loops
*/
- /*
* Solve for some register values now.
* Byte copy does not need t5 or t6.
*/
- mv t3, a0
- add t4, a0, a2
- add a4, a1, a2
- /*
* Byte copy if copying less than (2 * SZREG) bytes. This can
* cause problems with the bulk copy implementation and is
* small enough not to bother.
*/
- andi t0, a2, -(2 * SZREG)
- beqz t0, byte_copy
- /*
* Now solve for t5 and t6.
*/
- andi t5, t3, -SZREG
- andi t6, t4, -SZREG
- /*
* If dest(Register t3) rounded down to the nearest naturally
* aligned SZREG address, does not equal dest, then add SZREG
* to find the low-bound of SZREG alignment in the dest memory
* region. Note that this could overshoot the dest memory
* region if n is less than SZREG. This is one reason why
* we always byte copy if n is less than SZREG.
* Otherwise, dest is already naturally aligned to SZREG.
*/
- beq t5, t3, 1f
addi t5, t5, SZREG
- 1:
- /*
* If the dest and src are co-aligned to SZREG, then there is
* no need for the full rigmarole of a full misaligned fixup copy.
* Instead, do a simpler co-aligned copy.
*/
- xor t0, a0, a1
- andi t1, t0, (SZREG - 1)
- beqz t1, coaligned_copy
- /* Fall through to misaligned fixup copy */
+misaligned_fixup_copy:
- bltu a1, a0, misaligned_fixup_copy_reverse
+misaligned_fixup_copy_forward:
- jal t0, byte_copy_until_aligned_forward
- andi a5, a1, (SZREG - 1) /* Find the alignment offset of src (a1) */
- slli a6, a5, 3 /* Multiply by 8 to convert that to bits to shift */
- sub a5, a1, t3 /* Find the difference between src and dest */
- andi a1, a1, -SZREG /* Align the src pointer */
- addi a2, t6, SZREG /* The other breakpoint for the unrolled loop*/
- /*
* Compute The Inverse Shift
* a7 = XLEN - a6 = XLEN + -a6
* 2s complement negation to find the negative: -a6 = ~a6 + 1
* Add that to XLEN. XLEN = SZREG * 8.
*/
- not a7, a6
- addi a7, a7, (SZREG * 8 + 1)
- /*
* Fix Misalignment Copy Loop - Forward
* load_val0 = load_ptr[0];
* do {
* load_val1 = load_ptr[1];
* store_ptr += 2;
* store_ptr[0 - 2] = (load_val0 >> {a6}) | (load_val1 << {a7});
*
* if (store_ptr == {a2})
* break;
*
* load_val0 = load_ptr[2];
* load_ptr += 2;
* store_ptr[1 - 2] = (load_val1 >> {a6}) | (load_val0 << {a7});
*
* } while (store_ptr != store_ptr_end);
* store_ptr = store_ptr_end;
*/
- REG_L t0, (0 * SZREG)(a1)
- 1:
- REG_L t1, (1 * SZREG)(a1)
- addi t3, t3, (2 * SZREG)
- srl t0, t0, a6
- sll t2, t1, a7
- or t2, t0, t2
- REG_S t2, ((0 * SZREG) - (2 * SZREG))(t3)
- beq t3, a2, 2f
- REG_L t0, (2 * SZREG)(a1)
- addi a1, a1, (2 * SZREG)
- srl t1, t1, a6
- sll t2, t0, a7
- or t2, t1, t2
- REG_S t2, ((1 * SZREG) - (2 * SZREG))(t3)
- bne t3, t6, 1b
- 2:
- mv t3, t6 /* Fix the dest pointer in case the loop was broken */
- add a1, t3, a5 /* Restore the src pointer */
- j byte_copy_forward /* Copy any remaining bytes */
+misaligned_fixup_copy_reverse:
- jal t0, byte_copy_until_aligned_reverse
- andi a5, a4, (SZREG - 1) /* Find the alignment offset of src (a4) */
- slli a6, a5, 3 /* Multiply by 8 to convert that to bits to shift */
- sub a5, a4, t4 /* Find the difference between src and dest */
- andi a4, a4, -SZREG /* Align the src pointer */
- addi a2, t5, -SZREG /* The other breakpoint for the unrolled loop*/
- /*
* Compute The Inverse Shift
* a7 = XLEN - a6 = XLEN + -a6
* 2s complement negation to find the negative: -a6 = ~a6 + 1
* Add that to XLEN. XLEN = SZREG * 8.
*/
- not a7, a6
- addi a7, a7, (SZREG * 8 + 1)
- /*
* Fix Misalignment Copy Loop - Reverse
* load_val1 = load_ptr[0];
* do {
* load_val0 = load_ptr[-1];
* store_ptr -= 2;
* store_ptr[1] = (load_val0 >> {a6}) | (load_val1 << {a7});
*
* if (store_ptr == {a2})
* break;
*
* load_val1 = load_ptr[-2];
* load_ptr -= 2;
* store_ptr[0] = (load_val1 >> {a6}) | (load_val0 << {a7});
*
* } while (store_ptr != store_ptr_end);
* store_ptr = store_ptr_end;
*/
- REG_L t1, ( 0 * SZREG)(a4)
- 1:
- REG_L t0, (-1 * SZREG)(a4)
- addi t4, t4, (-2 * SZREG)
- sll t1, t1, a7
- srl t2, t0, a6
- or t2, t1, t2
- REG_S t2, ( 1 * SZREG)(t4)
- beq t4, a2, 2f
- REG_L t1, (-2 * SZREG)(a4)
- addi a4, a4, (-2 * SZREG)
- sll t0, t0, a7
- srl t2, t1, a6
- or t2, t0, t2
- REG_S t2, ( 0 * SZREG)(t4)
- bne t4, t5, 1b
- 2:
- mv t4, t5 /* Fix the dest pointer in case the loop was broken */
- add a4, t4, a5 /* Restore the src pointer */
- j byte_copy_reverse /* Copy any remaining bytes */
+/*
- Simple copy loops for SZREG co-aligned memory locations.
- These also make calls to do byte copies for any unaligned
- data at their terminations.
- */
+coaligned_copy:
- bltu a1, a0, coaligned_copy_reverse
+coaligned_copy_forward:
- jal t0, byte_copy_until_aligned_forward
- 1:
- REG_L t1, ( 0 * SZREG)(a1)
- addi a1, a1, SZREG
- addi t3, t3, SZREG
- REG_S t1, (-1 * SZREG)(t3)
- bne t3, t6, 1b
- j byte_copy_forward /* Copy any remaining bytes */
+coaligned_copy_reverse:
- jal t0, byte_copy_until_aligned_reverse
- 1:
- REG_L t1, (-1 * SZREG)(a4)
- addi a4, a4, -SZREG
- addi t4, t4, -SZREG
- REG_S t1, ( 0 * SZREG)(t4)
- bne t4, t5, 1b
- j byte_copy_reverse /* Copy any remaining bytes */
+/*
- These are basically sub-functions within the function. They
- are used to byte copy until the dest pointer is in alignment.
- At which point, a bulk copy method can be used by the
- calling code. These work on the same registers as the bulk
- copy loops. Therefore, the register values can be picked
- up from where they were left and we avoid code duplication
- without any overhead except the call in and return jumps.
- */
+byte_copy_until_aligned_forward:
- beq t3, t5, 2f
- 1:
- lb t1, 0(a1)
- addi a1, a1, 1
- addi t3, t3, 1
- sb t1, -1(t3)
- bne t3, t5, 1b
- 2:
- jalr zero, 0x0(t0) /* Return to multibyte copy loop */
+byte_copy_until_aligned_reverse:
- beq t4, t6, 2f
- 1:
- lb t1, -1(a4)
- addi a4, a4, -1
- addi t4, t4, -1
- sb t1, 0(t4)
- bne t4, t6, 1b
- 2:
- jalr zero, 0x0(t0) /* Return to multibyte copy loop */
+/*
- Simple byte copy loops.
- These will byte copy until they reach the end of data to copy.
- At that point, they will call to return from memmove.
- */
byte_copy:
lb t3, 0(a1)
addi a2, a2, -1
sb t3, 0(a0)
add a1, a1, t4
add a0, a0, t4
bnez a2, byte_copy
-exit_memcpy:
move a0, t0
move a1, t1
ret
-END(__memmove)
- bltu a1, a0, byte_copy_reverse
+byte_copy_forward:
- beq t3, t4, 2f
- 1:
- lb t1, 0(a1)
- addi a1, a1, 1
- addi t3, t3, 1
- sb t1, -1(t3)
- bne t3, t4, 1b
- 2:
- ret
+byte_copy_reverse:
- beq t4, t3, 2f
- 1:
- lb t1, -1(a4)
- addi a4, a4, -1
- addi t4, t4, -1
- sb t1, 0(t4)
- bne t4, t3, 1b
- 2:
+return_from_memmove:
- ret
+SYM_FUNC_END(memmove) +SYM_FUNC_END(__memmove)
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 5d26cff5bdbebdf98ba48217c078ff102536f134 ]
George reports that altnames can eat up kernel memory. We should charge that memory appropriately.
Reported-by: George Shuklin george.shuklin@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/rtnetlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 91d7a5a5a08d..a8c319dc224a 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3637,7 +3637,7 @@ static int rtnl_alt_ifname(int cmd, struct net_device *dev, struct nlattr *attr, if (err) return err;
- alt_ifname = nla_strdup(attr, GFP_KERNEL); + alt_ifname = nla_strdup(attr, GFP_KERNEL_ACCOUNT); if (!alt_ifname) return -ENOMEM;
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 155fb43b70b5fce341347a77d1af2765d1e8fbb8 ]
Property list (altname is a link "property") is wrapped in a nlattr. nlattrs length is 16bit so practically speaking the list of properties can't be longer than that, otherwise user space would have to interpret broken netlink messages.
Prevent the problem from occurring by checking the length of the property list before adding new entries.
Reported-by: George Shuklin george.shuklin@gmail.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/rtnetlink.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index a8c319dc224a..9c0e8ccf9bc5 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3631,12 +3631,23 @@ static int rtnl_alt_ifname(int cmd, struct net_device *dev, struct nlattr *attr, bool *changed, struct netlink_ext_ack *extack) { char *alt_ifname; + size_t size; int err;
err = nla_validate(attr, attr->nla_len, IFLA_MAX, ifla_policy, extack); if (err) return err;
+ if (cmd == RTM_NEWLINKPROP) { + size = rtnl_prop_list_size(dev); + size += nla_total_size(ALTIFNAMSIZ); + if (size >= U16_MAX) { + NL_SET_ERR_MSG(extack, + "effective property list too long"); + return -EINVAL; + } + } + alt_ifname = nla_strdup(attr, GFP_KERNEL_ACCOUNT); if (!alt_ifname) return -ENOMEM;
From: Gal Pressman gal@nvidia.com
[ Upstream commit 970adfb76095fa719778d70a6b86030d2feb88dd ]
Unlike the legacy EEPROM callbacks, when using the netlink EEPROM query (get_module_eeprom_by_page) the driver should not try to validate the query parameters, but just perform the read requested by the userspace.
Recent discussion in the mailing list: https://lore.kernel.org/netdev/20220120093051.70845141@kicinski-fedora-PC1C0...
Signed-off-by: Gal Pressman gal@nvidia.com Reviewed-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Maxim Mikityanskiy maximmi@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/port.c | 23 ------------------- 1 file changed, 23 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c index 7b16a1188aab..fd79860de723 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/port.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c @@ -433,35 +433,12 @@ int mlx5_query_module_eeprom_by_page(struct mlx5_core_dev *dev, struct mlx5_module_eeprom_query_params *params, u8 *data) { - u8 module_id; int err;
err = mlx5_query_module_num(dev, ¶ms->module_number); if (err) return err;
- err = mlx5_query_module_id(dev, params->module_number, &module_id); - if (err) - return err; - - switch (module_id) { - case MLX5_MODULE_ID_SFP: - if (params->page > 0) - return -EINVAL; - break; - case MLX5_MODULE_ID_QSFP: - case MLX5_MODULE_ID_QSFP28: - case MLX5_MODULE_ID_QSFP_PLUS: - if (params->page > 3) - return -EINVAL; - break; - case MLX5_MODULE_ID_DSFP: - break; - default: - mlx5_core_err(dev, "Module ID not recognized: 0x%x\n", module_id); - return -EINVAL; - } - if (params->i2c_address != MLX5_I2C_ADDR_HIGH && params->i2c_address != MLX5_I2C_ADDR_LOW) { mlx5_core_err(dev, "I2C address not recognized: 0x%x\n", params->i2c_address);
From: Michael Walle michael@walle.cc
[ Upstream commit 00eec9fe4f3b9588b4bfa8ef9dd0aae96407d5d7 ]
The Lantech 8330-262D-E module is 2500base-X capable, but it reports the nominal bitrate as 2500MBd instead of 3125MBd. Add a quirk for the module.
The following in an EEPROM dump of such a SFP with the serial number redacted:
00: 03 04 07 00 00 00 01 20 40 0c 05 01 19 00 00 00 ???...? @????... 10: 1e 0f 00 00 4c 61 6e 74 65 63 68 20 20 20 20 20 ??..Lantech 20: 20 20 20 20 00 00 00 00 38 33 33 30 2d 32 36 32 ....8330-262 30: 44 2d 45 20 20 20 20 20 56 31 2e 30 03 52 00 cb D-E V1.0?R.? 40: 00 1a 00 00 46 43 XX XX XX XX XX XX XX XX XX XX .?..FCXXXXXXXXXX 50: 20 20 20 20 32 32 30 32 31 34 20 20 68 b0 01 98 220214 h??? 60: 45 58 54 52 45 4d 45 4c 59 20 43 4f 4d 50 41 54 EXTREMELY COMPAT 70: 49 42 4c 45 20 20 20 20 20 20 20 20 20 20 20 20 IBLE
Signed-off-by: Michael Walle michael@walle.cc Link: https://lore.kernel.org/r/20220312205014.4154907-1-michael@walle.cc Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/sfp-bus.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/phy/sfp-bus.c b/drivers/net/phy/sfp-bus.c index ef2c6a09eb0f..4369d6249e7b 100644 --- a/drivers/net/phy/sfp-bus.c +++ b/drivers/net/phy/sfp-bus.c @@ -74,6 +74,12 @@ static const struct sfp_quirk sfp_quirks[] = { .vendor = "HUAWEI", .part = "MA5671A", .modes = sfp_quirk_2500basex, + }, { + // Lantech 8330-262D-E can operate at 2500base-X, but + // incorrectly report 2500MBd NRZ in their EEPROM + .vendor = "Lantech", + .part = "8330-262D-E", + .modes = sfp_quirk_2500basex, }, { .vendor = "UBNT", .part = "UF-INSTANT",
From: H. Nikolaus Schaller hns@goldelico.com
[ Upstream commit ac01df343e5a6c6bcead2ed421af1fde30f73e7e ]
Usually, the vbus_regulator (smps10 on omap5evm) boots up disabled.
Hence calling regulator_disable() indirectly through dwc3_omap_set_mailbox() during probe leads to:
[ 10.332764] WARNING: CPU: 0 PID: 1628 at drivers/regulator/core.c:2853 _regulator_disable+0x40/0x164 [ 10.351919] unbalanced disables for smps10_out1 [ 10.361298] Modules linked in: dwc3_omap(+) clk_twl6040 at24 gpio_twl6040 palmas_gpadc palmas_pwrbutton industrialio snd_soc_omap_mcbsp(+) snd_soc_ti_sdma display_connector ti_tpd12s015 drm leds_gpio drm_panel_orientation_quirks ip_tables x_tables ipv6 autofs4 [ 10.387818] CPU: 0 PID: 1628 Comm: systemd-udevd Not tainted 5.17.0-rc1-letux-lpae+ #8139 [ 10.405129] Hardware name: Generic OMAP5 (Flattened Device Tree) [ 10.411455] unwind_backtrace from show_stack+0x10/0x14 [ 10.416970] show_stack from dump_stack_lvl+0x40/0x4c [ 10.422313] dump_stack_lvl from __warn+0xb8/0x170 [ 10.427377] __warn from warn_slowpath_fmt+0x70/0x9c [ 10.432595] warn_slowpath_fmt from _regulator_disable+0x40/0x164 [ 10.439037] _regulator_disable from regulator_disable+0x30/0x64 [ 10.445382] regulator_disable from dwc3_omap_set_mailbox+0x8c/0xf0 [dwc3_omap] [ 10.453116] dwc3_omap_set_mailbox [dwc3_omap] from dwc3_omap_probe+0x2b8/0x394 [dwc3_omap] [ 10.467021] dwc3_omap_probe [dwc3_omap] from platform_probe+0x58/0xa8 [ 10.481762] platform_probe from really_probe+0x168/0x2fc [ 10.481782] really_probe from __driver_probe_device+0xc4/0xd8 [ 10.481782] __driver_probe_device from driver_probe_device+0x24/0xa4 [ 10.503762] driver_probe_device from __driver_attach+0xc4/0xd8 [ 10.510018] __driver_attach from bus_for_each_dev+0x64/0xa0 [ 10.516001] bus_for_each_dev from bus_add_driver+0x148/0x1a4 [ 10.524880] bus_add_driver from driver_register+0xb4/0xf8 [ 10.530678] driver_register from do_one_initcall+0x90/0x1c4 [ 10.536661] do_one_initcall from do_init_module+0x4c/0x200 [ 10.536683] do_init_module from load_module+0x13dc/0x1910 [ 10.551159] load_module from sys_finit_module+0xc8/0xd8 [ 10.561319] sys_finit_module from __sys_trace_return+0x0/0x18 [ 10.561336] Exception stack(0xc344bfa8 to 0xc344bff0) [ 10.561341] bfa0: b6fb5778 b6fab8d8 00000007 b6ecfbb8 00000000 b6ed0398 [ 10.561341] bfc0: b6fb5778 b6fab8d8 855c0500 0000017b 00020000 b6f9a3cc 00000000 b6fb5778 [ 10.595500] bfe0: bede18f8 bede18e8 b6ec9aeb b6dda1c2 [ 10.601345] ---[ end trace 0000000000000000 ]---
Fix this unnecessary warning by checking if the regulator is enabled.
Signed-off-by: H. Nikolaus Schaller hns@goldelico.com Link: https://lore.kernel.org/r/af3b750dc2265d875deaabcf5f80098c9645da45.164674461... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/dwc3/dwc3-omap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/dwc3-omap.c b/drivers/usb/dwc3/dwc3-omap.c index e196673f5c64..efaf0db595f4 100644 --- a/drivers/usb/dwc3/dwc3-omap.c +++ b/drivers/usb/dwc3/dwc3-omap.c @@ -242,7 +242,7 @@ static void dwc3_omap_set_mailbox(struct dwc3_omap *omap, break;
case OMAP_DWC3_ID_FLOAT: - if (omap->vbus_reg) + if (omap->vbus_reg && regulator_is_enabled(omap->vbus_reg)) regulator_disable(omap->vbus_reg); val = dwc3_omap_read_utmi_ctrl(omap); val |= USBOTGSS_UTMI_OTG_CTRL_IDDIG;
From: Deren Wu deren.wu@mediatek.com
[ Upstream commit 123bc712b1de0805f9d683687e17b1ec2aba0b68 ]
mt7921s driver may receive frames with fragment buffers. If there is a CTS packet received in monitor mode, the payload is 10 bytes only and need 6 bytes header padding after RXD buffer. However, only RXD in the first linear buffer, if we pull buffer size RXD-size+6 bytes with skb_pull(), that would trigger "BUG_ON(skb->len < skb->data_len)" in __skb_pull().
To avoid the nonlinear buffer issue, enlarge the RXD size from 128 to 256 to make sure all MCU operation in linear buffer.
[ 52.007562] kernel BUG at include/linux/skbuff.h:2313! [ 52.007578] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 52.007987] pc : skb_pull+0x48/0x4c [ 52.008015] lr : mt7921_queue_rx_skb+0x494/0x890 [mt7921_common] [ 52.008361] Call trace: [ 52.008377] skb_pull+0x48/0x4c [ 52.008400] mt76s_net_worker+0x134/0x1b0 [mt76_sdio 35339a92c6eb7d4bbcc806a1d22f56365565135c] [ 52.008431] __mt76_worker_fn+0xe8/0x170 [mt76 ef716597d11a77150bc07e3fdd68eeb0f9b56917] [ 52.008449] kthread+0x148/0x3ac [ 52.008466] ret_from_fork+0x10/0x30
Signed-off-by: Lorenzo Bianconi lorenzo@kernel.org Signed-off-by: Sean Wang sean.wang@mediatek.com Signed-off-by: Deren Wu deren.wu@mediatek.com Signed-off-by: Felix Fietkau nbd@nbd.name Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/mediatek/mt76/mt76.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/mediatek/mt76/mt76.h b/drivers/net/wireless/mediatek/mt76/mt76.h index 4d01fd85283d..6e4d69715927 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76.h +++ b/drivers/net/wireless/mediatek/mt76/mt76.h @@ -19,7 +19,7 @@
#define MT_MCU_RING_SIZE 32 #define MT_RX_BUF_SIZE 2048 -#define MT_SKB_HEAD_LEN 128 +#define MT_SKB_HEAD_LEN 256
#define MT_MAX_NON_AQL_PKT 16 #define MT_TXQ_FREE_THR 32
From: Max Filippov jcmvbkbc@gmail.com
[ Upstream commit e85d29ba4b24f68e7a78cb85c55e754362eeb2de ]
DTC issues the following warnings when building xtfpga device trees:
/soc/flash@00000000/partition@0x0: unit name should not have leading "0x" /soc/flash@00000000/partition@0x6000000: unit name should not have leading "0x" /soc/flash@00000000/partition@0x6800000: unit name should not have leading "0x" /soc/flash@00000000/partition@0x7fe0000: unit name should not have leading "0x"
Drop leading 0x from flash partition unit names.
Signed-off-by: Max Filippov jcmvbkbc@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/xtensa/boot/dts/xtfpga-flash-128m.dtsi | 8 ++++---- arch/xtensa/boot/dts/xtfpga-flash-16m.dtsi | 8 ++++---- arch/xtensa/boot/dts/xtfpga-flash-4m.dtsi | 4 ++-- 3 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/arch/xtensa/boot/dts/xtfpga-flash-128m.dtsi b/arch/xtensa/boot/dts/xtfpga-flash-128m.dtsi index 9bf8bad1dd18..c33932568aa7 100644 --- a/arch/xtensa/boot/dts/xtfpga-flash-128m.dtsi +++ b/arch/xtensa/boot/dts/xtfpga-flash-128m.dtsi @@ -8,19 +8,19 @@ reg = <0x00000000 0x08000000>; bank-width = <2>; device-width = <2>; - partition@0x0 { + partition@0 { label = "data"; reg = <0x00000000 0x06000000>; }; - partition@0x6000000 { + partition@6000000 { label = "boot loader area"; reg = <0x06000000 0x00800000>; }; - partition@0x6800000 { + partition@6800000 { label = "kernel image"; reg = <0x06800000 0x017e0000>; }; - partition@0x7fe0000 { + partition@7fe0000 { label = "boot environment"; reg = <0x07fe0000 0x00020000>; }; diff --git a/arch/xtensa/boot/dts/xtfpga-flash-16m.dtsi b/arch/xtensa/boot/dts/xtfpga-flash-16m.dtsi index 40c2f81f7cb6..7bde2ab2d6fb 100644 --- a/arch/xtensa/boot/dts/xtfpga-flash-16m.dtsi +++ b/arch/xtensa/boot/dts/xtfpga-flash-16m.dtsi @@ -8,19 +8,19 @@ reg = <0x08000000 0x01000000>; bank-width = <2>; device-width = <2>; - partition@0x0 { + partition@0 { label = "boot loader area"; reg = <0x00000000 0x00400000>; }; - partition@0x400000 { + partition@400000 { label = "kernel image"; reg = <0x00400000 0x00600000>; }; - partition@0xa00000 { + partition@a00000 { label = "data"; reg = <0x00a00000 0x005e0000>; }; - partition@0xfe0000 { + partition@fe0000 { label = "boot environment"; reg = <0x00fe0000 0x00020000>; }; diff --git a/arch/xtensa/boot/dts/xtfpga-flash-4m.dtsi b/arch/xtensa/boot/dts/xtfpga-flash-4m.dtsi index fb8d3a9f33c2..0655b868749a 100644 --- a/arch/xtensa/boot/dts/xtfpga-flash-4m.dtsi +++ b/arch/xtensa/boot/dts/xtfpga-flash-4m.dtsi @@ -8,11 +8,11 @@ reg = <0x08000000 0x00400000>; bank-width = <2>; device-width = <2>; - partition@0x0 { + partition@0 { label = "boot loader area"; reg = <0x00000000 0x003f0000>; }; - partition@0x3f0000 { + partition@3f0000 { label = "boot environment"; reg = <0x003f0000 0x00010000>; };
From: Krzysztof Kozlowski krzysztof.kozlowski@canonical.com
[ Upstream commit 8931ddd8d6a55fcefb20f44a38ba42bb746f0b62 ]
Unit node addresses should not have leading 0x:
Warning (unit_address_format): /nemc@13410000/efuse@d0/eth-mac-addr@0x22: unit name should not have leading "0x"
Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@canonical.com Reviewed-by: Paul Cercueil paul@crapouillou.net Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/mips/boot/dts/ingenic/jz4780.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/mips/boot/dts/ingenic/jz4780.dtsi b/arch/mips/boot/dts/ingenic/jz4780.dtsi index 9e34f433b9b5..efbbddaf0fde 100644 --- a/arch/mips/boot/dts/ingenic/jz4780.dtsi +++ b/arch/mips/boot/dts/ingenic/jz4780.dtsi @@ -450,7 +450,7 @@ #address-cells = <1>; #size-cells = <1>;
- eth0_addr: eth-mac-addr@0x22 { + eth0_addr: eth-mac-addr@22 { reg = <0x22 0x6>; }; };
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit f63d24baff787e13b723d86fe036f84bdbc35045 ]
This fixes the following trace caused by receiving HCI_EV_DISCONN_PHY_LINK_COMPLETE which does call hci_conn_del without first checking if conn->type is in fact AMP_LINK and in case it is do properly cleanup upper layers with hci_disconn_cfm:
================================================================== BUG: KASAN: use-after-free in hci_send_acl+0xaba/0xc50 Read of size 8 at addr ffff88800e404818 by task bluetoothd/142
CPU: 0 PID: 142 Comm: bluetoothd Not tainted 5.17.0-rc5-00006-gda4022eeac1a #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x45/0x59 print_address_description.constprop.0+0x1f/0x150 kasan_report.cold+0x7f/0x11b hci_send_acl+0xaba/0xc50 l2cap_do_send+0x23f/0x3d0 l2cap_chan_send+0xc06/0x2cc0 l2cap_sock_sendmsg+0x201/0x2b0 sock_sendmsg+0xdc/0x110 sock_write_iter+0x20f/0x370 do_iter_readv_writev+0x343/0x690 do_iter_write+0x132/0x640 vfs_writev+0x198/0x570 do_writev+0x202/0x280 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RSP: 002b:00007ffce8a099b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000014 Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 14 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 RDX: 0000000000000001 RSI: 00007ffce8a099e0 RDI: 0000000000000015 RAX: ffffffffffffffda RBX: 00007ffce8a099e0 RCX: 00007f788fc3cf77 R10: 00007ffce8af7080 R11: 0000000000000246 R12: 000055e4ccf75580 RBP: 0000000000000015 R08: 0000000000000002 R09: 0000000000000001 </TASK> R13: 000055e4ccf754a0 R14: 000055e4ccf75cd0 R15: 000055e4ccf4a6b0
Allocated by task 45: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 hci_chan_create+0x9a/0x2f0 l2cap_conn_add.part.0+0x1a/0xdc0 l2cap_connect_cfm+0x236/0x1000 le_conn_complete_evt+0x15a7/0x1db0 hci_le_conn_complete_evt+0x226/0x2c0 hci_le_meta_evt+0x247/0x450 hci_event_packet+0x61b/0xe90 hci_rx_work+0x4d5/0xc50 process_one_work+0x8fb/0x15a0 worker_thread+0x576/0x1240 kthread+0x29d/0x340 ret_from_fork+0x1f/0x30
Freed by task 45: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 __kasan_slab_free+0xfb/0x130 kfree+0xac/0x350 hci_conn_cleanup+0x101/0x6a0 hci_conn_del+0x27e/0x6c0 hci_disconn_phylink_complete_evt+0xe0/0x120 hci_event_packet+0x812/0xe90 hci_rx_work+0x4d5/0xc50 process_one_work+0x8fb/0x15a0 worker_thread+0x576/0x1240 kthread+0x29d/0x340 ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff88800c0f0500 The buggy address is located 24 bytes inside of which belongs to the cache kmalloc-128 of size 128 The buggy address belongs to the page: 128-byte region [ffff88800c0f0500, ffff88800c0f0580) flags: 0x100000000000200(slab|node=0|zone=1) page:00000000fe45cd86 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xc0f0 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 raw: 0100000000000200 ffffea00003a2c80 dead000000000004 ffff8880078418c0 page dumped because: kasan: bad access detected ffff88800c0f0400: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc Memory state around the buggy address: >ffff88800c0f0500: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88800c0f0480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88800c0f0580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ================================================================== ffff88800c0f0600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Reported-by: Sönke Huster soenke.huster@eknoes.de Tested-by: Sönke Huster soenke.huster@eknoes.de Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_event.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index 868a22df3285..e984a8b4b914 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -5153,8 +5153,9 @@ static void hci_disconn_phylink_complete_evt(struct hci_dev *hdev, hci_dev_lock(hdev);
hcon = hci_conn_hash_lookup_handle(hdev, ev->phy_handle); - if (hcon) { + if (hcon && hcon->type == AMP_LINK) { hcon->state = BT_CLOSED; + hci_disconn_cfm(hcon, ev->reason); hci_conn_del(hcon); }
From: Florian Westphal fw@strlen.de
[ Upstream commit 2cfadb761d3d0219412fd8150faea60c7e863833 ]
as of commit 4608fdfc07e1 ("netfilter: conntrack: collect all entries in one cycle") conntrack gc was changed to run every 2 minutes.
On systems where conntrack hash table is set to large value, most evictions happen from gc worker rather than the packet path due to hash table distribution.
This causes netlink event overflows when events are collected.
This change collects average expiry of scanned entries and reschedules to the average remaining value, within 1 to 60 second interval.
To avoid event overflows, reschedule after each bucket and add a limit for both run time and number of evictions per run.
If more entries have to be evicted, reschedule and restart 1 jiffy into the future.
Reported-by: Karel Rericha karel@maxtel.cz Cc: Shmulik Ladkani shmulik.ladkani@gmail.com Cc: Eyal Birger eyal.birger@gmail.com Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_conntrack_core.c | 85 ++++++++++++++++++++++++------- 1 file changed, 68 insertions(+), 17 deletions(-)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 917e708a4561..3a98a1316307 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -66,6 +66,8 @@ EXPORT_SYMBOL_GPL(nf_conntrack_hash); struct conntrack_gc_work { struct delayed_work dwork; u32 next_bucket; + u32 avg_timeout; + u32 start_time; bool exiting; bool early_drop; }; @@ -77,8 +79,19 @@ static __read_mostly bool nf_conntrack_locks_all; /* serialize hash resizes and nf_ct_iterate_cleanup */ static DEFINE_MUTEX(nf_conntrack_mutex);
-#define GC_SCAN_INTERVAL (120u * HZ) +#define GC_SCAN_INTERVAL_MAX (60ul * HZ) +#define GC_SCAN_INTERVAL_MIN (1ul * HZ) + +/* clamp timeouts to this value (TCP unacked) */ +#define GC_SCAN_INTERVAL_CLAMP (300ul * HZ) + +/* large initial bias so that we don't scan often just because we have + * three entries with a 1s timeout. + */ +#define GC_SCAN_INTERVAL_INIT INT_MAX + #define GC_SCAN_MAX_DURATION msecs_to_jiffies(10) +#define GC_SCAN_EXPIRED_MAX (64000u / HZ)
#define MIN_CHAINLEN 8u #define MAX_CHAINLEN (32u - MIN_CHAINLEN) @@ -1420,16 +1433,28 @@ static bool gc_worker_can_early_drop(const struct nf_conn *ct)
static void gc_worker(struct work_struct *work) { - unsigned long end_time = jiffies + GC_SCAN_MAX_DURATION; unsigned int i, hashsz, nf_conntrack_max95 = 0; - unsigned long next_run = GC_SCAN_INTERVAL; + u32 end_time, start_time = nfct_time_stamp; struct conntrack_gc_work *gc_work; + unsigned int expired_count = 0; + unsigned long next_run; + s32 delta_time; + gc_work = container_of(work, struct conntrack_gc_work, dwork.work);
i = gc_work->next_bucket; if (gc_work->early_drop) nf_conntrack_max95 = nf_conntrack_max / 100u * 95u;
+ if (i == 0) { + gc_work->avg_timeout = GC_SCAN_INTERVAL_INIT; + gc_work->start_time = start_time; + } + + next_run = gc_work->avg_timeout; + + end_time = start_time + GC_SCAN_MAX_DURATION; + do { struct nf_conntrack_tuple_hash *h; struct hlist_nulls_head *ct_hash; @@ -1446,6 +1471,7 @@ static void gc_worker(struct work_struct *work)
hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[i], hnnode) { struct nf_conntrack_net *cnet; + unsigned long expires; struct net *net;
tmp = nf_ct_tuplehash_to_ctrack(h); @@ -1455,11 +1481,29 @@ static void gc_worker(struct work_struct *work) continue; }
+ if (expired_count > GC_SCAN_EXPIRED_MAX) { + rcu_read_unlock(); + + gc_work->next_bucket = i; + gc_work->avg_timeout = next_run; + + delta_time = nfct_time_stamp - gc_work->start_time; + + /* re-sched immediately if total cycle time is exceeded */ + next_run = delta_time < (s32)GC_SCAN_INTERVAL_MAX; + goto early_exit; + } + if (nf_ct_is_expired(tmp)) { nf_ct_gc_expired(tmp); + expired_count++; continue; }
+ expires = clamp(nf_ct_expires(tmp), GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_CLAMP); + next_run += expires; + next_run /= 2u; + if (nf_conntrack_max95 == 0 || gc_worker_skip_ct(tmp)) continue;
@@ -1477,8 +1521,10 @@ static void gc_worker(struct work_struct *work) continue; }
- if (gc_worker_can_early_drop(tmp)) + if (gc_worker_can_early_drop(tmp)) { nf_ct_kill(tmp); + expired_count++; + }
nf_ct_put(tmp); } @@ -1491,33 +1537,38 @@ static void gc_worker(struct work_struct *work) cond_resched(); i++;
- if (time_after(jiffies, end_time) && i < hashsz) { + delta_time = nfct_time_stamp - end_time; + if (delta_time > 0 && i < hashsz) { + gc_work->avg_timeout = next_run; gc_work->next_bucket = i; next_run = 0; - break; + goto early_exit; } } while (i < hashsz);
+ gc_work->next_bucket = 0; + + next_run = clamp(next_run, GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_MAX); + + delta_time = max_t(s32, nfct_time_stamp - gc_work->start_time, 1); + if (next_run > (unsigned long)delta_time) + next_run -= delta_time; + else + next_run = 1; + +early_exit: if (gc_work->exiting) return;
- /* - * Eviction will normally happen from the packet path, and not - * from this gc worker. - * - * This worker is only here to reap expired entries when system went - * idle after a busy period. - */ - if (next_run) { + if (next_run) gc_work->early_drop = false; - gc_work->next_bucket = 0; - } + queue_delayed_work(system_power_efficient_wq, &gc_work->dwork, next_run); }
static void conntrack_gc_work_init(struct conntrack_gc_work *gc_work) { - INIT_DEFERRABLE_WORK(&gc_work->dwork, gc_worker); + INIT_DELAYED_WORK(&gc_work->dwork, gc_worker); gc_work->exiting = false; }
From: Wang Yufen wangyufen@huawei.com
[ Upstream commit f22881de730ebd472e15bcc2c0d1d46e36a87b9c ]
In calipso_map_cat_ntoh(), in the for loop, if the return value of netlbl_bitmap_walk() is equal to (net_clen_bits - 1), when netlbl_bitmap_walk() is called next time, out-of-bounds memory accesses of bitmap[byte_offset] occurs.
The bug was found during fuzzing. The following is the fuzzing report BUG: KASAN: slab-out-of-bounds in netlbl_bitmap_walk+0x3c/0xd0 Read of size 1 at addr ffffff8107bf6f70 by task err_OH/252
CPU: 7 PID: 252 Comm: err_OH Not tainted 5.17.0-rc7+ #17 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x21c/0x230 show_stack+0x1c/0x60 dump_stack_lvl+0x64/0x7c print_address_description.constprop.0+0x70/0x2d0 __kasan_report+0x158/0x16c kasan_report+0x74/0x120 __asan_load1+0x80/0xa0 netlbl_bitmap_walk+0x3c/0xd0 calipso_opt_getattr+0x1a8/0x230 calipso_sock_getattr+0x218/0x340 calipso_sock_getattr+0x44/0x60 netlbl_sock_getattr+0x44/0x80 selinux_netlbl_socket_setsockopt+0x138/0x170 selinux_socket_setsockopt+0x4c/0x60 security_socket_setsockopt+0x4c/0x90 __sys_setsockopt+0xbc/0x2b0 __arm64_sys_setsockopt+0x6c/0x84 invoke_syscall+0x64/0x190 el0_svc_common.constprop.0+0x88/0x200 do_el0_svc+0x88/0xa0 el0_svc+0x128/0x1b0 el0t_64_sync_handler+0x9c/0x120 el0t_64_sync+0x16c/0x170
Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Wang Yufen wangyufen@huawei.com Acked-by: Paul Moore paul@paul-moore.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/netlabel/netlabel_kapi.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c index beb0e573266d..54c083003947 100644 --- a/net/netlabel/netlabel_kapi.c +++ b/net/netlabel/netlabel_kapi.c @@ -885,6 +885,8 @@ int netlbl_bitmap_walk(const unsigned char *bitmap, u32 bitmap_len, unsigned char bitmask; unsigned char byte;
+ if (offset >= bitmap_len) + return -1; byte_offset = offset / 8; byte = bitmap[byte_offset]; bit_spot = offset;
From: Xiubo Li xiubli@redhat.com
[ Upstream commit 322794d3355c33adcc4feace0045d85a8e4ed813 ]
The ceph_get_inode() will search for or insert a new inode into the hash for the given vino, and return a reference to it. If new is non-NULL, its reference is consumed.
We should release the reference when in error handing cases.
Signed-off-by: Xiubo Li xiubli@redhat.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ceph/inode.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 1c7574105478..42e449d3f18b 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -87,13 +87,13 @@ struct inode *ceph_get_snapdir(struct inode *parent) if (!S_ISDIR(parent->i_mode)) { pr_warn_once("bad snapdir parent type (mode=0%o)\n", parent->i_mode); - return ERR_PTR(-ENOTDIR); + goto err; }
if (!(inode->i_state & I_NEW) && !S_ISDIR(inode->i_mode)) { pr_warn_once("bad snapdir inode type (mode=0%o)\n", inode->i_mode); - return ERR_PTR(-ENOTDIR); + goto err; }
inode->i_mode = parent->i_mode; @@ -113,6 +113,12 @@ struct inode *ceph_get_snapdir(struct inode *parent) }
return inode; +err: + if ((inode->i_state & I_NEW)) + discard_new_inode(inode); + else + iput(inode); + return ERR_PTR(-ENOTDIR); }
const struct inode_operations ceph_file_iops = {
From: Xiubo Li xiubli@redhat.com
[ Upstream commit f639d9867eea647005dc824e0e24f39ffc50d4e4 ]
Reset the last_readdir at the same time, and add a comment explaining why we don't free last_readdir when dir_emit returns false.
Signed-off-by: Xiubo Li xiubli@redhat.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ceph/dir.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c index 133dbd9338e7..d91fa53e12b3 100644 --- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -478,8 +478,11 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx) 2 : (fpos_off(rde->offset) + 1); err = note_last_dentry(dfi, rde->name, rde->name_len, next_offset); - if (err) + if (err) { + ceph_mdsc_put_request(dfi->last_readdir); + dfi->last_readdir = NULL; return err; + } } else if (req->r_reply_info.dir_end) { dfi->next_offset = 2; /* keep last name */ @@ -520,6 +523,12 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx) if (!dir_emit(ctx, rde->name, rde->name_len, ceph_present_ino(inode->i_sb, le64_to_cpu(rde->inode.in->ino)), le32_to_cpu(rde->inode.in->mode) >> 12)) { + /* + * NOTE: Here no need to put the 'dfi->last_readdir', + * because when dir_emit stops us it's most likely + * doesn't have enough memory, etc. So for next readdir + * it will continue. + */ dout("filldir stopping us...\n"); return 0; }
From: Feng Tang feng.tang@intel.com
[ Upstream commit 1bf18da62106225dbc47aab41efee2aeb99caccd ]
0Day robots reported there is compiling issue for 'csky' ARCH when CONFIG_DEBUG_FORCE_DATA_SECTION_ALIGNED is enabled [1]:
All errors (new ones prefixed by >>):
{standard input}: Assembler messages:
{standard input}:2277: Error: pcrel offset for branch to .LS000B too far (0x3c)
Which was discussed in [2]. And as there is no solution for csky yet, add some dependency for this config to limit it to several ARCHs which have no compiling issue so far.
[1]. https://lore.kernel.org/lkml/202202271612.W32UJAj2-lkp@intel.com/ [2]. https://www.spinics.net/lists/linux-kbuild/msg30298.html
Link: https://lkml.kernel.org/r/20220304021100.GN4548@shbuild999.sh.intel.com Reported-by: kernel test robot lkp@intel.com Signed-off-by: Feng Tang feng.tang@intel.com Cc: Guo Ren guoren@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- lib/Kconfig.debug | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 2a9b6dcdac4f..55e89b237b6f 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -414,7 +414,8 @@ config SECTION_MISMATCH_WARN_ONLY If unsure, say Y.
config DEBUG_FORCE_FUNCTION_ALIGN_64B - bool "Force all function address 64B aligned" if EXPERT + bool "Force all function address 64B aligned" + depends on EXPERT && (X86_64 || ARM64 || PPC32 || PPC64 || ARC) help There are cases that a commit from one domain changes the function address alignment of other domains, and cause magic performance
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit f9a40b0890658330c83c95511f9d6b396610defc ]
initcall_blacklist() should return 1 to indicate that it handled its cmdline arguments.
set_debug_rodata() should return 1 to indicate that it handled its cmdline arguments. Print a warning if the option string is invalid.
This prevents these strings from being added to the 'init' program's environment as they are not init arguments/parameters.
Link: https://lkml.kernel.org/r/20220221050901.23985-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap rdunlap@infradead.org Reported-by: Igor Zhbanov i.zhbanov@omprussia.ru Cc: Ingo Molnar mingo@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- init/main.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/init/main.c b/init/main.c index b340d990d77c..06b98350ebd2 100644 --- a/init/main.c +++ b/init/main.c @@ -1198,7 +1198,7 @@ static int __init initcall_blacklist(char *str) } } while (str_entry);
- return 0; + return 1; }
static bool __init_or_module initcall_blacklisted(initcall_t fn) @@ -1460,7 +1460,9 @@ static noinline void __init kernel_init_freeable(void); bool rodata_enabled __ro_after_init = true; static int __init set_debug_rodata(char *str) { - return strtobool(str, &rodata_enabled); + if (strtobool(str, &rodata_enabled)) + pr_warn("Invalid option string for rodata: '%s'\n", str); + return 1; } __setup("rodata=", set_debug_rodata); #endif
From: Qinghua Jin qhjin.dev@gmail.com
[ Upstream commit 9ce3c0d26c42d279b6c378a03cd6a61d828f19ca ]
Testcase: 1. create a minix file system and mount it 2. open a file on the file system with O_RDWR|O_CREAT|O_TRUNC|O_DIRECT 3. open fails with -EINVAL but leaves an empty file behind. All other open() failures don't leave the failed open files behind.
It is hard to check the direct_IO op before creating the inode. Just as ext4 and btrfs do, this patch will resolve the issue by allowing to create the file with O_DIRECT but returning error when writing the file.
Link: https://lkml.kernel.org/r/20220107133626.413379-1-qhjin.dev@gmail.com Signed-off-by: Qinghua Jin qhjin.dev@gmail.com Reported-by: Colin Ian King colin.king@intel.com Reviewed-by: Jan Kara jack@suse.cz Acked-by: Christian Brauner christian.brauner@ubuntu.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/minix/inode.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/minix/inode.c b/fs/minix/inode.c index a71f1cf894b9..d4bd94234ef7 100644 --- a/fs/minix/inode.c +++ b/fs/minix/inode.c @@ -447,7 +447,8 @@ static const struct address_space_operations minix_aops = { .writepage = minix_writepage, .write_begin = minix_write_begin, .write_end = generic_write_end, - .bmap = minix_bmap + .bmap = minix_bmap, + .direct_IO = noop_direct_IO };
static const struct inode_operations minix_symlink_inode_operations = {
From: Adam Wujek dev_public@wujek.eu
[ Upstream commit 2a8b539433e111c4de364237627ef219d2f6350a ]
SI5341_OUT_CFG_RDIV_FORCE2 shall be checked first to distinguish whether a divider for a given output is set to 2 (SI5341_OUT_CFG_RDIV_FORCE2 is set) or the output is disabled (SI5341_OUT_CFG_RDIV_FORCE2 not set, SI5341_OUT_R_REG is set 0). Before the change, divider set to 2 (SI5341_OUT_R_REG set to 0) was interpreted as output is disabled.
Signed-off-by: Adam Wujek dev_public@wujek.eu Link: https://lore.kernel.org/r/20211203141125.2447520-1-dev_public@wujek.eu Reviewed-by: Robert Hancock robert.hancock@calian.com Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/clk-si5341.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/drivers/clk/clk-si5341.c b/drivers/clk/clk-si5341.c index f7b41366666e..4de098b6b0d4 100644 --- a/drivers/clk/clk-si5341.c +++ b/drivers/clk/clk-si5341.c @@ -798,6 +798,15 @@ static unsigned long si5341_output_clk_recalc_rate(struct clk_hw *hw, u32 r_divider; u8 r[3];
+ err = regmap_read(output->data->regmap, + SI5341_OUT_CONFIG(output), &val); + if (err < 0) + return err; + + /* If SI5341_OUT_CFG_RDIV_FORCE2 is set, r_divider is 2 */ + if (val & SI5341_OUT_CFG_RDIV_FORCE2) + return parent_rate / 2; + err = regmap_bulk_read(output->data->regmap, SI5341_OUT_R_REG(output), r, 3); if (err < 0) @@ -814,13 +823,6 @@ static unsigned long si5341_output_clk_recalc_rate(struct clk_hw *hw, r_divider += 1; r_divider <<= 1;
- err = regmap_read(output->data->regmap, - SI5341_OUT_CONFIG(output), &val); - if (err < 0) - return err; - - if (val & SI5341_OUT_CFG_RDIV_FORCE2) - r_divider = 2;
return parent_rate / r_divider; }
From: Stefan Wahren stefan.wahren@i2se.com
[ Upstream commit aa899e686d442c63d50f4d369cc02dbbf0941cb0 ]
vchiq_get_state() can return a NULL pointer. So handle this cases and avoid a NULL pointer derefence in vchiq_dump_platform_instances.
Reviewed-by: Nicolas Saenz Julienne nsaenz@kernel.org Signed-off-by: Stefan Wahren stefan.wahren@i2se.com Link: https://lore.kernel.org/r/1642968143-19281-17-git-send-email-stefan.wahren@i... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c index ea9a53bdb417..099359fc0115 100644 --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c @@ -1189,6 +1189,9 @@ int vchiq_dump_platform_instances(void *dump_context) int len; int i;
+ if (!state) + return -ENOTCONN; + /* * There is no list of instances, so instead scan all services, * marking those that have been dumped.
From: Stefan Wahren stefan.wahren@i2se.com
[ Upstream commit ca225857faf237234d2fffe5d1919467dfadd822 ]
In case of an invalid handle the function find_servive_by_handle returns NULL. So take care of this and avoid a NULL pointer dereference.
Reviewed-by: Nicolas Saenz Julienne nsaenz@kernel.org Signed-off-by: Stefan Wahren stefan.wahren@i2se.com Link: https://lore.kernel.org/r/1642968143-19281-18-git-send-email-stefan.wahren@i... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../staging/vc04_services/interface/vchiq_arm/vchiq_core.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_core.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_core.c index 9429b8a642fb..630ed0dc24c3 100644 --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_core.c +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_core.c @@ -2421,6 +2421,9 @@ void vchiq_msg_queue_push(unsigned int handle, struct vchiq_header *header) struct vchiq_service *service = find_service_by_handle(handle); int pos;
+ if (!service) + return; + while (service->msg_queue_write == service->msg_queue_read + VCHIQ_MAX_SLOTS) { if (wait_for_completion_interruptible(&service->msg_queue_pop)) @@ -2441,6 +2444,9 @@ struct vchiq_header *vchiq_msg_hold(unsigned int handle) struct vchiq_header *header; int pos;
+ if (!service) + return NULL; + if (service->msg_queue_write == service->msg_queue_read) return NULL;
From: Amjad Ouled-Ameur aouledameur@baylibre.com
[ Upstream commit 2f87727130ce17ffefecd0895eeebf22d5a36f6f ]
Use reset_control_rearm() call if an error occurs in case phy_meson_gxl_usb2_init() fails after reset() has been called ; or in case phy_meson_gxl_usb2_exit() is called i.e the resource is no longer used and the reset line may be triggered again by other devices.
reset_control_rearm() keeps use of triggered_count sane in the reset framework. Therefore, use of reset_control_reset() on shared reset line should be balanced with reset_control_rearm().
Signed-off-by: Amjad Ouled-Ameur aouledameur@baylibre.com Reported-by: Jerome Brunet jbrunet@baylibre.com Reviewed-by: Martin Blumenstingl martin.blumenstingl@googlemail.com Reviewed-by: Philipp Zabel p.zabel@pengutronix.de Acked-by: Neil Armstrong narmstrong@baylibre.com Link: https://lore.kernel.org/r/20220111095255.176141-2-aouledameur@baylibre.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/phy/amlogic/phy-meson-gxl-usb2.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/phy/amlogic/phy-meson-gxl-usb2.c b/drivers/phy/amlogic/phy-meson-gxl-usb2.c index 2b3c0d730f20..db17c3448bfe 100644 --- a/drivers/phy/amlogic/phy-meson-gxl-usb2.c +++ b/drivers/phy/amlogic/phy-meson-gxl-usb2.c @@ -114,8 +114,10 @@ static int phy_meson_gxl_usb2_init(struct phy *phy) return ret;
ret = clk_prepare_enable(priv->clk); - if (ret) + if (ret) { + reset_control_rearm(priv->reset); return ret; + }
return 0; } @@ -125,6 +127,7 @@ static int phy_meson_gxl_usb2_exit(struct phy *phy) struct phy_meson_gxl_usb2_priv *priv = phy_get_drvdata(phy);
clk_disable_unprepare(priv->clk); + reset_control_rearm(priv->reset);
return 0; }
From: Amjad Ouled-Ameur aouledameur@baylibre.com
[ Upstream commit 6466ba1898d415b527e1013bd8551a6fdfece94c ]
Use the existing dev_err_probe() helper instead of open-coding the same operation.
Signed-off-by: Amjad Ouled-Ameur aouledameur@baylibre.com Reported-by: Martin Blumenstingl martin.blumenstingl@googlemail.com Reviewed-by: Martin Blumenstingl martin.blumenstingl@googlemail.com Acked-by: Neil Armstrong narmstrong@baylibre.com Link: https://lore.kernel.org/r/20220111095255.176141-3-aouledameur@baylibre.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/phy/amlogic/phy-meson8b-usb2.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/phy/amlogic/phy-meson8b-usb2.c b/drivers/phy/amlogic/phy-meson8b-usb2.c index cf10bed40528..77e7e9b1428c 100644 --- a/drivers/phy/amlogic/phy-meson8b-usb2.c +++ b/drivers/phy/amlogic/phy-meson8b-usb2.c @@ -265,8 +265,9 @@ static int phy_meson8b_usb2_probe(struct platform_device *pdev) return PTR_ERR(priv->clk_usb);
priv->reset = devm_reset_control_get_optional_shared(&pdev->dev, NULL); - if (PTR_ERR(priv->reset) == -EPROBE_DEFER) - return PTR_ERR(priv->reset); + if (IS_ERR(priv->reset)) + return dev_err_probe(&pdev->dev, PTR_ERR(priv->reset), + "Failed to get the reset line");
priv->dr_mode = of_usb_get_dr_mode_by_phy(pdev->dev.of_node, -1); if (priv->dr_mode == USB_DR_MODE_UNKNOWN) {
From: Amjad Ouled-Ameur aouledameur@baylibre.com
[ Upstream commit 6f1dedf089ab1a4f03ea7aadc3c4a99885b4b4a0 ]
Use reset_control_rearm() call if an error occurs in case phy_meson8b_usb2_power_on() fails after reset() has been called, or in case phy_meson8b_usb2_power_off() is called i.e the resource is no longer used and the reset line may be triggered again by other devices.
reset_control_rearm() keeps use of triggered_count sane in the reset framework, use of reset_control_reset() on shared reset line should be balanced with reset_control_rearm().
Signed-off-by: Amjad Ouled-Ameur aouledameur@baylibre.com Reported-by: Jerome Brunet jbrunet@baylibre.com Reviewed-by: Martin Blumenstingl martin.blumenstingl@googlemail.com Acked-by: Neil Armstrong narmstrong@baylibre.com Link: https://lore.kernel.org/r/20220111095255.176141-4-aouledameur@baylibre.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/phy/amlogic/phy-meson8b-usb2.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/phy/amlogic/phy-meson8b-usb2.c b/drivers/phy/amlogic/phy-meson8b-usb2.c index 77e7e9b1428c..dd96763911b8 100644 --- a/drivers/phy/amlogic/phy-meson8b-usb2.c +++ b/drivers/phy/amlogic/phy-meson8b-usb2.c @@ -154,6 +154,7 @@ static int phy_meson8b_usb2_power_on(struct phy *phy) ret = clk_prepare_enable(priv->clk_usb_general); if (ret) { dev_err(&phy->dev, "Failed to enable USB general clock\n"); + reset_control_rearm(priv->reset); return ret; }
@@ -161,6 +162,7 @@ static int phy_meson8b_usb2_power_on(struct phy *phy) if (ret) { dev_err(&phy->dev, "Failed to enable USB DDR clock\n"); clk_disable_unprepare(priv->clk_usb_general); + reset_control_rearm(priv->reset); return ret; }
@@ -199,6 +201,7 @@ static int phy_meson8b_usb2_power_on(struct phy *phy) dev_warn(&phy->dev, "USB ID detect failed!\n"); clk_disable_unprepare(priv->clk_usb); clk_disable_unprepare(priv->clk_usb_general); + reset_control_rearm(priv->reset); return -EINVAL; } } @@ -218,6 +221,7 @@ static int phy_meson8b_usb2_power_off(struct phy *phy)
clk_disable_unprepare(priv->clk_usb); clk_disable_unprepare(priv->clk_usb_general); + reset_control_rearm(priv->reset);
/* power off the PHY by putting it into reset mode */ regmap_update_bits(priv->regmap, REG_CTRL, REG_CTRL_POWER_ON_RESET,
From: Sascha Hauer s.hauer@pengutronix.de
[ Upstream commit ff3187eabb5ce478d15b6ed62eb286756adefac3 ]
The pixel clocks dclk_vop[012] can be clocked from hpll, vpll, gpll or cpll. gpll and cpll also drive many other clocks, so changing the dclk_vop[012] clocks could change these other clocks as well. Drop CLK_SET_RATE_PARENT to fix that. With this change the VOP2 driver can only adjust the pixel clocks with the divider between the PLL and the dclk_vop[012] which means the user may have to adjust the PLL clock to a suitable rate using the assigned-clock-rate device tree property.
Signed-off-by: Sascha Hauer s.hauer@pengutronix.de Link: https://lore.kernel.org/r/20220126145549.617165-25-s.hauer@pengutronix.de Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/rockchip/clk-rk3568.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/clk/rockchip/clk-rk3568.c b/drivers/clk/rockchip/clk-rk3568.c index 75ca855e720d..6e5440841d1e 100644 --- a/drivers/clk/rockchip/clk-rk3568.c +++ b/drivers/clk/rockchip/clk-rk3568.c @@ -1038,13 +1038,13 @@ static struct rockchip_clk_branch rk3568_clk_branches[] __initdata = { RK3568_CLKGATE_CON(20), 8, GFLAGS), GATE(HCLK_VOP, "hclk_vop", "hclk_vo", 0, RK3568_CLKGATE_CON(20), 9, GFLAGS), - COMPOSITE(DCLK_VOP0, "dclk_vop0", hpll_vpll_gpll_cpll_p, CLK_SET_RATE_PARENT | CLK_SET_RATE_NO_REPARENT, + COMPOSITE(DCLK_VOP0, "dclk_vop0", hpll_vpll_gpll_cpll_p, CLK_SET_RATE_NO_REPARENT, RK3568_CLKSEL_CON(39), 10, 2, MFLAGS, 0, 8, DFLAGS, RK3568_CLKGATE_CON(20), 10, GFLAGS), - COMPOSITE(DCLK_VOP1, "dclk_vop1", hpll_vpll_gpll_cpll_p, CLK_SET_RATE_PARENT | CLK_SET_RATE_NO_REPARENT, + COMPOSITE(DCLK_VOP1, "dclk_vop1", hpll_vpll_gpll_cpll_p, CLK_SET_RATE_NO_REPARENT, RK3568_CLKSEL_CON(40), 10, 2, MFLAGS, 0, 8, DFLAGS, RK3568_CLKGATE_CON(20), 11, GFLAGS), - COMPOSITE(DCLK_VOP2, "dclk_vop2", hpll_vpll_gpll_cpll_p, 0, + COMPOSITE(DCLK_VOP2, "dclk_vop2", hpll_vpll_gpll_cpll_p, CLK_SET_RATE_NO_REPARENT, RK3568_CLKSEL_CON(41), 10, 2, MFLAGS, 0, 8, DFLAGS, RK3568_CLKGATE_CON(20), 12, GFLAGS), GATE(CLK_VOP_PWM, "clk_vop_pwm", "xin24m", 0,
From: Pierre Gondois Pierre.Gondois@arm.com
[ Upstream commit ec1c7ad47664f964c1101fe555b6fde0cb124b38 ]
CPUfreq governors request CPU frequencies using information on current CPU usage. The CPPC driver converts them to performance requests. Frequency targets are computed as: target_freq = (util / cpu_capacity) * max_freq target_freq is then clamped between [policy->min, policy->max].
The CPPC driver converts performance values to frequencies (and vice-versa) using cppc_cpufreq_perf_to_khz() and cppc_cpufreq_khz_to_perf(). These functions both use two different factors depending on the range of the input value. For cppc_cpufreq_khz_to_perf(): - (NOMINAL_PERF / NOMINAL_FREQ) or - (LOWEST_PERF / LOWEST_FREQ) and for cppc_cpufreq_perf_to_khz(): - (NOMINAL_FREQ / NOMINAL_PERF) or - ((NOMINAL_PERF - LOWEST_FREQ) / (NOMINAL_PERF - LOWEST_PERF))
This means: 1- the functions are not inverse for some values: (perf_to_khz(khz_to_perf(x)) != x) 2- cppc_cpufreq_perf_to_khz(LOWEST_PERF) can sometimes give a different value from LOWEST_FREQ due to integer approximation 3- it is implied that performance and frequency are proportional (NOMINAL_FREQ / NOMINAL_PERF) == (LOWEST_PERF / LOWEST_FREQ)
This patch changes the conversion functions to an affine function. This fixes the 3 points above.
Suggested-by: Lukasz Luba lukasz.luba@arm.com Suggested-by: Morten Rasmussen morten.rasmussen@arm.com Signed-off-by: Pierre Gondois Pierre.Gondois@arm.com Signed-off-by: Viresh Kumar viresh.kumar@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cpufreq/cppc_cpufreq.c | 43 +++++++++++++++++----------------- 1 file changed, 21 insertions(+), 22 deletions(-)
diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index d4c27022b9c9..e0ff09d66c96 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -303,52 +303,48 @@ static u64 cppc_get_dmi_max_khz(void)
/* * If CPPC lowest_freq and nominal_freq registers are exposed then we can - * use them to convert perf to freq and vice versa - * - * If the perf/freq point lies between Nominal and Lowest, we can treat - * (Low perf, Low freq) and (Nom Perf, Nom freq) as 2D co-ordinates of a line - * and extrapolate the rest - * For perf/freq > Nominal, we use the ratio perf:freq at Nominal for conversion + * use them to convert perf to freq and vice versa. The conversion is + * extrapolated as an affine function passing by the 2 points: + * - (Low perf, Low freq) + * - (Nominal perf, Nominal perf) */ static unsigned int cppc_cpufreq_perf_to_khz(struct cppc_cpudata *cpu_data, unsigned int perf) { struct cppc_perf_caps *caps = &cpu_data->perf_caps; + s64 retval, offset = 0; static u64 max_khz; u64 mul, div;
if (caps->lowest_freq && caps->nominal_freq) { - if (perf >= caps->nominal_perf) { - mul = caps->nominal_freq; - div = caps->nominal_perf; - } else { - mul = caps->nominal_freq - caps->lowest_freq; - div = caps->nominal_perf - caps->lowest_perf; - } + mul = caps->nominal_freq - caps->lowest_freq; + div = caps->nominal_perf - caps->lowest_perf; + offset = caps->nominal_freq - div64_u64(caps->nominal_perf * mul, div); } else { if (!max_khz) max_khz = cppc_get_dmi_max_khz(); mul = max_khz; div = caps->highest_perf; } - return (u64)perf * mul / div; + + retval = offset + div64_u64(perf * mul, div); + if (retval >= 0) + return retval; + return 0; }
static unsigned int cppc_cpufreq_khz_to_perf(struct cppc_cpudata *cpu_data, unsigned int freq) { struct cppc_perf_caps *caps = &cpu_data->perf_caps; + s64 retval, offset = 0; static u64 max_khz; u64 mul, div;
if (caps->lowest_freq && caps->nominal_freq) { - if (freq >= caps->nominal_freq) { - mul = caps->nominal_perf; - div = caps->nominal_freq; - } else { - mul = caps->lowest_perf; - div = caps->lowest_freq; - } + mul = caps->nominal_perf - caps->lowest_perf; + div = caps->nominal_freq - caps->lowest_freq; + offset = caps->nominal_perf - div64_u64(caps->nominal_freq * mul, div); } else { if (!max_khz) max_khz = cppc_get_dmi_max_khz(); @@ -356,7 +352,10 @@ static unsigned int cppc_cpufreq_khz_to_perf(struct cppc_cpudata *cpu_data, div = max_khz; }
- return (u64)freq * mul / div; + retval = offset + div64_u64(freq * mul, div); + if (retval >= 0) + return retval; + return 0; }
static int cppc_cpufreq_set_target(struct cpufreq_policy *policy,
From: Viresh Kumar viresh.kumar@linaro.org
[ Upstream commit 021dbecabc93b1610b5db989d52a94e0c6671136 ]
It is difficult to find which OPPs are active at the moment, specially if there are multiple OPPs with same frequency available in the device tree (controlled by supported hardware feature).
Expose name of the DT node to find out the exact OPP.
While at it, also expose level field.
Reported-by: Leo Yan leo.yan@linaro.org Tested-by: Leo Yan leo.yan@linaro.org Signed-off-by: Viresh Kumar viresh.kumar@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/opp/debugfs.c | 5 +++++ drivers/opp/opp.h | 1 + 2 files changed, 6 insertions(+)
diff --git a/drivers/opp/debugfs.c b/drivers/opp/debugfs.c index 596c185b5dda..b5f2f9f39392 100644 --- a/drivers/opp/debugfs.c +++ b/drivers/opp/debugfs.c @@ -10,6 +10,7 @@ #include <linux/debugfs.h> #include <linux/device.h> #include <linux/err.h> +#include <linux/of.h> #include <linux/init.h> #include <linux/limits.h> #include <linux/slab.h> @@ -131,9 +132,13 @@ void opp_debug_create_one(struct dev_pm_opp *opp, struct opp_table *opp_table) debugfs_create_bool("suspend", S_IRUGO, d, &opp->suspend); debugfs_create_u32("performance_state", S_IRUGO, d, &opp->pstate); debugfs_create_ulong("rate_hz", S_IRUGO, d, &opp->rate); + debugfs_create_u32("level", S_IRUGO, d, &opp->level); debugfs_create_ulong("clock_latency_ns", S_IRUGO, d, &opp->clock_latency_ns);
+ opp->of_name = of_node_full_name(opp->np); + debugfs_create_str("of_name", S_IRUGO, d, (char **)&opp->of_name); + opp_debug_create_supplies(opp, opp_table, d); opp_debug_create_bw(opp, opp_table, d);
diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h index 407c3bfe51d9..45e3a55239a1 100644 --- a/drivers/opp/opp.h +++ b/drivers/opp/opp.h @@ -96,6 +96,7 @@ struct dev_pm_opp {
#ifdef CONFIG_DEBUG_FS struct dentry *dentry; + const char *of_name; #endif };
From: Xiaoke Wang xkernel.wang@foxmail.com
[ Upstream commit 60f1d3c92dc1ef1026e5b917a329a7fa947da036 ]
One error handler of wfx_init_common() return without calling ieee80211_free_hw(hw), which may result in memory leak. And I add one err label to unify the error handler, which is useful for the subsequent changes.
Suggested-by: Jérôme Pouiller jerome.pouiller@silabs.com Reviewed-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Jérôme Pouiller jerome.pouiller@silabs.com Signed-off-by: Xiaoke Wang xkernel.wang@foxmail.com Link: https://lore.kernel.org/r/tencent_24A24A3EFF61206ECCC4B94B1C5C1454E108@qq.co... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/staging/wfx/main.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/staging/wfx/main.c b/drivers/staging/wfx/main.c index 4b9fdf99981b..9ff69c5e0ae9 100644 --- a/drivers/staging/wfx/main.c +++ b/drivers/staging/wfx/main.c @@ -309,7 +309,8 @@ struct wfx_dev *wfx_init_common(struct device *dev, wdev->pdata.gpio_wakeup = devm_gpiod_get_optional(dev, "wakeup", GPIOD_OUT_LOW); if (IS_ERR(wdev->pdata.gpio_wakeup)) - return NULL; + goto err; + if (wdev->pdata.gpio_wakeup) gpiod_set_consumer_name(wdev->pdata.gpio_wakeup, "wfx wakeup");
@@ -328,6 +329,10 @@ struct wfx_dev *wfx_init_common(struct device *dev, return NULL;
return wdev; + +err: + ieee80211_free_hw(hw); + return NULL; }
int wfx_probe(struct wfx_dev *wdev)
From: Lucas Denefle lucas.denefle@converge.io
[ Upstream commit 41a92a89eee819298f805c40187ad8b02bb53426 ]
w1_seq was failing due to several devices responding to the CHAIN_DONE at the same time. Now properly selects the current device in the chain with MATCH_ROM. Also acknowledgment was read twice.
Signed-off-by: Lucas Denefle lucas.denefle@converge.io Link: https://lore.kernel.org/r/20220223113558.232750-1-lucas.denefle@converge.io Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/w1/slaves/w1_therm.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c index ca70c5f03206..9cbeeb4923ec 100644 --- a/drivers/w1/slaves/w1_therm.c +++ b/drivers/w1/slaves/w1_therm.c @@ -2090,16 +2090,20 @@ static ssize_t w1_seq_show(struct device *device, if (sl->reg_num.id == reg_num->id) seq = i;
+ if (w1_reset_bus(sl->master)) + goto error; + + /* Put the device into chain DONE state */ + w1_write_8(sl->master, W1_MATCH_ROM); + w1_write_block(sl->master, (u8 *)&rn, 8); w1_write_8(sl->master, W1_42_CHAIN); w1_write_8(sl->master, W1_42_CHAIN_DONE); w1_write_8(sl->master, W1_42_CHAIN_DONE_INV); - w1_read_block(sl->master, &ack, sizeof(ack));
/* check for acknowledgment */ ack = w1_read_8(sl->master); if (ack != W1_42_SUCCESS_CONFIRM_BYTE) goto error; - }
/* Exit from CHAIN state */
From: Xin Xiong xiongx18@fudan.edu.cn
[ Upstream commit b7f114edd54326f730a754547e7cfb197b5bc132 ]
[You don't often get email from xiongx18@fudan.edu.cn. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.]
The reference counting issue happens in two error paths in the function _nfs42_proc_copy_notify(). In both error paths, the function simply returns the error code and forgets to balance the refcount of object `ctx`, bumped by get_nfs_open_context() earlier, which may cause refcount leaks.
Fix it by balancing refcount of the `ctx` object before the function returns in both error paths.
Signed-off-by: Xin Xiong xiongx18@fudan.edu.cn Signed-off-by: Xiyu Yang xiyuyang19@fudan.edu.cn Signed-off-by: Xin Tan tanxin.ctf@gmail.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs42proc.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c index 9865b5c37d88..93f4d8257525 100644 --- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -586,8 +586,10 @@ static int _nfs42_proc_copy_notify(struct file *src, struct file *dst,
ctx = get_nfs_open_context(nfs_file_open_context(src)); l_ctx = nfs_get_lock_context(ctx); - if (IS_ERR(l_ctx)) - return PTR_ERR(l_ctx); + if (IS_ERR(l_ctx)) { + status = PTR_ERR(l_ctx); + goto out; + }
status = nfs4_set_rw_stateid(&args->cna_src_stateid, ctx, l_ctx, FMODE_READ); @@ -595,7 +597,7 @@ static int _nfs42_proc_copy_notify(struct file *src, struct file *dst, if (status) { if (status == -EAGAIN) status = -NFS4ERR_BAD_STATEID; - return status; + goto out; }
status = nfs4_call_sync(src_server->client, src_server, &msg, @@ -603,6 +605,7 @@ static int _nfs42_proc_copy_notify(struct file *src, struct file *dst, if (status == -ENOTSUPP) src_server->caps &= ~NFS_CAP_COPY_NOTIFY;
+out: put_nfs_open_context(nfs_file_open_context(src)); return status; }
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 3e17898aca293a24dae757a440a50aa63ca29671 ]
If memory allocation triggers a direct reclaim from the state recovery thread, then we can deadlock. Use memalloc_nofs_save/restore to ensure that doesn't happen.
Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4state.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index 51f5cb41e87a..57ea63e2cdb4 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -49,6 +49,7 @@ #include <linux/workqueue.h> #include <linux/bitops.h> #include <linux/jiffies.h> +#include <linux/sched/mm.h>
#include <linux/sunrpc/clnt.h>
@@ -2559,9 +2560,17 @@ static void nfs4_layoutreturn_any_run(struct nfs_client *clp)
static void nfs4_state_manager(struct nfs_client *clp) { + unsigned int memflags; int status = 0; const char *section = "", *section_sep = "";
+ /* + * State recovery can deadlock if the direct reclaim code tries + * start NFS writeback. So ensure memory allocations are all + * GFP_NOFS. + */ + memflags = memalloc_nofs_save(); + /* Ensure exclusive access to NFSv4 state */ do { trace_nfs4_state_mgr(clp); @@ -2656,6 +2665,7 @@ static void nfs4_state_manager(struct nfs_client *clp) clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state); }
+ memalloc_nofs_restore(memflags); nfs4_end_drain_session(clp); nfs4_clear_state_manager_bit(clp);
@@ -2673,6 +2683,7 @@ static void nfs4_state_manager(struct nfs_client *clp) return; if (test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) != 0) return; + memflags = memalloc_nofs_save(); } while (refcount_read(&clp->cl_count) > 1 && !signalled()); goto out_drain;
@@ -2685,6 +2696,7 @@ static void nfs4_state_manager(struct nfs_client *clp) clp->cl_hostname, -status); ssleep(1); out_drain: + memalloc_nofs_restore(memflags); nfs4_end_drain_session(clp); nfs4_clear_state_manager_bit(clp); }
From: Ohad Sharabi osharabi@habana.ai
[ Upstream commit eb85eec858c1a5c11d3a0bff403f6440b05b40dc ]
This patch fixes what seems to be copy paste error.
We will have a memory leak if the host-resident shadow is NULL (which will likely happen as the DR and HR are not dependent).
Signed-off-by: Ohad Sharabi osharabi@habana.ai Reviewed-by: Oded Gabbay ogabbay@kernel.org Signed-off-by: Oded Gabbay ogabbay@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/misc/habanalabs/common/mmu/mmu_v1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/misc/habanalabs/common/mmu/mmu_v1.c b/drivers/misc/habanalabs/common/mmu/mmu_v1.c index 0f536f79dd9c..e68e9f71c546 100644 --- a/drivers/misc/habanalabs/common/mmu/mmu_v1.c +++ b/drivers/misc/habanalabs/common/mmu/mmu_v1.c @@ -467,7 +467,7 @@ static void hl_mmu_v1_fini(struct hl_device *hdev) { /* MMU H/W fini was already done in device hw_fini() */
- if (!ZERO_OR_NULL_PTR(hdev->mmu_priv.hr.mmu_shadow_hop0)) { + if (!ZERO_OR_NULL_PTR(hdev->mmu_priv.dr.mmu_shadow_hop0)) { kvfree(hdev->mmu_priv.dr.mmu_shadow_hop0); gen_pool_destroy(hdev->mmu_priv.dr.mmu_pgt_pool);
From: Dongli Zhang dongli.zhang@oracle.com
[ Upstream commit eed05744322da07dd7e419432dcedf3c2e017179 ]
The sched_clock() can be used very early since commit 857baa87b642 ("sched/clock: Enable sched clock early"). In addition, with commit 38669ba205d1 ("x86/xen/time: Output xen sched_clock time from 0"), kdump kernel in Xen HVM guest may panic at very early stage when accessing &__this_cpu_read(xen_vcpu)->time as in below:
setup_arch() -> init_hypervisor_platform() -> x86_init.hyper.init_platform = xen_hvm_guest_init() -> xen_hvm_init_time_ops() -> xen_clocksource_read() -> src = &__this_cpu_read(xen_vcpu)->time;
This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info' embedded inside 'shared_info' during early stage until xen_vcpu_setup() is used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address.
However, when Xen HVM guest panic on vcpu >= 32, since xen_vcpu_info_reset(0) would set per_cpu(xen_vcpu, cpu) = NULL when vcpu >= 32, xen_clocksource_read() on vcpu >= 32 would panic.
This patch calls xen_hvm_init_time_ops() again later in xen_hvm_smp_prepare_boot_cpu() after the 'vcpu_info' for boot vcpu is registered when the boot vcpu is >= 32.
This issue can be reproduced on purpose via below command at the guest side when kdump/kexec is enabled:
"taskset -c 33 echo c > /proc/sysrq-trigger"
The bugfix for PVM is not implemented due to the lack of testing environment.
[boris: xen_hvm_init_time_ops() returns on errors instead of jumping to end]
Cc: Joe Jin joe.jin@oracle.com Signed-off-by: Dongli Zhang dongli.zhang@oracle.com Reviewed-by: Boris Ostrovsky boris.ostrovsky@oracle.com Link: https://lore.kernel.org/r/20220302164032.14569-3-dongli.zhang@oracle.com Signed-off-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/xen/smp_hvm.c | 6 ++++++ arch/x86/xen/time.c | 24 +++++++++++++++++++++++- 2 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/arch/x86/xen/smp_hvm.c b/arch/x86/xen/smp_hvm.c index 6ff3c887e0b9..b70afdff419c 100644 --- a/arch/x86/xen/smp_hvm.c +++ b/arch/x86/xen/smp_hvm.c @@ -19,6 +19,12 @@ static void __init xen_hvm_smp_prepare_boot_cpu(void) */ xen_vcpu_setup(0);
+ /* + * Called again in case the kernel boots on vcpu >= MAX_VIRT_CPUS. + * Refer to comments in xen_hvm_init_time_ops(). + */ + xen_hvm_init_time_ops(); + /* * The alternative logic (which patches the unlock/lock) runs before * the smp bootup up code is activated. Hence we need to set this up diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index d9c945ee1100..9ef0a5cca96e 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -558,6 +558,11 @@ static void xen_hvm_setup_cpu_clockevents(void)
void __init xen_hvm_init_time_ops(void) { + static bool hvm_time_initialized; + + if (hvm_time_initialized) + return; + /* * vector callback is needed otherwise we cannot receive interrupts * on cpu > 0 and at this point we don't know how many cpus are @@ -567,7 +572,22 @@ void __init xen_hvm_init_time_ops(void) return;
if (!xen_feature(XENFEAT_hvm_safe_pvclock)) { - pr_info("Xen doesn't support pvclock on HVM, disable pv timer"); + pr_info_once("Xen doesn't support pvclock on HVM, disable pv timer"); + return; + } + + /* + * Only MAX_VIRT_CPUS 'vcpu_info' are embedded inside 'shared_info'. + * The __this_cpu_read(xen_vcpu) is still NULL when Xen HVM guest + * boots on vcpu >= MAX_VIRT_CPUS (e.g., kexec), To access + * __this_cpu_read(xen_vcpu) via xen_clocksource_read() will panic. + * + * The xen_hvm_init_time_ops() should be called again later after + * __this_cpu_read(xen_vcpu) is available. + */ + if (!__this_cpu_read(xen_vcpu)) { + pr_info("Delay xen_init_time_common() as kernel is running on vcpu=%d\n", + xen_vcpu_nr(0)); return; }
@@ -577,6 +597,8 @@ void __init xen_hvm_init_time_ops(void) x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents;
x86_platform.set_wallclock = xen_set_wallclock; + + hvm_time_initialized = true; } #endif
From: Tony Lindgren tony@atomide.com
[ Upstream commit 80864594ff2ad002e2755daf97d46ff0c86faf1f ]
In preparation for making use of the clock-output-names, we want to keep node around in ti_dt_clocks_register().
This change should not needed as a fix currently.
Signed-off-by: Tony Lindgren tony@atomide.com Link: https://lore.kernel.org/r/20220204071449.16762-3-tony@atomide.com Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/ti/clk.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/clk/ti/clk.c b/drivers/clk/ti/clk.c index 3da33c786d77..29eafab4353e 100644 --- a/drivers/clk/ti/clk.c +++ b/drivers/clk/ti/clk.c @@ -131,7 +131,7 @@ int ti_clk_setup_ll_ops(struct ti_clk_ll_ops *ops) void __init ti_dt_clocks_register(struct ti_dt_clk oclks[]) { struct ti_dt_clk *c; - struct device_node *node, *parent; + struct device_node *node, *parent, *child; struct clk *clk; struct of_phandle_args clkspec; char buf[64]; @@ -171,10 +171,13 @@ void __init ti_dt_clocks_register(struct ti_dt_clk oclks[]) node = of_find_node_by_name(NULL, buf); if (num_args && compat_mode) { parent = node; - node = of_get_child_by_name(parent, "clock"); - if (!node) - node = of_get_child_by_name(parent, "clk"); - of_node_put(parent); + child = of_get_child_by_name(parent, "clock"); + if (!child) + child = of_get_child_by_name(parent, "clk"); + if (child) { + of_node_put(parent); + node = child; + } }
clkspec.np = node;
From: Maxime Ripard maxime@cerno.tech
[ Upstream commit 10c46f2ea914202482d19cf80dcc9c321c9ff59b ]
If we were to have two users of the same clock, doing something like:
clk_set_rate_range(user1, 1000, 2000); clk_set_rate_range(user2, 3000, 4000);
The second call would fail with -EINVAL, preventing from getting in a situation where we end up with impossible limits.
However, this is never explicitly checked against and enforced, and works by relying on an undocumented behaviour of clk_set_rate().
Indeed, on the first clk_set_rate_range will make sure the current clock rate is within the new range, so it will be between 1000 and 2000Hz. On the second clk_set_rate_range(), it will consider (rightfully), that our current clock is outside of the 3000-4000Hz range, and will call clk_core_set_rate_nolock() to set it to 3000Hz.
clk_core_set_rate_nolock() will then call clk_calc_new_rates() that will eventually check that our rate 3000Hz rate is outside the min 3000Hz max 2000Hz range, will bail out, the error will propagate and we'll eventually return -EINVAL.
This solely relies on the fact that clk_calc_new_rates(), and in particular clk_core_determine_round_nolock(), won't modify the new rate allowing the error to be reported. That assumption won't be true for all drivers, and most importantly we'll break that assumption in a later patch.
It can also be argued that we shouldn't even reach the point where we're calling clk_core_set_rate_nolock().
Let's make an explicit check for disjoints range before we're doing anything.
Signed-off-by: Maxime Ripard maxime@cerno.tech Link: https://lore.kernel.org/r/20220225143534.405820-4-maxime@cerno.tech Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/clk.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index 5cef73a85901..d6dc58bd07b3 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -631,6 +631,24 @@ static void clk_core_get_boundaries(struct clk_core *core, *max_rate = min(*max_rate, clk_user->max_rate); }
+static bool clk_core_check_boundaries(struct clk_core *core, + unsigned long min_rate, + unsigned long max_rate) +{ + struct clk *user; + + lockdep_assert_held(&prepare_lock); + + if (min_rate > core->max_rate || max_rate < core->min_rate) + return false; + + hlist_for_each_entry(user, &core->clks, clks_node) + if (min_rate > user->max_rate || max_rate < user->min_rate) + return false; + + return true; +} + void clk_hw_set_rate_range(struct clk_hw *hw, unsigned long min_rate, unsigned long max_rate) { @@ -2347,6 +2365,11 @@ int clk_set_rate_range(struct clk *clk, unsigned long min, unsigned long max) clk->min_rate = min; clk->max_rate = max;
+ if (!clk_core_check_boundaries(clk->core, min, max)) { + ret = -EINVAL; + goto out; + } + rate = clk_core_get_rate_nolock(clk->core); if (rate < min || rate > max) { /* @@ -2375,6 +2398,7 @@ int clk_set_rate_range(struct clk *clk, unsigned long min, unsigned long max) } }
+out: if (clk->exclusive_count) clk_core_rate_protect(clk->core);
From: NeilBrown neilb@suse.de
[ Upstream commit c487216bec83b0c5a8803e5c61433d33ad7b104d ]
When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory.
mempools are particularly a problem as memory can only be released back to the mempool by an async rpc task running. If all available workqueue threads are waiting on the mempool, no thread is available to return anything.
rpc_malloc() can block, and this might cause deadlocks. So check RPC_IS_ASYNC(), rather than RPC_IS_SWAPPER() to determine if blocking is acceptable.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/sched.c | 4 +++- net/sunrpc/xprtrdma/transport.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index c045f63d11fa..6e4d476c6324 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -1012,8 +1012,10 @@ int rpc_malloc(struct rpc_task *task) struct rpc_buffer *buf; gfp_t gfp = GFP_NOFS;
+ if (RPC_IS_ASYNC(task)) + gfp = GFP_NOWAIT | __GFP_NOWARN; if (RPC_IS_SWAPPER(task)) - gfp = __GFP_MEMALLOC | GFP_NOWAIT | __GFP_NOWARN; + gfp |= __GFP_MEMALLOC;
size += sizeof(struct rpc_buffer); if (size <= RPC_BUFFER_MAXSIZE) diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 16e5696314a4..a52277115500 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -574,8 +574,10 @@ xprt_rdma_allocate(struct rpc_task *task) gfp_t flags;
flags = RPCRDMA_DEF_GFP; + if (RPC_IS_ASYNC(task)) + flags = GFP_NOWAIT | __GFP_NOWARN; if (RPC_IS_SWAPPER(task)) - flags = __GFP_MEMALLOC | GFP_NOWAIT | __GFP_NOWARN; + flags |= __GFP_MEMALLOC;
if (!rpcrdma_check_regbuf(r_xprt, req->rl_sendbuf, rqst->rq_callsize, flags))
From: NeilBrown neilb@suse.de
[ Upstream commit a721035477fb5fb8abc738fbe410b07c12af3dc5 ]
When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory.
xprt_dynamic_alloc_slot can block indefinitely. This can tie up all workqueue threads and NFS can deadlock. So when called from a workqueue, set __GFP_NORETRY.
The rdma alloc_slot already does not block. However it sets the error to -EAGAIN suggesting this will trigger a sleep. It does not. As we can see in call_reserveresult(), only -ENOMEM causes a sleep. -EAGAIN causes immediate retry.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/xprt.c | 5 ++++- net/sunrpc/xprtrdma/transport.c | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index d95426c0bd3a..61603c2664a6 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1686,12 +1686,15 @@ static bool xprt_throttle_congested(struct rpc_xprt *xprt, struct rpc_task *task static struct rpc_rqst *xprt_dynamic_alloc_slot(struct rpc_xprt *xprt) { struct rpc_rqst *req = ERR_PTR(-EAGAIN); + gfp_t gfp_mask = GFP_KERNEL;
if (xprt->num_reqs >= xprt->max_reqs) goto out; ++xprt->num_reqs; spin_unlock(&xprt->reserve_lock); - req = kzalloc(sizeof(struct rpc_rqst), GFP_NOFS); + if (current->flags & PF_WQ_WORKER) + gfp_mask |= __GFP_NORETRY | __GFP_NOWARN; + req = kzalloc(sizeof(*req), gfp_mask); spin_lock(&xprt->reserve_lock); if (req != NULL) goto out; diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index a52277115500..32df23796747 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -521,7 +521,7 @@ xprt_rdma_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task) return;
out_sleep: - task->tk_status = -EAGAIN; + task->tk_status = -ENOMEM; xprt_add_backlog(xprt, task); }
From: NeilBrown neilb@suse.de
[ Upstream commit a80a8461868905823609be97f91776a26befe839 ]
Currently, tasks marked as "swapper" tasks get put to the front of non-priority rpc_queues, and are sorted earlier than non-swapper tasks on the transport's ->xmit_queue.
This is pointless as currently *all* tasks for a mount that has swap enabled on *any* file are marked as "swapper" tasks. So the net result is that the non-priority rpc_queues are reverse-ordered (LIFO).
This scheduling boost is not necessary to avoid deadlocks, and hurts fairness, so remove it. If there were a need to expedite some requests, the tk_priority mechanism is a more appropriate tool.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/sched.c | 7 ------- net/sunrpc/xprt.c | 11 ----------- 2 files changed, 18 deletions(-)
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index 6e4d476c6324..f0f55fbd1375 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -186,11 +186,6 @@ static void __rpc_add_wait_queue_priority(struct rpc_wait_queue *queue,
/* * Add new request to wait queue. - * - * Swapper tasks always get inserted at the head of the queue. - * This should avoid many nasty memory deadlocks and hopefully - * improve overall performance. - * Everyone else gets appended to the queue to ensure proper FIFO behavior. */ static void __rpc_add_wait_queue(struct rpc_wait_queue *queue, struct rpc_task *task, @@ -199,8 +194,6 @@ static void __rpc_add_wait_queue(struct rpc_wait_queue *queue, INIT_LIST_HEAD(&task->u.tk_wait.timer_list); if (RPC_IS_PRIORITY(queue)) __rpc_add_wait_queue_priority(queue, task, queue_priority); - else if (RPC_IS_SWAPPER(task)) - list_add(&task->u.tk_wait.list, &queue->tasks[0]); else list_add_tail(&task->u.tk_wait.list, &queue->tasks[0]); task->tk_waitqueue = queue; diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 61603c2664a6..f5dff09154da 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1353,17 +1353,6 @@ xprt_request_enqueue_transmit(struct rpc_task *task) INIT_LIST_HEAD(&req->rq_xmit2); goto out; } - } else if (RPC_IS_SWAPPER(task)) { - list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { - if (pos->rq_cong || pos->rq_bytes_sent) - continue; - if (RPC_IS_SWAPPER(pos->rq_task)) - continue; - /* Note: req is added _before_ pos */ - list_add_tail(&req->rq_xmit, &pos->rq_xmit); - INIT_LIST_HEAD(&req->rq_xmit2); - goto out; - } } else if (!req->rq_seqno) { list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { if (pos->rq_task->tk_owner != task->tk_owner)
From: NeilBrown neilb@suse.de
[ Upstream commit 64158668ac8b31626a8ce48db4cad08496eb8340 ]
1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding possible deadlocks with "fs_reclaim". These deadlocks could, I believe, eventuate if a buffered read on the swapfile was attempted.
We don't need coherence with the page cache for a swap file, and buffered writes are forbidden anyway. There is no other need for i_rwsem during direct IO. So never take it for swap_rw()
2/ generic_write_checks() explicitly forbids writes to swap, and performs checks that are not needed for swap. So bypass it for swap_rw().
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/direct.c | 42 ++++++++++++++++++++++++++++-------------- fs/nfs/file.c | 4 ++-- include/linux/nfs_fs.h | 8 ++++---- 3 files changed, 34 insertions(+), 20 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 3c0335c15a73..28afc315ec0c 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -172,8 +172,8 @@ ssize_t nfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) VM_BUG_ON(iov_iter_count(iter) != PAGE_SIZE);
if (iov_iter_rw(iter) == READ) - return nfs_file_direct_read(iocb, iter); - return nfs_file_direct_write(iocb, iter); + return nfs_file_direct_read(iocb, iter, true); + return nfs_file_direct_write(iocb, iter, true); }
static void nfs_direct_release_pages(struct page **pages, unsigned int npages) @@ -424,6 +424,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, * nfs_file_direct_read - file direct read operation for NFS files * @iocb: target I/O control block * @iter: vector of user buffers into which to read data + * @swap: flag indicating this is swap IO, not O_DIRECT IO * * We use this function for direct reads instead of calling * generic_file_aio_read() in order to avoid gfar's check to see if @@ -439,7 +440,8 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, * client must read the updated atime from the server back into its * cache. */ -ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter) +ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter, + bool swap) { struct file *file = iocb->ki_filp; struct address_space *mapping = file->f_mapping; @@ -481,12 +483,14 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter) if (iter_is_iovec(iter)) dreq->flags = NFS_ODIRECT_SHOULD_DIRTY;
- nfs_start_io_direct(inode); + if (!swap) + nfs_start_io_direct(inode);
NFS_I(inode)->read_io += count; requested = nfs_direct_read_schedule_iovec(dreq, iter, iocb->ki_pos);
- nfs_end_io_direct(inode); + if (!swap) + nfs_end_io_direct(inode);
if (requested > 0) { result = nfs_direct_wait(dreq); @@ -875,6 +879,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, * nfs_file_direct_write - file direct write operation for NFS files * @iocb: target I/O control block * @iter: vector of user buffers from which to write data + * @swap: flag indicating this is swap IO, not O_DIRECT IO * * We use this function for direct writes instead of calling * generic_file_aio_write() in order to avoid taking the inode @@ -891,7 +896,8 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, * Note that O_APPEND is not supported for NFS direct writes, as there * is no atomic O_APPEND write facility in the NFS protocol. */ -ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) +ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, + bool swap) { ssize_t result, requested; size_t count; @@ -905,7 +911,11 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) dfprintk(FILE, "NFS: direct write(%pD2, %zd@%Ld)\n", file, iov_iter_count(iter), (long long) iocb->ki_pos);
- result = generic_write_checks(iocb, iter); + if (swap) + /* bypass generic checks */ + result = iov_iter_count(iter); + else + result = generic_write_checks(iocb, iter); if (result <= 0) return result; count = result; @@ -936,16 +946,20 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) dreq->iocb = iocb; pnfs_init_ds_commit_info_ops(&dreq->ds_cinfo, inode);
- nfs_start_io_direct(inode); + if (swap) { + requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); + } else { + nfs_start_io_direct(inode);
- requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); + requested = nfs_direct_write_schedule_iovec(dreq, iter, pos);
- if (mapping->nrpages) { - invalidate_inode_pages2_range(mapping, - pos >> PAGE_SHIFT, end); - } + if (mapping->nrpages) { + invalidate_inode_pages2_range(mapping, + pos >> PAGE_SHIFT, end); + }
- nfs_end_io_direct(inode); + nfs_end_io_direct(inode); + }
if (requested > 0) { result = nfs_direct_wait(dreq); diff --git a/fs/nfs/file.c b/fs/nfs/file.c index aa353fd58240..42a16993913a 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -161,7 +161,7 @@ nfs_file_read(struct kiocb *iocb, struct iov_iter *to) ssize_t result;
if (iocb->ki_flags & IOCB_DIRECT) - return nfs_file_direct_read(iocb, to); + return nfs_file_direct_read(iocb, to, false);
dprintk("NFS: read(%pD2, %zu@%lu)\n", iocb->ki_filp, @@ -616,7 +616,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) return result;
if (iocb->ki_flags & IOCB_DIRECT) - return nfs_file_direct_write(iocb, from); + return nfs_file_direct_write(iocb, from, false);
dprintk("NFS: write(%pD2, %zu@%Ld)\n", file, iov_iter_count(from), (long long) iocb->ki_pos); diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 4a733f140939..41102e03512f 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -494,10 +494,10 @@ static inline const struct cred *nfs_file_cred(struct file *file) * linux/fs/nfs/direct.c */ extern ssize_t nfs_direct_IO(struct kiocb *, struct iov_iter *); -extern ssize_t nfs_file_direct_read(struct kiocb *iocb, - struct iov_iter *iter); -extern ssize_t nfs_file_direct_write(struct kiocb *iocb, - struct iov_iter *iter); +ssize_t nfs_file_direct_read(struct kiocb *iocb, + struct iov_iter *iter, bool swap); +ssize_t nfs_file_direct_write(struct kiocb *iocb, + struct iov_iter *iter, bool swap);
/* * linux/fs/nfs/dir.c
From: NeilBrown neilb@suse.de
[ Upstream commit c265de257f558a05c1859ee9e3fed04883b9ec0e ]
The commit handling code is not safe against memory-pressure deadlocks when writing to swap. In particular, nfs_commitdata_alloc() blocks indefinitely waiting for memory, and this can consume all available workqueue threads.
swap-out most likely uses STABLE writes anyway as COND_STABLE indicates that a stable write should be used if the write fits in a single request, and it normally does. However if we ever swap with a small wsize, or gather unusually large numbers of pages for a single write, this might change.
For safety, make it explicit in the code that direct writes used for swap must always use FLUSH_STABLE.
Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/direct.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 28afc315ec0c..c220810c61d1 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -793,7 +793,7 @@ static const struct nfs_pgio_completion_ops nfs_direct_write_completion_ops = { */ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, struct iov_iter *iter, - loff_t pos) + loff_t pos, int ioflags) { struct nfs_pageio_descriptor desc; struct inode *inode = dreq->inode; @@ -801,7 +801,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, size_t requested_bytes = 0; size_t wsize = max_t(size_t, NFS_SERVER(inode)->wsize, PAGE_SIZE);
- nfs_pageio_init_write(&desc, inode, FLUSH_COND_STABLE, false, + nfs_pageio_init_write(&desc, inode, ioflags, false, &nfs_direct_write_completion_ops); desc.pg_dreq = dreq; get_dreq(dreq); @@ -947,11 +947,13 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, pnfs_init_ds_commit_info_ops(&dreq->ds_cinfo, inode);
if (swap) { - requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); + requested = nfs_direct_write_schedule_iovec(dreq, iter, pos, + FLUSH_STABLE); } else { nfs_start_io_direct(inode);
- requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); + requested = nfs_direct_write_schedule_iovec(dreq, iter, pos, + FLUSH_COND_STABLE);
if (mapping->nrpages) { invalidate_inode_pages2_range(mapping,
From: Peter Zijlstra peterz@infradead.org
[ Upstream commit be0075951fde739f14ee2b659e2fd6e2499c46c0 ]
vmlinux.o: warning: objtool: page_fault_oops()+0x13c: unreachable instruction
0000 000000000005b460 <page_fault_oops>: ... 0128 5b588: 49 89 23 mov %rsp,(%r11) 012b 5b58b: 4c 89 dc mov %r11,%rsp 012e 5b58e: 4c 89 f2 mov %r14,%rdx 0131 5b591: 48 89 ee mov %rbp,%rsi 0134 5b594: 4c 89 e7 mov %r12,%rdi 0137 5b597: e8 00 00 00 00 call 5b59c <page_fault_oops+0x13c> 5b598: R_X86_64_PLT32 handle_stack_overflow-0x4 013c 5b59c: 5c pop %rsp
vmlinux.o: warning: objtool: sysvec_reboot()+0x6d: unreachable instruction
0000 00000000000033f0 <sysvec_reboot>: ... 005d 344d: 4c 89 dc mov %r11,%rsp 0060 3450: e8 00 00 00 00 call 3455 <sysvec_reboot+0x65> 3451: R_X86_64_PLT32 irq_enter_rcu-0x4 0065 3455: 48 89 ef mov %rbp,%rdi 0068 3458: e8 00 00 00 00 call 345d <sysvec_reboot+0x6d> 3459: R_X86_64_PC32 .text+0x47d0c 006d 345d: e8 00 00 00 00 call 3462 <sysvec_reboot+0x72> 345e: R_X86_64_PLT32 irq_exit_rcu-0x4 0072 3462: 5c pop %rsp
Both cases are due to a call_on_stack() calling a __noreturn function. Since that's an inline asm, GCC can't do anything about the instructions after the CALL. Therefore put in an explicit ASM_REACHABLE annotation to make sure objtool and gcc are consistently confused about control flow.
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Acked-by: Josh Poimboeuf jpoimboe@redhat.com Link: https://lore.kernel.org/r/20220308154319.468805622@infradead.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/irq_stack.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/irq_stack.h b/arch/x86/include/asm/irq_stack.h index 8d55bd11848c..e087cd7837c3 100644 --- a/arch/x86/include/asm/irq_stack.h +++ b/arch/x86/include/asm/irq_stack.h @@ -99,7 +99,8 @@ }
#define ASM_CALL_ARG0 \ - "call %P[__func] \n" + "call %P[__func] \n" \ + ASM_REACHABLE
#define ASM_CALL_ARG1 \ "movq %[arg1], %%rdi \n" \
From: Nathan Chancellor nathan@kernel.org
[ Upstream commit aaeed6ecc1253ce1463fa1aca0b70a4ccbc9fa75 ]
There are two outstanding issues with CONFIG_X86_X32_ABI and llvm-objcopy, with similar root causes:
1. llvm-objcopy does not properly convert .note.gnu.property when going from x86_64 to x86_x32, resulting in a corrupted section when linking:
https://github.com/ClangBuiltLinux/linux/issues/1141
2. llvm-objcopy produces corrupted compressed debug sections when going from x86_64 to x86_x32, also resulting in an error when linking:
https://github.com/ClangBuiltLinux/linux/issues/514
After commit 41c5ef31ad71 ("x86/ibt: Base IBT bits"), the .note.gnu.property section is always generated when CONFIG_X86_KERNEL_IBT is enabled, which causes the first issue to become visible with an allmodconfig build:
ld.lld: error: arch/x86/entry/vdso/vclock_gettime-x32.o:(.note.gnu.property+0x1c): program property is too short
To avoid this error, do not allow CONFIG_X86_X32_ABI to be selected when using llvm-objcopy. If the two issues ever get fixed in llvm-objcopy, this can be turned into a feature check.
Signed-off-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lore.kernel.org/r/20220314194842.3452-3-nathan@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/Kconfig | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 1f96809606ac..819f8c2e2c67 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2798,6 +2798,11 @@ config IA32_AOUT config X86_X32 bool "x32 ABI for 64-bit mode" depends on X86_64 + # llvm-objcopy does not convert x86_64 .note.gnu.property or + # compressed debug sections to x86_x32 properly: + # https://github.com/ClangBuiltLinux/linux/issues/514 + # https://github.com/ClangBuiltLinux/linux/issues/1141 + depends on $(success,$(OBJCOPY) --version | head -n1 | grep -qv llvm) help Include code to run binaries for the x32 native 32-bit ABI for 64-bit processors. An x32 process gets access to the
From: Jiri Slaby jslaby@suse.cz
[ Upstream commit 988c7c00691008ea1daaa1235680a0da49dab4e8 ]
The commit c15c3747ee32 (serial: samsung: fix potential soft lockup during uart write) added an unlock of port->lock before uart_write_wakeup() and a lock after it. It was always problematic to write data from tty_ldisc_ops::write_wakeup and it was even documented that way. We fixed the line disciplines to conform to this recently. So if there is still a missed one, we should fix them instead of this workaround.
On the top of that, s3c24xx_serial_tx_dma_complete() in this driver still holds the port->lock while calling uart_write_wakeup().
So revert the wrap added by the commit above.
Cc: Thomas Abraham thomas.abraham@linaro.org Cc: Kyungmin Park kyungmin.park@samsung.com Cc: Hyeonkook Kim hk619.kim@samsung.com Signed-off-by: Jiri Slaby jslaby@suse.cz Link: https://lore.kernel.org/r/20220308115153.4225-1-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/tty/serial/samsung_tty.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/tty/serial/samsung_tty.c b/drivers/tty/serial/samsung_tty.c index e2f49863e9c2..319533b3c32a 100644 --- a/drivers/tty/serial/samsung_tty.c +++ b/drivers/tty/serial/samsung_tty.c @@ -922,11 +922,8 @@ static void s3c24xx_serial_tx_chars(struct s3c24xx_uart_port *ourport) return; }
- if (uart_circ_chars_pending(xmit) < WAKEUP_CHARS) { - spin_unlock(&port->lock); + if (uart_circ_chars_pending(xmit) < WAKEUP_CHARS) uart_write_wakeup(port); - spin_lock(&port->lock); - }
if (uart_circ_empty(xmit)) s3c24xx_serial_stop_tx(port);
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit fefb8a2a941338d871e2d83fbd65fbfa068857bd ]
Eliminate anonymous module_init() and module_exit(), which can lead to confusion or ambiguity when reading System.map, crashes/oops/bugs, or an initcall_debug log.
Give each of these init and exit functions unique driver-specific names to eliminate the anonymous names.
Example 1: (System.map) ffffffff832fc78c t init ffffffff832fc79e t init ffffffff832fc8f8 t init
Example 2: (initcall_debug log) calling init+0x0/0x12 @ 1 initcall init+0x0/0x12 returned 0 after 15 usecs calling init+0x0/0x60 @ 1 initcall init+0x0/0x60 returned 0 after 2 usecs calling init+0x0/0x9a @ 1 initcall init+0x0/0x9a returned 0 after 74 usecs
Signed-off-by: Randy Dunlap rdunlap@infradead.org Reviewed-by: Amit Shah amit@kernel.org Cc: virtualization@lists.linux-foundation.org Cc: Arnd Bergmann arnd@arndb.de Link: https://lore.kernel.org/r/20220316192010.19001-3-rdunlap@infradead.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/char/virtio_console.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c index 3adf04766e98..77bc993d7513 100644 --- a/drivers/char/virtio_console.c +++ b/drivers/char/virtio_console.c @@ -2236,7 +2236,7 @@ static struct virtio_driver virtio_rproc_serial = { .remove = virtcons_remove, };
-static int __init init(void) +static int __init virtio_console_init(void) { int err;
@@ -2271,7 +2271,7 @@ static int __init init(void) return err; }
-static void __exit fini(void) +static void __exit virtio_console_fini(void) { reclaim_dma_bufs();
@@ -2281,8 +2281,8 @@ static void __exit fini(void) class_destroy(pdrvdata.class); debugfs_remove_recursive(pdrvdata.debugfs_dir); } -module_init(init); -module_exit(fini); +module_init(virtio_console_init); +module_exit(virtio_console_fini);
MODULE_DESCRIPTION("Virtio console driver"); MODULE_LICENSE("GPL");
From: Haimin Zhang tcs_kernel@tencent.com
[ Upstream commit a53046291020ec41e09181396c1e829287b48d47 ]
Add validation check for JFS_IP(ipimap)->i_imap to prevent a NULL deref in diFree since diFree uses it without do any validations. When function jfs_mount calls diMount to initialize fileset inode allocation map, it can fail and JFS_IP(ipimap)->i_imap won't be initialized. Then it calls diFreeSpecial to close fileset inode allocation map inode and it will flow into jfs_evict_inode. Function jfs_evict_inode just validates JFS_SBI(inode->i_sb)->ipimap, then calls diFree. diFree use JFS_IP(ipimap)->i_imap directly, then it will cause a NULL deref.
Reported-by: TCS Robot tcs_robot@tencent.com Signed-off-by: Haimin Zhang tcs_kernel@tencent.com Signed-off-by: Dave Kleikamp dave.kleikamp@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/jfs/inode.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/jfs/inode.c b/fs/jfs/inode.c index 57ab424c05ff..072821b50ab9 100644 --- a/fs/jfs/inode.c +++ b/fs/jfs/inode.c @@ -146,12 +146,13 @@ void jfs_evict_inode(struct inode *inode) dquot_initialize(inode);
if (JFS_IP(inode)->fileset == FILESYSTEM_I) { + struct inode *ipimap = JFS_SBI(inode->i_sb)->ipimap; truncate_inode_pages_final(&inode->i_data);
if (test_cflag(COMMIT_Freewmap, inode)) jfs_free_zero_link(inode);
- if (JFS_SBI(inode->i_sb)->ipimap) + if (ipimap && JFS_IP(ipimap)->i_imap) diFree(inode);
/*
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 7496b59f588dd52886fdbac7633608097543a0a5 ]
The socket layer requires that we use the socket lock to protect changes to the sock->sk_write_pending field and others.
Reported-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/xprtsock.c | 54 +++++++++++++++++++++++++++++++------------ 1 file changed, 39 insertions(+), 15 deletions(-)
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 04f1b78bcbca..2096b26adde5 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -763,12 +763,12 @@ xs_stream_start_connect(struct sock_xprt *transport) /** * xs_nospace - handle transmit was incomplete * @req: pointer to RPC request + * @transport: pointer to struct sock_xprt * */ -static int xs_nospace(struct rpc_rqst *req) +static int xs_nospace(struct rpc_rqst *req, struct sock_xprt *transport) { - struct rpc_xprt *xprt = req->rq_xprt; - struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); + struct rpc_xprt *xprt = &transport->xprt; struct sock *sk = transport->inet; int ret = -EAGAIN;
@@ -779,25 +779,49 @@ static int xs_nospace(struct rpc_rqst *req)
/* Don't race with disconnect */ if (xprt_connected(xprt)) { + struct socket_wq *wq; + + rcu_read_lock(); + wq = rcu_dereference(sk->sk_wq); + set_bit(SOCKWQ_ASYNC_NOSPACE, &wq->flags); + rcu_read_unlock(); + /* wait for more buffer space */ + set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); sk->sk_write_pending++; xprt_wait_for_buffer_space(xprt); } else ret = -ENOTCONN;
spin_unlock(&xprt->transport_lock); + return ret; +}
- /* Race breaker in case memory is freed before above code is called */ - if (ret == -EAGAIN) { - struct socket_wq *wq; +static int xs_sock_nospace(struct rpc_rqst *req) +{ + struct sock_xprt *transport = + container_of(req->rq_xprt, struct sock_xprt, xprt); + struct sock *sk = transport->inet; + int ret = -EAGAIN;
- rcu_read_lock(); - wq = rcu_dereference(sk->sk_wq); - set_bit(SOCKWQ_ASYNC_NOSPACE, &wq->flags); - rcu_read_unlock(); + lock_sock(sk); + if (!sock_writeable(sk)) + ret = xs_nospace(req, transport); + release_sock(sk); + return ret; +}
- sk->sk_write_space(sk); - } +static int xs_stream_nospace(struct rpc_rqst *req) +{ + struct sock_xprt *transport = + container_of(req->rq_xprt, struct sock_xprt, xprt); + struct sock *sk = transport->inet; + int ret = -EAGAIN; + + lock_sock(sk); + if (!sk_stream_memory_free(sk)) + ret = xs_nospace(req, transport); + release_sock(sk); return ret; }
@@ -887,7 +911,7 @@ static int xs_local_send_request(struct rpc_rqst *req) case -ENOBUFS: break; case -EAGAIN: - status = xs_nospace(req); + status = xs_stream_nospace(req); break; default: dprintk("RPC: sendmsg returned unrecognized error %d\n", @@ -963,7 +987,7 @@ static int xs_udp_send_request(struct rpc_rqst *req) /* Should we call xs_close() here? */ break; case -EAGAIN: - status = xs_nospace(req); + status = xs_sock_nospace(req); break; case -ENETUNREACH: case -ENOBUFS: @@ -1083,7 +1107,7 @@ static int xs_tcp_send_request(struct rpc_rqst *req) /* Should we call xs_close() here? */ break; case -EAGAIN: - status = xs_nospace(req); + status = xs_stream_nospace(req); break; case -ECONNRESET: case -ECONNREFUSED:
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 515dcdcd48736576c6f5c197814da6f81c60a21e ]
The concern is that since nfsiod is sometimes required to kick off a commit, it can get locked up waiting forever in mempool_alloc() instead of failing gracefully and leaving the commit until later.
Try to allocate from the slab first, with GFP_KERNEL | __GFP_NORETRY, then fall back to a non-blocking attempt to allocate from the memory pool.
Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/internal.h | 7 +++++++ fs/nfs/pnfs_nfs.c | 8 ++++++-- fs/nfs/write.c | 24 +++++++++--------------- include/linux/nfs_fs.h | 2 +- 4 files changed, 23 insertions(+), 18 deletions(-)
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 66fc936834f2..7239118d98a3 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -580,6 +580,13 @@ nfs_write_match_verf(const struct nfs_writeverf *verf, !nfs_write_verifier_cmp(&req->wb_verf, &verf->verifier); }
+static inline gfp_t nfs_io_gfp_mask(void) +{ + if (current->flags & PF_WQ_WORKER) + return GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN; + return GFP_KERNEL; +} + /* unlink.c */ extern struct rpc_task * nfs_async_rename(struct inode *old_dir, struct inode *new_dir, diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c index 316f68f96e57..657c242a18ff 100644 --- a/fs/nfs/pnfs_nfs.c +++ b/fs/nfs/pnfs_nfs.c @@ -419,7 +419,7 @@ static struct nfs_commit_data * pnfs_bucket_fetch_commitdata(struct pnfs_commit_bucket *bucket, struct nfs_commit_info *cinfo) { - struct nfs_commit_data *data = nfs_commitdata_alloc(false); + struct nfs_commit_data *data = nfs_commitdata_alloc();
if (!data) return NULL; @@ -515,7 +515,11 @@ pnfs_generic_commit_pagelist(struct inode *inode, struct list_head *mds_pages, unsigned int nreq = 0;
if (!list_empty(mds_pages)) { - data = nfs_commitdata_alloc(true); + data = nfs_commitdata_alloc(); + if (!data) { + nfs_retry_commit(mds_pages, NULL, cinfo, -1); + return -ENOMEM; + } data->ds_commit_index = -1; list_splice_init(mds_pages, &data->pages); list_add_tail(&data->list, &list); diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 0691b0b02147..a296e504e4aa 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -70,27 +70,17 @@ static mempool_t *nfs_wdata_mempool; static struct kmem_cache *nfs_cdata_cachep; static mempool_t *nfs_commit_mempool;
-struct nfs_commit_data *nfs_commitdata_alloc(bool never_fail) +struct nfs_commit_data *nfs_commitdata_alloc(void) { struct nfs_commit_data *p;
- if (never_fail) - p = mempool_alloc(nfs_commit_mempool, GFP_NOIO); - else { - /* It is OK to do some reclaim, not no safe to wait - * for anything to be returned to the pool. - * mempool_alloc() cannot handle that particular combination, - * so we need two separate attempts. - */ + p = kmem_cache_zalloc(nfs_cdata_cachep, nfs_io_gfp_mask()); + if (!p) { p = mempool_alloc(nfs_commit_mempool, GFP_NOWAIT); - if (!p) - p = kmem_cache_alloc(nfs_cdata_cachep, GFP_NOIO | - __GFP_NOWARN | __GFP_NORETRY); if (!p) return NULL; + memset(p, 0, sizeof(*p)); } - - memset(p, 0, sizeof(*p)); INIT_LIST_HEAD(&p->pages); return p; } @@ -1809,7 +1799,11 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how, if (list_empty(head)) return 0;
- data = nfs_commitdata_alloc(true); + data = nfs_commitdata_alloc(); + if (!data) { + nfs_retry_commit(head, NULL, cinfo, -1); + return -ENOMEM; + }
/* Set up the argument struct */ nfs_init_commit(data, head, NULL, cinfo); diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 41102e03512f..5ffcde9ac413 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -567,7 +567,7 @@ extern int nfs_wb_all(struct inode *inode); extern int nfs_wb_page(struct inode *inode, struct page *page); extern int nfs_wb_page_cancel(struct inode *inode, struct page* page); extern int nfs_commit_inode(struct inode *, int); -extern struct nfs_commit_data *nfs_commitdata_alloc(bool never_fail); +extern struct nfs_commit_data *nfs_commitdata_alloc(void); extern void nfs_commit_free(struct nfs_commit_data *data); bool nfs_commit_end(struct nfs_mds_commit_info *cinfo);
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 0bae835b63c53f86cdc524f5962e39409585b22c ]
In a low memory situation, allow the NFS writeback code to fail without getting stuck in infinite loops in mempool_alloc().
Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/pagelist.c | 10 +++++----- fs/nfs/write.c | 10 ++++++++-- 2 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c index b1130dc200d2..f2fe23e6c51f 100644 --- a/fs/nfs/pagelist.c +++ b/fs/nfs/pagelist.c @@ -90,10 +90,10 @@ void nfs_set_pgio_error(struct nfs_pgio_header *hdr, int error, loff_t pos) } }
-static inline struct nfs_page * -nfs_page_alloc(void) +static inline struct nfs_page *nfs_page_alloc(void) { - struct nfs_page *p = kmem_cache_zalloc(nfs_page_cachep, GFP_KERNEL); + struct nfs_page *p = + kmem_cache_zalloc(nfs_page_cachep, nfs_io_gfp_mask()); if (p) INIT_LIST_HEAD(&p->wb_list); return p; @@ -901,7 +901,7 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *desc, struct nfs_commit_info cinfo; struct nfs_page_array *pg_array = &hdr->page_array; unsigned int pagecount, pageused; - gfp_t gfp_flags = GFP_KERNEL; + gfp_t gfp_flags = nfs_io_gfp_mask();
pagecount = nfs_page_array_len(mirror->pg_base, mirror->pg_count); pg_array->npages = pagecount; @@ -988,7 +988,7 @@ nfs_pageio_alloc_mirrors(struct nfs_pageio_descriptor *desc, desc->pg_mirrors_dynamic = NULL; if (mirror_count == 1) return desc->pg_mirrors_static; - ret = kmalloc_array(mirror_count, sizeof(*ret), GFP_KERNEL); + ret = kmalloc_array(mirror_count, sizeof(*ret), nfs_io_gfp_mask()); if (ret != NULL) { for (i = 0; i < mirror_count; i++) nfs_pageio_mirror_init(&ret[i], desc->pg_bsize); diff --git a/fs/nfs/write.c b/fs/nfs/write.c index a296e504e4aa..d21b25511499 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -94,9 +94,15 @@ EXPORT_SYMBOL_GPL(nfs_commit_free);
static struct nfs_pgio_header *nfs_writehdr_alloc(void) { - struct nfs_pgio_header *p = mempool_alloc(nfs_wdata_mempool, GFP_KERNEL); + struct nfs_pgio_header *p;
- memset(p, 0, sizeof(*p)); + p = kmem_cache_zalloc(nfs_wdata_cachep, nfs_io_gfp_mask()); + if (!p) { + p = mempool_alloc(nfs_wdata_mempool, GFP_NOWAIT); + if (!p) + return NULL; + memset(p, 0, sizeof(*p)); + } p->rw_mode = FMODE_WRITE; return p; }
From: Naresh Kamboju naresh.kamboju@linaro.org
[ Upstream commit d9142e1cf3bbdaf21337767114ecab26fe702d47 ]
selftest net tls test cases need TLS=m without this the test hangs. Enabling config TLS solves this problem and runs to complete. - CONFIG_TLS=m
Reported-by: Linux Kernel Functional Testing lkft@linaro.org Signed-off-by: Naresh Kamboju naresh.kamboju@linaro.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/config | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config index 86ab429fe7f3..29fadd7512f7 100644 --- a/tools/testing/selftests/net/config +++ b/tools/testing/selftests/net/config @@ -43,4 +43,5 @@ CONFIG_NET_ACT_TUNNEL_KEY=m CONFIG_NET_ACT_MIRRED=m CONFIG_BAREUDP=m CONFIG_IPV6_IOAM6_LWTUNNEL=y +CONFIG_TLS=m CONFIG_CRYPTO_SM4=y
From: Helge Deller deller@gmx.de
[ Upstream commit 939fc856676c266c3bc347c1c1661872a3725c0f ]
Add the missing logic to allow Lasi, WAX and Dino to set the CPU affinity. This fixes IRQ migration to other CPUs when a CPU is shutdown which currently holds the IRQs for one of those chips.
Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/parisc/dino.c | 41 +++++++++++++++++++++++++++++++++-------- drivers/parisc/gsc.c | 31 +++++++++++++++++++++++++++++++ drivers/parisc/gsc.h | 1 + drivers/parisc/lasi.c | 7 +++---- drivers/parisc/wax.c | 7 +++---- 5 files changed, 71 insertions(+), 16 deletions(-)
diff --git a/drivers/parisc/dino.c b/drivers/parisc/dino.c index 952a92504df6..e33036281327 100644 --- a/drivers/parisc/dino.c +++ b/drivers/parisc/dino.c @@ -142,9 +142,8 @@ struct dino_device { struct pci_hba_data hba; /* 'C' inheritance - must be first */ spinlock_t dinosaur_pen; - unsigned long txn_addr; /* EIR addr to generate interrupt */ - u32 txn_data; /* EIR data assign to each dino */ u32 imr; /* IRQ's which are enabled */ + struct gsc_irq gsc_irq; int global_irq[DINO_LOCAL_IRQS]; /* map IMR bit to global irq */ #ifdef DINO_DEBUG unsigned int dino_irr0; /* save most recent IRQ line stat */ @@ -339,14 +338,43 @@ static void dino_unmask_irq(struct irq_data *d) if (tmp & DINO_MASK_IRQ(local_irq)) { DBG(KERN_WARNING "%s(): IRQ asserted! (ILR 0x%x)\n", __func__, tmp); - gsc_writel(dino_dev->txn_data, dino_dev->txn_addr); + gsc_writel(dino_dev->gsc_irq.txn_data, dino_dev->gsc_irq.txn_addr); } }
+#ifdef CONFIG_SMP +static int dino_set_affinity_irq(struct irq_data *d, const struct cpumask *dest, + bool force) +{ + struct dino_device *dino_dev = irq_data_get_irq_chip_data(d); + struct cpumask tmask; + int cpu_irq; + u32 eim; + + if (!cpumask_and(&tmask, dest, cpu_online_mask)) + return -EINVAL; + + cpu_irq = cpu_check_affinity(d, &tmask); + if (cpu_irq < 0) + return cpu_irq; + + dino_dev->gsc_irq.txn_addr = txn_affinity_addr(d->irq, cpu_irq); + eim = ((u32) dino_dev->gsc_irq.txn_addr) | dino_dev->gsc_irq.txn_data; + __raw_writel(eim, dino_dev->hba.base_addr+DINO_IAR0); + + irq_data_update_effective_affinity(d, &tmask); + + return IRQ_SET_MASK_OK; +} +#endif + static struct irq_chip dino_interrupt_type = { .name = "GSC-PCI", .irq_unmask = dino_unmask_irq, .irq_mask = dino_mask_irq, +#ifdef CONFIG_SMP + .irq_set_affinity = dino_set_affinity_irq, +#endif };
@@ -806,7 +834,6 @@ static int __init dino_common_init(struct parisc_device *dev, { int status; u32 eim; - struct gsc_irq gsc_irq; struct resource *res;
pcibios_register_hba(&dino_dev->hba); @@ -821,10 +848,8 @@ static int __init dino_common_init(struct parisc_device *dev, ** still only has 11 IRQ input lines - just map some of them ** to a different processor. */ - dev->irq = gsc_alloc_irq(&gsc_irq); - dino_dev->txn_addr = gsc_irq.txn_addr; - dino_dev->txn_data = gsc_irq.txn_data; - eim = ((u32) gsc_irq.txn_addr) | gsc_irq.txn_data; + dev->irq = gsc_alloc_irq(&dino_dev->gsc_irq); + eim = ((u32) dino_dev->gsc_irq.txn_addr) | dino_dev->gsc_irq.txn_data;
/* ** Dino needs a PA "IRQ" to get a processor's attention. diff --git a/drivers/parisc/gsc.c b/drivers/parisc/gsc.c index ed9371acf37e..ec175ae99873 100644 --- a/drivers/parisc/gsc.c +++ b/drivers/parisc/gsc.c @@ -135,10 +135,41 @@ static void gsc_asic_unmask_irq(struct irq_data *d) */ }
+#ifdef CONFIG_SMP +static int gsc_set_affinity_irq(struct irq_data *d, const struct cpumask *dest, + bool force) +{ + struct gsc_asic *gsc_dev = irq_data_get_irq_chip_data(d); + struct cpumask tmask; + int cpu_irq; + + if (!cpumask_and(&tmask, dest, cpu_online_mask)) + return -EINVAL; + + cpu_irq = cpu_check_affinity(d, &tmask); + if (cpu_irq < 0) + return cpu_irq; + + gsc_dev->gsc_irq.txn_addr = txn_affinity_addr(d->irq, cpu_irq); + gsc_dev->eim = ((u32) gsc_dev->gsc_irq.txn_addr) | gsc_dev->gsc_irq.txn_data; + + /* switch IRQ's for devices below LASI/WAX to other CPU */ + gsc_writel(gsc_dev->eim, gsc_dev->hpa + OFFSET_IAR); + + irq_data_update_effective_affinity(d, &tmask); + + return IRQ_SET_MASK_OK; +} +#endif + + static struct irq_chip gsc_asic_interrupt_type = { .name = "GSC-ASIC", .irq_unmask = gsc_asic_unmask_irq, .irq_mask = gsc_asic_mask_irq, +#ifdef CONFIG_SMP + .irq_set_affinity = gsc_set_affinity_irq, +#endif };
int gsc_assign_irq(struct irq_chip *type, void *data) diff --git a/drivers/parisc/gsc.h b/drivers/parisc/gsc.h index 86abad3fa215..73cbd0bb1975 100644 --- a/drivers/parisc/gsc.h +++ b/drivers/parisc/gsc.h @@ -31,6 +31,7 @@ struct gsc_asic { int version; int type; int eim; + struct gsc_irq gsc_irq; int global_irq[32]; };
diff --git a/drivers/parisc/lasi.c b/drivers/parisc/lasi.c index 4e4fd12c2112..6ef621adb63a 100644 --- a/drivers/parisc/lasi.c +++ b/drivers/parisc/lasi.c @@ -163,7 +163,6 @@ static int __init lasi_init_chip(struct parisc_device *dev) { extern void (*chassis_power_off)(void); struct gsc_asic *lasi; - struct gsc_irq gsc_irq; int ret;
lasi = kzalloc(sizeof(*lasi), GFP_KERNEL); @@ -185,7 +184,7 @@ static int __init lasi_init_chip(struct parisc_device *dev) lasi_init_irq(lasi);
/* the IRQ lasi should use */ - dev->irq = gsc_alloc_irq(&gsc_irq); + dev->irq = gsc_alloc_irq(&lasi->gsc_irq); if (dev->irq < 0) { printk(KERN_ERR "%s(): cannot get GSC irq\n", __func__); @@ -193,9 +192,9 @@ static int __init lasi_init_chip(struct parisc_device *dev) return -EBUSY; }
- lasi->eim = ((u32) gsc_irq.txn_addr) | gsc_irq.txn_data; + lasi->eim = ((u32) lasi->gsc_irq.txn_addr) | lasi->gsc_irq.txn_data;
- ret = request_irq(gsc_irq.irq, gsc_asic_intr, 0, "lasi", lasi); + ret = request_irq(lasi->gsc_irq.irq, gsc_asic_intr, 0, "lasi", lasi); if (ret < 0) { kfree(lasi); return ret; diff --git a/drivers/parisc/wax.c b/drivers/parisc/wax.c index 5b6df1516235..73a2b01f8d9c 100644 --- a/drivers/parisc/wax.c +++ b/drivers/parisc/wax.c @@ -68,7 +68,6 @@ static int __init wax_init_chip(struct parisc_device *dev) { struct gsc_asic *wax; struct parisc_device *parent; - struct gsc_irq gsc_irq; int ret;
wax = kzalloc(sizeof(*wax), GFP_KERNEL); @@ -85,7 +84,7 @@ static int __init wax_init_chip(struct parisc_device *dev) wax_init_irq(wax);
/* the IRQ wax should use */ - dev->irq = gsc_claim_irq(&gsc_irq, WAX_GSC_IRQ); + dev->irq = gsc_claim_irq(&wax->gsc_irq, WAX_GSC_IRQ); if (dev->irq < 0) { printk(KERN_ERR "%s(): cannot get GSC irq\n", __func__); @@ -93,9 +92,9 @@ static int __init wax_init_chip(struct parisc_device *dev) return -EBUSY; }
- wax->eim = ((u32) gsc_irq.txn_addr) | gsc_irq.txn_data; + wax->eim = ((u32) wax->gsc_irq.txn_addr) | wax->gsc_irq.txn_data;
- ret = request_irq(gsc_irq.irq, gsc_asic_intr, 0, "wax", wax); + ret = request_irq(wax->gsc_irq.irq, gsc_asic_intr, 0, "wax", wax); if (ret < 0) { kfree(wax); return ret;
From: John David Anglin dave.anglin@bell.net
[ Upstream commit a9fe7fa7d874a536e0540469f314772c054a0323 ]
This change fixes the following:
1) The flags variable is not initialized. Always use raw_spin_lock_irqsave and raw_spin_unlock_irqrestore to serialize patching.
2) flush_kernel_vmap_range is primarily intended for DMA flushes. Since __patch_text_multiple is often called with interrupts disabled, it is better to directly call flush_kernel_dcache_range_asm and flush_kernel_icache_range_asm. This avoids an extra call.
3) The final call to flush_icache_range is unnecessary.
Signed-off-by: John David Anglin dave.anglin@bell.net Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/parisc/kernel/patch.c | 25 +++++++++++-------------- 1 file changed, 11 insertions(+), 14 deletions(-)
diff --git a/arch/parisc/kernel/patch.c b/arch/parisc/kernel/patch.c index 80a0ab372802..e59574f65e64 100644 --- a/arch/parisc/kernel/patch.c +++ b/arch/parisc/kernel/patch.c @@ -40,10 +40,7 @@ static void __kprobes *patch_map(void *addr, int fixmap, unsigned long *flags,
*need_unmap = 1; set_fixmap(fixmap, page_to_phys(page)); - if (flags) - raw_spin_lock_irqsave(&patch_lock, *flags); - else - __acquire(&patch_lock); + raw_spin_lock_irqsave(&patch_lock, *flags);
return (void *) (__fix_to_virt(fixmap) + (uintaddr & ~PAGE_MASK)); } @@ -52,10 +49,7 @@ static void __kprobes patch_unmap(int fixmap, unsigned long *flags) { clear_fixmap(fixmap);
- if (flags) - raw_spin_unlock_irqrestore(&patch_lock, *flags); - else - __release(&patch_lock); + raw_spin_unlock_irqrestore(&patch_lock, *flags); }
void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len) @@ -67,8 +61,9 @@ void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len) int mapped;
/* Make sure we don't have any aliases in cache */ - flush_kernel_vmap_range(addr, len); - flush_icache_range(start, end); + flush_kernel_dcache_range_asm(start, end); + flush_kernel_icache_range_asm(start, end); + flush_tlb_kernel_range(start, end);
p = fixmap = patch_map(addr, FIX_TEXT_POKE0, &flags, &mapped);
@@ -81,8 +76,10 @@ void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len) * We're crossing a page boundary, so * need to remap */ - flush_kernel_vmap_range((void *)fixmap, - (p-fixmap) * sizeof(*p)); + flush_kernel_dcache_range_asm((unsigned long)fixmap, + (unsigned long)p); + flush_tlb_kernel_range((unsigned long)fixmap, + (unsigned long)p); if (mapped) patch_unmap(FIX_TEXT_POKE0, &flags); p = fixmap = patch_map(addr, FIX_TEXT_POKE0, &flags, @@ -90,10 +87,10 @@ void __kprobes __patch_text_multiple(void *addr, u32 *insn, unsigned int len) } }
- flush_kernel_vmap_range((void *)fixmap, (p-fixmap) * sizeof(*p)); + flush_kernel_dcache_range_asm((unsigned long)fixmap, (unsigned long)p); + flush_tlb_kernel_range((unsigned long)fixmap, (unsigned long)p); if (mapped) patch_unmap(FIX_TEXT_POKE0, &flags); - flush_icache_range(start, end); }
void __kprobes __patch_text(void *addr, u32 insn)
From: Mauricio Faria de Oliveira mfo@canonical.com
commit 6c8e2a256915a223f6289f651d6b926cd7135c9e upstream.
Problem: =======
Userspace might read the zero-page instead of actual data from a direct IO read on a block device if the buffers have been called madvise(MADV_FREE) on earlier (this is discussed below) due to a race between page reclaim on MADV_FREE and blkdev direct IO read.
- Race condition: ==============
During page reclaim, the MADV_FREE page check in try_to_unmap_one() checks if the page is not dirty, then discards its rmap PTE(s) (vs. remap back if the page is dirty).
However, after try_to_unmap_one() returns to shrink_page_list(), it might keep the page _anyway_ if page_ref_freeze() fails (it expects exactly _one_ page reference, from the isolation for page reclaim).
Well, blkdev_direct_IO() gets references for all pages, and on READ operations it only sets them dirty _later_.
So, if MADV_FREE'd pages (i.e., not dirty) are used as buffers for direct IO read from block devices, and page reclaim happens during __blkdev_direct_IO[_simple]() exactly AFTER bio_iov_iter_get_pages() returns, but BEFORE the pages are set dirty, the situation happens.
The direct IO read eventually completes. Now, when userspace reads the buffers, the PTE is no longer there and the page fault handler do_anonymous_page() services that with the zero-page, NOT the data!
A synthetic reproducer is provided.
- Page faults: ===========
If page reclaim happens BEFORE bio_iov_iter_get_pages() the issue doesn't happen, because that faults-in all pages as writeable, so do_anonymous_page() sets up a new page/rmap/PTE, and that is used by direct IO. The userspace reads don't fault as the PTE is there (thus zero-page is not used/setup).
But if page reclaim happens AFTER it / BEFORE setting pages dirty, the PTE is no longer there; the subsequent page faults can't help:
The data-read from the block device probably won't generate faults due to DMA (no MMU) but even in the case it wouldn't use DMA, that happens on different virtual addresses (not user-mapped addresses) because `struct bio_vec` stores `struct page` to figure addresses out (which are different from user-mapped addresses) for the read.
Thus userspace reads (to user-mapped addresses) still fault, then do_anonymous_page() gets another `struct page` that would address/ map to other memory than the `struct page` used by `struct bio_vec` for the read. (The original `struct page` is not available, since it wasn't freed, as page_ref_freeze() failed due to more page refs. And even if it were available, its data cannot be trusted anymore.)
Solution: ========
One solution is to check for the expected page reference count in try_to_unmap_one().
There should be one reference from the isolation (that is also checked in shrink_page_list() with page_ref_freeze()) plus one or more references from page mapping(s) (put in discard: label). Further references mean that rmap/PTE cannot be unmapped/nuked.
(Note: there might be more than one reference from mapping due to fork()/clone() without CLONE_VM, which use the same `struct page` for references, until the copy-on-write page gets copied.)
So, additional page references (e.g., from direct IO read) now prevent the rmap/PTE from being unmapped/dropped; similarly to the page is not freed per shrink_page_list()/page_ref_freeze()).
- Races and Barriers: ==================
The new check in try_to_unmap_one() should be safe in races with bio_iov_iter_get_pages() in get_user_pages() fast and slow paths, as it's done under the PTE lock.
The fast path doesn't take the lock, but it checks if the PTE has changed and if so, it drops the reference and leaves the page for the slow path (which does take that lock).
The fast path requires synchronization w/ full memory barrier: it writes the page reference count first then it reads the PTE later, while try_to_unmap() writes PTE first then it reads page refcount.
And a second barrier is needed, as the page dirty flag should not be read before the page reference count (as in __remove_mapping()). (This can be a load memory barrier only; no writes are involved.)
Call stack/comments:
- try_to_unmap_one() - page_vma_mapped_walk() - map_pte() # see pte_offset_map_lock(): pte_offset_map() spin_lock()
- ptep_get_and_clear() # write PTE - smp_mb() # (new barrier) GUP fast path - page_ref_count() # (new check) read refcount
- page_vma_mapped_walk_done() # see pte_unmap_unlock(): pte_unmap() spin_unlock()
- bio_iov_iter_get_pages() - __bio_iov_iter_get_pages() - iov_iter_get_pages() - get_user_pages_fast() - internal_get_user_pages_fast()
# fast path - lockless_pages_from_mm() - gup_{pgd,p4d,pud,pmd,pte}_range() ptep = pte_offset_map() # not _lock() pte = ptep_get_lockless(ptep)
page = pte_page(pte) try_grab_compound_head(page) # inc refcount # (RMW/barrier # on success)
if (pte_val(pte) != pte_val(*ptep)) # read PTE put_compound_head(page) # dec refcount # go slow path
# slow path - __gup_longterm_unlocked() - get_user_pages_unlocked() - __get_user_pages_locked() - __get_user_pages() - follow_{page,p4d,pud,pmd}_mask() - follow_page_pte() ptep = pte_offset_map_lock() pte = *ptep page = vm_normal_page(pte) try_grab_page(page) # inc refcount pte_unmap_unlock()
- Huge Pages: ==========
Regarding transparent hugepages, that logic shouldn't change, as MADV_FREE (aka lazyfree) pages are PageAnon() && !PageSwapBacked() (madvise_free_pte_range() -> mark_page_lazyfree() -> lru_lazyfree_fn()) thus should reach shrink_page_list() -> split_huge_page_to_list() before try_to_unmap[_one](), so it deals with normal pages only.
(And in case unlikely/TTU_SPLIT_HUGE_PMD/split_huge_pmd_address() happens, which should not or be rare, the page refcount should be greater than mapcount: the head page is referenced by tail pages. That also prevents checking the head `page` then incorrectly call page_remove_rmap(subpage) for a tail page, that isn't even in the shrink_page_list()'s page_list (an effect of split huge pmd/pmvw), as it might happen today in this unlikely scenario.)
MADV_FREE'd buffers: ===================
So, back to the "if MADV_FREE pages are used as buffers" note. The case is arguable, and subject to multiple interpretations.
The madvise(2) manual page on the MADV_FREE advice value says:
1) 'After a successful MADV_FREE ... data will be lost when the kernel frees the pages.' 2) 'the free operation will be canceled if the caller writes into the page' / 'subsequent writes ... will succeed and then [the] kernel cannot free those dirtied pages' 3) 'If there is no subsequent write, the kernel can free the pages at any time.'
Thoughts, questions, considerations... respectively:
1) Since the kernel didn't actually free the page (page_ref_freeze() failed), should the data not have been lost? (on userspace read.) 2) Should writes performed by the direct IO read be able to cancel the free operation? - Should the direct IO read be considered as 'the caller' too, as it's been requested by 'the caller'? - Should the bio technique to dirty pages on return to userspace (bio_check_pages_dirty() is called/used by __blkdev_direct_IO()) be considered in another/special way here? 3) Should an upcoming write from a previously requested direct IO read be considered as a subsequent write, so the kernel should not free the pages? (as it's known at the time of page reclaim.)
And lastly:
Technically, the last point would seem a reasonable consideration and balance, as the madvise(2) manual page apparently (and fairly) seem to assume that 'writes' are memory access from the userspace process (not explicitly considering writes from the kernel or its corner cases; again, fairly).. plus the kernel fix implementation for the corner case of the largely 'non-atomic write' encompassed by a direct IO read operation, is relatively simple; and it helps.
Reproducer: ==========
@ test.c (simplified, but works)
#define _GNU_SOURCE #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/mman.h>
int main() { int fd, i; char *buf;
fd = open(DEV, O_RDONLY | O_DIRECT);
buf = mmap(NULL, BUF_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
for (i = 0; i < BUF_SIZE; i += PAGE_SIZE) buf[i] = 1; // init to non-zero
madvise(buf, BUF_SIZE, MADV_FREE);
read(fd, buf, BUF_SIZE);
for (i = 0; i < BUF_SIZE; i += PAGE_SIZE) printf("%p: 0x%x\n", &buf[i], buf[i]);
return 0; }
@ block/fops.c (formerly fs/block_dev.c)
+#include <linux/swap.h> ... ... __blkdev_direct_IO[_simple](...) { ... + if (!strcmp(current->comm, "good")) + shrink_all_memory(ULONG_MAX); + ret = bio_iov_iter_get_pages(...); + + if (!strcmp(current->comm, "bad")) + shrink_all_memory(ULONG_MAX); ... }
@ shell
# NUM_PAGES=4 # PAGE_SIZE=$(getconf PAGE_SIZE)
# yes | dd of=test.img bs=${PAGE_SIZE} count=${NUM_PAGES} # DEV=$(losetup -f --show test.img)
# gcc -DDEV="$DEV" \ -DBUF_SIZE=$((PAGE_SIZE * NUM_PAGES)) \ -DPAGE_SIZE=${PAGE_SIZE} \ test.c -o test
# od -tx1 $DEV 0000000 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a * 0040000
# mv test good # ./good 0x7f7c10418000: 0x79 0x7f7c10419000: 0x79 0x7f7c1041a000: 0x79 0x7f7c1041b000: 0x79
# mv good bad # ./bad 0x7fa1b8050000: 0x0 0x7fa1b8051000: 0x0 0x7fa1b8052000: 0x0 0x7fa1b8053000: 0x0
Note: the issue is consistent on v5.17-rc3, but it's intermittent with the support of MADV_FREE on v4.5 (60%-70% error; needs swap). [wrap do_direct_IO() in do_blockdev_direct_IO() @ fs/direct-io.c].
- v5.17-rc3:
# for i in {1..1000}; do ./good; done \ | cut -d: -f2 | sort | uniq -c 4000 0x79
# mv good bad # for i in {1..1000}; do ./bad; done \ | cut -d: -f2 | sort | uniq -c 4000 0x0
# free | grep Swap Swap: 0 0 0
- v4.5:
# for i in {1..1000}; do ./good; done \ | cut -d: -f2 | sort | uniq -c 4000 0x79
# mv good bad # for i in {1..1000}; do ./bad; done \ | cut -d: -f2 | sort | uniq -c 2702 0x0 1298 0x79
# swapoff -av swapoff /swap
# for i in {1..1000}; do ./bad; done \ | cut -d: -f2 | sort | uniq -c 4000 0x79
Ceph/TCMalloc: =============
For documentation purposes, the use case driving the analysis/fix is Ceph on Ubuntu 18.04, as the TCMalloc library there still uses MADV_FREE to release unused memory to the system from the mmap'ed page heap (might be committed back/used again; it's not munmap'ed.) - PageHeap::DecommitSpan() -> TCMalloc_SystemRelease() -> madvise() - PageHeap::CommitSpan() -> TCMalloc_SystemCommit() -> do nothing.
Note: TCMalloc switched back to MADV_DONTNEED a few commits after the release in Ubuntu 18.04 (google-perftools/gperftools 2.5), so the issue just 'disappeared' on Ceph on later Ubuntu releases but is still present in the kernel, and can be hit by other use cases.
The observed issue seems to be the old Ceph bug #22464 [1], where checksum mismatches are observed (and instrumentation with buffer dumps shows zero-pages read from mmap'ed/MADV_FREE'd page ranges).
The issue in Ceph was reasonably deemed a kernel bug (comment #50) and mostly worked around with a retry mechanism, but other parts of Ceph could still hit that (rocksdb). Anyway, it's less likely to be hit again as TCMalloc switched out of MADV_FREE by default.
(Some kernel versions/reports from the Ceph bug, and relation with the MADV_FREE introduction/changes; TCMalloc versions not checked.) - 4.4 good - 4.5 (madv_free: introduction) - 4.9 bad - 4.10 good? maybe a swapless system - 4.12 (madv_free: no longer free instantly on swapless systems) - 4.13 bad
[1] https://tracker.ceph.com/issues/22464
Thanks: ======
Several people contributed to analysis/discussions/tests/reproducers in the first stages when drilling down on ceph/tcmalloc/linux kernel:
- Dan Hill - Dan Streetman - Dongdong Tao - Gavin Guo - Gerald Yang - Heitor Alves de Siqueira - Ioanna Alifieraki - Jay Vosburgh - Matthew Ruffell - Ponnuvel Palaniyappan
Reviews, suggestions, corrections, comments:
- Minchan Kim - Yu Zhao - Huang, Ying - John Hubbard - Christoph Hellwig
[mfo@canonical.com: v4] Link: https://lkml.kernel.org/r/20220209202659.183418-1-mfo@canonical.comLink: https://lkml.kernel.org/r/20220131230255.789059-1-mfo@canonical.com
Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages") Signed-off-by: Mauricio Faria de Oliveira mfo@canonical.com Reviewed-by: "Huang, Ying" ying.huang@intel.com Cc: Minchan Kim minchan@kernel.org Cc: Yu Zhao yuzhao@google.com Cc: Yang Shi shy828301@gmail.com Cc: Miaohe Lin linmiaohe@huawei.com Cc: Dan Hill daniel.hill@canonical.com Cc: Dan Streetman dan.streetman@canonical.com Cc: Dongdong Tao dongdong.tao@canonical.com Cc: Gavin Guo gavin.guo@canonical.com Cc: Gerald Yang gerald.yang@canonical.com Cc: Heitor Alves de Siqueira halves@canonical.com Cc: Ioanna Alifieraki ioanna-maria.alifieraki@canonical.com Cc: Jay Vosburgh jay.vosburgh@canonical.com Cc: Matthew Ruffell matthew.ruffell@canonical.com Cc: Ponnuvel Palaniyappan ponnuvel.palaniyappan@canonical.com Cc: stable@vger.kernel.org Cc: Christoph Hellwig hch@infradead.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org [mfo: backport: replace folio/test_flag with page/flag equivalents; real Fixes: 854e9ed09ded ("mm: support madvise(MADV_FREE)") in v4.] Signed-off-by: Mauricio Faria de Oliveira mfo@canonical.com Signed-off-by: Sasha Levin sashal@kernel.org --- mm/rmap.c | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-)
diff --git a/mm/rmap.c b/mm/rmap.c index 6aebd1747251..3e340ee380cb 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1570,7 +1570,30 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
/* MADV_FREE page check */ if (!PageSwapBacked(page)) { - if (!PageDirty(page)) { + int ref_count, map_count; + + /* + * Synchronize with gup_pte_range(): + * - clear PTE; barrier; read refcount + * - inc refcount; barrier; read PTE + */ + smp_mb(); + + ref_count = page_ref_count(page); + map_count = page_mapcount(page); + + /* + * Order reads for page refcount and dirty flag + * (see comments in __remove_mapping()). + */ + smp_rmb(); + + /* + * The only page refs must be one from isolation + * plus the rmap(s) (dropped by discard:). + */ + if (ref_count == 1 + map_count && + !PageDirty(page)) { /* Invalidate as we cleared the pte */ mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE);
From: Mateusz Jończyk mat.jonczyk@o2.pl
[ Upstream commit d35786b3a28dee20b12962ae2dd365892a99ed1a ]
No function is checking mc146818_get_time() return values yet, so correct them to make them more customary.
Signed-off-by: Mateusz Jończyk mat.jonczyk@o2.pl Cc: Alessandro Zummo a.zummo@towertech.it Cc: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Link: https://lore.kernel.org/r/20211210200131.153887-3-mat.jonczyk@o2.pl Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/rtc/rtc-mc146818-lib.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/rtc/rtc-mc146818-lib.c b/drivers/rtc/rtc-mc146818-lib.c index 04b05e3b68cb..bd48cee3027e 100644 --- a/drivers/rtc/rtc-mc146818-lib.c +++ b/drivers/rtc/rtc-mc146818-lib.c @@ -25,7 +25,7 @@ unsigned int mc146818_get_time(struct rtc_time *time) if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x40) != 0)) { spin_unlock_irqrestore(&rtc_lock, flags); memset(time, 0xff, sizeof(*time)); - return 0; + return -EIO; }
/* @@ -116,7 +116,7 @@ unsigned int mc146818_get_time(struct rtc_time *time)
time->tm_mon--;
- return RTC_24H; + return 0; } EXPORT_SYMBOL_GPL(mc146818_get_time);
From: Mateusz Jończyk mat.jonczyk@o2.pl
[ Upstream commit 0dd8d6cb9eddfe637bcd821bbfd40ebd5a0737b9 ]
There are 4 users of mc146818_get_time() and none of them was checking the return value from this function. Change this.
Print the appropriate warnings in callers of mc146818_get_time() instead of in the function mc146818_get_time() itself, in order not to add strings to rtc-mc146818-lib.c, which is kind of a library.
The callers of alpha_rtc_read_time() and cmos_read_time() may use the contents of (struct rtc_time *) even when the functions return a failure code. Therefore, set the contents of (struct rtc_time *) to 0x00, which looks more sensible then 0xff and aligns with the (possibly stale?) comment in cmos_read_time:
/* * If pm_trace abused the RTC for storage, set the timespec to 0, * which tells the caller that this RTC value is unusable. */
For consistency, do this in mc146818_get_time().
Note: hpet_rtc_interrupt() may call mc146818_get_time() many times a second. It is very unlikely, though, that the RTC suddenly stops working and mc146818_get_time() would consistently fail.
Only compile-tested on alpha.
Signed-off-by: Mateusz Jończyk mat.jonczyk@o2.pl Cc: Richard Henderson rth@twiddle.net Cc: Ivan Kokshaysky ink@jurassic.park.msu.ru Cc: Matt Turner mattst88@gmail.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Alessandro Zummo a.zummo@towertech.it Cc: Alexandre Belloni alexandre.belloni@bootlin.com Cc: linux-alpha@vger.kernel.org Cc: x86@kernel.org Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Link: https://lore.kernel.org/r/20211210200131.153887-4-mat.jonczyk@o2.pl Signed-off-by: Sasha Levin sashal@kernel.org --- arch/alpha/kernel/rtc.c | 7 ++++++- arch/x86/kernel/hpet.c | 8 ++++++-- drivers/base/power/trace.c | 6 +++++- drivers/rtc/rtc-cmos.c | 9 ++++++++- drivers/rtc/rtc-mc146818-lib.c | 2 +- 5 files changed, 26 insertions(+), 6 deletions(-)
diff --git a/arch/alpha/kernel/rtc.c b/arch/alpha/kernel/rtc.c index ce3077946e1d..fb3025396ac9 100644 --- a/arch/alpha/kernel/rtc.c +++ b/arch/alpha/kernel/rtc.c @@ -80,7 +80,12 @@ init_rtc_epoch(void) static int alpha_rtc_read_time(struct device *dev, struct rtc_time *tm) { - mc146818_get_time(tm); + int ret = mc146818_get_time(tm); + + if (ret < 0) { + dev_err_ratelimited(dev, "unable to read current time\n"); + return ret; + }
/* Adjust for non-default epochs. It's easier to depend on the generic __get_rtc_time and adjust the epoch here than create diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c index 882213df3713..71f336425e58 100644 --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -1435,8 +1435,12 @@ irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id) hpet_rtc_timer_reinit(); memset(&curr_time, 0, sizeof(struct rtc_time));
- if (hpet_rtc_flags & (RTC_UIE | RTC_AIE)) - mc146818_get_time(&curr_time); + if (hpet_rtc_flags & (RTC_UIE | RTC_AIE)) { + if (unlikely(mc146818_get_time(&curr_time) < 0)) { + pr_err_ratelimited("unable to read current time from RTC\n"); + return IRQ_HANDLED; + } + }
if (hpet_rtc_flags & RTC_UIE && curr_time.tm_sec != hpet_prev_update_sec) { diff --git a/drivers/base/power/trace.c b/drivers/base/power/trace.c index 94665037f4a3..72b7a92337b1 100644 --- a/drivers/base/power/trace.c +++ b/drivers/base/power/trace.c @@ -120,7 +120,11 @@ static unsigned int read_magic_time(void) struct rtc_time time; unsigned int val;
- mc146818_get_time(&time); + if (mc146818_get_time(&time) < 0) { + pr_err("Unable to read current time from RTC\n"); + return 0; + } + pr_info("RTC time: %ptRt, date: %ptRd\n", &time, &time); val = time.tm_year; /* 100 years */ if (val > 100) diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c index dc3f8b0dde98..d0f58cca5c20 100644 --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -222,6 +222,8 @@ static inline void cmos_write_bank2(unsigned char val, unsigned char addr)
static int cmos_read_time(struct device *dev, struct rtc_time *t) { + int ret; + /* * If pm_trace abused the RTC for storage, set the timespec to 0, * which tells the caller that this RTC value is unusable. @@ -229,7 +231,12 @@ static int cmos_read_time(struct device *dev, struct rtc_time *t) if (!pm_trace_rtc_valid()) return -EIO;
- mc146818_get_time(t); + ret = mc146818_get_time(t); + if (ret < 0) { + dev_err_ratelimited(dev, "unable to read current time\n"); + return ret; + } + return 0; }
diff --git a/drivers/rtc/rtc-mc146818-lib.c b/drivers/rtc/rtc-mc146818-lib.c index bd48cee3027e..97e3cebb4da9 100644 --- a/drivers/rtc/rtc-mc146818-lib.c +++ b/drivers/rtc/rtc-mc146818-lib.c @@ -24,7 +24,7 @@ unsigned int mc146818_get_time(struct rtc_time *time) /* Ensure that the RTC is accessible. Bit 6 must be 0! */ if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x40) != 0)) { spin_unlock_irqrestore(&rtc_lock, flags); - memset(time, 0xff, sizeof(*time)); + memset(time, 0, sizeof(*time)); return -EIO; }
From: Mateusz Jończyk mat.jonczyk@o2.pl
[ Upstream commit ea6fa4961aab8f90a8aa03575a98b4bda368d4b6 ]
To prevent an infinite loop in mc146818_get_time(), commit 211e5db19d15 ("rtc: mc146818: Detect and handle broken RTCs") added a check for RTC availability. Together with a later fix, it checked if bit 6 in register 0x0d is cleared.
This, however, caused a false negative on a motherboard with an AMD SB710 southbridge; according to the specification [1], bit 6 of register 0x0d of this chipset is a scratchbit. This caused a regression in Linux 5.11 - the RTC was determined broken by the kernel and not used by rtc-cmos.c [3]. This problem was also reported in Fedora [4].
As a better alternative, check whether the UIP ("Update-in-progress") bit is set for longer then 10ms. If that is the case, then apparently the RTC is either absent (and all register reads return 0xff) or broken. Also limit the number of loop iterations in mc146818_get_time() to 10 to prevent an infinite loop there.
The functions mc146818_get_time() and mc146818_does_rtc_work() will be refactored later in this patch series, in order to fix a separate problem with reading / setting the RTC alarm time. This is done so to avoid a confusion about what is being fixed when.
In a previous approach to this problem, I implemented a check whether the RTC_HOURS register contains a value <= 24. This, however, sometimes did not work correctly on my Intel Kaby Lake laptop. According to Intel's documentation [2], "the time and date RAM locations (0-9) are disconnected from the external bus" during the update cycle so reading this register without checking the UIP bit is incorrect.
[1] AMD SB700/710/750 Register Reference Guide, page 308, https://developer.amd.com/wordpress/media/2012/10/43009_sb7xx_rrg_pub_1.00.p...
[2] 7th Generation Intel ® Processor Family I/O for U/Y Platforms [...] Datasheet Volume 1 of 2, page 209 Intel's Document Number: 334658-006, https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/7th-...
[3] Functions in arch/x86/kernel/rtc.c apparently were using it.
[4] https://bugzilla.redhat.com/show_bug.cgi?id=1936688
Fixes: 211e5db19d15 ("rtc: mc146818: Detect and handle broken RTCs") Fixes: ebb22a059436 ("rtc: mc146818: Dont test for bit 0-5 in Register D") Signed-off-by: Mateusz Jończyk mat.jonczyk@o2.pl Cc: Thomas Gleixner tglx@linutronix.de Cc: Alessandro Zummo a.zummo@towertech.it Cc: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Link: https://lore.kernel.org/r/20211210200131.153887-5-mat.jonczyk@o2.pl Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/rtc/rtc-cmos.c | 10 ++++------ drivers/rtc/rtc-mc146818-lib.c | 34 ++++++++++++++++++++++++++++++---- include/linux/mc146818rtc.h | 1 + 3 files changed, 35 insertions(+), 10 deletions(-)
diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c index d0f58cca5c20..b90a603d6b12 100644 --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -800,16 +800,14 @@ cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq)
rename_region(ports, dev_name(&cmos_rtc.rtc->dev));
- spin_lock_irq(&rtc_lock); - - /* Ensure that the RTC is accessible. Bit 6 must be 0! */ - if ((CMOS_READ(RTC_VALID) & 0x40) != 0) { - spin_unlock_irq(&rtc_lock); - dev_warn(dev, "not accessible\n"); + if (!mc146818_does_rtc_work()) { + dev_warn(dev, "broken or not accessible\n"); retval = -ENXIO; goto cleanup1; }
+ spin_lock_irq(&rtc_lock); + if (!(flags & CMOS_RTC_FLAGS_NOFREQ)) { /* force periodic irq to CMOS reset default of 1024Hz; * diff --git a/drivers/rtc/rtc-mc146818-lib.c b/drivers/rtc/rtc-mc146818-lib.c index 97e3cebb4da9..70137566981e 100644 --- a/drivers/rtc/rtc-mc146818-lib.c +++ b/drivers/rtc/rtc-mc146818-lib.c @@ -8,10 +8,36 @@ #include <linux/acpi.h> #endif
+/* + * If the UIP (Update-in-progress) bit of the RTC is set for more then + * 10ms, the RTC is apparently broken or not present. + */ +bool mc146818_does_rtc_work(void) +{ + int i; + unsigned char val; + unsigned long flags; + + for (i = 0; i < 10; i++) { + spin_lock_irqsave(&rtc_lock, flags); + val = CMOS_READ(RTC_FREQ_SELECT); + spin_unlock_irqrestore(&rtc_lock, flags); + + if ((val & RTC_UIP) == 0) + return true; + + mdelay(1); + } + + return false; +} +EXPORT_SYMBOL_GPL(mc146818_does_rtc_work); + unsigned int mc146818_get_time(struct rtc_time *time) { unsigned char ctrl; unsigned long flags; + unsigned int iter_count = 0; unsigned char century = 0; bool retry;
@@ -20,13 +46,13 @@ unsigned int mc146818_get_time(struct rtc_time *time) #endif
again: - spin_lock_irqsave(&rtc_lock, flags); - /* Ensure that the RTC is accessible. Bit 6 must be 0! */ - if (WARN_ON_ONCE((CMOS_READ(RTC_VALID) & 0x40) != 0)) { - spin_unlock_irqrestore(&rtc_lock, flags); + if (iter_count > 10) { memset(time, 0, sizeof(*time)); return -EIO; } + iter_count++; + + spin_lock_irqsave(&rtc_lock, flags);
/* * Check whether there is an update in progress during which the diff --git a/include/linux/mc146818rtc.h b/include/linux/mc146818rtc.h index 0661af17a758..69c80c4325bf 100644 --- a/include/linux/mc146818rtc.h +++ b/include/linux/mc146818rtc.h @@ -123,6 +123,7 @@ struct cmos_rtc_board_info { #define RTC_IO_EXTENT_USED RTC_IO_EXTENT #endif /* ARCH_RTC_LOCATION */
+bool mc146818_does_rtc_work(void); unsigned int mc146818_get_time(struct rtc_time *time); int mc146818_set_time(struct rtc_time *time);
From: Dan Carpenter dan.carpenter@oracle.com
[ Upstream commit 1647b54ed55d4d48c7199d439f8834626576cbe9 ]
This post-op should be a pre-op so that we do not pass -1 as the bit number to test_bit(). The current code will loop downwards from 63 to -1. After changing to a pre-op, it loops from 63 to 0.
Fixes: 71c37505e7ea ("drm/amdgpu/gfx: move more common KIQ code to amdgpu_gfx.c") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index 1916ec84dd71..e7845df6cad2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -266,7 +266,7 @@ static int amdgpu_gfx_kiq_acquire(struct amdgpu_device *adev, * adev->gfx.mec.num_pipe_per_mec * adev->gfx.mec.num_queue_per_pipe;
- while (queue_bit-- >= 0) { + while (--queue_bit >= 0) { if (test_bit(queue_bit, adev->gfx.mec.queue_bitmap)) continue;
From: Guilherme G. Piccoli gpiccoli@igalia.com
[ Upstream commit 792f232d57ff28bbd5f9c4abe0466b23d5879dc8 ]
The vmbus driver relies on the panic notifier infrastructure to perform some operations when a panic event is detected. Since vmbus can be built as module, it is required that the driver handles both registering and unregistering such panic notifier callback.
After commit 74347a99e73a ("x86/Hyper-V: Unload vmbus channel in hv panic callback") though, the panic notifier registration is done unconditionally in the module initialization routine whereas the unregistering procedure is conditionally guarded and executes only if HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE capability is set.
This patch fixes that by unconditionally unregistering the panic notifier in the module's exit routine as well.
Fixes: 74347a99e73a ("x86/Hyper-V: Unload vmbus channel in hv panic callback") Signed-off-by: Guilherme G. Piccoli gpiccoli@igalia.com Reviewed-by: Michael Kelley mikelley@microsoft.com Link: https://lore.kernel.org/r/20220315203535.682306-1-gpiccoli@igalia.com Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hv/vmbus_drv.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 44bd0b6ff505..a939ca1a8d54 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -2776,10 +2776,15 @@ static void __exit vmbus_exit(void) if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) { kmsg_dump_unregister(&hv_kmsg_dumper); unregister_die_notifier(&hyperv_die_block); - atomic_notifier_chain_unregister(&panic_notifier_list, - &hyperv_panic_block); }
+ /* + * The panic notifier is always registered, hence we should + * also unconditionally unregister it here as well. + */ + atomic_notifier_chain_unregister(&panic_notifier_list, + &hyperv_panic_block); + free_page((unsigned long)hv_panic_page); unregister_sysctl_table(hv_ctl_table_hdr); hv_ctl_table_hdr = NULL;
From: ChenXiaoSong chenxiaosong2@huawei.com
[ Upstream commit ab0fc21bc7105b54bafd85bd8b82742f9e68898a ]
This reverts commit 44942b4e457beda00981f616402a1a791e8c616e.
After secondly opening a file with O_ACCMODE|O_DIRECT flags, nfs4_valid_open_stateid() will dereference NULL nfs4_state when lseek().
Reproducer: 1. mount -t nfs -o vers=4.2 $server_ip:/ /mnt/ 2. fd = open("/mnt/file", O_ACCMODE|O_DIRECT|O_CREAT) 3. close(fd) 4. fd = open("/mnt/file", O_ACCMODE|O_DIRECT) 5. lseek(fd)
Reported-by: Lyu Tao tao.lyu@epfl.ch Signed-off-by: ChenXiaoSong chenxiaosong2@huawei.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/inode.c | 1 - fs/nfs/nfs4file.c | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 410f87bc48cc..f4f75db7a825 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -1167,7 +1167,6 @@ int nfs_open(struct inode *inode, struct file *filp) nfs_fscache_open_file(inode, filp); return 0; } -EXPORT_SYMBOL_GPL(nfs_open);
/* * This function is called whenever some part of NFS notices that diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c index c91565227ea2..8f35b5e13e93 100644 --- a/fs/nfs/nfs4file.c +++ b/fs/nfs/nfs4file.c @@ -51,7 +51,7 @@ nfs4_file_open(struct inode *inode, struct file *filp) return err;
if ((openflags & O_ACCMODE) == 3) - return nfs_open(inode, filp); + openflags--;
/* We can't create new files here */ openflags &= ~(O_CREAT|O_EXCL);
From: ChenXiaoSong chenxiaosong2@huawei.com
[ Upstream commit b243874f6f9568b2daf1a00e9222cacdc15e159c ]
open() with O_ACCMODE|O_DIRECT flags secondly will fail.
Reproducer: 1. mount -t nfs -o vers=4.2 $server_ip:/ /mnt/ 2. fd = open("/mnt/file", O_ACCMODE|O_DIRECT|O_CREAT) 3. close(fd) 4. fd = open("/mnt/file", O_ACCMODE|O_DIRECT)
Server nfsd4_decode_share_access() will fail with error nfserr_bad_xdr when client use incorrect share access mode of 0.
Fix this by using NFS4_SHARE_ACCESS_BOTH share access mode in client, just like firstly opening.
Fixes: ce4ef7c0a8a05 ("NFS: Split out NFS v4 file operations") Signed-off-by: ChenXiaoSong chenxiaosong2@huawei.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/dir.c | 10 ---------- fs/nfs/internal.h | 10 ++++++++++ fs/nfs/nfs4file.c | 6 ++++-- 3 files changed, 14 insertions(+), 12 deletions(-)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 9adc6f57a008..78219396788b 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1835,16 +1835,6 @@ const struct dentry_operations nfs4_dentry_operations = { }; EXPORT_SYMBOL_GPL(nfs4_dentry_operations);
-static fmode_t flags_to_mode(int flags) -{ - fmode_t res = (__force fmode_t)flags & FMODE_EXEC; - if ((flags & O_ACCMODE) != O_WRONLY) - res |= FMODE_READ; - if ((flags & O_ACCMODE) != O_RDONLY) - res |= FMODE_WRITE; - return res; -} - static struct nfs_open_context *create_nfs_open_context(struct dentry *dentry, int open_flags, struct file *filp) { return alloc_nfs_open_context(dentry, flags_to_mode(open_flags), filp); diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 7239118d98a3..c8845242d422 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -42,6 +42,16 @@ static inline bool nfs_lookup_is_soft_revalidate(const struct dentry *dentry) return true; }
+static inline fmode_t flags_to_mode(int flags) +{ + fmode_t res = (__force fmode_t)flags & FMODE_EXEC; + if ((flags & O_ACCMODE) != O_WRONLY) + res |= FMODE_READ; + if ((flags & O_ACCMODE) != O_RDONLY) + res |= FMODE_WRITE; + return res; +} + /* * Note: RFC 1813 doesn't limit the number of auth flavors that * a server can return, so make something up. diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c index 8f35b5e13e93..4120e1cb3fee 100644 --- a/fs/nfs/nfs4file.c +++ b/fs/nfs/nfs4file.c @@ -32,6 +32,7 @@ nfs4_file_open(struct inode *inode, struct file *filp) struct dentry *parent = NULL; struct inode *dir; unsigned openflags = filp->f_flags; + fmode_t f_mode; struct iattr attr; int err;
@@ -50,8 +51,9 @@ nfs4_file_open(struct inode *inode, struct file *filp) if (err) return err;
+ f_mode = filp->f_mode; if ((openflags & O_ACCMODE) == 3) - openflags--; + f_mode |= flags_to_mode(openflags);
/* We can't create new files here */ openflags &= ~(O_CREAT|O_EXCL); @@ -59,7 +61,7 @@ nfs4_file_open(struct inode *inode, struct file *filp) parent = dget_parent(dentry); dir = d_inode(parent);
- ctx = alloc_nfs_open_context(file_dentry(filp), filp->f_mode, filp); + ctx = alloc_nfs_open_context(file_dentry(filp), f_mode, filp); err = PTR_ERR(ctx); if (IS_ERR(ctx)) goto out;
From: Kevin Groeneveld kgroeneveld@lenbrook.com
[ Upstream commit bc5519c18a32ce855bb51b9f5eceb77a9489d080 ]
Commit 2e27f576abc6 ("scsi: scsi_ioctl: Call scsi_cmd_ioctl() from scsi_ioctl()") seems to have a typo as it is checking ret instead of cmd in the if statement checking for CDROMCLOSETRAY and CDROMEJECT. This changes the behaviour of these ioctls as the cdrom_ioctl handling of these is more restrictive than the scsi_ioctl version.
Link: https://lore.kernel.org/r/20220323002242.21157-1-kgroeneveld@lenbrook.com Fixes: 2e27f576abc6 ("scsi: scsi_ioctl: Call scsi_cmd_ioctl() from scsi_ioctl()") Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Kevin Groeneveld kgroeneveld@lenbrook.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/sr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c index 973d6e079b02..652cd81d7775 100644 --- a/drivers/scsi/sr.c +++ b/drivers/scsi/sr.c @@ -579,7 +579,7 @@ static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
scsi_autopm_get_device(sdev);
- if (ret != CDROMCLOSETRAY && ret != CDROMEJECT) { + if (cmd != CDROMCLOSETRAY && cmd != CDROMEJECT) { ret = cdrom_ioctl(&cd->cdi, bdev, mode, cmd, arg); if (ret != -ENOSYS) goto put;
From: John Garry john.garry@huawei.com
[ Upstream commit eaba83b5b8506bbc9ee7ca2f10aeab3fff3719e7 ]
In commit edb854a3680b ("scsi: core: Reallocate device's budget map on queue depth change"), the sbitmap for the device budget map may be reallocated after the slave device depth is configured.
When the sbitmap is reallocated we use the result from scsi_device_max_queue_depth() for the sbitmap size, but don't resize to match the actual device queue depth.
Fix by resizing the sbitmap after reallocating the budget sbitmap. We do this instead of init'ing the sbitmap to the device queue depth as the user may want to change the queue depth later via sysfs or other.
Link: https://lore.kernel.org/r/1647423870-143867-1-git-send-email-john.garry@huaw... Fixes: edb854a3680b ("scsi: core: Reallocate device's budget map on queue depth change") Tested-by: Damien Le Moal damien.lemoal@opensource.wdc.com Reviewed-by: Ming Lei ming.lei@redhat.com Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: John Garry john.garry@huawei.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/scsi_scan.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 7266880c70c2..9466474ff01b 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -207,6 +207,8 @@ static int scsi_realloc_sdev_budget_map(struct scsi_device *sdev, int ret; struct sbitmap sb_backup;
+ depth = min_t(unsigned int, depth, scsi_device_max_queue_depth(sdev)); + /* * realloc if new shift is calculated, which is caused by setting * up one new default queue depth after calling ->slave_configure @@ -229,6 +231,9 @@ static int scsi_realloc_sdev_budget_map(struct scsi_device *sdev, scsi_device_max_queue_depth(sdev), new_shift, GFP_KERNEL, sdev->request_queue->node, false, true); + if (!ret) + sbitmap_resize(&sdev->budget_map, depth); + if (need_free) { if (ret) sdev->budget_map = sb_backup;
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit 16ed828b872d12ccba8f07bcc446ae89ba662f9c ]
The error handling path of the probe releases a resource that is not freed in the remove function. In some cases, a ioremap() must be undone.
Add the missing iounmap() call in the remove function.
Link: https://lore.kernel.org/r/247066a3104d25f9a05de8b3270fc3c848763bcc.164767326... Fixes: 45804fbb00ee ("[SCSI] 53c700: Amiga Zorro NCR53c710 SCSI") Reviewed-by: Geert Uytterhoeven geert@linux-m68k.org Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/zorro7xx.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/scsi/zorro7xx.c b/drivers/scsi/zorro7xx.c index 27b9e2baab1a..7acf9193a9e8 100644 --- a/drivers/scsi/zorro7xx.c +++ b/drivers/scsi/zorro7xx.c @@ -159,6 +159,8 @@ static void zorro7xx_remove_one(struct zorro_dev *z) scsi_remove_host(host);
NCR_700_release(host); + if (host->base > 0x01000000) + iounmap(hostdata->base); kfree(hostdata); free_irq(host->irq, host); zorro_release_device(z);
From: Eli Cohen elic@nvidia.com
[ Upstream commit 218bdd20e56cab41a68481bc10c551ae3e0a24fb ]
A subesequent patch will use the same workqueue for executing other work not related to control VQ. Rename the workqueue and the work queue entry used to convey information to the workqueue.
Signed-off-by: Eli Cohen elic@nvidia.com Link: https://lore.kernel.org/r/20210909123635.30884-3-elic@nvidia.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vdpa/mlx5/core/mlx5_vdpa.h | 2 +- drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h index 01a848adf590..81dc3d88d3dd 100644 --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h @@ -63,7 +63,7 @@ struct mlx5_control_vq { unsigned short head; };
-struct mlx5_ctrl_wq_ent { +struct mlx5_vdpa_wq_ent { struct work_struct work; struct mlx5_vdpa_dev *mvdev; }; diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index f77a611f592f..f769e2dc6d26 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1573,14 +1573,14 @@ static void mlx5_cvq_kick_handler(struct work_struct *work) { virtio_net_ctrl_ack status = VIRTIO_NET_ERR; struct virtio_net_ctrl_hdr ctrl; - struct mlx5_ctrl_wq_ent *wqent; + struct mlx5_vdpa_wq_ent *wqent; struct mlx5_vdpa_dev *mvdev; struct mlx5_control_vq *cvq; struct mlx5_vdpa_net *ndev; size_t read, write; int err;
- wqent = container_of(work, struct mlx5_ctrl_wq_ent, work); + wqent = container_of(work, struct mlx5_vdpa_wq_ent, work); mvdev = wqent->mvdev; ndev = to_mlx5_vdpa_ndev(mvdev); cvq = &mvdev->cvq; @@ -1632,7 +1632,7 @@ static void mlx5_vdpa_kick_vq(struct vdpa_device *vdev, u16 idx) struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); struct mlx5_vdpa_virtqueue *mvq; - struct mlx5_ctrl_wq_ent *wqent; + struct mlx5_vdpa_wq_ent *wqent;
if (!is_index_valid(mvdev, idx)) return; @@ -2502,7 +2502,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name) if (err) goto err_mr;
- mvdev->wq = create_singlethread_workqueue("mlx5_vdpa_ctrl_wq"); + mvdev->wq = create_singlethread_workqueue("mlx5_vdpa_wq"); if (!mvdev->wq) { err = -ENOMEM; goto err_res2;
From: Eli Cohen elic@nvidia.com
[ Upstream commit edf747affc41a18ccc3a616813d4c2b6d38b46ce ]
Add code to register to hardware asynchronous events. Use this mechanism to track link status events coming from the device and update the config struct.
After doing link status change, call the vdpa callback to notify of the link status change.
Signed-off-by: Eli Cohen elic@nvidia.com Link: https://lore.kernel.org/r/20210909123635.30884-4-elic@nvidia.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 94 ++++++++++++++++++++++++++++++- 1 file changed, 92 insertions(+), 2 deletions(-)
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index f769e2dc6d26..92a2484c8596 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -161,6 +161,8 @@ struct mlx5_vdpa_net { bool setup; u16 mtu; u32 cur_num_vqs; + struct notifier_block nb; + struct vdpa_callback config_cb; };
static void free_resources(struct mlx5_vdpa_net *ndev); @@ -1868,6 +1870,7 @@ static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev) ndev->mvdev.mlx_features |= BIT_ULL(VIRTIO_NET_F_CTRL_VQ); ndev->mvdev.mlx_features |= BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR); ndev->mvdev.mlx_features |= BIT_ULL(VIRTIO_NET_F_MQ); + ndev->mvdev.mlx_features |= BIT_ULL(VIRTIO_NET_F_STATUS);
print_features(mvdev, ndev->mvdev.mlx_features, false); return ndev->mvdev.mlx_features; @@ -1980,8 +1983,10 @@ static int mlx5_vdpa_set_features(struct vdpa_device *vdev, u64 features)
static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct vdpa_callback *cb) { - /* not implemented */ - mlx5_vdpa_warn(to_mvdev(vdev), "set config callback not supported\n"); + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); + + ndev->config_cb = *cb; }
#define MLX5_VDPA_MAX_VQ_ENTRIES 256 @@ -2433,6 +2438,82 @@ struct mlx5_vdpa_mgmtdev { struct mlx5_vdpa_net *ndev; };
+static u8 query_vport_state(struct mlx5_core_dev *mdev, u8 opmod, u16 vport) +{ + u32 out[MLX5_ST_SZ_DW(query_vport_state_out)] = {}; + u32 in[MLX5_ST_SZ_DW(query_vport_state_in)] = {}; + int err; + + MLX5_SET(query_vport_state_in, in, opcode, MLX5_CMD_OP_QUERY_VPORT_STATE); + MLX5_SET(query_vport_state_in, in, op_mod, opmod); + MLX5_SET(query_vport_state_in, in, vport_number, vport); + if (vport) + MLX5_SET(query_vport_state_in, in, other_vport, 1); + + err = mlx5_cmd_exec_inout(mdev, query_vport_state, in, out); + if (err) + return 0; + + return MLX5_GET(query_vport_state_out, out, state); +} + +static bool get_link_state(struct mlx5_vdpa_dev *mvdev) +{ + if (query_vport_state(mvdev->mdev, MLX5_VPORT_STATE_OP_MOD_VNIC_VPORT, 0) == + VPORT_STATE_UP) + return true; + + return false; +} + +static void update_carrier(struct work_struct *work) +{ + struct mlx5_vdpa_wq_ent *wqent; + struct mlx5_vdpa_dev *mvdev; + struct mlx5_vdpa_net *ndev; + + wqent = container_of(work, struct mlx5_vdpa_wq_ent, work); + mvdev = wqent->mvdev; + ndev = to_mlx5_vdpa_ndev(mvdev); + if (get_link_state(mvdev)) + ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP); + else + ndev->config.status &= cpu_to_mlx5vdpa16(mvdev, ~VIRTIO_NET_S_LINK_UP); + + if (ndev->config_cb.callback) + ndev->config_cb.callback(ndev->config_cb.private); + + kfree(wqent); +} + +static int event_handler(struct notifier_block *nb, unsigned long event, void *param) +{ + struct mlx5_vdpa_net *ndev = container_of(nb, struct mlx5_vdpa_net, nb); + struct mlx5_eqe *eqe = param; + int ret = NOTIFY_DONE; + struct mlx5_vdpa_wq_ent *wqent; + + if (event == MLX5_EVENT_TYPE_PORT_CHANGE) { + switch (eqe->sub_type) { + case MLX5_PORT_CHANGE_SUBTYPE_DOWN: + case MLX5_PORT_CHANGE_SUBTYPE_ACTIVE: + wqent = kzalloc(sizeof(*wqent), GFP_ATOMIC); + if (!wqent) + return NOTIFY_DONE; + + wqent->mvdev = &ndev->mvdev; + INIT_WORK(&wqent->work, update_carrier); + queue_work(ndev->mvdev.wq, &wqent->work); + ret = NOTIFY_OK; + break; + default: + return NOTIFY_DONE; + } + return ret; + } + return ret; +} + static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name) { struct mlx5_vdpa_mgmtdev *mgtdev = container_of(v_mdev, struct mlx5_vdpa_mgmtdev, mgtdev); @@ -2477,6 +2558,11 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name) if (err) goto err_mtu;
+ if (get_link_state(mvdev)) + ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP); + else + ndev->config.status &= cpu_to_mlx5vdpa16(mvdev, ~VIRTIO_NET_S_LINK_UP); + if (!is_zero_ether_addr(config->mac)) { pfmdev = pci_get_drvdata(pci_physfn(mdev->pdev)); err = mlx5_mpfs_add_mac(pfmdev, config->mac); @@ -2508,6 +2594,8 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name) goto err_res2; }
+ ndev->nb.notifier_call = event_handler; + mlx5_notifier_register(mdev, &ndev->nb); ndev->cur_num_vqs = 2 * mlx5_vdpa_max_qps(max_vqs); mvdev->vdev.mdev = &mgtdev->mgtdev; err = _vdpa_register_device(&mvdev->vdev, ndev->cur_num_vqs + 1); @@ -2538,7 +2626,9 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device * { struct mlx5_vdpa_mgmtdev *mgtdev = container_of(v_mdev, struct mlx5_vdpa_mgmtdev, mgtdev); struct mlx5_vdpa_dev *mvdev = to_mvdev(dev); + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+ mlx5_notifier_unregister(mvdev->mdev, &ndev->nb); destroy_workqueue(mvdev->wq); _vdpa_unregister_device(dev); mgtdev->ndev = NULL;
From: Jason Wang jasowang@redhat.com
[ Upstream commit 55ebf0d60e3cc6c9e8593399e185842c00e12f36 ]
A userspace triggerable infinite loop could happen in mlx5_cvq_kick_handler() if userspace keeps sending a huge amount of cvq requests.
Fixing this by introducing a quota and re-queue the work if we're out of the budget (currently the implicit budget is one) . While at it, using a per device work struct to avoid on demand memory allocation for cvq.
Fixes: 5262912ef3cfc ("vdpa/mlx5: Add support for control VQ and MAC setting") Signed-off-by: Jason Wang jasowang@redhat.com Link: https://lore.kernel.org/r/20220329042109.4029-1-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Acked-by: Eli Cohen elic@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 92a2484c8596..e4258f40dcd7 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -163,6 +163,7 @@ struct mlx5_vdpa_net { u32 cur_num_vqs; struct notifier_block nb; struct vdpa_callback config_cb; + struct mlx5_vdpa_wq_ent cvq_ent; };
static void free_resources(struct mlx5_vdpa_net *ndev); @@ -1587,10 +1588,10 @@ static void mlx5_cvq_kick_handler(struct work_struct *work) ndev = to_mlx5_vdpa_ndev(mvdev); cvq = &mvdev->cvq; if (!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))) - goto out; + return;
if (!cvq->ready) - goto out; + return;
while (true) { err = vringh_getdesc_iotlb(&cvq->vring, &cvq->riov, &cvq->wiov, &cvq->head, @@ -1624,9 +1625,10 @@ static void mlx5_cvq_kick_handler(struct work_struct *work)
if (vringh_need_notify_iotlb(&cvq->vring)) vringh_notify(&cvq->vring); + + queue_work(mvdev->wq, &wqent->work); + break; } -out: - kfree(wqent); }
static void mlx5_vdpa_kick_vq(struct vdpa_device *vdev, u16 idx) @@ -1634,7 +1636,6 @@ static void mlx5_vdpa_kick_vq(struct vdpa_device *vdev, u16 idx) struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); struct mlx5_vdpa_virtqueue *mvq; - struct mlx5_vdpa_wq_ent *wqent;
if (!is_index_valid(mvdev, idx)) return; @@ -1643,13 +1644,7 @@ static void mlx5_vdpa_kick_vq(struct vdpa_device *vdev, u16 idx) if (!mvdev->cvq.ready) return;
- wqent = kzalloc(sizeof(*wqent), GFP_ATOMIC); - if (!wqent) - return; - - wqent->mvdev = mvdev; - INIT_WORK(&wqent->work, mlx5_cvq_kick_handler); - queue_work(mvdev->wq, &wqent->work); + queue_work(mvdev->wq, &ndev->cvq_ent.work); return; }
@@ -2588,6 +2583,8 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name) if (err) goto err_mr;
+ ndev->cvq_ent.mvdev = mvdev; + INIT_WORK(&ndev->cvq_ent.work, mlx5_cvq_kick_handler); mvdev->wq = create_singlethread_workqueue("mlx5_vdpa_wq"); if (!mvdev->wq) { err = -ENOMEM;
From: Taehee Yoo ap420073@gmail.com
[ Upstream commit 059a47f1da93811d37533556d67e72f2261b1127 ]
After rx/tx ring buffer size is changed, kernel panic occurs when it acts XDP_TX or XDP_REDIRECT.
When tx/rx ring buffer size is changed(ethtool -G), sfc driver reallocates and reinitializes rx and tx queues and their buffer (tx_queue->buffer). But it misses reinitializing xdp queues(efx->xdp_tx_queues). So, while it is acting XDP_TX or XDP_REDIRECT, it uses the uninitialized tx_queue->buffer.
A new function efx_set_xdp_channels() is separated from efx_set_channels() to handle only xdp queues.
Splat looks like: BUG: kernel NULL pointer dereference, address: 000000000000002a #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#4] PREEMPT SMP NOPTI RIP: 0010:efx_tx_map_chunk+0x54/0x90 [sfc] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G D 5.17.0+ #55 e8beeee8289528f11357029357cf Code: 48 8b 8d a8 01 00 00 48 8d 14 52 4c 8d 2c d0 44 89 e0 48 85 c9 74 0e 44 89 e2 4c 89 f6 48 80 RSP: 0018:ffff92f121e45c60 EFLAGS: 00010297 RIP: 0010:efx_tx_map_chunk+0x54/0x90 [sfc] RAX: 0000000000000040 RBX: ffff92ea506895c0 RCX: ffffffffc0330870 RDX: 0000000000000001 RSI: 00000001139b10ce RDI: ffff92ea506895c0 RBP: ffffffffc0358a80 R08: 00000001139b110d R09: 0000000000000000 R10: 0000000000000001 R11: ffff92ea414c0088 R12: 0000000000000040 R13: 0000000000000018 R14: 00000001139b10ce R15: ffff92ea506895c0 FS: 0000000000000000(0000) GS:ffff92f121ec0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Code: 48 8b 8d a8 01 00 00 48 8d 14 52 4c 8d 2c d0 44 89 e0 48 85 c9 74 0e 44 89 e2 4c 89 f6 48 80 CR2: 000000000000002a CR3: 00000003e6810004 CR4: 00000000007706e0 RSP: 0018:ffff92f121e85c60 EFLAGS: 00010297 PKRU: 55555554 RAX: 0000000000000040 RBX: ffff92ea50689700 RCX: ffffffffc0330870 RDX: 0000000000000001 RSI: 00000001145a90ce RDI: ffff92ea50689700 RBP: ffffffffc0358a80 R08: 00000001145a910d R09: 0000000000000000 R10: 0000000000000001 R11: ffff92ea414c0088 R12: 0000000000000040 R13: 0000000000000018 R14: 00000001145a90ce R15: ffff92ea50689700 FS: 0000000000000000(0000) GS:ffff92f121e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000002a CR3: 00000003e6810005 CR4: 00000000007706e0 PKRU: 55555554 Call Trace: <IRQ> efx_xdp_tx_buffers+0x12b/0x3d0 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] __efx_rx_packet+0x5c3/0x930 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] efx_rx_packet+0x28c/0x2e0 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] efx_ef10_ev_process+0x5f8/0xf40 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] ? enqueue_task_fair+0x95/0x550 efx_poll+0xc4/0x360 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5]
Fixes: 3990a8fffbda ("sfc: allocate channels for XDP tx queues") Signed-off-by: Taehee Yoo ap420073@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/sfc/efx_channels.c | 146 +++++++++++++----------- 1 file changed, 81 insertions(+), 65 deletions(-)
diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c index 3dbea028b325..4753c0c5af10 100644 --- a/drivers/net/ethernet/sfc/efx_channels.c +++ b/drivers/net/ethernet/sfc/efx_channels.c @@ -763,6 +763,85 @@ void efx_remove_channels(struct efx_nic *efx) kfree(efx->xdp_tx_queues); }
+static int efx_set_xdp_tx_queue(struct efx_nic *efx, int xdp_queue_number, + struct efx_tx_queue *tx_queue) +{ + if (xdp_queue_number >= efx->xdp_tx_queue_count) + return -EINVAL; + + netif_dbg(efx, drv, efx->net_dev, + "Channel %u TXQ %u is XDP %u, HW %u\n", + tx_queue->channel->channel, tx_queue->label, + xdp_queue_number, tx_queue->queue); + efx->xdp_tx_queues[xdp_queue_number] = tx_queue; + return 0; +} + +static void efx_set_xdp_channels(struct efx_nic *efx) +{ + struct efx_tx_queue *tx_queue; + struct efx_channel *channel; + unsigned int next_queue = 0; + int xdp_queue_number = 0; + int rc; + + /* We need to mark which channels really have RX and TX + * queues, and adjust the TX queue numbers if we have separate + * RX-only and TX-only channels. + */ + efx_for_each_channel(channel, efx) { + if (channel->channel < efx->tx_channel_offset) + continue; + + if (efx_channel_is_xdp_tx(channel)) { + efx_for_each_channel_tx_queue(tx_queue, channel) { + tx_queue->queue = next_queue++; + rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, + tx_queue); + if (rc == 0) + xdp_queue_number++; + } + } else { + efx_for_each_channel_tx_queue(tx_queue, channel) { + tx_queue->queue = next_queue++; + netif_dbg(efx, drv, efx->net_dev, + "Channel %u TXQ %u is HW %u\n", + channel->channel, tx_queue->label, + tx_queue->queue); + } + + /* If XDP is borrowing queues from net stack, it must + * use the queue with no csum offload, which is the + * first one of the channel + * (note: tx_queue_by_type is not initialized yet) + */ + if (efx->xdp_txq_queues_mode == + EFX_XDP_TX_QUEUES_BORROWED) { + tx_queue = &channel->tx_queue[0]; + rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, + tx_queue); + if (rc == 0) + xdp_queue_number++; + } + } + } + WARN_ON(efx->xdp_txq_queues_mode == EFX_XDP_TX_QUEUES_DEDICATED && + xdp_queue_number != efx->xdp_tx_queue_count); + WARN_ON(efx->xdp_txq_queues_mode != EFX_XDP_TX_QUEUES_DEDICATED && + xdp_queue_number > efx->xdp_tx_queue_count); + + /* If we have more CPUs than assigned XDP TX queues, assign the already + * existing queues to the exceeding CPUs + */ + next_queue = 0; + while (xdp_queue_number < efx->xdp_tx_queue_count) { + tx_queue = efx->xdp_tx_queues[next_queue++]; + rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, tx_queue); + if (rc == 0) + xdp_queue_number++; + } +} + int efx_realloc_channels(struct efx_nic *efx, u32 rxq_entries, u32 txq_entries) { struct efx_channel *other_channel[EFX_MAX_CHANNELS], *channel; @@ -837,6 +916,7 @@ int efx_realloc_channels(struct efx_nic *efx, u32 rxq_entries, u32 txq_entries) efx_init_napi_channel(efx->channel[i]); }
+ efx_set_xdp_channels(efx); out: /* Destroy unused channel structures */ for (i = 0; i < efx->n_channels; i++) { @@ -872,26 +952,9 @@ int efx_realloc_channels(struct efx_nic *efx, u32 rxq_entries, u32 txq_entries) goto out; }
-static inline int -efx_set_xdp_tx_queue(struct efx_nic *efx, int xdp_queue_number, - struct efx_tx_queue *tx_queue) -{ - if (xdp_queue_number >= efx->xdp_tx_queue_count) - return -EINVAL; - - netif_dbg(efx, drv, efx->net_dev, "Channel %u TXQ %u is XDP %u, HW %u\n", - tx_queue->channel->channel, tx_queue->label, - xdp_queue_number, tx_queue->queue); - efx->xdp_tx_queues[xdp_queue_number] = tx_queue; - return 0; -} - int efx_set_channels(struct efx_nic *efx) { - struct efx_tx_queue *tx_queue; struct efx_channel *channel; - unsigned int next_queue = 0; - int xdp_queue_number; int rc;
efx->tx_channel_offset = @@ -909,61 +972,14 @@ int efx_set_channels(struct efx_nic *efx) return -ENOMEM; }
- /* We need to mark which channels really have RX and TX - * queues, and adjust the TX queue numbers if we have separate - * RX-only and TX-only channels. - */ - xdp_queue_number = 0; efx_for_each_channel(channel, efx) { if (channel->channel < efx->n_rx_channels) channel->rx_queue.core_index = channel->channel; else channel->rx_queue.core_index = -1; - - if (channel->channel >= efx->tx_channel_offset) { - if (efx_channel_is_xdp_tx(channel)) { - efx_for_each_channel_tx_queue(tx_queue, channel) { - tx_queue->queue = next_queue++; - rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, tx_queue); - if (rc == 0) - xdp_queue_number++; - } - } else { - efx_for_each_channel_tx_queue(tx_queue, channel) { - tx_queue->queue = next_queue++; - netif_dbg(efx, drv, efx->net_dev, "Channel %u TXQ %u is HW %u\n", - channel->channel, tx_queue->label, - tx_queue->queue); - } - - /* If XDP is borrowing queues from net stack, it must use the queue - * with no csum offload, which is the first one of the channel - * (note: channel->tx_queue_by_type is not initialized yet) - */ - if (efx->xdp_txq_queues_mode == EFX_XDP_TX_QUEUES_BORROWED) { - tx_queue = &channel->tx_queue[0]; - rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, tx_queue); - if (rc == 0) - xdp_queue_number++; - } - } - } } - WARN_ON(efx->xdp_txq_queues_mode == EFX_XDP_TX_QUEUES_DEDICATED && - xdp_queue_number != efx->xdp_tx_queue_count); - WARN_ON(efx->xdp_txq_queues_mode != EFX_XDP_TX_QUEUES_DEDICATED && - xdp_queue_number > efx->xdp_tx_queue_count);
- /* If we have more CPUs than assigned XDP TX queues, assign the already - * existing queues to the exceeding CPUs - */ - next_queue = 0; - while (xdp_queue_number < efx->xdp_tx_queue_count) { - tx_queue = efx->xdp_tx_queues[next_queue++]; - rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, tx_queue); - if (rc == 0) - xdp_queue_number++; - } + efx_set_xdp_channels(efx);
rc = netif_set_real_num_tx_queues(efx->net_dev, efx->n_tx_channels); if (rc)
From: Ziyang Xuan william.xuanziyang@huawei.com
[ Upstream commit 9381fe8c849cfbe50245ac01fc077554f6eaa0e2 ]
The memory size of tls_ctx->rx.iv for AES128-CCM is 12 setting in tls_set_sw_offload(). The return value of crypto_aead_ivsize() for "ccm(aes)" is 16. So memcpy() require 16 bytes from 12 bytes memory space will trigger slab-out-of-bounds bug as following:
================================================================== BUG: KASAN: slab-out-of-bounds in decrypt_internal+0x385/0xc40 [tls] Read of size 16 at addr ffff888114e84e60 by task tls/10911
Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x5e/0x5db ? decrypt_internal+0x385/0xc40 [tls] kasan_report+0xab/0x120 ? decrypt_internal+0x385/0xc40 [tls] kasan_check_range+0xf9/0x1e0 memcpy+0x20/0x60 decrypt_internal+0x385/0xc40 [tls] ? tls_get_rec+0x2e0/0x2e0 [tls] ? process_rx_list+0x1a5/0x420 [tls] ? tls_setup_from_iter.constprop.0+0x2e0/0x2e0 [tls] decrypt_skb_update+0x9d/0x400 [tls] tls_sw_recvmsg+0x3c8/0xb50 [tls]
Allocated by task 10911: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 tls_set_sw_offload+0x2eb/0xa20 [tls] tls_setsockopt+0x68c/0x700 [tls] __sys_setsockopt+0xfe/0x1b0
Replace the crypto_aead_ivsize() with prot->iv_size + prot->salt_size when memcpy() iv value in TLS_1_3_VERSION scenario.
Fixes: f295b3ae9f59 ("net/tls: Add support of AES128-CCM based ciphers") Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index bd96ec26f4f9..794ef3b3d7d4 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1483,7 +1483,7 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb, if (prot->version == TLS_1_3_VERSION || prot->cipher_type == TLS_CIPHER_CHACHA20_POLY1305) memcpy(iv + iv_offset, tls_ctx->rx.iv, - crypto_aead_ivsize(ctx->aead_recv)); + prot->iv_size + prot->salt_size); else memcpy(iv + iv_offset, tls_ctx->rx.iv, prot->salt_size);
From: Eyal Birger eyal.birger@gmail.com
[ Upstream commit 012d69fbfcc739f846766c1da56ef8b493b803b5 ]
in commit 048939088220 ("vrf: add mac header for tunneled packets when sniffer is attached") an Ethernet header was cooked for traffic originating from tunnel devices.
However, the header is added based on whether the mac_header is unset and ignores cases where the device doesn't expose a mac header to upper layers, such as in ip tunnels like ipip and gre.
Traffic originating from such devices still appears garbled when capturing on the vrf device.
Fix by observing whether the original device exposes a header to upper layers, similar to the logic done in af_packet.
In addition, skb->mac_len needs to be adjusted after adding the Ethernet header for the skb_push/pull() surrounding dev_queue_xmit_nit() to work on these packets.
Fixes: 048939088220 ("vrf: add mac header for tunneled packets when sniffer is attached") Signed-off-by: Eyal Birger eyal.birger@gmail.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/vrf.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index b2242a082431..091dd7caf10c 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -1265,6 +1265,7 @@ static int vrf_prepare_mac_header(struct sk_buff *skb, eth = (struct ethhdr *)skb->data;
skb_reset_mac_header(skb); + skb_reset_mac_len(skb);
/* we set the ethernet destination and the source addresses to the * address of the VRF device. @@ -1294,9 +1295,9 @@ static int vrf_prepare_mac_header(struct sk_buff *skb, */ static int vrf_add_mac_header_if_unset(struct sk_buff *skb, struct net_device *vrf_dev, - u16 proto) + u16 proto, struct net_device *orig_dev) { - if (skb_mac_header_was_set(skb)) + if (skb_mac_header_was_set(skb) && dev_has_header(orig_dev)) return 0;
return vrf_prepare_mac_header(skb, vrf_dev, proto); @@ -1402,6 +1403,8 @@ static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev,
/* if packet is NDISC then keep the ingress interface */ if (!is_ndisc) { + struct net_device *orig_dev = skb->dev; + vrf_rx_stats(vrf_dev, skb->len); skb->dev = vrf_dev; skb->skb_iif = vrf_dev->ifindex; @@ -1410,7 +1413,8 @@ static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev, int err;
err = vrf_add_mac_header_if_unset(skb, vrf_dev, - ETH_P_IPV6); + ETH_P_IPV6, + orig_dev); if (likely(!err)) { skb_push(skb, skb->mac_len); dev_queue_xmit_nit(skb, vrf_dev); @@ -1440,6 +1444,8 @@ static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev, static struct sk_buff *vrf_ip_rcv(struct net_device *vrf_dev, struct sk_buff *skb) { + struct net_device *orig_dev = skb->dev; + skb->dev = vrf_dev; skb->skb_iif = vrf_dev->ifindex; IPCB(skb)->flags |= IPSKB_L3SLAVE; @@ -1460,7 +1466,8 @@ static struct sk_buff *vrf_ip_rcv(struct net_device *vrf_dev, if (!list_empty(&vrf_dev->ptype_all)) { int err;
- err = vrf_add_mac_header_if_unset(skb, vrf_dev, ETH_P_IP); + err = vrf_add_mac_header_if_unset(skb, vrf_dev, ETH_P_IP, + orig_dev); if (likely(!err)) { skb_push(skb, skb->mac_len); dev_queue_xmit_nit(skb, vrf_dev);
From: Jean-Philippe Brucker jean-philippe@linaro.org
[ Upstream commit 1effe8ca4e34c34cdd9318436a4232dcb582ebf4 ]
Fix a use-after-free when using page_pool with page fragments. We encountered this problem during normal RX in the hns3 driver:
(1) Initially we have three descriptors in the RX queue. The first one allocates PAGE1 through page_pool, and the other two allocate one half of PAGE2 each. Page references look like this:
RX_BD1 _______ PAGE1 RX_BD2 _______ PAGE2 RX_BD3 _________/
(2) Handle RX on the first descriptor. Allocate SKB1, eventually added to the receive queue by tcp_queue_rcv().
(3) Handle RX on the second descriptor. Allocate SKB2 and pass it to netif_receive_skb():
netif_receive_skb(SKB2) ip_rcv(SKB2) SKB3 = skb_clone(SKB2)
SKB2 and SKB3 share a reference to PAGE2 through skb_shinfo()->dataref. The other ref to PAGE2 is still held by RX_BD3:
SKB2 ---+- PAGE2 SKB3 __/ / RX_BD3 _________/
(3b) Now while handling TCP, coalesce SKB3 with SKB1:
tcp_v4_rcv(SKB3) tcp_try_coalesce(to=SKB1, from=SKB3) // succeeds kfree_skb_partial(SKB3) skb_release_data(SKB3) // drops one dataref
SKB1 _____ PAGE1 ____ SKB2 _____ PAGE2 / RX_BD3 _________/
In skb_try_coalesce(), __skb_frag_ref() takes a page reference to PAGE2, where it should instead have increased the page_pool frag reference, pp_frag_count. Without coalescing, when releasing both SKB2 and SKB3, a single reference to PAGE2 would be dropped. Now when releasing SKB1 and SKB2, two references to PAGE2 will be dropped, resulting in underflow.
(3c) Drop SKB2:
af_packet_rcv(SKB2) consume_skb(SKB2) skb_release_data(SKB2) // drops second dataref page_pool_return_skb_page(PAGE2) // drops one pp_frag_count
SKB1 _____ PAGE1 ____ PAGE2 / RX_BD3 _________/
(4) Userspace calls recvmsg() Copies SKB1 and releases it. Since SKB3 was coalesced with SKB1, we release the SKB3 page as well:
tcp_eat_recv_skb(SKB1) skb_release_data(SKB1) page_pool_return_skb_page(PAGE1) page_pool_return_skb_page(PAGE2) // drops second pp_frag_count
(5) PAGE2 is freed, but the third RX descriptor was still using it! In our case this causes IOMMU faults, but it would silently corrupt memory if the IOMMU was disabled.
Change the logic that checks whether pp_recycle SKBs can be coalesced. We still reject differing pp_recycle between 'from' and 'to' SKBs, but in order to avoid the situation described above, we also reject coalescing when both 'from' and 'to' are pp_recycled and 'from' is cloned.
The new logic allows coalescing a cloned pp_recycle SKB into a page refcounted one, because in this case the release (4) will drop the right reference, the one taken by skb_try_coalesce().
Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") Suggested-by: Alexander Duyck alexanderduyck@fb.com Signed-off-by: Jean-Philippe Brucker jean-philippe@linaro.org Reviewed-by: Yunsheng Lin linyunsheng@huawei.com Reviewed-by: Alexander Duyck alexanderduyck@fb.com Acked-by: Ilias Apalodimas ilias.apalodimas@linaro.org Acked-by: Jesper Dangaard Brouer brouer@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/skbuff.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 5956c84cb274..e4badc189e37 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -5390,11 +5390,18 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, if (skb_cloned(to)) return false;
- /* The page pool signature of struct page will eventually figure out - * which pages can be recycled or not but for now let's prohibit slab - * allocated and page_pool allocated SKBs from being coalesced. + /* In general, avoid mixing slab allocated and page_pool allocated + * pages within the same SKB. However when @to is not pp_recycle and + * @from is cloned, we can transition frag pages from page_pool to + * reference counted. + * + * On the other hand, don't allow coalescing two pp_recycle SKBs if + * @from is cloned, in case the SKB is using page_pool fragment + * references (PP_FLAG_PAGE_FRAG). Since we only take full page + * references for cloned SKBs at the moment that would result in + * inconsistent reference counts. */ - if (to->pp_recycle != from->pp_recycle) + if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from))) return false;
if (len <= skb_tailroom(to)) {
From: Ivan Vecera ivecera@redhat.com
[ Upstream commit bd8c624c0cd59de0032752ba3001c107bba97f7b ]
VSI is set as default forwarding one when promisc mode is set for PF interface, when PF is switched to switchdev mode or when VF driver asks to enable allmulticast or promisc mode for the VF interface (when vf-true-promisc-support priv flag is off). The third case is buggy because in that case VSI associated with VF remains as default one after VF removal.
Reproducer: 1. Create VF echo 1 > sys/class/net/ens7f0/device/sriov_numvfs 2. Enable allmulticast or promisc mode on VF ip link set ens7f0v0 allmulticast on ip link set ens7f0v0 promisc on 3. Delete VF echo 0 > sys/class/net/ens7f0/device/sriov_numvfs 4. Try to enable promisc mode on PF ip link set ens7f0 promisc on
Although it looks that promisc mode on PF is enabled the opposite is true because ice_vsi_sync_fltr() responsible for IFF_PROMISC handling first checks if any other VSI is set as default forwarding one and if so the function does not do anything. At this point it is not possible to enable promisc mode on PF without re-probe device.
To resolve the issue this patch clear default forwarding VSI during ice_vsi_release() when the VSI to be released is the default one.
Fixes: 01b5e89aab49 ("ice: Add VF promiscuous support") Signed-off-by: Ivan Vecera ivecera@redhat.com Reviewed-by: Michal Swiatkowski michal.swiatkowski@linux.intel.com Reviewed-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Signed-off-by: Alice Michael alice.michael@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_lib.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index 8c08997dcef6..a5fd29ffdebe 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -2923,6 +2923,8 @@ int ice_vsi_release(struct ice_vsi *vsi) } }
+ if (ice_is_vsi_dflt_vsi(pf->first_sw, vsi)) + ice_clear_dflt_vsi(pf->first_sw); ice_fltr_remove_all(vsi); ice_rm_vsi_lan_cfg(vsi->port_info, vsi->idx); err = ice_rm_vsi_rdma_cfg(vsi->port_info, vsi->idx);
From: Matt Johnston matt@codeconstruct.com.au
[ Upstream commit 60be976ac45137657b7b505d7e0d44d0e51accb7 ]
dev_hard_header() returns the length of the header, so we need to test for negative errors rather than non-zero.
Fixes: 889b7da23abf ("mctp: Add initial routing framework") Signed-off-by: Matt Johnston matt@codeconstruct.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/mctp/route.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mctp/route.c b/net/mctp/route.c index fb1bf4ec8529..bbb13dbc9227 100644 --- a/net/mctp/route.c +++ b/net/mctp/route.c @@ -396,7 +396,7 @@ static int mctp_route_output(struct mctp_route *route, struct sk_buff *skb)
rc = dev_hard_header(skb, skb->dev, ntohs(skb->protocol), daddr, skb->dev->dev_addr, skb->len); - if (rc) { + if (rc < 0) { kfree_skb(skb); return -EHOSTUNREACH; }
From: Nikolay Aleksandrov razor@blackwall.org
[ Upstream commit 6bf92d70e690b7ff12b24f4bfff5e5434d019b82 ]
FRR folks have hit a kernel warning[1] while deleting routes[2] which is caused by trying to delete a route pointing to a nexthop id without specifying nhid but matching on an interface. That is, a route is found but we hit a warning while matching it. The warning is from fib_info_nh() in include/net/nexthop.h because we run it on a fib_info with nexthop object. The call chain is: inet_rtm_delroute -> fib_table_delete -> fib_nh_match (called with a nexthop fib_info and also with fc_oif set thus calling fib_info_nh on the fib_info and triggering the warning). The fix is to not do any matching in that branch if the fi has a nexthop object because those are managed separately. I.e. we should match when deleting without nh spec and should fail when deleting a nexthop route with old-style nh spec because nexthop objects are managed separately, e.g.: $ ip r show 1.2.3.4/32 1.2.3.4 nhid 12 via 192.168.11.2 dev dummy0
$ ip r del 1.2.3.4/32 $ ip r del 1.2.3.4/32 nhid 12 <both should work>
$ ip r del 1.2.3.4/32 dev dummy0 <should fail with ESRCH>
[1] [ 523.462226] ------------[ cut here ]------------ [ 523.462230] WARNING: CPU: 14 PID: 22893 at include/net/nexthop.h:468 fib_nh_match+0x210/0x460 [ 523.462236] Modules linked in: dummy rpcsec_gss_krb5 xt_socket nf_socket_ipv4 nf_socket_ipv6 ip6table_raw iptable_raw bpf_preload xt_statistic ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_mark nf_tables xt_nat veth nf_conntrack_netlink nfnetlink xt_addrtype br_netfilter overlay dm_crypt nfsv3 nfs fscache netfs vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack 8021q garp mrp ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bridge stp llc rfcomm snd_seq_dummy snd_hrtimer rpcrdma rdma_cm iw_cm ib_cm ib_core ip6table_filter xt_comment ip6_tables vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) qrtr bnep binfmt_misc xfs vfat fat squashfs loop nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(POE) nvidia(POE) intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi btusb btrtl iwlmvm uvcvideo btbcm snd_hda_intel edac_mce_amd [ 523.462274] videobuf2_vmalloc videobuf2_memops btintel snd_intel_dspcfg videobuf2_v4l2 snd_intel_sdw_acpi bluetooth snd_usb_audio snd_hda_codec mac80211 snd_usbmidi_lib joydev snd_hda_core videobuf2_common kvm_amd snd_rawmidi snd_hwdep snd_seq videodev ccp snd_seq_device libarc4 ecdh_generic mc snd_pcm kvm iwlwifi snd_timer drm_kms_helper snd cfg80211 cec soundcore irqbypass rapl wmi_bmof i2c_piix4 rfkill k10temp pcspkr acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc drm zram ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel nvme sp5100_tco r8169 nvme_core wmi ipmi_devintf ipmi_msghandler fuse [ 523.462300] CPU: 14 PID: 22893 Comm: ip Tainted: P OE 5.16.18-200.fc35.x86_64 #1 [ 523.462302] Hardware name: Micro-Star International Co., Ltd. MS-7C37/MPG X570 GAMING EDGE WIFI (MS-7C37), BIOS 1.C0 10/29/2020 [ 523.462303] RIP: 0010:fib_nh_match+0x210/0x460 [ 523.462304] Code: 7c 24 20 48 8b b5 90 00 00 00 e8 bb ee f4 ff 48 8b 7c 24 20 41 89 c4 e8 ee eb f4 ff 45 85 e4 0f 85 2e fe ff ff e9 4c ff ff ff <0f> 0b e9 17 ff ff ff 3c 0a 0f 85 61 fe ff ff 48 8b b5 98 00 00 00 [ 523.462306] RSP: 0018:ffffaa53d4d87928 EFLAGS: 00010286 [ 523.462307] RAX: 0000000000000000 RBX: ffffaa53d4d87a90 RCX: ffffaa53d4d87bb0 [ 523.462308] RDX: ffff9e3d2ee6be80 RSI: ffffaa53d4d87a90 RDI: ffffffff920ed380 [ 523.462309] RBP: ffff9e3d2ee6be80 R08: 0000000000000064 R09: 0000000000000000 [ 523.462310] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000031 [ 523.462310] R13: 0000000000000020 R14: 0000000000000000 R15: ffff9e3d331054e0 [ 523.462311] FS: 00007f245517c1c0(0000) GS:ffff9e492ed80000(0000) knlGS:0000000000000000 [ 523.462313] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 523.462313] CR2: 000055e5dfdd8268 CR3: 00000003ef488000 CR4: 0000000000350ee0 [ 523.462315] Call Trace: [ 523.462316] <TASK> [ 523.462320] fib_table_delete+0x1a9/0x310 [ 523.462323] inet_rtm_delroute+0x93/0x110 [ 523.462325] rtnetlink_rcv_msg+0x133/0x370 [ 523.462327] ? _copy_to_iter+0xb5/0x6f0 [ 523.462330] ? rtnl_calcit.isra.0+0x110/0x110 [ 523.462331] netlink_rcv_skb+0x50/0xf0 [ 523.462334] netlink_unicast+0x211/0x330 [ 523.462336] netlink_sendmsg+0x23f/0x480 [ 523.462338] sock_sendmsg+0x5e/0x60 [ 523.462340] ____sys_sendmsg+0x22c/0x270 [ 523.462341] ? import_iovec+0x17/0x20 [ 523.462343] ? sendmsg_copy_msghdr+0x59/0x90 [ 523.462344] ? __mod_lruvec_page_state+0x85/0x110 [ 523.462348] ___sys_sendmsg+0x81/0xc0 [ 523.462350] ? netlink_seq_start+0x70/0x70 [ 523.462352] ? __dentry_kill+0x13a/0x180 [ 523.462354] ? __fput+0xff/0x250 [ 523.462356] __sys_sendmsg+0x49/0x80 [ 523.462358] do_syscall_64+0x3b/0x90 [ 523.462361] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 523.462364] RIP: 0033:0x7f24552aa337 [ 523.462365] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 523.462366] RSP: 002b:00007fff7f05a838 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 523.462368] RAX: ffffffffffffffda RBX: 000000006245bf91 RCX: 00007f24552aa337 [ 523.462368] RDX: 0000000000000000 RSI: 00007fff7f05a8a0 RDI: 0000000000000003 [ 523.462369] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 [ 523.462370] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000001 [ 523.462370] R13: 00007fff7f05ce08 R14: 0000000000000000 R15: 000055e5dfdd1040 [ 523.462373] </TASK> [ 523.462374] ---[ end trace ba537bc16f6bf4ed ]---
[2] https://github.com/FRRouting/frr/issues/6412
Fixes: 4c7e8084fd46 ("ipv4: Plumb support for nexthop object in a fib_info") Signed-off-by: Nikolay Aleksandrov razor@blackwall.org Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/fib_semantics.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index d244c57b7303..b5563f5ff176 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -887,8 +887,13 @@ int fib_nh_match(struct net *net, struct fib_config *cfg, struct fib_info *fi, }
if (cfg->fc_oif || cfg->fc_gw_family) { - struct fib_nh *nh = fib_info_nh(fi, 0); + struct fib_nh *nh; + + /* cannot match on nexthop object attributes */ + if (fi->nh) + return 1;
+ nh = fib_info_nh(fi, 0); if (cfg->fc_encap) { if (fib_encap_match(net, cfg->fc_encap_type, cfg->fc_encap, nh, cfg, extack))
From: Chen-Yu Tsai wens@csie.org
[ Upstream commit c21cabb0fd0b54b8b54235fc1ecfe1195a23bcb2 ]
In commit 9cbadf094d9d ("net: stmmac: support max-speed device tree property"), when DT platforms don't set "max-speed", max_speed is set to -1; for non-DT platforms, it stays the default 0.
Prior to commit eeef2f6b9f6e ("net: stmmac: Start adding phylink support"), the check for a valid max_speed setting was to check if it was greater than zero. This commit got it right, but subsequent patches just checked for non-zero, which is incorrect for DT platforms.
In commit 92c3807b9ac3 ("net: stmmac: convert to phylink_get_linkmodes()") the conversion switched completely to checking for non-zero value as a valid value, which caused 1000base-T to stop getting advertised by default.
Instead of trying to fix all the checks, simply leave max_speed alone if DT property parsing fails.
Fixes: 9cbadf094d9d ("net: stmmac: support max-speed device tree property") Fixes: 92c3807b9ac3 ("net: stmmac: convert to phylink_get_linkmodes()") Signed-off-by: Chen-Yu Tsai wens@csie.org Acked-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Reviewed-by: Srinivas Kandagatla srinivas.kandagatla@linaro.org Link: https://lore.kernel.org/r/20220331184832.16316-1-wens@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c index 5d29f336315b..11e1055e8260 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c @@ -431,8 +431,7 @@ stmmac_probe_config_dt(struct platform_device *pdev, u8 *mac) plat->phylink_node = np;
/* Get max speed of operation from device tree */ - if (of_property_read_u32(np, "max-speed", &plat->max_speed)) - plat->max_speed = -1; + of_property_read_u32(np, "max-speed", &plat->max_speed);
plat->bus_id = of_alias_get_id(np, "ethernet"); if (plat->bus_id < 0)
From: Jiasheng Jiang jiasheng@iscas.ac.cn
[ Upstream commit 8027a9ad9b3568c5eb49c968ad6c97f279d76730 ]
As the possible failure of the allocation, kmemdup() may return NULL pointer. Therefore, it should be better to check the return value of kmemdup() and return error if fails.
Fixes: dc80d7038883 ("drm/imx-ldb: Add support to drm-bridge") Signed-off-by: Jiasheng Jiang jiasheng@iscas.ac.cn Signed-off-by: Philipp Zabel p.zabel@pengutronix.de Link: https://lore.kernel.org/r/20220105074729.2363657-1-jiasheng@iscas.ac.cn Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/imx/imx-ldb.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/imx/imx-ldb.c b/drivers/gpu/drm/imx/imx-ldb.c index e5078d03020d..fb0e951248f6 100644 --- a/drivers/gpu/drm/imx/imx-ldb.c +++ b/drivers/gpu/drm/imx/imx-ldb.c @@ -572,6 +572,8 @@ static int imx_ldb_panel_ddc(struct device *dev, edidp = of_get_property(child, "edid", &edid_len); if (edidp) { channel->edid = kmemdup(edidp, edid_len, GFP_KERNEL); + if (!channel->edid) + return -ENOMEM; } else if (!channel->panel) { /* fallback to display-timings node */ ret = of_get_drm_display_mode(child,
From: José Expósito jose.exposito89@gmail.com
[ Upstream commit bce81feb03a20fca7bbdd1c4af16b4e9d5c0e1d3 ]
Avoid leaking the display mode variable if of_get_drm_display_mode fails.
Fixes: 76ecd9c9fb24 ("drm/imx: parallel-display: check return code from of_get_drm_display_mode()") Addresses-Coverity-ID: 1443943 ("Resource leak") Signed-off-by: José Expósito jose.exposito89@gmail.com Signed-off-by: Philipp Zabel p.zabel@pengutronix.de Link: https://lore.kernel.org/r/20220108165230.44610-1-jose.exposito89@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/imx/parallel-display.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/imx/parallel-display.c b/drivers/gpu/drm/imx/parallel-display.c index 06cb1a59b9bc..63ba2ad84679 100644 --- a/drivers/gpu/drm/imx/parallel-display.c +++ b/drivers/gpu/drm/imx/parallel-display.c @@ -75,8 +75,10 @@ static int imx_pd_connector_get_modes(struct drm_connector *connector) ret = of_get_drm_display_mode(np, &imxpd->mode, &imxpd->bus_flags, OF_USE_NATIVE_MODE); - if (ret) + if (ret) { + drm_mode_destroy(connector->dev, mode); return ret; + }
drm_mode_copy(mode, &imxpd->mode); mode->type |= DRM_MODE_TYPE_DRIVER | DRM_MODE_TYPE_PREFERRED;
From: Liu Ying victor.liu@nxp.com
[ Upstream commit e8083acc3f8cc2097917018e947fd4c857f60454 ]
In dw_hdmi_imx_probe(), if error happens after dw_hdmi_probe() returns successfully, dw_hdmi_remove() should be called where necessary as bailout.
Fixes: c805ec7eb210 ("drm/imx: dw_hdmi-imx: move initialization into probe") Cc: Philipp Zabel p.zabel@pengutronix.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Shawn Guo shawnguo@kernel.org Cc: Sascha Hauer s.hauer@pengutronix.de Cc: Pengutronix Kernel Team kernel@pengutronix.de Cc: Fabio Estevam festevam@gmail.com Cc: NXP Linux Team linux-imx@nxp.com Signed-off-by: Liu Ying victor.liu@nxp.com Signed-off-by: Philipp Zabel p.zabel@pengutronix.de Link: https://lore.kernel.org/r/20220128091944.3831256-1-victor.liu@nxp.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/imx/dw_hdmi-imx.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/imx/dw_hdmi-imx.c b/drivers/gpu/drm/imx/dw_hdmi-imx.c index 87428fb23d9f..a2277a0d6d06 100644 --- a/drivers/gpu/drm/imx/dw_hdmi-imx.c +++ b/drivers/gpu/drm/imx/dw_hdmi-imx.c @@ -222,6 +222,7 @@ static int dw_hdmi_imx_probe(struct platform_device *pdev) struct device_node *np = pdev->dev.of_node; const struct of_device_id *match = of_match_node(dw_hdmi_imx_dt_ids, np); struct imx_hdmi *hdmi; + int ret;
hdmi = devm_kzalloc(&pdev->dev, sizeof(*hdmi), GFP_KERNEL); if (!hdmi) @@ -243,10 +244,15 @@ static int dw_hdmi_imx_probe(struct platform_device *pdev) hdmi->bridge = of_drm_find_bridge(np); if (!hdmi->bridge) { dev_err(hdmi->dev, "Unable to find bridge\n"); + dw_hdmi_remove(hdmi->hdmi); return -ENODEV; }
- return component_add(&pdev->dev, &dw_hdmi_imx_ops); + ret = component_add(&pdev->dev, &dw_hdmi_imx_ops); + if (ret) + dw_hdmi_remove(hdmi->hdmi); + + return ret; }
static int dw_hdmi_imx_remove(struct platform_device *pdev)
From: Axel Lin axel.lin@ingics.com
[ Upstream commit 17049bf9de55a42ee96fd34520aff8a484677675 ]
The active_discharge_on setting was missed, so output discharge resistor is always disabled. Fix it.
Fixes: 0555d41497de ("regulator: rtq2134: Add support for Richtek RTQ2134 SubPMIC") Signed-off-by: Axel Lin axel.lin@ingics.com Link: https://lore.kernel.org/r/20220404022514.449231-1-axel.lin@ingics.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/regulator/rtq2134-regulator.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/regulator/rtq2134-regulator.c b/drivers/regulator/rtq2134-regulator.c index f21e3f8b21f2..8e13dea354a2 100644 --- a/drivers/regulator/rtq2134-regulator.c +++ b/drivers/regulator/rtq2134-regulator.c @@ -285,6 +285,7 @@ static const unsigned int rtq2134_buck_ramp_delay_table[] = { .enable_mask = RTQ2134_VOUTEN_MASK, \ .active_discharge_reg = RTQ2134_REG_BUCK##_id##_CFG0, \ .active_discharge_mask = RTQ2134_ACTDISCHG_MASK, \ + .active_discharge_on = RTQ2134_ACTDISCHG_MASK, \ .ramp_reg = RTQ2134_REG_BUCK##_id##_RSPCFG, \ .ramp_mask = RTQ2134_RSPUP_MASK, \ .ramp_delay_table = rtq2134_buck_ramp_delay_table, \
From: Axel Lin axel.lin@ingics.com
[ Upstream commit 2316f0fc0ad2aa87a568ceaf3d76be983ee555c3 ]
Without active_discharge_on setting, the SWITCH1 discharge enable control is always disabled. Fix it.
Fixes: 3b15ccac161a ("regulator: Add regulator driver for ATC260x PMICs") Signed-off-by: Axel Lin axel.lin@ingics.com Link: https://lore.kernel.org/r/20220403132235.123727-1-axel.lin@ingics.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/regulator/atc260x-regulator.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/regulator/atc260x-regulator.c b/drivers/regulator/atc260x-regulator.c index 05147d2c3842..485e58b264c0 100644 --- a/drivers/regulator/atc260x-regulator.c +++ b/drivers/regulator/atc260x-regulator.c @@ -292,6 +292,7 @@ enum atc2603c_reg_ids { .bypass_mask = BIT(5), \ .active_discharge_reg = ATC2603C_PMU_SWITCH_CTL, \ .active_discharge_mask = BIT(1), \ + .active_discharge_on = BIT(1), \ .owner = THIS_MODULE, \ }
From: Phil Auld pauld@redhat.com
[ Upstream commit 5524cbb1bfcdff0cad0aaa9f94e6092002a07259 ]
Arm64 systems rely on store_cpu_topology() to call update_siblings_masks() to transfer the toplogy to the various cpu masks. This needs to be done before the call to notify_cpu_starting() which tells the scheduler about each cpu found, otherwise the core scheduling data structures are setup in a way that does not match the actual topology.
With smt_mask not setup correctly we bail on `cpumask_weight(smt_mask) == 1` for !leaders in:
notify_cpu_starting() cpuhp_invoke_callback_range() sched_cpu_starting() sched_core_cpu_starting()
which leads to rq->core not being correctly set for !leader-rq's.
Without this change stress-ng (which enables core scheduling in its prctl tests in newer versions -- i.e. with PR_SCHED_CORE support) causes a warning and then a crash (trimmed for legibility):
[ 1853.805168] ------------[ cut here ]------------ [ 1853.809784] task_rq(b)->core != rq->core [ 1853.809792] WARNING: CPU: 117 PID: 0 at kernel/sched/fair.c:11102 cfs_prio_less+0x1b4/0x1c4 ... [ 1854.015210] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010 ... [ 1854.231256] Call trace: [ 1854.233689] pick_next_task+0x3dc/0x81c [ 1854.237512] __schedule+0x10c/0x4cc [ 1854.240988] schedule_idle+0x34/0x54
Fixes: 9edeaea1bc45 ("sched: Core-wide rq->lock") Signed-off-by: Phil Auld pauld@redhat.com Reviewed-by: Dietmar Eggemann dietmar.eggemann@arm.com Tested-by: Dietmar Eggemann dietmar.eggemann@arm.com Link: https://lore.kernel.org/r/20220331153926.25742-1-pauld@redhat.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/kernel/smp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 6f6ff072acbd..3beaa6640ab3 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -234,6 +234,7 @@ asmlinkage notrace void secondary_start_kernel(void) * Log the CPU info before it is marked online and might get read. */ cpuinfo_store_cpu(); + store_cpu_topology(cpu);
/* * Enable GIC and timers. @@ -242,7 +243,6 @@ asmlinkage notrace void secondary_start_kernel(void)
ipi_setup(cpu);
- store_cpu_topology(cpu); numa_add_cpu(cpu);
/*
From: Pavan Chebbi pavan.chebbi@broadcom.com
[ Upstream commit 4f81def272de17dc4bbd89ac38f49b2676c9b3d2 ]
If there are more CPUs than the number of TX XDP rings, multiple XDP redirects can select the same TX ring based on the CPU on which XDP redirect is called. Add locking when needed and use static key to decide whether to take the lock.
Fixes: f18c2b77b2e4 ("bnxt_en: optimized XDP_REDIRECT support") Signed-off-by: Pavan Chebbi pavan.chebbi@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 7 +++++++ drivers/net/ethernet/broadcom/bnxt/bnxt.h | 2 ++ drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 8 ++++++++ drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h | 2 ++ 4 files changed, 19 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index ce36ee5a250f..8b078c319872 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -3234,6 +3234,7 @@ static int bnxt_alloc_tx_rings(struct bnxt *bp) } qidx = bp->tc_to_qidx[j]; ring->queue_id = bp->q_info[qidx].queue_id; + spin_lock_init(&txr->xdp_tx_lock); if (i < bp->tx_nr_rings_xdp) continue; if (i % bp->tx_nr_rings_per_tc == (bp->tx_nr_rings_per_tc - 1)) @@ -10246,6 +10247,12 @@ static int __bnxt_open_nic(struct bnxt *bp, bool irq_re_init, bool link_re_init) if (irq_re_init) udp_tunnel_nic_reset_ntf(bp->dev);
+ if (bp->tx_nr_rings_xdp < num_possible_cpus()) { + if (!static_key_enabled(&bnxt_xdp_locking_key)) + static_branch_enable(&bnxt_xdp_locking_key); + } else if (static_key_enabled(&bnxt_xdp_locking_key)) { + static_branch_disable(&bnxt_xdp_locking_key); + } set_bit(BNXT_STATE_OPEN, &bp->state); bnxt_enable_int(bp); /* Enable TX queues */ diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index ca6fdf03e586..0aaaeecd67ea 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -791,6 +791,8 @@ struct bnxt_tx_ring_info { u32 dev_state;
struct bnxt_ring_struct tx_ring_struct; + /* Synchronize simultaneous xdp_xmit on same ring */ + spinlock_t xdp_tx_lock; };
#define BNXT_LEGACY_COAL_CMPL_PARAMS \ diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c index c8083df5e0ab..c59e46c7a1ca 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c @@ -20,6 +20,8 @@ #include "bnxt.h" #include "bnxt_xdp.h"
+DEFINE_STATIC_KEY_FALSE(bnxt_xdp_locking_key); + struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp, struct bnxt_tx_ring_info *txr, dma_addr_t mapping, u32 len) @@ -227,6 +229,9 @@ int bnxt_xdp_xmit(struct net_device *dev, int num_frames, ring = smp_processor_id() % bp->tx_nr_rings_xdp; txr = &bp->tx_ring[ring];
+ if (static_branch_unlikely(&bnxt_xdp_locking_key)) + spin_lock(&txr->xdp_tx_lock); + for (i = 0; i < num_frames; i++) { struct xdp_frame *xdp = frames[i];
@@ -250,6 +255,9 @@ int bnxt_xdp_xmit(struct net_device *dev, int num_frames, bnxt_db_write(bp, &txr->tx_db, txr->tx_prod); }
+ if (static_branch_unlikely(&bnxt_xdp_locking_key)) + spin_unlock(&txr->xdp_tx_lock); + return nxmit; }
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h index 0df40c3beb05..067bb5e821f5 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h @@ -10,6 +10,8 @@ #ifndef BNXT_XDP_H #define BNXT_XDP_H
+DECLARE_STATIC_KEY_FALSE(bnxt_xdp_locking_key); + struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp, struct bnxt_tx_ring_info *txr, dma_addr_t mapping, u32 len);
From: Andy Gospodarek gospo@broadcom.com
[ Upstream commit facc173cf700e55b2ad249ecbd3a7537f7315691 ]
Insufficient space was being reserved in the page used for packet reception, so the interface MTU could be set too large to still have room for the contents of the packet when doing XDP redirect. This resulted in the following message when redirecting a packet between 3520 and 3822 bytes with an MTU of 3822:
[311815.561880] XDP_WARN: xdp_update_frame_from_buff(line:200): Driver BUG: missing reserved tailroom
Fixes: f18c2b77b2e4 ("bnxt_en: optimized XDP_REDIRECT support") Reviewed-by: Somnath Kotur somnath.kotur@broadcom.com Reviewed-by: Pavan Chebbi pavan.chebbi@broadcom.com Signed-off-by: Andy Gospodarek gospo@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index 0aaaeecd67ea..e5874c829226 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -584,7 +584,8 @@ struct nqe_cn { #define BNXT_MAX_MTU 9500 #define BNXT_MAX_PAGE_MODE_MTU \ ((unsigned int)PAGE_SIZE - VLAN_ETH_HLEN - NET_IP_ALIGN - \ - XDP_PACKET_HEADROOM) + XDP_PACKET_HEADROOM - \ + SKB_DATA_ALIGN((unsigned int)sizeof(struct skb_shared_info)))
#define BNXT_MIN_PKT_SIZE 52
From: Ray Jui ray.jui@broadcom.com
[ Upstream commit 27d4073f8d9af0340362554414f4961643a4f4de ]
Add checks in the XDP redirect callback to prevent XDP from running when the TX ring is undergoing shutdown.
Also remove redundant checks in the XDP redirect callback to validate the txr and the flag that indicates the ring supports XDP. The modulo arithmetic on 'tx_nr_rings_xdp' already guarantees the derived TX ring is an XDP ring. txr is also guaranteed to be valid after checking BNXT_STATE_OPEN and within RCU grace period.
Fixes: f18c2b77b2e4 ("bnxt_en: optimized XDP_REDIRECT support") Reviewed-by: Vladimir Olovyannikov vladimir.olovyannikov@broadcom.com Signed-off-by: Ray Jui ray.jui@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c index c59e46c7a1ca..148b58f3468b 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c @@ -229,14 +229,16 @@ int bnxt_xdp_xmit(struct net_device *dev, int num_frames, ring = smp_processor_id() % bp->tx_nr_rings_xdp; txr = &bp->tx_ring[ring];
+ if (READ_ONCE(txr->dev_state) == BNXT_DEV_STATE_CLOSING) + return -EINVAL; + if (static_branch_unlikely(&bnxt_xdp_locking_key)) spin_lock(&txr->xdp_tx_lock);
for (i = 0; i < num_frames; i++) { struct xdp_frame *xdp = frames[i];
- if (!txr || !bnxt_tx_avail(bp, txr) || - !(bp->bnapi[ring]->flags & BNXT_NAPI_FLAG_XDP)) + if (!bnxt_tx_avail(bp, txr)) break;
mapping = dma_map_single(&pdev->dev, xdp->data, xdp->len,
From: Martin Habets habetsm.xilinx@gmail.com
[ Upstream commit 458f5d92df4807e2a7c803ed928369129996bf96 ]
When the page_ring is not used page_ptr_mask is 0. Do not dereference page_ring[0] in this case.
Fixes: 2768935a4660 ("sfc: reuse pages to avoid DMA mapping/unmapping costs") Reported-by: Taehee Yoo ap420073@gmail.com Signed-off-by: Martin Habets habetsm.xilinx@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/sfc/rx_common.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/sfc/rx_common.c b/drivers/net/ethernet/sfc/rx_common.c index 633ca77a26fd..b925de9b4302 100644 --- a/drivers/net/ethernet/sfc/rx_common.c +++ b/drivers/net/ethernet/sfc/rx_common.c @@ -166,6 +166,9 @@ static void efx_fini_rx_recycle_ring(struct efx_rx_queue *rx_queue) struct efx_nic *efx = rx_queue->efx; int i;
+ if (unlikely(!rx_queue->page_ring)) + return; + /* Unmap and release the pages in the recycle ring. Remove the ring. */ for (i = 0; i <= rx_queue->page_ptr_mask; i++) { struct page *page = rx_queue->page_ring[i];
From: Aharon Landau aharonl@nvidia.com
[ Upstream commit 84c2362fb65d69c721fec0974556378cbb36a62b ]
Don't remove MRs from the cache if need to delay the removal.
Fixes: b9358bdbc713 ("RDMA/mlx5: Fix locking in MR cache work queue") Link: https://lore.kernel.org/r/c3087a90ff362c8796c7eaa2715128743ce36722.164906243... Signed-off-by: Aharon Landau aharonl@nvidia.com Reviewed-by: Shay Drory shayd@nvidia.com Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/mlx5/mr.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index 7bb1b9d0941c..85289fddc2ae 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -536,8 +536,10 @@ static void __cache_work_func(struct mlx5_cache_ent *ent) spin_lock_irq(&ent->lock); if (ent->disabled) goto out; - if (need_delay) + if (need_delay) { queue_delayed_work(cache->wq, &ent->dwork, 300 * HZ); + goto out; + } remove_cache_mr_locked(ent); queue_adjust_cache_locked(ent); }
From: Aharon Landau aharonl@nvidia.com
[ Upstream commit 1d735eeee63a0beb65180ca0224f239cc0c9f804 ]
Update cache->last_add when returning an MR to the cache so that the cache work won't remove it.
Fixes: b9358bdbc713 ("RDMA/mlx5: Fix locking in MR cache work queue") Link: https://lore.kernel.org/r/c99f076fce4b44829d434936bbcd3b5fc4c95020.164906243... Signed-off-by: Aharon Landau aharonl@nvidia.com Reviewed-by: Shay Drory shayd@nvidia.com Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/mlx5/mr.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index 85289fddc2ae..cf203f879d34 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -635,6 +635,7 @@ static void mlx5_mr_cache_free(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr) { struct mlx5_cache_ent *ent = mr->cache_ent;
+ WRITE_ONCE(dev->cache.last_add, jiffies); spin_lock_irq(&ent->lock); list_add_tail(&mr->list, &ent->head); ent->available_mrs++;
From: Mark Zhang markzhang@nvidia.com
[ Upstream commit 107dd7beba403a363adfeb3ffe3734fe38a05cce ]
On the passive side when the disconnectReq event comes, if the current state is MRA_REP_RCVD, it needs to cancel the MAD before entering the DREQ_RCVD and TIMEWAIT states, otherwise the destroy_id may block until this mad will reach timeout.
Fixes: a977049dacde ("[PATCH] IB: Add the kernel CM implementation") Link: https://lore.kernel.org/r/75261c00c1d82128b1d981af9ff46e994186e621.164906243... Signed-off-by: Mark Zhang markzhang@nvidia.com Reviewed-by: Maor Gottlieb maorg@nvidia.com Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/core/cm.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 35f0d5e7533d..1c107d6d03b9 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -2824,6 +2824,7 @@ static int cm_dreq_handler(struct cm_work *work) switch (cm_id_priv->id.state) { case IB_CM_REP_SENT: case IB_CM_DREQ_SENT: + case IB_CM_MRA_REP_RCVD: ib_cancel_mad(cm_id_priv->msg); break; case IB_CM_ESTABLISHED: @@ -2831,8 +2832,6 @@ static int cm_dreq_handler(struct cm_work *work) cm_id_priv->id.lap_state == IB_CM_MRA_LAP_RCVD) ib_cancel_mad(cm_id_priv->msg); break; - case IB_CM_MRA_REP_RCVD: - break; case IB_CM_TIMEWAIT: atomic_long_inc(&work->port->counters[CM_RECV_DUPLICATES] [CM_DREQ_COUNTER]);
From: Niels Dossche dossche.niels@gmail.com
[ Upstream commit 4d809f69695d4e7d1378b3a072fa9aef23123018 ]
The documentation of the function rvt_error_qp says both r_lock and s_lock need to be held when calling that function. It also asserts using lockdep that both of those locks are held. However, the commit I referenced in Fixes accidentally makes the call to rvt_error_qp in rvt_ruc_loopback no longer covered by r_lock. This results in the lockdep assertion failing and also possibly in a race condition.
Fixes: d757c60eca9b ("IB/rdmavt: Fix concurrency panics in QP post_send and modify to error") Link: https://lore.kernel.org/r/20220228165330.41546-1-dossche.niels@gmail.com Signed-off-by: Niels Dossche dossche.niels@gmail.com Acked-by: Dennis Dalessandro dennis.dalessandro@cornelisnetworks.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/sw/rdmavt/qp.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/sw/rdmavt/qp.c b/drivers/infiniband/sw/rdmavt/qp.c index ae50b56e8913..8ef112f883a7 100644 --- a/drivers/infiniband/sw/rdmavt/qp.c +++ b/drivers/infiniband/sw/rdmavt/qp.c @@ -3190,7 +3190,11 @@ void rvt_ruc_loopback(struct rvt_qp *sqp) spin_lock_irqsave(&sqp->s_lock, flags); rvt_send_complete(sqp, wqe, send_status); if (sqp->ibqp.qp_type == IB_QPT_RC) { - int lastwqe = rvt_error_qp(sqp, IB_WC_WR_FLUSH_ERR); + int lastwqe; + + spin_lock(&sqp->r_lock); + lastwqe = rvt_error_qp(sqp, IB_WC_WR_FLUSH_ERR); + spin_unlock(&sqp->r_lock);
sqp->s_flags &= ~RVT_S_BUSY; spin_unlock_irqrestore(&sqp->s_lock, flags);
From: Jamie Bainbridge jamie.bainbridge@gmail.com
[ Upstream commit e3d37210df5c41c51147a2d5d465de1a4d77be7a ]
Singleton chunks (INIT, HEARTBEAT PMTU probes, and SHUTDOWN- COMPLETE) are not counted in SCTP_GET_ASOC_STATS "sas_octrlchunks" counter available to the assoc owner.
These are all control chunks so they should be counted as such.
Add counting of singleton chunks so they are properly accounted for.
Fixes: 196d67593439 ("sctp: Add support to per-association statistics via a new SCTP_GET_ASSOC_STATS call") Signed-off-by: Jamie Bainbridge jamie.bainbridge@gmail.com Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Link: https://lore.kernel.org/r/c9ba8785789880cf07923b8a5051e174442ea9ee.164902966... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sctp/outqueue.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index ff47091c385e..b3950963fc8f 100644 --- a/net/sctp/outqueue.c +++ b/net/sctp/outqueue.c @@ -911,6 +911,7 @@ static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx) ctx->asoc->base.sk->sk_err = -error; return; } + ctx->asoc->stats.octrlchunks++; break;
case SCTP_CID_ABORT: @@ -935,7 +936,10 @@ static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx)
case SCTP_CID_HEARTBEAT: if (chunk->pmtu_probe) { - sctp_packet_singleton(ctx->transport, chunk, ctx->gfp); + error = sctp_packet_singleton(ctx->transport, + chunk, ctx->gfp); + if (!error) + ctx->asoc->stats.octrlchunks++; break; } fallthrough;
From: Miaoqian Lin linmq006@gmail.com
[ Upstream commit 2b04bd4f03bba021959ca339314f6739710f0954 ]
This node pointer is returned by of_find_compatible_node() with refcount incremented. Calling of_node_put() to aovid the refcount leak.
Fixes: d346c9e86d86 ("dpaa2-ptp: reuse ptp_qoriq driver") Signed-off-by: Miaoqian Lin linmq006@gmail.com Link: https://lore.kernel.org/r/20220404125336.13427-1-linmq006@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c index 32b5faa87bb8..208a3459f2e2 100644 --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c @@ -168,7 +168,7 @@ static int dpaa2_ptp_probe(struct fsl_mc_device *mc_dev) base = of_iomap(node, 0); if (!base) { err = -ENOMEM; - goto err_close; + goto err_put; }
err = fsl_mc_allocate_irqs(mc_dev); @@ -212,6 +212,8 @@ static int dpaa2_ptp_probe(struct fsl_mc_device *mc_dev) fsl_mc_free_irqs(mc_dev); err_unmap: iounmap(base); +err_put: + of_node_put(node); err_close: dprtc_close(mc_dev->mc_io, 0, mc_dev->mc_handle); err_free_mcp:
From: Anatolii Gerasymenko anatolii.gerasymenko@intel.com
[ Upstream commit ccfee1822042b87e5135d33cad8ea353e64612d2 ]
When VF is freshly created, but not brought up, ring->txq_teid value is by default set to 0. But 0 is a valid TEID. On some platforms the Root Node of Tx scheduler has a TEID = 0. This can cause issues as shown below.
The proper way is to set ring->txq_teid to ICE_INVAL_TEID (0xFFFFFFFF).
Testing Hints: echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs ip link set dev ens785f0v0 up ip link set dev ens785f0v0 down
If we have freshly created VF and quickly turn it on and off, so there would be no time to reach VIRTCHNL_OP_CONFIG_VSI_QUEUES stage, then VIRTCHNL_OP_DISABLE_QUEUES stage will fail with error: [ 639.531454] disable queue 89 failed 14 [ 639.532233] Failed to disable LAN Tx queues, error: ICE_ERR_AQ_ERROR [ 639.533107] ice 0000:02:00.0: Failed to stop Tx ring 0 on VSI 5
The reason for the fail is that we are trying to send AQ command to delete queue 89, which has never been created and receive an "invalid argument" error from firmware.
As this queue has never been created, it's teid and ring->txq_teid have default value 0. ice_dis_vsi_txq has a check against non-existent queues:
node = ice_sched_find_node_by_teid(pi->root, q_teids[i]); if (!node) continue;
But on some platforms the Root Node of Tx scheduler has a teid = 0. Hence, ice_sched_find_node_by_teid finds a node with teid = 0 (it is pi->root), and we go further to submit an erroneous request to firmware.
Fixes: 37bb83901286 ("ice: Move common functions out of ice_main.c part 7/7") Signed-off-by: Anatolii Gerasymenko anatolii.gerasymenko@intel.com Reviewed-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Tested-by: Konrad Jankowski konrad0.jankowski@intel.com Signed-off-by: Alice Michael alice.michael@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_lib.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index a5fd29ffdebe..653996e8fd30 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -1306,6 +1306,7 @@ static int ice_vsi_alloc_rings(struct ice_vsi *vsi) ring->tx_tstamps = &pf->ptp.port.tx; ring->dev = dev; ring->count = vsi->num_tx_desc; + ring->txq_teid = ICE_INVAL_TEID; WRITE_ONCE(vsi->tx_rings[i], ring); }
From: Anatolii Gerasymenko anatolii.gerasymenko@intel.com
[ Upstream commit 05ef6813b234db3196f083b91db3963f040b65bb ]
Disable check for queue being enabled in ice_vc_dis_qs_msg, because there could be a case when queues were created, but were not enabled. We still need to delete those queues.
Normal workflow for VF looks like: Enable path: VIRTCHNL_OP_ADD_ETH_ADDR (opcode 10) VIRTCHNL_OP_CONFIG_VSI_QUEUES (opcode 6) VIRTCHNL_OP_ENABLE_QUEUES (opcode 8)
Disable path: VIRTCHNL_OP_DISABLE_QUEUES (opcode 9) VIRTCHNL_OP_DEL_ETH_ADDR (opcode 11)
The issue appears only in stress conditions when VF is enabled and disabled very fast. Eventually there will be a case, when queues are created by VIRTCHNL_OP_CONFIG_VSI_QUEUES, but are not enabled by VIRTCHNL_OP_ENABLE_QUEUES. In turn, these queues are not deleted by VIRTCHNL_OP_DISABLE_QUEUES, because there is a check whether queues are enabled in ice_vc_dis_qs_msg.
When we bring up the VF again, we will see the "Failed to set LAN Tx queue context" error during VIRTCHNL_OP_CONFIG_VSI_QUEUES step. This happens because old 16 queues were not deleted and VF requests to create 16 more, but ice_sched_get_free_qparent in ice_ena_vsi_txq would fail to find a parent node for first newly requested queue (because all nodes are allocated to 16 old queues).
Testing Hints:
Just enable and disable VF fast enough, so it would be disabled before reaching VIRTCHNL_OP_ENABLE_QUEUES.
while true; do ip link set dev ens785f0v0 up sleep 0.065 # adjust delay value for you machine ip link set dev ens785f0v0 down done
Fixes: 77ca27c41705 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap") Signed-off-by: Anatolii Gerasymenko anatolii.gerasymenko@intel.com Tested-by: Konrad Jankowski konrad0.jankowski@intel.com Signed-off-by: Alice Michael alice.michael@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c index 4338e4ff7e85..9d4d58757e04 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c @@ -3335,9 +3335,9 @@ static int ice_vc_dis_qs_msg(struct ice_vf *vf, u8 *msg) goto error_param; }
- /* Skip queue if not enabled */ if (!test_bit(vf_q_id, vf->txq_ena)) - continue; + dev_dbg(ice_pf_to_dev(vsi->back), "Queue %u on VSI %u is not enabled, but stopping it anyway\n", + vf_q_id, vsi->vsi_num);
ice_fill_txq_meta(vsi, ring, &txq_meta);
From: David Ahern dsahern@kernel.org
[ Upstream commit 1158f79f82d437093aeed87d57df0548bdd68146 ]
VRF devices are the loopbacks for VRFs, and a loopback can not be assigned to a VRF. Accordingly, the condition in ip6_pkt_drop should be '||' not '&&'.
Fixes: 1d3fd8a10bed ("vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach") Reported-by: Pudak, Filip Filip.Pudak@windriver.com Reported-by: Xiao, Jiguang Jiguang.Xiao@windriver.com Signed-off-by: David Ahern dsahern@kernel.org Link: https://lore.kernel.org/r/20220404150908.2937-1-dsahern@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/route.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c index e0766bdf20e7..6b269595efaa 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -4509,7 +4509,7 @@ static int ip6_pkt_drop(struct sk_buff *skb, u8 code, int ipstats_mib_noroutes) struct inet6_dev *idev; int type;
- if (netif_is_l3_master(skb->dev) && + if (netif_is_l3_master(skb->dev) || dst->dev == net->loopback_dev) idev = __in6_dev_get_safely(dev_get_by_index_rcu(net, IP6CB(skb)->iif)); else
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit f9124c68f05ffdb87a47e3ea6d5fae9dad7cb6eb ]
Unfortunately, the ice driver doesn't respect the RCU critical section that XSK wakeup is surrounded with. To fix this, add synchronize_rcu() calls to paths that destroy resources that might be in use.
This was addressed in other AF_XDP ZC enabled drivers, for reference see for example commit b3873a5be757 ("net/i40e: Fix concurrency issues between config flow and XSK")
Fixes: efc2214b6047 ("ice: Add support for XDP") Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Tested-by: Shwetha Nagaraju shwetha.nagaraju@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice.h | 2 +- drivers/net/ethernet/intel/ice/ice_main.c | 4 +++- drivers/net/ethernet/intel/ice/ice_xsk.c | 4 +++- 3 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index 7e5daede3a2e..df65bb494695 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -552,7 +552,7 @@ static inline struct ice_pf *ice_netdev_to_pf(struct net_device *netdev)
static inline bool ice_is_xdp_ena_vsi(struct ice_vsi *vsi) { - return !!vsi->xdp_prog; + return !!READ_ONCE(vsi->xdp_prog); }
static inline void ice_set_ring_xdp(struct ice_ring *ring) diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 7f68132b8a1f..f330bd0acf9f 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -2612,8 +2612,10 @@ int ice_destroy_xdp_rings(struct ice_vsi *vsi)
for (i = 0; i < vsi->num_xdp_txq; i++) if (vsi->xdp_rings[i]) { - if (vsi->xdp_rings[i]->desc) + if (vsi->xdp_rings[i]->desc) { + synchronize_rcu(); ice_free_tx_ring(vsi->xdp_rings[i]); + } kfree_rcu(vsi->xdp_rings[i], rcu); vsi->xdp_rings[i] = NULL; } diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 37c7dc6b44a9..03ba08fdcee8 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -36,8 +36,10 @@ static void ice_qp_reset_stats(struct ice_vsi *vsi, u16 q_idx) static void ice_qp_clean_rings(struct ice_vsi *vsi, u16 q_idx) { ice_clean_tx_ring(vsi->tx_rings[q_idx]); - if (ice_is_xdp_ena_vsi(vsi)) + if (ice_is_xdp_ena_vsi(vsi)) { + synchronize_rcu(); ice_clean_tx_ring(vsi->xdp_rings[q_idx]); + } ice_clean_rx_ring(vsi->rx_rings[q_idx]); }
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit 72b915a2b444e9247c9d424a840e94263db07c27 ]
ICE_DOWN is dedicated for pf->state. Check for ICE_VSI_DOWN being set on vsi->state in ice_xsk_wakeup().
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Tested-by: Shwetha Nagaraju shwetha.nagaraju@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_xsk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 03ba08fdcee8..2b1873061912 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -761,7 +761,7 @@ ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, struct ice_vsi *vsi = np->vsi; struct ice_ring *ring;
- if (test_bit(ICE_DOWN, vsi->state)) + if (test_bit(ICE_VSI_DOWN, vsi->state)) return -ENETDOWN;
if (!ice_is_xdp_ena_vsi(vsi))
From: Ilya Maximets i.maximets@ovn.org
[ Upstream commit 3f2a3050b4a3e7f32fc0ea3c9b0183090ae00522 ]
'OVS_CLONE_ATTR_EXEC' is an internal attribute that is used for performance optimization inside the kernel. It's added by the kernel while parsing user-provided actions and should not be sent during the flow dump as it's not part of the uAPI.
The issue doesn't cause any significant problems to the ovs-vswitchd process, because reported actions are not really used in the application lifecycle and only supposed to be shown to a human via ovs-dpctl flow dump. However, the action list is still incorrect and causes the following error if the user wants to look at the datapath flows:
# ovs-dpctl add-dp system@ovs-system # ovs-dpctl add-flow "<flow match>" "clone(ct(commit),0)" # ovs-dpctl dump-flows <flow match>, packets:0, bytes:0, used:never, actions:clone(bad length 4, expected -1 for: action0(01 00 00 00), ct(commit),0)
With the fix:
# ovs-dpctl dump-flows <flow match>, packets:0, bytes:0, used:never, actions:clone(ct(commit),0)
Additionally fixed an incorrect attribute name in the comment.
Fixes: b233504033db ("openvswitch: kernel datapath clone action") Signed-off-by: Ilya Maximets i.maximets@ovn.org Acked-by: Aaron Conole aconole@redhat.com Link: https://lore.kernel.org/r/20220404104150.2865736-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/openvswitch/actions.c | 2 +- net/openvswitch/flow_netlink.c | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 780d9e2246f3..8955f31fa47e 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -1051,7 +1051,7 @@ static int clone(struct datapath *dp, struct sk_buff *skb, int rem = nla_len(attr); bool dont_clone_flow_key;
- /* The first action is always 'OVS_CLONE_ATTR_ARG'. */ + /* The first action is always 'OVS_CLONE_ATTR_EXEC'. */ clone_arg = nla_data(attr); dont_clone_flow_key = nla_get_u32(clone_arg); actions = nla_next(clone_arg, &rem); diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c index 0d677c9c2c80..2679007f8aeb 100644 --- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -3429,7 +3429,9 @@ static int clone_action_to_attr(const struct nlattr *attr, if (!start) return -EMSGSIZE;
- err = ovs_nla_put_actions(nla_data(attr), rem, skb); + /* Skipping the OVS_CLONE_ATTR_EXEC that is always the first attribute. */ + attr = nla_next(nla_data(attr), &rem); + err = ovs_nla_put_actions(attr, rem, skb);
if (err) nla_nest_cancel(skb, start);
From: Andrew Lunn andrew@lunn.ch
[ Upstream commit 11f8e7c122ce013fa745029fa8c94c6db69c2e54 ]
There is often not a MAC address available in an EEPROM accessible by Linux with Marvell devices. Instead the bootload has the MAC address and directly programs it into the hardware. So don't consider an error from of_get_mac_address() has fatal. However, the check was added for the case where there is a MAC address in an the EEPROM, but the EEPROM has not probed yet, and -EPROBE_DEFER is returned. In that case the error should be returned. So make the check specific to this error code.
Cc: Mauri Sandberg maukka@ext.kapsi.fi Reported-by: Thomas Walther walther-it@gmx.de Fixes: 42404d8f1c01 ("net: mv643xx_eth: process retval from of_get_mac_address") Signed-off-by: Andrew Lunn andrew@lunn.ch Link: https://lore.kernel.org/r/20220405000404.3374734-1-andrew@lunn.ch Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/mv643xx_eth.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c index 1b61fe2e9b4d..90fd5588e20d 100644 --- a/drivers/net/ethernet/marvell/mv643xx_eth.c +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c @@ -2747,7 +2747,7 @@ static int mv643xx_eth_shared_of_add_port(struct platform_device *pdev, }
ret = of_get_mac_address(pnp, ppd.mac_addr); - if (ret) + if (ret == -EPROBE_DEFER) return ret;
mv643xx_eth_property(pnp, "tx-queue-size", ppd.tx_queue_size);
From: Ilya Maximets i.maximets@ovn.org
[ Upstream commit 1f30fb9166d4f15a1aa19449b9da871fe0ed4796 ]
While parsing user-provided actions, openvswitch module may dynamically allocate memory and store pointers in the internal copy of the actions. So this memory has to be freed while destroying the actions.
Currently there are only two such actions: ct() and set(). However, there are many actions that can hold nested lists of actions and ovs_nla_free_flow_actions() just jumps over them leaking the memory.
For example, removal of the flow with the following actions will lead to a leak of the memory allocated by nf_ct_tmpl_alloc():
actions:clone(ct(commit),0)
Non-freed set() action may also leak the 'dst' structure for the tunnel info including device references.
Under certain conditions with a high rate of flow rotation that may cause significant memory leak problem (2MB per second in reporter's case). The problem is also hard to mitigate, because the user doesn't have direct control over the datapath flows generated by OVS.
Fix that by iterating over all the nested actions and freeing everything that needs to be freed recursively.
New build time assertion should protect us from this problem if new actions will be added in the future.
Unfortunately, openvswitch module doesn't use NLA_F_NESTED, so all attributes has to be explicitly checked. sample() and clone() actions are mixing extra attributes into the user-provided action list. That prevents some code generalization too.
Fixes: 34ae932a4036 ("openvswitch: Make tunnel set action attach a metadata dst") Link: https://mail.openvswitch.org/pipermail/ovs-dev/2022-March/392922.html Reported-by: Stéphane Graber stgraber@ubuntu.com Signed-off-by: Ilya Maximets i.maximets@ovn.org Acked-by: Aaron Conole aconole@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/openvswitch/flow_netlink.c | 95 ++++++++++++++++++++++++++++++++-- 1 file changed, 90 insertions(+), 5 deletions(-)
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c index 2679007f8aeb..c591b923016a 100644 --- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -2288,6 +2288,62 @@ static struct sw_flow_actions *nla_alloc_flow_actions(int size) return sfa; }
+static void ovs_nla_free_nested_actions(const struct nlattr *actions, int len); + +static void ovs_nla_free_check_pkt_len_action(const struct nlattr *action) +{ + const struct nlattr *a; + int rem; + + nla_for_each_nested(a, action, rem) { + switch (nla_type(a)) { + case OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL: + case OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER: + ovs_nla_free_nested_actions(nla_data(a), nla_len(a)); + break; + } + } +} + +static void ovs_nla_free_clone_action(const struct nlattr *action) +{ + const struct nlattr *a = nla_data(action); + int rem = nla_len(action); + + switch (nla_type(a)) { + case OVS_CLONE_ATTR_EXEC: + /* The real list of actions follows this attribute. */ + a = nla_next(a, &rem); + ovs_nla_free_nested_actions(a, rem); + break; + } +} + +static void ovs_nla_free_dec_ttl_action(const struct nlattr *action) +{ + const struct nlattr *a = nla_data(action); + + switch (nla_type(a)) { + case OVS_DEC_TTL_ATTR_ACTION: + ovs_nla_free_nested_actions(nla_data(a), nla_len(a)); + break; + } +} + +static void ovs_nla_free_sample_action(const struct nlattr *action) +{ + const struct nlattr *a = nla_data(action); + int rem = nla_len(action); + + switch (nla_type(a)) { + case OVS_SAMPLE_ATTR_ARG: + /* The real list of actions follows this attribute. */ + a = nla_next(a, &rem); + ovs_nla_free_nested_actions(a, rem); + break; + } +} + static void ovs_nla_free_set_action(const struct nlattr *a) { const struct nlattr *ovs_key = nla_data(a); @@ -2301,25 +2357,54 @@ static void ovs_nla_free_set_action(const struct nlattr *a) } }
-void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts) +static void ovs_nla_free_nested_actions(const struct nlattr *actions, int len) { const struct nlattr *a; int rem;
- if (!sf_acts) + /* Whenever new actions are added, the need to update this + * function should be considered. + */ + BUILD_BUG_ON(OVS_ACTION_ATTR_MAX != 23); + + if (!actions) return;
- nla_for_each_attr(a, sf_acts->actions, sf_acts->actions_len, rem) { + nla_for_each_attr(a, actions, len, rem) { switch (nla_type(a)) { - case OVS_ACTION_ATTR_SET: - ovs_nla_free_set_action(a); + case OVS_ACTION_ATTR_CHECK_PKT_LEN: + ovs_nla_free_check_pkt_len_action(a); + break; + + case OVS_ACTION_ATTR_CLONE: + ovs_nla_free_clone_action(a); break; + case OVS_ACTION_ATTR_CT: ovs_ct_free_action(a); break; + + case OVS_ACTION_ATTR_DEC_TTL: + ovs_nla_free_dec_ttl_action(a); + break; + + case OVS_ACTION_ATTR_SAMPLE: + ovs_nla_free_sample_action(a); + break; + + case OVS_ACTION_ATTR_SET: + ovs_nla_free_set_action(a); + break; } } +} + +void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts) +{ + if (!sf_acts) + return;
+ ovs_nla_free_nested_actions(sf_acts->actions, sf_acts->actions_len); kfree(sf_acts); }
From: Eric Dumazet edumazet@google.com
[ Upstream commit 1946014ca3b19be9e485e780e862c375c6f98bad ]
Current code can lead to the following race:
CPU0 CPU1
rxrpc_exit_net() rxrpc_peer_keepalive_worker() if (rxnet->live)
rxnet->live = false; del_timer_sync(&rxnet->peer_keepalive_timer);
timer_reduce(&rxnet->peer_keepalive_timer, jiffies + delay);
cancel_work_sync(&rxnet->peer_keepalive_work);
rxrpc_exit_net() exits while peer_keepalive_timer is still armed, leading to use-after-free.
syzbot report was:
ODEBUG: free active (active state 0) object type: timer_list hint: rxrpc_peer_keepalive_timeout+0x0/0xb0 WARNING: CPU: 0 PID: 3660 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505 Modules linked in: CPU: 0 PID: 3660 Comm: kworker/u4:6 Not tainted 5.17.0-syzkaller-13993-g88e6c0207623 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505 Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd 00 1c 26 8a 4c 89 ee 48 c7 c7 00 10 26 8a e8 b1 e7 28 05 <0f> 0b 83 05 15 eb c5 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3 RSP: 0018:ffffc9000353fb00 EFLAGS: 00010082 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000 RDX: ffff888029196140 RSI: ffffffff815efad8 RDI: fffff520006a7f52 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815ea4ae R11: 0000000000000000 R12: ffffffff89ce23e0 R13: ffffffff8a2614e0 R14: ffffffff816628c0 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe1f2908924 CR3: 0000000043720000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __debug_check_no_obj_freed lib/debugobjects.c:992 [inline] debug_check_no_obj_freed+0x301/0x420 lib/debugobjects.c:1023 kfree+0xd6/0x310 mm/slab.c:3809 ops_free_list.part.0+0x119/0x370 net/core/net_namespace.c:176 ops_free_list net/core/net_namespace.c:174 [inline] cleanup_net+0x591/0xb00 net/core/net_namespace.c:598 process_one_work+0x996/0x1610 kernel/workqueue.c:2289 worker_thread+0x665/0x1080 kernel/workqueue.c:2436 kthread+0x2e9/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298 </TASK>
Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive") Signed-off-by: Eric Dumazet edumazet@google.com Cc: David Howells dhowells@redhat.com Cc: Marc Dionne marc.dionne@auristor.com Cc: linux-afs@lists.infradead.org Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/rxrpc/net_ns.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/rxrpc/net_ns.c b/net/rxrpc/net_ns.c index 25bbc4cc8b13..f15d6942da45 100644 --- a/net/rxrpc/net_ns.c +++ b/net/rxrpc/net_ns.c @@ -113,8 +113,8 @@ static __net_exit void rxrpc_exit_net(struct net *net) struct rxrpc_net *rxnet = rxrpc_net(net);
rxnet->live = false; - del_timer_sync(&rxnet->peer_keepalive_timer); cancel_work_sync(&rxnet->peer_keepalive_work); + del_timer_sync(&rxnet->peer_keepalive_timer); rxrpc_destroy_all_calls(rxnet); rxrpc_destroy_all_connections(rxnet); rxrpc_destroy_all_peers(rxnet);
From: Taehee Yoo ap420073@gmail.com
[ Upstream commit fb5833d81e4333294add35d3ac7f7f52a7bf107f ]
In some cases, xdp tx_queue can get used before initialization. 1. interface up/down 2. ring buffer size change
When CPU cores are lower than maximum number of channels of sfc driver, it creates new channels only for XDP.
When an interface is up or ring buffer size is changed, all channels are initialized. But xdp channels are always initialized later. So, the below scenario is possible. Packets are received to rx queue of normal channels and it is acted XDP_TX and tx_queue of xdp channels get used. But these tx_queues are not initialized yet. If so, TX DMA or queue error occurs.
In order to avoid this problem. 1. initializes xdp tx_queues earlier than other rx_queue in efx_start_channels(). 2. checks whether tx_queue is initialized or not in efx_xdp_tx_buffers().
Splat looks like: sfc 0000:08:00.1 enp8s0f1np1: TX queue 10 spurious TX completion id 250 sfc 0000:08:00.1 enp8s0f1np1: resetting (RECOVER_OR_ALL) sfc 0000:08:00.1 enp8s0f1np1: MC command 0x80 inlen 100 failed rc=-22 (raw=22) arg=789 sfc 0000:08:00.1 enp8s0f1np1: has been disabled
Fixes: f28100cb9c96 ("sfc: fix lack of XDP TX queues - error XDP TX failed (-22)") Acked-by: Martin Habets habetsm.xilinx@gmail.com Signed-off-by: Taehee Yoo ap420073@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/sfc/efx_channels.c | 2 +- drivers/net/ethernet/sfc/tx.c | 3 +++ drivers/net/ethernet/sfc/tx_common.c | 2 ++ 3 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c index 4753c0c5af10..1f8cfd806008 100644 --- a/drivers/net/ethernet/sfc/efx_channels.c +++ b/drivers/net/ethernet/sfc/efx_channels.c @@ -1123,7 +1123,7 @@ void efx_start_channels(struct efx_nic *efx) struct efx_rx_queue *rx_queue; struct efx_channel *channel;
- efx_for_each_channel(channel, efx) { + efx_for_each_channel_rev(channel, efx) { efx_for_each_channel_tx_queue(tx_queue, channel) { efx_init_tx_queue(tx_queue); atomic_inc(&efx->active_queues); diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c index d16e031e95f4..6983799e1c05 100644 --- a/drivers/net/ethernet/sfc/tx.c +++ b/drivers/net/ethernet/sfc/tx.c @@ -443,6 +443,9 @@ int efx_xdp_tx_buffers(struct efx_nic *efx, int n, struct xdp_frame **xdpfs, if (unlikely(!tx_queue)) return -EINVAL;
+ if (!tx_queue->initialised) + return -EINVAL; + if (efx->xdp_txq_queues_mode != EFX_XDP_TX_QUEUES_DEDICATED) HARD_TX_LOCK(efx->net_dev, tx_queue->core_txq, cpu);
diff --git a/drivers/net/ethernet/sfc/tx_common.c b/drivers/net/ethernet/sfc/tx_common.c index d530cde2b864..9bc8281b7f5b 100644 --- a/drivers/net/ethernet/sfc/tx_common.c +++ b/drivers/net/ethernet/sfc/tx_common.c @@ -101,6 +101,8 @@ void efx_fini_tx_queue(struct efx_tx_queue *tx_queue) netif_dbg(tx_queue->efx, drv, tx_queue->efx->net_dev, "shutting down TX queue %d\n", tx_queue->queue);
+ tx_queue->initialised = false; + if (!tx_queue->buffer) return;
From: Michael Walle michael@walle.cc
[ Upstream commit 8d90991e5bf7fdb9f264f5f579d18969913054b7 ]
The driver doesn't support clause 45 register access yet, but doesn't check if the access is a c45 one either. This leads to spurious register reads and writes. Add the check.
Fixes: 542671fe4d86 ("net: phy: mscc-miim: Add MDIO driver") Signed-off-by: Michael Walle michael@walle.cc Reviewed-by: Andrew Lunn andrew@lunn.ch Reviewed-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/mdio/mdio-mscc-miim.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/mdio/mdio-mscc-miim.c b/drivers/net/mdio/mdio-mscc-miim.c index 17f98f609ec8..5070ca2f2637 100644 --- a/drivers/net/mdio/mdio-mscc-miim.c +++ b/drivers/net/mdio/mdio-mscc-miim.c @@ -76,6 +76,9 @@ static int mscc_miim_read(struct mii_bus *bus, int mii_id, int regnum) u32 val; int ret;
+ if (regnum & MII_ADDR_C45) + return -EOPNOTSUPP; + ret = mscc_miim_wait_pending(bus); if (ret) goto out; @@ -105,6 +108,9 @@ static int mscc_miim_write(struct mii_bus *bus, int mii_id, struct mscc_miim_dev *miim = bus->priv; int ret;
+ if (regnum & MII_ADDR_C45) + return -EOPNOTSUPP; + ret = mscc_miim_wait_pending(bus); if (ret < 0) goto out;
From: Jamie Bainbridge jamie.bainbridge@gmail.com
[ Upstream commit 4e910dbe36508654a896d5735b318c0b88172570 ]
qede_build_skb() assumes build_skb() always works and goes straight to skb_reserve(). However, build_skb() can fail under memory pressure. This results in a kernel panic because the skb to reserve is NULL.
Add a check in case build_skb() failed to allocate and return NULL.
The NULL return is handled correctly in callers to qede_build_skb().
Fixes: 8a8633978b842 ("qede: Add build_skb() support.") Signed-off-by: Jamie Bainbridge jamie.bainbridge@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/qlogic/qede/qede_fp.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/qlogic/qede/qede_fp.c b/drivers/net/ethernet/qlogic/qede/qede_fp.c index 999abcfe3310..17f895250e04 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_fp.c +++ b/drivers/net/ethernet/qlogic/qede/qede_fp.c @@ -747,6 +747,9 @@ qede_build_skb(struct qede_rx_queue *rxq, buf = page_address(bd->data) + bd->page_offset; skb = build_skb(buf, rxq->rx_buf_seg_size);
+ if (unlikely(!skb)) + return NULL; + skb_reserve(skb, pad); skb_put(skb, len);
From: Kamal Dasu kdasu.kdev@gmail.com
[ Upstream commit 2c7d1b281286c46049cd22b43435cecba560edde ]
This fixes case where MSPI controller is used to access spi-nor flash and BSPI block is not present.
Fixes: 5f195ee7d830 ("spi: bcm-qspi: Implement the spi_mem interface") Signed-off-by: Kamal Dasu kdasu.kdev@gmail.com Acked-by: Florian Fainelli f.fainelli@gmail.com Link: https://lore.kernel.org/r/20220328142442.7553-1-kdasu.kdev@gmail.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-bcm-qspi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/spi-bcm-qspi.c b/drivers/spi/spi-bcm-qspi.c index ae8c86be7786..bd7c7fc73961 100644 --- a/drivers/spi/spi-bcm-qspi.c +++ b/drivers/spi/spi-bcm-qspi.c @@ -1033,7 +1033,7 @@ static int bcm_qspi_exec_mem_op(struct spi_mem *mem, addr = op->addr.val; len = op->data.nbytes;
- if (bcm_qspi_bspi_ver_three(qspi) == true) { + if (has_bspi(qspi) && bcm_qspi_bspi_ver_three(qspi) == true) { /* * The address coming into this function is a raw flash offset. * But for BSPI <= V3, we need to convert it to a remapped BSPI @@ -1052,7 +1052,7 @@ static int bcm_qspi_exec_mem_op(struct spi_mem *mem, len < 4) mspi_read = true;
- if (mspi_read) + if (!has_bspi(qspi) || mspi_read) return bcm_qspi_mspi_exec_mem_op(spi, op);
ret = bcm_qspi_bspi_set_mode(qspi, op, 0);
From: Maxim Mikityanskiy maximmi@nvidia.com
[ Upstream commit 2e8702cc0cfa1080f29fd64003c00a3e24ac38de ]
bpf_tcp_gen_syncookie looks at the IP version in the IP header and validates the address family of the socket. It supports IPv4 packets in AF_INET6 dual-stack sockets.
On the other hand, bpf_tcp_check_syncookie looks only at the address family of the socket, ignoring the real IP version in headers, and validates only the packet size. This implementation has some drawbacks:
1. Packets are not validated properly, allowing a BPF program to trick bpf_tcp_check_syncookie into handling an IPv6 packet on an IPv4 socket.
2. Dual-stack sockets fail the checks on IPv4 packets. IPv4 clients end up receiving a SYNACK with the cookie, but the following ACK gets dropped.
This patch fixes these issues by changing the checks in bpf_tcp_check_syncookie to match the ones in bpf_tcp_gen_syncookie. IP version from the header is taken into account, and it is validated properly with address family.
Fixes: 399040847084 ("bpf: add helper to check for a valid SYN cookie") Signed-off-by: Maxim Mikityanskiy maximmi@nvidia.com Signed-off-by: Alexei Starovoitov ast@kernel.org Reviewed-by: Tariq Toukan tariqt@nvidia.com Acked-by: Arthur Fabre afabre@cloudflare.com Link: https://lore.kernel.org/bpf/20220406124113.2795730-1-maximmi@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/filter.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c index a65de7ac60aa..fbde862e3e82 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -6719,24 +6719,33 @@ BPF_CALL_5(bpf_tcp_check_syncookie, struct sock *, sk, void *, iph, u32, iph_len if (!th->ack || th->rst || th->syn) return -ENOENT;
+ if (unlikely(iph_len < sizeof(struct iphdr))) + return -EINVAL; + if (tcp_synq_no_recent_overflow(sk)) return -ENOENT;
cookie = ntohl(th->ack_seq) - 1;
- switch (sk->sk_family) { - case AF_INET: - if (unlikely(iph_len < sizeof(struct iphdr))) + /* Both struct iphdr and struct ipv6hdr have the version field at the + * same offset so we can cast to the shorter header (struct iphdr). + */ + switch (((struct iphdr *)iph)->version) { + case 4: + if (sk->sk_family == AF_INET6 && ipv6_only_sock(sk)) return -EINVAL;
ret = __cookie_v4_check((struct iphdr *)iph, th, cookie); break;
#if IS_BUILTIN(CONFIG_IPV6) - case AF_INET6: + case 6: if (unlikely(iph_len < sizeof(struct ipv6hdr))) return -EINVAL;
+ if (sk->sk_family != AF_INET6) + return -EINVAL; + ret = __cookie_v6_check((struct ipv6hdr *)iph, th, cookie); break; #endif /* CONFIG_IPV6 */
From: Lv Yunlong lyl2019@mail.ustc.edu.cn
[ Upstream commit aadb22ba2f656581b2f733deb3a467c48cc618f6 ]
In get_initial_state, it calls notify_initial_state_done(skb,..) if cb->args[5]==1. If genlmsg_put() failed in notify_initial_state_done(), the skb will be freed by nlmsg_free(skb). Then get_initial_state will goto out and the freed skb will be used by return value skb->len, which is a uaf bug.
What's worse, the same problem goes even further: skb can also be freed in the notify_*_state_change -> notify_*_state calls below. Thus 4 additional uaf bugs happened.
My patch lets the problem callee functions: notify_initial_state_done and notify_*_state_change return an error code if errors happen. So that the error codes could be propagated and the uaf bugs can be avoid.
v2 reports a compilation warning. This v3 fixed this warning and built successfully in my local environment with no additional warnings. v2: https://lore.kernel.org/patchwork/patch/1435218/
Fixes: a29728463b254 ("drbd: Backport the "events2" command") Signed-off-by: Lv Yunlong lyl2019@mail.ustc.edu.cn Reviewed-by: Christoph Böhmwalder christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/block/drbd/drbd_int.h | 8 ++--- drivers/block/drbd/drbd_nl.c | 41 ++++++++++++++++---------- drivers/block/drbd/drbd_state.c | 18 +++++------ drivers/block/drbd/drbd_state_change.h | 8 ++--- 4 files changed, 42 insertions(+), 33 deletions(-)
diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h index 5d9181382ce1..0a5766a2f161 100644 --- a/drivers/block/drbd/drbd_int.h +++ b/drivers/block/drbd/drbd_int.h @@ -1642,22 +1642,22 @@ struct sib_info { }; void drbd_bcast_event(struct drbd_device *device, const struct sib_info *sib);
-extern void notify_resource_state(struct sk_buff *, +extern int notify_resource_state(struct sk_buff *, unsigned int, struct drbd_resource *, struct resource_info *, enum drbd_notification_type); -extern void notify_device_state(struct sk_buff *, +extern int notify_device_state(struct sk_buff *, unsigned int, struct drbd_device *, struct device_info *, enum drbd_notification_type); -extern void notify_connection_state(struct sk_buff *, +extern int notify_connection_state(struct sk_buff *, unsigned int, struct drbd_connection *, struct connection_info *, enum drbd_notification_type); -extern void notify_peer_device_state(struct sk_buff *, +extern int notify_peer_device_state(struct sk_buff *, unsigned int, struct drbd_peer_device *, struct peer_device_info *, diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c index 44ccf8b4f4b2..69184cf17b6a 100644 --- a/drivers/block/drbd/drbd_nl.c +++ b/drivers/block/drbd/drbd_nl.c @@ -4617,7 +4617,7 @@ static int nla_put_notification_header(struct sk_buff *msg, return drbd_notification_header_to_skb(msg, &nh, true); }
-void notify_resource_state(struct sk_buff *skb, +int notify_resource_state(struct sk_buff *skb, unsigned int seq, struct drbd_resource *resource, struct resource_info *resource_info, @@ -4659,16 +4659,17 @@ void notify_resource_state(struct sk_buff *skb, if (err && err != -ESRCH) goto failed; } - return; + return 0;
nla_put_failure: nlmsg_free(skb); failed: drbd_err(resource, "Error %d while broadcasting event. Event seq:%u\n", err, seq); + return err; }
-void notify_device_state(struct sk_buff *skb, +int notify_device_state(struct sk_buff *skb, unsigned int seq, struct drbd_device *device, struct device_info *device_info, @@ -4708,16 +4709,17 @@ void notify_device_state(struct sk_buff *skb, if (err && err != -ESRCH) goto failed; } - return; + return 0;
nla_put_failure: nlmsg_free(skb); failed: drbd_err(device, "Error %d while broadcasting event. Event seq:%u\n", err, seq); + return err; }
-void notify_connection_state(struct sk_buff *skb, +int notify_connection_state(struct sk_buff *skb, unsigned int seq, struct drbd_connection *connection, struct connection_info *connection_info, @@ -4757,16 +4759,17 @@ void notify_connection_state(struct sk_buff *skb, if (err && err != -ESRCH) goto failed; } - return; + return 0;
nla_put_failure: nlmsg_free(skb); failed: drbd_err(connection, "Error %d while broadcasting event. Event seq:%u\n", err, seq); + return err; }
-void notify_peer_device_state(struct sk_buff *skb, +int notify_peer_device_state(struct sk_buff *skb, unsigned int seq, struct drbd_peer_device *peer_device, struct peer_device_info *peer_device_info, @@ -4807,13 +4810,14 @@ void notify_peer_device_state(struct sk_buff *skb, if (err && err != -ESRCH) goto failed; } - return; + return 0;
nla_put_failure: nlmsg_free(skb); failed: drbd_err(peer_device, "Error %d while broadcasting event. Event seq:%u\n", err, seq); + return err; }
void notify_helper(enum drbd_notification_type type, @@ -4864,7 +4868,7 @@ void notify_helper(enum drbd_notification_type type, err, seq); }
-static void notify_initial_state_done(struct sk_buff *skb, unsigned int seq) +static int notify_initial_state_done(struct sk_buff *skb, unsigned int seq) { struct drbd_genlmsghdr *dh; int err; @@ -4878,11 +4882,12 @@ static void notify_initial_state_done(struct sk_buff *skb, unsigned int seq) if (nla_put_notification_header(skb, NOTIFY_EXISTS)) goto nla_put_failure; genlmsg_end(skb, dh); - return; + return 0;
nla_put_failure: nlmsg_free(skb); pr_err("Error %d sending event. Event seq:%u\n", err, seq); + return err; }
static void free_state_changes(struct list_head *list) @@ -4909,6 +4914,7 @@ static int get_initial_state(struct sk_buff *skb, struct netlink_callback *cb) unsigned int seq = cb->args[2]; unsigned int n; enum drbd_notification_type flags = 0; + int err = 0;
/* There is no need for taking notification_mutex here: it doesn't matter if the initial state events mix with later state chage @@ -4917,32 +4923,32 @@ static int get_initial_state(struct sk_buff *skb, struct netlink_callback *cb)
cb->args[5]--; if (cb->args[5] == 1) { - notify_initial_state_done(skb, seq); + err = notify_initial_state_done(skb, seq); goto out; } n = cb->args[4]++; if (cb->args[4] < cb->args[3]) flags |= NOTIFY_CONTINUES; if (n < 1) { - notify_resource_state_change(skb, seq, state_change->resource, + err = notify_resource_state_change(skb, seq, state_change->resource, NOTIFY_EXISTS | flags); goto next; } n--; if (n < state_change->n_connections) { - notify_connection_state_change(skb, seq, &state_change->connections[n], + err = notify_connection_state_change(skb, seq, &state_change->connections[n], NOTIFY_EXISTS | flags); goto next; } n -= state_change->n_connections; if (n < state_change->n_devices) { - notify_device_state_change(skb, seq, &state_change->devices[n], + err = notify_device_state_change(skb, seq, &state_change->devices[n], NOTIFY_EXISTS | flags); goto next; } n -= state_change->n_devices; if (n < state_change->n_devices * state_change->n_connections) { - notify_peer_device_state_change(skb, seq, &state_change->peer_devices[n], + err = notify_peer_device_state_change(skb, seq, &state_change->peer_devices[n], NOTIFY_EXISTS | flags); goto next; } @@ -4957,7 +4963,10 @@ static int get_initial_state(struct sk_buff *skb, struct netlink_callback *cb) cb->args[4] = 0; } out: - return skb->len; + if (err) + return err; + else + return skb->len; }
int drbd_adm_get_initial_state(struct sk_buff *skb, struct netlink_callback *cb) diff --git a/drivers/block/drbd/drbd_state.c b/drivers/block/drbd/drbd_state.c index b8a27818ab3f..4ee11aef6672 100644 --- a/drivers/block/drbd/drbd_state.c +++ b/drivers/block/drbd/drbd_state.c @@ -1537,7 +1537,7 @@ int drbd_bitmap_io_from_worker(struct drbd_device *device, return rv; }
-void notify_resource_state_change(struct sk_buff *skb, +int notify_resource_state_change(struct sk_buff *skb, unsigned int seq, struct drbd_resource_state_change *resource_state_change, enum drbd_notification_type type) @@ -1550,10 +1550,10 @@ void notify_resource_state_change(struct sk_buff *skb, .res_susp_fen = resource_state_change->susp_fen[NEW], };
- notify_resource_state(skb, seq, resource, &resource_info, type); + return notify_resource_state(skb, seq, resource, &resource_info, type); }
-void notify_connection_state_change(struct sk_buff *skb, +int notify_connection_state_change(struct sk_buff *skb, unsigned int seq, struct drbd_connection_state_change *connection_state_change, enum drbd_notification_type type) @@ -1564,10 +1564,10 @@ void notify_connection_state_change(struct sk_buff *skb, .conn_role = connection_state_change->peer_role[NEW], };
- notify_connection_state(skb, seq, connection, &connection_info, type); + return notify_connection_state(skb, seq, connection, &connection_info, type); }
-void notify_device_state_change(struct sk_buff *skb, +int notify_device_state_change(struct sk_buff *skb, unsigned int seq, struct drbd_device_state_change *device_state_change, enum drbd_notification_type type) @@ -1577,10 +1577,10 @@ void notify_device_state_change(struct sk_buff *skb, .dev_disk_state = device_state_change->disk_state[NEW], };
- notify_device_state(skb, seq, device, &device_info, type); + return notify_device_state(skb, seq, device, &device_info, type); }
-void notify_peer_device_state_change(struct sk_buff *skb, +int notify_peer_device_state_change(struct sk_buff *skb, unsigned int seq, struct drbd_peer_device_state_change *p, enum drbd_notification_type type) @@ -1594,7 +1594,7 @@ void notify_peer_device_state_change(struct sk_buff *skb, .peer_resync_susp_dependency = p->resync_susp_dependency[NEW], };
- notify_peer_device_state(skb, seq, peer_device, &peer_device_info, type); + return notify_peer_device_state(skb, seq, peer_device, &peer_device_info, type); }
static void broadcast_state_change(struct drbd_state_change *state_change) @@ -1602,7 +1602,7 @@ static void broadcast_state_change(struct drbd_state_change *state_change) struct drbd_resource_state_change *resource_state_change = &state_change->resource[0]; bool resource_state_has_changed; unsigned int n_device, n_connection, n_peer_device, n_peer_devices; - void (*last_func)(struct sk_buff *, unsigned int, void *, + int (*last_func)(struct sk_buff *, unsigned int, void *, enum drbd_notification_type) = NULL; void *last_arg = NULL;
diff --git a/drivers/block/drbd/drbd_state_change.h b/drivers/block/drbd/drbd_state_change.h index ba80f612d6ab..d5b0479bc9a6 100644 --- a/drivers/block/drbd/drbd_state_change.h +++ b/drivers/block/drbd/drbd_state_change.h @@ -44,19 +44,19 @@ extern struct drbd_state_change *remember_old_state(struct drbd_resource *, gfp_ extern void copy_old_to_new_state_change(struct drbd_state_change *); extern void forget_state_change(struct drbd_state_change *);
-extern void notify_resource_state_change(struct sk_buff *, +extern int notify_resource_state_change(struct sk_buff *, unsigned int, struct drbd_resource_state_change *, enum drbd_notification_type type); -extern void notify_connection_state_change(struct sk_buff *, +extern int notify_connection_state_change(struct sk_buff *, unsigned int, struct drbd_connection_state_change *, enum drbd_notification_type type); -extern void notify_device_state_change(struct sk_buff *, +extern int notify_device_state_change(struct sk_buff *, unsigned int, struct drbd_device_state_change *, enum drbd_notification_type type); -extern void notify_peer_device_state_change(struct sk_buff *, +extern int notify_peer_device_state_change(struct sk_buff *, unsigned int, struct drbd_peer_device_state_change *, enum drbd_notification_type type);
From: Xiaomeng Tong xiam0nd.tong@gmail.com
[ Upstream commit bfb7789bcbd901caead43861461bc8f334c90d3b ]
The list iterator is always non-NULL so the check 'if (!rgn)' is always false and the dev_err() is never called. Move the check outside the loop and determine if 'victim_rgn' is NULL, to fix this bug.
Link: https://lore.kernel.org/r/20220320150733.21824-1-xiam0nd.tong@gmail.com Fixes: 4b5f49079c52 ("scsi: ufs: ufshpb: L2P map management for HPB read") Reviewed-by: Daejun Park daejun7.park@samsung.com Signed-off-by: Xiaomeng Tong xiam0nd.tong@gmail.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/ufs/ufshpb.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c index a86d0cc50de2..f7eaf64293a4 100644 --- a/drivers/scsi/ufs/ufshpb.c +++ b/drivers/scsi/ufs/ufshpb.c @@ -870,12 +870,6 @@ static struct ufshpb_region *ufshpb_victim_lru_info(struct ufshpb_lu *hpb) struct ufshpb_region *rgn, *victim_rgn = NULL;
list_for_each_entry(rgn, &lru_info->lh_lru_rgn, list_lru_rgn) { - if (!rgn) { - dev_err(&hpb->sdev_ufs_lu->sdev_dev, - "%s: no region allocated\n", - __func__); - return NULL; - } if (ufshpb_check_srgns_issue_state(hpb, rgn)) continue;
@@ -891,6 +885,11 @@ static struct ufshpb_region *ufshpb_victim_lru_info(struct ufshpb_lu *hpb) break; }
+ if (!victim_rgn) + dev_err(&hpb->sdev_ufs_lu->sdev_dev, + "%s: no region allocated\n", + __func__); + return victim_rgn; }
From: Pavel Begunkov asml.silence@gmail.com
[ Upstream commit 34bb77184123ae401100a4d156584f12fa630e5c ]
Don't forget to array_index_nospec() for indexes before updating rsrc tags in __io_sqe_files_update(), just use already safe and precalculated index @i.
Fixes: c3bdad0271834 ("io_uring: add generic rsrc update with tags") Signed-off-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io_uring.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c index 5fc3a62eae72..21823d1e91de 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -8589,7 +8589,7 @@ static int __io_sqe_files_update(struct io_ring_ctx *ctx, err = -EBADF; break; } - *io_get_tag_slot(data, up->offset + done) = tag; + *io_get_tag_slot(data, i) = tag; io_fixed_file_set(file_slot, file); err = io_sqe_file_register(ctx, file, i); if (err) {
From: Pavel Begunkov asml.silence@gmail.com
[ Upstream commit a07211e3001435fe8591b992464cd8d5e3c98c5a ]
It's safer to not touch scm_fp_list after we queued an skb to which it was assigned, there might be races lurking if we screw subtle sync guarantees on the io_uring side.
Fixes: 6b06314c47e14 ("io_uring: add file set registration") Signed-off-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io_uring.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c index 21823d1e91de..2faa558778ba 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -8126,8 +8126,12 @@ static int __io_sqe_files_scm(struct io_ring_ctx *ctx, int nr, int offset) refcount_add(skb->truesize, &sk->sk_wmem_alloc); skb_queue_head(&sk->sk_receive_queue, skb);
- for (i = 0; i < nr_files; i++) - fput(fpl->fp[i]); + for (i = 0; i < nr; i++) { + struct file *file = io_file_from_index(ctx, i + offset); + + if (file) + fput(file); + } } else { kfree_skb(skb); free_uid(fpl->user);
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit d3c15033b240767d0287f1c4a529cbbe2d5ded8a ]
Both call_transmit() and call_bc_transmit() can now return ENOMEM, so let's make sure that we handle the errors gracefully.
Fixes: 0472e4766049 ("SUNRPC: Convert socket page send code to use iov_iter()") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/clnt.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 5d5627bf3b18..9a183c254c84 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -2202,6 +2202,7 @@ call_transmit_status(struct rpc_task *task) * socket just returned a connection error, * then hold onto the transport lock. */ + case -ENOMEM: case -ENOBUFS: rpc_delay(task, HZ>>2); fallthrough; @@ -2285,6 +2286,7 @@ call_bc_transmit_status(struct rpc_task *task) case -ENOTCONN: case -EPIPE: break; + case -ENOMEM: case -ENOBUFS: rpc_delay(task, HZ>>2); fallthrough;
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 9d82819d5b065348ce623f196bf601028e22ed00 ]
We need to handle ENFILE, ENOBUFS, and ENOMEM, because xprt_wake_pending_tasks() can be called with any one of these due to socket creation failures.
Fixes: b61d59fffd3e ("SUNRPC: xs_tcp_connect_worker{4,6}: merge common code") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/clnt.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 9a183c254c84..3286add1a958 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -2369,6 +2369,11 @@ call_status(struct rpc_task *task) case -EPIPE: case -EAGAIN: break; + case -ENFILE: + case -ENOBUFS: + case -ENOMEM: + rpc_delay(task, HZ>>2); + break; case -EIO: /* shutdown or soft timeout */ goto out_exit;
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit b056fa070814897be32d83b079dbc311375588e7 ]
The allocation is done with GFP_KERNEL, but it could still fail in a low memory situation.
Fixes: 4a85a6a3320b ("SUNRPC: Handle TCP socket sends with kernel_sendpage() again") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/svcsock.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 478f857cdaed..6ea3d87e1147 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1096,7 +1096,9 @@ static int svc_tcp_sendmsg(struct socket *sock, struct xdr_buf *xdr, int ret;
*sentp = 0; - xdr_alloc_bvec(xdr, GFP_KERNEL); + ret = xdr_alloc_bvec(xdr, GFP_KERNEL); + if (ret < 0) + return ret;
ret = kernel_sendmsg(sock, &msg, &rm, 1, rm.iov_len); if (ret < 0)
From: Tony Lindgren tony@atomide.com
[ Upstream commit 71ff461c3f41f6465434b9e980c01782763e7ad8 ]
Commit 3f6634d997db ("iommu: Use right way to retrieve iommu_ops") started triggering a NULL pointer dereference for some omap variants:
__iommu_probe_device from probe_iommu_group+0x2c/0x38 probe_iommu_group from bus_for_each_dev+0x74/0xbc bus_for_each_dev from bus_iommu_probe+0x34/0x2e8 bus_iommu_probe from bus_set_iommu+0x80/0xc8 bus_set_iommu from omap_iommu_init+0x88/0xcc omap_iommu_init from do_one_initcall+0x44/0x24
This is caused by omap iommu probe returning 0 instead of ERR_PTR(-ENODEV) as noted by Jason Gunthorpe jgg@ziepe.ca.
Looks like the regression already happened with an earlier commit 6785eb9105e3 ("iommu/omap: Convert to probe/release_device() call-backs") that changed the function return type and missed converting one place.
Cc: Drew Fustini dfustini@baylibre.com Cc: Lu Baolu baolu.lu@linux.intel.com Cc: Suman Anna s-anna@ti.com Suggested-by: Jason Gunthorpe jgg@ziepe.ca Fixes: 6785eb9105e3 ("iommu/omap: Convert to probe/release_device() call-backs") Fixes: 3f6634d997db ("iommu: Use right way to retrieve iommu_ops") Signed-off-by: Tony Lindgren tony@atomide.com Tested-by: Drew Fustini dfustini@baylibre.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/20220331062301.24269-1-tony@atomide.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/omap-iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c index 91749654fd49..be60f6f3a265 100644 --- a/drivers/iommu/omap-iommu.c +++ b/drivers/iommu/omap-iommu.c @@ -1661,7 +1661,7 @@ static struct iommu_device *omap_iommu_probe_device(struct device *dev) num_iommus = of_property_count_elems_of_size(dev->of_node, "iommus", sizeof(phandle)); if (num_iommus < 0) - return 0; + return ERR_PTR(-ENODEV);
arch_data = kcalloc(num_iommus + 1, sizeof(*arch_data), GFP_KERNEL); if (!arch_data)
From: James Clark james.clark@arm.com
[ Upstream commit ffab487052054162b3b6c9c6005777ec6cfcea05 ]
Since commit bb30acae4c4dacfa ("perf report: Bail out --mem-mode if mem info is not available") "perf mem report" and "perf report --mem-mode" don't allow opening the file unless one of the events has PERF_SAMPLE_DATA_SRC set.
SPE doesn't have this set even though synthetic memory data is generated after it is decoded. Fix this issue by setting DATA_SRC on SPE events. This has no effect on the data collected because the SPE driver doesn't do anything with that flag and doesn't generate samples.
Fixes: bb30acae4c4dacfa ("perf report: Bail out --mem-mode if mem info is not available") Signed-off-by: James Clark james.clark@arm.com Tested-by: Leo Yan leo.yan@linaro.org Acked-by: Namhyung Kim namhyung@kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: German Gomez german.gomez@arm.com Cc: Jiri Olsa jolsa@kernel.org Cc: John Garry john.garry@huawei.com Cc: Leo Yan leo.yan@linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Ravi Bangoria ravi.bangoria@linux.ibm.com Cc: Will Deacon will@kernel.org Link: https://lore.kernel.org/r/20220408144056.1955535-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/arch/arm64/util/arm-spe.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c index a4420d4df503..7d589a705fc8 100644 --- a/tools/perf/arch/arm64/util/arm-spe.c +++ b/tools/perf/arch/arm64/util/arm-spe.c @@ -154,6 +154,12 @@ static int arm_spe_recording_options(struct auxtrace_record *itr, arm_spe_set_timestamp(itr, arm_spe_evsel); }
+ /* + * Set this only so that perf report knows that SPE generates memory info. It has no effect + * on the opening of the event or the SPE data produced. + */ + evsel__set_sample_bit(arm_spe_evsel, DATA_SRC); + /* Add dummy event to keep tracking */ err = parse_events(evlist, "dummy:u", NULL); if (err)
From: Adrian Hunter adrian.hunter@intel.com
[ Upstream commit aeee9dc53ce405d2161f9915f553114e94e5b677 ]
eprintf() does not expect va_list as the type of the 4th parameter.
Use veprintf() because it does.
Signed-off-by: Adrian Hunter adrian.hunter@intel.com Fixes: 428dab813a56ce94 ("libperf: Merge libperf_set_print() into libperf_init()") Cc: Jiri Olsa jolsa@kernel.org Link: https://lore.kernel.org/r/20220408132625.2451452-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/perf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/perf.c b/tools/perf/perf.c index 2f6b67189b42..6aae7b6c376b 100644 --- a/tools/perf/perf.c +++ b/tools/perf/perf.c @@ -434,7 +434,7 @@ void pthread__unblock_sigwinch(void) static int libperf_print(enum libperf_print_level level, const char *fmt, va_list ap) { - return eprintf(level, verbose, fmt, ap); + return veprintf(level, verbose, fmt, ap); }
int main(int argc, const char **argv)
From: Denis Nikitin denik@chromium.org
[ Upstream commit bc21e74d4775f883ae1f542c1f1dc7205b15d925 ]
If a perf event doesn't fit into remaining buffer space return NULL to remap buf and fetch the event again.
Keep the logic to error out on inadequate input from fuzzing.
This fixes perf failing on ChromeOS (with 32b userspace):
$ perf report -v -i perf.data ... prefetch_event: head=0x1fffff8 event->header_size=0x30, mmap_size=0x2000000: fuzzed or compressed perf.data? Error: failed to process sample
Fixes: 57fc032ad643ffd0 ("perf session: Avoid infinite loop when seeing invalid header.size") Reviewed-by: James Clark james.clark@arm.com Signed-off-by: Denis Nikitin denik@chromium.org Acked-by: Jiri Olsa jolsa@kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Alexey Budankov alexey.budankov@linux.intel.com Cc: Namhyung Kim namhyung@kernel.org Link: https://lore.kernel.org/r/20220330031130.2152327-1-denik@chromium.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/session.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 352f16076e01..562e9b808027 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -2076,6 +2076,7 @@ prefetch_event(char *buf, u64 head, size_t mmap_size, bool needs_swap, union perf_event *error) { union perf_event *event; + u16 event_size;
/* * Ensure we have enough space remaining to read @@ -2088,15 +2089,23 @@ prefetch_event(char *buf, u64 head, size_t mmap_size, if (needs_swap) perf_event_header__bswap(&event->header);
- if (head + event->header.size <= mmap_size) + event_size = event->header.size; + if (head + event_size <= mmap_size) return event;
/* We're not fetching the event so swap back again */ if (needs_swap) perf_event_header__bswap(&event->header);
- pr_debug("%s: head=%#" PRIx64 " event->header_size=%#x, mmap_size=%#zx:" - " fuzzed or compressed perf.data?\n",__func__, head, event->header.size, mmap_size); + /* Check if the event fits into the next mmapped buf. */ + if (event_size <= mmap_size - head % page_size) { + /* Remap buf and fetch again. */ + return NULL; + } + + /* Invalid input. Event size should never exceed mmap_size. */ + pr_debug("%s: head=%#" PRIx64 " event->header.size=%#x, mmap_size=%#zx:" + " fuzzed or compressed perf.data?\n", __func__, head, event_size, mmap_size);
return error; }
From: Chanho Park chanho61.park@samsung.com
commit 83bea32ac7ed37bbda58733de61fc9369513f9f9 upstream.
Add the MIDR part number info for the Arm Cortex-A78AE[1] and add it to spectre-BHB affected list[2].
[1]: https://developer.arm.com/Processors/Cortex-A78AE [2]: https://developer.arm.com/Arm%20Security%20Center/Spectre-BHB
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Mark Rutland mark.rutland@arm.com Cc: Will Deacon will@kernel.org Cc: James Morse james.morse@arm.com Signed-off-by: Chanho Park chanho61.park@samsung.com Link: https://lore.kernel.org/r/20220407091128.8700-1-chanho61.park@samsung.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/cputype.h | 2 ++ arch/arm64/kernel/proton-pack.c | 1 + 2 files changed, 3 insertions(+)
--- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -75,6 +75,7 @@ #define ARM_CPU_PART_CORTEX_A77 0xD0D #define ARM_CPU_PART_NEOVERSE_V1 0xD40 #define ARM_CPU_PART_CORTEX_A78 0xD41 +#define ARM_CPU_PART_CORTEX_A78AE 0xD42 #define ARM_CPU_PART_CORTEX_X1 0xD44 #define ARM_CPU_PART_CORTEX_A510 0xD46 #define ARM_CPU_PART_CORTEX_A710 0xD47 @@ -123,6 +124,7 @@ #define MIDR_CORTEX_A77 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A77) #define MIDR_NEOVERSE_V1 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_NEOVERSE_V1) #define MIDR_CORTEX_A78 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A78) +#define MIDR_CORTEX_A78AE MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A78AE) #define MIDR_CORTEX_X1 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_X1) #define MIDR_CORTEX_A510 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A510) #define MIDR_CORTEX_A710 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A710) --- a/arch/arm64/kernel/proton-pack.c +++ b/arch/arm64/kernel/proton-pack.c @@ -853,6 +853,7 @@ u8 spectre_bhb_loop_affected(int scope) if (scope == SCOPE_LOCAL_CPU) { static const struct midr_range spectre_bhb_k32_list[] = { MIDR_ALL_VERSIONS(MIDR_CORTEX_A78), + MIDR_ALL_VERSIONS(MIDR_CORTEX_A78AE), MIDR_ALL_VERSIONS(MIDR_CORTEX_A78C), MIDR_ALL_VERSIONS(MIDR_CORTEX_X1), MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
From: Damien Le Moal damien.lemoal@opensource.wdc.com
commit 87d663d40801dffc99a5ad3b0188ad3e2b4d1557 upstream.
The function mpt3sas_transport_port_remove() called in _scsih_expander_node_remove() frees the port field of the sas_expander structure, leading to the following use-after-free splat from KASAN when the ioc_info() call following that function is executed (e.g. when doing rmmod of the driver module):
[ 3479.371167] ================================================================== [ 3479.378496] BUG: KASAN: use-after-free in _scsih_expander_node_remove+0x710/0x750 [mpt3sas] [ 3479.386936] Read of size 1 at addr ffff8881c037691c by task rmmod/1531 [ 3479.393524] [ 3479.395035] CPU: 18 PID: 1531 Comm: rmmod Not tainted 5.17.0-rc8+ #1436 [ 3479.401712] Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.1 06/02/2021 [ 3479.409263] Call Trace: [ 3479.411743] <TASK> [ 3479.413875] dump_stack_lvl+0x45/0x59 [ 3479.417582] print_address_description.constprop.0+0x1f/0x120 [ 3479.423389] ? _scsih_expander_node_remove+0x710/0x750 [mpt3sas] [ 3479.429469] kasan_report.cold+0x83/0xdf [ 3479.433438] ? _scsih_expander_node_remove+0x710/0x750 [mpt3sas] [ 3479.439514] _scsih_expander_node_remove+0x710/0x750 [mpt3sas] [ 3479.445411] ? _raw_spin_unlock_irqrestore+0x2d/0x40 [ 3479.452032] scsih_remove+0x525/0xc90 [mpt3sas] [ 3479.458212] ? mpt3sas_expander_remove+0x1d0/0x1d0 [mpt3sas] [ 3479.465529] ? down_write+0xde/0x150 [ 3479.470746] ? up_write+0x14d/0x460 [ 3479.475840] ? kernfs_find_ns+0x137/0x310 [ 3479.481438] pci_device_remove+0x65/0x110 [ 3479.487013] __device_release_driver+0x316/0x680 [ 3479.493180] driver_detach+0x1ec/0x2d0 [ 3479.498499] bus_remove_driver+0xe7/0x2d0 [ 3479.504081] pci_unregister_driver+0x26/0x250 [ 3479.510033] _mpt3sas_exit+0x2b/0x6cf [mpt3sas] [ 3479.516144] __x64_sys_delete_module+0x2fd/0x510 [ 3479.522315] ? free_module+0xaa0/0xaa0 [ 3479.527593] ? __cond_resched+0x1c/0x90 [ 3479.532951] ? lockdep_hardirqs_on_prepare+0x273/0x3e0 [ 3479.539607] ? syscall_enter_from_user_mode+0x21/0x70 [ 3479.546161] ? trace_hardirqs_on+0x1c/0x110 [ 3479.551828] do_syscall_64+0x35/0x80 [ 3479.556884] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3479.563402] RIP: 0033:0x7f1fc482483b ... [ 3479.943087] ==================================================================
Fix this by introducing the local variable port_id to store the port ID value before executing mpt3sas_transport_port_remove(). This local variable is then used in the call to ioc_info() instead of dereferencing the freed port structure.
Link: https://lore.kernel.org/r/20220322055702.95276-1-damien.lemoal@opensource.wd... Fixes: 7d310f241001 ("scsi: mpt3sas: Get device objects using sas_address & portID") Cc: stable@vger.kernel.org Acked-by: Sreekanth Reddy sreekanth.reddy@broadcom.com Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -11035,6 +11035,7 @@ _scsih_expander_node_remove(struct MPT3S { struct _sas_port *mpt3sas_port, *next; unsigned long flags; + int port_id;
/* remove sibling ports attached to this expander */ list_for_each_entry_safe(mpt3sas_port, next, @@ -11055,6 +11056,8 @@ _scsih_expander_node_remove(struct MPT3S mpt3sas_port->hba_port); }
+ port_id = sas_expander->port->port_id; + mpt3sas_transport_port_remove(ioc, sas_expander->sas_address, sas_expander->sas_address_parent, sas_expander->port);
@@ -11062,7 +11065,7 @@ _scsih_expander_node_remove(struct MPT3S "expander_remove: handle(0x%04x), sas_addr(0x%016llx), port:%d\n", sas_expander->handle, (unsigned long long) sas_expander->sas_address, - sas_expander->port->port_id); + port_id);
spin_lock_irqsave(&ioc->sas_node_lock, flags); list_del(&sas_expander->list);
From: Adrian Hunter adrian.hunter@intel.com
commit 4049f7acef3eb37c1ea0df45f3ffc29404f4e708 upstream.
Add PCI ID and callbacks to support Intel Meteor Lake (MTL).
Link: https://lore.kernel.org/r/20220404055038.2208051-1-adrian.hunter@intel.com Cc: stable@vger.kernel.org # v5.15+ Reviewed-by: Avri Altman avri.altman@wdc.com Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Adrian Hunter adrian.hunter@intel.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/ufs/ufshcd-pci.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
--- a/drivers/scsi/ufs/ufshcd-pci.c +++ b/drivers/scsi/ufs/ufshcd-pci.c @@ -428,6 +428,12 @@ static int ufs_intel_adl_init(struct ufs return ufs_intel_common_init(hba); }
+static int ufs_intel_mtl_init(struct ufs_hba *hba) +{ + hba->caps |= UFSHCD_CAP_CRYPTO | UFSHCD_CAP_WB_EN; + return ufs_intel_common_init(hba); +} + static struct ufs_hba_variant_ops ufs_intel_cnl_hba_vops = { .name = "intel-pci", .init = ufs_intel_common_init, @@ -465,6 +471,16 @@ static struct ufs_hba_variant_ops ufs_in .device_reset = ufs_intel_device_reset, };
+static struct ufs_hba_variant_ops ufs_intel_mtl_hba_vops = { + .name = "intel-pci", + .init = ufs_intel_mtl_init, + .exit = ufs_intel_common_exit, + .hce_enable_notify = ufs_intel_hce_enable_notify, + .link_startup_notify = ufs_intel_link_startup_notify, + .resume = ufs_intel_resume, + .device_reset = ufs_intel_device_reset, +}; + #ifdef CONFIG_PM_SLEEP static int ufshcd_pci_restore(struct device *dev) { @@ -579,6 +595,7 @@ static const struct pci_device_id ufshcd { PCI_VDEVICE(INTEL, 0x98FA), (kernel_ulong_t)&ufs_intel_lkf_hba_vops }, { PCI_VDEVICE(INTEL, 0x51FF), (kernel_ulong_t)&ufs_intel_adl_hba_vops }, { PCI_VDEVICE(INTEL, 0x54FF), (kernel_ulong_t)&ufs_intel_adl_hba_vops }, + { PCI_VDEVICE(INTEL, 0x7E47), (kernel_ulong_t)&ufs_intel_mtl_hba_vops }, { } /* terminate list */ };
From: Pali Rohár pali@kernel.org
commit 7e2646ed47542123168d43916b84b954532e5386 upstream.
This reverts commit bb32e1987bc55ce1db400faf47d85891da3c9b9f.
Commit 1a3ed0dc3594 ("mmc: sdhci-xenon: fix 1.8v regulator stabilization") contains proper fix for the issue described in commit bb32e1987bc5 ("mmc: sdhci-xenon: fix annoying 1.8V regulator warning").
Fixes: 8d876bf472db ("mmc: sdhci-xenon: wait 5ms after set 1.8V signal enable") Cc: stable@vger.kernel.org # 1a3ed0dc3594 ("mmc: sdhci-xenon: fix 1.8v regulator stabilization") Signed-off-by: Pali Rohár pali@kernel.org Reviewed-by: Marek Behún kabel@kernel.org Reviewed-by: Marcin Wojtas mw@semihalf.com Link: https://lore.kernel.org/r/20220318141441.32329-1-pali@kernel.org Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/sdhci-xenon.c | 10 ---------- 1 file changed, 10 deletions(-)
--- a/drivers/mmc/host/sdhci-xenon.c +++ b/drivers/mmc/host/sdhci-xenon.c @@ -241,16 +241,6 @@ static void xenon_voltage_switch(struct { /* Wait for 5ms after set 1.8V signal enable bit */ usleep_range(5000, 5500); - - /* - * For some reason the controller's Host Control2 register reports - * the bit representing 1.8V signaling as 0 when read after it was - * written as 1. Subsequent read reports 1. - * - * Since this may cause some issues, do an empty read of the Host - * Control2 register here to circumvent this. - */ - sdhci_readw(host, SDHCI_HOST_CONTROL2); }
static unsigned int xenon_get_max_clock(struct sdhci_host *host)
From: Christian Löhle CLoehle@hyperstone.com
commit 5d435933376962b107bd76970912e7e80247dcc7 upstream.
Introduce a SEND_STATUS check for writes through SPI to not mark an unsuccessful write as successful.
Since SPI SD/MMC does not have states, after a write, the card will just hold the line LOW until it is ready again. The driver marks the write therefore as completed as soon as it reads something other than all zeroes. The driver does not distinguish from a card no longer signalling busy and it being disconnected (and the line being pulled-up by the host). This lead to writes being marked as successful when disconnecting a busy card. Now the card is ensured to be still connected by an additional CMD13, just like non-SPI is ensured to go back to TRAN state.
While at it and since we already poll for the post-write status anyway, we might as well check for SPIs error bits (any of them).
The disconnecting card problem is reproducable for me after continuous write activity and randomly disconnecting, around every 20-50 tries on SPI DS for some card.
Fixes: 7213d175e3b6f ("MMC/SD card driver learns SPI") Cc: stable@vger.kernel.org Signed-off-by: Christian Loehle cloehle@hyperstone.com Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Link: https://lore.kernel.org/r/76f6f5d2b35543bab3dfe438f268609c@hyperstone.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/core/block.c | 34 +++++++++++++++++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-)
--- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -1880,6 +1880,31 @@ static inline bool mmc_blk_rq_error(stru brq->data.error || brq->cmd.resp[0] & CMD_ERRORS; }
+static int mmc_spi_err_check(struct mmc_card *card) +{ + u32 status = 0; + int err; + + /* + * SPI does not have a TRAN state we have to wait on, instead the + * card is ready again when it no longer holds the line LOW. + * We still have to ensure two things here before we know the write + * was successful: + * 1. The card has not disconnected during busy and we actually read our + * own pull-up, thinking it was still connected, so ensure it + * still responds. + * 2. Check for any error bits, in particular R1_SPI_IDLE to catch a + * just reconnected card after being disconnected during busy. + */ + err = __mmc_send_status(card, &status, 0); + if (err) + return err; + /* All R1 and R2 bits of SPI are errors in our case */ + if (status) + return -EIO; + return 0; +} + static int mmc_blk_busy_cb(void *cb_data, bool *busy) { struct mmc_blk_busy_data *data = cb_data; @@ -1903,9 +1928,16 @@ static int mmc_blk_card_busy(struct mmc_ struct mmc_blk_busy_data cb_data; int err;
- if (mmc_host_is_spi(card->host) || rq_data_dir(req) == READ) + if (rq_data_dir(req) == READ) return 0;
+ if (mmc_host_is_spi(card->host)) { + err = mmc_spi_err_check(card); + if (err) + mqrq->brq.data.bytes_xfered = 0; + return err; + } + cb_data.card = card; cb_data.status = 0; err = __mmc_poll_for_busy(card, MMC_BLK_TIMEOUT_MS, &mmc_blk_busy_cb,
From: Yann Gautier yann.gautier@foss.st.com
commit 0d319dd5a27183b75d984e3dc495248e59f99334 upstream.
Use sg and not data->sg when checking sg list elements. Else only the first element alignment is checked. The last element should be checked the same way, for_each_sg already set sg to sg_next(sg).
Fixes: 46b723dd867d ("mmc: mmci: add stm32 sdmmc variant") Cc: stable@vger.kernel.org Signed-off-by: Yann Gautier yann.gautier@foss.st.com Link: https://lore.kernel.org/r/20220317111944.116148-2-yann.gautier@foss.st.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/mmci_stm32_sdmmc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/mmc/host/mmci_stm32_sdmmc.c +++ b/drivers/mmc/host/mmci_stm32_sdmmc.c @@ -62,8 +62,8 @@ static int sdmmc_idma_validate_data(stru * excepted the last element which has no constraint on idmasize */ for_each_sg(data->sg, sg, data->sg_len - 1, i) { - if (!IS_ALIGNED(data->sg->offset, sizeof(u32)) || - !IS_ALIGNED(data->sg->length, SDMMC_IDMA_BURST)) { + if (!IS_ALIGNED(sg->offset, sizeof(u32)) || + !IS_ALIGNED(sg->length, SDMMC_IDMA_BURST)) { dev_err(mmc_dev(host->mmc), "unaligned scatterlist: ofst:%x length:%d\n", data->sg->offset, data->sg->length); @@ -71,7 +71,7 @@ static int sdmmc_idma_validate_data(stru } }
- if (!IS_ALIGNED(data->sg->offset, sizeof(u32))) { + if (!IS_ALIGNED(sg->offset, sizeof(u32))) { dev_err(mmc_dev(host->mmc), "unaligned last scatterlist: ofst:%x length:%d\n", data->sg->offset, data->sg->length);
From: Wolfram Sang wsa+renesas@sang-engineering.com
commit 03e59b1e2f56245163b14c69e0a830c24b1a3a47 upstream.
When HS400 tuning is complete and HS400 is going to be activated, we have to keep the current number of TAPs and should not overwrite them with a hardcoded value. This was probably a copy&paste mistake when upporting HS400 support from the BSP.
Fixes: 26eb2607fa28 ("mmc: renesas_sdhi: add eMMC HS400 mode support") Reported-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Signed-off-by: Wolfram Sang wsa+renesas@sang-engineering.com Reviewed-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220404114902.12175-1-wsa+renesas@sang-engineerin... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/renesas_sdhi_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/mmc/host/renesas_sdhi_core.c +++ b/drivers/mmc/host/renesas_sdhi_core.c @@ -382,10 +382,10 @@ static void renesas_sdhi_hs400_complete( SH_MOBILE_SDHI_SCC_TMPPORT2_HS400OSEL) | sd_scc_read32(host, priv, SH_MOBILE_SDHI_SCC_TMPPORT2));
- /* Set the sampling clock selection range of HS400 mode */ sd_scc_write32(host, priv, SH_MOBILE_SDHI_SCC_DTCNTL, SH_MOBILE_SDHI_SCC_DTCNTL_TAPEN | - 0x4 << SH_MOBILE_SDHI_SCC_DTCNTL_TAPNUM_SHIFT); + sd_scc_read32(host, priv, + SH_MOBILE_SDHI_SCC_DTCNTL));
/* Avoid bad TAP */ if (bad_taps & BIT(priv->tap_set)) {
From: Michael Wu michael@allwinnertech.com
commit 08ebf903af57cda6d773f3dd1671b64f73b432b8 upstream.
During the card initialization process, the mmc core checks whether the eMMC/SD card supports an internal writeback-cache and then enables it inside the card.
Unfortunately, this isn't according to what the mmc core reports to the upper block layer. Instead, the writeback-cache support with REQ_FLUSH and REQ_FUA, are being enabled depending on whether the host supports the CMD23 (MMC_CAP_CMD23) and whether an eMMC supports the reliable-write command.
This is wrong and it may also sound awkward. In fact, it's a remnant from when both eMMC/SD cards didn't have dedicated commands/support to control the internal writeback-cache. In other words, it was the best we could do at that point in time.
To fix the problem, but also without breaking backwards compatibility, let's align the REQ_FLUSH support with whether the writeback-cache became successfully enabled - for both eMMC and SD cards.
Cc: stable@kernel.org Fixes: 881d1c25f765 ("mmc: core: Add cache control for eMMC4.5 device") Fixes: 130206a615a9 ("mmc: core: Add support for cache ctrl for SD cards") Depends-on: 97fce126e279 ("mmc: block: Issue a cache flush only when it's enabled") Reviewed-by: Avri Altman Avri.Altman@wdc.com Signed-off-by: Michael Wu michael@allwinnertech.com Link: https://lore.kernel.org/r/20220331073223.106415-1-michael@allwinnertech.com [Ulf: Re-wrote the commit message] Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/core/block.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
--- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -2376,6 +2376,8 @@ static struct mmc_blk_data *mmc_blk_allo struct mmc_blk_data *md; int devidx, ret; char cap_str[10]; + bool cache_enabled = false; + bool fua_enabled = false;
devidx = ida_simple_get(&mmc_blk_ida, 0, max_devices, GFP_KERNEL); if (devidx < 0) { @@ -2457,13 +2459,17 @@ static struct mmc_blk_data *mmc_blk_allo md->flags |= MMC_BLK_CMD23; }
- if (mmc_card_mmc(card) && - md->flags & MMC_BLK_CMD23 && + if (md->flags & MMC_BLK_CMD23 && ((card->ext_csd.rel_param & EXT_CSD_WR_REL_PARAM_EN) || card->ext_csd.rel_sectors)) { md->flags |= MMC_BLK_REL_WR; - blk_queue_write_cache(md->queue.queue, true, true); + fua_enabled = true; + cache_enabled = true; } + if (mmc_cache_enabled(card->host)) + cache_enabled = true; + + blk_queue_write_cache(md->queue.queue, cache_enabled, fua_enabled);
string_get_size((u64)size, 512, STRING_UNITS_2, cap_str, sizeof(cap_str));
From: Guo Xuenan guoxuenan@huawei.com
commit eafc0a02391b7b36617b36c97c4b5d6832cf5e24 upstream.
When partialDecoding, it is EOF if we've either filled the output buffer or can't proceed with reading an offset for following match.
In some extreme corner cases when compressed data is suitably corrupted, UAF will occur. As reported by KASAN [1], LZ4_decompress_safe_partial may lead to read out of bound problem during decoding. lz4 upstream has fixed it [2] and this issue has been disscussed here [3] before.
current decompression routine was ported from lz4 v1.8.3, bumping lib/lz4 to v1.9.+ is certainly a huge work to be done later, so, we'd better fix it first.
[1] https://lore.kernel.org/all/000000000000830d1205cf7f0477@google.com/ [2] https://github.com/lz4/lz4/commit/c5d6f8a8be3927c0bec91bcc58667a6cfad244ad# [3] https://lore.kernel.org/all/CC666AE8-4CA4-4951-B6FB-A2EFDE3AC03B@fb.com/
Link: https://lkml.kernel.org/r/20211111105048.2006070-1-guoxuenan@huawei.com Reported-by: syzbot+63d688f1d899c588fb71@syzkaller.appspotmail.com Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Nick Terrell terrelln@fb.com Acked-by: Gao Xiang hsiangkao@linux.alibaba.com Cc: Yann Collet cyan@fb.com Cc: Chengyang Fan cy.fan@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- lib/lz4/lz4_decompress.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/lib/lz4/lz4_decompress.c +++ b/lib/lz4/lz4_decompress.c @@ -271,8 +271,12 @@ static FORCE_INLINE int LZ4_decompress_g ip += length; op += length;
- /* Necessarily EOF, due to parsing restrictions */ - if (!partialDecoding || (cpy == oend)) + /* Necessarily EOF when !partialDecoding. + * When partialDecoding, it is EOF if we've either + * filled the output buffer or + * can't proceed with reading an offset for following match. + */ + if (!partialDecoding || (cpy == oend) || (ip >= (iend - 2))) break; } else { /* may overwrite up to WILDCOPYLENGTH beyond cpy */
From: Max Filippov jcmvbkbc@gmail.com
commit 66f133ceab7456c789f70a242991ed1b27ba1c3d upstream.
When CONFIG_DEBUG_KMAP_LOCAL is enabled __kmap_local_sched_{in,out} check that even slots in the tsk->kmap_ctrl.pteval are unmapped. The slots are initialized with 0 value, but the check is done with pte_none. 0 pte however does not necessarily mean that pte_none will return true. e.g. on xtensa it returns false, resulting in the following runtime warnings:
WARNING: CPU: 0 PID: 101 at mm/highmem.c:627 __kmap_local_sched_out+0x51/0x108 CPU: 0 PID: 101 Comm: touch Not tainted 5.17.0-rc7-00010-gd3a1cdde80d2-dirty #13 Call Trace: dump_stack+0xc/0x40 __warn+0x8f/0x174 warn_slowpath_fmt+0x48/0xac __kmap_local_sched_out+0x51/0x108 __schedule+0x71a/0x9c4 preempt_schedule_irq+0xa0/0xe0 common_exception_return+0x5c/0x93 do_wp_page+0x30e/0x330 handle_mm_fault+0xa70/0xc3c do_page_fault+0x1d8/0x3c4 common_exception+0x7f/0x7f
WARNING: CPU: 0 PID: 101 at mm/highmem.c:664 __kmap_local_sched_in+0x50/0xe0 CPU: 0 PID: 101 Comm: touch Tainted: G W 5.17.0-rc7-00010-gd3a1cdde80d2-dirty #13 Call Trace: dump_stack+0xc/0x40 __warn+0x8f/0x174 warn_slowpath_fmt+0x48/0xac __kmap_local_sched_in+0x50/0xe0 finish_task_switch$isra$0+0x1ce/0x2f8 __schedule+0x86e/0x9c4 preempt_schedule_irq+0xa0/0xe0 common_exception_return+0x5c/0x93 do_wp_page+0x30e/0x330 handle_mm_fault+0xa70/0xc3c do_page_fault+0x1d8/0x3c4 common_exception+0x7f/0x7f
Fix it by replacing !pte_none(pteval) with pte_val(pteval) != 0.
Link: https://lkml.kernel.org/r/20220403235159.3498065-1-jcmvbkbc@gmail.com Fixes: 5fbda3ecd14a ("sched: highmem: Store local kmaps in task struct") Signed-off-by: Max Filippov jcmvbkbc@gmail.com Reviewed-by: Thomas Gleixner tglx@linutronix.de Cc: "Peter Zijlstra (Intel)" peterz@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/highmem.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/highmem.c +++ b/mm/highmem.c @@ -627,7 +627,7 @@ void __kmap_local_sched_out(void)
/* With debug all even slots are unmapped and act as guard */ if (IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL) && !(i & 0x01)) { - WARN_ON_ONCE(!pte_none(pteval)); + WARN_ON_ONCE(pte_val(pteval) != 0); continue; } if (WARN_ON_ONCE(pte_none(pteval))) @@ -664,7 +664,7 @@ void __kmap_local_sched_in(void)
/* With debug all even slots are unmapped and act as guard */ if (IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL) && !(i & 0x01)) { - WARN_ON_ONCE(!pte_none(pteval)); + WARN_ON_ONCE(pte_val(pteval) != 0); continue; } if (WARN_ON_ONCE(pte_none(pteval)))
From: Paolo Bonzini pbonzini@redhat.com
commit 01e67e04c28170c47700c2c226d732bbfedb1ad0 upstream.
If an mremap() syscall with old_size=0 ends up in move_page_tables(), it will call invalidate_range_start()/invalidate_range_end() unnecessarily, i.e. with an empty range.
This causes a WARN in KVM's mmu_notifier. In the past, empty ranges have been diagnosed to be off-by-one bugs, hence the WARNing. Given the low (so far) number of unique reports, the benefits of detecting more buggy callers seem to outweigh the cost of having to fix cases such as this one, where userspace is doing something silly. In this particular case, an early return from move_page_tables() is enough to fix the issue.
Link: https://lkml.kernel.org/r/20220329173155.172439-1-pbonzini@redhat.com Reported-by: syzbot+6bde52d89cfdf9f61425@syzkaller.appspotmail.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Cc: Sean Christopherson seanjc@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/mremap.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/mm/mremap.c +++ b/mm/mremap.c @@ -486,6 +486,9 @@ unsigned long move_page_tables(struct vm pmd_t *old_pmd, *new_pmd; pud_t *old_pud, *new_pud;
+ if (!len) + return 0; + old_end = old_addr + len; flush_cache_range(vma, old_addr, old_end);
From: Miaohe Lin linmiaohe@huawei.com
commit 4ad099559b00ac01c3726e5c95dc3108ef47d03e upstream.
If mpol_new is allocated but not used in restart loop, mpol_new will be freed via mpol_put before returning to the caller. But refcnt is not initialized yet, so mpol_put could not do the right things and might leak the unused mpol_new. This would happen if mempolicy was updated on the shared shmem file while the sp->lock has been dropped during the memory allocation.
This issue could be triggered easily with the below code snippet if there are many processes doing the below work at the same time:
shmid = shmget((key_t)5566, 1024 * PAGE_SIZE, 0666|IPC_CREAT); shm = shmat(shmid, 0, 0); loop many times { mbind(shm, 1024 * PAGE_SIZE, MPOL_LOCAL, mask, maxnode, 0); mbind(shm + 128 * PAGE_SIZE, 128 * PAGE_SIZE, MPOL_DEFAULT, mask, maxnode, 0); }
Link: https://lkml.kernel.org/r/20220329111416.27954-1-linmiaohe@huawei.com Fixes: 42288fe366c4 ("mm: mempolicy: Convert shared_policy mutex to spinlock") Signed-off-by: Miaohe Lin linmiaohe@huawei.com Acked-by: Michal Hocko mhocko@suse.com Cc: KOSAKI Motohiro kosaki.motohiro@jp.fujitsu.com Cc: Mel Gorman mgorman@suse.de Cc: stable@vger.kernel.org [3.8] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/mempolicy.c | 1 + 1 file changed, 1 insertion(+)
--- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2561,6 +2561,7 @@ alloc_new: mpol_new = kmem_cache_alloc(policy_cache, GFP_KERNEL); if (!mpol_new) goto err_out; + atomic_set(&mpol_new->refcnt, 1); goto restart; }
From: Jens Axboe axboe@kernel.dk
commit ec858afda857e361182ceafc3d2ba2b164b8e889 upstream.
This is a leftover from the really old days where we weren't able to track and error early if we need a file and it wasn't assigned. Kill the check.
Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 3 --- 1 file changed, 3 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -4128,9 +4128,6 @@ static int io_fsync_prep(struct io_kiocb { struct io_ring_ctx *ctx = req->ctx;
- if (!req->file) - return -EBADF; - if (unlikely(ctx->flags & IORING_SETUP_IOPOLL)) return -EINVAL; if (unlikely(sqe->addr || sqe->ioprio || sqe->buf_index ||
From: Jens Axboe axboe@kernel.dk
commit a3e4bc23d5470b2beb7cc42a86b6a3e75b704c15 upstream.
In preparation for not using the file at prep time, defer checking if this file refers to a valid io_uring instance until issue time.
This also means we can get rid of the cleanup flag for splice and tee.
Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 49 +++++++++++++++++++++---------------------------- 1 file changed, 21 insertions(+), 28 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -623,10 +623,10 @@ struct io_epoll {
struct io_splice { struct file *file_out; - struct file *file_in; loff_t off_out; loff_t off_in; u64 len; + int splice_fd_in; unsigned int flags; };
@@ -1452,14 +1452,6 @@ static void io_prep_async_work(struct io if (def->unbound_nonreg_file) req->work.flags |= IO_WQ_WORK_UNBOUND; } - - switch (req->opcode) { - case IORING_OP_SPLICE: - case IORING_OP_TEE: - if (!S_ISREG(file_inode(req->splice.file_in)->i_mode)) - req->work.flags |= IO_WQ_WORK_UNBOUND; - break; - } }
static void io_prep_async_link(struct io_kiocb *req) @@ -4027,18 +4019,11 @@ static int __io_splice_prep(struct io_ki if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL)) return -EINVAL;
- sp->file_in = NULL; sp->len = READ_ONCE(sqe->len); sp->flags = READ_ONCE(sqe->splice_flags); - if (unlikely(sp->flags & ~valid_flags)) return -EINVAL; - - sp->file_in = io_file_get(req->ctx, req, READ_ONCE(sqe->splice_fd_in), - (sp->flags & SPLICE_F_FD_IN_FIXED)); - if (!sp->file_in) - return -EBADF; - req->flags |= REQ_F_NEED_CLEANUP; + sp->splice_fd_in = READ_ONCE(sqe->splice_fd_in); return 0; }
@@ -4053,20 +4038,27 @@ static int io_tee_prep(struct io_kiocb * static int io_tee(struct io_kiocb *req, unsigned int issue_flags) { struct io_splice *sp = &req->splice; - struct file *in = sp->file_in; struct file *out = sp->file_out; unsigned int flags = sp->flags & ~SPLICE_F_FD_IN_FIXED; + struct file *in; long ret = 0;
if (issue_flags & IO_URING_F_NONBLOCK) return -EAGAIN; + + in = io_file_get(req->ctx, req, sp->splice_fd_in, + (sp->flags & SPLICE_F_FD_IN_FIXED)); + if (!in) { + ret = -EBADF; + goto done; + } + if (sp->len) ret = do_tee(in, out, sp->len, flags);
if (!(sp->flags & SPLICE_F_FD_IN_FIXED)) io_put_file(in); - req->flags &= ~REQ_F_NEED_CLEANUP; - +done: if (ret != sp->len) req_set_fail(req); io_req_complete(req, ret); @@ -4085,15 +4077,22 @@ static int io_splice_prep(struct io_kioc static int io_splice(struct io_kiocb *req, unsigned int issue_flags) { struct io_splice *sp = &req->splice; - struct file *in = sp->file_in; struct file *out = sp->file_out; unsigned int flags = sp->flags & ~SPLICE_F_FD_IN_FIXED; loff_t *poff_in, *poff_out; + struct file *in; long ret = 0;
if (issue_flags & IO_URING_F_NONBLOCK) return -EAGAIN;
+ in = io_file_get(req->ctx, req, sp->splice_fd_in, + (sp->flags & SPLICE_F_FD_IN_FIXED)); + if (!in) { + ret = -EBADF; + goto done; + } + poff_in = (sp->off_in == -1) ? NULL : &sp->off_in; poff_out = (sp->off_out == -1) ? NULL : &sp->off_out;
@@ -4102,8 +4101,7 @@ static int io_splice(struct io_kiocb *re
if (!(sp->flags & SPLICE_F_FD_IN_FIXED)) io_put_file(in); - req->flags &= ~REQ_F_NEED_CLEANUP; - +done: if (ret != sp->len) req_set_fail(req); io_req_complete(req, ret); @@ -6649,11 +6647,6 @@ static void io_clean_op(struct io_kiocb kfree(io->free_iov); break; } - case IORING_OP_SPLICE: - case IORING_OP_TEE: - if (!(req->splice.flags & SPLICE_F_FD_IN_FIXED)) - io_put_file(req->splice.file_in); - break; case IORING_OP_OPENAT: case IORING_OP_OPENAT2: if (req->open.filename)
From: Eugene Syromiatnikov esyr@redhat.com
commit 0f5e4b83b37a96e3643951588ed7176b9b187c0a upstream.
Similarly to the way it is done im mbind syscall.
Cc: stable@vger.kernel.org # 5.14 Fixes: fe76421d1da1dcdb ("io_uring: allow user configurable IO thread CPU affinity") Signed-off-by: Eugene Syromiatnikov esyr@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -10683,7 +10683,15 @@ static int io_register_iowq_aff(struct i if (len > cpumask_size()) len = cpumask_size();
- if (copy_from_user(new_mask, arg, len)) { + if (in_compat_syscall()) { + ret = compat_get_bitmap(cpumask_bits(new_mask), + (const compat_ulong_t __user *)arg, + len * 8 /* CHAR_BIT */); + } else { + ret = copy_from_user(new_mask, arg, len); + } + + if (ret) { free_cpumask_var(new_mask); return -EFAULT; }
From: Jens Axboe axboe@kernel.dk
commit e677edbcabee849bfdd43f1602bccbecf736a646 upstream.
io_flush_timeouts() assumes the timeout isn't in progress of triggering or being removed/canceled, so it unconditionally removes it from the timeout list and attempts to cancel it.
Leave it on the list and let the normal timeout cancelation take care of it.
Cc: stable@vger.kernel.org # 5.5+ Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1538,12 +1538,11 @@ static void io_flush_timeouts(struct io_ __must_hold(&ctx->completion_lock) { u32 seq = ctx->cached_cq_tail - atomic_read(&ctx->cq_timeouts); + struct io_kiocb *req, *tmp;
spin_lock_irq(&ctx->timeout_lock); - while (!list_empty(&ctx->timeout_list)) { + list_for_each_entry_safe(req, tmp, &ctx->timeout_list, timeout.list) { u32 events_needed, events_got; - struct io_kiocb *req = list_first_entry(&ctx->timeout_list, - struct io_kiocb, timeout.list);
if (io_is_timeout_noseq(req)) break; @@ -1560,7 +1559,6 @@ static void io_flush_timeouts(struct io_ if (events_got < events_needed) break;
- list_del_init(&req->timeout.list); io_kill_timeout(req, 0); } ctx->cq_last_tm_flush = seq; @@ -6205,6 +6203,7 @@ static int io_timeout_prep(struct io_kio if (get_timespec64(&data->ts, u64_to_user_ptr(sqe->addr))) return -EFAULT;
+ INIT_LIST_HEAD(&req->timeout.list); data->mode = io_translate_timeout_mode(flags); hrtimer_init(&data->timer, io_timeout_get_clock(data), data->mode);
From: Pawan Gupta pawan.kumar.gupta@linux.intel.com
commit 73924ec4d560257004d5b5116b22a3647661e364 upstream.
The mechanism to save/restore MSRs during S3 suspend/resume checks for the MSR validity during suspend, and only restores the MSR if its a valid MSR. This is not optimal, as an invalid MSR will unnecessarily throw an exception for every suspend cycle. The more invalid MSRs, higher the impact will be.
Check and save the MSR validity at setup. This ensures that only valid MSRs that are guaranteed to not throw an exception will be attempted during suspend.
Fixes: 7a9c2dd08ead ("x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume") Suggested-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com Reviewed-by: Dave Hansen dave.hansen@linux.intel.com Acked-by: Borislav Petkov bp@suse.de Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/power/cpu.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -40,7 +40,8 @@ static void msr_save_context(struct save struct saved_msr *end = msr + ctxt->saved_msrs.num;
while (msr < end) { - msr->valid = !rdmsrl_safe(msr->info.msr_no, &msr->info.reg.q); + if (msr->valid) + rdmsrl(msr->info.msr_no, msr->info.reg.q); msr++; } } @@ -424,8 +425,10 @@ static int msr_build_context(const u32 * }
for (i = saved_msrs->num, j = 0; i < total_num; i++, j++) { + u64 dummy; + msr_array[i].info.msr_no = msr_id[j]; - msr_array[i].valid = false; + msr_array[i].valid = !rdmsrl_safe(msr_id[j], &dummy); msr_array[i].info.reg.q = 0; } saved_msrs->num = total_num;
From: Pawan Gupta pawan.kumar.gupta@linux.intel.com
commit e2a1256b17b16f9b9adf1b6fea56819e7b68e463 upstream.
After resuming from suspend-to-RAM, the MSRs that control CPU's speculative execution behavior are not being restored on the boot CPU.
These MSRs are used to mitigate speculative execution vulnerabilities. Not restoring them correctly may leave the CPU vulnerable. Secondary CPU's MSRs are correctly being restored at S3 resume by identify_secondary_cpu().
During S3 resume, restore these MSRs for boot CPU when restoring its processor state.
Fixes: 772439717dbf ("x86/bugs/intel: Set proper CPU features and setup RDS") Reported-by: Neelima Krishnan neelima.krishnan@intel.com Signed-off-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com Tested-by: Neelima Krishnan neelima.krishnan@intel.com Acked-by: Borislav Petkov bp@suse.de Reviewed-by: Dave Hansen dave.hansen@linux.intel.com Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/power/cpu.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
--- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -503,10 +503,24 @@ static int pm_cpu_check(const struct x86 return ret; }
+static void pm_save_spec_msr(void) +{ + u32 spec_msr_id[] = { + MSR_IA32_SPEC_CTRL, + MSR_IA32_TSX_CTRL, + MSR_TSX_FORCE_ABORT, + MSR_IA32_MCU_OPT_CTRL, + MSR_AMD64_LS_CFG, + }; + + msr_build_context(spec_msr_id, ARRAY_SIZE(spec_msr_id)); +} + static int pm_check_save_msr(void) { dmi_check_system(msr_save_dmi_table); pm_cpu_check(msr_save_cpu_table); + pm_save_spec_msr();
return 0; }
From: Kan Liang kan.liang@linux.intel.com
commit e590928de7547454469693da9bc7ffd562e54b7e upstream.
On Sapphire Rapids, the FRONTEND_RETIRED.MS_FLOWS event requires the FRONTEND MSR value 0x8. However, the current FRONTEND MSR mask doesn't support it.
Update intel_spr_extra_regs[] to support it.
Fixes: 61b985e3e775 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids") Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1648482543-14923-2-git-send-email-kan.liang@linux.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/events/intel/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -281,7 +281,7 @@ static struct extra_reg intel_spr_extra_ INTEL_UEVENT_EXTRA_REG(0x012a, MSR_OFFCORE_RSP_0, 0x3fffffffffull, RSP_0), INTEL_UEVENT_EXTRA_REG(0x012b, MSR_OFFCORE_RSP_1, 0x3fffffffffull, RSP_1), INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x01cd), - INTEL_UEVENT_EXTRA_REG(0x01c6, MSR_PEBS_FRONTEND, 0x7fff17, FE), + INTEL_UEVENT_EXTRA_REG(0x01c6, MSR_PEBS_FRONTEND, 0x7fff1f, FE), INTEL_UEVENT_EXTRA_REG(0x40ad, MSR_PEBS_FRONTEND, 0x7, FE), INTEL_UEVENT_EXTRA_REG(0x04c2, MSR_PEBS_FRONTEND, 0x8, FE), EVENT_EXTRA_END
From: Ethan Lien ethanlien@synology.com
commit b642b52d0b50f4d398cb4293f64992d0eed2e2ce upstream.
We use extent_changeset->bytes_changed in qgroup_reserve_data() to record how many bytes we set for EXTENT_QGROUP_RESERVED state. Currently the bytes_changed is set as "unsigned int", and it will overflow if we try to fallocate a range larger than 4GiB. The result is we reserve less bytes and eventually break the qgroup limit.
Unlike regular buffered/direct write, which we use one changeset for each ordered extent, which can never be larger than 256M. For fallocate, we use one changeset for the whole range, thus it no longer respects the 256M per extent limit, and caused the problem.
The following example test script reproduces the problem:
$ cat qgroup-overflow.sh #!/bin/bash
DEV=/dev/sdj MNT=/mnt/sdj
mkfs.btrfs -f $DEV mount $DEV $MNT
# Set qgroup limit to 2GiB. btrfs quota enable $MNT btrfs qgroup limit 2G $MNT
# Try to fallocate a 3GiB file. This should fail. echo echo "Try to fallocate a 3GiB file..." fallocate -l 3G $MNT/3G.file
# Try to fallocate a 5GiB file. echo echo "Try to fallocate a 5GiB file..." fallocate -l 5G $MNT/5G.file
# See we break the qgroup limit. echo sync btrfs qgroup show -r $MNT
umount $MNT
When running the test:
$ ./qgroup-overflow.sh (...)
Try to fallocate a 3GiB file... fallocate: fallocate failed: Disk quota exceeded
Try to fallocate a 5GiB file...
qgroupid rfer excl max_rfer -------- ---- ---- -------- 0/5 5.00GiB 5.00GiB 2.00GiB
Since we have no control of how bytes_changed is used, it's better to set it to u64.
CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Ethan Lien ethanlien@synology.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/extent_io.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -117,7 +117,7 @@ struct btrfs_bio_ctrl { */ struct extent_changeset { /* How many bytes are set/cleared in this operation */ - unsigned int bytes_changed; + u64 bytes_changed;
/* Changed ranges */ struct ulist range_changed;
From: Kaiwen Hu kevinhu@synology.com
commit 60021bd754c6ca0addc6817994f20290a321d8d6 upstream.
A subvolume with an active swapfile must not be deleted otherwise it would not be possible to deactivate it.
After the subvolume is deleted, we cannot swapoff the swapfile in this deleted subvolume because the path is unreachable. The swapfile is still active and holding references, the filesystem cannot be unmounted.
The test looks like this:
mkfs.btrfs -f $dev > /dev/null mount $dev $mnt
btrfs sub create $mnt/subvol touch $mnt/subvol/swapfile chmod 600 $mnt/subvol/swapfile chattr +C $mnt/subvol/swapfile dd if=/dev/zero of=$mnt/subvol/swapfile bs=1K count=4096 mkswap $mnt/subvol/swapfile swapon $mnt/subvol/swapfile
btrfs sub delete $mnt/subvol swapoff $mnt/subvol/swapfile # failed: No such file or directory swapoff --all
unmount $mnt # target is busy.
To prevent above issue, we simply check that whether the subvolume contains any active swapfile, and stop the deleting process. This behavior is like snapshot ioctl dealing with a swapfile.
CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Robbie Ko robbieko@synology.com Reviewed-by: Qu Wenruo wqu@suse.com Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Kaiwen Hu kevinhu@synology.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/inode.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-)
--- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4450,6 +4450,13 @@ int btrfs_delete_subvolume(struct inode dest->root_key.objectid); return -EPERM; } + if (atomic_read(&dest->nr_swapfiles)) { + spin_unlock(&dest->root_item_lock); + btrfs_warn(fs_info, + "attempt to delete subvolume %llu with active swapfile", + root->root_key.objectid); + return -EPERM; + } root_flags = btrfs_root_flags(&dest->root_item); btrfs_set_root_flags(&dest->root_item, root_flags | BTRFS_ROOT_SUBVOL_DEAD); @@ -10721,8 +10728,23 @@ static int btrfs_swap_activate(struct sw * set. We use this counter to prevent snapshots. We must increment it * before walking the extents because we don't want a concurrent * snapshot to run after we've already checked the extents. - */ + * + * It is possible that subvolume is marked for deletion but still not + * removed yet. To prevent this race, we check the root status before + * activating the swapfile. + */ + spin_lock(&root->root_item_lock); + if (btrfs_root_dead(root)) { + spin_unlock(&root->root_item_lock); + + btrfs_exclop_finish(fs_info); + btrfs_warn(fs_info, + "cannot activate swapfile because subvolume %llu is being deleted", + root->root_key.objectid); + return -EPERM; + } atomic_inc(&root->nr_swapfiles); + spin_unlock(&root->root_item_lock);
isize = ALIGN_DOWN(inode->i_size, fs_info->sectorsize);
From: Vinod Koul vkoul@kernel.org
commit 409543cec01a84610029d6440c480c3fdd7214fb upstream.
Commit b470e10eb43f ("spi: core: add dma_map_dev for dma device") added dma_map_dev for _spi_map_msg() but missed to add for unmap routine, __spi_unmap_msg(), so add it now.
Fixes: b470e10eb43f ("spi: core: add dma_map_dev for dma device") Cc: stable@vger.kernel.org # v5.14+ Signed-off-by: Vinod Koul vkoul@kernel.org Link: https://lore.kernel.org/r/20220406132238.1029249-1-vkoul@kernel.org Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/spi/spi.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -1072,11 +1072,15 @@ static int __spi_unmap_msg(struct spi_co
if (ctlr->dma_tx) tx_dev = ctlr->dma_tx->device->dev; + else if (ctlr->dma_map_dev) + tx_dev = ctlr->dma_map_dev; else tx_dev = ctlr->dev.parent;
if (ctlr->dma_rx) rx_dev = ctlr->dma_rx->device->dev; + else if (ctlr->dma_map_dev) + rx_dev = ctlr->dma_map_dev; else rx_dev = ctlr->dev.parent;
From: Guo Ren guoren@linux.alibaba.com
commit 31a099dbd91e69fcab55eef4be15ed7a8c984918 upstream.
These patch_text implementations are using stop_machine_cpuslocked infrastructure with atomic cpu_count. The original idea: When the master CPU patch_text, the others should wait for it. But current implementation is using the first CPU as master, which couldn't guarantee the remaining CPUs are waiting. This patch changes the last CPU as the master to solve the potential risk.
Fixes: ae16480785de ("arm64: introduce interfaces to hotpatch kernel and module code") Signed-off-by: Guo Ren guoren@linux.alibaba.com Signed-off-by: Guo Ren guoren@kernel.org Reviewed-by: Catalin Marinas catalin.marinas@arm.com Reviewed-by: Masami Hiramatsu mhiramat@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220407073323.743224-2-guoren@kernel.org Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/patching.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm64/kernel/patching.c +++ b/arch/arm64/kernel/patching.c @@ -117,8 +117,8 @@ static int __kprobes aarch64_insn_patch_ int i, ret = 0; struct aarch64_insn_patch *pp = arg;
- /* The first CPU becomes master */ - if (atomic_inc_return(&pp->cpu_count) == 1) { + /* The last CPU becomes master */ + if (atomic_inc_return(&pp->cpu_count) == num_online_cpus()) { for (i = 0; ret == 0 && i < pp->insn_cnt; i++) ret = aarch64_insn_patch_text_nosync(pp->text_addrs[i], pp->new_insns[i]);
From: Douglas Miller doug.miller@cornelisnetworks.com
commit 2bbac98d0930e8161b1957dc0ec99de39ade1b3c upstream.
Under certain conditions, such as MPI_Abort, the hfi1 cleanup code may represent the last reference held on the task mm. hfi1_mmu_rb_unregister() then drops the last reference and the mm is freed before the final use in hfi1_release_user_pages(). A new task may allocate the mm structure while it is still being used, resulting in problems. One manifestation is corruption of the mmap_sem counter leading to a hang in down_write(). Another is corruption of an mm struct that is in use by another task.
Fixes: 3d2a9d642512 ("IB/hfi1: Ensure correct mm is used at all times") Link: https://lore.kernel.org/r/20220408133523.122165.72975.stgit@awfm-01.cornelis... Cc: stable@vger.kernel.org Signed-off-by: Douglas Miller doug.miller@cornelisnetworks.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@cornelisnetworks.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/infiniband/hw/hfi1/mmu_rb.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/infiniband/hw/hfi1/mmu_rb.c +++ b/drivers/infiniband/hw/hfi1/mmu_rb.c @@ -80,6 +80,9 @@ void hfi1_mmu_rb_unregister(struct mmu_r unsigned long flags; struct list_head del_list;
+ /* Prevent freeing of mm until we are completely finished. */ + mmgrab(handler->mn.mm); + /* Unregister first so we don't get any more notifications. */ mmu_notifier_unregister(&handler->mn, handler->mn.mm);
@@ -102,6 +105,9 @@ void hfi1_mmu_rb_unregister(struct mmu_r
do_remove(handler, &del_list);
+ /* Now the mm may be freed. */ + mmdrop(handler->mn.mm); + kfree(handler); }
From: Shreeya Patel shreeya.patel@collabora.com
commit 5467801f1fcbdc46bc7298a84dbf3ca1ff2a7320 upstream.
GPIO chip irq members are exposed before they could be completely initialized and this leads to race conditions.
One such issue was observed for the gc->irq.domain variable which was accessed through the I2C interface in gpiochip_to_irq() before it could be initialized by gpiochip_add_irqchip(). This resulted in Kernel NULL pointer dereference.
Following are the logs for reference :-
kernel: Call Trace: kernel: gpiod_to_irq+0x53/0x70 kernel: acpi_dev_gpio_irq_get_by+0x113/0x1f0 kernel: i2c_acpi_get_irq+0xc0/0xd0 kernel: i2c_device_probe+0x28a/0x2a0 kernel: really_probe+0xf2/0x460 kernel: RIP: 0010:gpiochip_to_irq+0x47/0xc0
To avoid such scenarios, restrict usage of GPIO chip irq members before they are completely initialized.
Signed-off-by: Shreeya Patel shreeya.patel@collabora.com Cc: stable@vger.kernel.org Reviewed-by: Andy Shevchenko andy.shevchenko@gmail.com Reviewed-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Bartosz Golaszewski brgl@bgdev.pl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpiolib.c | 19 +++++++++++++++++++ include/linux/gpio/driver.h | 9 +++++++++ 2 files changed, 28 insertions(+)
--- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -1368,6 +1368,16 @@ static int gpiochip_to_irq(struct gpio_c { struct irq_domain *domain = gc->irq.domain;
+#ifdef CONFIG_GPIOLIB_IRQCHIP + /* + * Avoid race condition with other code, which tries to lookup + * an IRQ before the irqchip has been properly registered, + * i.e. while gpiochip is still being brought up. + */ + if (!gc->irq.initialized) + return -EPROBE_DEFER; +#endif + if (!gpiochip_irqchip_irq_valid(gc, offset)) return -ENXIO;
@@ -1552,6 +1562,15 @@ static int gpiochip_add_irqchip(struct g
acpi_gpiochip_request_interrupts(gc);
+ /* + * Using barrier() here to prevent compiler from reordering + * gc->irq.initialized before initialization of above + * GPIO chip irq members. + */ + barrier(); + + gc->irq.initialized = true; + return 0; }
--- a/include/linux/gpio/driver.h +++ b/include/linux/gpio/driver.h @@ -225,6 +225,15 @@ struct gpio_irq_chip { unsigned int ngpios);
/** + * @initialized: + * + * Flag to track GPIO chip irq member's initialization. + * This flag will make sure GPIO chip irq members are not used + * before they are initialized. + */ + bool initialized; + + /** * @valid_mask: * * If not %NULL, holds bitmask of GPIOs which are valid to be included
From: Reto Buerki reet@codelabs.ch
commit 59b18a1e65b7a2134814106d0860010e10babe18 upstream.
The x86 MSI message data is 32 bits in total and is either in compatibility or remappable format, see Intel Virtualization Technology for Directed I/O, section 5.1.2.
Fixes: 6285aa50736 ("x86/msi: Provide msi message shadow structs") Co-developed-by: Adrian-Ken Rueegsegger ken@codelabs.ch Signed-off-by: Adrian-Ken Rueegsegger ken@codelabs.ch Signed-off-by: Reto Buerki reet@codelabs.ch Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220407110647.67372-1-reet@codelabs.ch Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/msi.h | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-)
--- a/arch/x86/include/asm/msi.h +++ b/arch/x86/include/asm/msi.h @@ -12,14 +12,17 @@ int pci_msi_prepare(struct irq_domain *d /* Structs and defines for the X86 specific MSI message format */
typedef struct x86_msi_data { - u32 vector : 8, - delivery_mode : 3, - dest_mode_logical : 1, - reserved : 2, - active_low : 1, - is_level : 1; - - u32 dmar_subhandle; + union { + struct { + u32 vector : 8, + delivery_mode : 3, + dest_mode_logical : 1, + reserved : 2, + active_low : 1, + is_level : 1; + }; + u32 dmar_subhandle; + }; } __attribute__ ((packed)) arch_msi_msg_data_t; #define arch_msi_msg_data x86_msi_data
From: Dave Hansen dave.hansen@linux.intel.com
commit d39268ad24c0fd0665d0c5cf55a7c1a0ebf94766 upstream.
0day reported a regression on a microbenchmark which is intended to stress the TLB flushing path:
https://lore.kernel.org/all/20220317090415.GE735@xsang-OptiPlex-9020/
It pointed at a commit from Nadav which intended to remove retpoline overhead in the TLB flushing path by taking the 'cond'-ition in on_each_cpu_cond_mask(), pre-calculating it, and incorporating it into 'cpumask'. That allowed the code to use a bunch of earlier direct calls instead of later indirect calls that need a retpoline.
But, in practice, threads can go idle (and into lazy TLB mode where they don't need to flush their TLB) between the early and late calls. It works in this direction and not in the other because TLB-flushing threads tend to hold mmap_lock for write. Contention on that lock causes threads to _go_ idle right in this early/late window.
There was not any performance data in the original commit specific to the retpoline overhead. I did a few tests on a system with retpolines:
https://lore.kernel.org/all/dd8be93c-ded6-b962-50d4-96b1c3afb2b7@intel.com/
which showed a possible small win. But, that small win pales in comparison with the bigger loss induced on non-retpoline systems.
Revert the patch that removed the retpolines. This was not a clean revert, but it was self-contained enough not to be too painful.
Fixes: 6035152d8eeb ("x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy()") Reported-by: kernel test robot oliver.sang@intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Borislav Petkov bp@suse.de Acked-by: Nadav Amit namit@vmware.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/164874672286.389.7021457716635788197.tip-bot2@tip-... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/mm/tlb.c | 37 +++++-------------------------------- 1 file changed, 5 insertions(+), 32 deletions(-)
--- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -854,13 +854,11 @@ done: nr_invalidate); }
-static bool tlb_is_not_lazy(int cpu) +static bool tlb_is_not_lazy(int cpu, void *data) { return !per_cpu(cpu_tlbstate_shared.is_lazy, cpu); }
-static DEFINE_PER_CPU(cpumask_t, flush_tlb_mask); - DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_shared); EXPORT_PER_CPU_SYMBOL(cpu_tlbstate_shared);
@@ -889,36 +887,11 @@ STATIC_NOPV void native_flush_tlb_multi( * up on the new contents of what used to be page tables, while * doing a speculative memory access. */ - if (info->freed_tables) { + if (info->freed_tables) on_each_cpu_mask(cpumask, flush_tlb_func, (void *)info, true); - } else { - /* - * Although we could have used on_each_cpu_cond_mask(), - * open-coding it has performance advantages, as it eliminates - * the need for indirect calls or retpolines. In addition, it - * allows to use a designated cpumask for evaluating the - * condition, instead of allocating one. - * - * This code works under the assumption that there are no nested - * TLB flushes, an assumption that is already made in - * flush_tlb_mm_range(). - * - * cond_cpumask is logically a stack-local variable, but it is - * more efficient to have it off the stack and not to allocate - * it on demand. Preemption is disabled and this code is - * non-reentrant. - */ - struct cpumask *cond_cpumask = this_cpu_ptr(&flush_tlb_mask); - int cpu; - - cpumask_clear(cond_cpumask); - - for_each_cpu(cpu, cpumask) { - if (tlb_is_not_lazy(cpu)) - __cpumask_set_cpu(cpu, cond_cpumask); - } - on_each_cpu_mask(cond_cpumask, flush_tlb_func, (void *)info, true); - } + else + on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func, + (void *)info, 1, cpumask); }
void flush_tlb_multi(const struct cpumask *cpumask,
From: Kan Liang kan.liang@linux.intel.com
commit 4a263bf331c512849062805ef1b4ac40301a9829 upstream.
The INST_RETIRED.PREC_DIST event (0x0100) doesn't count on SPR. perf stat -e cpu/event=0xc0,umask=0x0/,cpu/event=0x0,umask=0x1/ -C0
Performance counter stats for 'CPU(s) 0':
607,246 cpu/event=0xc0,umask=0x0/ 0 cpu/event=0x0,umask=0x1/
The encoding for INST_RETIRED.PREC_DIST is pseudo-encoding, which doesn't work on the generic counters. However, current perf extends its mask to the generic counters.
The pseudo event-code for a fixed counter must be 0x00. Check and avoid extending the mask for the fixed counter event which using the pseudo-encoding, e.g., ref-cycles and PREC_DIST event.
With the patch, perf stat -e cpu/event=0xc0,umask=0x0/,cpu/event=0x0,umask=0x1/ -C0
Performance counter stats for 'CPU(s) 0':
583,184 cpu/event=0xc0,umask=0x0/ 583,048 cpu/event=0x0,umask=0x1/
Fixes: 2de71ee153ef ("perf/x86/intel: Fix ICL/SPR INST_RETIRED.PREC_DIST encodings") Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1648482543-14923-1-git-send-email-kan.liang@linux.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/events/intel/core.c | 6 +++++- arch/x86/include/asm/perf_event.h | 5 +++++ 2 files changed, 10 insertions(+), 1 deletion(-)
--- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -5466,7 +5466,11 @@ static void intel_pmu_check_event_constr /* Disabled fixed counters which are not in CPUID */ c->idxmsk64 &= intel_ctrl;
- if (c->idxmsk64 != INTEL_PMC_MSK_FIXED_REF_CYCLES) + /* + * Don't extend the pseudo-encoding to the + * generic counters + */ + if (!use_fixed_pseudo_encoding(c->code)) c->idxmsk64 |= (1ULL << num_counters) - 1; } c->idxmsk64 &= --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -241,6 +241,11 @@ struct x86_pmu_capability { #define INTEL_PMC_IDX_FIXED_SLOTS (INTEL_PMC_IDX_FIXED + 3) #define INTEL_PMC_MSK_FIXED_SLOTS (1ULL << INTEL_PMC_IDX_FIXED_SLOTS)
+static inline bool use_fixed_pseudo_encoding(u64 code) +{ + return !(code & 0xff); +} + /* * We model BTS tracing as another fixed-mode PMC. *
From: Christian Lamparter chunkeey@gmail.com
commit 7aa8104a554713b685db729e66511b93d989dd6a upstream.
the driver uses libata's "tag" values from in various arrays. Since the mentioned patch bumped the ATA_TAG_INTERNAL to 32, the value of the SATA_DWC_QCMD_MAX needs to account for that.
Otherwise ATA_TAG_INTERNAL usage cause similar crashes like this as reported by Tice Rex on the OpenWrt Forum and reproduced (with symbols) here:
| BUG: Kernel NULL pointer dereference at 0x00000000 | Faulting instruction address: 0xc03ed4b8 | Oops: Kernel access of bad area, sig: 11 [#1] | BE PAGE_SIZE=4K PowerPC 44x Platform | CPU: 0 PID: 362 Comm: scsi_eh_1 Not tainted 5.4.163 #0 | NIP: c03ed4b8 LR: c03d27e8 CTR: c03ed36c | REGS: cfa59950 TRAP: 0300 Not tainted (5.4.163) | MSR: 00021000 <CE,ME> CR: 42000222 XER: 00000000 | DEAR: 00000000 ESR: 00000000 | GPR00: c03d27e8 cfa59a08 cfa55fe0 00000000 0fa46bc0 [...] | [..] | NIP [c03ed4b8] sata_dwc_qc_issue+0x14c/0x254 | LR [c03d27e8] ata_qc_issue+0x1c8/0x2dc | Call Trace: | [cfa59a08] [c003f4e0] __cancel_work_timer+0x124/0x194 (unreliable) | [cfa59a78] [c03d27e8] ata_qc_issue+0x1c8/0x2dc | [cfa59a98] [c03d2b3c] ata_exec_internal_sg+0x240/0x524 | [cfa59b08] [c03d2e98] ata_exec_internal+0x78/0xe0 | [cfa59b58] [c03d30fc] ata_read_log_page.part.38+0x1dc/0x204 | [cfa59bc8] [c03d324c] ata_identify_page_supported+0x68/0x130 | [...]
This is because sata_dwc_dma_xfer_complete() NULLs the dma_pending's next neighbour "chan" (a *dma_chan struct) in this '32' case right here (line ~735):
hsdevp->dma_pending[tag] = SATA_DWC_DMA_PENDING_NONE;
Then the next time, a dma gets issued; dma_dwc_xfer_setup() passes the NULL'd hsdevp->chan to the dmaengine_slave_config() which then causes the crash.
With this patch, SATA_DWC_QCMD_MAX is now set to ATA_MAX_QUEUE + 1. This avoids the OOB. But please note, there was a worthwhile discussion on what ATA_TAG_INTERNAL and ATA_MAX_QUEUE is. And why there should not be a "fake" 33 command-long queue size.
Ideally, the dw driver should account for the ATA_TAG_INTERNAL. In Damien Le Moal's words: "... having looked at the driver, it is a bigger change than just faking a 33rd "tag" that is in fact not a command tag at all."
Fixes: 28361c403683c ("libata: add extra internal command") Cc: stable@kernel.org # 4.18+ BugLink: https://github.com/openwrt/openwrt/issues/9505 Signed-off-by: Christian Lamparter chunkeey@gmail.com Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ata/sata_dwc_460ex.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/ata/sata_dwc_460ex.c +++ b/drivers/ata/sata_dwc_460ex.c @@ -145,7 +145,11 @@ struct sata_dwc_device { #endif };
-#define SATA_DWC_QCMD_MAX 32 +/* + * Allow one extra special slot for commands and DMA management + * to account for libata internal commands. + */ +#define SATA_DWC_QCMD_MAX (ATA_MAX_QUEUE + 1)
struct sata_dwc_device_port { struct sata_dwc_device *hsdev;
From: Xiaomeng Tong xiam0nd.tong@gmail.com
commit 2012a9e279013933885983cbe0a5fe828052563b upstream.
The bug is here: return cluster;
The list iterator value 'cluster' will *always* be set and non-NULL by list_for_each_entry(), so it is incorrect to assume that the iterator value will be NULL if the list is empty or no element is found.
To fix the bug, return 'cluster' when found, otherwise return NULL.
Cc: stable@vger.kernel.org Fixes: 21bdbb7102ed ("perf: add qcom l2 cache perf events driver") Signed-off-by: Xiaomeng Tong xiam0nd.tong@gmail.com Link: https://lore.kernel.org/r/20220327055733.4070-1-xiam0nd.tong@gmail.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/perf/qcom_l2_pmu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/perf/qcom_l2_pmu.c +++ b/drivers/perf/qcom_l2_pmu.c @@ -736,7 +736,7 @@ static struct cluster_pmu *l2_cache_asso { u64 mpidr; int cpu_cluster_id; - struct cluster_pmu *cluster = NULL; + struct cluster_pmu *cluster;
/* * This assumes that the cluster_id is in MPIDR[aff1] for @@ -758,10 +758,10 @@ static struct cluster_pmu *l2_cache_asso cluster->cluster_id); cpumask_set_cpu(cpu, &cluster->cluster_cpus); *per_cpu_ptr(l2cache_pmu->pmu_cluster, cpu) = cluster; - break; + return cluster; }
- return cluster; + return NULL; }
static int l2cache_pmu_online_cpu(unsigned int cpu, struct hlist_node *node)
From: Namhyung Kim namhyung@kernel.org
commit e3265a4386428d3d157d9565bb520aabff8b4bf0 upstream.
It was reported that some perf event setup can make fork failed on ARM64. It was the case of a group of mixed hw and sw events and it failed in perf_event_init_task() due to armpmu_event_init().
The ARM PMU code checks if all the events in a group belong to the same PMU except for software events. But it didn't set the event_caps of inherited events and no longer identify them as software events. Therefore the test failed in a child process.
A simple reproducer is:
$ perf stat -e '{cycles,cs,instructions}' perf bench sched messaging # Running 'sched/messaging' benchmark: perf: fork(): Invalid argument
The perf stat was fine but the perf bench failed in fork(). Let's inherit the event caps from the parent.
Signed-off-by: Namhyung Kim namhyung@kernel.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220328200112.457740-1-namhyung@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/events/core.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -11596,6 +11596,9 @@ perf_event_alloc(struct perf_event_attr
event->state = PERF_EVENT_STATE_INACTIVE;
+ if (parent_event) + event->event_caps = parent_event->event_caps; + if (event->attr.sigtrap) atomic_set(&event->event_limit, 1);
From: Marc Zyngier maz@kernel.org
commit 0df6664531a12cdd8fc873f0cac0dcb40243d3e9 upstream.
It turns out that our polling of RWP is totally wrong when checking for it in the redistributors, as we test the *distributor* bit index, whereas it is a different bit number in the RDs... Oopsie boo.
This is embarassing. Not only because it is wrong, but also because it took *8 years* to notice the blunder...
Just fix the damn thing.
Fixes: 021f653791ad ("irqchip: gic-v3: Initial support for GICv3") Signed-off-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Reviewed-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Link: https://lore.kernel.org/r/20220315165034.794482-2-maz@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/irqchip/irq-gic-v3.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -206,11 +206,11 @@ static inline void __iomem *gic_dist_bas } }
-static void gic_do_wait_for_rwp(void __iomem *base) +static void gic_do_wait_for_rwp(void __iomem *base, u32 bit) { u32 count = 1000000; /* 1s! */
- while (readl_relaxed(base + GICD_CTLR) & GICD_CTLR_RWP) { + while (readl_relaxed(base + GICD_CTLR) & bit) { count--; if (!count) { pr_err_ratelimited("RWP timeout, gone fishing\n"); @@ -224,13 +224,13 @@ static void gic_do_wait_for_rwp(void __i /* Wait for completion of a distributor change */ static void gic_dist_wait_for_rwp(void) { - gic_do_wait_for_rwp(gic_data.dist_base); + gic_do_wait_for_rwp(gic_data.dist_base, GICD_CTLR_RWP); }
/* Wait for completion of a redistributor change */ static void gic_redist_wait_for_rwp(void) { - gic_do_wait_for_rwp(gic_data_rdist_rd_base()); + gic_do_wait_for_rwp(gic_data_rdist_rd_base(), GICR_CTLR_RWP); }
#ifdef CONFIG_ARM64
From: Thomas Zimmermann tzimmermann@suse.de
commit 0f525289ff0ddeb380813bd81e0f9bdaaa1c9078 upstream.
OF framebuffers do not have an underlying device in the Linux device hierarchy. Do a regular unregister call instead of hot unplugging such a non-existing device. Fixes a NULL dereference. An example error message on ppc64le is shown below.
BUG: Kernel NULL pointer dereference on read at 0x00000060 Faulting instruction address: 0xc00000000080dfa4 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [...] CPU: 2 PID: 139 Comm: systemd-udevd Not tainted 5.17.0-ae085d7f9365 #1 NIP: c00000000080dfa4 LR: c00000000080df9c CTR: c000000000797430 REGS: c000000004132fe0 TRAP: 0300 Not tainted (5.17.0-ae085d7f9365) MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 28228282 XER: 20000000 CFAR: c00000000000c80c DAR: 0000000000000060 DSISR: 40000000 IRQMASK: 0 GPR00: c00000000080df9c c000000004133280 c00000000169d200 0000000000000029 GPR04: 00000000ffffefff c000000004132f90 c000000004132f88 0000000000000000 GPR08: c0000000015658f8 c0000000015cd200 c0000000014f57d0 0000000048228283 GPR12: 0000000000000000 c00000003fffe300 0000000020000000 0000000000000000 GPR16: 0000000000000000 0000000113fc4a40 0000000000000005 0000000113fcfb80 GPR20: 000001000f7283b0 0000000000000000 c000000000e4a588 c000000000e4a5b0 GPR24: 0000000000000001 00000000000a0000 c008000000db0168 c0000000021f6ec0 GPR28: c0000000016d65a8 c000000004b36460 0000000000000000 c0000000016d64b0 NIP [c00000000080dfa4] do_remove_conflicting_framebuffers+0x184/0x1d0 [c000000004133280] [c00000000080df9c] do_remove_conflicting_framebuffers+0x17c/0x1d0 (unreliable) [c000000004133350] [c00000000080e4d0] remove_conflicting_framebuffers+0x60/0x150 [c0000000041333a0] [c00000000080e6f4] remove_conflicting_pci_framebuffers+0x134/0x1b0 [c000000004133450] [c008000000e70438] drm_aperture_remove_conflicting_pci_framebuffers+0x90/0x100 [drm] [c000000004133490] [c008000000da0ce4] bochs_pci_probe+0x6c/0xa64 [bochs] [...] [c000000004133db0] [c00000000002aaa0] system_call_exception+0x170/0x2d0 [c000000004133e10] [c00000000000c3cc] system_call_common+0xec/0x250
The bug [1] was introduced by commit 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal"). Most firmware framebuffers have an underlying platform device, which can be hot-unplugged before loading the native graphics driver. OF framebuffers do not (yet) have that device. Fix the code by unregistering the framebuffer as before without a hot unplug.
Tested with 5.17 on qemu ppc64le emulation.
Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Fixes: 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal") Reported-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch Reviewed-by: Javier Martinez Canillas javierm@redhat.com Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk Cc: Zack Rusin zackr@vmware.com Cc: Javier Martinez Canillas javierm@redhat.com Cc: Hans de Goede hdegoede@redhat.com Cc: stable@vger.kernel.org # v5.11+ Cc: Helge Deller deller@gmx.de Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Sam Ravnborg sam@ravnborg.org Cc: Zheyu Ma zheyuma97@gmail.com Cc: Xiyu Yang xiyuyang19@fudan.edu.cn Cc: Zhen Lei thunder.leizhen@huawei.com Cc: Matthew Wilcox willy@infradead.org Cc: Alex Deucher alexander.deucher@amd.com Cc: Tetsuo Handa penguin-kernel@i-love.sakura.ne.jp Cc: Guenter Roeck linux@roeck-us.net Cc: linux-fbdev@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Link: https://lore.kernel.org/all/YkHXO6LGHAN0p1pq@debian/ # [1] Link: https://patchwork.freedesktop.org/patch/msgid/20220404194402.29974-1-tzimmer... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/video/fbdev/core/fbmem.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/drivers/video/fbdev/core/fbmem.c +++ b/drivers/video/fbdev/core/fbmem.c @@ -1581,7 +1581,14 @@ static void do_remove_conflicting_frameb * If it's not a platform device, at least print a warning. A * fix would add code to remove the device from the system. */ - if (dev_is_platform(device)) { + if (!device) { + /* TODO: Represent each OF framebuffer as its own + * device in the device hierarchy. For now, offb + * doesn't have such a device, so unregister the + * framebuffer as before without warning. + */ + do_unregister_framebuffer(registered_fb[i]); + } else if (dev_is_platform(device)) { registered_fb[i]->forced_out = true; platform_device_unregister(to_platform_device(device)); } else {
From: Shirish S shirish.s@amd.com
commit 4052287a75eb3fc0f487fcc5f768a38bede455c8 upstream.
[Why] comparing pwm bl values (coverted) with user brightness(converted) levels in commit_tail leads to continuous setting of backlight via dmub as they don't to match. This leads overdrive in queuing of commands to DMCU that sometimes lead to depending on load on DMCU fw:
"[drm:dc_dmub_srv_wait_idle] *ERROR* Error waiting for DMUB idle: status=3"
[How] Store last successfully set backlight value and compare with it instead of pwm reads which is not what we should compare with.
Signed-off-by: Shirish S shirish.s@amd.com Reviewed-by: Harry Wentland harry.wentland@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 7 ++++--- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 6 ++++++ 2 files changed, 10 insertions(+), 3 deletions(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -3522,7 +3522,7 @@ static u32 convert_brightness_to_user(co max - min); }
-static int amdgpu_dm_backlight_set_level(struct amdgpu_display_manager *dm, +static void amdgpu_dm_backlight_set_level(struct amdgpu_display_manager *dm, int bl_idx, u32 user_brightness) { @@ -3550,7 +3550,8 @@ static int amdgpu_dm_backlight_set_level DRM_DEBUG("DM: Failed to update backlight on eDP[%d]\n", bl_idx); }
- return rc ? 0 : 1; + if (rc) + dm->actual_brightness[bl_idx] = user_brightness; }
static int amdgpu_dm_backlight_update_status(struct backlight_device *bd) @@ -9316,7 +9317,7 @@ static void amdgpu_dm_atomic_commit_tail /* restore the backlight level */ for (i = 0; i < dm->num_of_edps; i++) { if (dm->backlight_dev[i] && - (amdgpu_dm_backlight_get_level(dm, i) != dm->brightness[i])) + (dm->actual_brightness[i] != dm->brightness[i])) amdgpu_dm_backlight_set_level(dm, i, dm->brightness[i]); } #endif --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h @@ -446,6 +446,12 @@ struct amdgpu_display_manager { * cached backlight values. */ u32 brightness[AMDGPU_DM_MAX_NUM_EDP]; + /** + * @actual_brightness: + * + * last successfully applied backlight values. + */ + u32 actual_brightness[AMDGPU_DM_MAX_NUM_EDP]; };
enum dsc_clock_force_state {
From: Trond Myklebust trond.myklebust@hammerspace.com
commit 3be232f11a3cc9b0ef0795e39fa11bdb8e422a06 upstream.
If we have already set up the socket and are waiting for it to connect, then don't immediately close and retry.
Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sunrpc/xprt.c | 3 ++- net/sunrpc/xprtsock.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-)
--- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -767,7 +767,8 @@ EXPORT_SYMBOL_GPL(xprt_disconnect_done); */ static void xprt_schedule_autoclose_locked(struct rpc_xprt *xprt) { - set_bit(XPRT_CLOSE_WAIT, &xprt->state); + if (test_and_set_bit(XPRT_CLOSE_WAIT, &xprt->state)) + return; if (test_and_set_bit(XPRT_LOCKED, &xprt->state) == 0) queue_work(xprtiod_workqueue, &xprt->task_cleanup); else if (xprt->snd_task && !test_bit(XPRT_SND_IS_COOKIE, &xprt->state)) --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -2365,7 +2365,7 @@ static void xs_connect(struct rpc_xprt *
WARN_ON_ONCE(!xprt_lock_connect(xprt, task, transport));
- if (transport->sock != NULL) { + if (transport->sock != NULL && !xprt_connecting(xprt)) { dprintk("RPC: xs_connect delayed xprt %p for %lu " "seconds\n", xprt, xprt->reestablish_timeout / HZ);
From: Daniel Mack daniel@zonque.org
commit d14eb80e27795b7b20060f7b151cdfe39722a813 upstream.
If the optional regulator lookup fails, reset the pointer to NULL. Other functions such as mipi_dbi_poweron_reset_conditional() only do a NULL pointer check and will otherwise dereference the error pointer.
Fixes: 5a04227326b04c15 ("drm/panel: Add ilitek ili9341 panel driver") Signed-off-by: Daniel Mack daniel@zonque.org Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch Link: https://patchwork.freedesktop.org/patch/msgid/20220317225537.826302-1-daniel... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/panel/panel-ilitek-ili9341.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/panel/panel-ilitek-ili9341.c +++ b/drivers/gpu/drm/panel/panel-ilitek-ili9341.c @@ -612,8 +612,10 @@ static int ili9341_dbi_probe(struct spi_ int ret;
vcc = devm_regulator_get_optional(dev, "vcc"); - if (IS_ERR(vcc)) + if (IS_ERR(vcc)) { dev_err(dev, "get optional vcc failed\n"); + vcc = NULL; + }
dbidev = devm_drm_dev_alloc(dev, &ili9341_dbi_driver, struct mipi_dbi_dev, drm);
From: Benjamin Marty info@benjaminmarty.ch
commit 879791ad8bf3dc5453061cad74776a617b6e3319 upstream.
Fixes crash on MST Hub disconnect.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1849 Fixes: ee2698cf79cc ("drm/amd/display: Changed pipe split policy to allow for multi-display pipe split") Signed-off-by: Benjamin Marty info@benjaminmarty.ch Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c @@ -874,7 +874,7 @@ static const struct dc_debug_options deb .clock_trace = true, .disable_pplib_clock_request = true, .min_disp_clk_khz = 100000, - .pipe_split_policy = MPC_SPLIT_DYNAMIC, + .pipe_split_policy = MPC_SPLIT_AVOID_MULT_DISP, .force_single_disp_pipe_split = false, .disable_dcc = DCC_ENABLE, .vsr_support = true,
From: Alex Deucher alexander.deucher@amd.com
commit 2f25d8ce09b7ba5d769c132ba3d4eb84a941d2cb upstream.
SMU takes clock limits in Mhz units. socclk and fclk were using 10 khz units in some cases. Switch to Mhz units. Fixes higher than required SoC clocks.
Fixes: 97cf32996c46d9 ("drm/amd/pm: Removed fixed clock in auto mode DPM") Reviewed-by: Paul Menzel pmenzel@molgen.mpg.de Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c @@ -773,13 +773,13 @@ static int smu10_dpm_force_dpm_level(str smum_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_SetHardMinFclkByFreq, hwmgr->display_config->num_display > 3 ? - data->clock_vol_info.vdd_dep_on_fclk->entries[0].clk : + (data->clock_vol_info.vdd_dep_on_fclk->entries[0].clk / 100) : min_mclk, NULL);
smum_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_SetHardMinSocclkByFreq, - data->clock_vol_info.vdd_dep_on_socclk->entries[0].clk, + data->clock_vol_info.vdd_dep_on_socclk->entries[0].clk / 100, NULL); smum_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_SetHardMinVcn, @@ -792,11 +792,11 @@ static int smu10_dpm_force_dpm_level(str NULL); smum_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_SetSoftMaxFclkByFreq, - data->clock_vol_info.vdd_dep_on_fclk->entries[index_fclk].clk, + data->clock_vol_info.vdd_dep_on_fclk->entries[index_fclk].clk / 100, NULL); smum_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_SetSoftMaxSocclkByFreq, - data->clock_vol_info.vdd_dep_on_socclk->entries[index_socclk].clk, + data->clock_vol_info.vdd_dep_on_socclk->entries[index_socclk].clk / 100, NULL); smum_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_SetSoftMaxVcn,
From: Emily Deng Emily.Deng@amd.com
commit 02fc996d5098f4c3f65bdf6cdb6b28e3f29ba789 upstream.
Correct the code error for setting register UVD_GFX10_ADDR_CONFIG. Need to use inst_idx, or it only will set VCN0.
Signed-off-by: Emily Deng Emily.Deng@amd.com Reviewed-by: James Zhu James.Zhu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c @@ -601,8 +601,8 @@ static void vcn_v3_0_mc_resume_dpg_mode( AMDGPU_GPU_PAGE_ALIGN(sizeof(struct amdgpu_fw_shared)), 0, indirect);
/* VCN global tiling registers */ - WREG32_SOC15_DPG_MODE(0, SOC15_DPG_MODE_OFFSET( - UVD, 0, mmUVD_GFX10_ADDR_CONFIG), adev->gfx.config.gb_addr_config, 0, indirect); + WREG32_SOC15_DPG_MODE(inst_idx, SOC15_DPG_MODE_OFFSET( + UVD, inst_idx, mmUVD_GFX10_ADDR_CONFIG), adev->gfx.config.gb_addr_config, 0, indirect); }
static void vcn_v3_0_disable_static_power_gating(struct amdgpu_device *adev, int inst)
From: Karol Herbst kherbst@redhat.com
commit 38d4e5cf5b08798f093374e53c2f4609d5382dd5 upstream.
Fixes a crash booting on those platforms with nouveau.
Fixes: 4cdd2450bf73 ("drm/nouveau/pmu/gm200-: use alternate falcon reset sequence") Cc: Ben Skeggs bskeggs@redhat.com Cc: Karol Herbst kherbst@redhat.com Cc: dri-devel@lists.freedesktop.org Cc: nouveau@lists.freedesktop.org Cc: stable@vger.kernel.org # v5.17+ Signed-off-by: Karol Herbst kherbst@redhat.com Reviewed-by: Lyude Paul lyude@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20220322124800.2605463-1-kherb... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gm20b.c | 1 + drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp102.c | 2 +- drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c | 1 + drivers/gpu/drm/nouveau/nvkm/subdev/pmu/priv.h | 1 + 4 files changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gm20b.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gm20b.c @@ -216,6 +216,7 @@ gm20b_pmu = { .intr = gt215_pmu_intr, .recv = gm20b_pmu_recv, .initmsg = gm20b_pmu_initmsg, + .reset = gf100_pmu_reset, };
#if IS_ENABLED(CONFIG_ARCH_TEGRA_210_SOC) --- a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp102.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp102.c @@ -23,7 +23,7 @@ */ #include "priv.h"
-static void +void gp102_pmu_reset(struct nvkm_pmu *pmu) { struct nvkm_device *device = pmu->subdev.device; --- a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/gp10b.c @@ -83,6 +83,7 @@ gp10b_pmu = { .intr = gt215_pmu_intr, .recv = gm20b_pmu_recv, .initmsg = gm20b_pmu_initmsg, + .reset = gp102_pmu_reset, };
#if IS_ENABLED(CONFIG_ARCH_TEGRA_210_SOC) --- a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/priv.h +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/priv.h @@ -41,6 +41,7 @@ int gt215_pmu_send(struct nvkm_pmu *, u3
bool gf100_pmu_enabled(struct nvkm_pmu *); void gf100_pmu_reset(struct nvkm_pmu *); +void gp102_pmu_reset(struct nvkm_pmu *pmu);
void gk110_pmu_pgob(struct nvkm_pmu *, bool);
From: Lee Jones lee.jones@linaro.org
commit e79a2398e1b2d47060474dca291542368183bc0f upstream.
This ensures userspace cannot prematurely clean-up the client before it is fully initialised which has been proven to cause issues in the past.
Cc: Felix Kuehling Felix.Kuehling@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: "Christian König" christian.koenig@amd.com Cc: "Pan, Xinhui" Xinhui.Pan@amd.com Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Lee Jones lee.jones@linaro.org Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-)
--- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c @@ -270,15 +270,6 @@ int kfd_smi_event_open(struct kfd_dev *d return ret; }
- ret = anon_inode_getfd(kfd_smi_name, &kfd_smi_ev_fops, (void *)client, - O_RDWR); - if (ret < 0) { - kfifo_free(&client->fifo); - kfree(client); - return ret; - } - *fd = ret; - init_waitqueue_head(&client->wait_queue); spin_lock_init(&client->lock); client->events = 0; @@ -288,5 +279,20 @@ int kfd_smi_event_open(struct kfd_dev *d list_add_rcu(&client->list, &dev->smi_clients); spin_unlock(&dev->smi_lock);
+ ret = anon_inode_getfd(kfd_smi_name, &kfd_smi_ev_fops, (void *)client, + O_RDWR); + if (ret < 0) { + spin_lock(&dev->smi_lock); + list_del_rcu(&client->list); + spin_unlock(&dev->smi_lock); + + synchronize_rcu(); + + kfifo_free(&client->fifo); + kfree(client); + return ret; + } + *fd = ret; + return 0; }
From: Alex Deucher alexander.deucher@amd.com
commit ebc002e3ee78409c42156e62e4e27ad1d09c5a75 upstream.
Seems to cause a reboots or hangs on some systems.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1924 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1953 Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)") Reviewed-by: Lijo Lazar lijo.lazar@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c +++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c @@ -1045,6 +1045,17 @@ bool amdgpu_dpm_is_baco_supported(struct
if (!pp_funcs || !pp_funcs->get_asic_baco_capability) return false; + /* Don't use baco for reset in S3. + * This is a workaround for some platforms + * where entering BACO during suspend + * seems to cause reboots or hangs. + * This might be related to the fact that BACO controls + * power to the whole GPU including devices like audio and USB. + * Powering down/up everything may adversely affect these other + * devices. Needs more investigation. + */ + if (adev->in_s3) + return false;
if (pp_funcs->get_asic_baco_capability(pp_handle, &baco_cap)) return false;
From: Suravee Suthikulpanit suravee.suthikulpanit@amd.com
commit 4a204f7895878363ca8211f50ec610408c8c70aa upstream.
Expand KVM's mask for the AVIC host physical ID to the full 12 bits defined by the architecture. The number of bits consumed by hardware is model specific, e.g. early CPUs ignored bits 11:8, but there is no way for KVM to enumerate the "true" size. So, KVM must allow using all bits, else it risks rejecting completely legal x2APIC IDs on newer CPUs.
This means KVM relies on hardware to not assign x2APIC IDs that exceed the "true" width of the field, but presumably hardware is smart enough to tie the width to the max x2APIC ID. KVM also relies on hardware to support at least 8 bits, as the legacy xAPIC ID is writable by software. But, those assumptions are unavoidable due to the lack of any way to enumerate the "true" width.
Cc: stable@vger.kernel.org Cc: Maxim Levitsky mlevitsk@redhat.com Suggested-by: Sean Christopherson seanjc@google.com Reviewed-by: Sean Christopherson seanjc@google.com Fixes: 44a95dae1d22 ("KVM: x86: Detect and Initialize AVIC support") Signed-off-by: Suravee Suthikulpanit suravee.suthikulpanit@amd.com Message-Id: 20220211000851.185799-1-suravee.suthikulpanit@amd.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com [modified due to the conflict caused by the commit 391503528257 ("KVM: x86: SVM: move avic definitions from AMD's spec to svm.h")] Signed-off-by: Suravee Suthikulpanit suravee.suthikulpanit@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/avic.c | 7 +------ arch/x86/kvm/svm/svm.h | 2 +- 2 files changed, 2 insertions(+), 7 deletions(-)
--- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -947,15 +947,10 @@ out: void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { u64 entry; - /* ID = 0xff (broadcast), ID > 0xff (reserved) */ int h_physical_id = kvm_cpu_get_apicid(cpu); struct vcpu_svm *svm = to_svm(vcpu);
- /* - * Since the host physical APIC id is 8 bits, - * we can support host APIC ID upto 255. - */ - if (WARN_ON(h_physical_id > AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK)) + if (WARN_ON(h_physical_id & ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK)) return;
entry = READ_ONCE(*(svm->avic_physical_id_cache)); --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -499,7 +499,7 @@ extern struct kvm_x86_nested_ops svm_nes #define AVIC_LOGICAL_ID_ENTRY_VALID_BIT 31 #define AVIC_LOGICAL_ID_ENTRY_VALID_MASK (1 << 31)
-#define AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK (0xFFULL) +#define AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK GENMASK_ULL(11, 0) #define AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK (0xFFFFFFFFFFULL << 12) #define AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK (1ULL << 62) #define AVIC_PHYSICAL_ID_ENTRY_VALID_MASK (1ULL << 63)
From: Dust Li dust.li@linux.alibaba.com
commit b70a5cc045197aad9c159042621baf3c015f6cc7 upstream.
In commit ea785a1a573b("net/smc: Send directly when TCP_CORK is cleared"), we don't use delayed work to implement cork.
This patch use the same algorithm, removes the delayed work when setting TCP_NODELAY and send directly in setsockopt(). This also makes the TCP_NODELAY the same as TCP.
Cc: Tony Lu tonylu@linux.alibaba.com Signed-off-by: Dust Li dust.li@linux.alibaba.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/smc/af_smc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -2419,8 +2419,8 @@ static int smc_setsockopt(struct socket sk->sk_state != SMC_CLOSED) { if (val) { SMC_STAT_INC(smc, ndly_cnt); - mod_delayed_work(smc->conn.lgr->tx_wq, - &smc->conn.tx_work, 0); + smc_tx_pending(&smc->conn); + cancel_delayed_work(&smc->conn.tx_work); } } break;
From: Jakub Kicinski kuba@kernel.org
commit 20695e9a9fd39103d1b0669470ae74030b7aa196 upstream.
This reverts commit d9142e1cf3bbdaf21337767114ecab26fe702d47.
The test is supposed to run cleanly with TLS is disabled, to test compatibility with TCP behavior. I can't repro the failure [1], the problem should be debugged rather than papered over.
Link: https://lore.kernel.org/all/20220325161203.7000698c@kicinski-fedora-pc1c0hjn... [1] Fixes: d9142e1cf3bb ("selftests: net: Add tls config dependency for tls selftests") Signed-off-by: Jakub Kicinski kuba@kernel.org Link: https://lore.kernel.org/r/20220328212904.2685395-1-kuba@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/config | 1 - 1 file changed, 1 deletion(-)
--- a/tools/testing/selftests/net/config +++ b/tools/testing/selftests/net/config @@ -43,5 +43,4 @@ CONFIG_NET_ACT_TUNNEL_KEY=m CONFIG_NET_ACT_MIRRED=m CONFIG_BAREUDP=m CONFIG_IPV6_IOAM6_LWTUNNEL=y -CONFIG_TLS=m CONFIG_CRYPTO_SM4=y
From: Jakub Sitnicki jakub@cloudflare.com
commit 9a69e2b385f443f244a7e8b8bcafe5ccfb0866b4 upstream.
remote_port is another case of a BPF context field documented as a 32-bit value in network byte order for which the BPF context access converter generates a load of a zero-padded 16-bit integer in network byte order.
First such case was dst_port in bpf_sock which got addressed in commit 4421a582718a ("bpf: Make dst_port field in struct bpf_sock 16-bit wide").
Loading 4-bytes from the remote_port offset and converting the value with bpf_ntohl() leads to surprising results, as the expected value is shifted by 16 bits.
Reduce the confusion by splitting the field in two - a 16-bit field holding a big-endian integer, and a 16-bit zero-padding anonymous field that follows it.
Suggested-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Jakub Sitnicki jakub@cloudflare.com Signed-off-by: Alexei Starovoitov ast@kernel.org Link: https://lore.kernel.org/bpf/20220209184333.654927-2-jakub@cloudflare.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/uapi/linux/bpf.h | 3 ++- net/bpf/test_run.c | 4 ++-- net/core/filter.c | 3 ++- 3 files changed, 6 insertions(+), 4 deletions(-)
--- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -6223,7 +6223,8 @@ struct bpf_sk_lookup { __u32 protocol; /* IP protocol (IPPROTO_TCP, IPPROTO_UDP) */ __u32 remote_ip4; /* Network byte order */ __u32 remote_ip6[4]; /* Network byte order */ - __u32 remote_port; /* Network byte order */ + __be16 remote_port; /* Network byte order */ + __u16 :16; /* Zero padding */ __u32 local_ip4; /* Network byte order */ __u32 local_ip6[4]; /* Network byte order */ __u32 local_port; /* Host byte order */ --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -954,7 +954,7 @@ int bpf_prog_test_run_sk_lookup(struct b if (!range_is_zero(user_ctx, offsetofend(typeof(*user_ctx), local_port), sizeof(*user_ctx))) goto out;
- if (user_ctx->local_port > U16_MAX || user_ctx->remote_port > U16_MAX) { + if (user_ctx->local_port > U16_MAX) { ret = -ERANGE; goto out; } @@ -962,7 +962,7 @@ int bpf_prog_test_run_sk_lookup(struct b ctx.family = (u16)user_ctx->family; ctx.protocol = (u16)user_ctx->protocol; ctx.dport = (u16)user_ctx->local_port; - ctx.sport = (__force __be16)user_ctx->remote_port; + ctx.sport = user_ctx->remote_port;
switch (ctx.family) { case AF_INET: --- a/net/core/filter.c +++ b/net/core/filter.c @@ -10540,7 +10540,8 @@ static bool sk_lookup_is_valid_access(in case bpf_ctx_range(struct bpf_sk_lookup, local_ip4): case bpf_ctx_range_till(struct bpf_sk_lookup, remote_ip6[0], remote_ip6[3]): case bpf_ctx_range_till(struct bpf_sk_lookup, local_ip6[0], local_ip6[3]): - case bpf_ctx_range(struct bpf_sk_lookup, remote_port): + case offsetof(struct bpf_sk_lookup, remote_port) ... + offsetof(struct bpf_sk_lookup, local_ip4) - 1: case bpf_ctx_range(struct bpf_sk_lookup, local_port): bpf_ctx_record_field_size(info, sizeof(__u32)); return bpf_ctx_narrow_access_ok(off, size, sizeof(__u32));
From: Jakub Sitnicki jakub@cloudflare.com
commit 3c69611b8926f8e74fcf76bd97ae0e5dafbeb26a upstream.
In commit 9a69e2b385f4 ("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide") ->remote_port field changed from __u32 to __be16.
However, narrow load tests which exercise 1-byte sized loads from offsetof(struct bpf_sk_lookup, remote_port) were not adopted to reflect the change.
As a result, on little-endian we continue testing loads from addresses:
- (__u8 *)&ctx->remote_port + 3 - (__u8 *)&ctx->remote_port + 4
which map to the zero padding following the remote_port field, and don't break the tests because there is no observable change.
While on big-endian, we observe breakage because tests expect to see zeros for values loaded from:
- (__u8 *)&ctx->remote_port - 1 - (__u8 *)&ctx->remote_port - 2
Above addresses map to ->remote_ip6 field, which precedes ->remote_port, and are populated during the bpf_sk_lookup IPv6 tests.
Unsurprisingly, on s390x we observe:
#136/38 sk_lookup/narrow access to ctx v4:OK #136/39 sk_lookup/narrow access to ctx v6:FAIL
Fix it by removing the checks for 1-byte loads from offsets outside of the ->remote_port field.
Fixes: 9a69e2b385f4 ("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide") Suggested-by: Ilya Leoshkevich iii@linux.ibm.com Signed-off-by: Jakub Sitnicki jakub@cloudflare.com Signed-off-by: Alexei Starovoitov ast@kernel.org Acked-by: Martin KaFai Lau kafai@fb.com Link: https://lore.kernel.org/bpf/20220319183356.233666-3-jakub@cloudflare.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/bpf/progs/test_sk_lookup.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c +++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c @@ -404,8 +404,7 @@ int ctx_narrow_access(struct bpf_sk_look
/* Narrow loads from remote_port field. Expect SRC_PORT. */ if (LSB(ctx->remote_port, 0) != ((SRC_PORT >> 0) & 0xff) || - LSB(ctx->remote_port, 1) != ((SRC_PORT >> 8) & 0xff) || - LSB(ctx->remote_port, 2) != 0 || LSB(ctx->remote_port, 3) != 0) + LSB(ctx->remote_port, 1) != ((SRC_PORT >> 8) & 0xff)) return SK_DROP; if (LSW(ctx->remote_port, 0) != SRC_PORT) return SK_DROP;
From: Dan Carpenter dan.carpenter@oracle.com
commit 7372971c1be5b7d4fdd8ad237798bdc1d1d54162 upstream.
The mc146818_get_time() function returns zero on success or negative a error code on failure. It needs to be type int.
Fixes: d35786b3a28d ("rtc: mc146818-lib: change return values of mc146818_get_time()") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Mateusz Jończyk mat.jonczyk@o2.pl Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Link: https://lore.kernel.org/r/20220111071922.GE11243@kili Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/rtc/rtc-mc146818-lib.c | 2 +- include/linux/mc146818rtc.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/rtc/rtc-mc146818-lib.c +++ b/drivers/rtc/rtc-mc146818-lib.c @@ -33,7 +33,7 @@ bool mc146818_does_rtc_work(void) } EXPORT_SYMBOL_GPL(mc146818_does_rtc_work);
-unsigned int mc146818_get_time(struct rtc_time *time) +int mc146818_get_time(struct rtc_time *time) { unsigned char ctrl; unsigned long flags; --- a/include/linux/mc146818rtc.h +++ b/include/linux/mc146818rtc.h @@ -124,7 +124,7 @@ struct cmos_rtc_board_info { #endif /* ARCH_RTC_LOCATION */
bool mc146818_does_rtc_work(void); -unsigned int mc146818_get_time(struct rtc_time *time); +int mc146818_get_time(struct rtc_time *time); int mc146818_set_time(struct rtc_time *time);
#endif /* _MC146818RTC_H */
From: Trond Myklebust trond.myklebust@hammerspace.com
commit 89f42494f92f448747bd8a7ab1ae8b5d5520577d upstream.
Avoid socket state races due to repeated calls to ->connect() using the same socket. If connect() returns 0 due to the connection having completed, but we are in fact in a closing state, then we may leave the XPRT_CONNECTING flag set on the transport.
Reported-by: Enrico Scholz enrico.scholz@sigma-chemnitz.de Fixes: 3be232f11a3c ("SUNRPC: Prevent immediate close+reconnect") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/sunrpc/xprtsock.h | 1 + net/sunrpc/xprtsock.c | 21 +++++++++++---------- 2 files changed, 12 insertions(+), 10 deletions(-)
--- a/include/linux/sunrpc/xprtsock.h +++ b/include/linux/sunrpc/xprtsock.h @@ -89,5 +89,6 @@ struct sock_xprt { #define XPRT_SOCK_WAKE_WRITE (5) #define XPRT_SOCK_WAKE_PENDING (6) #define XPRT_SOCK_WAKE_DISCONNECT (7) +#define XPRT_SOCK_CONNECT_SENT (8)
#endif /* _LINUX_SUNRPC_XPRTSOCK_H */ --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -2257,6 +2257,7 @@ static int xs_tcp_finish_connecting(stru fallthrough; case -EINPROGRESS: /* SYN_SENT! */ + set_bit(XPRT_SOCK_CONNECT_SENT, &transport->sock_state); if (xprt->reestablish_timeout < XS_TCP_INIT_REEST_TO) xprt->reestablish_timeout = XS_TCP_INIT_REEST_TO; break; @@ -2282,10 +2283,14 @@ static void xs_tcp_setup_socket(struct w struct rpc_xprt *xprt = &transport->xprt; int status = -EIO;
- if (!sock) { - sock = xs_create_sock(xprt, transport, - xs_addr(xprt)->sa_family, SOCK_STREAM, - IPPROTO_TCP, true); + if (xprt_connected(xprt)) + goto out; + if (test_and_clear_bit(XPRT_SOCK_CONNECT_SENT, + &transport->sock_state) || + !sock) { + xs_reset_transport(transport); + sock = xs_create_sock(xprt, transport, xs_addr(xprt)->sa_family, + SOCK_STREAM, IPPROTO_TCP, true); if (IS_ERR(sock)) { status = PTR_ERR(sock); goto out; @@ -2365,13 +2370,9 @@ static void xs_connect(struct rpc_xprt *
WARN_ON_ONCE(!xprt_lock_connect(xprt, task, transport));
- if (transport->sock != NULL && !xprt_connecting(xprt)) { + if (transport->sock != NULL) { dprintk("RPC: xs_connect delayed xprt %p for %lu " - "seconds\n", - xprt, xprt->reestablish_timeout / HZ); - - /* Start by resetting any existing state */ - xs_reset_transport(transport); + "seconds\n", xprt, xprt->reestablish_timeout / HZ);
delay = xprt_reconnect_delay(xprt); xprt_reconnect_backoff(xprt, XS_TCP_INIT_REEST_TO);
From: Jens Axboe axboe@kernel.dk
commit 7198bfc2017644c6b92d2ecef9b8b8e0363bb5fd upstream.
This reverts commit 6d35d04a9e18990040e87d2bbf72689252669d54.
Both Gabriel and Borislav report that this commit casues a regression with nbd:
sysfs: cannot create duplicate filename '/dev/block/43:0'
Revert it before 5.18-rc1 and we'll investigage this separately in due time.
Link: https://lore.kernel.org/all/YkiJTnFOt9bTv6A2@zn.tnic/ Reported-by: Gabriel L. Somlo somlo@cmu.edu Reported-by: Borislav Petkov bp@alien8.de Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/nbd.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-)
--- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -1744,6 +1744,17 @@ static struct nbd_device *nbd_dev_add(in refcount_set(&nbd->refs, 0); INIT_LIST_HEAD(&nbd->list); disk->major = NBD_MAJOR; + + /* Too big first_minor can cause duplicate creation of + * sysfs files/links, since index << part_shift might overflow, or + * MKDEV() expect that the max bits of first_minor is 20. + */ + disk->first_minor = index << part_shift; + if (disk->first_minor < index || disk->first_minor > MINORMASK) { + err = -EINVAL; + goto out_free_work; + } + disk->minors = 1 << part_shift; disk->fops = &nbd_fops; disk->private_data = nbd; @@ -1848,19 +1859,8 @@ static int nbd_genl_connect(struct sk_bu if (!netlink_capable(skb, CAP_SYS_ADMIN)) return -EPERM;
- if (info->attrs[NBD_ATTR_INDEX]) { + if (info->attrs[NBD_ATTR_INDEX]) index = nla_get_u32(info->attrs[NBD_ATTR_INDEX]); - - /* - * Too big first_minor can cause duplicate creation of - * sysfs files/links, since index << part_shift might overflow, or - * MKDEV() expect that the max bits of first_minor is 20. - */ - if (index < 0 || index > MINORMASK >> part_shift) { - printk(KERN_ERR "nbd: illegal input index %d\n", index); - return -EINVAL; - } - } if (!info->attrs[NBD_ATTR_SOCKETS]) { printk(KERN_ERR "nbd: must specify at least one socket\n"); return -EINVAL;
From: Arnaldo Carvalho de Melo acme@redhat.com
commit 3a8a0475861a443f02e3a9b57d044fe2a0a99291 upstream.
Using -ffat-lto-objects in the python feature test when building with clang-13 results in:
clang-13: error: optimization flag '-ffat-lto-objects' is not supported [-Werror,-Wignored-optimization-argument] error: command '/usr/sbin/clang' failed with exit code 1 cp: cannot stat '/tmp/build/perf/python_ext_build/lib/perf*.so': No such file or directory make[2]: *** [Makefile.perf:639: /tmp/build/perf/python/perf.so] Error 1
Noticed when building on a docker.io/library/archlinux:base container.
Cc: Adrian Hunter adrian.hunter@intel.com Cc: Fangrui Song maskray@google.com Cc: Florian Fainelli f.fainelli@gmail.com Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@kernel.org Cc: John Keeping john@metanate.com Cc: Leo Yan leo.yan@linaro.org Cc: Michael Petlan mpetlan@redhat.com Cc: Namhyung Kim namhyung@kernel.org Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: Sedat Dilek sedat.dilek@gmail.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/perf/Makefile.config | 3 +++ tools/perf/util/setup.py | 2 ++ 2 files changed, 5 insertions(+)
--- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -270,6 +270,9 @@ ifdef PYTHON_CONFIG PYTHON_EMBED_LIBADD := $(call grep-libs,$(PYTHON_EMBED_LDOPTS)) -lutil PYTHON_EMBED_CCOPTS := $(shell $(PYTHON_CONFIG_SQ) --includes 2>/dev/null) FLAGS_PYTHON_EMBED := $(PYTHON_EMBED_CCOPTS) $(PYTHON_EMBED_LDOPTS) + ifeq ($(CC_NO_CLANG), 0) + PYTHON_EMBED_CCOPTS := $(filter-out -ffat-lto-objects, $(PYTHON_EMBED_CCOPTS)) + endif endif
FEATURE_CHECK_CFLAGS-libpython := $(PYTHON_EMBED_CCOPTS) --- a/tools/perf/util/setup.py +++ b/tools/perf/util/setup.py @@ -23,6 +23,8 @@ if cc_is_clang: vars[var] = sub("-fstack-protector-strong", "", vars[var]) if not clang_has_option("-fno-semantic-interposition"): vars[var] = sub("-fno-semantic-interposition", "", vars[var]) + if not clang_has_option("-ffat-lto-objects"): + vars[var] = sub("-ffat-lto-objects", "", vars[var])
from distutils.core import setup, Extension
From: Arnaldo Carvalho de Melo acme@redhat.com
commit dd6e1fe91cdd52774ca642d1da75b58a86356b56 upstream.
The clang compiler complains about some options even without a source file being available, while others require one, so use the simple tools/build/feature/test-hello.c file.
Then check for the "is not supported" string in its output, in addition to the "unknown argument" already being looked for.
This was noticed when building with clang-13 where -ffat-lto-objects isn't supported and since we were looking just for "unknown argument" and not providing a source code to clang, was mistakenly assumed as being available and not being filtered to set of command line options provided to clang, leading to a build failure.
Cc: Adrian Hunter adrian.hunter@intel.com Cc: Fangrui Song maskray@google.com Cc: Florian Fainelli f.fainelli@gmail.com Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@kernel.org Cc: John Keeping john@metanate.com Cc: Leo Yan leo.yan@linaro.org Cc: Michael Petlan mpetlan@redhat.com Cc: Namhyung Kim namhyung@kernel.org Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: Sedat Dilek sedat.dilek@gmail.com Link: http://lore.kernel.org/lkml/ Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/perf/util/setup.py | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/tools/perf/util/setup.py +++ b/tools/perf/util/setup.py @@ -1,12 +1,14 @@ -from os import getenv +from os import getenv, path from subprocess import Popen, PIPE from re import sub
cc = getenv("CC") cc_is_clang = b"clang version" in Popen([cc.split()[0], "-v"], stderr=PIPE).stderr.readline() +src_feature_tests = getenv('srctree') + '/tools/build/feature'
def clang_has_option(option): - return [o for o in Popen([cc, option], stderr=PIPE).stderr.readlines() if b"unknown argument" in o] == [ ] + cc_output = Popen([cc, option, path.join(src_feature_tests, "test-hello.c") ], stderr=PIPE).stderr.readlines() + return [o for o in cc_output if ((b"unknown argument" in o) or (b"is not supported" in o))] == [ ]
if cc_is_clang: from distutils.sysconfig import get_config_vars
From: Arnaldo Carvalho de Melo acme@redhat.com
commit 41caff459a5b956b3e23ba9ca759dd0629ad3dda upstream.
These make the feature check fail when using clang, so remove them just like is done in tools/perf/Makefile.config to build perf itself.
Adding -Wno-compound-token-split-by-macro to tools/perf/Makefile.config when building with clang is also necessary to avoid these warnings turned into errors (-Werror):
CC /tmp/build/perf/util/scripting-engines/trace-event-perl.o In file included from util/scripting-engines/trace-event-perl.c:35: In file included from /usr/lib64/perl5/CORE/perl.h:4085: In file included from /usr/lib64/perl5/CORE/hv.h:659: In file included from /usr/lib64/perl5/CORE/hv_func.h:34: In file included from /usr/lib64/perl5/CORE/sbox32_hash.h:4: /usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: error: '(' and '{' tokens introducing statement expression appear in different macro expansion contexts [-Werror,-Wcompound-token-split-by-macro] ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/lib64/perl5/CORE/zaphod32_hash.h:80:38: note: expanded from macro 'ZAPHOD32_SCRAMBLE32' #define ZAPHOD32_SCRAMBLE32(v,prime) STMT_START { \ ^~~~~~~~~~ /usr/lib64/perl5/CORE/perl.h:737:29: note: expanded from macro 'STMT_START' # define STMT_START (void)( /* gcc supports "({ STATEMENTS; })" */ ^ /usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: note: '{' token is here ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/lib64/perl5/CORE/zaphod32_hash.h:80:49: note: expanded from macro 'ZAPHOD32_SCRAMBLE32' #define ZAPHOD32_SCRAMBLE32(v,prime) STMT_START { \ ^ /usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: error: '}' and ')' tokens terminating statement expression appear in different macro expansion contexts [-Werror,-Wcompound-token-split-by-macro] ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/lib64/perl5/CORE/zaphod32_hash.h:87:41: note: expanded from macro 'ZAPHOD32_SCRAMBLE32' v ^= (v>>23); \ ^ /usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: note: ')' token is here ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/lib64/perl5/CORE/zaphod32_hash.h:88:3: note: expanded from macro 'ZAPHOD32_SCRAMBLE32' } STMT_END ^~~~~~~~ /usr/lib64/perl5/CORE/perl.h:738:21: note: expanded from macro 'STMT_END' # define STMT_END ) ^
Please refer to the discussion on the Link: tag below, where Nathan clarifies the situation:
<quote> acme> And then get to the problems at the end of this message, which seem acme> similar to the problem described here: acme> acme> From Nathan Chancellor <> acme> Subject [PATCH] mwifiex: Remove unnecessary braces from HostCmd_SET_SEQ_NO_BSS_INFO acme> acme> https://lkml.org/lkml/2020/9/1/135 acme> acme> So perhaps in this case its better to disable that acme> -Werror,-Wcompound-token-split-by-macro when building with clang?
Yes, I think that is probably the best solution. As far as I can tell, at least in this file and context, the warning appears harmless, as the "create a GNU C statement expression from two different macros" is very much intentional, based on the presence of PERL_USE_GCC_BRACE_GROUPS. The warning is fixed in upstream Perl by just avoiding creating GNU C statement expressions using STMT_START and STMT_END:
https://github.com/Perl/perl5/issues/18780 https://github.com/Perl/perl5/pull/18984
If I am reading the source code correctly, an alternative to disabling the warning would be specifying -DPERL_GCC_BRACE_GROUPS_FORBIDDEN but it seems like that might end up impacting more than just this site, according to the issue discussion above. </quote>
Based-on-a-patch-by: Sedat Dilek sedat.dilek@gmail.com Tested-by: Sedat Dilek sedat.dilek@gmail.com # Debian/Selfmade LLVM-14 (x86-64) Cc: Adrian Hunter adrian.hunter@intel.com Cc: Fangrui Song maskray@google.com Cc: Florian Fainelli f.fainelli@gmail.com Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@kernel.org Cc: John Keeping john@metanate.com Cc: Leo Yan leo.yan@linaro.org Cc: Michael Petlan mpetlan@redhat.com Cc: Namhyung Kim namhyung@kernel.org Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Link: http://lore.kernel.org/lkml/YkxWcYzph5pC1EK8@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/build/feature/Makefile | 7 +++++++ tools/perf/Makefile.config | 3 +++ 2 files changed, 10 insertions(+)
--- a/tools/build/feature/Makefile +++ b/tools/build/feature/Makefile @@ -216,6 +216,13 @@ PERL_EMBED_LIBADD = $(call grep-libs,$(P PERL_EMBED_CCOPTS = `perl -MExtUtils::Embed -e ccopts 2>/dev/null` FLAGS_PERL_EMBED=$(PERL_EMBED_CCOPTS) $(PERL_EMBED_LDOPTS)
+ifeq ($(CC_NO_CLANG), 0) + PERL_EMBED_LDOPTS := $(filter-out -specs=%,$(PERL_EMBED_LDOPTS)) + PERL_EMBED_CCOPTS := $(filter-out -flto=auto -ffat-lto-objects, $(PERL_EMBED_CCOPTS)) + PERL_EMBED_CCOPTS := $(filter-out -specs=%,$(PERL_EMBED_CCOPTS)) + FLAGS_PERL_EMBED += -Wno-compound-token-split-by-macro +endif + $(OUTPUT)test-libperl.bin: $(BUILD) $(FLAGS_PERL_EMBED)
--- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -788,6 +788,9 @@ else LDFLAGS += $(PERL_EMBED_LDFLAGS) EXTLIBS += $(PERL_EMBED_LIBADD) CFLAGS += -DHAVE_LIBPERL_SUPPORT + ifeq ($(CC_NO_CLANG), 0) + CFLAGS += -Wno-compound-token-split-by-macro + endif $(call detected,CONFIG_LIBPERL) endif endif
From: Arnaldo Carvalho de Melo acme@redhat.com
commit 541f695cbcb6932c22638b06e0cbe1d56177e2e9 upstream.
Just like its done for ldopts and for both in tools/perf/Makefile.config.
Using `` to initialize PERL_EMBED_CCOPTS somehow precludes using:
$(filter-out SOMETHING_TO_FILTER,$(PERL_EMBED_CCOPTS))
And we need to do it to allow for building with versions of clang where some gcc options selected by distros are not available.
Tested-by: Sedat Dilek sedat.dilek@gmail.com # Debian/Selfmade LLVM-14 (x86-64) Cc: Adrian Hunter adrian.hunter@intel.com Cc: Fangrui Song maskray@google.com Cc: Florian Fainelli f.fainelli@gmail.com Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@kernel.org Cc: John Keeping john@metanate.com Cc: Leo Yan leo.yan@linaro.org Cc: Michael Petlan mpetlan@redhat.com Cc: Namhyung Kim namhyung@kernel.org Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Link: http://lore.kernel.org/lkml/YktYX2OnLtyobRYD@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/build/feature/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/build/feature/Makefile +++ b/tools/build/feature/Makefile @@ -213,7 +213,7 @@ strip-libs = $(filter-out -l%,$(1)) PERL_EMBED_LDOPTS = $(shell perl -MExtUtils::Embed -e ldopts 2>/dev/null) PERL_EMBED_LDFLAGS = $(call strip-libs,$(PERL_EMBED_LDOPTS)) PERL_EMBED_LIBADD = $(call grep-libs,$(PERL_EMBED_LDOPTS)) -PERL_EMBED_CCOPTS = `perl -MExtUtils::Embed -e ccopts 2>/dev/null` +PERL_EMBED_CCOPTS = $(shell perl -MExtUtils::Embed -e ccopts 2>/dev/null) FLAGS_PERL_EMBED=$(PERL_EMBED_CCOPTS) $(PERL_EMBED_LDOPTS)
ifeq ($(CC_NO_CLANG), 0)
From: Vinod Koul vkoul@kernel.org
commit d143f939a95696d38ff800ada14402fa50ebbd6c upstream.
This reverts commit 455896c53d5b ("dmaengine: shdma: Fix runtime PM imbalance on error") as the patch wrongly reduced the count on error and did not bail out. So drop the count by reverting the patch .
Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/sh/shdma-base.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/drivers/dma/sh/shdma-base.c +++ b/drivers/dma/sh/shdma-base.c @@ -115,10 +115,8 @@ static dma_cookie_t shdma_tx_submit(stru ret = pm_runtime_get(schan->dev);
spin_unlock_irq(&schan->chan_lock); - if (ret < 0) { + if (ret < 0) dev_err(schan->dev, "%s(): GET = %d\n", __func__, ret); - pm_runtime_put(schan->dev); - }
pm_runtime_barrier(schan->dev);
From: Paolo Bonzini pbonzini@redhat.com
commit 5593473a1e6c743764b08e3b6071cb43b5cfa6c4 upstream.
kvm_vcpu_release() will call kvm_dirty_ring_free(), freeing ring->dirty_gfns and setting it to NULL. Afterwards, it calls kvm_arch_vcpu_destroy().
However, if closing the file descriptor races with KVM_RUN in such away that vcpu->arch.st.preempted == 0, the following call stack leads to a NULL pointer dereference in kvm_dirty_run_push():
mark_page_dirty_in_slot+0x192/0x270 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3171 kvm_steal_time_set_preempted arch/x86/kvm/x86.c:4600 [inline] kvm_arch_vcpu_put+0x34e/0x5b0 arch/x86/kvm/x86.c:4618 vcpu_put+0x1b/0x70 arch/x86/kvm/../../../virt/kvm/kvm_main.c:211 vmx_free_vcpu+0xcb/0x130 arch/x86/kvm/vmx/vmx.c:6985 kvm_arch_vcpu_destroy+0x76/0x290 arch/x86/kvm/x86.c:11219 kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline]
The fix is to release the dirty page ring after kvm_arch_vcpu_destroy has run.
Reported-by: Qiuhao Li qiuhao@sysec.org Reported-by: Gaoning Pan pgn@zju.edu.cn Reported-by: Yongkang Jia kangel@zju.edu.cn Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- virt/kvm/kvm_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -431,8 +431,8 @@ static void kvm_vcpu_init(struct kvm_vcp
void kvm_vcpu_destroy(struct kvm_vcpu *vcpu) { - kvm_dirty_ring_free(&vcpu->dirty_ring); kvm_arch_vcpu_destroy(vcpu); + kvm_dirty_ring_free(&vcpu->dirty_ring);
/* * No need for rcu_read_lock as VCPU_RUN is the only place that changes
From: dann frazier dann.frazier@canonical.com
This reverts commit 9cc25e8529d567e08da98d11f092b21449763144 which is commit 64ea2d0e7263b67d8efc93fa1baace041ed36d1e upstream.
This patch was shown to introduce a regression:
# devlink dev param show pci/0000:24:00.0 name flow_steering_mode pci/0000:24:00.0: name flow_steering_mode type driver-specific values:
(flow steering mode description is missing beneath "values:")
# devlink dev param set pci/0000:24:00.0 name flow_steering_mode value smfs cmode runtime Segmentation fault (core dumped)
and also with upstream iproute # ./iproute2/devlink/devlink dev param set pci/0000:24:00.0 name flow_steering_mode value smfs cmode runtime Configuration mode not supported
Note: Instead of reverting, we could instead also backport commit cf530217408e ("devlink: Notify users when objects are accessible"). However, that makes changes to core devlink code that I'm not sure are suitable for a stable backport.
Cc: Leon Romanovsky leonro@nvidia.com Cc: David S. Miller davem@davemloft.net Cc: Sasha Levin sashal@kernel.org Signed-off-by: dann frazier dann.frazier@canonical.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 12 ++++++++++-- drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 -- drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c | 2 -- 3 files changed, 10 insertions(+), 6 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c @@ -793,11 +793,14 @@ int mlx5_devlink_register(struct devlink { int err;
- err = devlink_params_register(devlink, mlx5_devlink_params, - ARRAY_SIZE(mlx5_devlink_params)); + err = devlink_register(devlink); if (err) return err;
+ err = devlink_params_register(devlink, mlx5_devlink_params, + ARRAY_SIZE(mlx5_devlink_params)); + if (err) + goto params_reg_err; mlx5_devlink_set_params_init_values(devlink);
err = mlx5_devlink_auxdev_params_register(devlink); @@ -808,6 +811,7 @@ int mlx5_devlink_register(struct devlink if (err) goto traps_reg_err;
+ devlink_params_publish(devlink); return 0;
traps_reg_err: @@ -815,13 +819,17 @@ traps_reg_err: auxdev_reg_err: devlink_params_unregister(devlink, mlx5_devlink_params, ARRAY_SIZE(mlx5_devlink_params)); +params_reg_err: + devlink_unregister(devlink); return err; }
void mlx5_devlink_unregister(struct devlink *devlink) { + devlink_params_unpublish(devlink); mlx5_devlink_traps_unregister(devlink); mlx5_devlink_auxdev_params_unregister(devlink); devlink_params_unregister(devlink, mlx5_devlink_params, ARRAY_SIZE(mlx5_devlink_params)); + devlink_unregister(devlink); } --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1541,7 +1541,6 @@ static int probe_one(struct pci_dev *pde dev_err(&pdev->dev, "mlx5_crdump_enable failed with error code %d\n", err);
pci_save_state(pdev); - devlink_register(devlink); if (!mlx5_core_is_mp_slave(dev)) devlink_reload_enable(devlink); return 0; @@ -1564,7 +1563,6 @@ static void remove_one(struct pci_dev *p struct devlink *devlink = priv_to_devlink(dev);
devlink_reload_disable(devlink); - devlink_unregister(devlink); mlx5_crdump_disable(dev); mlx5_drain_health_wq(dev); mlx5_uninit_one(dev); --- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c @@ -46,7 +46,6 @@ static int mlx5_sf_dev_probe(struct auxi mlx5_core_warn(mdev, "mlx5_init_one err=%d\n", err); goto init_one_err; } - devlink_register(devlink); devlink_reload_enable(devlink); return 0;
@@ -66,7 +65,6 @@ static void mlx5_sf_dev_remove(struct au
devlink = priv_to_devlink(sf_dev->mdev); devlink_reload_disable(devlink); - devlink_unregister(devlink); mlx5_uninit_one(sf_dev->mdev); iounmap(sf_dev->mdev->iseg); mlx5_mdev_uninit(sf_dev->mdev);
From: Kees Cook keescook@chromium.org
commit 69d0db01e210e07fe915e5da91b54a867cda040f upstream.
The object-size sanitizer is redundant to -Warray-bounds, and inappropriately performs its checks at run-time when all information needed for the evaluation is available at compile-time, making it quite difficult to use:
https://bugzilla.kernel.org/show_bug.cgi?id=214861
With -Warray-bounds almost enabled globally, it doesn't make sense to keep this around.
Link: https://lkml.kernel.org/r/20211203235346.110809-1-keescook@chromium.org Signed-off-by: Kees Cook keescook@chromium.org Reviewed-by: Marco Elver elver@google.com Cc: Masahiro Yamada masahiroy@kernel.org Cc: Michal Marek michal.lkml@markovi.net Cc: Nick Desaulniers ndesaulniers@google.com Cc: Nathan Chancellor nathan@kernel.org Cc: Andrey Ryabinin ryabinin.a.a@gmail.com Cc: "Peter Zijlstra (Intel)" peterz@infradead.org Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Arnd Bergmann arnd@arndb.de Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Cc: Tadeusz Struk tadeusz.struk@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- lib/Kconfig.ubsan | 13 ------------- lib/test_ubsan.c | 22 ---------------------- scripts/Makefile.ubsan | 1 - 3 files changed, 36 deletions(-)
--- a/lib/Kconfig.ubsan +++ b/lib/Kconfig.ubsan @@ -112,19 +112,6 @@ config UBSAN_UNREACHABLE This option enables -fsanitize=unreachable which checks for control flow reaching an expected-to-be-unreachable position.
-config UBSAN_OBJECT_SIZE - bool "Perform checking for accesses beyond the end of objects" - default UBSAN - # gcc hugely expands stack usage with -fsanitize=object-size - # https://lore.kernel.org/lkml/CAHk-=wjPasyJrDuwDnpHJS2TuQfExwe=px-SzLeN8GFMAQ... - depends on !CC_IS_GCC - depends on $(cc-option,-fsanitize=object-size) - help - This option enables -fsanitize=object-size which checks for accesses - beyond the end of objects where the optimizer can determine both the - object being operated on and its size, usually seen with bad downcasts, - or access to struct members from NULL pointers. - config UBSAN_BOOL bool "Perform checking for non-boolean values used as boolean" default UBSAN --- a/lib/test_ubsan.c +++ b/lib/test_ubsan.c @@ -79,15 +79,6 @@ static void test_ubsan_load_invalid_valu eval2 = eval; }
-static void test_ubsan_null_ptr_deref(void) -{ - volatile int *ptr = NULL; - int val; - - UBSAN_TEST(CONFIG_UBSAN_OBJECT_SIZE); - val = *ptr; -} - static void test_ubsan_misaligned_access(void) { volatile char arr[5] __aligned(4) = {1, 2, 3, 4, 5}; @@ -98,29 +89,16 @@ static void test_ubsan_misaligned_access *ptr = val; }
-static void test_ubsan_object_size_mismatch(void) -{ - /* "((aligned(8)))" helps this not into be misaligned for ptr-access. */ - volatile int val __aligned(8) = 4; - volatile long long *ptr, val2; - - UBSAN_TEST(CONFIG_UBSAN_OBJECT_SIZE); - ptr = (long long *)&val; - val2 = *ptr; -} - static const test_ubsan_fp test_ubsan_array[] = { test_ubsan_shift_out_of_bounds, test_ubsan_out_of_bounds, test_ubsan_load_invalid_value, test_ubsan_misaligned_access, - test_ubsan_object_size_mismatch, };
/* Excluded because they Oops the module. */ static const test_ubsan_fp skip_ubsan_array[] = { test_ubsan_divrem_overflow, - test_ubsan_null_ptr_deref, };
static int __init test_ubsan_init(void) --- a/scripts/Makefile.ubsan +++ b/scripts/Makefile.ubsan @@ -8,7 +8,6 @@ ubsan-cflags-$(CONFIG_UBSAN_LOCAL_BOUNDS ubsan-cflags-$(CONFIG_UBSAN_SHIFT) += -fsanitize=shift ubsan-cflags-$(CONFIG_UBSAN_DIV_ZERO) += -fsanitize=integer-divide-by-zero ubsan-cflags-$(CONFIG_UBSAN_UNREACHABLE) += -fsanitize=unreachable -ubsan-cflags-$(CONFIG_UBSAN_OBJECT_SIZE) += -fsanitize=object-size ubsan-cflags-$(CONFIG_UBSAN_BOOL) += -fsanitize=bool ubsan-cflags-$(CONFIG_UBSAN_ENUM) += -fsanitize=enum ubsan-cflags-$(CONFIG_UBSAN_TRAP) += -fsanitize-undefined-trap-on-error
From: Tejun Heo tj@kernel.org
commit b09c2baa56347ae65795350dfcc633dedb1c2970 upstream.
0644 is an odd perm to create a cgroup which is a directory. Use the regular 0755 instead. This is necessary for euid switching test case.
Reviewed-by: Michal Koutný mkoutny@suse.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/cgroup/cgroup_util.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/cgroup/cgroup_util.c +++ b/tools/testing/selftests/cgroup/cgroup_util.c @@ -221,7 +221,7 @@ int cg_find_unified_root(char *root, siz
int cg_create(const char *cgroup) { - return mkdir(cgroup, 0644); + return mkdir(cgroup, 0755); }
int cg_wait_for_proc_count(const char *cgroup, int count)
From: Tejun Heo tj@kernel.org
commit 613e040e4dc285367bff0f8f75ea59839bc10947 upstream.
When a task is writing to an fd opened by a different task, the perm check should use the credentials of the latter task. Add a test for it.
Tested-by: Michal Koutný mkoutny@suse.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/cgroup/test_core.c | 68 +++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+)
--- a/tools/testing/selftests/cgroup/test_core.c +++ b/tools/testing/selftests/cgroup/test_core.c @@ -674,6 +674,73 @@ cleanup: return ret; }
+/* + * cgroup migration permission check should be performed based on the + * credentials at the time of open instead of write. + */ +static int test_cgcore_lesser_euid_open(const char *root) +{ + const uid_t test_euid = 65534; /* usually nobody, any !root is fine */ + int ret = KSFT_FAIL; + char *cg_test_a = NULL, *cg_test_b = NULL; + char *cg_test_a_procs = NULL, *cg_test_b_procs = NULL; + int cg_test_b_procs_fd = -1; + uid_t saved_uid; + + cg_test_a = cg_name(root, "cg_test_a"); + cg_test_b = cg_name(root, "cg_test_b"); + + if (!cg_test_a || !cg_test_b) + goto cleanup; + + cg_test_a_procs = cg_name(cg_test_a, "cgroup.procs"); + cg_test_b_procs = cg_name(cg_test_b, "cgroup.procs"); + + if (!cg_test_a_procs || !cg_test_b_procs) + goto cleanup; + + if (cg_create(cg_test_a) || cg_create(cg_test_b)) + goto cleanup; + + if (cg_enter_current(cg_test_a)) + goto cleanup; + + if (chown(cg_test_a_procs, test_euid, -1) || + chown(cg_test_b_procs, test_euid, -1)) + goto cleanup; + + saved_uid = geteuid(); + if (seteuid(test_euid)) + goto cleanup; + + cg_test_b_procs_fd = open(cg_test_b_procs, O_RDWR); + + if (seteuid(saved_uid)) + goto cleanup; + + if (cg_test_b_procs_fd < 0) + goto cleanup; + + if (write(cg_test_b_procs_fd, "0", 1) >= 0 || errno != EACCES) + goto cleanup; + + ret = KSFT_PASS; + +cleanup: + cg_enter_current(root); + if (cg_test_b_procs_fd >= 0) + close(cg_test_b_procs_fd); + if (cg_test_b) + cg_destroy(cg_test_b); + if (cg_test_a) + cg_destroy(cg_test_a); + free(cg_test_b_procs); + free(cg_test_a_procs); + free(cg_test_b); + free(cg_test_a); + return ret; +} + #define T(x) { x, #x } struct corecg_test { int (*fn)(const char *root); @@ -689,6 +756,7 @@ struct corecg_test { T(test_cgcore_proc_migration), T(test_cgcore_thread_migration), T(test_cgcore_destroy), + T(test_cgcore_lesser_euid_open), }; #undef T
From: Tejun Heo tj@kernel.org
commit bf35a7879f1dfb0d050fe779168bcf25c7de66f5 upstream.
When a task is writing to an fd opened by a different task, the perm check should use the cgroup namespace of the latter task. Add a test for it.
Tested-by: Michal Koutný mkoutny@suse.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/cgroup/test_core.c | 97 +++++++++++++++++++++++++++++ 1 file changed, 97 insertions(+)
--- a/tools/testing/selftests/cgroup/test_core.c +++ b/tools/testing/selftests/cgroup/test_core.c @@ -1,11 +1,14 @@ /* SPDX-License-Identifier: GPL-2.0 */
+#define _GNU_SOURCE #include <linux/limits.h> +#include <linux/sched.h> #include <sys/types.h> #include <sys/mman.h> #include <sys/wait.h> #include <unistd.h> #include <fcntl.h> +#include <sched.h> #include <stdio.h> #include <errno.h> #include <signal.h> @@ -741,6 +744,99 @@ cleanup: return ret; }
+struct lesser_ns_open_thread_arg { + const char *path; + int fd; + int err; +}; + +static int lesser_ns_open_thread_fn(void *arg) +{ + struct lesser_ns_open_thread_arg *targ = arg; + + targ->fd = open(targ->path, O_RDWR); + targ->err = errno; + return 0; +} + +/* + * cgroup migration permission check should be performed based on the cgroup + * namespace at the time of open instead of write. + */ +static int test_cgcore_lesser_ns_open(const char *root) +{ + static char stack[65536]; + const uid_t test_euid = 65534; /* usually nobody, any !root is fine */ + int ret = KSFT_FAIL; + char *cg_test_a = NULL, *cg_test_b = NULL; + char *cg_test_a_procs = NULL, *cg_test_b_procs = NULL; + int cg_test_b_procs_fd = -1; + struct lesser_ns_open_thread_arg targ = { .fd = -1 }; + pid_t pid; + int status; + + cg_test_a = cg_name(root, "cg_test_a"); + cg_test_b = cg_name(root, "cg_test_b"); + + if (!cg_test_a || !cg_test_b) + goto cleanup; + + cg_test_a_procs = cg_name(cg_test_a, "cgroup.procs"); + cg_test_b_procs = cg_name(cg_test_b, "cgroup.procs"); + + if (!cg_test_a_procs || !cg_test_b_procs) + goto cleanup; + + if (cg_create(cg_test_a) || cg_create(cg_test_b)) + goto cleanup; + + if (cg_enter_current(cg_test_b)) + goto cleanup; + + if (chown(cg_test_a_procs, test_euid, -1) || + chown(cg_test_b_procs, test_euid, -1)) + goto cleanup; + + targ.path = cg_test_b_procs; + pid = clone(lesser_ns_open_thread_fn, stack + sizeof(stack), + CLONE_NEWCGROUP | CLONE_FILES | CLONE_VM | SIGCHLD, + &targ); + if (pid < 0) + goto cleanup; + + if (waitpid(pid, &status, 0) < 0) + goto cleanup; + + if (!WIFEXITED(status)) + goto cleanup; + + cg_test_b_procs_fd = targ.fd; + if (cg_test_b_procs_fd < 0) + goto cleanup; + + if (cg_enter_current(cg_test_a)) + goto cleanup; + + if ((status = write(cg_test_b_procs_fd, "0", 1)) >= 0 || errno != ENOENT) + goto cleanup; + + ret = KSFT_PASS; + +cleanup: + cg_enter_current(root); + if (cg_test_b_procs_fd >= 0) + close(cg_test_b_procs_fd); + if (cg_test_b) + cg_destroy(cg_test_b); + if (cg_test_a) + cg_destroy(cg_test_a); + free(cg_test_b_procs); + free(cg_test_a_procs); + free(cg_test_b); + free(cg_test_a); + return ret; +} + #define T(x) { x, #x } struct corecg_test { int (*fn)(const char *root); @@ -757,6 +853,7 @@ struct corecg_test { T(test_cgcore_thread_migration), T(test_cgcore_destroy), T(test_cgcore_lesser_euid_open), + T(test_cgcore_lesser_ns_open), }; #undef T
From: Peter Xu peterx@redhat.com
commit 5abfd71d936a8aefd9f9ccd299dea7a164a5d455 upstream.
Patch series "mm: Rework zap ptes on swap entries", v5.
Patch 1 should fix a long standing bug for zap_pte_range() on zap_details usage. The risk is we could have some swap entries skipped while we should have zapped them.
Migration entries are not the major concern because file backed memory always zap in the pattern that "first time without page lock, then re-zap with page lock" hence the 2nd zap will always make sure all migration entries are already recovered.
However there can be issues with real swap entries got skipped errornoously. There's a reproducer provided in commit message of patch 1 for that.
Patch 2-4 are cleanups that are based on patch 1. After the whole patchset applied, we should have a very clean view of zap_pte_range().
Only patch 1 needs to be backported to stable if necessary.
This patch (of 4):
The "details" pointer shouldn't be the token to decide whether we should skip swap entries.
For example, when the callers specified details->zap_mapping==NULL, it means the user wants to zap all the pages (including COWed pages), then we need to look into swap entries because there can be private COWed pages that was swapped out.
Skipping some swap entries when details is non-NULL may lead to wrongly leaving some of the swap entries while we should have zapped them.
A reproducer of the problem:
===8<=== #define _GNU_SOURCE /* See feature_test_macros(7) */ #include <stdio.h> #include <assert.h> #include <unistd.h> #include <sys/mman.h> #include <sys/types.h>
int page_size; int shmem_fd; char *buffer;
void main(void) { int ret; char val;
page_size = getpagesize(); shmem_fd = memfd_create("test", 0); assert(shmem_fd >= 0);
ret = ftruncate(shmem_fd, page_size * 2); assert(ret == 0);
buffer = mmap(NULL, page_size * 2, PROT_READ | PROT_WRITE, MAP_PRIVATE, shmem_fd, 0); assert(buffer != MAP_FAILED);
/* Write private page, swap it out */ buffer[page_size] = 1; madvise(buffer, page_size * 2, MADV_PAGEOUT);
/* This should drop private buffer[page_size] already */ ret = ftruncate(shmem_fd, page_size); assert(ret == 0); /* Recover the size */ ret = ftruncate(shmem_fd, page_size * 2); assert(ret == 0);
/* Re-read the data, it should be all zero */ val = buffer[page_size]; if (val == 0) printf("Good\n"); else printf("BUG\n"); } ===8<===
We don't need to touch up the pmd path, because pmd never had a issue with swap entries. For example, shmem pmd migration will always be split into pte level, and same to swapping on anonymous.
Add another helper should_zap_cows() so that we can also check whether we should zap private mappings when there's no page pointer specified.
This patch drops that trick, so we handle swap ptes coherently. Meanwhile we should do the same check upon migration entry, hwpoison entry and genuine swap entries too.
To be explicit, we should still remember to keep the private entries if even_cows==false, and always zap them when even_cows==true.
The issue seems to exist starting from the initial commit of git.
[peterx@redhat.com: comment tweaks] Link: https://lkml.kernel.org/r/20220217060746.71256-2-peterx@redhat.com
Link: https://lkml.kernel.org/r/20220217060746.71256-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20220216094810.60572-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20220216094810.60572-2-peterx@redhat.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Peter Xu peterx@redhat.com Reviewed-by: John Hubbard jhubbard@nvidia.com Cc: David Hildenbrand david@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Alistair Popple apopple@nvidia.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: "Kirill A . Shutemov" kirill@shutemov.name Cc: Matthew Wilcox willy@infradead.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Yang Shi shy828301@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memory.c | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-)
--- a/mm/memory.c +++ b/mm/memory.c @@ -1301,6 +1301,17 @@ copy_page_range(struct vm_area_struct *d return ret; }
+/* Whether we should zap all COWed (private) pages too */ +static inline bool should_zap_cows(struct zap_details *details) +{ + /* By default, zap all pages */ + if (!details) + return true; + + /* Or, we zap COWed pages only if the caller wants to */ + return !details->check_mapping; +} + static unsigned long zap_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, @@ -1396,16 +1407,18 @@ again: continue; }
- /* If details->check_mapping, we leave swap entries. */ - if (unlikely(details)) - continue; - - if (!non_swap_entry(entry)) + if (!non_swap_entry(entry)) { + /* Genuine swap entry, hence a private anon page */ + if (!should_zap_cows(details)) + continue; rss[MM_SWAPENTS]--; - else if (is_migration_entry(entry)) { + } else if (is_migration_entry(entry)) { struct page *page;
page = pfn_swap_entry_to_page(entry); + if (details && details->check_mapping && + details->check_mapping != page_rmapping(page)) + continue; rss[mm_counter(page)]--; } if (unlikely(!free_swap_and_cache(entry)))
From: Andrea Parri (Microsoft) parri.andrea@gmail.com
commit eaa03d34535872d29004cb5cf77dc9dec1ba9a25 upstream.
Following the recommendation in Documentation/memory-barriers.txt for virtual machine guests.
Fixes: 8b6a877c060ed ("Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels") Signed-off-by: Andrea Parri (Microsoft) parri.andrea@gmail.com Link: https://lore.kernel.org/r/20220328154457.100872-1-parri.andrea@gmail.com Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hv/channel_mgmt.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/hv/channel_mgmt.c +++ b/drivers/hv/channel_mgmt.c @@ -380,7 +380,7 @@ void vmbus_channel_map_relid(struct vmbu * execute: * * (a) In the "normal (i.e., not resuming from hibernation)" path, - * the full barrier in smp_store_mb() guarantees that the store + * the full barrier in virt_store_mb() guarantees that the store * is propagated to all CPUs before the add_channel_work work * is queued. In turn, add_channel_work is queued before the * channel's ring buffer is allocated/initialized and the @@ -392,14 +392,14 @@ void vmbus_channel_map_relid(struct vmbu * recv_int_page before retrieving the channel pointer from the * array of channels. * - * (b) In the "resuming from hibernation" path, the smp_store_mb() + * (b) In the "resuming from hibernation" path, the virt_store_mb() * guarantees that the store is propagated to all CPUs before * the VMBus connection is marked as ready for the resume event * (cf. check_ready_for_resume_event()). The interrupt handler * of the VMBus driver and vmbus_chan_sched() can not run before * vmbus_bus_resume() has completed execution (cf. resume_noirq). */ - smp_store_mb( + virt_store_mb( vmbus_connection.channels[channel->offermsg.child_relid], channel); }
From: Vincent Mailhol mailhol.vincent@wanadoo.fr
commit 9ce02f0fc68326dd1f87a0a3a4c6ae7fdd39e6f6 upstream.
The macro __WARN_FLAGS() uses a local variable named "f". This being a common name, there is a risk of shadowing other variables.
For example, GCC would yield:
| In file included from ./include/linux/bug.h:5, | from ./include/linux/cpumask.h:14, | from ./arch/x86/include/asm/cpumask.h:5, | from ./arch/x86/include/asm/msr.h:11, | from ./arch/x86/include/asm/processor.h:22, | from ./arch/x86/include/asm/timex.h:5, | from ./include/linux/timex.h:65, | from ./include/linux/time32.h:13, | from ./include/linux/time.h:60, | from ./include/linux/stat.h:19, | from ./include/linux/module.h:13, | from virt/lib/irqbypass.mod.c:1: | ./include/linux/rcupdate.h: In function 'rcu_head_after_call_rcu': | ./arch/x86/include/asm/bug.h:80:21: warning: declaration of 'f' shadows a parameter [-Wshadow] | 80 | __auto_type f = BUGFLAG_WARNING|(flags); \ | | ^ | ./include/asm-generic/bug.h:106:17: note: in expansion of macro '__WARN_FLAGS' | 106 | __WARN_FLAGS(BUGFLAG_ONCE | \ | | ^~~~~~~~~~~~ | ./include/linux/rcupdate.h:1007:9: note: in expansion of macro 'WARN_ON_ONCE' | 1007 | WARN_ON_ONCE(func != (rcu_callback_t)~0L); | | ^~~~~~~~~~~~ | In file included from ./include/linux/rbtree.h:24, | from ./include/linux/mm_types.h:11, | from ./include/linux/buildid.h:5, | from ./include/linux/module.h:14, | from virt/lib/irqbypass.mod.c:1: | ./include/linux/rcupdate.h:1001:62: note: shadowed declaration is here | 1001 | rcu_head_after_call_rcu(struct rcu_head *rhp, rcu_callback_t f) | | ~~~~~~~~~~~~~~~^
For reference, sparse also warns about it, c.f. [1].
This patch renames the variable from f to __flags (with two underscore prefixes as suggested in the Linux kernel coding style [2]) in order to prevent collisions.
[1] https://lore.kernel.org/all/CAFGhKbyifH1a+nAMCvWM88TK6fpNPdzFtUXPmRGnnQeePV+...
[2] Linux kernel coding style, section 12) Macros, Enums and RTL, paragraph 5) namespace collisions when defining local variables in macros resembling functions https://www.kernel.org/doc/html/latest/process/coding-style.html#macros-enum...
Fixes: bfb1a7c91fb7 ("x86/bug: Merge annotate_reachable() into_BUG_FLAGS() asm") Signed-off-by: Vincent Mailhol mailhol.vincent@wanadoo.fr Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Nick Desaulniers ndesaulniers@google.com Acked-by: Josh Poimboeuf jpoimboe@redhat.com Link: https://lkml.kernel.org/r/20220324023742.106546-1-mailhol.vincent@wanadoo.fr Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/bug.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/bug.h +++ b/arch/x86/include/asm/bug.h @@ -77,9 +77,9 @@ do { \ */ #define __WARN_FLAGS(flags) \ do { \ - __auto_type f = BUGFLAG_WARNING|(flags); \ + __auto_type __flags = BUGFLAG_WARNING|(flags); \ instrumentation_begin(); \ - _BUG_FLAGS(ASM_UD2, f, ASM_REACHABLE); \ + _BUG_FLAGS(ASM_UD2, __flags, ASM_REACHABLE); \ instrumentation_end(); \ } while (0)
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
commit 386ef214c3c6ab111d05e1790e79475363abaa05 upstream.
try_steal_cookie() looks at task_struct::cpus_mask to decide if the task could be moved to `this' CPU. It ignores that the task might be in a migration disabled section while not on the CPU. In this case the task must not be moved otherwise per-CPU assumption are broken.
Use is_cpu_allowed(), as suggested by Peter Zijlstra, to decide if the a task can be moved.
Fixes: d2dfa17bc7de6 ("sched: Trivial forced-newidle balancer") Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/YjNK9El+3fzGmswf@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5927,7 +5927,7 @@ static bool try_steal_cookie(int this, i if (p == src->core_pick || p == src->curr) goto next;
- if (!cpumask_test_cpu(this, &p->cpus_mask)) + if (!is_cpu_allowed(p, this)) goto next;
if (p->core_occupation > dst->idle->core_occupation)
From: Peter Zijlstra peterz@infradead.org
commit 1cd5f059d956e6f614ba6666ecdbcf95db05d5f5 upstream.
Paolo reported that the instruction sequence that is used to replace:
call __static_call_return0
namely:
66 66 48 31 c0 data16 data16 xor %rax,%rax
decodes to something else on i386, namely:
66 66 48 data16 dec %ax 31 c0 xor %eax,%eax
Which is a nonsensical sequence that happens to have the same outcome. *However* an important distinction is that it consists of 2 instructions which is a problem when the thing needs to be overwriten with a regular call instruction again.
As such, replace the instruction with something that decodes the same on both i386 and x86_64.
Fixes: 3f2a8fc4b15d ("static_call/x86: Add __static_call_return0()") Reported-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/20220318204419.GT8939@worktop.programming.kicks-as... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/static_call.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/static_call.c +++ b/arch/x86/kernel/static_call.c @@ -12,10 +12,9 @@ enum insn_type { };
/* - * data16 data16 xorq %rax, %rax - a single 5 byte instruction that clears %rax - * The REX.W cancels the effect of any data16. + * cs cs cs xorl %eax, %eax - a single 5 byte instruction that clears %[er]ax */ -static const u8 xor5rax[] = { 0x66, 0x66, 0x48, 0x31, 0xc0 }; +static const u8 xor5rax[] = { 0x2e, 0x2e, 0x2e, 0x31, 0xc0 };
static void __ref __static_call_transform(void *insn, enum insn_type type, void *func) {
From: Marc Zyngier maz@kernel.org
commit af27e41612ec7e5b4783f589b753a7c31a37aac8 upstream.
The way KVM drives GICv4.{0,1} is as follows: - vcpu_load() makes the VPE resident, instructing the RD to start scanning for interrupts - just before entering the guest, we check that the RD has finished scanning and that we can start running the vcpu - on preemption, we deschedule the VPE by making it invalid on the RD
However, we are preemptible between the first two steps. If it so happens *and* that the RD was still scanning, we nonetheless write to the GICR_VPENDBASER register while Dirty is set, and bad things happen (we're in UNPRED land).
This affects both the 4.0 and 4.1 implementations.
Make sure Dirty is cleared before performing the deschedule, meaning that its_clear_vpend_valid() becomes a sort of full VPE residency barrier.
Reported-by: Jingyi Wang wangjingyi11@huawei.com Tested-by: Nianyao Tang tangnianyao@huawei.com Signed-off-by: Marc Zyngier maz@kernel.org Fixes: 57e3cebd022f ("KVM: arm64: Delay the polling of the GICR_VPENDBASER.Dirty bit") Link: https://lore.kernel.org/r/4aae10ba-b39a-5f84-754b-69c2eb0a2c03@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/irqchip/irq-gic-v3-its.c | 28 +++++++++++++++++++--------- 1 file changed, 19 insertions(+), 9 deletions(-)
--- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -3007,18 +3007,12 @@ static int __init allocate_lpi_tables(vo return 0; }
-static u64 its_clear_vpend_valid(void __iomem *vlpi_base, u64 clr, u64 set) +static u64 read_vpend_dirty_clear(void __iomem *vlpi_base) { u32 count = 1000000; /* 1s! */ bool clean; u64 val;
- val = gicr_read_vpendbaser(vlpi_base + GICR_VPENDBASER); - val &= ~GICR_VPENDBASER_Valid; - val &= ~clr; - val |= set; - gicr_write_vpendbaser(val, vlpi_base + GICR_VPENDBASER); - do { val = gicr_read_vpendbaser(vlpi_base + GICR_VPENDBASER); clean = !(val & GICR_VPENDBASER_Dirty); @@ -3029,10 +3023,26 @@ static u64 its_clear_vpend_valid(void __ } } while (!clean && count);
- if (unlikely(val & GICR_VPENDBASER_Dirty)) { + if (unlikely(!clean)) pr_err_ratelimited("ITS virtual pending table not cleaning\n"); + + return val; +} + +static u64 its_clear_vpend_valid(void __iomem *vlpi_base, u64 clr, u64 set) +{ + u64 val; + + /* Make sure we wait until the RD is done with the initial scan */ + val = read_vpend_dirty_clear(vlpi_base); + val &= ~GICR_VPENDBASER_Valid; + val &= ~clr; + val |= set; + gicr_write_vpendbaser(val, vlpi_base + GICR_VPENDBASER); + + val = read_vpend_dirty_clear(vlpi_base); + if (unlikely(val & GICR_VPENDBASER_Dirty)) val |= GICR_VPENDBASER_PendingLast; - }
return val; }
From: Christophe Leroy christophe.leroy@csgroup.eu
commit af41d2866f7d75bbb38d487f6ec7770425d70e45 upstream.
Using conditional branches between two files is hasardous, they may get linked too far from each other.
arch/powerpc/kvm/book3s_64_entry.o:(.text+0x3ec): relocation truncated to fit: R_PPC64_REL14 (stub) against symbol `system_reset_common' defined in .text section in arch/powerpc/kernel/head_64.o
Reorganise the code to use non conditional branches.
Fixes: 89d35b239101 ("KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C") Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu [mpe: Avoid odd-looking bne ., use named local labels] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/89cf27bf43ee07a0b2879b9e8e2f5cd6386a3645.164836633... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kvm/book3s_64_entry.S | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/arch/powerpc/kvm/book3s_64_entry.S +++ b/arch/powerpc/kvm/book3s_64_entry.S @@ -407,10 +407,16 @@ END_FTR_SECTION_IFSET(CPU_FTR_DAWR1) */ ld r10,HSTATE_SCRATCH0(r13) cmpwi r10,BOOK3S_INTERRUPT_MACHINE_CHECK - beq machine_check_common + beq .Lcall_machine_check_common
cmpwi r10,BOOK3S_INTERRUPT_SYSTEM_RESET - beq system_reset_common + beq .Lcall_system_reset_common
b . + +.Lcall_machine_check_common: + b machine_check_common + +.Lcall_system_reset_common: + b system_reset_common #endif
From: Andre Przywara andre.przywara@arm.com
commit 544808f7e21cb9ccdb8f3aa7de594c05b1419061 upstream.
At the moment the GIC IRQ domain translation routine happily converts ACPI table GSI numbers below 16 to GIC SGIs (Software Generated Interrupts aka IPIs). On the Devicetree side we explicitly forbid this translation, actually the function will never return HWIRQs below 16 when using a DT based domain translation.
We expect SGIs to be handled in the first part of the function, and any further occurrence should be treated as a firmware bug, so add a check and print to report this explicitly and avoid lengthy debug sessions.
Fixes: 64b499d8df40 ("irqchip/gic-v3: Configure SGIs as standard interrupts") Signed-off-by: Andre Przywara andre.przywara@arm.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20220404110842.2882446-1-andre.przywara@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/irqchip/irq-gic-v3.c | 6 ++++++ drivers/irqchip/irq-gic.c | 6 ++++++ 2 files changed, 12 insertions(+)
--- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -1466,6 +1466,12 @@ static int gic_irq_domain_translate(stru if(fwspec->param_count != 2) return -EINVAL;
+ if (fwspec->param[0] < 16) { + pr_err(FW_BUG "Illegal GSI%d translation request\n", + fwspec->param[0]); + return -EINVAL; + } + *hwirq = fwspec->param[0]; *type = fwspec->param[1];
--- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -1085,6 +1085,12 @@ static int gic_irq_domain_translate(stru if(fwspec->param_count != 2) return -EINVAL;
+ if (fwspec->param[0] < 16) { + pr_err(FW_BUG "Illegal GSI%d translation request\n", + fwspec->param[0]); + return -EINVAL; + } + *hwirq = fwspec->param[0]; *type = fwspec->param[1];
From: Waiman Long longman@redhat.com
commit a431dbbc540532b7465eae4fc8b56a85a9fc7d17 upstream.
The gcc 12 compiler reports a "'mem_section' will never be NULL" warning on the following code:
static inline struct mem_section *__nr_to_section(unsigned long nr) { #ifdef CONFIG_SPARSEMEM_EXTREME if (!mem_section) return NULL; #endif if (!mem_section[SECTION_NR_TO_ROOT(nr)]) return NULL; :
It happens with CONFIG_SPARSEMEM_EXTREME off. The mem_section definition is
#ifdef CONFIG_SPARSEMEM_EXTREME extern struct mem_section **mem_section; #else extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]; #endif
In the !CONFIG_SPARSEMEM_EXTREME case, mem_section is a static 2-dimensional array and so the check "!mem_section[SECTION_NR_TO_ROOT(nr)]" doesn't make sense.
Fix this warning by moving the "!mem_section[SECTION_NR_TO_ROOT(nr)]" check up inside the CONFIG_SPARSEMEM_EXTREME block and adding an explicit NR_SECTION_ROOTS check to make sure that there is no out-of-bound array access.
Link: https://lkml.kernel.org/r/20220331180246.2746210-1-longman@redhat.com Fixes: 3e347261a80b ("sparsemem extreme implementation") Signed-off-by: Waiman Long longman@redhat.com Reported-by: Justin Forbes jforbes@redhat.com Cc: "Kirill A . Shutemov" kirill.shutemov@linux.intel.com Cc: Ingo Molnar mingo@kernel.org Cc: Rafael Aquini aquini@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/mmzone.h | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1351,13 +1351,16 @@ static inline unsigned long *section_to_
static inline struct mem_section *__nr_to_section(unsigned long nr) { + unsigned long root = SECTION_NR_TO_ROOT(nr); + + if (unlikely(root >= NR_SECTION_ROOTS)) + return NULL; + #ifdef CONFIG_SPARSEMEM_EXTREME - if (!mem_section) + if (!mem_section || !mem_section[root]) return NULL; #endif - if (!mem_section[SECTION_NR_TO_ROOT(nr)]) - return NULL; - return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK]; + return &mem_section[root][nr & SECTION_ROOT_MASK]; } extern size_t mem_section_usage_size(void);
From: Christophe Leroy christophe.leroy@csgroup.eu
commit 8fd4ddda2f49a66bf5dd3d0c01966c4b1971308b upstream.
System.map shows that vmlinux contains several instances of __static_call_return0():
c0004fc0 t __static_call_return0 c0011518 t __static_call_return0 c00d8160 t __static_call_return0
arch_static_call_transform() uses the middle one to check whether we are setting a call to __static_call_return0 or not:
c0011520 <arch_static_call_transform>: c0011520: 3d 20 c0 01 lis r9,-16383 <== r9 = 0xc001 << 16 c0011524: 39 29 15 18 addi r9,r9,5400 <== r9 += 0x1518 c0011528: 7c 05 48 00 cmpw r5,r9 <== r9 has value 0xc0011518 here
So if static_call_update() is called with one of the other instances of __static_call_return0(), arch_static_call_transform() won't recognise it.
In order to work properly, global single instance of __static_call_return0() is required.
Fixes: 3f2a8fc4b15d ("static_call/x86: Add __static_call_return0()") Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Acked-by: Josh Poimboeuf jpoimboe@redhat.com Link: https://lkml.kernel.org/r/30821468a0e7d28251954b578e5051dc09300d04.164725849... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/static_call.h | 5 kernel/Makefile | 3 kernel/static_call.c | 542 ------------------------------------------- kernel/static_call_inline.c | 543 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 547 insertions(+), 546 deletions(-) create mode 100644 kernel/static_call_inline.c
--- a/include/linux/static_call.h +++ b/include/linux/static_call.h @@ -248,10 +248,7 @@ static inline int static_call_text_reser return 0; }
-static inline long __static_call_return0(void) -{ - return 0; -} +extern long __static_call_return0(void);
#define EXPORT_STATIC_CALL(name) \ EXPORT_SYMBOL(STATIC_CALL_KEY(name)); \ --- a/kernel/Makefile +++ b/kernel/Makefile @@ -113,7 +113,8 @@ obj-$(CONFIG_CPU_PM) += cpu_pm.o obj-$(CONFIG_BPF) += bpf/ obj-$(CONFIG_KCSAN) += kcsan/ obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o -obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o +obj-$(CONFIG_HAVE_STATIC_CALL) += static_call.o +obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call_inline.o obj-$(CONFIG_CFI_CLANG) += cfi.o
obj-$(CONFIG_PERF_EVENTS) += events/ --- a/kernel/static_call.c +++ b/kernel/static_call.c @@ -1,548 +1,8 @@ // SPDX-License-Identifier: GPL-2.0 -#include <linux/init.h> #include <linux/static_call.h> -#include <linux/bug.h> -#include <linux/smp.h> -#include <linux/sort.h> -#include <linux/slab.h> -#include <linux/module.h> -#include <linux/cpu.h> -#include <linux/processor.h> -#include <asm/sections.h> - -extern struct static_call_site __start_static_call_sites[], - __stop_static_call_sites[]; -extern struct static_call_tramp_key __start_static_call_tramp_key[], - __stop_static_call_tramp_key[]; - -static bool static_call_initialized; - -/* mutex to protect key modules/sites */ -static DEFINE_MUTEX(static_call_mutex); - -static void static_call_lock(void) -{ - mutex_lock(&static_call_mutex); -} - -static void static_call_unlock(void) -{ - mutex_unlock(&static_call_mutex); -} - -static inline void *static_call_addr(struct static_call_site *site) -{ - return (void *)((long)site->addr + (long)&site->addr); -} - -static inline unsigned long __static_call_key(const struct static_call_site *site) -{ - return (long)site->key + (long)&site->key; -} - -static inline struct static_call_key *static_call_key(const struct static_call_site *site) -{ - return (void *)(__static_call_key(site) & ~STATIC_CALL_SITE_FLAGS); -} - -/* These assume the key is word-aligned. */ -static inline bool static_call_is_init(struct static_call_site *site) -{ - return __static_call_key(site) & STATIC_CALL_SITE_INIT; -} - -static inline bool static_call_is_tail(struct static_call_site *site) -{ - return __static_call_key(site) & STATIC_CALL_SITE_TAIL; -} - -static inline void static_call_set_init(struct static_call_site *site) -{ - site->key = (__static_call_key(site) | STATIC_CALL_SITE_INIT) - - (long)&site->key; -} - -static int static_call_site_cmp(const void *_a, const void *_b) -{ - const struct static_call_site *a = _a; - const struct static_call_site *b = _b; - const struct static_call_key *key_a = static_call_key(a); - const struct static_call_key *key_b = static_call_key(b); - - if (key_a < key_b) - return -1; - - if (key_a > key_b) - return 1; - - return 0; -} - -static void static_call_site_swap(void *_a, void *_b, int size) -{ - long delta = (unsigned long)_a - (unsigned long)_b; - struct static_call_site *a = _a; - struct static_call_site *b = _b; - struct static_call_site tmp = *a; - - a->addr = b->addr - delta; - a->key = b->key - delta; - - b->addr = tmp.addr + delta; - b->key = tmp.key + delta; -} - -static inline void static_call_sort_entries(struct static_call_site *start, - struct static_call_site *stop) -{ - sort(start, stop - start, sizeof(struct static_call_site), - static_call_site_cmp, static_call_site_swap); -} - -static inline bool static_call_key_has_mods(struct static_call_key *key) -{ - return !(key->type & 1); -} - -static inline struct static_call_mod *static_call_key_next(struct static_call_key *key) -{ - if (!static_call_key_has_mods(key)) - return NULL; - - return key->mods; -} - -static inline struct static_call_site *static_call_key_sites(struct static_call_key *key) -{ - if (static_call_key_has_mods(key)) - return NULL; - - return (struct static_call_site *)(key->type & ~1); -} - -void __static_call_update(struct static_call_key *key, void *tramp, void *func) -{ - struct static_call_site *site, *stop; - struct static_call_mod *site_mod, first; - - cpus_read_lock(); - static_call_lock(); - - if (key->func == func) - goto done; - - key->func = func; - - arch_static_call_transform(NULL, tramp, func, false); - - /* - * If uninitialized, we'll not update the callsites, but they still - * point to the trampoline and we just patched that. - */ - if (WARN_ON_ONCE(!static_call_initialized)) - goto done; - - first = (struct static_call_mod){ - .next = static_call_key_next(key), - .mod = NULL, - .sites = static_call_key_sites(key), - }; - - for (site_mod = &first; site_mod; site_mod = site_mod->next) { - bool init = system_state < SYSTEM_RUNNING; - struct module *mod = site_mod->mod; - - if (!site_mod->sites) { - /* - * This can happen if the static call key is defined in - * a module which doesn't use it. - * - * It also happens in the has_mods case, where the - * 'first' entry has no sites associated with it. - */ - continue; - } - - stop = __stop_static_call_sites; - - if (mod) { -#ifdef CONFIG_MODULES - stop = mod->static_call_sites + - mod->num_static_call_sites; - init = mod->state == MODULE_STATE_COMING; -#endif - } - - for (site = site_mod->sites; - site < stop && static_call_key(site) == key; site++) { - void *site_addr = static_call_addr(site); - - if (!init && static_call_is_init(site)) - continue; - - if (!kernel_text_address((unsigned long)site_addr)) { - /* - * This skips patching built-in __exit, which - * is part of init_section_contains() but is - * not part of kernel_text_address(). - * - * Skipping built-in __exit is fine since it - * will never be executed. - */ - WARN_ONCE(!static_call_is_init(site), - "can't patch static call site at %pS", - site_addr); - continue; - } - - arch_static_call_transform(site_addr, NULL, func, - static_call_is_tail(site)); - } - } - -done: - static_call_unlock(); - cpus_read_unlock(); -} -EXPORT_SYMBOL_GPL(__static_call_update); - -static int __static_call_init(struct module *mod, - struct static_call_site *start, - struct static_call_site *stop) -{ - struct static_call_site *site; - struct static_call_key *key, *prev_key = NULL; - struct static_call_mod *site_mod; - - if (start == stop) - return 0; - - static_call_sort_entries(start, stop); - - for (site = start; site < stop; site++) { - void *site_addr = static_call_addr(site); - - if ((mod && within_module_init((unsigned long)site_addr, mod)) || - (!mod && init_section_contains(site_addr, 1))) - static_call_set_init(site); - - key = static_call_key(site); - if (key != prev_key) { - prev_key = key; - - /* - * For vmlinux (!mod) avoid the allocation by storing - * the sites pointer in the key itself. Also see - * __static_call_update()'s @first. - * - * This allows architectures (eg. x86) to call - * static_call_init() before memory allocation works. - */ - if (!mod) { - key->sites = site; - key->type |= 1; - goto do_transform; - } - - site_mod = kzalloc(sizeof(*site_mod), GFP_KERNEL); - if (!site_mod) - return -ENOMEM; - - /* - * When the key has a direct sites pointer, extract - * that into an explicit struct static_call_mod, so we - * can have a list of modules. - */ - if (static_call_key_sites(key)) { - site_mod->mod = NULL; - site_mod->next = NULL; - site_mod->sites = static_call_key_sites(key); - - key->mods = site_mod; - - site_mod = kzalloc(sizeof(*site_mod), GFP_KERNEL); - if (!site_mod) - return -ENOMEM; - } - - site_mod->mod = mod; - site_mod->sites = site; - site_mod->next = static_call_key_next(key); - key->mods = site_mod; - } - -do_transform: - arch_static_call_transform(site_addr, NULL, key->func, - static_call_is_tail(site)); - } - - return 0; -} - -static int addr_conflict(struct static_call_site *site, void *start, void *end) -{ - unsigned long addr = (unsigned long)static_call_addr(site); - - if (addr <= (unsigned long)end && - addr + CALL_INSN_SIZE > (unsigned long)start) - return 1; - - return 0; -} - -static int __static_call_text_reserved(struct static_call_site *iter_start, - struct static_call_site *iter_stop, - void *start, void *end, bool init) -{ - struct static_call_site *iter = iter_start; - - while (iter < iter_stop) { - if (init || !static_call_is_init(iter)) { - if (addr_conflict(iter, start, end)) - return 1; - } - iter++; - } - - return 0; -} - -#ifdef CONFIG_MODULES - -static int __static_call_mod_text_reserved(void *start, void *end) -{ - struct module *mod; - int ret; - - preempt_disable(); - mod = __module_text_address((unsigned long)start); - WARN_ON_ONCE(__module_text_address((unsigned long)end) != mod); - if (!try_module_get(mod)) - mod = NULL; - preempt_enable(); - - if (!mod) - return 0; - - ret = __static_call_text_reserved(mod->static_call_sites, - mod->static_call_sites + mod->num_static_call_sites, - start, end, mod->state == MODULE_STATE_COMING); - - module_put(mod); - - return ret; -} - -static unsigned long tramp_key_lookup(unsigned long addr) -{ - struct static_call_tramp_key *start = __start_static_call_tramp_key; - struct static_call_tramp_key *stop = __stop_static_call_tramp_key; - struct static_call_tramp_key *tramp_key; - - for (tramp_key = start; tramp_key != stop; tramp_key++) { - unsigned long tramp; - - tramp = (long)tramp_key->tramp + (long)&tramp_key->tramp; - if (tramp == addr) - return (long)tramp_key->key + (long)&tramp_key->key; - } - - return 0; -} - -static int static_call_add_module(struct module *mod) -{ - struct static_call_site *start = mod->static_call_sites; - struct static_call_site *stop = start + mod->num_static_call_sites; - struct static_call_site *site; - - for (site = start; site != stop; site++) { - unsigned long s_key = __static_call_key(site); - unsigned long addr = s_key & ~STATIC_CALL_SITE_FLAGS; - unsigned long key; - - /* - * Is the key is exported, 'addr' points to the key, which - * means modules are allowed to call static_call_update() on - * it. - * - * Otherwise, the key isn't exported, and 'addr' points to the - * trampoline so we need to lookup the key. - * - * We go through this dance to prevent crazy modules from - * abusing sensitive static calls. - */ - if (!kernel_text_address(addr)) - continue; - - key = tramp_key_lookup(addr); - if (!key) { - pr_warn("Failed to fixup __raw_static_call() usage at: %ps\n", - static_call_addr(site)); - return -EINVAL; - } - - key |= s_key & STATIC_CALL_SITE_FLAGS; - site->key = key - (long)&site->key; - } - - return __static_call_init(mod, start, stop); -} - -static void static_call_del_module(struct module *mod) -{ - struct static_call_site *start = mod->static_call_sites; - struct static_call_site *stop = mod->static_call_sites + - mod->num_static_call_sites; - struct static_call_key *key, *prev_key = NULL; - struct static_call_mod *site_mod, **prev; - struct static_call_site *site; - - for (site = start; site < stop; site++) { - key = static_call_key(site); - if (key == prev_key) - continue; - - prev_key = key; - - for (prev = &key->mods, site_mod = key->mods; - site_mod && site_mod->mod != mod; - prev = &site_mod->next, site_mod = site_mod->next) - ; - - if (!site_mod) - continue; - - *prev = site_mod->next; - kfree(site_mod); - } -} - -static int static_call_module_notify(struct notifier_block *nb, - unsigned long val, void *data) -{ - struct module *mod = data; - int ret = 0; - - cpus_read_lock(); - static_call_lock(); - - switch (val) { - case MODULE_STATE_COMING: - ret = static_call_add_module(mod); - if (ret) { - WARN(1, "Failed to allocate memory for static calls"); - static_call_del_module(mod); - } - break; - case MODULE_STATE_GOING: - static_call_del_module(mod); - break; - } - - static_call_unlock(); - cpus_read_unlock(); - - return notifier_from_errno(ret); -} - -static struct notifier_block static_call_module_nb = { - .notifier_call = static_call_module_notify, -}; - -#else - -static inline int __static_call_mod_text_reserved(void *start, void *end) -{ - return 0; -} - -#endif /* CONFIG_MODULES */ - -int static_call_text_reserved(void *start, void *end) -{ - bool init = system_state < SYSTEM_RUNNING; - int ret = __static_call_text_reserved(__start_static_call_sites, - __stop_static_call_sites, start, end, init); - - if (ret) - return ret; - - return __static_call_mod_text_reserved(start, end); -} - -int __init static_call_init(void) -{ - int ret; - - if (static_call_initialized) - return 0; - - cpus_read_lock(); - static_call_lock(); - ret = __static_call_init(NULL, __start_static_call_sites, - __stop_static_call_sites); - static_call_unlock(); - cpus_read_unlock(); - - if (ret) { - pr_err("Failed to allocate memory for static_call!\n"); - BUG(); - } - - static_call_initialized = true; - -#ifdef CONFIG_MODULES - register_module_notifier(&static_call_module_nb); -#endif - return 0; -} -early_initcall(static_call_init);
long __static_call_return0(void) { return 0; } - -#ifdef CONFIG_STATIC_CALL_SELFTEST - -static int func_a(int x) -{ - return x+1; -} - -static int func_b(int x) -{ - return x+2; -} - -DEFINE_STATIC_CALL(sc_selftest, func_a); - -static struct static_call_data { - int (*func)(int); - int val; - int expect; -} static_call_data [] __initdata = { - { NULL, 2, 3 }, - { func_b, 2, 4 }, - { func_a, 2, 3 } -}; - -static int __init test_static_call_init(void) -{ - int i; - - for (i = 0; i < ARRAY_SIZE(static_call_data); i++ ) { - struct static_call_data *scd = &static_call_data[i]; - - if (scd->func) - static_call_update(sc_selftest, scd->func); - - WARN_ON(static_call(sc_selftest)(scd->val) != scd->expect); - } - - return 0; -} -early_initcall(test_static_call_init); - -#endif /* CONFIG_STATIC_CALL_SELFTEST */ +EXPORT_SYMBOL_GPL(__static_call_return0); --- /dev/null +++ b/kernel/static_call_inline.c @@ -0,0 +1,543 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/init.h> +#include <linux/static_call.h> +#include <linux/bug.h> +#include <linux/smp.h> +#include <linux/sort.h> +#include <linux/slab.h> +#include <linux/module.h> +#include <linux/cpu.h> +#include <linux/processor.h> +#include <asm/sections.h> + +extern struct static_call_site __start_static_call_sites[], + __stop_static_call_sites[]; +extern struct static_call_tramp_key __start_static_call_tramp_key[], + __stop_static_call_tramp_key[]; + +static bool static_call_initialized; + +/* mutex to protect key modules/sites */ +static DEFINE_MUTEX(static_call_mutex); + +static void static_call_lock(void) +{ + mutex_lock(&static_call_mutex); +} + +static void static_call_unlock(void) +{ + mutex_unlock(&static_call_mutex); +} + +static inline void *static_call_addr(struct static_call_site *site) +{ + return (void *)((long)site->addr + (long)&site->addr); +} + +static inline unsigned long __static_call_key(const struct static_call_site *site) +{ + return (long)site->key + (long)&site->key; +} + +static inline struct static_call_key *static_call_key(const struct static_call_site *site) +{ + return (void *)(__static_call_key(site) & ~STATIC_CALL_SITE_FLAGS); +} + +/* These assume the key is word-aligned. */ +static inline bool static_call_is_init(struct static_call_site *site) +{ + return __static_call_key(site) & STATIC_CALL_SITE_INIT; +} + +static inline bool static_call_is_tail(struct static_call_site *site) +{ + return __static_call_key(site) & STATIC_CALL_SITE_TAIL; +} + +static inline void static_call_set_init(struct static_call_site *site) +{ + site->key = (__static_call_key(site) | STATIC_CALL_SITE_INIT) - + (long)&site->key; +} + +static int static_call_site_cmp(const void *_a, const void *_b) +{ + const struct static_call_site *a = _a; + const struct static_call_site *b = _b; + const struct static_call_key *key_a = static_call_key(a); + const struct static_call_key *key_b = static_call_key(b); + + if (key_a < key_b) + return -1; + + if (key_a > key_b) + return 1; + + return 0; +} + +static void static_call_site_swap(void *_a, void *_b, int size) +{ + long delta = (unsigned long)_a - (unsigned long)_b; + struct static_call_site *a = _a; + struct static_call_site *b = _b; + struct static_call_site tmp = *a; + + a->addr = b->addr - delta; + a->key = b->key - delta; + + b->addr = tmp.addr + delta; + b->key = tmp.key + delta; +} + +static inline void static_call_sort_entries(struct static_call_site *start, + struct static_call_site *stop) +{ + sort(start, stop - start, sizeof(struct static_call_site), + static_call_site_cmp, static_call_site_swap); +} + +static inline bool static_call_key_has_mods(struct static_call_key *key) +{ + return !(key->type & 1); +} + +static inline struct static_call_mod *static_call_key_next(struct static_call_key *key) +{ + if (!static_call_key_has_mods(key)) + return NULL; + + return key->mods; +} + +static inline struct static_call_site *static_call_key_sites(struct static_call_key *key) +{ + if (static_call_key_has_mods(key)) + return NULL; + + return (struct static_call_site *)(key->type & ~1); +} + +void __static_call_update(struct static_call_key *key, void *tramp, void *func) +{ + struct static_call_site *site, *stop; + struct static_call_mod *site_mod, first; + + cpus_read_lock(); + static_call_lock(); + + if (key->func == func) + goto done; + + key->func = func; + + arch_static_call_transform(NULL, tramp, func, false); + + /* + * If uninitialized, we'll not update the callsites, but they still + * point to the trampoline and we just patched that. + */ + if (WARN_ON_ONCE(!static_call_initialized)) + goto done; + + first = (struct static_call_mod){ + .next = static_call_key_next(key), + .mod = NULL, + .sites = static_call_key_sites(key), + }; + + for (site_mod = &first; site_mod; site_mod = site_mod->next) { + bool init = system_state < SYSTEM_RUNNING; + struct module *mod = site_mod->mod; + + if (!site_mod->sites) { + /* + * This can happen if the static call key is defined in + * a module which doesn't use it. + * + * It also happens in the has_mods case, where the + * 'first' entry has no sites associated with it. + */ + continue; + } + + stop = __stop_static_call_sites; + + if (mod) { +#ifdef CONFIG_MODULES + stop = mod->static_call_sites + + mod->num_static_call_sites; + init = mod->state == MODULE_STATE_COMING; +#endif + } + + for (site = site_mod->sites; + site < stop && static_call_key(site) == key; site++) { + void *site_addr = static_call_addr(site); + + if (!init && static_call_is_init(site)) + continue; + + if (!kernel_text_address((unsigned long)site_addr)) { + /* + * This skips patching built-in __exit, which + * is part of init_section_contains() but is + * not part of kernel_text_address(). + * + * Skipping built-in __exit is fine since it + * will never be executed. + */ + WARN_ONCE(!static_call_is_init(site), + "can't patch static call site at %pS", + site_addr); + continue; + } + + arch_static_call_transform(site_addr, NULL, func, + static_call_is_tail(site)); + } + } + +done: + static_call_unlock(); + cpus_read_unlock(); +} +EXPORT_SYMBOL_GPL(__static_call_update); + +static int __static_call_init(struct module *mod, + struct static_call_site *start, + struct static_call_site *stop) +{ + struct static_call_site *site; + struct static_call_key *key, *prev_key = NULL; + struct static_call_mod *site_mod; + + if (start == stop) + return 0; + + static_call_sort_entries(start, stop); + + for (site = start; site < stop; site++) { + void *site_addr = static_call_addr(site); + + if ((mod && within_module_init((unsigned long)site_addr, mod)) || + (!mod && init_section_contains(site_addr, 1))) + static_call_set_init(site); + + key = static_call_key(site); + if (key != prev_key) { + prev_key = key; + + /* + * For vmlinux (!mod) avoid the allocation by storing + * the sites pointer in the key itself. Also see + * __static_call_update()'s @first. + * + * This allows architectures (eg. x86) to call + * static_call_init() before memory allocation works. + */ + if (!mod) { + key->sites = site; + key->type |= 1; + goto do_transform; + } + + site_mod = kzalloc(sizeof(*site_mod), GFP_KERNEL); + if (!site_mod) + return -ENOMEM; + + /* + * When the key has a direct sites pointer, extract + * that into an explicit struct static_call_mod, so we + * can have a list of modules. + */ + if (static_call_key_sites(key)) { + site_mod->mod = NULL; + site_mod->next = NULL; + site_mod->sites = static_call_key_sites(key); + + key->mods = site_mod; + + site_mod = kzalloc(sizeof(*site_mod), GFP_KERNEL); + if (!site_mod) + return -ENOMEM; + } + + site_mod->mod = mod; + site_mod->sites = site; + site_mod->next = static_call_key_next(key); + key->mods = site_mod; + } + +do_transform: + arch_static_call_transform(site_addr, NULL, key->func, + static_call_is_tail(site)); + } + + return 0; +} + +static int addr_conflict(struct static_call_site *site, void *start, void *end) +{ + unsigned long addr = (unsigned long)static_call_addr(site); + + if (addr <= (unsigned long)end && + addr + CALL_INSN_SIZE > (unsigned long)start) + return 1; + + return 0; +} + +static int __static_call_text_reserved(struct static_call_site *iter_start, + struct static_call_site *iter_stop, + void *start, void *end, bool init) +{ + struct static_call_site *iter = iter_start; + + while (iter < iter_stop) { + if (init || !static_call_is_init(iter)) { + if (addr_conflict(iter, start, end)) + return 1; + } + iter++; + } + + return 0; +} + +#ifdef CONFIG_MODULES + +static int __static_call_mod_text_reserved(void *start, void *end) +{ + struct module *mod; + int ret; + + preempt_disable(); + mod = __module_text_address((unsigned long)start); + WARN_ON_ONCE(__module_text_address((unsigned long)end) != mod); + if (!try_module_get(mod)) + mod = NULL; + preempt_enable(); + + if (!mod) + return 0; + + ret = __static_call_text_reserved(mod->static_call_sites, + mod->static_call_sites + mod->num_static_call_sites, + start, end, mod->state == MODULE_STATE_COMING); + + module_put(mod); + + return ret; +} + +static unsigned long tramp_key_lookup(unsigned long addr) +{ + struct static_call_tramp_key *start = __start_static_call_tramp_key; + struct static_call_tramp_key *stop = __stop_static_call_tramp_key; + struct static_call_tramp_key *tramp_key; + + for (tramp_key = start; tramp_key != stop; tramp_key++) { + unsigned long tramp; + + tramp = (long)tramp_key->tramp + (long)&tramp_key->tramp; + if (tramp == addr) + return (long)tramp_key->key + (long)&tramp_key->key; + } + + return 0; +} + +static int static_call_add_module(struct module *mod) +{ + struct static_call_site *start = mod->static_call_sites; + struct static_call_site *stop = start + mod->num_static_call_sites; + struct static_call_site *site; + + for (site = start; site != stop; site++) { + unsigned long s_key = __static_call_key(site); + unsigned long addr = s_key & ~STATIC_CALL_SITE_FLAGS; + unsigned long key; + + /* + * Is the key is exported, 'addr' points to the key, which + * means modules are allowed to call static_call_update() on + * it. + * + * Otherwise, the key isn't exported, and 'addr' points to the + * trampoline so we need to lookup the key. + * + * We go through this dance to prevent crazy modules from + * abusing sensitive static calls. + */ + if (!kernel_text_address(addr)) + continue; + + key = tramp_key_lookup(addr); + if (!key) { + pr_warn("Failed to fixup __raw_static_call() usage at: %ps\n", + static_call_addr(site)); + return -EINVAL; + } + + key |= s_key & STATIC_CALL_SITE_FLAGS; + site->key = key - (long)&site->key; + } + + return __static_call_init(mod, start, stop); +} + +static void static_call_del_module(struct module *mod) +{ + struct static_call_site *start = mod->static_call_sites; + struct static_call_site *stop = mod->static_call_sites + + mod->num_static_call_sites; + struct static_call_key *key, *prev_key = NULL; + struct static_call_mod *site_mod, **prev; + struct static_call_site *site; + + for (site = start; site < stop; site++) { + key = static_call_key(site); + if (key == prev_key) + continue; + + prev_key = key; + + for (prev = &key->mods, site_mod = key->mods; + site_mod && site_mod->mod != mod; + prev = &site_mod->next, site_mod = site_mod->next) + ; + + if (!site_mod) + continue; + + *prev = site_mod->next; + kfree(site_mod); + } +} + +static int static_call_module_notify(struct notifier_block *nb, + unsigned long val, void *data) +{ + struct module *mod = data; + int ret = 0; + + cpus_read_lock(); + static_call_lock(); + + switch (val) { + case MODULE_STATE_COMING: + ret = static_call_add_module(mod); + if (ret) { + WARN(1, "Failed to allocate memory for static calls"); + static_call_del_module(mod); + } + break; + case MODULE_STATE_GOING: + static_call_del_module(mod); + break; + } + + static_call_unlock(); + cpus_read_unlock(); + + return notifier_from_errno(ret); +} + +static struct notifier_block static_call_module_nb = { + .notifier_call = static_call_module_notify, +}; + +#else + +static inline int __static_call_mod_text_reserved(void *start, void *end) +{ + return 0; +} + +#endif /* CONFIG_MODULES */ + +int static_call_text_reserved(void *start, void *end) +{ + bool init = system_state < SYSTEM_RUNNING; + int ret = __static_call_text_reserved(__start_static_call_sites, + __stop_static_call_sites, start, end, init); + + if (ret) + return ret; + + return __static_call_mod_text_reserved(start, end); +} + +int __init static_call_init(void) +{ + int ret; + + if (static_call_initialized) + return 0; + + cpus_read_lock(); + static_call_lock(); + ret = __static_call_init(NULL, __start_static_call_sites, + __stop_static_call_sites); + static_call_unlock(); + cpus_read_unlock(); + + if (ret) { + pr_err("Failed to allocate memory for static_call!\n"); + BUG(); + } + + static_call_initialized = true; + +#ifdef CONFIG_MODULES + register_module_notifier(&static_call_module_nb); +#endif + return 0; +} +early_initcall(static_call_init); + +#ifdef CONFIG_STATIC_CALL_SELFTEST + +static int func_a(int x) +{ + return x+1; +} + +static int func_b(int x) +{ + return x+2; +} + +DEFINE_STATIC_CALL(sc_selftest, func_a); + +static struct static_call_data { + int (*func)(int); + int val; + int expect; +} static_call_data [] __initdata = { + { NULL, 2, 3 }, + { func_b, 2, 4 }, + { func_a, 2, 3 } +}; + +static int __init test_static_call_init(void) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(static_call_data); i++ ) { + struct static_call_data *scd = &static_call_data[i]; + + if (scd->func) + static_call_update(sc_selftest, scd->func); + + WARN_ON(static_call(sc_selftest)(scd->val) != scd->expect); + } + + return 0; +} +early_initcall(test_static_call_init); + +#endif /* CONFIG_STATIC_CALL_SELFTEST */
From: Kefeng Wang wangkefeng.wang@huawei.com
commit ffa0b64e3be58519ae472ea29a1a1ad681e32f48 upstream.
mpe: On 64-bit Book3E vmalloc space starts at 0x8000000000000000.
Because of the way __pa() works we have: __pa(0x8000000000000000) == 0, and therefore virt_to_pfn(0x8000000000000000) == 0, and therefore virt_addr_valid(0x8000000000000000) == true
Which is wrong, virt_addr_valid() should be false for vmalloc space. In fact all vmalloc addresses that alias with a valid PFN will return true from virt_addr_valid(). That can cause bugs with hardened usercopy as described below by Kefeng Wang:
When running ethtool eth0 on 64-bit Book3E, a BUG occurred:
usercopy: Kernel memory exposure attempt detected from SLUB object not in SLUB page?! (offset 0, size 1048)! kernel BUG at mm/usercopy.c:99 ... usercopy_abort+0x64/0xa0 (unreliable) __check_heap_object+0x168/0x190 __check_object_size+0x1a0/0x200 dev_ethtool+0x2494/0x2b20 dev_ioctl+0x5d0/0x770 sock_do_ioctl+0xf0/0x1d0 sock_ioctl+0x3ec/0x5a0 __se_sys_ioctl+0xf0/0x160 system_call_exception+0xfc/0x1f0 system_call_common+0xf8/0x200
The code shows below,
data = vzalloc(array_size(gstrings.len, ETH_GSTRING_LEN)); copy_to_user(useraddr, data, gstrings.len * ETH_GSTRING_LEN))
The data is alloced by vmalloc(), virt_addr_valid(ptr) will return true on 64-bit Book3E, which leads to the panic.
As commit 4dd7554a6456 ("powerpc/64: Add VIRTUAL_BUG_ON checks for __va and __pa addresses") does, make sure the virt addr above PAGE_OFFSET in the virt_addr_valid() for 64-bit, also add upper limit check to make sure the virt is below high_memory.
Meanwhile, for 32-bit PAGE_OFFSET is the virtual address of the start of lowmem, high_memory is the upper low virtual address, the check is suitable for 32-bit, this will fix the issue mentioned in commit 602946ec2f90 ("powerpc: Set max_mapnr correctly") too.
On 32-bit there is a similar problem with high memory, that was fixed in commit 602946ec2f90 ("powerpc: Set max_mapnr correctly"), but that commit breaks highmem and needs to be reverted.
We can't easily fix __pa(), we have code that relies on its current behaviour. So for now add extra checks to virt_addr_valid().
For 64-bit Book3S the extra checks are not necessary, the combination of virt_to_pfn() and pfn_valid() should yield the correct result, but they are harmless.
Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Reviewed-by: Christophe Leroy christophe.leroy@csgroup.eu [mpe: Add additional change log detail] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220406145802.538416-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/page.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/arch/powerpc/include/asm/page.h +++ b/arch/powerpc/include/asm/page.h @@ -132,7 +132,11 @@ static inline bool pfn_valid(unsigned lo #define virt_to_page(kaddr) pfn_to_page(virt_to_pfn(kaddr)) #define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT)
-#define virt_addr_valid(kaddr) pfn_valid(virt_to_pfn(kaddr)) +#define virt_addr_valid(vaddr) ({ \ + unsigned long _addr = (unsigned long)vaddr; \ + _addr >= PAGE_OFFSET && _addr < (unsigned long)high_memory && \ + pfn_valid(virt_to_pfn(_addr)); \ +})
/* * On Book-E parts we need __va to parse the device tree and we can't
On Tue, 12 Apr 2022 at 12:11, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.15.34 release. There are 277 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 14 Apr 2022 06:28:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.34-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
On linux stable-rc 5.15 x86 and i386 builds failed due to below error [1] with config [2].
The finding is when kunit config is enabled the builds pass. CONFIG_KUNIT=y
But with CONFIG_KUNIT not set the builds failed.
x86_64-linux-gnu-ld: mm/kfence/core.o: in function `__kfence_alloc': core.c:(.text+0x901): undefined reference to `filter_irq_stacks' make[1]: *** [/builds/linux/Makefile:1183: vmlinux] Error 1
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
I see these three commits, I will bisect and get back to you
2f222c87ceb4 kfence: limit currently covered allocations when pool nearly full e25487912879 kfence: move saving stack trace of allocations into __kfence_alloc() d99355395380 kfence: count unexpectedly skipped allocations
-- Linaro LKFT https://lkft.linaro.org
[1] https://builds.tuxbuild.com/27h6Ztu4T35pY178Xg8EyAj7gIW/ [2] https://builds.tuxbuild.com/27h6Ztu4T35pY178Xg8EyAj7gIW/config
On Tue, 12 Apr 2022 at 16:16, Naresh Kamboju naresh.kamboju@linaro.org wrote:
On Tue, 12 Apr 2022 at 12:11, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.15.34 release. There are 277 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 14 Apr 2022 06:28:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.34-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
On linux stable-rc 5.15 x86 and i386 builds failed due to below error [1] with config [2].
The finding is when kunit config is enabled the builds pass. CONFIG_KUNIT=y
But with CONFIG_KUNIT not set the builds failed.
x86_64-linux-gnu-ld: mm/kfence/core.o: in function `__kfence_alloc': core.c:(.text+0x901): undefined reference to `filter_irq_stacks' make[1]: *** [/builds/linux/Makefile:1183: vmlinux] Error 1
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
I see these three commits, I will bisect and get back to you
2f222c87ceb4 kfence: limit currently covered allocations when pool nearly full e25487912879 kfence: move saving stack trace of allocations into __kfence_alloc() d99355395380 kfence: count unexpectedly skipped allocations
My guess is that this commit is missing: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
Thanks, -- Marco
Hi Marco
On Tue, 12 Apr 2022 at 20:32, Marco Elver elver@google.com wrote:
On Tue, 12 Apr 2022 at 16:16, Naresh Kamboju naresh.kamboju@linaro.org wrote:
On Tue, 12 Apr 2022 at 12:11, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.15.34 release. There are 277 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 14 Apr 2022 06:28:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.34-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
On linux stable-rc 5.15 x86 and i386 builds failed due to below error [1] with config [2].
The finding is when kunit config is enabled the builds pass. CONFIG_KUNIT=y
But with CONFIG_KUNIT not set the builds failed.
x86_64-linux-gnu-ld: mm/kfence/core.o: in function `__kfence_alloc': core.c:(.text+0x901): undefined reference to `filter_irq_stacks' make[1]: *** [/builds/linux/Makefile:1183: vmlinux] Error 1
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
I see these three commits, I will bisect and get back to you
2f222c87ceb4 kfence: limit currently covered allocations when pool nearly full e25487912879 kfence: move saving stack trace of allocations into __kfence_alloc() d99355395380 kfence: count unexpectedly skipped allocations
My guess is that this commit is missing:
This patch is missing Fixes: tag.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
For your information, I have reverted the below commit and build pass.
kfence: limit currently covered allocations when pool nearly full
[ Upstream commit 08f6b10630f284755087f58aa393402e15b92977 ]
- Naresh
Thanks, -- Marco
On Tue, 12 Apr 2022 at 17:44, Naresh Kamboju naresh.kamboju@linaro.org wrote:
Hi Marco
On Tue, 12 Apr 2022 at 20:32, Marco Elver elver@google.com wrote:
On Tue, 12 Apr 2022 at 16:16, Naresh Kamboju naresh.kamboju@linaro.org wrote:
On Tue, 12 Apr 2022 at 12:11, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.15.34 release. There are 277 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 14 Apr 2022 06:28:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.34-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
On linux stable-rc 5.15 x86 and i386 builds failed due to below error [1] with config [2].
The finding is when kunit config is enabled the builds pass. CONFIG_KUNIT=y
But with CONFIG_KUNIT not set the builds failed.
x86_64-linux-gnu-ld: mm/kfence/core.o: in function `__kfence_alloc': core.c:(.text+0x901): undefined reference to `filter_irq_stacks' make[1]: *** [/builds/linux/Makefile:1183: vmlinux] Error 1
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
I see these three commits, I will bisect and get back to you
2f222c87ceb4 kfence: limit currently covered allocations when pool nearly full e25487912879 kfence: move saving stack trace of allocations into __kfence_alloc() d99355395380 kfence: count unexpectedly skipped allocations
My guess is that this commit is missing:
This patch is missing Fixes: tag.
No it's not - it was patch 1/N in this series: https://lore.kernel.org/all/20210923104803.2620285-1-elver@google.com/
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
On Tue, Apr 12, 2022 at 09:13:59PM +0530, Naresh Kamboju wrote:
Hi Marco
On Tue, 12 Apr 2022 at 20:32, Marco Elver elver@google.com wrote:
On Tue, 12 Apr 2022 at 16:16, Naresh Kamboju naresh.kamboju@linaro.org wrote:
On Tue, 12 Apr 2022 at 12:11, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.15.34 release. There are 277 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 14 Apr 2022 06:28:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.34-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
On linux stable-rc 5.15 x86 and i386 builds failed due to below error [1] with config [2].
The finding is when kunit config is enabled the builds pass. CONFIG_KUNIT=y
But with CONFIG_KUNIT not set the builds failed.
x86_64-linux-gnu-ld: mm/kfence/core.o: in function `__kfence_alloc': core.c:(.text+0x901): undefined reference to `filter_irq_stacks' make[1]: *** [/builds/linux/Makefile:1183: vmlinux] Error 1
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
I see these three commits, I will bisect and get back to you
2f222c87ceb4 kfence: limit currently covered allocations when pool nearly full e25487912879 kfence: move saving stack trace of allocations into __kfence_alloc() d99355395380 kfence: count unexpectedly skipped allocations
My guess is that this commit is missing:
This patch is missing Fixes: tag.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
For your information, I have reverted the below commit and build pass.
kfence: limit currently covered allocations when pool nearly full
[ Upstream commit 08f6b10630f284755087f58aa393402e15b92977 ]
I've added the above commit, does that fix the issue?
Hm, I can test that here, let me try it...
greg k-h
On Tue, Apr 12, 2022 at 06:43:15PM +0200, Greg Kroah-Hartman wrote:
On Tue, Apr 12, 2022 at 09:13:59PM +0530, Naresh Kamboju wrote:
Hi Marco
On Tue, 12 Apr 2022 at 20:32, Marco Elver elver@google.com wrote:
On Tue, 12 Apr 2022 at 16:16, Naresh Kamboju naresh.kamboju@linaro.org wrote:
On Tue, 12 Apr 2022 at 12:11, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.15.34 release. There are 277 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 14 Apr 2022 06:28:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.34-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
On linux stable-rc 5.15 x86 and i386 builds failed due to below error [1] with config [2].
The finding is when kunit config is enabled the builds pass. CONFIG_KUNIT=y
But with CONFIG_KUNIT not set the builds failed.
x86_64-linux-gnu-ld: mm/kfence/core.o: in function `__kfence_alloc': core.c:(.text+0x901): undefined reference to `filter_irq_stacks' make[1]: *** [/builds/linux/Makefile:1183: vmlinux] Error 1
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
I see these three commits, I will bisect and get back to you
2f222c87ceb4 kfence: limit currently covered allocations when pool nearly full e25487912879 kfence: move saving stack trace of allocations into __kfence_alloc() d99355395380 kfence: count unexpectedly skipped allocations
My guess is that this commit is missing:
This patch is missing Fixes: tag.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
For your information, I have reverted the below commit and build pass.
kfence: limit currently covered allocations when pool nearly full
[ Upstream commit 08f6b10630f284755087f58aa393402e15b92977 ]
I've added the above commit, does that fix the issue?
Hm, I can test that here, let me try it...
I can't duplicate the failure here with my config, let me try yours...
On Tue, Apr 12, 2022 at 07:23:10PM +0200, Greg Kroah-Hartman wrote:
On Tue, Apr 12, 2022 at 06:43:15PM +0200, Greg Kroah-Hartman wrote:
On Tue, Apr 12, 2022 at 09:13:59PM +0530, Naresh Kamboju wrote:
Hi Marco
On Tue, 12 Apr 2022 at 20:32, Marco Elver elver@google.com wrote:
On Tue, 12 Apr 2022 at 16:16, Naresh Kamboju naresh.kamboju@linaro.org wrote:
On Tue, 12 Apr 2022 at 12:11, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.15.34 release. There are 277 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 14 Apr 2022 06:28:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.34-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
On linux stable-rc 5.15 x86 and i386 builds failed due to below error [1] with config [2].
The finding is when kunit config is enabled the builds pass. CONFIG_KUNIT=y
But with CONFIG_KUNIT not set the builds failed.
x86_64-linux-gnu-ld: mm/kfence/core.o: in function `__kfence_alloc': core.c:(.text+0x901): undefined reference to `filter_irq_stacks' make[1]: *** [/builds/linux/Makefile:1183: vmlinux] Error 1
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
I see these three commits, I will bisect and get back to you
2f222c87ceb4 kfence: limit currently covered allocations when pool nearly full e25487912879 kfence: move saving stack trace of allocations into __kfence_alloc() d99355395380 kfence: count unexpectedly skipped allocations
My guess is that this commit is missing:
This patch is missing Fixes: tag.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
For your information, I have reverted the below commit and build pass.
kfence: limit currently covered allocations when pool nearly full
[ Upstream commit 08f6b10630f284755087f58aa393402e15b92977 ]
I've added the above commit, does that fix the issue?
Hm, I can test that here, let me try it...
I can't duplicate the failure here with my config, let me try yours...
Yes, with your config before it fails, after I added the commit it works. I'll push out a -rc2 soon with that added.
thanks,
greg k-h
On 4/11/22 23:26, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.34 release. There are 277 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 14 Apr 2022 06:28:59 +0000. Anything received after that time might be too late.
Building powerpc:skiroot_defconfig ... failed -------------- Error log: arch/powerpc/lib/code-patching.c:119:19: error: conflicting types for 'unmap_patch_area'; have 'int(long unsigned int)' 119 | static inline int unmap_patch_area(unsigned long addr) | ^~~~~~~~~~~~~~~~ arch/powerpc/lib/code-patching.c:51:13: note: previous declaration of 'unmap_patch_area' with type 'void(long unsigned int)' 51 | static void unmap_patch_area(unsigned long addr); | ^~~~~~~~~~~~~~~~ arch/powerpc/lib/code-patching.c:51:13: error: 'unmap_patch_area' used but never defined [-Werror]
Commit 520c23a20890 ("powerpc/code-patching: Pre-map patch area") is the last patch in a series of patches applied to the file since v5.15. It is not tagged for stable, and it does not include a Fixes: tag. I am not sure if it makes sense to apply it on its own. Copying Michael.
Guenter
On Tue, Apr 12, 2022 at 08:39:59AM -0700, Guenter Roeck wrote:
On 4/11/22 23:26, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.34 release. There are 277 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 14 Apr 2022 06:28:59 +0000. Anything received after that time might be too late.
Building powerpc:skiroot_defconfig ... failed
Error log: arch/powerpc/lib/code-patching.c:119:19: error: conflicting types for 'unmap_patch_area'; have 'int(long unsigned int)' 119 | static inline int unmap_patch_area(unsigned long addr) | ^~~~~~~~~~~~~~~~ arch/powerpc/lib/code-patching.c:51:13: note: previous declaration of 'unmap_patch_area' with type 'void(long unsigned int)' 51 | static void unmap_patch_area(unsigned long addr); | ^~~~~~~~~~~~~~~~ arch/powerpc/lib/code-patching.c:51:13: error: 'unmap_patch_area' used but never defined [-Werror]
Commit 520c23a20890 ("powerpc/code-patching: Pre-map patch area") is the last patch in a series of patches applied to the file since v5.15. It is not tagged for stable, and it does not include a Fixes: tag. I am not sure if it makes sense to apply it on its own. Copying Michael.
Yeah, that's not right. I'll drop it from 5.10 and 5.15 (and older), for some reason Sasha skipped 5.16, probably because he saw this build error there.
thanks for the report, I'll push out a -rc2 soon.
greg k-h
linux-stable-mirror@lists.linaro.org