This is the start of the stable review cycle for the 5.11.12 release. There are 152 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 07 Apr 2021 08:50:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.11.12-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.11.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.11.12-rc1
David S. Miller davem@davemloft.net Revert "net: bonding: fix error return code of bond_neigh_init()"
Jens Axboe axboe@kernel.dk Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing"
Pavel Begunkov asml.silence@gmail.com io_uring: do ctx sqd ejection in a clear context
Ben Dooks ben.dooks@codethink.co.uk riscv: evaluate put_user() arg before enabling user access
Du Cheng ducheng2@gmail.com drivers: video: fbcon: fix NULL dereference in fbcon_cursor()
Ahmad Fatoum a.fatoum@pengutronix.de driver core: clear deferred probe reason on probe retry
Atul Gopinathan atulgopinathan@gmail.com staging: rtl8192e: Change state information from u16 to u8
Atul Gopinathan atulgopinathan@gmail.com staging: rtl8192e: Fix incorrect source in memcpy()
Roja Rani Yarubandi rojay@codeaurora.org soc: qcom-geni-se: Cleanup the code to remove proxy votes
Wesley Cheng wcheng@codeaurora.org usb: dwc3: gadget: Clear DEP flags after stop transfers in ep disable
Shawn Guo shawn.guo@linaro.org usb: dwc3: qcom: skip interconnect init for ACPI probe
Artur Petrosyan Arthur.Petrosyan@synopsys.com usb: dwc2: Prevent core suspend when port connection flag is 0
Artur Petrosyan Arthur.Petrosyan@synopsys.com usb: dwc2: Fix HPRT0.PrtSusp bit setting for HiKey 960 board.
Tong Zhang ztong0001@gmail.com usb: gadget: udc: amd5536udc_pci fix null-ptr-dereference
Johan Hovold johan@kernel.org USB: cdc-acm: fix use-after-free after probe failure
Johan Hovold johan@kernel.org USB: cdc-acm: fix double free on probe failure
Oliver Neukum oneukum@suse.com USB: cdc-acm: downgrade message to debug
Oliver Neukum oneukum@suse.com USB: cdc-acm: untangle a circular dependency between callback and softint
Oliver Neukum oneukum@suse.com cdc-acm: fix BREAK rx code path adding necessary calls
Chunfeng Yun chunfeng.yun@mediatek.com usb: xhci-mtk: fix broken streams issue on 0.96 xHCI
Tony Lindgren tony@atomide.com usb: musb: Fix suspend with devices connected for a64
Vincent Palatin vpalatin@chromium.org USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modem
Shuah Khan skhan@linuxfoundation.org usbip: vhci_hcd fix shift out-of-bounds in vhci_hub_control()
Zheyu Ma zheyuma97@gmail.com firewire: nosy: Fix a use-after-free bug in nosy_ioctl()
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com powerpc/mm/book3s64: Use the correct storage key value when calling H_PROTECT
Lv Yunlong lyl2019@mail.ustc.edu.cn video: hyperv_fb: Fix a double free in hvfb_probe
Andy Shevchenko andriy.shevchenko@linux.intel.com usb: dwc3: pci: Enable dis_uX_susphy_quirk for Intel Merrifield
Nathan Lynch nathanl@linux.ibm.com powerpc/pseries/mobility: handle premature return from H_JOIN
Nathan Lynch nathanl@linux.ibm.com powerpc/pseries/mobility: use struct for shared state
Richard Gong richard.gong@intel.com firmware: stratix10-svc: reset COMMAND_RECONFIG_FLAG_PARTIAL to 0
Dinghao Liu dinghao.liu@zju.edu.cn extcon: Fix error handling in extcon_dev_register
Krzysztof Kozlowski krzk@kernel.org extcon: Add stubs for extcon_register_notifier_all() functions
Sean Christopherson seanjc@google.com KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping
Paolo Bonzini pbonzini@redhat.com KVM: x86: compile out TDP MMU on 32-bit systems
Ben Gardon bgardon@google.com KVM: x86/mmu: Use atomic ops to set SPTEs in TDP MMU map
Ben Gardon bgardon@google.com KVM: x86/mmu: Factor out functions to add/remove TDP MMU pages
Ben Gardon bgardon@google.com KVM: x86/mmu: Fix braces in kvm_recover_nx_lpages
Ben Gardon bgardon@google.com KVM: x86/mmu: Don't redundantly clear TDP MMU pt memory
Ben Gardon bgardon@google.com KVM: x86/mmu: Add comment on __tdp_mmu_set_spte
Sean Christopherson seanjc@google.com KVM: x86/mmu: Ensure TLBs are flushed when yielding during GFN range zap
Ben Gardon bgardon@google.com KVM: x86/mmu: Protect TDP MMU page table memory with RCU
Ben Gardon bgardon@google.com KVM: x86/mmu: Factor out handling of removed page tables
Ben Gardon bgardon@google.com KVM: x86/mmu: Add lockdep when setting a TDP MMU SPTE
Ben Gardon bgardon@google.com KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed
Ben Gardon bgardon@google.com KVM: x86/mmu: Ensure forward progress when yielding in TDP MMU iter
Ben Gardon bgardon@google.com KVM: x86/mmu: Rename goal_gfn to next_last_level_gfn
Ben Gardon bgardon@google.com KVM: x86/mmu: Merge flush and non-flush tdp_mmu_iter_cond_resched
Ben Gardon bgardon@google.com KVM: x86/mmu: change TDP MMU yield function returns to match cond_resched
Arnd Bergmann arnd@arndb.de pinctrl: qcom: fix unintentional string concatenation
Jonathan Marek jonathan@marek.ca pinctrl: qcom: lpass lpi: use default pullup/strength values
Rajendra Nayak rnayak@codeaurora.org pinctrl: qcom: sc7280: Fix SDC1_RCLK configurations
Rajendra Nayak rnayak@codeaurora.org pinctrl: qcom: sc7280: Fix SDC_QDSD_PINGROUP and UFS_RESET offsets
Wang Panzhenzhuan randy.wang@rock-chips.com pinctrl: rockchip: fix restore error in resume
Lars Povlsen lars.povlsen@microchip.com pinctrl: microchip-sgpio: Fix wrong register offset for IRQ trigger
Jason Gunthorpe jgg@ziepe.ca vfio/nvlink: Add missing SPAPR_TCE_IOMMU depends
Thierry Reding treding@nvidia.com drm/tegra: sor: Grab runtime PM reference across reset
Thierry Reding treding@nvidia.com drm/tegra: dc: Restore coupling of display controllers
Pan Bian bianpan2016@163.com drm/imx: fix memory leak when fails to init
Tetsuo Handa penguin-kernel@i-love.sakura.ne.jp reiserfs: update reiserfs_xattrs_initialized() condition
Xℹ Ruoyao xry111@mengyan1223.wang drm/amdgpu: check alignment on CPU page for bo map
Huacai Chen chenhuacai@kernel.org drm/amdgpu: Set a suitable dev_info.gart_page_size
Nirmoy Das nirmoy.das@amd.com drm/amdgpu: fix offset calculation in amdgpu_vm_bo_clear_mappings()
Alex Deucher alexander.deucher@amd.com drm/amdgpu/vangogh: don't check for dpm in is_dpm_running when in suspend
Evan Quan evan.quan@amd.com drm/amd/pm: no need to force MCLK to highest when no display connected
Qu Huang jinsdb@126.com drm/amdkfd: dqm fence memory corruption
Ilya Lipnitskiy ilya.lipnitskiy@gmail.com mm: fix race by making init_zero_pfn() early_initcall
Christian König christian.koenig@amd.com drm/ttm: make ttm_bo_unpin more defensive
Heiko Carstens hca@linux.ibm.com s390/vdso: fix tod_steering_delta type
Heiko Carstens hca@linux.ibm.com s390/vdso: copy tod_steering_delta value to vdso_data page
Steven Rostedt (VMware) rostedt@goodmis.org tracing: Fix stack trace event size
Adrian Hunter adrian.hunter@intel.com PM: runtime: Fix ordering in pm_runtime_get_suppliers()
Adrian Hunter adrian.hunter@intel.com PM: runtime: Fix race getting/putting suppliers at probe
Paolo Bonzini pbonzini@redhat.com KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit
Paolo Bonzini pbonzini@redhat.com KVM: SVM: load control fields from VMCB12 before checking them
Max Filippov jcmvbkbc@gmail.com xtensa: move coprocessor_flush to the .text section
Max Filippov jcmvbkbc@gmail.com xtensa: fix uaccess-related livelock in do_page_fault
Jeremy Szu jeremy.szu@canonical.com ALSA: hda/realtek: fix mute/micmute LEDs for HP 640 G8
Hui Wang hui.wang@canonical.com ALSA: hda/realtek: call alc_update_headset_mode() in hp_automute_hook
Hui Wang hui.wang@canonical.com ALSA: hda/realtek: fix a determine_headset_type issue for a Dell AIO
Takashi Iwai tiwai@suse.de ALSA: hda: Add missing sanity checks in PM prepare/complete callbacks
Takashi Iwai tiwai@suse.de ALSA: hda: Re-add dropped snd_poewr_change_state() calls
Ikjoon Jang ikjn@chromium.org ALSA: usb-audio: Apply sample rate quirk to Logitech Connect
Hans de Goede hdegoede@redhat.com ACPI: scan: Fix _STA getting called on devices with unmet dependencies
Vitaly Kuznetsov vkuznets@redhat.com ACPI: processor: Fix CPU0 wakeup in acpi_idle_play_dead()
Rafael J. Wysocki rafael.j.wysocki@intel.com ACPI: tables: x86: Reserve memory occupied by ACPI tables
Jesper Dangaard Brouer brouer@redhat.com bpf: Remove MTU check in __bpf_skb_max_len
Jisheng Zhang Jisheng.Zhang@synaptics.com net: 9p: advance iov on empty read
Tong Zhang ztong0001@gmail.com net: wan/lmc: unregister device when no matching device is found
Alex Elder elder@linaro.org net: ipa: fix register write command validation
Alex Elder elder@linaro.org net: ipa: use a separate pointer for adjusted GSI memory
Alex Elder elder@linaro.org net: ipa: remove two unused register definitions
Doug Brown doug@schmorgal.com appletalk: Fix skb allocation size in loopback case
Nathan Rossi nathan.rossi@digi.com net: ethernet: aquantia: Handle error cleanup of start on open
Shuah Khan skhan@linuxfoundation.org ath10k: hold RCU lock when calling ieee80211_find_sta_by_ifaddr()
Johannes Berg johannes.berg@intel.com iwlwifi: pcie: don't disable interrupts for reg_lock
Ido Schimmel idosch@nvidia.com netdevsim: dev: Initialize FIB module after debugfs
Guo-Feng Fan vincent_fann@realtek.com rtw88: coex: 8821c: correct antenna switch function
Wen Gong wgong@codeaurora.org ath11k: add ieee80211_unregister_hw to avoid kernel crash caused by NULL pointer
Luca Pesce luca.pesce@vimar.com brcmfmac: clear EAP/association status bits on linkdown events
Sasha Levin sashal@kernel.org can: tcan4x5x: fix max register value
Dan Carpenter dan.carpenter@oracle.com mptcp: fix bit MPTCP_PUSH_PENDING tests
Jia-Ju Bai baijiaju1990@gmail.com net: bonding: fix error return code of bond_neigh_init()
Paolo Abeni pabeni@redhat.com mptcp: fix race in release_cb
Oleksij Rempel linux@rempel-privat.de net: introduce CAN specific pointer in the struct net_device
Marc Kleine-Budde mkl@pengutronix.de can: dev: move driver related infrastructure into separate subdir
Florian Westphal fw@strlen.de mptcp: provide subflow aware release function
Paolo Abeni pabeni@redhat.com mptcp: fix DATA_FIN processing for orphaned sockets
Davide Caratti dcaratti@redhat.com flow_dissector: fix TTL and TOS dissection on IPv4 fragments
Paolo Abeni pabeni@redhat.com mptcp: add a missing retransmission timer scheduling
Paolo Abeni pabeni@redhat.com mptcp: init mptcp request socket earlier
Paolo Abeni pabeni@redhat.com mptcp: fix poll after shutdown
Paolo Abeni pabeni@redhat.com mptcp: deliver ssk errors to msk
Sasha Levin sashal@kernel.org net: mvpp2: fix interrupt mask/unmask skip condition
Stefan Metzmacher metze@samba.org io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL
zhangyi (F) yi.zhang@huawei.com ext4: do not iput inode under running transaction in ext4_rename()
Peter Zijlstra peterz@infradead.org static_call: Align static_call_is_init() patching condition
Tobias Klausmann tobias.klausmann@freenet.de nouveau: Skip unvailable ttm page entries
Josef Bacik josef@toxicpanda.com Revert "PM: ACPI: reboot: Use S5 for reboot"
Stefan Metzmacher metze@samba.org io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls
Elad Grupi elad.grupi@dell.com nvmet-tcp: fix kmap leak when data digest in use
Waiman Long longman@redhat.com locking/ww_mutex: Fix acquire/release imbalance in ww_acquire_init()/ww_acquire_fini()
Waiman Long longman@redhat.com locking/ww_mutex: Simplify use_ww_ctx & ww_ctx handling
Manaf Meethalavalappu Pallikunhi manafm@codeaurora.org thermal/core: Add NULL pointer check before using cooling device stats
Bard Liao yung-chuan.liao@linux.intel.com ASoC: rt711: add snd_soc_component remove callback
Sameer Pujar spujar@nvidia.com ASoC: rt5659: Update MCLK rate in set_sysclk()
Tong Zhang ztong0001@gmail.com staging: comedi: cb_pcidas64: fix request_irq() warn
Tong Zhang ztong0001@gmail.com staging: comedi: cb_pcidas: fix request_irq() warn
Alexey Dobriyan adobriyan@gmail.com scsi: qla2xxx: Fix broken #endif placement
Lv Yunlong lyl2019@mail.ustc.edu.cn scsi: st: Fix a use after free in st_open()
Pavel Begunkov asml.silence@gmail.com io_uring: halt SQO submission on ctx exit
Pavel Begunkov asml.silence@gmail.com io_uring: fix ->flags races by linked timeouts
Laurent Vivier lvivier@redhat.com vhost: Fix vhost_vq_reset()
Jens Axboe axboe@kernel.dk kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing
Jiaxin Yu jiaxin.yu@mediatek.com ASoC: mediatek: mt8192: fix tdm out data is valid on rising edge
Olga Kornievskaia kolga@netapp.com NFSD: fix error handling in NFSv4.0 callbacks
Lucas Tanure tanureal@opensource.cirrus.com ASoC: cs42l42: Always wait at least 3ms after reset
Lucas Tanure tanureal@opensource.cirrus.com ASoC: cs42l42: Fix mixer volume control
Lucas Tanure tanureal@opensource.cirrus.com ASoC: cs42l42: Fix channel width support
Lucas Tanure tanureal@opensource.cirrus.com ASoC: cs42l42: Fix Bitclock polarity inversion
Jon Hunter jonathanh@nvidia.com ASoC: soc-core: Prevent warning if no DMI table is present
Hans de Goede hdegoede@redhat.com ASoC: es8316: Simplify adc_pga_gain_tlv table
Benjamin Rood benjaminjrood@gmail.com ASoC: sgtl5000: set DAP_AVC_CTRL register to correct default value on probe
Hans de Goede hdegoede@redhat.com ASoC: rt5651: Fix dac- and adc- vol-tlv values being off by a factor of 10
Hans de Goede hdegoede@redhat.com ASoC: rt5640: Fix dac- and adc- vol-tlv values being off by a factor of 10
Jack Yu jack.yu@realtek.com ASoC: rt1015: fix i2c communication error
Ritesh Harjani riteshh@linux.ibm.com iomap: Fix negative assignment to unsigned sis->pages in iomap_swapfile_activate
J. Bruce Fields bfields@redhat.com rpc: fix NULL dereference on kmalloc failure
Julian Braha julianbraha@gmail.com fs: nfsd: fix kconfig dependency warning for NFSD_V4
Zhaolong Zhang zhangzl2013@126.com ext4: fix bh ref count on error paths
Eric Whitney enwlinux@gmail.com ext4: shrink race window in ext4_should_retry_alloc()
Vivek Goyal vgoyal@redhat.com virtiofs: Fail dax mount if device does not support it
Pavel Tatashin pasha.tatashin@soleen.com arm64: mm: correct the inside linear map range during hotplug check
-------------
Diffstat:
Documentation/virt/kvm/locking.rst | 9 +- Makefile | 4 +- arch/arm64/mm/mmu.c | 20 +- arch/powerpc/platforms/pseries/lpar.c | 3 +- arch/powerpc/platforms/pseries/mobility.c | 48 ++- arch/riscv/include/asm/uaccess.h | 7 +- arch/s390/include/asm/vdso/data.h | 2 +- arch/s390/kernel/time.c | 1 + arch/x86/include/asm/kvm_host.h | 15 + arch/x86/include/asm/smp.h | 1 + arch/x86/kernel/acpi/boot.c | 25 +- arch/x86/kernel/setup.c | 8 +- arch/x86/kernel/smpboot.c | 2 +- arch/x86/kvm/Makefile | 3 +- arch/x86/kvm/mmu/mmu.c | 49 +-- arch/x86/kvm/mmu/mmu_internal.h | 5 + arch/x86/kvm/mmu/tdp_iter.c | 46 +-- arch/x86/kvm/mmu/tdp_iter.h | 21 +- arch/x86/kvm/mmu/tdp_mmu.c | 448 +++++++++++++++------ arch/x86/kvm/mmu/tdp_mmu.h | 32 +- arch/x86/kvm/svm/nested.c | 28 +- arch/xtensa/kernel/coprocessor.S | 64 +-- arch/xtensa/mm/fault.c | 5 +- drivers/acpi/processor_idle.c | 7 + drivers/acpi/scan.c | 12 +- drivers/acpi/tables.c | 42 +- drivers/base/dd.c | 3 + drivers/base/power/runtime.c | 10 +- drivers/extcon/extcon.c | 1 + drivers/firewire/nosy.c | 9 +- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 10 +- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2 +- .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 +- .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 8 +- .../gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c | 3 +- drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 5 + drivers/gpu/drm/imx/imx-drm-core.c | 2 +- drivers/gpu/drm/nouveau/nouveau_bo.c | 8 + drivers/gpu/drm/tegra/dc.c | 20 +- drivers/gpu/drm/tegra/sor.c | 7 + drivers/net/can/Makefile | 7 +- drivers/net/can/dev/Makefile | 7 + drivers/net/can/{ => dev}/dev.c | 4 +- drivers/net/can/{ => dev}/rx-offload.c | 0 drivers/net/can/m_can/tcan4x5x.c | 2 +- drivers/net/can/slcan.c | 4 +- drivers/net/can/vcan.c | 2 +- drivers/net/can/vxcan.c | 6 +- drivers/net/ethernet/aquantia/atlantic/aq_main.c | 4 +- drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 4 +- drivers/net/ipa/gsi.c | 28 +- drivers/net/ipa/gsi.h | 5 +- drivers/net/ipa/gsi_reg.h | 31 +- drivers/net/ipa/ipa_cmd.c | 32 +- drivers/net/netdevsim/dev.c | 40 +- drivers/net/wan/lmc/lmc_main.c | 2 + drivers/net/wireless/ath/ath10k/wmi-tlv.c | 7 +- drivers/net/wireless/ath/ath11k/mac.c | 7 +- .../broadcom/brcm80211/brcmfmac/cfg80211.c | 7 +- drivers/net/wireless/intel/iwlwifi/pcie/trans.c | 11 +- drivers/net/wireless/intel/iwlwifi/pcie/tx-gen2.c | 5 +- drivers/net/wireless/intel/iwlwifi/pcie/tx.c | 22 +- drivers/net/wireless/realtek/rtw88/rtw8821c.c | 16 +- drivers/nvme/target/tcp.c | 2 +- drivers/pinctrl/pinctrl-microchip-sgpio.c | 2 +- drivers/pinctrl/pinctrl-rockchip.c | 13 +- drivers/pinctrl/qcom/pinctrl-lpass-lpi.c | 2 +- drivers/pinctrl/qcom/pinctrl-sc7280.c | 16 +- drivers/pinctrl/qcom/pinctrl-sdx55.c | 2 +- drivers/scsi/qla2xxx/qla_target.h | 2 +- drivers/scsi/st.c | 2 +- drivers/soc/qcom/qcom-geni-se.c | 74 ---- drivers/staging/comedi/drivers/cb_pcidas.c | 2 +- drivers/staging/comedi/drivers/cb_pcidas64.c | 2 +- drivers/staging/rtl8192e/rtllib.h | 2 +- drivers/staging/rtl8192e/rtllib_rx.c | 2 +- drivers/thermal/thermal_sysfs.c | 3 + drivers/tty/serial/qcom_geni_serial.c | 7 - drivers/usb/class/cdc-acm.c | 61 ++- drivers/usb/core/quirks.c | 4 + drivers/usb/dwc2/hcd.c | 5 +- drivers/usb/dwc3/dwc3-pci.c | 2 + drivers/usb/dwc3/dwc3-qcom.c | 3 + drivers/usb/dwc3/gadget.c | 8 +- drivers/usb/gadget/udc/amd5536udc_pci.c | 10 +- drivers/usb/host/xhci-mtk.c | 10 +- drivers/usb/musb/musb_core.c | 12 +- drivers/usb/usbip/vhci_hcd.c | 2 + drivers/vfio/pci/Kconfig | 2 +- drivers/vhost/vhost.c | 2 +- drivers/video/fbdev/core/fbcon.c | 3 + drivers/video/fbdev/hyperv_fb.c | 3 - fs/ext4/balloc.c | 38 +- fs/ext4/ext4.h | 1 + fs/ext4/inode.c | 6 +- fs/ext4/namei.c | 18 +- fs/ext4/super.c | 5 + fs/ext4/sysfs.c | 7 + fs/fuse/virtio_fs.c | 9 +- fs/io_uring.c | 41 +- fs/iomap/swapfile.c | 10 + fs/nfsd/Kconfig | 1 + fs/nfsd/nfs4callback.c | 1 + fs/reiserfs/xattr.h | 2 +- include/drm/ttm/ttm_bo_api.h | 6 +- include/linux/acpi.h | 9 +- include/linux/can/can-ml.h | 12 + include/linux/extcon.h | 23 ++ .../linux/firmware/intel/stratix10-svc-client.h | 2 +- include/linux/netdevice.h | 34 +- include/linux/qcom-geni-se.h | 2 - include/linux/ww_mutex.h | 5 +- kernel/locking/mutex.c | 25 +- kernel/reboot.c | 2 - kernel/static_call.c | 14 +- kernel/trace/trace.c | 3 +- mm/memory.c | 2 +- net/9p/client.c | 4 - net/appletalk/ddp.c | 33 +- net/can/af_can.c | 34 +- net/can/j1939/main.c | 22 +- net/can/j1939/socket.c | 13 +- net/can/proc.c | 19 +- net/core/filter.c | 12 +- net/core/flow_dissector.c | 6 +- net/mptcp/options.c | 3 +- net/mptcp/protocol.c | 111 ++++- net/mptcp/protocol.h | 4 + net/mptcp/subflow.c | 83 ++-- net/sunrpc/auth_gss/svcauth_gss.c | 11 +- sound/pci/hda/hda_intel.c | 8 + sound/pci/hda/patch_realtek.c | 4 +- sound/soc/codecs/cs42l42.c | 74 ++-- sound/soc/codecs/cs42l42.h | 13 +- sound/soc/codecs/es8316.c | 9 +- sound/soc/codecs/rt1015.c | 1 + sound/soc/codecs/rt5640.c | 4 +- sound/soc/codecs/rt5651.c | 4 +- sound/soc/codecs/rt5659.c | 5 + sound/soc/codecs/rt711.c | 8 + sound/soc/codecs/sgtl5000.c | 2 +- sound/soc/mediatek/mt8192/mt8192-dai-tdm.c | 4 +- sound/soc/mediatek/mt8192/mt8192-reg.h | 8 +- sound/soc/soc-core.c | 4 + sound/usb/quirks.c | 1 + .../testing/selftests/net/forwarding/tc_flower.sh | 38 +- 151 files changed, 1525 insertions(+), 821 deletions(-)
From: Pavel Tatashin pasha.tatashin@soleen.com
[ Upstream commit ee7febce051945be28ad86d16a15886f878204de ]
Memory hotplug may fail on systems with CONFIG_RANDOMIZE_BASE because the linear map range is not checked correctly.
The start physical address that linear map covers can be actually at the end of the range because of randomization. Check that and if so reduce it to 0.
This can be verified on QEMU with setting kaslr-seed to ~0ul:
memstart_offset_seed = 0xffff START: __pa(_PAGE_OFFSET(vabits_actual)) = ffff9000c0000000 END: __pa(PAGE_END - 1) = 1000bfffffff
Signed-off-by: Pavel Tatashin pasha.tatashin@soleen.com Fixes: 58284a901b42 ("arm64/mm: Validate hotplug range before creating linear mapping") Tested-by: Tyler Hicks tyhicks@linux.microsoft.com Reviewed-by: Anshuman Khandual anshuman.khandual@arm.com Link: https://lore.kernel.org/r/20210216150351.129018-2-pasha.tatashin@soleen.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/mm/mmu.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 6f0648777d34..ee01f421e1e4 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1445,14 +1445,30 @@ static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size)
static bool inside_linear_region(u64 start, u64 size) { + u64 start_linear_pa = __pa(_PAGE_OFFSET(vabits_actual)); + u64 end_linear_pa = __pa(PAGE_END - 1); + + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) { + /* + * Check for a wrap, it is possible because of randomized linear + * mapping the start physical address is actually bigger than + * the end physical address. In this case set start to zero + * because [0, end_linear_pa] range must still be able to cover + * all addressable physical addresses. + */ + if (start_linear_pa > end_linear_pa) + start_linear_pa = 0; + } + + WARN_ON(start_linear_pa > end_linear_pa); + /* * Linear mapping region is the range [PAGE_OFFSET..(PAGE_END - 1)] * accommodating both its ends but excluding PAGE_END. Max physical * range which can be mapped inside this linear mapping range, must * also be derived from its end points. */ - return start >= __pa(_PAGE_OFFSET(vabits_actual)) && - (start + size - 1) <= __pa(PAGE_END - 1); + return start >= start_linear_pa && (start + size - 1) <= end_linear_pa; }
int arch_add_memory(int nid, u64 start, u64 size,
From: Vivek Goyal vgoyal@redhat.com
[ Upstream commit 3f9b9efd82a84f27e95d0414f852caf1fa839e83 ]
Right now "mount -t virtiofs -o dax myfs /mnt/virtiofs" succeeds even if filesystem deivce does not have a cache window and hence DAX can't be supported.
This gives a false sense to user that they are using DAX with virtiofs but fact of the matter is that they are not.
Fix this by returning error if dax can't be supported and user has asked for it.
Signed-off-by: Vivek Goyal vgoyal@redhat.com Reviewed-by: Stefan Hajnoczi stefanha@redhat.com Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/fuse/virtio_fs.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 8868ac31a3c0..4ee6f734ba83 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -1324,8 +1324,15 @@ static int virtio_fs_fill_super(struct super_block *sb, struct fs_context *fsc)
/* virtiofs allocates and installs its own fuse devices */ ctx->fudptr = NULL; - if (ctx->dax) + if (ctx->dax) { + if (!fs->dax_dev) { + err = -EINVAL; + pr_err("virtio-fs: dax can't be enabled as filesystem" + " device does not support it.\n"); + goto err_free_fuse_devs; + } ctx->dax_dev = fs->dax_dev; + } err = fuse_fill_super_common(sb, ctx); if (err < 0) goto err_free_fuse_devs;
From: Eric Whitney enwlinux@gmail.com
[ Upstream commit efc61345274d6c7a46a0570efbc916fcbe3e927b ]
When generic/371 is run on kvm-xfstests using 5.10 and 5.11 kernels, it fails at significant rates on the two test scenarios that disable delayed allocation (ext3conv and data_journal) and force actual block allocation for the fallocate and pwrite functions in the test. The failure rate on 5.10 for both ext3conv and data_journal on one test system typically runs about 85%. On 5.11, the failure rate on ext3conv sometimes drops to as low as 1% while the rate on data_journal increases to nearly 100%.
The observed failures are largely due to ext4_should_retry_alloc() cutting off block allocation retries when s_mb_free_pending (used to indicate that a transaction in progress will free blocks) is 0. However, free space is usually available when this occurs during runs of generic/371. It appears that a thread attempting to allocate blocks is just missing transaction commits in other threads that increase the free cluster count and reset s_mb_free_pending while the allocating thread isn't running. Explicitly testing for free space availability avoids this race.
The current code uses a post-increment operator in the conditional expression that determines whether the retry limit has been exceeded. This means that the conditional expression uses the value of the retry counter before it's increased, resulting in an extra retry cycle. The current code actually retries twice before hitting its retry limit rather than once.
Increasing the retry limit to 3 from the current actual maximum retry count of 2 in combination with the change described above reduces the observed failure rate to less that 0.1% on both ext3conv and data_journal with what should be limited impact on users sensitive to the overhead caused by retries.
A per filesystem percpu counter exported via sysfs is added to allow users or developers to track the number of times the retry limit is exceeded without resorting to debugging methods. This should provide some insight into worst case retry behavior.
Signed-off-by: Eric Whitney enwlinux@gmail.com Link: https://lore.kernel.org/r/20210218151132.19678-1-enwlinux@gmail.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/balloc.c | 38 ++++++++++++++++++++++++++------------ fs/ext4/ext4.h | 1 + fs/ext4/super.c | 5 +++++ fs/ext4/sysfs.c | 7 +++++++ 4 files changed, 39 insertions(+), 12 deletions(-)
diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index f45f9feebe59..74a5172c2d83 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -626,27 +626,41 @@ int ext4_claim_free_clusters(struct ext4_sb_info *sbi,
/** * ext4_should_retry_alloc() - check if a block allocation should be retried - * @sb: super block - * @retries: number of attemps has been made + * @sb: superblock + * @retries: number of retry attempts made so far * - * ext4_should_retry_alloc() is called when ENOSPC is returned, and if - * it is profitable to retry the operation, this function will wait - * for the current or committing transaction to complete, and then - * return TRUE. We will only retry once. + * ext4_should_retry_alloc() is called when ENOSPC is returned while + * attempting to allocate blocks. If there's an indication that a pending + * journal transaction might free some space and allow another attempt to + * succeed, this function will wait for the current or committing transaction + * to complete and then return TRUE. */ int ext4_should_retry_alloc(struct super_block *sb, int *retries) { - if (!ext4_has_free_clusters(EXT4_SB(sb), 1, 0) || - (*retries)++ > 1 || - !EXT4_SB(sb)->s_journal) + struct ext4_sb_info *sbi = EXT4_SB(sb); + + if (!sbi->s_journal) return 0;
- smp_mb(); - if (EXT4_SB(sb)->s_mb_free_pending == 0) + if (++(*retries) > 3) { + percpu_counter_inc(&sbi->s_sra_exceeded_retry_limit); return 0; + }
+ /* + * if there's no indication that blocks are about to be freed it's + * possible we just missed a transaction commit that did so + */ + smp_mb(); + if (sbi->s_mb_free_pending == 0) + return ext4_has_free_clusters(sbi, 1, 0); + + /* + * it's possible we've just missed a transaction commit here, + * so ignore the returned status + */ jbd_debug(1, "%s: retrying operation after ENOSPC\n", sb->s_id); - jbd2_journal_force_commit_nested(EXT4_SB(sb)->s_journal); + (void) jbd2_journal_force_commit_nested(sbi->s_journal); return 1; }
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index e5c81593d972..9ad539ee4196 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1484,6 +1484,7 @@ struct ext4_sb_info { struct percpu_counter s_freeinodes_counter; struct percpu_counter s_dirs_counter; struct percpu_counter s_dirtyclusters_counter; + struct percpu_counter s_sra_exceeded_retry_limit; struct blockgroup_lock *s_blockgroup_lock; struct proc_dir_entry *s_proc; struct kobject s_kobj; diff --git a/fs/ext4/super.c b/fs/ext4/super.c index a1353b0825ea..c8cc8175b376 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1210,6 +1210,7 @@ static void ext4_put_super(struct super_block *sb) percpu_counter_destroy(&sbi->s_freeinodes_counter); percpu_counter_destroy(&sbi->s_dirs_counter); percpu_counter_destroy(&sbi->s_dirtyclusters_counter); + percpu_counter_destroy(&sbi->s_sra_exceeded_retry_limit); percpu_free_rwsem(&sbi->s_writepages_rwsem); #ifdef CONFIG_QUOTA for (i = 0; i < EXT4_MAXQUOTAS; i++) @@ -5011,6 +5012,9 @@ no_journal: if (!err) err = percpu_counter_init(&sbi->s_dirtyclusters_counter, 0, GFP_KERNEL); + if (!err) + err = percpu_counter_init(&sbi->s_sra_exceeded_retry_limit, 0, + GFP_KERNEL); if (!err) err = percpu_init_rwsem(&sbi->s_writepages_rwsem);
@@ -5124,6 +5128,7 @@ failed_mount6: percpu_counter_destroy(&sbi->s_freeinodes_counter); percpu_counter_destroy(&sbi->s_dirs_counter); percpu_counter_destroy(&sbi->s_dirtyclusters_counter); + percpu_counter_destroy(&sbi->s_sra_exceeded_retry_limit); percpu_free_rwsem(&sbi->s_writepages_rwsem); failed_mount5: ext4_ext_release(sb); diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c index 075aa3a19ff5..a3d08276d441 100644 --- a/fs/ext4/sysfs.c +++ b/fs/ext4/sysfs.c @@ -24,6 +24,7 @@ typedef enum { attr_session_write_kbytes, attr_lifetime_write_kbytes, attr_reserved_clusters, + attr_sra_exceeded_retry_limit, attr_inode_readahead, attr_trigger_test_error, attr_first_error_time, @@ -202,6 +203,7 @@ EXT4_ATTR_FUNC(delayed_allocation_blocks, 0444); EXT4_ATTR_FUNC(session_write_kbytes, 0444); EXT4_ATTR_FUNC(lifetime_write_kbytes, 0444); EXT4_ATTR_FUNC(reserved_clusters, 0644); +EXT4_ATTR_FUNC(sra_exceeded_retry_limit, 0444);
EXT4_ATTR_OFFSET(inode_readahead_blks, 0644, inode_readahead, ext4_sb_info, s_inode_readahead_blks); @@ -251,6 +253,7 @@ static struct attribute *ext4_attrs[] = { ATTR_LIST(session_write_kbytes), ATTR_LIST(lifetime_write_kbytes), ATTR_LIST(reserved_clusters), + ATTR_LIST(sra_exceeded_retry_limit), ATTR_LIST(inode_readahead_blks), ATTR_LIST(inode_goal), ATTR_LIST(mb_stats), @@ -374,6 +377,10 @@ static ssize_t ext4_attr_show(struct kobject *kobj, return snprintf(buf, PAGE_SIZE, "%llu\n", (unsigned long long) atomic64_read(&sbi->s_resv_clusters)); + case attr_sra_exceeded_retry_limit: + return snprintf(buf, PAGE_SIZE, "%llu\n", + (unsigned long long) + percpu_counter_sum(&sbi->s_sra_exceeded_retry_limit)); case attr_inode_readahead: case attr_pointer_ui: if (!ptr)
From: Zhaolong Zhang zhangzl2013@126.com
[ Upstream commit c915fb80eaa6194fa9bd0a4487705cd5b0dda2f1 ]
__ext4_journalled_writepage should drop bhs' ref count on error paths
Signed-off-by: Zhaolong Zhang zhangzl2013@126.com Link: https://lore.kernel.org/r/1614678151-70481-1-git-send-email-zhangzl2013@126.... Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/inode.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index ed498538a749..3b9f7bf4045b 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1937,13 +1937,13 @@ static int __ext4_journalled_writepage(struct page *page, if (!ret) ret = err;
- if (!ext4_has_inline_data(inode)) - ext4_walk_page_buffers(NULL, page_bufs, 0, len, - NULL, bput_one); ext4_set_inode_state(inode, EXT4_STATE_JDATA); out: unlock_page(page); out_no_pagelock: + if (!inline_data && page_bufs) + ext4_walk_page_buffers(NULL, page_bufs, 0, len, + NULL, bput_one); brelse(inode_bh); return ret; }
From: Julian Braha julianbraha@gmail.com
[ Upstream commit 7005227369079963d25fb2d5d736d0feb2c44cf6 ]
When NFSD_V4 is enabled and CRYPTO is disabled, Kbuild gives the following warning:
WARNING: unmet direct dependencies detected for CRYPTO_SHA256 Depends on [n]: CRYPTO [=n] Selected by [y]: - NFSD_V4 [=y] && NETWORK_FILESYSTEMS [=y] && NFSD [=y] && PROC_FS [=y]
WARNING: unmet direct dependencies detected for CRYPTO_MD5 Depends on [n]: CRYPTO [=n] Selected by [y]: - NFSD_V4 [=y] && NETWORK_FILESYSTEMS [=y] && NFSD [=y] && PROC_FS [=y]
This is because NFSD_V4 selects CRYPTO_MD5 and CRYPTO_SHA256, without depending on or selecting CRYPTO, despite those config options being subordinate to CRYPTO.
Signed-off-by: Julian Braha julianbraha@gmail.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig index dbbc583d6273..248f1459c039 100644 --- a/fs/nfsd/Kconfig +++ b/fs/nfsd/Kconfig @@ -73,6 +73,7 @@ config NFSD_V4 select NFSD_V3 select FS_POSIX_ACL select SUNRPC_GSS + select CRYPTO select CRYPTO_MD5 select CRYPTO_SHA256 select GRACE_PERIOD
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 0ddc942394013f08992fc379ca04cffacbbe3dae ]
I think this is unlikely but possible:
svc_authenticate sets rq_authop and calls svcauth_gss_accept. The kmalloc(sizeof(*svcdata), GFP_KERNEL) fails, leaving rq_auth_data NULL, and returning SVC_DENIED.
This causes svc_process_common to go to err_bad_auth, and eventually call svc_authorise. That calls ->release == svcauth_gss_release, which tries to dereference rq_auth_data.
Signed-off-by: J. Bruce Fields bfields@redhat.com Link: https://lore.kernel.org/linux-nfs/3F1B347F-B809-478F-A1E9-0BE98E22B0F0@oracl... Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/auth_gss/svcauth_gss.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c index bd4678db9d76..6dff64374bfe 100644 --- a/net/sunrpc/auth_gss/svcauth_gss.c +++ b/net/sunrpc/auth_gss/svcauth_gss.c @@ -1825,11 +1825,14 @@ static int svcauth_gss_release(struct svc_rqst *rqstp) { struct gss_svc_data *gsd = (struct gss_svc_data *)rqstp->rq_auth_data; - struct rpc_gss_wire_cred *gc = &gsd->clcred; + struct rpc_gss_wire_cred *gc; struct xdr_buf *resbuf = &rqstp->rq_res; int stat = -EINVAL; struct sunrpc_net *sn = net_generic(SVC_NET(rqstp), sunrpc_net_id);
+ if (!gsd) + goto out; + gc = &gsd->clcred; if (gc->gc_proc != RPC_GSS_PROC_DATA) goto out; /* Release can be called twice, but we only wrap once. */ @@ -1870,10 +1873,10 @@ out_err: if (rqstp->rq_cred.cr_group_info) put_group_info(rqstp->rq_cred.cr_group_info); rqstp->rq_cred.cr_group_info = NULL; - if (gsd->rsci) + if (gsd && gsd->rsci) { cache_put(&gsd->rsci->h, sn->rsc_cache); - gsd->rsci = NULL; - + gsd->rsci = NULL; + } return stat; }
From: Ritesh Harjani riteshh@linux.ibm.com
[ Upstream commit 5808fecc572391867fcd929662b29c12e6d08d81 ]
In case if isi.nr_pages is 0, we are making sis->pages (which is unsigned int) a huge value in iomap_swapfile_activate() by assigning -1. This could cause a kernel crash in kernel v4.18 (with below signature). Or could lead to unknown issues on latest kernel if the fake big swap gets used.
Fix this issue by returning -EINVAL in case of nr_pages is 0, since it is anyway a invalid swapfile. Looks like this issue will be hit when we have pagesize < blocksize type of configuration.
I was able to hit the issue in case of a tiny swap file with below test script. https://raw.githubusercontent.com/riteshharjani/LinuxStudy/master/scripts/sw...
kernel crash analysis on v4.18 ============================== On v4.18 kernel, it causes a kernel panic, since sis->pages becomes a huge value and isi.nr_extents is 0. When 0 is returned it is considered as a swapfile over NFS and SWP_FILE is set (sis->flags |= SWP_FILE). Then when swapoff was getting called it was calling a_ops->swap_deactivate() if (sis->flags & SWP_FILE) is true. Since a_ops->swap_deactivate() is NULL in case of XFS, it causes below panic.
Panic signature on v4.18 kernel: ======================================= root@qemu:/home/qemu# [ 8291.723351] XFS (loop2): Unmounting Filesystem [ 8292.123104] XFS (loop2): Mounting V5 Filesystem [ 8292.132451] XFS (loop2): Ending clean mount [ 8292.263362] Adding 4294967232k swap on /mnt1/test/swapfile. Priority:-2 extents:1 across:274877906880k [ 8292.277834] Unable to handle kernel paging request for instruction fetch [ 8292.278677] Faulting instruction address: 0x00000000 cpu 0x19: Vector: 400 (Instruction Access) at [c0000009dd5b7ad0] pc: 0000000000000000 lr: c0000000003eb9dc: destroy_swap_extents+0xfc/0x120 sp: c0000009dd5b7d50 msr: 8000000040009033 current = 0xc0000009b6710080 paca = 0xc00000003ffcb280 irqmask: 0x03 irq_happened: 0x01 pid = 5604, comm = swapoff Linux version 4.18.0 (riteshh@xxxxxxx) (gcc version 8.4.0 (Ubuntu 8.4.0-1ubuntu1~18.04)) #57 SMP Wed Mar 3 01:33:04 CST 2021 enter ? for help [link register ] c0000000003eb9dc destroy_swap_extents+0xfc/0x120 [c0000009dd5b7d50] c0000000025a7058 proc_poll_event+0x0/0x4 (unreliable) [c0000009dd5b7da0] c0000000003f0498 sys_swapoff+0x3f8/0x910 [c0000009dd5b7e30] c00000000000bbe4 system_call+0x5c/0x70 Exception: c01 (System Call) at 00007ffff7d208d8
Signed-off-by: Ritesh Harjani riteshh@linux.ibm.com [djwong: rework the comment to provide more details] Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- fs/iomap/swapfile.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c index a648dbf6991e..a5e478de1417 100644 --- a/fs/iomap/swapfile.c +++ b/fs/iomap/swapfile.c @@ -170,6 +170,16 @@ int iomap_swapfile_activate(struct swap_info_struct *sis, return ret; }
+ /* + * If this swapfile doesn't contain even a single page-aligned + * contiguous range of blocks, reject this useless swapfile to + * prevent confusion later on. + */ + if (isi.nr_pages == 0) { + pr_warn("swapon: Cannot find a single usable page in file.\n"); + return -EINVAL; + } + *pagespan = 1 + isi.highest_ppage - isi.lowest_ppage; sis->max = isi.nr_pages; sis->pages = isi.nr_pages - 1;
From: Jack Yu jack.yu@realtek.com
[ Upstream commit 9e0bdaa9fcb8c64efc1487a7fba07722e7bc515e ]
Remove 0x100 cache re-sync to solve i2c communication error.
Signed-off-by: Jack Yu jack.yu@realtek.com Link: https://lore.kernel.org/r/20210222090057.29532-1-jack.yu@realtek.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/rt1015.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/sound/soc/codecs/rt1015.c b/sound/soc/codecs/rt1015.c index 32e6bcf763d1..4607039a16e7 100644 --- a/sound/soc/codecs/rt1015.c +++ b/sound/soc/codecs/rt1015.c @@ -209,6 +209,7 @@ static bool rt1015_volatile_register(struct device *dev, unsigned int reg) case RT1015_VENDOR_ID: case RT1015_DEVICE_ID: case RT1015_PRO_ALT: + case RT1015_MAN_I2C: case RT1015_DAC3: case RT1015_VBAT_TEST_OUT1: case RT1015_VBAT_TEST_OUT2:
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit cfa26ed1f9f885c2fd8f53ca492989d1e16d0199 ]
The adc_vol_tlv volume-control has a range from -17.625 dB to +30 dB, not -176.25 dB to + 300 dB. This wrong scale is esp. a problem in userspace apps which translate the dB scale to a linear scale. With the logarithmic dB scale being of by a factor of 10 we loose all precision in the lower area of the range when apps translate things to a linear scale.
E.g. the 0 dB default, which corresponds with a value of 47 of the 0 - 127 range for the control, would be shown as 0/100 in alsa-mixer.
Since the centi-dB values used in the TLV struct cannot represent the 0.375 dB step size used by these controls, change the TLV definition for them to specify a min and max value instead of min + stepsize.
Note this mirrors commit 3f31f7d9b540 ("ASoC: rt5670: Fix dac- and adc- vol-tlv values being off by a factor of 10") which made the exact same change to the rt5670 codec driver.
Signed-off-by: Hans de Goede hdegoede@redhat.com Link: https://lore.kernel.org/r/20210226143817.84287-2-hdegoede@redhat.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/rt5640.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sound/soc/codecs/rt5640.c b/sound/soc/codecs/rt5640.c index 1414ad15d01c..a5674c227b3a 100644 --- a/sound/soc/codecs/rt5640.c +++ b/sound/soc/codecs/rt5640.c @@ -339,9 +339,9 @@ static bool rt5640_readable_register(struct device *dev, unsigned int reg) }
static const DECLARE_TLV_DB_SCALE(out_vol_tlv, -4650, 150, 0); -static const DECLARE_TLV_DB_SCALE(dac_vol_tlv, -65625, 375, 0); +static const DECLARE_TLV_DB_MINMAX(dac_vol_tlv, -6562, 0); static const DECLARE_TLV_DB_SCALE(in_vol_tlv, -3450, 150, 0); -static const DECLARE_TLV_DB_SCALE(adc_vol_tlv, -17625, 375, 0); +static const DECLARE_TLV_DB_MINMAX(adc_vol_tlv, -1762, 3000); static const DECLARE_TLV_DB_SCALE(adc_bst_tlv, 0, 1200, 0);
/* {0, +20, +24, +30, +35, +40, +44, +50, +52} dB */
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit eee51df776bd6cac10a76b2779a9fdee3f622b2b ]
The adc_vol_tlv volume-control has a range from -17.625 dB to +30 dB, not -176.25 dB to + 300 dB. This wrong scale is esp. a problem in userspace apps which translate the dB scale to a linear scale. With the logarithmic dB scale being of by a factor of 10 we loose all precision in the lower area of the range when apps translate things to a linear scale.
E.g. the 0 dB default, which corresponds with a value of 47 of the 0 - 127 range for the control, would be shown as 0/100 in alsa-mixer.
Since the centi-dB values used in the TLV struct cannot represent the 0.375 dB step size used by these controls, change the TLV definition for them to specify a min and max value instead of min + stepsize.
Note this mirrors commit 3f31f7d9b540 ("ASoC: rt5670: Fix dac- and adc- vol-tlv values being off by a factor of 10") which made the exact same change to the rt5670 codec driver.
Signed-off-by: Hans de Goede hdegoede@redhat.com Link: https://lore.kernel.org/r/20210226143817.84287-3-hdegoede@redhat.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/rt5651.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sound/soc/codecs/rt5651.c b/sound/soc/codecs/rt5651.c index d198e191fb0c..e59fdc81dbd4 100644 --- a/sound/soc/codecs/rt5651.c +++ b/sound/soc/codecs/rt5651.c @@ -285,9 +285,9 @@ static bool rt5651_readable_register(struct device *dev, unsigned int reg) }
static const DECLARE_TLV_DB_SCALE(out_vol_tlv, -4650, 150, 0); -static const DECLARE_TLV_DB_SCALE(dac_vol_tlv, -65625, 375, 0); +static const DECLARE_TLV_DB_MINMAX(dac_vol_tlv, -6562, 0); static const DECLARE_TLV_DB_SCALE(in_vol_tlv, -3450, 150, 0); -static const DECLARE_TLV_DB_SCALE(adc_vol_tlv, -17625, 375, 0); +static const DECLARE_TLV_DB_MINMAX(adc_vol_tlv, -1762, 3000); static const DECLARE_TLV_DB_SCALE(adc_bst_tlv, 0, 1200, 0);
/* {0, +20, +24, +30, +35, +40, +44, +50, +52} dB */
From: Benjamin Rood benjaminjrood@gmail.com
[ Upstream commit f86f58e3594fb0ab1993d833d3b9a2496f3c928c ]
According to the SGTL5000 datasheet [1], the DAP_AVC_CTRL register has the following bit field definitions:
| BITS | FIELD | RW | RESET | DEFINITION | | 15 | RSVD | RO | 0x0 | Reserved | | 14 | RSVD | RW | 0x1 | Reserved | | 13:12 | MAX_GAIN | RW | 0x1 | Max Gain of AVC in expander mode | | 11:10 | RSVD | RO | 0x0 | Reserved | | 9:8 | LBI_RESP | RW | 0x1 | Integrator Response | | 7:6 | RSVD | RO | 0x0 | Reserved | | 5 | HARD_LMT_EN | RW | 0x0 | Enable hard limiter mode | | 4:1 | RSVD | RO | 0x0 | Reserved | | 0 | EN | RW | 0x0 | Enable/Disable AVC |
The original default value written to the DAP_AVC_CTRL register during sgtl5000_i2c_probe() was 0x0510. This would incorrectly write values to bits 4 and 10, which are defined as RESERVED. It would also not set bits 12 and 14 to their correct RESET values of 0x1, and instead set them to 0x0. While the DAP_AVC module is effectively disabled because the EN bit is 0, this default value is still writing invalid values to registers that are marked as read-only and RESERVED as well as not setting bits 12 and 14 to their correct default values as defined by the datasheet.
The correct value that should be written to the DAP_AVC_CTRL register is 0x5100, which configures the register bits to the default values defined by the datasheet, and prevents any writes to bits defined as 'read-only'. Generally speaking, it is best practice to NOT attempt to write values to registers/bits defined as RESERVED, as it generally produces unwanted/undefined behavior, or errors.
Also, all credit for this patch should go to my colleague Dan MacDonald dmacdonald@curbellmedical.com for finding this error in the first place.
[1] https://www.nxp.com/docs/en/data-sheet/SGTL5000.pdf
Signed-off-by: Benjamin Rood benjaminjrood@gmail.com Reviewed-by: Fabio Estevam festevam@gmail.com Link: https://lore.kernel.org/r/20210219183308.GA2117@ubuntu-dev Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/sgtl5000.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/codecs/sgtl5000.c b/sound/soc/codecs/sgtl5000.c index 4d6ff8114622..4c0e87e22b97 100644 --- a/sound/soc/codecs/sgtl5000.c +++ b/sound/soc/codecs/sgtl5000.c @@ -71,7 +71,7 @@ static const struct reg_default sgtl5000_reg_defaults[] = { { SGTL5000_DAP_EQ_BASS_BAND4, 0x002f }, { SGTL5000_DAP_MAIN_CHAN, 0x8000 }, { SGTL5000_DAP_MIX_CHAN, 0x0000 }, - { SGTL5000_DAP_AVC_CTRL, 0x0510 }, + { SGTL5000_DAP_AVC_CTRL, 0x5100 }, { SGTL5000_DAP_AVC_THRESHOLD, 0x1473 }, { SGTL5000_DAP_AVC_ATTACK, 0x0028 }, { SGTL5000_DAP_AVC_DECAY, 0x0050 },
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit bb18c678754ce1514100fb4c0bf6113b5af36c48 ]
Most steps in this table are steps of 3dB (300 centi-dB), so we can simplify the table.
This not only reduces the amount of space it takes inside the kernel, this also makes alsa-lib's mixer code actually accept the table, where as before this change alsa-lib saw the "ADC PGA Gain" control as a control without a dB scale.
Signed-off-by: Hans de Goede hdegoede@redhat.com Link: https://lore.kernel.org/r/20210228160441.241110-1-hdegoede@redhat.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/es8316.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/sound/soc/codecs/es8316.c b/sound/soc/codecs/es8316.c index f9ec5cf82599..ec2f11ff8a84 100644 --- a/sound/soc/codecs/es8316.c +++ b/sound/soc/codecs/es8316.c @@ -63,13 +63,8 @@ static const SNDRV_CTL_TLVD_DECLARE_DB_RANGE(adc_pga_gain_tlv, 1, 1, TLV_DB_SCALE_ITEM(0, 0, 0), 2, 2, TLV_DB_SCALE_ITEM(250, 0, 0), 3, 3, TLV_DB_SCALE_ITEM(450, 0, 0), - 4, 4, TLV_DB_SCALE_ITEM(700, 0, 0), - 5, 5, TLV_DB_SCALE_ITEM(1000, 0, 0), - 6, 6, TLV_DB_SCALE_ITEM(1300, 0, 0), - 7, 7, TLV_DB_SCALE_ITEM(1600, 0, 0), - 8, 8, TLV_DB_SCALE_ITEM(1800, 0, 0), - 9, 9, TLV_DB_SCALE_ITEM(2100, 0, 0), - 10, 10, TLV_DB_SCALE_ITEM(2400, 0, 0), + 4, 7, TLV_DB_SCALE_ITEM(700, 300, 0), + 8, 10, TLV_DB_SCALE_ITEM(1800, 300, 0), );
static const SNDRV_CTL_TLVD_DECLARE_DB_RANGE(hpout_vol_tlv,
From: Jon Hunter jonathanh@nvidia.com
[ Upstream commit 7de14d581dbed57c2b3c6afffa2c3fdc6955a3cd ]
Many systems do not use ACPI and hence do not provide a DMI table. On non-ACPI systems a warning, such as the following, is printed on boot.
WARNING KERN tegra-audio-graph-card sound: ASoC: no DMI vendor name!
The variable 'dmi_available' is not exported and so currently cannot be used by kernel modules without adding an accessor. However, it is possible to use the function is_acpi_device_node() to determine if the sound card is an ACPI device and hence indicate if we expect a DMI table to be present. Therefore, call is_acpi_device_node() to see if we are using ACPI and only parse the DMI table if we are booting with ACPI.
Signed-off-by: Jon Hunter jonathanh@nvidia.com Link: https://lore.kernel.org/r/20210303115526.419458-1-jonathanh@nvidia.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/soc-core.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c index f6d4e99b590c..0cffc9527e28 100644 --- a/sound/soc/soc-core.c +++ b/sound/soc/soc-core.c @@ -31,6 +31,7 @@ #include <linux/of.h> #include <linux/of_graph.h> #include <linux/dmi.h> +#include <linux/acpi.h> #include <sound/core.h> #include <sound/pcm.h> #include <sound/pcm_params.h> @@ -1573,6 +1574,9 @@ int snd_soc_set_dmi_name(struct snd_soc_card *card, const char *flavour) if (card->long_name) return 0; /* long name already set by driver or from DMI */
+ if (!is_acpi_device_node(card->dev->fwnode)) + return 0; + /* make up dmi long name as: vendor-product-version-board */ vendor = dmi_get_system_info(DMI_BOARD_VENDOR); if (!vendor || !is_dmi_valid(vendor)) {
From: Lucas Tanure tanureal@opensource.cirrus.com
[ Upstream commit e793c965519b8b7f2fea51a48398405e2a501729 ]
The driver was setting bit clock polarity opposite to intended polarity. Also simplify the code by grouping ADC and DAC clock configurations into a single field.
Signed-off-by: Lucas Tanure tanureal@opensource.cirrus.com Link: https://lore.kernel.org/r/20210305173442.195740-2-tanureal@opensource.cirrus... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/cs42l42.c | 20 ++++++++------------ sound/soc/codecs/cs42l42.h | 11 ++++++----- 2 files changed, 14 insertions(+), 17 deletions(-)
diff --git a/sound/soc/codecs/cs42l42.c b/sound/soc/codecs/cs42l42.c index 210fcbedf241..df0d5fec0287 100644 --- a/sound/soc/codecs/cs42l42.c +++ b/sound/soc/codecs/cs42l42.c @@ -797,27 +797,23 @@ static int cs42l42_set_dai_fmt(struct snd_soc_dai *codec_dai, unsigned int fmt) /* Bitclock/frame inversion */ switch (fmt & SND_SOC_DAIFMT_INV_MASK) { case SND_SOC_DAIFMT_NB_NF: + asp_cfg_val |= CS42L42_ASP_SCPOL_NOR << CS42L42_ASP_SCPOL_SHIFT; break; case SND_SOC_DAIFMT_NB_IF: - asp_cfg_val |= CS42L42_ASP_POL_INV << - CS42L42_ASP_LCPOL_IN_SHIFT; + asp_cfg_val |= CS42L42_ASP_SCPOL_NOR << CS42L42_ASP_SCPOL_SHIFT; + asp_cfg_val |= CS42L42_ASP_LCPOL_INV << CS42L42_ASP_LCPOL_SHIFT; break; case SND_SOC_DAIFMT_IB_NF: - asp_cfg_val |= CS42L42_ASP_POL_INV << - CS42L42_ASP_SCPOL_IN_DAC_SHIFT; break; case SND_SOC_DAIFMT_IB_IF: - asp_cfg_val |= CS42L42_ASP_POL_INV << - CS42L42_ASP_LCPOL_IN_SHIFT; - asp_cfg_val |= CS42L42_ASP_POL_INV << - CS42L42_ASP_SCPOL_IN_DAC_SHIFT; + asp_cfg_val |= CS42L42_ASP_LCPOL_INV << CS42L42_ASP_LCPOL_SHIFT; break; }
- snd_soc_component_update_bits(component, CS42L42_ASP_CLK_CFG, - CS42L42_ASP_MODE_MASK | - CS42L42_ASP_SCPOL_IN_DAC_MASK | - CS42L42_ASP_LCPOL_IN_MASK, asp_cfg_val); + snd_soc_component_update_bits(component, CS42L42_ASP_CLK_CFG, CS42L42_ASP_MODE_MASK | + CS42L42_ASP_SCPOL_MASK | + CS42L42_ASP_LCPOL_MASK, + asp_cfg_val);
return 0; } diff --git a/sound/soc/codecs/cs42l42.h b/sound/soc/codecs/cs42l42.h index 9e3cc528dcff..1f0d67c95a9a 100644 --- a/sound/soc/codecs/cs42l42.h +++ b/sound/soc/codecs/cs42l42.h @@ -258,11 +258,12 @@ #define CS42L42_ASP_SLAVE_MODE 0x00 #define CS42L42_ASP_MODE_SHIFT 4 #define CS42L42_ASP_MODE_MASK (1 << CS42L42_ASP_MODE_SHIFT) -#define CS42L42_ASP_SCPOL_IN_DAC_SHIFT 2 -#define CS42L42_ASP_SCPOL_IN_DAC_MASK (1 << CS42L42_ASP_SCPOL_IN_DAC_SHIFT) -#define CS42L42_ASP_LCPOL_IN_SHIFT 0 -#define CS42L42_ASP_LCPOL_IN_MASK (1 << CS42L42_ASP_LCPOL_IN_SHIFT) -#define CS42L42_ASP_POL_INV 1 +#define CS42L42_ASP_SCPOL_SHIFT 2 +#define CS42L42_ASP_SCPOL_MASK (3 << CS42L42_ASP_SCPOL_SHIFT) +#define CS42L42_ASP_SCPOL_NOR 3 +#define CS42L42_ASP_LCPOL_SHIFT 0 +#define CS42L42_ASP_LCPOL_MASK (3 << CS42L42_ASP_LCPOL_SHIFT) +#define CS42L42_ASP_LCPOL_INV 3
#define CS42L42_ASP_FRM_CFG (CS42L42_PAGE_12 + 0x08) #define CS42L42_ASP_STP_SHIFT 4
From: Lucas Tanure tanureal@opensource.cirrus.com
[ Upstream commit 2bdc4f5c6838f7c3feb4fe68e4edbeea158ec0a2 ]
Remove the hard coded 32 bits width and replace with the correct width calculated by params_width.
Signed-off-by: Lucas Tanure tanureal@opensource.cirrus.com Link: https://lore.kernel.org/r/20210305173442.195740-3-tanureal@opensource.cirrus... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/cs42l42.c | 47 ++++++++++++++++++-------------------- sound/soc/codecs/cs42l42.h | 1 - 2 files changed, 22 insertions(+), 26 deletions(-)
diff --git a/sound/soc/codecs/cs42l42.c b/sound/soc/codecs/cs42l42.c index df0d5fec0287..4f9ad9547929 100644 --- a/sound/soc/codecs/cs42l42.c +++ b/sound/soc/codecs/cs42l42.c @@ -691,24 +691,6 @@ static int cs42l42_pll_config(struct snd_soc_component *component) CS42L42_CLK_OASRC_SEL_MASK, CS42L42_CLK_OASRC_SEL_12 << CS42L42_CLK_OASRC_SEL_SHIFT); - /* channel 1 on low LRCLK, 32 bit */ - snd_soc_component_update_bits(component, - CS42L42_ASP_RX_DAI0_CH1_AP_RES, - CS42L42_ASP_RX_CH_AP_MASK | - CS42L42_ASP_RX_CH_RES_MASK, - (CS42L42_ASP_RX_CH_AP_LOW << - CS42L42_ASP_RX_CH_AP_SHIFT) | - (CS42L42_ASP_RX_CH_RES_32 << - CS42L42_ASP_RX_CH_RES_SHIFT)); - /* Channel 2 on high LRCLK, 32 bit */ - snd_soc_component_update_bits(component, - CS42L42_ASP_RX_DAI0_CH2_AP_RES, - CS42L42_ASP_RX_CH_AP_MASK | - CS42L42_ASP_RX_CH_RES_MASK, - (CS42L42_ASP_RX_CH_AP_HI << - CS42L42_ASP_RX_CH_AP_SHIFT) | - (CS42L42_ASP_RX_CH_RES_32 << - CS42L42_ASP_RX_CH_RES_SHIFT)); if (pll_ratio_table[i].mclk_src_sel == 0) { /* Pass the clock straight through */ snd_soc_component_update_bits(component, @@ -824,14 +806,29 @@ static int cs42l42_pcm_hw_params(struct snd_pcm_substream *substream, { struct snd_soc_component *component = dai->component; struct cs42l42_private *cs42l42 = snd_soc_component_get_drvdata(component); - int retval; + unsigned int width = (params_width(params) / 8) - 1; + unsigned int val = 0;
cs42l42->srate = params_rate(params); - cs42l42->swidth = params_width(params);
- retval = cs42l42_pll_config(component); + switch(substream->stream) { + case SNDRV_PCM_STREAM_PLAYBACK: + val |= width << CS42L42_ASP_RX_CH_RES_SHIFT; + /* channel 1 on low LRCLK */ + snd_soc_component_update_bits(component, CS42L42_ASP_RX_DAI0_CH1_AP_RES, + CS42L42_ASP_RX_CH_AP_MASK | + CS42L42_ASP_RX_CH_RES_MASK, val); + /* Channel 2 on high LRCLK */ + val |= CS42L42_ASP_RX_CH_AP_HI << CS42L42_ASP_RX_CH_AP_SHIFT; + snd_soc_component_update_bits(component, CS42L42_ASP_RX_DAI0_CH2_AP_RES, + CS42L42_ASP_RX_CH_AP_MASK | + CS42L42_ASP_RX_CH_RES_MASK, val); + break; + default: + break; + }
- return retval; + return cs42l42_pll_config(component); }
static int cs42l42_set_sysclk(struct snd_soc_dai *dai, @@ -896,9 +893,9 @@ static int cs42l42_mute(struct snd_soc_dai *dai, int mute, int direction) return 0; }
-#define CS42L42_FORMATS (SNDRV_PCM_FMTBIT_S16_LE | SNDRV_PCM_FMTBIT_S18_3LE | \ - SNDRV_PCM_FMTBIT_S20_3LE | SNDRV_PCM_FMTBIT_S24_LE | \ - SNDRV_PCM_FMTBIT_S32_LE) +#define CS42L42_FORMATS (SNDRV_PCM_FMTBIT_S16_LE |\ + SNDRV_PCM_FMTBIT_S24_LE |\ + SNDRV_PCM_FMTBIT_S32_LE )
static const struct snd_soc_dai_ops cs42l42_ops = { diff --git a/sound/soc/codecs/cs42l42.h b/sound/soc/codecs/cs42l42.h index 1f0d67c95a9a..9b017b76828a 100644 --- a/sound/soc/codecs/cs42l42.h +++ b/sound/soc/codecs/cs42l42.h @@ -757,7 +757,6 @@ struct cs42l42_private { struct completion pdn_done; u32 sclk; u32 srate; - u32 swidth; u8 plug_state; u8 hs_type; u8 ts_inv;
From: Lucas Tanure tanureal@opensource.cirrus.com
[ Upstream commit 72d904763ae6a8576e7ad034f9da4f0e3c44bf24 ]
The minimum value is 0x3f (-63dB), which also is mute
Signed-off-by: Lucas Tanure tanureal@opensource.cirrus.com Link: https://lore.kernel.org/r/20210305173442.195740-4-tanureal@opensource.cirrus... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/cs42l42.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sound/soc/codecs/cs42l42.c b/sound/soc/codecs/cs42l42.c index 4f9ad9547929..d5078ce79fad 100644 --- a/sound/soc/codecs/cs42l42.c +++ b/sound/soc/codecs/cs42l42.c @@ -401,7 +401,7 @@ static const struct regmap_config cs42l42_regmap = { };
static DECLARE_TLV_DB_SCALE(adc_tlv, -9600, 100, false); -static DECLARE_TLV_DB_SCALE(mixer_tlv, -6200, 100, false); +static DECLARE_TLV_DB_SCALE(mixer_tlv, -6300, 100, true);
static const char * const cs42l42_hpf_freq_text[] = { "1.86Hz", "120Hz", "235Hz", "466Hz" @@ -458,7 +458,7 @@ static const struct snd_kcontrol_new cs42l42_snd_controls[] = { CS42L42_DAC_HPF_EN_SHIFT, true, false), SOC_DOUBLE_R_TLV("Mixer Volume", CS42L42_MIXER_CHA_VOL, CS42L42_MIXER_CHB_VOL, CS42L42_MIXER_CH_VOL_SHIFT, - 0x3e, 1, mixer_tlv) + 0x3f, 1, mixer_tlv) };
static int cs42l42_hpdrv_evt(struct snd_soc_dapm_widget *w,
From: Lucas Tanure tanureal@opensource.cirrus.com
[ Upstream commit 19325cfea04446bc79b36bffd4978af15f46a00e ]
This delay is part of the power-up sequence defined in the datasheet. A runtime_resume is a power-up so must also include the delay.
Signed-off-by: Lucas Tanure tanureal@opensource.cirrus.com Link: https://lore.kernel.org/r/20210305173442.195740-6-tanureal@opensource.cirrus... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/cs42l42.c | 3 ++- sound/soc/codecs/cs42l42.h | 1 + 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/sound/soc/codecs/cs42l42.c b/sound/soc/codecs/cs42l42.c index d5078ce79fad..4d82d24c7828 100644 --- a/sound/soc/codecs/cs42l42.c +++ b/sound/soc/codecs/cs42l42.c @@ -1794,7 +1794,7 @@ static int cs42l42_i2c_probe(struct i2c_client *i2c_client, dev_dbg(&i2c_client->dev, "Found reset GPIO\n"); gpiod_set_value_cansleep(cs42l42->reset_gpio, 1); } - mdelay(3); + usleep_range(CS42L42_BOOT_TIME_US, CS42L42_BOOT_TIME_US * 2);
/* Request IRQ */ ret = devm_request_threaded_irq(&i2c_client->dev, @@ -1919,6 +1919,7 @@ static int cs42l42_runtime_resume(struct device *dev) }
gpiod_set_value_cansleep(cs42l42->reset_gpio, 1); + usleep_range(CS42L42_BOOT_TIME_US, CS42L42_BOOT_TIME_US * 2);
regcache_cache_only(cs42l42->regmap, false); regcache_sync(cs42l42->regmap); diff --git a/sound/soc/codecs/cs42l42.h b/sound/soc/codecs/cs42l42.h index 9b017b76828a..866d7c873e3c 100644 --- a/sound/soc/codecs/cs42l42.h +++ b/sound/soc/codecs/cs42l42.h @@ -740,6 +740,7 @@ #define CS42L42_FRAC2_VAL(val) (((val) & 0xff0000) >> 16)
#define CS42L42_NUM_SUPPLIES 5 +#define CS42L42_BOOT_TIME_US 3000
static const char *const cs42l42_supply_names[CS42L42_NUM_SUPPLIES] = { "VA",
From: Olga Kornievskaia kolga@netapp.com
[ Upstream commit b4250dd868d1b42c0a65de11ef3afbee67ba5d2f ]
When the server tries to do a callback and a client fails it due to authentication problems, we need the server to set callback down flag in RENEW so that client can recover.
Suggested-by: Bruce Fields bfields@redhat.com Signed-off-by: Olga Kornievskaia kolga@netapp.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Tested-by: Benjamin Coddington bcodding@redhat.com Link: https://lore.kernel.org/linux-nfs/FB84E90A-1A03-48B3-8BF7-D9D10AC2C9FE@oracl... Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4callback.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index 052be5bf9ef5..7325592b456e 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -1189,6 +1189,7 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata) switch (task->tk_status) { case -EIO: case -ETIMEDOUT: + case -EACCES: nfsd4_mark_cb_down(clp, task->tk_status); } break;
From: Jiaxin Yu jiaxin.yu@mediatek.com
[ Upstream commit 8d06b9633a66f41fed520f6eebd163189518ba79 ]
This patch correct tdm out bck inverse register to AUDIO_TOP_CON3[3].
Signed-off-by: Jiaxin Yu jiaxin.yu@mediatek.com Link: https://lore.kernel.org/r/1615516005-781-1-git-send-email-jiaxin.yu@mediatek... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/mediatek/mt8192/mt8192-dai-tdm.c | 4 +++- sound/soc/mediatek/mt8192/mt8192-reg.h | 8 +++++--- 2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/sound/soc/mediatek/mt8192/mt8192-dai-tdm.c b/sound/soc/mediatek/mt8192/mt8192-dai-tdm.c index 8383536b7ae0..504293de2c0d 100644 --- a/sound/soc/mediatek/mt8192/mt8192-dai-tdm.c +++ b/sound/soc/mediatek/mt8192/mt8192-dai-tdm.c @@ -555,7 +555,9 @@ static int mtk_dai_tdm_hw_params(struct snd_pcm_substream *substream,
/* set tdm */ if (tdm_priv->bck_invert) - tdm_con |= 1 << BCK_INVERSE_SFT; + regmap_update_bits(afe->regmap, AUDIO_TOP_CON3, + BCK_INVERSE_MASK_SFT, + 0x1 << BCK_INVERSE_SFT);
if (tdm_priv->lck_invert) tdm_con |= 1 << LRCK_INVERSE_SFT; diff --git a/sound/soc/mediatek/mt8192/mt8192-reg.h b/sound/soc/mediatek/mt8192/mt8192-reg.h index 562f25c79c34..b9fb80d4afec 100644 --- a/sound/soc/mediatek/mt8192/mt8192-reg.h +++ b/sound/soc/mediatek/mt8192/mt8192-reg.h @@ -21,6 +21,11 @@ enum { /***************************************************************************** * R E G I S T E R D E F I N I T I O N *****************************************************************************/ +/* AUDIO_TOP_CON3 */ +#define BCK_INVERSE_SFT 3 +#define BCK_INVERSE_MASK 0x1 +#define BCK_INVERSE_MASK_SFT (0x1 << 3) + /* AFE_DAC_CON0 */ #define VUL12_ON_SFT 31 #define VUL12_ON_MASK 0x1 @@ -2079,9 +2084,6 @@ enum { #define TDM_EN_SFT 0 #define TDM_EN_MASK 0x1 #define TDM_EN_MASK_SFT (0x1 << 0) -#define BCK_INVERSE_SFT 1 -#define BCK_INVERSE_MASK 0x1 -#define BCK_INVERSE_MASK_SFT (0x1 << 1) #define LRCK_INVERSE_SFT 2 #define LRCK_INVERSE_MASK 0x1 #define LRCK_INVERSE_MASK_SFT (0x1 << 2)
From: Jens Axboe axboe@kernel.dk
[ Upstream commit 15b2219facadec583c24523eed40fa45865f859f ]
Don't send fake signals to PF_IO_WORKER threads, they don't accept signals. Just treat them like kthreads in this regard, all they need is a wakeup as no forced kernel/user transition is needed.
Suggested-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/freezer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/freezer.c b/kernel/freezer.c index dc520f01f99d..1a2d57d1327c 100644 --- a/kernel/freezer.c +++ b/kernel/freezer.c @@ -134,7 +134,7 @@ bool freeze_task(struct task_struct *p) return false; }
- if (!(p->flags & PF_KTHREAD)) + if (!(p->flags & (PF_KTHREAD | PF_IO_WORKER))) fake_signal_wake_up(p); else wake_up_state(p, TASK_INTERRUPTIBLE);
From: Laurent Vivier lvivier@redhat.com
[ Upstream commit beb691e69f4dec7bfe8b81b509848acfd1f0dbf9 ]
vhost_reset_is_le() is vhost_init_is_le(), and in the case of cross-endian legacy, vhost_init_is_le() depends on vq->user_be.
vq->user_be is set by vhost_disable_cross_endian().
But in vhost_vq_reset(), we have:
vhost_reset_is_le(vq); vhost_disable_cross_endian(vq);
And so user_be is used before being set.
To fix that, reverse the lines order as there is no other dependency between them.
Signed-off-by: Laurent Vivier lvivier@redhat.com Link: https://lore.kernel.org/r/20210312140913.788592-1-lvivier@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vhost/vhost.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index a262e12c6dc2..5ccb0705beae 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -332,8 +332,8 @@ static void vhost_vq_reset(struct vhost_dev *dev, vq->error_ctx = NULL; vq->kick = NULL; vq->log_ctx = NULL; - vhost_reset_is_le(vq); vhost_disable_cross_endian(vq); + vhost_reset_is_le(vq); vq->busyloop_timeout = 0; vq->umem = NULL; vq->iotlb = NULL;
From: Pavel Begunkov asml.silence@gmail.com
[ Upstream commit efe814a471e0e58f28f1efaf430c8784a4f36626 ]
It's racy to modify req->flags from a not owning context, e.g. linked timeout calling req_set_fail_links() for the master request might race with that request setting/clearing flags while being executed concurrently. Just remove req_set_fail_links(prev) from io_link_timeout_fn(), io_async_find_and_cancel() and functions down the line take care of setting the fail bit.
Signed-off-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io_uring.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c index 5c4378694d54..381f82ebd282 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -6496,7 +6496,6 @@ static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer) spin_unlock_irqrestore(&ctx->completion_lock, flags);
if (prev) { - req_set_fail_links(prev); io_async_find_and_cancel(ctx, req, prev->user_data, -ETIME); io_put_req_deferred(prev, 1); } else {
From: Pavel Begunkov asml.silence@gmail.com
[ Upstream commit f6d54255f4235448d4bbe442362d4caa62da97d5 ]
io_sq_thread_finish() is called in io_ring_ctx_free(), so SQPOLL task is potentially running submitting new requests. It's not a disaster because of using a "try" variant of percpu_ref_get, but is far from nice.
Remove ctx from the sqd ctx list earlier, before cancellation loop, so SQPOLL can't find it and so won't submit new requests.
Signed-off-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io_uring.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c index 381f82ebd282..aaf9b5d49c17 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -8800,6 +8800,14 @@ static void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx) __io_cqring_overflow_flush(ctx, true, NULL, NULL); mutex_unlock(&ctx->uring_lock);
+ /* prevent SQPOLL from submitting new requests */ + if (ctx->sq_data) { + io_sq_thread_park(ctx->sq_data); + list_del_init(&ctx->sqd_list); + io_sqd_update_thread_idle(ctx->sq_data); + io_sq_thread_unpark(ctx->sq_data); + } + io_kill_timeouts(ctx, NULL, NULL); io_poll_remove_all(ctx, NULL, NULL);
From: Lv Yunlong lyl2019@mail.ustc.edu.cn
[ Upstream commit c8c165dea4c8f5ad67b1240861e4f6c5395fa4ac ]
In st_open(), if STp->in_use is true, STp will be freed by scsi_tape_put(). However, STp is still used by DEBC_printk() after. It is better to DEBC_printk() before scsi_tape_put().
Link: https://lore.kernel.org/r/20210311064636.10522-1-lyl2019@mail.ustc.edu.cn Acked-by: Kai Mäkisara kai.makisara@kolumbus.fi Signed-off-by: Lv Yunlong lyl2019@mail.ustc.edu.cn Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/st.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c index 43f7624508a9..8b10fa4e381a 100644 --- a/drivers/scsi/st.c +++ b/drivers/scsi/st.c @@ -1269,8 +1269,8 @@ static int st_open(struct inode *inode, struct file *filp) spin_lock(&st_use_lock); if (STp->in_use) { spin_unlock(&st_use_lock); - scsi_tape_put(STp); DEBC_printk(STp, "Device already in use.\n"); + scsi_tape_put(STp); return (-EBUSY); }
From: Alexey Dobriyan adobriyan@gmail.com
[ Upstream commit 5999b9e5b1f8a2f5417b755130919b3ac96f5550 ]
Only half of the file is under include guard because terminating #endif is placed too early.
Link: https://lore.kernel.org/r/YE4snvoW1SuwcXAn@localhost.localdomain Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com Signed-off-by: Alexey Dobriyan adobriyan@gmail.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/qla2xxx/qla_target.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/qla2xxx/qla_target.h b/drivers/scsi/qla2xxx/qla_target.h index 10e5e6c8087d..01620f3eab39 100644 --- a/drivers/scsi/qla2xxx/qla_target.h +++ b/drivers/scsi/qla2xxx/qla_target.h @@ -116,7 +116,6 @@ (min(1270, ((ql) > 0) ? (QLA_TGT_DATASEGS_PER_CMD_24XX + \ QLA_TGT_DATASEGS_PER_CONT_24XX*((ql) - 1)) : 0)) #endif -#endif
#define GET_TARGET_ID(ha, iocb) ((HAS_EXTENDED_IDS(ha)) \ ? le16_to_cpu((iocb)->u.isp2x.target.extended) \ @@ -244,6 +243,7 @@ struct ctio_to_2xxx { #ifndef CTIO_RET_TYPE #define CTIO_RET_TYPE 0x17 /* CTIO return entry */ #define ATIO_TYPE7 0x06 /* Accept target I/O entry for 24xx */ +#endif
struct fcp_hdr { uint8_t r_ctl;
From: Tong Zhang ztong0001@gmail.com
[ Upstream commit 2e5848a3d86f03024ae096478bdb892ab3d79131 ]
request_irq() wont accept a name which contains slash so we need to repalce it with something else -- otherwise it will trigger a warning and the entry in /proc/irq/ will not be created since the .name might be used by userspace and we don't want to break userspace, so we are changing the parameters passed to request_irq()
[ 1.630764] name 'pci-das1602/16' [ 1.630950] WARNING: CPU: 0 PID: 181 at fs/proc/generic.c:180 __xlate_proc_name+0x93/0xb0 [ 1.634009] RIP: 0010:__xlate_proc_name+0x93/0xb0 [ 1.639441] Call Trace: [ 1.639976] proc_mkdir+0x18/0x20 [ 1.641946] request_threaded_irq+0xfe/0x160 [ 1.642186] cb_pcidas_auto_attach+0xf4/0x610 [cb_pcidas]
Suggested-by: Ian Abbott abbotti@mev.co.uk Reviewed-by: Ian Abbott abbotti@mev.co.uk Signed-off-by: Tong Zhang ztong0001@gmail.com Link: https://lore.kernel.org/r/20210315195914.4801-1-ztong0001@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/staging/comedi/drivers/cb_pcidas.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/comedi/drivers/cb_pcidas.c b/drivers/staging/comedi/drivers/cb_pcidas.c index d740c4782775..2f20bd56ec6c 100644 --- a/drivers/staging/comedi/drivers/cb_pcidas.c +++ b/drivers/staging/comedi/drivers/cb_pcidas.c @@ -1281,7 +1281,7 @@ static int cb_pcidas_auto_attach(struct comedi_device *dev, devpriv->amcc + AMCC_OP_REG_INTCSR);
ret = request_irq(pcidev->irq, cb_pcidas_interrupt, IRQF_SHARED, - dev->board_name, dev); + "cb_pcidas", dev); if (ret) { dev_dbg(dev->class_dev, "unable to allocate irq %d\n", pcidev->irq);
From: Tong Zhang ztong0001@gmail.com
[ Upstream commit d2d106fe3badfc3bf0dd3899d1c3f210c7203eab ]
request_irq() wont accept a name which contains slash so we need to repalce it with something else -- otherwise it will trigger a warning and the entry in /proc/irq/ will not be created since the .name might be used by userspace and we don't want to break userspace, so we are changing the parameters passed to request_irq()
[ 1.565966] name 'pci-das6402/16' [ 1.566149] WARNING: CPU: 0 PID: 184 at fs/proc/generic.c:180 __xlate_proc_name+0x93/0xb0 [ 1.568923] RIP: 0010:__xlate_proc_name+0x93/0xb0 [ 1.574200] Call Trace: [ 1.574722] proc_mkdir+0x18/0x20 [ 1.576629] request_threaded_irq+0xfe/0x160 [ 1.576859] auto_attach+0x60a/0xc40 [cb_pcidas64]
Suggested-by: Ian Abbott abbotti@mev.co.uk Reviewed-by: Ian Abbott abbotti@mev.co.uk Signed-off-by: Tong Zhang ztong0001@gmail.com Link: https://lore.kernel.org/r/20210315195814.4692-1-ztong0001@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/staging/comedi/drivers/cb_pcidas64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/comedi/drivers/cb_pcidas64.c b/drivers/staging/comedi/drivers/cb_pcidas64.c index fa987bb0e7cd..6d3ba399a7f0 100644 --- a/drivers/staging/comedi/drivers/cb_pcidas64.c +++ b/drivers/staging/comedi/drivers/cb_pcidas64.c @@ -4035,7 +4035,7 @@ static int auto_attach(struct comedi_device *dev, init_stc_registers(dev);
retval = request_irq(pcidev->irq, handle_interrupt, IRQF_SHARED, - dev->board_name, dev); + "cb_pcidas64", dev); if (retval) { dev_dbg(dev->class_dev, "unable to allocate irq %u\n", pcidev->irq);
From: Sameer Pujar spujar@nvidia.com
[ Upstream commit dbf54a9534350d6aebbb34f5c1c606b81a4f35dd ]
Simple-card/audio-graph-card drivers do not handle MCLK clock when it is specified in the codec device node. The expectation here is that, the codec should actually own up the MCLK clock and do necessary setup in the driver.
Suggested-by: Mark Brown broonie@kernel.org Suggested-by: Michael Walle michael@walle.cc Signed-off-by: Sameer Pujar spujar@nvidia.com Link: https://lore.kernel.org/r/1615829492-8972-3-git-send-email-spujar@nvidia.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/rt5659.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/sound/soc/codecs/rt5659.c b/sound/soc/codecs/rt5659.c index 41e5917b16a5..91a4ef7f620c 100644 --- a/sound/soc/codecs/rt5659.c +++ b/sound/soc/codecs/rt5659.c @@ -3426,12 +3426,17 @@ static int rt5659_set_component_sysclk(struct snd_soc_component *component, int { struct rt5659_priv *rt5659 = snd_soc_component_get_drvdata(component); unsigned int reg_val = 0; + int ret;
if (freq == rt5659->sysclk && clk_id == rt5659->sysclk_src) return 0;
switch (clk_id) { case RT5659_SCLK_S_MCLK: + ret = clk_set_rate(rt5659->mclk, freq); + if (ret) + return ret; + reg_val |= RT5659_SCLK_SRC_MCLK; break; case RT5659_SCLK_S_PLL1:
From: Bard Liao yung-chuan.liao@linux.intel.com
[ Upstream commit 899b12542b0897f92de9ba30944937c39ebb246d ]
We do some IO operations in the snd_soc_component_set_jack callback function and snd_soc_component_set_jack() will be called when soc component is removed. However, we should not access SoundWire registers when the bus is suspended. So set regcache_cache_only(regmap, true) to avoid accessing in the soc component removal process.
Signed-off-by: Bard Liao yung-chuan.liao@linux.intel.com Reviewed-by: Kai Vehmanen kai.vehmanen@linux.intel.com Reviewed-by: Rander Wang rander.wang@intel.com Link: https://lore.kernel.org/r/20210316005254.29699-1-yung-chuan.liao@linux.intel... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/rt711.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/sound/soc/codecs/rt711.c b/sound/soc/codecs/rt711.c index 85f744184a60..047f4e677d78 100644 --- a/sound/soc/codecs/rt711.c +++ b/sound/soc/codecs/rt711.c @@ -895,6 +895,13 @@ static int rt711_probe(struct snd_soc_component *component) return 0; }
+static void rt711_remove(struct snd_soc_component *component) +{ + struct rt711_priv *rt711 = snd_soc_component_get_drvdata(component); + + regcache_cache_only(rt711->regmap, true); +} + static const struct snd_soc_component_driver soc_codec_dev_rt711 = { .probe = rt711_probe, .set_bias_level = rt711_set_bias_level, @@ -905,6 +912,7 @@ static const struct snd_soc_component_driver soc_codec_dev_rt711 = { .dapm_routes = rt711_audio_map, .num_dapm_routes = ARRAY_SIZE(rt711_audio_map), .set_jack = rt711_set_jack_detect, + .remove = rt711_remove, };
static int rt711_set_sdw_stream(struct snd_soc_dai *dai, void *sdw_stream,
From: Manaf Meethalavalappu Pallikunhi manafm@codeaurora.org
[ Upstream commit 2046a24ae121cd107929655a6aaf3b8c5beea01f ]
There is a possible chance that some cooling device stats buffer allocation fails due to very high cooling device max state value. Later cooling device update sysfs can try to access stats data for the same cooling device. It will lead to NULL pointer dereference issue.
Add a NULL pointer check before accessing thermal cooling device stats data. It fixes the following bug
[ 26.812833] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004 [ 27.122960] Call trace: [ 27.122963] do_raw_spin_lock+0x18/0xe8 [ 27.122966] _raw_spin_lock+0x24/0x30 [ 27.128157] thermal_cooling_device_stats_update+0x24/0x98 [ 27.128162] cur_state_store+0x88/0xb8 [ 27.128166] dev_attr_store+0x40/0x58 [ 27.128169] sysfs_kf_write+0x50/0x68 [ 27.133358] kernfs_fop_write+0x12c/0x1c8 [ 27.133362] __vfs_write+0x54/0x160 [ 27.152297] vfs_write+0xcc/0x188 [ 27.157132] ksys_write+0x78/0x108 [ 27.162050] ksys_write+0xf8/0x108 [ 27.166968] __arm_smccc_hvc+0x158/0x4b0 [ 27.166973] __arm_smccc_hvc+0x9c/0x4b0 [ 27.186005] el0_svc+0x8/0xc
Signed-off-by: Manaf Meethalavalappu Pallikunhi manafm@codeaurora.org Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Link: https://lore.kernel.org/r/1607367181-24589-1-git-send-email-manafm@codeauror... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/thermal_sysfs.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c index 0866e949339b..9b73532464e5 100644 --- a/drivers/thermal/thermal_sysfs.c +++ b/drivers/thermal/thermal_sysfs.c @@ -754,6 +754,9 @@ void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev, { struct cooling_dev_stats *stats = cdev->stats;
+ if (!stats) + return; + spin_lock(&stats->lock);
if (stats->state == new_state)
From: Waiman Long longman@redhat.com
[ Upstream commit 5de2055d31ea88fd9ae9709ac95c372a505a60fa ]
The use_ww_ctx flag is passed to mutex_optimistic_spin(), but the function doesn't use it. The frequent use of the (use_ww_ctx && ww_ctx) combination is repetitive.
In fact, ww_ctx should not be used at all if !use_ww_ctx. Simplify ww_mutex code by dropping use_ww_ctx from mutex_optimistic_spin() an clear ww_ctx if !use_ww_ctx. In this way, we can replace (use_ww_ctx && ww_ctx) by just (ww_ctx).
Signed-off-by: Waiman Long longman@redhat.com Signed-off-by: Ingo Molnar mingo@kernel.org Acked-by: Davidlohr Bueso dbueso@suse.de Link: https://lore.kernel.org/r/20210316153119.13802-2-longman@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/locking/mutex.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 5352ce50a97e..2c25b830203c 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -636,7 +636,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock) */ static __always_inline bool mutex_optimistic_spin(struct mutex *lock, struct ww_acquire_ctx *ww_ctx, - const bool use_ww_ctx, struct mutex_waiter *waiter) + struct mutex_waiter *waiter) { if (!waiter) { /* @@ -712,7 +712,7 @@ fail: #else static __always_inline bool mutex_optimistic_spin(struct mutex *lock, struct ww_acquire_ctx *ww_ctx, - const bool use_ww_ctx, struct mutex_waiter *waiter) + struct mutex_waiter *waiter) { return false; } @@ -932,6 +932,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, struct ww_mutex *ww; int ret;
+ if (!use_ww_ctx) + ww_ctx = NULL; + might_sleep();
#ifdef CONFIG_DEBUG_MUTEXES @@ -939,7 +942,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, #endif
ww = container_of(lock, struct ww_mutex, base); - if (use_ww_ctx && ww_ctx) { + if (ww_ctx) { if (unlikely(ww_ctx == READ_ONCE(ww->ctx))) return -EALREADY;
@@ -956,10 +959,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
if (__mutex_trylock(lock) || - mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, NULL)) { + mutex_optimistic_spin(lock, ww_ctx, NULL)) { /* got the lock, yay! */ lock_acquired(&lock->dep_map, ip); - if (use_ww_ctx && ww_ctx) + if (ww_ctx) ww_mutex_set_context_fastpath(ww, ww_ctx); preempt_enable(); return 0; @@ -970,7 +973,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, * After waiting to acquire the wait_lock, try again. */ if (__mutex_trylock(lock)) { - if (use_ww_ctx && ww_ctx) + if (ww_ctx) __ww_mutex_check_waiters(lock, ww_ctx);
goto skip_wait; @@ -1023,7 +1026,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, goto err; }
- if (use_ww_ctx && ww_ctx) { + if (ww_ctx) { ret = __ww_mutex_check_kill(lock, &waiter, ww_ctx); if (ret) goto err; @@ -1036,7 +1039,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, * ww_mutex needs to always recheck its position since its waiter * list is not FIFO ordered. */ - if ((use_ww_ctx && ww_ctx) || !first) { + if (ww_ctx || !first) { first = __mutex_waiter_is_first(lock, &waiter); if (first) __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF); @@ -1049,7 +1052,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, * or we must see its unlock and acquire. */ if (__mutex_trylock(lock) || - (first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, &waiter))) + (first && mutex_optimistic_spin(lock, ww_ctx, &waiter))) break;
spin_lock(&lock->wait_lock); @@ -1058,7 +1061,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, acquired: __set_current_state(TASK_RUNNING);
- if (use_ww_ctx && ww_ctx) { + if (ww_ctx) { /* * Wound-Wait; we stole the lock (!first_waiter), check the * waiters as anyone might want to wound us. @@ -1078,7 +1081,7 @@ skip_wait: /* got the lock - cleanup and rejoice! */ lock_acquired(&lock->dep_map, ip);
- if (use_ww_ctx && ww_ctx) + if (ww_ctx) ww_mutex_lock_acquired(ww, ww_ctx);
spin_unlock(&lock->wait_lock);
From: Waiman Long longman@redhat.com
[ Upstream commit bee645788e07eea63055d261d2884ea45c2ba857 ]
In ww_acquire_init(), mutex_acquire() is gated by CONFIG_DEBUG_LOCK_ALLOC. The dep_map in the ww_acquire_ctx structure is also gated by the same config. However mutex_release() in ww_acquire_fini() is gated by CONFIG_DEBUG_MUTEXES. It is possible to set CONFIG_DEBUG_MUTEXES without setting CONFIG_DEBUG_LOCK_ALLOC though it is an unlikely configuration. That may cause a compilation error as dep_map isn't defined in this case. Fix this potential problem by enclosing mutex_release() inside CONFIG_DEBUG_LOCK_ALLOC.
Signed-off-by: Waiman Long longman@redhat.com Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lore.kernel.org/r/20210316153119.13802-3-longman@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/ww_mutex.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h index 850424e5d030..6ecf2a0220db 100644 --- a/include/linux/ww_mutex.h +++ b/include/linux/ww_mutex.h @@ -173,9 +173,10 @@ static inline void ww_acquire_done(struct ww_acquire_ctx *ctx) */ static inline void ww_acquire_fini(struct ww_acquire_ctx *ctx) { -#ifdef CONFIG_DEBUG_MUTEXES +#ifdef CONFIG_DEBUG_LOCK_ALLOC mutex_release(&ctx->dep_map, _THIS_IP_); - +#endif +#ifdef CONFIG_DEBUG_MUTEXES DEBUG_LOCKS_WARN_ON(ctx->acquired); if (!IS_ENABLED(CONFIG_PROVE_LOCKING)) /*
From: Elad Grupi elad.grupi@dell.com
[ Upstream commit bac04454ef9fada009f0572576837548b190bf94 ]
When data digest is enabled we should unmap pdu iovec before handling the data digest pdu.
Signed-off-by: Elad Grupi elad.grupi@dell.com Reviewed-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/target/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index 8b0485ada315..d658c6e8263a 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -1098,11 +1098,11 @@ static int nvmet_tcp_try_recv_data(struct nvmet_tcp_queue *queue) cmd->rbytes_done += ret; }
+ nvmet_tcp_unmap_pdu_iovec(cmd); if (queue->data_digest) { nvmet_tcp_prep_recv_ddgst(cmd); return 0; } - nvmet_tcp_unmap_pdu_iovec(cmd);
if (!(cmd->flags & NVMET_TCP_F_INIT_FAILED) && cmd->rbytes_done == cmd->req.transfer_len) {
From: Stefan Metzmacher metze@samba.org
[ Upstream commit 76cd979f4f38a27df22efb5773a0d567181a9392 ]
We never want to generate any SIGPIPE, -EPIPE only is much better.
Signed-off-by: Stefan Metzmacher metze@samba.org Link: https://lore.kernel.org/r/38961085c3ec49fd21550c7788f214d1ff02d2d4.161590847... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io_uring.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c index aaf9b5d49c17..26b4af9831da 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -4648,7 +4648,7 @@ static int io_sendmsg(struct io_kiocb *req, bool force_nonblock, kmsg = &iomsg; }
- flags = req->sr_msg.msg_flags; + flags = req->sr_msg.msg_flags | MSG_NOSIGNAL; if (flags & MSG_DONTWAIT) req->flags |= REQ_F_NOWAIT; else if (force_nonblock) @@ -4692,7 +4692,7 @@ static int io_send(struct io_kiocb *req, bool force_nonblock, msg.msg_controllen = 0; msg.msg_namelen = 0;
- flags = req->sr_msg.msg_flags; + flags = req->sr_msg.msg_flags | MSG_NOSIGNAL; if (flags & MSG_DONTWAIT) req->flags |= REQ_F_NOWAIT; else if (force_nonblock) @@ -4886,7 +4886,7 @@ static int io_recvmsg(struct io_kiocb *req, bool force_nonblock, 1, req->sr_msg.len); }
- flags = req->sr_msg.msg_flags; + flags = req->sr_msg.msg_flags | MSG_NOSIGNAL; if (flags & MSG_DONTWAIT) req->flags |= REQ_F_NOWAIT; else if (force_nonblock) @@ -4944,7 +4944,7 @@ static int io_recv(struct io_kiocb *req, bool force_nonblock, msg.msg_iocb = NULL; msg.msg_flags = 0;
- flags = req->sr_msg.msg_flags; + flags = req->sr_msg.msg_flags | MSG_NOSIGNAL; if (flags & MSG_DONTWAIT) req->flags |= REQ_F_NOWAIT; else if (force_nonblock)
From: Josef Bacik josef@toxicpanda.com
[ Upstream commit 9d3fcb28f9b9750b474811a2964ce022df56336e ]
This reverts commit d60cd06331a3566d3305b3c7b566e79edf4e2095.
This patch causes a panic when rebooting my Dell Poweredge r440. I do not have the full panic log as it's lost at that stage of the reboot and I do not have a serial console. Reverting this patch makes my system able to reboot again.
Signed-off-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/reboot.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/kernel/reboot.c b/kernel/reboot.c index eb1b15850761..a6ad5eb2fa73 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -244,8 +244,6 @@ void migrate_to_reboot_cpu(void) void kernel_restart(char *cmd) { kernel_restart_prepare(cmd); - if (pm_power_off_prepare) - pm_power_off_prepare(); migrate_to_reboot_cpu(); syscore_shutdown(); if (!cmd)
From: Tobias Klausmann tobias.klausmann@freenet.de
[ Upstream commit e94c55b8e0a0bbe9a026250cf31e2fa45957d776 ]
Starting with commit f295c8cfec833c2707ff1512da10d65386dde7af ("drm/nouveau: fix dma syncing warning with debugging on.") the following oops occures:
BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 6 PID: 1013 Comm: Xorg.bin Tainted: G E 5.11.0-desktop-rc0+ #2 Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.11 08/01/2018 RIP: 0010:nouveau_bo_sync_for_device+0x40/0xb0 [nouveau] Call Trace: nouveau_bo_validate+0x5d/0x80 [nouveau] nouveau_gem_ioctl_pushbuf+0x662/0x1120 [nouveau] ? nouveau_gem_ioctl_new+0xf0/0xf0 [nouveau] drm_ioctl_kernel+0xa6/0xf0 [drm] drm_ioctl+0x1f4/0x3a0 [drm] ? nouveau_gem_ioctl_new+0xf0/0xf0 [nouveau] nouveau_drm_ioctl+0x50/0xa0 [nouveau] __x64_sys_ioctl+0x7e/0xb0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae ---[ end trace ccfb1e7f4064374f ]--- RIP: 0010:nouveau_bo_sync_for_device+0x40/0xb0 [nouveau]
The underlying problem is not introduced by the commit, yet it uncovered the underlying issue. The cited commit relies on valid pages. This is not given for due to some bugs. For now, just warn and work around the issue by just ignoring the bad ttm objects. Below is some debug info gathered while debugging this issue:
nouveau 0000:01:00.0: DRM: ttm_dma->num_pages: 2048 nouveau 0000:01:00.0: DRM: ttm_dma->pages is NULL nouveau 0000:01:00.0: DRM: ttm_dma: 00000000e96058e7 nouveau 0000:01:00.0: DRM: ttm_dma->page_flags: nouveau 0000:01:00.0: DRM: ttm_dma: Populated: 1 nouveau 0000:01:00.0: DRM: ttm_dma: No Retry: 0 nouveau 0000:01:00.0: DRM: ttm_dma: SG: 256 nouveau 0000:01:00.0: DRM: ttm_dma: Zero Alloc: 0 nouveau 0000:01:00.0: DRM: ttm_dma: Swapped: 0
Signed-off-by: Tobias Klausmann tobias.klausmann@freenet.de Signed-off-by: Dave Airlie airlied@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20210313222159.3346-1-tobias.k... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/nouveau/nouveau_bo.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index f1c9a22083be..e05565f284dc 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -551,6 +551,10 @@ nouveau_bo_sync_for_device(struct nouveau_bo *nvbo)
if (!ttm_dma) return; + if (!ttm_dma->pages) { + NV_DEBUG(drm, "ttm_dma 0x%p: pages NULL\n", ttm_dma); + return; + }
/* Don't waste time looping if the object is coherent */ if (nvbo->force_coherent) @@ -583,6 +587,10 @@ nouveau_bo_sync_for_cpu(struct nouveau_bo *nvbo)
if (!ttm_dma) return; + if (!ttm_dma->pages) { + NV_DEBUG(drm, "ttm_dma 0x%p: pages NULL\n", ttm_dma); + return; + }
/* Don't waste time looping if the object is coherent */ if (nvbo->force_coherent)
From: Peter Zijlstra peterz@infradead.org
[ Upstream commit 698bacefe993ad2922c9d3b1380591ad489355e9 ]
The intent is to avoid writing init code after init (because the text might have been freed). The code is needlessly different between jump_label and static_call and not obviously correct.
The existing code relies on the fact that the module loader clears the init layout, such that within_module_init() always fails, while jump_label relies on the module state which is more obvious and matches the kernel logic.
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Acked-by: Jarkko Sakkinen jarkko@kernel.org Tested-by: Sumit Garg sumit.garg@linaro.org Link: https://lkml.kernel.org/r/20210318113610.636651340@infradead.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/static_call.c | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-)
diff --git a/kernel/static_call.c b/kernel/static_call.c index 49efbdc5b480..f59089a12231 100644 --- a/kernel/static_call.c +++ b/kernel/static_call.c @@ -149,6 +149,7 @@ void __static_call_update(struct static_call_key *key, void *tramp, void *func) };
for (site_mod = &first; site_mod; site_mod = site_mod->next) { + bool init = system_state < SYSTEM_RUNNING; struct module *mod = site_mod->mod;
if (!site_mod->sites) { @@ -168,6 +169,7 @@ void __static_call_update(struct static_call_key *key, void *tramp, void *func) if (mod) { stop = mod->static_call_sites + mod->num_static_call_sites; + init = mod->state == MODULE_STATE_COMING; } #endif
@@ -175,16 +177,8 @@ void __static_call_update(struct static_call_key *key, void *tramp, void *func) site < stop && static_call_key(site) == key; site++) { void *site_addr = static_call_addr(site);
- if (static_call_is_init(site)) { - /* - * Don't write to call sites which were in - * initmem and have since been freed. - */ - if (!mod && system_state >= SYSTEM_RUNNING) - continue; - if (mod && !within_module_init((unsigned long)site_addr, mod)) - continue; - } + if (!init && static_call_is_init(site)) + continue;
if (!kernel_text_address((unsigned long)site_addr)) { /*
From: zhangyi (F) yi.zhang@huawei.com
[ Upstream commit 5dccdc5a1916d4266edd251f20bbbb113a5c495f ]
In ext4_rename(), when RENAME_WHITEOUT failed to add new entry into directory, it ends up dropping new created whiteout inode under the running transaction. After commit <9b88f9fb0d2> ("ext4: Do not iput inode under running transaction"), we follow the assumptions that evict() does not get called from a transaction context but in ext4_rename() it breaks this suggestion. Although it's not a real problem, better to obey it, so this patch add inode to orphan list and stop transaction before final iput().
Signed-off-by: zhangyi (F) yi.zhang@huawei.com Link: https://lore.kernel.org/r/20210303131703.330415-2-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/namei.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 078f26f4b56e..9cc9e6c1d582 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -3785,14 +3785,14 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry, */ retval = -ENOENT; if (!old.bh || le32_to_cpu(old.de->inode) != old.inode->i_ino) - goto end_rename; + goto release_bh;
new.bh = ext4_find_entry(new.dir, &new.dentry->d_name, &new.de, &new.inlined); if (IS_ERR(new.bh)) { retval = PTR_ERR(new.bh); new.bh = NULL; - goto end_rename; + goto release_bh; } if (new.bh) { if (!new.inode) { @@ -3809,15 +3809,13 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry, handle = ext4_journal_start(old.dir, EXT4_HT_DIR, credits); if (IS_ERR(handle)) { retval = PTR_ERR(handle); - handle = NULL; - goto end_rename; + goto release_bh; } } else { whiteout = ext4_whiteout_for_rename(&old, credits, &handle); if (IS_ERR(whiteout)) { retval = PTR_ERR(whiteout); - whiteout = NULL; - goto end_rename; + goto release_bh; } }
@@ -3954,16 +3952,18 @@ end_rename: ext4_resetent(handle, &old, old.inode->i_ino, old_file_type); drop_nlink(whiteout); + ext4_orphan_add(handle, whiteout); } unlock_new_inode(whiteout); + ext4_journal_stop(handle); iput(whiteout); - + } else { + ext4_journal_stop(handle); } +release_bh: brelse(old.dir_bh); brelse(old.bh); brelse(new.bh); - if (handle) - ext4_journal_stop(handle); return retval; }
From: Stefan Metzmacher metze@samba.org
[ Upstream commit 0031275d119efe16711cd93519b595e6f9b4b330 ]
Without that it's not safe to use them in a linked combination with others.
Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE should be possible.
We already handle short reads and writes for the following opcodes:
- IORING_OP_READV - IORING_OP_READ_FIXED - IORING_OP_READ - IORING_OP_WRITEV - IORING_OP_WRITE_FIXED - IORING_OP_WRITE - IORING_OP_SPLICE - IORING_OP_TEE
Now we have it for these as well:
- IORING_OP_SENDMSG - IORING_OP_SEND - IORING_OP_RECVMSG - IORING_OP_RECV
For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC flags in order to call req_set_fail_links().
There might be applications arround depending on the behavior that even short send[msg]()/recv[msg]() retuns continue an IOSQE_IO_LINK chain.
It's very unlikely that such applications pass in MSG_WAITALL, which is only defined in 'man 2 recvmsg', but not in 'man 2 sendmsg'.
It's expected that the low level sock_sendmsg() call just ignores MSG_WAITALL, as MSG_ZEROCOPY is also ignored without explicitly set SO_ZEROCOPY.
We also expect the caller to know about the implicit truncation to MAX_RW_COUNT, which we don't detect.
cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/c4e1a4cc0d905314f4d5dc567e65a7b09621aab3.161590847... Signed-off-by: Stefan Metzmacher metze@samba.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io_uring.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c index 26b4af9831da..2b0b9c3dda33 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -4628,6 +4628,7 @@ static int io_sendmsg(struct io_kiocb *req, bool force_nonblock, struct io_async_msghdr iomsg, *kmsg; struct socket *sock; unsigned flags; + int min_ret = 0; int ret;
sock = sock_from_file(req->file); @@ -4654,6 +4655,9 @@ static int io_sendmsg(struct io_kiocb *req, bool force_nonblock, else if (force_nonblock) flags |= MSG_DONTWAIT;
+ if (flags & MSG_WAITALL) + min_ret = iov_iter_count(&kmsg->msg.msg_iter); + ret = __sys_sendmsg_sock(sock, &kmsg->msg, flags); if (force_nonblock && ret == -EAGAIN) return io_setup_async_msg(req, kmsg); @@ -4663,7 +4667,7 @@ static int io_sendmsg(struct io_kiocb *req, bool force_nonblock, if (kmsg->iov != kmsg->fast_iov) kfree(kmsg->iov); req->flags &= ~REQ_F_NEED_CLEANUP; - if (ret < 0) + if (ret < min_ret) req_set_fail_links(req); __io_req_complete(req, ret, 0, cs); return 0; @@ -4677,6 +4681,7 @@ static int io_send(struct io_kiocb *req, bool force_nonblock, struct iovec iov; struct socket *sock; unsigned flags; + int min_ret = 0; int ret;
sock = sock_from_file(req->file); @@ -4698,6 +4703,9 @@ static int io_send(struct io_kiocb *req, bool force_nonblock, else if (force_nonblock) flags |= MSG_DONTWAIT;
+ if (flags & MSG_WAITALL) + min_ret = iov_iter_count(&msg.msg_iter); + msg.msg_flags = flags; ret = sock_sendmsg(sock, &msg); if (force_nonblock && ret == -EAGAIN) @@ -4705,7 +4713,7 @@ static int io_send(struct io_kiocb *req, bool force_nonblock, if (ret == -ERESTARTSYS) ret = -EINTR;
- if (ret < 0) + if (ret < min_ret) req_set_fail_links(req); __io_req_complete(req, ret, 0, cs); return 0; @@ -4857,6 +4865,7 @@ static int io_recvmsg(struct io_kiocb *req, bool force_nonblock, struct socket *sock; struct io_buffer *kbuf; unsigned flags; + int min_ret = 0; int ret, cflags = 0;
sock = sock_from_file(req->file); @@ -4892,6 +4901,9 @@ static int io_recvmsg(struct io_kiocb *req, bool force_nonblock, else if (force_nonblock) flags |= MSG_DONTWAIT;
+ if (flags & MSG_WAITALL) + min_ret = iov_iter_count(&kmsg->msg.msg_iter); + ret = __sys_recvmsg_sock(sock, &kmsg->msg, req->sr_msg.umsg, kmsg->uaddr, flags); if (force_nonblock && ret == -EAGAIN) @@ -4904,7 +4916,7 @@ static int io_recvmsg(struct io_kiocb *req, bool force_nonblock, if (kmsg->iov != kmsg->fast_iov) kfree(kmsg->iov); req->flags &= ~REQ_F_NEED_CLEANUP; - if (ret < 0) + if (ret < min_ret || ((flags & MSG_WAITALL) && (kmsg->msg.msg_flags & (MSG_TRUNC | MSG_CTRUNC)))) req_set_fail_links(req); __io_req_complete(req, ret, cflags, cs); return 0; @@ -4920,6 +4932,7 @@ static int io_recv(struct io_kiocb *req, bool force_nonblock, struct socket *sock; struct iovec iov; unsigned flags; + int min_ret = 0; int ret, cflags = 0;
sock = sock_from_file(req->file); @@ -4950,6 +4963,9 @@ static int io_recv(struct io_kiocb *req, bool force_nonblock, else if (force_nonblock) flags |= MSG_DONTWAIT;
+ if (flags & MSG_WAITALL) + min_ret = iov_iter_count(&msg.msg_iter); + ret = sock_recvmsg(sock, &msg, flags); if (force_nonblock && ret == -EAGAIN) return -EAGAIN; @@ -4958,7 +4974,7 @@ static int io_recv(struct io_kiocb *req, bool force_nonblock, out_free: if (req->flags & REQ_F_BUFFER_SELECTED) cflags = io_put_recv_kbuf(req); - if (ret < 0) + if (ret < min_ret || ((flags & MSG_WAITALL) && (msg.msg_flags & (MSG_TRUNC | MSG_CTRUNC)))) req_set_fail_links(req); __io_req_complete(req, ret, cflags, cs); return 0;
[ Upstream commit 7867299cde34e9c2d2c676f2a384a9d5853b914d ]
The condition should be skipped if CPU ID equal to nthreads. The patch doesn't fix any actual issue since nthreads = min_t(unsigned int, num_present_cpus(), MVPP2_MAX_THREADS). On all current Armada platforms, the number of CPU's is less than MVPP2_MAX_THREADS.
Fixes: e531f76757eb ("net: mvpp2: handle cases where more CPUs are available than s/w threads") Reported-by: Russell King rmk+kernel@armlinux.org.uk Signed-off-by: Stefan Chulski stefanc@marvell.com Reviewed-by: Russell King rmk+kernel@armlinux.org.uk Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c index 358119d98358..e6f9b5345b70 100644 --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c @@ -1153,7 +1153,7 @@ static void mvpp2_interrupts_unmask(void *arg) u32 val;
/* If the thread isn't used, don't do anything */ - if (smp_processor_id() > port->priv->nthreads) + if (smp_processor_id() >= port->priv->nthreads) return;
val = MVPP2_CAUSE_MISC_SUM_MASK | @@ -2287,7 +2287,7 @@ static void mvpp2_txq_sent_counter_clear(void *arg) int queue;
/* If the thread isn't used, don't do anything */ - if (smp_processor_id() > port->priv->nthreads) + if (smp_processor_id() >= port->priv->nthreads) return;
for (queue = 0; queue < port->ntxqs; queue++) {
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 15cc10453398c22f78f6c2b897119ecce5e5dd89 ]
Currently all errors received on msk subflows are ignored. We need to catch at least the errors on connect() and on fallback sockets.
Use a custom sk_error_report callback at subflow level, and do the real action under the msk socket lock - via the usual sock_owned_by_user()/release_callback() schema.
Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks") Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/protocol.c | 7 +++++++ net/mptcp/protocol.h | 4 ++++ net/mptcp/subflow.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 54 insertions(+)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 7345df40385a..f588332eebb4 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2958,6 +2958,8 @@ static void mptcp_release_cb(struct sock *sk) mptcp_push_pending(sk, 0); spin_lock_bh(&sk->sk_lock.slock); } + if (test_and_clear_bit(MPTCP_ERROR_REPORT, &mptcp_sk(sk)->flags)) + __mptcp_error_report(sk);
/* clear any wmem reservation and errors */ __mptcp_update_wmem(sk); @@ -3354,6 +3356,11 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock, if (sk->sk_shutdown & RCV_SHUTDOWN) mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP;
+ /* This barrier is coupled with smp_wmb() in tcp_reset() */ + smp_rmb(); + if (sk->sk_err) + mask |= EPOLLERR; + return mask; }
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index c374345ad134..62288836d053 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -96,6 +96,7 @@ #define MPTCP_WORK_CLOSE_SUBFLOW 5 #define MPTCP_PUSH_PENDING 6 #define MPTCP_CLEAN_UNA 7 +#define MPTCP_ERROR_REPORT 8
static inline bool before64(__u64 seq1, __u64 seq2) { @@ -413,6 +414,7 @@ struct mptcp_subflow_context { void (*tcp_data_ready)(struct sock *sk); void (*tcp_state_change)(struct sock *sk); void (*tcp_write_space)(struct sock *sk); + void (*tcp_error_report)(struct sock *sk);
struct rcu_head rcu; }; @@ -478,6 +480,7 @@ static inline void mptcp_subflow_tcp_fallback(struct sock *sk, sk->sk_data_ready = ctx->tcp_data_ready; sk->sk_state_change = ctx->tcp_state_change; sk->sk_write_space = ctx->tcp_write_space; + sk->sk_error_report = ctx->tcp_error_report;
inet_csk(sk)->icsk_af_ops = ctx->icsk_af_ops; } @@ -505,6 +508,7 @@ bool mptcp_finish_join(struct sock *sk); bool mptcp_schedule_work(struct sock *sk); void __mptcp_check_push(struct sock *sk, struct sock *ssk); void __mptcp_data_acked(struct sock *sk); +void __mptcp_error_report(struct sock *sk); void mptcp_subflow_eof(struct sock *sk); bool mptcp_update_rcv_data_fin(struct mptcp_sock *msk, u64 data_fin_seq, bool use_64bit); void __mptcp_flush_join_list(struct mptcp_sock *msk); diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 96e040951cd4..6c0205816a5d 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1054,6 +1054,46 @@ static void subflow_write_space(struct sock *ssk) /* we take action in __mptcp_clean_una() */ }
+void __mptcp_error_report(struct sock *sk) +{ + struct mptcp_subflow_context *subflow; + struct mptcp_sock *msk = mptcp_sk(sk); + + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + int err = sock_error(ssk); + + if (!err) + continue; + + /* only propagate errors on fallen-back sockets or + * on MPC connect + */ + if (sk->sk_state != TCP_SYN_SENT && !__mptcp_check_fallback(msk)) + continue; + + inet_sk_state_store(sk, inet_sk_state_load(ssk)); + sk->sk_err = -err; + + /* This barrier is coupled with smp_rmb() in mptcp_poll() */ + smp_wmb(); + sk->sk_error_report(sk); + break; + } +} + +static void subflow_error_report(struct sock *ssk) +{ + struct sock *sk = mptcp_subflow_ctx(ssk)->conn; + + mptcp_data_lock(sk); + if (!sock_owned_by_user(sk)) + __mptcp_error_report(sk); + else + set_bit(MPTCP_ERROR_REPORT, &mptcp_sk(sk)->flags); + mptcp_data_unlock(sk); +} + static struct inet_connection_sock_af_ops * subflow_default_af_ops(struct sock *sk) { @@ -1367,9 +1407,11 @@ static int subflow_ulp_init(struct sock *sk) ctx->tcp_data_ready = sk->sk_data_ready; ctx->tcp_state_change = sk->sk_state_change; ctx->tcp_write_space = sk->sk_write_space; + ctx->tcp_error_report = sk->sk_error_report; sk->sk_data_ready = subflow_data_ready; sk->sk_write_space = subflow_write_space; sk->sk_state_change = subflow_state_change; + sk->sk_error_report = subflow_error_report; out: return err; } @@ -1422,6 +1464,7 @@ static void subflow_ulp_clone(const struct request_sock *req, new_ctx->tcp_data_ready = old_ctx->tcp_data_ready; new_ctx->tcp_state_change = old_ctx->tcp_state_change; new_ctx->tcp_write_space = old_ctx->tcp_write_space; + new_ctx->tcp_error_report = old_ctx->tcp_error_report; new_ctx->rel_write_seq = 1; new_ctx->tcp_sock = newsk;
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit dd913410b0a442a53d41a9817ed2208850858e99 ]
The current mptcp_poll() implementation gives unexpected results after shutdown(SEND_SHUTDOWN) and when the msk status is TCP_CLOSE.
Set the correct mask.
Fixes: 8edf08649eed ("mptcp: rework poll+nospace handling") Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/protocol.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index f588332eebb4..44b8868f0607 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -3320,7 +3320,7 @@ static __poll_t mptcp_check_writeable(struct mptcp_sock *msk) struct sock *sk = (struct sock *)msk;
if (unlikely(sk->sk_shutdown & SEND_SHUTDOWN)) - return 0; + return EPOLLOUT | EPOLLWRNORM;
if (sk_stream_is_writeable(sk)) return EPOLLOUT | EPOLLWRNORM; @@ -3353,6 +3353,8 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock, mask |= mptcp_check_readable(msk); mask |= mptcp_check_writeable(msk); } + if (sk->sk_shutdown == SHUTDOWN_MASK || state == TCP_CLOSE) + mask |= EPOLLHUP; if (sk->sk_shutdown & RCV_SHUTDOWN) mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP;
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit d8b59efa64060d17b7b61f97d891de2d9f2bd9f0 ]
The mptcp subflow route_req() callback performs the subflow req initialization after the route_req() check. If the latter fails, mptcp-specific bits of the current request sockets are left uninitialized.
The above causes bad things at req socket disposal time, when the mptcp resources are cleared.
This change addresses the issue by splitting subflow_init_req() into the actual initialization and the mptcp-specific checks. The initialization is moved before any possibly failing check.
Reported-by: Christoph Paasch cpaasch@apple.com Fixes: 7ea851d19b23 ("tcp: merge 'init_req' and 'route_req' functions") Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/subflow.c | 40 ++++++++++++++++------------------------ 1 file changed, 16 insertions(+), 24 deletions(-)
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 6c0205816a5d..f97f29df4505 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -92,7 +92,7 @@ static struct mptcp_sock *subflow_token_join_request(struct request_sock *req, return msk; }
-static int __subflow_init_req(struct request_sock *req, const struct sock *sk_listener) +static void subflow_init_req(struct request_sock *req, const struct sock *sk_listener) { struct mptcp_subflow_request_sock *subflow_req = mptcp_subflow_rsk(req);
@@ -100,16 +100,6 @@ static int __subflow_init_req(struct request_sock *req, const struct sock *sk_li subflow_req->mp_join = 0; subflow_req->msk = NULL; mptcp_token_init_request(req); - -#ifdef CONFIG_TCP_MD5SIG - /* no MPTCP if MD5SIG is enabled on this socket or we may run out of - * TCP option space. - */ - if (rcu_access_pointer(tcp_sk(sk_listener)->md5sig_info)) - return -EINVAL; -#endif - - return 0; }
/* Init mptcp request socket. @@ -117,20 +107,23 @@ static int __subflow_init_req(struct request_sock *req, const struct sock *sk_li * Returns an error code if a JOIN has failed and a TCP reset * should be sent. */ -static int subflow_init_req(struct request_sock *req, - const struct sock *sk_listener, - struct sk_buff *skb) +static int subflow_check_req(struct request_sock *req, + const struct sock *sk_listener, + struct sk_buff *skb) { struct mptcp_subflow_context *listener = mptcp_subflow_ctx(sk_listener); struct mptcp_subflow_request_sock *subflow_req = mptcp_subflow_rsk(req); struct mptcp_options_received mp_opt; - int ret;
pr_debug("subflow_req=%p, listener=%p", subflow_req, listener);
- ret = __subflow_init_req(req, sk_listener); - if (ret) - return 0; +#ifdef CONFIG_TCP_MD5SIG + /* no MPTCP if MD5SIG is enabled on this socket or we may run out of + * TCP option space. + */ + if (rcu_access_pointer(tcp_sk(sk_listener)->md5sig_info)) + return -EINVAL; +#endif
mptcp_get_options(skb, &mp_opt);
@@ -205,10 +198,7 @@ int mptcp_subflow_init_cookie_req(struct request_sock *req, struct mptcp_options_received mp_opt; int err;
- err = __subflow_init_req(req, sk_listener); - if (err) - return err; - + subflow_init_req(req, sk_listener); mptcp_get_options(skb, &mp_opt);
if (mp_opt.mp_capable && mp_opt.mp_join) @@ -248,12 +238,13 @@ static struct dst_entry *subflow_v4_route_req(const struct sock *sk, int err;
tcp_rsk(req)->is_mptcp = 1; + subflow_init_req(req, sk);
dst = tcp_request_sock_ipv4_ops.route_req(sk, skb, fl, req); if (!dst) return NULL;
- err = subflow_init_req(req, sk, skb); + err = subflow_check_req(req, sk, skb); if (err == 0) return dst;
@@ -273,12 +264,13 @@ static struct dst_entry *subflow_v6_route_req(const struct sock *sk, int err;
tcp_rsk(req)->is_mptcp = 1; + subflow_init_req(req, sk);
dst = tcp_request_sock_ipv6_ops.route_req(sk, skb, fl, req); if (!dst) return NULL;
- err = subflow_init_req(req, sk, skb); + err = subflow_check_req(req, sk, skb); if (err == 0) return dst;
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit d09d818ec2ed31bce94fdcfcc4700233e01f8498 ]
Currently we do not schedule the MPTCP retransmission timer after pushing the data when such action happens in the subflow context.
This may cause hang-up on active-backup scenarios, or even when only single subflow msks are involved, if we lost some peer's ack.
Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks") Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/options.c | 3 +-- net/mptcp/protocol.c | 3 +++ 2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 37ef0bf098f6..9e86c601093f 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -885,8 +885,7 @@ static void ack_update_msk(struct mptcp_sock *msk, msk->wnd_end = new_wnd_end;
/* this assumes mptcp_incoming_options() is invoked after tcp_ack() */ - if (after64(msk->wnd_end, READ_ONCE(msk->snd_nxt)) && - sk_stream_memory_free(ssk)) + if (after64(msk->wnd_end, READ_ONCE(msk->snd_nxt))) __mptcp_check_push(sk, ssk);
if (after64(new_snd_una, old_snd_una)) { diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 44b8868f0607..67483e561b37 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1568,6 +1568,9 @@ out: mptcp_set_timeout(sk, ssk); tcp_push(ssk, 0, info.mss_now, tcp_sk(ssk)->nonagle, info.size_goal); + if (!mptcp_timer_pending(sk)) + mptcp_reset_timer(sk); + if (msk->snd_data_fin_enable && msk->snd_nxt + 1 == msk->write_seq) mptcp_schedule_work(sk);
From: Davide Caratti dcaratti@redhat.com
[ Upstream commit d2126838050ccd1dadf310ffb78b2204f3b032b9 ]
the following command:
# tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \ $tcflags dst_ip 192.0.2.2 ip_ttl 63 action drop
doesn't drop all IPv4 packets that match the configured TTL / destination address. In particular, if "fragment offset" or "more fragments" have non zero value in the IPv4 header, setting of FLOW_DISSECTOR_KEY_IP is simply ignored. Fix this dissecting IPv4 TTL and TOS before fragment info; while at it, add a selftest for tc flower's match on 'ip_ttl' that verifies the correct behavior.
Fixes: 518d8a2e9bad ("net/flow_dissector: add support for dissection of misc ip header fields") Reported-by: Shuang Li shuali@redhat.com Signed-off-by: Davide Caratti dcaratti@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/flow_dissector.c | 6 +-- .../selftests/net/forwarding/tc_flower.sh | 38 ++++++++++++++++++- 2 files changed, 40 insertions(+), 4 deletions(-)
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 7a06d4301617..180be5102efc 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -1050,6 +1050,9 @@ proto_again: key_control->addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS; }
+ __skb_flow_dissect_ipv4(skb, flow_dissector, + target_container, data, iph); + if (ip_is_fragment(iph)) { key_control->flags |= FLOW_DIS_IS_FRAGMENT;
@@ -1066,9 +1069,6 @@ proto_again: } }
- __skb_flow_dissect_ipv4(skb, flow_dissector, - target_container, data, iph); - break; } case htons(ETH_P_IPV6): { diff --git a/tools/testing/selftests/net/forwarding/tc_flower.sh b/tools/testing/selftests/net/forwarding/tc_flower.sh index 058c746ee300..b11d8e6b5bc1 100755 --- a/tools/testing/selftests/net/forwarding/tc_flower.sh +++ b/tools/testing/selftests/net/forwarding/tc_flower.sh @@ -3,7 +3,7 @@
ALL_TESTS="match_dst_mac_test match_src_mac_test match_dst_ip_test \ match_src_ip_test match_ip_flags_test match_pcp_test match_vlan_test \ - match_ip_tos_test match_indev_test" + match_ip_tos_test match_indev_test match_ip_ttl_test" NUM_NETIFS=2 source tc_common.sh source lib.sh @@ -310,6 +310,42 @@ match_ip_tos_test() log_test "ip_tos match ($tcflags)" }
+match_ip_ttl_test() +{ + RET=0 + + tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \ + $tcflags dst_ip 192.0.2.2 ip_ttl 63 action drop + tc filter add dev $h2 ingress protocol ip pref 2 handle 102 flower \ + $tcflags dst_ip 192.0.2.2 action drop + + $MZ $h1 -c 1 -p 64 -a $h1mac -b $h2mac -A 192.0.2.1 -B 192.0.2.2 \ + -t ip "ttl=63" -q + + $MZ $h1 -c 1 -p 64 -a $h1mac -b $h2mac -A 192.0.2.1 -B 192.0.2.2 \ + -t ip "ttl=63,mf,frag=256" -q + + tc_check_packets "dev $h2 ingress" 102 1 + check_fail $? "Matched on the wrong filter (no check on ttl)" + + tc_check_packets "dev $h2 ingress" 101 2 + check_err $? "Did not match on correct filter (ttl=63)" + + $MZ $h1 -c 1 -p 64 -a $h1mac -b $h2mac -A 192.0.2.1 -B 192.0.2.2 \ + -t ip "ttl=255" -q + + tc_check_packets "dev $h2 ingress" 101 3 + check_fail $? "Matched on a wrong filter (ttl=63)" + + tc_check_packets "dev $h2 ingress" 102 1 + check_err $? "Did not match on correct filter (no check on ttl)" + + tc filter del dev $h2 ingress protocol ip pref 2 handle 102 flower + tc filter del dev $h2 ingress protocol ip pref 1 handle 101 flower + + log_test "ip_ttl match ($tcflags)" +} + match_indev_test() { RET=0
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 341c65242fe18aac8900e4291d472df9f7ba7bc7 ]
Currently we move orphaned msk sockets directly from FIN_WAIT2 state to CLOSE, with the rationale that incoming additional data could be just dropped by the TCP stack/TW sockets.
Anyhow we miss sending MPTCP-level ack on incoming DATA_FIN, and that may hang the peers.
Fixes: e16163b6e2b7 ("mptcp: refactor shutdown and close") Reviewed-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/protocol.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 67483e561b37..88f2d900a347 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2292,13 +2292,12 @@ static void mptcp_worker(struct work_struct *work) __mptcp_check_send_data_fin(sk); mptcp_check_data_fin(sk);
- /* if the msk data is completely acked, or the socket timedout, - * there is no point in keeping around an orphaned sk + /* There is no point in keeping around an orphaned sk timedout or + * closed, but we need the msk around to reply to incoming DATA_FIN, + * even if it is orphaned and in FIN_WAIT2 state */ if (sock_flag(sk, SOCK_DEAD) && - (mptcp_check_close_timeout(sk) || - (state != sk->sk_state && - ((1 << inet_sk_state_load(sk)) & (TCPF_CLOSE | TCPF_FIN_WAIT2))))) { + (mptcp_check_close_timeout(sk) || sk->sk_state == TCP_CLOSE)) { inet_sk_state_store(sk, TCP_CLOSE); __mptcp_destroy_sock(sk); goto unlock;
From: Florian Westphal fw@strlen.de
[ Upstream commit ad98dd37051e14fa8c785609430d907fcfd518ba ]
mptcp re-used inet(6)_release, so the subflow sockets are ignored. Need to invoke ip(v6)_mc_drop_socket function to ensure mcast join resources get free'd.
Fixes: 717e79c867ca5 ("mptcp: Add setsockopt()/getsockopt() socket operations") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/110 Acked-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/protocol.c | 55 ++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 53 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 88f2d900a347..c3299a4568a0 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -11,6 +11,7 @@ #include <linux/netdevice.h> #include <linux/sched/signal.h> #include <linux/atomic.h> +#include <linux/igmp.h> #include <net/sock.h> #include <net/inet_common.h> #include <net/inet_hashtables.h> @@ -19,6 +20,7 @@ #include <net/tcp_states.h> #if IS_ENABLED(CONFIG_MPTCP_IPV6) #include <net/transp_v6.h> +#include <net/addrconf.h> #endif #include <net/mptcp.h> #include <net/xfrm.h> @@ -3368,10 +3370,34 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock, return mask; }
+static int mptcp_release(struct socket *sock) +{ + struct mptcp_subflow_context *subflow; + struct sock *sk = sock->sk; + struct mptcp_sock *msk; + + if (!sk) + return 0; + + lock_sock(sk); + + msk = mptcp_sk(sk); + + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + + ip_mc_drop_socket(ssk); + } + + release_sock(sk); + + return inet_release(sock); +} + static const struct proto_ops mptcp_stream_ops = { .family = PF_INET, .owner = THIS_MODULE, - .release = inet_release, + .release = mptcp_release, .bind = mptcp_bind, .connect = mptcp_stream_connect, .socketpair = sock_no_socketpair, @@ -3418,10 +3444,35 @@ void __init mptcp_proto_init(void) }
#if IS_ENABLED(CONFIG_MPTCP_IPV6) +static int mptcp6_release(struct socket *sock) +{ + struct mptcp_subflow_context *subflow; + struct mptcp_sock *msk; + struct sock *sk = sock->sk; + + if (!sk) + return 0; + + lock_sock(sk); + + msk = mptcp_sk(sk); + + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + + ip_mc_drop_socket(ssk); + ipv6_sock_mc_close(ssk); + ipv6_sock_ac_close(ssk); + } + + release_sock(sk); + return inet6_release(sock); +} + static const struct proto_ops mptcp_v6_stream_ops = { .family = PF_INET6, .owner = THIS_MODULE, - .release = inet6_release, + .release = mptcp6_release, .bind = mptcp_bind, .connect = mptcp_stream_connect, .socketpair = sock_no_socketpair,
From: Marc Kleine-Budde mkl@pengutronix.de
[ Upstream commit 3e77f70e734584e0ad1038e459ed3fd2400f873a ]
This patch moves the CAN driver related infrastructure into a separate subdir. It will be split into more files in the coming patches.
Reviewed-by: Vincent Mailhol mailhol.vincent@wanadoo.fr Link: https://lore.kernel.org/r/20210111141930.693847-3-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/can/Makefile | 7 +------ drivers/net/can/dev/Makefile | 7 +++++++ drivers/net/can/{ => dev}/dev.c | 0 drivers/net/can/{ => dev}/rx-offload.c | 0 4 files changed, 8 insertions(+), 6 deletions(-) create mode 100644 drivers/net/can/dev/Makefile rename drivers/net/can/{ => dev}/dev.c (100%) rename drivers/net/can/{ => dev}/rx-offload.c (100%)
diff --git a/drivers/net/can/Makefile b/drivers/net/can/Makefile index 22164300122d..a2b4463d8480 100644 --- a/drivers/net/can/Makefile +++ b/drivers/net/can/Makefile @@ -7,12 +7,7 @@ obj-$(CONFIG_CAN_VCAN) += vcan.o obj-$(CONFIG_CAN_VXCAN) += vxcan.o obj-$(CONFIG_CAN_SLCAN) += slcan.o
-obj-$(CONFIG_CAN_DEV) += can-dev.o -can-dev-y += dev.o -can-dev-y += rx-offload.o - -can-dev-$(CONFIG_CAN_LEDS) += led.o - +obj-y += dev/ obj-y += rcar/ obj-y += spi/ obj-y += usb/ diff --git a/drivers/net/can/dev/Makefile b/drivers/net/can/dev/Makefile new file mode 100644 index 000000000000..cba92e6bcf6f --- /dev/null +++ b/drivers/net/can/dev/Makefile @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0 + +obj-$(CONFIG_CAN_DEV) += can-dev.o +can-dev-y += dev.o +can-dev-y += rx-offload.o + +can-dev-$(CONFIG_CAN_LEDS) += led.o diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev/dev.c similarity index 100% rename from drivers/net/can/dev.c rename to drivers/net/can/dev/dev.c diff --git a/drivers/net/can/rx-offload.c b/drivers/net/can/dev/rx-offload.c similarity index 100% rename from drivers/net/can/rx-offload.c rename to drivers/net/can/dev/rx-offload.c
From: Oleksij Rempel o.rempel@pengutronix.de
[ Upstream commit 4e096a18867a5a989b510f6999d9c6b6622e8f7b ]
Since 20dd3850bcf8 ("can: Speed up CAN frame receiption by using ml_priv") the CAN framework uses per device specific data in the AF_CAN protocol. For this purpose the struct net_device->ml_priv is used. Later the ml_priv usage in CAN was extended for other users, one of them being CAN_J1939.
Later in the kernel ml_priv was converted to an union, used by other drivers. E.g. the tun driver started storing it's stats pointer.
Since tun devices can claim to be a CAN device, CAN specific protocols will wrongly interpret this pointer, which will cause system crashes. Mostly this issue is visible in the CAN_J1939 stack.
To fix this issue, we request a dedicated CAN pointer within the net_device struct.
Reported-by: syzbot+5138c4dd15a0401bec7b@syzkaller.appspotmail.com Fixes: 20dd3850bcf8 ("can: Speed up CAN frame receiption by using ml_priv") Fixes: ffd956eef69b ("can: introduce CAN midlayer private and allocate it automatically") Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Fixes: 497a5757ce4e ("tun: switch to net core provided statistics counters") Signed-off-by: Oleksij Rempel o.rempel@pengutronix.de Link: https://lore.kernel.org/r/20210223070127.4538-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/can/dev/dev.c | 4 +++- drivers/net/can/slcan.c | 4 +++- drivers/net/can/vcan.c | 2 +- drivers/net/can/vxcan.c | 6 +++++- include/linux/can/can-ml.h | 12 ++++++++++++ include/linux/netdevice.h | 34 +++++++++++++++++++++++++++++++++- net/can/af_can.c | 34 ++-------------------------------- net/can/j1939/main.c | 22 ++++++++-------------- net/can/j1939/socket.c | 13 ++++--------- net/can/proc.c | 19 +++++++++++++------ 10 files changed, 84 insertions(+), 66 deletions(-)
diff --git a/drivers/net/can/dev/dev.c b/drivers/net/can/dev/dev.c index 2a4f12c3c28b..a665afaeccd1 100644 --- a/drivers/net/can/dev/dev.c +++ b/drivers/net/can/dev/dev.c @@ -747,6 +747,7 @@ EXPORT_SYMBOL_GPL(alloc_can_err_skb); struct net_device *alloc_candev_mqs(int sizeof_priv, unsigned int echo_skb_max, unsigned int txqs, unsigned int rxqs) { + struct can_ml_priv *can_ml; struct net_device *dev; struct can_priv *priv; int size; @@ -778,7 +779,8 @@ struct net_device *alloc_candev_mqs(int sizeof_priv, unsigned int echo_skb_max, priv = netdev_priv(dev); priv->dev = dev;
- dev->ml_priv = (void *)priv + ALIGN(sizeof_priv, NETDEV_ALIGN); + can_ml = (void *)priv + ALIGN(sizeof_priv, NETDEV_ALIGN); + can_set_ml_priv(dev, can_ml);
if (echo_skb_max) { priv->echo_skb_max = echo_skb_max; diff --git a/drivers/net/can/slcan.c b/drivers/net/can/slcan.c index a1bd1be09548..30c8d53c9745 100644 --- a/drivers/net/can/slcan.c +++ b/drivers/net/can/slcan.c @@ -516,6 +516,7 @@ static struct slcan *slc_alloc(void) int i; char name[IFNAMSIZ]; struct net_device *dev = NULL; + struct can_ml_priv *can_ml; struct slcan *sl; int size;
@@ -538,7 +539,8 @@ static struct slcan *slc_alloc(void)
dev->base_addr = i; sl = netdev_priv(dev); - dev->ml_priv = (void *)sl + ALIGN(sizeof(*sl), NETDEV_ALIGN); + can_ml = (void *)sl + ALIGN(sizeof(*sl), NETDEV_ALIGN); + can_set_ml_priv(dev, can_ml);
/* Initialize channel control data */ sl->magic = SLCAN_MAGIC; diff --git a/drivers/net/can/vcan.c b/drivers/net/can/vcan.c index 39ca14b0585d..067705e2850b 100644 --- a/drivers/net/can/vcan.c +++ b/drivers/net/can/vcan.c @@ -153,7 +153,7 @@ static void vcan_setup(struct net_device *dev) dev->addr_len = 0; dev->tx_queue_len = 0; dev->flags = IFF_NOARP; - dev->ml_priv = netdev_priv(dev); + can_set_ml_priv(dev, netdev_priv(dev));
/* set flags according to driver capabilities */ if (echo) diff --git a/drivers/net/can/vxcan.c b/drivers/net/can/vxcan.c index f9a524c5f6d6..8861a7d875e7 100644 --- a/drivers/net/can/vxcan.c +++ b/drivers/net/can/vxcan.c @@ -141,6 +141,8 @@ static const struct net_device_ops vxcan_netdev_ops = {
static void vxcan_setup(struct net_device *dev) { + struct can_ml_priv *can_ml; + dev->type = ARPHRD_CAN; dev->mtu = CANFD_MTU; dev->hard_header_len = 0; @@ -149,7 +151,9 @@ static void vxcan_setup(struct net_device *dev) dev->flags = (IFF_NOARP|IFF_ECHO); dev->netdev_ops = &vxcan_netdev_ops; dev->needs_free_netdev = true; - dev->ml_priv = netdev_priv(dev) + ALIGN(sizeof(struct vxcan_priv), NETDEV_ALIGN); + + can_ml = netdev_priv(dev) + ALIGN(sizeof(struct vxcan_priv), NETDEV_ALIGN); + can_set_ml_priv(dev, can_ml); }
/* forward declaration for rtnl_create_link() */ diff --git a/include/linux/can/can-ml.h b/include/linux/can/can-ml.h index 2f5d731ae251..8afa92d15a66 100644 --- a/include/linux/can/can-ml.h +++ b/include/linux/can/can-ml.h @@ -44,6 +44,7 @@
#include <linux/can.h> #include <linux/list.h> +#include <linux/netdevice.h>
#define CAN_SFF_RCV_ARRAY_SZ (1 << CAN_SFF_ID_BITS) #define CAN_EFF_RCV_HASH_BITS 10 @@ -65,4 +66,15 @@ struct can_ml_priv { #endif };
+static inline struct can_ml_priv *can_get_ml_priv(struct net_device *dev) +{ + return netdev_get_ml_priv(dev, ML_PRIV_CAN); +} + +static inline void can_set_ml_priv(struct net_device *dev, + struct can_ml_priv *ml_priv) +{ + netdev_set_ml_priv(dev, ml_priv, ML_PRIV_CAN); +} + #endif /* CAN_ML_H */ diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index fb79ac497794..688c7477ec0a 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1607,6 +1607,12 @@ enum netdev_priv_flags { #define IFF_L3MDEV_RX_HANDLER IFF_L3MDEV_RX_HANDLER #define IFF_LIVE_RENAME_OK IFF_LIVE_RENAME_OK
+/* Specifies the type of the struct net_device::ml_priv pointer */ +enum netdev_ml_priv_type { + ML_PRIV_NONE, + ML_PRIV_CAN, +}; + /** * struct net_device - The DEVICE structure. * @@ -1802,6 +1808,7 @@ enum netdev_priv_flags { * @nd_net: Network namespace this network device is inside * * @ml_priv: Mid-layer private + * @ml_priv_type: Mid-layer private type * @lstats: Loopback statistics * @tstats: Tunnel statistics * @dstats: Dummy statistics @@ -2114,8 +2121,10 @@ struct net_device { possible_net_t nd_net;
/* mid-layer private */ + void *ml_priv; + enum netdev_ml_priv_type ml_priv_type; + union { - void *ml_priv; struct pcpu_lstats __percpu *lstats; struct pcpu_sw_netstats __percpu *tstats; struct pcpu_dstats __percpu *dstats; @@ -2305,6 +2314,29 @@ static inline void netdev_reset_rx_headroom(struct net_device *dev) netdev_set_rx_headroom(dev, -1); }
+static inline void *netdev_get_ml_priv(struct net_device *dev, + enum netdev_ml_priv_type type) +{ + if (dev->ml_priv_type != type) + return NULL; + + return dev->ml_priv; +} + +static inline void netdev_set_ml_priv(struct net_device *dev, + void *ml_priv, + enum netdev_ml_priv_type type) +{ + WARN(dev->ml_priv_type && dev->ml_priv_type != type, + "Overwriting already set ml_priv_type (%u) with different ml_priv_type (%u)!\n", + dev->ml_priv_type, type); + WARN(!dev->ml_priv_type && dev->ml_priv, + "Overwriting already set ml_priv and ml_priv_type is ML_PRIV_NONE!\n"); + + dev->ml_priv = ml_priv; + dev->ml_priv_type = type; +} + /* * Net namespace inlines */ diff --git a/net/can/af_can.c b/net/can/af_can.c index 837bb8af0ec3..cce2af10eb3e 100644 --- a/net/can/af_can.c +++ b/net/can/af_can.c @@ -304,8 +304,8 @@ static struct can_dev_rcv_lists *can_dev_rcv_lists_find(struct net *net, struct net_device *dev) { if (dev) { - struct can_ml_priv *ml_priv = dev->ml_priv; - return &ml_priv->dev_rcv_lists; + struct can_ml_priv *can_ml = can_get_ml_priv(dev); + return &can_ml->dev_rcv_lists; } else { return net->can.rx_alldev_list; } @@ -790,25 +790,6 @@ void can_proto_unregister(const struct can_proto *cp) } EXPORT_SYMBOL(can_proto_unregister);
-/* af_can notifier to create/remove CAN netdevice specific structs */ -static int can_notifier(struct notifier_block *nb, unsigned long msg, - void *ptr) -{ - struct net_device *dev = netdev_notifier_info_to_dev(ptr); - - if (dev->type != ARPHRD_CAN) - return NOTIFY_DONE; - - switch (msg) { - case NETDEV_REGISTER: - WARN(!dev->ml_priv, - "No CAN mid layer private allocated, please fix your driver and use alloc_candev()!\n"); - break; - } - - return NOTIFY_DONE; -} - static int can_pernet_init(struct net *net) { spin_lock_init(&net->can.rcvlists_lock); @@ -876,11 +857,6 @@ static const struct net_proto_family can_family_ops = { .owner = THIS_MODULE, };
-/* notifier block for netdevice event */ -static struct notifier_block can_netdev_notifier __read_mostly = { - .notifier_call = can_notifier, -}; - static struct pernet_operations can_pernet_ops __read_mostly = { .init = can_pernet_init, .exit = can_pernet_exit, @@ -911,17 +887,12 @@ static __init int can_init(void) err = sock_register(&can_family_ops); if (err) goto out_sock; - err = register_netdevice_notifier(&can_netdev_notifier); - if (err) - goto out_notifier;
dev_add_pack(&can_packet); dev_add_pack(&canfd_packet);
return 0;
-out_notifier: - sock_unregister(PF_CAN); out_sock: unregister_pernet_subsys(&can_pernet_ops); out_pernet: @@ -935,7 +906,6 @@ static __exit void can_exit(void) /* protocol unregister */ dev_remove_pack(&canfd_packet); dev_remove_pack(&can_packet); - unregister_netdevice_notifier(&can_netdev_notifier); sock_unregister(PF_CAN);
unregister_pernet_subsys(&can_pernet_ops); diff --git a/net/can/j1939/main.c b/net/can/j1939/main.c index bb914d8b4216..da3a7a7bcff2 100644 --- a/net/can/j1939/main.c +++ b/net/can/j1939/main.c @@ -140,9 +140,9 @@ static struct j1939_priv *j1939_priv_create(struct net_device *ndev) static inline void j1939_priv_set(struct net_device *ndev, struct j1939_priv *priv) { - struct can_ml_priv *can_ml_priv = ndev->ml_priv; + struct can_ml_priv *can_ml = can_get_ml_priv(ndev);
- can_ml_priv->j1939_priv = priv; + can_ml->j1939_priv = priv; }
static void __j1939_priv_release(struct kref *kref) @@ -211,12 +211,9 @@ static void __j1939_rx_release(struct kref *kref) /* get pointer to priv without increasing ref counter */ static inline struct j1939_priv *j1939_ndev_to_priv(struct net_device *ndev) { - struct can_ml_priv *can_ml_priv = ndev->ml_priv; + struct can_ml_priv *can_ml = can_get_ml_priv(ndev);
- if (!can_ml_priv) - return NULL; - - return can_ml_priv->j1939_priv; + return can_ml->j1939_priv; }
static struct j1939_priv *j1939_priv_get_by_ndev_locked(struct net_device *ndev) @@ -225,9 +222,6 @@ static struct j1939_priv *j1939_priv_get_by_ndev_locked(struct net_device *ndev)
lockdep_assert_held(&j1939_netdev_lock);
- if (ndev->type != ARPHRD_CAN) - return NULL; - priv = j1939_ndev_to_priv(ndev); if (priv) j1939_priv_get(priv); @@ -348,15 +342,16 @@ static int j1939_netdev_notify(struct notifier_block *nb, unsigned long msg, void *data) { struct net_device *ndev = netdev_notifier_info_to_dev(data); + struct can_ml_priv *can_ml = can_get_ml_priv(ndev); struct j1939_priv *priv;
+ if (!can_ml) + goto notify_done; + priv = j1939_priv_get_by_ndev(ndev); if (!priv) goto notify_done;
- if (ndev->type != ARPHRD_CAN) - goto notify_put; - switch (msg) { case NETDEV_DOWN: j1939_cancel_active_session(priv, NULL); @@ -365,7 +360,6 @@ static int j1939_netdev_notify(struct notifier_block *nb, break; }
-notify_put: j1939_priv_put(priv);
notify_done: diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c index f23966526a88..56aa66147d5a 100644 --- a/net/can/j1939/socket.c +++ b/net/can/j1939/socket.c @@ -12,6 +12,7 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/can/can-ml.h> #include <linux/can/core.h> #include <linux/can/skb.h> #include <linux/errqueue.h> @@ -453,6 +454,7 @@ static int j1939_sk_bind(struct socket *sock, struct sockaddr *uaddr, int len) j1939_jsk_del(priv, jsk); j1939_local_ecu_put(priv, jsk->addr.src_name, jsk->addr.sa); } else { + struct can_ml_priv *can_ml; struct net_device *ndev;
ndev = dev_get_by_index(net, addr->can_ifindex); @@ -461,15 +463,8 @@ static int j1939_sk_bind(struct socket *sock, struct sockaddr *uaddr, int len) goto out_release_sock; }
- if (ndev->type != ARPHRD_CAN) { - dev_put(ndev); - ret = -ENODEV; - goto out_release_sock; - } - - if (!ndev->ml_priv) { - netdev_warn_once(ndev, - "No CAN mid layer private allocated, please fix your driver and use alloc_candev()!\n"); + can_ml = can_get_ml_priv(ndev); + if (!can_ml) { dev_put(ndev); ret = -ENODEV; goto out_release_sock; diff --git a/net/can/proc.c b/net/can/proc.c index 5ea8695f507e..b15760b5c1cc 100644 --- a/net/can/proc.c +++ b/net/can/proc.c @@ -322,8 +322,11 @@ static int can_rcvlist_proc_show(struct seq_file *m, void *v)
/* receive list for registered CAN devices */ for_each_netdev_rcu(net, dev) { - if (dev->type == ARPHRD_CAN && dev->ml_priv) - can_rcvlist_proc_show_one(m, idx, dev, dev->ml_priv); + struct can_ml_priv *can_ml = can_get_ml_priv(dev); + + if (can_ml) + can_rcvlist_proc_show_one(m, idx, dev, + &can_ml->dev_rcv_lists); }
rcu_read_unlock(); @@ -375,8 +378,10 @@ static int can_rcvlist_sff_proc_show(struct seq_file *m, void *v)
/* sff receive list for registered CAN devices */ for_each_netdev_rcu(net, dev) { - if (dev->type == ARPHRD_CAN && dev->ml_priv) { - dev_rcv_lists = dev->ml_priv; + struct can_ml_priv *can_ml = can_get_ml_priv(dev); + + if (can_ml) { + dev_rcv_lists = &can_ml->dev_rcv_lists; can_rcvlist_proc_show_array(m, dev, dev_rcv_lists->rx_sff, ARRAY_SIZE(dev_rcv_lists->rx_sff)); } @@ -406,8 +411,10 @@ static int can_rcvlist_eff_proc_show(struct seq_file *m, void *v)
/* eff receive list for registered CAN devices */ for_each_netdev_rcu(net, dev) { - if (dev->type == ARPHRD_CAN && dev->ml_priv) { - dev_rcv_lists = dev->ml_priv; + struct can_ml_priv *can_ml = can_get_ml_priv(dev); + + if (can_ml) { + dev_rcv_lists = &can_ml->dev_rcv_lists; can_rcvlist_proc_show_array(m, dev, dev_rcv_lists->rx_eff, ARRAY_SIZE(dev_rcv_lists->rx_eff)); }
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit c2e6048fa1cf2228063aec299f93ac6eb256b457 ]
If we receive a MPTCP_PUSH_PENDING even from a subflow when mptcp_release_cb() is serving the previous one, the latter will be delayed up to the next release_sock(msk).
Address the issue implementing a test/serve loop for such event.
Additionally rename the push helper to __mptcp_push_pending() to be more consistent with the existing code.
Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks") Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/protocol.c | 33 +++++++++++++++++++++------------ 1 file changed, 21 insertions(+), 12 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index c3299a4568a0..7cbb544c6d02 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1442,7 +1442,7 @@ static void mptcp_push_release(struct sock *sk, struct sock *ssk, release_sock(ssk); }
-static void mptcp_push_pending(struct sock *sk, unsigned int flags) +static void __mptcp_push_pending(struct sock *sk, unsigned int flags) { struct sock *prev_ssk = NULL, *ssk = NULL; struct mptcp_sock *msk = mptcp_sk(sk); @@ -1681,14 +1681,14 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
wait_for_memory: set_bit(MPTCP_NOSPACE, &msk->flags); - mptcp_push_pending(sk, msg->msg_flags); + __mptcp_push_pending(sk, msg->msg_flags); ret = sk_stream_wait_memory(sk, &timeo); if (ret) goto out; }
if (copied) - mptcp_push_pending(sk, msg->msg_flags); + __mptcp_push_pending(sk, msg->msg_flags);
out: release_sock(sk); @@ -2944,13 +2944,14 @@ static void mptcp_release_cb(struct sock *sk) { unsigned long flags, nflags;
- /* push_pending may touch wmem_reserved, do it before the later - * cleanup - */ - if (test_and_clear_bit(MPTCP_CLEAN_UNA, &mptcp_sk(sk)->flags)) - __mptcp_clean_una(sk); - if (test_and_clear_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags)) { - /* mptcp_push_pending() acquires the subflow socket lock + for (;;) { + flags = 0; + if (test_and_clear_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags)) + flags |= MPTCP_PUSH_PENDING; + if (!flags) + break; + + /* the following actions acquire the subflow socket lock * * 1) can't be invoked in atomic scope * 2) must avoid ABBA deadlock with msk socket spinlock: the RX @@ -2959,13 +2960,21 @@ static void mptcp_release_cb(struct sock *sk) */
spin_unlock_bh(&sk->sk_lock.slock); - mptcp_push_pending(sk, 0); + if (flags & MPTCP_PUSH_PENDING) + __mptcp_push_pending(sk, 0); + + cond_resched(); spin_lock_bh(&sk->sk_lock.slock); } + + if (test_and_clear_bit(MPTCP_CLEAN_UNA, &mptcp_sk(sk)->flags)) + __mptcp_clean_una(sk); if (test_and_clear_bit(MPTCP_ERROR_REPORT, &mptcp_sk(sk)->flags)) __mptcp_error_report(sk);
- /* clear any wmem reservation and errors */ + /* push_pending may touch wmem_reserved, ensure we do the cleanup + * later + */ __mptcp_update_wmem(sk); __mptcp_update_rmem(sk);
From: Jia-Ju Bai baijiaju1990@gmail.com
[ Upstream commit 2055a99da8a253a357bdfd359b3338ef3375a26c ]
When slave is NULL or slave_ops->ndo_neigh_setup is NULL, no error return code of bond_neigh_init() is assigned. To fix this bug, ret is assigned with -EINVAL in these cases.
Fixes: 9e99bfefdbce ("bonding: fix bond_neigh_init()") Reported-by: TOTE Robot oslab@tsinghua.edu.cn Signed-off-by: Jia-Ju Bai baijiaju1990@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 5fe5232cc3f3..fba6b6d1b430 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3917,11 +3917,15 @@ static int bond_neigh_init(struct neighbour *n)
rcu_read_lock(); slave = bond_first_slave_rcu(bond); - if (!slave) + if (!slave) { + ret = -EINVAL; goto out; + } slave_ops = slave->dev->netdev_ops; - if (!slave_ops->ndo_neigh_setup) + if (!slave_ops->ndo_neigh_setup) { + ret = -EINVAL; goto out; + }
/* TODO: find another way [1] to implement this. * Passing a zeroed structure is fragile,
From: Dan Carpenter dan.carpenter@oracle.com
[ Upstream commit 2e5de7e0c8d2caa860e133ef71fc94671cb8e0bf ]
The MPTCP_PUSH_PENDING define is 6 and these tests should be testing if BIT(6) is set.
Fixes: c2e6048fa1cf ("mptcp: fix race in release_cb") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/protocol.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 7cbb544c6d02..5932b0ebecc3 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2947,7 +2947,7 @@ static void mptcp_release_cb(struct sock *sk) for (;;) { flags = 0; if (test_and_clear_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags)) - flags |= MPTCP_PUSH_PENDING; + flags |= BIT(MPTCP_PUSH_PENDING); if (!flags) break;
@@ -2960,7 +2960,7 @@ static void mptcp_release_cb(struct sock *sk) */
spin_unlock_bh(&sk->sk_lock.slock); - if (flags & MPTCP_PUSH_PENDING) + if (flags & BIT(MPTCP_PUSH_PENDING)) __mptcp_push_pending(sk, 0);
cond_resched();
[ Upstream commit 6e1caaf8ed22eb700cc47ec353816eee33186c1c ]
This patch fixes the max register value for the regmap.
Reviewed-by: Dan Murphy dmurphy@ti.com Tested-by: Sean Nyekjaer sean@geanix.com Link: https://lore.kernel.org/r/20201215231746.1132907-12-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/can/m_can/tcan4x5x.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/can/m_can/tcan4x5x.c b/drivers/net/can/m_can/tcan4x5x.c index 4920de09ffb7..aeac3ce7bfc8 100644 --- a/drivers/net/can/m_can/tcan4x5x.c +++ b/drivers/net/can/m_can/tcan4x5x.c @@ -88,7 +88,7 @@
#define TCAN4X5X_MRAM_START 0x8000 #define TCAN4X5X_MCAN_OFFSET 0x1000 -#define TCAN4X5X_MAX_REGISTER 0x8fff +#define TCAN4X5X_MAX_REGISTER 0x8ffc
#define TCAN4X5X_CLEAR_ALL_INT 0xffffffff #define TCAN4X5X_SET_ALL_INT 0xffffffff
From: Luca Pesce luca.pesce@vimar.com
[ Upstream commit e862a3e4088070de352fdafe9bd9e3ae0a95a33c ]
This ensure that previous association attempts do not leave stale statuses on subsequent attempts.
This fixes the WARN_ON(!cr->bss)) from __cfg80211_connect_result() when connecting to an AP after a previous connection failure (e.g. where EAP fails due to incorrect psk but association succeeded). In some scenarios, indeed, brcmf_is_linkup() was reporting a link up event too early due to stale BRCMF_VIF_STATUS_ASSOC_SUCCESS bit, thus reporting to cfg80211 a connection result with a zeroed bssid (vif->profile.bssid is still empty), causing the WARN_ON due to the call to cfg80211_get_bss() with the empty bssid.
Signed-off-by: Luca Pesce luca.pesce@vimar.com Signed-off-by: Kalle Valo kvalo@codeaurora.org Link: https://lore.kernel.org/r/1608807119-21785-1-git-send-email-luca.pesce@vimar... Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c index 0ee421f30aa2..23e6422c2251 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c @@ -5611,7 +5611,8 @@ static bool brcmf_is_linkup(struct brcmf_cfg80211_vif *vif, return false; }
-static bool brcmf_is_linkdown(const struct brcmf_event_msg *e) +static bool brcmf_is_linkdown(struct brcmf_cfg80211_vif *vif, + const struct brcmf_event_msg *e) { u32 event = e->event_code; u16 flags = e->flags; @@ -5620,6 +5621,8 @@ static bool brcmf_is_linkdown(const struct brcmf_event_msg *e) (event == BRCMF_E_DISASSOC_IND) || ((event == BRCMF_E_LINK) && (!(flags & BRCMF_EVENT_MSG_LINK)))) { brcmf_dbg(CONN, "Processing link down\n"); + clear_bit(BRCMF_VIF_STATUS_EAP_SUCCESS, &vif->sme_state); + clear_bit(BRCMF_VIF_STATUS_ASSOC_SUCCESS, &vif->sme_state); return true; } return false; @@ -6067,7 +6070,7 @@ brcmf_notify_connect_status(struct brcmf_if *ifp, } else brcmf_bss_connect_done(cfg, ndev, e, true); brcmf_net_setcarrier(ifp, true); - } else if (brcmf_is_linkdown(e)) { + } else if (brcmf_is_linkdown(ifp->vif, e)) { brcmf_dbg(CONN, "Linkdown\n"); if (!brcmf_is_ibssmode(ifp->vif) && test_bit(BRCMF_VIF_STATUS_CONNECTED,
From: Wen Gong wgong@codeaurora.org
[ Upstream commit 0d96968315d7ffbd70d608b29e9bea084210b96d ]
When function return fail to __ath11k_mac_register after success called ieee80211_register_hw, then it set wiphy->dev.parent to NULL by SET_IEEE80211_DEV(ar->hw, NULL) in end of __ath11k_mac_register, then cfg80211_get_drvinfo will be called by below call stack, but the wiphy->dev.parent is NULL, so kernel crash.
Call stack to cfg80211_get_drvinfo: NetworkManager 826 [001] 6696.731371: probe:cfg80211_get_drvinfo: (ffffffffc107d8f0) ffffffffc107d8f1 cfg80211_get_drvinfo+0x1 (/lib/modules/5.10.0-rc1-wt-ath+/kernel/net/wireless-back/cfg80211.ko) ffffffff9d8fc529 ethtool_get_drvinfo+0x99 (vmlinux) ffffffff9d90080e dev_ethtool+0x1dbe (vmlinux) ffffffff9d8b88f7 dev_ioctl+0xb7 (vmlinux) ffffffff9d8668de sock_do_ioctl+0xae (vmlinux) ffffffff9d866d60 sock_ioctl+0x350 (vmlinux) ffffffff9d2ca30e __x64_sys_ioctl+0x8e (vmlinux) ffffffff9da0dda3 do_syscall_64+0x33 (vmlinux) ffffffff9dc0008c entry_SYSCALL_64_after_hwframe+0x44 (vmlinux) 7feb5f673007 __GI___ioctl+0x7 (/lib/x86_64-linux-gnu/libc-2.23.so) 0 [unknown] ([unknown])
Code of cfg80211_get_drvinfo, the pdev which is wiphy->dev.parent is NULL when kernel crash: void cfg80211_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info) { struct wireless_dev *wdev = dev->ieee80211_ptr; struct device *pdev = wiphy_dev(wdev->wiphy);
if (pdev->driver) ....
kernel crash log: [ 973.619550] ath11k_pci 0000:05:00.0: failed to perform regd update : -16 [ 973.619555] ath11k_pci 0000:05:00.0: ath11k regd update failed: -16 [ 973.619566] ath11k_pci 0000:05:00.0: failed register the radio with mac80211: -16 [ 973.619618] ath11k_pci 0000:05:00.0: failed to create pdev core: -16 [ 973.636035] BUG: kernel NULL pointer dereference, address: 0000000000000068 [ 973.636046] #PF: supervisor read access in kernel mode [ 973.636050] #PF: error_code(0x0000) - not-present page [ 973.636054] PGD 800000012452e067 P4D 800000012452e067 PUD 12452d067 PMD 0 [ 973.636064] Oops: 0000 [#1] SMP PTI [ 973.636072] CPU: 3 PID: 848 Comm: NetworkManager Kdump: loaded Tainted: G W OE 5.10.0-rc1-wt-ath+ #24 [ 973.636076] Hardware name: LENOVO 418065C/418065C, BIOS 83ET63WW (1.33 ) 07/29/2011 [ 973.636161] RIP: 0010:cfg80211_get_drvinfo+0x25/0xd0 [cfg80211] [ 973.636169] Code: e9 c9 fe ff ff 66 66 66 66 90 55 53 ba 20 00 00 00 48 8b af 08 03 00 00 48 89 f3 48 8d 7e 04 48 8b 45 00 48 8b 80 90 01 00 00 <48> 8b 40 68 48 85 c0 0f 84 8d 00 00 00 48 8b 30 e8 a6 cc 72 c7 48 [ 973.636174] RSP: 0018:ffffaafb4040bbe0 EFLAGS: 00010286 [ 973.636180] RAX: 0000000000000000 RBX: ffffaafb4040bbfc RCX: 0000000000000000 [ 973.636184] RDX: 0000000000000020 RSI: ffffaafb4040bbfc RDI: ffffaafb4040bc00 [ 973.636188] RBP: ffff8a84c9568950 R08: 722d302e30312e35 R09: 74612d74772d3163 [ 973.636192] R10: 3163722d302e3031 R11: 2b6874612d74772d R12: ffffaafb4040bbfc [ 973.636196] R13: 00007ffe453707c0 R14: ffff8a84c9568000 R15: 0000000000000000 [ 973.636202] FS: 00007fd3d179b940(0000) GS:ffff8a84fa2c0000(0000) knlGS:0000000000000000 [ 973.636206] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 973.636211] CR2: 0000000000000068 CR3: 00000001153b6002 CR4: 00000000000606e0 [ 973.636215] Call Trace: [ 973.636234] ethtool_get_drvinfo+0x99/0x1f0 [ 973.636246] dev_ethtool+0x1dbe/0x2be0 [ 973.636256] ? mntput_no_expire+0x35/0x220 [ 973.636264] ? inet_ioctl+0x1ce/0x200 [ 973.636274] ? tomoyo_path_number_perm+0x68/0x1d0 [ 973.636282] ? kmem_cache_alloc+0x3cb/0x430 [ 973.636290] ? dev_ioctl+0xb7/0x570 [ 973.636295] dev_ioctl+0xb7/0x570 [ 973.636307] sock_do_ioctl+0xae/0x150 [ 973.636315] ? sock_ioctl+0x350/0x3c0 [ 973.636319] sock_ioctl+0x350/0x3c0 [ 973.636332] ? __x64_sys_ioctl+0x8e/0xd0 [ 973.636339] ? dlci_ioctl_set+0x30/0x30 [ 973.636346] __x64_sys_ioctl+0x8e/0xd0 [ 973.636359] do_syscall_64+0x33/0x80 [ 973.636368] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Sequence of function call when wlan load for success case when function __ath11k_mac_register return 0:
kworker/u16:3-e 2922 [001] 6696.729734: probe:ieee80211_register_hw: (ffffffffc116ae60) kworker/u16:3-e 2922 [001] 6696.730210: probe:ieee80211_if_add: (ffffffffc1185cc0) NetworkManager 826 [001] 6696.731345: probe:ethtool_get_drvinfo: (ffffffff9d8fc490) NetworkManager 826 [001] 6696.731371: probe:cfg80211_get_drvinfo: (ffffffffc107d8f0) NetworkManager 826 [001] 6696.731639: probe:ethtool_get_drvinfo: (ffffffff9d8fc490) NetworkManager 826 [001] 6696.731653: probe:cfg80211_get_drvinfo: (ffffffffc107d8f0) NetworkManager 826 [001] 6696.732866: probe:ethtool_get_drvinfo: (ffffffff9d8fc490) NetworkManager 826 [001] 6696.732893: probe:cfg80211_get_drvinfo: (ffffffffc107d8f0) systemd-udevd 3850 [003] 6696.737199: probe:ethtool_get_drvinfo: (ffffffff9d8fc490) systemd-udevd 3850 [003] 6696.737226: probe:cfg80211_get_drvinfo: (ffffffffc107d8f0) NetworkManager 826 [000] 6696.759950: probe:ethtool_get_drvinfo: (ffffffff9d8fc490) NetworkManager 826 [000] 6696.759967: probe:cfg80211_get_drvinfo: (ffffffffc107d8f0) NetworkManager 826 [000] 6696.760057: probe:ethtool_get_drvinfo: (ffffffff9d8fc490) NetworkManager 826 [000] 6696.760062: probe:cfg80211_get_drvinfo: (ffffffffc107d8f0)
After apply this patch, kernel crash gone, and below is the test case's sequence of function call and log when wlan load with fail by function ath11k_regd_update, and __ath11k_mac_register return fail:
kworker/u16:5-e 192 [001] 215.174388: probe:ieee80211_register_hw: (ffffffffc1131e60) kworker/u16:5-e 192 [000] 215.174973: probe:ieee80211_if_add: (ffffffffc114ccc0) NetworkManager 846 [001] 215.175857: probe:ethtool_get_drvinfo: (ffffffff928fc490) kworker/u16:5-e 192 [000] 215.175867: probe:ieee80211_unregister_hw: (ffffffffc1131970) NetworkManager 846 [001] 215.175880: probe:cfg80211_get_drvinfo: (ffffffffc107f8f0) NetworkManager 846 [001] 215.176105: probe:ethtool_get_drvinfo: (ffffffff928fc490) NetworkManager 846 [001] 215.176118: probe:cfg80211_get_drvinfo: (ffffffffc107f8f0) [ 215.175859] ath11k_pci 0000:05:00.0: ath11k regd update failed: -16 NetworkManager 846 [001] 215.196420: probe:ethtool_get_drvinfo: (ffffffff928fc490) NetworkManager 846 [001] 215.196430: probe:cfg80211_get_drvinfo: (ffffffffc107f8f0) [ 215.258598] ath11k_pci 0000:05:00.0: failed register the radio with mac80211: -16 [ 215.258613] ath11k_pci 0000:05:00.0: failed to create pdev core: -16
When ath11k_regd_update or ath11k_debugfs_register return fail, function ieee80211_unregister_hw of mac80211 will be called, then it will wait untill cfg80211_get_drvinfo finished, the wiphy->dev.parent is not NULL at this moment, after that, it set wiphy->dev.parent to NULL by SET_IEEE80211_DEV(ar->hw, NULL) in end of __ath11k_mac_register, so not happen kernel crash.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1 Signed-off-by: Wen Gong wgong@codeaurora.org Signed-off-by: Kalle Valo kvalo@codeaurora.org Link: https://lore.kernel.org/r/1608607824-16067-1-git-send-email-wgong@codeaurora... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath11k/mac.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/ath/ath11k/mac.c b/drivers/net/wireless/ath/ath11k/mac.c index 54bdef33f3f8..55ecf7f43735 100644 --- a/drivers/net/wireless/ath/ath11k/mac.c +++ b/drivers/net/wireless/ath/ath11k/mac.c @@ -6361,17 +6361,20 @@ static int __ath11k_mac_register(struct ath11k *ar) ret = ath11k_regd_update(ar, true); if (ret) { ath11k_err(ar->ab, "ath11k regd update failed: %d\n", ret); - goto err_free_if_combs; + goto err_unregister_hw; }
ret = ath11k_debugfs_register(ar); if (ret) { ath11k_err(ar->ab, "debugfs registration failed: %d\n", ret); - goto err_free_if_combs; + goto err_unregister_hw; }
return 0;
+err_unregister_hw: + ieee80211_unregister_hw(ar->hw); + err_free_if_combs: kfree(ar->hw->wiphy->iface_combinations[0].limits); kfree(ar->hw->wiphy->iface_combinations);
From: Guo-Feng Fan vincent_fann@realtek.com
[ Upstream commit adba838af159914eb98fcd55bfd3a89c9a7d41a8 ]
This patch fixes a defect that uses incorrect function to access registers. Use 8 and 32 bit access function to access 8 and 32 bit long data respectively.
Signed-off-by: Guo-Feng Fan vincent_fann@realtek.com Signed-off-by: Ping-Ke Shih pkshih@realtek.com Signed-off-by: Kalle Valo kvalo@codeaurora.org Link: https://lore.kernel.org/r/20210202055012.8296-2-pkshih@realtek.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/realtek/rtw88/rtw8821c.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtw88/rtw8821c.c b/drivers/net/wireless/realtek/rtw88/rtw8821c.c index fbfd85439d1f..88fb49486ee0 100644 --- a/drivers/net/wireless/realtek/rtw88/rtw8821c.c +++ b/drivers/net/wireless/realtek/rtw88/rtw8821c.c @@ -719,8 +719,8 @@ static void rtw8821c_coex_cfg_ant_switch(struct rtw_dev *rtwdev, u8 ctrl_type, regval = (!polarity_inverse ? 0x1 : 0x2); }
- rtw_write8_mask(rtwdev, REG_RFE_CTRL8, BIT_MASK_R_RFE_SEL_15, - regval); + rtw_write32_mask(rtwdev, REG_RFE_CTRL8, BIT_MASK_R_RFE_SEL_15, + regval); break; case COEX_SWITCH_CTRL_BY_PTA: rtw_write32_clr(rtwdev, REG_LED_CFG, BIT_DPDT_SEL_EN); @@ -730,8 +730,8 @@ static void rtw8821c_coex_cfg_ant_switch(struct rtw_dev *rtwdev, u8 ctrl_type, PTA_CTRL_PIN);
regval = (!polarity_inverse ? 0x2 : 0x1); - rtw_write8_mask(rtwdev, REG_RFE_CTRL8, BIT_MASK_R_RFE_SEL_15, - regval); + rtw_write32_mask(rtwdev, REG_RFE_CTRL8, BIT_MASK_R_RFE_SEL_15, + regval); break; case COEX_SWITCH_CTRL_BY_ANTDIV: rtw_write32_clr(rtwdev, REG_LED_CFG, BIT_DPDT_SEL_EN); @@ -757,11 +757,11 @@ static void rtw8821c_coex_cfg_ant_switch(struct rtw_dev *rtwdev, u8 ctrl_type, }
if (ctrl_type == COEX_SWITCH_CTRL_BY_BT) { - rtw_write32_clr(rtwdev, REG_CTRL_TYPE, BIT_CTRL_TYPE1); - rtw_write32_clr(rtwdev, REG_CTRL_TYPE, BIT_CTRL_TYPE2); + rtw_write8_clr(rtwdev, REG_CTRL_TYPE, BIT_CTRL_TYPE1); + rtw_write8_clr(rtwdev, REG_CTRL_TYPE, BIT_CTRL_TYPE2); } else { - rtw_write32_set(rtwdev, REG_CTRL_TYPE, BIT_CTRL_TYPE1); - rtw_write32_set(rtwdev, REG_CTRL_TYPE, BIT_CTRL_TYPE2); + rtw_write8_set(rtwdev, REG_CTRL_TYPE, BIT_CTRL_TYPE1); + rtw_write8_set(rtwdev, REG_CTRL_TYPE, BIT_CTRL_TYPE2); } }
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit f57ab5b75f7193e194c83616cd104f41c8350f68 ]
Initialize the dummy FIB offload module after debugfs, so that the FIB module could create its own directory there.
Signed-off-by: Amit Cohen amcohen@nvidia.com Signed-off-by: Ido Schimmel idosch@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/netdevsim/dev.c | 40 +++++++++++++++++++------------------ 1 file changed, 21 insertions(+), 19 deletions(-)
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c index 816af1f55e2c..dbeb29fa16e8 100644 --- a/drivers/net/netdevsim/dev.c +++ b/drivers/net/netdevsim/dev.c @@ -1012,23 +1012,25 @@ static int nsim_dev_reload_create(struct nsim_dev *nsim_dev, nsim_dev->fw_update_status = true; nsim_dev->fw_update_overwrite_mask = 0;
- nsim_dev->fib_data = nsim_fib_create(devlink, extack); - if (IS_ERR(nsim_dev->fib_data)) - return PTR_ERR(nsim_dev->fib_data); - nsim_devlink_param_load_driverinit_values(devlink);
err = nsim_dev_dummy_region_init(nsim_dev, devlink); if (err) - goto err_fib_destroy; + return err;
err = nsim_dev_traps_init(devlink); if (err) goto err_dummy_region_exit;
+ nsim_dev->fib_data = nsim_fib_create(devlink, extack); + if (IS_ERR(nsim_dev->fib_data)) { + err = PTR_ERR(nsim_dev->fib_data); + goto err_traps_exit; + } + err = nsim_dev_health_init(nsim_dev, devlink); if (err) - goto err_traps_exit; + goto err_fib_destroy;
err = nsim_dev_port_add_all(nsim_dev, nsim_bus_dev->port_count); if (err) @@ -1043,12 +1045,12 @@ static int nsim_dev_reload_create(struct nsim_dev *nsim_dev,
err_health_exit: nsim_dev_health_exit(nsim_dev); +err_fib_destroy: + nsim_fib_destroy(devlink, nsim_dev->fib_data); err_traps_exit: nsim_dev_traps_exit(devlink); err_dummy_region_exit: nsim_dev_dummy_region_exit(nsim_dev); -err_fib_destroy: - nsim_fib_destroy(devlink, nsim_dev->fib_data); return err; }
@@ -1080,15 +1082,9 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev) if (err) goto err_devlink_free;
- nsim_dev->fib_data = nsim_fib_create(devlink, NULL); - if (IS_ERR(nsim_dev->fib_data)) { - err = PTR_ERR(nsim_dev->fib_data); - goto err_resources_unregister; - } - err = devlink_register(devlink, &nsim_bus_dev->dev); if (err) - goto err_fib_destroy; + goto err_resources_unregister;
err = devlink_params_register(devlink, nsim_devlink_params, ARRAY_SIZE(nsim_devlink_params)); @@ -1108,9 +1104,15 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev) if (err) goto err_traps_exit;
+ nsim_dev->fib_data = nsim_fib_create(devlink, NULL); + if (IS_ERR(nsim_dev->fib_data)) { + err = PTR_ERR(nsim_dev->fib_data); + goto err_debugfs_exit; + } + err = nsim_dev_health_init(nsim_dev, devlink); if (err) - goto err_debugfs_exit; + goto err_fib_destroy;
err = nsim_bpf_dev_init(nsim_dev); if (err) @@ -1128,6 +1130,8 @@ err_bpf_dev_exit: nsim_bpf_dev_exit(nsim_dev); err_health_exit: nsim_dev_health_exit(nsim_dev); +err_fib_destroy: + nsim_fib_destroy(devlink, nsim_dev->fib_data); err_debugfs_exit: nsim_dev_debugfs_exit(nsim_dev); err_traps_exit: @@ -1139,8 +1143,6 @@ err_params_unregister: ARRAY_SIZE(nsim_devlink_params)); err_dl_unregister: devlink_unregister(devlink); -err_fib_destroy: - nsim_fib_destroy(devlink, nsim_dev->fib_data); err_resources_unregister: devlink_resources_unregister(devlink, NULL); err_devlink_free: @@ -1157,10 +1159,10 @@ static void nsim_dev_reload_destroy(struct nsim_dev *nsim_dev) debugfs_remove(nsim_dev->take_snapshot); nsim_dev_port_del_all(nsim_dev); nsim_dev_health_exit(nsim_dev); + nsim_fib_destroy(devlink, nsim_dev->fib_data); nsim_dev_traps_exit(devlink); nsim_dev_dummy_region_exit(nsim_dev); mutex_destroy(&nsim_dev->port_list_lock); - nsim_fib_destroy(devlink, nsim_dev->fib_data); }
void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev)
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit 874020f8adce535cd318af1768ffe744251b6593 ]
The only thing we do touching the device in hard interrupt context is, at most, writing an interrupt ACK register, which isn't racing in with anything protected by the reg_lock.
Thus, avoid disabling interrupts here for potentially long periods of time, particularly long periods have been observed with dumping of firmware memory (leading to lockup warnings on some devices.)
Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Luca Coelho luciano.coelho@intel.com Link: https://lore.kernel.org/r/iwlwifi.20210210135352.da916ab91298.I064c3e7823b61... Signed-off-by: Luca Coelho luciano.coelho@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/wireless/intel/iwlwifi/pcie/trans.c | 11 +++++----- .../net/wireless/intel/iwlwifi/pcie/tx-gen2.c | 5 ++--- drivers/net/wireless/intel/iwlwifi/pcie/tx.c | 22 ++++++++----------- 3 files changed, 16 insertions(+), 22 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c index ab93a848a466..e71bc97cb40e 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c @@ -1972,7 +1972,7 @@ static bool iwl_trans_pcie_grab_nic_access(struct iwl_trans *trans, int ret; struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans);
- spin_lock_irqsave(&trans_pcie->reg_lock, *flags); + spin_lock_bh(&trans_pcie->reg_lock);
if (trans_pcie->cmd_hold_nic_awake) goto out; @@ -2057,7 +2057,7 @@ static bool iwl_trans_pcie_grab_nic_access(struct iwl_trans *trans, }
err: - spin_unlock_irqrestore(&trans_pcie->reg_lock, *flags); + spin_unlock_bh(&trans_pcie->reg_lock); return false; }
@@ -2095,7 +2095,7 @@ static void iwl_trans_pcie_release_nic_access(struct iwl_trans *trans, * scheduled on different CPUs (after we drop reg_lock). */ out: - spin_unlock_irqrestore(&trans_pcie->reg_lock, *flags); + spin_unlock_bh(&trans_pcie->reg_lock); }
static int iwl_trans_pcie_read_mem(struct iwl_trans *trans, u32 addr, @@ -2296,11 +2296,10 @@ static void iwl_trans_pcie_set_bits_mask(struct iwl_trans *trans, u32 reg, u32 mask, u32 value) { struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans); - unsigned long flags;
- spin_lock_irqsave(&trans_pcie->reg_lock, flags); + spin_lock_bh(&trans_pcie->reg_lock); __iwl_trans_pcie_set_bits_mask(trans, reg, mask, value); - spin_unlock_irqrestore(&trans_pcie->reg_lock, flags); + spin_unlock_bh(&trans_pcie->reg_lock); }
static const char *get_csr_string(int cmd) diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/tx-gen2.c b/drivers/net/wireless/intel/iwlwifi/pcie/tx-gen2.c index 8757246a90d5..b9afd9b04042 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/tx-gen2.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/tx-gen2.c @@ -31,7 +31,6 @@ static int iwl_pcie_gen2_enqueue_hcmd(struct iwl_trans *trans, struct iwl_txq *txq = trans->txqs.txq[trans->txqs.cmd.q_id]; struct iwl_device_cmd *out_cmd; struct iwl_cmd_meta *out_meta; - unsigned long flags; void *dup_buf = NULL; dma_addr_t phys_addr; int i, cmd_pos, idx; @@ -244,11 +243,11 @@ static int iwl_pcie_gen2_enqueue_hcmd(struct iwl_trans *trans, if (txq->read_ptr == txq->write_ptr && txq->wd_timeout) mod_timer(&txq->stuck_timer, jiffies + txq->wd_timeout);
- spin_lock_irqsave(&trans_pcie->reg_lock, flags); + spin_lock(&trans_pcie->reg_lock); /* Increment and update queue's write index */ txq->write_ptr = iwl_txq_inc_wrap(trans, txq->write_ptr); iwl_txq_inc_wr_ptr(trans, txq); - spin_unlock_irqrestore(&trans_pcie->reg_lock, flags); + spin_unlock(&trans_pcie->reg_lock);
out: spin_unlock_bh(&txq->lock); diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/tx.c b/drivers/net/wireless/intel/iwlwifi/pcie/tx.c index 83f4964f3cb2..689f51968049 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/tx.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/tx.c @@ -223,12 +223,10 @@ static void iwl_pcie_txq_unmap(struct iwl_trans *trans, int txq_id) txq->read_ptr = iwl_txq_inc_wrap(trans, txq->read_ptr);
if (txq->read_ptr == txq->write_ptr) { - unsigned long flags; - - spin_lock_irqsave(&trans_pcie->reg_lock, flags); + spin_lock(&trans_pcie->reg_lock); if (txq_id == trans->txqs.cmd.q_id) iwl_pcie_clear_cmd_in_flight(trans); - spin_unlock_irqrestore(&trans_pcie->reg_lock, flags); + spin_unlock(&trans_pcie->reg_lock); } }
@@ -679,7 +677,6 @@ static void iwl_pcie_cmdq_reclaim(struct iwl_trans *trans, int txq_id, int idx) { struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans); struct iwl_txq *txq = trans->txqs.txq[txq_id]; - unsigned long flags; int nfreed = 0; u16 r;
@@ -710,9 +707,10 @@ static void iwl_pcie_cmdq_reclaim(struct iwl_trans *trans, int txq_id, int idx) }
if (txq->read_ptr == txq->write_ptr) { - spin_lock_irqsave(&trans_pcie->reg_lock, flags); + /* BHs are also disabled due to txq->lock */ + spin_lock(&trans_pcie->reg_lock); iwl_pcie_clear_cmd_in_flight(trans); - spin_unlock_irqrestore(&trans_pcie->reg_lock, flags); + spin_unlock(&trans_pcie->reg_lock); }
iwl_txq_progress(txq); @@ -921,7 +919,6 @@ static int iwl_pcie_enqueue_hcmd(struct iwl_trans *trans, struct iwl_txq *txq = trans->txqs.txq[trans->txqs.cmd.q_id]; struct iwl_device_cmd *out_cmd; struct iwl_cmd_meta *out_meta; - unsigned long flags; void *dup_buf = NULL; dma_addr_t phys_addr; int idx; @@ -1164,20 +1161,19 @@ static int iwl_pcie_enqueue_hcmd(struct iwl_trans *trans, if (txq->read_ptr == txq->write_ptr && txq->wd_timeout) mod_timer(&txq->stuck_timer, jiffies + txq->wd_timeout);
- spin_lock_irqsave(&trans_pcie->reg_lock, flags); + spin_lock(&trans_pcie->reg_lock); ret = iwl_pcie_set_cmd_in_flight(trans, cmd); if (ret < 0) { idx = ret; - spin_unlock_irqrestore(&trans_pcie->reg_lock, flags); - goto out; + goto unlock_reg; }
/* Increment and update queue's write index */ txq->write_ptr = iwl_txq_inc_wrap(trans, txq->write_ptr); iwl_pcie_txq_inc_wr_ptr(trans, txq);
- spin_unlock_irqrestore(&trans_pcie->reg_lock, flags); - + unlock_reg: + spin_unlock(&trans_pcie->reg_lock); out: spin_unlock_bh(&txq->lock); free_dup_buf:
From: Shuah Khan skhan@linuxfoundation.org
[ Upstream commit 09078368d516918666a0122f2533dc73676d3d7e ]
ieee80211_find_sta_by_ifaddr() must be called under the RCU lock and the resulting pointer is only valid under RCU lock as well.
Fix ath10k_wmi_tlv_op_pull_peer_stats_info() to hold RCU lock before it calls ieee80211_find_sta_by_ifaddr() and release it when the resulting pointer is no longer needed.
This problem was found while reviewing code to debug RCU warn from ath10k_wmi_tlv_parse_peer_stats_info().
Link: https://lore.kernel.org/linux-wireless/7230c9e5-2632-b77e-c4f9-10eca557a5bb@... Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Kalle Valo kvalo@codeaurora.org Link: https://lore.kernel.org/r/20210210212107.40373-1-skhan@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath10k/wmi-tlv.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/ath/ath10k/wmi-tlv.c b/drivers/net/wireless/ath/ath10k/wmi-tlv.c index e6135795719a..e7072fc4f487 100644 --- a/drivers/net/wireless/ath/ath10k/wmi-tlv.c +++ b/drivers/net/wireless/ath/ath10k/wmi-tlv.c @@ -576,13 +576,13 @@ static void ath10k_wmi_event_tdls_peer(struct ath10k *ar, struct sk_buff *skb) case WMI_TDLS_TEARDOWN_REASON_TX: case WMI_TDLS_TEARDOWN_REASON_RSSI: case WMI_TDLS_TEARDOWN_REASON_PTR_TIMEOUT: + rcu_read_lock(); station = ieee80211_find_sta_by_ifaddr(ar->hw, ev->peer_macaddr.addr, NULL); if (!station) { ath10k_warn(ar, "did not find station from tdls peer event"); - kfree(tb); - return; + goto exit; } arvif = ath10k_get_arvif(ar, __le32_to_cpu(ev->vdev_id)); ieee80211_tdls_oper_request( @@ -593,6 +593,9 @@ static void ath10k_wmi_event_tdls_peer(struct ath10k *ar, struct sk_buff *skb) ); break; } + +exit: + rcu_read_unlock(); kfree(tb); }
From: Nathan Rossi nathan.rossi@digi.com
[ Upstream commit 8a28af7a3e85ddf358f8c41e401a33002f7a9587 ]
The aq_nic_start function can fail in a variety of cases which leaves the device in broken state.
An example case where the start function fails is the request_threaded_irq which can be interrupted, resulting in a EINTR result. This can be manually triggered by bringing the link up (e.g. ip link set up) and triggering a SIGINT on the initiating process (e.g. Ctrl+C). This would put the device into a half configured state. Subsequently bringing the link up again would cause the napi_enable to BUG.
In order to correctly clean up the failed attempt to start a device call aq_nic_stop.
Signed-off-by: Nathan Rossi nathan.rossi@digi.com Reviewed-by: Igor Russkikh irusskikh@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/aquantia/atlantic/aq_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_main.c b/drivers/net/ethernet/aquantia/atlantic/aq_main.c index 8f70a3909929..4af0cd9530de 100644 --- a/drivers/net/ethernet/aquantia/atlantic/aq_main.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_main.c @@ -71,8 +71,10 @@ static int aq_ndev_open(struct net_device *ndev) goto err_exit;
err = aq_nic_start(aq_nic); - if (err < 0) + if (err < 0) { + aq_nic_stop(aq_nic); goto err_exit; + }
err_exit: if (err < 0)
From: Doug Brown doug@schmorgal.com
[ Upstream commit 39935dccb21c60f9bbf1bb72d22ab6fd14ae7705 ]
If a DDP broadcast packet is sent out to a non-gateway target, it is also looped back. There is a potential for the loopback device to have a longer hardware header length than the original target route's device, which can result in the skb not being created with enough room for the loopback device's hardware header. This patch fixes the issue by determining that a loopback will be necessary prior to allocating the skb, and if so, ensuring the skb has enough room.
This was discovered while testing a new driver that creates a LocalTalk network interface (LTALK_HLEN = 1). It caused an skb_under_panic.
Signed-off-by: Doug Brown doug@schmorgal.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/appletalk/ddp.c | 33 +++++++++++++++++++++------------ 1 file changed, 21 insertions(+), 12 deletions(-)
diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c index ca1a0d07a087..ebda397fa95a 100644 --- a/net/appletalk/ddp.c +++ b/net/appletalk/ddp.c @@ -1577,8 +1577,8 @@ static int atalk_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) struct sk_buff *skb; struct net_device *dev; struct ddpehdr *ddp; - int size; - struct atalk_route *rt; + int size, hard_header_len; + struct atalk_route *rt, *rt_lo = NULL; int err;
if (flags & ~(MSG_DONTWAIT|MSG_CMSG_COMPAT)) @@ -1641,7 +1641,22 @@ static int atalk_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) SOCK_DEBUG(sk, "SK %p: Size needed %d, device %s\n", sk, size, dev->name);
- size += dev->hard_header_len; + hard_header_len = dev->hard_header_len; + /* Leave room for loopback hardware header if necessary */ + if (usat->sat_addr.s_node == ATADDR_BCAST && + (dev->flags & IFF_LOOPBACK || !(rt->flags & RTF_GATEWAY))) { + struct atalk_addr at_lo; + + at_lo.s_node = 0; + at_lo.s_net = 0; + + rt_lo = atrtr_find(&at_lo); + + if (rt_lo && rt_lo->dev->hard_header_len > hard_header_len) + hard_header_len = rt_lo->dev->hard_header_len; + } + + size += hard_header_len; release_sock(sk); skb = sock_alloc_send_skb(sk, size, (flags & MSG_DONTWAIT), &err); lock_sock(sk); @@ -1649,7 +1664,7 @@ static int atalk_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) goto out;
skb_reserve(skb, ddp_dl->header_length); - skb_reserve(skb, dev->hard_header_len); + skb_reserve(skb, hard_header_len); skb->dev = dev;
SOCK_DEBUG(sk, "SK %p: Begin build.\n", sk); @@ -1700,18 +1715,12 @@ static int atalk_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) /* loop back */ skb_orphan(skb); if (ddp->deh_dnode == ATADDR_BCAST) { - struct atalk_addr at_lo; - - at_lo.s_node = 0; - at_lo.s_net = 0; - - rt = atrtr_find(&at_lo); - if (!rt) { + if (!rt_lo) { kfree_skb(skb); err = -ENETUNREACH; goto out; } - dev = rt->dev; + dev = rt_lo->dev; skb->dev = dev; } ddp_dl->request(ddp_dl, skb, dev->dev_addr);
From: Alex Elder elder@linaro.org
[ Upstream commit d5bc5015eb9d64cbd14e467db1a56db1472d0d6c ]
We do not support inter-EE channel or event ring commands. Inter-EE interrupts are disabled (and never re-enabled) for all channels and event rings, so we have no need for the GSI registers that clear those interrupt conditions. So remove their definitions.
Signed-off-by: Alex Elder elder@linaro.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ipa/gsi_reg.h | 10 ---------- 1 file changed, 10 deletions(-)
diff --git a/drivers/net/ipa/gsi_reg.h b/drivers/net/ipa/gsi_reg.h index 0e138bbd8205..299456e70f28 100644 --- a/drivers/net/ipa/gsi_reg.h +++ b/drivers/net/ipa/gsi_reg.h @@ -59,16 +59,6 @@ #define GSI_INTER_EE_N_SRC_EV_CH_IRQ_OFFSET(ee) \ (0x0000c01c + 0x1000 * (ee))
-#define GSI_INTER_EE_SRC_CH_IRQ_CLR_OFFSET \ - GSI_INTER_EE_N_SRC_CH_IRQ_CLR_OFFSET(GSI_EE_AP) -#define GSI_INTER_EE_N_SRC_CH_IRQ_CLR_OFFSET(ee) \ - (0x0000c028 + 0x1000 * (ee)) - -#define GSI_INTER_EE_SRC_EV_CH_IRQ_CLR_OFFSET \ - GSI_INTER_EE_N_SRC_EV_CH_IRQ_CLR_OFFSET(GSI_EE_AP) -#define GSI_INTER_EE_N_SRC_EV_CH_IRQ_CLR_OFFSET(ee) \ - (0x0000c02c + 0x1000 * (ee)) - #define GSI_CH_C_CNTXT_0_OFFSET(ch) \ GSI_EE_N_CH_C_CNTXT_0_OFFSET((ch), GSI_EE_AP) #define GSI_EE_N_CH_C_CNTXT_0_OFFSET(ch, ee) \
From: Alex Elder elder@linaro.org
[ Upstream commit 571b1e7e58ad30b3a842254aea50d2e83b2396e1 ]
This patch actually fixes a bug, though it doesn't affect the two platforms supported currently. The fix implements GSI memory pointers a bit differently.
For IPA version 4.5 and above, the address space for almost all GSI registers is adjusted downward by a fixed amount. This is currently handled by adjusting the I/O virtual address pointer after it has been mapped. The bug is that the pointer is not "de-adjusted" as it should be when it's unmapped.
This patch fixes that error, but it does so by maintaining one "raw" pointer for the mapped memory range. This is assigned when the memory is mapped and used to unmap the memory. This pointer is also used to access the two registers that do *not* sit in the "adjusted" memory space.
Rather than adjusting *that* pointer, we maintain a separate pointer that's an adjusted copy of the "raw" pointer, and that is used for most GSI register accesses.
Signed-off-by: Alex Elder elder@linaro.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ipa/gsi.c | 28 ++++++++++++---------------- drivers/net/ipa/gsi.h | 5 +++-- drivers/net/ipa/gsi_reg.h | 21 +++++++++++++-------- 3 files changed, 28 insertions(+), 26 deletions(-)
diff --git a/drivers/net/ipa/gsi.c b/drivers/net/ipa/gsi.c index b77f5fef7aec..febfac75dd6a 100644 --- a/drivers/net/ipa/gsi.c +++ b/drivers/net/ipa/gsi.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2015-2018, The Linux Foundation. All rights reserved. - * Copyright (C) 2018-2020 Linaro Ltd. + * Copyright (C) 2018-2021 Linaro Ltd. */
#include <linux/types.h> @@ -195,8 +195,6 @@ static void gsi_irq_type_disable(struct gsi *gsi, enum gsi_irq_type_id type_id) /* Turn off all GSI interrupts initially */ static void gsi_irq_setup(struct gsi *gsi) { - u32 adjust; - /* Disable all interrupt types */ gsi_irq_type_update(gsi, 0);
@@ -206,10 +204,9 @@ static void gsi_irq_setup(struct gsi *gsi) iowrite32(0, gsi->virt + GSI_CNTXT_GLOB_IRQ_EN_OFFSET); iowrite32(0, gsi->virt + GSI_CNTXT_SRC_IEOB_IRQ_MSK_OFFSET);
- /* Reverse the offset adjustment for inter-EE register offsets */ - adjust = gsi->version < IPA_VERSION_4_5 ? 0 : GSI_EE_REG_ADJUST; - iowrite32(0, gsi->virt + adjust + GSI_INTER_EE_SRC_CH_IRQ_OFFSET); - iowrite32(0, gsi->virt + adjust + GSI_INTER_EE_SRC_EV_CH_IRQ_OFFSET); + /* The inter-EE registers are in the non-adjusted address range */ + iowrite32(0, gsi->virt_raw + GSI_INTER_EE_SRC_CH_IRQ_OFFSET); + iowrite32(0, gsi->virt_raw + GSI_INTER_EE_SRC_EV_CH_IRQ_OFFSET);
iowrite32(0, gsi->virt + GSI_CNTXT_GSI_IRQ_EN_OFFSET); } @@ -2115,9 +2112,8 @@ int gsi_init(struct gsi *gsi, struct platform_device *pdev, gsi->dev = dev; gsi->version = version;
- /* The GSI layer performs NAPI on all endpoints. NAPI requires a - * network device structure, but the GSI layer does not have one, - * so we must create a dummy network device for this purpose. + /* GSI uses NAPI on all channels. Create a dummy network device + * for the channel NAPI contexts to be associated with. */ init_dummy_netdev(&gsi->dummy_dev);
@@ -2142,13 +2138,13 @@ int gsi_init(struct gsi *gsi, struct platform_device *pdev, return -EINVAL; }
- gsi->virt = ioremap(res->start, size); - if (!gsi->virt) { + gsi->virt_raw = ioremap(res->start, size); + if (!gsi->virt_raw) { dev_err(dev, "unable to remap "gsi" memory\n"); return -ENOMEM; } - /* Adjust register range pointer downward for newer IPA versions */ - gsi->virt -= adjust; + /* Most registers are accessed using an adjusted register range */ + gsi->virt = gsi->virt_raw - adjust;
init_completion(&gsi->completion);
@@ -2167,7 +2163,7 @@ int gsi_init(struct gsi *gsi, struct platform_device *pdev, err_irq_exit: gsi_irq_exit(gsi); err_iounmap: - iounmap(gsi->virt); + iounmap(gsi->virt_raw);
return ret; } @@ -2178,7 +2174,7 @@ void gsi_exit(struct gsi *gsi) mutex_destroy(&gsi->mutex); gsi_channel_exit(gsi); gsi_irq_exit(gsi); - iounmap(gsi->virt); + iounmap(gsi->virt_raw); }
/* The maximum number of outstanding TREs on a channel. This limits diff --git a/drivers/net/ipa/gsi.h b/drivers/net/ipa/gsi.h index 96c9aed397aa..696c9825834a 100644 --- a/drivers/net/ipa/gsi.h +++ b/drivers/net/ipa/gsi.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 */
/* Copyright (c) 2015-2018, The Linux Foundation. All rights reserved. - * Copyright (C) 2018-2020 Linaro Ltd. + * Copyright (C) 2018-2021 Linaro Ltd. */ #ifndef _GSI_H_ #define _GSI_H_ @@ -150,7 +150,8 @@ struct gsi { struct device *dev; /* Same as IPA device */ enum ipa_version version; struct net_device dummy_dev; /* needed for NAPI */ - void __iomem *virt; + void __iomem *virt_raw; /* I/O mapped address range */ + void __iomem *virt; /* Adjusted for most registers */ u32 irq; u32 channel_count; u32 evt_ring_count; diff --git a/drivers/net/ipa/gsi_reg.h b/drivers/net/ipa/gsi_reg.h index 299456e70f28..1622d8cf8dea 100644 --- a/drivers/net/ipa/gsi_reg.h +++ b/drivers/net/ipa/gsi_reg.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 */
/* Copyright (c) 2015-2018, The Linux Foundation. All rights reserved. - * Copyright (C) 2018-2020 Linaro Ltd. + * Copyright (C) 2018-2021 Linaro Ltd. */ #ifndef _GSI_REG_H_ #define _GSI_REG_H_ @@ -38,17 +38,21 @@ * (though the actual limit is hardware-dependent). */
-/* GSI EE registers as a group are shifted downward by a fixed - * constant amount for IPA versions 4.5 and beyond. This applies - * to all GSI registers we use *except* the ones that disable - * inter-EE interrupts for channels and event channels. +/* GSI EE registers as a group are shifted downward by a fixed constant amount + * for IPA versions 4.5 and beyond. This applies to all GSI registers we use + * *except* the ones that disable inter-EE interrupts for channels and event + * channels. * - * We handle this by adjusting the pointer to the mapped GSI memory - * region downward. Then in the one place we use them (gsi_irq_setup()) - * we undo that adjustment for the inter-EE interrupt registers. + * The "raw" (not adjusted) GSI register range is mapped, and a pointer to + * the mapped range is held in gsi->virt_raw. The inter-EE interrupt + * registers are accessed using that pointer. + * + * Most registers are accessed using gsi->virt, which is a copy of the "raw" + * pointer, adjusted downward by the fixed amount. */ #define GSI_EE_REG_ADJUST 0x0000d000 /* IPA v4.5+ */
+/* The two inter-EE IRQ register offsets are relative to gsi->virt_raw */ #define GSI_INTER_EE_SRC_CH_IRQ_OFFSET \ GSI_INTER_EE_N_SRC_CH_IRQ_OFFSET(GSI_EE_AP) #define GSI_INTER_EE_N_SRC_CH_IRQ_OFFSET(ee) \ @@ -59,6 +63,7 @@ #define GSI_INTER_EE_N_SRC_EV_CH_IRQ_OFFSET(ee) \ (0x0000c01c + 0x1000 * (ee))
+/* All other register offsets are relative to gsi->virt */ #define GSI_CH_C_CNTXT_0_OFFSET(ch) \ GSI_EE_N_CH_C_CNTXT_0_OFFSET((ch), GSI_EE_AP) #define GSI_EE_N_CH_C_CNTXT_0_OFFSET(ch, ee) \
From: Alex Elder elder@linaro.org
[ Upstream commit 2d65ed76924bc772d3974b0894d870b1aa63b34a ]
In ipa_cmd_register_write_valid() we verify that values we will supply to a REGISTER_WRITE IPA immediate command will fit in the fields that need to hold them. This patch fixes some issues in that function and ipa_cmd_register_write_offset_valid().
The dev_err() call in ipa_cmd_register_write_offset_valid() has some printf format errors: - The name of the register (corresponding to the string format specifier) was not supplied. - The IPA base offset and offset need to be supplied separately to match the other format specifiers. Also make the ~0 constant used there to compute the maximum supported offset value explicitly unsigned.
There are two other issues in ipa_cmd_register_write_valid(): - There's no need to check the hash flush register for platforms (like IPA v4.2) that do not support hashed tables - The highest possible endpoint number, whose status register offset is computed, is COUNT - 1, not COUNT.
Fix these problems, and add some additional commentary.
Signed-off-by: Alex Elder elder@linaro.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ipa/ipa_cmd.c | 32 ++++++++++++++++++++++++-------- 1 file changed, 24 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ipa/ipa_cmd.c b/drivers/net/ipa/ipa_cmd.c index 002e51448510..eb65a11e33ea 100644 --- a/drivers/net/ipa/ipa_cmd.c +++ b/drivers/net/ipa/ipa_cmd.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2012-2018, The Linux Foundation. All rights reserved. - * Copyright (C) 2019-2020 Linaro Ltd. + * Copyright (C) 2019-2021 Linaro Ltd. */
#include <linux/types.h> @@ -244,11 +244,15 @@ static bool ipa_cmd_register_write_offset_valid(struct ipa *ipa, if (ipa->version != IPA_VERSION_3_5_1) bit_count += hweight32(REGISTER_WRITE_FLAGS_OFFSET_HIGH_FMASK); BUILD_BUG_ON(bit_count > 32); - offset_max = ~0 >> (32 - bit_count); + offset_max = ~0U >> (32 - bit_count);
+ /* Make sure the offset can be represented by the field(s) + * that holds it. Also make sure the offset is not outside + * the overall IPA memory range. + */ if (offset > offset_max || ipa->mem_offset > offset_max - offset) { dev_err(dev, "%s offset too large 0x%04x + 0x%04x > 0x%04x)\n", - ipa->mem_offset + offset, offset_max); + name, ipa->mem_offset, offset, offset_max); return false; }
@@ -261,12 +265,24 @@ static bool ipa_cmd_register_write_valid(struct ipa *ipa) const char *name; u32 offset;
- offset = ipa_reg_filt_rout_hash_flush_offset(ipa->version); - name = "filter/route hash flush"; - if (!ipa_cmd_register_write_offset_valid(ipa, name, offset)) - return false; + /* If hashed tables are supported, ensure the hash flush register + * offset will fit in a register write IPA immediate command. + */ + if (ipa->version != IPA_VERSION_4_2) { + offset = ipa_reg_filt_rout_hash_flush_offset(ipa->version); + name = "filter/route hash flush"; + if (!ipa_cmd_register_write_offset_valid(ipa, name, offset)) + return false; + }
- offset = IPA_REG_ENDP_STATUS_N_OFFSET(IPA_ENDPOINT_COUNT); + /* Each endpoint can have a status endpoint associated with it, + * and this is recorded in an endpoint register. If the modem + * crashes, we reset the status endpoint for all modem endpoints + * using a register write IPA immediate command. Make sure the + * worst case (highest endpoint number) offset of that endpoint + * fits in the register write command field(s) that must hold it. + */ + offset = IPA_REG_ENDP_STATUS_N_OFFSET(IPA_ENDPOINT_COUNT - 1); name = "maximal endpoint status"; if (!ipa_cmd_register_write_offset_valid(ipa, name, offset)) return false;
From: Tong Zhang ztong0001@gmail.com
[ Upstream commit 62e69bc419772638369eff8ff81340bde8aceb61 ]
lmc set sc->lmc_media pointer when there is a matching device. However, when no matching device is found, this pointer is NULL and the following dereference will result in a null-ptr-deref.
To fix this issue, unregister the hdlc device and return an error.
[ 4.569359] BUG: KASAN: null-ptr-deref in lmc_init_one.cold+0x2b6/0x55d [lmc] [ 4.569748] Read of size 8 at addr 0000000000000008 by task modprobe/95 [ 4.570102] [ 4.570187] CPU: 0 PID: 95 Comm: modprobe Not tainted 5.11.0-rc7 #94 [ 4.570527] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-preb4 [ 4.571125] Call Trace: [ 4.571261] dump_stack+0x7d/0xa3 [ 4.571445] kasan_report.cold+0x10c/0x10e [ 4.571667] ? lmc_init_one.cold+0x2b6/0x55d [lmc] [ 4.571932] lmc_init_one.cold+0x2b6/0x55d [lmc] [ 4.572186] ? lmc_mii_readreg+0xa0/0xa0 [lmc] [ 4.572432] local_pci_probe+0x6f/0xb0 [ 4.572639] pci_device_probe+0x171/0x240 [ 4.572857] ? pci_device_remove+0xe0/0xe0 [ 4.573080] ? kernfs_create_link+0xb6/0x110 [ 4.573315] ? sysfs_do_create_link_sd.isra.0+0x76/0xe0 [ 4.573598] really_probe+0x161/0x420 [ 4.573799] driver_probe_device+0x6d/0xd0 [ 4.574022] device_driver_attach+0x82/0x90 [ 4.574249] ? device_driver_attach+0x90/0x90 [ 4.574485] __driver_attach+0x60/0x100 [ 4.574694] ? device_driver_attach+0x90/0x90 [ 4.574931] bus_for_each_dev+0xe1/0x140 [ 4.575146] ? subsys_dev_iter_exit+0x10/0x10 [ 4.575387] ? klist_node_init+0x61/0x80 [ 4.575602] bus_add_driver+0x254/0x2a0 [ 4.575812] driver_register+0xd3/0x150 [ 4.576021] ? 0xffffffffc0018000 [ 4.576202] do_one_initcall+0x84/0x250 [ 4.576411] ? trace_event_raw_event_initcall_finish+0x150/0x150 [ 4.576733] ? unpoison_range+0xf/0x30 [ 4.576938] ? ____kasan_kmalloc.constprop.0+0x84/0xa0 [ 4.577219] ? unpoison_range+0xf/0x30 [ 4.577423] ? unpoison_range+0xf/0x30 [ 4.577628] do_init_module+0xf8/0x350 [ 4.577833] load_module+0x3fe6/0x4340 [ 4.578038] ? vm_unmap_ram+0x1d0/0x1d0 [ 4.578247] ? ____kasan_kmalloc.constprop.0+0x84/0xa0 [ 4.578526] ? module_frob_arch_sections+0x20/0x20 [ 4.578787] ? __do_sys_finit_module+0x108/0x170 [ 4.579037] __do_sys_finit_module+0x108/0x170 [ 4.579278] ? __ia32_sys_init_module+0x40/0x40 [ 4.579523] ? file_open_root+0x200/0x200 [ 4.579742] ? do_sys_open+0x85/0xe0 [ 4.579938] ? filp_open+0x50/0x50 [ 4.580125] ? exit_to_user_mode_prepare+0xfc/0x130 [ 4.580390] do_syscall_64+0x33/0x40 [ 4.580586] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 4.580859] RIP: 0033:0x7f1a724c3cf7 [ 4.581054] Code: 48 89 57 30 48 8b 04 24 48 89 47 38 e9 1d a0 02 00 48 89 f8 48 89 f7 48 89 d6 48 891 [ 4.582043] RSP: 002b:00007fff44941c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 4.582447] RAX: ffffffffffffffda RBX: 00000000012ada70 RCX: 00007f1a724c3cf7 [ 4.582827] RDX: 0000000000000000 RSI: 00000000012ac9e0 RDI: 0000000000000003 [ 4.583207] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000001 [ 4.583587] R10: 00007f1a72527300 R11: 0000000000000246 R12: 00000000012ac9e0 [ 4.583968] R13: 0000000000000000 R14: 00000000012acc90 R15: 0000000000000001 [ 4.584349] ==================================================================
Signed-off-by: Tong Zhang ztong0001@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wan/lmc/lmc_main.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/wan/lmc/lmc_main.c b/drivers/net/wan/lmc/lmc_main.c index 93c7e8502845..ebb568f9bc66 100644 --- a/drivers/net/wan/lmc/lmc_main.c +++ b/drivers/net/wan/lmc/lmc_main.c @@ -899,6 +899,8 @@ static int lmc_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) break; default: printk(KERN_WARNING "%s: LMC UNKNOWN CARD!\n", dev->name); + unregister_hdlc_device(dev); + return -EIO; break; }
From: Jisheng Zhang Jisheng.Zhang@synaptics.com
[ Upstream commit d65614a01d24704b016635abf5cc028a54e45a62 ]
I met below warning when cating a small size(about 80bytes) txt file on 9pfs(msize=2097152 is passed to 9p mount option), the reason is we miss iov_iter_advance() if the read count is 0 for zerocopy case, so we didn't truncate the pipe, then iov_iter_pipe() thinks the pipe is full. Fix it by removing the exception for 0 to ensure to call iov_iter_advance() even on empty read for zerocopy case.
[ 8.279568] WARNING: CPU: 0 PID: 39 at lib/iov_iter.c:1203 iov_iter_pipe+0x31/0x40 [ 8.280028] Modules linked in: [ 8.280561] CPU: 0 PID: 39 Comm: cat Not tainted 5.11.0+ #6 [ 8.281260] RIP: 0010:iov_iter_pipe+0x31/0x40 [ 8.281974] Code: 2b 42 54 39 42 5c 76 22 c7 07 20 00 00 00 48 89 57 18 8b 42 50 48 c7 47 08 b [ 8.283169] RSP: 0018:ffff888000cbbd80 EFLAGS: 00000246 [ 8.283512] RAX: 0000000000000010 RBX: ffff888000117d00 RCX: 0000000000000000 [ 8.283876] RDX: ffff88800031d600 RSI: 0000000000000000 RDI: ffff888000cbbd90 [ 8.284244] RBP: ffff888000cbbe38 R08: 0000000000000000 R09: ffff8880008d2058 [ 8.284605] R10: 0000000000000002 R11: ffff888000375510 R12: 0000000000000050 [ 8.284964] R13: ffff888000cbbe80 R14: 0000000000000050 R15: ffff88800031d600 [ 8.285439] FS: 00007f24fd8af600(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000 [ 8.285844] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.286150] CR2: 00007f24fd7d7b90 CR3: 0000000000c97000 CR4: 00000000000406b0 [ 8.286710] Call Trace: [ 8.288279] generic_file_splice_read+0x31/0x1a0 [ 8.289273] ? do_splice_to+0x2f/0x90 [ 8.289511] splice_direct_to_actor+0xcc/0x220 [ 8.289788] ? pipe_to_sendpage+0xa0/0xa0 [ 8.290052] do_splice_direct+0x8b/0xd0 [ 8.290314] do_sendfile+0x1ad/0x470 [ 8.290576] do_syscall_64+0x2d/0x40 [ 8.290818] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 8.291409] RIP: 0033:0x7f24fd7dca0a [ 8.292511] Code: c3 0f 1f 80 00 00 00 00 4c 89 d2 4c 89 c6 e9 bd fd ff ff 0f 1f 44 00 00 31 8 [ 8.293360] RSP: 002b:00007ffc20932818 EFLAGS: 00000206 ORIG_RAX: 0000000000000028 [ 8.293800] RAX: ffffffffffffffda RBX: 0000000001000000 RCX: 00007f24fd7dca0a [ 8.294153] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001 [ 8.294504] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 [ 8.294867] R10: 0000000001000000 R11: 0000000000000206 R12: 0000000000000003 [ 8.295217] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000 [ 8.295782] ---[ end trace 63317af81b3ca24b ]---
Signed-off-by: Jisheng Zhang Jisheng.Zhang@synaptics.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/9p/client.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/net/9p/client.c b/net/9p/client.c index 4f62f299da0c..0a9019da18f3 100644 --- a/net/9p/client.c +++ b/net/9p/client.c @@ -1623,10 +1623,6 @@ p9_client_read_once(struct p9_fid *fid, u64 offset, struct iov_iter *to, }
p9_debug(P9_DEBUG_9P, "<<< RREAD count %d\n", count); - if (!count) { - p9_tag_remove(clnt, req); - return 0; - }
if (non_zc) { int n = copy_to_iter(dataptr, count, to);
From: Jesper Dangaard Brouer brouer@redhat.com
commit 6306c1189e77a513bf02720450bb43bd4ba5d8ae upstream.
Multiple BPF-helpers that can manipulate/increase the size of the SKB uses __bpf_skb_max_len() as the max-length. This function limit size against the current net_device MTU (skb->dev->mtu).
When a BPF-prog grow the packet size, then it should not be limited to the MTU. The MTU is a transmit limitation, and software receiving this packet should be allowed to increase the size. Further more, current MTU check in __bpf_skb_max_len uses the MTU from ingress/current net_device, which in case of redirects uses the wrong net_device.
This patch keeps a sanity max limit of SKB_MAX_ALLOC (16KiB). The real limit is elsewhere in the system. Jesper's testing[1] showed it was not possible to exceed 8KiB when expanding the SKB size via BPF-helper. The limiting factor is the define KMALLOC_MAX_CACHE_SIZE which is 8192 for SLUB-allocator (CONFIG_SLUB) in-case PAGE_SIZE is 4096. This define is in-effect due to this being called from softirq context see code __gfp_pfmemalloc_flags() and __do_kmalloc_node(). Jakub's testing showed that frames above 16KiB can cause NICs to reset (but not crash). Keep this sanity limit at this level as memory layer can differ based on kernel config.
[1] https://github.com/xdp-project/bpf-examples/tree/master/MTU-tests
Signed-off-by: Jesper Dangaard Brouer brouer@redhat.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: John Fastabend john.fastabend@gmail.com Link: https://lore.kernel.org/bpf/161287788936.790810.2937823995775097177.stgit@fi... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/filter.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-)
--- a/net/core/filter.c +++ b/net/core/filter.c @@ -3552,11 +3552,7 @@ static int bpf_skb_net_shrink(struct sk_ return 0; }
-static u32 __bpf_skb_max_len(const struct sk_buff *skb) -{ - return skb->dev ? skb->dev->mtu + skb->dev->hard_header_len : - SKB_MAX_ALLOC; -} +#define BPF_SKB_MAX_LEN SKB_MAX_ALLOC
BPF_CALL_4(sk_skb_adjust_room, struct sk_buff *, skb, s32, len_diff, u32, mode, u64, flags) @@ -3605,7 +3601,7 @@ BPF_CALL_4(bpf_skb_adjust_room, struct s { u32 len_cur, len_diff_abs = abs(len_diff); u32 len_min = bpf_skb_net_base_len(skb); - u32 len_max = __bpf_skb_max_len(skb); + u32 len_max = BPF_SKB_MAX_LEN; __be16 proto = skb->protocol; bool shrink = len_diff < 0; u32 off; @@ -3688,7 +3684,7 @@ static int bpf_skb_trim_rcsum(struct sk_ static inline int __bpf_skb_change_tail(struct sk_buff *skb, u32 new_len, u64 flags) { - u32 max_len = __bpf_skb_max_len(skb); + u32 max_len = BPF_SKB_MAX_LEN; u32 min_len = __bpf_skb_min_len(skb); int ret;
@@ -3764,7 +3760,7 @@ static const struct bpf_func_proto sk_sk static inline int __bpf_skb_change_head(struct sk_buff *skb, u32 head_room, u64 flags) { - u32 max_len = __bpf_skb_max_len(skb); + u32 max_len = BPF_SKB_MAX_LEN; u32 new_len = skb->len + head_room; int ret;
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit 1a1c130ab7575498eed5bcf7220037ae09cd1f8a upstream.
The following problem has been reported by George Kennedy:
Since commit 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()") the following use after free occurs intermittently when ACPI tables are accessed.
BUG: KASAN: use-after-free in ibft_init+0x134/0xc49 Read of size 4 at addr ffff8880be453004 by task swapper/0/1 CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc1-7a7fd0d #1 Call Trace: dump_stack+0xf6/0x158 print_address_description.constprop.9+0x41/0x60 kasan_report.cold.14+0x7b/0xd4 __asan_report_load_n_noabort+0xf/0x20 ibft_init+0x134/0xc49 do_one_initcall+0xc4/0x3e0 kernel_init_freeable+0x5af/0x66b kernel_init+0x16/0x1d0 ret_from_fork+0x22/0x30
ACPI tables mapped via kmap() do not have their mapped pages reserved and the pages can be "stolen" by the buddy allocator.
Apparently, on the affected system, the ACPI table in question is not located in "reserved" memory, like ACPI NVS or ACPI Data, that will not be used by the buddy allocator, so the memory occupied by that table has to be explicitly reserved to prevent the buddy allocator from using it.
In order to address this problem, rearrange the initialization of the ACPI tables on x86 to locate the initial tables earlier and reserve the memory occupied by them.
The other architectures using ACPI should not be affected by this change.
Link: https://lore.kernel.org/linux-acpi/1614802160-29362-1-git-send-email-george.... Reported-by: George Kennedy george.kennedy@oracle.com Tested-by: George Kennedy george.kennedy@oracle.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Reviewed-by: Mike Rapoport rppt@linux.ibm.com Cc: 5.10+ stable@vger.kernel.org # 5.10+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/acpi/boot.c | 25 ++++++++++++------------- arch/x86/kernel/setup.c | 8 +++----- drivers/acpi/tables.c | 42 +++++++++++++++++++++++++++++++++++++++--- include/linux/acpi.h | 9 ++++++++- 4 files changed, 62 insertions(+), 22 deletions(-)
--- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -1554,10 +1554,18 @@ void __init acpi_boot_table_init(void) /* * Initialize the ACPI boot-time table parser. */ - if (acpi_table_init()) { + if (acpi_locate_initial_tables()) disable_acpi(); - return; - } + else + acpi_reserve_initial_tables(); +} + +int __init early_acpi_boot_init(void) +{ + if (acpi_disabled) + return 1; + + acpi_table_init_complete();
acpi_table_parse(ACPI_SIG_BOOT, acpi_parse_sbf);
@@ -1570,18 +1578,9 @@ void __init acpi_boot_table_init(void) } else { printk(KERN_WARNING PREFIX "Disabling ACPI support\n"); disable_acpi(); - return; + return 1; } } -} - -int __init early_acpi_boot_init(void) -{ - /* - * If acpi_disabled, bail out - */ - if (acpi_disabled) - return 1;
/* * Process the Multiple APIC Description Table (MADT), if present --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1046,6 +1046,9 @@ void __init setup_arch(char **cmdline_p)
cleanup_highmap();
+ /* Look for ACPI tables and reserve memory occupied by them. */ + acpi_boot_table_init(); + memblock_set_current_limit(ISA_END_ADDRESS); e820__memblock_setup();
@@ -1137,11 +1140,6 @@ void __init setup_arch(char **cmdline_p)
early_platform_quirks();
- /* - * Parse the ACPI tables for possible boot-time SMP configuration. - */ - acpi_boot_table_init(); - early_acpi_boot_init();
initmem_init(); --- a/drivers/acpi/tables.c +++ b/drivers/acpi/tables.c @@ -780,7 +780,7 @@ acpi_status acpi_os_table_override(struc }
/* - * acpi_table_init() + * acpi_locate_initial_tables() * * find RSDP, find and checksum SDT/XSDT. * checksum all tables, print SDT/XSDT @@ -788,7 +788,7 @@ acpi_status acpi_os_table_override(struc * result: sdt_entry[] is initialized */
-int __init acpi_table_init(void) +int __init acpi_locate_initial_tables(void) { acpi_status status;
@@ -803,9 +803,45 @@ int __init acpi_table_init(void) status = acpi_initialize_tables(initial_tables, ACPI_MAX_TABLES, 0); if (ACPI_FAILURE(status)) return -EINVAL; - acpi_table_initrd_scan();
+ return 0; +} + +void __init acpi_reserve_initial_tables(void) +{ + int i; + + for (i = 0; i < ACPI_MAX_TABLES; i++) { + struct acpi_table_desc *table_desc = &initial_tables[i]; + u64 start = table_desc->address; + u64 size = table_desc->length; + + if (!start || !size) + break; + + pr_info("Reserving %4s table memory at [mem 0x%llx-0x%llx]\n", + table_desc->signature.ascii, start, start + size - 1); + + memblock_reserve(start, size); + } +} + +void __init acpi_table_init_complete(void) +{ + acpi_table_initrd_scan(); check_multiple_madt(); +} + +int __init acpi_table_init(void) +{ + int ret; + + ret = acpi_locate_initial_tables(); + if (ret) + return ret; + + acpi_table_init_complete(); + return 0; }
--- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -222,10 +222,14 @@ void __iomem *__acpi_map_table(unsigned void __acpi_unmap_table(void __iomem *map, unsigned long size); int early_acpi_boot_init(void); int acpi_boot_init (void); +void acpi_boot_table_prepare (void); void acpi_boot_table_init (void); int acpi_mps_check (void); int acpi_numa_init (void);
+int acpi_locate_initial_tables (void); +void acpi_reserve_initial_tables (void); +void acpi_table_init_complete (void); int acpi_table_init (void); int acpi_table_parse(char *id, acpi_tbl_table_handler handler); int __init acpi_table_parse_entries(char *id, unsigned long table_size, @@ -807,9 +811,12 @@ static inline int acpi_boot_init(void) return 0; }
+static inline void acpi_boot_table_prepare(void) +{ +} + static inline void acpi_boot_table_init(void) { - return; }
static inline int acpi_mps_check(void)
From: Vitaly Kuznetsov vkuznets@redhat.com
commit 8cdddd182bd7befae6af49c5fd612893f55d6ccb upstream.
Commit 496121c02127 ("ACPI: processor: idle: Allow probing on platforms with one ACPI C-state") broke CPU0 hotplug on certain systems, e.g. I'm observing the following on AWS Nitro (e.g r5b.xlarge but other instance types are affected as well):
# echo 0 > /sys/devices/system/cpu/cpu0/online # echo 1 > /sys/devices/system/cpu/cpu0/online <10 seconds delay> -bash: echo: write error: Input/output error
In fact, the above mentioned commit only revealed the problem and did not introduce it. On x86, to wakeup CPU an NMI is being used and hlt_play_dead()/mwait_play_dead() loops are prepared to handle it:
/* * If NMI wants to wake up CPU0, start CPU0. */ if (wakeup_cpu0()) start_cpu0();
cpuidle_play_dead() -> acpi_idle_play_dead() (which is now being called on systems where it wasn't called before the above mentioned commit) serves the same purpose but it doesn't have a path for CPU0. What happens now on wakeup is: - NMI is sent to CPU0 - wakeup_cpu0_nmi() works as expected - we get back to while (1) loop in acpi_idle_play_dead() - safe_halt() puts CPU0 to sleep again.
The straightforward/minimal fix is add the special handling for CPU0 on x86 and that's what the patch is doing.
Fixes: 496121c02127 ("ACPI: processor: idle: Allow probing on platforms with one ACPI C-state") Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Cc: 5.10+ stable@vger.kernel.org # 5.10+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/smp.h | 1 + arch/x86/kernel/smpboot.c | 2 +- drivers/acpi/processor_idle.c | 7 +++++++ 3 files changed, 9 insertions(+), 1 deletion(-)
--- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -132,6 +132,7 @@ void native_play_dead(void); void play_dead_common(void); void wbinvd_on_cpu(int cpu); int wbinvd_on_all_cpus(void); +bool wakeup_cpu0(void);
void native_smp_send_reschedule(int cpu); void native_send_call_func_ipi(const struct cpumask *mask); --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1659,7 +1659,7 @@ void play_dead_common(void) local_irq_disable(); }
-static bool wakeup_cpu0(void) +bool wakeup_cpu0(void) { if (smp_processor_id() == 0 && enable_start_cpu0) return true; --- a/drivers/acpi/processor_idle.c +++ b/drivers/acpi/processor_idle.c @@ -29,6 +29,7 @@ */ #ifdef CONFIG_X86 #include <asm/apic.h> +#include <asm/cpu.h> #endif
#define _COMPONENT ACPI_PROCESSOR_COMPONENT @@ -541,6 +542,12 @@ static int acpi_idle_play_dead(struct cp wait_for_freeze(); } else return -ENODEV; + +#if defined(CONFIG_X86) && defined(CONFIG_HOTPLUG_CPU) + /* If NMI wants to wake up CPU0, start CPU0. */ + if (wakeup_cpu0()) + start_cpu0(); +#endif }
/* Never reached */
From: Hans de Goede hdegoede@redhat.com
commit 3e759425cc3cf9a43392309819d34c65a3644c59 upstream.
Commit 71da201f38df ("ACPI: scan: Defer enumeration of devices with _DEP lists") dropped the following 2 lines from acpi_init_device_object():
/* Assume there are unmet deps until acpi_device_dep_initialize() runs */ device->dep_unmet = 1;
Leaving the initial value of dep_unmet at the 0 from the kzalloc(). This causes the acpi_bus_get_status() call in acpi_add_single_object() to actually call _STA, even though there maybe unmet deps, leading to errors like these:
[ 0.123579] ACPI Error: No handler for Region [ECRM] (00000000ba9edc4c) [GenericSerialBus] (20170831/evregion-166) [ 0.123601] ACPI Error: Region GenericSerialBus (ID=9) has no handler (20170831/exfldio-299) [ 0.123618] ACPI Error: Method parse/execution failed _SB.I2C1.BAT1._STA, AE_NOT_EXIST (20170831/psparse-550)
Fix this by re-adding the dep_unmet = 1 initialization to acpi_init_device_object() and modifying acpi_bus_check_add() to make sure that dep_unmet always gets setup there, overriding the initial 1 value.
This re-fixes the issue initially fixed by commit 63347db0affa ("ACPI / scan: Use acpi_bus_get_status() to initialize ACPI_TYPE_DEVICE devs"), which introduced the removed "device->dep_unmet = 1;" statement.
This issue was noticed; and the fix tested on a Dell Venue 10 Pro 5055.
Fixes: 71da201f38df ("ACPI: scan: Defer enumeration of devices with _DEP lists") Suggested-by: Rafael J. Wysocki rafael@kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com Cc: 5.11+ stable@vger.kernel.org # 5.11+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/scan.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
--- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -1669,6 +1669,8 @@ void acpi_init_device_object(struct acpi device_initialize(&device->dev); dev_set_uevent_suppress(&device->dev, true); acpi_init_coherency(device); + /* Assume there are unmet deps to start with. */ + device->dep_unmet = 1; }
void acpi_device_add_finalize(struct acpi_device *device) @@ -1934,6 +1936,8 @@ static void acpi_scan_dep_init(struct ac { struct acpi_dep_data *dep;
+ adev->dep_unmet = 0; + mutex_lock(&acpi_dep_list_lock);
list_for_each_entry(dep, &acpi_dep_list, node) { @@ -1981,7 +1985,13 @@ static acpi_status acpi_bus_check_add(ac return AE_CTRL_DEPTH;
acpi_scan_init_hotplug(device); - if (!check_dep) + /* + * If check_dep is true at this point, the device has no dependencies, + * or the creation of the device object would have been postponed above. + */ + if (check_dep) + device->dep_unmet = 0; + else acpi_scan_dep_init(device);
out:
From: Ikjoon Jang ikjn@chromium.org
commit 625bd5a616ceda4840cd28f82e957c8ced394b6a upstream.
Logitech ConferenceCam Connect is a compound USB device with UVC and UAC. Not 100% reproducible but sometimes it keeps responding STALL to every control transfer once it receives get_freq request.
This patch adds 046d:0x084c to a snd_usb_get_sample_rate_quirk list.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203419 Signed-off-by: Ikjoon Jang ikjn@chromium.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210324105153.2322881-1-ikjn@chromium.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/quirks.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/usb/quirks.c +++ b/sound/usb/quirks.c @@ -1521,6 +1521,7 @@ bool snd_usb_get_sample_rate_quirk(struc case USB_ID(0x21b4, 0x0081): /* AudioQuest DragonFly */ case USB_ID(0x2912, 0x30c8): /* Audioengine D1 */ case USB_ID(0x413c, 0xa506): /* Dell AE515 sound bar */ + case USB_ID(0x046d, 0x084c): /* Logitech ConferenceCam Connect */ return true; }
From: Takashi Iwai tiwai@suse.de
commit c8f79808cd8eb5bc8d14de129bd6d586d3fce0aa upstream.
The card power state change via snd_power_change_state() at the system suspend/resume seems dropped mistakenly during the PM code rewrite. The card power state doesn't play much role nowadays but it's still referred in a few places such as the HDMI codec driver.
This patch restores them, but in a more appropriate place now in the prepare and complete callbacks.
Fixes: f5dac54d9d93 ("ALSA: hda: Separate runtime and system suspend") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210329113059.25035-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/hda_intel.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -1025,6 +1025,7 @@ static int azx_prepare(struct device *de
chip = card->private_data; chip->pm_prepared = 1; + snd_power_change_state(card, SNDRV_CTL_POWER_D3hot);
flush_work(&azx_bus(chip)->unsol_work);
@@ -1040,6 +1041,7 @@ static void azx_complete(struct device * struct azx *chip;
chip = card->private_data; + snd_power_change_state(card, SNDRV_CTL_POWER_D0); chip->pm_prepared = 0; }
From: Takashi Iwai tiwai@suse.de
commit 66affb7bb0dc0905155a1b2475261aa704d1ddb5 upstream.
The recently added PM prepare and complete callbacks don't have the sanity check whether the card instance has been properly initialized, which may potentially lead to Oops.
This patch adds the azx_is_pm_ready() call in each place appropriately like other PM callbacks.
Fixes: f5dac54d9d93 ("ALSA: hda: Separate runtime and system suspend") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210329113059.25035-2-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/hda_intel.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -1023,6 +1023,9 @@ static int azx_prepare(struct device *de struct snd_card *card = dev_get_drvdata(dev); struct azx *chip;
+ if (!azx_is_pm_ready(card)) + return 0; + chip = card->private_data; chip->pm_prepared = 1; snd_power_change_state(card, SNDRV_CTL_POWER_D3hot); @@ -1040,6 +1043,9 @@ static void azx_complete(struct device * struct snd_card *card = dev_get_drvdata(dev); struct azx *chip;
+ if (!azx_is_pm_ready(card)) + return; + chip = card->private_data; snd_power_change_state(card, SNDRV_CTL_POWER_D0); chip->pm_prepared = 0;
From: Hui Wang hui.wang@canonical.com
commit febf22565549ea7111e7d45e8f2d64373cc66b11 upstream.
We found a recording issue on a Dell AIO, users plug a headset-mic and select headset-mic from UI, but can't record any sound from headset-mic. The root cause is the determine_headset_type() returns a wrong type, e.g. users plug a ctia type headset, but that function returns omtp type.
On this machine, the internal mic is not connected to the codec, the "Input Source" is headset mic by default. And when users plug a headset, the determine_headset_type() will be called immediately, the codec on this AIO is alc274, the delay time for this codec in the determine_headset_type() is only 80ms, the delay is too short to correctly determine the headset type, the fail rate is nearly 99% when users plug the headset with the normal speed.
Other codecs set several hundred ms delay time, so here I change the delay time to 850ms for alc2x4 series, after this change, the fail rate is zero unless users plug the headset slowly on purpose.
Cc: stable@vger.kernel.org Signed-off-by: Hui Wang hui.wang@canonical.com Link: https://lore.kernel.org/r/20210320091542.6748-1-hui.wang@canonical.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -5256,7 +5256,7 @@ static void alc_determine_headset_type(s case 0x10ec0274: case 0x10ec0294: alc_process_coef_fw(codec, coef0274); - msleep(80); + msleep(850); val = alc_read_coef_idx(codec, 0x46); is_ctia = (val & 0x00f0) == 0x00f0; break;
From: Hui Wang hui.wang@canonical.com
commit e54f30befa7990b897189b44a56c1138c6bfdbb5 upstream.
We found the alc_update_headset_mode() is not called on some machines when unplugging the headset, as a result, the mode of the ALC_HEADSET_MODE_UNPLUGGED can't be set, then the current_headset_type is not cleared, if users plug a differnt type of headset next time, the determine_headset_type() will not be called and the audio jack is set to the headset type of previous time.
On the Dell machines which connect the dmic to the PCH, if we open the gnome-sound-setting and unplug the headset, this issue will happen. Those machines disable the auto-mute by ucm and has no internal mic in the input source, so the update_headset_mode() will not be called by cap_sync_hook or automute_hook when unplugging, and because the gnome-sound-setting is opened, the codec will not enter the runtime_suspend state, so the update_headset_mode() will not be called by alc_resume when unplugging. In this case the hp_automute_hook is called when unplugging, so add update_headset_mode() calling to this function.
Cc: stable@vger.kernel.org Signed-off-by: Hui Wang hui.wang@canonical.com Link: https://lore.kernel.org/r/20210320091542.6748-2-hui.wang@canonical.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -5440,6 +5440,7 @@ static void alc_update_headset_jack_cb(s struct hda_jack_callback *jack) { snd_hda_gen_hp_automute(codec, jack); + alc_update_headset_mode(codec); }
static void alc_probe_headset_mode(struct hda_codec *codec)
From: Jeremy Szu jeremy.szu@canonical.com
commit 417eadfdd9e25188465280edf3668ed163fda2d0 upstream.
The HP EliteBook 640 G8 Notebook PC is using ALC236 codec which is using 0x02 to control mute LED and 0x01 to control micmute LED. Therefore, add a quirk to make it works.
Signed-off-by: Jeremy Szu jeremy.szu@canonical.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210330114428.40490-1-jeremy.szu@canonical.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -8058,6 +8058,7 @@ static const struct snd_pci_quirk alc269 ALC285_FIXUP_HP_GPIO_AMP_INIT), SND_PCI_QUIRK(0x103c, 0x87c8, "HP", ALC287_FIXUP_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x87e5, "HP ProBook 440 G8 Notebook PC", ALC236_FIXUP_HP_GPIO_LED), + SND_PCI_QUIRK(0x103c, 0x87f2, "HP ProBook 640 G8 Notebook PC", ALC236_FIXUP_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x87f4, "HP", ALC287_FIXUP_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x87f5, "HP", ALC287_FIXUP_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x87f7, "HP Spectre x360 14", ALC245_FIXUP_HP_X360_AMP),
From: Max Filippov jcmvbkbc@gmail.com
commit 7b9acbb6aad4f54623dcd4bd4b1a60fe0c727b09 upstream.
If a uaccess (e.g. get_user()) triggers a fault and there's a fault signal pending, the handler will return to the uaccess without having performed a uaccess fault fixup, and so the CPU will immediately execute the uaccess instruction again, whereupon it will livelock bouncing between that instruction and the fault handler.
https://lore.kernel.org/lkml/20210121123140.GD48431@C02TD0UTHF1T.local/
Cc: stable@vger.kernel.org Reported-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Max Filippov jcmvbkbc@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/xtensa/mm/fault.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/arch/xtensa/mm/fault.c +++ b/arch/xtensa/mm/fault.c @@ -112,8 +112,11 @@ good_area: */ fault = handle_mm_fault(vma, address, flags, regs);
- if (fault_signal_pending(fault, regs)) + if (fault_signal_pending(fault, regs)) { + if (!user_mode(regs)) + goto bad_page_fault; return; + }
if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM)
From: Max Filippov jcmvbkbc@gmail.com
commit ab5eb336411f18fd449a1fb37d36a55ec422603f upstream.
coprocessor_flush is not a part of fast exception handlers, but it uses parts of fast coprocessor handling code that's why it's in the same source file. It uses call0 opcode to invoke those parts so there are no limitations on their relative location, but the rest of the code calls coprocessor_flush with call8 and that doesn't work when vectors are placed in a different gigabyte-aligned area than the rest of the kernel.
Move coprocessor_flush from the .exception.text section to the .text so that it's reachable from the rest of the kernel with call8.
Cc: stable@vger.kernel.org Signed-off-by: Max Filippov jcmvbkbc@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/xtensa/kernel/coprocessor.S | 64 ++++++++++++++++++++------------------- 1 file changed, 33 insertions(+), 31 deletions(-)
--- a/arch/xtensa/kernel/coprocessor.S +++ b/arch/xtensa/kernel/coprocessor.S @@ -100,37 +100,6 @@ LOAD_CP_REGS_TAB(7)
/* - * coprocessor_flush(struct thread_info*, index) - * a2 a3 - * - * Save coprocessor registers for coprocessor 'index'. - * The register values are saved to or loaded from the coprocessor area - * inside the task_info structure. - * - * Note that this function doesn't update the coprocessor_owner information! - * - */ - -ENTRY(coprocessor_flush) - - /* reserve 4 bytes on stack to save a0 */ - abi_entry(4) - - s32i a0, a1, 0 - movi a0, .Lsave_cp_regs_jump_table - addx8 a3, a3, a0 - l32i a4, a3, 4 - l32i a3, a3, 0 - add a2, a2, a4 - beqz a3, 1f - callx0 a3 -1: l32i a0, a1, 0 - - abi_ret(4) - -ENDPROC(coprocessor_flush) - -/* * Entry condition: * * a0: trashed, original value saved on stack (PT_AREG0) @@ -245,6 +214,39 @@ ENTRY(fast_coprocessor)
ENDPROC(fast_coprocessor)
+ .text + +/* + * coprocessor_flush(struct thread_info*, index) + * a2 a3 + * + * Save coprocessor registers for coprocessor 'index'. + * The register values are saved to or loaded from the coprocessor area + * inside the task_info structure. + * + * Note that this function doesn't update the coprocessor_owner information! + * + */ + +ENTRY(coprocessor_flush) + + /* reserve 4 bytes on stack to save a0 */ + abi_entry(4) + + s32i a0, a1, 0 + movi a0, .Lsave_cp_regs_jump_table + addx8 a3, a3, a0 + l32i a4, a3, 4 + l32i a3, a3, 0 + add a2, a2, a4 + beqz a3, 1f + callx0 a3 +1: l32i a0, a1, 0 + + abi_ret(4) + +ENDPROC(coprocessor_flush) + .data
ENTRY(coprocessor_owner)
From: Paolo Bonzini pbonzini@redhat.com
commit a58d9166a756a0f4a6618e4f593232593d6df134 upstream.
Avoid races between check and use of the nested VMCB controls. This for example ensures that the VMRUN intercept is always reflected to the nested hypervisor, instead of being processed by the host. Without this patch, it is possible to end up with svm->nested.hsave pointing to the MSR permission bitmap for nested guests.
This bug is CVE-2021-29657.
Reported-by: Felix Wilhelm fwilhelm@google.com Cc: stable@vger.kernel.org Fixes: 2fcf4876ada ("KVM: nSVM: implement on demand allocation of the nested state") Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/nested.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
--- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -246,7 +246,7 @@ static bool nested_vmcb_check_controls(s return true; }
-static bool nested_vmcb_checks(struct vcpu_svm *svm, struct vmcb *vmcb12) +static bool nested_vmcb_check_save(struct vcpu_svm *svm, struct vmcb *vmcb12) { struct kvm_vcpu *vcpu = &svm->vcpu; bool vmcb12_lma; @@ -271,7 +271,7 @@ static bool nested_vmcb_checks(struct vc if (!kvm_is_valid_cr4(&svm->vcpu, vmcb12->save.cr4)) return false;
- return nested_vmcb_check_controls(&vmcb12->control); + return true; }
static void load_nested_vmcb_control(struct vcpu_svm *svm, @@ -454,7 +454,6 @@ int enter_svm_guest_mode(struct vcpu_svm int ret;
svm->nested.vmcb12_gpa = vmcb12_gpa; - load_nested_vmcb_control(svm, &vmcb12->control); nested_prepare_vmcb_save(svm, vmcb12); nested_prepare_vmcb_control(svm);
@@ -501,7 +500,10 @@ int nested_svm_vmrun(struct vcpu_svm *sv if (WARN_ON_ONCE(!svm->nested.initialized)) return -EINVAL;
- if (!nested_vmcb_checks(svm, vmcb12)) { + load_nested_vmcb_control(svm, &vmcb12->control); + + if (!nested_vmcb_check_save(svm, vmcb12) || + !nested_vmcb_check_controls(&svm->nested.ctl)) { vmcb12->control.exit_code = SVM_EXIT_ERR; vmcb12->control.exit_code_hi = 0; vmcb12->control.exit_info_1 = 0;
From: Paolo Bonzini pbonzini@redhat.com
commit 3c346c0c60ab06a021d1c0884a0ef494bc4ee3a7 upstream.
Fixing nested_vmcb_check_save to avoid all TOC/TOU races is a bit harder in released kernels, so do the bare minimum by avoiding that EFER.SVME is cleared. This is problematic because svm_set_efer frees the data structures for nested virtualization if EFER.SVME is cleared.
Also check that EFER.SVME remains set after a nested vmexit; clearing it could happen if the bit is zero in the save area that is passed to KVM_SET_NESTED_STATE (the save area of the nested state corresponds to the nested hypervisor's state and is restored on the next nested vmexit).
Cc: stable@vger.kernel.org Fixes: 2fcf4876ada ("KVM: nSVM: implement on demand allocation of the nested state") Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/nested.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
--- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -251,6 +251,13 @@ static bool nested_vmcb_check_save(struc struct kvm_vcpu *vcpu = &svm->vcpu; bool vmcb12_lma;
+ /* + * FIXME: these should be done after copying the fields, + * to avoid TOC/TOU races. For these save area checks + * the possible damage is limited since kvm_set_cr0 and + * kvm_set_cr4 handle failure; EFER_SVME is an exception + * so it is force-set later in nested_prepare_vmcb_save. + */ if ((vmcb12->save.efer & EFER_SVME) == 0) return false;
@@ -396,7 +403,14 @@ static void nested_prepare_vmcb_save(str svm->vmcb->save.gdtr = vmcb12->save.gdtr; svm->vmcb->save.idtr = vmcb12->save.idtr; kvm_set_rflags(&svm->vcpu, vmcb12->save.rflags | X86_EFLAGS_FIXED); - svm_set_efer(&svm->vcpu, vmcb12->save.efer); + + /* + * Force-set EFER_SVME even though it is checked earlier on the + * VMCB12, because the guest can flip the bit between the check + * and now. Clearing EFER_SVME would call svm_free_nested. + */ + svm_set_efer(&svm->vcpu, vmcb12->save.efer | EFER_SVME); + svm_set_cr0(&svm->vcpu, vmcb12->save.cr0); svm_set_cr4(&svm->vcpu, vmcb12->save.cr4); svm->vmcb->save.cr2 = svm->vcpu.arch.cr2 = vmcb12->save.cr2; @@ -1209,6 +1223,8 @@ static int svm_set_nested_state(struct k */ if (!(save->cr0 & X86_CR0_PG)) goto out_free; + if (!(save->efer & EFER_SVME)) + goto out_free;
/* * All checks done, we can enter guest mode. L1 control fields
From: Adrian Hunter adrian.hunter@intel.com
commit 9dfacc54a8661bc8be6e08cffee59596ec59f263 upstream.
pm_runtime_put_suppliers() must not decrement rpm_active unless the consumer is suspended. That is because, otherwise, it could suspend suppliers for an active consumer.
That can happen as follows:
static int driver_probe_device(struct device_driver *drv, struct device *dev) { int ret = 0;
if (!device_is_registered(dev)) return -ENODEV;
dev->can_match = true; pr_debug("bus: '%s': %s: matched device %s with driver %s\n", drv->bus->name, __func__, dev_name(dev), drv->name);
pm_runtime_get_suppliers(dev); if (dev->parent) pm_runtime_get_sync(dev->parent);
At this point, dev can runtime suspend so rpm_put_suppliers() can run, rpm_active becomes 1 (the lowest value).
pm_runtime_barrier(dev); if (initcall_debug) ret = really_probe_debug(dev, drv); else ret = really_probe(dev, drv);
Probe callback can have runtime resumed dev, and then runtime put so dev is awaiting autosuspend, but rpm_active is 2.
pm_request_idle(dev);
if (dev->parent) pm_runtime_put(dev->parent);
pm_runtime_put_suppliers(dev);
Now pm_runtime_put_suppliers() will put the supplier i.e. rpm_active 2 -> 1, but consumer can still be active.
return ret; }
Fix by checking the runtime status. For any status other than RPM_SUSPENDED, rpm_active can be considered to be "owned" by rpm_[get/put]_suppliers() and pm_runtime_put_suppliers() need do nothing.
Reported-by: Asutosh Das asutoshd@codeaurora.org Fixes: 4c06c4e6cf63 ("driver core: Fix possible supplier PM-usage counter imbalance") Signed-off-by: Adrian Hunter adrian.hunter@intel.com Cc: 5.1+ stable@vger.kernel.org # 5.1+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/base/power/runtime.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/drivers/base/power/runtime.c +++ b/drivers/base/power/runtime.c @@ -1704,6 +1704,8 @@ void pm_runtime_get_suppliers(struct dev void pm_runtime_put_suppliers(struct device *dev) { struct device_link *link; + unsigned long flags; + bool put; int idx;
idx = device_links_read_lock(); @@ -1712,7 +1714,11 @@ void pm_runtime_put_suppliers(struct dev device_links_read_lock_held()) if (link->supplier_preactivated) { link->supplier_preactivated = false; - if (refcount_dec_not_one(&link->rpm_active)) + spin_lock_irqsave(&dev->power.lock, flags); + put = pm_runtime_status_suspended(dev) && + refcount_dec_not_one(&link->rpm_active); + spin_unlock_irqrestore(&dev->power.lock, flags); + if (put) pm_runtime_put(link->supplier); }
From: Adrian Hunter adrian.hunter@intel.com
commit c0c33442f7203704aef345647e14c2fb86071001 upstream.
rpm_active indicates how many times the supplier usage_count has been incremented. Consequently it must be updated after pm_runtime_get_sync() of the supplier, not before.
Fixes: 4c06c4e6cf63 ("driver core: Fix possible supplier PM-usage counter imbalance") Signed-off-by: Adrian Hunter adrian.hunter@intel.com Cc: 5.1+ stable@vger.kernel.org # 5.1+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/base/power/runtime.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/base/power/runtime.c +++ b/drivers/base/power/runtime.c @@ -1690,8 +1690,8 @@ void pm_runtime_get_suppliers(struct dev device_links_read_lock_held()) if (link->flags & DL_FLAG_PM_RUNTIME) { link->supplier_preactivated = true; - refcount_inc(&link->rpm_active); pm_runtime_get_sync(link->supplier); + refcount_inc(&link->rpm_active); }
device_links_read_unlock(idx);
From: Steven Rostedt (VMware) rostedt@goodmis.org
commit 9deb193af69d3fd6dd8e47f292b67c805a787010 upstream.
Commit cbc3b92ce037 fixed an issue to modify the macros of the stack trace event so that user space could parse it properly. Originally the stack trace format to user space showed that the called stack was a dynamic array. But it is not actually a dynamic array, in the way that other dynamic event arrays worked, and this broke user space parsing for it. The update was to make the array look to have 8 entries in it. Helper functions were added to make it parse it correctly, as the stack was dynamic, but was determined by the size of the event stored.
Although this fixed user space on how it read the event, it changed the internal structure used for the stack trace event. It changed the array size from [0] to [8] (added 8 entries). This increased the size of the stack trace event by 8 words. The size reserved on the ring buffer was the size of the stack trace event plus the number of stack entries found in the stack trace. That commit caused the amount to be 8 more than what was needed because it did not expect the caller field to have any size. This produced 8 entries of garbage (and reading random data) from the stack trace event:
<idle>-0 [002] d... 1976396.837549: <stack trace> => trace_event_raw_event_sched_switch => __traceiter_sched_switch => __schedule => schedule_idle => do_idle => cpu_startup_entry => secondary_startup_64_no_verify => 0xc8c5e150ffff93de => 0xffff93de => 0 => 0 => 0xc8c5e17800000000 => 0x1f30affff93de => 0x00000004 => 0x200000000
Instead, subtract the size of the caller field from the size of the event to make sure that only the amount needed to store the stack trace is reserved.
Link: https://lore.kernel.org/lkml/your-ad-here.call-01617191565-ext-9692@work.hou...
Cc: stable@vger.kernel.org Fixes: cbc3b92ce037 ("tracing: Set kernel_stack's caller size properly") Reported-by: Vasily Gorbik gor@linux.ibm.com Tested-by: Vasily Gorbik gor@linux.ibm.com Acked-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -2984,7 +2984,8 @@ static void __ftrace_trace_stack(struct
size = nr_entries * sizeof(unsigned long); event = __trace_buffer_lock_reserve(buffer, TRACE_STACK, - sizeof(*entry) + size, flags, pc); + (sizeof(*entry) - sizeof(entry->caller)) + size, + flags, pc); if (!event) goto out; entry = ring_buffer_event_data(event);
From: Heiko Carstens hca@linux.ibm.com
commit 72bbc226ed2ef0a46c165a482861fff00dd6d4e1 upstream.
When converting the vdso assembler code to C it was forgotten to actually copy the tod_steering_delta value to vdso_data page.
Which in turn means that tod clock steering will not work correctly.
Fix this by simply copying the value whenever it is updated.
Fixes: 4bff8cb54502 ("s390: convert to GENERIC_VDSO") Cc: stable@vger.kernel.org # 5.10 Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/time.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/s390/kernel/time.c +++ b/arch/s390/kernel/time.c @@ -398,6 +398,7 @@ static void clock_sync_global(unsigned l tod_steering_delta); tod_steering_end = now + (abs(tod_steering_delta) << 15); vdso_data->arch_data.tod_steering_end = tod_steering_end; + vdso_data->arch_data.tod_steering_delta = tod_steering_delta;
/* Update LPAR offset. */ if (ptff_query(PTFF_QTO) && ptff(&qto, sizeof(qto), PTFF_QTO) == 0)
From: Heiko Carstens hca@linux.ibm.com
commit b24bacd67ffddd9192c4745500fd6f73dbfe565e upstream.
The s390 specific vdso function __arch_get_hw_counter() is supposed to consider tod clock steering.
If a tod clock steering event happens and the tod clock is set to a new value __arch_get_hw_counter() will not return the real tod clock value but slowly drift it from the old delta until the returned value finally matches the real tod clock value again.
Unfortunately the type of tod_steering_delta unsigned while it is supposed to be signed. It depends on if tod_steering_delta is negative or positive in which direction the vdso code drifts the clock value.
Worst case is now that instead of drifting the clock slowly it will jump into the opposite direction by a factor of two.
Fix this by simply making tod_steering_delta signed.
Fixes: 4bff8cb54502 ("s390: convert to GENERIC_VDSO") Cc: stable@vger.kernel.org # 5.10 Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/include/asm/vdso/data.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/s390/include/asm/vdso/data.h +++ b/arch/s390/include/asm/vdso/data.h @@ -6,7 +6,7 @@ #include <vdso/datapage.h>
struct arch_vdso_data { - __u64 tod_steering_delta; + __s64 tod_steering_delta; __u64 tod_steering_end; };
From: Christian König christian.koenig@amd.com
commit 6c5403173a13a08ff61dbdafa4c0ed4a9dedbfe0 upstream.
We seem to have some more driver bugs than thought.
Signed-off-by: Christian König christian.koenig@amd.com Fixes: deb0814b43f3 ("drm/ttm: add ttm_bo_pin()/ttm_bo_unpin() v2") Acked-by: Matthew Auld matthew.auld@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20210312093810.2202-1-christia... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/drm/ttm/ttm_bo_api.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -612,8 +612,10 @@ static inline void ttm_bo_pin(struct ttm static inline void ttm_bo_unpin(struct ttm_buffer_object *bo) { dma_resv_assert_held(bo->base.resv); - WARN_ON_ONCE(!bo->pin_count); - --bo->pin_count; + if (bo->pin_count) + --bo->pin_count; + else + WARN_ON_ONCE(true); }
int ttm_mem_evict_first(struct ttm_bo_device *bdev,
From: Ilya Lipnitskiy ilya.lipnitskiy@gmail.com
commit e720e7d0e983bf05de80b231bccc39f1487f0f16 upstream.
There are code paths that rely on zero_pfn to be fully initialized before core_initcall. For example, wq_sysfs_init() is a core_initcall function that eventually results in a call to kernel_execve, which causes a page fault with a subsequent mmput. If zero_pfn is not initialized by then it may not get cleaned up properly and result in an error:
BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:1
Here is an analysis of the race as seen on a MIPS device. On this particular MT7621 device (Ubiquiti ER-X), zero_pfn is PFN 0 until initialized, at which point it becomes PFN 5120:
1. wq_sysfs_init calls into kobject_uevent_env at core_initcall: kobject_uevent_env+0x7e4/0x7ec kset_register+0x68/0x88 bus_register+0xdc/0x34c subsys_virtual_register+0x34/0x78 wq_sysfs_init+0x1c/0x4c do_one_initcall+0x50/0x1a8 kernel_init_freeable+0x230/0x2c8 kernel_init+0x10/0x100 ret_from_kernel_thread+0x14/0x1c
2. kobject_uevent_env() calls call_usermodehelper_exec() which executes kernel_execve asynchronously.
3. Memory allocations in kernel_execve cause a page fault, bumping the MM reference counter: add_mm_counter_fast+0xb4/0xc0 handle_mm_fault+0x6e4/0xea0 __get_user_pages.part.78+0x190/0x37c __get_user_pages_remote+0x128/0x360 get_arg_page+0x34/0xa0 copy_string_kernel+0x194/0x2a4 kernel_execve+0x11c/0x298 call_usermodehelper_exec_async+0x114/0x194
4. In case zero_pfn has not been initialized yet, zap_pte_range does not decrement the MM_ANONPAGES RSS counter and the BUG message is triggered shortly afterwards when __mmdrop checks the ref counters: __mmdrop+0x98/0x1d0 free_bprm+0x44/0x118 kernel_execve+0x160/0x1d8 call_usermodehelper_exec_async+0x114/0x194 ret_from_kernel_thread+0x14/0x1c
To avoid races such as described above, initialize init_zero_pfn at early_initcall level. Depending on the architecture, ZERO_PAGE is either constant or gets initialized even earlier, at paging_init, so there is no issue with initializing zero_pfn earlier.
Link: https://lkml.kernel.org/r/CALCv0x2YqOXEAy2Q=hafjhHCtTHVodChv1qpM=niAXOpqEbt7... Signed-off-by: Ilya Lipnitskiy ilya.lipnitskiy@gmail.com Cc: Hugh Dickins hughd@google.com Cc: "Eric W. Biederman" ebiederm@xmission.com Cc: stable@vger.kernel.org Tested-by: 周琰杰 (Zhou Yanjie) zhouyanjie@wanyeetech.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memory.c +++ b/mm/memory.c @@ -154,7 +154,7 @@ static int __init init_zero_pfn(void) zero_pfn = page_to_pfn(ZERO_PAGE(0)); return 0; } -core_initcall(init_zero_pfn); +early_initcall(init_zero_pfn);
void mm_trace_rss_stat(struct mm_struct *mm, int member, long count) {
From: Qu Huang jinsdb@126.com
commit e92049ae4548ba09e53eaa9c8f6964b07ea274c9 upstream.
Amdgpu driver uses 4-byte data type as DQM fence memory, and transmits GPU address of fence memory to microcode through query status PM4 message. However, query status PM4 message definition and microcode processing are all processed according to 8 bytes. Fence memory only allocates 4 bytes of memory, but microcode does write 8 bytes of memory, so there is a memory corruption.
Changes since v1: * Change dqm->fence_addr as a u64 pointer to fix this issue, also fix up query_status and amdkfd_fence_wait_timeout function uses 64 bit fence value to make them consistent.
Signed-off-by: Qu Huang jinsdb@126.com Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 +++--- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 8 ++++---- 7 files changed, 12 insertions(+), 12 deletions(-)
--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c @@ -155,7 +155,7 @@ static int dbgdev_diq_submit_ib(struct k
/* Wait till CP writes sync code: */ status = amdkfd_fence_wait_timeout( - (unsigned int *) rm_state, + rm_state, QUEUESTATE__ACTIVE, 1500);
kfd_gtt_sa_free(dbgdev->dev, mem_obj); --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1167,7 +1167,7 @@ static int start_cpsch(struct device_que if (retval) goto fail_allocate_vidmem;
- dqm->fence_addr = dqm->fence_mem->cpu_ptr; + dqm->fence_addr = (uint64_t *)dqm->fence_mem->cpu_ptr; dqm->fence_gpu_addr = dqm->fence_mem->gpu_addr;
init_interrupts(dqm); @@ -1340,8 +1340,8 @@ out: return retval; }
-int amdkfd_fence_wait_timeout(unsigned int *fence_addr, - unsigned int fence_value, +int amdkfd_fence_wait_timeout(uint64_t *fence_addr, + uint64_t fence_value, unsigned int timeout_ms) { unsigned long end_jiffies = msecs_to_jiffies(timeout_ms) + jiffies; --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h @@ -192,7 +192,7 @@ struct device_queue_manager { uint16_t vmid_pasid[VMID_NUM]; uint64_t pipelines_addr; uint64_t fence_gpu_addr; - unsigned int *fence_addr; + uint64_t *fence_addr; struct kfd_mem_obj *fence_mem; bool active_runlist; int sched_policy; --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c @@ -347,7 +347,7 @@ fail_create_runlist_ib: }
int pm_send_query_status(struct packet_manager *pm, uint64_t fence_address, - uint32_t fence_value) + uint64_t fence_value) { uint32_t *buffer, size; int retval = 0; --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c @@ -283,7 +283,7 @@ static int pm_unmap_queues_v9(struct pac }
static int pm_query_status_v9(struct packet_manager *pm, uint32_t *buffer, - uint64_t fence_address, uint32_t fence_value) + uint64_t fence_address, uint64_t fence_value) { struct pm4_mes_query_status *packet;
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi.c @@ -263,7 +263,7 @@ static int pm_unmap_queues_vi(struct pac }
static int pm_query_status_vi(struct packet_manager *pm, uint32_t *buffer, - uint64_t fence_address, uint32_t fence_value) + uint64_t fence_address, uint64_t fence_value) { struct pm4_mes_query_status *packet;
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -1003,8 +1003,8 @@ int pqm_get_wave_state(struct process_qu u32 *ctl_stack_used_size, u32 *save_area_used_size);
-int amdkfd_fence_wait_timeout(unsigned int *fence_addr, - unsigned int fence_value, +int amdkfd_fence_wait_timeout(uint64_t *fence_addr, + uint64_t fence_value, unsigned int timeout_ms);
/* Packet Manager */ @@ -1040,7 +1040,7 @@ struct packet_manager_funcs { uint32_t filter_param, bool reset, unsigned int sdma_engine); int (*query_status)(struct packet_manager *pm, uint32_t *buffer, - uint64_t fence_address, uint32_t fence_value); + uint64_t fence_address, uint64_t fence_value); int (*release_mem)(uint64_t gpu_addr, uint32_t *buffer);
/* Packet sizes */ @@ -1062,7 +1062,7 @@ int pm_send_set_resources(struct packet_ struct scheduling_resources *res); int pm_send_runlist(struct packet_manager *pm, struct list_head *dqm_queues); int pm_send_query_status(struct packet_manager *pm, uint64_t fence_address, - uint32_t fence_value); + uint64_t fence_value);
int pm_send_unmap_queue(struct packet_manager *pm, enum kfd_queue_type type, enum kfd_unmap_queues_filter mode,
From: Evan Quan evan.quan@amd.com
commit acc7baafeb0b52a5b91be64c4776f827a163dda1 upstream.
Correct the check for vblank short.
Signed-off-by: Evan Quan evan.quan@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Tested-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c @@ -3330,7 +3330,8 @@ static int smu7_apply_state_adjust_rules
disable_mclk_switching_for_display = ((1 < hwmgr->display_config->num_display) && !hwmgr->display_config->multi_monitor_in_sync) || - smu7_vblank_too_short(hwmgr, hwmgr->display_config->min_vblank_time); + (hwmgr->display_config->num_display && + smu7_vblank_too_short(hwmgr, hwmgr->display_config->min_vblank_time));
disable_mclk_switching = disable_mclk_switching_for_frame_lock || disable_mclk_switching_for_display;
From: Alex Deucher alexander.deucher@amd.com
commit 6951c3e4a260f65a16433833d2511e8796dc8625 upstream.
Do the same thing we do for Renoir. We can check, but since the sbios has started DPM, it will always return true which causes the driver to skip some of the SMU init when it shouldn't.
Reviewed-by: Zhan Liu zhan.liu@amd.com Acked-by: Evan Quan evan.quan@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c @@ -388,10 +388,15 @@ static int vangogh_get_allowed_feature_m
static bool vangogh_is_dpm_running(struct smu_context *smu) { + struct amdgpu_device *adev = smu->adev; int ret = 0; uint32_t feature_mask[2]; uint64_t feature_enabled;
+ /* we need to re-init after suspend so return false */ + if (adev->in_suspend) + return false; + ret = smu_cmn_get_enabled_32_bits_mask(smu, feature_mask, 2);
if (ret)
From: Nirmoy Das nirmoy.das@amd.com
commit 5e61b84f9d3ddfba73091f9fbc940caae1c9eb22 upstream.
Offset calculation wasn't correct as start addresses are in pfn not in bytes.
CC: stable@vger.kernel.org Signed-off-by: Nirmoy Das nirmoy.das@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2407,7 +2407,7 @@ int amdgpu_vm_bo_clear_mappings(struct a after->start = eaddr + 1; after->last = tmp->last; after->offset = tmp->offset; - after->offset += after->start - tmp->start; + after->offset += (after->start - tmp->start) << PAGE_SHIFT; after->flags = tmp->flags; after->bo_va = tmp->bo_va; list_add(&after->list, &tmp->bo_va->invalids);
From: Huacai Chen chenhc@lemote.com
commit 566c6e25f957ebdb0b6e8073ee291049118f47fb upstream.
In Mesa, dev_info.gart_page_size is used for alignment and it was set to AMDGPU_GPU_PAGE_SIZE(4KB). However, the page table of AMDGPU driver requires an alignment on CPU pages. So, for non-4KB page system, gart_page_size should be max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE).
Signed-off-by: Rui Wang wangr@lemote.com Signed-off-by: Huacai Chen chenhc@lemote.com Link: https://github.com/loongson-community/linux-stable/commit/caa9c0a1 [Xi: rebased for drm-next, use max_t for checkpatch, and reworded commit message.] Signed-off-by: Xi Ruoyao xry111@mengyan1223.wang BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 Tested-by: Dan Horák dan@danny.cz Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -780,9 +780,9 @@ int amdgpu_info_ioctl(struct drm_device dev_info->high_va_offset = AMDGPU_GMC_HOLE_END; dev_info->high_va_max = AMDGPU_GMC_HOLE_END | vm_size; } - dev_info->virtual_address_alignment = max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); + dev_info->virtual_address_alignment = max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); dev_info->pte_fragment_size = (1 << adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE; - dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE; + dev_info->gart_page_size = max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); dev_info->cu_active_number = adev->gfx.cu_info.number; dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask; dev_info->ce_ram_size = adev->gfx.ce_ram_size;
From: Xℹ Ruoyao xry111@mengyan1223.wang
commit e3512fb67093fabdf27af303066627b921ee9bd8 upstream.
The page table of AMDGPU requires an alignment to CPU page so we should check ioctl parameters for it. Return -EINVAL if some parameter is unaligned to CPU page, instead of corrupt the page table sliently.
Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Xi Ruoyao xry111@mengyan1223.wang Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2195,8 +2195,8 @@ int amdgpu_vm_bo_map(struct amdgpu_devic uint64_t eaddr;
/* validate the parameters */ - if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK || - size == 0 || size & AMDGPU_GPU_PAGE_MASK) + if (saddr & ~PAGE_MASK || offset & ~PAGE_MASK || + size == 0 || size & ~PAGE_MASK) return -EINVAL;
/* make sure object fit at this offset */ @@ -2261,8 +2261,8 @@ int amdgpu_vm_bo_replace_map(struct amdg int r;
/* validate the parameters */ - if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK || - size == 0 || size & AMDGPU_GPU_PAGE_MASK) + if (saddr & ~PAGE_MASK || offset & ~PAGE_MASK || + size == 0 || size & ~PAGE_MASK) return -EINVAL;
/* make sure object fit at this offset */
From: Tetsuo Handa penguin-kernel@i-love.sakura.ne.jp
commit 5e46d1b78a03d52306f21f77a4e4a144b6d31486 upstream.
syzbot is reporting NULL pointer dereference at reiserfs_security_init() [1], for commit ab17c4f02156c4f7 ("reiserfs: fixup xattr_root caching") is assuming that REISERFS_SB(s)->xattr_root != NULL in reiserfs_xattr_jcreate_nblocks() despite that commit made REISERFS_SB(sb)->priv_root != NULL && REISERFS_SB(s)->xattr_root == NULL case possible.
I guess that commit 6cb4aff0a77cc0e6 ("reiserfs: fix oops while creating privroot with selinux enabled") wanted to check xattr_root != NULL before reiserfs_xattr_jcreate_nblocks(), for the changelog is talking about the xattr root.
The issue is that while creating the privroot during mount reiserfs_security_init calls reiserfs_xattr_jcreate_nblocks which dereferences the xattr root. The xattr root doesn't exist, so we get an oops.
Therefore, update reiserfs_xattrs_initialized() to check both the privroot and the xattr root.
Link: https://syzkaller.appspot.com/bug?id=8abaedbdeb32c861dc5340544284167dd0e46cd... # [1] Reported-and-tested-by: syzbot syzbot+690cb1e51970435f9775@syzkaller.appspotmail.com Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Fixes: 6cb4aff0a77c ("reiserfs: fix oops while creating privroot with selinux enabled") Acked-by: Jeff Mahoney jeffm@suse.com Acked-by: Jan Kara jack@suse.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/reiserfs/xattr.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/reiserfs/xattr.h +++ b/fs/reiserfs/xattr.h @@ -43,7 +43,7 @@ void reiserfs_security_free(struct reise
static inline int reiserfs_xattrs_initialized(struct super_block *sb) { - return REISERFS_SB(sb)->priv_root != NULL; + return REISERFS_SB(sb)->priv_root && REISERFS_SB(sb)->xattr_root; }
#define xattr_size(size) ((size) + sizeof(struct reiserfs_xattr_header))
From: Pan Bian bianpan2016@163.com
commit 69c3ed7282a143439bbc2d03dc00d49c68fcb629 upstream.
Put DRM device on initialization failure path rather than directly return error code.
Fixes: a67d5088ceb8 ("drm/imx: drop explicit drm_mode_config_cleanup") Signed-off-by: Pan Bian bianpan2016@163.com Signed-off-by: Philipp Zabel p.zabel@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/imx/imx-drm-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/imx/imx-drm-core.c +++ b/drivers/gpu/drm/imx/imx-drm-core.c @@ -215,7 +215,7 @@ static int imx_drm_bind(struct device *d
ret = drmm_mode_config_init(drm); if (ret) - return ret; + goto err_kms;
ret = drm_vblank_init(drm, MAX_CRTC); if (ret)
From: Thierry Reding treding@nvidia.com
commit a31500fe7055451ed9043c8fff938dfa6f70ee37 upstream.
Coupling of display controllers used to rely on runtime PM to take the companion controller out of reset. Commit fd67e9c6ed5a ("drm/tegra: Do not implement runtime PM") accidentally broke this when runtime PM was removed.
Restore this functionality by reusing the hierarchical host1x client suspend/resume infrastructure that's similar to runtime PM and which perfectly fits this use-case.
Fixes: fd67e9c6ed5a ("drm/tegra: Do not implement runtime PM") Reported-by: Dmitry Osipenko digetx@gmail.com Reported-by: Paul Fertser fercerpav@gmail.com Tested-by: Dmitry Osipenko digetx@gmail.com Signed-off-by: Thierry Reding treding@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/tegra/dc.c | 20 ++++++++------------ 1 file changed, 8 insertions(+), 12 deletions(-)
--- a/drivers/gpu/drm/tegra/dc.c +++ b/drivers/gpu/drm/tegra/dc.c @@ -2501,22 +2501,18 @@ static int tegra_dc_couple(struct tegra_ * POWER_CONTROL registers during CRTC enabling. */ if (dc->soc->coupled_pm && dc->pipe == 1) { - u32 flags = DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_CONSUMER; - struct device_link *link; - struct device *partner; + struct device *companion; + struct tegra_dc *parent;
- partner = driver_find_device(dc->dev->driver, NULL, NULL, - tegra_dc_match_by_pipe); - if (!partner) + companion = driver_find_device(dc->dev->driver, NULL, (const void *)0, + tegra_dc_match_by_pipe); + if (!companion) return -EPROBE_DEFER;
- link = device_link_add(dc->dev, partner, flags); - if (!link) { - dev_err(dc->dev, "failed to link controllers\n"); - return -EINVAL; - } + parent = dev_get_drvdata(companion); + dc->client.parent = &parent->client;
- dev_dbg(dc->dev, "coupled to %s\n", dev_name(partner)); + dev_dbg(dc->dev, "coupled to %s\n", dev_name(companion)); }
return 0;
From: Thierry Reding treding@nvidia.com
commit ac097aecfef0bb289ca53d2fe0b73fc7e1612a05 upstream.
The SOR resets are exclusively shared with the SOR power domain. This means that exclusive access can only be granted temporarily and in order for that to work, a rigorous sequence must be observed. To ensure that a single consumer gets exclusive access to a reset, each consumer must implement a rigorous protocol using the reset_control_acquire() and reset_control_release() functions.
However, these functions alone don't provide any guarantees at the system level. Drivers need to ensure that the only a single consumer has access to the reset at the same time. In order for the SOR to be able to exclusively access its reset, it must therefore ensure that the SOR power domain is not powered off by holding on to a runtime PM reference to that power domain across the reset assert/deassert operation.
This used to work fine by accident, but was revealed when recently more devices started to rely on the SOR power domain.
Fixes: 11c632e1cfd3 ("drm/tegra: sor: Implement acquire/release for reset") Reported-by: Jonathan Hunter jonathanh@nvidia.com Signed-off-by: Thierry Reding treding@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/tegra/sor.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/gpu/drm/tegra/sor.c +++ b/drivers/gpu/drm/tegra/sor.c @@ -3115,6 +3115,12 @@ static int tegra_sor_init(struct host1x_ * kernel is possible. */ if (sor->rst) { + err = pm_runtime_resume_and_get(sor->dev); + if (err < 0) { + dev_err(sor->dev, "failed to get runtime PM: %d\n", err); + return err; + } + err = reset_control_acquire(sor->rst); if (err < 0) { dev_err(sor->dev, "failed to acquire SOR reset: %d\n", @@ -3148,6 +3154,7 @@ static int tegra_sor_init(struct host1x_ }
reset_control_release(sor->rst); + pm_runtime_put(sor->dev); }
err = clk_prepare_enable(sor->clk_safe);
From: Jason Gunthorpe jgg@nvidia.com
commit e0146a108ce4d2c22b9510fd12268e3ee72a0161 upstream.
Compiling the nvlink stuff relies on the SPAPR_TCE_IOMMU otherwise there are compile errors:
drivers/vfio/pci/vfio_pci_nvlink2.c:101:10: error: implicit declaration of function 'mm_iommu_put' [-Werror,-Wimplicit-function-declaration] ret = mm_iommu_put(data->mm, data->mem);
As PPC only defines these functions when the config is set.
Previously this wasn't a problem by chance as SPAPR_TCE_IOMMU was the only IOMMU that could have satisfied IOMMU_API on POWERNV.
Fixes: 179209fa1270 ("vfio: IOMMU_API should be selected") Reported-by: kernel test robot lkp@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Message-Id: 0-v1-83dba9768fc3+419-vfio_nvlink2_kconfig_jgg@nvidia.com Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/vfio/pci/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/vfio/pci/Kconfig +++ b/drivers/vfio/pci/Kconfig @@ -42,7 +42,7 @@ config VFIO_PCI_IGD
config VFIO_PCI_NVLINK2 def_bool y - depends on VFIO_PCI && PPC_POWERNV + depends on VFIO_PCI && PPC_POWERNV && SPAPR_TCE_IOMMU help VFIO PCI support for P9 Witherspoon machine with NVIDIA V100 GPUs
From: Lars Povlsen lars.povlsen@microchip.com
commit 5d5f2919273d1089a00556cad68e7f462f3dd2eb upstream.
This patch fixes using a wrong register offset when configuring an IRQ trigger type.
Fixes: be2dc859abd4 ("pinctrl: pinctrl-microchip-sgpio: Add irq support (for sparx5)") Reported-by: Gustavo A. R. Silva gustavoars@kernel.org Signed-off-by: Lars Povlsen lars.povlsen@microchip.com Reviewed-by: Gustavo A. R. Silva gustavoars@kernel.org Link: https://lore.kernel.org/r/20210203123825.611576-1-lars.povlsen@microchip.com Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pinctrl/pinctrl-microchip-sgpio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/pinctrl/pinctrl-microchip-sgpio.c +++ b/drivers/pinctrl/pinctrl-microchip-sgpio.c @@ -572,7 +572,7 @@ static void microchip_sgpio_irq_settype( /* Type value spread over 2 registers sets: low, high bit */ sgpio_clrsetbits(bank->priv, REG_INT_TRIGGER, addr.bit, BIT(addr.port), (!!(type & 0x1)) << addr.port); - sgpio_clrsetbits(bank->priv, REG_INT_TRIGGER + SGPIO_MAX_BITS, addr.bit, + sgpio_clrsetbits(bank->priv, REG_INT_TRIGGER, SGPIO_MAX_BITS + addr.bit, BIT(addr.port), (!!(type & 0x2)) << addr.port);
if (type == SGPIO_INT_TRG_LEVEL)
From: Wang Panzhenzhuan randy.wang@rock-chips.com
commit c971af25cda94afe71617790826a86253e88eab0 upstream.
The restore in resume should match to suspend which only set for RK3288 SoCs pinctrl.
Fixes: 8dca933127024 ("pinctrl: rockchip: save and restore gpio6_c6 pinmux in suspend/resume") Reviewed-by: Jianqun Xu jay.xu@rock-chips.com Reviewed-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Wang Panzhenzhuan randy.wang@rock-chips.com Signed-off-by: Jianqun Xu jay.xu@rock-chips.com Link: https://lore.kernel.org/r/20210223100725.269240-1-jay.xu@rock-chips.com Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pinctrl/pinctrl-rockchip.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
--- a/drivers/pinctrl/pinctrl-rockchip.c +++ b/drivers/pinctrl/pinctrl-rockchip.c @@ -3727,12 +3727,15 @@ static int __maybe_unused rockchip_pinct static int __maybe_unused rockchip_pinctrl_resume(struct device *dev) { struct rockchip_pinctrl *info = dev_get_drvdata(dev); - int ret = regmap_write(info->regmap_base, RK3288_GRF_GPIO6C_IOMUX, - rk3288_grf_gpio6c_iomux | - GPIO6C6_SEL_WRITE_ENABLE); + int ret;
- if (ret) - return ret; + if (info->ctrl->type == RK3288) { + ret = regmap_write(info->regmap_base, RK3288_GRF_GPIO6C_IOMUX, + rk3288_grf_gpio6c_iomux | + GPIO6C6_SEL_WRITE_ENABLE); + if (ret) + return ret; + }
return pinctrl_force_default(info->pctl_dev); }
From: Rajendra Nayak rnayak@codeaurora.org
commit 07abd8db9358751107cc46d1cdbd44a92c76a934 upstream.
The offsets for SDC_QDSD_PINGROUP and UFS_RESET were off by 0x100000 due to an issue in the scripts generating the data.
Fixes: ecb454594c43: ("pinctrl: qcom: Add sc7280 pinctrl driver") Reported-by: Veerabhadrarao Badiganti vbadigan@codeaurora.org Signed-off-by: Rajendra Nayak rnayak@codeaurora.org Reviewed-by: Bjorn Andersson bjorn.andersson@linaro.org Link: https://lore.kernel.org/r/1614662511-26519-1-git-send-email-rnayak@codeauror... Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pinctrl/qcom/pinctrl-sc7280.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/drivers/pinctrl/qcom/pinctrl-sc7280.c +++ b/drivers/pinctrl/qcom/pinctrl-sc7280.c @@ -1439,14 +1439,14 @@ static const struct msm_pingroup sc7280_ [172] = PINGROUP(172, qdss, _, _, _, _, _, _, _, _), [173] = PINGROUP(173, qdss, _, _, _, _, _, _, _, _), [174] = PINGROUP(174, qdss, _, _, _, _, _, _, _, _), - [175] = UFS_RESET(ufs_reset, 0x1be000), - [176] = SDC_QDSD_PINGROUP(sdc1_rclk, 0x1b3000, 15, 0), - [177] = SDC_QDSD_PINGROUP(sdc1_clk, 0x1b3000, 13, 6), - [178] = SDC_QDSD_PINGROUP(sdc1_cmd, 0x1b3000, 11, 3), - [179] = SDC_QDSD_PINGROUP(sdc1_data, 0x1b3000, 9, 0), - [180] = SDC_QDSD_PINGROUP(sdc2_clk, 0x1b4000, 14, 6), - [181] = SDC_QDSD_PINGROUP(sdc2_cmd, 0x1b4000, 11, 3), - [182] = SDC_QDSD_PINGROUP(sdc2_data, 0x1b4000, 9, 0), + [175] = UFS_RESET(ufs_reset, 0xbe000), + [176] = SDC_QDSD_PINGROUP(sdc1_rclk, 0xb3000, 15, 0), + [177] = SDC_QDSD_PINGROUP(sdc1_clk, 0xb3000, 13, 6), + [178] = SDC_QDSD_PINGROUP(sdc1_cmd, 0xb3000, 11, 3), + [179] = SDC_QDSD_PINGROUP(sdc1_data, 0xb3000, 9, 0), + [180] = SDC_QDSD_PINGROUP(sdc2_clk, 0xb4000, 14, 6), + [181] = SDC_QDSD_PINGROUP(sdc2_cmd, 0xb4000, 11, 3), + [182] = SDC_QDSD_PINGROUP(sdc2_data, 0xb4000, 9, 0), };
static const struct msm_pinctrl_soc_data sc7280_pinctrl = {
From: Rajendra Nayak rnayak@codeaurora.org
commit d0f9f47c07fe52b34e2ff8590cf09e0a9d8d6f99 upstream.
Fix SDC1_RCLK configurations which are in a different register so fix the offset from 0xb3000 to 0xb3004.
Fixes: ecb454594c43: ("pinctrl: qcom: Add sc7280 pinctrl driver") Reported-by: Veerabhadrarao Badiganti vbadigan@codeaurora.org Signed-off-by: Rajendra Nayak rnayak@codeaurora.org Acked-by: Bjorn Andersson bjorn.andersson@linaro.org Link: https://lore.kernel.org/r/1614662511-26519-2-git-send-email-rnayak@codeauror... Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pinctrl/qcom/pinctrl-sc7280.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/pinctrl/qcom/pinctrl-sc7280.c +++ b/drivers/pinctrl/qcom/pinctrl-sc7280.c @@ -1440,7 +1440,7 @@ static const struct msm_pingroup sc7280_ [173] = PINGROUP(173, qdss, _, _, _, _, _, _, _, _), [174] = PINGROUP(174, qdss, _, _, _, _, _, _, _, _), [175] = UFS_RESET(ufs_reset, 0xbe000), - [176] = SDC_QDSD_PINGROUP(sdc1_rclk, 0xb3000, 15, 0), + [176] = SDC_QDSD_PINGROUP(sdc1_rclk, 0xb3004, 0, 6), [177] = SDC_QDSD_PINGROUP(sdc1_clk, 0xb3000, 13, 6), [178] = SDC_QDSD_PINGROUP(sdc1_cmd, 0xb3000, 11, 3), [179] = SDC_QDSD_PINGROUP(sdc1_data, 0xb3000, 9, 0),
From: Jonathan Marek jonathan@marek.ca
commit 2a9be38099e338f597c14d3cb851849b01db05f6 upstream.
If these fields are not set in dts, the driver will use these variables uninitialized to set the fields. Not only will it set garbage values for these fields, but it can overflow into other fields and break those.
In the current sm8250 dts, the dmic01 entries do not have a pullup setting, and might not work without this change.
Reported-by: kernel test robot lkp@intel.com Reported-by: Dan Carpenter dan.carpenter@oracle.com Fixes: 6e261d1090d6 ("pinctrl: qcom: Add sm8250 lpass lpi pinctrl driver") Signed-off-by: Jonathan Marek jonathan@marek.ca Reviewed-by: Bjorn Andersson bjorn.andersson@linaro.org Reviewed-by: Srinivas Kandagatla srinivas.kandagatla@linaro.org Link: https://lore.kernel.org/r/20210304194816.3843-1-jonathan@marek.ca Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pinctrl/qcom/pinctrl-lpass-lpi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/pinctrl/qcom/pinctrl-lpass-lpi.c +++ b/drivers/pinctrl/qcom/pinctrl-lpass-lpi.c @@ -392,7 +392,7 @@ static int lpi_config_set(struct pinctrl unsigned long *configs, unsigned int nconfs) { struct lpi_pinctrl *pctrl = dev_get_drvdata(pctldev->dev); - unsigned int param, arg, pullup, strength; + unsigned int param, arg, pullup = LPI_GPIO_BIAS_DISABLE, strength = 2; bool value, output_enabled = false; const struct lpi_pingroup *g; unsigned long sval;
From: Arnd Bergmann arnd@arndb.de
commit 58b5ada8c465b5f1300bc021ebd3d3b8149124b4 upstream.
clang is clearly correct to point out a typo in a silly array of strings:
drivers/pinctrl/qcom/pinctrl-sdx55.c:426:61: error: suspicious concatenation of string literals in an array initialization; did you mean to separate the elements with a comma? [-Werror,-Wstring-concatenation] "gpio14", "gpio15", "gpio16", "gpio17", "gpio18", "gpio19" "gpio20", "gpio21", "gpio22", ^ Add the missing comma that must have accidentally been removed.
Fixes: ac43c44a7a37 ("pinctrl: qcom: Add SDX55 pincontrol driver") Signed-off-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Bjorn Andersson bjorn.andersson@linaro.org Reviewed-by: Nathan Chancellor nathan@kernel.org Link: https://lore.kernel.org/r/20210323131728.2702789-1-arnd@kernel.org Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pinctrl/qcom/pinctrl-sdx55.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/pinctrl/qcom/pinctrl-sdx55.c +++ b/drivers/pinctrl/qcom/pinctrl-sdx55.c @@ -423,7 +423,7 @@ static const char * const gpio_groups[]
static const char * const qdss_stm_groups[] = { "gpio0", "gpio1", "gpio2", "gpio3", "gpio4", "gpio5", "gpio6", "gpio7", "gpio12", "gpio13", - "gpio14", "gpio15", "gpio16", "gpio17", "gpio18", "gpio19" "gpio20", "gpio21", "gpio22", + "gpio14", "gpio15", "gpio16", "gpio17", "gpio18", "gpio19", "gpio20", "gpio21", "gpio22", "gpio23", "gpio44", "gpio45", "gpio52", "gpio53", "gpio56", "gpio57", "gpio61", "gpio62", "gpio63", "gpio64", "gpio65", "gpio66", };
From: Ben Gardon bgardon@google.com
[ Upstream commit e28a436ca4f65384cceaf3f4da0e00aa74244e6a ]
Currently the TDP MMU yield / cond_resched functions either return nothing or return true if the TLBs were not flushed. These are confusing semantics, especially when making control flow decisions in calling functions.
To clean things up, change both functions to have the same return value semantics as cond_resched: true if the thread yielded, false if it did not. If the function yielded in the _flush_ version, then the TLBs will have been flushed.
Reviewed-by: Peter Feiner pfeiner@google.com Acked-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Ben Gardon bgardon@google.com Message-Id: 20210202185734.1680553-2-bgardon@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/mmu/tdp_mmu.c | 39 ++++++++++++++++++++++++++++---------- 1 file changed, 29 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 17976998bffb..abdd89771b9b 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -413,8 +413,15 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm, _mmu->shadow_root_level, _start, _end)
/* - * Flush the TLB if the process should drop kvm->mmu_lock. - * Return whether the caller still needs to flush the tlb. + * Flush the TLB and yield if the MMU lock is contended or this thread needs to + * return control to the scheduler. + * + * If this function yields, it will also reset the tdp_iter's walk over the + * paging structure and the calling function should allow the iterator to + * continue its traversal from the paging structure root. + * + * Return true if this function yielded, the TLBs were flushed, and the + * iterator's traversal was reset. Return false if a yield was not needed. */ static bool tdp_mmu_iter_flush_cond_resched(struct kvm *kvm, struct tdp_iter *iter) { @@ -422,18 +429,32 @@ static bool tdp_mmu_iter_flush_cond_resched(struct kvm *kvm, struct tdp_iter *it kvm_flush_remote_tlbs(kvm); cond_resched_lock(&kvm->mmu_lock); tdp_iter_refresh_walk(iter); - return false; - } else { return true; } + + return false; }
-static void tdp_mmu_iter_cond_resched(struct kvm *kvm, struct tdp_iter *iter) +/* + * Yield if the MMU lock is contended or this thread needs to return control + * to the scheduler. + * + * If this function yields, it will also reset the tdp_iter's walk over the + * paging structure and the calling function should allow the iterator to + * continue its traversal from the paging structure root. + * + * Return true if this function yielded and the iterator's traversal was reset. + * Return false if a yield was not needed. + */ +static bool tdp_mmu_iter_cond_resched(struct kvm *kvm, struct tdp_iter *iter) { if (need_resched() || spin_needbreak(&kvm->mmu_lock)) { cond_resched_lock(&kvm->mmu_lock); tdp_iter_refresh_walk(iter); + return true; } + + return false; }
/* @@ -469,10 +490,8 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
tdp_mmu_set_spte(kvm, &iter, 0);
- if (can_yield) - flush_needed = tdp_mmu_iter_flush_cond_resched(kvm, &iter); - else - flush_needed = true; + flush_needed = !can_yield || + !tdp_mmu_iter_flush_cond_resched(kvm, &iter); } return flush_needed; } @@ -1073,7 +1092,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
tdp_mmu_set_spte(kvm, &iter, 0);
- spte_set = tdp_mmu_iter_flush_cond_resched(kvm, &iter); + spte_set = !tdp_mmu_iter_flush_cond_resched(kvm, &iter); }
if (spte_set)
From: Ben Gardon bgardon@google.com
[ Upstream commit e139a34ef9d5627a41e1c02210229082140d1f92 ]
The flushing and non-flushing variants of tdp_mmu_iter_cond_resched have almost identical implementations. Merge the two functions and add a flush parameter.
Signed-off-by: Ben Gardon bgardon@google.com Message-Id: 20210202185734.1680553-12-bgardon@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/mmu/tdp_mmu.c | 42 ++++++++++++-------------------------- 1 file changed, 13 insertions(+), 29 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index abdd89771b9b..0dd27767c770 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -412,33 +412,13 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm, for_each_tdp_pte(_iter, __va(_mmu->root_hpa), \ _mmu->shadow_root_level, _start, _end)
-/* - * Flush the TLB and yield if the MMU lock is contended or this thread needs to - * return control to the scheduler. - * - * If this function yields, it will also reset the tdp_iter's walk over the - * paging structure and the calling function should allow the iterator to - * continue its traversal from the paging structure root. - * - * Return true if this function yielded, the TLBs were flushed, and the - * iterator's traversal was reset. Return false if a yield was not needed. - */ -static bool tdp_mmu_iter_flush_cond_resched(struct kvm *kvm, struct tdp_iter *iter) -{ - if (need_resched() || spin_needbreak(&kvm->mmu_lock)) { - kvm_flush_remote_tlbs(kvm); - cond_resched_lock(&kvm->mmu_lock); - tdp_iter_refresh_walk(iter); - return true; - } - - return false; -} - /* * Yield if the MMU lock is contended or this thread needs to return control * to the scheduler. * + * If this function should yield and flush is set, it will perform a remote + * TLB flush before yielding. + * * If this function yields, it will also reset the tdp_iter's walk over the * paging structure and the calling function should allow the iterator to * continue its traversal from the paging structure root. @@ -446,9 +426,13 @@ static bool tdp_mmu_iter_flush_cond_resched(struct kvm *kvm, struct tdp_iter *it * Return true if this function yielded and the iterator's traversal was reset. * Return false if a yield was not needed. */ -static bool tdp_mmu_iter_cond_resched(struct kvm *kvm, struct tdp_iter *iter) +static inline bool tdp_mmu_iter_cond_resched(struct kvm *kvm, + struct tdp_iter *iter, bool flush) { if (need_resched() || spin_needbreak(&kvm->mmu_lock)) { + if (flush) + kvm_flush_remote_tlbs(kvm); + cond_resched_lock(&kvm->mmu_lock); tdp_iter_refresh_walk(iter); return true; @@ -491,7 +475,7 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, tdp_mmu_set_spte(kvm, &iter, 0);
flush_needed = !can_yield || - !tdp_mmu_iter_flush_cond_resched(kvm, &iter); + !tdp_mmu_iter_cond_resched(kvm, &iter, true); } return flush_needed; } @@ -864,7 +848,7 @@ static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, tdp_mmu_set_spte_no_dirty_log(kvm, &iter, new_spte); spte_set = true;
- tdp_mmu_iter_cond_resched(kvm, &iter); + tdp_mmu_iter_cond_resched(kvm, &iter, false); } return spte_set; } @@ -923,7 +907,7 @@ static bool clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, tdp_mmu_set_spte_no_dirty_log(kvm, &iter, new_spte); spte_set = true;
- tdp_mmu_iter_cond_resched(kvm, &iter); + tdp_mmu_iter_cond_resched(kvm, &iter, false); } return spte_set; } @@ -1039,7 +1023,7 @@ static bool set_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, tdp_mmu_set_spte(kvm, &iter, new_spte); spte_set = true;
- tdp_mmu_iter_cond_resched(kvm, &iter); + tdp_mmu_iter_cond_resched(kvm, &iter, false); }
return spte_set; @@ -1092,7 +1076,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
tdp_mmu_set_spte(kvm, &iter, 0);
- spte_set = !tdp_mmu_iter_flush_cond_resched(kvm, &iter); + spte_set = !tdp_mmu_iter_cond_resched(kvm, &iter, true); }
if (spte_set)
From: Ben Gardon bgardon@google.com
[ Upstream commit 74953d3530280dc53256054e1906f58d07bfba44 ]
The goal_gfn field in tdp_iter can be misleading as it implies that it is the iterator's final goal. It is really a target for the lowest gfn mapped by the leaf level SPTE the iterator will traverse towards. Change the field's name to be more precise.
Signed-off-by: Ben Gardon bgardon@google.com Message-Id: 20210202185734.1680553-13-bgardon@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/mmu/tdp_iter.c | 20 ++++++++++---------- arch/x86/kvm/mmu/tdp_iter.h | 4 ++-- 2 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c index 87b7e16911db..9917c55b7d24 100644 --- a/arch/x86/kvm/mmu/tdp_iter.c +++ b/arch/x86/kvm/mmu/tdp_iter.c @@ -22,21 +22,21 @@ static gfn_t round_gfn_for_level(gfn_t gfn, int level)
/* * Sets a TDP iterator to walk a pre-order traversal of the paging structure - * rooted at root_pt, starting with the walk to translate goal_gfn. + * rooted at root_pt, starting with the walk to translate next_last_level_gfn. */ void tdp_iter_start(struct tdp_iter *iter, u64 *root_pt, int root_level, - int min_level, gfn_t goal_gfn) + int min_level, gfn_t next_last_level_gfn) { WARN_ON(root_level < 1); WARN_ON(root_level > PT64_ROOT_MAX_LEVEL);
- iter->goal_gfn = goal_gfn; + iter->next_last_level_gfn = next_last_level_gfn; iter->root_level = root_level; iter->min_level = min_level; iter->level = root_level; iter->pt_path[iter->level - 1] = root_pt;
- iter->gfn = round_gfn_for_level(iter->goal_gfn, iter->level); + iter->gfn = round_gfn_for_level(iter->next_last_level_gfn, iter->level); tdp_iter_refresh_sptep(iter);
iter->valid = true; @@ -82,7 +82,7 @@ static bool try_step_down(struct tdp_iter *iter)
iter->level--; iter->pt_path[iter->level - 1] = child_pt; - iter->gfn = round_gfn_for_level(iter->goal_gfn, iter->level); + iter->gfn = round_gfn_for_level(iter->next_last_level_gfn, iter->level); tdp_iter_refresh_sptep(iter);
return true; @@ -106,7 +106,7 @@ static bool try_step_side(struct tdp_iter *iter) return false;
iter->gfn += KVM_PAGES_PER_HPAGE(iter->level); - iter->goal_gfn = iter->gfn; + iter->next_last_level_gfn = iter->gfn; iter->sptep++; iter->old_spte = READ_ONCE(*iter->sptep);
@@ -166,13 +166,13 @@ void tdp_iter_next(struct tdp_iter *iter) */ void tdp_iter_refresh_walk(struct tdp_iter *iter) { - gfn_t goal_gfn = iter->goal_gfn; + gfn_t next_last_level_gfn = iter->next_last_level_gfn;
- if (iter->gfn > goal_gfn) - goal_gfn = iter->gfn; + if (iter->gfn > next_last_level_gfn) + next_last_level_gfn = iter->gfn;
tdp_iter_start(iter, iter->pt_path[iter->root_level - 1], - iter->root_level, iter->min_level, goal_gfn); + iter->root_level, iter->min_level, next_last_level_gfn); }
u64 *tdp_iter_root_pt(struct tdp_iter *iter) diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index 47170d0dc98e..b2dd269c631f 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -15,7 +15,7 @@ struct tdp_iter { * The iterator will traverse the paging structure towards the mapping * for this GFN. */ - gfn_t goal_gfn; + gfn_t next_last_level_gfn; /* Pointers to the page tables traversed to reach the current SPTE */ u64 *pt_path[PT64_ROOT_MAX_LEVEL]; /* A pointer to the current SPTE */ @@ -52,7 +52,7 @@ struct tdp_iter { u64 *spte_to_child_pt(u64 pte, int level);
void tdp_iter_start(struct tdp_iter *iter, u64 *root_pt, int root_level, - int min_level, gfn_t goal_gfn); + int min_level, gfn_t next_last_level_gfn); void tdp_iter_next(struct tdp_iter *iter); void tdp_iter_refresh_walk(struct tdp_iter *iter); u64 *tdp_iter_root_pt(struct tdp_iter *iter);
From: Ben Gardon bgardon@google.com
[ Upstream commit ed5e484b79e8a9b8be714bd85b6fc70bd6dc99a7 ]
In some functions the TDP iter risks not making forward progress if two threads livelock yielding to one another. This is possible if two threads are trying to execute wrprot_gfn_range. Each could write protect an entry and then yield. This would reset the tdp_iter's walk over the paging structure and the loop would end up repeating the same entry over and over, preventing either thread from making forward progress.
Fix this issue by only yielding if the loop has made forward progress since the last yield.
Fixes: a6a0b05da9f3 ("kvm: x86/mmu: Support dirty logging for the TDP MMU") Reviewed-by: Peter Feiner pfeiner@google.com Signed-off-by: Ben Gardon bgardon@google.com
Message-Id: 20210202185734.1680553-14-bgardon@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/mmu/tdp_iter.c | 18 +----------------- arch/x86/kvm/mmu/tdp_iter.h | 7 ++++++- arch/x86/kvm/mmu/tdp_mmu.c | 21 ++++++++++++++++----- 3 files changed, 23 insertions(+), 23 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c index 9917c55b7d24..1a09d212186b 100644 --- a/arch/x86/kvm/mmu/tdp_iter.c +++ b/arch/x86/kvm/mmu/tdp_iter.c @@ -31,6 +31,7 @@ void tdp_iter_start(struct tdp_iter *iter, u64 *root_pt, int root_level, WARN_ON(root_level > PT64_ROOT_MAX_LEVEL);
iter->next_last_level_gfn = next_last_level_gfn; + iter->yielded_gfn = iter->next_last_level_gfn; iter->root_level = root_level; iter->min_level = min_level; iter->level = root_level; @@ -158,23 +159,6 @@ void tdp_iter_next(struct tdp_iter *iter) iter->valid = false; }
-/* - * Restart the walk over the paging structure from the root, starting from the - * highest gfn the iterator had previously reached. Assumes that the entire - * paging structure, except the root page, may have been completely torn down - * and rebuilt. - */ -void tdp_iter_refresh_walk(struct tdp_iter *iter) -{ - gfn_t next_last_level_gfn = iter->next_last_level_gfn; - - if (iter->gfn > next_last_level_gfn) - next_last_level_gfn = iter->gfn; - - tdp_iter_start(iter, iter->pt_path[iter->root_level - 1], - iter->root_level, iter->min_level, next_last_level_gfn); -} - u64 *tdp_iter_root_pt(struct tdp_iter *iter) { return iter->pt_path[iter->root_level - 1]; diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index b2dd269c631f..d480c540ee27 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -16,6 +16,12 @@ struct tdp_iter { * for this GFN. */ gfn_t next_last_level_gfn; + /* + * The next_last_level_gfn at the time when the thread last + * yielded. Only yielding when the next_last_level_gfn != + * yielded_gfn helps ensure forward progress. + */ + gfn_t yielded_gfn; /* Pointers to the page tables traversed to reach the current SPTE */ u64 *pt_path[PT64_ROOT_MAX_LEVEL]; /* A pointer to the current SPTE */ @@ -54,7 +60,6 @@ u64 *spte_to_child_pt(u64 pte, int level); void tdp_iter_start(struct tdp_iter *iter, u64 *root_pt, int root_level, int min_level, gfn_t next_last_level_gfn); void tdp_iter_next(struct tdp_iter *iter); -void tdp_iter_refresh_walk(struct tdp_iter *iter); u64 *tdp_iter_root_pt(struct tdp_iter *iter);
#endif /* __KVM_X86_MMU_TDP_ITER_H */ diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 0dd27767c770..a07d37abb63f 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -420,8 +420,9 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm, * TLB flush before yielding. * * If this function yields, it will also reset the tdp_iter's walk over the - * paging structure and the calling function should allow the iterator to - * continue its traversal from the paging structure root. + * paging structure and the calling function should skip to the next + * iteration to allow the iterator to continue its traversal from the + * paging structure root. * * Return true if this function yielded and the iterator's traversal was reset. * Return false if a yield was not needed. @@ -429,12 +430,22 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm, static inline bool tdp_mmu_iter_cond_resched(struct kvm *kvm, struct tdp_iter *iter, bool flush) { + /* Ensure forward progress has been made before yielding. */ + if (iter->next_last_level_gfn == iter->yielded_gfn) + return false; + if (need_resched() || spin_needbreak(&kvm->mmu_lock)) { if (flush) kvm_flush_remote_tlbs(kvm);
cond_resched_lock(&kvm->mmu_lock); - tdp_iter_refresh_walk(iter); + + WARN_ON(iter->gfn > iter->next_last_level_gfn); + + tdp_iter_start(iter, iter->pt_path[iter->root_level - 1], + iter->root_level, iter->min_level, + iter->next_last_level_gfn); + return true; }
@@ -474,8 +485,8 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
tdp_mmu_set_spte(kvm, &iter, 0);
- flush_needed = !can_yield || - !tdp_mmu_iter_cond_resched(kvm, &iter, true); + flush_needed = !(can_yield && + tdp_mmu_iter_cond_resched(kvm, &iter, true)); } return flush_needed; }
From: Ben Gardon bgardon@google.com
[ Upstream commit 1af4a96025b33587ca953c7ef12a1b20c6e70412 ]
Given certain conditions, some TDP MMU functions may not yield reliably / frequently enough. For example, if a paging structure was very large but had few, if any writable entries, wrprot_gfn_range could traverse many entries before finding a writable entry and yielding because the check for yielding only happens after an SPTE is modified.
Fix this issue by moving the yield to the beginning of the loop.
Fixes: a6a0b05da9f3 ("kvm: x86/mmu: Support dirty logging for the TDP MMU") Reviewed-by: Peter Feiner pfeiner@google.com Signed-off-by: Ben Gardon bgardon@google.com
Message-Id: 20210202185734.1680553-15-bgardon@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/mmu/tdp_mmu.c | 32 ++++++++++++++++++++++---------- 1 file changed, 22 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index a07d37abb63f..0567286fba39 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -470,6 +470,12 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, bool flush_needed = false;
tdp_root_for_each_pte(iter, root, start, end) { + if (can_yield && + tdp_mmu_iter_cond_resched(kvm, &iter, flush_needed)) { + flush_needed = false; + continue; + } + if (!is_shadow_present_pte(iter.old_spte)) continue;
@@ -484,9 +490,7 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, continue;
tdp_mmu_set_spte(kvm, &iter, 0); - - flush_needed = !(can_yield && - tdp_mmu_iter_cond_resched(kvm, &iter, true)); + flush_needed = true; } return flush_needed; } @@ -850,6 +854,9 @@ static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
for_each_tdp_pte_min_level(iter, root->spt, root->role.level, min_level, start, end) { + if (tdp_mmu_iter_cond_resched(kvm, &iter, false)) + continue; + if (!is_shadow_present_pte(iter.old_spte) || !is_last_spte(iter.old_spte, iter.level)) continue; @@ -858,8 +865,6 @@ static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
tdp_mmu_set_spte_no_dirty_log(kvm, &iter, new_spte); spte_set = true; - - tdp_mmu_iter_cond_resched(kvm, &iter, false); } return spte_set; } @@ -903,6 +908,9 @@ static bool clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, bool spte_set = false;
tdp_root_for_each_leaf_pte(iter, root, start, end) { + if (tdp_mmu_iter_cond_resched(kvm, &iter, false)) + continue; + if (spte_ad_need_write_protect(iter.old_spte)) { if (is_writable_pte(iter.old_spte)) new_spte = iter.old_spte & ~PT_WRITABLE_MASK; @@ -917,8 +925,6 @@ static bool clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
tdp_mmu_set_spte_no_dirty_log(kvm, &iter, new_spte); spte_set = true; - - tdp_mmu_iter_cond_resched(kvm, &iter, false); } return spte_set; } @@ -1026,6 +1032,9 @@ static bool set_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, bool spte_set = false;
tdp_root_for_each_pte(iter, root, start, end) { + if (tdp_mmu_iter_cond_resched(kvm, &iter, false)) + continue; + if (!is_shadow_present_pte(iter.old_spte)) continue;
@@ -1033,8 +1042,6 @@ static bool set_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
tdp_mmu_set_spte(kvm, &iter, new_spte); spte_set = true; - - tdp_mmu_iter_cond_resched(kvm, &iter, false); }
return spte_set; @@ -1075,6 +1082,11 @@ static void zap_collapsible_spte_range(struct kvm *kvm, bool spte_set = false;
tdp_root_for_each_pte(iter, root, start, end) { + if (tdp_mmu_iter_cond_resched(kvm, &iter, spte_set)) { + spte_set = false; + continue; + } + if (!is_shadow_present_pte(iter.old_spte) || !is_last_spte(iter.old_spte, iter.level)) continue; @@ -1087,7 +1099,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
tdp_mmu_set_spte(kvm, &iter, 0);
- spte_set = !tdp_mmu_iter_cond_resched(kvm, &iter, true); + spte_set = true; }
if (spte_set)
From: Ben Gardon bgardon@google.com
[ Upstream commit 3a9a4aa5657471a02ffb7f9b7f3b7a468b3f257b ]
Add lockdep to __tdp_mmu_set_spte to ensure that SPTEs are only modified under the MMU lock.
No functional change intended.
Reviewed-by: Peter Feiner pfeiner@google.com Reviewed-by: Sean Christopherson seanjc@google.com Acked-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Ben Gardon bgardon@google.com Message-Id: 20210202185734.1680553-4-bgardon@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/mmu/tdp_mmu.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 0567286fba39..3a8bbc812a28 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -365,6 +365,8 @@ static inline void __tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter, struct kvm_mmu_page *root = sptep_to_sp(root_pt); int as_id = kvm_mmu_page_as_id(root);
+ lockdep_assert_held(&kvm->mmu_lock); + WRITE_ONCE(*iter->sptep, new_spte);
__handle_changed_spte(kvm, as_id, iter->gfn, iter->old_spte, new_spte,
From: Ben Gardon bgardon@google.com
[ Upstream commit a066e61f13cf4b17d043ad8bea0cdde2b1e5ee49 ]
Factor out the code to handle a disconnected subtree of the TDP paging structure from the code to handle the change to an individual SPTE. Future commits will build on this to allow asynchronous page freeing.
No functional change intended.
Reviewed-by: Peter Feiner pfeiner@google.com Acked-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Ben Gardon bgardon@google.com
Message-Id: 20210202185734.1680553-6-bgardon@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/mmu/tdp_mmu.c | 71 ++++++++++++++++++++++---------------- 1 file changed, 42 insertions(+), 29 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 3a8bbc812a28..3efaa8b44e45 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -234,6 +234,45 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn, } }
+/** + * handle_removed_tdp_mmu_page - handle a pt removed from the TDP structure + * + * @kvm: kvm instance + * @pt: the page removed from the paging structure + * + * Given a page table that has been removed from the TDP paging structure, + * iterates through the page table to clear SPTEs and free child page tables. + */ +static void handle_removed_tdp_mmu_page(struct kvm *kvm, u64 *pt) +{ + struct kvm_mmu_page *sp = sptep_to_sp(pt); + int level = sp->role.level; + gfn_t gfn = sp->gfn; + u64 old_child_spte; + int i; + + trace_kvm_mmu_prepare_zap_page(sp); + + list_del(&sp->link); + + if (sp->lpage_disallowed) + unaccount_huge_nx_page(kvm, sp); + + for (i = 0; i < PT64_ENT_PER_PAGE; i++) { + old_child_spte = READ_ONCE(*(pt + i)); + WRITE_ONCE(*(pt + i), 0); + handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), + gfn + (i * KVM_PAGES_PER_HPAGE(level - 1)), + old_child_spte, 0, level - 1); + } + + kvm_flush_remote_tlbs_with_address(kvm, gfn, + KVM_PAGES_PER_HPAGE(level)); + + free_page((unsigned long)pt); + kmem_cache_free(mmu_page_header_cache, sp); +} + /** * handle_changed_spte - handle bookkeeping associated with an SPTE change * @kvm: kvm instance @@ -254,10 +293,6 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, bool was_leaf = was_present && is_last_spte(old_spte, level); bool is_leaf = is_present && is_last_spte(new_spte, level); bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte); - u64 *pt; - struct kvm_mmu_page *sp; - u64 old_child_spte; - int i;
WARN_ON(level > PT64_ROOT_MAX_LEVEL); WARN_ON(level < PG_LEVEL_4K); @@ -321,31 +356,9 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, * Recursively handle child PTs if the change removed a subtree from * the paging structure. */ - if (was_present && !was_leaf && (pfn_changed || !is_present)) { - pt = spte_to_child_pt(old_spte, level); - sp = sptep_to_sp(pt); - - trace_kvm_mmu_prepare_zap_page(sp); - - list_del(&sp->link); - - if (sp->lpage_disallowed) - unaccount_huge_nx_page(kvm, sp); - - for (i = 0; i < PT64_ENT_PER_PAGE; i++) { - old_child_spte = READ_ONCE(*(pt + i)); - WRITE_ONCE(*(pt + i), 0); - handle_changed_spte(kvm, as_id, - gfn + (i * KVM_PAGES_PER_HPAGE(level - 1)), - old_child_spte, 0, level - 1); - } - - kvm_flush_remote_tlbs_with_address(kvm, gfn, - KVM_PAGES_PER_HPAGE(level)); - - free_page((unsigned long)pt); - kmem_cache_free(mmu_page_header_cache, sp); - } + if (was_present && !was_leaf && (pfn_changed || !is_present)) + handle_removed_tdp_mmu_page(kvm, + spte_to_child_pt(old_spte, level)); }
static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
From: Ben Gardon bgardon@google.com
[ Upstream commit 7cca2d0b7e7d9f3cd740d41afdc00051c9b508a0 ]
In order to enable concurrent modifications to the paging structures in the TDP MMU, threads must be able to safely remove pages of page table memory while other threads are traversing the same memory. To ensure threads do not access PT memory after it is freed, protect PT memory with RCU.
Protecting concurrent accesses to page table memory from use-after-free bugs could also have been acomplished using walk_shadow_page_lockless_begin/end() and READING_SHADOW_PAGE_TABLES, coupling with the barriers in a TLB flush. The use of RCU for this case has several distinct advantages over that approach. 1. Disabling interrupts for long running operations is not desirable. Future commits will allow operations besides page faults to operate without the exclusive protection of the MMU lock and those operations are too long to disable iterrupts for their duration. 2. The use of RCU here avoids long blocking / spinning operations in perfromance critical paths. By freeing memory with an asynchronous RCU API we avoid the longer wait times TLB flushes experience when overlapping with a thread in walk_shadow_page_lockless_begin/end(). 3. RCU provides a separation of concerns when removing memory from the paging structure. Because the RCU callback to free memory can be scheduled immediately after a TLB flush, there's no need for the thread to manually free a queue of pages later, as commit_zap_pages does.
Fixes: 95fb5b0258b7 ("kvm: x86/mmu: Support MMIO in the TDP MMU") Reviewed-by: Peter Feiner pfeiner@google.com Suggested-by: Sean Christopherson seanjc@google.com Signed-off-by: Ben Gardon bgardon@google.com
Message-Id: 20210202185734.1680553-18-bgardon@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/mmu/mmu_internal.h | 3 ++ arch/x86/kvm/mmu/tdp_iter.c | 16 +++--- arch/x86/kvm/mmu/tdp_iter.h | 10 ++-- arch/x86/kvm/mmu/tdp_mmu.c | 95 +++++++++++++++++++++++++++++---- 4 files changed, 103 insertions(+), 21 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index cf101b73a360..9e600dc30f08 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -57,6 +57,9 @@ struct kvm_mmu_page { atomic_t write_flooding_count;
bool tdp_mmu_page; + + /* Used for freeing the page asyncronously if it is a TDP MMU page. */ + struct rcu_head rcu_head; };
extern struct kmem_cache *mmu_page_header_cache; diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c index 1a09d212186b..e5f148106e20 100644 --- a/arch/x86/kvm/mmu/tdp_iter.c +++ b/arch/x86/kvm/mmu/tdp_iter.c @@ -12,7 +12,7 @@ static void tdp_iter_refresh_sptep(struct tdp_iter *iter) { iter->sptep = iter->pt_path[iter->level - 1] + SHADOW_PT_INDEX(iter->gfn << PAGE_SHIFT, iter->level); - iter->old_spte = READ_ONCE(*iter->sptep); + iter->old_spte = READ_ONCE(*rcu_dereference(iter->sptep)); }
static gfn_t round_gfn_for_level(gfn_t gfn, int level) @@ -35,7 +35,7 @@ void tdp_iter_start(struct tdp_iter *iter, u64 *root_pt, int root_level, iter->root_level = root_level; iter->min_level = min_level; iter->level = root_level; - iter->pt_path[iter->level - 1] = root_pt; + iter->pt_path[iter->level - 1] = (tdp_ptep_t)root_pt;
iter->gfn = round_gfn_for_level(iter->next_last_level_gfn, iter->level); tdp_iter_refresh_sptep(iter); @@ -48,7 +48,7 @@ void tdp_iter_start(struct tdp_iter *iter, u64 *root_pt, int root_level, * address of the child page table referenced by the SPTE. Returns null if * there is no such entry. */ -u64 *spte_to_child_pt(u64 spte, int level) +tdp_ptep_t spte_to_child_pt(u64 spte, int level) { /* * There's no child entry if this entry isn't present or is a @@ -57,7 +57,7 @@ u64 *spte_to_child_pt(u64 spte, int level) if (!is_shadow_present_pte(spte) || is_last_spte(spte, level)) return NULL;
- return __va(spte_to_pfn(spte) << PAGE_SHIFT); + return (tdp_ptep_t)__va(spte_to_pfn(spte) << PAGE_SHIFT); }
/* @@ -66,7 +66,7 @@ u64 *spte_to_child_pt(u64 spte, int level) */ static bool try_step_down(struct tdp_iter *iter) { - u64 *child_pt; + tdp_ptep_t child_pt;
if (iter->level == iter->min_level) return false; @@ -75,7 +75,7 @@ static bool try_step_down(struct tdp_iter *iter) * Reread the SPTE before stepping down to avoid traversing into page * tables that are no longer linked from this entry. */ - iter->old_spte = READ_ONCE(*iter->sptep); + iter->old_spte = READ_ONCE(*rcu_dereference(iter->sptep));
child_pt = spte_to_child_pt(iter->old_spte, iter->level); if (!child_pt) @@ -109,7 +109,7 @@ static bool try_step_side(struct tdp_iter *iter) iter->gfn += KVM_PAGES_PER_HPAGE(iter->level); iter->next_last_level_gfn = iter->gfn; iter->sptep++; - iter->old_spte = READ_ONCE(*iter->sptep); + iter->old_spte = READ_ONCE(*rcu_dereference(iter->sptep));
return true; } @@ -159,7 +159,7 @@ void tdp_iter_next(struct tdp_iter *iter) iter->valid = false; }
-u64 *tdp_iter_root_pt(struct tdp_iter *iter) +tdp_ptep_t tdp_iter_root_pt(struct tdp_iter *iter) { return iter->pt_path[iter->root_level - 1]; } diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index d480c540ee27..4cc177d75c4a 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -7,6 +7,8 @@
#include "mmu.h"
+typedef u64 __rcu *tdp_ptep_t; + /* * A TDP iterator performs a pre-order walk over a TDP paging structure. */ @@ -23,9 +25,9 @@ struct tdp_iter { */ gfn_t yielded_gfn; /* Pointers to the page tables traversed to reach the current SPTE */ - u64 *pt_path[PT64_ROOT_MAX_LEVEL]; + tdp_ptep_t pt_path[PT64_ROOT_MAX_LEVEL]; /* A pointer to the current SPTE */ - u64 *sptep; + tdp_ptep_t sptep; /* The lowest GFN mapped by the current SPTE */ gfn_t gfn; /* The level of the root page given to the iterator */ @@ -55,11 +57,11 @@ struct tdp_iter { #define for_each_tdp_pte(iter, root, root_level, start, end) \ for_each_tdp_pte_min_level(iter, root, root_level, PG_LEVEL_4K, start, end)
-u64 *spte_to_child_pt(u64 pte, int level); +tdp_ptep_t spte_to_child_pt(u64 pte, int level);
void tdp_iter_start(struct tdp_iter *iter, u64 *root_pt, int root_level, int min_level, gfn_t next_last_level_gfn); void tdp_iter_next(struct tdp_iter *iter); -u64 *tdp_iter_root_pt(struct tdp_iter *iter); +tdp_ptep_t tdp_iter_root_pt(struct tdp_iter *iter);
#endif /* __KVM_X86_MMU_TDP_ITER_H */ diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 3efaa8b44e45..65c9172dcdf9 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -42,6 +42,12 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) return;
WARN_ON(!list_empty(&kvm->arch.tdp_mmu_roots)); + + /* + * Ensure that all the outstanding RCU callbacks to free shadow pages + * can run before the VM is torn down. + */ + rcu_barrier(); }
static void tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root) @@ -196,6 +202,28 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu) return __pa(root->spt); }
+static void tdp_mmu_free_sp(struct kvm_mmu_page *sp) +{ + free_page((unsigned long)sp->spt); + kmem_cache_free(mmu_page_header_cache, sp); +} + +/* + * This is called through call_rcu in order to free TDP page table memory + * safely with respect to other kernel threads that may be operating on + * the memory. + * By only accessing TDP MMU page table memory in an RCU read critical + * section, and freeing it after a grace period, lockless access to that + * memory won't use it after it is freed. + */ +static void tdp_mmu_free_sp_rcu_callback(struct rcu_head *head) +{ + struct kvm_mmu_page *sp = container_of(head, struct kvm_mmu_page, + rcu_head); + + tdp_mmu_free_sp(sp); +} + static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, u64 old_spte, u64 new_spte, int level);
@@ -269,8 +297,7 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, u64 *pt) kvm_flush_remote_tlbs_with_address(kvm, gfn, KVM_PAGES_PER_HPAGE(level));
- free_page((unsigned long)pt); - kmem_cache_free(mmu_page_header_cache, sp); + call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback); }
/** @@ -374,13 +401,13 @@ static inline void __tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter, u64 new_spte, bool record_acc_track, bool record_dirty_log) { - u64 *root_pt = tdp_iter_root_pt(iter); + tdp_ptep_t root_pt = tdp_iter_root_pt(iter); struct kvm_mmu_page *root = sptep_to_sp(root_pt); int as_id = kvm_mmu_page_as_id(root);
lockdep_assert_held(&kvm->mmu_lock);
- WRITE_ONCE(*iter->sptep, new_spte); + WRITE_ONCE(*rcu_dereference(iter->sptep), new_spte);
__handle_changed_spte(kvm, as_id, iter->gfn, iter->old_spte, new_spte, iter->level); @@ -450,10 +477,13 @@ static inline bool tdp_mmu_iter_cond_resched(struct kvm *kvm, return false;
if (need_resched() || spin_needbreak(&kvm->mmu_lock)) { + rcu_read_unlock(); + if (flush) kvm_flush_remote_tlbs(kvm);
cond_resched_lock(&kvm->mmu_lock); + rcu_read_lock();
WARN_ON(iter->gfn > iter->next_last_level_gfn);
@@ -484,6 +514,8 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, struct tdp_iter iter; bool flush_needed = false;
+ rcu_read_lock(); + tdp_root_for_each_pte(iter, root, start, end) { if (can_yield && tdp_mmu_iter_cond_resched(kvm, &iter, flush_needed)) { @@ -507,6 +539,8 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, tdp_mmu_set_spte(kvm, &iter, 0); flush_needed = true; } + + rcu_read_unlock(); return flush_needed; }
@@ -552,13 +586,15 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
if (unlikely(is_noslot_pfn(pfn))) { new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL); - trace_mark_mmio_spte(iter->sptep, iter->gfn, new_spte); + trace_mark_mmio_spte(rcu_dereference(iter->sptep), iter->gfn, + new_spte); } else { make_spte_ret = make_spte(vcpu, ACC_ALL, iter->level, iter->gfn, pfn, iter->old_spte, prefault, true, map_writable, !shadow_accessed_mask, &new_spte); - trace_kvm_mmu_set_spte(iter->level, iter->gfn, iter->sptep); + trace_kvm_mmu_set_spte(iter->level, iter->gfn, + rcu_dereference(iter->sptep)); }
if (new_spte == iter->old_spte) @@ -581,7 +617,8 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write, if (unlikely(is_mmio_spte(new_spte))) ret = RET_PF_EMULATE;
- trace_kvm_mmu_set_spte(iter->level, iter->gfn, iter->sptep); + trace_kvm_mmu_set_spte(iter->level, iter->gfn, + rcu_dereference(iter->sptep)); if (!prefault) vcpu->stat.pf_fixed++;
@@ -619,6 +656,9 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, huge_page_disallowed, &req_level);
trace_kvm_mmu_spte_requested(gpa, level, pfn); + + rcu_read_lock(); + tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) { if (nx_huge_page_workaround_enabled) disallowed_hugepage_adjust(iter.old_spte, gfn, @@ -644,7 +684,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, * because the new value informs the !present * path below. */ - iter.old_spte = READ_ONCE(*iter.sptep); + iter.old_spte = READ_ONCE(*rcu_dereference(iter.sptep)); }
if (!is_shadow_present_pte(iter.old_spte)) { @@ -663,11 +703,14 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, } }
- if (WARN_ON(iter.level != level)) + if (WARN_ON(iter.level != level)) { + rcu_read_unlock(); return RET_PF_RETRY; + }
ret = tdp_mmu_map_handle_target_level(vcpu, write, map_writable, &iter, pfn, prefault); + rcu_read_unlock();
return ret; } @@ -738,6 +781,8 @@ static int age_gfn_range(struct kvm *kvm, struct kvm_memory_slot *slot, int young = 0; u64 new_spte = 0;
+ rcu_read_lock(); + tdp_root_for_each_leaf_pte(iter, root, start, end) { /* * If we have a non-accessed entry we don't need to change the @@ -769,6 +814,8 @@ static int age_gfn_range(struct kvm *kvm, struct kvm_memory_slot *slot, trace_kvm_age_page(iter.gfn, iter.level, slot, young); }
+ rcu_read_unlock(); + return young; }
@@ -814,6 +861,8 @@ static int set_tdp_spte(struct kvm *kvm, struct kvm_memory_slot *slot, u64 new_spte; int need_flush = 0;
+ rcu_read_lock(); + WARN_ON(pte_huge(*ptep));
new_pfn = pte_pfn(*ptep); @@ -842,6 +891,8 @@ static int set_tdp_spte(struct kvm *kvm, struct kvm_memory_slot *slot, if (need_flush) kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
+ rcu_read_unlock(); + return 0; }
@@ -865,6 +916,8 @@ static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, u64 new_spte; bool spte_set = false;
+ rcu_read_lock(); + BUG_ON(min_level > KVM_MAX_HUGEPAGE_LEVEL);
for_each_tdp_pte_min_level(iter, root->spt, root->role.level, @@ -881,6 +934,8 @@ static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, tdp_mmu_set_spte_no_dirty_log(kvm, &iter, new_spte); spte_set = true; } + + rcu_read_unlock(); return spte_set; }
@@ -922,6 +977,8 @@ static bool clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, u64 new_spte; bool spte_set = false;
+ rcu_read_lock(); + tdp_root_for_each_leaf_pte(iter, root, start, end) { if (tdp_mmu_iter_cond_resched(kvm, &iter, false)) continue; @@ -941,6 +998,8 @@ static bool clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, tdp_mmu_set_spte_no_dirty_log(kvm, &iter, new_spte); spte_set = true; } + + rcu_read_unlock(); return spte_set; }
@@ -982,6 +1041,8 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root, struct tdp_iter iter; u64 new_spte;
+ rcu_read_lock(); + tdp_root_for_each_leaf_pte(iter, root, gfn + __ffs(mask), gfn + BITS_PER_LONG) { if (!mask) @@ -1007,6 +1068,8 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
mask &= ~(1UL << (iter.gfn - gfn)); } + + rcu_read_unlock(); }
/* @@ -1046,6 +1109,8 @@ static bool set_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, u64 new_spte; bool spte_set = false;
+ rcu_read_lock(); + tdp_root_for_each_pte(iter, root, start, end) { if (tdp_mmu_iter_cond_resched(kvm, &iter, false)) continue; @@ -1059,6 +1124,7 @@ static bool set_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, spte_set = true; }
+ rcu_read_unlock(); return spte_set; }
@@ -1096,6 +1162,8 @@ static void zap_collapsible_spte_range(struct kvm *kvm, kvm_pfn_t pfn; bool spte_set = false;
+ rcu_read_lock(); + tdp_root_for_each_pte(iter, root, start, end) { if (tdp_mmu_iter_cond_resched(kvm, &iter, spte_set)) { spte_set = false; @@ -1117,6 +1185,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm, spte_set = true; }
+ rcu_read_unlock(); if (spte_set) kvm_flush_remote_tlbs(kvm); } @@ -1153,6 +1222,8 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root, u64 new_spte; bool spte_set = false;
+ rcu_read_lock(); + tdp_root_for_each_leaf_pte(iter, root, gfn, gfn + 1) { if (!is_writable_pte(iter.old_spte)) break; @@ -1164,6 +1235,8 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root, spte_set = true; }
+ rcu_read_unlock(); + return spte_set; }
@@ -1204,10 +1277,14 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
*root_level = vcpu->arch.mmu->shadow_root_level;
+ rcu_read_lock(); + tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) { leaf = iter.level; sptes[leaf] = iter.old_spte; }
+ rcu_read_unlock(); + return leaf; }
From: Sean Christopherson seanjc@google.com
[ Upstream commit a835429cda91621fca915d80672a157b47738afb ]
When flushing a range of GFNs across multiple roots, ensure any pending flush from a previous root is honored before yielding while walking the tables of the current root.
Note, kvm_tdp_mmu_zap_gfn_range() now intentionally overwrites its local "flush" with the result to avoid redundant flushes. zap_gfn_range() preserves and return the incoming "flush", unless of course the flush was performed prior to yielding and no new flush was triggered.
Fixes: 1af4a96025b3 ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed") Cc: stable@vger.kernel.org Reviewed-by: Ben Gardon bgardon@google.com Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20210325200119.1359384-2-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/mmu/tdp_mmu.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 65c9172dcdf9..50c088a41dee 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -111,7 +111,7 @@ bool is_tdp_mmu_root(struct kvm *kvm, hpa_t hpa) }
static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, - gfn_t start, gfn_t end, bool can_yield); + gfn_t start, gfn_t end, bool can_yield, bool flush);
void kvm_tdp_mmu_free_root(struct kvm *kvm, struct kvm_mmu_page *root) { @@ -124,7 +124,7 @@ void kvm_tdp_mmu_free_root(struct kvm *kvm, struct kvm_mmu_page *root)
list_del(&root->link);
- zap_gfn_range(kvm, root, 0, max_gfn, false); + zap_gfn_range(kvm, root, 0, max_gfn, false, false);
free_page((unsigned long)root->spt); kmem_cache_free(mmu_page_header_cache, root); @@ -506,20 +506,21 @@ static inline bool tdp_mmu_iter_cond_resched(struct kvm *kvm, * scheduler needs the CPU or there is contention on the MMU lock. If this * function cannot yield, it will not release the MMU lock or reschedule and * the caller must ensure it does not supply too large a GFN range, or the - * operation can cause a soft lockup. + * operation can cause a soft lockup. Note, in some use cases a flush may be + * required by prior actions. Ensure the pending flush is performed prior to + * yielding. */ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, - gfn_t start, gfn_t end, bool can_yield) + gfn_t start, gfn_t end, bool can_yield, bool flush) { struct tdp_iter iter; - bool flush_needed = false;
rcu_read_lock();
tdp_root_for_each_pte(iter, root, start, end) { if (can_yield && - tdp_mmu_iter_cond_resched(kvm, &iter, flush_needed)) { - flush_needed = false; + tdp_mmu_iter_cond_resched(kvm, &iter, flush)) { + flush = false; continue; }
@@ -537,11 +538,11 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, continue;
tdp_mmu_set_spte(kvm, &iter, 0); - flush_needed = true; + flush = true; }
rcu_read_unlock(); - return flush_needed; + return flush; }
/* @@ -556,7 +557,7 @@ bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start, gfn_t end) bool flush = false;
for_each_tdp_mmu_root_yield_safe(kvm, root) - flush |= zap_gfn_range(kvm, root, start, end, true); + flush = zap_gfn_range(kvm, root, start, end, true, flush);
return flush; } @@ -759,7 +760,7 @@ static int zap_gfn_range_hva_wrapper(struct kvm *kvm, struct kvm_mmu_page *root, gfn_t start, gfn_t end, unsigned long unused) { - return zap_gfn_range(kvm, root, start, end, false); + return zap_gfn_range(kvm, root, start, end, false, false); }
int kvm_tdp_mmu_zap_hva_range(struct kvm *kvm, unsigned long start,
From: Ben Gardon bgardon@google.com
[ Upstream commit fe43fa2f407b9d513f7bcf18142e14e1bf1508d6 ]
__tdp_mmu_set_spte is a very important function in the TDP MMU which already accepts several arguments and will take more in future commits. To offset this complexity, add a comment to the function describing each of the arguemnts.
No functional change intended.
Reviewed-by: Peter Feiner pfeiner@google.com Acked-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Ben Gardon bgardon@google.com Message-Id: 20210202185734.1680553-3-bgardon@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/mmu/tdp_mmu.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 50c088a41dee..6bd86bb4c089 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -397,6 +397,22 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, new_spte, level); }
+/* + * __tdp_mmu_set_spte - Set a TDP MMU SPTE and handle the associated bookkeeping + * @kvm: kvm instance + * @iter: a tdp_iter instance currently on the SPTE that should be set + * @new_spte: The value the SPTE should be set to + * @record_acc_track: Notify the MM subsystem of changes to the accessed state + * of the page. Should be set unless handling an MMU + * notifier for access tracking. Leaving record_acc_track + * unset in that case prevents page accesses from being + * double counted. + * @record_dirty_log: Record the page as dirty in the dirty bitmap if + * appropriate for the change being made. Should be set + * unless performing certain dirty logging operations. + * Leaving record_dirty_log unset in that case prevents page + * writes from being double counted. + */ static inline void __tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter, u64 new_spte, bool record_acc_track, bool record_dirty_log)
From: Ben Gardon bgardon@google.com
[ Upstream commit 734e45b329d626d2c14e2bcf8be3d069a33c3316 ]
The KVM MMU caches already guarantee that shadow page table memory will be zeroed, so there is no reason to re-zero the page in the TDP MMU page fault handler.
No functional change intended.
Reviewed-by: Peter Feiner pfeiner@google.com Reviewed-by: Sean Christopherson seanjc@google.com Acked-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Ben Gardon bgardon@google.com Message-Id: 20210202185734.1680553-5-bgardon@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/mmu/tdp_mmu.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 6bd86bb4c089..4a2b8844f00f 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -708,7 +708,6 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level); list_add(&sp->link, &vcpu->kvm->arch.tdp_mmu_pages); child_pt = sp->spt; - clear_page(child_pt); new_spte = make_nonleaf_spte(child_pt, !shadow_accessed_mask);
From: Ben Gardon bgardon@google.com
[ Upstream commit 8d1a182ea791f0111b0258c8f3eb8d77af0a8386 ]
No functional change intended.
Fixes: 29cf0f5007a2 ("kvm: x86/mmu: NX largepage recovery for TDP MMU") Signed-off-by: Ben Gardon bgardon@google.com Message-Id: 20210202185734.1680553-10-bgardon@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/mmu/mmu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index ed861245ecf0..5771102a840c 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6005,10 +6005,10 @@ static void kvm_recover_nx_lpages(struct kvm *kvm) struct kvm_mmu_page, lpage_disallowed_link); WARN_ON_ONCE(!sp->lpage_disallowed); - if (sp->tdp_mmu_page) + if (sp->tdp_mmu_page) { kvm_tdp_mmu_zap_gfn_range(kvm, sp->gfn, sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level)); - else { + } else { kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list); WARN_ON_ONCE(sp->lpage_disallowed); }
From: Ben Gardon bgardon@google.com
[ Upstream commit a9442f594147f95307f691cfba0c31e25dc79b9d ]
Move the work of adding and removing TDP MMU pages to/from "secondary" data structures to helper functions. These functions will be built on in future commits to enable MMU operations to proceed (mostly) in parallel.
No functional change expected.
Signed-off-by: Ben Gardon bgardon@google.com Message-Id: 20210202185734.1680553-20-bgardon@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/mmu/tdp_mmu.c | 47 +++++++++++++++++++++++++++++++------- 1 file changed, 39 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 4a2b8844f00f..bc49a5b90086 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -262,6 +262,39 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn, } }
+/** + * tdp_mmu_link_page - Add a new page to the list of pages used by the TDP MMU + * + * @kvm: kvm instance + * @sp: the new page + * @account_nx: This page replaces a NX large page and should be marked for + * eventual reclaim. + */ +static void tdp_mmu_link_page(struct kvm *kvm, struct kvm_mmu_page *sp, + bool account_nx) +{ + lockdep_assert_held_write(&kvm->mmu_lock); + + list_add(&sp->link, &kvm->arch.tdp_mmu_pages); + if (account_nx) + account_huge_nx_page(kvm, sp); +} + +/** + * tdp_mmu_unlink_page - Remove page from the list of pages used by the TDP MMU + * + * @kvm: kvm instance + * @sp: the page to be removed + */ +static void tdp_mmu_unlink_page(struct kvm *kvm, struct kvm_mmu_page *sp) +{ + lockdep_assert_held_write(&kvm->mmu_lock); + + list_del(&sp->link); + if (sp->lpage_disallowed) + unaccount_huge_nx_page(kvm, sp); +} + /** * handle_removed_tdp_mmu_page - handle a pt removed from the TDP structure * @@ -281,10 +314,7 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, u64 *pt)
trace_kvm_mmu_prepare_zap_page(sp);
- list_del(&sp->link); - - if (sp->lpage_disallowed) - unaccount_huge_nx_page(kvm, sp); + tdp_mmu_unlink_page(kvm, sp);
for (i = 0; i < PT64_ENT_PER_PAGE; i++) { old_child_spte = READ_ONCE(*(pt + i)); @@ -706,15 +736,16 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
if (!is_shadow_present_pte(iter.old_spte)) { sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level); - list_add(&sp->link, &vcpu->kvm->arch.tdp_mmu_pages); child_pt = sp->spt; + + tdp_mmu_link_page(vcpu->kvm, sp, + huge_page_disallowed && + req_level >= iter.level); + new_spte = make_nonleaf_spte(child_pt, !shadow_accessed_mask);
trace_kvm_mmu_get_page(sp, true); - if (huge_page_disallowed && req_level >= iter.level) - account_huge_nx_page(vcpu->kvm, sp); - tdp_mmu_set_spte(vcpu->kvm, &iter, new_spte); } }
From: Ben Gardon bgardon@google.com
[ Upstream commit 9a77daacc87dee9fd63e31243f21894132ed8407 ]
To prepare for handling page faults in parallel, change the TDP MMU page fault handler to use atomic operations to set SPTEs so that changes are not lost if multiple threads attempt to modify the same SPTE.
Reviewed-by: Peter Feiner pfeiner@google.com Signed-off-by: Ben Gardon bgardon@google.com
Message-Id: 20210202185734.1680553-21-bgardon@google.com [Document new locking rules. - Paolo] Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/virt/kvm/locking.rst | 9 +- arch/x86/include/asm/kvm_host.h | 13 +++ arch/x86/kvm/mmu/tdp_mmu.c | 142 ++++++++++++++++++++++------- 3 files changed, 130 insertions(+), 34 deletions(-)
diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst index b21a34c34a21..0aa4817b466d 100644 --- a/Documentation/virt/kvm/locking.rst +++ b/Documentation/virt/kvm/locking.rst @@ -16,7 +16,14 @@ The acquisition orders for mutexes are as follows: - kvm->slots_lock is taken outside kvm->irq_lock, though acquiring them together is quite rare.
-On x86, vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock. +On x86: + +- vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock + +- kvm->arch.mmu_lock is an rwlock. kvm->arch.tdp_mmu_pages_lock is + taken inside kvm->arch.mmu_lock, and cannot be taken without already + holding kvm->arch.mmu_lock (typically with ``read_lock``, otherwise + there's no need to take kvm->arch.tdp_mmu_pages_lock at all).
Everything else is a leaf: no other lock is taken inside the critical sections. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index e0cfd620b293..42fca28d6189 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1030,6 +1030,19 @@ struct kvm_arch { * tdp_mmu_page set and a root_count of 0. */ struct list_head tdp_mmu_pages; + + /* + * Protects accesses to the following fields when the MMU lock + * is held in read mode: + * - tdp_mmu_pages (above) + * - the link field of struct kvm_mmu_pages used by the TDP MMU + * - lpage_disallowed_mmu_pages + * - the lpage_disallowed_link field of struct kvm_mmu_pages used + * by the TDP MMU + * It is acceptable, but not necessary, to acquire this lock when + * the thread holds the MMU lock in write mode. + */ + spinlock_t tdp_mmu_pages_lock; };
struct kvm_vm_stat { diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index bc49a5b90086..bb6faa9193b4 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -7,6 +7,7 @@ #include "tdp_mmu.h" #include "spte.h"
+#include <asm/cmpxchg.h> #include <trace/events/kvm.h>
#ifdef CONFIG_X86_64 @@ -33,6 +34,7 @@ void kvm_mmu_init_tdp_mmu(struct kvm *kvm) kvm->arch.tdp_mmu_enabled = true;
INIT_LIST_HEAD(&kvm->arch.tdp_mmu_roots); + spin_lock_init(&kvm->arch.tdp_mmu_pages_lock); INIT_LIST_HEAD(&kvm->arch.tdp_mmu_pages); }
@@ -225,7 +227,8 @@ static void tdp_mmu_free_sp_rcu_callback(struct rcu_head *head) }
static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, - u64 old_spte, u64 new_spte, int level); + u64 old_spte, u64 new_spte, int level, + bool shared);
static int kvm_mmu_page_as_id(struct kvm_mmu_page *sp) { @@ -267,17 +270,26 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn, * * @kvm: kvm instance * @sp: the new page + * @shared: This operation may not be running under the exclusive use of + * the MMU lock and the operation must synchronize with other + * threads that might be adding or removing pages. * @account_nx: This page replaces a NX large page and should be marked for * eventual reclaim. */ static void tdp_mmu_link_page(struct kvm *kvm, struct kvm_mmu_page *sp, - bool account_nx) + bool shared, bool account_nx) { - lockdep_assert_held_write(&kvm->mmu_lock); + if (shared) + spin_lock(&kvm->arch.tdp_mmu_pages_lock); + else + lockdep_assert_held_write(&kvm->mmu_lock);
list_add(&sp->link, &kvm->arch.tdp_mmu_pages); if (account_nx) account_huge_nx_page(kvm, sp); + + if (shared) + spin_unlock(&kvm->arch.tdp_mmu_pages_lock); }
/** @@ -285,14 +297,24 @@ static void tdp_mmu_link_page(struct kvm *kvm, struct kvm_mmu_page *sp, * * @kvm: kvm instance * @sp: the page to be removed + * @shared: This operation may not be running under the exclusive use of + * the MMU lock and the operation must synchronize with other + * threads that might be adding or removing pages. */ -static void tdp_mmu_unlink_page(struct kvm *kvm, struct kvm_mmu_page *sp) +static void tdp_mmu_unlink_page(struct kvm *kvm, struct kvm_mmu_page *sp, + bool shared) { - lockdep_assert_held_write(&kvm->mmu_lock); + if (shared) + spin_lock(&kvm->arch.tdp_mmu_pages_lock); + else + lockdep_assert_held_write(&kvm->mmu_lock);
list_del(&sp->link); if (sp->lpage_disallowed) unaccount_huge_nx_page(kvm, sp); + + if (shared) + spin_unlock(&kvm->arch.tdp_mmu_pages_lock); }
/** @@ -300,28 +322,39 @@ static void tdp_mmu_unlink_page(struct kvm *kvm, struct kvm_mmu_page *sp) * * @kvm: kvm instance * @pt: the page removed from the paging structure + * @shared: This operation may not be running under the exclusive use + * of the MMU lock and the operation must synchronize with other + * threads that might be modifying SPTEs. * * Given a page table that has been removed from the TDP paging structure, * iterates through the page table to clear SPTEs and free child page tables. */ -static void handle_removed_tdp_mmu_page(struct kvm *kvm, u64 *pt) +static void handle_removed_tdp_mmu_page(struct kvm *kvm, u64 *pt, + bool shared) { struct kvm_mmu_page *sp = sptep_to_sp(pt); int level = sp->role.level; gfn_t gfn = sp->gfn; u64 old_child_spte; + u64 *sptep; int i;
trace_kvm_mmu_prepare_zap_page(sp);
- tdp_mmu_unlink_page(kvm, sp); + tdp_mmu_unlink_page(kvm, sp, shared);
for (i = 0; i < PT64_ENT_PER_PAGE; i++) { - old_child_spte = READ_ONCE(*(pt + i)); - WRITE_ONCE(*(pt + i), 0); + sptep = pt + i; + + if (shared) { + old_child_spte = xchg(sptep, 0); + } else { + old_child_spte = READ_ONCE(*sptep); + WRITE_ONCE(*sptep, 0); + } handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn + (i * KVM_PAGES_PER_HPAGE(level - 1)), - old_child_spte, 0, level - 1); + old_child_spte, 0, level - 1, shared); }
kvm_flush_remote_tlbs_with_address(kvm, gfn, @@ -338,12 +371,16 @@ static void handle_removed_tdp_mmu_page(struct kvm *kvm, u64 *pt) * @old_spte: The value of the SPTE before the change * @new_spte: The value of the SPTE after the change * @level: the level of the PT the SPTE is part of in the paging structure + * @shared: This operation may not be running under the exclusive use of + * the MMU lock and the operation must synchronize with other + * threads that might be modifying SPTEs. * * Handle bookkeeping that might result from the modification of a SPTE. * This function must be called for all TDP SPTE modifications. */ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, - u64 old_spte, u64 new_spte, int level) + u64 old_spte, u64 new_spte, int level, + bool shared) { bool was_present = is_shadow_present_pte(old_spte); bool is_present = is_shadow_present_pte(new_spte); @@ -415,18 +452,51 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, */ if (was_present && !was_leaf && (pfn_changed || !is_present)) handle_removed_tdp_mmu_page(kvm, - spte_to_child_pt(old_spte, level)); + spte_to_child_pt(old_spte, level), shared); }
static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, - u64 old_spte, u64 new_spte, int level) + u64 old_spte, u64 new_spte, int level, + bool shared) { - __handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level); + __handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, + shared); handle_changed_spte_acc_track(old_spte, new_spte, level); handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte, new_spte, level); }
+/* + * tdp_mmu_set_spte_atomic - Set a TDP MMU SPTE atomically and handle the + * associated bookkeeping + * + * @kvm: kvm instance + * @iter: a tdp_iter instance currently on the SPTE that should be set + * @new_spte: The value the SPTE should be set to + * Returns: true if the SPTE was set, false if it was not. If false is returned, + * this function will have no side-effects. + */ +static inline bool tdp_mmu_set_spte_atomic(struct kvm *kvm, + struct tdp_iter *iter, + u64 new_spte) +{ + u64 *root_pt = tdp_iter_root_pt(iter); + struct kvm_mmu_page *root = sptep_to_sp(root_pt); + int as_id = kvm_mmu_page_as_id(root); + + lockdep_assert_held_read(&kvm->mmu_lock); + + if (cmpxchg64(rcu_dereference(iter->sptep), iter->old_spte, + new_spte) != iter->old_spte) + return false; + + handle_changed_spte(kvm, as_id, iter->gfn, iter->old_spte, new_spte, + iter->level, true); + + return true; +} + + /* * __tdp_mmu_set_spte - Set a TDP MMU SPTE and handle the associated bookkeeping * @kvm: kvm instance @@ -456,7 +526,7 @@ static inline void __tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter, WRITE_ONCE(*rcu_dereference(iter->sptep), new_spte);
__handle_changed_spte(kvm, as_id, iter->gfn, iter->old_spte, new_spte, - iter->level); + iter->level, false); if (record_acc_track) handle_changed_spte_acc_track(iter->old_spte, new_spte, iter->level); @@ -631,23 +701,18 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write, int ret = 0; int make_spte_ret = 0;
- if (unlikely(is_noslot_pfn(pfn))) { + if (unlikely(is_noslot_pfn(pfn))) new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL); - trace_mark_mmio_spte(rcu_dereference(iter->sptep), iter->gfn, - new_spte); - } else { + else make_spte_ret = make_spte(vcpu, ACC_ALL, iter->level, iter->gfn, pfn, iter->old_spte, prefault, true, map_writable, !shadow_accessed_mask, &new_spte); - trace_kvm_mmu_set_spte(iter->level, iter->gfn, - rcu_dereference(iter->sptep)); - }
if (new_spte == iter->old_spte) ret = RET_PF_SPURIOUS; - else - tdp_mmu_set_spte(vcpu->kvm, iter, new_spte); + else if (!tdp_mmu_set_spte_atomic(vcpu->kvm, iter, new_spte)) + return RET_PF_RETRY;
/* * If the page fault was caused by a write but the page is write @@ -661,8 +726,13 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write, }
/* If a MMIO SPTE is installed, the MMIO will need to be emulated. */ - if (unlikely(is_mmio_spte(new_spte))) + if (unlikely(is_mmio_spte(new_spte))) { + trace_mark_mmio_spte(rcu_dereference(iter->sptep), iter->gfn, + new_spte); ret = RET_PF_EMULATE; + } else + trace_kvm_mmu_set_spte(iter->level, iter->gfn, + rcu_dereference(iter->sptep));
trace_kvm_mmu_set_spte(iter->level, iter->gfn, rcu_dereference(iter->sptep)); @@ -721,7 +791,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, */ if (is_shadow_present_pte(iter.old_spte) && is_large_pte(iter.old_spte)) { - tdp_mmu_set_spte(vcpu->kvm, &iter, 0); + if (!tdp_mmu_set_spte_atomic(vcpu->kvm, &iter, 0)) + break;
kvm_flush_remote_tlbs_with_address(vcpu->kvm, iter.gfn, KVM_PAGES_PER_HPAGE(iter.level)); @@ -738,19 +809,24 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, sp = alloc_tdp_mmu_page(vcpu, iter.gfn, iter.level); child_pt = sp->spt;
- tdp_mmu_link_page(vcpu->kvm, sp, - huge_page_disallowed && - req_level >= iter.level); - new_spte = make_nonleaf_spte(child_pt, !shadow_accessed_mask);
- trace_kvm_mmu_get_page(sp, true); - tdp_mmu_set_spte(vcpu->kvm, &iter, new_spte); + if (tdp_mmu_set_spte_atomic(vcpu->kvm, &iter, + new_spte)) { + tdp_mmu_link_page(vcpu->kvm, sp, true, + huge_page_disallowed && + req_level >= iter.level); + + trace_kvm_mmu_get_page(sp, true); + } else { + tdp_mmu_free_sp(sp); + break; + } } }
- if (WARN_ON(iter.level != level)) { + if (iter.level != level) { rcu_read_unlock(); return RET_PF_RETRY; }
From: Paolo Bonzini pbonzini@redhat.com
[ Upstream commit 897218ff7cf19290ec2d69652ce673d8ed6fedeb ]
The TDP MMU assumes that it can do atomic accesses to 64-bit PTEs. Rather than just disabling it, compile it out completely so that it is possible to use for example 64-bit xchg.
To limit the number of stubs, wrap all accesses to tdp_mmu_enabled or tdp_mmu_page with a function. Calls to all other functions in tdp_mmu.c are eliminated and do not even reach the linker.
Reviewed-by: Sean Christopherson seanjc@google.com Tested-by: Sean Christopherson seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/Makefile | 3 ++- arch/x86/kvm/mmu/mmu.c | 36 ++++++++++++++++----------------- arch/x86/kvm/mmu/mmu_internal.h | 2 ++ arch/x86/kvm/mmu/tdp_mmu.c | 29 +------------------------- arch/x86/kvm/mmu/tdp_mmu.h | 32 +++++++++++++++++++++++++---- 6 files changed, 53 insertions(+), 51 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 42fca28d6189..0cbb13b83a16 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1005,6 +1005,7 @@ struct kvm_arch { struct kvm_pmu_event_filter *pmu_event_filter; struct task_struct *nx_lpage_recovery_thread;
+#ifdef CONFIG_X86_64 /* * Whether the TDP MMU is enabled for this VM. This contains a * snapshot of the TDP MMU module parameter from when the VM was @@ -1043,6 +1044,7 @@ struct kvm_arch { * the thread holds the MMU lock in write mode. */ spinlock_t tdp_mmu_pages_lock; +#endif /* CONFIG_X86_64 */ };
struct kvm_vm_stat { diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index 4bd14ab01323..53c54cdcc923 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -17,7 +17,8 @@ kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o kvm-y += x86.o emulate.o i8259.o irq.o lapic.o \ i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \ hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o \ - mmu/spte.o mmu/tdp_iter.o mmu/tdp_mmu.o + mmu/spte.o +kvm-$(CONFIG_X86_64) += mmu/tdp_iter.o mmu/tdp_mmu.o
kvm-intel-y += vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \ vmx/evmcs.o vmx/nested.o vmx/posted_intr.o diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 5771102a840c..d9901836d7aa 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1225,7 +1225,7 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, { struct kvm_rmap_head *rmap_head;
- if (kvm->arch.tdp_mmu_enabled) + if (is_tdp_mmu_enabled(kvm)) kvm_tdp_mmu_clear_dirty_pt_masked(kvm, slot, slot->base_gfn + gfn_offset, mask, true); while (mask) { @@ -1254,7 +1254,7 @@ void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm, { struct kvm_rmap_head *rmap_head;
- if (kvm->arch.tdp_mmu_enabled) + if (is_tdp_mmu_enabled(kvm)) kvm_tdp_mmu_clear_dirty_pt_masked(kvm, slot, slot->base_gfn + gfn_offset, mask, false); while (mask) { @@ -1309,7 +1309,7 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, write_protected |= __rmap_write_protect(kvm, rmap_head, true); }
- if (kvm->arch.tdp_mmu_enabled) + if (is_tdp_mmu_enabled(kvm)) write_protected |= kvm_tdp_mmu_write_protect_gfn(kvm, slot, gfn);
@@ -1521,7 +1521,7 @@ int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end,
r = kvm_handle_hva_range(kvm, start, end, 0, kvm_unmap_rmapp);
- if (kvm->arch.tdp_mmu_enabled) + if (is_tdp_mmu_enabled(kvm)) r |= kvm_tdp_mmu_zap_hva_range(kvm, start, end);
return r; @@ -1533,7 +1533,7 @@ int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
r = kvm_handle_hva(kvm, hva, (unsigned long)&pte, kvm_set_pte_rmapp);
- if (kvm->arch.tdp_mmu_enabled) + if (is_tdp_mmu_enabled(kvm)) r |= kvm_tdp_mmu_set_spte_hva(kvm, hva, &pte);
return r; @@ -1588,7 +1588,7 @@ int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end) int young = false;
young = kvm_handle_hva_range(kvm, start, end, 0, kvm_age_rmapp); - if (kvm->arch.tdp_mmu_enabled) + if (is_tdp_mmu_enabled(kvm)) young |= kvm_tdp_mmu_age_hva_range(kvm, start, end);
return young; @@ -1599,7 +1599,7 @@ int kvm_test_age_hva(struct kvm *kvm, unsigned long hva) int young = false;
young = kvm_handle_hva(kvm, hva, 0, kvm_test_age_rmapp); - if (kvm->arch.tdp_mmu_enabled) + if (is_tdp_mmu_enabled(kvm)) young |= kvm_tdp_mmu_test_age_hva(kvm, hva);
return young; @@ -3161,7 +3161,7 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t *root_hpa, sp = to_shadow_page(*root_hpa & PT64_BASE_ADDR_MASK);
if (kvm_mmu_put_root(kvm, sp)) { - if (sp->tdp_mmu_page) + if (is_tdp_mmu_page(sp)) kvm_tdp_mmu_free_root(kvm, sp); else if (sp->role.invalid) kvm_mmu_prepare_zap_page(kvm, sp, invalid_list); @@ -3255,7 +3255,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) hpa_t root; unsigned i;
- if (vcpu->kvm->arch.tdp_mmu_enabled) { + if (is_tdp_mmu_enabled(vcpu->kvm)) { root = kvm_tdp_mmu_get_vcpu_root_hpa(vcpu);
if (!VALID_PAGE(root)) @@ -5447,7 +5447,7 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
kvm_zap_obsolete_pages(kvm);
- if (kvm->arch.tdp_mmu_enabled) + if (is_tdp_mmu_enabled(kvm)) kvm_tdp_mmu_zap_all(kvm);
spin_unlock(&kvm->mmu_lock); @@ -5510,7 +5510,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) } }
- if (kvm->arch.tdp_mmu_enabled) { + if (is_tdp_mmu_enabled(kvm)) { flush = kvm_tdp_mmu_zap_gfn_range(kvm, gfn_start, gfn_end); if (flush) kvm_flush_remote_tlbs(kvm); @@ -5534,7 +5534,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, spin_lock(&kvm->mmu_lock); flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect, start_level, KVM_MAX_HUGEPAGE_LEVEL, false); - if (kvm->arch.tdp_mmu_enabled) + if (is_tdp_mmu_enabled(kvm)) flush |= kvm_tdp_mmu_wrprot_slot(kvm, memslot, PG_LEVEL_4K); spin_unlock(&kvm->mmu_lock);
@@ -5600,7 +5600,7 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm, slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot, kvm_mmu_zap_collapsible_spte, true);
- if (kvm->arch.tdp_mmu_enabled) + if (is_tdp_mmu_enabled(kvm)) kvm_tdp_mmu_zap_collapsible_sptes(kvm, memslot); spin_unlock(&kvm->mmu_lock); } @@ -5627,7 +5627,7 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
spin_lock(&kvm->mmu_lock); flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty, false); - if (kvm->arch.tdp_mmu_enabled) + if (is_tdp_mmu_enabled(kvm)) flush |= kvm_tdp_mmu_clear_dirty_slot(kvm, memslot); spin_unlock(&kvm->mmu_lock);
@@ -5650,7 +5650,7 @@ void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm, spin_lock(&kvm->mmu_lock); flush = slot_handle_large_level(kvm, memslot, slot_rmap_write_protect, false); - if (kvm->arch.tdp_mmu_enabled) + if (is_tdp_mmu_enabled(kvm)) flush |= kvm_tdp_mmu_wrprot_slot(kvm, memslot, PG_LEVEL_2M); spin_unlock(&kvm->mmu_lock);
@@ -5666,7 +5666,7 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm,
spin_lock(&kvm->mmu_lock); flush = slot_handle_all_level(kvm, memslot, __rmap_set_dirty, false); - if (kvm->arch.tdp_mmu_enabled) + if (is_tdp_mmu_enabled(kvm)) flush |= kvm_tdp_mmu_slot_set_dirty(kvm, memslot); spin_unlock(&kvm->mmu_lock);
@@ -5694,7 +5694,7 @@ void kvm_mmu_zap_all(struct kvm *kvm)
kvm_mmu_commit_zap_page(kvm, &invalid_list);
- if (kvm->arch.tdp_mmu_enabled) + if (is_tdp_mmu_enabled(kvm)) kvm_tdp_mmu_zap_all(kvm);
spin_unlock(&kvm->mmu_lock); @@ -6005,7 +6005,7 @@ static void kvm_recover_nx_lpages(struct kvm *kvm) struct kvm_mmu_page, lpage_disallowed_link); WARN_ON_ONCE(!sp->lpage_disallowed); - if (sp->tdp_mmu_page) { + if (is_tdp_mmu_page(sp)) { kvm_tdp_mmu_zap_gfn_range(kvm, sp->gfn, sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level)); } else { diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 9e600dc30f08..cbac13a2bd45 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -56,10 +56,12 @@ struct kvm_mmu_page { /* Number of writes since the last time traversal visited this page. */ atomic_t write_flooding_count;
+#ifdef CONFIG_X86_64 bool tdp_mmu_page;
/* Used for freeing the page asyncronously if it is a TDP MMU page. */ struct rcu_head rcu_head; +#endif };
extern struct kmem_cache *mmu_page_header_cache; diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index bb6faa9193b4..e2157d0a5712 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -10,24 +10,13 @@ #include <asm/cmpxchg.h> #include <trace/events/kvm.h>
-#ifdef CONFIG_X86_64 static bool __read_mostly tdp_mmu_enabled = false; module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0644); -#endif - -static bool is_tdp_mmu_enabled(void) -{ -#ifdef CONFIG_X86_64 - return tdp_enabled && READ_ONCE(tdp_mmu_enabled); -#else - return false; -#endif /* CONFIG_X86_64 */ -}
/* Initializes the TDP MMU for the VM, if enabled. */ void kvm_mmu_init_tdp_mmu(struct kvm *kvm) { - if (!is_tdp_mmu_enabled()) + if (!tdp_enabled || !READ_ONCE(tdp_mmu_enabled)) return;
/* This should not be changed for the lifetime of the VM. */ @@ -96,22 +85,6 @@ static inline struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm, #define for_each_tdp_mmu_root(_kvm, _root) \ list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link)
-bool is_tdp_mmu_root(struct kvm *kvm, hpa_t hpa) -{ - struct kvm_mmu_page *sp; - - if (!kvm->arch.tdp_mmu_enabled) - return false; - if (WARN_ON(!VALID_PAGE(hpa))) - return false; - - sp = to_shadow_page(hpa); - if (WARN_ON(!sp)) - return false; - - return sp->tdp_mmu_page && sp->root_count; -} - static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, gfn_t start, gfn_t end, bool can_yield, bool flush);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index cbbdbadd1526..b4b65e3699b3 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -5,10 +5,6 @@
#include <linux/kvm_host.h>
-void kvm_mmu_init_tdp_mmu(struct kvm *kvm); -void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm); - -bool is_tdp_mmu_root(struct kvm *kvm, hpa_t root); hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu); void kvm_tdp_mmu_free_root(struct kvm *kvm, struct kvm_mmu_page *root);
@@ -47,4 +43,32 @@ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm, int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level);
+#ifdef CONFIG_X86_64 +void kvm_mmu_init_tdp_mmu(struct kvm *kvm); +void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm); +static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return kvm->arch.tdp_mmu_enabled; } +static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return sp->tdp_mmu_page; } +#else +static inline void kvm_mmu_init_tdp_mmu(struct kvm *kvm) {} +static inline void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) {} +static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return false; } +static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; } +#endif + +static inline bool is_tdp_mmu_root(struct kvm *kvm, hpa_t hpa) +{ + struct kvm_mmu_page *sp; + + if (!is_tdp_mmu_enabled(kvm)) + return false; + if (WARN_ON(!VALID_PAGE(hpa))) + return false; + + sp = to_shadow_page(hpa); + if (WARN_ON(!sp)) + return false; + + return is_tdp_mmu_page(sp) && sp->root_count; +} + #endif /* __KVM_X86_MMU_TDP_MMU_H */
From: Sean Christopherson seanjc@google.com
[ Upstream commit 048f49809c526348775425420fb5b8e84fd9a133 ]
Honor the "flush needed" return from kvm_tdp_mmu_zap_gfn_range(), which does the flush itself if and only if it yields (which it will never do in this particular scenario), and otherwise expects the caller to do the flush. If pages are zapped from the TDP MMU but not the legacy MMU, then no flush will occur.
Fixes: 29cf0f5007a2 ("kvm: x86/mmu: NX largepage recovery for TDP MMU") Cc: stable@vger.kernel.org Cc: Ben Gardon bgardon@google.com Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20210325200119.1359384-3-seanjc@google.com Reviewed-by: Ben Gardon bgardon@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/mmu/mmu.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index d9901836d7aa..8643c766415a 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5985,6 +5985,8 @@ static void kvm_recover_nx_lpages(struct kvm *kvm) struct kvm_mmu_page *sp; unsigned int ratio; LIST_HEAD(invalid_list); + bool flush = false; + gfn_t gfn_end; ulong to_zap;
rcu_idx = srcu_read_lock(&kvm->srcu); @@ -6006,19 +6008,20 @@ static void kvm_recover_nx_lpages(struct kvm *kvm) lpage_disallowed_link); WARN_ON_ONCE(!sp->lpage_disallowed); if (is_tdp_mmu_page(sp)) { - kvm_tdp_mmu_zap_gfn_range(kvm, sp->gfn, - sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level)); + gfn_end = sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level); + flush = kvm_tdp_mmu_zap_gfn_range(kvm, sp->gfn, gfn_end); } else { kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list); WARN_ON_ONCE(sp->lpage_disallowed); }
if (need_resched() || spin_needbreak(&kvm->mmu_lock)) { - kvm_mmu_commit_zap_page(kvm, &invalid_list); + kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, flush); cond_resched_lock(&kvm->mmu_lock); + flush = false; } } - kvm_mmu_commit_zap_page(kvm, &invalid_list); + kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, flush);
spin_unlock(&kvm->mmu_lock); srcu_read_unlock(&kvm->srcu, rcu_idx);
From: Krzysztof Kozlowski krzk@kernel.org
[ Upstream commit c9570d4a5efd04479b3cd09c39b571eb031d94f4 ]
Add stubs for extcon_register_notifier_all() function for !CONFIG_EXTCON case. This is useful for compile testing and for drivers which use EXTCON but do not require it (therefore do not depend on CONFIG_EXTCON).
Fixes: 815429b39d94 ("extcon: Add new extcon_register_notifier_all() to monitor all external connectors") Reported-by: kernel test robot lkp@intel.com Signed-off-by: Krzysztof Kozlowski krzk@kernel.org Signed-off-by: Chanwoo Choi cw00.choi@samsung.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/extcon.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+)
diff --git a/include/linux/extcon.h b/include/linux/extcon.h index fd183fb9c20f..0c19010da77f 100644 --- a/include/linux/extcon.h +++ b/include/linux/extcon.h @@ -271,6 +271,29 @@ static inline void devm_extcon_unregister_notifier(struct device *dev, struct extcon_dev *edev, unsigned int id, struct notifier_block *nb) { }
+static inline int extcon_register_notifier_all(struct extcon_dev *edev, + struct notifier_block *nb) +{ + return 0; +} + +static inline int extcon_unregister_notifier_all(struct extcon_dev *edev, + struct notifier_block *nb) +{ + return 0; +} + +static inline int devm_extcon_register_notifier_all(struct device *dev, + struct extcon_dev *edev, + struct notifier_block *nb) +{ + return 0; +} + +static inline void devm_extcon_unregister_notifier_all(struct device *dev, + struct extcon_dev *edev, + struct notifier_block *nb) { } + static inline struct extcon_dev *extcon_get_extcon_dev(const char *extcon_name) { return ERR_PTR(-ENODEV);
From: Dinghao Liu dinghao.liu@zju.edu.cn
[ Upstream commit d3bdd1c3140724967ca4136755538fa7c05c2b4e ]
When devm_kcalloc() fails, we should execute device_unregister() to unregister edev->dev from system.
Fixes: 046050f6e623e ("extcon: Update the prototype of extcon_register_notifier() with enum extcon") Signed-off-by: Dinghao Liu dinghao.liu@zju.edu.cn Signed-off-by: Chanwoo Choi cw00.choi@samsung.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/extcon/extcon.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/extcon/extcon.c b/drivers/extcon/extcon.c index 0a6438cbb3f3..e7a9561a826d 100644 --- a/drivers/extcon/extcon.c +++ b/drivers/extcon/extcon.c @@ -1241,6 +1241,7 @@ int extcon_dev_register(struct extcon_dev *edev) sizeof(*edev->nh), GFP_KERNEL); if (!edev->nh) { ret = -ENOMEM; + device_unregister(&edev->dev); goto err_dev; }
From: Richard Gong richard.gong@intel.com
[ Upstream commit 2e8496f31d0be8f43849b2980b069f3a9805d047 ]
Clean up COMMAND_RECONFIG_FLAG_PARTIAL flag by resetting it to 0, which aligns with the firmware settings.
Fixes: 36847f9e3e56 ("firmware: stratix10-svc: correct reconfig flag and timeout values") Signed-off-by: Richard Gong richard.gong@intel.com Reviewed-by: Tom Rix trix@redhat.com Signed-off-by: Moritz Fischer mdf@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/firmware/intel/stratix10-svc-client.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/firmware/intel/stratix10-svc-client.h b/include/linux/firmware/intel/stratix10-svc-client.h index a93d85932eb9..f843c6a10cf3 100644 --- a/include/linux/firmware/intel/stratix10-svc-client.h +++ b/include/linux/firmware/intel/stratix10-svc-client.h @@ -56,7 +56,7 @@ * COMMAND_RECONFIG_FLAG_PARTIAL: * Set to FPGA configuration type (full or partial). */ -#define COMMAND_RECONFIG_FLAG_PARTIAL 1 +#define COMMAND_RECONFIG_FLAG_PARTIAL 0
/** * Timeout settings for service clients:
From: Nathan Lynch nathanl@linux.ibm.com
[ Upstream commit e834df6cfc71d8e5ce2c27a0184145ea125c3f0f ]
The atomic_t counter is the only shared state for the join/suspend sequence so far, but that will change. Contain it in a struct (pseries_suspend_info), and document its intended use. No functional change.
Signed-off-by: Nathan Lynch nathanl@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20210315080045.460331-2-nathanl@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/platforms/pseries/mobility.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index ea4d6a660e0d..a6739ce9feac 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -452,9 +452,21 @@ static int do_suspend(void) return ret; }
+/** + * struct pseries_suspend_info - State shared between CPUs for join/suspend. + * @counter: Threads are to increment this upon resuming from suspend + * or if an error is received from H_JOIN. The thread which performs + * the first increment (i.e. sets it to 1) is responsible for + * waking the other threads. + */ +struct pseries_suspend_info { + atomic_t counter; +}; + static int do_join(void *arg) { - atomic_t *counter = arg; + struct pseries_suspend_info *info = arg; + atomic_t *counter = &info->counter; long hvrc; int ret;
@@ -535,11 +547,15 @@ static int pseries_suspend(u64 handle) int ret;
while (true) { - atomic_t counter = ATOMIC_INIT(0); + struct pseries_suspend_info info; unsigned long vasi_state; int vasi_err;
- ret = stop_machine(do_join, &counter, cpu_online_mask); + info = (struct pseries_suspend_info) { + .counter = ATOMIC_INIT(0), + }; + + ret = stop_machine(do_join, &info, cpu_online_mask); if (ret == 0) break; /*
From: Nathan Lynch nathanl@linux.ibm.com
[ Upstream commit 274cb1ca2e7ce02cab56f5f4c61a74aeb566f931 ]
The pseries join/suspend sequence in its current form was written with the assumption that it was the only user of H_PROD and that it needn't handle spurious successful returns from H_JOIN. That's wrong; powerpc's paravirt spinlock code uses H_PROD, and CPUs entering do_join() can be woken prematurely from H_JOIN with a status of H_SUCCESS as a result. This causes all CPUs to exit the sequence early, preventing suspend from occurring at all.
Add a 'done' boolean flag to the pseries_suspend_info struct, and have the waking thread set it before waking the other threads. Threads which receive H_SUCCESS from H_JOIN retry if the 'done' flag is still unset.
Fixes: 9327dc0aeef3 ("powerpc/pseries/mobility: use stop_machine for join/suspend") Signed-off-by: Nathan Lynch nathanl@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20210315080045.460331-3-nathanl@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/platforms/pseries/mobility.c | 26 ++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index a6739ce9feac..e83e0891272d 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -458,9 +458,12 @@ static int do_suspend(void) * or if an error is received from H_JOIN. The thread which performs * the first increment (i.e. sets it to 1) is responsible for * waking the other threads. + * @done: False if join/suspend is in progress. True if the operation is + * complete (successful or not). */ struct pseries_suspend_info { atomic_t counter; + bool done; };
static int do_join(void *arg) @@ -470,6 +473,7 @@ static int do_join(void *arg) long hvrc; int ret;
+retry: /* Must ensure MSR.EE off for H_JOIN. */ hard_irq_disable(); hvrc = plpar_hcall_norets(H_JOIN); @@ -485,8 +489,20 @@ static int do_join(void *arg) case H_SUCCESS: /* * The suspend is complete and this cpu has received a - * prod. + * prod, or we've received a stray prod from unrelated + * code (e.g. paravirt spinlocks) and we need to join + * again. + * + * This barrier orders the return from H_JOIN above vs + * the load of info->done. It pairs with the barrier + * in the wakeup/prod path below. */ + smp_mb(); + if (READ_ONCE(info->done) == false) { + pr_info_ratelimited("premature return from H_JOIN on CPU %i, retrying", + smp_processor_id()); + goto retry; + } ret = 0; break; case H_BAD_MODE: @@ -500,6 +516,13 @@ static int do_join(void *arg)
if (atomic_inc_return(counter) == 1) { pr_info("CPU %u waking all threads\n", smp_processor_id()); + WRITE_ONCE(info->done, true); + /* + * This barrier orders the store to info->done vs subsequent + * H_PRODs to wake the other CPUs. It pairs with the barrier + * in the H_SUCCESS case above. + */ + smp_mb(); prod_others(); } /* @@ -553,6 +576,7 @@ static int pseries_suspend(u64 handle)
info = (struct pseries_suspend_info) { .counter = ATOMIC_INIT(0), + .done = false, };
ret = stop_machine(do_join, &info, cpu_online_mask);
From: Andy Shevchenko andriy.shevchenko@linux.intel.com
[ Upstream commit b522f830d35189e0283fa4d5b4b3ef8d7a78cfcb ]
It seems that on Intel Merrifield platform the USB PHY shouldn't be suspended. Otherwise it can't be enabled by simply change the cable in the connector.
Enable corresponding quirk for the platform in question.
Fixes: e5f4ca3fce90 ("usb: dwc3: ulpi: Fix USB2.0 HS/FS/LS PHY suspend regression") Suggested-by: Serge Semin fancer.lancer@gmail.com Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Link: https://lore.kernel.org/r/20210322125244.79407-1-andriy.shevchenko@linux.int... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/dwc3/dwc3-pci.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c index bae6a70664c8..598daed8086f 100644 --- a/drivers/usb/dwc3/dwc3-pci.c +++ b/drivers/usb/dwc3/dwc3-pci.c @@ -118,6 +118,8 @@ static const struct property_entry dwc3_pci_intel_properties[] = { static const struct property_entry dwc3_pci_mrfld_properties[] = { PROPERTY_ENTRY_STRING("dr_mode", "otg"), PROPERTY_ENTRY_STRING("linux,extcon-name", "mrfld_bcove_pwrsrc"), + PROPERTY_ENTRY_BOOL("snps,dis_u3_susphy_quirk"), + PROPERTY_ENTRY_BOOL("snps,dis_u2_susphy_quirk"), PROPERTY_ENTRY_BOOL("linux,sysdev_is_parent"), {} };
From: Lv Yunlong lyl2019@mail.ustc.edu.cn
[ Upstream commit 37df9f3fedb6aeaff5564145e8162aab912c9284 ]
Function hvfb_probe() calls hvfb_getmem(), expecting upon return that info->apertures is either NULL or points to memory that should be freed by framebuffer_release(). But hvfb_getmem() is freeing the memory and leaving the pointer non-NULL, resulting in a double free if an error occurs or later if hvfb_remove() is called.
Fix this by removing all kfree(info->apertures) calls in hvfb_getmem(). This will allow framebuffer_release() to free the memory, which follows the pattern of other fbdev drivers.
Fixes: 3a6fb6c4255c ("video: hyperv: hyperv_fb: Use physical memory for fb on HyperV Gen 1 VMs.") Signed-off-by: Lv Yunlong lyl2019@mail.ustc.edu.cn Reviewed-by: Michael Kelley mikelley@microsoft.com Link: https://lore.kernel.org/r/20210324103724.4189-1-lyl2019@mail.ustc.edu.cn Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/video/fbdev/hyperv_fb.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c index c8b0ae676809..4dc9077dd2ac 100644 --- a/drivers/video/fbdev/hyperv_fb.c +++ b/drivers/video/fbdev/hyperv_fb.c @@ -1031,7 +1031,6 @@ static int hvfb_getmem(struct hv_device *hdev, struct fb_info *info) PCI_DEVICE_ID_HYPERV_VIDEO, NULL); if (!pdev) { pr_err("Unable to find PCI Hyper-V video\n"); - kfree(info->apertures); return -ENODEV; }
@@ -1129,7 +1128,6 @@ getmem_done: } else { pci_dev_put(pdev); } - kfree(info->apertures);
return 0;
@@ -1141,7 +1139,6 @@ err2: err1: if (!gen2vm) pci_dev_put(pdev); - kfree(info->apertures);
return -ENOMEM; }
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
[ Upstream commit 53f1d31708f6240e4615b0927df31f182e389e2f ]
H_PROTECT expects the flag value to include flags: AVPN, pp0, pp1, pp2, key0-key4, Noexec, CMO Option flags
This patch updates hpte_updatepp() to fetch the storage key value from the linux page table and use the same in H_PROTECT hcall.
native_hpte_updatepp() is not updated because the kernel doesn't clear the existing storage key value there. The kernel also doesn't use hpte_updatepp() callback for updating storage keys.
This fixes the below kernel crash observed with KUAP enabled.
BUG: Unable to handle kernel data access on write at 0xc009fffffc440000 Faulting instruction address: 0xc0000000000b7030 Key fault AMR: 0xfcffffffffffffff IAMR: 0xc0000077bc498100 Found HPTE: v = 0x40070adbb6fffc05 r = 0x1ffffffffff1194 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries ... CFAR: c000000000010100 DAR: c009fffffc440000 DSISR: 02200000 IRQMASK: 0 ... NIP memset+0x68/0x104 LR pcpu_alloc+0x54c/0xb50 Call Trace: pcpu_alloc+0x55c/0xb50 (unreliable) blk_stat_alloc_callback+0x94/0x150 blk_mq_init_allocated_queue+0x64/0x560 blk_mq_init_queue+0x54/0xb0 scsi_mq_alloc_queue+0x30/0xa0 scsi_alloc_sdev+0x1cc/0x300 scsi_probe_and_add_lun+0xb50/0x1020 __scsi_scan_target+0x17c/0x790 scsi_scan_channel+0x90/0xe0 scsi_scan_host_selected+0x148/0x1f0 do_scan_async+0x2c/0x2a0 async_run_entry_fn+0x78/0x220 process_one_work+0x264/0x540 worker_thread+0xa8/0x600 kthread+0x190/0x1a0 ret_from_kernel_thread+0x5c/0x6c
With KUAP enabled the kernel uses storage key 3 for all its translations. But as shown by the debug print, in this specific case we have the hash page table entry created with key value 0.
Found HPTE: v = 0x40070adbb6fffc05 r = 0x1ffffffffff1194
and DSISR indicates a key fault.
This can happen due to parallel fault on the same EA by different CPUs:
CPU 0 CPU 1 fault on X
H_PAGE_BUSY set fault on X
finish fault handling and clear H_PAGE_BUSY check for H_PAGE_BUSY continue with fault handling.
This implies CPU1 will end up calling hpte_updatepp for address X and the kernel updated the hash pte entry with key 0
Fixes: d94b827e89dc ("powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash translation") Reported-by: Murilo Opsfelder Araujo muriloo@linux.ibm.com Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Debugged-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20210326070755.304625-1-aneesh.kumar@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/platforms/pseries/lpar.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c index 764170fdb0f7..3805519a6469 100644 --- a/arch/powerpc/platforms/pseries/lpar.c +++ b/arch/powerpc/platforms/pseries/lpar.c @@ -887,7 +887,8 @@ static long pSeries_lpar_hpte_updatepp(unsigned long slot,
want_v = hpte_encode_avpn(vpn, psize, ssize);
- flags = (newpp & 7) | H_AVPN; + flags = (newpp & (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO)) | H_AVPN; + flags |= (newpp & HPTE_R_KEY_HI) >> 48; if (mmu_has_feature(MMU_FTR_KERNEL_RO)) /* Move pp0 into bit 8 (IBM 55) */ flags |= (newpp & HPTE_R_PP0) >> 55;
From: Zheyu Ma zheyuma97@gmail.com
[ Upstream commit 829933ef05a951c8ff140e814656d73e74915faf ]
For each device, the nosy driver allocates a pcilynx structure. A use-after-free might happen in the following scenario:
1. Open nosy device for the first time and call ioctl with command NOSY_IOC_START, then a new client A will be malloced and added to doubly linked list. 2. Open nosy device for the second time and call ioctl with command NOSY_IOC_START, then a new client B will be malloced and added to doubly linked list. 3. Call ioctl with command NOSY_IOC_START for client A, then client A will be readded to the doubly linked list. Now the doubly linked list is messed up. 4. Close the first nosy device and nosy_release will be called. In nosy_release, client A will be unlinked and freed. 5. Close the second nosy device, and client A will be referenced, resulting in UAF.
The root cause of this bug is that the element in the doubly linked list is reentered into the list.
Fix this bug by adding a check before inserting a client. If a client is already in the linked list, don't insert it.
The following KASAN report reveals it:
BUG: KASAN: use-after-free in nosy_release+0x1ea/0x210 Write of size 8 at addr ffff888102ad7360 by task poc CPU: 3 PID: 337 Comm: poc Not tainted 5.12.0-rc5+ #6 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Call Trace: nosy_release+0x1ea/0x210 __fput+0x1e2/0x840 task_work_run+0xe8/0x180 exit_to_user_mode_prepare+0x114/0x120 syscall_exit_to_user_mode+0x1d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae
Allocated by task 337: nosy_open+0x154/0x4d0 misc_open+0x2ec/0x410 chrdev_open+0x20d/0x5a0 do_dentry_open+0x40f/0xe80 path_openat+0x1cf9/0x37b0 do_filp_open+0x16d/0x390 do_sys_openat2+0x11d/0x360 __x64_sys_open+0xfd/0x1a0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae
Freed by task 337: kfree+0x8f/0x210 nosy_release+0x158/0x210 __fput+0x1e2/0x840 task_work_run+0xe8/0x180 exit_to_user_mode_prepare+0x114/0x120 syscall_exit_to_user_mode+0x1d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae
The buggy address belongs to the object at ffff888102ad7300 which belongs to the cache kmalloc-128 of size 128 The buggy address is located 96 bytes inside of 128-byte region [ffff888102ad7300, ffff888102ad7380)
[ Modified to use 'list_empty()' inside proper lock - Linus ]
Link: https://lore.kernel.org/lkml/1617433116-5930-1-git-send-email-zheyuma97@gmai... Reported-and-tested-by: 马哲宇 (Zheyu Ma) zheyuma97@gmail.com Signed-off-by: Zheyu Ma zheyuma97@gmail.com Cc: Greg Kroah-Hartman greg@kroah.com Cc: Stefan Richter stefanr@s5r6.in-berlin.de Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firewire/nosy.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/firewire/nosy.c b/drivers/firewire/nosy.c index 5fd6a60b6741..88ed971e32c0 100644 --- a/drivers/firewire/nosy.c +++ b/drivers/firewire/nosy.c @@ -346,6 +346,7 @@ nosy_ioctl(struct file *file, unsigned int cmd, unsigned long arg) struct client *client = file->private_data; spinlock_t *client_list_lock = &client->lynx->client_list_lock; struct nosy_stats stats; + int ret;
switch (cmd) { case NOSY_IOC_GET_STATS: @@ -360,11 +361,15 @@ nosy_ioctl(struct file *file, unsigned int cmd, unsigned long arg) return 0;
case NOSY_IOC_START: + ret = -EBUSY; spin_lock_irq(client_list_lock); - list_add_tail(&client->link, &client->lynx->client_list); + if (list_empty(&client->link)) { + list_add_tail(&client->link, &client->lynx->client_list); + ret = 0; + } spin_unlock_irq(client_list_lock);
- return 0; + return ret;
case NOSY_IOC_STOP: spin_lock_irq(client_list_lock);
From: Shuah Khan skhan@linuxfoundation.org
commit 1cc5ed25bdade86de2650a82b2730108a76de20c upstream.
Fix shift out-of-bounds in vhci_hub_control() SetPortFeature handling.
UBSAN: shift-out-of-bounds in drivers/usb/usbip/vhci_hcd.c:605:42 shift exponent 768 is too large for 32-bit type 'int'
Reported-by: syzbot+3dea30b047f41084de66@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Shuah Khan skhan@linuxfoundation.org Link: https://lore.kernel.org/r/20210324230654.34798-1-skhan@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/usbip/vhci_hcd.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/usb/usbip/vhci_hcd.c +++ b/drivers/usb/usbip/vhci_hcd.c @@ -594,6 +594,8 @@ static int vhci_hub_control(struct usb_h pr_err("invalid port number %d\n", wIndex); goto error; } + if (wValue >= 32) + goto error; if (hcd->speed == HCD_USB3) { if ((vhci_hcd->port_status[rhport] & USB_SS_PORT_STAT_POWER) != 0) {
From: Vincent Palatin vpalatin@chromium.org
commit 0bd860493f81eb2a46173f6f5e44cc38331c8dbd upstream.
This LTE modem (M.2 card) has a bug in its power management: there is some kind of race condition for U3 wake-up between the host and the device. The modem firmware sometimes crashes/locks when both events happen at the same time and the modem fully drops off the USB bus (and sometimes re-enumerates, sometimes just gets stuck until the next reboot).
Tested with the modem wired to the XHCI controller on an AMD 3015Ce platform. Without the patch, the modem dropped of the USB bus 5 times in 3 days. With the quirk, it stayed connected for a week while the 'runtime_suspended_time' counter incremented as excepted.
Signed-off-by: Vincent Palatin vpalatin@chromium.org Link: https://lore.kernel.org/r/20210319124802.2315195-1-vpalatin@chromium.org Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/core/quirks.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/usb/core/quirks.c +++ b/drivers/usb/core/quirks.c @@ -498,6 +498,10 @@ static const struct usb_device_id usb_qu /* DJI CineSSD */ { USB_DEVICE(0x2ca3, 0x0031), .driver_info = USB_QUIRK_NO_LPM },
+ /* Fibocom L850-GL LTE Modem */ + { USB_DEVICE(0x2cb7, 0x0007), .driver_info = + USB_QUIRK_IGNORE_REMOTE_WAKEUP }, + /* INTEL VALUE SSD */ { USB_DEVICE(0x8086, 0xf1a5), .driver_info = USB_QUIRK_RESET_RESUME },
From: Tony Lindgren tony@atomide.com
commit 92af4fc6ec331228aca322ca37c8aea7b150a151 upstream.
Pinephone running on Allwinner A64 fails to suspend with USB devices connected as reported by Bhushan Shah bshah@kde.org. Reverting commit 5fbf7a253470 ("usb: musb: fix idling for suspend after disconnect interrupt") fixes the issue.
Let's add suspend checks also for suspend after disconnect interrupt quirk handling like we already do elsewhere.
Fixes: 5fbf7a253470 ("usb: musb: fix idling for suspend after disconnect interrupt") Reported-by: Bhushan Shah bshah@kde.org Tested-by: Bhushan Shah bshah@kde.org Signed-off-by: Tony Lindgren tony@atomide.com Link: https://lore.kernel.org/r/20210324071142.42264-1-tony@atomide.com Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/musb/musb_core.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
--- a/drivers/usb/musb/musb_core.c +++ b/drivers/usb/musb/musb_core.c @@ -2004,10 +2004,14 @@ static void musb_pm_runtime_check_sessio MUSB_DEVCTL_HR; switch (devctl & ~s) { case MUSB_QUIRK_B_DISCONNECT_99: - musb_dbg(musb, "Poll devctl in case of suspend after disconnect\n"); - schedule_delayed_work(&musb->irq_work, - msecs_to_jiffies(1000)); - break; + if (musb->quirk_retries && !musb->flush_irq_work) { + musb_dbg(musb, "Poll devctl in case of suspend after disconnect\n"); + schedule_delayed_work(&musb->irq_work, + msecs_to_jiffies(1000)); + musb->quirk_retries--; + break; + } + fallthrough; case MUSB_QUIRK_B_INVALID_VBUS_91: if (musb->quirk_retries && !musb->flush_irq_work) { musb_dbg(musb,
From: Chunfeng Yun chunfeng.yun@mediatek.com
commit 6f978a30c9bb12dab1302d0f06951ee290f5e600 upstream.
The MediaTek 0.96 xHCI controller on some platforms does not support bulk stream even HCCPARAMS says supporting, due to MaxPSASize is set a default value 1 by mistake, here use XHCI_BROKEN_STREAMS quirk to fix it.
Fixes: 94a631d91ad3 ("usb: xhci-mtk: check hcc_params after adding primary hcd") Cc: stable stable@vger.kernel.org Signed-off-by: Chunfeng Yun chunfeng.yun@mediatek.com Link: https://lore.kernel.org/r/1616482975-17841-4-git-send-email-chunfeng.yun@med... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci-mtk.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/drivers/usb/host/xhci-mtk.c +++ b/drivers/usb/host/xhci-mtk.c @@ -397,6 +397,13 @@ static void xhci_mtk_quirks(struct devic xhci->quirks |= XHCI_SPURIOUS_SUCCESS; if (mtk->lpm_support) xhci->quirks |= XHCI_LPM_SUPPORT; + + /* + * MTK xHCI 0.96: PSA is 1 by default even if doesn't support stream, + * and it's 3 when support it. + */ + if (xhci->hci_version < 0x100 && HCC_MAX_PSA(xhci->hcc_params) == 4) + xhci->quirks |= XHCI_BROKEN_STREAMS; }
/* called during probe() after chip reset completes */ @@ -548,7 +555,8 @@ static int xhci_mtk_probe(struct platfor if (ret) goto put_usb3_hcd;
- if (HCC_MAX_PSA(xhci->hcc_params) >= 4) + if (HCC_MAX_PSA(xhci->hcc_params) >= 4 && + !(xhci->quirks & XHCI_BROKEN_STREAMS)) xhci->shared_hcd->can_do_streams = 1;
ret = usb_add_hcd(xhci->shared_hcd, irq, IRQF_SHARED);
From: Oliver Neukum oneukum@suse.com
commit 08dff274edda54310d6f1cf27b62fddf0f8d146e upstream.
Counting break events is nice but we should actually report them to the tty layer.
Fixes: 5a6a62bdb9257 ("cdc-acm: add TIOCMIWAIT") Signed-off-by: Oliver Neukum oneukum@suse.com Link: https://lore.kernel.org/r/20210311133714.31881-1-oneukum@suse.com Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/class/cdc-acm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -313,8 +313,10 @@ static void acm_process_notification(str acm->iocount.dsr++; if (difference & ACM_CTRL_DCD) acm->iocount.dcd++; - if (newctrl & ACM_CTRL_BRK) + if (newctrl & ACM_CTRL_BRK) { acm->iocount.brk++; + tty_insert_flip_char(&acm->port, 0, TTY_BREAK); + } if (newctrl & ACM_CTRL_RI) acm->iocount.rng++; if (newctrl & ACM_CTRL_FRAMING)
From: Oliver Neukum oneukum@suse.com
commit 6069e3e927c8fb3a1947b07d1a561644ea960248 upstream.
We have a cycle of callbacks scheduling works which submit URBs with thos callbacks. This needs to be blocked, stopped and unblocked to untangle the circle.
The issue leads to faults like:
[ 55.068392] Unable to handle kernel paging request at virtual address 6b6b6c03 [ 55.075624] pgd = be866494 [ 55.078335] [6b6b6c03] *pgd=00000000 [ 55.081924] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [ 55.087238] Modules linked in: ppp_async crc_ccitt ppp_generic slhc xt_TCPMSS xt_tcpmss xt_hl nf_log_ipv6 nf_log_ipv4 nf_log_common xt_policy xt_limit xt_conntrack xt_tcpudp xt_pkttype ip6table_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle ip6table_filter ip6_tables iptable_filter ip_tables des_generic md5 sch_fq_codel cdc_mbim cdc_wdm cdc_ncm usbnet mii cdc_acm usb_storage ip_tunnel xfrm_user xfrm6_tunnel tunnel6 xfrm4_tunnel tunnel4 esp6 esp4 ah6 ah4 xfrm_algo xt_LOG xt_LED xt_comment x_tables ipv6 [ 55.134954] CPU: 0 PID: 82 Comm: kworker/0:2 Tainted: G T 5.8.17 #1 [ 55.142526] Hardware name: Freescale i.MX7 Dual (Device Tree) [ 55.148304] Workqueue: events acm_softint [cdc_acm] [ 55.153196] PC is at kobject_get+0x10/0xa4 [ 55.157302] LR is at usb_get_dev+0x14/0x1c [ 55.161402] pc : [<8047c06c>] lr : [<80560448>] psr: 20000193 [ 55.167671] sp : bca39ea8 ip : 00007374 fp : bf6cbd80 [ 55.172899] r10: 00000000 r9 : bdd92284 r8 : bdd92008 [ 55.178128] r7 : 6b6b6b6b r6 : fffffffe r5 : 60000113 r4 : 6b6b6be3 [ 55.184658] r3 : 6b6b6b6b r2 : 00000111 r1 : 00000000 r0 : 6b6b6be3 [ 55.191191] Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none [ 55.198417] Control: 10c5387d Table: bcf0c06a DAC: 00000051 [ 55.204168] Process kworker/0:2 (pid: 82, stack limit = 0x9bdd2a89) [ 55.210439] Stack: (0xbca39ea8 to 0xbca3a000) [ 55.214805] 9ea0: bf6cbd80 80769a50 6b6b6b6b 80560448 bdeb0500 8056bfe8 [ 55.222991] 9ec0: 00000002 b76da000 00000000 bdeb0500 bdd92448 bca38000 bdeb0510 8056d69c [ 55.231177] 9ee0: bca38000 00000000 80c050fc 00000000 bca39f44 09d42015 00000000 00000001 [ 55.239363] 9f00: bdd92448 bdd92438 bdd92000 7f1158c4 bdd92448 bca2ee00 bf6cbd80 bf6cef00 [ 55.247549] 9f20: 00000000 00000000 00000000 801412d8 bf6cbd98 80c03d00 bca2ee00 bf6cbd80 [ 55.255735] 9f40: bca2ee14 bf6cbd98 80c03d00 00000008 bca38000 80141568 00000000 80c446ae [ 55.263921] 9f60: 00000000 bc9ed880 bc9f0700 bca38000 bc117eb4 80141524 bca2ee00 bc9ed8a4 [ 55.272107] 9f80: 00000000 80147cc8 00000000 bc9f0700 80147b84 00000000 00000000 00000000 [ 55.280292] 9fa0: 00000000 00000000 00000000 80100148 00000000 00000000 00000000 00000000 [ 55.288477] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 55.296662] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 [ 55.304860] [<8047c06c>] (kobject_get) from [<80560448>] (usb_get_dev+0x14/0x1c) [ 55.312271] [<80560448>] (usb_get_dev) from [<8056bfe8>] (usb_hcd_unlink_urb+0x50/0xd8) [ 55.320286] [<8056bfe8>] (usb_hcd_unlink_urb) from [<8056d69c>] (usb_kill_urb.part.0+0x44/0xd0) [ 55.329004] [<8056d69c>] (usb_kill_urb.part.0) from [<7f1158c4>] (acm_softint+0x4c/0x10c [cdc_acm]) [ 55.338082] [<7f1158c4>] (acm_softint [cdc_acm]) from [<801412d8>] (process_one_work+0x19c/0x3e8) [ 55.346969] [<801412d8>] (process_one_work) from [<80141568>] (worker_thread+0x44/0x4dc) [ 55.355072] [<80141568>] (worker_thread) from [<80147cc8>] (kthread+0x144/0x180) [ 55.362481] [<80147cc8>] (kthread) from [<80100148>] (ret_from_fork+0x14/0x2c) [ 55.369706] Exception stack(0xbca39fb0 to 0xbca39ff8)
Tested-by: Bruno Thomsen bruno.thomsen@gmail.com Signed-off-by: Oliver Neukum oneukum@suse.com Cc: stable stable@vger.kernel.org Link: https://lore.kernel.org/r/20210311130126.15972-1-oneukum@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/class/cdc-acm.c | 48 +++++++++++++++++++++++++++++--------------- 1 file changed, 32 insertions(+), 16 deletions(-)
--- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -147,17 +147,29 @@ static inline int acm_set_control(struct #define acm_send_break(acm, ms) \ acm_ctrl_msg(acm, USB_CDC_REQ_SEND_BREAK, ms, NULL, 0)
-static void acm_kill_urbs(struct acm *acm) +static void acm_poison_urbs(struct acm *acm) { int i;
- usb_kill_urb(acm->ctrlurb); + usb_poison_urb(acm->ctrlurb); for (i = 0; i < ACM_NW; i++) - usb_kill_urb(acm->wb[i].urb); + usb_poison_urb(acm->wb[i].urb); for (i = 0; i < acm->rx_buflimit; i++) - usb_kill_urb(acm->read_urbs[i]); + usb_poison_urb(acm->read_urbs[i]); +} + +static void acm_unpoison_urbs(struct acm *acm) +{ + int i; + + for (i = 0; i < acm->rx_buflimit; i++) + usb_unpoison_urb(acm->read_urbs[i]); + for (i = 0; i < ACM_NW; i++) + usb_unpoison_urb(acm->wb[i].urb); + usb_unpoison_urb(acm->ctrlurb); }
+ /* * Write buffer management. * All of these assume proper locks taken by the caller. @@ -226,9 +238,10 @@ static int acm_start_wb(struct acm *acm,
rc = usb_submit_urb(wb->urb, GFP_ATOMIC); if (rc < 0) { - dev_err(&acm->data->dev, - "%s - usb_submit_urb(write bulk) failed: %d\n", - __func__, rc); + if (rc != -EPERM) + dev_err(&acm->data->dev, + "%s - usb_submit_urb(write bulk) failed: %d\n", + __func__, rc); acm_write_done(acm, wb); } return rc; @@ -482,11 +495,6 @@ static void acm_read_bulk_callback(struc dev_vdbg(&acm->data->dev, "got urb %d, len %d, status %d\n", rb->index, urb->actual_length, status);
- if (!acm->dev) { - dev_dbg(&acm->data->dev, "%s - disconnected\n", __func__); - return; - } - switch (status) { case 0: usb_mark_last_busy(acm->dev); @@ -733,6 +741,7 @@ static void acm_port_shutdown(struct tty * Need to grab write_lock to prevent race with resume, but no need to * hold it due to the tty-port initialised flag. */ + acm_poison_urbs(acm); spin_lock_irq(&acm->write_lock); spin_unlock_irq(&acm->write_lock);
@@ -749,7 +758,8 @@ static void acm_port_shutdown(struct tty usb_autopm_put_interface_async(acm->control); }
- acm_kill_urbs(acm); + acm_unpoison_urbs(acm); + }
static void acm_tty_cleanup(struct tty_struct *tty) @@ -1542,8 +1552,14 @@ static void acm_disconnect(struct usb_in if (!acm) return;
- mutex_lock(&acm->mutex); acm->disconnected = true; + /* + * there is a circular dependency. acm_softint() can resubmit + * the URBs in error handling so we need to block any + * submission right away + */ + acm_poison_urbs(acm); + mutex_lock(&acm->mutex); if (acm->country_codes) { device_remove_file(&acm->control->dev, &dev_attr_wCountryCodes); @@ -1562,7 +1578,6 @@ static void acm_disconnect(struct usb_in tty_kref_put(tty); }
- acm_kill_urbs(acm); cancel_delayed_work_sync(&acm->dwork);
tty_unregister_device(acm_tty_driver, acm->minor); @@ -1604,7 +1619,7 @@ static int acm_suspend(struct usb_interf if (cnt) return 0;
- acm_kill_urbs(acm); + acm_poison_urbs(acm); cancel_delayed_work_sync(&acm->dwork); acm->urbs_in_error_delay = 0;
@@ -1617,6 +1632,7 @@ static int acm_resume(struct usb_interfa struct urb *urb; int rv = 0;
+ acm_unpoison_urbs(acm); spin_lock_irq(&acm->write_lock);
if (--acm->susp_count)
From: Oliver Neukum oneukum@suse.com
commit e4c77070ad45fc940af1d7fb1e637c349e848951 upstream.
This failure is so common that logging an error here amounts to spamming log files.
Reviewed-by: Bruno Thomsen bruno.thomsen@gmail.com Signed-off-by: Oliver Neukum oneukum@suse.com Cc: stable stable@vger.kernel.org Link: https://lore.kernel.org/r/20210311130126.15972-2-oneukum@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/class/cdc-acm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -659,7 +659,8 @@ static void acm_port_dtr_rts(struct tty_
res = acm_set_control(acm, val); if (res && (acm->ctrl_caps & USB_CDC_CAP_LINE)) - dev_err(&acm->control->dev, "failed to set dtr/rts\n"); + /* This is broken in too many devices to spam the logs */ + dev_dbg(&acm->control->dev, "failed to set dtr/rts\n"); }
static int acm_port_activate(struct tty_port *port, struct tty_struct *tty)
From: Johan Hovold johan@kernel.org
commit 7180495cb3d0e2a2860d282a468b4146c21da78f upstream.
If tty-device registration fails the driver copy of any Country Selection functional descriptor would end up being freed twice; first explicitly in the error path and then again in the tty-port destructor.
Drop the first erroneous free that was left when fixing a tty-port resource leak.
Fixes: cae2bc768d17 ("usb: cdc-acm: Decrement tty port's refcount if probe() fail") Cc: stable@vger.kernel.org # 4.19 Cc: Jaejoong Kim climbbb.kim@gmail.com Acked-by: Oliver Neukum oneukum@suse.com Signed-off-by: Johan Hovold johan@kernel.org Link: https://lore.kernel.org/r/20210322155318.9837-2-johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/class/cdc-acm.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -1521,7 +1521,6 @@ alloc_fail6: &dev_attr_wCountryCodes); device_remove_file(&acm->control->dev, &dev_attr_iCountryCodeRelDate); - kfree(acm->country_codes); } device_remove_file(&acm->control->dev, &dev_attr_bmCapabilities); alloc_fail5:
From: Johan Hovold johan@kernel.org
commit 4e49bf376c0451ad2eae2592e093659cde12be9a upstream.
If tty-device registration fails the driver would fail to release the data interface. When the device is later disconnected, the disconnect callback would still be called for the data interface and would go about releasing already freed resources.
Fixes: c93d81955005 ("usb: cdc-acm: fix error handling in acm_probe()") Cc: stable@vger.kernel.org # 3.9 Cc: Alexey Khoroshilov khoroshilov@ispras.ru Acked-by: Oliver Neukum oneukum@suse.com Signed-off-by: Johan Hovold johan@kernel.org Link: https://lore.kernel.org/r/20210322155318.9837-3-johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/class/cdc-acm.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -1516,6 +1516,11 @@ skip_countries:
return 0; alloc_fail6: + if (!acm->combined_interfaces) { + /* Clear driver data so that disconnect() returns early. */ + usb_set_intfdata(data_interface, NULL); + usb_driver_release_interface(&acm_driver, data_interface); + } if (acm->country_codes) { device_remove_file(&acm->control->dev, &dev_attr_wCountryCodes);
From: Tong Zhang ztong0001@gmail.com
commit 72035f4954f0bca2d8c47cf31b3629c42116f5b7 upstream.
init_dma_pools() calls dma_pool_create(...dev->dev) to create dma pool. however, dev->dev is actually set after calling init_dma_pools(), which effectively makes dma_pool_create(..NULL) and cause crash. To fix this issue, init dma only after dev->dev is set.
[ 1.317993] RIP: 0010:dma_pool_create+0x83/0x290 [ 1.323257] Call Trace: [ 1.323390] ? pci_write_config_word+0x27/0x30 [ 1.323626] init_dma_pools+0x41/0x1a0 [snps_udc_core] [ 1.323899] udc_pci_probe+0x202/0x2b1 [amd5536udc_pci]
Fixes: 7c51247a1f62 (usb: gadget: udc: Provide correct arguments for 'dma_pool_create') Cc: stable stable@vger.kernel.org Signed-off-by: Tong Zhang ztong0001@gmail.com Link: https://lore.kernel.org/r/20210317230400.357756-1-ztong0001@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/gadget/udc/amd5536udc_pci.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/drivers/usb/gadget/udc/amd5536udc_pci.c +++ b/drivers/usb/gadget/udc/amd5536udc_pci.c @@ -153,6 +153,11 @@ static int udc_pci_probe( pci_set_master(pdev); pci_try_set_mwi(pdev);
+ dev->phys_addr = resource; + dev->irq = pdev->irq; + dev->pdev = pdev; + dev->dev = &pdev->dev; + /* init dma pools */ if (use_dma) { retval = init_dma_pools(dev); @@ -160,11 +165,6 @@ static int udc_pci_probe( goto err_dma; }
- dev->phys_addr = resource; - dev->irq = pdev->irq; - dev->pdev = pdev; - dev->dev = &pdev->dev; - /* general probing */ if (udc_probe(dev)) { retval = -ENODEV;
From: Artur Petrosyan Arthur.Petrosyan@synopsys.com
commit 5e3bbae8ee3d677a0aa2919dc62b5c60ea01ba61 upstream.
Increased the waiting timeout for HPRT0.PrtSusp register field to be set, because on HiKey 960 board HPRT0.PrtSusp wasn't generated with the existing timeout.
Cc: stable@vger.kernel.org # 4.18 Fixes: 22bb5cfdf13a ("usb: dwc2: Fix host exit from hibernation flow.") Signed-off-by: Artur Petrosyan Arthur.Petrosyan@synopsys.com Acked-by: Minas Harutyunyan Minas.Harutyunyan@synopsys.com Link: https://lore.kernel.org/r/20210326102447.8F7FEA005D@mailhost.synopsys.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc2/hcd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/usb/dwc2/hcd.c +++ b/drivers/usb/dwc2/hcd.c @@ -5398,7 +5398,7 @@ int dwc2_host_enter_hibernation(struct d dwc2_writel(hsotg, hprt0, HPRT0);
/* Wait for the HPRT0.PrtSusp register field to be set */ - if (dwc2_hsotg_wait_bit_set(hsotg, HPRT0, HPRT0_SUSP, 3000)) + if (dwc2_hsotg_wait_bit_set(hsotg, HPRT0, HPRT0_SUSP, 5000)) dev_warn(hsotg->dev, "Suspend wasn't generated\n");
/*
From: Artur Petrosyan Arthur.Petrosyan@synopsys.com
commit 93f672804bf2d7a49ef3fd96827ea6290ca1841e upstream.
In host mode port connection status flag is "0" when loading the driver. After loading the driver system asserts suspend which is handled by "_dwc2_hcd_suspend()" function. Before the system suspend the port connection status is "0". As result need to check the "port_connect_status" if it is "0", then skipping entering to suspend.
Cc: stable@vger.kernel.org # 5.2 Fixes: 6f6d70597c15 ("usb: dwc2: bus suspend/resume for hosts with DWC2_POWER_DOWN_PARAM_NONE") Signed-off-by: Artur Petrosyan Arthur.Petrosyan@synopsys.com Link: https://lore.kernel.org/r/20210326102510.BDEDEA005D@mailhost.synopsys.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc2/hcd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/usb/dwc2/hcd.c +++ b/drivers/usb/dwc2/hcd.c @@ -4322,7 +4322,8 @@ static int _dwc2_hcd_suspend(struct usb_ if (hsotg->op_state == OTG_STATE_B_PERIPHERAL) goto unlock;
- if (hsotg->params.power_down > DWC2_POWER_DOWN_PARAM_PARTIAL) + if (hsotg->params.power_down != DWC2_POWER_DOWN_PARAM_PARTIAL || + hsotg->flags.b.port_connect_status == 0) goto skip_power_saving;
/*
From: Shawn Guo shawn.guo@linaro.org
commit 5e4010e36a58978e42b2ee13739ff9b50209c830 upstream.
The ACPI probe starts failing since commit bea46b981515 ("usb: dwc3: qcom: Add interconnect support in dwc3 driver"), because there is no interconnect support for ACPI, and of_icc_get() call in dwc3_qcom_interconnect_init() will just return -EINVAL.
Fix the problem by skipping interconnect init for ACPI probe, and then the NULL icc_path_ddr will simply just scheild all ICC calls.
Fixes: bea46b981515 ("usb: dwc3: qcom: Add interconnect support in dwc3 driver") Signed-off-by: Shawn Guo shawn.guo@linaro.org Cc: stable stable@vger.kernel.org Link: https://lore.kernel.org/r/20210311060318.25418-1-shawn.guo@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc3/dwc3-qcom.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/usb/dwc3/dwc3-qcom.c +++ b/drivers/usb/dwc3/dwc3-qcom.c @@ -244,6 +244,9 @@ static int dwc3_qcom_interconnect_init(s struct device *dev = qcom->dev; int ret;
+ if (has_acpi_companion(dev)) + return 0; + qcom->icc_path_ddr = of_icc_get(dev, "usb-ddr"); if (IS_ERR(qcom->icc_path_ddr)) { dev_err(dev, "failed to get usb-ddr path: %ld\n",
From: Wesley Cheng wcheng@codeaurora.org
commit 5aef629704ad4d983ecf5c8a25840f16e45b6d59 upstream.
Ensure that dep->flags are cleared until after stop active transfers is completed. Otherwise, the ENDXFER command will not be executed during ep disable.
Fixes: f09ddcfcb8c5 ("usb: dwc3: gadget: Prevent EP queuing while stopping transfers") Cc: stable stable@vger.kernel.org Reported-and-tested-by: Andy Shevchenko andy.shevchenko@gmail.com Tested-by: Marek Szyprowski m.szyprowski@samsung.com Signed-off-by: Wesley Cheng wcheng@codeaurora.org Link: https://lore.kernel.org/r/1616610664-16495-1-git-send-email-wcheng@codeauror... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc3/gadget.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -791,10 +791,6 @@ static int __dwc3_gadget_ep_disable(stru reg &= ~DWC3_DALEPENA_EP(dep->number); dwc3_writel(dwc->regs, DWC3_DALEPENA, reg);
- dep->stream_capable = false; - dep->type = 0; - dep->flags = 0; - /* Clear out the ep descriptors for non-ep0 */ if (dep->number > 1) { dep->endpoint.comp_desc = NULL; @@ -803,6 +799,10 @@ static int __dwc3_gadget_ep_disable(stru
dwc3_remove_requests(dwc, dep);
+ dep->stream_capable = false; + dep->type = 0; + dep->flags = 0; + return 0; }
From: Roja Rani Yarubandi rojay@codeaurora.org
commit 29d96eb261345c8d888e248ae79484e681be2faa upstream.
This reverts commit 048eb908a1f2 ("soc: qcom-geni-se: Add interconnect support to fix earlycon crash")
ICC core and platforms drivers supports sync_state feature, which ensures that the default ICC BW votes from the bootloader is not removed until all it's consumers are probes.
The proxy votes were needed in case other QUP child drivers I2C, SPI probes before UART, they can turn off the QUP-CORE clock which is shared resources for all QUP driver, this causes unclocked access to HW from earlycon.
Given above support from ICC there is no longer need to maintain proxy votes on QUP-CORE ICC node from QUP wrapper driver for early console usecase, the default votes won't be removed until real console is probed.
Cc: stable@vger.kernel.org Fixes: 266cd33b5913 ("interconnect: qcom: Ensure that the floor bandwidth value is enforced") Fixes: 7d3b0b0d8184 ("interconnect: qcom: Use icc_sync_state") Signed-off-by: Roja Rani Yarubandi rojay@codeaurora.org Signed-off-by: Akash Asthana akashast@codeaurora.org Reviewed-by: Matthias Kaehlcke mka@chromium.org Link: https://lore.kernel.org/r/20210324101836.25272-2-rojay@codeaurora.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/soc/qcom/qcom-geni-se.c | 74 ---------------------------------- drivers/tty/serial/qcom_geni_serial.c | 7 --- include/linux/qcom-geni-se.h | 2 3 files changed, 83 deletions(-)
--- a/drivers/soc/qcom/qcom-geni-se.c +++ b/drivers/soc/qcom/qcom-geni-se.c @@ -3,7 +3,6 @@
#include <linux/acpi.h> #include <linux/clk.h> -#include <linux/console.h> #include <linux/slab.h> #include <linux/dma-mapping.h> #include <linux/io.h> @@ -92,14 +91,11 @@ struct geni_wrapper { struct device *dev; void __iomem *base; struct clk_bulk_data ahb_clks[NUM_AHB_CLKS]; - struct geni_icc_path to_core; };
static const char * const icc_path_names[] = {"qup-core", "qup-config", "qup-memory"};
-static struct geni_wrapper *earlycon_wrapper; - #define QUP_HW_VER_REG 0x4
/* Common SE registers */ @@ -843,44 +839,11 @@ int geni_icc_disable(struct geni_se *se) } EXPORT_SYMBOL(geni_icc_disable);
-void geni_remove_earlycon_icc_vote(void) -{ - struct platform_device *pdev; - struct geni_wrapper *wrapper; - struct device_node *parent; - struct device_node *child; - - if (!earlycon_wrapper) - return; - - wrapper = earlycon_wrapper; - parent = of_get_next_parent(wrapper->dev->of_node); - for_each_child_of_node(parent, child) { - if (!of_device_is_compatible(child, "qcom,geni-se-qup")) - continue; - - pdev = of_find_device_by_node(child); - if (!pdev) - continue; - - wrapper = platform_get_drvdata(pdev); - icc_put(wrapper->to_core.path); - wrapper->to_core.path = NULL; - - } - of_node_put(parent); - - earlycon_wrapper = NULL; -} -EXPORT_SYMBOL(geni_remove_earlycon_icc_vote); - static int geni_se_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; struct resource *res; struct geni_wrapper *wrapper; - struct console __maybe_unused *bcon; - bool __maybe_unused has_earlycon = false; int ret;
wrapper = devm_kzalloc(dev, sizeof(*wrapper), GFP_KERNEL); @@ -903,43 +866,6 @@ static int geni_se_probe(struct platform } }
-#ifdef CONFIG_SERIAL_EARLYCON - for_each_console(bcon) { - if (!strcmp(bcon->name, "qcom_geni")) { - has_earlycon = true; - break; - } - } - if (!has_earlycon) - goto exit; - - wrapper->to_core.path = devm_of_icc_get(dev, "qup-core"); - if (IS_ERR(wrapper->to_core.path)) - return PTR_ERR(wrapper->to_core.path); - /* - * Put minmal BW request on core clocks on behalf of early console. - * The vote will be removed earlycon exit function. - * - * Note: We are putting vote on each QUP wrapper instead only to which - * earlycon is connected because QUP core clock of different wrapper - * share same voltage domain. If core1 is put to 0, then core2 will - * also run at 0, if not voted. Default ICC vote will be removed ASA - * we touch any of the core clock. - * core1 = core2 = max(core1, core2) - */ - ret = icc_set_bw(wrapper->to_core.path, GENI_DEFAULT_BW, - GENI_DEFAULT_BW); - if (ret) { - dev_err(&pdev->dev, "%s: ICC BW voting failed for core: %d\n", - __func__, ret); - return ret; - } - - if (of_get_compatible_child(pdev->dev.of_node, "qcom,geni-debug-uart")) - earlycon_wrapper = wrapper; - of_node_put(pdev->dev.of_node); -exit: -#endif dev_set_drvdata(dev, wrapper); dev_dbg(dev, "GENI SE Driver probed\n"); return devm_of_platform_populate(dev); --- a/drivers/tty/serial/qcom_geni_serial.c +++ b/drivers/tty/serial/qcom_geni_serial.c @@ -1177,12 +1177,6 @@ static inline void qcom_geni_serial_enab struct console *con) { } #endif
-static int qcom_geni_serial_earlycon_exit(struct console *con) -{ - geni_remove_earlycon_icc_vote(); - return 0; -} - static struct qcom_geni_private_data earlycon_private_data;
static int __init qcom_geni_serial_earlycon_setup(struct earlycon_device *dev, @@ -1233,7 +1227,6 @@ static int __init qcom_geni_serial_early writel(stop_bit_len, uport->membase + SE_UART_TX_STOP_BIT_LEN);
dev->con->write = qcom_geni_serial_earlycon_write; - dev->con->exit = qcom_geni_serial_earlycon_exit; dev->con->setup = NULL; qcom_geni_serial_enable_early_read(&se, dev->con);
--- a/include/linux/qcom-geni-se.h +++ b/include/linux/qcom-geni-se.h @@ -460,7 +460,5 @@ void geni_icc_set_tag(struct geni_se *se int geni_icc_enable(struct geni_se *se);
int geni_icc_disable(struct geni_se *se); - -void geni_remove_earlycon_icc_vote(void); #endif #endif
From: Atul Gopinathan atulgopinathan@gmail.com
commit 72ad25fbbb78930f892b191637359ab5b94b3190 upstream.
The variable "info_element" is of the following type:
struct rtllib_info_element *info_element
defined in drivers/staging/rtl8192e/rtllib.h:
struct rtllib_info_element { u8 id; u8 len; u8 data[]; } __packed;
The "len" field defines the size of the "data[]" array. The code is supposed to check if "info_element->len" is greater than 4 and later equal to 6. If this is satisfied then, the last two bytes (the 4th and 5th element of u8 "data[]" array) are copied into "network->CcxRmState".
Right now the code uses "memcpy()" with the source as "&info_element[4]" which would copy in wrong and unintended information. The struct "rtllib_info_element" has a size of 2 bytes for "id" and "len", therefore indexing will be done in interval of 2 bytes. So, "info_element[4]" would point to data which is beyond the memory allocated for this pointer (that is, at x+8, while "info_element" has been allocated only from x to x+7 (2 + 6 => 8 bytes)).
This patch rectifies this error by using "&info_element->data[4]" which correctly copies the last two bytes of "data[]".
NOTE: The faulty line of code came from the following commit:
commit ecdfa44610fa ("Staging: add Realtek 8192 PCI wireless driver")
The above commit created the file `rtl8192e/ieee80211/ieee80211_rx.c` which had the faulty line of code. This file has been deleted (or possibly renamed) with the contents copied in to a new file `rtl8192e/rtllib_rx.c` along with additional code in the commit 94a799425eee (tagged in Fixes).
Fixes: 94a799425eee ("From: wlanfae wlanfae@realtek.com [PATCH 1/8] rtl8192e: Import new version of driver from realtek") Cc: stable@vger.kernel.org Signed-off-by: Atul Gopinathan atulgopinathan@gmail.com Link: https://lore.kernel.org/r/20210323113413.29179-1-atulgopinathan@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/staging/rtl8192e/rtllib_rx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/staging/rtl8192e/rtllib_rx.c +++ b/drivers/staging/rtl8192e/rtllib_rx.c @@ -1968,7 +1968,7 @@ static void rtllib_parse_mife_generic(st info_element->data[2] == 0x96 && info_element->data[3] == 0x01) { if (info_element->len == 6) { - memcpy(network->CcxRmState, &info_element[4], 2); + memcpy(network->CcxRmState, &info_element->data[4], 2); if (network->CcxRmState[0] != 0) network->bCcxRmEnable = true; else
From: Atul Gopinathan atulgopinathan@gmail.com
commit e78836ae76d20f38eed8c8c67f21db97529949da upstream.
The "u16 CcxRmState[2];" array field in struct "rtllib_network" has 4 bytes in total while the operations performed on this array through-out the code base are only 2 bytes.
The "CcxRmState" field is fed only 2 bytes of data using memcpy():
(In rtllib_rx.c:1972) memcpy(network->CcxRmState, &info_element->data[4], 2)
With "info_element->data[]" being a u8 array, if 2 bytes are written into "CcxRmState" (whose one element is u16 size), then the 2 u8 elements from "data[]" gets squashed and written into the first element ("CcxRmState[0]") while the second element ("CcxRmState[1]") is never fed with any data.
Same in file rtllib_rx.c:2522: memcpy(dst->CcxRmState, src->CcxRmState, 2);
The above line duplicates "src" data to "dst" but only writes 2 bytes (and not 4, which is the actual size). Again, only 1st element gets the value while the 2nd element remains uninitialized.
This later makes operations done with CcxRmState unpredictable in the following lines as the 1st element is having a squashed number while the 2nd element is having an uninitialized random number.
rtllib_rx.c:1973: if (network->CcxRmState[0] != 0) rtllib_rx.c:1977: network->MBssidMask = network->CcxRmState[1] & 0x07;
network->MBssidMask is also of type u8 and not u16.
Fix this by changing the type of "CcxRmState" from u16 to u8 so that the data written into this array and read from it make sense and are not random values.
NOTE: The wrong initialization of "CcxRmState" can be seen in the following commit:
commit ecdfa44610fa ("Staging: add Realtek 8192 PCI wireless driver")
The above commit created a file `rtl8192e/ieee80211.h` which used to have the faulty line. The file has been deleted (or possibly renamed) with the contents copied in to a new file `rtl8192e/rtllib.h` along with additional code in the commit 94a799425eee (tagged in Fixes).
Fixes: 94a799425eee ("From: wlanfae wlanfae@realtek.com [PATCH 1/8] rtl8192e: Import new version of driver from realtek") Cc: stable@vger.kernel.org Signed-off-by: Atul Gopinathan atulgopinathan@gmail.com Link: https://lore.kernel.org/r/20210323113413.29179-2-atulgopinathan@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/staging/rtl8192e/rtllib.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/staging/rtl8192e/rtllib.h +++ b/drivers/staging/rtl8192e/rtllib.h @@ -1105,7 +1105,7 @@ struct rtllib_network { bool bWithAironetIE; bool bCkipSupported; bool bCcxRmEnable; - u16 CcxRmState[2]; + u8 CcxRmState[2]; bool bMBssidValid; u8 MBssidMask; u8 MBssid[ETH_ALEN];
From: Ahmad Fatoum a.fatoum@pengutronix.de
commit f0acf637d60ffcef3ccb6e279f743e587b3c7359 upstream.
When retrying a deferred probe, any old defer reason string should be discarded. Otherwise, if the probe is deferred again at a different spot, but without setting a message, the now incorrect probe reason will remain.
This was observed with the i.MX I2C driver, which ultimately failed to probe due to lack of the GPIO driver. The probe defer for GPIO doesn't record a message, but a previous probe defer to clock_get did. This had the effect that /sys/kernel/debug/devices_deferred listed a misleading probe deferral reason.
Cc: stable stable@vger.kernel.org Fixes: d090b70ede02 ("driver core: add deferring probe reason to devices_deferred property") Reviewed-by: Andy Shevchenko andy.shevchenko@gmail.com Reviewed-by: Andrzej Hajda a.hajda@samsung.com Signed-off-by: Ahmad Fatoum a.fatoum@pengutronix.de Link: https://lore.kernel.org/r/20210319110459.19966-1-a.fatoum@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/base/dd.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -97,6 +97,9 @@ static void deferred_probe_work_func(str
get_device(dev);
+ kfree(dev->p->deferred_probe_reason); + dev->p->deferred_probe_reason = NULL; + /* * Drop the mutex while probing each device; the probe path may * manipulate the deferred list
From: Du Cheng ducheng2@gmail.com
commit 01faae5193d6190b7b3aa93dae43f514e866d652 upstream.
add null-check on function pointer before dereference on ops->cursor
Reported-by: syzbot+b67aaae8d3a927f68d20@syzkaller.appspotmail.com Cc: stable stable@vger.kernel.org Signed-off-by: Du Cheng ducheng2@gmail.com Link: https://lore.kernel.org/r/20210312081421.452405-1-ducheng2@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/video/fbdev/core/fbcon.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/video/fbdev/core/fbcon.c +++ b/drivers/video/fbdev/core/fbcon.c @@ -1341,6 +1341,9 @@ static void fbcon_cursor(struct vc_data
ops->cursor_flash = (mode == CM_ERASE) ? 0 : 1;
+ if (!ops->cursor) + return; + ops->cursor(vc, info, mode, get_color(vc, info, c, 1), get_color(vc, info, c, 0)); }
From: Ben Dooks ben.dooks@codethink.co.uk
commit 285a76bb2cf51b0c74c634f2aaccdb93e1f2a359 upstream.
The <asm/uaccess.h> header has a problem with put_user(a, ptr) if the 'a' is not a simple variable, such as a function. This can lead to the compiler producing code as so:
1: enable_user_access() 2: evaluate 'a' into register 'r' 3: put 'r' to 'ptr' 4: disable_user_acess()
The issue is that 'a' is now being evaluated with the user memory protections disabled. So we try and force the evaulation by assigning 'x' to __val at the start, and hoping the compiler barriers in enable_user_access() do the job of ordering step 2 before step 1.
This has shown up in a bug where 'a' sleeps and thus schedules out and loses the SR_SUM flag. This isn't sufficient to fully fix, but should reduce the window of opportunity. The first instance of this we found is in scheudle_tail() where the code does:
$ less -N kernel/sched/core.c
4263 if (current->set_child_tid) 4264 put_user(task_pid_vnr(current), current->set_child_tid);
Here, the task_pid_vnr(current) is called within the block that has enabled the user memory access. This can be made worse with KASAN which makes task_pid_vnr() a rather large call with plenty of opportunity to sleep.
Signed-off-by: Ben Dooks ben.dooks@codethink.co.uk Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com Suggested-by: Arnd Bergman arnd@arndb.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
-- Changes since v1: - fixed formatting and updated the patch description with more info
Changes since v2: - fixed commenting on __put_user() (schwab@linux-m68k.org)
Change since v3: - fixed RFC in patch title. Should be ready to merge.
Signed-off-by: Palmer Dabbelt palmerdabbelt@google.com --- arch/riscv/include/asm/uaccess.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/arch/riscv/include/asm/uaccess.h +++ b/arch/riscv/include/asm/uaccess.h @@ -306,7 +306,9 @@ do { \ * data types like structures or arrays. * * @ptr must have pointer-to-simple-variable type, and @x must be assignable - * to the result of dereferencing @ptr. + * to the result of dereferencing @ptr. The value of @x is copied to avoid + * re-ordering where @x is evaluated inside the block that enables user-space + * access (thus bypassing user space protection if @x is a function). * * Caller must check the pointer with access_ok() before calling this * function. @@ -316,12 +318,13 @@ do { \ #define __put_user(x, ptr) \ ({ \ __typeof__(*(ptr)) __user *__gu_ptr = (ptr); \ + __typeof__(*__gu_ptr) __val = (x); \ long __pu_err = 0; \ \ __chk_user_ptr(__gu_ptr); \ \ __enable_user_access(); \ - __put_user_nocheck(x, __gu_ptr, __pu_err); \ + __put_user_nocheck(__val, __gu_ptr, __pu_err); \ __disable_user_access(); \ \ __pu_err; \
From: Pavel Begunkov asml.silence@gmail.com
commit a185f1db59f13de73aa470559030e90e50b34d93 upstream.
WARNING: CPU: 1 PID: 27907 at fs/io_uring.c:7147 io_sq_thread_park+0xb5/0xd0 fs/io_uring.c:7147 CPU: 1 PID: 27907 Comm: iou-sqp-27905 Not tainted 5.12.0-rc4-syzkaller #0 RIP: 0010:io_sq_thread_park+0xb5/0xd0 fs/io_uring.c:7147 Call Trace: io_ring_ctx_wait_and_kill+0x214/0x700 fs/io_uring.c:8619 io_uring_release+0x3e/0x50 fs/io_uring.c:8646 __fput+0x288/0x920 fs/file_table.c:280 task_work_run+0xdd/0x1a0 kernel/task_work.c:140 io_run_task_work fs/io_uring.c:2238 [inline] io_run_task_work fs/io_uring.c:2228 [inline] io_uring_try_cancel_requests+0x8ec/0xc60 fs/io_uring.c:8770 io_uring_cancel_sqpoll+0x1cf/0x290 fs/io_uring.c:8974 io_sqpoll_cancel_cb+0x87/0xb0 fs/io_uring.c:8907 io_run_task_work_head+0x58/0xb0 fs/io_uring.c:1961 io_sq_thread+0x3e2/0x18d0 fs/io_uring.c:6763 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
May happen that last ctx ref is killed in io_uring_cancel_sqpoll(), so fput callback (i.e. io_uring_release()) is enqueued through task_work, and run by same cancellation. As it's deeply nested we can't do parking or taking sqd->lock there, because its state is unclear. So avoid ctx ejection from sqd list from io_ring_ctx_wait_and_kill() and do it in a clear context in io_ring_exit_work().
Fixes: f6d54255f423 ("io_uring: halt SQO submission on ctx exit") Reported-by: syzbot+e3a3f84f5cecf61f0583@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov asml.silence@gmail.com Link: https://lore.kernel.org/r/e90df88b8ff2cabb14a7534601d35d62ab4cb8c7.161649670... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/io_uring.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -8738,6 +8738,14 @@ static __poll_t io_uring_poll(struct fil if (!io_sqring_full(ctx)) mask |= EPOLLOUT | EPOLLWRNORM;
+ /* prevent SQPOLL from submitting new requests */ + if (ctx->sq_data) { + io_sq_thread_park(ctx->sq_data); + list_del_init(&ctx->sqd_list); + io_sqd_update_thread_idle(ctx->sq_data); + io_sq_thread_unpark(ctx->sq_data); + } + /* * Don't flush cqring overflow list here, just do a simple check. * Otherwise there could possible be ABBA deadlock: @@ -8816,14 +8824,6 @@ static void io_ring_ctx_wait_and_kill(st __io_cqring_overflow_flush(ctx, true, NULL, NULL); mutex_unlock(&ctx->uring_lock);
- /* prevent SQPOLL from submitting new requests */ - if (ctx->sq_data) { - io_sq_thread_park(ctx->sq_data); - list_del_init(&ctx->sqd_list); - io_sqd_update_thread_idle(ctx->sq_data); - io_sq_thread_unpark(ctx->sq_data); - } - io_kill_timeouts(ctx, NULL, NULL); io_poll_remove_all(ctx, NULL, NULL);
From: Jens Axboe axboe@kernel.dk
commit d3dc04cd81e0eaf50b2d09ab051a13300e587439 upstream.
This reverts commit 15b2219facadec583c24523eed40fa45865f859f.
Before IO threads accepted signals, the freezer using take signals to wake up an IO thread would cause them to loop without any way to clear the pending signal. That is no longer the case, so stop special casing PF_IO_WORKER in the freezer.
Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/freezer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/freezer.c +++ b/kernel/freezer.c @@ -134,7 +134,7 @@ bool freeze_task(struct task_struct *p) return false; }
- if (!(p->flags & (PF_KTHREAD | PF_IO_WORKER))) + if (!(p->flags & PF_KTHREAD)) fake_signal_wake_up(p); else wake_up_state(p, TASK_INTERRUPTIBLE);
From: David S. Miller davem@davemloft.net
commit 080bfa1e6d928a5d1f185cc44e5f3c251df06df5 upstream.
This reverts commit 2055a99da8a253a357bdfd359b3338ef3375a26c.
This change rejects legitimate configurations.
A slave doesn't need to exist nor implement ndo_slave_setup.
Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/bonding/bond_main.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)
--- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3917,15 +3917,11 @@ static int bond_neigh_init(struct neighb
rcu_read_lock(); slave = bond_first_slave_rcu(bond); - if (!slave) { - ret = -EINVAL; + if (!slave) goto out; - } slave_ops = slave->dev->netdev_ops; - if (!slave_ops->ndo_neigh_setup) { - ret = -EINVAL; + if (!slave_ops->ndo_neigh_setup) goto out; - }
/* TODO: find another way [1] to implement this. * Passing a zeroed structure is fragile,
On Mon, 5 Apr 2021 at 14:43, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.11.12 release. There are 152 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 07 Apr 2021 08:50:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.11.12-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.11.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 5.11.12-rc1 * git: ['https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git', 'https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc'] * git branch: linux-5.11.y * git commit: 74f1df3016246d321c3f58de40c0d64f5d5861a1 * git describe: v5.11.11-153-g74f1df301624 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.11.y/build/v5.11....
## No regressions (compared to v5.11.11-105-g79c43dab0491)
## No fixes (compared to v5.11.11-105-g79c43dab0491)
## Test result summary total: 76190, pass: 64219, fail: 1734, skip: 9997, xfail: 240,
## Build Summary * arc: 10 total, 10 passed, 0 failed * arm: 192 total, 192 passed, 0 failed * arm64: 26 total, 26 passed, 0 failed * dragonboard-410c: 1 total, 1 passed, 0 failed * hi6220-hikey: 1 total, 1 passed, 0 failed * i386: 25 total, 25 passed, 0 failed * juno-r2: 1 total, 1 passed, 0 failed * mips: 45 total, 45 passed, 0 failed * parisc: 9 total, 9 passed, 0 failed * powerpc: 27 total, 27 passed, 0 failed * riscv: 42 total, 38 passed, 4 failed * s390: 27 total, 24 passed, 3 failed * sh: 18 total, 18 passed, 0 failed * sparc: 9 total, 9 passed, 0 failed * x15: 1 total, 0 passed, 1 failed * x86: 1 total, 1 passed, 0 failed * x86_64: 35 total, 34 passed, 1 failed
## Test suites summary * fwts * igt-gpu-tools * install-android-platform-tools-r2600 * kselftest- * kselftest-android * kselftest-bpf * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-efivarfs * kselftest-filesystems * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-lkdtm * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-vsyscall-mode-native- * kselftest-vsyscall-mode-none- * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libhugetlbfs * linux-log-parser * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-controllers-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-tracing-tests * network-basic-tests * perf * rcutorture * ssuite * v4l2-compliance
-- Linaro LKFT https://lkft.linaro.org
On Mon, Apr 05, 2021 at 10:52:29AM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.11.12 release. There are 152 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 07 Apr 2021 08:50:09 +0000. Anything received after that time might be too late.
Build results: total: 155 pass: 155 fail: 0 Qemu test results: total: 460 pass: 460 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On 4/5/21 2:52 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.11.12 release. There are 152 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 07 Apr 2021 08:50:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.11.12-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.11.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On Mon, 05 Apr 2021 10:52:29 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.11.12 release. There are 152 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 07 Apr 2021 08:50:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.11.12-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.11.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v5.11: 12 builds: 12 pass, 0 fail 28 boots: 28 pass, 0 fail 70 tests: 70 pass, 0 fail
Linux version: 5.11.12-rc1-g74f1df301624 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
linux-stable-mirror@lists.linaro.org