This is the start of the stable review cycle for the 5.16.5 release. There are 200 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 02 Feb 2022 10:51:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.16.5-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.16.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.16.5-rc1
OGAWA Hirofumi hirofumi@mail.parknet.co.jp block: Fix wrong offset in bio_truncate()
Vitaly Kuznetsov vkuznets@redhat.com KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use
Vitaly Kuznetsov vkuznets@redhat.com KVM: nVMX: Implement evmcs_field_offset() suitable for handle_vmread()
Vitaly Kuznetsov vkuznets@redhat.com KVM: nVMX: Rename vmcs_to_field_offset{,_table}
Maor Gottlieb maorg@nvidia.com tools/testing/scatterlist: add missing defines
Dmitry V. Levin ldv@altlinux.org usr/include/Makefile: add linux/nfc.h to the compile-test coverage
Robert Hancock robert.hancock@calian.com usb: dwc3: xilinx: fix uninitialized return value
Suren Baghdasaryan surenb@google.com psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n
Suren Baghdasaryan surenb@google.com psi: fix "no previous prototype" warnings when CONFIG_CGROUPS=n
Namhyung Kim namhyung@kernel.org perf/core: Fix cgroup event list management
Sergio Paracuellos sergio.paracuellos@gmail.com PCI: mt7621: Remove unused function pcie_rmw()
Marc Kleine-Budde mkl@pengutronix.de dt-bindings: can: tcan4x5x: fix mram-cfg RX FIFO config
Sander Vanheule sander@svanheule.net irqchip/realtek-rtl: Fix off-by-one in routing
Sander Vanheule sander@svanheule.net irqchip/realtek-rtl: Map control data to virq
Tim Yi tim.yi@pica8.com net: bridge: vlan: fix memory leak in __allowed_ingress
Eric Dumazet edumazet@google.com ipv4: remove sparse error in ip_neigh_gw4()
Eric Dumazet edumazet@google.com ipv4: tcp: send zero IPID in SYNACK messages
Eric Dumazet edumazet@google.com ipv4: raw: lock the socket in raw_bind()
Nikolay Aleksandrov nikolay@nvidia.com net: bridge: vlan: fix single net device option dumping
Guillaume Nault gnault@redhat.com Revert "ipv6: Honor all IPv6 PIO Valid Lifetime values"
Catherine Sullivan csully@google.com gve: Fix GFP flags when allocing pages
Xiubo Li xiubli@redhat.com ceph: put the requests/sessions when it fails to alloc memory
Sean Christopherson seanjc@google.com KVM: selftests: Don't skip L2's VMCALL in SMM test for SVM guest
Dave Airlie airlied@redhat.com Revert "drm/ast: Support 1600x900 with 108MHz PCLK"
Maxim Mikityanskiy maximmi@nvidia.com sch_htb: Fail on unsupported parameters when offload is requested
David Matlack dmatlack@google.com KVM: selftests: Re-enable access_tracking_perf_test
Yufeng Mo moyufeng@huawei.com net: hns3: handle empty unknown interrupt for VF
Toke Høiland-Jørgensen toke@redhat.com net: cpsw: Properly initialise struct page_pool_params
Hangyu Hua hbh25y@gmail.com yam: fix a memory leak in yam_siocdevprivate()
Rob Clark robdclark@chromium.org drm/msm/a6xx: Add missing suspend_count increment
José Expósito jose.exposito89@gmail.com drm/msm/dpu: invalid parameter check in dpu_setup_dspp_pcc
Miaoqian Lin linmq006@gmail.com drm/msm/hdmi: Fix missing put_device() call in msm_hdmi_get_phy
Guenter Roeck linux@roeck-us.net hwmon: (nct6775) Fix crash in clear_caseopen
Marc Kleine-Budde mkl@pengutronix.de can: tcan4x5x: regmap: fix max register value
Michael Kelley mikelley@microsoft.com video: hyperv_fb: Fix validation of screen resolution
Wen Gu guwen@linux.alibaba.com net/smc: Transitional solution for clcsock race issue
Sukadev Bhattiprolu sukadev@linux.ibm.com ibmvnic: don't spin in tasklet
Sukadev Bhattiprolu sukadev@linux.ibm.com ibmvnic: init ->running_cap_crqs early
Sukadev Bhattiprolu sukadev@linux.ibm.com ibmvnic: Allow extra failures before disabling
Jakub Kicinski kuba@kernel.org ipv4: fix ip option filtering for locally generated fragments
Athira Rajeev atrajeev@linux.vnet.ibm.com powerpc/perf: Fix power_pmu_disable to call clear_pmi_irq_pending only if PMI is pending
Dan Carpenter dan.carpenter@oracle.com hwmon: (adt7470) Prevent divide by zero in adt7470_fan_write()
Guenter Roeck linux@roeck-us.net hwmon: (lm90) Fix sysfs and udev notifications
Guenter Roeck linux@roeck-us.net hwmon: (lm90) Mark alert as broken for MAX6654
Guenter Roeck linux@roeck-us.net hwmon: (lm90) Re-enable interrupts after alert clears
Yanming Liu yanminglr@gmail.com Drivers: hv: balloon: account for vmbus packet header in max_pkt_size
Miaoqian Lin linmq006@gmail.com block: fix memory leak in disk_register_independent_access_ranges
Dylan Yudaken dylany@fb.com io_uring: fix bug in slow unregistering of nodes
Mihai Carabas mihai.carabas@oracle.com efi/libstub: arm64: Fix image check alignment at entry
David Howells dhowells@redhat.com rxrpc: Adjust retransmission backoff
Kiran Kumar K kirankumark@marvell.com octeontx2-af: Add KPU changes to parse NGIO as separate layer
Subbaraya Sundeep sbhatta@marvell.com octeontx2-pf: Forward error codes to VF
Geetha sowjanya gakula@marvell.com octeontx2-af: cn10k: Do not enable RPM loopback for LPC interfaces
Geetha sowjanya gakula@marvell.com octeontx2-af: Increase link credit restore polling timeout
Geetha sowjanya gakula@marvell.com octeontx2-pf: cn10k: Ensure valid pointers are freed to aura
Geetha sowjanya gakula@marvell.com octeontx2-af: cn10k: Use appropriate register for LMAC enable
Geetha sowjanya gakula@marvell.com octeontx2-af: Retry until RVU block reset complete
Sunil Goutham sgoutham@marvell.com octeontx2-af: Fix LBK backpressure id count
Subbaraya Sundeep sbhatta@marvell.com octeontx2-af: Do not fixup all VF action entries
Paolo Abeni pabeni@redhat.com selftests: mptcp: fix ipv6 routing setup
Geliang Tang geliang.tang@suse.com mptcp: fix removing ids bitmap setting
Paolo Abeni pabeni@redhat.com mptcp: fix msk traversal in mptcp_nl_cmd_set_flags()
Paolo Abeni pabeni@redhat.com mptcp: keep track of local endpoint still available for each msk
Jean Sacren sakiwit@gmail.com mptcp: clean up harmless false expressions
Davide Caratti dcaratti@redhat.com mptcp: allow changing the "backup" bit by endpoint id
Marek Behún kabel@kernel.org phylib: fix potential use-after-free
Yuji Ishikawa yuji2.ishikawa@toshiba.co.jp net: stmmac: dwmac-visconti: Fix clock configuration for RMII mode
Yuji Ishikawa yuji2.ishikawa@toshiba.co.jp net: stmmac: dwmac-visconti: Fix bit definitions for ETHER_CLK_SEL
Moshe Tal moshet@nvidia.com ethtool: Fix link extended state for big endian
Robert Hancock robert.hancock@calian.com net: phy: broadcom: hook up soft_reset for BCM54616S
Vincent Guittot vincent.guittot@linaro.org sched/pelt: Relax the sync of util_sum with util_avg
Peter Zijlstra peterz@infradead.org perf: Fix perf_event_read_local() time
Nicholas Piggin npiggin@gmail.com powerpc/64s: Mask SRR0 before checking against the masked NIP
Randy Dunlap rdunlap@infradead.org remoteproc: qcom: q6v5: fix service routines build errors
Florian Westphal fw@strlen.de netfilter: conntrack: don't increment invalid counter on NF_REPEAT
Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com powerpc64/bpf: Limit 'ldbrx' to processors compliant with ISA v2.06
Chuck Lever chuck.lever@oracle.com SUNRPC: Don't dereference xprt->snd_task if it's a cookie
Marc Zyngier maz@kernel.org KVM: arm64: pkvm: Use the mm_ops indirection for cache maintenance
Trond Myklebust trond.myklebust@hammerspace.com NFS: Ensure the server has an up to date ctime before renaming
Trond Myklebust trond.myklebust@hammerspace.com NFS: Ensure the server has an up to date ctime before hardlinking
Eric Dumazet edumazet@google.com ipv6: annotate accesses to fn->fn_sernum
José Expósito jose.exposito89@gmail.com drm/msm/dsi: invalid parameter check in msm_dsi_phy_enable
Miaoqian Lin linmq006@gmail.com drm/msm/dsi: Fix missing put_device() call in dsi_get_phy
Xianting Tian xianting.tian@linux.alibaba.com drm/msm: Fix wrong size calculation
Jianguo Wu wujianguo@chinatelecom.cn net-procfs: show net devices bound packet types
Trond Myklebust trond.myklebust@hammerspace.com NFSv4: nfs_atomic_open() can race when looking up a non-regular file
Trond Myklebust trond.myklebust@hammerspace.com NFSv4: Handle case where the lookup of a directory fails
Guenter Roeck linux@roeck-us.net hwmon: (lm90) Reduce maximum conversion rate for G781
Eric Dumazet edumazet@google.com ipv4: avoid using shared IP generator for connected sockets
Xin Long lucien.xin@gmail.com ping: fix the sk_bound_dev_if match in ping_lookup
Guenter Roeck linux@roeck-us.net hwmon: (lm90) Mark alert as broken for MAX6680
Guenter Roeck linux@roeck-us.net hwmon: (lm90) Mark alert as broken for MAX6646/6647/6649
Congyu Liu liu3101@purdue.edu net: fix information leakage in /proc/net/ptype
sparkhuang huangshaobo6@huawei.com ARM: 9170/1: fix panic when kasan and kprobe are enabled
Ido Schimmel idosch@nvidia.com ipv6_tunnel: Rate limit warning messages
John Meneghini jmeneghi@redhat.com scsi: bnx2fc: Flush destroy_work queue before calling bnx2fc_interface_put()
Yang Yingliang yangyingliang@huawei.com scsi: elx: efct: Don't use GFP_KERNEL under spin lock
Matthias Kaehlcke mka@chromium.org rpmsg: char: Fix race between the release of rpmsg_eptdev and cdev
Sujit Kautkar sujitka@chromium.org rpmsg: char: Fix race between the release of rpmsg_ctrldev and cdev
Linyu Yuan quic_linyyuan@quicinc.com usb: roles: fix include/linux/usb/role.h compile issue
Joe Damato jdamato@fastly.com i40e: fix unsigned stat widths
Karen Sornek karen.sornek@intel.com i40e: Fix for failed to init adminq while VF reset
Sylwester Dziedziuch sylwesterx.dziedziuch@intel.com i40e: Fix queues reservation for XDP
Jedrzej Jagielski jedrzej.jagielski@intel.com i40e: Fix issue when maximum queues is exceeded
Jedrzej Jagielski jedrzej.jagielski@intel.com i40e: Increase delay to 1 s after global EMP reset
Christophe Leroy christophe.leroy@csgroup.eu powerpc/32: Fix boot failure with GCC latent entropy plugin
Christophe Leroy christophe.leroy@csgroup.eu powerpc/32s: Fix kasan_init_region() for KASAN
Christophe Leroy christophe.leroy@csgroup.eu powerpc/32s: Allocate one 256k IBAT instead of two consecutives 128k IBATs
Tony Luck tony.luck@intel.com x86/cpu: Add Xeon Icelake-D to list of CPUs that support PPIN
Yazen Ghannam yazen.ghannam@amd.com x86/MCE/AMD: Allow thresholding interface updates after init
Bjorn Helgaas bhelgaas@google.com PCI/sysfs: Find shadow ROM before static attribute initialization
Mathieu Desnoyers mathieu.desnoyers@efficios.com sched/membarrier: Fix membarrier-rseq fence command missing from query bitmask
Joseph Qi joseph.qi@linux.alibaba.com ocfs2: fix a deadlock when commit trans
Joseph Qi joseph.qi@linux.alibaba.com jbd2: export jbd2_journal_[grab|put]_journal_head
Peter Collingbourne pcc@google.com mm, kasan: use compare-exchange operation to set KASAN page tag
Lorenzo Bianconi lorenzo@kernel.org mt76: connac: introduce MCU_CE_CMD macro
Sing-Han Chen singhanc@nvidia.com ucsi_ccg: Check DEV_INT bit only when starting CCG4
Badhri Jagan Sridharan badhri@google.com usb: typec: tcpm: Do not disconnect when receiving VSAFE0V
Badhri Jagan Sridharan badhri@google.com usb: typec: tcpm: Do not disconnect while receiving VBUS off
Xu Yang xu.yang_2@nxp.com usb: typec: tcpci: don't touch CC line if it's Vconn source
Alan Stern stern@rowland.harvard.edu USB: core: Fix hang in usb_kill_urb by adding memory barriers
Robert Hancock robert.hancock@calian.com usb: dwc3: xilinx: Fix error handling when getting USB3 PHY
Robert Hancock robert.hancock@calian.com usb: dwc3: xilinx: Skip resets and USB3 register settings for USB2.0 mode
Pawel Laszczak pawell@cadence.com usb: cdnsp: Fix segmentation fault in cdns_lost_power function
Pavankumar Kondeti quic_pkondeti@quicinc.com usb: gadget: f_sourcesink: Fix isoc transfer for USB_SPEED_SUPER_PLUS
Jon Hunter jonathanh@nvidia.com usb: common: ulpi: Fix crash in ulpi_match()
Frank Li Frank.Li@nxp.com usb: xhci-plat: fix crash when suspend if remote wake enable
Alan Stern stern@rowland.harvard.edu usb-storage: Add unusual-devs entry for VL817 USB-SATA bridge
Greg Kroah-Hartman gregkh@linuxfoundation.org kbuild: remove include/linux/cyclades.h from header file check
Cameron Williams cang1@live.co.uk tty: Add support for Brainboxes UC cards.
Maciej W. Rozycki macro@embecosm.com tty: Partially revert the removal of the Cyclades public API
daniel.starke@siemens.com daniel.starke@siemens.com tty: n_gsm: fix SW flow control encoding/handling
Arnaud Pouliquen arnaud.pouliquen@foss.st.com tty: rpmsg: Fix race condition releasing tty port
Valentin Caron valentin.caron@foss.st.com serial: stm32: fix software flow control transfer
Robert Hancock robert.hancock@calian.com serial: 8250: of: Fix mapped region size when using reg-offset property
Jochen Mades jochen@mades.net serial: pl011: Fix incorrect rs485 RTS polarity on set_mctrl
Mike Snitzer snitzer@redhat.com dm: properly fix redundant bio-based IO accounting
Mike Snitzer snitzer@redhat.com block: add bio_start_io_acct_time() to control start_time
Mike Snitzer snitzer@redhat.com dm: revert partial fix for redundant bio-based IO accounting
Evgenii Stepanov eugenis@google.com arm64: extable: fix load_unaligned_zeropad() reg indices
Vivek Goyal vgoyal@redhat.com security, lsm: dentry_init_security() Handle multi LSM registration
Nicholas Piggin npiggin@gmail.com KVM: PPC: Book3S HV Nested: Fix nested HFSCR being clobbered with multiple vCPUs
Like Xu likexu@tencent.com KVM: x86: Sync the states size with the XCR0/IA32_XSS at, any time
Like Xu likexu@tencent.com KVM: x86: Update vCPU's runtime CPUID on write to MSR_IA32_XSS
Xiaoyao Li xiaoyao.li@intel.com KVM: x86: Keep MSR_IA32_XSS unchanged for INIT
Vitaly Kuznetsov vkuznets@redhat.com KVM: x86: Check .flags in kvm_cpuid_check_equal() too
Sean Christopherson seanjc@google.com KVM: x86: Forcibly leave nested virt when SMM state is toggled
Sean Christopherson seanjc@google.com KVM: x86: Free kvm_cpuid_entry2 array on post-KVM_RUN KVM_SET_CPUID{,2}
Vitaly Kuznetsov vkuznets@redhat.com KVM: x86: Move CPUID.(EAX=0x12,ECX=1) mangling to __kvm_update_cpuid_runtime()
Denis Valeev lemniscattaden@gmail.com KVM: x86: nSVM: skip eax alignment check for non-SVM instructions
Sean Christopherson seanjc@google.com KVM: SVM: Don't intercept #GP for SEV guests
Sean Christopherson seanjc@google.com KVM: SVM: Never reject emulation due to SMAP errata for !SEV guests
Wanpeng Li wanpengli@tencent.com KVM: LAPIC: Also cancel preemption timer during SET_LAPIC
Bas Nieuwenhuizen bas@basnieuwenhuizen.nl drm/amd/display: Wrap dcn301_calculate_wm_and_dlg for FPU.
Bas Nieuwenhuizen bas@basnieuwenhuizen.nl drm/amd/display: Fix FP start/end for dcn30_internal_validate_bw.
Bas Nieuwenhuizen bas@basnieuwenhuizen.nl drm/amdgpu/display: Remove t_srx_delay_us.
Alex Deucher alexander.deucher@amd.com drm/amdgpu: filter out radeon secondary ids as well
Manasi Navare manasi.d.navare@intel.com drm/atomic: Add the crtc to affected crtc only if uapi.enable = true
Lucas Stach l.stach@pengutronix.de drm/etnaviv: relax submit size limits
Kan Liang kan.liang@linux.intel.com perf/x86/intel: Add a quirk for the calculation of the number of counters on Alder Lake
Zhengjun Xing zhengjun.xing@linux.intel.com perf/x86/intel/uncore: Fix CAS_COUNT_WRITE issue for ICX
Christophe Leroy christophe.leroy@csgroup.eu powerpc/audit: Fix syscall_get_arch()
Suren Baghdasaryan surenb@google.com psi: Fix uaf issue when psi trigger is destroyed while being polled
Sean Christopherson seanjc@google.com Revert "KVM: SVM: avoid infinite loop on NPF from bad address"
Amir Goldstein amir73il@gmail.com fsnotify: fix fsnotify hooks in pseudo filesystems
Amir Goldstein amir73il@gmail.com fsnotify: invalidate dcache before IN_DELETE event
Jeff Layton jlayton@kernel.org ceph: set pool_ns in new inode layout for async creates
Jeff Layton jlayton@kernel.org ceph: properly put ceph_string reference after async create attempt
Tom Zanussi zanussi@kernel.org tracing: Don't inc err_log entry count if entry allocation fails
Tom Zanussi zanussi@kernel.org tracing: Propagate is_signed to expression
Xiaoke Wang xkernel.wang@foxmail.com tracing/histogram: Fix a potential memory leak for kstrdup()
Greg Kroah-Hartman gregkh@linuxfoundation.org PM: wakeup: simplify the output logic of pm_show_wakelocks()
Ard Biesheuvel ardb@kernel.org efi: runtime: avoid EFIv2 runtime services on Apple x86 machines
Jan Kara jack@suse.cz udf: Fix NULL ptr deref when converting from inline format
Jan Kara jack@suse.cz udf: Restore i_lenAlloc when inode expansion fails
Steffen Maier maier@linux.ibm.com scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devices
Eric W. Biederman ebiederm@xmission.com ucount: Make get_ucount a safe get_user replacement
Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com powerpc/bpf: Update ldimm64 instructions during extra pass
Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com powerpc32/bpf: Fix codegen for bpf-to-bpf calls
Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()
Christian Borntraeger borntraeger@linux.ibm.com s390/nmi: handle vector validity failures for KVM guests
Christian Borntraeger borntraeger@linux.ibm.com s390/nmi: handle guarded storage validity failures for KVM guests
Vasily Gorbik gor@linux.ibm.com s390/hypfs: include z/VM guests with access control group set
Ilya Leoshkevich iii@linux.ibm.com s390/module: fix loading modules with a lot of relocations
Marc Zyngier maz@kernel.org KVM: arm64: vgic-v3: Restrict SEIS workaround to known broken systems
Marc Zyngier maz@kernel.org KVM: arm64: Use shadow SPSR_EL1 when injecting exceptions on !VHE
Ard Biesheuvel ardb@kernel.org ARM: 9180/1: Thumb2: align ALT_UP() sections in modules sufficiently
Ard Biesheuvel ardb@kernel.org ARM: 9179/1: uaccess: avoid alignment faults in copy_[from|to]_kernel_nofault
Mohammad Athari Bin Ismail mohammad.athari.ismail@intel.com net: stmmac: skip only stmmac_ptp_register when resume from suspend
Mohammad Athari Bin Ismail mohammad.athari.ismail@intel.com net: stmmac: configure PTP clock source prior to PTP initialization
Marek Behún kabel@kernel.org net: sfp: ignore disabled SFP node
Marc Kleine-Budde mkl@pengutronix.de can: m_can: m_can_fifo_{read,write}: don't read or write from/to FIFO if length is 0
Filipe Manana fdmanana@suse.com btrfs: update writeback index when starting defrag
Filipe Manana fdmanana@suse.com btrfs: add back missing dirty page rate limiting to defrag
Filipe Manana fdmanana@suse.com btrfs: fix deadlock when reserving space during defrag
Qu Wenruo wqu@suse.com btrfs: defrag: properly update range->start for autodefrag
Qu Wenruo wqu@suse.com btrfs: defrag: fix wrong number of defragged sectors
Filipe Manana fdmanana@suse.com btrfs: allow defrag to be interruptible
Filipe Manana fdmanana@suse.com btrfs: fix too long loop when defragging a 1 byte file
Brian Gix brian.gix@intel.com Bluetooth: refactor malicious adv data check
-------------
Diffstat:
Documentation/accounting/psi.rst | 3 +- .../devicetree/bindings/net/can/tcan4x5x.txt | 2 +- Makefile | 4 +- arch/arm/include/asm/assembler.h | 2 + arch/arm/include/asm/processor.h | 1 + arch/arm/include/asm/uaccess.h | 10 +- arch/arm/probes/kprobes/Makefile | 3 + arch/arm64/kvm/hyp/exception.c | 5 +- arch/arm64/kvm/hyp/pgtable.c | 18 +- arch/arm64/kvm/hyp/vgic-v3-sr.c | 3 + arch/arm64/kvm/vgic/vgic-v3.c | 17 +- arch/arm64/mm/extable.c | 4 +- arch/ia64/pci/fixup.c | 4 +- arch/mips/loongson64/vbios_quirk.c | 9 +- arch/powerpc/include/asm/book3s/32/mmu-hash.h | 2 + arch/powerpc/include/asm/kvm_book3s_64.h | 1 - arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/include/asm/ppc-opcode.h | 1 + arch/powerpc/include/asm/syscall.h | 4 +- arch/powerpc/include/asm/thread_info.h | 2 + arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/interrupt_64.S | 2 + arch/powerpc/kvm/book3s_hv.c | 3 +- arch/powerpc/kvm/book3s_hv_nested.c | 2 +- arch/powerpc/lib/Makefile | 3 + arch/powerpc/mm/book3s32/mmu.c | 15 +- arch/powerpc/mm/kasan/book3s_32.c | 59 ++--- arch/powerpc/net/bpf_jit_comp.c | 29 ++- arch/powerpc/net/bpf_jit_comp32.c | 9 + arch/powerpc/net/bpf_jit_comp64.c | 29 ++- arch/powerpc/perf/core-book3s.c | 17 +- arch/s390/hypfs/hypfs_vm.c | 6 +- arch/s390/kernel/module.c | 37 ++- arch/s390/kernel/nmi.c | 27 ++- arch/x86/events/intel/core.c | 13 ++ arch/x86/events/intel/uncore_snbep.c | 2 +- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kernel/cpu/mce/amd.c | 2 +- arch/x86/kernel/cpu/mce/intel.c | 1 + arch/x86/kvm/cpuid.c | 65 ++++-- arch/x86/kvm/lapic.c | 2 +- arch/x86/kvm/svm/nested.c | 9 +- arch/x86/kvm/svm/svm.c | 41 ++-- arch/x86/kvm/svm/svm.h | 2 +- arch/x86/kvm/vmx/evmcs.c | 3 +- arch/x86/kvm/vmx/evmcs.h | 44 +++- arch/x86/kvm/vmx/nested.c | 60 +++-- arch/x86/kvm/vmx/vmcs12.c | 4 +- arch/x86/kvm/vmx/vmcs12.h | 6 +- arch/x86/kvm/x86.c | 10 +- arch/x86/pci/fixup.c | 4 +- block/bio.c | 3 +- block/blk-core.c | 25 +- block/blk-ia-ranges.c | 2 +- drivers/firmware/efi/efi.c | 7 + drivers/firmware/efi/libstub/arm64-stub.c | 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 81 +++++++ drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c | 1 - .../gpu/drm/amd/display/dc/dcn30/dcn30_resource.c | 4 +- .../drm/amd/display/dc/dcn301/dcn301_resource.c | 11 + .../display/dc/dml/dcn20/display_rq_dlg_calc_20.c | 2 - .../dc/dml/dcn20/display_rq_dlg_calc_20v2.c | 2 - .../display/dc/dml/dcn21/display_rq_dlg_calc_21.c | 2 - .../display/dc/dml/dcn30/display_rq_dlg_calc_30.c | 2 - .../gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.c | 2 +- .../gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.h | 2 +- .../drm/amd/display/dc/dml/display_mode_structs.h | 1 - .../amd/display/dc/dml/display_rq_dlg_helpers.c | 3 - .../amd/display/dc/dml/dml1_display_rq_dlg_calc.c | 4 - drivers/gpu/drm/ast/ast_tables.h | 2 - drivers/gpu/drm/drm_atomic.c | 12 +- drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 4 +- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 2 + drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dspp.c | 11 +- drivers/gpu/drm/msm/dsi/dsi.c | 7 +- drivers/gpu/drm/msm/dsi/phy/dsi_phy.c | 4 +- drivers/gpu/drm/msm/hdmi/hdmi.c | 7 +- drivers/gpu/drm/msm/msm_drv.c | 2 +- drivers/hv/hv_balloon.c | 7 + drivers/hwmon/adt7470.c | 3 + drivers/hwmon/lm90.c | 21 +- drivers/hwmon/nct6775.c | 6 +- drivers/irqchip/irq-realtek-rtl.c | 10 +- drivers/md/dm.c | 20 +- drivers/net/can/m_can/m_can.c | 6 + drivers/net/can/m_can/tcan4x5x-regmap.c | 2 +- drivers/net/ethernet/google/gve/gve.h | 2 +- drivers/net/ethernet/google/gve/gve_main.c | 6 +- drivers/net/ethernet/google/gve/gve_rx.c | 3 +- drivers/net/ethernet/google/gve/gve_rx_dqo.c | 2 +- .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 3 +- drivers/net/ethernet/ibm/ibmvnic.c | 133 +++++++---- drivers/net/ethernet/intel/i40e/i40e.h | 9 +- drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 2 +- drivers/net/ethernet/intel/i40e/i40e_main.c | 44 ++-- drivers/net/ethernet/intel/i40e/i40e_register.h | 3 + drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 103 ++++++++- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h | 1 + drivers/net/ethernet/marvell/octeontx2/af/cgx.c | 2 + .../ethernet/marvell/octeontx2/af/lmac_common.h | 3 + drivers/net/ethernet/marvell/octeontx2/af/mbox.h | 1 + .../ethernet/marvell/octeontx2/af/npc_profile.h | 70 +++--- drivers/net/ethernet/marvell/octeontx2/af/rpm.c | 66 ++++-- drivers/net/ethernet/marvell/octeontx2/af/rpm.h | 4 + drivers/net/ethernet/marvell/octeontx2/af/rvu.c | 7 +- drivers/net/ethernet/marvell/octeontx2/af/rvu.h | 1 + .../net/ethernet/marvell/octeontx2/af/rvu_cgx.c | 14 +- .../ethernet/marvell/octeontx2/af/rvu_debugfs.c | 2 + .../net/ethernet/marvell/octeontx2/af/rvu_nix.c | 20 +- .../net/ethernet/marvell/octeontx2/af/rvu_npc.c | 22 +- .../net/ethernet/marvell/octeontx2/af/rvu_npc_fs.c | 20 +- .../ethernet/marvell/octeontx2/nic/otx2_common.h | 1 + .../net/ethernet/marvell/octeontx2/nic/otx2_pf.c | 7 +- .../net/ethernet/stmicro/stmmac/dwmac-visconti.c | 42 ++-- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 23 +- drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c | 3 - drivers/net/ethernet/ti/cpsw_priv.c | 2 +- drivers/net/hamradio/yam.c | 4 +- drivers/net/phy/broadcom.c | 1 + drivers/net/phy/phy_device.c | 6 +- drivers/net/phy/sfp-bus.c | 5 + drivers/net/wireless/mediatek/mt76/mt7615/mcu.c | 16 +- .../net/wireless/mediatek/mt76/mt76_connac_mcu.c | 47 ++-- .../net/wireless/mediatek/mt76/mt76_connac_mcu.h | 48 ++-- drivers/net/wireless/mediatek/mt76/mt7921/mcu.c | 24 +- .../net/wireless/mediatek/mt76/mt7921/testmode.c | 4 +- drivers/pci/controller/pcie-mt7621.c | 9 - drivers/remoteproc/Kconfig | 4 + drivers/remoteproc/qcom_q6v5.c | 1 + drivers/rpmsg/rpmsg_char.c | 22 +- drivers/s390/scsi/zfcp_fc.c | 13 +- drivers/scsi/bnx2fc/bnx2fc_fcoe.c | 20 +- drivers/scsi/elx/libefc/efc_els.c | 8 +- drivers/tty/n_gsm.c | 4 +- drivers/tty/rpmsg_tty.c | 40 ++-- drivers/tty/serial/8250/8250_of.c | 11 +- drivers/tty/serial/8250/8250_pci.c | 100 +++++++- drivers/tty/serial/amba-pl011.c | 8 +- drivers/tty/serial/stm32-usart.c | 2 +- drivers/usb/cdns3/drd.c | 6 +- drivers/usb/common/ulpi.c | 7 +- drivers/usb/core/hcd.c | 14 ++ drivers/usb/core/urb.c | 12 + drivers/usb/dwc3/dwc3-xilinx.c | 25 +- drivers/usb/gadget/function/f_sourcesink.c | 1 + drivers/usb/host/xhci-plat.c | 3 + drivers/usb/storage/unusual_devs.h | 10 + drivers/usb/typec/tcpm/tcpci.c | 26 +++ drivers/usb/typec/tcpm/tcpci.h | 1 + drivers/usb/typec/tcpm/tcpm.c | 7 +- drivers/usb/typec/ucsi/ucsi_ccg.c | 2 +- drivers/video/fbdev/hyperv_fb.c | 16 +- fs/btrfs/ioctl.c | 90 ++++++-- fs/ceph/caps.c | 55 +++-- fs/ceph/file.c | 9 + fs/configfs/dir.c | 6 +- fs/devpts/inode.c | 2 +- fs/io_uring.c | 7 +- fs/jbd2/journal.c | 2 + fs/namei.c | 10 +- fs/nfs/dir.c | 22 ++ fs/nfsd/nfsctl.c | 5 +- fs/ocfs2/suballoc.c | 25 +- fs/udf/inode.c | 9 +- include/linux/blkdev.h | 1 + include/linux/ethtool.h | 2 +- include/linux/fsnotify.h | 49 +++- include/linux/lsm_hook_defs.h | 2 +- include/linux/mm.h | 17 +- include/linux/netdevice.h | 1 + include/linux/perf_event.h | 15 +- include/linux/psi.h | 13 +- include/linux/psi_types.h | 3 - include/linux/usb/role.h | 6 + include/net/addrconf.h | 2 + include/net/ip.h | 21 +- include/net/ip6_fib.h | 2 +- include/net/route.h | 2 +- include/trace/events/sunrpc.h | 18 +- include/uapi/linux/cyclades.h | 35 +++ kernel/bpf/stackmap.c | 5 +- kernel/cgroup/cgroup.c | 11 +- kernel/events/core.c | 257 +++++++++++++-------- kernel/power/wakelock.c | 11 +- kernel/sched/fair.c | 16 +- kernel/sched/membarrier.c | 9 +- kernel/sched/pelt.h | 4 +- kernel/sched/psi.c | 145 ++++++------ kernel/trace/trace.c | 3 +- kernel/trace/trace_events_hist.c | 4 + kernel/ucount.c | 2 + net/bluetooth/hci_event.c | 10 +- net/bridge/br_vlan.c | 9 +- net/core/net-procfs.c | 38 ++- net/ipv4/ip_output.c | 26 ++- net/ipv4/ping.c | 3 +- net/ipv4/raw.c | 5 +- net/ipv6/addrconf.c | 27 ++- net/ipv6/ip6_fib.c | 23 +- net/ipv6/ip6_tunnel.c | 8 +- net/ipv6/route.c | 2 +- net/mptcp/pm.c | 1 + net/mptcp/pm_netlink.c | 176 ++++++++------ net/mptcp/protocol.c | 3 +- net/mptcp/protocol.h | 12 +- net/netfilter/nf_conntrack_core.c | 8 +- net/packet/af_packet.c | 2 + net/rxrpc/call_event.c | 8 +- net/rxrpc/output.c | 2 +- net/sched/sch_htb.c | 20 ++ net/smc/af_smc.c | 63 ++++- net/sunrpc/rpc_pipe.c | 4 +- security/security.c | 15 +- tools/testing/scatterlist/linux/mm.h | 3 +- tools/testing/selftests/kvm/Makefile | 1 + tools/testing/selftests/kvm/x86_64/smm_test.c | 1 - tools/testing/selftests/net/mptcp/mptcp_join.sh | 10 +- usr/include/Makefile | 2 +- virt/kvm/kvm_main.c | 1 - 219 files changed, 2317 insertions(+), 1102 deletions(-)
From: Brian Gix brian.gix@intel.com
commit 899663be5e75dc0174dc8bda0b5e6826edf0b29a upstream.
Check for out-of-bound read was being performed at the end of while num_reports loop, and would fill journal with false positives. Added check to beginning of loop processing so that it doesn't get checked after ptr has been advanced.
Signed-off-by: Brian Gix brian.gix@intel.com Signed-off-by: Marcel Holtmann marcel@holtmann.org Cc: syphyr syphyr@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bluetooth/hci_event.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -5822,6 +5822,11 @@ static void hci_le_adv_report_evt(struct struct hci_ev_le_advertising_info *ev = ptr; s8 rssi;
+ if (ptr > (void *)skb_tail_pointer(skb) - sizeof(*ev)) { + bt_dev_err(hdev, "Malicious advertising data."); + break; + } + if (ev->length <= HCI_MAX_AD_LENGTH && ev->data + ev->length <= skb_tail_pointer(skb)) { rssi = ev->data[ev->length]; @@ -5833,11 +5838,6 @@ static void hci_le_adv_report_evt(struct }
ptr += sizeof(*ev) + ev->length + 1; - - if (ptr > (void *) skb_tail_pointer(skb) - sizeof(*ev)) { - bt_dev_err(hdev, "Malicious advertising data. Stopping processing"); - break; - } }
hci_dev_unlock(hdev);
From: Filipe Manana fdmanana@suse.com
commit 6b34cd8e175bfbf4f3f01b6d19eae18245e1a8cc upstream.
When attempting to defrag a file with a single byte, we can end up in a too long loop, which is nearly infinite because at btrfs_defrag_file() we end up with the variable last_byte assigned with a value of 18446744073709551615 (which is (u64)-1). The problem comes from the fact we end up doing:
last_byte = round_up(last_byte, fs_info->sectorsize) - 1;
So if last_byte was assigned 0, which is i_size - 1, we underflow and end up with the value 18446744073709551615.
This is trivial to reproduce and the following script triggers it:
$ cat test.sh #!/bin/bash
DEV=/dev/sdj MNT=/mnt/sdj
mkfs.btrfs -f $DEV mount $DEV $MNT
echo -n "X" > $MNT/foobar
btrfs filesystem defragment $MNT/foobar
umount $MNT
So fix this by not decrementing last_byte by 1 before doing the sector size round up. Also, to make it easier to follow, make the round up right after computing last_byte.
Reported-by: Anthony Ruhier aruhier@mailbox.org Fixes: 7b508037d4cac3 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()") Link: https://lore.kernel.org/linux-btrfs/0a269612-e43f-da22-c5bc-b34b1b56ebe8@mai... CC: stable@vger.kernel.org # 5.16 Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/ioctl.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
--- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1492,12 +1492,16 @@ int btrfs_defrag_file(struct inode *inod
if (range->start + range->len > range->start) { /* Got a specific range */ - last_byte = min(isize, range->start + range->len) - 1; + last_byte = min(isize, range->start + range->len); } else { /* Defrag until file end */ - last_byte = isize - 1; + last_byte = isize; }
+ /* Align the range */ + cur = round_down(range->start, fs_info->sectorsize); + last_byte = round_up(last_byte, fs_info->sectorsize) - 1; + /* * If we were not given a ra, allocate a readahead context. As * readahead is just an optimization, defrag will work without it so @@ -1510,10 +1514,6 @@ int btrfs_defrag_file(struct inode *inod file_ra_state_init(ra, inode->i_mapping); }
- /* Align the range */ - cur = round_down(range->start, fs_info->sectorsize); - last_byte = round_up(last_byte, fs_info->sectorsize) - 1; - while (cur < last_byte) { u64 cluster_end;
From: Filipe Manana fdmanana@suse.com
commit b767c2fc787e992daeadfff40d61c05f66c82da0 upstream.
During defrag, at btrfs_defrag_file(), we have this loop that iterates over a file range in steps no larger than 256K subranges. If the range is too long, there's no way to interrupt it. So make the loop check in each iteration if there's signal pending, and if there is, break and return -AGAIN to userspace.
Before kernel 5.16, we used to allow defrag to be cancelled through a signal, but that was lost with commit 7b508037d4cac3 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()").
This change adds back the possibility to cancel a defrag with a signal and keeps the same semantics, returning -EAGAIN to user space (and not the usually more expected -EINTR).
This is also motivated by a recent bug on 5.16 where defragging a 1 byte file resulted in iterating from file range 0 to (u64)-1, as hitting the bug triggered a too long loop, basically requiring one to reboot the machine, as it was not possible to cancel defrag.
Fixes: 7b508037d4cac3 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()") CC: stable@vger.kernel.org # 5.16 Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/ioctl.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1520,6 +1520,11 @@ int btrfs_defrag_file(struct inode *inod /* The cluster size 256K should always be page aligned */ BUILD_BUG_ON(!IS_ALIGNED(CLUSTER_SIZE, PAGE_SIZE));
+ if (btrfs_defrag_cancelled(fs_info)) { + ret = -EAGAIN; + break; + } + /* We want the cluster end at page boundary when possible */ cluster_end = (((cur >> PAGE_SHIFT) + (SZ_256K >> PAGE_SHIFT)) << PAGE_SHIFT) - 1;
From: Qu Wenruo wqu@suse.com
commit 484167da77739a8d0e225008c48e697fd3f781ae upstream.
[BUG] There are users using autodefrag mount option reporting obvious increase in IO:
If I compare the write average (in total, I don't have it per process) when taking idle periods on the same machine: Linux 5.16: without autodefrag: ~ 10KiB/s with autodefrag: between 1 and 2MiB/s.
Linux 5.15: with autodefrag:~ 10KiB/s (around the same as without
autodefrag on 5.16)
[CAUSE] When autodefrag mount option is enabled, btrfs_defrag_file() will be called with @max_sectors = BTRFS_DEFRAG_BATCH (1024) to limit how many sectors we can defrag in one try.
And then use the number of sectors defragged to determine if we need to re-defrag.
But commit b18c3ab2343d ("btrfs: defrag: introduce helper to defrag one cluster") uses wrong unit to increase @sectors_defragged, which should be in unit of sector, not byte.
This means, if we have defragged any sector, then @sectors_defragged will be >= sectorsize (normally 4096), which is larger than BTRFS_DEFRAG_BATCH.
This makes the @max_sectors check in defrag_one_cluster() to underflow, rendering the whole @max_sectors check useless.
Thus causing way more IO for autodefrag mount options, as now there is no limit on how many sectors can really be defragged.
[FIX] Fix the problems by:
- Use sector as unit when increasing @sectors_defragged
- Include @sectors_defragged > @max_sectors case to break the loop
- Add extra comment on the return value of btrfs_defrag_file()
Reported-by: Anthony Ruhier aruhier@mailbox.org Fixes: b18c3ab2343d ("btrfs: defrag: introduce helper to defrag one cluster") Link: https://lore.kernel.org/linux-btrfs/0a269612-e43f-da22-c5bc-b34b1b56ebe8@mai... CC: stable@vger.kernel.org # 5.16 Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/ioctl.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
--- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1416,8 +1416,8 @@ static int defrag_one_cluster(struct btr list_for_each_entry(entry, &target_list, list) { u32 range_len = entry->len;
- /* Reached the limit */ - if (max_sectors && max_sectors == *sectors_defragged) + /* Reached or beyond the limit */ + if (max_sectors && *sectors_defragged >= max_sectors) break;
if (max_sectors) @@ -1439,7 +1439,8 @@ static int defrag_one_cluster(struct btr extent_thresh, newer_than, do_compress); if (ret < 0) break; - *sectors_defragged += range_len; + *sectors_defragged += range_len >> + inode->root->fs_info->sectorsize_bits; } out: list_for_each_entry_safe(entry, tmp, &target_list, list) { @@ -1458,6 +1459,9 @@ out: * @newer_than: minimum transid to defrag * @max_to_defrag: max number of sectors to be defragged, if 0, the whole inode * will be defragged. + * + * Return <0 for error. + * Return >=0 for the number of sectors defragged. */ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra, struct btrfs_ioctl_defrag_range_args *range,
From: Qu Wenruo wqu@suse.com
commit c080b4144b9dd3b7af838a194ffad3204ca15166 upstream.
[BUG] After commit 7b508037d4ca ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()") autodefrag no longer properly re-defrag the file from previously finished location.
[CAUSE] The recent refactoring of defrag only focuses on defrag ioctl subpage support, doesn't take autodefrag into consideration.
There are two problems involved which prevents autodefrag to restart its scan:
- No range.start update Previously when one defrag target is found, range->start will be updated to indicate where next search should start from.
But now btrfs_defrag_file() doesn't update it anymore, making all autodefrag to rescan from file offset 0.
This would also make autodefrag to mark the same range dirty again and again, causing extra IO.
- No proper quick exit for defrag_one_cluster() Currently if we reached or exceed @max_sectors limit, we just exit defrag_one_cluster(), and let next defrag_one_cluster() call to do a quick exit. This makes @cur increase, thus no way to properly know which range is defragged and which range is skipped.
[FIX] The fix involves two modifications:
- Update range->start to next cluster start This is a little different from the old behavior. Previously range->start is updated to the next defrag target.
But in the end, the behavior should still be pretty much the same, as now we skip to next defrag target inside btrfs_defrag_file().
Thus if auto-defrag determines to re-scan, then we still do the skip, just at a different timing.
- Make defrag_one_cluster() to return >0 to indicate a quick exit So that btrfs_defrag_file() can also do a quick exit, without increasing @cur to the range end, and re-use @cur to update @range->start.
- Add comment for btrfs_defrag_file() to mention the range->start update Currently only autodefrag utilize this behavior, as defrag ioctl won't set @max_to_defrag parameter, thus unless interrupted it will always try to defrag the whole range.
Reported-by: Filipe Manana fdmanana@suse.com Fixes: 7b508037d4ca ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()") Link: https://lore.kernel.org/linux-btrfs/0a269612-e43f-da22-c5bc-b34b1b56ebe8@mai... CC: stable@vger.kernel.org # 5.16 Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/ioctl.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)
--- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1417,8 +1417,10 @@ static int defrag_one_cluster(struct btr u32 range_len = entry->len;
/* Reached or beyond the limit */ - if (max_sectors && *sectors_defragged >= max_sectors) + if (max_sectors && *sectors_defragged >= max_sectors) { + ret = 1; break; + }
if (max_sectors) range_len = min_t(u32, range_len, @@ -1461,7 +1463,10 @@ out: * will be defragged. * * Return <0 for error. - * Return >=0 for the number of sectors defragged. + * Return >=0 for the number of sectors defragged, and range->start will be updated + * to indicate the file offset where next defrag should be started at. + * (Mostly for autodefrag, which sets @max_to_defrag thus we may exit early without + * defragging all the range). */ int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra, struct btrfs_ioctl_defrag_range_args *range, @@ -1554,10 +1559,19 @@ int btrfs_defrag_file(struct inode *inod if (ret < 0) break; cur = cluster_end + 1; + if (ret > 0) { + ret = 0; + break; + } }
if (ra_allocated) kfree(ra); + /* + * Update range.start for autodefrag, this will indicate where to start + * in next run. + */ + range->start = cur; if (sectors_defragged) { /* * We have defragged some sectors, for compression case they
From: Filipe Manana fdmanana@suse.com
commit 0cb5950f3f3b51a4e8657d106f897f2b913e0586 upstream.
When defragging we can end up collecting a range for defrag that has already pages under delalloc (dirty), as long as the respective extent map for their range is not mapped to a hole, a prealloc extent or the extent map is from an old generation.
Most of the time that is harmless from a functional perspective at least, however it can result in a deadlock:
1) At defrag_collect_targets() we find an extent map that meets all requirements but there's delalloc for the range it covers, and we add its range to list of ranges to defrag;
2) The defrag_collect_targets() function is called at defrag_one_range(), after it locked a range that overlaps the range of the extent map;
3) At defrag_one_range(), while the range is still locked, we call defrag_one_locked_target() for the range associated to the extent map we collected at step 1);
4) Then finally at defrag_one_locked_target() we do a call to btrfs_delalloc_reserve_space(), which will reserve data and metadata space. If the space reservations can not be satisfied right away, the flusher might be kicked in and start flushing delalloc and wait for the respective ordered extents to complete. If this happens we will deadlock, because both flushing delalloc and finishing an ordered extent, requires locking the range in the inode's io tree, which was already locked at defrag_collect_targets().
So fix this by skipping extent maps for which there's already delalloc.
Fixes: eb793cf857828d ("btrfs: defrag: introduce helper to collect target file extents") CC: stable@vger.kernel.org # 5.16 Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/ioctl.c | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-)
--- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1188,6 +1188,35 @@ static int defrag_collect_targets(struct goto next;
/* + * Our start offset might be in the middle of an existing extent + * map, so take that into account. + */ + range_len = em->len - (cur - em->start); + /* + * If this range of the extent map is already flagged for delalloc, + * skip it, because: + * + * 1) We could deadlock later, when trying to reserve space for + * delalloc, because in case we can't immediately reserve space + * the flusher can start delalloc and wait for the respective + * ordered extents to complete. The deadlock would happen + * because we do the space reservation while holding the range + * locked, and starting writeback, or finishing an ordered + * extent, requires locking the range; + * + * 2) If there's delalloc there, it means there's dirty pages for + * which writeback has not started yet (we clean the delalloc + * flag when starting writeback and after creating an ordered + * extent). If we mark pages in an adjacent range for defrag, + * then we will have a larger contiguous range for delalloc, + * very likely resulting in a larger extent after writeback is + * triggered (except in a case of free space fragmentation). + */ + if (test_range_bit(&inode->io_tree, cur, cur + range_len - 1, + EXTENT_DELALLOC, 0, NULL)) + goto next; + + /* * For do_compress case, we want to compress all valid file * extents, thus no @extent_thresh or mergeable check. */ @@ -1195,7 +1224,7 @@ static int defrag_collect_targets(struct goto add;
/* Skip too large extent */ - if (em->len >= extent_thresh) + if (range_len >= extent_thresh) goto next;
next_mergeable = defrag_check_next_extent(&inode->vfs_inode, em,
From: Filipe Manana fdmanana@suse.com
commit 3c9d31c715948aaff0ee6d322a91a2dec07770bf upstream.
A defrag operation can dirty a lot of pages, specially if operating on the entire file or a large file range. Any task dirtying pages should periodically call balance_dirty_pages_ratelimited(), as stated in that function's comments, otherwise they can leave too many dirty pages in the system. This is what we did before the refactoring in 5.16, and it should have remained, just like in the buffered write path and relocation. So restore that behaviour.
Fixes: 7b508037d4cac3 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()") CC: stable@vger.kernel.org # 5.16 Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/ioctl.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1553,6 +1553,7 @@ int btrfs_defrag_file(struct inode *inod }
while (cur < last_byte) { + const unsigned long prev_sectors_defragged = sectors_defragged; u64 cluster_end;
/* The cluster size 256K should always be page aligned */ @@ -1584,6 +1585,10 @@ int btrfs_defrag_file(struct inode *inod cluster_end + 1 - cur, extent_thresh, newer_than, do_compress, §ors_defragged, max_to_defrag); + + if (sectors_defragged > prev_sectors_defragged) + balance_dirty_pages_ratelimited(inode->i_mapping); + btrfs_inode_unlock(inode, 0); if (ret < 0) break;
From: Filipe Manana fdmanana@suse.com
commit 27cdfde181bcacd226c230b2fd831f6f5b8c215f upstream.
When starting a defrag, we should update the writeback index of the inode's mapping in case it currently has a value beyond the start of the range we are defragging. This can help performance and often result in getting less extents after writeback - for e.g., if the current value of the writeback index sits somewhere in the middle of a range that gets dirty by the defrag, then after writeback we can get two smaller extents instead of a single, larger extent.
We used to have this before the refactoring in 5.16, but it was removed without any reason to do so. Originally it was added in kernel 3.1, by commit 2a0f7f5769992b ("Btrfs: fix recursive auto-defrag"), in order to fix a loop with autodefrag resulting in dirtying and writing pages over and over, but some testing on current code did not show that happening, at least with the test described in that commit.
So add back the behaviour, as at the very least it is a nice to have optimization.
Fixes: 7b508037d4cac3 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()") CC: stable@vger.kernel.org # 5.16 Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/ioctl.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1511,6 +1511,7 @@ int btrfs_defrag_file(struct inode *inod int compress_type = BTRFS_COMPRESS_ZLIB; int ret = 0; u32 extent_thresh = range->extent_thresh; + pgoff_t start_index;
if (isize == 0) return 0; @@ -1552,6 +1553,14 @@ int btrfs_defrag_file(struct inode *inod file_ra_state_init(ra, inode->i_mapping); }
+ /* + * Make writeback start from the beginning of the range, so that the + * defrag range can be written sequentially. + */ + start_index = cur >> PAGE_SHIFT; + if (start_index < inode->i_mapping->writeback_index) + inode->i_mapping->writeback_index = start_index; + while (cur < last_byte) { const unsigned long prev_sectors_defragged = sectors_defragged; u64 cluster_end;
From: Marc Kleine-Budde mkl@pengutronix.de
commit db72589c49fd260bfc99c7160c079675bc7417af upstream.
In order to optimize FIFO access, especially on m_can cores attached to slow busses like SPI, in patch
| e39381770ec9 ("can: m_can: Disable IRQs on FIFO bus errors")
bulk read/write support has been added to the m_can_fifo_{read,write} functions.
That change leads to the tcan driver to call regmap_bulk_{read,write}() with a length of 0 (for CAN frames with 0 data length). regmap treats this as an error:
| tcan4x5x spi1.0 tcan4x5x0: FIFO write returned -22
This patch fixes the problem by not calling the cdev->ops->{read,write)_fifo() in case of a 0 length read/write.
Fixes: e39381770ec9 ("can: m_can: Disable IRQs on FIFO bus errors") Link: https://lore.kernel.org/all/20220114155751.2651888-1-mkl@pengutronix.de Cc: stable@vger.kernel.org Cc: Matt Kline matt@bitbashing.io Cc: Chandrasekar Ramakrishnan rcsekar@samsung.com Reported-by: Michael Anochin anochin@photo-meter.com Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/can/m_can/m_can.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/net/can/m_can/m_can.c +++ b/drivers/net/can/m_can/m_can.c @@ -336,6 +336,9 @@ m_can_fifo_read(struct m_can_classdev *c u32 addr_offset = cdev->mcfg[MRAM_RXF0].off + fgi * RXF0_ELEMENT_SIZE + offset;
+ if (val_count == 0) + return 0; + return cdev->ops->read_fifo(cdev, addr_offset, val, val_count); }
@@ -346,6 +349,9 @@ m_can_fifo_write(struct m_can_classdev * u32 addr_offset = cdev->mcfg[MRAM_TXB].off + fpi * TXB_ELEMENT_SIZE + offset;
+ if (val_count == 0) + return 0; + return cdev->ops->write_fifo(cdev, addr_offset, val, val_count); }
From: Marek Behún kabel@kernel.org
commit 2148927e6ed43a1667baf7c2ae3e0e05a44b51a0 upstream.
Commit ce0aa27ff3f6 ("sfp: add sfp-bus to bridge between network devices and sfp cages") added code which finds SFP bus DT node even if the node is disabled with status = "disabled". Because of this, when phylink is created, it ends with non-null .sfp_bus member, even though the SFP module is not probed (because the node is disabled).
We need to ignore disabled SFP bus node.
Fixes: ce0aa27ff3f6 ("sfp: add sfp-bus to bridge between network devices and sfp cages") Signed-off-by: Marek Behún kabel@kernel.org Cc: stable@vger.kernel.org # 2203cbf2c8b5 ("net: sfp: move fwnode parsing into sfp-bus layer") Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/phy/sfp-bus.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/drivers/net/phy/sfp-bus.c +++ b/drivers/net/phy/sfp-bus.c @@ -651,6 +651,11 @@ struct sfp_bus *sfp_bus_find_fwnode(stru else if (ret < 0) return ERR_PTR(ret);
+ if (!fwnode_device_is_available(ref.fwnode)) { + fwnode_handle_put(ref.fwnode); + return NULL; + } + bus = sfp_bus_get(ref.fwnode); fwnode_handle_put(ref.fwnode); if (!bus)
From: Mohammad Athari Bin Ismail mohammad.athari.ismail@intel.com
commit 94c82de43e01ef5747a95e4a590880de863fe423 upstream.
For Intel platform, it is required to configure PTP clock source prior PTP initialization in MAC. So, need to move ptp_clk_freq_config execution from stmmac_ptp_register() to stmmac_init_ptp().
Fixes: 76da35dc99af ("stmmac: intel: Add PSE and PCH PTP clock source selection") Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Mohammad Athari Bin Ismail mohammad.athari.ismail@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 +++ drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c | 3 --- 2 files changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -899,6 +899,9 @@ static int stmmac_init_ptp(struct stmmac bool xmac = priv->plat->has_gmac4 || priv->plat->has_xgmac; int ret;
+ if (priv->plat->ptp_clk_freq_config) + priv->plat->ptp_clk_freq_config(priv); + ret = stmmac_init_tstamp_counter(priv, STMMAC_HWTS_ACTIVE); if (ret) return ret; --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c @@ -297,9 +297,6 @@ void stmmac_ptp_register(struct stmmac_p { int i;
- if (priv->plat->ptp_clk_freq_config) - priv->plat->ptp_clk_freq_config(priv); - for (i = 0; i < priv->dma_cap.pps_out_num; i++) { if (i >= STMMAC_PPS_MAX) break;
From: Mohammad Athari Bin Ismail mohammad.athari.ismail@intel.com
commit 0735e639f129dff455aeb91da291f5c578cc33db upstream.
When resume from suspend, besides skipping PTP registration, it also skipping PTP HW initialization. This could cause PTP clock not able to operate properly when resume from suspend.
To fix this, only stmmac_ptp_register() is skipped when resume from suspend.
Fixes: fe1319291150 ("stmmac: Don't init ptp again when resume from suspend/hibernation") Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Mohammad Athari Bin Ismail mohammad.athari.ismail@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-)
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -924,8 +924,6 @@ static int stmmac_init_ptp(struct stmmac priv->hwts_tx_en = 0; priv->hwts_rx_en = 0;
- stmmac_ptp_register(priv); - return 0; }
@@ -3248,7 +3246,7 @@ static int stmmac_fpe_start_wq(struct st /** * stmmac_hw_setup - setup mac in a usable state. * @dev : pointer to the device structure. - * @init_ptp: initialize PTP if set + * @ptp_register: register PTP if set * Description: * this is the main function to setup the HW in a usable state because the * dma engine is reset, the core registers are configured (e.g. AXI, @@ -3258,7 +3256,7 @@ static int stmmac_fpe_start_wq(struct st * 0 on success and an appropriate (-)ve integer as defined in errno.h * file on failure. */ -static int stmmac_hw_setup(struct net_device *dev, bool init_ptp) +static int stmmac_hw_setup(struct net_device *dev, bool ptp_register) { struct stmmac_priv *priv = netdev_priv(dev); u32 rx_cnt = priv->plat->rx_queues_to_use; @@ -3315,13 +3313,13 @@ static int stmmac_hw_setup(struct net_de
stmmac_mmc_setup(priv);
- if (init_ptp) { - ret = stmmac_init_ptp(priv); - if (ret == -EOPNOTSUPP) - netdev_warn(priv->dev, "PTP not supported by HW\n"); - else if (ret) - netdev_warn(priv->dev, "PTP init failed\n"); - } + ret = stmmac_init_ptp(priv); + if (ret == -EOPNOTSUPP) + netdev_warn(priv->dev, "PTP not supported by HW\n"); + else if (ret) + netdev_warn(priv->dev, "PTP init failed\n"); + else if (ptp_register) + stmmac_ptp_register(priv);
priv->eee_tw_timer = STMMAC_DEFAULT_TWT_LS;
From: Ard Biesheuvel ardb@kernel.org
commit 15420269b02a63ed8c1841905d8b8b2403246004 upstream.
The helpers that are used to implement copy_from_kernel_nofault() and copy_to_kernel_nofault() cast a void* to a pointer to a wider type, which may result in alignment faults on ARM if the compiler decides to use double-word or multiple-word load/store instructions.
Only configurations that define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y are affected, given that commit 2423de2e6f4d ("ARM: 9115/1: mm/maccess: fix unaligned copy_{from,to}_kernel_nofault") ensures that dst and src are sufficiently aligned otherwise.
So use the unaligned accessors for accessing dst and src in cases where they may be misaligned.
Cc: stable@vger.kernel.org # depends on 2423de2e6f4d Fixes: 2df4c9a741a0 ("ARM: 9112/1: uaccess: add __{get,put}_kernel_nofault") Reviewed-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/include/asm/uaccess.h | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/arch/arm/include/asm/uaccess.h +++ b/arch/arm/include/asm/uaccess.h @@ -11,6 +11,7 @@ #include <linux/string.h> #include <asm/memory.h> #include <asm/domain.h> +#include <asm/unaligned.h> #include <asm/unified.h> #include <asm/compiler.h>
@@ -497,7 +498,10 @@ do { \ } \ default: __err = __get_user_bad(); break; \ } \ - *(type *)(dst) = __val; \ + if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) \ + put_unaligned(__val, (type *)(dst)); \ + else \ + *(type *)(dst) = __val; /* aligned by caller */ \ if (__err) \ goto err_label; \ } while (0) @@ -507,7 +511,9 @@ do { \ const type *__pk_ptr = (dst); \ unsigned long __dst = (unsigned long)__pk_ptr; \ int __err = 0; \ - type __val = *(type *)src; \ + type __val = IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) \ + ? get_unaligned((type *)(src)) \ + : *(type *)(src); /* aligned by caller */ \ switch (sizeof(type)) { \ case 1: __put_user_asm_byte(__val, __dst, __err, ""); break; \ case 2: __put_user_asm_half(__val, __dst, __err, ""); break; \
From: Ard Biesheuvel ardb@kernel.org
commit 9f80ccda53b9417236945bc7ece4b519037df74d upstream.
When building for Thumb2, the .alt.smp.init sections that are emitted by the ALT_UP() patching code may not be 32-bit aligned, even though the fixup_smp_on_up() routine expects that. This results in alignment faults at module load time, which need to be fixed up by the fault handler.
So let's align those sections explicitly, and prevent this from occurring.
Cc: stable@vger.kernel.org Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/include/asm/assembler.h | 2 ++ arch/arm/include/asm/processor.h | 1 + 2 files changed, 3 insertions(+)
--- a/arch/arm/include/asm/assembler.h +++ b/arch/arm/include/asm/assembler.h @@ -288,6 +288,7 @@ */ #define ALT_UP(instr...) \ .pushsection ".alt.smp.init", "a" ;\ + .align 2 ;\ .long 9998b - . ;\ 9997: instr ;\ .if . - 9997b == 2 ;\ @@ -299,6 +300,7 @@ .popsection #define ALT_UP_B(label) \ .pushsection ".alt.smp.init", "a" ;\ + .align 2 ;\ .long 9998b - . ;\ W(b) . + (label - 9998b) ;\ .popsection --- a/arch/arm/include/asm/processor.h +++ b/arch/arm/include/asm/processor.h @@ -96,6 +96,7 @@ unsigned long __get_wchan(struct task_st #define __ALT_SMP_ASM(smp, up) \ "9998: " smp "\n" \ " .pushsection ".alt.smp.init", "a"\n" \ + " .align 2\n" \ " .long 9998b - .\n" \ " " up "\n" \ " .popsection\n"
From: Marc Zyngier maz@kernel.org
commit 278583055a237270fac70518275ba877bf9e4013 upstream.
Injecting an exception into a guest with non-VHE is risky business. Instead of writing in the shadow register for the switch code to restore it, we override the CPU register instead. Which gets overriden a few instructions later by said restore code.
The result is that although the guest correctly gets the exception, it will return to the original context in some random state, depending on what was there the first place... Boo.
Fix the issue by writing to the shadow register. The original code is absolutely fine on VHE, as the state is already loaded, and writing to the shadow register in that case would actually be a bug.
Fixes: bb666c472ca2 ("KVM: arm64: Inject AArch64 exceptions from HYP") Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier maz@kernel.org Reviewed-by: Fuad Tabba tabba@google.com Link: https://lore.kernel.org/r/20220121184207.423426-1-maz@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/hyp/exception.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/arch/arm64/kvm/hyp/exception.c +++ b/arch/arm64/kvm/hyp/exception.c @@ -38,7 +38,10 @@ static inline void __vcpu_write_sys_reg(
static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, u64 val) { - write_sysreg_el1(val, SYS_SPSR); + if (has_vhe()) + write_sysreg_el1(val, SYS_SPSR); + else + __vcpu_sys_reg(vcpu, SPSR_EL1) = val; }
static void __vcpu_write_spsr_abt(struct kvm_vcpu *vcpu, u64 val)
From: Marc Zyngier maz@kernel.org
commit d11a327ed95dbec756b99cbfef2a7fd85c9eeb09 upstream.
Contrary to what df652bcf1136 ("KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors") was asserting, there is at least one other system out there (Cavium ThunderX2) implementing SEIS, and not in an obviously broken way.
So instead of imposing the M1 workaround on an innocent bystander, let's limit it to the two known broken Apple implementations.
Fixes: df652bcf1136 ("KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors") Reported-by: Ard Biesheuvel ardb@kernel.org Tested-by: Ard Biesheuvel ardb@kernel.org Acked-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220122103912.795026-1-maz@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/hyp/vgic-v3-sr.c | 3 +++ arch/arm64/kvm/vgic/vgic-v3.c | 17 +++++++++++++++-- 2 files changed, 18 insertions(+), 2 deletions(-)
--- a/arch/arm64/kvm/hyp/vgic-v3-sr.c +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c @@ -983,6 +983,9 @@ static void __vgic_v3_read_ctlr(struct k val = ((vtr >> 29) & 7) << ICC_CTLR_EL1_PRI_BITS_SHIFT; /* IDbits */ val |= ((vtr >> 23) & 7) << ICC_CTLR_EL1_ID_BITS_SHIFT; + /* SEIS */ + if (kvm_vgic_global_state.ich_vtr_el2 & ICH_VTR_SEIS_MASK) + val |= BIT(ICC_CTLR_EL1_SEIS_SHIFT); /* A3V */ val |= ((vtr >> 21) & 1) << ICC_CTLR_EL1_A3V_SHIFT; /* EOImode */ --- a/arch/arm64/kvm/vgic/vgic-v3.c +++ b/arch/arm64/kvm/vgic/vgic-v3.c @@ -609,6 +609,18 @@ static int __init early_gicv4_enable(cha } early_param("kvm-arm.vgic_v4_enable", early_gicv4_enable);
+static const struct midr_range broken_seis[] = { + MIDR_ALL_VERSIONS(MIDR_APPLE_M1_ICESTORM), + MIDR_ALL_VERSIONS(MIDR_APPLE_M1_FIRESTORM), + {}, +}; + +static bool vgic_v3_broken_seis(void) +{ + return ((kvm_vgic_global_state.ich_vtr_el2 & ICH_VTR_SEIS_MASK) && + is_midr_in_range_list(read_cpuid_id(), broken_seis)); +} + /** * vgic_v3_probe - probe for a VGICv3 compatible interrupt controller * @info: pointer to the GIC description @@ -676,9 +688,10 @@ int vgic_v3_probe(const struct gic_kvm_i group1_trap = true; }
- if (kvm_vgic_global_state.ich_vtr_el2 & ICH_VTR_SEIS_MASK) { - kvm_info("GICv3 with locally generated SEI\n"); + if (vgic_v3_broken_seis()) { + kvm_info("GICv3 with broken locally generated SEI\n");
+ kvm_vgic_global_state.ich_vtr_el2 &= ~ICH_VTR_SEIS_MASK; group0_trap = true; group1_trap = true; if (ich_vtr_el2 & ICH_VTR_TDS_MASK)
From: Ilya Leoshkevich iii@linux.ibm.com
commit f3b7e73b2c6619884351a3a0a7468642f852b8a2 upstream.
If the size of the PLT entries generated by apply_rela() exceeds 64KiB, the first ones can no longer reach __jump_r1 with brc. Fix by using brcl. An alternative solution is to add a __jump_r1 copy after every 64KiB, however, the space savings are quite small and do not justify the additional complexity.
Fixes: f19fbd5ed642 ("s390: introduce execute-trampolines for branches") Cc: stable@vger.kernel.org Reported-by: Andrea Righi andrea.righi@canonical.com Signed-off-by: Ilya Leoshkevich iii@linux.ibm.com Reviewed-by: Heiko Carstens hca@linux.ibm.com Cc: Vasily Gorbik gor@linux.ibm.com Cc: Christian Borntraeger borntraeger@linux.ibm.com Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/module.c | 37 ++++++++++++++++++------------------- 1 file changed, 18 insertions(+), 19 deletions(-)
--- a/arch/s390/kernel/module.c +++ b/arch/s390/kernel/module.c @@ -33,7 +33,7 @@ #define DEBUGP(fmt , ...) #endif
-#define PLT_ENTRY_SIZE 20 +#define PLT_ENTRY_SIZE 22
void *module_alloc(unsigned long size) { @@ -341,27 +341,26 @@ static int apply_rela(Elf_Rela *rela, El case R_390_PLTOFF32: /* 32 bit offset from GOT to PLT. */ case R_390_PLTOFF64: /* 16 bit offset from GOT to PLT. */ if (info->plt_initialized == 0) { - unsigned int insn[5]; - unsigned int *ip = me->core_layout.base + - me->arch.plt_offset + - info->plt_offset; - - insn[0] = 0x0d10e310; /* basr 1,0 */ - insn[1] = 0x100a0004; /* lg 1,10(1) */ + unsigned char insn[PLT_ENTRY_SIZE]; + char *plt_base; + char *ip; + + plt_base = me->core_layout.base + me->arch.plt_offset; + ip = plt_base + info->plt_offset; + *(int *)insn = 0x0d10e310; /* basr 1,0 */ + *(int *)&insn[4] = 0x100c0004; /* lg 1,12(1) */ if (IS_ENABLED(CONFIG_EXPOLINE) && !nospec_disable) { - unsigned int *ij; - ij = me->core_layout.base + - me->arch.plt_offset + - me->arch.plt_size - PLT_ENTRY_SIZE; - insn[2] = 0xa7f40000 + /* j __jump_r1 */ - (unsigned int)(u16) - (((unsigned long) ij - 8 - - (unsigned long) ip) / 2); + char *jump_r1; + + jump_r1 = plt_base + me->arch.plt_size - + PLT_ENTRY_SIZE; + /* brcl 0xf,__jump_r1 */ + *(short *)&insn[8] = 0xc0f4; + *(int *)&insn[10] = (jump_r1 - (ip + 8)) / 2; } else { - insn[2] = 0x07f10000; /* br %r1 */ + *(int *)&insn[8] = 0x07f10000; /* br %r1 */ } - insn[3] = (unsigned int) (val >> 32); - insn[4] = (unsigned int) val; + *(long *)&insn[14] = val;
write(ip, insn, sizeof(insn)); info->plt_initialized = 1;
From: Vasily Gorbik gor@linux.ibm.com
commit 663d34c8df98740f1e90241e78e456d00b3c6cad upstream.
Currently if z/VM guest is allowed to retrieve hypervisor performance data globally for all guests (privilege class B) the query is formed in a way to include all guests but the group name is left empty. This leads to that z/VM guests which have access control group set not being included in the results (even local vm).
Change the query group identifier from empty to "any" to retrieve information about all guests from any groups (or without a group set).
Cc: stable@vger.kernel.org Fixes: 31cb4bd31a48 ("[S390] Hypervisor filesystem (s390_hypfs) for z/VM") Reviewed-by: Gerald Schaefer gerald.schaefer@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/hypfs/hypfs_vm.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/arch/s390/hypfs/hypfs_vm.c +++ b/arch/s390/hypfs/hypfs_vm.c @@ -20,6 +20,7 @@
static char local_guest[] = " "; static char all_guests[] = "* "; +static char *all_groups = all_guests; static char *guest_query;
struct diag2fc_data { @@ -62,10 +63,11 @@ static int diag2fc(int size, char* query
memcpy(parm_list.userid, query, NAME_LEN); ASCEBC(parm_list.userid, NAME_LEN); - parm_list.addr = (unsigned long) addr ; + memcpy(parm_list.aci_grp, all_groups, NAME_LEN); + ASCEBC(parm_list.aci_grp, NAME_LEN); + parm_list.addr = (unsigned long)addr; parm_list.size = size; parm_list.fmt = 0x02; - memset(parm_list.aci_grp, 0x40, NAME_LEN); rc = -1;
diag_stat_inc(DIAG_STAT_X2FC);
From: Christian Borntraeger borntraeger@linux.ibm.com
commit 1ea1d6a847d2b1d17fefd9196664b95f052a0775 upstream.
machine check validity bits reflect the state of the machine check. If a guest does not make use of guarded storage, the validity bit might be off. We can not use the host CR bit to decide if the validity bit must be on. So ignore "invalid" guarded storage controls for KVM guests in the host and rely on the machine check being forwarded to the guest. If no other errors happen from a host perspective everything is fine and no process must be killed and the host can continue to run.
Cc: stable@vger.kernel.org Fixes: c929500d7a5a ("s390/nmi: s390: New low level handling for machine check happening in guest") Reported-by: Carsten Otte cotte@de.ibm.com Signed-off-by: Christian Borntraeger borntraeger@linux.ibm.com Tested-by: Carsten Otte cotte@de.ibm.com Reviewed-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/nmi.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
--- a/arch/s390/kernel/nmi.c +++ b/arch/s390/kernel/nmi.c @@ -316,11 +316,21 @@ static int notrace s390_validate_registe if (cr2.gse) { if (!mci.gs) { /* - * Guarded storage register can't be restored and - * the current processes uses guarded storage. - * It has to be terminated. + * 2 cases: + * - machine check in kernel or userspace + * - machine check while running SIE (KVM guest) + * For kernel or userspace the userspace values of + * guarded storage control can not be recreated, the + * process must be terminated. + * For SIE the guest values of guarded storage can not + * be recreated. This is either due to a bug or due to + * GS being disabled in the guest. The guest will be + * notified by KVM code and the guests machine check + * handling must take care of this. The host values + * are saved by KVM and are not affected. */ - kill_task = 1; + if (!test_cpu_flag(CIF_MCCK_GUEST)) + kill_task = 1; } else { load_gs_cb((struct gs_cb *)mcesa->guarded_storage_save_area); }
From: Christian Borntraeger borntraeger@linux.ibm.com
commit f094a39c6ba168f2df1edfd1731cca377af5f442 upstream.
The machine check validity bit tells about the context. If a KVM guest was running the bit tells about the guest validity and the host state is not affected. As a guest can disable the guest validity this might result in unwanted host errors on machine checks.
Cc: stable@vger.kernel.org Fixes: c929500d7a5a ("s390/nmi: s390: New low level handling for machine check happening in guest") Signed-off-by: Christian Borntraeger borntraeger@linux.ibm.com Reviewed-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/nmi.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/arch/s390/kernel/nmi.c +++ b/arch/s390/kernel/nmi.c @@ -273,7 +273,14 @@ static int notrace s390_validate_registe /* Validate vector registers */ union ctlreg0 cr0;
- if (!mci.vr) { + /* + * The vector validity must only be checked if not running a + * KVM guest. For KVM guests the machine check is forwarded by + * KVM and it is the responsibility of the guest to take + * appropriate actions. The host vector or FPU values have been + * saved by KVM and will be restored by KVM. + */ + if (!mci.vr && !test_cpu_flag(CIF_MCCK_GUEST)) { /* * Vector registers can't be restored. If the kernel * currently uses vector registers the system is
From: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com
commit b992f01e66150fc5e90be4a96f5eb8e634c8249e upstream.
task_pt_regs() can return NULL on powerpc for kernel threads. This is then used in __bpf_get_stack() to check for user mode, resulting in a kernel oops. Guard against this by checking return value of task_pt_regs() before trying to obtain the call chain.
Fixes: fa28dcb82a38f8 ("bpf: Introduce helper bpf_get_task_stack()") Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com Acked-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/d5ef83c361cc255494afd15ff1b4fb02a36e1dcf.164146812... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/stackmap.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -525,13 +525,14 @@ BPF_CALL_4(bpf_get_task_stack, struct ta u32, size, u64, flags) { struct pt_regs *regs; - long res; + long res = -EINVAL;
if (!try_get_task_stack(task)) return -EFAULT;
regs = task_pt_regs(task); - res = __bpf_get_stack(regs, task, NULL, buf, size, flags); + if (regs) + res = __bpf_get_stack(regs, task, NULL, buf, size, flags); put_task_stack(task);
return res;
From: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com
commit fab07611fb2e6a15fac05c4583045ca5582fd826 upstream.
Pad instructions emitted for BPF_CALL so that the number of instructions generated does not change for different function addresses. This is especially important for calls to other bpf functions, whose address will only be known during extra pass.
Fixes: 51c66ad849a703 ("powerpc/bpf: Implement extended BPF on PPC32") Cc: stable@vger.kernel.org # v5.13+ Signed-off-by: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/52d8fe51f7620a6f27f377791564d79d75463576.164146812... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/net/bpf_jit_comp32.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/arch/powerpc/net/bpf_jit_comp32.c +++ b/arch/powerpc/net/bpf_jit_comp32.c @@ -191,6 +191,9 @@ void bpf_jit_emit_func_call_rel(u32 *ima
if (image && rel < 0x2000000 && rel >= -0x2000000) { PPC_BL_ABS(func); + EMIT(PPC_RAW_NOP()); + EMIT(PPC_RAW_NOP()); + EMIT(PPC_RAW_NOP()); } else { /* Load function address into r0 */ EMIT(PPC_RAW_LIS(_R0, IMM_H(func)));
From: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com
commit f9320c49993ca3c0ec0f9a7026b313735306bb8b upstream.
These instructions are updated after the initial JIT, so redo codegen during the extra pass. Rename bpf_jit_fixup_subprog_calls() to clarify that this is more than just subprog calls.
Fixes: 69c087ba6225b5 ("bpf: Add bpf_for_each_map_elem() helper") Cc: stable@vger.kernel.org # v5.15 Signed-off-by: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com Tested-by: Jiri Olsa jolsa@redhat.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/7cc162af77ba918eb3ecd26ec9e7824bc44b1fae.164146812... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/net/bpf_jit_comp.c | 29 +++++++++++++++++++++++------ arch/powerpc/net/bpf_jit_comp32.c | 6 ++++++ arch/powerpc/net/bpf_jit_comp64.c | 7 ++++++- 3 files changed, 35 insertions(+), 7 deletions(-)
--- a/arch/powerpc/net/bpf_jit_comp.c +++ b/arch/powerpc/net/bpf_jit_comp.c @@ -23,15 +23,15 @@ static void bpf_jit_fill_ill_insns(void memset32(area, BREAKPOINT_INSTRUCTION, size / 4); }
-/* Fix the branch target addresses for subprog calls */ -static int bpf_jit_fixup_subprog_calls(struct bpf_prog *fp, u32 *image, - struct codegen_context *ctx, u32 *addrs) +/* Fix updated addresses (for subprog calls, ldimm64, et al) during extra pass */ +static int bpf_jit_fixup_addresses(struct bpf_prog *fp, u32 *image, + struct codegen_context *ctx, u32 *addrs) { const struct bpf_insn *insn = fp->insnsi; bool func_addr_fixed; u64 func_addr; u32 tmp_idx; - int i, ret; + int i, j, ret;
for (i = 0; i < fp->len; i++) { /* @@ -66,6 +66,23 @@ static int bpf_jit_fixup_subprog_calls(s * of the JITed sequence remains unchanged. */ ctx->idx = tmp_idx; + } else if (insn[i].code == (BPF_LD | BPF_IMM | BPF_DW)) { + tmp_idx = ctx->idx; + ctx->idx = addrs[i] / 4; +#ifdef CONFIG_PPC32 + PPC_LI32(ctx->b2p[insn[i].dst_reg] - 1, (u32)insn[i + 1].imm); + PPC_LI32(ctx->b2p[insn[i].dst_reg], (u32)insn[i].imm); + for (j = ctx->idx - addrs[i] / 4; j < 4; j++) + EMIT(PPC_RAW_NOP()); +#else + func_addr = ((u64)(u32)insn[i].imm) | (((u64)(u32)insn[i + 1].imm) << 32); + PPC_LI64(b2p[insn[i].dst_reg], func_addr); + /* overwrite rest with nops */ + for (j = ctx->idx - addrs[i] / 4; j < 5; j++) + EMIT(PPC_RAW_NOP()); +#endif + ctx->idx = tmp_idx; + i++; } }
@@ -193,13 +210,13 @@ skip_init_ctx: /* * Do not touch the prologue and epilogue as they will remain * unchanged. Only fix the branch target address for subprog - * calls in the body. + * calls in the body, and ldimm64 instructions. * * This does not change the offsets and lengths of the subprog * call instruction sequences and hence, the size of the JITed * image as well. */ - bpf_jit_fixup_subprog_calls(fp, code_base, &cgctx, addrs); + bpf_jit_fixup_addresses(fp, code_base, &cgctx, addrs);
/* There is no need to perform the usual passes. */ goto skip_codegen_passes; --- a/arch/powerpc/net/bpf_jit_comp32.c +++ b/arch/powerpc/net/bpf_jit_comp32.c @@ -292,6 +292,8 @@ int bpf_jit_build_body(struct bpf_prog * bool func_addr_fixed; u64 func_addr; u32 true_cond; + u32 tmp_idx; + int j;
/* * addrs[] maps a BPF bytecode address into a real offset from @@ -839,8 +841,12 @@ int bpf_jit_build_body(struct bpf_prog * * 16 byte instruction that uses two 'struct bpf_insn' */ case BPF_LD | BPF_IMM | BPF_DW: /* dst = (u64) imm */ + tmp_idx = ctx->idx; PPC_LI32(dst_reg_h, (u32)insn[i + 1].imm); PPC_LI32(dst_reg, (u32)insn[i].imm); + /* padding to allow full 4 instructions for later patching */ + for (j = ctx->idx - tmp_idx; j < 4; j++) + EMIT(PPC_RAW_NOP()); /* Adjust for two bpf instructions */ addrs[++i] = ctx->idx * 4; break; --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -318,6 +318,7 @@ int bpf_jit_build_body(struct bpf_prog * u64 imm64; u32 true_cond; u32 tmp_idx; + int j;
/* * addrs[] maps a BPF bytecode address into a real offset from @@ -806,9 +807,13 @@ emit_clear: case BPF_LD | BPF_IMM | BPF_DW: /* dst = (u64) imm */ imm64 = ((u64)(u32) insn[i].imm) | (((u64)(u32) insn[i+1].imm) << 32); + tmp_idx = ctx->idx; + PPC_LI64(dst_reg, imm64); + /* padding to allow full 5 instructions for later patching */ + for (j = ctx->idx - tmp_idx; j < 5; j++) + EMIT(PPC_RAW_NOP()); /* Adjust for two bpf instructions */ addrs[++i] = ctx->idx * 4; - PPC_LI64(dst_reg, imm64); break;
/*
From: Eric W. Biederman ebiederm@xmission.com
commit f9d87929d451d3e649699d0f1d74f71f77ad38f5 upstream.
When the ucount code was refactored to create get_ucount it was missed that some of the contexts in which a rlimit is kept elevated can be the only reference to the user/ucount in the system.
Ordinary ucount references exist in places that also have a reference to the user namspace, but in POSIX message queues, the SysV shm code, and the SIGPENDING code there is no independent user namespace reference.
Inspection of the the user_namespace show no instance of circular references between struct ucounts and the user_namespace. So hold a reference from struct ucount to i's user_namespace to resolve this problem.
Link: https://lore.kernel.org/lkml/YZV7Z+yXbsx9p3JN@fixkernel.com/ Reported-by: Qian Cai quic_qiancai@quicinc.com Reported-by: Mathias Krause minipli@grsecurity.net Tested-by: Mathias Krause minipli@grsecurity.net Reviewed-by: Mathias Krause minipli@grsecurity.net Reviewed-by: Alexey Gladkov legion@kernel.org Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") Fixes: 6e52a9f0532f ("Reimplement RLIMIT_MSGQUEUE on top of ucounts") Fixes: d7c9e99aee48 ("Reimplement RLIMIT_MEMLOCK on top of ucounts") Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/ucount.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/kernel/ucount.c +++ b/kernel/ucount.c @@ -190,6 +190,7 @@ struct ucounts *alloc_ucounts(struct use kfree(new); } else { hlist_add_head(&new->node, hashent); + get_user_ns(new->ns); spin_unlock_irq(&ucounts_lock); return new; } @@ -210,6 +211,7 @@ void put_ucounts(struct ucounts *ucounts if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) { hlist_del_init(&ucounts->node); spin_unlock_irqrestore(&ucounts_lock, flags); + put_user_ns(ucounts->ns); kfree(ucounts); } }
From: Steffen Maier maier@linux.ibm.com
commit 8c9db6679be4348b8aae108e11d4be2f83976e30 upstream.
Suppose we have an environment with a number of non-NPIV FCP devices (virtual HBAs / FCP devices / zfcp "adapter"s) sharing the same physical FCP channel (HBA port) and its I_T nexus. Plus a number of storage target ports zoned to such shared channel. Now one target port logs out of the fabric causing an RSCN. Zfcp reacts with an ADISC ELS and subsequent port recovery depending on the ADISC result. This happens on all such FCP devices (in different Linux images) concurrently as they all receive a copy of this RSCN. In the following we look at one of those FCP devices.
Requests other than FSF_QTCB_FCP_CMND can be slow until they get a response.
Depending on which requests are affected by slow responses, there are different recovery outcomes. Here we want to fix failed recoveries on port or adapter level by avoiding recovery requests that can be slow.
We need the cached N_Port_ID for the remote port "link" test with ADISC. Just before sending the ADISC, we now intentionally forget the old cached N_Port_ID. The idea is that on receiving an RSCN for a port, we have to assume that any cached information about this port is stale. This forces a fresh new GID_PN [FC-GS] nameserver lookup on any subsequent recovery for the same port. Since we typically can still communicate with the nameserver efficiently, we now reach steady state quicker: Either the nameserver still does not know about the port so we stop recovery, or the nameserver already knows the port potentially with a new N_Port_ID and we can successfully and quickly perform open port recovery. For the one case, where ADISC returns successfully, we re-initialize port->d_id because that case does not involve any port recovery.
This also solves a problem if the storage WWPN quickly logs into the fabric again but with a different N_Port_ID. Such as on virtual WWPN takeover during target NPIV failover. [https://www.redbooks.ibm.com/abstracts/redp5477.html] In that case the RSCN from the storage FDISC was ignored by zfcp and we could not successfully recover the failover. On some later failback on the storage, we could have been lucky if the virtual WWPN got the same old N_Port_ID from the SAN switch as we still had cached. Then the related RSCN triggered a successful port reopen recovery. However, there is no guarantee to get the same N_Port_ID on NPIV FDISC.
Even though NPIV-enabled FCP devices are not affected by this problem, this code change optimizes recovery time for gone remote ports as a side effect. The timely drop of cached N_Port_IDs prevents unnecessary slow open port attempts.
While the problem might have been in code before v2.6.32 commit 799b76d09aee ("[SCSI] zfcp: Decouple gid_pn requests from erp") this fix depends on the gid_pn_work introduced with that commit, so we mark it as culprit to satisfy fix dependencies.
Note: Point-to-point remote port is already handled separately and gets its N_Port_ID from the cached peer_d_id. So resetting port->d_id in general does not affect PtP.
Link: https://lore.kernel.org/r/20220118165803.3667947-1-maier@linux.ibm.com Fixes: 799b76d09aee ("[SCSI] zfcp: Decouple gid_pn requests from erp") Cc: stable@vger.kernel.org #2.6.32+ Suggested-by: Benjamin Block bblock@linux.ibm.com Reviewed-by: Benjamin Block bblock@linux.ibm.com Signed-off-by: Steffen Maier maier@linux.ibm.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/scsi/zfcp_fc.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
--- a/drivers/s390/scsi/zfcp_fc.c +++ b/drivers/s390/scsi/zfcp_fc.c @@ -521,6 +521,8 @@ static void zfcp_fc_adisc_handler(void * goto out; }
+ /* re-init to undo drop from zfcp_fc_adisc() */ + port->d_id = ntoh24(adisc_resp->adisc_port_id); /* port is good, unblock rport without going through erp */ zfcp_scsi_schedule_rport_register(port); out: @@ -534,6 +536,7 @@ static int zfcp_fc_adisc(struct zfcp_por struct zfcp_fc_req *fc_req; struct zfcp_adapter *adapter = port->adapter; struct Scsi_Host *shost = adapter->scsi_host; + u32 d_id; int ret;
fc_req = kmem_cache_zalloc(zfcp_fc_req_cache, GFP_ATOMIC); @@ -558,7 +561,15 @@ static int zfcp_fc_adisc(struct zfcp_por fc_req->u.adisc.req.adisc_cmd = ELS_ADISC; hton24(fc_req->u.adisc.req.adisc_port_id, fc_host_port_id(shost));
- ret = zfcp_fsf_send_els(adapter, port->d_id, &fc_req->ct_els, + d_id = port->d_id; /* remember as destination for send els below */ + /* + * Force fresh GID_PN lookup on next port recovery. + * Must happen after request setup and before sending request, + * to prevent race with port->d_id re-init in zfcp_fc_adisc_handler(). + */ + port->d_id = 0; + + ret = zfcp_fsf_send_els(adapter, d_id, &fc_req->ct_els, ZFCP_FC_CTELS_TMO); if (ret) kmem_cache_free(zfcp_fc_req_cache, fc_req);
From: Jan Kara jack@suse.cz
commit ea8569194b43f0f01f0a84c689388542c7254a1f upstream.
When we fail to expand inode from inline format to a normal format, we restore inode to contain the original inline formatting but we forgot to set i_lenAlloc back. The mismatch between i_lenAlloc and i_size was then causing further problems such as warnings and lost data down the line.
Reported-by: butt3rflyh4ck butterflyhuangxx@gmail.com CC: stable@vger.kernel.org Fixes: 7e49b6f2480c ("udf: Convert UDF to new truncate calling sequence") Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/udf/inode.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/udf/inode.c +++ b/fs/udf/inode.c @@ -317,6 +317,7 @@ int udf_expand_file_adinicb(struct inode unlock_page(page); iinfo->i_alloc_type = ICBTAG_FLAG_AD_IN_ICB; inode->i_data.a_ops = &udf_adinicb_aops; + iinfo->i_lenAlloc = inode->i_size; up_write(&iinfo->i_data_sem); } put_page(page);
From: Jan Kara jack@suse.cz
commit 7fc3b7c2981bbd1047916ade327beccb90994eee upstream.
udf_expand_file_adinicb() calls directly ->writepage to write data expanded into a page. This however misses to setup inode for writeback properly and so we can crash on inode->i_wb dereference when submitting page for IO like:
BUG: kernel NULL pointer dereference, address: 0000000000000158 #PF: supervisor read access in kernel mode ... <TASK> __folio_start_writeback+0x2ac/0x350 __block_write_full_page+0x37d/0x490 udf_expand_file_adinicb+0x255/0x400 [udf] udf_file_write_iter+0xbe/0x1b0 [udf] new_sync_write+0x125/0x1c0 vfs_write+0x28e/0x400
Fix the problem by marking the page dirty and going through the standard writeback path to write the page. Strictly speaking we would not even have to write the page but we want to catch e.g. ENOSPC errors early.
Reported-by: butt3rflyh4ck butterflyhuangxx@gmail.com CC: stable@vger.kernel.org Fixes: 52ebea749aae ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks") Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/udf/inode.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
--- a/fs/udf/inode.c +++ b/fs/udf/inode.c @@ -258,10 +258,6 @@ int udf_expand_file_adinicb(struct inode char *kaddr; struct udf_inode_info *iinfo = UDF_I(inode); int err; - struct writeback_control udf_wbc = { - .sync_mode = WB_SYNC_NONE, - .nr_to_write = 1, - };
WARN_ON_ONCE(!inode_is_locked(inode)); if (!iinfo->i_lenAlloc) { @@ -305,8 +301,10 @@ int udf_expand_file_adinicb(struct inode iinfo->i_alloc_type = ICBTAG_FLAG_AD_LONG; /* from now on we have normal address_space methods */ inode->i_data.a_ops = &udf_aops; + set_page_dirty(page); + unlock_page(page); up_write(&iinfo->i_data_sem); - err = inode->i_data.a_ops->writepage(page, &udf_wbc); + err = filemap_fdatawrite(inode->i_mapping); if (err) { /* Restore everything back so that we don't lose data... */ lock_page(page);
From: Ard Biesheuvel ardb@kernel.org
commit f5390cd0b43c2e54c7cf5506c7da4a37c5cef746 upstream.
Aditya reports [0] that his recent MacbookPro crashes in the firmware when using the variable services at runtime. The culprit appears to be a call to QueryVariableInfo(), which we did not use to call on Apple x86 machines in the past as they only upgraded from EFI v1.10 to EFI v2.40 firmware fairly recently, and QueryVariableInfo() (along with UpdateCapsule() et al) was added in EFI v2.00.
The only runtime service introduced in EFI v2.00 that we actually use in Linux is QueryVariableInfo(), as the capsule based ones are optional, generally not used at runtime (all the LVFS/fwupd firmware update infrastructure uses helper EFI programs that invoke capsule update at boot time, not runtime), and not implemented by Apple machines in the first place. QueryVariableInfo() is used to 'safely' set variables, i.e., only when there is enough space. This prevents machines with buggy firmwares from corrupting their NVRAMs when they run out of space.
Given that Apple machines have been using EFI v1.10 services only for the longest time (the EFI v2.0 spec was released in 2006, and Linux support for the newly introduced runtime services was added in 2011, but the MacbookPro12,1 released in 2015 still claims to be EFI v1.10 only), let's avoid the EFI v2.0 ones on all Apple x86 machines.
[0] https://lore.kernel.org/all/6D757C75-65B1-468B-842D-10410081A8E4@live.com/
Cc: stable@vger.kernel.org Cc: Jeremy Kerr jk@ozlabs.org Cc: Matthew Garrett mjg59@srcf.ucam.org Reported-by: Aditya Garg gargaditya08@live.com Tested-by: Orlando Chamberlain redecorating@protonmail.com Signed-off-by: Ard Biesheuvel ardb@kernel.org Tested-by: Aditya Garg gargaditya08@live.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=215277 Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firmware/efi/efi.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -722,6 +722,13 @@ void __init efi_systab_report_header(con systab_hdr->revision >> 16, systab_hdr->revision & 0xffff, vendor); + + if (IS_ENABLED(CONFIG_X86_64) && + systab_hdr->revision > EFI_1_10_SYSTEM_TABLE_REVISION && + !strcmp(vendor, "Apple")) { + pr_info("Apple Mac detected, using EFI v1.10 runtime services only\n"); + efi.runtime_version = EFI_1_10_SYSTEM_TABLE_REVISION; + } }
static __initdata char memory_type_name[][13] = {
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
commit c9d967b2ce40d71e968eb839f36c936b8a9cf1ea upstream.
The buffer handling in pm_show_wakelocks() is tricky, and hopefully correct. Ensure it really is correct by using sysfs_emit_at() which handles all of the tricky string handling logic in a PAGE_SIZE buffer for us automatically as this is a sysfs file being read from.
Reviewed-by: Lee Jones lee.jones@linaro.org Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/power/wakelock.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)
--- a/kernel/power/wakelock.c +++ b/kernel/power/wakelock.c @@ -39,23 +39,20 @@ ssize_t pm_show_wakelocks(char *buf, boo { struct rb_node *node; struct wakelock *wl; - char *str = buf; - char *end = buf + PAGE_SIZE; + int len = 0;
mutex_lock(&wakelocks_lock);
for (node = rb_first(&wakelocks_tree); node; node = rb_next(node)) { wl = rb_entry(node, struct wakelock, node); if (wl->ws->active == show_active) - str += scnprintf(str, end - str, "%s ", wl->name); + len += sysfs_emit_at(buf, len, "%s ", wl->name); } - if (str > buf) - str--;
- str += scnprintf(str, end - str, "\n"); + len += sysfs_emit_at(buf, len, "\n");
mutex_unlock(&wakelocks_lock); - return (str - buf); + return len; }
#if CONFIG_PM_WAKELOCKS_LIMIT > 0
From: Xiaoke Wang xkernel.wang@foxmail.com
commit e629e7b525a179e29d53463d992bdee759c950fb upstream.
kfree() is missing on an error path to free the memory allocated by kstrdup():
p = param = kstrdup(data->params[i], GFP_KERNEL);
So it is better to free it via kfree(p).
Link: https://lkml.kernel.org/r/tencent_C52895FD37802832A3E5B272D05008866F0A@qq.co...
Cc: stable@vger.kernel.org Fixes: d380dcde9a07c ("tracing: Fix now invalid var_ref_vals assumption in trace action") Signed-off-by: Xiaoke Wang xkernel.wang@foxmail.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace_events_hist.c | 1 + 1 file changed, 1 insertion(+)
--- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -3919,6 +3919,7 @@ static int trace_action_create(struct hi
var_ref_idx = find_var_ref_idx(hist_data, var_ref); if (WARN_ON(var_ref_idx < 0)) { + kfree(p); ret = var_ref_idx; goto err; }
From: Tom Zanussi zanussi@kernel.org
commit 097f1eefedeab528cecbd35586dfe293853ffb17 upstream.
During expression parsing, a new expression field is created which should inherit the properties of the operands, such as size and is_signed.
is_signed propagation was missing, causing spurious errors with signed operands. Add it in parse_expr() and parse_unary() to fix the problem.
Link: https://lkml.kernel.org/r/f4dac08742fd7a0920bf80a73c6c44042f5eaa40.164331970...
Cc: stable@vger.kernel.org Fixes: 100719dcef447 ("tracing: Add simple expression support to hist triggers") Reported-by: Yordan Karadzhov ykaradzhov@vmware.com BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215513 Signed-off-by: Tom Zanussi zanussi@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace_events_hist.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -2487,6 +2487,8 @@ static struct hist_field *parse_unary(st (HIST_FIELD_FL_TIMESTAMP | HIST_FIELD_FL_TIMESTAMP_USECS); expr->fn = hist_field_unary_minus; expr->operands[0] = operand1; + expr->size = operand1->size; + expr->is_signed = operand1->is_signed; expr->operator = FIELD_OP_UNARY_MINUS; expr->name = expr_str(expr, 0); expr->type = kstrdup_const(operand1->type, GFP_KERNEL); @@ -2703,6 +2705,7 @@ static struct hist_field *parse_expr(str
/* The operand sizes should be the same, so just pick one */ expr->size = operand1->size; + expr->is_signed = operand1->is_signed;
expr->operator = field_op; expr->type = kstrdup_const(operand1->type, GFP_KERNEL);
From: Tom Zanussi zanussi@kernel.org
commit 67ab5eb71b37b55f7c5522d080a1b42823351776 upstream.
tr->n_err_log_entries should only be increased if entry allocation succeeds.
Doing it when it fails won't cause any problems other than wasting an entry, but should be fixed anyway.
Link: https://lkml.kernel.org/r/cad1ab28f75968db0f466925e7cba5970cec6c29.164331970...
Cc: stable@vger.kernel.org Fixes: 2f754e771b1a6 ("tracing: Don't inc err_log entry count if entry allocation fails") Signed-off-by: Tom Zanussi zanussi@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -7734,7 +7734,8 @@ static struct tracing_log_err *get_traci err = kzalloc(sizeof(*err), GFP_KERNEL); if (!err) err = ERR_PTR(-ENOMEM); - tr->n_err_log_entries++; + else + tr->n_err_log_entries++;
return err; }
From: Jeff Layton jlayton@kernel.org
commit 932a9b5870d38b87ba0a9923c804b1af7d3605b9 upstream.
The reference acquired by try_prep_async_create is currently leaked. Ensure we put it.
Cc: stable@vger.kernel.org Fixes: 9a8d03ca2e2c ("ceph: attempt to do async create when possible") Signed-off-by: Jeff Layton jlayton@kernel.org Reviewed-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ceph/file.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -746,8 +746,10 @@ retry: restore_deleg_ino(dir, req->r_deleg_ino); ceph_mdsc_put_request(req); try_async = false; + ceph_put_string(rcu_dereference_raw(lo.pool_ns)); goto retry; } + ceph_put_string(rcu_dereference_raw(lo.pool_ns)); goto out_req; } }
From: Jeff Layton jlayton@kernel.org
commit 4584a768f22b7669cdebabc911543621ac661341 upstream.
Dan reported that he was unable to write to files that had been asynchronously created when the client's OSD caps are restricted to a particular namespace.
The issue is that the layout for the new inode is only partially being filled. Ensure that we populate the pool_ns_data and pool_ns_len in the iinfo before calling ceph_fill_inode.
Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/54013 Fixes: 9a8d03ca2e2c ("ceph: attempt to do async create when possible") Reported-by: Dan van der Ster dan@vanderster.com Signed-off-by: Jeff Layton jlayton@kernel.org Reviewed-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ceph/file.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -579,6 +579,7 @@ static int ceph_finish_async_create(stru struct ceph_inode_info *ci = ceph_inode(dir); struct inode *inode; struct timespec64 now; + struct ceph_string *pool_ns; struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(dir->i_sb); struct ceph_vino vino = { .ino = req->r_deleg_ino, .snap = CEPH_NOSNAP }; @@ -628,6 +629,12 @@ static int ceph_finish_async_create(stru in.max_size = cpu_to_le64(lo->stripe_unit);
ceph_file_layout_to_legacy(lo, &in.layout); + /* lo is private, so pool_ns can't change */ + pool_ns = rcu_dereference_raw(lo->pool_ns); + if (pool_ns) { + iinfo.pool_ns_len = pool_ns->len; + iinfo.pool_ns_data = pool_ns->str; + }
down_read(&mdsc->snap_rwsem); ret = ceph_fill_inode(inode, NULL, &iinfo, NULL, req->r_session,
From: Amir Goldstein amir73il@gmail.com
commit a37d9a17f099072fe4d3a9048b0321978707a918 upstream.
Apparently, there are some applications that use IN_DELETE event as an invalidation mechanism and expect that if they try to open a file with the name reported with the delete event, that it should not contain the content of the deleted file.
Commit 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of d_delete()") moved the fsnotify delete hook before d_delete() so fsnotify will have access to a positive dentry.
This allowed a race where opening the deleted file via cached dentry is now possible after receiving the IN_DELETE event.
To fix the regression, create a new hook fsnotify_delete() that takes the unlinked inode as an argument and use a helper d_delete_notify() to pin the inode, so we can pass it to fsnotify_delete() after d_delete().
Backporting hint: this regression is from v5.3. Although patch will apply with only trivial conflicts to v5.4 and v5.10, it won't build, because fsnotify_delete() implementation is different in each of those versions (see fsnotify_link()).
A follow up patch will fix the fsnotify_unlink/rmdir() calls in pseudo filesystem that do not need to call d_delete().
Link: https://lore.kernel.org/r/20220120215305.282577-1-amir73il@gmail.com Reported-by: Ivan Delalande colona@arista.com Link: https://lore.kernel.org/linux-fsdevel/YeNyzoDM5hP5LtGW@visor/ Fixes: 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of d_delete()") Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/ioctl.c | 6 +---- fs/namei.c | 10 ++++----- include/linux/fsnotify.h | 49 +++++++++++++++++++++++++++++++++++++++++------ 3 files changed, 50 insertions(+), 15 deletions(-)
--- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3126,10 +3126,8 @@ static noinline int btrfs_ioctl_snap_des btrfs_inode_lock(inode, 0); err = btrfs_delete_subvolume(dir, dentry); btrfs_inode_unlock(inode, 0); - if (!err) { - fsnotify_rmdir(dir, dentry); - d_delete(dentry); - } + if (!err) + d_delete_notify(dir, dentry);
out_dput: dput(dentry); --- a/fs/namei.c +++ b/fs/namei.c @@ -3973,13 +3973,12 @@ int vfs_rmdir(struct user_namespace *mnt dentry->d_inode->i_flags |= S_DEAD; dont_mount(dentry); detach_mounts(dentry); - fsnotify_rmdir(dir, dentry);
out: inode_unlock(dentry->d_inode); dput(dentry); if (!error) - d_delete(dentry); + d_delete_notify(dir, dentry); return error; } EXPORT_SYMBOL(vfs_rmdir); @@ -4101,7 +4100,6 @@ int vfs_unlink(struct user_namespace *mn if (!error) { dont_mount(dentry); detach_mounts(dentry); - fsnotify_unlink(dir, dentry); } } } @@ -4109,9 +4107,11 @@ out: inode_unlock(target);
/* We don't d_delete() NFS sillyrenamed files--they still exist. */ - if (!error && !(dentry->d_flags & DCACHE_NFSFS_RENAMED)) { + if (!error && dentry->d_flags & DCACHE_NFSFS_RENAMED) { + fsnotify_unlink(dir, dentry); + } else if (!error) { fsnotify_link_count(target); - d_delete(dentry); + d_delete_notify(dir, dentry); }
return error; --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -222,16 +222,53 @@ static inline void fsnotify_link(struct }
/* + * fsnotify_delete - @dentry was unlinked and unhashed + * + * Caller must make sure that dentry->d_name is stable. + * + * Note: unlike fsnotify_unlink(), we have to pass also the unlinked inode + * as this may be called after d_delete() and old_dentry may be negative. + */ +static inline void fsnotify_delete(struct inode *dir, struct inode *inode, + struct dentry *dentry) +{ + __u32 mask = FS_DELETE; + + if (S_ISDIR(inode->i_mode)) + mask |= FS_ISDIR; + + fsnotify_name(mask, inode, FSNOTIFY_EVENT_INODE, dir, &dentry->d_name, + 0); +} + +/** + * d_delete_notify - delete a dentry and call fsnotify_delete() + * @dentry: The dentry to delete + * + * This helper is used to guaranty that the unlinked inode cannot be found + * by lookup of this name after fsnotify_delete() event has been delivered. + */ +static inline void d_delete_notify(struct inode *dir, struct dentry *dentry) +{ + struct inode *inode = d_inode(dentry); + + ihold(inode); + d_delete(dentry); + fsnotify_delete(dir, inode, dentry); + iput(inode); +} + +/* * fsnotify_unlink - 'name' was unlinked * * Caller must make sure that dentry->d_name is stable. */ static inline void fsnotify_unlink(struct inode *dir, struct dentry *dentry) { - /* Expected to be called before d_delete() */ - WARN_ON_ONCE(d_is_negative(dentry)); + if (WARN_ON_ONCE(d_is_negative(dentry))) + return;
- fsnotify_dirent(dir, dentry, FS_DELETE); + fsnotify_delete(dir, d_inode(dentry), dentry); }
/* @@ -255,10 +292,10 @@ static inline void fsnotify_mkdir(struct */ static inline void fsnotify_rmdir(struct inode *dir, struct dentry *dentry) { - /* Expected to be called before d_delete() */ - WARN_ON_ONCE(d_is_negative(dentry)); + if (WARN_ON_ONCE(d_is_negative(dentry))) + return;
- fsnotify_dirent(dir, dentry, FS_DELETE | FS_ISDIR); + fsnotify_delete(dir, d_inode(dentry), dentry); }
/*
From: Amir Goldstein amir73il@gmail.com
commit 29044dae2e746949ad4b9cbdbfb248994d1dcdb4 upstream.
Commit 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of d_delete()") moved the fsnotify delete hook before d_delete() so fsnotify will have access to a positive dentry.
This allowed a race where opening the deleted file via cached dentry is now possible after receiving the IN_DELETE event.
To fix the regression in pseudo filesystems, convert d_delete() calls to d_drop() (see commit 46c46f8df9aa ("devpts_pty_kill(): don't bother with d_delete()") and move the fsnotify hook after d_drop().
Add a missing fsnotify_unlink() hook in nfsdfs that was found during the audit of fsnotify hooks in pseudo filesystems.
Note that the fsnotify hooks in simple_recursive_removal() follow d_invalidate(), so they require no change.
Link: https://lore.kernel.org/r/20220120215305.282577-2-amir73il@gmail.com Reported-by: Ivan Delalande colona@arista.com Link: https://lore.kernel.org/linux-fsdevel/YeNyzoDM5hP5LtGW@visor/ Fixes: 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of d_delete()") Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/configfs/dir.c | 6 +++--- fs/devpts/inode.c | 2 +- fs/nfsd/nfsctl.c | 5 +++-- net/sunrpc/rpc_pipe.c | 4 ++-- 4 files changed, 9 insertions(+), 8 deletions(-)
--- a/fs/configfs/dir.c +++ b/fs/configfs/dir.c @@ -1780,8 +1780,8 @@ void configfs_unregister_group(struct co configfs_detach_group(&group->cg_item); d_inode(dentry)->i_flags |= S_DEAD; dont_mount(dentry); + d_drop(dentry); fsnotify_rmdir(d_inode(parent), dentry); - d_delete(dentry); inode_unlock(d_inode(parent));
dput(dentry); @@ -1922,10 +1922,10 @@ void configfs_unregister_subsystem(struc configfs_detach_group(&group->cg_item); d_inode(dentry)->i_flags |= S_DEAD; dont_mount(dentry); - fsnotify_rmdir(d_inode(root), dentry); inode_unlock(d_inode(dentry));
- d_delete(dentry); + d_drop(dentry); + fsnotify_rmdir(d_inode(root), dentry);
inode_unlock(d_inode(root));
--- a/fs/devpts/inode.c +++ b/fs/devpts/inode.c @@ -621,8 +621,8 @@ void devpts_pty_kill(struct dentry *dent
dentry->d_fsdata = NULL; drop_nlink(dentry->d_inode); - fsnotify_unlink(d_inode(dentry->d_parent), dentry); d_drop(dentry); + fsnotify_unlink(d_inode(dentry->d_parent), dentry); dput(dentry); /* d_alloc_name() in devpts_pty_new() */ }
--- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1249,7 +1249,8 @@ static void nfsdfs_remove_file(struct in clear_ncl(d_inode(dentry)); dget(dentry); ret = simple_unlink(dir, dentry); - d_delete(dentry); + d_drop(dentry); + fsnotify_unlink(dir, dentry); dput(dentry); WARN_ON_ONCE(ret); } @@ -1340,8 +1341,8 @@ void nfsd_client_rmdir(struct dentry *de dget(dentry); ret = simple_rmdir(dir, dentry); WARN_ON_ONCE(ret); + d_drop(dentry); fsnotify_rmdir(dir, dentry); - d_delete(dentry); dput(dentry); inode_unlock(dir); } --- a/net/sunrpc/rpc_pipe.c +++ b/net/sunrpc/rpc_pipe.c @@ -600,9 +600,9 @@ static int __rpc_rmdir(struct inode *dir
dget(dentry); ret = simple_rmdir(dir, dentry); + d_drop(dentry); if (!ret) fsnotify_rmdir(dir, dentry); - d_delete(dentry); dput(dentry); return ret; } @@ -613,9 +613,9 @@ static int __rpc_unlink(struct inode *di
dget(dentry); ret = simple_unlink(dir, dentry); + d_drop(dentry); if (!ret) fsnotify_unlink(dir, dentry); - d_delete(dentry); dput(dentry); return ret; }
From: Sean Christopherson seanjc@google.com
commit 31c25585695abdf03d6160aa6d829e855b256329 upstream.
Revert a completely broken check on an "invalid" RIP in SVM's workaround for the DecodeAssists SMAP errata. kvm_vcpu_gfn_to_memslot() obviously expects a gfn, i.e. operates in the guest physical address space, whereas RIP is a virtual (not even linear) address. The "fix" worked for the problematic KVM selftest because the test identity mapped RIP.
Fully revert the hack instead of trying to translate RIP to a GPA, as the non-SEV case is now handled earlier, and KVM cannot access guest page tables to translate RIP.
This reverts commit e72436bc3a5206f95bb384e741154166ddb3202e.
Fixes: e72436bc3a52 ("KVM: SVM: avoid infinite loop on NPF from bad address") Reported-by: Liam Merwick liam.merwick@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson seanjc@google.com Reviewed-by: Liam Merwick liam.merwick@oracle.com Message-Id: 20220120010719.711476-3-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/svm.c | 7 ------- virt/kvm/kvm_main.c | 1 - 2 files changed, 8 deletions(-)
--- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4513,13 +4513,6 @@ static bool svm_can_emulate_instruction( if (likely(!insn || insn_len)) return true;
- /* - * If RIP is invalid, go ahead with emulation which will cause an - * internal error exit. - */ - if (!kvm_vcpu_gfn_to_memslot(vcpu, kvm_rip_read(vcpu) >> PAGE_SHIFT)) - return true; - cr4 = kvm_read_cr4(vcpu); smep = cr4 & X86_CR4_SMEP; smap = cr4 & X86_CR4_SMAP; --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2112,7 +2112,6 @@ struct kvm_memory_slot *kvm_vcpu_gfn_to_
return NULL; } -EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_memslot);
bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn) {
From: Suren Baghdasaryan surenb@google.com
commit a06247c6804f1a7c86a2e5398a4c1f1db1471848 upstream.
With write operation on psi files replacing old trigger with a new one, the lifetime of its waitqueue is totally arbitrary. Overwriting an existing trigger causes its waitqueue to be freed and pending poll() will stumble on trigger->event_wait which was destroyed. Fix this by disallowing to redefine an existing psi trigger. If a write operation is used on a file descriptor with an already existing psi trigger, the operation will fail with EBUSY error. Also bypass a check for psi_disabled in the psi_trigger_destroy as the flag can be flipped after the trigger is created, leading to a memory leak.
Fixes: 0e94682b73bf ("psi: introduce psi monitor") Reported-by: syzbot+cdb5dd11c97cc532efad@syzkaller.appspotmail.com Suggested-by: Linus Torvalds torvalds@linux-foundation.org Analyzed-by: Eric Biggers ebiggers@kernel.org Signed-off-by: Suren Baghdasaryan surenb@google.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Eric Biggers ebiggers@google.com Acked-by: Johannes Weiner hannes@cmpxchg.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220111232309.1786347-1-surenb@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/accounting/psi.rst | 3 + include/linux/psi.h | 2 - include/linux/psi_types.h | 3 - kernel/cgroup/cgroup.c | 11 ++++-- kernel/sched/psi.c | 66 +++++++++++++++++---------------------- 5 files changed, 40 insertions(+), 45 deletions(-)
--- a/Documentation/accounting/psi.rst +++ b/Documentation/accounting/psi.rst @@ -92,7 +92,8 @@ Triggers can be set on more than one psi for the same psi metric can be specified. However for each trigger a separate file descriptor is required to be able to poll it separately from others, therefore for each trigger a separate open() syscall should be made even -when opening the same psi interface file. +when opening the same psi interface file. Write operations to a file descriptor +with an already existing psi trigger will fail with EBUSY.
Monitors activate only when system enters stall state for the monitored psi metric and deactivates upon exit from the stall state. While system is --- a/include/linux/psi.h +++ b/include/linux/psi.h @@ -32,7 +32,7 @@ void cgroup_move_task(struct task_struct
struct psi_trigger *psi_trigger_create(struct psi_group *group, char *buf, size_t nbytes, enum psi_res res); -void psi_trigger_replace(void **trigger_ptr, struct psi_trigger *t); +void psi_trigger_destroy(struct psi_trigger *t);
__poll_t psi_trigger_poll(void **trigger_ptr, struct file *file, poll_table *wait); --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -140,9 +140,6 @@ struct psi_trigger { * events to one per window */ u64 last_event_time; - - /* Refcounting to prevent premature destruction */ - struct kref refcount; };
struct psi_group { --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -3642,6 +3642,12 @@ static ssize_t cgroup_pressure_write(str cgroup_get(cgrp); cgroup_kn_unlock(of->kn);
+ /* Allow only one trigger per file descriptor */ + if (ctx->psi.trigger) { + cgroup_put(cgrp); + return -EBUSY; + } + psi = cgroup_ino(cgrp) == 1 ? &psi_system : &cgrp->psi; new = psi_trigger_create(psi, buf, nbytes, res); if (IS_ERR(new)) { @@ -3649,8 +3655,7 @@ static ssize_t cgroup_pressure_write(str return PTR_ERR(new); }
- psi_trigger_replace(&ctx->psi.trigger, new); - + smp_store_release(&ctx->psi.trigger, new); cgroup_put(cgrp);
return nbytes; @@ -3689,7 +3694,7 @@ static void cgroup_pressure_release(stru { struct cgroup_file_ctx *ctx = of->priv;
- psi_trigger_replace(&ctx->psi.trigger, NULL); + psi_trigger_destroy(ctx->psi.trigger); }
bool cgroup_psi_enabled(void) --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -1162,7 +1162,6 @@ struct psi_trigger *psi_trigger_create(s t->event = 0; t->last_event_time = 0; init_waitqueue_head(&t->event_wait); - kref_init(&t->refcount);
mutex_lock(&group->trigger_lock);
@@ -1191,15 +1190,19 @@ struct psi_trigger *psi_trigger_create(s return t; }
-static void psi_trigger_destroy(struct kref *ref) +void psi_trigger_destroy(struct psi_trigger *t) { - struct psi_trigger *t = container_of(ref, struct psi_trigger, refcount); - struct psi_group *group = t->group; + struct psi_group *group; struct task_struct *task_to_destroy = NULL;
- if (static_branch_likely(&psi_disabled)) + /* + * We do not check psi_disabled since it might have been disabled after + * the trigger got created. + */ + if (!t) return;
+ group = t->group; /* * Wakeup waiters to stop polling. Can happen if cgroup is deleted * from under a polling process. @@ -1235,9 +1238,9 @@ static void psi_trigger_destroy(struct k mutex_unlock(&group->trigger_lock);
/* - * Wait for both *trigger_ptr from psi_trigger_replace and - * poll_task RCUs to complete their read-side critical sections - * before destroying the trigger and optionally the poll_task + * Wait for psi_schedule_poll_work RCU to complete its read-side + * critical section before destroying the trigger and optionally the + * poll_task. */ synchronize_rcu(); /* @@ -1254,18 +1257,6 @@ static void psi_trigger_destroy(struct k kfree(t); }
-void psi_trigger_replace(void **trigger_ptr, struct psi_trigger *new) -{ - struct psi_trigger *old = *trigger_ptr; - - if (static_branch_likely(&psi_disabled)) - return; - - rcu_assign_pointer(*trigger_ptr, new); - if (old) - kref_put(&old->refcount, psi_trigger_destroy); -} - __poll_t psi_trigger_poll(void **trigger_ptr, struct file *file, poll_table *wait) { @@ -1275,24 +1266,15 @@ __poll_t psi_trigger_poll(void **trigger if (static_branch_likely(&psi_disabled)) return DEFAULT_POLLMASK | EPOLLERR | EPOLLPRI;
- rcu_read_lock(); - - t = rcu_dereference(*(void __rcu __force **)trigger_ptr); - if (!t) { - rcu_read_unlock(); + t = smp_load_acquire(trigger_ptr); + if (!t) return DEFAULT_POLLMASK | EPOLLERR | EPOLLPRI; - } - kref_get(&t->refcount); - - rcu_read_unlock();
poll_wait(file, &t->event_wait, wait);
if (cmpxchg(&t->event, 1, 0) == 1) ret |= EPOLLPRI;
- kref_put(&t->refcount, psi_trigger_destroy); - return ret; }
@@ -1316,14 +1298,24 @@ static ssize_t psi_write(struct file *fi
buf[buf_size - 1] = '\0';
- new = psi_trigger_create(&psi_system, buf, nbytes, res); - if (IS_ERR(new)) - return PTR_ERR(new); - seq = file->private_data; + /* Take seq->lock to protect seq->private from concurrent writes */ mutex_lock(&seq->lock); - psi_trigger_replace(&seq->private, new); + + /* Allow only one trigger per file descriptor */ + if (seq->private) { + mutex_unlock(&seq->lock); + return -EBUSY; + } + + new = psi_trigger_create(&psi_system, buf, nbytes, res); + if (IS_ERR(new)) { + mutex_unlock(&seq->lock); + return PTR_ERR(new); + } + + smp_store_release(&seq->private, new); mutex_unlock(&seq->lock);
return nbytes; @@ -1358,7 +1350,7 @@ static int psi_fop_release(struct inode { struct seq_file *seq = file->private_data;
- psi_trigger_replace(&seq->private, NULL); + psi_trigger_destroy(seq->private); return single_release(inode, file); }
From: Christophe Leroy christophe.leroy@csgroup.eu
commit 252745240ba0ae774d2f80c5e185ed59fbc4fb41 upstream.
Commit 770cec16cdc9 ("powerpc/audit: Simplify syscall_get_arch()") and commit 898a1ef06ad4 ("powerpc/audit: Avoid unneccessary #ifdef in syscall_get_arguments()") replaced test_tsk_thread_flag(task, TIF_32BIT)) by is_32bit_task().
But is_32bit_task() applies on current task while be want the test done on task 'task'
So add a new macro is_tsk_32bit_task() to check any task.
Fixes: 770cec16cdc9 ("powerpc/audit: Simplify syscall_get_arch()") Fixes: 898a1ef06ad4 ("powerpc/audit: Avoid unneccessary #ifdef in syscall_get_arguments()") Cc: stable@vger.kernel.org Reported-by: Dmitry V. Levin ldv@altlinux.org Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/c55cddb8f65713bf5859ed675d75a50cb37d5995.164215957... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/syscall.h | 4 ++-- arch/powerpc/include/asm/thread_info.h | 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-)
--- a/arch/powerpc/include/asm/syscall.h +++ b/arch/powerpc/include/asm/syscall.h @@ -90,7 +90,7 @@ static inline void syscall_get_arguments unsigned long val, mask = -1UL; unsigned int n = 6;
- if (is_32bit_task()) + if (is_tsk_32bit_task(task)) mask = 0xffffffff;
while (n--) { @@ -105,7 +105,7 @@ static inline void syscall_get_arguments
static inline int syscall_get_arch(struct task_struct *task) { - if (is_32bit_task()) + if (is_tsk_32bit_task(task)) return AUDIT_ARCH_PPC; else if (IS_ENABLED(CONFIG_CPU_LITTLE_ENDIAN)) return AUDIT_ARCH_PPC64LE; --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -168,8 +168,10 @@ static inline bool test_thread_local_fla
#ifdef CONFIG_COMPAT #define is_32bit_task() (test_thread_flag(TIF_32BIT)) +#define is_tsk_32bit_task(tsk) (test_tsk_thread_flag(tsk, TIF_32BIT)) #else #define is_32bit_task() (IS_ENABLED(CONFIG_PPC32)) +#define is_tsk_32bit_task(tsk) (IS_ENABLED(CONFIG_PPC32)) #endif
#if defined(CONFIG_PPC64)
From: Zhengjun Xing zhengjun.xing@linux.intel.com
commit 96fd2e89fba1aaada6f4b1e5d25a9d9ecbe1943d upstream.
The user recently report a perf issue in the ICX platform, when test by perf event “uncore_imc_x/cas_count_write”,the write bandwidth is always very small (only 0.38MB/s), it is caused by the wrong "umask" for the "cas_count_write" event. When double-checking, find "cas_count_read" also is wrong.
The public document for ICX uncore:
3rd Gen Intel® Xeon® Processor Scalable Family, Codename Ice Lake,Uncore Performance Monitoring Reference Manual, Revision 1.00, May 2021
On 2.4.7, it defines Unit Masks for CAS_COUNT: RD b00001111 WR b00110000
So corrected both "cas_count_read" and "cas_count_write" for ICX.
Old settings: hswep_uncore_imc_events INTEL_UNCORE_EVENT_DESC(cas_count_read, "event=0x04,umask=0x03") INTEL_UNCORE_EVENT_DESC(cas_count_write, "event=0x04,umask=0x0c")
New settings: snr_uncore_imc_events INTEL_UNCORE_EVENT_DESC(cas_count_read, "event=0x04,umask=0x0f") INTEL_UNCORE_EVENT_DESC(cas_count_write, "event=0x04,umask=0x30")
Fixes: 2b3b76b5ec67 ("perf/x86/intel/uncore: Add Ice Lake server uncore support") Signed-off-by: Zhengjun Xing zhengjun.xing@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Adrian Hunter adrian.hunter@intel.com Reviewed-by: Kan Liang kan.liang@linux.intel.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20211223144826.841267-1-zhengjun.xing@linux.intel.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/events/intel/uncore_snbep.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/events/intel/uncore_snbep.c +++ b/arch/x86/events/intel/uncore_snbep.c @@ -5482,7 +5482,7 @@ static struct intel_uncore_type icx_unco .fixed_ctr_bits = 48, .fixed_ctr = SNR_IMC_MMIO_PMON_FIXED_CTR, .fixed_ctl = SNR_IMC_MMIO_PMON_FIXED_CTL, - .event_descs = hswep_uncore_imc_events, + .event_descs = snr_uncore_imc_events, .perf_ctr = SNR_IMC_MMIO_PMON_CTR0, .event_ctl = SNR_IMC_MMIO_PMON_CTL0, .event_mask = SNBEP_PMON_RAW_EVENT_MASK,
From: Kan Liang kan.liang@linux.intel.com
commit 7fa981cad216e9f64f49e22112f610c0bfed91bc upstream.
For some Alder Lake machine with all E-cores disabled in a BIOS, the below warning may be triggered.
[ 2.010766] hw perf events fixed 5 > max(4), clipping!
Current perf code relies on the CPUID leaf 0xA and leaf 7.EDX[15] to calculate the number of the counters and follow the below assumption.
For a hybrid configuration, the leaf 7.EDX[15] (X86_FEATURE_HYBRID_CPU) is set. The leaf 0xA only enumerate the common counters. Linux perf has to manually add the extra GP counters and fixed counters for P-cores. For a non-hybrid configuration, the X86_FEATURE_HYBRID_CPU should not be set. The leaf 0xA enumerates all counters.
However, that's not the case when all E-cores are disabled in a BIOS. Although there are only P-cores in the system, the leaf 7.EDX[15] (X86_FEATURE_HYBRID_CPU) is still set. But the leaf 0xA is updated to enumerate all counters of P-cores. The inconsistency triggers the warning.
Several software ways were considered to handle the inconsistency. - Drop the leaf 0xA and leaf 7.EDX[15] CPUID enumeration support. Hardcode the number of counters. This solution may be a problem for virtualization. A hypervisor cannot control the number of counters in a Linux guest via changing the guest CPUID enumeration anymore. - Find another CPUID bit that is also updated with E-cores disabled. There may be a problem in the virtualization environment too. Because a hypervisor may disable the feature/CPUID bit. - The P-cores have a maximum of 8 GP counters and 4 fixed counters on ADL. The maximum number can be used to detect the case. This solution is implemented in this patch.
Fixes: ee72a94ea4a6 ("perf/x86/intel: Fix fixed counter check warning for some Alder Lake") Reported-by: Damjan Marion (damarion) damarion@cisco.com Reported-by: Chan Edison edison_chan_gz@hotmail.com Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Tested-by: Damjan Marion (damarion) damarion@cisco.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1641925238-149288-1-git-send-email-kan.liang@linux... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/events/intel/core.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
--- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -6242,6 +6242,19 @@ __init int intel_pmu_init(void) pmu->num_counters = x86_pmu.num_counters; pmu->num_counters_fixed = x86_pmu.num_counters_fixed; } + + /* + * Quirk: For some Alder Lake machine, when all E-cores are disabled in + * a BIOS, the leaf 0xA will enumerate all counters of P-cores. However, + * the X86_FEATURE_HYBRID_CPU is still set. The above codes will + * mistakenly add extra counters for P-cores. Correct the number of + * counters here. + */ + if ((pmu->num_counters > 8) || (pmu->num_counters_fixed > 4)) { + pmu->num_counters = x86_pmu.num_counters; + pmu->num_counters_fixed = x86_pmu.num_counters_fixed; + } + pmu->max_pebs_events = min_t(unsigned, MAX_PEBS_EVENTS, pmu->num_counters); pmu->unconstrained = (struct event_constraint) __EVENT_CONSTRAINT(0, (1ULL << pmu->num_counters) - 1,
From: Lucas Stach l.stach@pengutronix.de
commit e3d26528e083e612314d4dcd713f3d5a26143ddc upstream.
While all userspace tried to limit commandstreams to 64K in size, a bug in the Mesa driver lead to command streams of up to 128K being submitted. Allow those to avoid breaking existing userspace.
Fixes: 6dfa2fab8ddd ("drm/etnaviv: limit submit sizes") Cc: stable@vger.kernel.org Signed-off-by: Lucas Stach l.stach@pengutronix.de Reviewed-by: Christian Gmeiner christian.gmeiner@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c @@ -469,8 +469,8 @@ int etnaviv_ioctl_gem_submit(struct drm_ return -EINVAL; }
- if (args->stream_size > SZ_64K || args->nr_relocs > SZ_64K || - args->nr_bos > SZ_64K || args->nr_pmrs > 128) { + if (args->stream_size > SZ_128K || args->nr_relocs > SZ_128K || + args->nr_bos > SZ_128K || args->nr_pmrs > 128) { DRM_ERROR("submit arguments out of size limits\n"); return -EINVAL; }
From: Manasi Navare manasi.d.navare@intel.com
commit 5ec1cebd59300ddd26dbaa96c17c508764eef911 upstream.
In case of a modeset where a mode gets split across multiple CRTCs in the driver specific implementation (bigjoiner in i915) we wrongly count the affected CRTCs based on the drm_crtc_mask and indicate the stolen CRTC as an affected CRTC in atomic_check_only(). This triggers a warning since affected CRTCs doent match requested CRTC.
To fix this in such bigjoiner configurations, we should only increment affected crtcs if that CRTC is enabled in UAPI not if it is just used internally in the driver to split the mode.
v3: Add the same uapi crtc_state->enable check in requested crtc calc (Ville)
Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Simon Ser contact@emersion.fr Cc: Pekka Paalanen pekka.paalanen@collabora.co.uk Cc: Daniel Stone daniels@collabora.com Cc: Daniel Vetter daniel.vetter@intel.com Cc: dri-devel@lists.freedesktop.org Cc: stable@vger.kernel.org # v5.11+ Fixes: 919c2299a893 ("drm/i915: Enable bigjoiner") Signed-off-by: Manasi Navare manasi.d.navare@intel.com Reviewed-by: Ville Syrjälä ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20211004115913.23889-1-manasi.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/drm_atomic.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/drm_atomic.c +++ b/drivers/gpu/drm/drm_atomic.c @@ -1310,8 +1310,10 @@ int drm_atomic_check_only(struct drm_ato
DRM_DEBUG_ATOMIC("checking %p\n", state);
- for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) - requested_crtc |= drm_crtc_mask(crtc); + for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) { + if (new_crtc_state->enable) + requested_crtc |= drm_crtc_mask(crtc); + }
for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) { ret = drm_atomic_plane_check(old_plane_state, new_plane_state); @@ -1360,8 +1362,10 @@ int drm_atomic_check_only(struct drm_ato } }
- for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) - affected_crtc |= drm_crtc_mask(crtc); + for_each_new_crtc_in_state(state, crtc, new_crtc_state, i) { + if (new_crtc_state->enable) + affected_crtc |= drm_crtc_mask(crtc); + }
/* * For commits that allow modesets drivers can add other CRTCs to the
From: Alex Deucher alexander.deucher@amd.com
commit 9e5a14bce2402e84251a10269df0235cd7ce9234 upstream.
Older radeon boards (r2xx-r5xx) had secondary PCI functions which we solely there for supporting multi-head on OSs with special requirements. Add them to the unsupported list as well so we don't attempt to bind to them. The driver would fail to bind to them anyway, but this does so in a cleaner way that should not confuse the user.
Cc: stable@vger.kernel.org Acked-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 81 ++++++++++++++++++++++++++++++++ 1 file changed, 81 insertions(+)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1523,6 +1523,87 @@ static const u16 amdgpu_unsupported_pcii 0x99A0, 0x99A2, 0x99A4, + /* radeon secondary ids */ + 0x3171, + 0x3e70, + 0x4164, + 0x4165, + 0x4166, + 0x4168, + 0x4170, + 0x4171, + 0x4172, + 0x4173, + 0x496e, + 0x4a69, + 0x4a6a, + 0x4a6b, + 0x4a70, + 0x4a74, + 0x4b69, + 0x4b6b, + 0x4b6c, + 0x4c6e, + 0x4e64, + 0x4e65, + 0x4e66, + 0x4e67, + 0x4e68, + 0x4e69, + 0x4e6a, + 0x4e71, + 0x4f73, + 0x5569, + 0x556b, + 0x556d, + 0x556f, + 0x5571, + 0x5854, + 0x5874, + 0x5940, + 0x5941, + 0x5b72, + 0x5b73, + 0x5b74, + 0x5b75, + 0x5d44, + 0x5d45, + 0x5d6d, + 0x5d6f, + 0x5d72, + 0x5d77, + 0x5e6b, + 0x5e6d, + 0x7120, + 0x7124, + 0x7129, + 0x712e, + 0x712f, + 0x7162, + 0x7163, + 0x7166, + 0x7167, + 0x7172, + 0x7173, + 0x71a0, + 0x71a1, + 0x71a3, + 0x71a7, + 0x71bb, + 0x71e0, + 0x71e1, + 0x71e2, + 0x71e6, + 0x71e7, + 0x71f2, + 0x7269, + 0x726b, + 0x726e, + 0x72a0, + 0x72a8, + 0x72b1, + 0x72b3, + 0x793f, };
static const struct pci_device_id pciidlist[] = {
From: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl
commit 2a807341ed1074ab83638f2fab08dffaa373f6b8 upstream.
Unused. Convert the divisions into asserts on the divisor, to debug why it is zero. The divide by zero is suspected of causing kernel panics.
While I have no idea where the zero is coming from I think this patch is a positive either way.
Cc: stable@vger.kernel.org Reviewed-by: Harry Wentland harry.wentland@amd.com Signed-off-by: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c | 1 - drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c | 2 -- drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c | 2 -- drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c | 2 -- drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.c | 2 -- drivers/gpu/drm/amd/display/dc/dml/display_mode_structs.h | 1 - drivers/gpu/drm/amd/display/dc/dml/display_rq_dlg_helpers.c | 3 --- drivers/gpu/drm/amd/display/dc/dml/dml1_display_rq_dlg_calc.c | 4 ---- 8 files changed, 17 deletions(-)
--- a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c +++ b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c @@ -503,7 +503,6 @@ static void dcn_bw_calc_rq_dlg_ttu( //input[in_idx].dout.output_standard;
/*todo: soc->sr_enter_plus_exit_time??*/ - dlg_sys_param->t_srx_delay_us = dc->dcn_ip->dcfclk_cstate_latency / v->dcf_clk_deep_sleep;
dml1_rq_dlg_get_rq_params(dml, rq_param, &input->pipe.src); dml1_extract_rq_regs(dml, rq_regs, rq_param); --- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c @@ -1576,8 +1576,6 @@ void dml20_rq_dlg_get_dlg_reg(struct dis dlg_sys_param.total_flip_bytes = get_total_immediate_flip_bytes(mode_lib, e2e_pipe_param, num_pipes); - dlg_sys_param.t_srx_delay_us = mode_lib->ip.dcfclk_cstate_latency - / dlg_sys_param.deepsleep_dcfclk_mhz; // TODO: Deprecated
print__dlg_sys_params_st(mode_lib, &dlg_sys_param);
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c @@ -1577,8 +1577,6 @@ void dml20v2_rq_dlg_get_dlg_reg(struct d dlg_sys_param.total_flip_bytes = get_total_immediate_flip_bytes(mode_lib, e2e_pipe_param, num_pipes); - dlg_sys_param.t_srx_delay_us = mode_lib->ip.dcfclk_cstate_latency - / dlg_sys_param.deepsleep_dcfclk_mhz; // TODO: Deprecated
print__dlg_sys_params_st(mode_lib, &dlg_sys_param);
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c @@ -1688,8 +1688,6 @@ void dml21_rq_dlg_get_dlg_reg( mode_lib, e2e_pipe_param, num_pipes); - dlg_sys_param.t_srx_delay_us = mode_lib->ip.dcfclk_cstate_latency - / dlg_sys_param.deepsleep_dcfclk_mhz; // TODO: Deprecated
print__dlg_sys_params_st(mode_lib, &dlg_sys_param);
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.c @@ -1858,8 +1858,6 @@ void dml30_rq_dlg_get_dlg_reg(struct dis dlg_sys_param.total_flip_bytes = get_total_immediate_flip_bytes(mode_lib, e2e_pipe_param, num_pipes); - dlg_sys_param.t_srx_delay_us = mode_lib->ip.dcfclk_cstate_latency - / dlg_sys_param.deepsleep_dcfclk_mhz; // TODO: Deprecated
print__dlg_sys_params_st(mode_lib, &dlg_sys_param);
--- a/drivers/gpu/drm/amd/display/dc/dml/display_mode_structs.h +++ b/drivers/gpu/drm/amd/display/dc/dml/display_mode_structs.h @@ -546,7 +546,6 @@ struct _vcs_dpi_display_dlg_sys_params_s double t_sr_wm_us; double t_extra_us; double mem_trip_us; - double t_srx_delay_us; double deepsleep_dcfclk_mhz; double total_flip_bw; unsigned int total_flip_bytes; --- a/drivers/gpu/drm/amd/display/dc/dml/display_rq_dlg_helpers.c +++ b/drivers/gpu/drm/amd/display/dc/dml/display_rq_dlg_helpers.c @@ -142,9 +142,6 @@ void print__dlg_sys_params_st(struct dis dml_print("DML_RQ_DLG_CALC: t_sr_wm_us = %3.2f\n", dlg_sys_param->t_sr_wm_us); dml_print("DML_RQ_DLG_CALC: t_extra_us = %3.2f\n", dlg_sys_param->t_extra_us); dml_print( - "DML_RQ_DLG_CALC: t_srx_delay_us = %3.2f\n", - dlg_sys_param->t_srx_delay_us); - dml_print( "DML_RQ_DLG_CALC: deepsleep_dcfclk_mhz = %3.2f\n", dlg_sys_param->deepsleep_dcfclk_mhz); dml_print( --- a/drivers/gpu/drm/amd/display/dc/dml/dml1_display_rq_dlg_calc.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dml1_display_rq_dlg_calc.c @@ -1331,10 +1331,6 @@ void dml1_rq_dlg_get_dlg_params( if (dual_plane) DTRACE("DLG: %s: swath_height_c = %d", __func__, swath_height_c);
- DTRACE( - "DLG: %s: t_srx_delay_us = %3.2f", - __func__, - (double) dlg_sys_param->t_srx_delay_us); DTRACE("DLG: %s: line_time_in_us = %3.2f", __func__, (double) line_time_in_us); DTRACE("DLG: %s: vupdate_offset = %d", __func__, vupdate_offset); DTRACE("DLG: %s: vupdate_width = %d", __func__, vupdate_width);
From: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl
commit 72a8d87b87270bff0c0b2fed4d59c48d0dd840d7 upstream.
It calls populate_dml_pipes which uses doubles to initialize the scale_ratio_depth params. Mirrors the dcn20 logic.
Cc: stable@vger.kernel.org Signed-off-by: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c @@ -1880,7 +1880,6 @@ noinline bool dcn30_internal_validate_bw dc->res_pool->funcs->update_soc_for_wm_a(dc, context); pipe_cnt = dc->res_pool->funcs->populate_dml_pipes(dc, context, pipes, fast_validate);
- DC_FP_START(); if (!pipe_cnt) { out = true; goto validate_out; @@ -2106,7 +2105,6 @@ validate_fail: out = false;
validate_out: - DC_FP_END(); return out; }
@@ -2308,7 +2306,9 @@ bool dcn30_validate_bandwidth(struct dc
BW_VAL_TRACE_COUNT();
+ DC_FP_START(); out = dcn30_internal_validate_bw(dc, context, pipes, &pipe_cnt, &vlevel, fast_validate); + DC_FP_END();
if (pipe_cnt == 0) goto validate_out;
From: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl
commit 25f1488bdbba63415239ff301fe61a8546140d9f upstream.
Mirrors the logic for dcn30. Cue lots of WARNs and some kernel panics without this fix.
Cc: stable@vger.kernel.org Signed-off-by: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c | 11 +++++++++++ drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.c | 2 +- drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.h | 2 +- 3 files changed, 13 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c @@ -1391,6 +1391,17 @@ static void set_wm_ranges( pp_smu->nv_funcs.set_wm_ranges(&pp_smu->nv_funcs.pp_smu, &ranges); }
+static void dcn301_calculate_wm_and_dlg( + struct dc *dc, struct dc_state *context, + display_e2e_pipe_params_st *pipes, + int pipe_cnt, + int vlevel) +{ + DC_FP_START(); + dcn301_calculate_wm_and_dlg_fp(dc, context, pipes, pipe_cnt, vlevel); + DC_FP_END(); +} + static struct resource_funcs dcn301_res_pool_funcs = { .destroy = dcn301_destroy_resource_pool, .link_enc_create = dcn301_link_encoder_create, --- a/drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.c @@ -327,7 +327,7 @@ void dcn301_fpu_init_soc_bounding_box(st dcn3_01_soc.sr_exit_time_us = bb_info.dram_sr_exit_latency_100ns * 10; }
-void dcn301_calculate_wm_and_dlg(struct dc *dc, +void dcn301_calculate_wm_and_dlg_fp(struct dc *dc, struct dc_state *context, display_e2e_pipe_params_st *pipes, int pipe_cnt, --- a/drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.h +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn301/dcn301_fpu.h @@ -34,7 +34,7 @@ void dcn301_fpu_set_wm_ranges(int i,
void dcn301_fpu_init_soc_bounding_box(struct bp_soc_bb_info bb_info);
-void dcn301_calculate_wm_and_dlg(struct dc *dc, +void dcn301_calculate_wm_and_dlg_fp(struct dc *dc, struct dc_state *context, display_e2e_pipe_params_st *pipes, int pipe_cnt,
From: Wanpeng Li wanpengli@tencent.com
commit 35fe7cfbab2e81f1afb23fc4212210b1de6d9633 upstream.
The below warning is splatting during guest reboot.
------------[ cut here ]------------ WARNING: CPU: 0 PID: 1931 at arch/x86/kvm/x86.c:10322 kvm_arch_vcpu_ioctl_run+0x874/0x880 [kvm] CPU: 0 PID: 1931 Comm: qemu-system-x86 Tainted: G I 5.17.0-rc1+ #5 RIP: 0010:kvm_arch_vcpu_ioctl_run+0x874/0x880 [kvm] Call Trace: <TASK> kvm_vcpu_ioctl+0x279/0x710 [kvm] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fd39797350b
This can be triggered by not exposing tsc-deadline mode and doing a reboot in the guest. The lapic_shutdown() function which is called in sys_reboot path will not disarm the flying timer, it just masks LVTT. lapic_shutdown() clears APIC state w/ LVT_MASKED and timer-mode bit is 0, this can trigger timer-mode switch between tsc-deadline and oneshot/periodic, which can result in preemption timer be cancelled in apic_update_lvtt(). However, We can't depend on this when not exposing tsc-deadline mode and oneshot/periodic modes emulated by preemption timer. Qemu will synchronise states around reset, let's cancel preemption timer under KVM_SET_LAPIC.
Signed-off-by: Wanpeng Li wanpengli@tencent.com Message-Id: 1643102220-35667-1-git-send-email-wanpengli@tencent.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/lapic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2623,7 +2623,7 @@ int kvm_apic_set_state(struct kvm_vcpu * kvm_apic_set_version(vcpu);
apic_update_ppr(apic); - hrtimer_cancel(&apic->lapic_timer.timer); + cancel_apic_timer(apic); apic->lapic_timer.expired_tscdeadline = 0; apic_update_lvtt(apic); apic_manage_nmi_watchdog(apic, kvm_lapic_get_reg(apic, APIC_LVT0));
From: Sean Christopherson seanjc@google.com
commit 55467fcd55b89c622e62b4afe60ac0eb2fae91f2 upstream.
Always signal that emulation is possible for !SEV guests regardless of whether or not the CPU provided a valid instruction byte stream. KVM can read all guest state (memory and registers) for !SEV guests, i.e. can fetch the code stream from memory even if the CPU failed to do so because of the SMAP errata.
Fixes: 05d5a4863525 ("KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)") Cc: stable@vger.kernel.org Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Brijesh Singh brijesh.singh@amd.com Signed-off-by: Sean Christopherson seanjc@google.com Reviewed-by: Liam Merwick liam.merwick@oracle.com Message-Id: 20220120010719.711476-2-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/svm.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
--- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4464,8 +4464,13 @@ static bool svm_can_emulate_instruction( bool smep, smap, is_user; unsigned long cr4;
+ /* Emulation is always possible when KVM has access to all guest state. */ + if (!sev_guest(vcpu->kvm)) + return true; + /* - * When the guest is an SEV-ES guest, emulation is not possible. + * Emulation is impossible for SEV-ES guests as KVM doesn't have access + * to guest register state. */ if (sev_es_guest(vcpu->kvm)) return false; @@ -4518,9 +4523,6 @@ static bool svm_can_emulate_instruction( smap = cr4 & X86_CR4_SMAP; is_user = svm_get_cpl(vcpu) == 3; if (smap && (!smep || is_user)) { - if (!sev_guest(vcpu->kvm)) - return true; - pr_err_ratelimited("KVM: SEV Guest triggered AMD Erratum 1096\n"); kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu); }
From: Sean Christopherson seanjc@google.com
commit 0b0be065b7563ac708aaa9f69dd4941c80b3446d upstream.
Never intercept #GP for SEV guests as reading SEV guest private memory will return cyphertext, i.e. emulating on #GP can't work as intended.
Cc: stable@vger.kernel.org Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Brijesh Singh brijesh.singh@amd.com Signed-off-by: Sean Christopherson seanjc@google.com Reviewed-by: Liam Merwick liam.merwick@oracle.com Message-Id: 20220120010719.711476-4-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/svm.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
--- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -312,7 +312,11 @@ int svm_set_efer(struct kvm_vcpu *vcpu, return ret; }
- if (svm_gp_erratum_intercept) + /* + * Never intercept #GP for SEV guests, KVM can't + * decrypt guest memory to workaround the erratum. + */ + if (svm_gp_erratum_intercept && !sev_guest(vcpu->kvm)) set_exception_intercept(svm, GP_VECTOR); } } @@ -1238,9 +1242,10 @@ static void init_vmcb(struct kvm_vcpu *v * Guest access to VMware backdoor ports could legitimately * trigger #GP because of TSS I/O permission bitmap. * We intercept those #GP and allow access to them anyway - * as VMware does. + * as VMware does. Don't intercept #GP for SEV guests as KVM can't + * decrypt guest memory to decode the faulting instruction. */ - if (enable_vmware_backdoor) + if (enable_vmware_backdoor && !sev_guest(vcpu->kvm)) set_exception_intercept(svm, GP_VECTOR);
svm_set_intercept(svm, INTERCEPT_INTR);
From: Denis Valeev lemniscattaden@gmail.com
commit 47c28d436f409f5b009dc82bd82d4971088aa391 upstream.
The bug occurs on #GP triggered by VMware backdoor when eax value is unaligned. eax alignment check should not be applied to non-SVM instructions because it leads to incorrect omission of the instructions emulation. Apply the alignment check only to SVM instructions to fix.
Fixes: d1cba6c92237 ("KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround") Signed-off-by: Denis Valeev lemniscattaden@gmail.com Message-Id: Yexlhaoe1Fscm59u@q Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/svm.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
--- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -2306,10 +2306,6 @@ static int gp_interception(struct kvm_vc if (error_code) goto reinject;
- /* All SVM instructions expect page aligned RAX */ - if (svm->vmcb->save.rax & ~PAGE_MASK) - goto reinject; - /* Decode the instruction for usage later */ if (x86_decode_emulated_instruction(vcpu, 0, NULL, 0) != EMULATION_OK) goto reinject; @@ -2327,8 +2323,13 @@ static int gp_interception(struct kvm_vc if (!is_guest_mode(vcpu)) return kvm_emulate_instruction(vcpu, EMULTYPE_VMWARE_GP | EMULTYPE_NO_DECODE); - } else + } else { + /* All SVM instructions expect page aligned RAX */ + if (svm->vmcb->save.rax & ~PAGE_MASK) + goto reinject; + return emulate_svm_instr(vcpu, opcode); + }
reinject: kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
From: Vitaly Kuznetsov vkuznets@redhat.com
commit 5c89be1dd5cfb697614bc13626ba3bd0781aa160 upstream.
Full equality check of CPUID data on update (kvm_cpuid_check_equal()) may fail for SGX enabled CPUs as CPUID.(EAX=0x12,ECX=1) is currently being mangled in kvm_vcpu_after_set_cpuid(). Move it to __kvm_update_cpuid_runtime() and split off cpuid_get_supported_xcr0() helper as 'vcpu->arch.guest_supported_xcr0' update needs (logically) to stay in kvm_vcpu_after_set_cpuid().
Cc: stable@vger.kernel.org Fixes: feb627e8d6f6 ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN") Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20220124103606.2630588-2-vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/cpuid.c | 54 +++++++++++++++++++++++++++++++-------------------- 1 file changed, 33 insertions(+), 21 deletions(-)
--- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -176,10 +176,26 @@ void kvm_update_pv_runtime(struct kvm_vc vcpu->arch.pv_cpuid.features = best->eax; }
+/* + * Calculate guest's supported XCR0 taking into account guest CPUID data and + * supported_xcr0 (comprised of host configuration and KVM_SUPPORTED_XCR0). + */ +static u64 cpuid_get_supported_xcr0(struct kvm_cpuid_entry2 *entries, int nent) +{ + struct kvm_cpuid_entry2 *best; + + best = cpuid_entry2_find(entries, nent, 0xd, 0); + if (!best) + return 0; + + return (best->eax | ((u64)best->edx << 32)) & supported_xcr0; +} + static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *entries, int nent) { struct kvm_cpuid_entry2 *best; + u64 guest_supported_xcr0 = cpuid_get_supported_xcr0(entries, nent);
best = cpuid_entry2_find(entries, nent, 1, 0); if (best) { @@ -218,6 +234,21 @@ static void __kvm_update_cpuid_runtime(s vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_MWAIT); } + + /* + * Bits 127:0 of the allowed SECS.ATTRIBUTES (CPUID.0x12.0x1) enumerate + * the supported XSAVE Feature Request Mask (XFRM), i.e. the enclave's + * requested XCR0 value. The enclave's XFRM must be a subset of XCRO + * at the time of EENTER, thus adjust the allowed XFRM by the guest's + * supported XCR0. Similar to XCR0 handling, FP and SSE are forced to + * '1' even on CPUs that don't support XSAVE. + */ + best = cpuid_entry2_find(entries, nent, 0x12, 0x1); + if (best) { + best->ecx &= guest_supported_xcr0 & 0xffffffff; + best->edx &= guest_supported_xcr0 >> 32; + best->ecx |= XFEATURE_MASK_FPSSE; + } }
void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu) @@ -241,27 +272,8 @@ static void kvm_vcpu_after_set_cpuid(str kvm_apic_set_version(vcpu); }
- best = kvm_find_cpuid_entry(vcpu, 0xD, 0); - if (!best) - vcpu->arch.guest_supported_xcr0 = 0; - else - vcpu->arch.guest_supported_xcr0 = - (best->eax | ((u64)best->edx << 32)) & supported_xcr0; - - /* - * Bits 127:0 of the allowed SECS.ATTRIBUTES (CPUID.0x12.0x1) enumerate - * the supported XSAVE Feature Request Mask (XFRM), i.e. the enclave's - * requested XCR0 value. The enclave's XFRM must be a subset of XCRO - * at the time of EENTER, thus adjust the allowed XFRM by the guest's - * supported XCR0. Similar to XCR0 handling, FP and SSE are forced to - * '1' even on CPUs that don't support XSAVE. - */ - best = kvm_find_cpuid_entry(vcpu, 0x12, 0x1); - if (best) { - best->ecx &= vcpu->arch.guest_supported_xcr0 & 0xffffffff; - best->edx &= vcpu->arch.guest_supported_xcr0 >> 32; - best->ecx |= XFEATURE_MASK_FPSSE; - } + vcpu->arch.guest_supported_xcr0 = + cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
kvm_update_pv_runtime(vcpu);
From: Sean Christopherson seanjc@google.com
commit 811f95ff95270e6048197821434d9301e3d7f07c upstream.
Free the "struct kvm_cpuid_entry2" array on successful post-KVM_RUN KVM_SET_CPUID{,2} to fix a memory leak, the callers of kvm_set_cpuid() free the array only on failure.
BUG: memory leak unreferenced object 0xffff88810963a800 (size 2048): comm "syz-executor025", pid 3610, jiffies 4294944928 (age 8.080s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 0d 00 00 00 ................ 47 65 6e 75 6e 74 65 6c 69 6e 65 49 00 00 00 00 GenuntelineI.... backtrace: [<ffffffff814948ee>] kmalloc_node include/linux/slab.h:604 [inline] [<ffffffff814948ee>] kvmalloc_node+0x3e/0x100 mm/util.c:580 [<ffffffff814950f2>] kvmalloc include/linux/slab.h:732 [inline] [<ffffffff814950f2>] vmemdup_user+0x22/0x100 mm/util.c:199 [<ffffffff8109f5ff>] kvm_vcpu_ioctl_set_cpuid2+0x8f/0xf0 arch/x86/kvm/cpuid.c:423 [<ffffffff810711b9>] kvm_arch_vcpu_ioctl+0xb99/0x1e60 arch/x86/kvm/x86.c:5251 [<ffffffff8103e92d>] kvm_vcpu_ioctl+0x4ad/0x950 arch/x86/kvm/../../../virt/kvm/kvm_main.c:4066 [<ffffffff815afacc>] vfs_ioctl fs/ioctl.c:51 [inline] [<ffffffff815afacc>] __do_sys_ioctl fs/ioctl.c:874 [inline] [<ffffffff815afacc>] __se_sys_ioctl fs/ioctl.c:860 [inline] [<ffffffff815afacc>] __x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:860 [<ffffffff844a3335>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff844a3335>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: c6617c61e8fe ("KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN") Cc: stable@vger.kernel.org Reported-by: syzbot+be576ad7655690586eec@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20220125210445.2053429-1-seanjc@google.com Reviewed-by: Vitaly Kuznetsov vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/cpuid.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -338,8 +338,14 @@ static int kvm_set_cpuid(struct kvm_vcpu * KVM_SET_CPUID{,2} again. To support this legacy behavior, check * whether the supplied CPUID data is equal to what's already set. */ - if (vcpu->arch.last_vmentry_cpu != -1) - return kvm_cpuid_check_equal(vcpu, e2, nent); + if (vcpu->arch.last_vmentry_cpu != -1) { + r = kvm_cpuid_check_equal(vcpu, e2, nent); + if (r) + return r; + + kvfree(e2); + return 0; + }
r = kvm_check_cpuid(e2, nent); if (r)
From: Sean Christopherson seanjc@google.com
commit f7e570780efc5cec9b2ed1e0472a7da14e864fdb upstream.
Forcibly leave nested virtualization operation if userspace toggles SMM state via KVM_SET_VCPU_EVENTS or KVM_SYNC_X86_EVENTS. If userspace forces the vCPU out of SMM while it's post-VMXON and then injects an SMI, vmx_enter_smm() will overwrite vmx->nested.smm.vmxon and end up with both vmxon=false and smm.vmxon=false, but all other nVMX state allocated.
Don't attempt to gracefully handle the transition as (a) most transitions are nonsencial, e.g. forcing SMM while L2 is running, (b) there isn't sufficient information to handle all transitions, e.g. SVM wants access to the SMRAM save state, and (c) KVM_SET_VCPU_EVENTS must precede KVM_SET_NESTED_STATE during state restore as the latter disallows putting the vCPU into L2 if SMM is active, and disallows tagging the vCPU as being post-VMXON in SMM if SMM is not active.
Abuse of KVM_SET_VCPU_EVENTS manifests as a WARN and memory leak in nVMX due to failure to free vmcs01's shadow VMCS, but the bug goes far beyond just a memory leak, e.g. toggling SMM on while L2 is active puts the vCPU in an architecturally impossible state.
WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline] WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656 Modules linked in: CPU: 1 PID: 3606 Comm: syz-executor725 Not tainted 5.17.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline] RIP: 0010:free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656 Code: <0f> 0b eb b3 e8 8f 4d 9f 00 e9 f7 fe ff ff 48 89 df e8 92 4d 9f 00 Call Trace: <TASK> kvm_arch_vcpu_destroy+0x72/0x2f0 arch/x86/kvm/x86.c:11123 kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline] kvm_destroy_vcpus+0x11f/0x290 arch/x86/kvm/../../../virt/kvm/kvm_main.c:460 kvm_free_vcpus arch/x86/kvm/x86.c:11564 [inline] kvm_arch_destroy_vm+0x2e8/0x470 arch/x86/kvm/x86.c:11676 kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1217 [inline] kvm_put_kvm+0x4fa/0xb00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1250 kvm_vm_release+0x3f/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1273 __fput+0x286/0x9f0 fs/file_table.c:311 task_work_run+0xdd/0x1a0 kernel/task_work.c:164 exit_task_work include/linux/task_work.h:32 [inline] do_exit+0xb29/0x2a30 kernel/exit.c:806 do_group_exit+0xd2/0x2f0 kernel/exit.c:935 get_signal+0x4b0/0x28c0 kernel/signal.c:2862 arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868 handle_signal_work kernel/entry/common.c:148 [inline] exit_to_user_mode_loop kernel/entry/common.c:172 [inline] exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300 do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK>
Cc: stable@vger.kernel.org Reported-by: syzbot+8112db3ab20e70d50c31@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20220125220358.2091737-1-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/svm/nested.c | 9 +++++---- arch/x86/kvm/svm/svm.c | 2 +- arch/x86/kvm/svm/svm.h | 2 +- arch/x86/kvm/vmx/nested.c | 1 + arch/x86/kvm/x86.c | 4 +++- 6 files changed, 12 insertions(+), 7 deletions(-)
--- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1497,6 +1497,7 @@ struct kvm_x86_ops { };
struct kvm_x86_nested_ops { + void (*leave_nested)(struct kvm_vcpu *vcpu); int (*check_events)(struct kvm_vcpu *vcpu); bool (*hv_timer_pending)(struct kvm_vcpu *vcpu); void (*triple_fault)(struct kvm_vcpu *vcpu); --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -964,9 +964,9 @@ void svm_free_nested(struct vcpu_svm *sv /* * Forcibly leave nested mode in order to be able to reset the VCPU later on. */ -void svm_leave_nested(struct vcpu_svm *svm) +void svm_leave_nested(struct kvm_vcpu *vcpu) { - struct kvm_vcpu *vcpu = &svm->vcpu; + struct vcpu_svm *svm = to_svm(vcpu);
if (is_guest_mode(vcpu)) { svm->nested.nested_run_pending = 0; @@ -1345,7 +1345,7 @@ static int svm_set_nested_state(struct k return -EINVAL;
if (!(kvm_state->flags & KVM_STATE_NESTED_GUEST_MODE)) { - svm_leave_nested(svm); + svm_leave_nested(vcpu); svm_set_gif(svm, !!(kvm_state->flags & KVM_STATE_NESTED_GIF_SET)); return 0; } @@ -1410,7 +1410,7 @@ static int svm_set_nested_state(struct k */
if (is_guest_mode(vcpu)) - svm_leave_nested(svm); + svm_leave_nested(vcpu); else svm->nested.vmcb02.ptr->save = svm->vmcb01.ptr->save;
@@ -1464,6 +1464,7 @@ static bool svm_get_nested_state_pages(s }
struct kvm_x86_nested_ops svm_nested_ops = { + .leave_nested = svm_leave_nested, .check_events = svm_check_nested_events, .triple_fault = nested_svm_triple_fault, .get_nested_state_pages = svm_get_nested_state_pages, --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -290,7 +290,7 @@ int svm_set_efer(struct kvm_vcpu *vcpu,
if ((old_efer & EFER_SVME) != (efer & EFER_SVME)) { if (!(efer & EFER_SVME)) { - svm_leave_nested(svm); + svm_leave_nested(vcpu); svm_set_gif(svm, true); /* #GP intercept is still needed for vmware backdoor */ if (!enable_vmware_backdoor) --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -470,7 +470,7 @@ static inline bool nested_exit_on_nmi(st
int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb_gpa, struct vmcb *vmcb12, bool from_vmrun); -void svm_leave_nested(struct vcpu_svm *svm); +void svm_leave_nested(struct kvm_vcpu *vcpu); void svm_free_nested(struct vcpu_svm *svm); int svm_allocate_nested(struct vcpu_svm *svm); int nested_svm_vmrun(struct kvm_vcpu *vcpu); --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -6744,6 +6744,7 @@ __init int nested_vmx_hardware_setup(int }
struct kvm_x86_nested_ops vmx_nested_ops = { + .leave_nested = vmx_leave_nested, .check_events = vmx_check_nested_events, .hv_timer_pending = nested_vmx_preemption_timer_pending, .triple_fault = nested_vmx_triple_fault, --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4784,8 +4784,10 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_e vcpu->arch.apic->sipi_vector = events->sipi_vector;
if (events->flags & KVM_VCPUEVENT_VALID_SMM) { - if (!!(vcpu->arch.hflags & HF_SMM_MASK) != events->smi.smm) + if (!!(vcpu->arch.hflags & HF_SMM_MASK) != events->smi.smm) { + kvm_x86_ops.nested_ops->leave_nested(vcpu); kvm_smm_changed(vcpu, events->smi.smm); + }
vcpu->arch.smi_pending = events->smi.pending;
From: Vitaly Kuznetsov vkuznets@redhat.com
commit 033a3ea59a19df63edb4db6bfdbb357cd028258a upstream.
kvm_cpuid_check_equal() checks for the (full) equality of the supplied CPUID data so .flags need to be checked too.
Reported-by: Sean Christopherson seanjc@google.com Fixes: c6617c61e8fe ("KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN") Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20220126131804.2839410-1-vkuznets@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/cpuid.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -113,6 +113,7 @@ static int kvm_cpuid_check_equal(struct orig = &vcpu->arch.cpuid_entries[i]; if (e2[i].function != orig->function || e2[i].index != orig->index || + e2[i].flags != orig->flags || e2[i].eax != orig->eax || e2[i].ebx != orig->ebx || e2[i].ecx != orig->ecx || e2[i].edx != orig->edx) return -EINVAL;
From: Xiaoyao Li xiaoyao.li@intel.com
commit be4f3b3f82271c3193ce200a996dc70682c8e622 upstream.
It has been corrected from SDM version 075 that MSR_IA32_XSS is reset to zero on Power up and Reset but keeps unchanged on INIT.
Fixes: a554d207dc46 ("KVM: X86: Processor States following Reset or INIT") Cc: stable@vger.kernel.org Signed-off-by: Xiaoyao Li xiaoyao.li@intel.com Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20220126172226.2298529-2-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/x86.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11065,6 +11065,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcp vcpu->arch.msr_misc_features_enables = 0;
vcpu->arch.xcr0 = XFEATURE_MASK_FP; + vcpu->arch.ia32_xss = 0; }
/* All GPRs except RDX (handled below) are zeroed on RESET/INIT. */ @@ -11081,8 +11082,6 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcp cpuid_0x1 = kvm_find_cpuid_entry(vcpu, 1, 0); kvm_rdx_write(vcpu, cpuid_0x1 ? cpuid_0x1->eax : 0x600);
- vcpu->arch.ia32_xss = 0; - static_call(kvm_x86_vcpu_reset)(vcpu, init_event);
kvm_set_rflags(vcpu, X86_EFLAGS_FIXED);
From: Like Xu likexu@tencent.com
commit 4c282e51e4450b94680d6ca3b10f830483b1f243 upstream.
Do a runtime CPUID update for a vCPU if MSR_IA32_XSS is written, as the size in bytes of the XSAVE area is affected by the states enabled in XSS.
Fixes: 203000993de5 ("kvm: vmx: add MSR logic for XSAVES") Cc: stable@vger.kernel.org Signed-off-by: Like Xu likexu@tencent.com [sean: split out as a separate patch, adjust Fixes tag] Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20220126172226.2298529-3-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/x86.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3508,6 +3508,7 @@ int kvm_set_msr_common(struct kvm_vcpu * if (data & ~supported_xss) return 1; vcpu->arch.ia32_xss = data; + kvm_update_cpuid_runtime(vcpu); break; case MSR_SMI_COUNT: if (!msr_info->host_initiated)
From: Like Xu likexu@tencent.com
commit 05a9e065059e566f218f8778c4d17ee75db56c55 upstream.
XCR0 is reset to 1 by RESET but not INIT and IA32_XSS is zeroed by both RESET and INIT. The kvm_set_msr_common()'s handling of MSR_IA32_XSS also needs to update kvm_update_cpuid_runtime(). In the above cases, the size in bytes of the XSAVE area containing all states enabled by XCR0 or (XCRO | IA32_XSS) needs to be updated.
For simplicity and consistency, existing helpers are used to write values and call kvm_update_cpuid_runtime(), and it's not exactly a fast path.
Fixes: a554d207dc46 ("KVM: X86: Processor States following Reset or INIT") Cc: stable@vger.kernel.org Signed-off-by: Like Xu likexu@tencent.com Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20220126172226.2298529-4-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/x86.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11065,8 +11065,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcp
vcpu->arch.msr_misc_features_enables = 0;
- vcpu->arch.xcr0 = XFEATURE_MASK_FP; - vcpu->arch.ia32_xss = 0; + __kvm_set_xcr(vcpu, 0, XFEATURE_MASK_FP); + __kvm_set_msr(vcpu, MSR_IA32_XSS, 0, true); }
/* All GPRs except RDX (handled below) are zeroed on RESET/INIT. */
From: Nicholas Piggin npiggin@gmail.com
commit 22f7ff0dea9491e90b6fe808ed40c30bd791e5c2 upstream.
The L0 is storing HFSCR requested by the L1 for the L2 in struct kvm_nested_guest when the L1 requests a vCPU enter L2. kvm_nested_guest is not a per-vCPU structure. Hilarity ensues.
Fix it by moving the nested hfscr into the vCPU structure together with the other per-vCPU nested fields.
Fixes: 8b210a880b35 ("KVM: PPC: Book3S HV Nested: Make nested HFSCR state accessible") Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Nicholas Piggin npiggin@gmail.com Reviewed-by: Fabiano Rosas farosas@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220122105530.3477250-1-npiggin@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/kvm_book3s_64.h | 1 - arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/kvm/book3s_hv.c | 3 +-- arch/powerpc/kvm/book3s_hv_nested.c | 2 +- 4 files changed, 3 insertions(+), 4 deletions(-)
--- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -39,7 +39,6 @@ struct kvm_nested_guest { pgd_t *shadow_pgtable; /* our page table for this guest */ u64 l1_gr_to_hr; /* L1's addr of part'n-scoped table */ u64 process_table; /* process table entry for this guest */ - u64 hfscr; /* HFSCR that the L1 requested for this nested guest */ long refcnt; /* number of pointers to this struct */ struct mutex tlb_lock; /* serialize page faults and tlbies */ struct kvm_nested_guest *next; --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -814,6 +814,7 @@ struct kvm_vcpu_arch {
/* For support of nested guests */ struct kvm_nested_guest *nested; + u64 nested_hfscr; /* HFSCR that the L1 requested for the nested guest */ u32 nested_vcpu_id; gpa_t nested_io_gpr; #endif --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1731,7 +1731,6 @@ static int kvmppc_handle_exit_hv(struct
static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu) { - struct kvm_nested_guest *nested = vcpu->arch.nested; int r; int srcu_idx;
@@ -1831,7 +1830,7 @@ static int kvmppc_handle_nested_exit(str * it into a HEAI. */ if (!(vcpu->arch.hfscr_permitted & (1UL << cause)) || - (nested->hfscr & (1UL << cause))) { + (vcpu->arch.nested_hfscr & (1UL << cause))) { vcpu->arch.trap = BOOK3S_INTERRUPT_H_EMUL_ASSIST;
/* --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -362,7 +362,7 @@ long kvmhv_enter_nested_guest(struct kvm /* set L1 state to L2 state */ vcpu->arch.nested = l2; vcpu->arch.nested_vcpu_id = l2_hv.vcpu_token; - l2->hfscr = l2_hv.hfscr; + vcpu->arch.nested_hfscr = l2_hv.hfscr; vcpu->arch.regs = l2_regs;
/* Guest must always run with ME enabled, HV disabled. */
From: Vivek Goyal vgoyal@redhat.com
commit 7f5056b9e7b71149bf11073f00a57fa1ac2921a9 upstream.
A ceph user has reported that ceph is crashing with kernel NULL pointer dereference. Following is the backtrace.
/proc/version: Linux version 5.16.2-arch1-1 (linux@archlinux) (gcc (GCC) 11.1.0, GNU ld (GNU Binutils) 2.36.1) #1 SMP PREEMPT Thu, 20 Jan 2022 16:18:29 +0000 distro / arch: Arch Linux / x86_64 SELinux is not enabled ceph cluster version: 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503)
relevant dmesg output: [ 30.947129] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 30.947206] #PF: supervisor read access in kernel mode [ 30.947258] #PF: error_code(0x0000) - not-present page [ 30.947310] PGD 0 P4D 0 [ 30.947342] Oops: 0000 [#1] PREEMPT SMP PTI [ 30.947388] CPU: 5 PID: 778 Comm: touch Not tainted 5.16.2-arch1-1 #1 86fbf2c313cc37a553d65deb81d98e9dcc2a3659 [ 30.947486] Hardware name: Gigabyte Technology Co., Ltd. B365M DS3H/B365M DS3H, BIOS F5 08/13/2019 [ 30.947569] RIP: 0010:strlen+0x0/0x20 [ 30.947616] Code: b6 07 38 d0 74 16 48 83 c7 01 84 c0 74 05 48 39 f7 75 ec 31 c0 31 d2 89 d6 89 d7 c3 48 89 f8 31 d2 89 d6 89 d7 c3 0 f 1f 40 00 <80> 3f 00 74 12 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 31 ff [ 30.947782] RSP: 0018:ffffa4ed80ffbbb8 EFLAGS: 00010246 [ 30.947836] RAX: 0000000000000000 RBX: ffffa4ed80ffbc60 RCX: 0000000000000000 [ 30.947904] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 30.947971] RBP: ffff94b0d15c0ae0 R08: 0000000000000000 R09: 0000000000000000 [ 30.948040] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 30.948106] R13: 0000000000000001 R14: ffffa4ed80ffbc60 R15: 0000000000000000 [ 30.948174] FS: 00007fc7520f0740(0000) GS:ffff94b7ced40000(0000) knlGS:0000000000000000 [ 30.948252] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 30.948308] CR2: 0000000000000000 CR3: 0000000104a40001 CR4: 00000000003706e0 [ 30.948376] Call Trace: [ 30.948404] <TASK> [ 30.948431] ceph_security_init_secctx+0x7b/0x240 [ceph 49f9c4b9bf5be8760f19f1747e26da33920bce4b] [ 30.948582] ceph_atomic_open+0x51e/0x8a0 [ceph 49f9c4b9bf5be8760f19f1747e26da33920bce4b] [ 30.948708] ? get_cached_acl+0x4d/0xa0 [ 30.948759] path_openat+0x60d/0x1030 [ 30.948809] do_filp_open+0xa5/0x150 [ 30.948859] do_sys_openat2+0xc4/0x190 [ 30.948904] __x64_sys_openat+0x53/0xa0 [ 30.948948] do_syscall_64+0x5c/0x90 [ 30.948989] ? exc_page_fault+0x72/0x180 [ 30.949034] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 30.949091] RIP: 0033:0x7fc7521e25bb [ 30.950849] Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 0 0 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 54 24 28 64 48 2b 14 25
Core of the problem is that ceph checks for return code from security_dentry_init_security() and if return code is 0, it assumes everything is fine and continues to call strlen(name), which crashes.
Typically SELinux LSM returns 0 and sets name to "security.selinux" and it is not a problem. Or if selinux is not compiled in or disabled, it returns -EOPNOTSUP and ceph deals with it.
But somehow in this configuration, 0 is being returned and "name" is not being initialized and that's creating the problem.
Our suspicion is that BPF LSM is registering a hook for dentry_init_security() and returns hook default of 0.
LSM_HOOK(int, 0, dentry_init_security, struct dentry *dentry,...)
I have not been able to reproduce it just by doing CONFIG_BPF_LSM=y. Stephen has tested the patch though and confirms it solves the problem for him.
dentry_init_security() is written in such a way that it expects only one LSM to register the hook. Atleast that's the expectation with current code.
If another LSM returns a hook and returns default, it will simply return 0 as of now and that will break ceph.
Hence, suggestion is that change semantics of this hook a bit. If there are no LSMs or no LSM is taking ownership and initializing security context, then return -EOPNOTSUP. Also allow at max one LSM to initialize security context. This hook can't deal with multiple LSMs trying to init security context. This patch implements this new behavior.
Reported-by: Stephen Muth smuth4@gmail.com Tested-by: Stephen Muth smuth4@gmail.com Suggested-by: Casey Schaufler casey@schaufler-ca.com Acked-by: Casey Schaufler casey@schaufler-ca.com Reviewed-by: Serge Hallyn serge@hallyn.com Cc: Jeff Layton jlayton@kernel.org Cc: Christian Brauner brauner@kernel.org Cc: Paul Moore paul@paul-moore.com Cc: stable@vger.kernel.org # 5.16.0 Signed-off-by: Vivek Goyal vgoyal@redhat.com Reviewed-by: Jeff Layton jlayton@kernel.org Acked-by: Paul Moore paul@paul-moore.com Acked-by: Christian Brauner brauner@kernel.org Signed-off-by: James Morris jmorris@namei.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/lsm_hook_defs.h | 2 +- security/security.c | 15 +++++++++++++-- 2 files changed, 14 insertions(+), 3 deletions(-)
--- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -82,7 +82,7 @@ LSM_HOOK(int, 0, sb_add_mnt_opt, const c int len, void **mnt_opts) LSM_HOOK(int, 0, move_mount, const struct path *from_path, const struct path *to_path) -LSM_HOOK(int, 0, dentry_init_security, struct dentry *dentry, +LSM_HOOK(int, -EOPNOTSUPP, dentry_init_security, struct dentry *dentry, int mode, const struct qstr *name, const char **xattr_name, void **ctx, u32 *ctxlen) LSM_HOOK(int, 0, dentry_create_files_as, struct dentry *dentry, int mode, --- a/security/security.c +++ b/security/security.c @@ -1056,8 +1056,19 @@ int security_dentry_init_security(struct const char **xattr_name, void **ctx, u32 *ctxlen) { - return call_int_hook(dentry_init_security, -EOPNOTSUPP, dentry, mode, - name, xattr_name, ctx, ctxlen); + struct security_hook_list *hp; + int rc; + + /* + * Only one module will provide a security context. + */ + hlist_for_each_entry(hp, &security_hook_heads.dentry_init_security, list) { + rc = hp->hook.dentry_init_security(dentry, mode, name, + xattr_name, ctx, ctxlen); + if (rc != LSM_RET_DEFAULT(dentry_init_security)) + return rc; + } + return LSM_RET_DEFAULT(dentry_init_security); } EXPORT_SYMBOL(security_dentry_init_security);
From: Evgenii Stepanov eugenis@google.com
commit 3758a6c74e08bdc15ccccd6872a6ad37d165239a upstream.
In ex_handler_load_unaligned_zeropad() we erroneously extract the data and addr register indices from ex->type rather than ex->data. As ex->type will contain EX_TYPE_LOAD_UNALIGNED_ZEROPAD (i.e. 4): * We'll always treat X0 as the address register, since EX_DATA_REG_ADDR is extracted from bits [9:5]. Thus, we may attempt to dereference an arbitrary address as X0 may hold an arbitrary value. * We'll always treat X4 as the data register, since EX_DATA_REG_DATA is extracted from bits [4:0]. Thus we will corrupt X4 and cause arbitrary behaviour within load_unaligned_zeropad() and its caller.
Fix this by extracting both values from ex->data as originally intended.
On an MTE-enabled QEMU image we are hitting the following crash: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Call trace: fixup_exception+0xc4/0x108 __do_kernel_fault+0x3c/0x268 do_tag_check_fault+0x3c/0x104 do_mem_abort+0x44/0xf4 el1_abort+0x40/0x64 el1h_64_sync_handler+0x60/0xa0 el1h_64_sync+0x7c/0x80 link_path_walk+0x150/0x344 path_openat+0xa0/0x7dc do_filp_open+0xb8/0x168 do_sys_openat2+0x88/0x17c __arm64_sys_openat+0x74/0xa0 invoke_syscall+0x48/0x148 el0_svc_common+0xb8/0xf8 do_el0_svc+0x28/0x88 el0_svc+0x24/0x84 el0t_64_sync_handler+0x88/0xec el0t_64_sync+0x1b4/0x1b8 Code: f8695a69 71007d1f 540000e0 927df12a (f940014a)
Fixes: 753b32368705 ("arm64: extable: add load_unaligned_zeropad() handler") Cc: stable@vger.kernel.org # 5.16.x Reviewed-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Evgenii Stepanov eugenis@google.com Link: https://lore.kernel.org/r/20220125182217.2605202-1-eugenis@google.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/mm/extable.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm64/mm/extable.c +++ b/arch/arm64/mm/extable.c @@ -43,8 +43,8 @@ static bool ex_handler_load_unaligned_zeropad(const struct exception_table_entry *ex, struct pt_regs *regs) { - int reg_data = FIELD_GET(EX_DATA_REG_DATA, ex->type); - int reg_addr = FIELD_GET(EX_DATA_REG_ADDR, ex->type); + int reg_data = FIELD_GET(EX_DATA_REG_DATA, ex->data); + int reg_addr = FIELD_GET(EX_DATA_REG_ADDR, ex->data); unsigned long data, addr, offset;
addr = pt_regs_read_reg(regs, reg_addr);
From: Mike Snitzer snitzer@redhat.com
commit f524d9c95fab54783d0038f7a3e8c014d5b56857 upstream.
Reverts a1e1cb72d9649 ("dm: fix redundant IO accounting for bios that need splitting") because it was too narrow in scope (only addressed redundant 'sectors[]' accounting and not ios, nsecs[], etc).
Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer snitzer@redhat.com Link: https://lore.kernel.org/r/20220128155841.39644-3-snitzer@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm.c | 15 --------------- 1 file changed, 15 deletions(-)
--- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1510,9 +1510,6 @@ static void init_clone_info(struct clone ci->sector = bio->bi_iter.bi_sector; }
-#define __dm_part_stat_sub(part, field, subnd) \ - (part_stat_get(part, field) -= (subnd)) - /* * Entry point to split a bio into clones and submit them to the targets. */ @@ -1548,18 +1545,6 @@ static void __split_and_process_bio(stru GFP_NOIO, &md->queue->bio_split); ci.io->orig_bio = b;
- /* - * Adjust IO stats for each split, otherwise upon queue - * reentry there will be redundant IO accounting. - * NOTE: this is a stop-gap fix, a proper fix involves - * significant refactoring of DM core's bio splitting - * (by eliminating DM's splitting and just using bio_split) - */ - part_stat_lock(); - __dm_part_stat_sub(dm_disk(md)->part0, - sectors[op_stat_group(bio_op(bio))], ci.sector_count); - part_stat_unlock(); - bio_chain(b, bio); trace_block_split(b, bio->bi_iter.bi_sector); submit_bio_noacct(bio);
From: Mike Snitzer snitzer@redhat.com
commit e45c47d1f94e0cc7b6b079fdb4bcce2995e2adc4 upstream.
bio_start_io_acct_time() interface is like bio_start_io_acct() that allows start_time to be passed in. This gives drivers the ability to defer starting accounting until after IO is issued (but possibily not entirely due to bio splitting).
Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Mike Snitzer snitzer@redhat.com Link: https://lore.kernel.org/r/20220128155841.39644-2-snitzer@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- block/blk-core.c | 25 +++++++++++++++++++------ include/linux/blkdev.h | 1 + 2 files changed, 20 insertions(+), 6 deletions(-)
--- a/block/blk-core.c +++ b/block/blk-core.c @@ -1258,22 +1258,34 @@ void __blk_account_io_start(struct reque }
static unsigned long __part_start_io_acct(struct block_device *part, - unsigned int sectors, unsigned int op) + unsigned int sectors, unsigned int op, + unsigned long start_time) { const int sgrp = op_stat_group(op); - unsigned long now = READ_ONCE(jiffies);
part_stat_lock(); - update_io_ticks(part, now, false); + update_io_ticks(part, start_time, false); part_stat_inc(part, ios[sgrp]); part_stat_add(part, sectors[sgrp], sectors); part_stat_local_inc(part, in_flight[op_is_write(op)]); part_stat_unlock();
- return now; + return start_time; }
/** + * bio_start_io_acct_time - start I/O accounting for bio based drivers + * @bio: bio to start account for + * @start_time: start time that should be passed back to bio_end_io_acct(). + */ +void bio_start_io_acct_time(struct bio *bio, unsigned long start_time) +{ + __part_start_io_acct(bio->bi_bdev, bio_sectors(bio), + bio_op(bio), start_time); +} +EXPORT_SYMBOL_GPL(bio_start_io_acct_time); + +/** * bio_start_io_acct - start I/O accounting for bio based drivers * @bio: bio to start account for * @@ -1281,14 +1293,15 @@ static unsigned long __part_start_io_acc */ unsigned long bio_start_io_acct(struct bio *bio) { - return __part_start_io_acct(bio->bi_bdev, bio_sectors(bio), bio_op(bio)); + return __part_start_io_acct(bio->bi_bdev, bio_sectors(bio), + bio_op(bio), jiffies); } EXPORT_SYMBOL_GPL(bio_start_io_acct);
unsigned long disk_start_io_acct(struct gendisk *disk, unsigned int sectors, unsigned int op) { - return __part_start_io_acct(disk->part0, sectors, op); + return __part_start_io_acct(disk->part0, sectors, op, jiffies); } EXPORT_SYMBOL(disk_start_io_acct);
--- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1254,6 +1254,7 @@ unsigned long disk_start_io_acct(struct void disk_end_io_acct(struct gendisk *disk, unsigned int op, unsigned long start_time);
+void bio_start_io_acct_time(struct bio *bio, unsigned long start_time); unsigned long bio_start_io_acct(struct bio *bio); void bio_end_io_acct_remapped(struct bio *bio, unsigned long start_time, struct block_device *orig_bdev);
From: Mike Snitzer snitzer@redhat.com
commit b879f915bc48a18d4f4462729192435bb0f17052 upstream.
Record the start_time for a bio but defer the starting block core's IO accounting until after IO is submitted using bio_start_io_acct_time().
This approach avoids the need to mess around with any of the individual IO stats in response to a bio_split() that follows bio submission.
Reported-by: Bud Brown bubrown@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Cc: stable@vger.kernel.org Depends-on: e45c47d1f94e ("block: add bio_start_io_acct_time() to control start_time") Signed-off-by: Mike Snitzer snitzer@redhat.com Link: https://lore.kernel.org/r/20220128155841.39644-4-snitzer@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -489,7 +489,7 @@ static void start_io_acct(struct dm_io * struct mapped_device *md = io->md; struct bio *bio = io->orig_bio;
- io->start_time = bio_start_io_acct(bio); + bio_start_io_acct_time(bio, io->start_time); if (unlikely(dm_stats_used(&md->stats))) dm_stats_account_io(&md->stats, bio_data_dir(bio), bio->bi_iter.bi_sector, bio_sectors(bio), @@ -535,7 +535,7 @@ static struct dm_io *alloc_io(struct map io->md = md; spin_lock_init(&io->endio_lock);
- start_io_acct(io); + io->start_time = jiffies;
return io; } @@ -1550,6 +1550,7 @@ static void __split_and_process_bio(stru submit_bio_noacct(bio); } } + start_io_acct(ci.io);
/* drop the extra reference count */ dm_io_dec_pending(ci.io, errno_to_blk_status(error));
From: Jochen Mades jochen@mades.net
commit 62f676ff7898f6c1bd26ce014564773a3dc00601 upstream.
Commit 8d479237727c ("serial: amba-pl011: add RS485 support") sought to keep RTS deasserted on set_mctrl if rs485 is enabled. However it did so only if deasserted RTS polarity is high. Fix it in case it's low.
Fixes: 8d479237727c ("serial: amba-pl011: add RS485 support") Cc: stable@vger.kernel.org # v5.15+ Cc: Lino Sanfilippo LinoSanfilippo@gmx.de Signed-off-by: Jochen Mades jochen@mades.net [lukas: copyedit commit message, add stable designation] Signed-off-by: Lukas Wunner lukas@wunner.de Link: https://lore.kernel.org/r/85fa3323ba8c307943969b7343e23f34c3e652ba.164290928... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/amba-pl011.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/tty/serial/amba-pl011.c +++ b/drivers/tty/serial/amba-pl011.c @@ -1615,8 +1615,12 @@ static void pl011_set_mctrl(struct uart_ container_of(port, struct uart_amba_port, port); unsigned int cr;
- if (port->rs485.flags & SER_RS485_ENABLED) - mctrl &= ~TIOCM_RTS; + if (port->rs485.flags & SER_RS485_ENABLED) { + if (port->rs485.flags & SER_RS485_RTS_AFTER_SEND) + mctrl &= ~TIOCM_RTS; + else + mctrl |= TIOCM_RTS; + }
cr = pl011_read(uap, REG_CR);
From: Robert Hancock robert.hancock@calian.com
commit d06b1cf28297e27127d3da54753a3a01a2fa2f28 upstream.
8250_of supports a reg-offset property which is intended to handle cases where the device registers start at an offset inside the region of memory allocated to the device. The Xilinx 16550 UART, for which this support was initially added, requires this. However, the code did not adjust the overall size of the mapped region accordingly, causing the driver to request an area of memory past the end of the device's allocation. For example, if the UART was allocated an address of 0xb0130000, size of 0x10000 and reg-offset of 0x1000 in the device tree, the region of memory reserved was b0131000-b0140fff, which caused the driver for the region starting at b0140000 to fail to probe.
Fix this by subtracting reg-offset from the mapped region size.
Fixes: b912b5e2cfb3 ([POWERPC] Xilinx: of_serial support for Xilinx uart 16550.) Cc: stable stable@vger.kernel.org Signed-off-by: Robert Hancock robert.hancock@calian.com Link: https://lore.kernel.org/r/20220112194214.881844-1-robert.hancock@calian.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/8250/8250_of.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
--- a/drivers/tty/serial/8250/8250_of.c +++ b/drivers/tty/serial/8250/8250_of.c @@ -83,8 +83,17 @@ static int of_platform_serial_setup(stru port->mapsize = resource_size(&resource);
/* Check for shifted address mapping */ - if (of_property_read_u32(np, "reg-offset", &prop) == 0) + if (of_property_read_u32(np, "reg-offset", &prop) == 0) { + if (prop >= port->mapsize) { + dev_warn(&ofdev->dev, "reg-offset %u exceeds region size %pa\n", + prop, &port->mapsize); + ret = -EINVAL; + goto err_unprepare; + } + port->mapbase += prop; + port->mapsize -= prop; + }
port->iotype = UPIO_MEM; if (of_property_read_u32(np, "reg-io-width", &prop) == 0) {
From: Valentin Caron valentin.caron@foss.st.com
commit 037b91ec7729524107982e36ec4b40f9b174f7a2 upstream.
x_char is ignored by stm32_usart_start_tx() when xmit buffer is empty.
Fix start_tx condition to allow x_char to be sent.
Fixes: 48a6092fb41f ("serial: stm32-usart: Add STM32 USART Driver") Cc: stable stable@vger.kernel.org Signed-off-by: Erwan Le Ray erwan.leray@foss.st.com Signed-off-by: Valentin Caron valentin.caron@foss.st.com Link: https://lore.kernel.org/r/20220111164441.6178-3-valentin.caron@foss.st.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/stm32-usart.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/tty/serial/stm32-usart.c +++ b/drivers/tty/serial/stm32-usart.c @@ -696,7 +696,7 @@ static void stm32_usart_start_tx(struct struct serial_rs485 *rs485conf = &port->rs485; struct circ_buf *xmit = &port->state->xmit;
- if (uart_circ_empty(xmit)) + if (uart_circ_empty(xmit) && !port->x_char) return;
if (rs485conf->flags & SER_RS485_ENABLED) {
From: daniel.starke@siemens.com daniel.starke@siemens.com
commit 8838b2af23caf1ff0610caef2795d6668a013b2d upstream.
n_gsm is based on the 3GPP 07.010 and its newer version is the 3GPP 27.010. See https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.a... The changes from 07.010 to 27.010 are non-functional. Therefore, I refer to the newer 27.010 here. Chapter 5.2.7.3 states that DC1 (XON) and DC3 (XOFF) are the control characters defined in ISO/IEC 646. These shall be quoted if seen in the data stream to avoid interpretation as flow control characters.
ISO/IEC 646 refers to the set of ISO standards described as the ISO 7-bit coded character set for information interchange. Its final version is also known as ITU T.50. See https://www.itu.int/rec/T-REC-T.50-199209-I/en
To abide the standard it is needed to quote DC1 and DC3 correctly if these are seen as data bytes and not as control characters. The current implementation already tries to enforce this but fails to catch all defined cases. 3GPP 27.010 chapter 5.2.7.3 clearly states that the most significant bit shall be ignored for DC1 and DC3 handling. The current implementation handles only the case with the most significant bit set 0. Cases in which DC1 and DC3 have the most significant bit set 1 are left unhandled.
This patch fixes this by masking the data bytes with ISO_IEC_646_MASK (only the 7 least significant bits set 1) before comparing them with XON (a.k.a. DC1) and XOFF (a.k.a. DC3) when testing which byte values need quotation via byte stuffing.
Fixes: e1eaea46bb40 ("tty: n_gsm line discipline") Cc: stable@vger.kernel.org Signed-off-by: Daniel Starke daniel.starke@siemens.com Link: https://lore.kernel.org/r/20220120101857.2509-1-daniel.starke@siemens.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/n_gsm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -322,6 +322,7 @@ static int addr_cnt; #define GSM1_ESCAPE_BITS 0x20 #define XON 0x11 #define XOFF 0x13 +#define ISO_IEC_646_MASK 0x7F
static const struct tty_port_operations gsm_port_ops;
@@ -531,7 +532,8 @@ static int gsm_stuff_frame(const u8 *inp int olen = 0; while (len--) { if (*input == GSM1_SOF || *input == GSM1_ESCAPE - || *input == XON || *input == XOFF) { + || (*input & ISO_IEC_646_MASK) == XON + || (*input & ISO_IEC_646_MASK) == XOFF) { *output++ = GSM1_ESCAPE; *output++ = *input++ ^ GSM1_ESCAPE_BITS; olen++;
From: Maciej W. Rozycki macro@embecosm.com
commit f23653fe64479d96910bfda2b700b1af17c991ac upstream.
Fix a user API regression introduced with commit f76edd8f7ce0 ("tty: cyclades, remove this orphan"), which removed a part of the API and caused compilation errors for user programs using said part, such as GCC 9 in its libsanitizer component[1]:
.../libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc:160:10: fatal error: linux/cyclades.h: No such file or directory 160 | #include <linux/cyclades.h> | ^~~~~~~~~~~~~~~~~~ compilation terminated. make[4]: *** [Makefile:664: sanitizer_platform_limits_posix.lo] Error 1
As the absolute minimum required bring `struct cyclades_monitor' and ioctl numbers back then so as to make the library build again. Add a preprocessor warning as to the obsolescence of the features provided.
[1] GCC PR sanitizer/100379, "cyclades.h is removed from linux kernel header files", https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100379
Fixes: f76edd8f7ce0 ("tty: cyclades, remove this orphan") Cc: stable@vger.kernel.org # v5.13+ Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Maciej W. Rozycki macro@embecosm.com Link: https://lore.kernel.org/r/alpine.DEB.2.20.2201260733430.11348@tpp.orcam.me.u... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/uapi/linux/cyclades.h | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) create mode 100644 include/uapi/linux/cyclades.h
--- /dev/null +++ b/include/uapi/linux/cyclades.h @@ -0,0 +1,35 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ + +#ifndef _UAPI_LINUX_CYCLADES_H +#define _UAPI_LINUX_CYCLADES_H + +#warning "Support for features provided by this header has been removed" +#warning "Please consider updating your code" + +struct cyclades_monitor { + unsigned long int_count; + unsigned long char_count; + unsigned long char_max; + unsigned long char_last; +}; + +#define CYGETMON 0x435901 +#define CYGETTHRESH 0x435902 +#define CYSETTHRESH 0x435903 +#define CYGETDEFTHRESH 0x435904 +#define CYSETDEFTHRESH 0x435905 +#define CYGETTIMEOUT 0x435906 +#define CYSETTIMEOUT 0x435907 +#define CYGETDEFTIMEOUT 0x435908 +#define CYSETDEFTIMEOUT 0x435909 +#define CYSETRFLOW 0x43590a +#define CYGETRFLOW 0x43590b +#define CYSETRTSDTR_INV 0x43590c +#define CYGETRTSDTR_INV 0x43590d +#define CYZSETPOLLCYCLE 0x43590e +#define CYZGETPOLLCYCLE 0x43590f +#define CYGETCD1400VER 0x435910 +#define CYSETWAIT 0x435912 +#define CYGETWAIT 0x435913 + +#endif /* _UAPI_LINUX_CYCLADES_H */
From: Cameron Williams cang1@live.co.uk
commit 152d1afa834c84530828ee031cf07a00e0fc0b8c upstream.
This commit adds support for the some of the Brainboxes PCI range of cards, including the UC-101, UC-235/246, UC-257, UC-268, UC-275/279, UC-302, UC-310, UC-313, UC-320/324, UC-346, UC-357, UC-368 and UC-420/431.
Signed-off-by: Cameron Williams cang1@live.co.uk Cc: stable stable@vger.kernel.org Link: https://lore.kernel.org/r/AM5PR0202MB2564688493F7DD9B9C610827C45E9@AM5PR0202... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/8250/8250_pci.c | 100 ++++++++++++++++++++++++++++++++++++- 1 file changed, 98 insertions(+), 2 deletions(-)
--- a/drivers/tty/serial/8250/8250_pci.c +++ b/drivers/tty/serial/8250/8250_pci.c @@ -5174,8 +5174,30 @@ static const struct pci_device_id serial { PCI_VENDOR_ID_INTASHIELD, PCI_DEVICE_ID_INTASHIELD_IS400, PCI_ANY_ID, PCI_ANY_ID, 0, 0, /* 135a.0dc0 */ pbn_b2_4_115200 }, + /* Brainboxes Devices */ /* - * BrainBoxes UC-260 + * Brainboxes UC-101 + */ + { PCI_VENDOR_ID_INTASHIELD, 0x0BA1, + PCI_ANY_ID, PCI_ANY_ID, + 0, 0, + pbn_b2_2_115200 }, + /* + * Brainboxes UC-235/246 + */ + { PCI_VENDOR_ID_INTASHIELD, 0x0AA1, + PCI_ANY_ID, PCI_ANY_ID, + 0, 0, + pbn_b2_1_115200 }, + /* + * Brainboxes UC-257 + */ + { PCI_VENDOR_ID_INTASHIELD, 0x0861, + PCI_ANY_ID, PCI_ANY_ID, + 0, 0, + pbn_b2_2_115200 }, + /* + * Brainboxes UC-260/271/701/756 */ { PCI_VENDOR_ID_INTASHIELD, 0x0D21, PCI_ANY_ID, PCI_ANY_ID, @@ -5183,7 +5205,81 @@ static const struct pci_device_id serial pbn_b2_4_115200 }, { PCI_VENDOR_ID_INTASHIELD, 0x0E34, PCI_ANY_ID, PCI_ANY_ID, - PCI_CLASS_COMMUNICATION_MULTISERIAL << 8, 0xffff00, + PCI_CLASS_COMMUNICATION_MULTISERIAL << 8, 0xffff00, + pbn_b2_4_115200 }, + /* + * Brainboxes UC-268 + */ + { PCI_VENDOR_ID_INTASHIELD, 0x0841, + PCI_ANY_ID, PCI_ANY_ID, + 0, 0, + pbn_b2_4_115200 }, + /* + * Brainboxes UC-275/279 + */ + { PCI_VENDOR_ID_INTASHIELD, 0x0881, + PCI_ANY_ID, PCI_ANY_ID, + 0, 0, + pbn_b2_8_115200 }, + /* + * Brainboxes UC-302 + */ + { PCI_VENDOR_ID_INTASHIELD, 0x08E1, + PCI_ANY_ID, PCI_ANY_ID, + 0, 0, + pbn_b2_2_115200 }, + /* + * Brainboxes UC-310 + */ + { PCI_VENDOR_ID_INTASHIELD, 0x08C1, + PCI_ANY_ID, PCI_ANY_ID, + 0, 0, + pbn_b2_2_115200 }, + /* + * Brainboxes UC-313 + */ + { PCI_VENDOR_ID_INTASHIELD, 0x08A3, + PCI_ANY_ID, PCI_ANY_ID, + 0, 0, + pbn_b2_2_115200 }, + /* + * Brainboxes UC-320/324 + */ + { PCI_VENDOR_ID_INTASHIELD, 0x0A61, + PCI_ANY_ID, PCI_ANY_ID, + 0, 0, + pbn_b2_1_115200 }, + /* + * Brainboxes UC-346 + */ + { PCI_VENDOR_ID_INTASHIELD, 0x0B02, + PCI_ANY_ID, PCI_ANY_ID, + 0, 0, + pbn_b2_4_115200 }, + /* + * Brainboxes UC-357 + */ + { PCI_VENDOR_ID_INTASHIELD, 0x0A81, + PCI_ANY_ID, PCI_ANY_ID, + 0, 0, + pbn_b2_2_115200 }, + { PCI_VENDOR_ID_INTASHIELD, 0x0A83, + PCI_ANY_ID, PCI_ANY_ID, + 0, 0, + pbn_b2_2_115200 }, + /* + * Brainboxes UC-368 + */ + { PCI_VENDOR_ID_INTASHIELD, 0x0C41, + PCI_ANY_ID, PCI_ANY_ID, + 0, 0, + pbn_b2_4_115200 }, + /* + * Brainboxes UC-420/431 + */ + { PCI_VENDOR_ID_INTASHIELD, 0x0921, + PCI_ANY_ID, PCI_ANY_ID, + 0, 0, pbn_b2_4_115200 }, /* * Perle PCI-RAS cards
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
commit d1ad2721b1eb05d54e81393a7ebc332d4a35c68f upstream.
The file now rightfully throws up a big warning that it should never be included, so remove it from the header_check test.
Fixes: f23653fe6447 ("tty: Partially revert the removal of the Cyclades public API") Cc: stable stable@vger.kernel.org Cc: Masahiro Yamada masahiroy@kernel.org Cc: "Maciej W. Rozycki" macro@embecosm.com Reported-by: Stephen Rothwell sfr@canb.auug.org.au Reported-by: kernel test robot lkp@intel.com Link: https://lore.kernel.org/r/20220127073304.42399-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- usr/include/Makefile | 1 + 1 file changed, 1 insertion(+)
--- a/usr/include/Makefile +++ b/usr/include/Makefile @@ -28,6 +28,7 @@ no-header-test += linux/am437x-vpfe.h no-header-test += linux/android/binder.h no-header-test += linux/android/binderfs.h no-header-test += linux/coda.h +no-header-test += linux/cyclades.h no-header-test += linux/errqueue.h no-header-test += linux/fsmap.h no-header-test += linux/hdlc/ioctl.h
From: Alan Stern stern@rowland.harvard.edu
commit 5b67b315037250a61861119683e7fcb509deea25 upstream.
Two people have reported (and mentioned numerous other reports on the web) that VIA's VL817 USB-SATA bridge does not work with the uas driver. Typical log messages are:
[ 3606.232149] sd 14:0:0:0: [sdg] tag#2 uas_zap_pending 0 uas-tag 1 inflight: CMD [ 3606.232154] sd 14:0:0:0: [sdg] tag#2 CDB: Write(16) 8a 00 00 00 00 00 18 0c c9 80 00 00 00 80 00 00 [ 3606.306257] usb 4-4.4: reset SuperSpeed Plus Gen 2x1 USB device number 11 using xhci_hcd [ 3606.328584] scsi host14: uas_eh_device_reset_handler success
Surprisingly, the devices do seem to work okay for some other people. The cause of the differing behaviors is not known.
In the hope of getting the devices to work for the most users, even at the possible cost of degraded performance for some, this patch adds an unusual_devs entry for the VL817 to block it from binding to the uas driver by default. Users will be able to override this entry by means of a module parameter, if they want.
CC: stable@vger.kernel.org Reported-by: DocMAX mail@vacharakis.de Reported-and-tested-by: Thomas Weißschuh linux@weissschuh.net Signed-off-by: Alan Stern stern@rowland.harvard.edu Link: https://lore.kernel.org/r/Ye8IsK2sjlEv1rqU@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/storage/unusual_devs.h | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/drivers/usb/storage/unusual_devs.h +++ b/drivers/usb/storage/unusual_devs.h @@ -2301,6 +2301,16 @@ UNUSUAL_DEV( 0x2027, 0xa001, 0x0000, 0x USB_SC_DEVICE, USB_PR_DEVICE, usb_stor_euscsi_init, US_FL_SCM_MULT_TARG ),
+/* + * Reported by DocMAX mail@vacharakis.de + * and Thomas Weißschuh linux@weissschuh.net + */ +UNUSUAL_DEV( 0x2109, 0x0715, 0x9999, 0x9999, + "VIA Labs, Inc.", + "VL817 SATA Bridge", + USB_SC_DEVICE, USB_PR_DEVICE, NULL, + US_FL_IGNORE_UAS), + UNUSUAL_DEV( 0x2116, 0x0320, 0x0001, 0x0001, "ST", "2A",
From: Frank Li Frank.Li@nxp.com
commit 9df478463d9feb90dae24f183383961cf123a0ec upstream.
Crashed at i.mx8qm platform when suspend if enable remote wakeup
Internal error: synchronous external abort: 96000210 [#1] PREEMPT SMP Modules linked in: CPU: 2 PID: 244 Comm: kworker/u12:6 Not tainted 5.15.5-dirty #12 Hardware name: Freescale i.MX8QM MEK (DT) Workqueue: events_unbound async_run_entry_fn pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : xhci_disable_hub_port_wake.isra.62+0x60/0xf8 lr : xhci_disable_hub_port_wake.isra.62+0x34/0xf8 sp : ffff80001394bbf0 x29: ffff80001394bbf0 x28: 0000000000000000 x27: ffff00081193b578 x26: ffff00081193b570 x25: 0000000000000000 x24: 0000000000000000 x23: ffff00081193a29c x22: 0000000000020001 x21: 0000000000000001 x20: 0000000000000000 x19: ffff800014e90490 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000002 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000960 x9 : ffff80001394baa0 x8 : ffff0008145d1780 x7 : ffff0008f95b8e80 x6 : 000000001853b453 x5 : 0000000000000496 x4 : 0000000000000000 x3 : ffff00081193a29c x2 : 0000000000000001 x1 : 0000000000000000 x0 : ffff000814591620 Call trace: xhci_disable_hub_port_wake.isra.62+0x60/0xf8 xhci_suspend+0x58/0x510 xhci_plat_suspend+0x50/0x78 platform_pm_suspend+0x2c/0x78 dpm_run_callback.isra.25+0x50/0xe8 __device_suspend+0x108/0x3c0
The basic flow: 1. run time suspend call xhci_suspend, xhci parent devices gate the clock. 2. echo mem >/sys/power/state, system _device_suspend call xhci_suspend 3. xhci_suspend call xhci_disable_hub_port_wake, which access register, but clock already gated by run time suspend.
This problem was hidden by power domain driver, which call run time resume before it.
But the below commit remove it and make this issue happen. commit c1df456d0f06e ("PM: domains: Don't runtime resume devices at genpd_prepare()")
This patch call run time resume before suspend to make sure clock is on before access register.
Reviewed-by: Peter Chen peter.chen@kernel.org Cc: stable stable@vger.kernel.org Signed-off-by: Frank Li Frank.Li@nxp.com Testeb-by: Abel Vesa abel.vesa@nxp.com Link: https://lore.kernel.org/r/20220110172738.31686-1-Frank.Li@nxp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci-plat.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -437,6 +437,9 @@ static int __maybe_unused xhci_plat_susp struct xhci_hcd *xhci = hcd_to_xhci(hcd); int ret;
+ if (pm_runtime_suspended(dev)) + pm_runtime_resume(dev); + ret = xhci_priv_suspend_quirk(hcd); if (ret) return ret;
From: Jon Hunter jonathanh@nvidia.com
commit 2e3dd4a6246945bf84ea6f478365d116e661554c upstream.
Commit 7495af930835 ("ARM: multi_v7_defconfig: Enable drivers for DragonBoard 410c") enables the CONFIG_PHY_QCOM_USB_HS for the ARM multi_v7_defconfig. Enabling this Kconfig is causing the kernel to crash on the Tegra20 Ventana platform in the ulpi_match() function.
The Qualcomm USB HS PHY driver that is enabled by CONFIG_PHY_QCOM_USB_HS, registers a ulpi_driver but this driver does not provide an 'id_table', so when ulpi_match() is called on the Tegra20 Ventana platform, it crashes when attempting to deference the id_table pointer which is not valid. The Qualcomm USB HS PHY driver uses device-tree for matching the ULPI driver with the device and so fix this crash by using device-tree for matching if the id_table is not valid.
Fixes: ef6a7bcfb01c ("usb: ulpi: Support device discovery via DT") Cc: stable stable@vger.kernel.org Signed-off-by: Jon Hunter jonathanh@nvidia.com Link: https://lore.kernel.org/r/20220117150039.44058-1-jonathanh@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/common/ulpi.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/drivers/usb/common/ulpi.c +++ b/drivers/usb/common/ulpi.c @@ -39,8 +39,11 @@ static int ulpi_match(struct device *dev struct ulpi *ulpi = to_ulpi_dev(dev); const struct ulpi_device_id *id;
- /* Some ULPI devices don't have a vendor id so rely on OF match */ - if (ulpi->id.vendor == 0) + /* + * Some ULPI devices don't have a vendor id + * or provide an id_table so rely on OF match. + */ + if (ulpi->id.vendor == 0 || !drv->id_table) return of_driver_match_device(dev, driver);
for (id = drv->id_table; id->vendor; id++)
From: Pavankumar Kondeti quic_pkondeti@quicinc.com
commit 904edf8aeb459697129be5fde847e2a502f41fd9 upstream.
Currently when gadget enumerates in super speed plus, the isoc endpoint request buffer size is not calculated correctly. Fix this by checking the gadget speed against USB_SPEED_SUPER_PLUS and update the request buffer size.
Fixes: 90c4d05780d4 ("usb: fix various gadgets null ptr deref on 10gbps cabling.") Cc: stable stable@vger.kernel.org Signed-off-by: Pavankumar Kondeti quic_pkondeti@quicinc.com Link: https://lore.kernel.org/r/1642820602-20619-1-git-send-email-quic_pkondeti@qu... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/gadget/function/f_sourcesink.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/usb/gadget/function/f_sourcesink.c +++ b/drivers/usb/gadget/function/f_sourcesink.c @@ -584,6 +584,7 @@ static int source_sink_start_ep(struct f
if (is_iso) { switch (speed) { + case USB_SPEED_SUPER_PLUS: case USB_SPEED_SUPER: size = ss->isoc_maxpacket * (ss->isoc_mult + 1) *
From: Pawel Laszczak pawell@cadence.com
commit 79aa3e19fe8f5be30e846df8a436bfe306e8b1a6 upstream.
CDNSP driver read not initialized cdns->otg_v0_regs which lead to segmentation fault. Patch fixes this issue.
Fixes: 2cf2581cd229 ("usb: cdns3: add power lost support for system resume") cc: stable@vger.kernel.org Signed-off-by: Pawel Laszczak pawell@cadence.com Link: https://lore.kernel.org/r/20220111090737.10345-1-pawell@gli-login.cadence.co... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/cdns3/drd.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/usb/cdns3/drd.c +++ b/drivers/usb/cdns3/drd.c @@ -483,11 +483,11 @@ int cdns_drd_exit(struct cdns *cdns) /* Indicate the cdns3 core was power lost before */ bool cdns_power_is_lost(struct cdns *cdns) { - if (cdns->version == CDNS3_CONTROLLER_V1) { - if (!(readl(&cdns->otg_v1_regs->simulate) & BIT(0))) + if (cdns->version == CDNS3_CONTROLLER_V0) { + if (!(readl(&cdns->otg_v0_regs->simulate) & BIT(0))) return true; } else { - if (!(readl(&cdns->otg_v0_regs->simulate) & BIT(0))) + if (!(readl(&cdns->otg_v1_regs->simulate) & BIT(0))) return true; } return false;
From: Robert Hancock robert.hancock@calian.com
commit 9678f3361afc27a3124cd2824aec0227739986fb upstream.
It appears that the PIPE clock should not be selected when only USB 2.0 is being used in the design and no USB 3.0 reference clock is used. Also, the core resets are not required if a USB3 PHY is not in use, and will break things if USB3 is actually used but the PHY entry is not listed in the device tree.
Skip core resets and register settings that are only required for USB3 mode when no USB3 PHY is specified in the device tree.
Fixes: 84770f028fab ("usb: dwc3: Add driver for Xilinx platforms") Cc: stable stable@vger.kernel.org Signed-off-by: Robert Hancock robert.hancock@calian.com Link: https://lore.kernel.org/r/20220126000253.1586760-2-robert.hancock@calian.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc3/dwc3-xilinx.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
--- a/drivers/usb/dwc3/dwc3-xilinx.c +++ b/drivers/usb/dwc3/dwc3-xilinx.c @@ -110,6 +110,18 @@ static int dwc3_xlnx_init_zynqmp(struct usb3_phy = NULL; }
+ /* + * The following core resets are not required unless a USB3 PHY + * is used, and the subsequent register settings are not required + * unless a core reset is performed (they should be set properly + * by the first-stage boot loader, but may be reverted by a core + * reset). They may also break the configuration if USB3 is actually + * in use but the usb3-phy entry is missing from the device tree. + * Therefore, skip these operations in this case. + */ + if (!usb3_phy) + goto skip_usb3_phy; + crst = devm_reset_control_get_exclusive(dev, "usb_crst"); if (IS_ERR(crst)) { ret = PTR_ERR(crst); @@ -188,6 +200,7 @@ static int dwc3_xlnx_init_zynqmp(struct goto err; }
+skip_usb3_phy: /* * This routes the USB DMA traffic to go through FPD path instead * of reaching DDR directly. This traffic routing is needed to
From: Robert Hancock robert.hancock@calian.com
commit 2cc9b1c93b1c4caa2d971856c0780fb5f7d04692 upstream.
The code that looked up the USB3 PHY was ignoring all errors other than EPROBE_DEFER in an attempt to handle the PHY not being present. Fix and simplify the code by using devm_phy_optional_get and dev_err_probe so that a missing PHY is not treated as an error and unexpected errors are handled properly.
Fixes: 84770f028fab ("usb: dwc3: Add driver for Xilinx platforms") Cc: stable stable@vger.kernel.org Signed-off-by: Robert Hancock robert.hancock@calian.com Link: https://lore.kernel.org/r/20220126000253.1586760-3-robert.hancock@calian.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc3/dwc3-xilinx.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/drivers/usb/dwc3/dwc3-xilinx.c +++ b/drivers/usb/dwc3/dwc3-xilinx.c @@ -102,12 +102,12 @@ static int dwc3_xlnx_init_zynqmp(struct int ret; u32 reg;
- usb3_phy = devm_phy_get(dev, "usb3-phy"); - if (PTR_ERR(usb3_phy) == -EPROBE_DEFER) { - ret = -EPROBE_DEFER; + usb3_phy = devm_phy_optional_get(dev, "usb3-phy"); + if (IS_ERR(usb3_phy)) { + ret = PTR_ERR(usb3_phy); + dev_err_probe(dev, ret, + "failed to get USB3 PHY\n"); goto err; - } else if (IS_ERR(usb3_phy)) { - usb3_phy = NULL; }
/*
From: Alan Stern stern@rowland.harvard.edu
commit 26fbe9772b8c459687930511444ce443011f86bf upstream.
The syzbot fuzzer has identified a bug in which processes hang waiting for usb_kill_urb() to return. It turns out the issue is not unlinking the URB; that works just fine. Rather, the problem arises when the wakeup notification that the URB has completed is not received.
The reason is memory-access ordering on SMP systems. In outline form, usb_kill_urb() and __usb_hcd_giveback_urb() operating concurrently on different CPUs perform the following actions:
CPU 0 CPU 1 ---------------------------- --------------------------------- usb_kill_urb(): __usb_hcd_giveback_urb(): ... ... atomic_inc(&urb->reject); atomic_dec(&urb->use_count); ... ... wait_event(usb_kill_urb_queue, atomic_read(&urb->use_count) == 0); if (atomic_read(&urb->reject)) wake_up(&usb_kill_urb_queue);
Confining your attention to urb->reject and urb->use_count, you can see that the overall pattern of accesses on CPU 0 is:
write urb->reject, then read urb->use_count;
whereas the overall pattern of accesses on CPU 1 is:
write urb->use_count, then read urb->reject.
This pattern is referred to in memory-model circles as SB (for "Store Buffering"), and it is well known that without suitable enforcement of the desired order of accesses -- in the form of memory barriers -- it is entirely possible for one or both CPUs to execute their reads ahead of their writes. The end result will be that sometimes CPU 0 sees the old un-decremented value of urb->use_count while CPU 1 sees the old un-incremented value of urb->reject. Consequently CPU 0 ends up on the wait queue and never gets woken up, leading to the observed hang in usb_kill_urb().
The same pattern of accesses occurs in usb_poison_urb() and the failure pathway of usb_hcd_submit_urb().
The problem is fixed by adding suitable memory barriers. To provide proper memory-access ordering in the SB pattern, a full barrier is required on both CPUs. The atomic_inc() and atomic_dec() accesses themselves don't provide any memory ordering, but since they are present, we can use the optimized smp_mb__after_atomic() memory barrier in the various routines to obtain the desired effect.
This patch adds the necessary memory barriers.
CC: stable@vger.kernel.org Reported-and-tested-by: syzbot+76629376e06e2c2ad626@syzkaller.appspotmail.com Signed-off-by: Alan Stern stern@rowland.harvard.edu Link: https://lore.kernel.org/r/Ye8K0QYee0Q0Nna2@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/core/hcd.c | 14 ++++++++++++++ drivers/usb/core/urb.c | 12 ++++++++++++ 2 files changed, 26 insertions(+)
--- a/drivers/usb/core/hcd.c +++ b/drivers/usb/core/hcd.c @@ -1563,6 +1563,13 @@ int usb_hcd_submit_urb (struct urb *urb, urb->hcpriv = NULL; INIT_LIST_HEAD(&urb->urb_list); atomic_dec(&urb->use_count); + /* + * Order the write of urb->use_count above before the read + * of urb->reject below. Pairs with the memory barriers in + * usb_kill_urb() and usb_poison_urb(). + */ + smp_mb__after_atomic(); + atomic_dec(&urb->dev->urbnum); if (atomic_read(&urb->reject)) wake_up(&usb_kill_urb_queue); @@ -1665,6 +1672,13 @@ static void __usb_hcd_giveback_urb(struc
usb_anchor_resume_wakeups(anchor); atomic_dec(&urb->use_count); + /* + * Order the write of urb->use_count above before the read + * of urb->reject below. Pairs with the memory barriers in + * usb_kill_urb() and usb_poison_urb(). + */ + smp_mb__after_atomic(); + if (unlikely(atomic_read(&urb->reject))) wake_up(&usb_kill_urb_queue); usb_put_urb(urb); --- a/drivers/usb/core/urb.c +++ b/drivers/usb/core/urb.c @@ -715,6 +715,12 @@ void usb_kill_urb(struct urb *urb) if (!(urb && urb->dev && urb->ep)) return; atomic_inc(&urb->reject); + /* + * Order the write of urb->reject above before the read + * of urb->use_count below. Pairs with the barriers in + * __usb_hcd_giveback_urb() and usb_hcd_submit_urb(). + */ + smp_mb__after_atomic();
usb_hcd_unlink_urb(urb, -ENOENT); wait_event(usb_kill_urb_queue, atomic_read(&urb->use_count) == 0); @@ -756,6 +762,12 @@ void usb_poison_urb(struct urb *urb) if (!urb) return; atomic_inc(&urb->reject); + /* + * Order the write of urb->reject above before the read + * of urb->use_count below. Pairs with the barriers in + * __usb_hcd_giveback_urb() and usb_hcd_submit_urb(). + */ + smp_mb__after_atomic();
if (!urb->dev || !urb->ep) return;
From: Xu Yang xu.yang_2@nxp.com
commit 5638b0dfb6921f69943c705383ff40fb64b987f2 upstream.
With the AMS and Collision Avoidance, tcpm often needs to change the CC's termination. When one CC line is sourcing Vconn, if we still change its termination, the voltage of the another CC line is likely to be fluctuant and unstable.
Therefore, we should verify whether a CC line is sourcing Vconn before changing its termination and only change the termination that is not a Vconn line. This can be done by reading the Vconn Present bit of POWER_ STATUS register. To determine the polarity, we can read the Plug Orientation bit of TCPC_CONTROL register. Since Vconn can only be sourced if Plug Orientation is set.
Fixes: 0908c5aca31e ("usb: typec: tcpm: AMS and Collision Avoidance") cc: stable@vger.kernel.org Reviewed-by: Guenter Roeck linux@roeck-us.net Acked-by: Heikki Krogerus heikki.krogerus@linux.intel.com Signed-off-by: Xu Yang xu.yang_2@nxp.com Link: https://lore.kernel.org/r/20220113092943.752372-1-xu.yang_2@nxp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/tcpm/tcpci.c | 26 ++++++++++++++++++++++++++ drivers/usb/typec/tcpm/tcpci.h | 1 + 2 files changed, 27 insertions(+)
--- a/drivers/usb/typec/tcpm/tcpci.c +++ b/drivers/usb/typec/tcpm/tcpci.c @@ -75,9 +75,25 @@ static int tcpci_write16(struct tcpci *t static int tcpci_set_cc(struct tcpc_dev *tcpc, enum typec_cc_status cc) { struct tcpci *tcpci = tcpc_to_tcpci(tcpc); + bool vconn_pres; + enum typec_cc_polarity polarity = TYPEC_POLARITY_CC1; unsigned int reg; int ret;
+ ret = regmap_read(tcpci->regmap, TCPC_POWER_STATUS, ®); + if (ret < 0) + return ret; + + vconn_pres = !!(reg & TCPC_POWER_STATUS_VCONN_PRES); + if (vconn_pres) { + ret = regmap_read(tcpci->regmap, TCPC_TCPC_CTRL, ®); + if (ret < 0) + return ret; + + if (reg & TCPC_TCPC_CTRL_ORIENTATION) + polarity = TYPEC_POLARITY_CC2; + } + switch (cc) { case TYPEC_CC_RA: reg = (TCPC_ROLE_CTRL_CC_RA << TCPC_ROLE_CTRL_CC1_SHIFT) | @@ -112,6 +128,16 @@ static int tcpci_set_cc(struct tcpc_dev break; }
+ if (vconn_pres) { + if (polarity == TYPEC_POLARITY_CC2) { + reg &= ~(TCPC_ROLE_CTRL_CC1_MASK << TCPC_ROLE_CTRL_CC1_SHIFT); + reg |= (TCPC_ROLE_CTRL_CC_OPEN << TCPC_ROLE_CTRL_CC1_SHIFT); + } else { + reg &= ~(TCPC_ROLE_CTRL_CC2_MASK << TCPC_ROLE_CTRL_CC2_SHIFT); + reg |= (TCPC_ROLE_CTRL_CC_OPEN << TCPC_ROLE_CTRL_CC2_SHIFT); + } + } + ret = regmap_write(tcpci->regmap, TCPC_ROLE_CTRL, reg); if (ret < 0) return ret; --- a/drivers/usb/typec/tcpm/tcpci.h +++ b/drivers/usb/typec/tcpm/tcpci.h @@ -98,6 +98,7 @@ #define TCPC_POWER_STATUS_SOURCING_VBUS BIT(4) #define TCPC_POWER_STATUS_VBUS_DET BIT(3) #define TCPC_POWER_STATUS_VBUS_PRES BIT(2) +#define TCPC_POWER_STATUS_VCONN_PRES BIT(1) #define TCPC_POWER_STATUS_SINKING_VBUS BIT(0)
#define TCPC_FAULT_STATUS 0x1f
From: Badhri Jagan Sridharan badhri@google.com
commit 90b8aa9f5b09edae6928c0561f933fec9f7a9987 upstream.
With some chargers, vbus might momentarily raise above VSAFE5V and fall back to 0V before tcpm gets to read port->tcpc->get_vbus. This will will report a VBUS off event causing TCPM to transition to SNK_UNATTACHED where it should be waiting in either SNK_ATTACH_WAIT or SNK_DEBOUNCED state. This patch makes TCPM avoid vbus off events while in SNK_ATTACH_WAIT or SNK_DEBOUNCED state.
Stub from the spec: "4.5.2.2.4.2 Exiting from AttachWait.SNK State A Sink shall transition to Unattached.SNK when the state of both the CC1 and CC2 pins is SNK.Open for at least tPDDebounce. A DRP shall transition to Unattached.SRC when the state of both the CC1 and CC2 pins is SNK.Open for at least tPDDebounce."
[23.194131] CC1: 0 -> 0, CC2: 0 -> 5 [state SNK_UNATTACHED, polarity 0, connected] [23.201777] state change SNK_UNATTACHED -> SNK_ATTACH_WAIT [rev3 NONE_AMS] [23.209949] pending state change SNK_ATTACH_WAIT -> SNK_DEBOUNCED @ 170 ms [rev3 NONE_AMS] [23.300579] VBUS off [23.300668] state change SNK_ATTACH_WAIT -> SNK_UNATTACHED [rev3 NONE_AMS] [23.301014] VBUS VSAFE0V [23.301111] Start toggling
Fixes: f0690a25a140b8 ("staging: typec: USB Type-C Port Manager (tcpm)") Cc: stable@vger.kernel.org Acked-by: Heikki Krogerus heikki.krogerus@linux.intel.com Signed-off-by: Badhri Jagan Sridharan badhri@google.com Link: https://lore.kernel.org/r/20220122015520.332507-1-badhri@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/tcpm/tcpm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/usb/typec/tcpm/tcpm.c +++ b/drivers/usb/typec/tcpm/tcpm.c @@ -5156,7 +5156,8 @@ static void _tcpm_pd_vbus_off(struct tcp case SNK_TRYWAIT_DEBOUNCE: break; case SNK_ATTACH_WAIT: - tcpm_set_state(port, SNK_UNATTACHED, 0); + case SNK_DEBOUNCED: + /* Do nothing, as TCPM is still waiting for vbus to reaach VSAFE5V to connect */ break;
case SNK_NEGOTIATE_CAPABILITIES:
From: Badhri Jagan Sridharan badhri@google.com
commit 746f96e7d6f7a276726860f696671766bfb24cf0 upstream.
With some chargers, vbus might momentarily raise above VSAFE5V and fall back to 0V causing VSAFE0V to be triggered. This will will report a VBUS off event causing TCPM to transition to SNK_UNATTACHED state where it should be waiting in either SNK_ATTACH_WAIT or SNK_DEBOUNCED state. This patch makes TCPM avoid VSAFE0V events while in SNK_ATTACH_WAIT or SNK_DEBOUNCED state.
Stub from the spec: "4.5.2.2.4.2 Exiting from AttachWait.SNK State A Sink shall transition to Unattached.SNK when the state of both the CC1 and CC2 pins is SNK.Open for at least tPDDebounce. A DRP shall transition to Unattached.SRC when the state of both the CC1 and CC2 pins is SNK.Open for at least tPDDebounce."
[23.194131] CC1: 0 -> 0, CC2: 0 -> 5 [state SNK_UNATTACHED, polarity 0, connected] [23.201777] state change SNK_UNATTACHED -> SNK_ATTACH_WAIT [rev3 NONE_AMS] [23.209949] pending state change SNK_ATTACH_WAIT -> SNK_DEBOUNCED @ 170 ms [rev3 NONE_AMS] [23.300579] VBUS off [23.300668] state change SNK_ATTACH_WAIT -> SNK_UNATTACHED [rev3 NONE_AMS] [23.301014] VBUS VSAFE0V [23.301111] Start toggling
Fixes: 28b43d3d746b8 ("usb: typec: tcpm: Introduce vsafe0v for vbus") Cc: stable@vger.kernel.org Acked-by: Heikki Krogerus heikki.krogerus@linux.intel.com Signed-off-by: Badhri Jagan Sridharan badhri@google.com Link: https://lore.kernel.org/r/20220122015520.332507-2-badhri@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/tcpm/tcpm.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/usb/typec/tcpm/tcpm.c +++ b/drivers/usb/typec/tcpm/tcpm.c @@ -5264,6 +5264,10 @@ static void _tcpm_pd_vbus_vsafe0v(struct case PR_SWAP_SNK_SRC_SOURCE_ON: /* Do nothing, vsafe0v is expected during transition */ break; + case SNK_ATTACH_WAIT: + case SNK_DEBOUNCED: + /*Do nothing, still waiting for VSAFE5V for connect */ + break; default: if (port->pwr_role == TYPEC_SINK && port->auto_vbus_discharge_enabled) tcpm_set_state(port, SNK_UNATTACHED, 0);
From: Sing-Han Chen singhanc@nvidia.com
commit 825911492eb15bf8bb7fb94bc0c0421fe7a6327d upstream.
CCGx clears Bit 0:Device Interrupt in the INTR_REG if CCGx is reset successfully. However, there might be a chance that other bits in INTR_REG are not cleared due to internal data queued in PPM. This case misleads the driver that CCGx reset failed.
The commit checks bit 0 in INTR_REG and ignores other bits. The ucsi driver would reset PPM later.
Fixes: 247c554a14aa ("usb: typec: ucsi: add support for Cypress CCGx") Cc: stable@vger.kernel.org Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Signed-off-by: Sing-Han Chen singhanc@nvidia.com Signed-off-by: Wayne Chang waynec@nvidia.com Link: https://lore.kernel.org/r/20220112094143.628610-1-waynec@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/ucsi/ucsi_ccg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/usb/typec/ucsi/ucsi_ccg.c +++ b/drivers/usb/typec/ucsi/ucsi_ccg.c @@ -325,7 +325,7 @@ static int ucsi_ccg_init(struct ucsi_ccg if (status < 0) return status;
- if (!data) + if (!(data & DEV_INT)) return 0;
status = ccg_write(uc, CCGX_RAB_INTR_REG, &data, sizeof(data));
From: Lorenzo Bianconi lorenzo@kernel.org
commit 680a2ead741ad9b479a53adf154ed5eee74d2b9a upstream.
Similar to MCU_EXT_CMD, introduce MCU_CE_CMD for CE commands
Stable kernel notes:
Upstream commit 547224024579 (mt76: connac: introduce MCU_UNI_CMD macro, 2021-12-09) introduced a bug by removing MCU_UNI_PREFIX, but not updating MCU_CMD_MASK accordingly, so when commands are compared in mt7921_mcu_parse_response() one has the extra bit __MCU_CMD_FIELD_UNI set and the comparison fails:
if (mcu_cmd != event->cid) if (20001 != 1)
The fix was sneaked by in the next commit 680a2ead741a (mt76: connac: introduce MCU_CE_CMD macro, 2021-12-09):
- int mcu_cmd = cmd & MCU_CMD_MASK; + int mcu_cmd = FIELD_GET(__MCU_CMD_FIELD_ID, cmd);
But it was never merged into linux-stable.
We need either both commits, or none.
Signed-off-by: Lorenzo Bianconi lorenzo@kernel.org Signed-off-by: Felix Fietkau nbd@nbd.name Signed-off-by: Felipe Contreras felipe.contreras@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/mediatek/mt76/mt7615/mcu.c | 16 +++--- drivers/net/wireless/mediatek/mt76/mt76_connac_mcu.c | 47 ++++++++++-------- drivers/net/wireless/mediatek/mt76/mt76_connac_mcu.h | 48 +++++++++---------- drivers/net/wireless/mediatek/mt76/mt7921/mcu.c | 24 ++++----- drivers/net/wireless/mediatek/mt76/mt7921/testmode.c | 4 - 5 files changed, 73 insertions(+), 66 deletions(-)
--- a/drivers/net/wireless/mediatek/mt76/mt7615/mcu.c +++ b/drivers/net/wireless/mediatek/mt76/mt7615/mcu.c @@ -145,7 +145,7 @@ void mt7615_mcu_fill_msg(struct mt7615_d mcu_txd->cid = mcu_cmd; mcu_txd->ext_cid = FIELD_GET(__MCU_CMD_FIELD_EXT_ID, cmd);
- if (mcu_txd->ext_cid || (cmd & MCU_CE_PREFIX)) { + if (mcu_txd->ext_cid || (cmd & __MCU_CMD_FIELD_CE)) { if (cmd & __MCU_CMD_FIELD_QUERY) mcu_txd->set_query = MCU_Q_QUERY; else @@ -193,7 +193,7 @@ int mt7615_mcu_parse_response(struct mt7 skb_pull(skb, sizeof(*rxd)); event = (struct mt7615_mcu_uni_event *)skb->data; ret = le32_to_cpu(event->status); - } else if (cmd == MCU_CMD_REG_READ) { + } else if (cmd == MCU_CE_QUERY(REG_READ)) { struct mt7615_mcu_reg_event *event;
skb_pull(skb, sizeof(*rxd)); @@ -2737,13 +2737,13 @@ int mt7615_mcu_set_bss_pm(struct mt7615_ if (vif->type != NL80211_IFTYPE_STATION) return 0;
- err = mt76_mcu_send_msg(&dev->mt76, MCU_CMD_SET_BSS_ABORT, &req_hdr, - sizeof(req_hdr), false); + err = mt76_mcu_send_msg(&dev->mt76, MCU_CE_CMD(SET_BSS_ABORT), + &req_hdr, sizeof(req_hdr), false); if (err < 0 || !enable) return err;
- return mt76_mcu_send_msg(&dev->mt76, MCU_CMD_SET_BSS_CONNECTED, &req, - sizeof(req), false); + return mt76_mcu_send_msg(&dev->mt76, MCU_CE_CMD(SET_BSS_CONNECTED), + &req, sizeof(req), false); }
int mt7615_mcu_set_roc(struct mt7615_phy *phy, struct ieee80211_vif *vif, @@ -2762,6 +2762,6 @@ int mt7615_mcu_set_roc(struct mt7615_phy
phy->roc_grant = false;
- return mt76_mcu_send_msg(&dev->mt76, MCU_CMD_SET_ROC, &req, - sizeof(req), false); + return mt76_mcu_send_msg(&dev->mt76, MCU_CE_CMD(SET_ROC), + &req, sizeof(req), false); } --- a/drivers/net/wireless/mediatek/mt76/mt76_connac_mcu.c +++ b/drivers/net/wireless/mediatek/mt76/mt76_connac_mcu.c @@ -160,7 +160,8 @@ int mt76_connac_mcu_set_channel_domain(s
memcpy(__skb_push(skb, sizeof(hdr)), &hdr, sizeof(hdr));
- return mt76_mcu_skb_send_msg(dev, skb, MCU_CMD_SET_CHAN_DOMAIN, false); + return mt76_mcu_skb_send_msg(dev, skb, MCU_CE_CMD(SET_CHAN_DOMAIN), + false); } EXPORT_SYMBOL_GPL(mt76_connac_mcu_set_channel_domain);
@@ -198,8 +199,8 @@ int mt76_connac_mcu_set_vif_ps(struct mt if (vif->type != NL80211_IFTYPE_STATION) return -EOPNOTSUPP;
- return mt76_mcu_send_msg(dev, MCU_CMD_SET_PS_PROFILE, &req, - sizeof(req), false); + return mt76_mcu_send_msg(dev, MCU_CE_CMD(SET_PS_PROFILE), + &req, sizeof(req), false); } EXPORT_SYMBOL_GPL(mt76_connac_mcu_set_vif_ps);
@@ -1523,7 +1524,8 @@ int mt76_connac_mcu_hw_scan(struct mt76_ req->scan_func |= SCAN_FUNC_RANDOM_MAC; }
- err = mt76_mcu_skb_send_msg(mdev, skb, MCU_CMD_START_HW_SCAN, false); + err = mt76_mcu_skb_send_msg(mdev, skb, MCU_CE_CMD(START_HW_SCAN), + false); if (err < 0) clear_bit(MT76_HW_SCANNING, &phy->state);
@@ -1551,8 +1553,8 @@ int mt76_connac_mcu_cancel_hw_scan(struc ieee80211_scan_completed(phy->hw, &info); }
- return mt76_mcu_send_msg(phy->dev, MCU_CMD_CANCEL_HW_SCAN, &req, - sizeof(req), false); + return mt76_mcu_send_msg(phy->dev, MCU_CE_CMD(CANCEL_HW_SCAN), + &req, sizeof(req), false); } EXPORT_SYMBOL_GPL(mt76_connac_mcu_cancel_hw_scan);
@@ -1638,7 +1640,8 @@ int mt76_connac_mcu_sched_scan_req(struc memcpy(skb_put(skb, sreq->ie_len), sreq->ie, sreq->ie_len); }
- return mt76_mcu_skb_send_msg(mdev, skb, MCU_CMD_SCHED_SCAN_REQ, false); + return mt76_mcu_skb_send_msg(mdev, skb, MCU_CE_CMD(SCHED_SCAN_REQ), + false); } EXPORT_SYMBOL_GPL(mt76_connac_mcu_sched_scan_req);
@@ -1658,8 +1661,8 @@ int mt76_connac_mcu_sched_scan_enable(st else clear_bit(MT76_HW_SCHED_SCANNING, &phy->state);
- return mt76_mcu_send_msg(phy->dev, MCU_CMD_SCHED_SCAN_ENABLE, &req, - sizeof(req), false); + return mt76_mcu_send_msg(phy->dev, MCU_CE_CMD(SCHED_SCAN_ENABLE), + &req, sizeof(req), false); } EXPORT_SYMBOL_GPL(mt76_connac_mcu_sched_scan_enable);
@@ -1671,8 +1674,8 @@ int mt76_connac_mcu_chip_config(struct m
memcpy(req.data, "assert", 7);
- return mt76_mcu_send_msg(dev, MCU_CMD_CHIP_CONFIG, &req, sizeof(req), - false); + return mt76_mcu_send_msg(dev, MCU_CE_CMD(CHIP_CONFIG), + &req, sizeof(req), false); } EXPORT_SYMBOL_GPL(mt76_connac_mcu_chip_config);
@@ -1684,8 +1687,8 @@ int mt76_connac_mcu_set_deep_sleep(struc
snprintf(req.data, sizeof(req.data), "KeepFullPwr %d", !enable);
- return mt76_mcu_send_msg(dev, MCU_CMD_CHIP_CONFIG, &req, sizeof(req), - false); + return mt76_mcu_send_msg(dev, MCU_CE_CMD(CHIP_CONFIG), + &req, sizeof(req), false); } EXPORT_SYMBOL_GPL(mt76_connac_mcu_set_deep_sleep);
@@ -1787,8 +1790,8 @@ int mt76_connac_mcu_get_nic_capability(s struct sk_buff *skb; int ret, i;
- ret = mt76_mcu_send_and_get_msg(phy->dev, MCU_CMD_GET_NIC_CAPAB, NULL, - 0, true, &skb); + ret = mt76_mcu_send_and_get_msg(phy->dev, MCU_CE_CMD(GET_NIC_CAPAB), + NULL, 0, true, &skb); if (ret) return ret;
@@ -2071,7 +2074,8 @@ mt76_connac_mcu_rate_txpower_band(struct memcpy(skb->data, &tx_power_tlv, sizeof(tx_power_tlv));
err = mt76_mcu_skb_send_msg(dev, skb, - MCU_CMD_SET_RATE_TX_POWER, false); + MCU_CE_CMD(SET_RATE_TX_POWER), + false); if (err < 0) return err; } @@ -2163,8 +2167,8 @@ int mt76_connac_mcu_set_p2p_oppps(struct .bss_idx = mvif->idx, };
- return mt76_mcu_send_msg(phy->dev, MCU_CMD_SET_P2P_OPPPS, &req, - sizeof(req), false); + return mt76_mcu_send_msg(phy->dev, MCU_CE_CMD(SET_P2P_OPPPS), + &req, sizeof(req), false); } EXPORT_SYMBOL_GPL(mt76_connac_mcu_set_p2p_oppps);
@@ -2490,8 +2494,8 @@ u32 mt76_connac_mcu_reg_rr(struct mt76_d .addr = cpu_to_le32(offset), };
- return mt76_mcu_send_msg(dev, MCU_CMD_REG_READ, &req, sizeof(req), - true); + return mt76_mcu_send_msg(dev, MCU_CE_QUERY(REG_READ), &req, + sizeof(req), true); } EXPORT_SYMBOL_GPL(mt76_connac_mcu_reg_rr);
@@ -2505,7 +2509,8 @@ void mt76_connac_mcu_reg_wr(struct mt76_ .val = cpu_to_le32(val), };
- mt76_mcu_send_msg(dev, MCU_CMD_REG_WRITE, &req, sizeof(req), false); + mt76_mcu_send_msg(dev, MCU_CE_CMD(REG_WRITE), &req, + sizeof(req), false); } EXPORT_SYMBOL_GPL(mt76_connac_mcu_reg_wr);
--- a/drivers/net/wireless/mediatek/mt76/mt76_connac_mcu.h +++ b/drivers/net/wireless/mediatek/mt76/mt76_connac_mcu.h @@ -496,13 +496,11 @@ enum { #define MCU_CMD_UNI_EXT_ACK (MCU_CMD_ACK | MCU_CMD_UNI | \ MCU_CMD_QUERY)
-#define MCU_CE_PREFIX BIT(29) -#define MCU_CMD_MASK ~(MCU_CE_PREFIX) - #define __MCU_CMD_FIELD_ID GENMASK(7, 0) #define __MCU_CMD_FIELD_EXT_ID GENMASK(15, 8) #define __MCU_CMD_FIELD_QUERY BIT(16) #define __MCU_CMD_FIELD_UNI BIT(17) +#define __MCU_CMD_FIELD_CE BIT(18)
#define MCU_CMD(_t) FIELD_PREP(__MCU_CMD_FIELD_ID, \ MCU_CMD_##_t) @@ -513,6 +511,10 @@ enum { #define MCU_UNI_CMD(_t) (__MCU_CMD_FIELD_UNI | \ FIELD_PREP(__MCU_CMD_FIELD_ID, \ MCU_UNI_CMD_##_t)) +#define MCU_CE_CMD(_t) (__MCU_CMD_FIELD_CE | \ + FIELD_PREP(__MCU_CMD_FIELD_ID, \ + MCU_CE_CMD_##_t)) +#define MCU_CE_QUERY(_t) (MCU_CE_CMD(_t) | __MCU_CMD_FIELD_QUERY)
enum { MCU_EXT_CMD_EFUSE_ACCESS = 0x01, @@ -589,26 +591,26 @@ enum {
/* offload mcu commands */ enum { - MCU_CMD_TEST_CTRL = MCU_CE_PREFIX | 0x01, - MCU_CMD_START_HW_SCAN = MCU_CE_PREFIX | 0x03, - MCU_CMD_SET_PS_PROFILE = MCU_CE_PREFIX | 0x05, - MCU_CMD_SET_CHAN_DOMAIN = MCU_CE_PREFIX | 0x0f, - MCU_CMD_SET_BSS_CONNECTED = MCU_CE_PREFIX | 0x16, - MCU_CMD_SET_BSS_ABORT = MCU_CE_PREFIX | 0x17, - MCU_CMD_CANCEL_HW_SCAN = MCU_CE_PREFIX | 0x1b, - MCU_CMD_SET_ROC = MCU_CE_PREFIX | 0x1d, - MCU_CMD_SET_P2P_OPPPS = MCU_CE_PREFIX | 0x33, - MCU_CMD_SET_RATE_TX_POWER = MCU_CE_PREFIX | 0x5d, - MCU_CMD_SCHED_SCAN_ENABLE = MCU_CE_PREFIX | 0x61, - MCU_CMD_SCHED_SCAN_REQ = MCU_CE_PREFIX | 0x62, - MCU_CMD_GET_NIC_CAPAB = MCU_CE_PREFIX | 0x8a, - MCU_CMD_SET_MU_EDCA_PARMS = MCU_CE_PREFIX | 0xb0, - MCU_CMD_REG_WRITE = MCU_CE_PREFIX | 0xc0, - MCU_CMD_REG_READ = MCU_CE_PREFIX | __MCU_CMD_FIELD_QUERY | 0xc0, - MCU_CMD_CHIP_CONFIG = MCU_CE_PREFIX | 0xca, - MCU_CMD_FWLOG_2_HOST = MCU_CE_PREFIX | 0xc5, - MCU_CMD_GET_WTBL = MCU_CE_PREFIX | 0xcd, - MCU_CMD_GET_TXPWR = MCU_CE_PREFIX | 0xd0, + MCU_CE_CMD_TEST_CTRL = 0x01, + MCU_CE_CMD_START_HW_SCAN = 0x03, + MCU_CE_CMD_SET_PS_PROFILE = 0x05, + MCU_CE_CMD_SET_CHAN_DOMAIN = 0x0f, + MCU_CE_CMD_SET_BSS_CONNECTED = 0x16, + MCU_CE_CMD_SET_BSS_ABORT = 0x17, + MCU_CE_CMD_CANCEL_HW_SCAN = 0x1b, + MCU_CE_CMD_SET_ROC = 0x1d, + MCU_CE_CMD_SET_P2P_OPPPS = 0x33, + MCU_CE_CMD_SET_RATE_TX_POWER = 0x5d, + MCU_CE_CMD_SCHED_SCAN_ENABLE = 0x61, + MCU_CE_CMD_SCHED_SCAN_REQ = 0x62, + MCU_CE_CMD_GET_NIC_CAPAB = 0x8a, + MCU_CE_CMD_SET_MU_EDCA_PARMS = 0xb0, + MCU_CE_CMD_REG_WRITE = 0xc0, + MCU_CE_CMD_REG_READ = 0xc0, + MCU_CE_CMD_CHIP_CONFIG = 0xca, + MCU_CE_CMD_FWLOG_2_HOST = 0xc5, + MCU_CE_CMD_GET_WTBL = 0xcd, + MCU_CE_CMD_GET_TXPWR = 0xd0, };
enum { --- a/drivers/net/wireless/mediatek/mt76/mt7921/mcu.c +++ b/drivers/net/wireless/mediatek/mt76/mt7921/mcu.c @@ -163,8 +163,8 @@ mt7921_mcu_parse_eeprom(struct mt76_dev int mt7921_mcu_parse_response(struct mt76_dev *mdev, int cmd, struct sk_buff *skb, int seq) { + int mcu_cmd = FIELD_GET(__MCU_CMD_FIELD_ID, cmd); struct mt7921_mcu_rxd *rxd; - int mcu_cmd = cmd & MCU_CMD_MASK; int ret = 0;
if (!skb) { @@ -201,7 +201,7 @@ int mt7921_mcu_parse_response(struct mt7 /* skip invalid event */ if (mcu_cmd != event->cid) ret = -EAGAIN; - } else if (cmd == MCU_CMD_REG_READ) { + } else if (cmd == MCU_CE_QUERY(REG_READ)) { struct mt7921_mcu_reg_event *event;
skb_pull(skb, sizeof(*rxd)); @@ -274,7 +274,7 @@ int mt7921_mcu_fill_message(struct mt76_ mcu_txd->s2d_index = MCU_S2D_H2N; mcu_txd->ext_cid = FIELD_GET(__MCU_CMD_FIELD_EXT_ID, cmd);
- if (mcu_txd->ext_cid || (cmd & MCU_CE_PREFIX)) { + if (mcu_txd->ext_cid || (cmd & __MCU_CMD_FIELD_CE)) { if (cmd & __MCU_CMD_FIELD_QUERY) mcu_txd->set_query = MCU_Q_QUERY; else @@ -883,8 +883,8 @@ int mt7921_mcu_fw_log_2_host(struct mt79 .ctrl_val = ctrl };
- return mt76_mcu_send_msg(&dev->mt76, MCU_CMD_FWLOG_2_HOST, &data, - sizeof(data), false); + return mt76_mcu_send_msg(&dev->mt76, MCU_CE_CMD(FWLOG_2_HOST), + &data, sizeof(data), false); }
int mt7921_run_firmware(struct mt7921_dev *dev) @@ -1009,8 +1009,8 @@ int mt7921_mcu_set_tx(struct mt7921_dev e->timer = q->mu_edca_timer; }
- return mt76_mcu_send_msg(&dev->mt76, MCU_CMD_SET_MU_EDCA_PARMS, &req_mu, - sizeof(req_mu), false); + return mt76_mcu_send_msg(&dev->mt76, MCU_CE_CMD(SET_MU_EDCA_PARMS), + &req_mu, sizeof(req_mu), false); }
int mt7921_mcu_set_chan_info(struct mt7921_phy *phy, int cmd) @@ -1214,13 +1214,13 @@ mt7921_mcu_set_bss_pm(struct mt7921_dev if (vif->type != NL80211_IFTYPE_STATION) return 0;
- err = mt76_mcu_send_msg(&dev->mt76, MCU_CMD_SET_BSS_ABORT, &req_hdr, - sizeof(req_hdr), false); + err = mt76_mcu_send_msg(&dev->mt76, MCU_CE_CMD(SET_BSS_ABORT), + &req_hdr, sizeof(req_hdr), false); if (err < 0 || !enable) return err;
- return mt76_mcu_send_msg(&dev->mt76, MCU_CMD_SET_BSS_CONNECTED, &req, - sizeof(req), false); + return mt76_mcu_send_msg(&dev->mt76, MCU_CE_CMD(SET_BSS_CONNECTED), + &req, sizeof(req), false); }
int mt7921_mcu_sta_update(struct mt7921_dev *dev, struct ieee80211_sta *sta, @@ -1330,7 +1330,7 @@ int mt7921_get_txpwr_info(struct mt7921_ struct sk_buff *skb; int ret;
- ret = mt76_mcu_send_and_get_msg(&dev->mt76, MCU_CMD_GET_TXPWR, + ret = mt76_mcu_send_and_get_msg(&dev->mt76, MCU_CE_CMD(GET_TXPWR), &req, sizeof(req), true, &skb); if (ret) return ret; --- a/drivers/net/wireless/mediatek/mt76/mt7921/testmode.c +++ b/drivers/net/wireless/mediatek/mt76/mt7921/testmode.c @@ -66,7 +66,7 @@ mt7921_tm_set(struct mt7921_dev *dev, st if (!mt76_testmode_enabled(phy)) goto out;
- ret = mt76_mcu_send_msg(&dev->mt76, MCU_CMD_TEST_CTRL, &cmd, + ret = mt76_mcu_send_msg(&dev->mt76, MCU_CE_CMD(TEST_CTRL), &cmd, sizeof(cmd), false); if (ret) goto out; @@ -95,7 +95,7 @@ mt7921_tm_query(struct mt7921_dev *dev, struct sk_buff *skb; int ret;
- ret = mt76_mcu_send_and_get_msg(&dev->mt76, MCU_CMD_TEST_CTRL, + ret = mt76_mcu_send_and_get_msg(&dev->mt76, MCU_CE_CMD(TEST_CTRL), &cmd, sizeof(cmd), true, &skb); if (ret) goto out;
From: Peter Collingbourne pcc@google.com
commit 27fe73394a1c6d0b07fa4d95f1bca116d1cc66e9 upstream.
It has been reported that the tag setting operation on newly-allocated pages can cause the page flags to be corrupted when performed concurrently with other flag updates as a result of the use of non-atomic operations.
Fix the problem by using a compare-exchange loop to update the tag.
Link: https://lkml.kernel.org/r/20220120020148.1632253-1-pcc@google.com Link: https://linux-review.googlesource.com/id/I456b24a2b9067d93968d43b4bb3351c0ce... Fixes: 2813b9c02962 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc") Signed-off-by: Peter Collingbourne pcc@google.com Reviewed-by: Andrey Konovalov andreyknvl@gmail.com Cc: Peter Zijlstra peterz@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/mm.h | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)
--- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1524,11 +1524,18 @@ static inline u8 page_kasan_tag(const st
static inline void page_kasan_tag_set(struct page *page, u8 tag) { - if (kasan_enabled()) { - tag ^= 0xff; - page->flags &= ~(KASAN_TAG_MASK << KASAN_TAG_PGSHIFT); - page->flags |= (tag & KASAN_TAG_MASK) << KASAN_TAG_PGSHIFT; - } + unsigned long old_flags, flags; + + if (!kasan_enabled()) + return; + + tag ^= 0xff; + old_flags = READ_ONCE(page->flags); + do { + flags = old_flags; + flags &= ~(KASAN_TAG_MASK << KASAN_TAG_PGSHIFT); + flags |= (tag & KASAN_TAG_MASK) << KASAN_TAG_PGSHIFT; + } while (unlikely(!try_cmpxchg(&page->flags, &old_flags, flags))); }
static inline void page_kasan_tag_reset(struct page *page)
From: Joseph Qi joseph.qi@linux.alibaba.com
commit 4cd1103d8c66b2cdb7e64385c274edb0ac5e8887 upstream.
Patch series "ocfs2: fix a deadlock case".
This fixes a deadlock case in ocfs2. We firstly export jbd2 symbols jbd2_journal_[grab|put]_journal_head as preparation and later use them in ocfs2 insread of jbd_[lock|unlock]_bh_journal_head to fix the deadlock.
This patch (of 2):
This exports symbols jbd2_journal_[grab|put]_journal_head, which will be used outside modules, e.g. ocfs2.
Link: https://lkml.kernel.org/r/20220121071205.100648-2-joseph.qi@linux.alibaba.co... Signed-off-by: Joseph Qi joseph.qi@linux.alibaba.com Cc: Mark Fasheh mark@fasheh.com Cc: Joel Becker jlbec@evilplan.org Cc: Junxiao Bi junxiao.bi@oracle.com Cc: Changwei Ge gechangwei@live.cn Cc: Gang He ghe@suse.com Cc: Jun Piao piaojun@huawei.com Cc: Andreas Dilger adilger.kernel@dilger.ca Cc: Gautham Ananthakrishna gautham.ananthakrishna@oracle.com Cc: Saeed Mirzamohammadi saeed.mirzamohammadi@oracle.com Cc: "Theodore Ts'o" tytso@mit.edu Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/jbd2/journal.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -2970,6 +2970,7 @@ struct journal_head *jbd2_journal_grab_j jbd_unlock_bh_journal_head(bh); return jh; } +EXPORT_SYMBOL(jbd2_journal_grab_journal_head);
static void __journal_remove_journal_head(struct buffer_head *bh) { @@ -3022,6 +3023,7 @@ void jbd2_journal_put_journal_head(struc jbd_unlock_bh_journal_head(bh); } } +EXPORT_SYMBOL(jbd2_journal_put_journal_head);
/* * Initialize jbd inode head
From: Joseph Qi joseph.qi@linux.alibaba.com
commit ddf4b773aa40790dfa936bd845c18e735a49c61c upstream.
commit 6f1b228529ae introduces a regression which can deadlock as follows:
Task1: Task2: jbd2_journal_commit_transaction ocfs2_test_bg_bit_allocatable spin_lock(&jh->b_state_lock) jbd_lock_bh_journal_head __jbd2_journal_remove_checkpoint spin_lock(&jh->b_state_lock) jbd2_journal_put_journal_head jbd_lock_bh_journal_head
Task1 and Task2 lock bh->b_state and jh->b_state_lock in different order, which finally result in a deadlock.
So use jbd2_journal_[grab|put]_journal_head instead in ocfs2_test_bg_bit_allocatable() to fix it.
Link: https://lkml.kernel.org/r/20220121071205.100648-3-joseph.qi@linux.alibaba.co... Fixes: 6f1b228529ae ("ocfs2: fix race between searching chunks and release journal_head from buffer_head") Signed-off-by: Joseph Qi joseph.qi@linux.alibaba.com Reported-by: Gautham Ananthakrishna gautham.ananthakrishna@oracle.com Tested-by: Gautham Ananthakrishna gautham.ananthakrishna@oracle.com Reported-by: Saeed Mirzamohammadi saeed.mirzamohammadi@oracle.com Cc: "Theodore Ts'o" tytso@mit.edu Cc: Andreas Dilger adilger.kernel@dilger.ca Cc: Changwei Ge gechangwei@live.cn Cc: Gang He ghe@suse.com Cc: Joel Becker jlbec@evilplan.org Cc: Jun Piao piaojun@huawei.com Cc: Junxiao Bi junxiao.bi@oracle.com Cc: Mark Fasheh mark@fasheh.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ocfs2/suballoc.c | 25 +++++++++++-------------- 1 file changed, 11 insertions(+), 14 deletions(-)
--- a/fs/ocfs2/suballoc.c +++ b/fs/ocfs2/suballoc.c @@ -1251,26 +1251,23 @@ static int ocfs2_test_bg_bit_allocatable { struct ocfs2_group_desc *bg = (struct ocfs2_group_desc *) bg_bh->b_data; struct journal_head *jh; - int ret = 1; + int ret;
if (ocfs2_test_bit(nr, (unsigned long *)bg->bg_bitmap)) return 0;
- if (!buffer_jbd(bg_bh)) + jh = jbd2_journal_grab_journal_head(bg_bh); + if (!jh) return 1;
- jbd_lock_bh_journal_head(bg_bh); - if (buffer_jbd(bg_bh)) { - jh = bh2jh(bg_bh); - spin_lock(&jh->b_state_lock); - bg = (struct ocfs2_group_desc *) jh->b_committed_data; - if (bg) - ret = !ocfs2_test_bit(nr, (unsigned long *)bg->bg_bitmap); - else - ret = 1; - spin_unlock(&jh->b_state_lock); - } - jbd_unlock_bh_journal_head(bg_bh); + spin_lock(&jh->b_state_lock); + bg = (struct ocfs2_group_desc *) jh->b_committed_data; + if (bg) + ret = !ocfs2_test_bit(nr, (unsigned long *)bg->bg_bitmap); + else + ret = 1; + spin_unlock(&jh->b_state_lock); + jbd2_journal_put_journal_head(jh);
return ret; }
From: Mathieu Desnoyers mathieu.desnoyers@efficios.com
commit 809232619f5b15e31fb3563985e705454f32621f upstream.
The membarrier command MEMBARRIER_CMD_QUERY allows querying the available membarrier commands. When the membarrier-rseq fence commands were added, a new MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ_BITMASK was introduced with the intent to expose them with the MEMBARRIER_CMD_QUERY command, the but it was never added to MEMBARRIER_CMD_BITMASK.
The membarrier-rseq fence commands are therefore not wired up with the query command.
Rename MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ_BITMASK to MEMBARRIER_PRIVATE_EXPEDITED_RSEQ_BITMASK (the bitmask is not a command per-se), and change the erroneous MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ_BITMASK (which does not actually exist) to MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ.
Wire up MEMBARRIER_PRIVATE_EXPEDITED_RSEQ_BITMASK in MEMBARRIER_CMD_BITMASK. Fixing this allows discovering availability of the membarrier-rseq fence feature.
Fixes: 2a36ab717e8f ("rseq/membarrier: Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ") Signed-off-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org # 5.10+ Link: https://lkml.kernel.org/r/20220117203010.30129-1-mathieu.desnoyers@efficios.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sched/membarrier.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -147,11 +147,11 @@ #endif
#ifdef CONFIG_RSEQ -#define MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ_BITMASK \ +#define MEMBARRIER_PRIVATE_EXPEDITED_RSEQ_BITMASK \ (MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ \ - | MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ_BITMASK) + | MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ) #else -#define MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ_BITMASK 0 +#define MEMBARRIER_PRIVATE_EXPEDITED_RSEQ_BITMASK 0 #endif
#define MEMBARRIER_CMD_BITMASK \ @@ -159,7 +159,8 @@ | MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED \ | MEMBARRIER_CMD_PRIVATE_EXPEDITED \ | MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED \ - | MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK) + | MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK \ + | MEMBARRIER_PRIVATE_EXPEDITED_RSEQ_BITMASK)
static void ipi_mb(void *info) {
From: Bjorn Helgaas bhelgaas@google.com
commit 66d28b21fe6b3da8d1e9f0a7ba38bc61b6c547e1 upstream.
Ville reported that the sysfs "rom" file for VGA devices disappeared after 527139d738d7 ("PCI/sysfs: Convert "rom" to static attribute").
Prior to 527139d738d7, FINAL fixups, including pci_fixup_video() where we find shadow ROMs, were run before pci_create_sysfs_dev_files() created the sysfs "rom" file.
After 527139d738d7, "rom" is a static attribute and is created before FINAL fixups are run, so we didn't create "rom" files for shadow ROMs:
acpi_pci_root_add ... pci_scan_single_device pci_device_add pci_fixup_video # <-- new HEADER fixup device_add ... if (grp->is_visible()) pci_dev_rom_attr_is_visible # after 527139d738d7 pci_bus_add_devices pci_bus_add_device pci_fixup_device(pci_fixup_final) pci_fixup_video # <-- previous FINAL fixup pci_create_sysfs_dev_files if (pci_resource_len(pdev, PCI_ROM_RESOURCE)) sysfs_create_bin_file("rom") # before 527139d738d7
Change pci_fixup_video() to be a HEADER fixup so it runs before sysfs static attributes are initialized.
Rename the Loongson pci_fixup_radeon() to pci_fixup_video() and make its dmesg logging identical to the others since it is doing the same job.
Link: https://lore.kernel.org/r/YbxqIyrkv3GhZVxx@intel.com Fixes: 527139d738d7 ("PCI/sysfs: Convert "rom" to static attribute") Link: https://lore.kernel.org/r/20220126154001.16895-1-helgaas@kernel.org Reported-by: Ville Syrjälä ville.syrjala@linux.intel.com Tested-by: Ville Syrjälä ville.syrjala@linux.intel.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Cc: stable@vger.kernel.org # v5.13+ Cc: Huacai Chen chenhuacai@kernel.org Cc: Jiaxun Yang jiaxun.yang@flygoat.com Cc: Thomas Bogendoerfer tsbogend@alpha.franken.de Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Krzysztof Wilczyński kw@linux.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/ia64/pci/fixup.c | 4 ++-- arch/mips/loongson64/vbios_quirk.c | 9 ++++----- arch/x86/pci/fixup.c | 4 ++-- 3 files changed, 8 insertions(+), 9 deletions(-)
--- a/arch/ia64/pci/fixup.c +++ b/arch/ia64/pci/fixup.c @@ -76,5 +76,5 @@ static void pci_fixup_video(struct pci_d } } } -DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_ANY_ID, PCI_ANY_ID, - PCI_CLASS_DISPLAY_VGA, 8, pci_fixup_video); +DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_ANY_ID, PCI_ANY_ID, + PCI_CLASS_DISPLAY_VGA, 8, pci_fixup_video); --- a/arch/mips/loongson64/vbios_quirk.c +++ b/arch/mips/loongson64/vbios_quirk.c @@ -3,7 +3,7 @@ #include <linux/pci.h> #include <loongson.h>
-static void pci_fixup_radeon(struct pci_dev *pdev) +static void pci_fixup_video(struct pci_dev *pdev) { struct resource *res = &pdev->resource[PCI_ROM_RESOURCE];
@@ -22,8 +22,7 @@ static void pci_fixup_radeon(struct pci_ res->flags = IORESOURCE_MEM | IORESOURCE_ROM_SHADOW | IORESOURCE_PCI_FIXED;
- dev_info(&pdev->dev, "BAR %d: assigned %pR for Radeon ROM\n", - PCI_ROM_RESOURCE, res); + dev_info(&pdev->dev, "Video device with shadowed ROM at %pR\n", res); } -DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_ATI, 0x9615, - PCI_CLASS_DISPLAY_VGA, 8, pci_fixup_radeon); +DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_ATI, 0x9615, + PCI_CLASS_DISPLAY_VGA, 8, pci_fixup_video); --- a/arch/x86/pci/fixup.c +++ b/arch/x86/pci/fixup.c @@ -353,8 +353,8 @@ static void pci_fixup_video(struct pci_d } } } -DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_ANY_ID, PCI_ANY_ID, - PCI_CLASS_DISPLAY_VGA, 8, pci_fixup_video); +DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_ANY_ID, PCI_ANY_ID, + PCI_CLASS_DISPLAY_VGA, 8, pci_fixup_video);
static const struct dmi_system_id msi_k8t_dmi_table[] = {
From: Yazen Ghannam yazen.ghannam@amd.com
commit 1f52b0aba6fd37653416375cb8a1ca673acf8d5f upstream.
Changes to the AMD Thresholding sysfs code prevents sysfs writes from updating the underlying registers once CPU init is completed, i.e. "threshold_banks" is set.
Allow the registers to be updated if the thresholding interface is already initialized or if in the init path. Use the "set_lvt_off" value to indicate if running in the init path, since this value is only set during init.
Fixes: a037f3ca0ea0 ("x86/mce/amd: Make threshold bank setting hotplug robust") Signed-off-by: Yazen Ghannam yazen.ghannam@amd.com Signed-off-by: Borislav Petkov bp@suse.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220117161328.19148-1-yazen.ghannam@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/mce/amd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -401,7 +401,7 @@ static void threshold_restart_bank(void u32 hi, lo;
/* sysfs write might race against an offline operation */ - if (this_cpu_read(threshold_banks)) + if (!this_cpu_read(threshold_banks) && !tr->set_lvt_off) return;
rdmsr(tr->b->address, lo, hi);
From: Tony Luck tony.luck@intel.com
commit e464121f2d40eabc7d11823fb26db807ce945df4 upstream.
Missed adding the Icelake-D CPU to the list. It uses the same MSRs to control and read the inventory number as all the other models.
Fixes: dc6b025de95b ("x86/mce: Add Xeon Icelake to list of CPUs that support PPIN") Reported-by: Ailin Xu ailin.xu@intel.com Signed-off-by: Tony Luck tony.luck@intel.com Signed-off-by: Borislav Petkov bp@suse.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220121174743.1875294-2-tony.luck@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/mce/intel.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/kernel/cpu/mce/intel.c +++ b/arch/x86/kernel/cpu/mce/intel.c @@ -486,6 +486,7 @@ static void intel_ppin_init(struct cpuin case INTEL_FAM6_BROADWELL_X: case INTEL_FAM6_SKYLAKE_X: case INTEL_FAM6_ICELAKE_X: + case INTEL_FAM6_ICELAKE_D: case INTEL_FAM6_SAPPHIRERAPIDS_X: case INTEL_FAM6_XEON_PHI_KNL: case INTEL_FAM6_XEON_PHI_KNM:
From: Christophe Leroy christophe.leroy@csgroup.eu
commit 37eb7ca91b692e8e49e7dd50158349a6c8fb5b09 upstream.
Today we have the following IBATs allocated:
---[ Instruction Block Address Translation ]--- 0: 0xc0000000-0xc03fffff 0x00000000 4M Kernel x m 1: 0xc0400000-0xc05fffff 0x00400000 2M Kernel x m 2: 0xc0600000-0xc06fffff 0x00600000 1M Kernel x m 3: 0xc0700000-0xc077ffff 0x00700000 512K Kernel x m 4: 0xc0780000-0xc079ffff 0x00780000 128K Kernel x m 5: 0xc07a0000-0xc07bffff 0x007a0000 128K Kernel x m 6: - 7: -
The two 128K should be a single 256K instead.
When _etext is not aligned to 128Kbytes, the system will allocate all necessary BATs to the lower 128Kbytes boundary, then allocate an additional 128Kbytes BAT for the remaining block.
Instead, align the top to 128Kbytes so that the function directly allocates a 256Kbytes last block:
---[ Instruction Block Address Translation ]--- 0: 0xc0000000-0xc03fffff 0x00000000 4M Kernel x m 1: 0xc0400000-0xc05fffff 0x00400000 2M Kernel x m 2: 0xc0600000-0xc06fffff 0x00600000 1M Kernel x m 3: 0xc0700000-0xc077ffff 0x00700000 512K Kernel x m 4: 0xc0780000-0xc07bffff 0x00780000 256K Kernel x m 5: - 6: - 7: -
Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/ab58b296832b0ec650e2203200e060adbcb2677d.163793042... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/mm/book3s32/mmu.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/arch/powerpc/mm/book3s32/mmu.c +++ b/arch/powerpc/mm/book3s32/mmu.c @@ -196,18 +196,17 @@ void mmu_mark_initmem_nx(void) int nb = mmu_has_feature(MMU_FTR_USE_HIGH_BATS) ? 8 : 4; int i; unsigned long base = (unsigned long)_stext - PAGE_OFFSET; - unsigned long top = (unsigned long)_etext - PAGE_OFFSET; + unsigned long top = ALIGN((unsigned long)_etext - PAGE_OFFSET, SZ_128K); unsigned long border = (unsigned long)__init_begin - PAGE_OFFSET; unsigned long size;
- for (i = 0; i < nb - 1 && base < top && top - base > (128 << 10);) { + for (i = 0; i < nb - 1 && base < top;) { size = block_size(base, top); setibat(i++, PAGE_OFFSET + base, base, size, PAGE_KERNEL_TEXT); base += size; } if (base < top) { size = block_size(base, top); - size = max(size, 128UL << 10); if ((top - base) > size) { size <<= 1; if (strict_kernel_rwx_enabled() && base + size > border)
From: Christophe Leroy christophe.leroy@csgroup.eu
commit d37823c3528e5e0705fc7746bcbc2afffb619259 upstream.
It has been reported some configuration where the kernel doesn't boot with KASAN enabled.
This is due to wrong BAT allocation for the KASAN area:
---[ Data Block Address Translation ]--- 0: 0xc0000000-0xcfffffff 0x00000000 256M Kernel rw m 1: 0xd0000000-0xdfffffff 0x10000000 256M Kernel rw m 2: 0xe0000000-0xefffffff 0x20000000 256M Kernel rw m 3: 0xf8000000-0xf9ffffff 0x2a000000 32M Kernel rw m 4: 0xfa000000-0xfdffffff 0x2c000000 64M Kernel rw m
A BAT must have both virtual and physical addresses alignment matching the size of the BAT. This is not the case for BAT 4 above.
Fix kasan_init_region() by using block_size() function that is in book3s32/mmu.c. To be able to reuse it here, make it non static and change its name to bat_block_size() in order to avoid name conflict with block_size() defined in <linux/blkdev.h>
Also reuse find_free_bat() to avoid an error message from setbat() when no BAT is available.
And allocate memory outside of linear memory mapping to avoid wasting that precious space.
With this change we get correct alignment for BATs and KASAN shadow memory is allocated outside the linear memory space.
---[ Data Block Address Translation ]--- 0: 0xc0000000-0xcfffffff 0x00000000 256M Kernel rw 1: 0xd0000000-0xdfffffff 0x10000000 256M Kernel rw 2: 0xe0000000-0xefffffff 0x20000000 256M Kernel rw 3: 0xf8000000-0xfbffffff 0x7c000000 64M Kernel rw 4: 0xfc000000-0xfdffffff 0x7a000000 32M Kernel rw
Fixes: 7974c4732642 ("powerpc/32s: Implement dedicated kasan_init_region()") Cc: stable@vger.kernel.org Reported-by: Maxime Bizon mbizon@freebox.fr Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Tested-by: Maxime Bizon mbizon@freebox.fr Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/7a50ef902494d1325227d47d33dada01e52e5518.164181872... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/book3s/32/mmu-hash.h | 2 arch/powerpc/mm/book3s32/mmu.c | 10 ++-- arch/powerpc/mm/kasan/book3s_32.c | 57 +++++++++++++------------- 3 files changed, 37 insertions(+), 32 deletions(-)
--- a/arch/powerpc/include/asm/book3s/32/mmu-hash.h +++ b/arch/powerpc/include/asm/book3s/32/mmu-hash.h @@ -143,6 +143,8 @@ static __always_inline void update_user_ update_user_segment(15, val); }
+int __init find_free_bat(void); +unsigned int bat_block_size(unsigned long base, unsigned long top); #endif /* !__ASSEMBLY__ */
/* We happily ignore the smaller BATs on 601, we don't actually use --- a/arch/powerpc/mm/book3s32/mmu.c +++ b/arch/powerpc/mm/book3s32/mmu.c @@ -76,7 +76,7 @@ unsigned long p_block_mapped(phys_addr_t return 0; }
-static int find_free_bat(void) +int __init find_free_bat(void) { int b; int n = mmu_has_feature(MMU_FTR_USE_HIGH_BATS) ? 8 : 4; @@ -100,7 +100,7 @@ static int find_free_bat(void) * - block size has to be a power of two. This is calculated by finding the * highest bit set to 1. */ -static unsigned int block_size(unsigned long base, unsigned long top) +unsigned int bat_block_size(unsigned long base, unsigned long top) { unsigned int max_size = SZ_256M; unsigned int base_shift = (ffs(base) - 1) & 31; @@ -145,7 +145,7 @@ static unsigned long __init __mmu_mapin_ int idx;
while ((idx = find_free_bat()) != -1 && base != top) { - unsigned int size = block_size(base, top); + unsigned int size = bat_block_size(base, top);
if (size < 128 << 10) break; @@ -201,12 +201,12 @@ void mmu_mark_initmem_nx(void) unsigned long size;
for (i = 0; i < nb - 1 && base < top;) { - size = block_size(base, top); + size = bat_block_size(base, top); setibat(i++, PAGE_OFFSET + base, base, size, PAGE_KERNEL_TEXT); base += size; } if (base < top) { - size = block_size(base, top); + size = bat_block_size(base, top); if ((top - base) > size) { size <<= 1; if (strict_kernel_rwx_enabled() && base + size > border) --- a/arch/powerpc/mm/kasan/book3s_32.c +++ b/arch/powerpc/mm/kasan/book3s_32.c @@ -10,48 +10,51 @@ int __init kasan_init_region(void *start { unsigned long k_start = (unsigned long)kasan_mem_to_shadow(start); unsigned long k_end = (unsigned long)kasan_mem_to_shadow(start + size); - unsigned long k_cur = k_start; - int k_size = k_end - k_start; - int k_size_base = 1 << (ffs(k_size) - 1); + unsigned long k_nobat = k_start; + unsigned long k_cur; + phys_addr_t phys; int ret; - void *block;
- block = memblock_alloc(k_size, k_size_base); + while (k_nobat < k_end) { + unsigned int k_size = bat_block_size(k_nobat, k_end); + int idx = find_free_bat(); + + if (idx == -1) + break; + if (k_size < SZ_128K) + break; + phys = memblock_phys_alloc_range(k_size, k_size, 0, + MEMBLOCK_ALLOC_ANYWHERE); + if (!phys) + break;
- if (block && k_size_base >= SZ_128K && k_start == ALIGN(k_start, k_size_base)) { - int shift = ffs(k_size - k_size_base); - int k_size_more = shift ? 1 << (shift - 1) : 0; - - setbat(-1, k_start, __pa(block), k_size_base, PAGE_KERNEL); - if (k_size_more >= SZ_128K) - setbat(-1, k_start + k_size_base, __pa(block) + k_size_base, - k_size_more, PAGE_KERNEL); - if (v_block_mapped(k_start)) - k_cur = k_start + k_size_base; - if (v_block_mapped(k_start + k_size_base)) - k_cur = k_start + k_size_base + k_size_more; - - update_bats(); + setbat(idx, k_nobat, phys, k_size, PAGE_KERNEL); + k_nobat += k_size; } + if (k_nobat != k_start) + update_bats();
- if (!block) - block = memblock_alloc(k_size, PAGE_SIZE); - if (!block) - return -ENOMEM; + if (k_nobat < k_end) { + phys = memblock_phys_alloc_range(k_end - k_nobat, PAGE_SIZE, 0, + MEMBLOCK_ALLOC_ANYWHERE); + if (!phys) + return -ENOMEM; + }
ret = kasan_init_shadow_page_tables(k_start, k_end); if (ret) return ret;
- kasan_update_early_region(k_start, k_cur, __pte(0)); + kasan_update_early_region(k_start, k_nobat, __pte(0));
- for (; k_cur < k_end; k_cur += PAGE_SIZE) { + for (k_cur = k_nobat; k_cur < k_end; k_cur += PAGE_SIZE) { pmd_t *pmd = pmd_off_k(k_cur); - void *va = block + k_cur - k_start; - pte_t pte = pfn_pte(PHYS_PFN(__pa(va)), PAGE_KERNEL); + pte_t pte = pfn_pte(PHYS_PFN(phys + k_cur - k_nobat), PAGE_KERNEL);
__set_pte_at(&init_mm, k_cur, pte_offset_kernel(pmd, k_cur), pte, 0); } flush_tlb_kernel_range(k_start, k_end); + memset(kasan_mem_to_shadow(start), 0, k_end - k_start); + return 0; }
From: Christophe Leroy christophe.leroy@csgroup.eu
commit bba496656a73fc1d1330b49c7f82843836e9feb1 upstream.
Boot fails with GCC latent entropy plugin enabled.
This is due to early boot functions trying to access 'latent_entropy' global data while the kernel is not relocated at its final destination yet.
As there is no way to tell GCC to use PTRRELOC() to access it, disable latent entropy plugin in early_32.o and feature-fixups.o and code-patching.o
Fixes: 38addce8b600 ("gcc-plugins: Add latent_entropy plugin") Cc: stable@vger.kernel.org # v4.9+ Reported-by: Erhard Furtner erhard_f@mailbox.org Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://bugzilla.kernel.org/show_bug.cgi?id=215217 Link: https://lore.kernel.org/r/2bac55483b8daf5b1caa163a45fa5f9cdbe18be4.164017842... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/Makefile | 1 + arch/powerpc/lib/Makefile | 3 +++ 2 files changed, 4 insertions(+)
--- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -11,6 +11,7 @@ CFLAGS_prom_init.o += -fPIC CFLAGS_btext.o += -fPIC endif
+CFLAGS_early_32.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) CFLAGS_cputable.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) CFLAGS_prom_init.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) CFLAGS_btext.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) --- a/arch/powerpc/lib/Makefile +++ b/arch/powerpc/lib/Makefile @@ -19,6 +19,9 @@ CFLAGS_code-patching.o += -DDISABLE_BRAN CFLAGS_feature-fixups.o += -DDISABLE_BRANCH_PROFILING endif
+CFLAGS_code-patching.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) +CFLAGS_feature-fixups.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) + obj-y += alloc.o code-patching.o feature-fixups.o pmem.o test_code-patching.o
ifndef CONFIG_KASAN
From: Jedrzej Jagielski jedrzej.jagielski@intel.com
commit 9b13bd53134c9ddd544a790125199fdbdb505e67 upstream.
Recently simplified i40e_rebuild causes that FW sometimes is not ready after NVM update, the ping does not return.
Increase the delay in case of EMP reset. Old delay of 300 ms was introduced for specific cards for 710 series. Now it works for all the cards and delay was increased.
Fixes: 1fa51a650e1d ("i40e: Add delay after EMP reset for firmware to recover") Signed-off-by: Arkadiusz Kubalewski arkadiusz.kubalewski@intel.com Signed-off-by: Jedrzej Jagielski jedrzej.jagielski@intel.com Tested-by: Gurucharan G gurucharanx.g@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/intel/i40e/i40e_main.c | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-)
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -10574,15 +10574,9 @@ static void i40e_rebuild(struct i40e_pf } i40e_get_oem_version(&pf->hw);
- if (test_bit(__I40E_EMP_RESET_INTR_RECEIVED, pf->state) && - ((hw->aq.fw_maj_ver == 4 && hw->aq.fw_min_ver <= 33) || - hw->aq.fw_maj_ver < 4) && hw->mac.type == I40E_MAC_XL710) { - /* The following delay is necessary for 4.33 firmware and older - * to recover after EMP reset. 200 ms should suffice but we - * put here 300 ms to be sure that FW is ready to operate - * after reset. - */ - mdelay(300); + if (test_and_clear_bit(__I40E_EMP_RESET_INTR_RECEIVED, pf->state)) { + /* The following delay is necessary for firmware update. */ + mdelay(1000); }
/* re-verify the eeprom if we just had an EMP reset */
From: Jedrzej Jagielski jedrzej.jagielski@intel.com
commit d701658a50a471591094b3eb3961b4926cc8f104 upstream.
Before this patch VF interface vanished when maximum queue number was exceeded. Driver tried to add next queues even if there was not enough space. PF sent incorrect number of queues to the VF when there were not enough of them.
Add an additional condition introduced to check available space in 'qp_pile' before proceeding. This condition makes it impossible to add queues if they number is greater than the number resulting from available space. Also add the search for free space in PF queue pair piles.
Without this patch VF interfaces are not seen when available space for queues has been exceeded and following logs appears permanently in dmesg: "Unable to get VF config (-32)". "VF 62 failed opcode 3, retval: -5" "Unable to get VF config due to PF error condition, not retrying"
Fixes: 7daa6bf3294e ("i40e: driver core headers") Fixes: 41c445ff0f48 ("i40e: main driver core") Signed-off-by: Jaroslaw Gawin jaroslawx.gawin@intel.com Signed-off-by: Slawomir Laba slawomirx.laba@intel.com Signed-off-by: Jedrzej Jagielski jedrzej.jagielski@intel.com Tested-by: Konrad Jankowski konrad0.jankowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/intel/i40e/i40e.h | 1 drivers/net/ethernet/intel/i40e/i40e_main.c | 14 ---- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 59 +++++++++++++++++++++ 3 files changed, 61 insertions(+), 13 deletions(-)
--- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -174,7 +174,6 @@ enum i40e_interrupt_policy {
struct i40e_lump_tracking { u16 num_entries; - u16 search_hint; u16 list[0]; #define I40E_PILE_VALID_BIT 0x8000 #define I40E_IWARP_IRQ_PILE_ID (I40E_PILE_VALID_BIT - 2) --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -196,10 +196,6 @@ int i40e_free_virt_mem_d(struct i40e_hw * @id: an owner id to stick on the items assigned * * Returns the base item index of the lump, or negative for error - * - * The search_hint trick and lack of advanced fit-finding only work - * because we're highly likely to have all the same size lump requests. - * Linear search time and any fragmentation should be minimal. **/ static int i40e_get_lump(struct i40e_pf *pf, struct i40e_lump_tracking *pile, u16 needed, u16 id) @@ -214,8 +210,7 @@ static int i40e_get_lump(struct i40e_pf return -EINVAL; }
- /* start the linear search with an imperfect hint */ - i = pile->search_hint; + i = 0; while (i < pile->num_entries) { /* skip already allocated entries */ if (pile->list[i] & I40E_PILE_VALID_BIT) { @@ -234,7 +229,6 @@ static int i40e_get_lump(struct i40e_pf for (j = 0; j < needed; j++) pile->list[i+j] = id | I40E_PILE_VALID_BIT; ret = i; - pile->search_hint = i + j; break; }
@@ -257,7 +251,7 @@ static int i40e_put_lump(struct i40e_lum { int valid_id = (id | I40E_PILE_VALID_BIT); int count = 0; - int i; + u16 i;
if (!pile || index >= pile->num_entries) return -EINVAL; @@ -269,8 +263,6 @@ static int i40e_put_lump(struct i40e_lum count++; }
- if (count && index < pile->search_hint) - pile->search_hint = index;
return count; } @@ -11786,7 +11778,6 @@ static int i40e_init_interrupt_scheme(st return -ENOMEM;
pf->irq_pile->num_entries = vectors; - pf->irq_pile->search_hint = 0;
/* track first vector for misc interrupts, ignore return */ (void)i40e_get_lump(pf, pf->irq_pile, 1, I40E_PILE_VALID_BIT - 1); @@ -12589,7 +12580,6 @@ static int i40e_sw_init(struct i40e_pf * goto sw_init_done; } pf->qp_pile->num_entries = pf->hw.func_caps.num_tx_qp; - pf->qp_pile->search_hint = 0;
pf->tx_timeout_recovery_level = 1;
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -2618,6 +2618,59 @@ error_param: }
/** + * i40e_check_enough_queue - find big enough queue number + * @vf: pointer to the VF info + * @needed: the number of items needed + * + * Returns the base item index of the queue, or negative for error + **/ +static int i40e_check_enough_queue(struct i40e_vf *vf, u16 needed) +{ + unsigned int i, cur_queues, more, pool_size; + struct i40e_lump_tracking *pile; + struct i40e_pf *pf = vf->pf; + struct i40e_vsi *vsi; + + vsi = pf->vsi[vf->lan_vsi_idx]; + cur_queues = vsi->alloc_queue_pairs; + + /* if current allocated queues are enough for need */ + if (cur_queues >= needed) + return vsi->base_queue; + + pile = pf->qp_pile; + if (cur_queues > 0) { + /* if the allocated queues are not zero + * just check if there are enough queues for more + * behind the allocated queues. + */ + more = needed - cur_queues; + for (i = vsi->base_queue + cur_queues; + i < pile->num_entries; i++) { + if (pile->list[i] & I40E_PILE_VALID_BIT) + break; + + if (more-- == 1) + /* there is enough */ + return vsi->base_queue; + } + } + + pool_size = 0; + for (i = 0; i < pile->num_entries; i++) { + if (pile->list[i] & I40E_PILE_VALID_BIT) { + pool_size = 0; + continue; + } + if (needed <= ++pool_size) + /* there is enough */ + return i; + } + + return -ENOMEM; +} + +/** * i40e_vc_request_queues_msg * @vf: pointer to the VF info * @msg: pointer to the msg buffer @@ -2651,6 +2704,12 @@ static int i40e_vc_request_queues_msg(st req_pairs - cur_pairs, pf->queues_left); vfres->num_queue_pairs = pf->queues_left + cur_pairs; + } else if (i40e_check_enough_queue(vf, req_pairs) < 0) { + dev_warn(&pf->pdev->dev, + "VF %d requested %d more queues, but there is not enough for it.\n", + vf->vf_id, + req_pairs - cur_pairs); + vfres->num_queue_pairs = cur_pairs; } else { /* successful request */ vf->num_req_queues = req_pairs;
From: Sylwester Dziedziuch sylwesterx.dziedziuch@intel.com
commit 92947844b8beee988c0ce17082b705c2f75f0742 upstream.
When XDP was configured on a system with large number of CPUs and X722 NIC there was a call trace with NULL pointer dereference.
i40e 0000:87:00.0: failed to get tracking for 256 queues for VSI 0 err -12 i40e 0000:87:00.0: setup of MAIN VSI failed
BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:i40e_xdp+0xea/0x1b0 [i40e] Call Trace: ? i40e_reconfig_rss_queues+0x130/0x130 [i40e] dev_xdp_install+0x61/0xe0 dev_xdp_attach+0x18a/0x4c0 dev_change_xdp_fd+0x1e6/0x220 do_setlink+0x616/0x1030 ? ahci_port_stop+0x80/0x80 ? ata_qc_issue+0x107/0x1e0 ? lock_timer_base+0x61/0x80 ? __mod_timer+0x202/0x380 rtnl_setlink+0xe5/0x170 ? bpf_lsm_binder_transaction+0x10/0x10 ? security_capable+0x36/0x50 rtnetlink_rcv_msg+0x121/0x350 ? rtnl_calcit.isra.0+0x100/0x100 netlink_rcv_skb+0x50/0xf0 netlink_unicast+0x1d3/0x2a0 netlink_sendmsg+0x22a/0x440 sock_sendmsg+0x5e/0x60 __sys_sendto+0xf0/0x160 ? __sys_getsockname+0x7e/0xc0 ? _copy_from_user+0x3c/0x80 ? __sys_setsockopt+0xc8/0x1a0 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f83fa7a39e0
This was caused by PF queue pile fragmentation due to flow director VSI queue being placed right after main VSI. Because of this main VSI was not able to resize its queue allocation for XDP resulting in no queues allocated for main VSI when XDP was turned on.
Fix this by always allocating last queue in PF queue pile for a flow director VSI.
Fixes: 41c445ff0f48 ("i40e: main driver core") Fixes: 74608d17fe29 ("i40e: add support for XDP_TX action") Signed-off-by: Sylwester Dziedziuch sylwesterx.dziedziuch@intel.com Signed-off-by: Mateusz Palczewski mateusz.palczewski@intel.com Reviewed-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Tested-by: Kiran Bhandare kiranx.bhandare@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/intel/i40e/i40e_main.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -210,6 +210,20 @@ static int i40e_get_lump(struct i40e_pf return -EINVAL; }
+ /* Allocate last queue in the pile for FDIR VSI queue + * so it doesn't fragment the qp_pile + */ + if (pile == pf->qp_pile && pf->vsi[id]->type == I40E_VSI_FDIR) { + if (pile->list[pile->num_entries - 1] & I40E_PILE_VALID_BIT) { + dev_err(&pf->pdev->dev, + "Cannot allocate queue %d for I40E_VSI_FDIR\n", + pile->num_entries - 1); + return -ENOMEM; + } + pile->list[pile->num_entries - 1] = id | I40E_PILE_VALID_BIT; + return pile->num_entries - 1; + } + i = 0; while (i < pile->num_entries) { /* skip already allocated entries */
From: Karen Sornek karen.sornek@intel.com
commit 0f344c8129a5337dae50e31b817dd50a60ff238c upstream.
Fix for failed to init adminq: -53 while VF is resetting via MAC address changing procedure. Added sync module to avoid reading deadbeef value in reinit adminq during software reset. Without this patch it is possible to trigger VF reset procedure during reinit adminq. This resulted in an incorrect reading of value from the AQP registers and generated the -53 error.
Fixes: 5c3c48ac6bf5 ("i40e: implement virtual device interface") Signed-off-by: Grzegorz Szczurek grzegorzx.szczurek@intel.com Signed-off-by: Karen Sornek karen.sornek@intel.com Tested-by: Konrad Jankowski konrad0.jankowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/intel/i40e/i40e_register.h | 3 + drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 44 ++++++++++++++++++++- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h | 1 3 files changed, 46 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/intel/i40e/i40e_register.h +++ b/drivers/net/ethernet/intel/i40e/i40e_register.h @@ -413,6 +413,9 @@ #define I40E_VFINT_DYN_CTLN(_INTVF) (0x00024800 + ((_INTVF) * 4)) /* _i=0...511 */ /* Reset: VFR */ #define I40E_VFINT_DYN_CTLN_CLEARPBA_SHIFT 1 #define I40E_VFINT_DYN_CTLN_CLEARPBA_MASK I40E_MASK(0x1, I40E_VFINT_DYN_CTLN_CLEARPBA_SHIFT) +#define I40E_VFINT_ICR0_ADMINQ_SHIFT 30 +#define I40E_VFINT_ICR0_ADMINQ_MASK I40E_MASK(0x1, I40E_VFINT_ICR0_ADMINQ_SHIFT) +#define I40E_VFINT_ICR0_ENA(_VF) (0x0002C000 + ((_VF) * 4)) /* _i=0...127 */ /* Reset: CORER */ #define I40E_VPINT_AEQCTL(_VF) (0x0002B800 + ((_VF) * 4)) /* _i=0...127 */ /* Reset: CORER */ #define I40E_VPINT_AEQCTL_MSIX_INDX_SHIFT 0 #define I40E_VPINT_AEQCTL_ITR_INDX_SHIFT 11 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -1377,6 +1377,32 @@ static i40e_status i40e_config_vf_promis }
/** + * i40e_sync_vfr_reset + * @hw: pointer to hw struct + * @vf_id: VF identifier + * + * Before trigger hardware reset, we need to know if no other process has + * reserved the hardware for any reset operations. This check is done by + * examining the status of the RSTAT1 register used to signal the reset. + **/ +static int i40e_sync_vfr_reset(struct i40e_hw *hw, int vf_id) +{ + u32 reg; + int i; + + for (i = 0; i < I40E_VFR_WAIT_COUNT; i++) { + reg = rd32(hw, I40E_VFINT_ICR0_ENA(vf_id)) & + I40E_VFINT_ICR0_ADMINQ_MASK; + if (reg) + return 0; + + usleep_range(100, 200); + } + + return -EAGAIN; +} + +/** * i40e_trigger_vf_reset * @vf: pointer to the VF structure * @flr: VFLR was issued or not @@ -1390,9 +1416,11 @@ static void i40e_trigger_vf_reset(struct struct i40e_pf *pf = vf->pf; struct i40e_hw *hw = &pf->hw; u32 reg, reg_idx, bit_idx; + bool vf_active; + u32 radq;
/* warn the VF */ - clear_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states); + vf_active = test_and_clear_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states);
/* Disable VF's configuration API during reset. The flag is re-enabled * in i40e_alloc_vf_res(), when it's safe again to access VF's VSI. @@ -1406,7 +1434,19 @@ static void i40e_trigger_vf_reset(struct * just need to clean up, so don't hit the VFRTRIG register. */ if (!flr) { - /* reset VF using VPGEN_VFRTRIG reg */ + /* Sync VFR reset before trigger next one */ + radq = rd32(hw, I40E_VFINT_ICR0_ENA(vf->vf_id)) & + I40E_VFINT_ICR0_ADMINQ_MASK; + if (vf_active && !radq) + /* waiting for finish reset by virtual driver */ + if (i40e_sync_vfr_reset(hw, vf->vf_id)) + dev_info(&pf->pdev->dev, + "Reset VF %d never finished\n", + vf->vf_id); + + /* Reset VF using VPGEN_VFRTRIG reg. It is also setting + * in progress state in rstat1 register. + */ reg = rd32(hw, I40E_VPGEN_VFRTRIG(vf->vf_id)); reg |= I40E_VPGEN_VFRTRIG_VFSWR_MASK; wr32(hw, I40E_VPGEN_VFRTRIG(vf->vf_id), reg); --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h @@ -19,6 +19,7 @@ #define I40E_MAX_VF_PROMISC_FLAGS 3
#define I40E_VF_STATE_WAIT_COUNT 20 +#define I40E_VFR_WAIT_COUNT 100
/* Various queue ctrls */ enum i40e_queue_ctrl {
From: Joe Damato jdamato@fastly.com
commit 3b8428b84539c78fdc8006c17ebd25afd4722d51 upstream.
Change i40e_update_vsi_stats and struct i40e_vsi to use u64 fields to match the width of the stats counters in struct i40e_rx_queue_stats.
Update debugfs code to use the correct format specifier for u64.
Fixes: 41c445ff0f48 ("i40e: main driver core") Signed-off-by: Joe Damato jdamato@fastly.com Reported-by: kernel test robot lkp@intel.com Tested-by: Gurucharan G gurucharanx.g@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/intel/i40e/i40e.h | 8 ++++---- drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 2 +- drivers/net/ethernet/intel/i40e/i40e_main.c | 4 ++-- 3 files changed, 7 insertions(+), 7 deletions(-)
--- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -847,12 +847,12 @@ struct i40e_vsi { struct rtnl_link_stats64 net_stats_offsets; struct i40e_eth_stats eth_stats; struct i40e_eth_stats eth_stats_offsets; - u32 tx_restart; - u32 tx_busy; + u64 tx_restart; + u64 tx_busy; u64 tx_linearize; u64 tx_force_wb; - u32 rx_buf_failed; - u32 rx_page_failed; + u64 rx_buf_failed; + u64 rx_page_failed;
/* These are containers of ring pointers, allocated at run-time */ struct i40e_ring **rx_rings; --- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c +++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c @@ -240,7 +240,7 @@ static void i40e_dbg_dump_vsi_seid(struc (unsigned long int)vsi->net_stats_offsets.rx_compressed, (unsigned long int)vsi->net_stats_offsets.tx_compressed); dev_info(&pf->pdev->dev, - " tx_restart = %d, tx_busy = %d, rx_buf_failed = %d, rx_page_failed = %d\n", + " tx_restart = %llu, tx_busy = %llu, rx_buf_failed = %llu, rx_page_failed = %llu\n", vsi->tx_restart, vsi->tx_busy, vsi->rx_buf_failed, vsi->rx_page_failed); rcu_read_lock(); --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -778,9 +778,9 @@ static void i40e_update_vsi_stats(struct struct rtnl_link_stats64 *ns; /* netdev stats */ struct i40e_eth_stats *oes; struct i40e_eth_stats *es; /* device's eth stats */ - u32 tx_restart, tx_busy; + u64 tx_restart, tx_busy; struct i40e_ring *p; - u32 rx_page, rx_buf; + u64 rx_page, rx_buf; u64 bytes, packets; unsigned int start; u64 tx_linearize;
From: Linyu Yuan quic_linyyuan@quicinc.com
commit 945c37ed564770c78dfe6b9f08bed57a1b4e60ef upstream.
when CONFIG_USB_ROLE_SWITCH is not defined, add usb_role_switch_find_by_fwnode() definition which return NULL.
Fixes: c6919d5e0cd1 ("usb: roles: Add usb_role_switch_find_by_fwnode()") Signed-off-by: Linyu Yuan quic_linyyuan@quicinc.com Link: https://lore.kernel.org/r/1641818608-25039-1-git-send-email-quic_linyyuan@qu... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/usb/role.h | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/include/linux/usb/role.h +++ b/include/linux/usb/role.h @@ -92,6 +92,12 @@ fwnode_usb_role_switch_get(struct fwnode static inline void usb_role_switch_put(struct usb_role_switch *sw) { }
static inline struct usb_role_switch * +usb_role_switch_find_by_fwnode(const struct fwnode_handle *fwnode) +{ + return NULL; +} + +static inline struct usb_role_switch * usb_role_switch_register(struct device *parent, const struct usb_role_switch_desc *desc) {
From: Sujit Kautkar sujitka@chromium.org
commit b7fb2dad571d1e21173c06cef0bced77b323990a upstream.
struct rpmsg_ctrldev contains a struct cdev. The current code frees the rpmsg_ctrldev struct in rpmsg_ctrldev_release_device(), but the cdev is a managed object, therefore its release is not predictable and the rpmsg_ctrldev could be freed before the cdev is entirely released, as in the backtrace below.
[ 93.625603] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x7c [ 93.636115] WARNING: CPU: 0 PID: 12 at lib/debugobjects.c:488 debug_print_object+0x13c/0x1b0 [ 93.644799] Modules linked in: veth xt_cgroup xt_MASQUERADE rfcomm algif_hash algif_skcipher af_alg uinput ip6table_nat fuse uvcvideo videobuf2_vmalloc venus_enc venus_dec videobuf2_dma_contig hci_uart btandroid btqca snd_soc_rt5682_i2c bluetooth qcom_spmi_temp_alarm snd_soc_rt5682v [ 93.715175] CPU: 0 PID: 12 Comm: kworker/0:1 Tainted: G B 5.4.163-lockdep #26 [ 93.723855] Hardware name: Google Lazor (rev3 - 8) with LTE (DT) [ 93.730055] Workqueue: events kobject_delayed_cleanup [ 93.735271] pstate: 60c00009 (nZCv daif +PAN +UAO) [ 93.740216] pc : debug_print_object+0x13c/0x1b0 [ 93.744890] lr : debug_print_object+0x13c/0x1b0 [ 93.749555] sp : ffffffacf5bc7940 [ 93.752978] x29: ffffffacf5bc7940 x28: dfffffd000000000 [ 93.758448] x27: ffffffacdb11a800 x26: dfffffd000000000 [ 93.763916] x25: ffffffd0734f856c x24: dfffffd000000000 [ 93.769389] x23: 0000000000000000 x22: ffffffd0733c35b0 [ 93.774860] x21: ffffffd0751994a0 x20: ffffffd075ec27c0 [ 93.780338] x19: ffffffd075199100 x18: 00000000000276e0 [ 93.785814] x17: 0000000000000000 x16: dfffffd000000000 [ 93.791291] x15: ffffffffffffffff x14: 6e6968207473696c [ 93.796768] x13: 0000000000000000 x12: ffffffd075e2b000 [ 93.802244] x11: 0000000000000001 x10: 0000000000000000 [ 93.807723] x9 : d13400dff1921900 x8 : d13400dff1921900 [ 93.813200] x7 : 0000000000000000 x6 : 0000000000000000 [ 93.818676] x5 : 0000000000000080 x4 : 0000000000000000 [ 93.824152] x3 : ffffffd0732a0fa4 x2 : 0000000000000001 [ 93.829628] x1 : ffffffacf5bc7580 x0 : 0000000000000061 [ 93.835104] Call trace: [ 93.837644] debug_print_object+0x13c/0x1b0 [ 93.841963] __debug_check_no_obj_freed+0x25c/0x3c0 [ 93.846987] debug_check_no_obj_freed+0x18/0x20 [ 93.851669] slab_free_freelist_hook+0xbc/0x1e4 [ 93.856346] kfree+0xfc/0x2f4 [ 93.859416] rpmsg_ctrldev_release_device+0x78/0xb8 [ 93.864445] device_release+0x84/0x168 [ 93.868310] kobject_cleanup+0x12c/0x298 [ 93.872356] kobject_delayed_cleanup+0x10/0x18 [ 93.876948] process_one_work+0x578/0x92c [ 93.881086] worker_thread+0x804/0xcf8 [ 93.884963] kthread+0x2a8/0x314 [ 93.888303] ret_from_fork+0x10/0x18
The cdev_device_add/del() API was created to address this issue (see commit '233ed09d7fda ("chardev: add helper function to register char devs with a struct device")'), use it instead of cdev add/del().
Fixes: c0cdc19f84a4 ("rpmsg: Driver for user space endpoint interface") Signed-off-by: Sujit Kautkar sujitka@chromium.org Signed-off-by: Matthias Kaehlcke mka@chromium.org Reviewed-by: Mathieu Poirier mathieu.poirier@linaro.org Reviewed-by: Bjorn Andersson bjorn.andersson@linaro.org Reviewed-by: Stephen Boyd swboyd@chromium.org Signed-off-by: Bjorn Andersson bjorn.andersson@linaro.org Link: https://lore.kernel.org/r/20220110104706.v6.1.Iaac908f3e3149a89190ce006ba166... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/rpmsg/rpmsg_char.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-)
--- a/drivers/rpmsg/rpmsg_char.c +++ b/drivers/rpmsg/rpmsg_char.c @@ -459,7 +459,6 @@ static void rpmsg_ctrldev_release_device
ida_simple_remove(&rpmsg_ctrl_ida, dev->id); ida_simple_remove(&rpmsg_minor_ida, MINOR(dev->devt)); - cdev_del(&ctrldev->cdev); kfree(ctrldev); }
@@ -494,19 +493,13 @@ static int rpmsg_chrdev_probe(struct rpm dev->id = ret; dev_set_name(&ctrldev->dev, "rpmsg_ctrl%d", ret);
- ret = cdev_add(&ctrldev->cdev, dev->devt, 1); + ret = cdev_device_add(&ctrldev->cdev, &ctrldev->dev); if (ret) goto free_ctrl_ida;
/* We can now rely on the release function for cleanup */ dev->release = rpmsg_ctrldev_release_device;
- ret = device_add(dev); - if (ret) { - dev_err(&rpdev->dev, "device_add failed: %d\n", ret); - put_device(dev); - } - dev_set_drvdata(&rpdev->dev, ctrldev);
return ret; @@ -532,7 +525,7 @@ static void rpmsg_chrdev_remove(struct r if (ret) dev_warn(&rpdev->dev, "failed to nuke endpoints: %d\n", ret);
- device_del(&ctrldev->dev); + cdev_device_del(&ctrldev->cdev, &ctrldev->dev); put_device(&ctrldev->dev); }
From: Matthias Kaehlcke mka@chromium.org
commit 7a534ae89e34e9b51acb5a63dd0f88308178b46a upstream.
struct rpmsg_eptdev contains a struct cdev. The current code frees the rpmsg_eptdev struct in rpmsg_eptdev_destroy(), but the cdev is a managed object, therefore its release is not predictable and the rpmsg_eptdev could be freed before the cdev is entirely released.
The cdev_device_add/del() API was created to address this issue (see commit '233ed09d7fda ("chardev: add helper function to register char devs with a struct device")'), use it instead of cdev add/del().
Fixes: c0cdc19f84a4 ("rpmsg: Driver for user space endpoint interface") Suggested-by: Bjorn Andersson bjorn.andersson@linaro.org Signed-off-by: Matthias Kaehlcke mka@chromium.org Reviewed-by: Mathieu Poirier mathieu.poirier@linaro.org Reviewed-by: Stephen Boyd swboyd@chromium.org Reviewed-by: Bjorn Andersson bjorn.andersson@linaro.org Signed-off-by: Bjorn Andersson bjorn.andersson@linaro.org Link: https://lore.kernel.org/r/20220110104706.v6.2.Idde68b05b88d4a2e6e54766c653f3... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/rpmsg/rpmsg_char.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-)
--- a/drivers/rpmsg/rpmsg_char.c +++ b/drivers/rpmsg/rpmsg_char.c @@ -90,7 +90,7 @@ static int rpmsg_eptdev_destroy(struct d /* wake up any blocked readers */ wake_up_interruptible(&eptdev->readq);
- device_del(&eptdev->dev); + cdev_device_del(&eptdev->cdev, &eptdev->dev); put_device(&eptdev->dev);
return 0; @@ -333,7 +333,6 @@ static void rpmsg_eptdev_release_device(
ida_simple_remove(&rpmsg_ept_ida, dev->id); ida_simple_remove(&rpmsg_minor_ida, MINOR(eptdev->dev.devt)); - cdev_del(&eptdev->cdev); kfree(eptdev); }
@@ -378,19 +377,13 @@ static int rpmsg_eptdev_create(struct rp dev->id = ret; dev_set_name(dev, "rpmsg%d", ret);
- ret = cdev_add(&eptdev->cdev, dev->devt, 1); + ret = cdev_device_add(&eptdev->cdev, &eptdev->dev); if (ret) goto free_ept_ida;
/* We can now rely on the release function for cleanup */ dev->release = rpmsg_eptdev_release_device;
- ret = device_add(dev); - if (ret) { - dev_err(dev, "device_add failed: %d\n", ret); - put_device(dev); - } - return ret;
free_ept_ida:
From: Yang Yingliang yangyingliang@huawei.com
commit 61263b3a11a2594b4e898f166c31162236182b5c upstream.
GFP_KERNEL/GFP_DMA can't be used under a spin lock. According the comment, els_ios_lock is used to protect els ios list so we can move down the spin lock to avoid using this flag under the lock.
Link: https://lore.kernel.org/r/20220111012441.3232527-1-yangyingliang@huawei.com Fixes: 8f406ef72859 ("scsi: elx: libefc: Extended link Service I/O handling") Reported-by: Hulk Robot hulkci@huawei.com Reviewed-by: James Smart jsmart2021@gmail.com Signed-off-by: Yang Yingliang yangyingliang@huawei.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/elx/libefc/efc_els.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)
--- a/drivers/scsi/elx/libefc/efc_els.c +++ b/drivers/scsi/elx/libefc/efc_els.c @@ -46,18 +46,14 @@ efc_els_io_alloc_size(struct efc_node *n
efc = node->efc;
- spin_lock_irqsave(&node->els_ios_lock, flags); - if (!node->els_io_enabled) { efc_log_err(efc, "els io alloc disabled\n"); - spin_unlock_irqrestore(&node->els_ios_lock, flags); return NULL; }
els = mempool_alloc(efc->els_io_pool, GFP_ATOMIC); if (!els) { atomic_add_return(1, &efc->els_io_alloc_failed_count); - spin_unlock_irqrestore(&node->els_ios_lock, flags); return NULL; }
@@ -74,7 +70,6 @@ efc_els_io_alloc_size(struct efc_node *n &els->io.req.phys, GFP_DMA); if (!els->io.req.virt) { mempool_free(els, efc->els_io_pool); - spin_unlock_irqrestore(&node->els_ios_lock, flags); return NULL; }
@@ -94,10 +89,11 @@ efc_els_io_alloc_size(struct efc_node *n
/* add els structure to ELS IO list */ INIT_LIST_HEAD(&els->list_entry); + spin_lock_irqsave(&node->els_ios_lock, flags); list_add_tail(&els->list_entry, &node->els_ios_list); + spin_unlock_irqrestore(&node->els_ios_lock, flags); }
- spin_unlock_irqrestore(&node->els_ios_lock, flags); return els; }
From: John Meneghini jmeneghi@redhat.com
commit 847f9ea4c5186fdb7b84297e3eeed9e340e83fce upstream.
The bnx2fc_destroy() functions are removing the interface before calling destroy_work. This results multiple WARNings from sysfs_remove_group() as the controller rport device attributes are removed too early.
Replace the fcoe_port's destroy_work queue. It's not needed.
The problem is easily reproducible with the following steps.
Example:
$ dmesg -w & $ systemctl enable --now fcoe $ fipvlan -s -c ens2f1 $ fcoeadm -d ens2f1.802 [ 583.464488] host2: libfc: Link down on port (7500a1) [ 583.472651] bnx2fc: 7500a1 - rport not created Yet!! [ 583.490468] ------------[ cut here ]------------ [ 583.538725] sysfs group 'power' not found for kobject 'rport-2:0-0' [ 583.568814] WARNING: CPU: 3 PID: 192 at fs/sysfs/group.c:279 sysfs_remove_group+0x6f/0x80 [ 583.607130] Modules linked in: dm_service_time 8021q garp mrp stp llc bnx2fc cnic uio rpcsec_gss_krb5 auth_rpcgss nfsv4 ... [ 583.942994] CPU: 3 PID: 192 Comm: kworker/3:2 Kdump: loaded Not tainted 5.14.0-39.el9.x86_64 #1 [ 583.984105] Hardware name: HP ProLiant DL120 G7, BIOS J01 07/01/2013 [ 584.016535] Workqueue: fc_wq_2 fc_rport_final_delete [scsi_transport_fc] [ 584.050691] RIP: 0010:sysfs_remove_group+0x6f/0x80 [ 584.074725] Code: ff 5b 48 89 ef 5d 41 5c e9 ee c0 ff ff 48 89 ef e8 f6 b8 ff ff eb d1 49 8b 14 24 48 8b 33 48 c7 c7 ... [ 584.162586] RSP: 0018:ffffb567c15afdc0 EFLAGS: 00010282 [ 584.188225] RAX: 0000000000000000 RBX: ffffffff8eec4220 RCX: 0000000000000000 [ 584.221053] RDX: ffff8c1586ce84c0 RSI: ffff8c1586cd7cc0 RDI: ffff8c1586cd7cc0 [ 584.255089] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffb567c15afc00 [ 584.287954] R10: ffffb567c15afbf8 R11: ffffffff8fbe7f28 R12: ffff8c1486326400 [ 584.322356] R13: ffff8c1486326480 R14: ffff8c1483a4a000 R15: 0000000000000004 [ 584.355379] FS: 0000000000000000(0000) GS:ffff8c1586cc0000(0000) knlGS:0000000000000000 [ 584.394419] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 584.421123] CR2: 00007fe95a6f7840 CR3: 0000000107674002 CR4: 00000000000606e0 [ 584.454888] Call Trace: [ 584.466108] device_del+0xb2/0x3e0 [ 584.481701] device_unregister+0x13/0x60 [ 584.501306] bsg_unregister_queue+0x5b/0x80 [ 584.522029] bsg_remove_queue+0x1c/0x40 [ 584.541884] fc_rport_final_delete+0xf3/0x1d0 [scsi_transport_fc] [ 584.573823] process_one_work+0x1e3/0x3b0 [ 584.592396] worker_thread+0x50/0x3b0 [ 584.609256] ? rescuer_thread+0x370/0x370 [ 584.628877] kthread+0x149/0x170 [ 584.643673] ? set_kthread_struct+0x40/0x40 [ 584.662909] ret_from_fork+0x22/0x30 [ 584.680002] ---[ end trace 53575ecefa942ece ]---
Link: https://lore.kernel.org/r/20220115040044.1013475-1-jmeneghi@redhat.com Fixes: 0cbf32e1681d ("[SCSI] bnx2fc: Avoid calling bnx2fc_if_destroy with unnecessary locks") Tested-by: Guangwu Zhang guazhang@redhat.com Co-developed-by: Maurizio Lombardi mlombard@redhat.com Signed-off-by: Maurizio Lombardi mlombard@redhat.com Signed-off-by: John Meneghini jmeneghi@redhat.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/bnx2fc/bnx2fc_fcoe.c | 20 +++++--------------- 1 file changed, 5 insertions(+), 15 deletions(-)
--- a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c +++ b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c @@ -82,7 +82,7 @@ static int bnx2fc_bind_pcidev(struct bnx static void bnx2fc_unbind_pcidev(struct bnx2fc_hba *hba); static struct fc_lport *bnx2fc_if_create(struct bnx2fc_interface *interface, struct device *parent, int npiv); -static void bnx2fc_destroy_work(struct work_struct *work); +static void bnx2fc_port_destroy(struct fcoe_port *port);
static struct bnx2fc_hba *bnx2fc_hba_lookup(struct net_device *phys_dev); static struct bnx2fc_interface *bnx2fc_interface_lookup(struct net_device @@ -907,9 +907,6 @@ static void bnx2fc_indicate_netevent(voi __bnx2fc_destroy(interface); } mutex_unlock(&bnx2fc_dev_lock); - - /* Ensure ALL destroy work has been completed before return */ - flush_workqueue(bnx2fc_wq); return;
default: @@ -1215,8 +1212,8 @@ static int bnx2fc_vport_destroy(struct f mutex_unlock(&n_port->lp_mutex); bnx2fc_free_vport(interface->hba, port->lport); bnx2fc_port_shutdown(port->lport); + bnx2fc_port_destroy(port); bnx2fc_interface_put(interface); - queue_work(bnx2fc_wq, &port->destroy_work); return 0; }
@@ -1525,7 +1522,6 @@ static struct fc_lport *bnx2fc_if_create port->lport = lport; port->priv = interface; port->get_netdev = bnx2fc_netdev; - INIT_WORK(&port->destroy_work, bnx2fc_destroy_work);
/* Configure fcoe_port */ rc = bnx2fc_lport_config(lport); @@ -1653,8 +1649,8 @@ static void __bnx2fc_destroy(struct bnx2 bnx2fc_interface_cleanup(interface); bnx2fc_stop(interface); list_del(&interface->list); + bnx2fc_port_destroy(port); bnx2fc_interface_put(interface); - queue_work(bnx2fc_wq, &port->destroy_work); }
/** @@ -1694,15 +1690,12 @@ netdev_err: return rc; }
-static void bnx2fc_destroy_work(struct work_struct *work) +static void bnx2fc_port_destroy(struct fcoe_port *port) { - struct fcoe_port *port; struct fc_lport *lport;
- port = container_of(work, struct fcoe_port, destroy_work); lport = port->lport; - - BNX2FC_HBA_DBG(lport, "Entered bnx2fc_destroy_work\n"); + BNX2FC_HBA_DBG(lport, "Entered %s, destroying lport %p\n", __func__, lport);
bnx2fc_if_destroy(lport); } @@ -2556,9 +2549,6 @@ static void bnx2fc_ulp_exit(struct cnic_ __bnx2fc_destroy(interface); mutex_unlock(&bnx2fc_dev_lock);
- /* Ensure ALL destroy work has been completed before return */ - flush_workqueue(bnx2fc_wq); - bnx2fc_ulp_stop(hba); /* unregister cnic device */ if (test_and_clear_bit(BNX2FC_CNIC_REGISTERED, &hba->reg_with_cnic))
From: Ido Schimmel idosch@nvidia.com
commit 6cee105e7f2ced596373951d9ea08dacc3883c68 upstream.
The warning messages can be invoked from the data path for every packet transmitted through an ip6gre netdev, leading to high CPU utilization.
Fix that by rate limiting the messages.
Fixes: 09c6bbf090ec ("[IPV6]: Do mandatory IPv6 tunnel endpoint checks in realtime") Reported-by: Maksym Yaremchuk maksymy@nvidia.com Tested-by: Maksym Yaremchuk maksymy@nvidia.com Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Amit Cohen amcohen@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6_tunnel.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -1036,14 +1036,14 @@ int ip6_tnl_xmit_ctl(struct ip6_tnl *t,
if (unlikely(!ipv6_chk_addr_and_flags(net, laddr, ldev, false, 0, IFA_F_TENTATIVE))) - pr_warn("%s xmit: Local address not yet configured!\n", - p->name); + pr_warn_ratelimited("%s xmit: Local address not yet configured!\n", + p->name); else if (!(p->flags & IP6_TNL_F_ALLOW_LOCAL_REMOTE) && !ipv6_addr_is_multicast(raddr) && unlikely(ipv6_chk_addr_and_flags(net, raddr, ldev, true, 0, IFA_F_TENTATIVE))) - pr_warn("%s xmit: Routing loop! Remote address found on this node!\n", - p->name); + pr_warn_ratelimited("%s xmit: Routing loop! Remote address found on this node!\n", + p->name); else ret = 1; rcu_read_unlock();
From: sparkhuang huangshaobo6@huawei.com
commit 8b59b0a53c840921b625378f137e88adfa87647e upstream.
arm32 uses software to simulate the instruction replaced by kprobe. some instructions may be simulated by constructing assembly functions. therefore, before executing instruction simulation, it is necessary to construct assembly function execution environment in C language through binding registers. after kasan is enabled, the register binding relationship will be destroyed, resulting in instruction simulation errors and causing kernel panic.
the kprobe emulate instruction function is distributed in three files: actions-common.c actions-arm.c actions-thumb.c, so disable KASAN when compiling these files.
for example, use kprobe insert on cap_capable+20 after kasan enabled, the cap_capable assembly code is as follows: <cap_capable>: e92d47f0 push {r4, r5, r6, r7, r8, r9, sl, lr} e1a05000 mov r5, r0 e280006c add r0, r0, #108 ; 0x6c e1a04001 mov r4, r1 e1a06002 mov r6, r2 e59fa090 ldr sl, [pc, #144] ; ebfc7bf8 bl c03aa4b4 <__asan_load4> e595706c ldr r7, [r5, #108] ; 0x6c e2859014 add r9, r5, #20 ...... The emulate_ldr assembly code after enabling kasan is as follows: c06f1384 <emulate_ldr>: e92d47f0 push {r4, r5, r6, r7, r8, r9, sl, lr} e282803c add r8, r2, #60 ; 0x3c e1a05000 mov r5, r0 e7e37855 ubfx r7, r5, #16, #4 e1a00008 mov r0, r8 e1a09001 mov r9, r1 e1a04002 mov r4, r2 ebf35462 bl c03c6530 <__asan_load4> e357000f cmp r7, #15 e7e36655 ubfx r6, r5, #12, #4 e205a00f and sl, r5, #15 0a000001 beq c06f13bc <emulate_ldr+0x38> e0840107 add r0, r4, r7, lsl #2 ebf3545c bl c03c6530 <__asan_load4> e084010a add r0, r4, sl, lsl #2 ebf3545a bl c03c6530 <__asan_load4> e2890010 add r0, r9, #16 ebf35458 bl c03c6530 <__asan_load4> e5990010 ldr r0, [r9, #16] e12fff30 blx r0 e356000f cm r6, #15 1a000014 bne c06f1430 <emulate_ldr+0xac> e1a06000 mov r6, r0 e2840040 add r0, r4, #64 ; 0x40 ......
when running in emulate_ldr to simulate the ldr instruction, panic occurred, and the log is as follows: Unable to handle kernel NULL pointer dereference at virtual address 00000090 pgd = ecb46400 [00000090] *pgd=2e0fa003, *pmd=00000000 Internal error: Oops: 206 [#1] SMP ARM PC is at cap_capable+0x14/0xb0 LR is at emulate_ldr+0x50/0xc0 psr: 600d0293 sp : ecd63af8 ip : 00000004 fp : c0a7c30c r10: 00000000 r9 : c30897f4 r8 : ecd63cd4 r7 : 0000000f r6 : 0000000a r5 : e59fa090 r4 : ecd63c98 r3 : c06ae294 r2 : 00000000 r1 : b7611300 r0 : bf4ec008 Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Control: 32c5387d Table: 2d546400 DAC: 55555555 Process bash (pid: 1643, stack limit = 0xecd60190) (cap_capable) from (kprobe_handler+0x218/0x340) (kprobe_handler) from (kprobe_trap_handler+0x24/0x48) (kprobe_trap_handler) from (do_undefinstr+0x13c/0x364) (do_undefinstr) from (__und_svc_finish+0x0/0x30) (__und_svc_finish) from (cap_capable+0x18/0xb0) (cap_capable) from (cap_vm_enough_memory+0x38/0x48) (cap_vm_enough_memory) from (security_vm_enough_memory_mm+0x48/0x6c) (security_vm_enough_memory_mm) from (copy_process.constprop.5+0x16b4/0x25c8) (copy_process.constprop.5) from (_do_fork+0xe8/0x55c) (_do_fork) from (SyS_clone+0x1c/0x24) (SyS_clone) from (__sys_trace_return+0x0/0x10) Code: 0050a0e1 6c0080e2 0140a0e1 0260a0e1 (f801f0e7)
Fixes: 35aa1df43283 ("ARM kprobes: instruction single-stepping support") Fixes: 421015713b30 ("ARM: 9017/2: Enable KASan for ARM") Signed-off-by: huangshaobo huangshaobo6@huawei.com Acked-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/probes/kprobes/Makefile | 3 +++ 1 file changed, 3 insertions(+)
--- a/arch/arm/probes/kprobes/Makefile +++ b/arch/arm/probes/kprobes/Makefile @@ -1,4 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 +KASAN_SANITIZE_actions-common.o := n +KASAN_SANITIZE_actions-arm.o := n +KASAN_SANITIZE_actions-thumb.o := n obj-$(CONFIG_KPROBES) += core.o actions-common.o checkers-common.o obj-$(CONFIG_ARM_KPROBES_TEST) += test-kprobes.o test-kprobes-objs := test-core.o
From: Congyu Liu liu3101@purdue.edu
commit 47934e06b65637c88a762d9c98329ae6e3238888 upstream.
In one net namespace, after creating a packet socket without binding it to a device, users in other net namespaces can observe the new `packet_type` added by this packet socket by reading `/proc/net/ptype` file. This is minor information leakage as packet socket is namespace aware.
Add a net pointer in `packet_type` to keep the net namespace of of corresponding packet socket. In `ptype_seq_show`, this net pointer must be checked when it is not NULL.
Fixes: 2feb27dbe00c ("[NETNS]: Minor information leak via /proc/net/ptype file.") Signed-off-by: Congyu Liu liu3101@purdue.edu Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/netdevice.h | 1 + net/core/net-procfs.c | 3 ++- net/packet/af_packet.c | 2 ++ 3 files changed, 5 insertions(+), 1 deletion(-)
--- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2636,6 +2636,7 @@ struct packet_type { struct net_device *); bool (*id_match)(struct packet_type *ptype, struct sock *sk); + struct net *af_packet_net; void *af_packet_priv; struct list_head list; }; --- a/net/core/net-procfs.c +++ b/net/core/net-procfs.c @@ -260,7 +260,8 @@ static int ptype_seq_show(struct seq_fil
if (v == SEQ_START_TOKEN) seq_puts(seq, "Type Device Function\n"); - else if (pt->dev == NULL || dev_net(pt->dev) == seq_file_net(seq)) { + else if ((!pt->af_packet_net || net_eq(pt->af_packet_net, seq_file_net(seq))) && + (!pt->dev || net_eq(dev_net(pt->dev), seq_file_net(seq)))) { if (pt->type == htons(ETH_P_ALL)) seq_puts(seq, "ALL "); else --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1773,6 +1773,7 @@ static int fanout_add(struct sock *sk, s match->prot_hook.dev = po->prot_hook.dev; match->prot_hook.func = packet_rcv_fanout; match->prot_hook.af_packet_priv = match; + match->prot_hook.af_packet_net = read_pnet(&match->net); match->prot_hook.id_match = match_fanout_group; match->max_num_members = args->max_num_members; list_add(&match->list, &fanout_list); @@ -3358,6 +3359,7 @@ static int packet_create(struct net *net po->prot_hook.func = packet_rcv_spkt;
po->prot_hook.af_packet_priv = sk; + po->prot_hook.af_packet_net = sock_net(sk);
if (proto) { po->prot_hook.type = proto;
From: Guenter Roeck linux@roeck-us.net
commit f614629f9c1080dcc844a8430e3fb4c37ebbf05d upstream.
Experiments with MAX6646 and MAX6648 show that the alert function of those chips is broken, similar to other chips supported by the lm90 driver. Mark it accordingly.
Fixes: 4667bcb8d8fc ("hwmon: (lm90) Introduce chip parameter structure") Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hwmon/lm90.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/hwmon/lm90.c +++ b/drivers/hwmon/lm90.c @@ -394,7 +394,7 @@ static const struct lm90_params lm90_par .max_convrate = 9, }, [max6646] = { - .flags = LM90_HAVE_CRIT, + .flags = LM90_HAVE_CRIT | LM90_HAVE_BROKEN_ALERT, .alert_alarms = 0x7c, .max_convrate = 6, .reg_local_ext = MAX6657_REG_R_LOCAL_TEMPL,
From: Guenter Roeck linux@roeck-us.net
commit 94746b0ba479743355e0d3cc1cb9cfe3011fb8be upstream.
Experiments with MAX6680 and MAX6681 show that the alert function of those chips is broken, similar to other chips supported by the lm90 driver. Mark it accordingly.
Fixes: 4667bcb8d8fc ("hwmon: (lm90) Introduce chip parameter structure") Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/lm90.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/hwmon/lm90.c +++ b/drivers/hwmon/lm90.c @@ -418,7 +418,7 @@ static const struct lm90_params lm90_par }, [max6680] = { .flags = LM90_HAVE_OFFSET | LM90_HAVE_CRIT - | LM90_HAVE_CRIT_ALRM_SWP, + | LM90_HAVE_CRIT_ALRM_SWP | LM90_HAVE_BROKEN_ALERT, .alert_alarms = 0x7c, .max_convrate = 7, },
From: Xin Long lucien.xin@gmail.com
commit 2afc3b5a31f9edf3ef0f374f5d70610c79c93a42 upstream.
When 'ping' changes to use PING socket instead of RAW socket by:
# sysctl -w net.ipv4.ping_group_range="0 100"
the selftests 'router_broadcast.sh' will fail, as such command
# ip vrf exec vrf-h1 ping -I veth0 198.51.100.255 -b
can't receive the response skb by the PING socket. It's caused by mismatch of sk_bound_dev_if and dif in ping_rcv() when looking up the PING socket, as dif is vrf-h1 if dif's master was set to vrf-h1.
This patch is to fix this regression by also checking the sk_bound_dev_if against sdif so that the packets can stil be received even if the socket is not bound to the vrf device but to the real iif.
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Reported-by: Hangbin Liu liuhangbin@gmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/ping.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/ipv4/ping.c +++ b/net/ipv4/ping.c @@ -220,7 +220,8 @@ static struct sock *ping_lookup(struct n continue; }
- if (sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif) + if (sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif && + sk->sk_bound_dev_if != inet_sdif(skb)) continue;
sock_hold(sk);
From: Eric Dumazet edumazet@google.com
commit 23f57406b82de51809d5812afd96f210f8b627f3 upstream.
ip_select_ident_segs() has been very conservative about using the connected socket private generator only for packets with IP_DF set, claiming it was needed for some VJ compression implementations.
As mentioned in this referenced document, this can be abused. (Ref: Off-Path TCP Exploits of the Mixed IPID Assignment)
Before switching to pure random IPID generation and possibly hurt some workloads, lets use the private inet socket generator.
Not only this will remove one vulnerability, this will also improve performance of TCP flows using pmtudisc==IP_PMTUDISC_DONT
Fixes: 73f156a6e8c1 ("inetpeer: get rid of ip_id_count") Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: David Ahern dsahern@kernel.org Reported-by: Ray Che xijiache@gmail.com Cc: Willy Tarreau w@1wt.eu Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/ip.h | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-)
--- a/include/net/ip.h +++ b/include/net/ip.h @@ -525,19 +525,18 @@ static inline void ip_select_ident_segs( { struct iphdr *iph = ip_hdr(skb);
+ /* We had many attacks based on IPID, use the private + * generator as much as we can. + */ + if (sk && inet_sk(sk)->inet_daddr) { + iph->id = htons(inet_sk(sk)->inet_id); + inet_sk(sk)->inet_id += segs; + return; + } if ((iph->frag_off & htons(IP_DF)) && !skb->ignore_df) { - /* This is only to work around buggy Windows95/2000 - * VJ compression implementations. If the ID field - * does not change, they drop every other packet in - * a TCP stream using header compression. - */ - if (sk && inet_sk(sk)->inet_daddr) { - iph->id = htons(inet_sk(sk)->inet_id); - inet_sk(sk)->inet_id += segs; - } else { - iph->id = 0; - } + iph->id = 0; } else { + /* Unfortunately we need the big hammer to get a suitable IPID */ __ip_select_ident(net, iph, segs); } }
From: Guenter Roeck linux@roeck-us.net
[ Upstream commit a66c5ed539277b9f2363bbace0dba88b85b36c26 ]
According to its datasheet, G781 supports a maximum conversion rate value of 8 (62.5 ms). However, chips labeled G781 and G780 were found to only support a maximum conversion rate value of 7 (125 ms). On the other side, chips labeled G781-1 and G784 were found to support a conversion rate value of 8. There is no known means to distinguish G780 from G781 or G784; all chips report the same manufacturer ID and chip revision. Setting the conversion rate register value to 8 on chips not supporting it causes unexpected behavior since the real conversion rate is set to 0 (16 seconds) if a value of 8 is written into the conversion rate register. Limit the conversion rate register value to 7 for all G78x chips to avoid the problem.
Fixes: ae544f64cc7b ("hwmon: (lm90) Add support for GMT G781") Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/lm90.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/hwmon/lm90.c +++ b/drivers/hwmon/lm90.c @@ -373,7 +373,7 @@ static const struct lm90_params lm90_par .flags = LM90_HAVE_OFFSET | LM90_HAVE_REM_LIMIT_EXT | LM90_HAVE_BROKEN_ALERT | LM90_HAVE_CRIT, .alert_alarms = 0x7c, - .max_convrate = 8, + .max_convrate = 7, }, [lm86] = { .flags = LM90_HAVE_OFFSET | LM90_HAVE_REM_LIMIT_EXT
From: Trond Myklebust trond.myklebust@hammerspace.com
commit ac795161c93699d600db16c1a8cc23a65a1eceaf upstream.
If the application sets the O_DIRECTORY flag, and tries to open a regular file, nfs_atomic_open() will punt to doing a regular lookup. If the server then returns a regular file, we will happily return a file descriptor with uninitialised open state.
The fix is to return the expected ENOTDIR error in these cases.
Reported-by: Lyu Tao tao.lyu@epfl.ch Fixes: 0dd2b474d0b6 ("nfs: implement i_op->atomic_open()") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfs/dir.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
--- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1967,6 +1967,19 @@ out:
no_open: res = nfs_lookup(dir, dentry, lookup_flags); + if (!res) { + inode = d_inode(dentry); + if ((lookup_flags & LOOKUP_DIRECTORY) && inode && + !S_ISDIR(inode->i_mode)) + res = ERR_PTR(-ENOTDIR); + } else if (!IS_ERR(res)) { + inode = d_inode(res); + if ((lookup_flags & LOOKUP_DIRECTORY) && inode && + !S_ISDIR(inode->i_mode)) { + dput(res); + res = ERR_PTR(-ENOTDIR); + } + } if (switched) { d_lookup_done(dentry); if (!res)
From: Trond Myklebust trond.myklebust@hammerspace.com
commit 1751fc1db36f6f411709e143d5393f92d12137a9 upstream.
If the file type changes back to being a regular file on the server between the failed OPEN and our LOOKUP, then we need to re-run the OPEN.
Fixes: 0dd2b474d0b6 ("nfs: implement i_op->atomic_open()") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfs/dir.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1972,12 +1972,17 @@ no_open: if ((lookup_flags & LOOKUP_DIRECTORY) && inode && !S_ISDIR(inode->i_mode)) res = ERR_PTR(-ENOTDIR); + else if (inode && S_ISREG(inode->i_mode)) + res = ERR_PTR(-EOPENSTALE); } else if (!IS_ERR(res)) { inode = d_inode(res); if ((lookup_flags & LOOKUP_DIRECTORY) && inode && !S_ISDIR(inode->i_mode)) { dput(res); res = ERR_PTR(-ENOTDIR); + } else if (inode && S_ISREG(inode->i_mode)) { + dput(res); + res = ERR_PTR(-EOPENSTALE); } } if (switched) {
From: Jianguo Wu wujianguo@chinatelecom.cn
commit 1d10f8a1f40b965d449e8f2d5ed7b96a7c138b77 upstream.
After commit:7866a621043f ("dev: add per net_device packet type chains"), we can not get packet types that are bound to a specified net device by /proc/net/ptype, this patch fix the regression.
Run "tcpdump -i ens192 udp -nns0" Before and after apply this patch:
Before: [root@localhost ~]# cat /proc/net/ptype Type Device Function 0800 ip_rcv 0806 arp_rcv 86dd ipv6_rcv
After: [root@localhost ~]# cat /proc/net/ptype Type Device Function ALL ens192 tpacket_rcv 0800 ip_rcv 0806 arp_rcv 86dd ipv6_rcv
v1 -> v2: - fix the regression rather than adding new /proc API as suggested by Stephen Hemminger.
Fixes: 7866a621043f ("dev: add per net_device packet type chains") Signed-off-by: Jianguo Wu wujianguo@chinatelecom.cn Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/net-procfs.c | 35 ++++++++++++++++++++++++++++++++--- 1 file changed, 32 insertions(+), 3 deletions(-)
--- a/net/core/net-procfs.c +++ b/net/core/net-procfs.c @@ -190,12 +190,23 @@ static const struct seq_operations softn .show = softnet_seq_show, };
-static void *ptype_get_idx(loff_t pos) +static void *ptype_get_idx(struct seq_file *seq, loff_t pos) { + struct list_head *ptype_list = NULL; struct packet_type *pt = NULL; + struct net_device *dev; loff_t i = 0; int t;
+ for_each_netdev_rcu(seq_file_net(seq), dev) { + ptype_list = &dev->ptype_all; + list_for_each_entry_rcu(pt, ptype_list, list) { + if (i == pos) + return pt; + ++i; + } + } + list_for_each_entry_rcu(pt, &ptype_all, list) { if (i == pos) return pt; @@ -216,22 +227,40 @@ static void *ptype_seq_start(struct seq_ __acquires(RCU) { rcu_read_lock(); - return *pos ? ptype_get_idx(*pos - 1) : SEQ_START_TOKEN; + return *pos ? ptype_get_idx(seq, *pos - 1) : SEQ_START_TOKEN; }
static void *ptype_seq_next(struct seq_file *seq, void *v, loff_t *pos) { + struct net_device *dev; struct packet_type *pt; struct list_head *nxt; int hash;
++*pos; if (v == SEQ_START_TOKEN) - return ptype_get_idx(0); + return ptype_get_idx(seq, 0);
pt = v; nxt = pt->list.next; + if (pt->dev) { + if (nxt != &pt->dev->ptype_all) + goto found; + + dev = pt->dev; + for_each_netdev_continue_rcu(seq_file_net(seq), dev) { + if (!list_empty(&dev->ptype_all)) { + nxt = dev->ptype_all.next; + goto found; + } + } + + nxt = ptype_all.next; + goto ptype_all; + } + if (pt->type == htons(ETH_P_ALL)) { +ptype_all: if (nxt != &ptype_all) goto found; hash = 0;
From: Xianting Tian xianting.tian@linux.alibaba.com
commit 0a727b459ee39bd4c5ced19d6024258ac87b6b2e upstream.
For example, memory-region in .dts as below, reg = <0x0 0x50000000 0x0 0x20000000>
We can get below values, struct resource r; r.start = 0x50000000; r.end = 0x6fffffff;
So the size should be: size = r.end - r.start + 1 = 0x20000000
Signed-off-by: Xianting Tian xianting.tian@linux.alibaba.com Fixes: 072f1f9168ed ("drm/msm: add support for "stolen" mem") Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Link: https://lore.kernel.org/r/20220112123334.749776-1-xianting.tian@linux.alibab... Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/msm_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -466,7 +466,7 @@ static int msm_init_vram(struct drm_devi of_node_put(node); if (ret) return ret; - size = r.end - r.start; + size = r.end - r.start + 1; DRM_INFO("using VRAM carveout: %lx@%pa\n", size, &r.start);
/* if we have no IOMMU, then we need to use carveout allocator.
From: Miaoqian Lin linmq006@gmail.com
commit c04c3148ca12227d92f91b355b4538cc333c9922 upstream.
If of_find_device_by_node() succeeds, dsi_get_phy() doesn't a corresponding put_device(). Thus add put_device() to fix the exception handling.
Fixes: ec31abf ("drm/msm/dsi: Separate PHY to another platform device") Signed-off-by: Miaoqian Lin linmq006@gmail.com Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Link: https://lore.kernel.org/r/20211230070943.18116-1-linmq006@gmail.com Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/msm/dsi/dsi.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/msm/dsi/dsi.c +++ b/drivers/gpu/drm/msm/dsi/dsi.c @@ -40,7 +40,12 @@ static int dsi_get_phy(struct msm_dsi *m
of_node_put(phy_node);
- if (!phy_pdev || !msm_dsi->phy) { + if (!phy_pdev) { + DRM_DEV_ERROR(&pdev->dev, "%s: phy driver is not ready\n", __func__); + return -EPROBE_DEFER; + } + if (!msm_dsi->phy) { + put_device(&phy_pdev->dev); DRM_DEV_ERROR(&pdev->dev, "%s: phy driver is not ready\n", __func__); return -EPROBE_DEFER; }
From: José Expósito jose.exposito89@gmail.com
commit 5e761a2287234bc402ba7ef07129f5103bcd775c upstream.
The function performs a check on the "phy" input parameter, however, it is used before the check.
Initialize the "dev" variable after the sanity check to avoid a possible NULL pointer dereference.
Fixes: 5c8290284402b ("drm/msm/dsi: Split PHY drivers to separate files") Addresses-Coverity-ID: 1493860 ("Null pointer dereference") Signed-off-by: José Expósito jose.exposito89@gmail.com Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Link: https://lore.kernel.org/r/20220116181844.7400-1-jose.exposito89@gmail.com Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/msm/dsi/phy/dsi_phy.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/msm/dsi/phy/dsi_phy.c +++ b/drivers/gpu/drm/msm/dsi/phy/dsi_phy.c @@ -808,12 +808,14 @@ int msm_dsi_phy_enable(struct msm_dsi_ph struct msm_dsi_phy_clk_request *clk_req, struct msm_dsi_phy_shared_timings *shared_timings) { - struct device *dev = &phy->pdev->dev; + struct device *dev; int ret;
if (!phy || !phy->cfg->ops.enable) return -EINVAL;
+ dev = &phy->pdev->dev; + ret = dsi_phy_enable_resource(phy); if (ret) { DRM_DEV_ERROR(dev, "%s: resource enable failed, %d\n",
From: Eric Dumazet edumazet@google.com
commit aafc2e3285c2d7a79b7ee15221c19fbeca7b1509 upstream.
struct fib6_node's fn_sernum field can be read while other threads change it.
Add READ_ONCE()/WRITE_ONCE() annotations.
Do not change existing smp barriers in fib6_get_cookie_safe() and __fib6_update_sernum_upto_root()
syzbot reported:
BUG: KCSAN: data-race in fib6_clean_node / inet6_csk_route_socket
write to 0xffff88813df62e2c of 4 bytes by task 1920 on cpu 1: fib6_clean_node+0xc2/0x260 net/ipv6/ip6_fib.c:2178 fib6_walk_continue+0x38e/0x430 net/ipv6/ip6_fib.c:2112 fib6_walk net/ipv6/ip6_fib.c:2160 [inline] fib6_clean_tree net/ipv6/ip6_fib.c:2240 [inline] __fib6_clean_all+0x1a9/0x2e0 net/ipv6/ip6_fib.c:2256 fib6_flush_trees+0x6c/0x80 net/ipv6/ip6_fib.c:2281 rt_genid_bump_ipv6 include/net/net_namespace.h:488 [inline] addrconf_dad_completed+0x57f/0x870 net/ipv6/addrconf.c:4230 addrconf_dad_work+0x908/0x1170 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307 worker_thread+0x616/0xa70 kernel/workqueue.c:2454 kthread+0x1bf/0x1e0 kernel/kthread.c:359 ret_from_fork+0x1f/0x30
read to 0xffff88813df62e2c of 4 bytes by task 15701 on cpu 0: fib6_get_cookie_safe include/net/ip6_fib.h:285 [inline] rt6_get_cookie include/net/ip6_fib.h:306 [inline] ip6_dst_store include/net/ip6_route.h:234 [inline] inet6_csk_route_socket+0x352/0x3c0 net/ipv6/inet6_connection_sock.c:109 inet6_csk_xmit+0x91/0x1e0 net/ipv6/inet6_connection_sock.c:121 __tcp_transmit_skb+0x1323/0x1840 net/ipv4/tcp_output.c:1402 tcp_transmit_skb net/ipv4/tcp_output.c:1420 [inline] tcp_write_xmit+0x1450/0x4460 net/ipv4/tcp_output.c:2680 __tcp_push_pending_frames+0x68/0x1c0 net/ipv4/tcp_output.c:2864 tcp_push+0x2d9/0x2f0 net/ipv4/tcp.c:725 mptcp_push_release net/mptcp/protocol.c:1491 [inline] __mptcp_push_pending+0x46c/0x490 net/mptcp/protocol.c:1578 mptcp_sendmsg+0x9ec/0xa50 net/mptcp/protocol.c:1764 inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:643 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] kernel_sendmsg+0x97/0xd0 net/socket.c:745 sock_no_sendpage+0x84/0xb0 net/core/sock.c:3086 inet_sendpage+0x9d/0xc0 net/ipv4/af_inet.c:834 kernel_sendpage+0x187/0x200 net/socket.c:3492 sock_sendpage+0x5a/0x70 net/socket.c:1007 pipe_to_sendpage+0x128/0x160 fs/splice.c:364 splice_from_pipe_feed fs/splice.c:418 [inline] __splice_from_pipe+0x207/0x500 fs/splice.c:562 splice_from_pipe fs/splice.c:597 [inline] generic_splice_sendpage+0x94/0xd0 fs/splice.c:746 do_splice_from fs/splice.c:767 [inline] direct_splice_actor+0x80/0xa0 fs/splice.c:936 splice_direct_to_actor+0x345/0x650 fs/splice.c:891 do_splice_direct+0x106/0x190 fs/splice.c:979 do_sendfile+0x675/0xc40 fs/read_write.c:1245 __do_sys_sendfile64 fs/read_write.c:1310 [inline] __se_sys_sendfile64 fs/read_write.c:1296 [inline] __x64_sys_sendfile64+0x102/0x140 fs/read_write.c:1296 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x0000026f -> 0x00000271
Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 15701 Comm: syz-executor.2 Not tainted 5.16.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
The Fixes tag I chose is probably arbitrary, I do not think we need to backport this patch to older kernels.
Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Link: https://lore.kernel.org/r/20220120174112.1126644-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/ip6_fib.h | 2 +- net/ipv6/ip6_fib.c | 23 +++++++++++++---------- net/ipv6/route.c | 2 +- 3 files changed, 15 insertions(+), 12 deletions(-)
--- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -281,7 +281,7 @@ static inline bool fib6_get_cookie_safe( fn = rcu_dereference(f6i->fib6_node);
if (fn) { - *cookie = fn->fn_sernum; + *cookie = READ_ONCE(fn->fn_sernum); /* pairs with smp_wmb() in __fib6_update_sernum_upto_root() */ smp_rmb(); status = true; --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -111,7 +111,7 @@ void fib6_update_sernum(struct net *net, fn = rcu_dereference_protected(f6i->fib6_node, lockdep_is_held(&f6i->fib6_table->tb6_lock)); if (fn) - fn->fn_sernum = fib6_new_sernum(net); + WRITE_ONCE(fn->fn_sernum, fib6_new_sernum(net)); }
/* @@ -589,12 +589,13 @@ static int fib6_dump_table(struct fib6_t spin_unlock_bh(&table->tb6_lock); if (res > 0) { cb->args[4] = 1; - cb->args[5] = w->root->fn_sernum; + cb->args[5] = READ_ONCE(w->root->fn_sernum); } } else { - if (cb->args[5] != w->root->fn_sernum) { + int sernum = READ_ONCE(w->root->fn_sernum); + if (cb->args[5] != sernum) { /* Begin at the root if the tree changed */ - cb->args[5] = w->root->fn_sernum; + cb->args[5] = sernum; w->state = FWS_INIT; w->node = w->root; w->skip = w->count; @@ -1344,7 +1345,7 @@ static void __fib6_update_sernum_upto_ro /* paired with smp_rmb() in fib6_get_cookie_safe() */ smp_wmb(); while (fn) { - fn->fn_sernum = sernum; + WRITE_ONCE(fn->fn_sernum, sernum); fn = rcu_dereference_protected(fn->parent, lockdep_is_held(&rt->fib6_table->tb6_lock)); } @@ -2173,8 +2174,8 @@ static int fib6_clean_node(struct fib6_w };
if (c->sernum != FIB6_NO_SERNUM_CHANGE && - w->node->fn_sernum != c->sernum) - w->node->fn_sernum = c->sernum; + READ_ONCE(w->node->fn_sernum) != c->sernum) + WRITE_ONCE(w->node->fn_sernum, c->sernum);
if (!c->func) { WARN_ON_ONCE(c->sernum == FIB6_NO_SERNUM_CHANGE); @@ -2542,7 +2543,7 @@ static void ipv6_route_seq_setup_walk(st iter->w.state = FWS_INIT; iter->w.node = iter->w.root; iter->w.args = iter; - iter->sernum = iter->w.root->fn_sernum; + iter->sernum = READ_ONCE(iter->w.root->fn_sernum); INIT_LIST_HEAD(&iter->w.lh); fib6_walker_link(net, &iter->w); } @@ -2570,8 +2571,10 @@ static struct fib6_table *ipv6_route_seq
static void ipv6_route_check_sernum(struct ipv6_route_iter *iter) { - if (iter->sernum != iter->w.root->fn_sernum) { - iter->sernum = iter->w.root->fn_sernum; + int sernum = READ_ONCE(iter->w.root->fn_sernum); + + if (iter->sernum != sernum) { + iter->sernum = sernum; iter->w.state = FWS_INIT; iter->w.node = iter->w.root; WARN_ON(iter->w.skip); --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -2802,7 +2802,7 @@ static void ip6_link_failure(struct sk_b if (from) { fn = rcu_dereference(from->fib6_node); if (fn && (rt->rt6i_flags & RTF_DEFAULT)) - fn->fn_sernum = -1; + WRITE_ONCE(fn->fn_sernum, -1); } } rcu_read_unlock();
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 204975036b34f55237bc44c8a302a88468ef21b5 ]
Creating a hard link is required by POSIX to update the file ctime, so ensure that the file data is synced to disk so that we don't clobber the updated ctime by writing back after creating the hard link.
Fixes: 9f7682728728 ("NFS: Move the delegation return down into nfs4_proc_link()") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/dir.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -2397,6 +2397,8 @@ nfs_link(struct dentry *old_dentry, stru
trace_nfs_link_enter(inode, dir, dentry); d_drop(dentry); + if (S_ISREG(inode->i_mode)) + nfs_sync_inode(inode); error = NFS_PROTO(dir)->link(inode, dir, &dentry->d_name); if (error == 0) { nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit 6ff9d99bb88faebf134ca668842349d9718e5464 ]
Renaming a file is required by POSIX to update the file ctime, so ensure that the file data is synced to disk so that we don't clobber the updated ctime by writing back after creating the hard link.
Fixes: f2c2c552f119 ("NFS: Move delegation recall into the NFSv4 callback for rename_setup()") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/dir.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -2488,6 +2488,8 @@ int nfs_rename(struct user_namespace *mn } }
+ if (S_ISREG(old_inode->i_mode)) + nfs_sync_inode(old_inode); task = nfs_async_rename(old_dir, new_dir, old_dentry, new_dentry, NULL); if (IS_ERR(task)) { error = PTR_ERR(task);
From: Marc Zyngier maz@kernel.org
[ Upstream commit 094d00f8ca58c5d29b25e23b4daaed1ff1f13b41 ]
CMOs issued from EL2 cannot directly use the kernel helpers, as EL2 doesn't have a mapping of the guest pages. Oops.
Instead, use the mm_ops indirection to use helpers that will perform a mapping at EL2 and allow the CMO to be effective.
Fixes: 25aa28691bb9 ("KVM: arm64: Move guest CMOs to the fault handlers") Reviewed-by: Quentin Perret qperret@google.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20220114125038.1336965-1-maz@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/kvm/hyp/pgtable.c | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index f8ceebe4982eb..4c77ff556f0ae 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -921,13 +921,9 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, */ stage2_put_pte(ptep, mmu, addr, level, mm_ops);
- if (need_flush) { - kvm_pte_t *pte_follow = kvm_pte_follow(pte, mm_ops); - - dcache_clean_inval_poc((unsigned long)pte_follow, - (unsigned long)pte_follow + - kvm_granule_size(level)); - } + if (need_flush && mm_ops->dcache_clean_inval_poc) + mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops), + kvm_granule_size(level));
if (childp) mm_ops->put_page(childp); @@ -1089,15 +1085,13 @@ static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, struct kvm_pgtable *pgt = arg; struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops; kvm_pte_t pte = *ptep; - kvm_pte_t *pte_follow;
if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte)) return 0;
- pte_follow = kvm_pte_follow(pte, mm_ops); - dcache_clean_inval_poc((unsigned long)pte_follow, - (unsigned long)pte_follow + - kvm_granule_size(level)); + if (mm_ops->dcache_clean_inval_poc) + mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops), + kvm_granule_size(level)); return 0; }
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit aed28b7a2d620cb5cd0c554cb889075c02e25e8e ]
Fixes: e26d9972720e ("SUNRPC: Clean up scheduling of autoclose") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/trace/events/sunrpc.h | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index 7b5dcff84cf27..54eb1654d9ecc 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -953,7 +953,8 @@ TRACE_EVENT(rpc_socket_nospace, { BIT(XPRT_REMOVE), "REMOVE" }, \ { BIT(XPRT_CONGESTED), "CONGESTED" }, \ { BIT(XPRT_CWND_WAIT), "CWND_WAIT" }, \ - { BIT(XPRT_WRITE_SPACE), "WRITE_SPACE" }) + { BIT(XPRT_WRITE_SPACE), "WRITE_SPACE" }, \ + { BIT(XPRT_SND_IS_COOKIE), "SND_IS_COOKIE" })
DECLARE_EVENT_CLASS(rpc_xprt_lifetime_class, TP_PROTO( @@ -1150,8 +1151,11 @@ DECLARE_EVENT_CLASS(xprt_writelock_event, __entry->task_id = -1; __entry->client_id = -1; } - __entry->snd_task_id = xprt->snd_task ? - xprt->snd_task->tk_pid : -1; + if (xprt->snd_task && + !test_bit(XPRT_SND_IS_COOKIE, &xprt->state)) + __entry->snd_task_id = xprt->snd_task->tk_pid; + else + __entry->snd_task_id = -1; ),
TP_printk(SUNRPC_TRACE_TASK_SPECIFIER @@ -1196,8 +1200,12 @@ DECLARE_EVENT_CLASS(xprt_cong_event, __entry->task_id = -1; __entry->client_id = -1; } - __entry->snd_task_id = xprt->snd_task ? - xprt->snd_task->tk_pid : -1; + if (xprt->snd_task && + !test_bit(XPRT_SND_IS_COOKIE, &xprt->state)) + __entry->snd_task_id = xprt->snd_task->tk_pid; + else + __entry->snd_task_id = -1; + __entry->cong = xprt->cong; __entry->cwnd = xprt->cwnd; __entry->wait = test_bit(XPRT_CWND_WAIT, &xprt->state);
From: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com
[ Upstream commit 3f5f766d5f7f95a69a630da3544a1a0cee1cdddf ]
Johan reported the below crash with test_bpf on ppc64 e5500:
test_bpf: #296 ALU_END_FROM_LE 64: 0x0123456789abcdef -> 0x67452301 jited:1 Oops: Exception in kernel mode, sig: 4 [#1] BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500 Modules linked in: test_bpf(+) CPU: 0 PID: 76 Comm: insmod Not tainted 5.14.0-03771-g98c2059e008a-dirty #1 NIP: 8000000000061c3c LR: 80000000006dea64 CTR: 8000000000061c18 REGS: c0000000032d3420 TRAP: 0700 Not tainted (5.14.0-03771-g98c2059e008a-dirty) MSR: 0000000080089000 <EE,ME> CR: 88002822 XER: 20000000 IRQMASK: 0 <...> NIP [8000000000061c3c] 0x8000000000061c3c LR [80000000006dea64] .__run_one+0x104/0x17c [test_bpf] Call Trace: .__run_one+0x60/0x17c [test_bpf] (unreliable) .test_bpf_init+0x6a8/0xdc8 [test_bpf] .do_one_initcall+0x6c/0x28c .do_init_module+0x68/0x28c .load_module+0x2460/0x2abc .__do_sys_init_module+0x120/0x18c .system_call_exception+0x110/0x1b8 system_call_common+0xf0/0x210 --- interrupt: c00 at 0x101d0acc <...> ---[ end trace 47b2bf19090bb3d0 ]---
Illegal instruction
The illegal instruction turned out to be 'ldbrx' emitted for BPF_FROM_[L|B]E, which was only introduced in ISA v2.06. Guard use of the same and implement an alternative approach for older processors.
Fixes: 156d0e290e969c ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF") Reported-by: Johan Almbladh johan.almbladh@anyfinetworks.com Signed-off-by: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com Tested-by: Johan Almbladh johan.almbladh@anyfinetworks.com Acked-by: Johan Almbladh johan.almbladh@anyfinetworks.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/d1e51c6fdf572062cf3009a751c3406bda01b832.164146812... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/ppc-opcode.h | 1 + arch/powerpc/net/bpf_jit_comp64.c | 22 +++++++++++++--------- 2 files changed, 14 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index baea657bc8687..bca31a61e57f8 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -498,6 +498,7 @@ #define PPC_RAW_LDX(r, base, b) (0x7c00002a | ___PPC_RT(r) | ___PPC_RA(base) | ___PPC_RB(b)) #define PPC_RAW_LHZ(r, base, i) (0xa0000000 | ___PPC_RT(r) | ___PPC_RA(base) | IMM_L(i)) #define PPC_RAW_LHBRX(r, base, b) (0x7c00062c | ___PPC_RT(r) | ___PPC_RA(base) | ___PPC_RB(b)) +#define PPC_RAW_LWBRX(r, base, b) (0x7c00042c | ___PPC_RT(r) | ___PPC_RA(base) | ___PPC_RB(b)) #define PPC_RAW_LDBRX(r, base, b) (0x7c000428 | ___PPC_RT(r) | ___PPC_RA(base) | ___PPC_RB(b)) #define PPC_RAW_STWCX(s, a, b) (0x7c00012d | ___PPC_RS(s) | ___PPC_RA(a) | ___PPC_RB(b)) #define PPC_RAW_CMPWI(a, i) (0x2c000000 | ___PPC_RA(a) | IMM_L(i)) diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c index 6057e900a8790..a26a782e8b78e 100644 --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -633,17 +633,21 @@ bpf_alu32_trunc: EMIT(PPC_RAW_MR(dst_reg, b2p[TMP_REG_1])); break; case 64: - /* - * Way easier and faster(?) to store the value - * into stack and then use ldbrx - * - * ctx->seen will be reliable in pass2, but - * the instructions generated will remain the - * same across all passes - */ + /* Store the value to stack and then use byte-reverse loads */ PPC_BPF_STL(dst_reg, 1, bpf_jit_stack_local(ctx)); EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], 1, bpf_jit_stack_local(ctx))); - EMIT(PPC_RAW_LDBRX(dst_reg, 0, b2p[TMP_REG_1])); + if (cpu_has_feature(CPU_FTR_ARCH_206)) { + EMIT(PPC_RAW_LDBRX(dst_reg, 0, b2p[TMP_REG_1])); + } else { + EMIT(PPC_RAW_LWBRX(dst_reg, 0, b2p[TMP_REG_1])); + if (IS_ENABLED(CONFIG_CPU_LITTLE_ENDIAN)) + EMIT(PPC_RAW_SLDI(dst_reg, dst_reg, 32)); + EMIT(PPC_RAW_LI(b2p[TMP_REG_2], 4)); + EMIT(PPC_RAW_LWBRX(b2p[TMP_REG_2], b2p[TMP_REG_2], b2p[TMP_REG_1])); + if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN)) + EMIT(PPC_RAW_SLDI(b2p[TMP_REG_2], b2p[TMP_REG_2], 32)); + EMIT(PPC_RAW_OR(dst_reg, dst_reg, b2p[TMP_REG_2])); + } break; } break;
From: Florian Westphal fw@strlen.de
[ Upstream commit 830af2eba40327abec64325a5b08b1e85c37a2e0 ]
The packet isn't invalid, REPEAT means we're trying again after cleaning out a stale connection, e.g. via tcp tracker.
This caused increases of invalid stat counter in a test case involving frequent connection reuse, even though no packet is actually invalid.
Fixes: 56a62e2218f5 ("netfilter: conntrack: fix NF_REPEAT handling") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_conntrack_core.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 4712a90a1820c..7f79974607643 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -1922,15 +1922,17 @@ repeat: pr_debug("nf_conntrack_in: Can't track with proto module\n"); nf_conntrack_put(&ct->ct_general); skb->_nfct = 0; - NF_CT_STAT_INC_ATOMIC(state->net, invalid); - if (ret == -NF_DROP) - NF_CT_STAT_INC_ATOMIC(state->net, drop); /* Special case: TCP tracker reports an attempt to reopen a * closed/aborted connection. We have to go back and create a * fresh conntrack. */ if (ret == -NF_REPEAT) goto repeat; + + NF_CT_STAT_INC_ATOMIC(state->net, invalid); + if (ret == -NF_DROP) + NF_CT_STAT_INC_ATOMIC(state->net, drop); + ret = -ret; goto out; }
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit eee412e968f7b950564880bc6a7a9f00f49034da ]
When CONFIG_QCOM_AOSS_QMP=m and CONFIG_QCOM_Q6V5_MSS=y, the builtin driver cannot call into the loadable module's low-level service functions. Trying to build with that config combo causes linker errors.
There are two problems here. First, drivers/remoteproc/qcom_q6v5.c should #include <linux/soc/qcom/qcom_aoss.h> for the definitions of the service functions, depending on whether CONFIG_QCOM_AOSS_QMP is set/enabled or not. Second, the qcom remoteproc drivers should depend on QCOM_AOSS_QMP iff it is enabled (=y or =m) so that the qcom remoteproc drivers can be built properly.
This prevents these build errors:
aarch64-linux-ld: drivers/remoteproc/qcom_q6v5.o: in function `q6v5_load_state_toggle': qcom_q6v5.c:(.text+0xc4): undefined reference to `qmp_send' aarch64-linux-ld: drivers/remoteproc/qcom_q6v5.o: in function `qcom_q6v5_deinit': (.text+0x2e4): undefined reference to `qmp_put' aarch64-linux-ld: drivers/remoteproc/qcom_q6v5.o: in function `qcom_q6v5_init': (.text+0x778): undefined reference to `qmp_get' aarch64-linux-ld: (.text+0x7d8): undefined reference to `qmp_put'
Fixes: c1fe10d238c0 ("remoteproc: qcom: q6v5: Use qmp_send to update co-processor load state") Signed-off-by: Randy Dunlap rdunlap@infradead.org Reported-by: kernel test robot lkp@intel.com Cc: Bjorn Andersson bjorn.andersson@linaro.org Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: linux-remoteproc@vger.kernel.org Cc: Sibi Sankar sibis@codeaurora.org Cc: Stephen Boyd swboyd@chromium.org Reviewed-by: Stephen Boyd swboyd@chromium.org Reviewed-by: Bjorn Andersson bjorn.andersson@linaro.org Signed-off-by: Bjorn Andersson bjorn.andersson@linaro.org Link: https://lore.kernel.org/r/20220115011338.2973-1-rdunlap@infradead.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/remoteproc/Kconfig | 4 ++++ drivers/remoteproc/qcom_q6v5.c | 1 + 2 files changed, 5 insertions(+)
diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig index f2e961f998ca2..341156e2a29b9 100644 --- a/drivers/remoteproc/Kconfig +++ b/drivers/remoteproc/Kconfig @@ -180,6 +180,7 @@ config QCOM_Q6V5_ADSP depends on RPMSG_QCOM_GLINK_SMEM || RPMSG_QCOM_GLINK_SMEM=n depends on QCOM_SYSMON || QCOM_SYSMON=n depends on RPMSG_QCOM_GLINK || RPMSG_QCOM_GLINK=n + depends on QCOM_AOSS_QMP || QCOM_AOSS_QMP=n select MFD_SYSCON select QCOM_PIL_INFO select QCOM_MDT_LOADER @@ -199,6 +200,7 @@ config QCOM_Q6V5_MSS depends on RPMSG_QCOM_GLINK_SMEM || RPMSG_QCOM_GLINK_SMEM=n depends on QCOM_SYSMON || QCOM_SYSMON=n depends on RPMSG_QCOM_GLINK || RPMSG_QCOM_GLINK=n + depends on QCOM_AOSS_QMP || QCOM_AOSS_QMP=n select MFD_SYSCON select QCOM_MDT_LOADER select QCOM_PIL_INFO @@ -218,6 +220,7 @@ config QCOM_Q6V5_PAS depends on RPMSG_QCOM_GLINK_SMEM || RPMSG_QCOM_GLINK_SMEM=n depends on QCOM_SYSMON || QCOM_SYSMON=n depends on RPMSG_QCOM_GLINK || RPMSG_QCOM_GLINK=n + depends on QCOM_AOSS_QMP || QCOM_AOSS_QMP=n select MFD_SYSCON select QCOM_PIL_INFO select QCOM_MDT_LOADER @@ -239,6 +242,7 @@ config QCOM_Q6V5_WCSS depends on RPMSG_QCOM_GLINK_SMEM || RPMSG_QCOM_GLINK_SMEM=n depends on QCOM_SYSMON || QCOM_SYSMON=n depends on RPMSG_QCOM_GLINK || RPMSG_QCOM_GLINK=n + depends on QCOM_AOSS_QMP || QCOM_AOSS_QMP=n select MFD_SYSCON select QCOM_MDT_LOADER select QCOM_PIL_INFO diff --git a/drivers/remoteproc/qcom_q6v5.c b/drivers/remoteproc/qcom_q6v5.c index eada7e34f3af5..442a388f81028 100644 --- a/drivers/remoteproc/qcom_q6v5.c +++ b/drivers/remoteproc/qcom_q6v5.c @@ -10,6 +10,7 @@ #include <linux/platform_device.h> #include <linux/interrupt.h> #include <linux/module.h> +#include <linux/soc/qcom/qcom_aoss.h> #include <linux/soc/qcom/smem.h> #include <linux/soc/qcom/smem_state.h> #include <linux/remoteproc.h>
From: Nicholas Piggin npiggin@gmail.com
[ Upstream commit aee101d7b95a03078945681dd7f7ea5e4a1e7686 ]
Commit 314f6c23dd8d ("powerpc/64s: Mask NIP before checking against SRR0") masked off the low 2 bits of the NIP value in the interrupt stack frame in case they are non-zero and mis-compare against a SRR0 register value of a CPU which always reads back 0 from the 2 low bits which are reserved.
This now causes the opposite problem that an implementation which does implement those bits in SRR0 will mis-compare against the masked NIP value in which they have been cleared. QEMU is one such implementation, and this is allowed by the architecture.
This can be triggered by sigfuz by setting low bits of PT_NIP in the signal context.
Fix this for now by masking the SRR0 bits as well. Cleaner is probably to sanitise these values before putting them in registers or stack, but this is the quick and backportable fix.
Fixes: 314f6c23dd8d ("powerpc/64s: Mask NIP before checking against SRR0") Signed-off-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220117134403.2995059-1-npiggin@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/interrupt_64.S | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S index 4b1ff94e67eb4..4c6d1a8dcefed 100644 --- a/arch/powerpc/kernel/interrupt_64.S +++ b/arch/powerpc/kernel/interrupt_64.S @@ -30,6 +30,7 @@ COMPAT_SYS_CALL_TABLE: .ifc \srr,srr mfspr r11,SPRN_SRR0 ld r12,_NIP(r1) + clrrdi r11,r11,2 clrrdi r12,r12,2 100: tdne r11,r12 EMIT_WARN_ENTRY 100b,__FILE__,__LINE__,(BUGFLAG_WARNING | BUGFLAG_ONCE) @@ -40,6 +41,7 @@ COMPAT_SYS_CALL_TABLE: .else mfspr r11,SPRN_HSRR0 ld r12,_NIP(r1) + clrrdi r11,r11,2 clrrdi r12,r12,2 100: tdne r11,r12 EMIT_WARN_ENTRY 100b,__FILE__,__LINE__,(BUGFLAG_WARNING | BUGFLAG_ONCE)
From: Peter Zijlstra peterz@infradead.org
[ Upstream commit 09f5e7dc7ad705289e1b1ec065439aa3c42951c4 ]
Time readers that cannot take locks (due to NMI etc..) currently make use of perf_event::shadow_ctx_time, which, for that event gives:
time' = now + (time - timestamp)
or, alternatively arranged:
time' = time + (now - timestamp)
IOW, the progression of time since the last time the shadow_ctx_time was updated.
There's problems with this:
A) the shadow_ctx_time is per-event, even though the ctx_time it reflects is obviously per context. The direct concequence of this is that the context needs to iterate all events all the time to keep the shadow_ctx_time in sync.
B) even with the prior point, the context itself might not be active meaning its time should not advance to begin with.
C) shadow_ctx_time isn't consistently updated when ctx_time is
There are 3 users of this stuff, that suffer differently from this:
- calc_timer_values() - perf_output_read() - perf_event_update_userpage() /* A */
- perf_event_read_local() /* A,B */
In particular, perf_output_read() doesn't suffer at all, because it's sample driven and hence only relevant when the event is actually running.
This same was supposed to be true for perf_event_update_userpage(), after all self-monitoring implies the context is active *HOWEVER*, as per commit f79256532682 ("perf/core: fix userpage->time_enabled of inactive events") this goes wrong when combined with counter overcommit, in that case those events that do not get scheduled when the context becomes active (task events typically) miss out on the EVENT_TIME update and ENABLED time is inflated (for a little while) with the time the context was inactive. Once the event gets rotated in, this gets corrected, leading to a non-monotonic timeflow.
perf_event_read_local() made things even worse, it can request time at any point, suffering all the problems perf_event_update_userpage() does and more. Because while perf_event_update_userpage() is limited by the context being active, perf_event_read_local() users have no such constraint.
Therefore, completely overhaul things and do away with perf_event::shadow_ctx_time. Instead have regular context time updates keep track of this offset directly and provide perf_event_time_now() to complement perf_event_time().
perf_event_time_now() will, in adition to being context wide, also take into account if the context is active. For inactive context, it will not advance time.
This latter property means the cgroup perf_cgroup_info context needs to grow addition state to track this.
Additionally, since all this is strictly per-cpu, we can use barrier() to order context activity vs context time.
Fixes: 7d9285e82db5 ("perf/bpf: Extend the perf_event_read_local() interface, a.k.a. "bpf: perf event change needed for subsequent bpf helpers"") Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Tested-by: Song Liu song@kernel.org Tested-by: Namhyung Kim namhyung@kernel.org Link: https://lkml.kernel.org/r/YcB06DasOBtU0b00@hirez.programming.kicks-ass.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/perf_event.h | 15 +-- kernel/events/core.c | 246 ++++++++++++++++++++++--------------- 2 files changed, 149 insertions(+), 112 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 318c489b735bc..d7f927f8c335d 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -681,18 +681,6 @@ struct perf_event { u64 total_time_running; u64 tstamp;
- /* - * timestamp shadows the actual context timing but it can - * be safely used in NMI interrupt context. It reflects the - * context time as it was when the event was last scheduled in, - * or when ctx_sched_in failed to schedule the event because we - * run out of PMC. - * - * ctx_time already accounts for ctx->timestamp. Therefore to - * compute ctx_time for a sample, simply add perf_clock(). - */ - u64 shadow_ctx_time; - struct perf_event_attr attr; u16 header_size; u16 id_header_size; @@ -839,6 +827,7 @@ struct perf_event_context { */ u64 time; u64 timestamp; + u64 timeoffset;
/* * These fields let us detect when two contexts have both @@ -921,6 +910,8 @@ struct bpf_perf_event_data_kern { struct perf_cgroup_info { u64 time; u64 timestamp; + u64 timeoffset; + int active; };
struct perf_cgroup { diff --git a/kernel/events/core.c b/kernel/events/core.c index 63f0414666438..a0064dd706538 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -674,6 +674,23 @@ perf_event_set_state(struct perf_event *event, enum perf_event_state state) WRITE_ONCE(event->state, state); }
+/* + * UP store-release, load-acquire + */ + +#define __store_release(ptr, val) \ +do { \ + barrier(); \ + WRITE_ONCE(*(ptr), (val)); \ +} while (0) + +#define __load_acquire(ptr) \ +({ \ + __unqual_scalar_typeof(*(ptr)) ___p = READ_ONCE(*(ptr)); \ + barrier(); \ + ___p; \ +}) + #ifdef CONFIG_CGROUP_PERF
static inline bool @@ -719,34 +736,51 @@ static inline u64 perf_cgroup_event_time(struct perf_event *event) return t->time; }
-static inline void __update_cgrp_time(struct perf_cgroup *cgrp) +static inline u64 perf_cgroup_event_time_now(struct perf_event *event, u64 now) { - struct perf_cgroup_info *info; - u64 now; - - now = perf_clock(); + struct perf_cgroup_info *t;
- info = this_cpu_ptr(cgrp->info); + t = per_cpu_ptr(event->cgrp->info, event->cpu); + if (!__load_acquire(&t->active)) + return t->time; + now += READ_ONCE(t->timeoffset); + return now; +}
- info->time += now - info->timestamp; +static inline void __update_cgrp_time(struct perf_cgroup_info *info, u64 now, bool adv) +{ + if (adv) + info->time += now - info->timestamp; info->timestamp = now; + /* + * see update_context_time() + */ + WRITE_ONCE(info->timeoffset, info->time - info->timestamp); }
-static inline void update_cgrp_time_from_cpuctx(struct perf_cpu_context *cpuctx) +static inline void update_cgrp_time_from_cpuctx(struct perf_cpu_context *cpuctx, bool final) { struct perf_cgroup *cgrp = cpuctx->cgrp; struct cgroup_subsys_state *css; + struct perf_cgroup_info *info;
if (cgrp) { + u64 now = perf_clock(); + for (css = &cgrp->css; css; css = css->parent) { cgrp = container_of(css, struct perf_cgroup, css); - __update_cgrp_time(cgrp); + info = this_cpu_ptr(cgrp->info); + + __update_cgrp_time(info, now, true); + if (final) + __store_release(&info->active, 0); } } }
static inline void update_cgrp_time_from_event(struct perf_event *event) { + struct perf_cgroup_info *info; struct perf_cgroup *cgrp;
/* @@ -760,8 +794,10 @@ static inline void update_cgrp_time_from_event(struct perf_event *event) /* * Do not update time when cgroup is not active */ - if (cgroup_is_descendant(cgrp->css.cgroup, event->cgrp->css.cgroup)) - __update_cgrp_time(event->cgrp); + if (cgroup_is_descendant(cgrp->css.cgroup, event->cgrp->css.cgroup)) { + info = this_cpu_ptr(event->cgrp->info); + __update_cgrp_time(info, perf_clock(), true); + } }
static inline void @@ -785,7 +821,8 @@ perf_cgroup_set_timestamp(struct task_struct *task, for (css = &cgrp->css; css; css = css->parent) { cgrp = container_of(css, struct perf_cgroup, css); info = this_cpu_ptr(cgrp->info); - info->timestamp = ctx->timestamp; + __update_cgrp_time(info, ctx->timestamp, false); + __store_release(&info->active, 1); } }
@@ -981,14 +1018,6 @@ out: return ret; }
-static inline void -perf_cgroup_set_shadow_time(struct perf_event *event, u64 now) -{ - struct perf_cgroup_info *t; - t = per_cpu_ptr(event->cgrp->info, event->cpu); - event->shadow_ctx_time = now - t->timestamp; -} - static inline void perf_cgroup_event_enable(struct perf_event *event, struct perf_event_context *ctx) { @@ -1066,7 +1095,8 @@ static inline void update_cgrp_time_from_event(struct perf_event *event) { }
-static inline void update_cgrp_time_from_cpuctx(struct perf_cpu_context *cpuctx) +static inline void update_cgrp_time_from_cpuctx(struct perf_cpu_context *cpuctx, + bool final) { }
@@ -1098,12 +1128,12 @@ perf_cgroup_switch(struct task_struct *task, struct task_struct *next) { }
-static inline void -perf_cgroup_set_shadow_time(struct perf_event *event, u64 now) +static inline u64 perf_cgroup_event_time(struct perf_event *event) { + return 0; }
-static inline u64 perf_cgroup_event_time(struct perf_event *event) +static inline u64 perf_cgroup_event_time_now(struct perf_event *event, u64 now) { return 0; } @@ -1525,22 +1555,59 @@ static void perf_unpin_context(struct perf_event_context *ctx) /* * Update the record of the current time in a context. */ -static void update_context_time(struct perf_event_context *ctx) +static void __update_context_time(struct perf_event_context *ctx, bool adv) { u64 now = perf_clock();
- ctx->time += now - ctx->timestamp; + if (adv) + ctx->time += now - ctx->timestamp; ctx->timestamp = now; + + /* + * The above: time' = time + (now - timestamp), can be re-arranged + * into: time` = now + (time - timestamp), which gives a single value + * offset to compute future time without locks on. + * + * See perf_event_time_now(), which can be used from NMI context where + * it's (obviously) not possible to acquire ctx->lock in order to read + * both the above values in a consistent manner. + */ + WRITE_ONCE(ctx->timeoffset, ctx->time - ctx->timestamp); +} + +static void update_context_time(struct perf_event_context *ctx) +{ + __update_context_time(ctx, true); }
static u64 perf_event_time(struct perf_event *event) { struct perf_event_context *ctx = event->ctx;
+ if (unlikely(!ctx)) + return 0; + if (is_cgroup_event(event)) return perf_cgroup_event_time(event);
- return ctx ? ctx->time : 0; + return ctx->time; +} + +static u64 perf_event_time_now(struct perf_event *event, u64 now) +{ + struct perf_event_context *ctx = event->ctx; + + if (unlikely(!ctx)) + return 0; + + if (is_cgroup_event(event)) + return perf_cgroup_event_time_now(event, now); + + if (!(__load_acquire(&ctx->is_active) & EVENT_TIME)) + return ctx->time; + + now += READ_ONCE(ctx->timeoffset); + return now; }
static enum event_type_t get_event_type(struct perf_event *event) @@ -2346,7 +2413,7 @@ __perf_remove_from_context(struct perf_event *event,
if (ctx->is_active & EVENT_TIME) { update_context_time(ctx); - update_cgrp_time_from_cpuctx(cpuctx); + update_cgrp_time_from_cpuctx(cpuctx, false); }
event_sched_out(event, cpuctx, ctx); @@ -2357,6 +2424,9 @@ __perf_remove_from_context(struct perf_event *event, list_del_event(event, ctx);
if (!ctx->nr_events && ctx->is_active) { + if (ctx == &cpuctx->ctx) + update_cgrp_time_from_cpuctx(cpuctx, true); + ctx->is_active = 0; ctx->rotate_necessary = 0; if (ctx->task) { @@ -2478,40 +2548,6 @@ void perf_event_disable_inatomic(struct perf_event *event) irq_work_queue(&event->pending); }
-static void perf_set_shadow_time(struct perf_event *event, - struct perf_event_context *ctx) -{ - /* - * use the correct time source for the time snapshot - * - * We could get by without this by leveraging the - * fact that to get to this function, the caller - * has most likely already called update_context_time() - * and update_cgrp_time_xx() and thus both timestamp - * are identical (or very close). Given that tstamp is, - * already adjusted for cgroup, we could say that: - * tstamp - ctx->timestamp - * is equivalent to - * tstamp - cgrp->timestamp. - * - * Then, in perf_output_read(), the calculation would - * work with no changes because: - * - event is guaranteed scheduled in - * - no scheduled out in between - * - thus the timestamp would be the same - * - * But this is a bit hairy. - * - * So instead, we have an explicit cgroup call to remain - * within the time source all along. We believe it - * is cleaner and simpler to understand. - */ - if (is_cgroup_event(event)) - perf_cgroup_set_shadow_time(event, event->tstamp); - else - event->shadow_ctx_time = event->tstamp - ctx->timestamp; -} - #define MAX_INTERRUPTS (~0ULL)
static void perf_log_throttle(struct perf_event *event, int enable); @@ -2552,8 +2588,6 @@ event_sched_in(struct perf_event *event,
perf_pmu_disable(event->pmu);
- perf_set_shadow_time(event, ctx); - perf_log_itrace_start(event);
if (event->pmu->add(event, PERF_EF_START)) { @@ -3247,16 +3281,6 @@ static void ctx_sched_out(struct perf_event_context *ctx, return; }
- ctx->is_active &= ~event_type; - if (!(ctx->is_active & EVENT_ALL)) - ctx->is_active = 0; - - if (ctx->task) { - WARN_ON_ONCE(cpuctx->task_ctx != ctx); - if (!ctx->is_active) - cpuctx->task_ctx = NULL; - } - /* * Always update time if it was set; not only when it changes. * Otherwise we can 'forget' to update time for any but the last @@ -3270,7 +3294,22 @@ static void ctx_sched_out(struct perf_event_context *ctx, if (is_active & EVENT_TIME) { /* update (and stop) ctx time */ update_context_time(ctx); - update_cgrp_time_from_cpuctx(cpuctx); + update_cgrp_time_from_cpuctx(cpuctx, ctx == &cpuctx->ctx); + /* + * CPU-release for the below ->is_active store, + * see __load_acquire() in perf_event_time_now() + */ + barrier(); + } + + ctx->is_active &= ~event_type; + if (!(ctx->is_active & EVENT_ALL)) + ctx->is_active = 0; + + if (ctx->task) { + WARN_ON_ONCE(cpuctx->task_ctx != ctx); + if (!ctx->is_active) + cpuctx->task_ctx = NULL; }
is_active ^= ctx->is_active; /* changed bits */ @@ -3707,13 +3746,19 @@ static noinline int visit_groups_merge(struct perf_cpu_context *cpuctx, return 0; }
+/* + * Because the userpage is strictly per-event (there is no concept of context, + * so there cannot be a context indirection), every userpage must be updated + * when context time starts :-( + * + * IOW, we must not miss EVENT_TIME edges. + */ static inline bool event_update_userpage(struct perf_event *event) { if (likely(!atomic_read(&event->mmap_count))) return false;
perf_event_update_time(event); - perf_set_shadow_time(event, event->ctx); perf_event_update_userpage(event);
return true; @@ -3797,13 +3842,23 @@ ctx_sched_in(struct perf_event_context *ctx, struct task_struct *task) { int is_active = ctx->is_active; - u64 now;
lockdep_assert_held(&ctx->lock);
if (likely(!ctx->nr_events)) return;
+ if (is_active ^ EVENT_TIME) { + /* start ctx time */ + __update_context_time(ctx, false); + perf_cgroup_set_timestamp(task, ctx); + /* + * CPU-release for the below ->is_active store, + * see __load_acquire() in perf_event_time_now() + */ + barrier(); + } + ctx->is_active |= (event_type | EVENT_TIME); if (ctx->task) { if (!is_active) @@ -3814,13 +3869,6 @@ ctx_sched_in(struct perf_event_context *ctx,
is_active ^= ctx->is_active; /* changed bits */
- if (is_active & EVENT_TIME) { - /* start ctx time */ - now = perf_clock(); - ctx->timestamp = now; - perf_cgroup_set_timestamp(task, ctx); - } - /* * First go through the list and put on any pinned groups * in order to give them the best chance of going on. @@ -4414,6 +4462,18 @@ static inline u64 perf_event_count(struct perf_event *event) return local64_read(&event->count) + atomic64_read(&event->child_count); }
+static void calc_timer_values(struct perf_event *event, + u64 *now, + u64 *enabled, + u64 *running) +{ + u64 ctx_time; + + *now = perf_clock(); + ctx_time = perf_event_time_now(event, *now); + __perf_update_times(event, ctx_time, enabled, running); +} + /* * NMI-safe method to read a local event, that is an event that * is: @@ -4473,10 +4533,9 @@ int perf_event_read_local(struct perf_event *event, u64 *value,
*value = local64_read(&event->count); if (enabled || running) { - u64 now = event->shadow_ctx_time + perf_clock(); - u64 __enabled, __running; + u64 __enabled, __running, __now;;
- __perf_update_times(event, now, &__enabled, &__running); + calc_timer_values(event, &__now, &__enabled, &__running); if (enabled) *enabled = __enabled; if (running) @@ -5798,18 +5857,6 @@ static int perf_event_index(struct perf_event *event) return event->pmu->event_idx(event); }
-static void calc_timer_values(struct perf_event *event, - u64 *now, - u64 *enabled, - u64 *running) -{ - u64 ctx_time; - - *now = perf_clock(); - ctx_time = event->shadow_ctx_time + *now; - __perf_update_times(event, ctx_time, enabled, running); -} - static void perf_event_init_userpage(struct perf_event *event) { struct perf_event_mmap_page *userpg; @@ -6349,7 +6396,6 @@ accounting: ring_buffer_attach(event, rb);
perf_event_update_time(event); - perf_set_shadow_time(event, event->ctx); perf_event_init_userpage(event); perf_event_update_userpage(event); } else {
From: Vincent Guittot vincent.guittot@linaro.org
[ Upstream commit 98b0d890220d45418cfbc5157b3382e6da5a12ab ]
Rick reported performance regressions in bugzilla because of cpu frequency being lower than before: https://bugzilla.kernel.org/show_bug.cgi?id=215045
He bisected the problem to: commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
This commit forces util_sum to be synced with the new util_avg after removing the contribution of a task and before the next periodic sync. By doing so util_sum is rounded to its lower bound and might lost up to LOAD_AVG_MAX-1 of accumulated contribution which has not yet been reflected in util_avg.
Instead of always setting util_sum to the low bound of util_avg, which can significantly lower the utilization of root cfs_rq after propagating the change down into the hierarchy, we revert the change of util_sum and propagate the difference.
In addition, we also check that cfs's util_sum always stays above the lower bound for a given util_avg as it has been observed that sched_entity's util_sum is sometimes above cfs one.
Fixes: 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent") Reported-by: Rick Yiu rickyiu@google.com Signed-off-by: Vincent Guittot vincent.guittot@linaro.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Dietmar Eggemann dietmar.eggemann@arm.com Tested-by: Sachin Sant sachinp@linux.ibm.com Link: https://lkml.kernel.org/r/20220111134659.24961-2-vincent.guittot@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/sched/fair.c | 16 +++++++++++++--- kernel/sched/pelt.h | 4 +++- 2 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index f2cf047b25e56..069e01772d922 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3382,7 +3382,6 @@ void set_task_rq_fair(struct sched_entity *se, se->avg.last_update_time = n_last_update_time; }
- /* * When on migration a sched_entity joins/leaves the PELT hierarchy, we need to * propagate its contribution. The key to this propagation is the invariant @@ -3450,7 +3449,6 @@ void set_task_rq_fair(struct sched_entity *se, * XXX: only do this for the part of runnable > running ? * */ - static inline void update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq) { @@ -3682,7 +3680,19 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
r = removed_util; sub_positive(&sa->util_avg, r); - sa->util_sum = sa->util_avg * divider; + sub_positive(&sa->util_sum, r * divider); + /* + * Because of rounding, se->util_sum might ends up being +1 more than + * cfs->util_sum. Although this is not a problem by itself, detaching + * a lot of tasks with the rounding problem between 2 updates of + * util_avg (~1ms) can make cfs->util_sum becoming null whereas + * cfs_util_avg is not. + * Check that util_sum is still above its lower bound for the new + * util_avg. Given that period_contrib might have moved since the last + * sync, we are only sure that util_sum must be above or equal to + * util_avg * minimum possible divider + */ + sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * PELT_MIN_DIVIDER);
r = removed_runnable; sub_positive(&sa->runnable_avg, r); diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h index e06071bf3472c..c336f5f481bca 100644 --- a/kernel/sched/pelt.h +++ b/kernel/sched/pelt.h @@ -37,9 +37,11 @@ update_irq_load_avg(struct rq *rq, u64 running) } #endif
+#define PELT_MIN_DIVIDER (LOAD_AVG_MAX - 1024) + static inline u32 get_pelt_divider(struct sched_avg *avg) { - return LOAD_AVG_MAX - 1024 + avg->period_contrib; + return PELT_MIN_DIVIDER + avg->period_contrib; }
static inline void cfs_se_util_change(struct sched_avg *avg)
From: Robert Hancock robert.hancock@calian.com
[ Upstream commit d15c7e875d44367005370e6a82e8f3a382a04f9b ]
A problem was encountered with the Bel-Fuse 1GBT-SFP05 SFP module (which is a 1 Gbps copper module operating in SGMII mode with an internal BCM54616S PHY device) using the Xilinx AXI Ethernet MAC core, where the module would work properly on the initial insertion or boot of the device, but after the device was rebooted, the link would either only come up at 100 Mbps speeds or go up and down erratically.
I found no meaningful changes in the PHY configuration registers between the working and non-working boots, but the status registers seemed to have a lot of error indications set on the SERDES side of the device on the non-working boot. I suspect the problem is that whatever happens on the SGMII link when the device is rebooted and the FPGA logic gets reloaded ends up putting the module's onboard PHY into a bad state.
Since commit 6e2d85ec0559 ("net: phy: Stop with excessive soft reset") the genphy_soft_reset call is not made automatically by the PHY core unless the callback is explicitly specified in the driver structure. For most of these Broadcom devices, there is probably a hardware reset that gets asserted to reset the PHY during boot, however for SFP modules (where the BCM54616S is commonly found) no such reset line exists, so if the board keeps the SFP cage powered up across a reboot, it will end up with no reset occurring during reboots.
Hook up the genphy_soft_reset callback for BCM54616S to ensure that a PHY reset is performed before the device is initialized. This appears to fix the issue with erratic operation after a reboot with this SFP module.
Fixes: 6e2d85ec0559 ("net: phy: Stop with excessive soft reset") Signed-off-by: Robert Hancock robert.hancock@calian.com Reviewed-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/broadcom.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/phy/broadcom.c b/drivers/net/phy/broadcom.c index bb5104ae46104..3c683e0e40e9e 100644 --- a/drivers/net/phy/broadcom.c +++ b/drivers/net/phy/broadcom.c @@ -854,6 +854,7 @@ static struct phy_driver broadcom_drivers[] = { .phy_id_mask = 0xfffffff0, .name = "Broadcom BCM54616S", /* PHY_GBIT_FEATURES */ + .soft_reset = genphy_soft_reset, .config_init = bcm54xx_config_init, .config_aneg = bcm54616s_config_aneg, .config_intr = bcm_phy_config_intr,
From: Moshe Tal moshet@nvidia.com
[ Upstream commit e2f08207c558bc0bc8abaa557cdb29bad776ac7b ]
The link extended sub-states are assigned as enum that is an integer size but read from a union as u8, this is working for small values on little endian systems but for big endian this always give 0. Fix the variable in the union to match the enum size.
Fixes: ecc31c60240b ("ethtool: Add link extended state") Signed-off-by: Moshe Tal moshet@nvidia.com Reviewed-by: Ido Schimmel idosch@nvidia.com Tested-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Gal Pressman gal@nvidia.com Reviewed-by: Amit Cohen amcohen@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/ethtool.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h index 845a0ffc16ee8..d8f07baf272ad 100644 --- a/include/linux/ethtool.h +++ b/include/linux/ethtool.h @@ -95,7 +95,7 @@ struct ethtool_link_ext_state_info { enum ethtool_link_ext_substate_bad_signal_integrity bad_signal_integrity; enum ethtool_link_ext_substate_cable_issue cable_issue; enum ethtool_link_ext_substate_module module; - u8 __link_ext_substate; + u32 __link_ext_substate; }; };
From: Yuji Ishikawa yuji2.ishikawa@toshiba.co.jp
[ Upstream commit 1ba1a4a90fa416a6f389206416c5f488cf8b1543 ]
just 0 should be used to represent cleared bits
* ETHER_CLK_SEL_DIV_SEL_20 * ETHER_CLK_SEL_TX_CLK_EXT_SEL_IN * ETHER_CLK_SEL_RX_CLK_EXT_SEL_IN * ETHER_CLK_SEL_TX_CLK_O_TX_I * ETHER_CLK_SEL_RMII_CLK_SEL_IN
Fixes: b38dd98ff8d0 ("net: stmmac: Add Toshiba Visconti SoCs glue driver") Signed-off-by: Yuji Ishikawa yuji2.ishikawa@toshiba.co.jp Reviewed-by: Nobuhiro Iwamatsu nobuhiro1.iwamatsu@toshiba.co.jp Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/dwmac-visconti.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-visconti.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-visconti.c index e2e0f977875d7..43a446ceadf7a 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-visconti.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-visconti.c @@ -22,21 +22,21 @@ #define ETHER_CLK_SEL_RMII_CLK_EN BIT(2) #define ETHER_CLK_SEL_RMII_CLK_RST BIT(3) #define ETHER_CLK_SEL_DIV_SEL_2 BIT(4) -#define ETHER_CLK_SEL_DIV_SEL_20 BIT(0) +#define ETHER_CLK_SEL_DIV_SEL_20 0 #define ETHER_CLK_SEL_FREQ_SEL_125M (BIT(9) | BIT(8)) #define ETHER_CLK_SEL_FREQ_SEL_50M BIT(9) #define ETHER_CLK_SEL_FREQ_SEL_25M BIT(8) #define ETHER_CLK_SEL_FREQ_SEL_2P5M 0 -#define ETHER_CLK_SEL_TX_CLK_EXT_SEL_IN BIT(0) +#define ETHER_CLK_SEL_TX_CLK_EXT_SEL_IN 0 #define ETHER_CLK_SEL_TX_CLK_EXT_SEL_TXC BIT(10) #define ETHER_CLK_SEL_TX_CLK_EXT_SEL_DIV BIT(11) -#define ETHER_CLK_SEL_RX_CLK_EXT_SEL_IN BIT(0) +#define ETHER_CLK_SEL_RX_CLK_EXT_SEL_IN 0 #define ETHER_CLK_SEL_RX_CLK_EXT_SEL_RXC BIT(12) #define ETHER_CLK_SEL_RX_CLK_EXT_SEL_DIV BIT(13) -#define ETHER_CLK_SEL_TX_CLK_O_TX_I BIT(0) +#define ETHER_CLK_SEL_TX_CLK_O_TX_I 0 #define ETHER_CLK_SEL_TX_CLK_O_RMII_I BIT(14) #define ETHER_CLK_SEL_TX_O_E_N_IN BIT(15) -#define ETHER_CLK_SEL_RMII_CLK_SEL_IN BIT(0) +#define ETHER_CLK_SEL_RMII_CLK_SEL_IN 0 #define ETHER_CLK_SEL_RMII_CLK_SEL_RX_C BIT(16)
#define ETHER_CLK_SEL_RX_TX_CLK_EN (ETHER_CLK_SEL_RX_CLK_EN | ETHER_CLK_SEL_TX_CLK_EN)
From: Yuji Ishikawa yuji2.ishikawa@toshiba.co.jp
[ Upstream commit 0959bc4bd4206433ed101a1332a23e93ad16ec77 ]
Bit pattern of the ETHER_CLOCK_SEL register for RMII/MII mode should be fixed. Also, some control bits should be modified with a specific sequence.
Fixes: b38dd98ff8d0 ("net: stmmac: Add Toshiba Visconti SoCs glue driver") Signed-off-by: Yuji Ishikawa yuji2.ishikawa@toshiba.co.jp Reviewed-by: Nobuhiro Iwamatsu nobuhiro1.iwamatsu@toshiba.co.jp Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/stmicro/stmmac/dwmac-visconti.c | 32 ++++++++++++------- 1 file changed, 21 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-visconti.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-visconti.c index 43a446ceadf7a..dde5b772a5af7 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-visconti.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-visconti.c @@ -96,31 +96,41 @@ static void visconti_eth_fix_mac_speed(void *priv, unsigned int speed) val |= ETHER_CLK_SEL_TX_O_E_N_IN; writel(val, dwmac->reg + REG_ETHER_CLOCK_SEL);
+ /* Set Clock-Mux, Start clock, Set TX_O direction */ switch (dwmac->phy_intf_sel) { case ETHER_CONFIG_INTF_RGMII: val = clk_sel_val | ETHER_CLK_SEL_RX_CLK_EXT_SEL_RXC; + writel(val, dwmac->reg + REG_ETHER_CLOCK_SEL); + + val |= ETHER_CLK_SEL_RX_TX_CLK_EN; + writel(val, dwmac->reg + REG_ETHER_CLOCK_SEL); + + val &= ~ETHER_CLK_SEL_TX_O_E_N_IN; + writel(val, dwmac->reg + REG_ETHER_CLOCK_SEL); break; case ETHER_CONFIG_INTF_RMII: val = clk_sel_val | ETHER_CLK_SEL_RX_CLK_EXT_SEL_DIV | - ETHER_CLK_SEL_TX_CLK_EXT_SEL_TXC | ETHER_CLK_SEL_TX_O_E_N_IN | + ETHER_CLK_SEL_TX_CLK_EXT_SEL_DIV | ETHER_CLK_SEL_TX_O_E_N_IN | ETHER_CLK_SEL_RMII_CLK_SEL_RX_C; + writel(val, dwmac->reg + REG_ETHER_CLOCK_SEL); + + val |= ETHER_CLK_SEL_RMII_CLK_RST; + writel(val, dwmac->reg + REG_ETHER_CLOCK_SEL); + + val |= ETHER_CLK_SEL_RMII_CLK_EN | ETHER_CLK_SEL_RX_TX_CLK_EN; + writel(val, dwmac->reg + REG_ETHER_CLOCK_SEL); break; case ETHER_CONFIG_INTF_MII: default: val = clk_sel_val | ETHER_CLK_SEL_RX_CLK_EXT_SEL_RXC | - ETHER_CLK_SEL_TX_CLK_EXT_SEL_DIV | ETHER_CLK_SEL_TX_O_E_N_IN | - ETHER_CLK_SEL_RMII_CLK_EN; + ETHER_CLK_SEL_TX_CLK_EXT_SEL_TXC | ETHER_CLK_SEL_TX_O_E_N_IN; + writel(val, dwmac->reg + REG_ETHER_CLOCK_SEL); + + val |= ETHER_CLK_SEL_RX_TX_CLK_EN; + writel(val, dwmac->reg + REG_ETHER_CLOCK_SEL); break; }
- /* Start clock */ - writel(val, dwmac->reg + REG_ETHER_CLOCK_SEL); - val |= ETHER_CLK_SEL_RX_TX_CLK_EN; - writel(val, dwmac->reg + REG_ETHER_CLOCK_SEL); - - val &= ~ETHER_CLK_SEL_TX_O_E_N_IN; - writel(val, dwmac->reg + REG_ETHER_CLOCK_SEL); - spin_unlock_irqrestore(&dwmac->lock, flags); }
From: Marek Behún kabel@kernel.org
[ Upstream commit cbda1b16687580d5beee38273f6241ae3725960c ]
Commit bafbdd527d56 ("phylib: Add device reset GPIO support") added call to phy_device_reset(phydev) after the put_device() call in phy_detach().
The comment before the put_device() call says that the phydev might go away with put_device().
Fix potential use-after-free by calling phy_device_reset() before put_device().
Fixes: bafbdd527d56 ("phylib: Add device reset GPIO support") Signed-off-by: Marek Behún kabel@kernel.org Reviewed-by: Andrew Lunn andrew@lunn.ch Link: https://lore.kernel.org/r/20220119162748.32418-1-kabel@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/phy_device.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 74d8e1dc125f8..ce0bb5951b81e 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -1746,6 +1746,9 @@ void phy_detach(struct phy_device *phydev) phy_driver_is_genphy_10g(phydev)) device_release_driver(&phydev->mdio.dev);
+ /* Assert the reset signal */ + phy_device_reset(phydev, 1); + /* * The phydev might go away on the put_device() below, so avoid * a use-after-free bug by reading the underlying bus first. @@ -1757,9 +1760,6 @@ void phy_detach(struct phy_device *phydev) ndev_owner = dev->dev.parent->driver->owner; if (ndev_owner != bus->owner) module_put(bus->owner); - - /* Assert the reset signal */ - phy_device_reset(phydev, 1); } EXPORT_SYMBOL(phy_detach);
From: Davide Caratti dcaratti@redhat.com
[ Upstream commit 602837e8479d20d49559b4b97b79d34c0efe7ecb ]
a non-zero 'id' is sufficient to identify MPTCP endpoints: allow changing the value of 'backup' bit by simply specifying the endpoint id.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/158 Signed-off-by: Davide Caratti dcaratti@redhat.com Signed-off-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/pm_netlink.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index 65764c8171b37..d18b13e3e74c6 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -1711,22 +1711,28 @@ next:
static int mptcp_nl_cmd_set_flags(struct sk_buff *skb, struct genl_info *info) { + struct mptcp_pm_addr_entry addr = { .addr = { .family = AF_UNSPEC }, }, *entry; struct nlattr *attr = info->attrs[MPTCP_PM_ATTR_ADDR]; struct pm_nl_pernet *pernet = genl_info_pm_nl(info); - struct mptcp_pm_addr_entry addr, *entry; struct net *net = sock_net(skb->sk); - u8 bkup = 0; + u8 bkup = 0, lookup_by_id = 0; int ret;
- ret = mptcp_pm_parse_addr(attr, info, true, &addr); + ret = mptcp_pm_parse_addr(attr, info, false, &addr); if (ret < 0) return ret;
if (addr.flags & MPTCP_PM_ADDR_FLAG_BACKUP) bkup = 1; + if (addr.addr.family == AF_UNSPEC) { + lookup_by_id = 1; + if (!addr.addr.id) + return -EOPNOTSUPP; + }
list_for_each_entry(entry, &pernet->local_addr_list, list) { - if (addresses_equal(&entry->addr, &addr.addr, true)) { + if ((!lookup_by_id && addresses_equal(&entry->addr, &addr.addr, true)) || + (lookup_by_id && entry->addr.id == addr.addr.id)) { mptcp_nl_addr_backup(net, &entry->addr, bkup);
if (bkup)
On Mon, 31 Jan 2022, Greg Kroah-Hartman wrote:
From: Davide Caratti dcaratti@redhat.com
[ Upstream commit 602837e8479d20d49559b4b97b79d34c0efe7ecb ]
Please drop this from both the 5.15 and 5.16 queues. This patch adds a new feature (doesn't appear to meet the stable rules). It is fairly self contained and probably won't break anything, but it wasn't intended to be backported.
Thanks, Mat
a non-zero 'id' is sufficient to identify MPTCP endpoints: allow changing the value of 'backup' bit by simply specifying the endpoint id.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/158 Signed-off-by: Davide Caratti dcaratti@redhat.com Signed-off-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org
net/mptcp/pm_netlink.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index 65764c8171b37..d18b13e3e74c6 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -1711,22 +1711,28 @@ next:
static int mptcp_nl_cmd_set_flags(struct sk_buff *skb, struct genl_info *info) {
- struct mptcp_pm_addr_entry addr = { .addr = { .family = AF_UNSPEC }, }, *entry; struct nlattr *attr = info->attrs[MPTCP_PM_ATTR_ADDR]; struct pm_nl_pernet *pernet = genl_info_pm_nl(info);
- struct mptcp_pm_addr_entry addr, *entry; struct net *net = sock_net(skb->sk);
- u8 bkup = 0;
- u8 bkup = 0, lookup_by_id = 0; int ret;
- ret = mptcp_pm_parse_addr(attr, info, true, &addr);
ret = mptcp_pm_parse_addr(attr, info, false, &addr); if (ret < 0) return ret;
if (addr.flags & MPTCP_PM_ADDR_FLAG_BACKUP) bkup = 1;
if (addr.addr.family == AF_UNSPEC) {
lookup_by_id = 1;
if (!addr.addr.id)
return -EOPNOTSUPP;
}
list_for_each_entry(entry, &pernet->local_addr_list, list) {
if (addresses_equal(&entry->addr, &addr.addr, true)) {
if ((!lookup_by_id && addresses_equal(&entry->addr, &addr.addr, true)) ||
(lookup_by_id && entry->addr.id == addr.addr.id)) { mptcp_nl_addr_backup(net, &entry->addr, bkup); if (bkup)
-- 2.34.1
-- Mat Martineau Intel
From: Jean Sacren sakiwit@gmail.com
[ Upstream commit 59060a47ca50bbdb1d863b73667a1065873ecc06 ]
entry->addr.id is u8 with a range from 0 to 255 and MAX_ADDR_ID is 255. We should drop both false expressions of (entry->addr.id > MAX_ADDR_ID).
We should also remove the obsolete parentheses in the first if branch.
Use U8_MAX for MAX_ADDR_ID and add a comment to show the link to mptcp_addr_info.id as suggested by Mr. Matthieu Baerts.
Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jean Sacren sakiwit@gmail.com Signed-off-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/pm_netlink.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index d18b13e3e74c6..27427aeeee0e5 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -38,7 +38,8 @@ struct mptcp_pm_add_entry { u8 retrans_times; };
-#define MAX_ADDR_ID 255 +/* max value of mptcp_addr_info.id */ +#define MAX_ADDR_ID U8_MAX #define BITMAP_SZ DIV_ROUND_UP(MAX_ADDR_ID + 1, BITS_PER_LONG)
struct pm_nl_pernet { @@ -831,14 +832,13 @@ find_next: entry->addr.id = find_next_zero_bit(pernet->id_bitmap, MAX_ADDR_ID + 1, pernet->next_id); - if ((!entry->addr.id || entry->addr.id > MAX_ADDR_ID) && - pernet->next_id != 1) { + if (!entry->addr.id && pernet->next_id != 1) { pernet->next_id = 1; goto find_next; } }
- if (!entry->addr.id || entry->addr.id > MAX_ADDR_ID) + if (!entry->addr.id) goto out;
__set_bit(entry->addr.id, pernet->id_bitmap);
On Mon, 31 Jan 2022, Greg Kroah-Hartman wrote:
From: Jean Sacren sakiwit@gmail.com
[ Upstream commit 59060a47ca50bbdb1d863b73667a1065873ecc06 ]
Please drop this from the stable queue for both 5.15 and 5.16. This is a code cleanup (no functional change) patch that was originally merged in net-next and then got selected for stable.
It's pretty harmless to backport this one, but I hope this feedback is useful for tuning your scripts or manual patch review processes. If it's more helpful for me to let something like this slide by, or I'm misunderstanding how this might belong in the stable trees, I am likewise open to feedback!
Thanks, Mat
entry->addr.id is u8 with a range from 0 to 255 and MAX_ADDR_ID is 255. We should drop both false expressions of (entry->addr.id > MAX_ADDR_ID).
We should also remove the obsolete parentheses in the first if branch.
Use U8_MAX for MAX_ADDR_ID and add a comment to show the link to mptcp_addr_info.id as suggested by Mr. Matthieu Baerts.
Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Jean Sacren sakiwit@gmail.com Signed-off-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org
net/mptcp/pm_netlink.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index d18b13e3e74c6..27427aeeee0e5 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -38,7 +38,8 @@ struct mptcp_pm_add_entry { u8 retrans_times; };
-#define MAX_ADDR_ID 255 +/* max value of mptcp_addr_info.id */ +#define MAX_ADDR_ID U8_MAX #define BITMAP_SZ DIV_ROUND_UP(MAX_ADDR_ID + 1, BITS_PER_LONG)
struct pm_nl_pernet { @@ -831,14 +832,13 @@ find_next: entry->addr.id = find_next_zero_bit(pernet->id_bitmap, MAX_ADDR_ID + 1, pernet->next_id);
if ((!entry->addr.id || entry->addr.id > MAX_ADDR_ID) &&
pernet->next_id != 1) {
} }if (!entry->addr.id && pernet->next_id != 1) { pernet->next_id = 1; goto find_next;
- if (!entry->addr.id || entry->addr.id > MAX_ADDR_ID)
if (!entry->addr.id) goto out;
__set_bit(entry->addr.id, pernet->id_bitmap);
-- 2.34.1
-- Mat Martineau Intel
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 86e39e04482b0aadf3ee3ed5fcf2d63816559d36 ]
Include into the path manager status a bitmap tracking the list of local endpoints still available - not yet used - for the relevant mptcp socket.
Keep such map updated at endpoint creation/deletion time, so that we can easily skip already used endpoint at local address selection time.
The endpoint used by the initial subflow is lazyly accounted at subflow creation time: the usage bitmap is be up2date before endpoint selection and we avoid such unneeded task in some relevant scenarios - e.g. busy servers accepting incoming subflows but not creating any additional ones nor annuncing additional addresses.
Overall this allows for fair local endpoints usage in case of subflow failure.
As a side effect, this patch also enforces that each endpoint is used at most once for each mptcp connection.
Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/pm.c | 1 + net/mptcp/pm_netlink.c | 125 +++++++++++------- net/mptcp/protocol.c | 3 +- net/mptcp/protocol.h | 12 +- .../testing/selftests/net/mptcp/mptcp_join.sh | 5 +- 5 files changed, 91 insertions(+), 55 deletions(-)
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c index 6ab386ff32944..332ac6eda3ba4 100644 --- a/net/mptcp/pm.c +++ b/net/mptcp/pm.c @@ -370,6 +370,7 @@ void mptcp_pm_data_init(struct mptcp_sock *msk) WRITE_ONCE(msk->pm.accept_subflow, false); WRITE_ONCE(msk->pm.remote_deny_join_id0, false); msk->pm.status = 0; + bitmap_fill(msk->pm.id_avail_bitmap, MPTCP_PM_MAX_ADDR_ID + 1);
spin_lock_init(&msk->pm.lock); INIT_LIST_HEAD(&msk->pm.anno_list); diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index 27427aeeee0e5..ad3dc9c6c5310 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -38,10 +38,6 @@ struct mptcp_pm_add_entry { u8 retrans_times; };
-/* max value of mptcp_addr_info.id */ -#define MAX_ADDR_ID U8_MAX -#define BITMAP_SZ DIV_ROUND_UP(MAX_ADDR_ID + 1, BITS_PER_LONG) - struct pm_nl_pernet { /* protects pernet updates */ spinlock_t lock; @@ -53,14 +49,14 @@ struct pm_nl_pernet { unsigned int local_addr_max; unsigned int subflows_max; unsigned int next_id; - unsigned long id_bitmap[BITMAP_SZ]; + DECLARE_BITMAP(id_bitmap, MPTCP_PM_MAX_ADDR_ID + 1); };
#define MPTCP_PM_ADDR_MAX 8 #define ADD_ADDR_RETRANS_MAX 3
static bool addresses_equal(const struct mptcp_addr_info *a, - struct mptcp_addr_info *b, bool use_port) + const struct mptcp_addr_info *b, bool use_port) { bool addr_equals = false;
@@ -174,6 +170,9 @@ select_local_address(const struct pm_nl_pernet *pernet, if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)) continue;
+ if (!test_bit(entry->addr.id, msk->pm.id_avail_bitmap)) + continue; + if (entry->addr.family != sk->sk_family) { #if IS_ENABLED(CONFIG_MPTCP_IPV6) if ((entry->addr.family == AF_INET && @@ -184,23 +183,17 @@ select_local_address(const struct pm_nl_pernet *pernet, continue; }
- /* avoid any address already in use by subflows and - * pending join - */ - if (!lookup_subflow_by_saddr(&msk->conn_list, &entry->addr)) { - ret = entry; - break; - } + ret = entry; + break; } rcu_read_unlock(); return ret; }
static struct mptcp_pm_addr_entry * -select_signal_address(struct pm_nl_pernet *pernet, unsigned int pos) +select_signal_address(struct pm_nl_pernet *pernet, struct mptcp_sock *msk) { struct mptcp_pm_addr_entry *entry, *ret = NULL; - int i = 0;
rcu_read_lock(); /* do not keep any additional per socket state, just signal @@ -209,12 +202,14 @@ select_signal_address(struct pm_nl_pernet *pernet, unsigned int pos) * can lead to additional addresses not being announced. */ list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) { + if (!test_bit(entry->addr.id, msk->pm.id_avail_bitmap)) + continue; + if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL)) continue; - if (i++ == pos) { - ret = entry; - break; - } + + ret = entry; + break; } rcu_read_unlock(); return ret; @@ -258,9 +253,11 @@ EXPORT_SYMBOL_GPL(mptcp_pm_get_local_addr_max);
static void check_work_pending(struct mptcp_sock *msk) { - if (msk->pm.add_addr_signaled == mptcp_pm_get_add_addr_signal_max(msk) && - (msk->pm.local_addr_used == mptcp_pm_get_local_addr_max(msk) || - msk->pm.subflows == mptcp_pm_get_subflows_max(msk))) + struct pm_nl_pernet *pernet = net_generic(sock_net((struct sock *)msk), pm_nl_pernet_id); + + if (msk->pm.subflows == mptcp_pm_get_subflows_max(msk) || + (find_next_and_bit(pernet->id_bitmap, msk->pm.id_avail_bitmap, + MPTCP_PM_MAX_ADDR_ID + 1, 0) == MPTCP_PM_MAX_ADDR_ID + 1)) WRITE_ONCE(msk->pm.work_pending, false); }
@@ -460,6 +457,35 @@ static unsigned int fill_remote_addresses_vec(struct mptcp_sock *msk, bool fullm return i; }
+static struct mptcp_pm_addr_entry * +__lookup_addr_by_id(struct pm_nl_pernet *pernet, unsigned int id) +{ + struct mptcp_pm_addr_entry *entry; + + list_for_each_entry(entry, &pernet->local_addr_list, list) { + if (entry->addr.id == id) + return entry; + } + return NULL; +} + +static int +lookup_id_by_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *addr) +{ + struct mptcp_pm_addr_entry *entry; + int ret = -1; + + rcu_read_lock(); + list_for_each_entry(entry, &pernet->local_addr_list, list) { + if (addresses_equal(&entry->addr, addr, entry->addr.port)) { + ret = entry->addr.id; + break; + } + } + rcu_read_unlock(); + return ret; +} + static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) { struct sock *sk = (struct sock *)msk; @@ -475,6 +501,19 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) local_addr_max = mptcp_pm_get_local_addr_max(msk); subflows_max = mptcp_pm_get_subflows_max(msk);
+ /* do lazy endpoint usage accounting for the MPC subflows */ + if (unlikely(!(msk->pm.status & BIT(MPTCP_PM_MPC_ENDPOINT_ACCOUNTED))) && msk->first) { + struct mptcp_addr_info local; + int mpc_id; + + local_address((struct sock_common *)msk->first, &local); + mpc_id = lookup_id_by_addr(pernet, &local); + if (mpc_id < 0) + __clear_bit(mpc_id, msk->pm.id_avail_bitmap); + + msk->pm.status |= BIT(MPTCP_PM_MPC_ENDPOINT_ACCOUNTED); + } + pr_debug("local %d:%d signal %d:%d subflows %d:%d\n", msk->pm.local_addr_used, local_addr_max, msk->pm.add_addr_signaled, add_addr_signal_max, @@ -482,21 +521,16 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
/* check first for announce */ if (msk->pm.add_addr_signaled < add_addr_signal_max) { - local = select_signal_address(pernet, - msk->pm.add_addr_signaled); + local = select_signal_address(pernet, msk);
if (local) { if (mptcp_pm_alloc_anno_list(msk, local)) { + __clear_bit(local->addr.id, msk->pm.id_avail_bitmap); msk->pm.add_addr_signaled++; mptcp_pm_announce_addr(msk, &local->addr, false); mptcp_pm_nl_addr_send_ack(msk); } - } else { - /* pick failed, avoid fourther attempts later */ - msk->pm.local_addr_used = add_addr_signal_max; } - - check_work_pending(msk); }
/* check if should create a new subflow */ @@ -510,19 +544,16 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) int i, nr;
msk->pm.local_addr_used++; - check_work_pending(msk); nr = fill_remote_addresses_vec(msk, fullmesh, addrs); + if (nr) + __clear_bit(local->addr.id, msk->pm.id_avail_bitmap); spin_unlock_bh(&msk->pm.lock); for (i = 0; i < nr; i++) __mptcp_subflow_connect(sk, &local->addr, &addrs[i]); spin_lock_bh(&msk->pm.lock); - return; } - - /* lookup failed, avoid fourther attempts later */ - msk->pm.local_addr_used = local_addr_max; - check_work_pending(msk); } + check_work_pending(msk); }
static void mptcp_pm_nl_fully_established(struct mptcp_sock *msk) @@ -736,6 +767,7 @@ static void mptcp_pm_nl_rm_addr_or_subflow(struct mptcp_sock *msk, msk->pm.subflows--; __MPTCP_INC_STATS(sock_net(sk), rm_type); } + __set_bit(rm_list->ids[1], msk->pm.id_avail_bitmap); if (!removed) continue;
@@ -765,6 +797,9 @@ void mptcp_pm_nl_work(struct mptcp_sock *msk)
msk_owned_by_me(msk);
+ if (!(pm->status & MPTCP_PM_WORK_MASK)) + return; + spin_lock_bh(&msk->pm.lock);
pr_debug("msk=%p status=%x", msk, pm->status); @@ -810,7 +845,7 @@ static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet, /* to keep the code simple, don't do IDR-like allocation for address ID, * just bail when we exceed limits */ - if (pernet->next_id == MAX_ADDR_ID) + if (pernet->next_id == MPTCP_PM_MAX_ADDR_ID) pernet->next_id = 1; if (pernet->addrs >= MPTCP_PM_ADDR_MAX) goto out; @@ -830,7 +865,7 @@ static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet, if (!entry->addr.id) { find_next: entry->addr.id = find_next_zero_bit(pernet->id_bitmap, - MAX_ADDR_ID + 1, + MPTCP_PM_MAX_ADDR_ID + 1, pernet->next_id); if (!entry->addr.id && pernet->next_id != 1) { pernet->next_id = 1; @@ -1197,18 +1232,6 @@ static int mptcp_nl_cmd_add_addr(struct sk_buff *skb, struct genl_info *info) return 0; }
-static struct mptcp_pm_addr_entry * -__lookup_addr_by_id(struct pm_nl_pernet *pernet, unsigned int id) -{ - struct mptcp_pm_addr_entry *entry; - - list_for_each_entry(entry, &pernet->local_addr_list, list) { - if (entry->addr.id == id) - return entry; - } - return NULL; -} - int mptcp_pm_get_flags_and_ifindex_by_id(struct net *net, unsigned int id, u8 *flags, int *ifindex) { @@ -1467,7 +1490,7 @@ static int mptcp_nl_cmd_flush_addrs(struct sk_buff *skb, struct genl_info *info) list_splice_init(&pernet->local_addr_list, &free_list); __reset_counters(pernet); pernet->next_id = 1; - bitmap_zero(pernet->id_bitmap, MAX_ADDR_ID + 1); + bitmap_zero(pernet->id_bitmap, MPTCP_PM_MAX_ADDR_ID + 1); spin_unlock_bh(&pernet->lock); mptcp_nl_remove_addrs_list(sock_net(skb->sk), &free_list); synchronize_rcu(); @@ -1577,7 +1600,7 @@ static int mptcp_nl_cmd_dump_addrs(struct sk_buff *msg, pernet = net_generic(net, pm_nl_pernet_id);
spin_lock_bh(&pernet->lock); - for (i = id; i < MAX_ADDR_ID + 1; i++) { + for (i = id; i < MPTCP_PM_MAX_ADDR_ID + 1; i++) { if (test_bit(i, pernet->id_bitmap)) { entry = __lookup_addr_by_id(pernet, i); if (!entry) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 0cd55e4c30fab..70a3cac85f7b8 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2416,8 +2416,7 @@ static void mptcp_worker(struct work_struct *work)
mptcp_check_fastclose(msk);
- if (msk->pm.status) - mptcp_pm_nl_work(msk); + mptcp_pm_nl_work(msk);
if (test_and_clear_bit(MPTCP_WORK_EOF, &msk->flags)) mptcp_check_for_eof(msk); diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index d87cc040352e3..0387ad9fb43f7 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -174,16 +174,25 @@ enum mptcp_pm_status { MPTCP_PM_ADD_ADDR_SEND_ACK, MPTCP_PM_RM_ADDR_RECEIVED, MPTCP_PM_ESTABLISHED, - MPTCP_PM_ALREADY_ESTABLISHED, /* persistent status, set after ESTABLISHED event */ MPTCP_PM_SUBFLOW_ESTABLISHED, + MPTCP_PM_ALREADY_ESTABLISHED, /* persistent status, set after ESTABLISHED event */ + MPTCP_PM_MPC_ENDPOINT_ACCOUNTED /* persistent status, set after MPC local address is + * accounted int id_avail_bitmap + */ };
+/* Status bits below MPTCP_PM_ALREADY_ESTABLISHED need pm worker actions */ +#define MPTCP_PM_WORK_MASK ((1 << MPTCP_PM_ALREADY_ESTABLISHED) - 1) + enum mptcp_addr_signal_status { MPTCP_ADD_ADDR_SIGNAL, MPTCP_ADD_ADDR_ECHO, MPTCP_RM_ADDR_SIGNAL, };
+/* max value of mptcp_addr_info.id */ +#define MPTCP_PM_MAX_ADDR_ID U8_MAX + struct mptcp_pm_data { struct mptcp_addr_info local; struct mptcp_addr_info remote; @@ -202,6 +211,7 @@ struct mptcp_pm_data { u8 local_addr_used; u8 subflows; u8 status; + DECLARE_BITMAP(id_avail_bitmap, MPTCP_PM_MAX_ADDR_ID + 1); struct mptcp_rm_list rm_list_tx; struct mptcp_rm_list rm_list_rx; }; diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh index 7ef639a9d4a6f..bbafa4cf54538 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -1071,7 +1071,10 @@ signal_address_tests() ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags signal ip netns exec $ns2 ./pm_nl_ctl add 10.0.4.2 flags signal run_tests $ns1 $ns2 10.0.1.1 - chk_add_nr 4 4 + + # the server will not signal the address terminating + # the MPC subflow + chk_add_nr 3 3 }
link_failure_tests()
On Mon, 31 Jan 2022, Greg Kroah-Hartman wrote:
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 86e39e04482b0aadf3ee3ed5fcf2d63816559d36 ]
Hi Greg -
Please drop this from the stable queue for both 5.16 and 5.15. It wasn't intended for backporting and we haven't checked for dependencies with other changes in this part of MPTCP subsystem.
In the mptcp tree we make sure to add Fixes: tags on every patch we think is eligible for the -stable tree. I know you're sifting through a lot of patches from subsystems that end up with important fixes arriving from the "-next" branches, and it seems like the scripts scooped up several MPTCP patches this time around that don't meet the -stable rules.
Thanks,
Mat
Include into the path manager status a bitmap tracking the list of local endpoints still available - not yet used - for the relevant mptcp socket.
Keep such map updated at endpoint creation/deletion time, so that we can easily skip already used endpoint at local address selection time.
The endpoint used by the initial subflow is lazyly accounted at subflow creation time: the usage bitmap is be up2date before endpoint selection and we avoid such unneeded task in some relevant scenarios - e.g. busy servers accepting incoming subflows but not creating any additional ones nor annuncing additional addresses.
Overall this allows for fair local endpoints usage in case of subflow failure.
As a side effect, this patch also enforces that each endpoint is used at most once for each mptcp connection.
Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org
net/mptcp/pm.c | 1 + net/mptcp/pm_netlink.c | 125 +++++++++++------- net/mptcp/protocol.c | 3 +- net/mptcp/protocol.h | 12 +- .../testing/selftests/net/mptcp/mptcp_join.sh | 5 +- 5 files changed, 91 insertions(+), 55 deletions(-)
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c index 6ab386ff32944..332ac6eda3ba4 100644 --- a/net/mptcp/pm.c +++ b/net/mptcp/pm.c @@ -370,6 +370,7 @@ void mptcp_pm_data_init(struct mptcp_sock *msk) WRITE_ONCE(msk->pm.accept_subflow, false); WRITE_ONCE(msk->pm.remote_deny_join_id0, false); msk->pm.status = 0;
bitmap_fill(msk->pm.id_avail_bitmap, MPTCP_PM_MAX_ADDR_ID + 1);
spin_lock_init(&msk->pm.lock); INIT_LIST_HEAD(&msk->pm.anno_list);
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index 27427aeeee0e5..ad3dc9c6c5310 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -38,10 +38,6 @@ struct mptcp_pm_add_entry { u8 retrans_times; };
-/* max value of mptcp_addr_info.id */ -#define MAX_ADDR_ID U8_MAX -#define BITMAP_SZ DIV_ROUND_UP(MAX_ADDR_ID + 1, BITS_PER_LONG)
struct pm_nl_pernet { /* protects pernet updates */ spinlock_t lock; @@ -53,14 +49,14 @@ struct pm_nl_pernet { unsigned int local_addr_max; unsigned int subflows_max; unsigned int next_id;
- unsigned long id_bitmap[BITMAP_SZ];
- DECLARE_BITMAP(id_bitmap, MPTCP_PM_MAX_ADDR_ID + 1);
};
#define MPTCP_PM_ADDR_MAX 8 #define ADD_ADDR_RETRANS_MAX 3
static bool addresses_equal(const struct mptcp_addr_info *a,
struct mptcp_addr_info *b, bool use_port)
const struct mptcp_addr_info *b, bool use_port)
{ bool addr_equals = false;
@@ -174,6 +170,9 @@ select_local_address(const struct pm_nl_pernet *pernet, if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)) continue;
if (!test_bit(entry->addr.id, msk->pm.id_avail_bitmap))
continue;
- if (entry->addr.family != sk->sk_family) {
#if IS_ENABLED(CONFIG_MPTCP_IPV6) if ((entry->addr.family == AF_INET && @@ -184,23 +183,17 @@ select_local_address(const struct pm_nl_pernet *pernet, continue; }
/* avoid any address already in use by subflows and
* pending join
*/
if (!lookup_subflow_by_saddr(&msk->conn_list, &entry->addr)) {
ret = entry;
break;
}
ret = entry;
} rcu_read_unlock(); return ret;break;
}
static struct mptcp_pm_addr_entry * -select_signal_address(struct pm_nl_pernet *pernet, unsigned int pos) +select_signal_address(struct pm_nl_pernet *pernet, struct mptcp_sock *msk) { struct mptcp_pm_addr_entry *entry, *ret = NULL;
int i = 0;
rcu_read_lock(); /* do not keep any additional per socket state, just signal
@@ -209,12 +202,14 @@ select_signal_address(struct pm_nl_pernet *pernet, unsigned int pos) * can lead to additional addresses not being announced. */ list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) {
if (!test_bit(entry->addr.id, msk->pm.id_avail_bitmap))
continue;
- if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL)) continue;
if (i++ == pos) {
ret = entry;
break;
}
ret = entry;
} rcu_read_unlock(); return ret;break;
@@ -258,9 +253,11 @@ EXPORT_SYMBOL_GPL(mptcp_pm_get_local_addr_max);
static void check_work_pending(struct mptcp_sock *msk) {
- if (msk->pm.add_addr_signaled == mptcp_pm_get_add_addr_signal_max(msk) &&
(msk->pm.local_addr_used == mptcp_pm_get_local_addr_max(msk) ||
msk->pm.subflows == mptcp_pm_get_subflows_max(msk)))
- struct pm_nl_pernet *pernet = net_generic(sock_net((struct sock *)msk), pm_nl_pernet_id);
- if (msk->pm.subflows == mptcp_pm_get_subflows_max(msk) ||
(find_next_and_bit(pernet->id_bitmap, msk->pm.id_avail_bitmap,
WRITE_ONCE(msk->pm.work_pending, false);MPTCP_PM_MAX_ADDR_ID + 1, 0) == MPTCP_PM_MAX_ADDR_ID + 1))
}
@@ -460,6 +457,35 @@ static unsigned int fill_remote_addresses_vec(struct mptcp_sock *msk, bool fullm return i; }
+static struct mptcp_pm_addr_entry * +__lookup_addr_by_id(struct pm_nl_pernet *pernet, unsigned int id) +{
- struct mptcp_pm_addr_entry *entry;
- list_for_each_entry(entry, &pernet->local_addr_list, list) {
if (entry->addr.id == id)
return entry;
- }
- return NULL;
+}
+static int +lookup_id_by_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *addr) +{
- struct mptcp_pm_addr_entry *entry;
- int ret = -1;
- rcu_read_lock();
- list_for_each_entry(entry, &pernet->local_addr_list, list) {
if (addresses_equal(&entry->addr, addr, entry->addr.port)) {
ret = entry->addr.id;
break;
}
- }
- rcu_read_unlock();
- return ret;
+}
static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) { struct sock *sk = (struct sock *)msk; @@ -475,6 +501,19 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) local_addr_max = mptcp_pm_get_local_addr_max(msk); subflows_max = mptcp_pm_get_subflows_max(msk);
- /* do lazy endpoint usage accounting for the MPC subflows */
- if (unlikely(!(msk->pm.status & BIT(MPTCP_PM_MPC_ENDPOINT_ACCOUNTED))) && msk->first) {
struct mptcp_addr_info local;
int mpc_id;
local_address((struct sock_common *)msk->first, &local);
mpc_id = lookup_id_by_addr(pernet, &local);
if (mpc_id < 0)
__clear_bit(mpc_id, msk->pm.id_avail_bitmap);
msk->pm.status |= BIT(MPTCP_PM_MPC_ENDPOINT_ACCOUNTED);
- }
- pr_debug("local %d:%d signal %d:%d subflows %d:%d\n", msk->pm.local_addr_used, local_addr_max, msk->pm.add_addr_signaled, add_addr_signal_max,
@@ -482,21 +521,16 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
/* check first for announce */ if (msk->pm.add_addr_signaled < add_addr_signal_max) {
local = select_signal_address(pernet,
msk->pm.add_addr_signaled);
local = select_signal_address(pernet, msk);
if (local) { if (mptcp_pm_alloc_anno_list(msk, local)) {
__clear_bit(local->addr.id, msk->pm.id_avail_bitmap); msk->pm.add_addr_signaled++; mptcp_pm_announce_addr(msk, &local->addr, false); mptcp_pm_nl_addr_send_ack(msk); }
} else {
/* pick failed, avoid fourther attempts later */
msk->pm.local_addr_used = add_addr_signal_max;
}
check_work_pending(msk);
}
/* check if should create a new subflow */
@@ -510,19 +544,16 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) int i, nr;
msk->pm.local_addr_used++;
check_work_pending(msk); nr = fill_remote_addresses_vec(msk, fullmesh, addrs);
if (nr)
__clear_bit(local->addr.id, msk->pm.id_avail_bitmap); spin_unlock_bh(&msk->pm.lock); for (i = 0; i < nr; i++) __mptcp_subflow_connect(sk, &local->addr, &addrs[i]); spin_lock_bh(&msk->pm.lock);
}return;
/* lookup failed, avoid fourther attempts later */
msk->pm.local_addr_used = local_addr_max;
}check_work_pending(msk);
- check_work_pending(msk);
}
static void mptcp_pm_nl_fully_established(struct mptcp_sock *msk) @@ -736,6 +767,7 @@ static void mptcp_pm_nl_rm_addr_or_subflow(struct mptcp_sock *msk, msk->pm.subflows--; __MPTCP_INC_STATS(sock_net(sk), rm_type); }
if (!removed) continue;__set_bit(rm_list->ids[1], msk->pm.id_avail_bitmap);
@@ -765,6 +797,9 @@ void mptcp_pm_nl_work(struct mptcp_sock *msk)
msk_owned_by_me(msk);
if (!(pm->status & MPTCP_PM_WORK_MASK))
return;
spin_lock_bh(&msk->pm.lock);
pr_debug("msk=%p status=%x", msk, pm->status);
@@ -810,7 +845,7 @@ static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet, /* to keep the code simple, don't do IDR-like allocation for address ID, * just bail when we exceed limits */
- if (pernet->next_id == MAX_ADDR_ID)
- if (pernet->next_id == MPTCP_PM_MAX_ADDR_ID) pernet->next_id = 1; if (pernet->addrs >= MPTCP_PM_ADDR_MAX) goto out;
@@ -830,7 +865,7 @@ static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet, if (!entry->addr.id) { find_next: entry->addr.id = find_next_zero_bit(pernet->id_bitmap,
MAX_ADDR_ID + 1,
if (!entry->addr.id && pernet->next_id != 1) { pernet->next_id = 1;MPTCP_PM_MAX_ADDR_ID + 1, pernet->next_id);
@@ -1197,18 +1232,6 @@ static int mptcp_nl_cmd_add_addr(struct sk_buff *skb, struct genl_info *info) return 0; }
-static struct mptcp_pm_addr_entry * -__lookup_addr_by_id(struct pm_nl_pernet *pernet, unsigned int id) -{
- struct mptcp_pm_addr_entry *entry;
- list_for_each_entry(entry, &pernet->local_addr_list, list) {
if (entry->addr.id == id)
return entry;
- }
- return NULL;
-}
int mptcp_pm_get_flags_and_ifindex_by_id(struct net *net, unsigned int id, u8 *flags, int *ifindex) { @@ -1467,7 +1490,7 @@ static int mptcp_nl_cmd_flush_addrs(struct sk_buff *skb, struct genl_info *info) list_splice_init(&pernet->local_addr_list, &free_list); __reset_counters(pernet); pernet->next_id = 1;
- bitmap_zero(pernet->id_bitmap, MAX_ADDR_ID + 1);
- bitmap_zero(pernet->id_bitmap, MPTCP_PM_MAX_ADDR_ID + 1); spin_unlock_bh(&pernet->lock); mptcp_nl_remove_addrs_list(sock_net(skb->sk), &free_list); synchronize_rcu();
@@ -1577,7 +1600,7 @@ static int mptcp_nl_cmd_dump_addrs(struct sk_buff *msg, pernet = net_generic(net, pm_nl_pernet_id);
spin_lock_bh(&pernet->lock);
- for (i = id; i < MAX_ADDR_ID + 1; i++) {
- for (i = id; i < MPTCP_PM_MAX_ADDR_ID + 1; i++) { if (test_bit(i, pernet->id_bitmap)) { entry = __lookup_addr_by_id(pernet, i); if (!entry)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 0cd55e4c30fab..70a3cac85f7b8 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2416,8 +2416,7 @@ static void mptcp_worker(struct work_struct *work)
mptcp_check_fastclose(msk);
- if (msk->pm.status)
mptcp_pm_nl_work(msk);
mptcp_pm_nl_work(msk);
if (test_and_clear_bit(MPTCP_WORK_EOF, &msk->flags)) mptcp_check_for_eof(msk);
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index d87cc040352e3..0387ad9fb43f7 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -174,16 +174,25 @@ enum mptcp_pm_status { MPTCP_PM_ADD_ADDR_SEND_ACK, MPTCP_PM_RM_ADDR_RECEIVED, MPTCP_PM_ESTABLISHED,
- MPTCP_PM_ALREADY_ESTABLISHED, /* persistent status, set after ESTABLISHED event */ MPTCP_PM_SUBFLOW_ESTABLISHED,
- MPTCP_PM_ALREADY_ESTABLISHED, /* persistent status, set after ESTABLISHED event */
- MPTCP_PM_MPC_ENDPOINT_ACCOUNTED /* persistent status, set after MPC local address is
* accounted int id_avail_bitmap
*/
};
+/* Status bits below MPTCP_PM_ALREADY_ESTABLISHED need pm worker actions */ +#define MPTCP_PM_WORK_MASK ((1 << MPTCP_PM_ALREADY_ESTABLISHED) - 1)
enum mptcp_addr_signal_status { MPTCP_ADD_ADDR_SIGNAL, MPTCP_ADD_ADDR_ECHO, MPTCP_RM_ADDR_SIGNAL, };
+/* max value of mptcp_addr_info.id */ +#define MPTCP_PM_MAX_ADDR_ID U8_MAX
struct mptcp_pm_data { struct mptcp_addr_info local; struct mptcp_addr_info remote; @@ -202,6 +211,7 @@ struct mptcp_pm_data { u8 local_addr_used; u8 subflows; u8 status;
- DECLARE_BITMAP(id_avail_bitmap, MPTCP_PM_MAX_ADDR_ID + 1); struct mptcp_rm_list rm_list_tx; struct mptcp_rm_list rm_list_rx;
}; diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh index 7ef639a9d4a6f..bbafa4cf54538 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -1071,7 +1071,10 @@ signal_address_tests() ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags signal ip netns exec $ns2 ./pm_nl_ctl add 10.0.4.2 flags signal run_tests $ns1 $ns2 10.0.1.1
- chk_add_nr 4 4
- # the server will not signal the address terminating
- # the MPC subflow
- chk_add_nr 3 3
}
link_failure_tests()
2.34.1
-- Mat Martineau Intel
On Mon, Jan 31, 2022 at 11:36:32AM -0800, Mat Martineau wrote:
On Mon, 31 Jan 2022, Greg Kroah-Hartman wrote:
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 86e39e04482b0aadf3ee3ed5fcf2d63816559d36 ]
Hi Greg -
Please drop this from the stable queue for both 5.16 and 5.15. It wasn't intended for backporting and we haven't checked for dependencies with other changes in this part of MPTCP subsystem.
In the mptcp tree we make sure to add Fixes: tags on every patch we think is eligible for the -stable tree. I know you're sifting through a lot of patches from subsystems that end up with important fixes arriving from the "-next" branches, and it seems like the scripts scooped up several MPTCP patches this time around that don't meet the -stable rules.
I think these were needed due to 8e9eacad7ec7 ("mptcp: fix msk traversal in mptcp_nl_cmd_set_flags()") which you did tag with a "Fixes:" tag, right? Without these other commits, that one does not apply and it looks like it should go into 5.15.y and 5.16.y.
Note, just putting a "Fixes:" tag does not guarantee if a commit will go into a stable tree. Please use the correct "Cc: stable@vger.kernel.org" tag as the documentation asks for.
I'll go drop all of these mptcp patches, including 8e9eacad7ec7 ("mptcp: fix msk traversal in mptcp_nl_cmd_set_flags()") for now. If you want them added back in, please let us know.
thanks,
greg k-h
On Tue, Feb 01, 2022 at 03:30:02PM +0100, Greg Kroah-Hartman wrote:
On Mon, Jan 31, 2022 at 11:36:32AM -0800, Mat Martineau wrote:
On Mon, 31 Jan 2022, Greg Kroah-Hartman wrote:
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 86e39e04482b0aadf3ee3ed5fcf2d63816559d36 ]
Hi Greg -
Please drop this from the stable queue for both 5.16 and 5.15. It wasn't intended for backporting and we haven't checked for dependencies with other changes in this part of MPTCP subsystem.
In the mptcp tree we make sure to add Fixes: tags on every patch we think is eligible for the -stable tree. I know you're sifting through a lot of patches from subsystems that end up with important fixes arriving from the "-next" branches, and it seems like the scripts scooped up several MPTCP patches this time around that don't meet the -stable rules.
I think these were needed due to 8e9eacad7ec7 ("mptcp: fix msk traversal in mptcp_nl_cmd_set_flags()") which you did tag with a "Fixes:" tag, right? Without these other commits, that one does not apply and it looks like it should go into 5.15.y and 5.16.y.
Note, just putting a "Fixes:" tag does not guarantee if a commit will go into a stable tree. Please use the correct "Cc: stable@vger.kernel.org" tag as the documentation asks for.
I'll go drop all of these mptcp patches, including 8e9eacad7ec7 ("mptcp: fix msk traversal in mptcp_nl_cmd_set_flags()") for now. If you want them added back in, please let us know.
To be specific, I have dropped the following commits from the queues now, which was more than just these three that you asked:
602837e8479d ("mptcp: allow changing the "backup" bit by endpoint id") 59060a47ca50 ("mptcp: clean up harmless false expressions") 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk") 8e9eacad7ec7 ("mptcp: fix msk traversal in mptcp_nl_cmd_set_flags()") a4c0214fbee9 ("mptcp: fix removing ids bitmap setting") 9846921dba49 ("selftests: mptcp: fix ipv6 routing setup")
If you want them back, please let us know.
thanks,
greg k-h
On Tue, 1 Feb 2022, Greg Kroah-Hartman wrote:
On Tue, Feb 01, 2022 at 03:30:02PM +0100, Greg Kroah-Hartman wrote:
On Mon, Jan 31, 2022 at 11:36:32AM -0800, Mat Martineau wrote:
On Mon, 31 Jan 2022, Greg Kroah-Hartman wrote:
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 86e39e04482b0aadf3ee3ed5fcf2d63816559d36 ]
Hi Greg -
Please drop this from the stable queue for both 5.16 and 5.15. It wasn't intended for backporting and we haven't checked for dependencies with other changes in this part of MPTCP subsystem.
In the mptcp tree we make sure to add Fixes: tags on every patch we think is eligible for the -stable tree. I know you're sifting through a lot of patches from subsystems that end up with important fixes arriving from the "-next" branches, and it seems like the scripts scooped up several MPTCP patches this time around that don't meet the -stable rules.
I think these were needed due to 8e9eacad7ec7 ("mptcp: fix msk traversal in mptcp_nl_cmd_set_flags()") which you did tag with a "Fixes:" tag, right? Without these other commits, that one does not apply and it looks like it should go into 5.15.y and 5.16.y.
Ok, I see what happened there (there's a new variable present in 5.17 that's not relevant for the earlier kernels). 8e9eacad7ec7 needs manual backporting, I'll work on that and send to the stable list.
Note, just putting a "Fixes:" tag does not guarantee if a commit will go into a stable tree. Please use the correct "Cc:stable@vger.kernel.org" tag as the documentation asks for.
I will make sure to use the Cc:stable@vger.kernel.org tag in the future.
I'll go drop all of these mptcp patches, including 8e9eacad7ec7 ("mptcp: fix msk traversal in mptcp_nl_cmd_set_flags()") for now. If you want them added back in, please let us know.
To be specific, I have dropped the following commits from the queues now, which was more than just these three that you asked:
602837e8479d ("mptcp: allow changing the "backup" bit by endpoint id") 59060a47ca50 ("mptcp: clean up harmless false expressions") 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk") 8e9eacad7ec7 ("mptcp: fix msk traversal in mptcp_nl_cmd_set_flags()") a4c0214fbee9 ("mptcp: fix removing ids bitmap setting") 9846921dba49 ("selftests: mptcp: fix ipv6 routing setup")
If you want them back, please let us know.
Of the above, there's only one suitable for stable as-is:
9846921dba49 ("selftests: mptcp: fix ipv6 routing setup")
Please add that one to the 5.16 and 5.15 queues.
Thanks!
-- Mat Martineau Intel
On Tue, Feb 01, 2022 at 02:23:55PM -0800, Mat Martineau wrote:
On Tue, 1 Feb 2022, Greg Kroah-Hartman wrote:
On Tue, Feb 01, 2022 at 03:30:02PM +0100, Greg Kroah-Hartman wrote:
On Mon, Jan 31, 2022 at 11:36:32AM -0800, Mat Martineau wrote:
On Mon, 31 Jan 2022, Greg Kroah-Hartman wrote:
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 86e39e04482b0aadf3ee3ed5fcf2d63816559d36 ]
Hi Greg -
Please drop this from the stable queue for both 5.16 and 5.15. It wasn't intended for backporting and we haven't checked for dependencies with other changes in this part of MPTCP subsystem.
In the mptcp tree we make sure to add Fixes: tags on every patch we think is eligible for the -stable tree. I know you're sifting through a lot of patches from subsystems that end up with important fixes arriving from the "-next" branches, and it seems like the scripts scooped up several MPTCP patches this time around that don't meet the -stable rules.
I think these were needed due to 8e9eacad7ec7 ("mptcp: fix msk traversal in mptcp_nl_cmd_set_flags()") which you did tag with a "Fixes:" tag, right? Without these other commits, that one does not apply and it looks like it should go into 5.15.y and 5.16.y.
Ok, I see what happened there (there's a new variable present in 5.17 that's not relevant for the earlier kernels). 8e9eacad7ec7 needs manual backporting, I'll work on that and send to the stable list.
Note, just putting a "Fixes:" tag does not guarantee if a commit will go into a stable tree. Please use the correct "Cc:stable@vger.kernel.org" tag as the documentation asks for.
I will make sure to use the Cc:stable@vger.kernel.org tag in the future.
I'll go drop all of these mptcp patches, including 8e9eacad7ec7 ("mptcp: fix msk traversal in mptcp_nl_cmd_set_flags()") for now. If you want them added back in, please let us know.
To be specific, I have dropped the following commits from the queues now, which was more than just these three that you asked:
602837e8479d ("mptcp: allow changing the "backup" bit by endpoint id") 59060a47ca50 ("mptcp: clean up harmless false expressions") 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk") 8e9eacad7ec7 ("mptcp: fix msk traversal in mptcp_nl_cmd_set_flags()") a4c0214fbee9 ("mptcp: fix removing ids bitmap setting") 9846921dba49 ("selftests: mptcp: fix ipv6 routing setup")
If you want them back, please let us know.
Of the above, there's only one suitable for stable as-is:
9846921dba49 ("selftests: mptcp: fix ipv6 routing setup")
Please add that one to the 5.16 and 5.15 queues.
Now queued up, thanks.
greg k-h
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 8e9eacad7ec7a9cbf262649ebf1fa6e6f6cc7d82 ]
The MPTCP endpoint list is under RCU protection, guarded by the pernet spinlock. mptcp_nl_cmd_set_flags() traverses the list without acquiring the spin-lock nor under the RCU critical section.
This change addresses the issue performing the lookup and the endpoint update under the pernet spinlock.
Fixes: 0f9f696a502e ("mptcp: add set_flags command in PM netlink") Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/pm_netlink.c | 37 +++++++++++++++++++++++++++---------- 1 file changed, 27 insertions(+), 10 deletions(-)
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index ad3dc9c6c5310..35abcce604b4d 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -469,6 +469,20 @@ __lookup_addr_by_id(struct pm_nl_pernet *pernet, unsigned int id) return NULL; }
+static struct mptcp_pm_addr_entry * +__lookup_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *info, + bool lookup_by_id) +{ + struct mptcp_pm_addr_entry *entry; + + list_for_each_entry(entry, &pernet->local_addr_list, list) { + if ((!lookup_by_id && addresses_equal(&entry->addr, info, true)) || + (lookup_by_id && entry->addr.id == info->id)) + return entry; + } + return NULL; +} + static int lookup_id_by_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *addr) { @@ -1753,18 +1767,21 @@ static int mptcp_nl_cmd_set_flags(struct sk_buff *skb, struct genl_info *info) return -EOPNOTSUPP; }
- list_for_each_entry(entry, &pernet->local_addr_list, list) { - if ((!lookup_by_id && addresses_equal(&entry->addr, &addr.addr, true)) || - (lookup_by_id && entry->addr.id == addr.addr.id)) { - mptcp_nl_addr_backup(net, &entry->addr, bkup); - - if (bkup) - entry->flags |= MPTCP_PM_ADDR_FLAG_BACKUP; - else - entry->flags &= ~MPTCP_PM_ADDR_FLAG_BACKUP; - } + spin_lock_bh(&pernet->lock); + entry = __lookup_addr(pernet, &addr.addr, lookup_by_id); + if (!entry) { + spin_unlock_bh(&pernet->lock); + return -EINVAL; }
+ if (bkup) + entry->flags |= MPTCP_PM_ADDR_FLAG_BACKUP; + else + entry->flags &= ~MPTCP_PM_ADDR_FLAG_BACKUP; + addr = *entry; + spin_unlock_bh(&pernet->lock); + + mptcp_nl_addr_backup(net, &addr.addr, bkup); return 0; }
From: Geliang Tang geliang.tang@suse.com
[ Upstream commit a4c0214fbee97c46e3f41fee37931d66c0fc3cb1 ]
In mptcp_pm_nl_rm_addr_or_subflow(), the bit of rm_list->ids[i] in the id_avail_bitmap should be set, not rm_list->ids[1]. This patch fixed it.
Fixes: 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk") Acked-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Geliang Tang geliang.tang@suse.com Signed-off-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/pm_netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index 35abcce604b4d..14e6d6f745186 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -781,7 +781,7 @@ static void mptcp_pm_nl_rm_addr_or_subflow(struct mptcp_sock *msk, msk->pm.subflows--; __MPTCP_INC_STATS(sock_net(sk), rm_type); } - __set_bit(rm_list->ids[1], msk->pm.id_avail_bitmap); + __set_bit(rm_list->ids[i], msk->pm.id_avail_bitmap); if (!removed) continue;
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 9846921dba4936d92f7608315b5d1e0a8ec3a538 ]
MPJ ipv6 selftests currently lack per link route to the server net. Additionally, ipv6 subflows endpoints are created without any interface specified. The end-result is that in ipv6 self-tests subflows are created all on the same link, leading to expected delays and sporadic self-tests failures.
Fix the issue by adding the missing setup bits.
Fixes: 523514ed0a99 ("selftests: mptcp: add ADD_ADDR IPv6 test cases") Reported-and-tested-by: Geliang Tang geliang.tang@suse.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh index bbafa4cf54538..d188ea0a5fc50 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -75,6 +75,7 @@ init()
# let $ns2 reach any $ns1 address from any interface ip -net "$ns2" route add default via 10.0.$i.1 dev ns2eth$i metric 10$i + ip -net "$ns2" route add default via dead:beef:$i::1 dev ns2eth$i metric 10$i done }
@@ -1389,7 +1390,7 @@ ipv6_tests() reset ip netns exec $ns1 ./pm_nl_ctl limits 0 1 ip netns exec $ns2 ./pm_nl_ctl limits 0 1 - ip netns exec $ns2 ./pm_nl_ctl add dead:beef:3::2 flags subflow + ip netns exec $ns2 ./pm_nl_ctl add dead:beef:3::2 dev ns2eth3 flags subflow run_tests $ns1 $ns2 dead:beef:1::1 0 0 0 slow chk_join_nr "single subflow IPv6" 1 1 1
@@ -1424,7 +1425,7 @@ ipv6_tests() ip netns exec $ns1 ./pm_nl_ctl limits 0 2 ip netns exec $ns1 ./pm_nl_ctl add dead:beef:2::1 flags signal ip netns exec $ns2 ./pm_nl_ctl limits 1 2 - ip netns exec $ns2 ./pm_nl_ctl add dead:beef:3::2 flags subflow + ip netns exec $ns2 ./pm_nl_ctl add dead:beef:3::2 dev ns2eth3 flags subflow run_tests $ns1 $ns2 dead:beef:1::1 0 -1 -1 slow chk_join_nr "remove subflow and signal IPv6" 2 2 2 chk_add_nr 1 1
From: Subbaraya Sundeep sbhatta@marvell.com
[ Upstream commit d225c449ab2be25273a3674f476c6c0b57c50254 ]
AF modifies all the rules destined for VF to use the action same as default RSS action. This fixup was needed because AF only installs default rules with RSS action. But the action in rules installed by a PF for its VFs should not be changed by this fixup. This is because action can be drop or direct to queue as specified by user(ntuple filters). This patch fixes that problem.
Fixes: 967db3529eca ("octeontx2-af: add support for multicast/promisc packet") Signed-off-by: Subbaraya Sundeep sbhatta@marvell.com Signed-off-by: Naveen Mamindlapalli naveenm@marvell.com Signed-off-by: Sunil Goutham sgoutham@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/marvell/octeontx2/af/rvu_npc.c | 22 ++++++++++++++++--- .../marvell/octeontx2/af/rvu_npc_fs.c | 20 ++++++++++------- 2 files changed, 31 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c index c0005a1feee69..91f86d77cd41b 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c @@ -402,6 +402,7 @@ static void npc_fixup_vf_rule(struct rvu *rvu, struct npc_mcam *mcam, int blkaddr, int index, struct mcam_entry *entry, bool *enable) { + struct rvu_npc_mcam_rule *rule; u16 owner, target_func; struct rvu_pfvf *pfvf; u64 rx_action; @@ -423,6 +424,12 @@ static void npc_fixup_vf_rule(struct rvu *rvu, struct npc_mcam *mcam, test_bit(NIXLF_INITIALIZED, &pfvf->flags))) *enable = false;
+ /* fix up not needed for the rules added by user(ntuple filters) */ + list_for_each_entry(rule, &mcam->mcam_rules, list) { + if (rule->entry == index) + return; + } + /* copy VF default entry action to the VF mcam entry */ rx_action = npc_get_default_entry_action(rvu, mcam, blkaddr, target_func); @@ -489,8 +496,8 @@ static void npc_config_mcam_entry(struct rvu *rvu, struct npc_mcam *mcam, }
/* PF installing VF rule */ - if (intf == NIX_INTF_RX && actindex < mcam->bmap_entries) - npc_fixup_vf_rule(rvu, mcam, blkaddr, index, entry, &enable); + if (is_npc_intf_rx(intf) && actindex < mcam->bmap_entries) + npc_fixup_vf_rule(rvu, mcam, blkaddr, actindex, entry, &enable);
/* Set 'action' */ rvu_write64(rvu, blkaddr, @@ -916,7 +923,8 @@ static void npc_update_vf_flow_entry(struct rvu *rvu, struct npc_mcam *mcam, int blkaddr, u16 pcifunc, u64 rx_action) { int actindex, index, bank, entry; - bool enable; + struct rvu_npc_mcam_rule *rule; + bool enable, update;
if (!(pcifunc & RVU_PFVF_FUNC_MASK)) return; @@ -924,6 +932,14 @@ static void npc_update_vf_flow_entry(struct rvu *rvu, struct npc_mcam *mcam, mutex_lock(&mcam->lock); for (index = 0; index < mcam->bmap_entries; index++) { if (mcam->entry2target_pffunc[index] == pcifunc) { + update = true; + /* update not needed for the rules added via ntuple filters */ + list_for_each_entry(rule, &mcam->mcam_rules, list) { + if (rule->entry == index) + update = false; + } + if (!update) + continue; bank = npc_get_bank(mcam, index); actindex = index; entry = index & (mcam->banksize - 1); diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc_fs.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc_fs.c index ff2b21999f36f..19c53e591d0da 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc_fs.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc_fs.c @@ -1098,14 +1098,6 @@ find_rule: write_req.cntr = rule->cntr; }
- err = rvu_mbox_handler_npc_mcam_write_entry(rvu, &write_req, - &write_rsp); - if (err) { - rvu_mcam_remove_counter_from_rule(rvu, owner, rule); - if (new) - kfree(rule); - return err; - } /* update rule */ memcpy(&rule->packet, &dummy.packet, sizeof(rule->packet)); memcpy(&rule->mask, &dummy.mask, sizeof(rule->mask)); @@ -1132,6 +1124,18 @@ find_rule: if (req->default_rule) pfvf->def_ucast_rule = rule;
+ /* write to mcam entry registers */ + err = rvu_mbox_handler_npc_mcam_write_entry(rvu, &write_req, + &write_rsp); + if (err) { + rvu_mcam_remove_counter_from_rule(rvu, owner, rule); + if (new) { + list_del(&rule->list); + kfree(rule); + } + return err; + } + /* VF's MAC address is being changed via PF */ if (pf_set_vfs_mac) { ether_addr_copy(pfvf->default_mac, req->packet.dmac);
From: Sunil Goutham sgoutham@marvell.com
[ Upstream commit 00bfe94e388fe12bfd0d4f6361b1b1343374ff5b ]
In rvu_nix_get_bpid() lbk_bpid_cnt is being read from wrong register. Due to this backpressure enable is failing for LBK VF32 onwards. This patch fixes that.
Fixes: fe1939bb2340 ("octeontx2-af: Add SDP interface support") Signed-off-by: Sunil Goutham sgoutham@marvell.com Signed-off-by: Subbaraya Sundeep sgoutham@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c index d8b1948aaa0ae..b74ab0dae10dc 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c @@ -512,11 +512,11 @@ static int rvu_nix_get_bpid(struct rvu *rvu, struct nix_bp_cfg_req *req, cfg = rvu_read64(rvu, blkaddr, NIX_AF_CONST); lmac_chan_cnt = cfg & 0xFF;
- cfg = rvu_read64(rvu, blkaddr, NIX_AF_CONST1); - sdp_chan_cnt = cfg & 0xFFF; - cgx_bpid_cnt = hw->cgx_links * lmac_chan_cnt; lbk_bpid_cnt = hw->lbk_links * ((cfg >> 16) & 0xFF); + + cfg = rvu_read64(rvu, blkaddr, NIX_AF_CONST1); + sdp_chan_cnt = cfg & 0xFFF; sdp_bpid_cnt = hw->sdp_links * sdp_chan_cnt;
pfvf = rvu_get_pfvf(rvu, req->hdr.pcifunc);
From: Geetha sowjanya gakula@marvell.com
[ Upstream commit 03ffbc9914bd1130fba464f0a41c01372e5fc359 ]
Few RVU blocks like SSO require more time for reset on some silicons. Hence retrying the block reset until success.
Fixes: c0fa2cff8822c ("octeontx2-af: Handle return value in block reset") Signed-off-by: Geetha sowjanya gakula@marvell.com Signed-off-by: Subbaraya Sundeep sbhatta@marvell.com Signed-off-by: Sunil Goutham sgoutham@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/octeontx2/af/rvu.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c index 3ca6b942ebe25..54e1b27a7dfec 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c @@ -520,8 +520,11 @@ static void rvu_block_reset(struct rvu *rvu, int blkaddr, u64 rst_reg)
rvu_write64(rvu, blkaddr, rst_reg, BIT_ULL(0)); err = rvu_poll_reg(rvu, blkaddr, rst_reg, BIT_ULL(63), true); - if (err) - dev_err(rvu->dev, "HW block:%d reset failed\n", blkaddr); + if (err) { + dev_err(rvu->dev, "HW block:%d reset timeout retrying again\n", blkaddr); + while (rvu_poll_reg(rvu, blkaddr, rst_reg, BIT_ULL(63), true) == -EBUSY) + ; + } }
static void rvu_reset_all_blocks(struct rvu *rvu)
From: Geetha sowjanya gakula@marvell.com
[ Upstream commit fae80edeafbbba5ef9a0423aa5e5515518626433 ]
CN10K platforms uses RPM(0..2)_MTI_MAC100(0..3)_COMMAND_CONFIG register for lmac TX/RX enable whereas CN9xxx platforms use CGX_CMRX_CONFIG register. This config change was missed when adding support for CN10K RPM.
Fixes: 91c6945ea1f9 ("octeontx2-af: cn10k: Add RPM MAC support") Signed-off-by: Geetha sowjanya gakula@marvell.com Signed-off-by: Subbaraya Sundeep sbhatta@marvell.com Signed-off-by: Sunil Goutham sgoutham@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/marvell/octeontx2/af/cgx.c | 2 + .../marvell/octeontx2/af/lmac_common.h | 3 ++ .../net/ethernet/marvell/octeontx2/af/rpm.c | 39 +++++++++++++++++++ .../net/ethernet/marvell/octeontx2/af/rpm.h | 4 ++ .../net/ethernet/marvell/octeontx2/af/rvu.h | 1 + .../ethernet/marvell/octeontx2/af/rvu_cgx.c | 14 ++++++- .../ethernet/marvell/octeontx2/af/rvu_nix.c | 10 ++--- 7 files changed, 66 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cgx.c b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c index 186d00a9ab35c..3631d612aaca1 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/cgx.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c @@ -1570,6 +1570,8 @@ static struct mac_ops cgx_mac_ops = { .mac_enadis_pause_frm = cgx_lmac_enadis_pause_frm, .mac_pause_frm_config = cgx_lmac_pause_frm_config, .mac_enadis_ptp_config = cgx_lmac_ptp_config, + .mac_rx_tx_enable = cgx_lmac_rx_tx_enable, + .mac_tx_enable = cgx_lmac_tx_enable, };
static int cgx_probe(struct pci_dev *pdev, const struct pci_device_id *id) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/lmac_common.h b/drivers/net/ethernet/marvell/octeontx2/af/lmac_common.h index fc6e7423cbd81..b33e7d1d0851c 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/lmac_common.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/lmac_common.h @@ -107,6 +107,9 @@ struct mac_ops { void (*mac_enadis_ptp_config)(void *cgxd, int lmac_id, bool enable); + + int (*mac_rx_tx_enable)(void *cgxd, int lmac_id, bool enable); + int (*mac_tx_enable)(void *cgxd, int lmac_id, bool enable); };
struct cgx { diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rpm.c b/drivers/net/ethernet/marvell/octeontx2/af/rpm.c index e695fa0e82a94..4cbd91540f999 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rpm.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rpm.c @@ -30,6 +30,8 @@ static struct mac_ops rpm_mac_ops = { .mac_enadis_pause_frm = rpm_lmac_enadis_pause_frm, .mac_pause_frm_config = rpm_lmac_pause_frm_config, .mac_enadis_ptp_config = rpm_lmac_ptp_config, + .mac_rx_tx_enable = rpm_lmac_rx_tx_enable, + .mac_tx_enable = rpm_lmac_tx_enable, };
struct mac_ops *rpm_get_mac_ops(void) @@ -54,6 +56,43 @@ int rpm_get_nr_lmacs(void *rpmd) return hweight8(rpm_read(rpm, 0, CGXX_CMRX_RX_LMACS) & 0xFULL); }
+int rpm_lmac_tx_enable(void *rpmd, int lmac_id, bool enable) +{ + rpm_t *rpm = rpmd; + u64 cfg, last; + + if (!is_lmac_valid(rpm, lmac_id)) + return -ENODEV; + + cfg = rpm_read(rpm, lmac_id, RPMX_MTI_MAC100X_COMMAND_CONFIG); + last = cfg; + if (enable) + cfg |= RPM_TX_EN; + else + cfg &= ~(RPM_TX_EN); + + if (cfg != last) + rpm_write(rpm, lmac_id, RPMX_MTI_MAC100X_COMMAND_CONFIG, cfg); + return !!(last & RPM_TX_EN); +} + +int rpm_lmac_rx_tx_enable(void *rpmd, int lmac_id, bool enable) +{ + rpm_t *rpm = rpmd; + u64 cfg; + + if (!is_lmac_valid(rpm, lmac_id)) + return -ENODEV; + + cfg = rpm_read(rpm, lmac_id, RPMX_MTI_MAC100X_COMMAND_CONFIG); + if (enable) + cfg |= RPM_RX_EN | RPM_TX_EN; + else + cfg &= ~(RPM_RX_EN | RPM_TX_EN); + rpm_write(rpm, lmac_id, RPMX_MTI_MAC100X_COMMAND_CONFIG, cfg); + return 0; +} + void rpm_lmac_enadis_rx_pause_fwding(void *rpmd, int lmac_id, bool enable) { rpm_t *rpm = rpmd; diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rpm.h b/drivers/net/ethernet/marvell/octeontx2/af/rpm.h index 57c8a687b488a..ff580311edd03 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rpm.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/rpm.h @@ -43,6 +43,8 @@ #define RPMX_MTI_STAT_DATA_HI_CDC 0x10038
#define RPM_LMAC_FWI 0xa +#define RPM_TX_EN BIT_ULL(0) +#define RPM_RX_EN BIT_ULL(1)
/* Function Declarations */ int rpm_get_nr_lmacs(void *rpmd); @@ -57,4 +59,6 @@ int rpm_lmac_enadis_pause_frm(void *rpmd, int lmac_id, u8 tx_pause, int rpm_get_tx_stats(void *rpmd, int lmac_id, int idx, u64 *tx_stat); int rpm_get_rx_stats(void *rpmd, int lmac_id, int idx, u64 *rx_stat); void rpm_lmac_ptp_config(void *rpmd, int lmac_id, bool enable); +int rpm_lmac_rx_tx_enable(void *rpmd, int lmac_id, bool enable); +int rpm_lmac_tx_enable(void *rpmd, int lmac_id, bool enable); #endif /* RPM_H */ diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h index 66e45d733824e..5ed94cfb47d2d 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h @@ -806,6 +806,7 @@ bool is_mac_feature_supported(struct rvu *rvu, int pf, int feature); u32 rvu_cgx_get_fifolen(struct rvu *rvu); void *rvu_first_cgx_pdata(struct rvu *rvu); int cgxlmac_to_pf(struct rvu *rvu, int cgx_id, int lmac_id); +int rvu_cgx_config_tx(void *cgxd, int lmac_id, bool enable);
int npc_get_nixlf_mcam_index(struct npc_mcam *mcam, u16 pcifunc, int nixlf, int type); diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c index 2ca182a4ce823..8a7ac5a8b821d 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c @@ -441,16 +441,26 @@ void rvu_cgx_enadis_rx_bp(struct rvu *rvu, int pf, bool enable) int rvu_cgx_config_rxtx(struct rvu *rvu, u16 pcifunc, bool start) { int pf = rvu_get_pf(pcifunc); + struct mac_ops *mac_ops; u8 cgx_id, lmac_id; + void *cgxd;
if (!is_cgx_config_permitted(rvu, pcifunc)) return LMAC_AF_ERR_PERM_DENIED;
rvu_get_cgx_lmac_id(rvu->pf2cgxlmac_map[pf], &cgx_id, &lmac_id); + cgxd = rvu_cgx_pdata(cgx_id, rvu); + mac_ops = get_mac_ops(cgxd); + + return mac_ops->mac_rx_tx_enable(cgxd, lmac_id, start); +}
- cgx_lmac_rx_tx_enable(rvu_cgx_pdata(cgx_id, rvu), lmac_id, start); +int rvu_cgx_config_tx(void *cgxd, int lmac_id, bool enable) +{ + struct mac_ops *mac_ops;
- return 0; + mac_ops = get_mac_ops(cgxd); + return mac_ops->mac_tx_enable(cgxd, lmac_id, enable); }
void rvu_cgx_disable_dmac_entries(struct rvu *rvu, u16 pcifunc) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c index b74ab0dae10dc..de6e5a1288640 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c @@ -2068,8 +2068,8 @@ static int nix_smq_flush(struct rvu *rvu, int blkaddr, /* enable cgx tx if disabled */ if (is_pf_cgxmapped(rvu, pf)) { rvu_get_cgx_lmac_id(rvu->pf2cgxlmac_map[pf], &cgx_id, &lmac_id); - restore_tx_en = !cgx_lmac_tx_enable(rvu_cgx_pdata(cgx_id, rvu), - lmac_id, true); + restore_tx_en = !rvu_cgx_config_tx(rvu_cgx_pdata(cgx_id, rvu), + lmac_id, true); }
cfg = rvu_read64(rvu, blkaddr, NIX_AF_SMQX_CFG(smq)); @@ -2092,7 +2092,7 @@ static int nix_smq_flush(struct rvu *rvu, int blkaddr, rvu_cgx_enadis_rx_bp(rvu, pf, true); /* restore cgx tx state */ if (restore_tx_en) - cgx_lmac_tx_enable(rvu_cgx_pdata(cgx_id, rvu), lmac_id, false); + rvu_cgx_config_tx(rvu_cgx_pdata(cgx_id, rvu), lmac_id, false); return err; }
@@ -3878,7 +3878,7 @@ nix_config_link_credits(struct rvu *rvu, int blkaddr, int link, /* Enable cgx tx if disabled for credits to be back */ if (is_pf_cgxmapped(rvu, pf)) { rvu_get_cgx_lmac_id(rvu->pf2cgxlmac_map[pf], &cgx_id, &lmac_id); - restore_tx_en = !cgx_lmac_tx_enable(rvu_cgx_pdata(cgx_id, rvu), + restore_tx_en = !rvu_cgx_config_tx(rvu_cgx_pdata(cgx_id, rvu), lmac_id, true); }
@@ -3918,7 +3918,7 @@ exit:
/* Restore state of cgx tx */ if (restore_tx_en) - cgx_lmac_tx_enable(rvu_cgx_pdata(cgx_id, rvu), lmac_id, false); + rvu_cgx_config_tx(rvu_cgx_pdata(cgx_id, rvu), lmac_id, false);
mutex_unlock(&rvu->rsrc_lock); return rc;
From: Geetha sowjanya gakula@marvell.com
[ Upstream commit c5d731c54a17677939bd59ee8be4ed74d7485ba4 ]
While freeing SQB pointers to aura, driver first memcpy to target address and then triggers lmtst operation to free pointer to the aura. We need to ensure(by adding dmb barrier)that memcpy is finished before pointers are freed to the aura. This patch also adds the missing sq context structure entry in debugfs.
Fixes: ef6c8da71eaf ("octeontx2-pf: cn10K: Reserve LMTST lines per core") Signed-off-by: Geetha sowjanya gakula@marvell.com Signed-off-by: Subbaraya Sundeep sbhatta@marvell.com Signed-off-by: Sunil Goutham sgoutham@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c | 2 ++ drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h | 1 + 2 files changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c index a09a507369ac3..d1eddb769a419 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c @@ -1224,6 +1224,8 @@ static void print_nix_cn10k_sq_ctx(struct seq_file *m, seq_printf(m, "W3: head_offset\t\t\t%d\nW3: smenq_next_sqb_vld\t\t%d\n\n", sq_ctx->head_offset, sq_ctx->smenq_next_sqb_vld);
+ seq_printf(m, "W3: smq_next_sq_vld\t\t%d\nW3: smq_pend\t\t\t%d\n", + sq_ctx->smq_next_sq_vld, sq_ctx->smq_pend); seq_printf(m, "W4: next_sqb \t\t\t%llx\n\n", sq_ctx->next_sqb); seq_printf(m, "W5: tail_sqb \t\t\t%llx\n\n", sq_ctx->tail_sqb); seq_printf(m, "W6: smenq_sqb \t\t\t%llx\n\n", sq_ctx->smenq_sqb); diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h index 61e52812983fa..14509fc64cce9 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h @@ -603,6 +603,7 @@ static inline void __cn10k_aura_freeptr(struct otx2_nic *pfvf, u64 aura, size++; tar_addr |= ((size - 1) & 0x7) << 4; } + dma_wmb(); memcpy((u64 *)lmt_info->lmt_addr, ptrs, sizeof(u64) * num_ptrs); /* Perform LMTST flush */ cn10k_lmt_flush(val, tar_addr);
From: Geetha sowjanya gakula@marvell.com
[ Upstream commit 1581d61b42d985cefe7b71eea67ab3bfcbf34d0f ]
It's been observed that sometimes link credit restore takes a lot of time than the current timeout. This patch increases the default timeout value and return the proper error value on failure.
Fixes: 1c74b89171c3 ("octeontx2-af: Wait for TX link idle for credits change") Signed-off-by: Geetha sowjanya gakula@marvell.com Signed-off-by: Subbaraya Sundeep sbhatta@marvell.com Signed-off-by: Sunil Goutham sgoutham@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/octeontx2/af/mbox.h | 1 + drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h index 4e79e918a1617..58e2aeebc14f8 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h @@ -732,6 +732,7 @@ enum nix_af_status { NIX_AF_ERR_BANDPROF_INVAL_REQ = -428, NIX_AF_ERR_CQ_CTX_WRITE_ERR = -429, NIX_AF_ERR_AQ_CTX_RETRY_WRITE = -430, + NIX_AF_ERR_LINK_CREDITS = -431, };
/* For NIX RX vtag action */ diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c index de6e5a1288640..97fb61915379a 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c @@ -3891,8 +3891,8 @@ nix_config_link_credits(struct rvu *rvu, int blkaddr, int link, NIX_AF_TL1X_SW_XOFF(schq), BIT_ULL(0)); }
- rc = -EBUSY; - poll_tmo = jiffies + usecs_to_jiffies(10000); + rc = NIX_AF_ERR_LINK_CREDITS; + poll_tmo = jiffies + usecs_to_jiffies(200000); /* Wait for credits to return */ do { if (time_after(jiffies, poll_tmo))
From: Geetha sowjanya gakula@marvell.com
[ Upstream commit df66b6ebc5dcf7253e35a640b9ec4add54195c25 ]
Internal looback is not supported to low rate LPCS interface like SGMII/QSGMII. Hence don't allow to enable for such interfaces.
Fixes: 3ad3f8f93c81 ("octeontx2-af: cn10k: MAC internal loopback support") Signed-off-by: Geetha sowjanya gakula@marvell.com Signed-off-by: Subbaraya Sundeep sbhatta@marvell.com Signed-off-by: Sunil Goutham sgoutham@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/marvell/octeontx2/af/rpm.c | 27 +++++++++---------- 1 file changed, 12 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rpm.c b/drivers/net/ethernet/marvell/octeontx2/af/rpm.c index 4cbd91540f999..9ea2f6ac38ec1 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rpm.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rpm.c @@ -291,23 +291,20 @@ int rpm_lmac_internal_loopback(void *rpmd, int lmac_id, bool enable) if (!rpm || lmac_id >= rpm->lmac_count) return -ENODEV; lmac_type = rpm->mac_ops->get_lmac_type(rpm, lmac_id); - if (lmac_type == LMAC_MODE_100G_R) { - cfg = rpm_read(rpm, lmac_id, RPMX_MTI_PCS100X_CONTROL1); - - if (enable) - cfg |= RPMX_MTI_PCS_LBK; - else - cfg &= ~RPMX_MTI_PCS_LBK; - rpm_write(rpm, lmac_id, RPMX_MTI_PCS100X_CONTROL1, cfg); - } else { - cfg = rpm_read(rpm, lmac_id, RPMX_MTI_LPCSX_CONTROL1); - if (enable) - cfg |= RPMX_MTI_PCS_LBK; - else - cfg &= ~RPMX_MTI_PCS_LBK; - rpm_write(rpm, lmac_id, RPMX_MTI_LPCSX_CONTROL1, cfg); + + if (lmac_type == LMAC_MODE_QSGMII || lmac_type == LMAC_MODE_SGMII) { + dev_err(&rpm->pdev->dev, "loopback not supported for LPC mode\n"); + return 0; }
+ cfg = rpm_read(rpm, lmac_id, RPMX_MTI_PCS100X_CONTROL1); + + if (enable) + cfg |= RPMX_MTI_PCS_LBK; + else + cfg &= ~RPMX_MTI_PCS_LBK; + rpm_write(rpm, lmac_id, RPMX_MTI_PCS100X_CONTROL1, cfg); + return 0; }
From: Subbaraya Sundeep sbhatta@marvell.com
[ Upstream commit a8db854be28622a2477cb21cdf7f829adbb2c42d ]
PF forwards its VF messages to AF and corresponding replies from AF to VF. AF sets proper error code in the replies after processing message requests. Currently PF checks the error codes in replies and sends invalid message to VF. This way VF lacks the information of error code set by AF for its messages. This patch changes that such that PF simply forwards AF replies so that VF can handle error codes.
Fixes: d424b6c02415 ("octeontx2-pf: Enable SRIOV and added VF mbox handling") Signed-off-by: Subbaraya Sundeep sbhatta@marvell.com Signed-off-by: Sunil Goutham sgoutham@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c index 1e0d0c9c1dac3..ba7f6b295ca55 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c @@ -394,7 +394,12 @@ static int otx2_forward_vf_mbox_msgs(struct otx2_nic *pf, dst_mdev->msg_size = mbox_hdr->msg_size; dst_mdev->num_msgs = num_msgs; err = otx2_sync_mbox_msg(dst_mbox); - if (err) { + /* Error code -EIO indicate there is a communication failure + * to the AF. Rest of the error codes indicate that AF processed + * VF messages and set the error codes in response messages + * (if any) so simply forward responses to VF. + */ + if (err == -EIO) { dev_warn(pf->dev, "AF not responding to VF%d messages\n", vf); /* restore PF mbase and exit */
From: Kiran Kumar K kirankumark@marvell.com
[ Upstream commit 745166fcf01cecc4f5ff3defc6586868349a43f9 ]
With current KPU profile NGIO is being parsed along with CTAG as a single layer. Because of this MCAM/ntuple rules installed with ethertype as 0x8842 are not being hit. Adding KPU profile changes to parse NGIO in separate ltype and CTAG in separate ltype.
Fixes: f9c49be90c05 ("octeontx2-af: Update the default KPU profile and fixes") Signed-off-by: Kiran Kumar K kirankumark@marvell.com Signed-off-by: Subbaraya Sundeep sbhatta@marvell.com Signed-off-by: Sunil Goutham sgoutham@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../marvell/octeontx2/af/npc_profile.h | 70 +++++++++---------- 1 file changed, 35 insertions(+), 35 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/npc_profile.h b/drivers/net/ethernet/marvell/octeontx2/af/npc_profile.h index 0fe7ad35e36fd..4180376fa6763 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/npc_profile.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/npc_profile.h @@ -185,7 +185,6 @@ enum npc_kpu_parser_state { NPC_S_KPU2_QINQ, NPC_S_KPU2_ETAG, NPC_S_KPU2_EXDSA, - NPC_S_KPU2_NGIO, NPC_S_KPU2_CPT_CTAG, NPC_S_KPU2_CPT_QINQ, NPC_S_KPU3_CTAG, @@ -212,6 +211,7 @@ enum npc_kpu_parser_state { NPC_S_KPU5_NSH, NPC_S_KPU5_CPT_IP, NPC_S_KPU5_CPT_IP6, + NPC_S_KPU5_NGIO, NPC_S_KPU6_IP6_EXT, NPC_S_KPU6_IP6_HOP_DEST, NPC_S_KPU6_IP6_ROUT, @@ -1120,15 +1120,6 @@ static struct npc_kpu_profile_cam kpu1_cam_entries[] = { 0x0000, 0x0000, }, - { - NPC_S_KPU1_ETHER, 0xff, - NPC_ETYPE_CTAG, - 0xffff, - NPC_ETYPE_NGIO, - 0xffff, - 0x0000, - 0x0000, - }, { NPC_S_KPU1_ETHER, 0xff, NPC_ETYPE_CTAG, @@ -1966,6 +1957,15 @@ static struct npc_kpu_profile_cam kpu2_cam_entries[] = { 0x0000, 0x0000, }, + { + NPC_S_KPU2_CTAG, 0xff, + NPC_ETYPE_NGIO, + 0xffff, + 0x0000, + 0x0000, + 0x0000, + 0x0000, + }, { NPC_S_KPU2_CTAG, 0xff, NPC_ETYPE_PPPOE, @@ -2749,15 +2749,6 @@ static struct npc_kpu_profile_cam kpu2_cam_entries[] = { 0x0000, 0x0000, }, - { - NPC_S_KPU2_NGIO, 0xff, - 0x0000, - 0x0000, - 0x0000, - 0x0000, - 0x0000, - 0x0000, - }, { NPC_S_KPU2_CPT_CTAG, 0xff, NPC_ETYPE_IP, @@ -5089,6 +5080,15 @@ static struct npc_kpu_profile_cam kpu5_cam_entries[] = { 0x0000, 0x0000, }, + { + NPC_S_KPU5_NGIO, 0xff, + 0x0000, + 0x0000, + 0x0000, + 0x0000, + 0x0000, + 0x0000, + }, { NPC_S_NA, 0X00, 0x0000, @@ -8422,14 +8422,6 @@ static struct npc_kpu_profile_action kpu1_action_entries[] = { 0, 0, 0, 0, 0, }, - { - NPC_ERRLEV_RE, NPC_EC_NOERR, - 8, 12, 0, 0, 0, - NPC_S_KPU2_NGIO, 12, 1, - NPC_LID_LA, NPC_LT_LA_ETHER, - 0, - 0, 0, 0, 0, - }, { NPC_ERRLEV_RE, NPC_EC_NOERR, 8, 12, 0, 0, 0, @@ -9194,6 +9186,14 @@ static struct npc_kpu_profile_action kpu2_action_entries[] = { 0, 0, 0, 0, 0, }, + { + NPC_ERRLEV_RE, NPC_EC_NOERR, + 0, 0, 0, 2, 0, + NPC_S_KPU5_NGIO, 6, 1, + NPC_LID_LB, NPC_LT_LB_CTAG, + 0, + 0, 0, 0, 0, + }, { NPC_ERRLEV_RE, NPC_EC_NOERR, 8, 0, 6, 2, 0, @@ -9890,14 +9890,6 @@ static struct npc_kpu_profile_action kpu2_action_entries[] = { NPC_F_LB_U_UNK_ETYPE | NPC_F_LB_L_EXDSA, 0, 0, 0, 0, }, - { - NPC_ERRLEV_RE, NPC_EC_NOERR, - 0, 0, 0, 0, 1, - NPC_S_NA, 0, 1, - NPC_LID_LC, NPC_LT_LC_NGIO, - 0, - 0, 0, 0, 0, - }, { NPC_ERRLEV_RE, NPC_EC_NOERR, 8, 0, 6, 2, 0, @@ -11973,6 +11965,14 @@ static struct npc_kpu_profile_action kpu5_action_entries[] = { 0, 0, 0, 0, 0, }, + { + NPC_ERRLEV_RE, NPC_EC_NOERR, + 0, 0, 0, 0, 1, + NPC_S_NA, 0, 1, + NPC_LID_LC, NPC_LT_LC_NGIO, + 0, + 0, 0, 0, 0, + }, { NPC_ERRLEV_LC, NPC_EC_UNK, 0, 0, 0, 0, 1,
From: David Howells dhowells@redhat.com
[ Upstream commit 2c13c05c5ff4b9fc907b07f7311821910ebaaf8a ]
Improve retransmission backoff by only backing off when we retransmit data packets rather than when we set the lost ack timer.
To this end:
(1) In rxrpc_resend(), use rxrpc_get_rto_backoff() when setting the retransmission timer and only tell it that we are retransmitting if we actually have things to retransmit.
Note that it's possible for the retransmission algorithm to race with the processing of a received ACK, so we may see no packets needing retransmission.
(2) In rxrpc_send_data_packet(), don't bump the backoff when setting the ack_lost_at timer, as it may then get bumped twice.
With this, when looking at one particular packet, the retransmission intervals were seen to be 1.5ms, 2ms, 3ms, 5ms, 9ms, 17ms, 33ms, 71ms, 136ms, 264ms, 544ms, 1.088s, 2.1s, 4.2s and 8.3s.
Fixes: c410bf01933e ("rxrpc: Fix the excessive initial retransmission timeout") Suggested-by: Marc Dionne marc.dionne@auristor.com Signed-off-by: David Howells dhowells@redhat.com Reviewed-by: Marc Dionne marc.dionne@auristor.com Tested-by: Marc Dionne marc.dionne@auristor.com cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/164138117069.2023386.17446904856843997127.stgit@wa... Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/rxrpc/call_event.c | 8 +++----- net/rxrpc/output.c | 2 +- 2 files changed, 4 insertions(+), 6 deletions(-)
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index 6be2672a65eab..df864e6922679 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -157,7 +157,7 @@ static void rxrpc_congestion_timeout(struct rxrpc_call *call) static void rxrpc_resend(struct rxrpc_call *call, unsigned long now_j) { struct sk_buff *skb; - unsigned long resend_at, rto_j; + unsigned long resend_at; rxrpc_seq_t cursor, seq, top; ktime_t now, max_age, oldest, ack_ts; int ix; @@ -165,10 +165,8 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned long now_j)
_enter("{%d,%d}", call->tx_hard_ack, call->tx_top);
- rto_j = call->peer->rto_j; - now = ktime_get_real(); - max_age = ktime_sub(now, jiffies_to_usecs(rto_j)); + max_age = ktime_sub(now, jiffies_to_usecs(call->peer->rto_j));
spin_lock_bh(&call->lock);
@@ -213,7 +211,7 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned long now_j) }
resend_at = nsecs_to_jiffies(ktime_to_ns(ktime_sub(now, oldest))); - resend_at += jiffies + rto_j; + resend_at += jiffies + rxrpc_get_rto_backoff(call->peer, retrans); WRITE_ONCE(call->resend_at, resend_at);
if (unacked) diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index 10f2bf2e9068a..a45c83f22236e 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -468,7 +468,7 @@ done: if (call->peer->rtt_count > 1) { unsigned long nowj = jiffies, ack_lost_at;
- ack_lost_at = rxrpc_get_rto_backoff(call->peer, retrans); + ack_lost_at = rxrpc_get_rto_backoff(call->peer, false); ack_lost_at += nowj; WRITE_ONCE(call->ack_lost_at, ack_lost_at); rxrpc_reduce_call_timer(call, ack_lost_at, nowj,
From: Mihai Carabas mihai.carabas@oracle.com
[ Upstream commit e9b7c3a4263bdcfd31bc3d03d48ce0ded7a94635 ]
The kernel is aligned at SEGMENT_SIZE and this is the size populated in the PE headers:
arch/arm64/kernel/efi-header.S: .long SEGMENT_ALIGN // SectionAlignment
EFI_KIMG_ALIGN is defined as: (SEGMENT_ALIGN > THREAD_ALIGN ? SEGMENT_ALIGN : THREAD_ALIGN)
So it depends on THREAD_ALIGN. On newer builds this message started to appear even though the loader is taking into account the PE header (which is stating SEGMENT_ALIGN).
Fixes: c32ac11da3f8 ("efi/libstub: arm64: Double check image alignment at entry") Signed-off-by: Mihai Carabas mihai.carabas@oracle.com Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/efi/libstub/arm64-stub.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/firmware/efi/libstub/arm64-stub.c b/drivers/firmware/efi/libstub/arm64-stub.c index 2363fee9211c9..9cc556013d085 100644 --- a/drivers/firmware/efi/libstub/arm64-stub.c +++ b/drivers/firmware/efi/libstub/arm64-stub.c @@ -119,9 +119,9 @@ efi_status_t handle_kernel_image(unsigned long *image_addr, if (image->image_base != _text) efi_err("FIRMWARE BUG: efi_loaded_image_t::image_base has bogus value\n");
- if (!IS_ALIGNED((u64)_text, EFI_KIMG_ALIGN)) - efi_err("FIRMWARE BUG: kernel image not aligned on %ldk boundary\n", - EFI_KIMG_ALIGN >> 10); + if (!IS_ALIGNED((u64)_text, SEGMENT_ALIGN)) + efi_err("FIRMWARE BUG: kernel image not aligned on %dk boundary\n", + SEGMENT_ALIGN >> 10);
kernel_size = _edata - _text; kernel_memsize = kernel_size + (_end - _edata);
From: Dylan Yudaken dylany@fb.com
[ Upstream commit b36a2050040b2d839bdc044007cdd57101d7f881 ]
In some cases io_rsrc_ref_quiesce will call io_rsrc_node_switch_start, and then immediately flush the delayed work queue &ctx->rsrc_put_work.
However the percpu_ref_put does not immediately destroy the node, it will be called asynchronously via RCU. That ends up with io_rsrc_node_ref_zero only being called after rsrc_put_work has been flushed, and so the process ends up sleeping for 1 second unnecessarily.
This patch executes the put code immediately if we are busy quiescing.
Fixes: 4a38aed2a0a7 ("io_uring: batch reap of dead file registrations") Signed-off-by: Dylan Yudaken dylany@fb.com Link: https://lore.kernel.org/r/20220121123856.3557884-1-dylany@fb.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io_uring.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c index 15f303180d70c..698db7fb62e06 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -7761,10 +7761,15 @@ static __cold void io_rsrc_node_ref_zero(struct percpu_ref *ref) struct io_ring_ctx *ctx = node->rsrc_data->ctx; unsigned long flags; bool first_add = false; + unsigned long delay = HZ;
spin_lock_irqsave(&ctx->rsrc_ref_lock, flags); node->done = true;
+ /* if we are mid-quiesce then do not delay */ + if (node->rsrc_data->quiesce) + delay = 0; + while (!list_empty(&ctx->rsrc_ref_list)) { node = list_first_entry(&ctx->rsrc_ref_list, struct io_rsrc_node, node); @@ -7777,7 +7782,7 @@ static __cold void io_rsrc_node_ref_zero(struct percpu_ref *ref) spin_unlock_irqrestore(&ctx->rsrc_ref_lock, flags);
if (first_add) - mod_delayed_work(system_wq, &ctx->rsrc_put_work, HZ); + mod_delayed_work(system_wq, &ctx->rsrc_put_work, delay); }
static struct io_rsrc_node *io_rsrc_node_alloc(struct io_ring_ctx *ctx)
From: Miaoqian Lin linmq006@gmail.com
[ Upstream commit 83114df32ae779df57e0af99a8ba6c3968b2ba3d ]
kobject_init_and_add() takes reference even when it fails. According to the doc of kobject_init_and_add()
If this function returns an error, kobject_put() must be called to properly clean up the memory associated with the object.
Fix this issue by adding kobject_put(). Callback function blk_ia_ranges_sysfs_release() in kobject_put() can handle the pointer "iars" properly.
Fixes: a2247f19ee1c ("block: Add independent access ranges support") Signed-off-by: Miaoqian Lin linmq006@gmail.com Reviewed-by: Damien Le Moal damien.lemoal@opensource.wdc.com Link: https://lore.kernel.org/r/20220120101025.22411-1-linmq006@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-ia-ranges.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/blk-ia-ranges.c b/block/blk-ia-ranges.c index b925f3db3ab7a..18c68d8b9138e 100644 --- a/block/blk-ia-ranges.c +++ b/block/blk-ia-ranges.c @@ -144,7 +144,7 @@ int disk_register_independent_access_ranges(struct gendisk *disk, &q->kobj, "%s", "independent_access_ranges"); if (ret) { q->ia_ranges = NULL; - kfree(iars); + kobject_put(&iars->kobj); return ret; }
From: Yanming Liu yanminglr@gmail.com
[ Upstream commit 96d9d1fa5cd505078534113308ced0aa56d8da58 ]
Commit adae1e931acd ("Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring buffer") introduced a notion of maximum packet size in vmbus channel and used that size to initialize a buffer holding all incoming packet along with their vmbus packet header. hv_balloon uses the default maximum packet size VMBUS_DEFAULT_MAX_PKT_SIZE which matches its maximum message size, however vmbus_open expects this size to also include vmbus packet header. This leads to 4096 bytes dm_unballoon_request messages being truncated to 4080 bytes. When the driver tries to read next packet it starts from a wrong read_index, receives garbage and prints a lot of "Unhandled message: type: <garbage>" in dmesg.
Allocate the buffer with HV_HYP_PAGE_SIZE more bytes to make room for the header.
Fixes: adae1e931acd ("Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring buffer") Suggested-by: Michael Kelley (LINUX) mikelley@microsoft.com Suggested-by: Andrea Parri (Microsoft) parri.andrea@gmail.com Signed-off-by: Yanming Liu yanminglr@gmail.com Reviewed-by: Michael Kelley mikelley@microsoft.com Reviewed-by: Andrea Parri (Microsoft) parri.andrea@gmail.com Link: https://lore.kernel.org/r/20220119202052.3006981-1-yanminglr@gmail.com Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hv/hv_balloon.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index ca873a3b98dbe..f2d05bff42453 100644 --- a/drivers/hv/hv_balloon.c +++ b/drivers/hv/hv_balloon.c @@ -1660,6 +1660,13 @@ static int balloon_connect_vsp(struct hv_device *dev) unsigned long t; int ret;
+ /* + * max_pkt_size should be large enough for one vmbus packet header plus + * our receive buffer size. Hyper-V sends messages up to + * HV_HYP_PAGE_SIZE bytes long on balloon channel. + */ + dev->channel->max_pkt_size = HV_HYP_PAGE_SIZE * 2; + ret = vmbus_open(dev->channel, dm_ring_size, dm_ring_size, NULL, 0, balloon_onchannelcallback, dev); if (ret)
From: Guenter Roeck linux@roeck-us.net
[ Upstream commit bc341a1a98827925082e95db174734fc8bd68af6 ]
If alert handling is broken, interrupts are disabled after an alert and re-enabled after the alert clears. However, if there is an interrupt handler, this does not apply if alerts were originally disabled and enabled when the driver was loaded. In that case, interrupts will stay disabled after an alert was handled though the alert handler even after the alert condition clears. Address the situation by always re-enabling interrupts after the alert condition clears if there is an interrupt handler.
Fixes: 2abdc357c55d9 ("hwmon: (lm90) Unmask hardware interrupt") Cc: Dmitry Osipenko digetx@gmail.com Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/lm90.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hwmon/lm90.c b/drivers/hwmon/lm90.c index cc5e48fe304b1..e4ecf3440d7cf 100644 --- a/drivers/hwmon/lm90.c +++ b/drivers/hwmon/lm90.c @@ -848,7 +848,7 @@ static int lm90_update_device(struct device *dev) * Re-enable ALERT# output if it was originally enabled and * relevant alarms are all clear */ - if (!(data->config_orig & 0x80) && + if ((client->irq || !(data->config_orig & 0x80)) && !(data->alarms & data->alert_alarms)) { if (data->config & 0x80) { dev_dbg(&client->dev, "Re-enabling ALERT#\n");
From: Guenter Roeck linux@roeck-us.net
[ Upstream commit a53fff96f35763d132a36c620b183fdf11022d7a ]
Experiments with MAX6654 show that its alert function is broken, similar to other chips supported by the lm90 driver. Mark it accordingly.
Fixes: 229d495d8189 ("hwmon: (lm90) Add max6654 support to lm90 driver") Cc: Josh Lehan krellan@google.com Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/lm90.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/hwmon/lm90.c b/drivers/hwmon/lm90.c index e4ecf3440d7cf..280ae5f58187b 100644 --- a/drivers/hwmon/lm90.c +++ b/drivers/hwmon/lm90.c @@ -400,6 +400,7 @@ static const struct lm90_params lm90_params[] = { .reg_local_ext = MAX6657_REG_R_LOCAL_TEMPL, }, [max6654] = { + .flags = LM90_HAVE_BROKEN_ALERT, .alert_alarms = 0x7c, .max_convrate = 7, .reg_local_ext = MAX6657_REG_R_LOCAL_TEMPL,
From: Guenter Roeck linux@roeck-us.net
[ Upstream commit d379880d9adb9f1ada3f1266aa49ea2561328e08 ]
sysfs and udev notifications need to be sent to the _alarm attributes, not to the value attributes.
Fixes: 94dbd23ed88c ("hwmon: (lm90) Use hwmon_notify_event()") Cc: Dmitry Osipenko digetx@gmail.com Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/lm90.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/hwmon/lm90.c b/drivers/hwmon/lm90.c index ba01127c1deb1..1c9493c708132 100644 --- a/drivers/hwmon/lm90.c +++ b/drivers/hwmon/lm90.c @@ -1808,22 +1808,22 @@ static bool lm90_is_tripped(struct i2c_client *client, u16 *status)
if (st & LM90_STATUS_LLOW) hwmon_notify_event(data->hwmon_dev, hwmon_temp, - hwmon_temp_min, 0); + hwmon_temp_min_alarm, 0); if (st & LM90_STATUS_RLOW) hwmon_notify_event(data->hwmon_dev, hwmon_temp, - hwmon_temp_min, 1); + hwmon_temp_min_alarm, 1); if (st2 & MAX6696_STATUS2_R2LOW) hwmon_notify_event(data->hwmon_dev, hwmon_temp, - hwmon_temp_min, 2); + hwmon_temp_min_alarm, 2); if (st & LM90_STATUS_LHIGH) hwmon_notify_event(data->hwmon_dev, hwmon_temp, - hwmon_temp_max, 0); + hwmon_temp_max_alarm, 0); if (st & LM90_STATUS_RHIGH) hwmon_notify_event(data->hwmon_dev, hwmon_temp, - hwmon_temp_max, 1); + hwmon_temp_max_alarm, 1); if (st2 & MAX6696_STATUS2_R2HIGH) hwmon_notify_event(data->hwmon_dev, hwmon_temp, - hwmon_temp_max, 2); + hwmon_temp_max_alarm, 2);
return true; }
From: Dan Carpenter dan.carpenter@oracle.com
[ Upstream commit c1ec0cabc36718efc7fe8b4157d41b82d08ec1d2 ]
The "val" variable is controlled by the user and comes from hwmon_attr_store(). The FAN_RPM_TO_PERIOD() macro divides by "val" so a zero will crash the system. Check for that and return -EINVAL. Negatives are also invalid so return -EINVAL for those too.
Fixes: fc958a61ff6d ("hwmon: (adt7470) Convert to devm_hwmon_device_register_with_info API") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/adt7470.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/hwmon/adt7470.c b/drivers/hwmon/adt7470.c index d519aca4a9d64..fb6d14d213a18 100644 --- a/drivers/hwmon/adt7470.c +++ b/drivers/hwmon/adt7470.c @@ -662,6 +662,9 @@ static int adt7470_fan_write(struct device *dev, u32 attr, int channel, long val struct adt7470_data *data = dev_get_drvdata(dev); int err;
+ if (val <= 0) + return -EINVAL; + val = FAN_RPM_TO_PERIOD(val); val = clamp_val(val, 1, 65534);
From: Athira Rajeev atrajeev@linux.vnet.ibm.com
[ Upstream commit fb6433b48a178d4672cb26632454ee0b21056eaa ]
Running selftest with CONFIG_PPC_IRQ_SOFT_MASK_DEBUG enabled in kernel triggered below warning:
[ 172.851380] ------------[ cut here ]------------ [ 172.851391] WARNING: CPU: 8 PID: 2901 at arch/powerpc/include/asm/hw_irq.h:246 power_pmu_disable+0x270/0x280 [ 172.851402] Modules linked in: dm_mod bonding nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables rfkill nfnetlink sunrpc xfs libcrc32c pseries_rng xts vmx_crypto uio_pdrv_genirq uio sch_fq_codel ip_tables ext4 mbcache jbd2 sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp fuse [ 172.851442] CPU: 8 PID: 2901 Comm: lost_exception_ Not tainted 5.16.0-rc5-03218-g798527287598 #2 [ 172.851451] NIP: c00000000013d600 LR: c00000000013d5a4 CTR: c00000000013b180 [ 172.851458] REGS: c000000017687860 TRAP: 0700 Not tainted (5.16.0-rc5-03218-g798527287598) [ 172.851465] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 48004884 XER: 20040000 [ 172.851482] CFAR: c00000000013d5b4 IRQMASK: 1 [ 172.851482] GPR00: c00000000013d5a4 c000000017687b00 c000000002a10600 0000000000000004 [ 172.851482] GPR04: 0000000082004000 c0000008ba08f0a8 0000000000000000 00000008b7ed0000 [ 172.851482] GPR08: 00000000446194f6 0000000000008000 c00000000013b118 c000000000d58e68 [ 172.851482] GPR12: c00000000013d390 c00000001ec54a80 0000000000000000 0000000000000000 [ 172.851482] GPR16: 0000000000000000 0000000000000000 c000000015d5c708 c0000000025396d0 [ 172.851482] GPR20: 0000000000000000 0000000000000000 c00000000a3bbf40 0000000000000003 [ 172.851482] GPR24: 0000000000000000 c0000008ba097400 c0000000161e0d00 c00000000a3bb600 [ 172.851482] GPR28: c000000015d5c700 0000000000000001 0000000082384090 c0000008ba0020d8 [ 172.851549] NIP [c00000000013d600] power_pmu_disable+0x270/0x280 [ 172.851557] LR [c00000000013d5a4] power_pmu_disable+0x214/0x280 [ 172.851565] Call Trace: [ 172.851568] [c000000017687b00] [c00000000013d5a4] power_pmu_disable+0x214/0x280 (unreliable) [ 172.851579] [c000000017687b40] [c0000000003403ac] perf_pmu_disable+0x4c/0x60 [ 172.851588] [c000000017687b60] [c0000000003445e4] __perf_event_task_sched_out+0x1d4/0x660 [ 172.851596] [c000000017687c50] [c000000000d1175c] __schedule+0xbcc/0x12a0 [ 172.851602] [c000000017687d60] [c000000000d11ea8] schedule+0x78/0x140 [ 172.851608] [c000000017687d90] [c0000000001a8080] sys_sched_yield+0x20/0x40 [ 172.851615] [c000000017687db0] [c0000000000334dc] system_call_exception+0x18c/0x380 [ 172.851622] [c000000017687e10] [c00000000000c74c] system_call_common+0xec/0x268
The warning indicates that MSR_EE being set(interrupt enabled) when there was an overflown PMC detected. This could happen in power_pmu_disable since it runs under interrupt soft disable condition ( local_irq_save ) and not with interrupts hard disabled. commit 2c9ac51b850d ("powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC") intended to clear PMI pending bit in Paca when disabling the PMU. It could happen that PMC gets overflown while code is in power_pmu_disable callback function. Hence add a check to see if PMI pending bit is set in Paca before clearing it via clear_pmi_pending.
Fixes: 2c9ac51b850d ("powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC") Reported-by: Sachin Sant sachinp@linux.ibm.com Signed-off-by: Athira Rajeev atrajeev@linux.vnet.ibm.com Tested-by: Sachin Sant sachinp@linux.ibm.com Reviewed-by: Nicholas Piggin npiggin@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220122033429.25395-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/perf/core-book3s.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index bef6b1abce702..e78de70509472 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -1326,9 +1326,20 @@ static void power_pmu_disable(struct pmu *pmu) * Otherwise provide a warning if there is PMI pending, but * no counter is found overflown. */ - if (any_pmc_overflown(cpuhw)) - clear_pmi_irq_pending(); - else + if (any_pmc_overflown(cpuhw)) { + /* + * Since power_pmu_disable runs under local_irq_save, it + * could happen that code hits a PMC overflow without PMI + * pending in paca. Hence only clear PMI pending if it was + * set. + * + * If a PMI is pending, then MSR[EE] must be disabled (because + * the masked PMI handler disabling EE). So it is safe to + * call clear_pmi_irq_pending(). + */ + if (pmi_irq_pending()) + clear_pmi_irq_pending(); + } else WARN_ON(pmi_irq_pending());
val = mmcra = cpuhw->mmcr.mmcra;
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 27a8caa59babb96c5890569e131bc0eb6d45daee ]
During IP fragmentation we sanitize IP options. This means overwriting options which should not be copied with NOPs. Only the first fragment has the original, full options.
ip_fraglist_prepare() copies the IP header and options from previous fragment to the next one. Commit 19c3401a917b ("net: ipv4: place control buffer handling away from fragmentation iterators") moved sanitizing options before ip_fraglist_prepare() which means options are sanitized and then overwritten again with the old values.
Fixing this is not enough, however, nor did the sanitization work prior to aforementioned commit.
ip_options_fragment() (which does the sanitization) uses ipcb->opt.optlen for the length of the options. ipcb->opt of fragments is not populated (it's 0), only the head skb has the state properly built. So even when called at the right time ip_options_fragment() does nothing. This seems to date back all the way to v2.5.44 when the fast path for pre-fragmented skbs had been introduced. Prior to that ip_options_build() would have been called for every fragment (in fact ever since v2.5.44 the fragmentation handing in ip_options_build() has been dead code, I'll clean it up in -next).
In the original patch (see Link) caixf mentions fixing the handling for fragments other than the second one, but I'm not sure how _any_ fragment could have had their options sanitized with the code as it stood.
Tested with python (MTU on lo lowered to 1000 to force fragmentation):
import socket s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) s.setsockopt(socket.IPPROTO_IP, socket.IP_OPTIONS, bytearray([7,4,5,192, 20|0x80,4,1,0])) s.sendto(b'1'*2000, ('127.0.0.1', 1234))
Before:
IP (tos 0x0, ttl 64, id 1053, offset 0, flags [+], proto UDP (17), length 996, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256)) localhost.36500 > localhost.search-agent: UDP, length 2000 IP (tos 0x0, ttl 64, id 1053, offset 968, flags [+], proto UDP (17), length 996, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256)) localhost > localhost: udp IP (tos 0x0, ttl 64, id 1053, offset 1936, flags [none], proto UDP (17), length 100, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256)) localhost > localhost: udp
After:
IP (tos 0x0, ttl 96, id 42549, offset 0, flags [+], proto UDP (17), length 996, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256)) localhost.51607 > localhost.search-agent: UDP, bad length 2000 > 960 IP (tos 0x0, ttl 96, id 42549, offset 968, flags [+], proto UDP (17), length 996, options (NOP,NOP,NOP,NOP,RA value 256)) localhost > localhost: udp IP (tos 0x0, ttl 96, id 42549, offset 1936, flags [none], proto UDP (17), length 100, options (NOP,NOP,NOP,NOP,RA value 256)) localhost > localhost: udp
RA (20 | 0x80) is now copied as expected, RR (7) is "NOPed out".
Link: https://lore.kernel.org/netdev/20220107080559.122713-1-ooppublic@163.com/ Fixes: 19c3401a917b ("net: ipv4: place control buffer handling away from fragmentation iterators") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: caixf ooppublic@163.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/ip_output.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 9bca57ef8b838..ff38b46bd4b0f 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -826,15 +826,24 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, /* Everything is OK. Generate! */ ip_fraglist_init(skb, iph, hlen, &iter);
- if (iter.frag) - ip_options_fragment(iter.frag); - for (;;) { /* Prepare header of the next frame, * before previous one went down. */ if (iter.frag) { + bool first_frag = (iter.offset == 0); + IPCB(iter.frag)->flags = IPCB(skb)->flags; ip_fraglist_prepare(skb, &iter); + if (first_frag && IPCB(skb)->opt.optlen) { + /* ipcb->opt is not populated for frags + * coming from __ip_make_skb(), + * ip_options_fragment() needs optlen + */ + IPCB(iter.frag)->opt.optlen = + IPCB(skb)->opt.optlen; + ip_options_fragment(iter.frag); + ip_send_check(iter.iph); + } }
skb->tstamp = tstamp;
From: Sukadev Bhattiprolu sukadev@linux.ibm.com
[ Upstream commit db9f0e8bf79e6da7068b5818fea0ffd9d0d4b4da ]
If auto-priority-failover (APF) is enabled and there are at least two backing devices of different priorities, some resets like fail-over, change-param etc can cause at least two back to back failovers. (Failover from high priority backing device to lower priority one and then back to the higher priority one if that is still functional).
Depending on the timimg of the two failovers it is possible to trigger a "hard" reset and for the hard reset to fail due to failovers. When this occurs, the driver assumes that the network is unstable and disables the VNIC for a 60-second "settling time". This in turn can cause the ethtool command to fail with "No such device" while the vnic automatically recovers a little while later.
Given that it's possible to have two back to back failures, allow for extra failures before disabling the vnic for the settling time.
Fixes: f15fde9d47b8 ("ibmvnic: delay next reset if hard reset fails") Signed-off-by: Sukadev Bhattiprolu sukadev@linux.ibm.com Reviewed-by: Dany Madden drt@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/ibm/ibmvnic.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 0bb3911dd014d..9b2d16ad76f12 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -2598,6 +2598,7 @@ static void __ibmvnic_reset(struct work_struct *work) struct ibmvnic_rwi *rwi; unsigned long flags; u32 reset_state; + int num_fails = 0; int rc = 0;
adapter = container_of(work, struct ibmvnic_adapter, ibmvnic_reset); @@ -2651,11 +2652,23 @@ static void __ibmvnic_reset(struct work_struct *work) rc = do_hard_reset(adapter, rwi, reset_state); rtnl_unlock(); } - if (rc) { - /* give backing device time to settle down */ + if (rc) + num_fails++; + else + num_fails = 0; + + /* If auto-priority-failover is enabled we can get + * back to back failovers during resets, resulting + * in at least two failed resets (from high-priority + * backing device to low-priority one and then back) + * If resets continue to fail beyond that, give the + * adapter some time to settle down before retrying. + */ + if (num_fails >= 3) { netdev_dbg(adapter->netdev, - "[S:%s] Hard reset failed, waiting 60 secs\n", - adapter_state_to_string(adapter->state)); + "[S:%s] Hard reset failed %d times, waiting 60 secs\n", + adapter_state_to_string(adapter->state), + num_fails); set_current_state(TASK_UNINTERRUPTIBLE); schedule_timeout(60 * HZ); }
From: Sukadev Bhattiprolu sukadev@linux.ibm.com
[ Upstream commit 151b6a5c06b678687f64f2d9a99fd04d5cd32b72 ]
We use ->running_cap_crqs to determine when the ibmvnic_tasklet() should send out the next protocol message type. i.e when we get back responses to all our QUERY_CAPABILITY CRQs we send out REQUEST_CAPABILITY crqs. Similiary, when we get responses to all the REQUEST_CAPABILITY crqs, we send out the QUERY_IP_OFFLOAD CRQ.
We currently increment ->running_cap_crqs as we send out each CRQ and have the ibmvnic_tasklet() send out the next message type, when this running_cap_crqs count drops to 0.
This assumes that all the CRQs of the current type were sent out before the count drops to 0. However it is possible that we send out say 6 CRQs, get preempted and receive all the 6 responses before we send out the remaining CRQs. This can result in ->running_cap_crqs count dropping to zero before all messages of the current type were sent and we end up sending the next protocol message too early.
Instead initialize the ->running_cap_crqs upfront so the tasklet will only send the next protocol message after all responses are received.
Use the cap_reqs local variable to also detect any discrepancy (either now or in future) in the number of capability requests we actually send.
Currently only send_query_cap() is affected by this behavior (of sending next message early) since it is called from the worker thread (during reset) and from application thread (during ->ndo_open()) and they can be preempted. send_request_cap() is only called from the tasklet which processes CRQ responses sequentially, is not be affected. But to maintain the existing symmtery with send_query_capability() we update send_request_capability() also.
Fixes: 249168ad07cd ("ibmvnic: Make CRQ interrupt tasklet wait for all capabilities crqs") Signed-off-by: Sukadev Bhattiprolu sukadev@linux.ibm.com Reviewed-by: Dany Madden drt@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/ibm/ibmvnic.c | 106 +++++++++++++++++++---------- 1 file changed, 71 insertions(+), 35 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 9b2d16ad76f12..acd488310bbce 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -3849,11 +3849,25 @@ static void send_request_cap(struct ibmvnic_adapter *adapter, int retry) struct device *dev = &adapter->vdev->dev; union ibmvnic_crq crq; int max_entries; + int cap_reqs; + + /* We send out 6 or 7 REQUEST_CAPABILITY CRQs below (depending on + * the PROMISC flag). Initialize this count upfront. When the tasklet + * receives a response to all of these, it will send the next protocol + * message (QUERY_IP_OFFLOAD). + */ + if (!(adapter->netdev->flags & IFF_PROMISC) || + adapter->promisc_supported) + cap_reqs = 7; + else + cap_reqs = 6;
if (!retry) { /* Sub-CRQ entries are 32 byte long */ int entries_page = 4 * PAGE_SIZE / (sizeof(u64) * 4);
+ atomic_set(&adapter->running_cap_crqs, cap_reqs); + if (adapter->min_tx_entries_per_subcrq > entries_page || adapter->min_rx_add_entries_per_subcrq > entries_page) { dev_err(dev, "Fatal, invalid entries per sub-crq\n"); @@ -3914,44 +3928,45 @@ static void send_request_cap(struct ibmvnic_adapter *adapter, int retry) adapter->opt_rx_comp_queues;
adapter->req_rx_add_queues = adapter->max_rx_add_queues; + } else { + atomic_add(cap_reqs, &adapter->running_cap_crqs); } - memset(&crq, 0, sizeof(crq)); crq.request_capability.first = IBMVNIC_CRQ_CMD; crq.request_capability.cmd = REQUEST_CAPABILITY;
crq.request_capability.capability = cpu_to_be16(REQ_TX_QUEUES); crq.request_capability.number = cpu_to_be64(adapter->req_tx_queues); - atomic_inc(&adapter->running_cap_crqs); + cap_reqs--; ibmvnic_send_crq(adapter, &crq);
crq.request_capability.capability = cpu_to_be16(REQ_RX_QUEUES); crq.request_capability.number = cpu_to_be64(adapter->req_rx_queues); - atomic_inc(&adapter->running_cap_crqs); + cap_reqs--; ibmvnic_send_crq(adapter, &crq);
crq.request_capability.capability = cpu_to_be16(REQ_RX_ADD_QUEUES); crq.request_capability.number = cpu_to_be64(adapter->req_rx_add_queues); - atomic_inc(&adapter->running_cap_crqs); + cap_reqs--; ibmvnic_send_crq(adapter, &crq);
crq.request_capability.capability = cpu_to_be16(REQ_TX_ENTRIES_PER_SUBCRQ); crq.request_capability.number = cpu_to_be64(adapter->req_tx_entries_per_subcrq); - atomic_inc(&adapter->running_cap_crqs); + cap_reqs--; ibmvnic_send_crq(adapter, &crq);
crq.request_capability.capability = cpu_to_be16(REQ_RX_ADD_ENTRIES_PER_SUBCRQ); crq.request_capability.number = cpu_to_be64(adapter->req_rx_add_entries_per_subcrq); - atomic_inc(&adapter->running_cap_crqs); + cap_reqs--; ibmvnic_send_crq(adapter, &crq);
crq.request_capability.capability = cpu_to_be16(REQ_MTU); crq.request_capability.number = cpu_to_be64(adapter->req_mtu); - atomic_inc(&adapter->running_cap_crqs); + cap_reqs--; ibmvnic_send_crq(adapter, &crq);
if (adapter->netdev->flags & IFF_PROMISC) { @@ -3959,16 +3974,21 @@ static void send_request_cap(struct ibmvnic_adapter *adapter, int retry) crq.request_capability.capability = cpu_to_be16(PROMISC_REQUESTED); crq.request_capability.number = cpu_to_be64(1); - atomic_inc(&adapter->running_cap_crqs); + cap_reqs--; ibmvnic_send_crq(adapter, &crq); } } else { crq.request_capability.capability = cpu_to_be16(PROMISC_REQUESTED); crq.request_capability.number = cpu_to_be64(0); - atomic_inc(&adapter->running_cap_crqs); + cap_reqs--; ibmvnic_send_crq(adapter, &crq); } + + /* Keep at end to catch any discrepancy between expected and actual + * CRQs sent. + */ + WARN_ON(cap_reqs != 0); }
static int pending_scrq(struct ibmvnic_adapter *adapter, @@ -4362,118 +4382,132 @@ static void send_query_map(struct ibmvnic_adapter *adapter) static void send_query_cap(struct ibmvnic_adapter *adapter) { union ibmvnic_crq crq; + int cap_reqs; + + /* We send out 25 QUERY_CAPABILITY CRQs below. Initialize this count + * upfront. When the tasklet receives a response to all of these, it + * can send out the next protocol messaage (REQUEST_CAPABILITY). + */ + cap_reqs = 25; + + atomic_set(&adapter->running_cap_crqs, cap_reqs);
- atomic_set(&adapter->running_cap_crqs, 0); memset(&crq, 0, sizeof(crq)); crq.query_capability.first = IBMVNIC_CRQ_CMD; crq.query_capability.cmd = QUERY_CAPABILITY;
crq.query_capability.capability = cpu_to_be16(MIN_TX_QUEUES); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(MIN_RX_QUEUES); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(MIN_RX_ADD_QUEUES); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(MAX_TX_QUEUES); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(MAX_RX_QUEUES); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(MAX_RX_ADD_QUEUES); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(MIN_TX_ENTRIES_PER_SUBCRQ); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(MIN_RX_ADD_ENTRIES_PER_SUBCRQ); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(MAX_TX_ENTRIES_PER_SUBCRQ); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(MAX_RX_ADD_ENTRIES_PER_SUBCRQ); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(TCP_IP_OFFLOAD); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(PROMISC_SUPPORTED); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(MIN_MTU); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(MAX_MTU); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(MAX_MULTICAST_FILTERS); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(VLAN_HEADER_INSERTION); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(RX_VLAN_HEADER_INSERTION); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(MAX_TX_SG_ENTRIES); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(RX_SG_SUPPORTED); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(OPT_TX_COMP_SUB_QUEUES); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(OPT_RX_COMP_QUEUES); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(OPT_RX_BUFADD_Q_PER_RX_COMP_Q); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(OPT_TX_ENTRIES_PER_SUBCRQ); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(OPT_RXBA_ENTRIES_PER_SUBCRQ); - atomic_inc(&adapter->running_cap_crqs); ibmvnic_send_crq(adapter, &crq); + cap_reqs--;
crq.query_capability.capability = cpu_to_be16(TX_RX_DESC_REQ); - atomic_inc(&adapter->running_cap_crqs); + ibmvnic_send_crq(adapter, &crq); + cap_reqs--; + + /* Keep at end to catch any discrepancy between expected and actual + * CRQs sent. + */ + WARN_ON(cap_reqs != 0); }
static void send_query_ip_offload(struct ibmvnic_adapter *adapter) @@ -4777,6 +4811,8 @@ static void handle_request_cap_rsp(union ibmvnic_crq *crq, char *name;
atomic_dec(&adapter->running_cap_crqs); + netdev_dbg(adapter->netdev, "Outstanding request-caps: %d\n", + atomic_read(&adapter->running_cap_crqs)); switch (be16_to_cpu(crq->request_capability_rsp.capability)) { case REQ_TX_QUEUES: req_value = &adapter->req_tx_queues;
From: Sukadev Bhattiprolu sukadev@linux.ibm.com
[ Upstream commit 48079e7fdd0269d66b1d7d66ae88bd03162464ad ]
ibmvnic_tasklet() continuously spins waiting for responses to all capability requests. It does this to avoid encountering an error during initialization of the vnic. However if there is a bug in the VIOS and we do not receive a response to one or more queries the tasklet ends up spinning continuously leading to hard lock ups.
If we fail to receive a message from the VIOS it is reasonable to timeout the login attempt rather than spin indefinitely in the tasklet.
Fixes: 249168ad07cd ("ibmvnic: Make CRQ interrupt tasklet wait for all capabilities crqs") Signed-off-by: Sukadev Bhattiprolu sukadev@linux.ibm.com Reviewed-by: Dany Madden drt@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/ibm/ibmvnic.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index acd488310bbce..682a440151a87 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -5491,12 +5491,6 @@ static void ibmvnic_tasklet(struct tasklet_struct *t) ibmvnic_handle_crq(crq, adapter); crq->generic.first = 0; } - - /* remain in tasklet until all - * capabilities responses are received - */ - if (!adapter->wait_capability) - done = true; } /* if capabilities CRQ's were sent in this tasklet, the following * tasklet must wait until all responses are received
From: Wen Gu guwen@linux.alibaba.com
[ Upstream commit c0bf3d8a943b6f2e912b7c1de03e2ef28e76f760 ]
We encountered a crash in smc_setsockopt() and it is caused by accessing smc->clcsock after clcsock was released.
BUG: kernel NULL pointer dereference, address: 0000000000000020 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 50309 Comm: nginx Kdump: loaded Tainted: G E 5.16.0-rc4+ #53 RIP: 0010:smc_setsockopt+0x59/0x280 [smc] Call Trace: <TASK> __sys_setsockopt+0xfc/0x190 __x64_sys_setsockopt+0x20/0x30 do_syscall_64+0x34/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f16ba83918e </TASK>
This patch tries to fix it by holding clcsock_release_lock and checking whether clcsock has already been released before access.
In case that a crash of the same reason happens in smc_getsockopt() or smc_switch_to_fallback(), this patch also checkes smc->clcsock in them too. And the caller of smc_switch_to_fallback() will identify whether fallback succeeds according to the return value.
Fixes: fd57770dd198 ("net/smc: wait for pending work before clcsock release_sock") Link: https://lore.kernel.org/lkml/5dd7ffd1-28e2-24cc-9442-1defec27375e@linux.ibm.... Signed-off-by: Wen Gu guwen@linux.alibaba.com Acked-by: Karsten Graul kgraul@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/smc/af_smc.c | 63 +++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 51 insertions(+), 12 deletions(-)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 211cd91b6c408..85e077a69c67d 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -566,12 +566,17 @@ static void smc_stat_fallback(struct smc_sock *smc) mutex_unlock(&net->smc.mutex_fback_rsn); }
-static void smc_switch_to_fallback(struct smc_sock *smc, int reason_code) +static int smc_switch_to_fallback(struct smc_sock *smc, int reason_code) { wait_queue_head_t *smc_wait = sk_sleep(&smc->sk); - wait_queue_head_t *clc_wait = sk_sleep(smc->clcsock->sk); + wait_queue_head_t *clc_wait; unsigned long flags;
+ mutex_lock(&smc->clcsock_release_lock); + if (!smc->clcsock) { + mutex_unlock(&smc->clcsock_release_lock); + return -EBADF; + } smc->use_fallback = true; smc->fallback_rsn = reason_code; smc_stat_fallback(smc); @@ -586,18 +591,30 @@ static void smc_switch_to_fallback(struct smc_sock *smc, int reason_code) * smc socket->wq, which should be removed * to clcsocket->wq during the fallback. */ + clc_wait = sk_sleep(smc->clcsock->sk); spin_lock_irqsave(&smc_wait->lock, flags); spin_lock_nested(&clc_wait->lock, SINGLE_DEPTH_NESTING); list_splice_init(&smc_wait->head, &clc_wait->head); spin_unlock(&clc_wait->lock); spin_unlock_irqrestore(&smc_wait->lock, flags); } + mutex_unlock(&smc->clcsock_release_lock); + return 0; }
/* fall back during connect */ static int smc_connect_fallback(struct smc_sock *smc, int reason_code) { - smc_switch_to_fallback(smc, reason_code); + struct net *net = sock_net(&smc->sk); + int rc = 0; + + rc = smc_switch_to_fallback(smc, reason_code); + if (rc) { /* fallback fails */ + this_cpu_inc(net->smc.smc_stats->clnt_hshake_err_cnt); + if (smc->sk.sk_state == SMC_INIT) + sock_put(&smc->sk); /* passive closing */ + return rc; + } smc_copy_sock_settings_to_clc(smc); smc->connect_nonblock = 0; if (smc->sk.sk_state == SMC_INIT) @@ -1514,11 +1531,12 @@ static void smc_listen_decline(struct smc_sock *new_smc, int reason_code, { /* RDMA setup failed, switch back to TCP */ smc_conn_abort(new_smc, local_first); - if (reason_code < 0) { /* error, no fallback possible */ + if (reason_code < 0 || + smc_switch_to_fallback(new_smc, reason_code)) { + /* error, no fallback possible */ smc_listen_out_err(new_smc); return; } - smc_switch_to_fallback(new_smc, reason_code); if (reason_code && reason_code != SMC_CLC_DECL_PEERDECL) { if (smc_clc_send_decline(new_smc, reason_code, version) < 0) { smc_listen_out_err(new_smc); @@ -1960,8 +1978,11 @@ static void smc_listen_work(struct work_struct *work)
/* check if peer is smc capable */ if (!tcp_sk(newclcsock->sk)->syn_smc) { - smc_switch_to_fallback(new_smc, SMC_CLC_DECL_PEERNOSMC); - smc_listen_out_connected(new_smc); + rc = smc_switch_to_fallback(new_smc, SMC_CLC_DECL_PEERNOSMC); + if (rc) + smc_listen_out_err(new_smc); + else + smc_listen_out_connected(new_smc); return; }
@@ -2250,7 +2271,9 @@ static int smc_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
if (msg->msg_flags & MSG_FASTOPEN) { if (sk->sk_state == SMC_INIT && !smc->connect_nonblock) { - smc_switch_to_fallback(smc, SMC_CLC_DECL_OPTUNSUPP); + rc = smc_switch_to_fallback(smc, SMC_CLC_DECL_OPTUNSUPP); + if (rc) + goto out; } else { rc = -EINVAL; goto out; @@ -2443,6 +2466,11 @@ static int smc_setsockopt(struct socket *sock, int level, int optname, /* generic setsockopts reaching us here always apply to the * CLC socket */ + mutex_lock(&smc->clcsock_release_lock); + if (!smc->clcsock) { + mutex_unlock(&smc->clcsock_release_lock); + return -EBADF; + } if (unlikely(!smc->clcsock->ops->setsockopt)) rc = -EOPNOTSUPP; else @@ -2452,6 +2480,7 @@ static int smc_setsockopt(struct socket *sock, int level, int optname, sk->sk_err = smc->clcsock->sk->sk_err; sk_error_report(sk); } + mutex_unlock(&smc->clcsock_release_lock);
if (optlen < sizeof(int)) return -EINVAL; @@ -2468,7 +2497,7 @@ static int smc_setsockopt(struct socket *sock, int level, int optname, case TCP_FASTOPEN_NO_COOKIE: /* option not supported by SMC */ if (sk->sk_state == SMC_INIT && !smc->connect_nonblock) { - smc_switch_to_fallback(smc, SMC_CLC_DECL_OPTUNSUPP); + rc = smc_switch_to_fallback(smc, SMC_CLC_DECL_OPTUNSUPP); } else { rc = -EINVAL; } @@ -2511,13 +2540,23 @@ static int smc_getsockopt(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen) { struct smc_sock *smc; + int rc;
smc = smc_sk(sock->sk); + mutex_lock(&smc->clcsock_release_lock); + if (!smc->clcsock) { + mutex_unlock(&smc->clcsock_release_lock); + return -EBADF; + } /* socket options apply to the CLC socket */ - if (unlikely(!smc->clcsock->ops->getsockopt)) + if (unlikely(!smc->clcsock->ops->getsockopt)) { + mutex_unlock(&smc->clcsock_release_lock); return -EOPNOTSUPP; - return smc->clcsock->ops->getsockopt(smc->clcsock, level, optname, - optval, optlen); + } + rc = smc->clcsock->ops->getsockopt(smc->clcsock, level, optname, + optval, optlen); + mutex_unlock(&smc->clcsock_release_lock); + return rc; }
static int smc_ioctl(struct socket *sock, unsigned int cmd,
From: Michael Kelley mikelley@microsoft.com
[ Upstream commit 9ff5549b1d1d3c3a9d71220d44bd246586160f1d ]
In the WIN10 version of the Synthetic Video protocol with Hyper-V, Hyper-V reports a list of supported resolutions as part of the protocol negotiation. The driver calculates the maximum width and height from the list of resolutions, and uses those maximums to validate any screen resolution specified in the video= option on the kernel boot line.
This method of validation is incorrect. For example, the list of supported resolutions could contain 1600x1200 and 1920x1080, both of which fit in an 8 Mbyte frame buffer. But calculating the max width and height yields 1920 and 1200, and 1920x1200 resolution does not fit in an 8 Mbyte frame buffer. Unfortunately, this resolution is accepted, causing a kernel fault when the driver accesses memory outside the frame buffer.
Instead, validate the specified screen resolution by calculating its size, and comparing against the frame buffer size. Delete the code for calculating the max width and height from the list of resolutions, since these max values have no use. Also add the frame buffer size to the info message to aid in understanding why a resolution might be rejected.
Fixes: 67e7cdb4829d ("video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host") Signed-off-by: Michael Kelley mikelley@microsoft.com Reviewed-by: Haiyang Zhang haiyangz@microsoft.com Acked-by: Helge Deller deller@gmx.de Link: https://lore.kernel.org/r/1642360711-2335-1-git-send-email-mikelley@microsof... Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/video/fbdev/hyperv_fb.c | 16 +++------------- 1 file changed, 3 insertions(+), 13 deletions(-)
diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c index 23999df527393..c8e0ea27caf1d 100644 --- a/drivers/video/fbdev/hyperv_fb.c +++ b/drivers/video/fbdev/hyperv_fb.c @@ -287,8 +287,6 @@ struct hvfb_par {
static uint screen_width = HVFB_WIDTH; static uint screen_height = HVFB_HEIGHT; -static uint screen_width_max = HVFB_WIDTH; -static uint screen_height_max = HVFB_HEIGHT; static uint screen_depth; static uint screen_fb_size; static uint dio_fb_size; /* FB size for deferred IO */ @@ -582,7 +580,6 @@ static int synthvid_get_supported_resolution(struct hv_device *hdev) int ret = 0; unsigned long t; u8 index; - int i;
memset(msg, 0, sizeof(struct synthvid_msg)); msg->vid_hdr.type = SYNTHVID_RESOLUTION_REQUEST; @@ -613,13 +610,6 @@ static int synthvid_get_supported_resolution(struct hv_device *hdev) goto out; }
- for (i = 0; i < msg->resolution_resp.resolution_count; i++) { - screen_width_max = max_t(unsigned int, screen_width_max, - msg->resolution_resp.supported_resolution[i].width); - screen_height_max = max_t(unsigned int, screen_height_max, - msg->resolution_resp.supported_resolution[i].height); - } - screen_width = msg->resolution_resp.supported_resolution[index].width; screen_height = @@ -941,7 +931,7 @@ static void hvfb_get_option(struct fb_info *info)
if (x < HVFB_WIDTH_MIN || y < HVFB_HEIGHT_MIN || (synthvid_ver_ge(par->synthvid_version, SYNTHVID_VERSION_WIN10) && - (x > screen_width_max || y > screen_height_max)) || + (x * y * screen_depth / 8 > screen_fb_size)) || (par->synthvid_version == SYNTHVID_VERSION_WIN8 && x * y * screen_depth / 8 > SYNTHVID_FB_SIZE_WIN8) || (par->synthvid_version == SYNTHVID_VERSION_WIN7 && @@ -1194,8 +1184,8 @@ static int hvfb_probe(struct hv_device *hdev, }
hvfb_get_option(info); - pr_info("Screen resolution: %dx%d, Color depth: %d\n", - screen_width, screen_height, screen_depth); + pr_info("Screen resolution: %dx%d, Color depth: %d, Frame buffer size: %d\n", + screen_width, screen_height, screen_depth, screen_fb_size);
ret = hvfb_getmem(hdev, info); if (ret) {
From: Marc Kleine-Budde mkl@pengutronix.de
[ Upstream commit e59986de5ff701494e14c722b78b6e6d513e0ab5 ]
The MRAM of the tcan4x5x has a size of 2K and starts at 0x8000. There are no further registers in the tcan4x5x making 0x87fc the biggest addressable register.
This patch fixes the max register value of the regmap config from 0x8ffc to 0x87fc.
Fixes: 6e1caaf8ed22 ("can: tcan4x5x: fix max register value") Link: https://lore.kernel.org/all/20220119064011.2943292-1-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/can/m_can/tcan4x5x-regmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/can/m_can/tcan4x5x-regmap.c b/drivers/net/can/m_can/tcan4x5x-regmap.c index ca80dbaf7a3f5..26e212b8ca7a6 100644 --- a/drivers/net/can/m_can/tcan4x5x-regmap.c +++ b/drivers/net/can/m_can/tcan4x5x-regmap.c @@ -12,7 +12,7 @@ #define TCAN4X5X_SPI_INSTRUCTION_WRITE (0x61 << 24) #define TCAN4X5X_SPI_INSTRUCTION_READ (0x41 << 24)
-#define TCAN4X5X_MAX_REGISTER 0x8ffc +#define TCAN4X5X_MAX_REGISTER 0x87fc
static int tcan4x5x_regmap_gather_write(void *context, const void *reg, size_t reg_len,
From: Guenter Roeck linux@roeck-us.net
[ Upstream commit 79da533d3cc717ccc05ddbd3190da8a72bc2408b ]
Paweł Marciniak reports the following crash, observed when clearing the chassis intrusion alarm.
BUG: kernel NULL pointer dereference, address: 0000000000000028 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 3 PID: 4815 Comm: bash Tainted: G S 5.16.2-200.fc35.x86_64 #1 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z97 Extreme4, BIOS P2.60A 05/03/2018 RIP: 0010:clear_caseopen+0x5a/0x120 [nct6775] Code: 68 70 e8 e9 32 b1 e3 85 c0 0f 85 d2 00 00 00 48 83 7c 24 ... RSP: 0018:ffffabcb02803dd8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000 RDX: ffff8e8808192880 RSI: 0000000000000000 RDI: ffff8e87c7509a68 RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000000a R10: 000000000000000a R11: f000000000000000 R12: 000000000000001f R13: ffff8e87c7509828 R14: ffff8e87c7509a68 R15: ffff8e88494527a0 FS: 00007f4db9151740(0000) GS:ffff8e8ebfec0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000028 CR3: 0000000166b66001 CR4: 00000000001706e0 Call Trace: <TASK> kernfs_fop_write_iter+0x11c/0x1b0 new_sync_write+0x10b/0x180 vfs_write+0x209/0x2a0 ksys_write+0x4f/0xc0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae
The problem is that the device passed to clear_caseopen() is the hwmon device, not the platform device, and the platform data is not set in the hwmon device. Store the pointer to sio_data in struct nct6775_data and get if from there if needed.
Fixes: 2e7b9886968b ("hwmon: (nct6775) Use superio_*() function pointers in sio_data.") Cc: Denis Pauk pauk.denis@gmail.com Cc: Bernhard Seibold mail@bernhard-seibold.de Reported-by: Paweł Marciniak pmarciniak@lodz.home.pl Tested-by: Denis Pauk pauk.denis@gmail.com Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/nct6775.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/hwmon/nct6775.c b/drivers/hwmon/nct6775.c index 57ce8633a7256..3496a823e4d4f 100644 --- a/drivers/hwmon/nct6775.c +++ b/drivers/hwmon/nct6775.c @@ -1175,7 +1175,7 @@ static inline u8 in_to_reg(u32 val, u8 nr)
struct nct6775_data { int addr; /* IO base of hw monitor block */ - int sioreg; /* SIO register address */ + struct nct6775_sio_data *sio_data; enum kinds kind; const char *name;
@@ -3561,7 +3561,7 @@ clear_caseopen(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { struct nct6775_data *data = dev_get_drvdata(dev); - struct nct6775_sio_data *sio_data = dev_get_platdata(dev); + struct nct6775_sio_data *sio_data = data->sio_data; int nr = to_sensor_dev_attr(attr)->index - INTRUSION_ALARM_BASE; unsigned long val; u8 reg; @@ -3969,7 +3969,7 @@ static int nct6775_probe(struct platform_device *pdev) return -ENOMEM;
data->kind = sio_data->kind; - data->sioreg = sio_data->sioreg; + data->sio_data = sio_data;
if (sio_data->access == access_direct) { data->addr = res->start;
From: Miaoqian Lin linmq006@gmail.com
[ Upstream commit 774fe0cd838d1b1419d41ab4ea0613c80d4ecbd7 ]
The reference taken by 'of_find_device_by_node()' must be released when not needed anymore. Add the corresponding 'put_device()' in the error handling path.
Fixes: e00012b256d4 ("drm/msm/hdmi: Make HDMI core get its PHY") Signed-off-by: Miaoqian Lin linmq006@gmail.com Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Link: https://lore.kernel.org/r/20220107085026.23831-1-linmq006@gmail.com Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/hdmi/hdmi.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/hdmi/hdmi.c b/drivers/gpu/drm/msm/hdmi/hdmi.c index 75b64e6ae0350..a439794a32e81 100644 --- a/drivers/gpu/drm/msm/hdmi/hdmi.c +++ b/drivers/gpu/drm/msm/hdmi/hdmi.c @@ -95,10 +95,15 @@ static int msm_hdmi_get_phy(struct hdmi *hdmi)
of_node_put(phy_node);
- if (!phy_pdev || !hdmi->phy) { + if (!phy_pdev) { DRM_DEV_ERROR(&pdev->dev, "phy driver is not ready\n"); return -EPROBE_DEFER; } + if (!hdmi->phy) { + DRM_DEV_ERROR(&pdev->dev, "phy driver is not ready\n"); + put_device(&phy_pdev->dev); + return -EPROBE_DEFER; + }
hdmi->phy_dev = get_device(&phy_pdev->dev);
From: José Expósito jose.exposito89@gmail.com
[ Upstream commit 170b22234d5495f5e0844246e23f004639ee89ba ]
The function performs a check on the "ctx" input parameter, however, it is used before the check.
Initialize the "base" variable after the sanity check to avoid a possible NULL pointer dereference.
Fixes: 4259ff7ae509e ("drm/msm/dpu: add support for pcc color block in dpu driver") Addresses-Coverity-ID: 1493866 ("Null pointer dereference") Signed-off-by: José Expósito jose.exposito89@gmail.com Link: https://lore.kernel.org/r/20220109192431.135949-1-jose.exposito89@gmail.com Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dspp.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dspp.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dspp.c index a98e964c3b6fa..355894a3b48c3 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dspp.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dspp.c @@ -26,9 +26,16 @@ static void dpu_setup_dspp_pcc(struct dpu_hw_dspp *ctx, struct dpu_hw_pcc_cfg *cfg) {
- u32 base = ctx->cap->sblk->pcc.base; + u32 base;
- if (!ctx || !base) { + if (!ctx) { + DRM_ERROR("invalid ctx %pK\n", ctx); + return; + } + + base = ctx->cap->sblk->pcc.base; + + if (!base) { DRM_ERROR("invalid ctx %pK pcc base 0x%x\n", ctx, base); return; }
From: Rob Clark robdclark@chromium.org
[ Upstream commit 860a7b2a87b7c743154824d0597b6c3eb3b53154 ]
Reported-by: Danylo Piliaiev dpiliaiev@igalia.com Fixes: 3ab1c5cc3939 ("drm/msm: Add param for userspace to query suspend count") Signed-off-by: Rob Clark robdclark@chromium.org Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Link: https://lore.kernel.org/r/20220113163215.215367-1-robdclark@gmail.com Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index 78aad5216a613..a305ff7e8c6fb 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -1557,6 +1557,8 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu) for (i = 0; i < gpu->nr_rings; i++) a6xx_gpu->shadow[i] = 0;
+ gpu->suspend_count++; + return 0; }
From: Hangyu Hua hbh25y@gmail.com
[ Upstream commit 29eb31542787e1019208a2e1047bb7c76c069536 ]
ym needs to be free when ym->cmd != SIOCYAMSMCS.
Fixes: 0781168e23a2 ("yam: fix a missing-check bug") Signed-off-by: Hangyu Hua hbh25y@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/hamradio/yam.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/net/hamradio/yam.c b/drivers/net/hamradio/yam.c index 6376b8485976c..980f2be32f05a 100644 --- a/drivers/net/hamradio/yam.c +++ b/drivers/net/hamradio/yam.c @@ -950,9 +950,7 @@ static int yam_siocdevprivate(struct net_device *dev, struct ifreq *ifr, void __ ym = memdup_user(data, sizeof(struct yamdrv_ioctl_mcs)); if (IS_ERR(ym)) return PTR_ERR(ym); - if (ym->cmd != SIOCYAMSMCS) - return -EINVAL; - if (ym->bitrate > YAM_MAXBITRATE) { + if (ym->cmd != SIOCYAMSMCS || ym->bitrate > YAM_MAXBITRATE) { kfree(ym); return -EINVAL; }
From: Toke Høiland-Jørgensen toke@redhat.com
[ Upstream commit c63003e3d99761afb280add3b30de1cf30fa522b ]
The cpsw driver didn't properly initialise the struct page_pool_params before calling page_pool_create(), which leads to crashes after the struct has been expanded with new parameters.
The second Fixes tag below is where the buggy code was introduced, but because the code was moved around this patch will only apply on top of the commit in the first Fixes tag.
Fixes: c5013ac1dd0e ("net: ethernet: ti: cpsw: move set of common functions in cpsw_priv") Fixes: 9ed4050c0d75 ("net: ethernet: ti: cpsw: add XDP support") Reported-by: Colin Foster colin.foster@in-advantage.com Signed-off-by: Toke Høiland-Jørgensen toke@redhat.com Tested-by: Colin Foster colin.foster@in-advantage.com Acked-by: Jesper Dangaard Brouer brouer@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/ti/cpsw_priv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/ti/cpsw_priv.c b/drivers/net/ethernet/ti/cpsw_priv.c index 6bb5ac51d23c3..f8e591d69d2cb 100644 --- a/drivers/net/ethernet/ti/cpsw_priv.c +++ b/drivers/net/ethernet/ti/cpsw_priv.c @@ -1144,7 +1144,7 @@ int cpsw_fill_rx_channels(struct cpsw_priv *priv) static struct page_pool *cpsw_create_page_pool(struct cpsw_common *cpsw, int size) { - struct page_pool_params pp_params; + struct page_pool_params pp_params = {}; struct page_pool *pool;
pp_params.order = 0;
From: Yufeng Mo moyufeng@huawei.com
[ Upstream commit 2f61353cd2f789a4229b6f5c1c24a40a613357bb ]
Since some interrupt states may be cleared by hardware, the driver may receive an empty interrupt. Currently, the VF driver directly disables the vector0 interrupt in this case. As a result, the VF is unavailable. Therefore, the vector0 interrupt should be enabled in this case.
Fixes: b90fcc5bd904 ("net: hns3: add reset handling for VF when doing Core/Global/IMP reset") Signed-off-by: Yufeng Mo moyufeng@huawei.com Signed-off-by: Guangbin Huang huangguangbin2@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c index 41afaeea881bc..70491e07b0ff6 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c @@ -2496,8 +2496,7 @@ static irqreturn_t hclgevf_misc_irq_handle(int irq, void *data) break; }
- if (event_cause != HCLGEVF_VECTOR0_EVENT_OTHER) - hclgevf_enable_vector(&hdev->misc_vector, true); + hclgevf_enable_vector(&hdev->misc_vector, true);
return IRQ_HANDLED; }
From: David Matlack dmatlack@google.com
[ Upstream commit de1956f48543e90f94b1194395f33140898b39b2 ]
This selftest was accidentally removed by commit 6a58150859fd ("selftest: KVM: Add intra host migration tests"). Add it back.
Fixes: 6a58150859fd ("selftest: KVM: Add intra host migration tests") Signed-off-by: David Matlack dmatlack@google.com Message-Id: 20220120003826.2805036-1-dmatlack@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/kvm/Makefile | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 290b1b0552d6e..4fdfb42aeddba 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -77,6 +77,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/xen_shinfo_test TEST_GEN_PROGS_x86_64 += x86_64/xen_vmcall_test TEST_GEN_PROGS_x86_64 += x86_64/vmx_pi_mmio_test TEST_GEN_PROGS_x86_64 += x86_64/sev_migrate_tests +TEST_GEN_PROGS_x86_64 += access_tracking_perf_test TEST_GEN_PROGS_x86_64 += demand_paging_test TEST_GEN_PROGS_x86_64 += dirty_log_test TEST_GEN_PROGS_x86_64 += dirty_log_perf_test
From: Maxim Mikityanskiy maximmi@nvidia.com
[ Upstream commit 429c3be8a5e2695b5b92a6a12361eb89eb185495 ]
The current implementation of HTB offload doesn't support some parameters. Instead of ignoring them, actively return the EINVAL error when they are set to non-defaults.
As this patch goes to stable, the driver API is not changed here. If future drivers support more offload parameters, the checks can be moved to the driver side.
Note that the buffer and cbuffer parameters are also not supported, but the tc userspace tool assigns some default values derived from rate and ceil, and identifying these defaults in sch_htb would be unreliable, so they are still ignored.
Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload") Reported-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Maxim Mikityanskiy maximmi@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Link: https://lore.kernel.org/r/20220125100654.424570-1-maximmi@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_htb.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index 9267922ea9c37..23a9d6242429f 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -1810,6 +1810,26 @@ static int htb_change_class(struct Qdisc *sch, u32 classid, if (!hopt->rate.rate || !hopt->ceil.rate) goto failure;
+ if (q->offload) { + /* Options not supported by the offload. */ + if (hopt->rate.overhead || hopt->ceil.overhead) { + NL_SET_ERR_MSG(extack, "HTB offload doesn't support the overhead parameter"); + goto failure; + } + if (hopt->rate.mpu || hopt->ceil.mpu) { + NL_SET_ERR_MSG(extack, "HTB offload doesn't support the mpu parameter"); + goto failure; + } + if (hopt->quantum) { + NL_SET_ERR_MSG(extack, "HTB offload doesn't support the quantum parameter"); + goto failure; + } + if (hopt->prio) { + NL_SET_ERR_MSG(extack, "HTB offload doesn't support the prio parameter"); + goto failure; + } + } + /* Keeping backward compatible with rate_table based iproute2 tc */ if (hopt->rate.linklayer == TC_LINKLAYER_UNAWARE) qdisc_put_rtab(qdisc_get_rtab(&hopt->rate, tb[TCA_HTB_RTAB],
From: Dave Airlie airlied@redhat.com
[ Upstream commit 76cea3d95513fe40000d06a3719c4bb6b53275e2 ]
This reverts commit 9bb7b689274b67ecb3641e399e76f84adc627df1.
This caused a regression reported to Red Hat.
Fixes: 9bb7b689274b ("drm/ast: Support 1600x900 with 108MHz PCLK") Signed-off-by: Dave Airlie airlied@redhat.com Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Link: https://patchwork.freedesktop.org/patch/msgid/20220120040527.552068-1-airlie... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/ast/ast_tables.h | 2 -- 1 file changed, 2 deletions(-)
diff --git a/drivers/gpu/drm/ast/ast_tables.h b/drivers/gpu/drm/ast/ast_tables.h index d9eb353a4bf09..dbe1cc620f6e6 100644 --- a/drivers/gpu/drm/ast/ast_tables.h +++ b/drivers/gpu/drm/ast/ast_tables.h @@ -282,8 +282,6 @@ static const struct ast_vbios_enhtable res_1360x768[] = { };
static const struct ast_vbios_enhtable res_1600x900[] = { - {1800, 1600, 24, 80, 1000, 900, 1, 3, VCLK108, /* 60Hz */ - (SyncPP | Charx8Dot | LineCompareOff | WideScreenMode | NewModeInfo), 60, 3, 0x3A }, {1760, 1600, 48, 32, 926, 900, 3, 5, VCLK97_75, /* 60Hz CVT RB */ (SyncNP | Charx8Dot | LineCompareOff | WideScreenMode | NewModeInfo | AST2500PreCatchCRT), 60, 1, 0x3A },
From: Sean Christopherson seanjc@google.com
[ Upstream commit 4cf3d3ebe8794c449af3e0e8c1d790c97e461d20 ]
Don't skip the vmcall() in l2_guest_code() prior to re-entering L2, doing so will result in L2 running to completion, popping '0' off the stack for RET, jumping to address '0', and ultimately dying with a triple fault shutdown.
It's not at all obvious why the test re-enters L2 and re-executes VMCALL, but presumably it serves a purpose. The VMX path doesn't skip vmcall(), and the test can't possibly have passed on SVM, so just do what VMX does.
Fixes: d951b2210c1a ("KVM: selftests: smm_test: Test SMM enter from L2") Cc: Maxim Levitsky mlevitsk@redhat.com Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20220125221725.2101126-1-seanjc@google.com Reviewed-by: Vitaly Kuznetsov vkuznets@redhat.com Tested-by: Vitaly Kuznetsov vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/kvm/x86_64/smm_test.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/x86_64/smm_test.c b/tools/testing/selftests/kvm/x86_64/smm_test.c index d0fe2fdce58c4..db2a17559c3d5 100644 --- a/tools/testing/selftests/kvm/x86_64/smm_test.c +++ b/tools/testing/selftests/kvm/x86_64/smm_test.c @@ -105,7 +105,6 @@ static void guest_code(void *arg)
if (cpu_has_svm()) { run_guest(svm->vmcb, svm->vmcb_gpa); - svm->vmcb->save.rip += 3; run_guest(svm->vmcb, svm->vmcb_gpa); } else { vmlaunch();
From: Xiubo Li xiubli@redhat.com
[ Upstream commit 89d43d0551a848e70e63d9ba11534aaeabc82443 ]
When failing to allocate the sessions memory we should make sure the req1 and req2 and the sessions get put. And also in case the max_sessions decreased so when kreallocate the new memory some sessions maybe missed being put.
And if the max_sessions is 0 krealloc will return ZERO_SIZE_PTR, which will lead to a distinct access fault.
URL: https://tracker.ceph.com/issues/53819 Fixes: e1a4541ec0b9 ("ceph: flush the mdlog before waiting on unsafe reqs") Signed-off-by: Xiubo Li xiubli@redhat.com Reviewed-by: Venky Shankar vshankar@redhat.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ceph/caps.c | 55 +++++++++++++++++++++++++++++++++----------------- 1 file changed, 37 insertions(+), 18 deletions(-)
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index c447fa2e2d1fe..2f8696f3b925d 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -2218,6 +2218,7 @@ static int unsafe_request_wait(struct inode *inode) struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc; struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_mds_request *req1 = NULL, *req2 = NULL; + unsigned int max_sessions; int ret, err = 0;
spin_lock(&ci->i_unsafe_lock); @@ -2235,37 +2236,45 @@ static int unsafe_request_wait(struct inode *inode) } spin_unlock(&ci->i_unsafe_lock);
+ /* + * The mdsc->max_sessions is unlikely to be changed + * mostly, here we will retry it by reallocating the + * sessions array memory to get rid of the mdsc->mutex + * lock. + */ +retry: + max_sessions = mdsc->max_sessions; + /* * Trigger to flush the journal logs in all the relevant MDSes * manually, or in the worst case we must wait at most 5 seconds * to wait the journal logs to be flushed by the MDSes periodically. */ - if (req1 || req2) { + if ((req1 || req2) && likely(max_sessions)) { struct ceph_mds_session **sessions = NULL; struct ceph_mds_session *s; struct ceph_mds_request *req; - unsigned int max; int i;
- /* - * The mdsc->max_sessions is unlikely to be changed - * mostly, here we will retry it by reallocating the - * sessions arrary memory to get rid of the mdsc->mutex - * lock. - */ -retry: - max = mdsc->max_sessions; - sessions = krealloc(sessions, max * sizeof(s), __GFP_ZERO); - if (!sessions) - return -ENOMEM; + sessions = kzalloc(max_sessions * sizeof(s), GFP_KERNEL); + if (!sessions) { + err = -ENOMEM; + goto out; + }
spin_lock(&ci->i_unsafe_lock); if (req1) { list_for_each_entry(req, &ci->i_unsafe_dirops, r_unsafe_dir_item) { s = req->r_session; - if (unlikely(s->s_mds >= max)) { + if (unlikely(s->s_mds >= max_sessions)) { spin_unlock(&ci->i_unsafe_lock); + for (i = 0; i < max_sessions; i++) { + s = sessions[i]; + if (s) + ceph_put_mds_session(s); + } + kfree(sessions); goto retry; } if (!sessions[s->s_mds]) { @@ -2278,8 +2287,14 @@ retry: list_for_each_entry(req, &ci->i_unsafe_iops, r_unsafe_target_item) { s = req->r_session; - if (unlikely(s->s_mds >= max)) { + if (unlikely(s->s_mds >= max_sessions)) { spin_unlock(&ci->i_unsafe_lock); + for (i = 0; i < max_sessions; i++) { + s = sessions[i]; + if (s) + ceph_put_mds_session(s); + } + kfree(sessions); goto retry; } if (!sessions[s->s_mds]) { @@ -2300,7 +2315,7 @@ retry: spin_unlock(&ci->i_ceph_lock);
/* send flush mdlog request to MDSes */ - for (i = 0; i < max; i++) { + for (i = 0; i < max_sessions; i++) { s = sessions[i]; if (s) { send_flush_mdlog(s); @@ -2317,15 +2332,19 @@ retry: ceph_timeout_jiffies(req1->r_timeout)); if (ret) err = -EIO; - ceph_mdsc_put_request(req1); } if (req2) { ret = !wait_for_completion_timeout(&req2->r_safe_completion, ceph_timeout_jiffies(req2->r_timeout)); if (ret) err = -EIO; - ceph_mdsc_put_request(req2); } + +out: + if (req1) + ceph_mdsc_put_request(req1); + if (req2) + ceph_mdsc_put_request(req2); return err; }
From: Catherine Sullivan csully@google.com
[ Upstream commit a92f7a6feeb3884c69c1c7c1f13bccecb2228ad0 ]
Use GFP_ATOMIC when allocating pages out of the hotpath, continue to use GFP_KERNEL when allocating pages during setup.
GFP_KERNEL will allow blocking which allows it to succeed more often in a low memory enviornment but in the hotpath we do not want to allow the allocation to block.
Fixes: f5cedc84a30d2 ("gve: Add transmit and receive support") Signed-off-by: Catherine Sullivan csully@google.com Signed-off-by: David Awogbemila awogbemila@google.com Link: https://lore.kernel.org/r/20220126003843.3584521-1-awogbemila@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/google/gve/gve.h | 2 +- drivers/net/ethernet/google/gve/gve_main.c | 6 +++--- drivers/net/ethernet/google/gve/gve_rx.c | 3 ++- drivers/net/ethernet/google/gve/gve_rx_dqo.c | 2 +- 4 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h index b719f72281c44..3d469073fbc54 100644 --- a/drivers/net/ethernet/google/gve/gve.h +++ b/drivers/net/ethernet/google/gve/gve.h @@ -830,7 +830,7 @@ static inline bool gve_is_gqi(struct gve_priv *priv) /* buffers */ int gve_alloc_page(struct gve_priv *priv, struct device *dev, struct page **page, dma_addr_t *dma, - enum dma_data_direction); + enum dma_data_direction, gfp_t gfp_flags); void gve_free_page(struct device *dev, struct page *page, dma_addr_t dma, enum dma_data_direction); /* tx handling */ diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c index 59b66f679e46e..28e2d4d8ed7c6 100644 --- a/drivers/net/ethernet/google/gve/gve_main.c +++ b/drivers/net/ethernet/google/gve/gve_main.c @@ -752,9 +752,9 @@ static void gve_free_rings(struct gve_priv *priv)
int gve_alloc_page(struct gve_priv *priv, struct device *dev, struct page **page, dma_addr_t *dma, - enum dma_data_direction dir) + enum dma_data_direction dir, gfp_t gfp_flags) { - *page = alloc_page(GFP_KERNEL); + *page = alloc_page(gfp_flags); if (!*page) { priv->page_alloc_fail++; return -ENOMEM; @@ -797,7 +797,7 @@ static int gve_alloc_queue_page_list(struct gve_priv *priv, u32 id, for (i = 0; i < pages; i++) { err = gve_alloc_page(priv, &priv->pdev->dev, &qpl->pages[i], &qpl->page_buses[i], - gve_qpl_dma_dir(priv, id)); + gve_qpl_dma_dir(priv, id), GFP_KERNEL); /* caller handles clean up */ if (err) return -ENOMEM; diff --git a/drivers/net/ethernet/google/gve/gve_rx.c b/drivers/net/ethernet/google/gve/gve_rx.c index 3d04b5aff331b..04a08904305a9 100644 --- a/drivers/net/ethernet/google/gve/gve_rx.c +++ b/drivers/net/ethernet/google/gve/gve_rx.c @@ -86,7 +86,8 @@ static int gve_rx_alloc_buffer(struct gve_priv *priv, struct device *dev, dma_addr_t dma; int err;
- err = gve_alloc_page(priv, dev, &page, &dma, DMA_FROM_DEVICE); + err = gve_alloc_page(priv, dev, &page, &dma, DMA_FROM_DEVICE, + GFP_ATOMIC); if (err) return err;
diff --git a/drivers/net/ethernet/google/gve/gve_rx_dqo.c b/drivers/net/ethernet/google/gve/gve_rx_dqo.c index beb8bb079023c..8c939628e2d85 100644 --- a/drivers/net/ethernet/google/gve/gve_rx_dqo.c +++ b/drivers/net/ethernet/google/gve/gve_rx_dqo.c @@ -157,7 +157,7 @@ static int gve_alloc_page_dqo(struct gve_priv *priv, int err;
err = gve_alloc_page(priv, &priv->pdev->dev, &buf_state->page_info.page, - &buf_state->addr, DMA_FROM_DEVICE); + &buf_state->addr, DMA_FROM_DEVICE, GFP_KERNEL); if (err) return err;
From: Guillaume Nault gnault@redhat.com
[ Upstream commit 36268983e90316b37000a005642af42234dabb36 ]
This reverts commit b75326c201242de9495ff98e5d5cff41d7fc0d9d.
This commit breaks Linux compatibility with USGv6 tests. The RFC this commit was based on is actually an expired draft: no published RFC currently allows the new behaviour it introduced.
Without full IETF endorsement, the flash renumbering scenario this patch was supposed to enable is never going to work, as other IPv6 equipements on the same LAN will keep the 2 hours limit.
Fixes: b75326c20124 ("ipv6: Honor all IPv6 PIO Valid Lifetime values") Signed-off-by: Guillaume Nault gnault@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/addrconf.h | 2 ++ net/ipv6/addrconf.c | 27 ++++++++++++++++++++------- 2 files changed, 22 insertions(+), 7 deletions(-)
diff --git a/include/net/addrconf.h b/include/net/addrconf.h index 78ea3e332688f..e7ce719838b5e 100644 --- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -6,6 +6,8 @@ #define RTR_SOLICITATION_INTERVAL (4*HZ) #define RTR_SOLICITATION_MAX_INTERVAL (3600*HZ) /* 1 hour */
+#define MIN_VALID_LIFETIME (2*3600) /* 2 hours */ + #define TEMP_VALID_LIFETIME (7*86400) #define TEMP_PREFERRED_LIFETIME (86400) #define REGEN_MAX_RETRY (3) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 3445f8017430f..87961f1d9959b 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -2589,7 +2589,7 @@ int addrconf_prefix_rcv_add_addr(struct net *net, struct net_device *dev, __u32 valid_lft, u32 prefered_lft) { struct inet6_ifaddr *ifp = ipv6_get_ifaddr(net, addr, dev, 1); - int create = 0; + int create = 0, update_lft = 0;
if (!ifp && valid_lft) { int max_addresses = in6_dev->cnf.max_addresses; @@ -2633,19 +2633,32 @@ int addrconf_prefix_rcv_add_addr(struct net *net, struct net_device *dev, unsigned long now; u32 stored_lft;
- /* Update lifetime (RFC4862 5.5.3 e) - * We deviate from RFC4862 by honoring all Valid Lifetimes to - * improve the reaction of SLAAC to renumbering events - * (draft-gont-6man-slaac-renum-06, Section 4.2) - */ + /* update lifetime (RFC2462 5.5.3 e) */ spin_lock_bh(&ifp->lock); now = jiffies; if (ifp->valid_lft > (now - ifp->tstamp) / HZ) stored_lft = ifp->valid_lft - (now - ifp->tstamp) / HZ; else stored_lft = 0; - if (!create && stored_lft) { + const u32 minimum_lft = min_t(u32, + stored_lft, MIN_VALID_LIFETIME); + valid_lft = max(valid_lft, minimum_lft); + + /* RFC4862 Section 5.5.3e: + * "Note that the preferred lifetime of the + * corresponding address is always reset to + * the Preferred Lifetime in the received + * Prefix Information option, regardless of + * whether the valid lifetime is also reset or + * ignored." + * + * So we should always update prefered_lft here. + */ + update_lft = 1; + } + + if (update_lft) { ifp->valid_lft = valid_lft; ifp->prefered_lft = prefered_lft; ifp->tstamp = now;
From: Nikolay Aleksandrov nikolay@nvidia.com
[ Upstream commit dcb2c5c6ca9b9177f04abaf76e5a983d177c9414 ]
When dumping vlan options for a single net device we send the same entries infinitely because user-space expects a 0 return at the end but we keep returning skb->len and restarting the dump on retry. Fix it by returning the value from br_vlan_dump_dev() if it completed or there was an error. The only case that must return skb->len is when the dump was incomplete and needs to continue (-EMSGSIZE).
Reported-by: Benjamin Poirier bpoirier@nvidia.com Fixes: 8dcea187088b ("net: bridge: vlan: add rtm definitions and dump support") Signed-off-by: Nikolay Aleksandrov nikolay@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/bridge/br_vlan.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c index 49e105e0a4479..d0ebcc99bfa9d 100644 --- a/net/bridge/br_vlan.c +++ b/net/bridge/br_vlan.c @@ -2020,7 +2020,8 @@ static int br_vlan_rtm_dump(struct sk_buff *skb, struct netlink_callback *cb) goto out_err; } err = br_vlan_dump_dev(dev, skb, cb, dump_flags); - if (err && err != -EMSGSIZE) + /* if the dump completed without an error we return 0 here */ + if (err != -EMSGSIZE) goto out_err; } else { for_each_netdev_rcu(net, dev) {
From: Eric Dumazet edumazet@google.com
[ Upstream commit 153a0d187e767c68733b8e9f46218eb1f41ab902 ]
For some reason, raw_bind() forgot to lock the socket.
BUG: KCSAN: data-race in __ip4_datagram_connect / raw_bind
write to 0xffff8881170d4308 of 4 bytes by task 5466 on cpu 0: raw_bind+0x1b0/0x250 net/ipv4/raw.c:739 inet_bind+0x56/0xa0 net/ipv4/af_inet.c:443 __sys_bind+0x14b/0x1b0 net/socket.c:1697 __do_sys_bind net/socket.c:1708 [inline] __se_sys_bind net/socket.c:1706 [inline] __x64_sys_bind+0x3d/0x50 net/socket.c:1706 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff8881170d4308 of 4 bytes by task 5468 on cpu 1: __ip4_datagram_connect+0xb7/0x7b0 net/ipv4/datagram.c:39 ip4_datagram_connect+0x2a/0x40 net/ipv4/datagram.c:89 inet_dgram_connect+0x107/0x190 net/ipv4/af_inet.c:576 __sys_connect_file net/socket.c:1900 [inline] __sys_connect+0x197/0x1b0 net/socket.c:1917 __do_sys_connect net/socket.c:1927 [inline] __se_sys_connect net/socket.c:1924 [inline] __x64_sys_connect+0x3d/0x50 net/socket.c:1924 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x00000000 -> 0x0003007f
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 5468 Comm: syz-executor.5 Not tainted 5.17.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/raw.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index bb446e60cf580..b8689052079cd 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -721,6 +721,7 @@ static int raw_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len) int ret = -EINVAL; int chk_addr_ret;
+ lock_sock(sk); if (sk->sk_state != TCP_CLOSE || addr_len < sizeof(struct sockaddr_in)) goto out;
@@ -740,7 +741,9 @@ static int raw_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len) inet->inet_saddr = 0; /* Use device */ sk_dst_reset(sk); ret = 0; -out: return ret; +out: + release_sock(sk); + return ret; }
/*
From: Eric Dumazet edumazet@google.com
[ Upstream commit 970a5a3ea86da637471d3cd04d513a0755aba4bf ]
In commit 431280eebed9 ("ipv4: tcp: send zero IPID for RST and ACK sent in SYN-RECV and TIME-WAIT state") we took care of some ctl packets sent by TCP.
It turns out we need to use a similar strategy for SYNACK packets.
By default, they carry IP_DF and IPID==0, but there are ways to ask them to use the hashed IP ident generator and thus be used to build off-path attacks. (Ref: Off-Path TCP Exploits of the Mixed IPID Assignment)
One of this way is to force (before listener is started) echo 1 >/proc/sys/net/ipv4/ip_no_pmtu_disc
Another way is using forged ICMP ICMP_FRAG_NEEDED with a very small MTU (like 68) to force a false return from ip_dont_fragment()
In this patch, ip_build_and_send_pkt() uses the following heuristics.
1) Most SYNACK packets are smaller than IPV4_MIN_MTU and therefore can use IP_DF regardless of the listener or route pmtu setting.
2) In case the SYNACK packet is bigger than IPV4_MIN_MTU, we use prandom_u32() generator instead of the IPv4 hashed ident one.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: Ray Che xijiache@gmail.com Reviewed-by: David Ahern dsahern@kernel.org Cc: Geoff Alexander alexandg@cs.unm.edu Cc: Willy Tarreau w@1wt.eu Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/ip_output.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index ff38b46bd4b0f..a4d2eb691cbc1 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -162,12 +162,19 @@ int ip_build_and_send_pkt(struct sk_buff *skb, const struct sock *sk, iph->daddr = (opt && opt->opt.srr ? opt->opt.faddr : daddr); iph->saddr = saddr; iph->protocol = sk->sk_protocol; - if (ip_dont_fragment(sk, &rt->dst)) { + /* Do not bother generating IPID for small packets (eg SYNACK) */ + if (skb->len <= IPV4_MIN_MTU || ip_dont_fragment(sk, &rt->dst)) { iph->frag_off = htons(IP_DF); iph->id = 0; } else { iph->frag_off = 0; - __ip_select_ident(net, iph, 1); + /* TCP packets here are SYNACK with fat IPv4/TCP options. + * Avoid using the hashed IP ident generator. + */ + if (sk->sk_protocol == IPPROTO_TCP) + iph->id = (__force __be16)prandom_u32(); + else + __ip_select_ident(net, iph, 1); }
if (opt && opt->opt.optlen) {
From: Eric Dumazet edumazet@google.com
[ Upstream commit 3c42b2019863b327caa233072c50739d4144dd16 ]
./include/net/route.h:373:48: warning: incorrect type in argument 2 (different base types) ./include/net/route.h:373:48: expected unsigned int [usertype] key ./include/net/route.h:373:48: got restricted __be32 [usertype] daddr
Fixes: 5c9f7c1dfc2e ("ipv4: Add helpers for neigh lookup for nexthop") Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: David Ahern dsahern@kernel.org Link: https://lore.kernel.org/r/20220127013404.1279313-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/route.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/net/route.h b/include/net/route.h index 2e6c0e153e3a5..2551f3f03b37e 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -369,7 +369,7 @@ static inline struct neighbour *ip_neigh_gw4(struct net_device *dev, { struct neighbour *neigh;
- neigh = __ipv4_neigh_lookup_noref(dev, daddr); + neigh = __ipv4_neigh_lookup_noref(dev, (__force u32)daddr); if (unlikely(!neigh)) neigh = __neigh_create(&arp_tbl, &daddr, dev, false);
From: Tim Yi tim.yi@pica8.com
[ Upstream commit fd20d9738395cf8e27d0a17eba34169699fccdff ]
When using per-vlan state, if vlan snooping and stats are disabled, untagged or priority-tagged ingress frame will go to check pvid state. If the port state is forwarding and the pvid state is not learning/forwarding, untagged or priority-tagged frame will be dropped but skb memory is not freed. Should free skb when __allowed_ingress returns false.
Fixes: a580c76d534c ("net: bridge: vlan: add per-vlan state") Signed-off-by: Tim Yi tim.yi@pica8.com Acked-by: Nikolay Aleksandrov nikolay@nvidia.com Link: https://lore.kernel.org/r/20220127074953.12632-1-tim.yi@pica8.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bridge/br_vlan.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/net/bridge/br_vlan.c +++ b/net/bridge/br_vlan.c @@ -560,10 +560,10 @@ static bool __allowed_ingress(const stru !br_opt_get(br, BROPT_VLAN_STATS_ENABLED)) { if (*state == BR_STATE_FORWARDING) { *state = br_vlan_get_pvid_state(vg); - return br_vlan_state_allowed(*state, true); - } else { - return true; + if (!br_vlan_state_allowed(*state, true)) + goto drop; } + return true; } } v = br_vlan_find(vg, *vid);
From: Sander Vanheule sander@svanheule.net
commit 291e79c7e2eb6fdc016453597b78482e06199d0f upstream.
The driver assigned the irqchip and irq handler to the hardware irq, instead of the virq. This is incorrect, and only worked because these irq numbers happened to be the same on the devices used for testing the original driver.
Fixes: 9f3a0f34b84a ("irqchip: Add support for Realtek RTL838x/RTL839x interrupt controller") Signed-off-by: Sander Vanheule sander@svanheule.net Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/4b4936606480265db47df152f00bc2ed46340599.164173971... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/irqchip/irq-realtek-rtl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/irqchip/irq-realtek-rtl.c +++ b/drivers/irqchip/irq-realtek-rtl.c @@ -62,7 +62,7 @@ static struct irq_chip realtek_ictl_irq
static int intc_map(struct irq_domain *d, unsigned int irq, irq_hw_number_t hw) { - irq_set_chip_and_handler(hw, &realtek_ictl_irq, handle_level_irq); + irq_set_chip_and_handler(irq, &realtek_ictl_irq, handle_level_irq);
return 0; }
From: Sander Vanheule sander@svanheule.net
commit 91351b5dd0fd494eb2d85e1bb6aca77b067447e0 upstream.
There is an offset between routing values (1..6) and the connected MIPS CPU interrupts (2..7), but no distinction was made between these two values.
This issue was previously hidden during testing, because an interrupt mapping was used where for each required interrupt another (unused) routing was configured, with an offset of +1.
Offset the CPU IRQ numbers by -1 to retrieve the correct routing value.
Fixes: 9f3a0f34b84a ("irqchip: Add support for Realtek RTL838x/RTL839x interrupt controller") Signed-off-by: Sander Vanheule sander@svanheule.net Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/177b920aa8d8610615692d0e657e509f363c85ca.164173971... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/irqchip/irq-realtek-rtl.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/irqchip/irq-realtek-rtl.c +++ b/drivers/irqchip/irq-realtek-rtl.c @@ -95,7 +95,8 @@ out: * SoC interrupts are cascaded to MIPS CPU interrupts according to the * interrupt-map in the device tree. Each SoC interrupt gets 4 bits for * the CPU interrupt in an Interrupt Routing Register. Max 32 SoC interrupts - * thus go into 4 IRRs. + * thus go into 4 IRRs. A routing value of '0' means the interrupt is left + * disconnected. Routing values {1..15} connect to output lines {0..14}. */ static int __init map_interrupts(struct device_node *node, struct irq_domain *domain) { @@ -134,7 +135,7 @@ static int __init map_interrupts(struct of_node_put(cpu_ictl);
cpu_int = be32_to_cpup(imap + 2); - if (cpu_int > 7) + if (cpu_int > 7 || cpu_int < 2) return -EINVAL;
if (!(mips_irqs_set & BIT(cpu_int))) { @@ -143,7 +144,8 @@ static int __init map_interrupts(struct mips_irqs_set |= BIT(cpu_int); }
- regs[(soc_int * 4) / 32] |= cpu_int << (soc_int * 4) % 32; + /* Use routing values (1..6) for CPU interrupts (2..7) */ + regs[(soc_int * 4) / 32] |= (cpu_int - 1) << (soc_int * 4) % 32; imap += 3; }
From: Marc Kleine-Budde mkl@pengutronix.de
commit 17a30422621c0e04cb6060d20d7edcefd7463347 upstream.
This tcan4x5x only comes with 2K of MRAM, a RX FIFO with a dept of 32 doesn't fit into the MRAM. Use a depth of 16 instead.
Fixes: 4edd396a1911 ("dt-bindings: can: tcan4x5x: Add DT bindings for TCAN4x5X driver") Link: https://lore.kernel.org/all/20220119062951.2939851-1-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/devicetree/bindings/net/can/tcan4x5x.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/Documentation/devicetree/bindings/net/can/tcan4x5x.txt +++ b/Documentation/devicetree/bindings/net/can/tcan4x5x.txt @@ -31,7 +31,7 @@ tcan4x5x: tcan4x5x@0 { #address-cells = <1>; #size-cells = <1>; spi-max-frequency = <10000000>; - bosch,mram-cfg = <0x0 0 0 32 0 0 1 1>; + bosch,mram-cfg = <0x0 0 0 16 0 0 1 1>; interrupt-parent = <&gpio1>; interrupts = <14 IRQ_TYPE_LEVEL_LOW>; device-state-gpios = <&gpio3 21 GPIO_ACTIVE_HIGH>;
From: Sergio Paracuellos sergio.paracuellos@gmail.com
commit c035366d9c9fe48d947ee6c43465ab43d42e20f2 upstream.
Function pcie_rmw() is not being used at all and can be deleted. Hence get rid of it, which fixes this warning:
drivers/pci/controller/pcie-mt7621.c:112:20: warning: unused function 'pcie_rmw' [-Wunused-function]
Fixes: 2bdd5238e756 ("PCI: mt7621: Add MediaTek MT7621 PCIe host controller driver") Link: https://lore.kernel.org/r/20220124113003.406224-3-sergio.paracuellos@gmail.c... Link: https://lore.kernel.org/all/202201241754.igtHzgHv-lkp@intel.com/ Reported-by: kernel test robot lkp@intel.com Signed-off-by: Sergio Paracuellos sergio.paracuellos@gmail.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pcie-mt7621.c | 9 --------- 1 file changed, 9 deletions(-)
--- a/drivers/pci/controller/pcie-mt7621.c +++ b/drivers/pci/controller/pcie-mt7621.c @@ -109,15 +109,6 @@ static inline void pcie_write(struct mt7 writel_relaxed(val, pcie->base + reg); }
-static inline void pcie_rmw(struct mt7621_pcie *pcie, u32 reg, u32 clr, u32 set) -{ - u32 val = readl_relaxed(pcie->base + reg); - - val &= ~clr; - val |= set; - writel_relaxed(val, pcie->base + reg); -} - static inline u32 pcie_port_read(struct mt7621_pcie_port *port, u32 reg) { return readl_relaxed(port->base + reg);
From: Namhyung Kim namhyung@kernel.org
commit c5de60cd622a2607c043ba65e25a6e9998a369f9 upstream.
The active cgroup events are managed in the per-cpu cgrp_cpuctx_list. This list is only accessed from current cpu and not protected by any locks. But from the commit ef54c1a476ae ("perf: Rework perf_event_exit_event()"), it's possible to access (actually modify) the list from another cpu.
In the perf_remove_from_context(), it can remove an event from the context without an IPI when the context is not active. This is not safe with cgroup events which can have some active events in the context even if ctx->is_active is 0 at the moment. The target cpu might be in the middle of list iteration at the same time.
If the event is enabled when it's about to be closed, it might call perf_cgroup_event_disable() and list_del() with the cgrp_cpuctx_list on a different cpu.
This resulted in a crash due to an invalid list pointer access during the cgroup list traversal on the cpu which the event belongs to.
Let's fallback to IPI to access the cgrp_cpuctx_list from that cpu. Similarly, perf_install_in_context() should use IPI for the cgroup events too.
Fixes: ef54c1a476ae ("perf: Rework perf_event_exit_event()") Signed-off-by: Namhyung Kim namhyung@kernel.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/20220124195808.2252071-1-namhyung@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/events/core.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
--- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2458,7 +2458,11 @@ static void perf_remove_from_context(str * event_function_call() user. */ raw_spin_lock_irq(&ctx->lock); - if (!ctx->is_active) { + /* + * Cgroup events are per-cpu events, and must IPI because of + * cgrp_cpuctx_list. + */ + if (!ctx->is_active && !is_cgroup_event(event)) { __perf_remove_from_context(event, __get_cpu_context(ctx), ctx, (void *)flags); raw_spin_unlock_irq(&ctx->lock); @@ -2891,11 +2895,14 @@ perf_install_in_context(struct perf_even * perf_event_attr::disabled events will not run and can be initialized * without IPI. Except when this is the first event for the context, in * that case we need the magic of the IPI to set ctx->is_active. + * Similarly, cgroup events for the context also needs the IPI to + * manipulate the cgrp_cpuctx_list. * * The IOC_ENABLE that is sure to follow the creation of a disabled * event will issue the IPI and reprogram the hardware. */ - if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF && ctx->nr_events) { + if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF && + ctx->nr_events && !is_cgroup_event(event)) { raw_spin_lock_irq(&ctx->lock); if (ctx->task == TASK_TOMBSTONE) { raw_spin_unlock_irq(&ctx->lock);
From: Suren Baghdasaryan surenb@google.com
commit 51e50fbd3efc6064c30ed73a5e009018b36e290a upstream.
When CONFIG_CGROUPS is disabled psi code generates the following warnings:
kernel/sched/psi.c:1112:21: warning: no previous prototype for 'psi_trigger_create' [-Wmissing-prototypes] 1112 | struct psi_trigger *psi_trigger_create(struct psi_group *group, | ^~~~~~~~~~~~~~~~~~ kernel/sched/psi.c:1182:6: warning: no previous prototype for 'psi_trigger_destroy' [-Wmissing-prototypes] 1182 | void psi_trigger_destroy(struct psi_trigger *t) | ^~~~~~~~~~~~~~~~~~~ kernel/sched/psi.c:1249:10: warning: no previous prototype for 'psi_trigger_poll' [-Wmissing-prototypes] 1249 | __poll_t psi_trigger_poll(void **trigger_ptr, | ^~~~~~~~~~~~~~~~
Change the declarations of these functions in the header to provide the prototypes even when they are unused.
Link: https://lkml.kernel.org/r/20220119223940.787748-2-surenb@google.com Fixes: 0e94682b73bf ("psi: introduce psi monitor") Signed-off-by: Suren Baghdasaryan surenb@google.com Reported-by: kernel test robot lkp@intel.com Acked-by: Johannes Weiner hannes@cmpxchg.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/psi.h | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
--- a/include/linux/psi.h +++ b/include/linux/psi.h @@ -24,18 +24,17 @@ void psi_memstall_enter(unsigned long *f void psi_memstall_leave(unsigned long *flags);
int psi_show(struct seq_file *s, struct psi_group *group, enum psi_res res); - -#ifdef CONFIG_CGROUPS -int psi_cgroup_alloc(struct cgroup *cgrp); -void psi_cgroup_free(struct cgroup *cgrp); -void cgroup_move_task(struct task_struct *p, struct css_set *to); - struct psi_trigger *psi_trigger_create(struct psi_group *group, char *buf, size_t nbytes, enum psi_res res); void psi_trigger_destroy(struct psi_trigger *t);
__poll_t psi_trigger_poll(void **trigger_ptr, struct file *file, poll_table *wait); + +#ifdef CONFIG_CGROUPS +int psi_cgroup_alloc(struct cgroup *cgrp); +void psi_cgroup_free(struct cgroup *cgrp); +void cgroup_move_task(struct task_struct *p, struct css_set *to); #endif
#else /* CONFIG_PSI */
From: Suren Baghdasaryan surenb@google.com
commit 44585f7bc0cb01095bc2ad4258049c02bbad21ef upstream.
When CONFIG_PROC_FS is disabled psi code generates the following warnings:
kernel/sched/psi.c:1364:30: warning: 'psi_cpu_proc_ops' defined but not used [-Wunused-const-variable=] 1364 | static const struct proc_ops psi_cpu_proc_ops = { | ^~~~~~~~~~~~~~~~ kernel/sched/psi.c:1355:30: warning: 'psi_memory_proc_ops' defined but not used [-Wunused-const-variable=] 1355 | static const struct proc_ops psi_memory_proc_ops = { | ^~~~~~~~~~~~~~~~~~~ kernel/sched/psi.c:1346:30: warning: 'psi_io_proc_ops' defined but not used [-Wunused-const-variable=] 1346 | static const struct proc_ops psi_io_proc_ops = { | ^~~~~~~~~~~~~~~
Make definitions of these structures and related functions conditional on CONFIG_PROC_FS config.
Link: https://lkml.kernel.org/r/20220119223940.787748-3-surenb@google.com Fixes: 0e94682b73bf ("psi: introduce psi monitor") Signed-off-by: Suren Baghdasaryan surenb@google.com Reported-by: kernel test robot lkp@intel.com Acked-by: Johannes Weiner hannes@cmpxchg.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sched/psi.c | 79 +++++++++++++++++++++++++++-------------------------- 1 file changed, 41 insertions(+), 38 deletions(-)
--- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -1082,44 +1082,6 @@ int psi_show(struct seq_file *m, struct return 0; }
-static int psi_io_show(struct seq_file *m, void *v) -{ - return psi_show(m, &psi_system, PSI_IO); -} - -static int psi_memory_show(struct seq_file *m, void *v) -{ - return psi_show(m, &psi_system, PSI_MEM); -} - -static int psi_cpu_show(struct seq_file *m, void *v) -{ - return psi_show(m, &psi_system, PSI_CPU); -} - -static int psi_open(struct file *file, int (*psi_show)(struct seq_file *, void *)) -{ - if (file->f_mode & FMODE_WRITE && !capable(CAP_SYS_RESOURCE)) - return -EPERM; - - return single_open(file, psi_show, NULL); -} - -static int psi_io_open(struct inode *inode, struct file *file) -{ - return psi_open(file, psi_io_show); -} - -static int psi_memory_open(struct inode *inode, struct file *file) -{ - return psi_open(file, psi_memory_show); -} - -static int psi_cpu_open(struct inode *inode, struct file *file) -{ - return psi_open(file, psi_cpu_show); -} - struct psi_trigger *psi_trigger_create(struct psi_group *group, char *buf, size_t nbytes, enum psi_res res) { @@ -1278,6 +1240,45 @@ __poll_t psi_trigger_poll(void **trigger return ret; }
+#ifdef CONFIG_PROC_FS +static int psi_io_show(struct seq_file *m, void *v) +{ + return psi_show(m, &psi_system, PSI_IO); +} + +static int psi_memory_show(struct seq_file *m, void *v) +{ + return psi_show(m, &psi_system, PSI_MEM); +} + +static int psi_cpu_show(struct seq_file *m, void *v) +{ + return psi_show(m, &psi_system, PSI_CPU); +} + +static int psi_open(struct file *file, int (*psi_show)(struct seq_file *, void *)) +{ + if (file->f_mode & FMODE_WRITE && !capable(CAP_SYS_RESOURCE)) + return -EPERM; + + return single_open(file, psi_show, NULL); +} + +static int psi_io_open(struct inode *inode, struct file *file) +{ + return psi_open(file, psi_io_show); +} + +static int psi_memory_open(struct inode *inode, struct file *file) +{ + return psi_open(file, psi_memory_show); +} + +static int psi_cpu_open(struct inode *inode, struct file *file) +{ + return psi_open(file, psi_cpu_show); +} + static ssize_t psi_write(struct file *file, const char __user *user_buf, size_t nbytes, enum psi_res res) { @@ -1392,3 +1393,5 @@ static int __init psi_proc_init(void) return 0; } module_init(psi_proc_init); + +#endif /* CONFIG_PROC_FS */
From: Robert Hancock robert.hancock@calian.com
commit b470947c3672f7eb7c4c271d510383d896831cc2 upstream.
A previous patch to skip part of the initialization when a USB3 PHY was not present could result in the return value being uninitialized in that case, causing spurious probe failures. Initialize ret to 0 to avoid this.
Fixes: 9678f3361afc ("usb: dwc3: xilinx: Skip resets and USB3 register settings for USB2.0 mode") Cc: stable@vger.kernel.org Reviewed-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Robert Hancock robert.hancock@calian.com Link: https://lore.kernel.org/r/20220127221500.177021-1-robert.hancock@calian.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc3/dwc3-xilinx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/usb/dwc3/dwc3-xilinx.c +++ b/drivers/usb/dwc3/dwc3-xilinx.c @@ -99,7 +99,7 @@ static int dwc3_xlnx_init_zynqmp(struct struct device *dev = priv_data->dev; struct reset_control *crst, *hibrst, *apbrst; struct phy *usb3_phy; - int ret; + int ret = 0; u32 reg;
usb3_phy = devm_phy_optional_get(dev, "usb3-phy");
From: Dmitry V. Levin ldv@altlinux.org
commit 10756dc5b02bff370ddd351d7744bc99ada659c2 upstream.
As linux/nfc.h userspace compilation was finally fixed by commits 79b69a83705e ("nfc: uapi: use kernel size_t to fix user-space builds") and 7175f02c4e5f ("uapi: fix linux/nfc.h userspace compilation errors"), there is no need to keep the compile-test exception for it in usr/include/Makefile.
Signed-off-by: Dmitry V. Levin ldv@altlinux.org Signed-off-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- usr/include/Makefile | 1 - 1 file changed, 1 deletion(-)
--- a/usr/include/Makefile +++ b/usr/include/Makefile @@ -35,7 +35,6 @@ no-header-test += linux/hdlc/ioctl.h no-header-test += linux/ivtv.h no-header-test += linux/kexec.h no-header-test += linux/matroxfb.h -no-header-test += linux/nfc.h no-header-test += linux/omap3isp.h no-header-test += linux/omapfb.h no-header-test += linux/patchkey.h
From: Maor Gottlieb maorg@nvidia.com
[ Upstream commit 0226bd64da52aa23120d1450c37a424387827a21 ]
The cited commits replaced preemptible with pagefault_disabled and flush_kernel_dcache_page with flush_dcache_page respectively, hence need to update the corresponding defines in the test.
scatterlist.c: In function ‘sg_miter_stop’: scatterlist.c:919:4: warning: implicit declaration of function ‘flush_dcache_page’ [-Wimplicit-function-declaration] flush_dcache_page(miter->page); ^~~~~~~~~~~~~~~~~ In file included from linux/scatterlist.h:8:0, from scatterlist.c:9: scatterlist.c:922:18: warning: implicit declaration of function ‘pagefault_disabled’ [-Wimplicit-function-declaration] WARN_ON_ONCE(!pagefault_disabled()); ^ linux/mm.h:23:25: note: in definition of macro ‘WARN_ON_ONCE’ int __ret_warn_on = !!(condition); \ ^~~~~~~~~
Link: https://lkml.kernel.org/r/20220118082105.1737320-1-maorg@nvidia.com Fixes: 723aca208516 ("mm/scatterlist: replace the !preemptible warning in sg_miter_stop()") Fixes: 0e84f5dbf8d6 ("scatterlist: replace flush_kernel_dcache_page with flush_dcache_page") Signed-off-by: Maor Gottlieb maorg@nvidia.com Tested-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: Thomas Gleixner tglx@linutronix.de Cc: Christoph Hellwig hch@lst.de Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/scatterlist/linux/mm.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/tools/testing/scatterlist/linux/mm.h +++ b/tools/testing/scatterlist/linux/mm.h @@ -74,7 +74,7 @@ static inline unsigned long page_to_phys __UNIQUE_ID(min1_), __UNIQUE_ID(min2_), \ x, y)
-#define preemptible() (1) +#define pagefault_disabled() (0)
static inline void *kmap(struct page *page) { @@ -127,6 +127,7 @@ kmalloc_array(unsigned int n, unsigned i #define kmemleak_free(a)
#define PageSlab(p) (0) +#define flush_dcache_page(p)
#define MAX_ERRNO 4095
From: Vitaly Kuznetsov vkuznets@redhat.com
commit 2423a4c0d17418eca1ba1e3f48684cb2ab7523d5 upstream.
vmcs_to_field_offset{,_table} may sound misleading as VMCS is an opaque blob which is not supposed to be accessed directly. In fact, vmcs_to_field_offset{,_table} are related to KVM defined VMCS12 structure.
Rename vmcs_field_to_offset() to get_vmcs12_field_offset() for clarity.
No functional change intended.
Reviewed-by: Sean Christopherson seanjc@google.com Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20220112170134.1904308-4-vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/vmx/nested.c | 6 +++--- arch/x86/kvm/vmx/vmcs12.c | 4 ++-- arch/x86/kvm/vmx/vmcs12.h | 6 +++--- 3 files changed, 8 insertions(+), 8 deletions(-)
--- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -5086,7 +5086,7 @@ static int handle_vmread(struct kvm_vcpu /* Decode instruction info and find the field to read */ field = kvm_register_read(vcpu, (((instr_info) >> 28) & 0xf));
- offset = vmcs_field_to_offset(field); + offset = get_vmcs12_field_offset(field); if (offset < 0) return nested_vmx_fail(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);
@@ -5189,7 +5189,7 @@ static int handle_vmwrite(struct kvm_vcp
field = kvm_register_read(vcpu, (((instr_info) >> 28) & 0xf));
- offset = vmcs_field_to_offset(field); + offset = get_vmcs12_field_offset(field); if (offset < 0) return nested_vmx_fail(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);
@@ -6435,7 +6435,7 @@ static u64 nested_vmx_calc_vmcs_enum_msr max_idx = 0; for (i = 0; i < nr_vmcs12_fields; i++) { /* The vmcs12 table is very, very sparsely populated. */ - if (!vmcs_field_to_offset_table[i]) + if (!vmcs12_field_offsets[i]) continue;
idx = vmcs_field_index(VMCS12_IDX_TO_ENC(i)); --- a/arch/x86/kvm/vmx/vmcs12.c +++ b/arch/x86/kvm/vmx/vmcs12.c @@ -8,7 +8,7 @@ FIELD(number, name), \ [ROL16(number##_HIGH, 6)] = VMCS12_OFFSET(name) + sizeof(u32)
-const unsigned short vmcs_field_to_offset_table[] = { +const unsigned short vmcs12_field_offsets[] = { FIELD(VIRTUAL_PROCESSOR_ID, virtual_processor_id), FIELD(POSTED_INTR_NV, posted_intr_nv), FIELD(GUEST_ES_SELECTOR, guest_es_selector), @@ -151,4 +151,4 @@ const unsigned short vmcs_field_to_offse FIELD(HOST_RSP, host_rsp), FIELD(HOST_RIP, host_rip), }; -const unsigned int nr_vmcs12_fields = ARRAY_SIZE(vmcs_field_to_offset_table); +const unsigned int nr_vmcs12_fields = ARRAY_SIZE(vmcs12_field_offsets); --- a/arch/x86/kvm/vmx/vmcs12.h +++ b/arch/x86/kvm/vmx/vmcs12.h @@ -361,10 +361,10 @@ static inline void vmx_check_vmcs12_offs CHECK_OFFSET(guest_pml_index, 996); }
-extern const unsigned short vmcs_field_to_offset_table[]; +extern const unsigned short vmcs12_field_offsets[]; extern const unsigned int nr_vmcs12_fields;
-static inline short vmcs_field_to_offset(unsigned long field) +static inline short get_vmcs12_field_offset(unsigned long field) { unsigned short offset; unsigned int index; @@ -377,7 +377,7 @@ static inline short vmcs_field_to_offset return -ENOENT;
index = array_index_nospec(index, nr_vmcs12_fields); - offset = vmcs_field_to_offset_table[index]; + offset = vmcs12_field_offsets[index]; if (offset == 0) return -ENOENT; return offset;
From: Vitaly Kuznetsov vkuznets@redhat.com
commit 892a42c10ddb945d3a4dcf07dccdf9cb98b21548 upstream.
In preparation to allowing reads from Enlightened VMCS from handle_vmread(), implement evmcs_field_offset() to get the correct read offset. get_evmcs_offset(), which is being used by KVM-on-Hyper-V, is almost what's needed but a few things need to be adjusted. First, WARN_ON() is unacceptable for handle_vmread() as any field can (in theory) be supplied by the guest and not all fields are defined in eVMCS v1. Second, we need to handle 'holes' in eVMCS (missing fields). It also sounds like a good idea to WARN_ON() if such fields are ever accessed by KVM-on-Hyper-V.
Implement dedicated evmcs_field_offset() helper.
No functional change intended.
Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20220112170134.1904308-5-vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/vmx/evmcs.c | 3 +-- arch/x86/kvm/vmx/evmcs.h | 32 ++++++++++++++++++++++++-------- 2 files changed, 25 insertions(+), 10 deletions(-)
--- a/arch/x86/kvm/vmx/evmcs.c +++ b/arch/x86/kvm/vmx/evmcs.c @@ -12,8 +12,6 @@
DEFINE_STATIC_KEY_FALSE(enable_evmcs);
-#if IS_ENABLED(CONFIG_HYPERV) - #define EVMCS1_OFFSET(x) offsetof(struct hv_enlightened_vmcs, x) #define EVMCS1_FIELD(number, name, clean_field)[ROL16(number, 6)] = \ {EVMCS1_OFFSET(name), clean_field} @@ -296,6 +294,7 @@ const struct evmcs_field vmcs_field_to_e }; const unsigned int nr_evmcs_1_fields = ARRAY_SIZE(vmcs_field_to_evmcs_1);
+#if IS_ENABLED(CONFIG_HYPERV) __init void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf) { vmcs_conf->pin_based_exec_ctrl &= ~EVMCS1_UNSUPPORTED_PINCTRL; --- a/arch/x86/kvm/vmx/evmcs.h +++ b/arch/x86/kvm/vmx/evmcs.h @@ -63,8 +63,6 @@ DECLARE_STATIC_KEY_FALSE(enable_evmcs); #define EVMCS1_UNSUPPORTED_VMENTRY_CTRL (VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) #define EVMCS1_UNSUPPORTED_VMFUNC (VMX_VMFUNC_EPTP_SWITCHING)
-#if IS_ENABLED(CONFIG_HYPERV) - struct evmcs_field { u16 offset; u16 clean_field; @@ -73,26 +71,44 @@ struct evmcs_field { extern const struct evmcs_field vmcs_field_to_evmcs_1[]; extern const unsigned int nr_evmcs_1_fields;
-static __always_inline int get_evmcs_offset(unsigned long field, - u16 *clean_field) +static __always_inline int evmcs_field_offset(unsigned long field, + u16 *clean_field) { unsigned int index = ROL16(field, 6); const struct evmcs_field *evmcs_field;
- if (unlikely(index >= nr_evmcs_1_fields)) { - WARN_ONCE(1, "KVM: accessing unsupported EVMCS field %lx\n", - field); + if (unlikely(index >= nr_evmcs_1_fields)) return -ENOENT; - }
evmcs_field = &vmcs_field_to_evmcs_1[index];
+ /* + * Use offset=0 to detect holes in eVMCS. This offset belongs to + * 'revision_id' but this field has no encoding and is supposed to + * be accessed directly. + */ + if (unlikely(!evmcs_field->offset)) + return -ENOENT; + if (clean_field) *clean_field = evmcs_field->clean_field;
return evmcs_field->offset; }
+#if IS_ENABLED(CONFIG_HYPERV) + +static __always_inline int get_evmcs_offset(unsigned long field, + u16 *clean_field) +{ + int offset = evmcs_field_offset(field, clean_field); + + WARN_ONCE(offset < 0, "KVM: accessing unsupported EVMCS field %lx\n", + field); + + return offset; +} + static __always_inline void evmcs_write64(unsigned long field, u64 value) { u16 clean_field;
From: Vitaly Kuznetsov vkuznets@redhat.com
commit 6cbbaab60ff33f59355492c241318046befd9ffc upstream.
Hyper-V TLFS explicitly forbids VMREAD and VMWRITE instructions when Enlightened VMCS interface is in use:
"Any VMREAD or VMWRITE instructions while an enlightened VMCS is active is unsupported and can result in unexpected behavior.""
Windows 11 + WSL2 seems to ignore this, attempts to VMREAD VMCS field 0x4404 ("VM-exit interruption information") are observed. Failing these attempts with nested_vmx_failInvalid() makes such guests unbootable.
Microsoft confirms this is a Hyper-V bug and claims that it'll get fixed eventually but for the time being we need a workaround. (Temporary) allow VMREAD to get data from the currently loaded Enlightened VMCS.
Note: VMWRITE instructions remain forbidden, it is not clear how to handle them properly and hopefully won't ever be needed.
Reviewed-by: Sean Christopherson seanjc@google.com Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20220112170134.1904308-6-vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/vmx/evmcs.h | 12 ++++++++++ arch/x86/kvm/vmx/nested.c | 55 ++++++++++++++++++++++++++++++++-------------- 2 files changed, 51 insertions(+), 16 deletions(-)
--- a/arch/x86/kvm/vmx/evmcs.h +++ b/arch/x86/kvm/vmx/evmcs.h @@ -96,6 +96,18 @@ static __always_inline int evmcs_field_o return evmcs_field->offset; }
+static inline u64 evmcs_read_any(struct hv_enlightened_vmcs *evmcs, + unsigned long field, u16 offset) +{ + /* + * vmcs12_read_any() doesn't care whether the supplied structure + * is 'struct vmcs12' or 'struct hv_enlightened_vmcs' as it takes + * the exact offset of the required field, use it for convenience + * here. + */ + return vmcs12_read_any((void *)evmcs, field, offset); +} + #if IS_ENABLED(CONFIG_HYPERV)
static __always_inline int get_evmcs_offset(unsigned long field, --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -7,6 +7,7 @@ #include <asm/mmu_context.h>
#include "cpuid.h" +#include "evmcs.h" #include "hyperv.h" #include "mmu.h" #include "nested.h" @@ -5074,27 +5075,49 @@ static int handle_vmread(struct kvm_vcpu if (!nested_vmx_check_permission(vcpu)) return 1;
- /* - * In VMX non-root operation, when the VMCS-link pointer is INVALID_GPA, - * any VMREAD sets the ALU flags for VMfailInvalid. - */ - if (vmx->nested.current_vmptr == INVALID_GPA || - (is_guest_mode(vcpu) && - get_vmcs12(vcpu)->vmcs_link_pointer == INVALID_GPA)) - return nested_vmx_failInvalid(vcpu); - /* Decode instruction info and find the field to read */ field = kvm_register_read(vcpu, (((instr_info) >> 28) & 0xf));
- offset = get_vmcs12_field_offset(field); - if (offset < 0) - return nested_vmx_fail(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT); + if (!evmptr_is_valid(vmx->nested.hv_evmcs_vmptr)) { + /* + * In VMX non-root operation, when the VMCS-link pointer is INVALID_GPA, + * any VMREAD sets the ALU flags for VMfailInvalid. + */ + if (vmx->nested.current_vmptr == INVALID_GPA || + (is_guest_mode(vcpu) && + get_vmcs12(vcpu)->vmcs_link_pointer == INVALID_GPA)) + return nested_vmx_failInvalid(vcpu); + + offset = get_vmcs12_field_offset(field); + if (offset < 0) + return nested_vmx_fail(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT); + + if (!is_guest_mode(vcpu) && is_vmcs12_ext_field(field)) + copy_vmcs02_to_vmcs12_rare(vcpu, vmcs12); + + /* Read the field, zero-extended to a u64 value */ + value = vmcs12_read_any(vmcs12, field, offset); + } else { + /* + * Hyper-V TLFS (as of 6.0b) explicitly states, that while an + * enlightened VMCS is active VMREAD/VMWRITE instructions are + * unsupported. Unfortunately, certain versions of Windows 11 + * don't comply with this requirement which is not enforced in + * genuine Hyper-V. Allow VMREAD from an enlightened VMCS as a + * workaround, as misbehaving guests will panic on VM-Fail. + * Note, enlightened VMCS is incompatible with shadow VMCS so + * all VMREADs from L2 should go to L1. + */ + if (WARN_ON_ONCE(is_guest_mode(vcpu))) + return nested_vmx_failInvalid(vcpu);
- if (!is_guest_mode(vcpu) && is_vmcs12_ext_field(field)) - copy_vmcs02_to_vmcs12_rare(vcpu, vmcs12); + offset = evmcs_field_offset(field, NULL); + if (offset < 0) + return nested_vmx_fail(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);
- /* Read the field, zero-extended to a u64 value */ - value = vmcs12_read_any(vmcs12, field, offset); + /* Read the field, zero-extended to a u64 value */ + value = evmcs_read_any(vmx->nested.hv_evmcs, field, offset); + }
/* * Now copy part of this value to register or memory, as requested.
From: OGAWA Hirofumi hirofumi@mail.parknet.co.jp
commit 3ee859e384d453d6ac68bfd5971f630d9fa46ad3 upstream.
bio_truncate() clears the buffer outside of last block of bdev, however current bio_truncate() is using the wrong offset of page. So it can return the uninitialized data.
This happened when both of truncated/corrupted FS and userspace (via bdev) are trying to read the last of bdev.
Reported-by: syzbot+ac94ae5f68b84197f41c@syzkaller.appspotmail.com Signed-off-by: OGAWA Hirofumi hirofumi@mail.parknet.co.jp Reviewed-by: Ming Lei ming.lei@redhat.com Link: https://lore.kernel.org/r/875yqt1c9g.fsf@mail.parknet.co.jp Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- block/bio.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/block/bio.c +++ b/block/bio.c @@ -569,7 +569,8 @@ static void bio_truncate(struct bio *bio offset = new_size - done; else offset = 0; - zero_user(bv.bv_page, offset, bv.bv_len - offset); + zero_user(bv.bv_page, bv.bv_offset + offset, + bv.bv_len - offset); truncated = true; } done += bv.bv_len;
On Mon, 31 Jan 2022 11:54:23 +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.16.5 release. There are 200 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 02 Feb 2022 10:51:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.16.5-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.16.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v5.16: 10 builds: 10 pass, 0 fail 28 boots: 28 pass, 0 fail 122 tests: 122 pass, 0 fail
Linux version: 5.16.5-rc1-g04f8430582e6 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On Mon, 31 Jan 2022 at 16:46, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.16.5 release. There are 200 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 02 Feb 2022 10:51:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.16.5-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.16.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 5.16.5-rc1 * git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git * git branch: linux-5.16.y * git commit: 04f8430582e6a2a7bb728d05119e2d66bd3c819a * git describe: v5.16.4-201-g04f8430582e6 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.16.y/build/v5.16....
## Test Regressions (compared to v5.16.4-187-g950d978a980e) No test regressions found.
## Metric Regressions (compared to v5.16.4-187-g950d978a980e) No metric regressions found.
## Test Fixes (compared to v5.16.4-187-g950d978a980e) No test fixes found.
## Metric Fixes (compared to v5.16.4-187-g950d978a980e) No metric fixes found.
## Test result summary total: 84643, pass: 72410, fail: 848, skip: 10560, xfail: 825
## Build Summary * arc: 10 total, 10 passed, 0 failed * arm: 259 total, 259 passed, 0 failed * arm64: 37 total, 37 passed, 0 failed * dragonboard-410c: 1 total, 1 passed, 0 failed * hi6220-hikey: 1 total, 1 passed, 0 failed * i386: 36 total, 36 passed, 0 failed * juno-r2: 1 total, 1 passed, 0 failed * mips: 34 total, 34 passed, 0 failed * parisc: 12 total, 12 passed, 0 failed * powerpc: 52 total, 48 passed, 4 failed * riscv: 24 total, 20 passed, 4 failed * s390: 18 total, 18 passed, 0 failed * sh: 24 total, 24 passed, 0 failed * sparc: 12 total, 12 passed, 0 failed * x15: 1 total, 1 passed, 0 failed * x86: 1 total, 1 passed, 0 failed * x86_64: 37 total, 37 passed, 0 failed
## Test suites summary * fwts * igt-gpu-tools * kselftest- * kselftest-android * kselftest-arm64 * kselftest-arm64/arm64.btitest.bti_c_func * kselftest-arm64/arm64.btitest.bti_j_func * kselftest-arm64/arm64.btitest.bti_jc_func * kselftest-arm64/arm64.btitest.bti_none_func * kselftest-arm64/arm64.btitest.nohint_func * kselftest-arm64/arm64.btitest.paciasp_func * kselftest-arm64/arm64.nobtitest.bti_c_func * kselftest-arm64/arm64.nobtitest.bti_j_func * kselftest-arm64/arm64.nobtitest.bti_jc_func * kselftest-arm64/arm64.nobtitest.bti_none_func * kselftest-arm64/arm64.nobtitest.nohint_func * kselftest-arm64/arm64.nobtitest.paciasp_func * kselftest-bpf * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers * kselftest-efivarfs * kselftest-filesystems * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-lkdtm * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * linux-log-parser * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-controllers-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-tracing-tests * network-basic-tests * packetdrill * perf * prep-inline * rcutorture * ssuite * v4l2-compliance
-- Linaro LKFT https://lkft.linaro.org
On 1/31/22 3:54 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.16.5 release. There are 200 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 02 Feb 2022 10:51:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.16.5-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.16.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On 1/31/2022 2:54 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.16.5 release. There are 200 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 02 Feb 2022 10:51:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.16.5-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.16.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels:
Tested-by: Florian Fainelli f.fainelli@gmail.com
On Mon, Jan 31, 2022 at 8:38 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.16.5 release. There are 200 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 02 Feb 2022 10:51:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.16.5-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.16.y and the diffstat can be found below.
thanks,
greg k-h
Hi Greg,
Compiled and booted on my test system Lenovo P50s: Intel Core i7 No emergency and critical messages in the dmesg
./perf bench sched all # Running sched/messaging benchmark... # 20 sender and receiver processes per group # 10 groups == 400 processes run
Total time: 0.436 [sec]
# Running sched/pipe benchmark... # Executed 1000000 pipe operations between two processes
Total time: 7.170 [sec]
7.170326 usecs/op 139463 ops/sec
Tested-by: Zan Aziz zanaziz313@gmail.com
Thanks -Zan
On Mon, Jan 31, 2022 at 11:54:23AM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.16.5 release. There are 200 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 02 Feb 2022 10:51:59 +0000. Anything received after that time might be too late.
Build results: total: 155 pass: 154 fail: 1 Failed builds: powerpc:ppc32_allmodconfig Qemu test results: total: 488 pass: 488 fail: 0
As with v5.15.y, the build error is not new and fixed with commit 33a0da68fb07 ("mtd: rawnand: mpc5121: Remove unused variable in ads5121_select_chip()").
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On 1/31/22 02:54, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.16.5 release. There are 200 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 02 Feb 2022 10:51:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.16.5-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.16.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
On 31/01/22 17.54, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.16.5 release. There are 200 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Successfully compiled and booted on my system (x86_64 localmodconfig). Also successfully cross-compiled for arm64 (bcm2711_defconfig taken from raspberry pi kernel sources) and ppc64 (gamecube_defconfig).
Tested-by: Bagas Sanjaya bagasdotme@gmail.com
On Mon, Jan 31, 2022 at 11:54:23AM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.16.5 release. There are 200 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 02 Feb 2022 10:51:59 +0000. Anything received after that time might be too late.
Hi Greg,
5.16.5-rc1 tested.
Run tested on: - Allwinner H6 (Tanix TX6) - Intel Tiger Lake x86_64 (nuc11 i7-1165G7)
In addition - build tested on: - Allwinner A64 - Allwinner H3 - Allwinner H5 - NXP iMX6 - NXP iMX8 - Qualcomm Dragonboard - Rockchip RK3288 - Rockchip RK3328 - Rockchip RK3399pro - Samsung Exynos
Tested-by: Rudi Heitbaum rudi@heitbaum.com -- Rudi
On Mon, 31 Jan 2022 11:54:23 +0100, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.16.5 release. There are 200 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 02 Feb 2022 10:51:59 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.16.5-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.16.y and the diffstat can be found below.
thanks,
greg k-h
5.16.5-rc1 Successfully Compiled and booted on my Raspberry PI 4b (8g) (bcm2711)
Tested-by: Fox Chen foxhlchen@gmail.com
linux-stable-mirror@lists.linaro.org