This is the start of the stable review cycle for the 4.14.257 release. There are 106 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 08 Dec 2021 14:55:37 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.257-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.14.257-rc1
Helge Deller deller@gmx.de parisc: Mark cr16 CPU clocksource unstable on all SMP machines
Johan Hovold johan@kernel.org serial: core: fix transmit-buffer reset and memleak
Pierre Gondois Pierre.Gondois@arm.com serial: pl011: Add ACPI SBSA UART match id
Sven Eckelmann sven@narfation.org tty: serial: msm_serial: Deactivate RX DMA for polling support
Joerg Roedel jroedel@suse.de x86/64/mm: Map all kernel memory into trampoline_pgd
Badhri Jagan Sridharan badhri@google.com usb: typec: tcpm: Wait in SNK_DEBOUNCED until disconnect
Mathias Nyman mathias.nyman@linux.intel.com xhci: Fix commad ring abort, write all 64 bits to CRCR register.
Maciej W. Rozycki macro@orcam.me.uk vgacon: Propagate console boot parameters before calling `vc_resize'
Helge Deller deller@gmx.de parisc: Fix "make install" on newer debian releases
Helge Deller deller@gmx.de parisc: Fix KBUILD_IMAGE for self-extracting kernel
Tony Lu tonylu@linux.alibaba.com net/smc: Keep smc_close_final rc during active close
William Kucharski william.kucharski@oracle.com net/rds: correct socket tunable error in rds_tcp_tune()
Sven Schuchmann schuchmann@schleissheimer.de net: usb: lan78xx: lan78xx_phy_init(): use PHY_POLL instead of "0" if no IRQ is available
Zhou Qingyang zhou1615@umn.edu net/mlx4_en: Fix an use-after-free bug in mlx4_en_try_alloc_resources()
Arnd Bergmann arnd@arndb.de siphash: use _unaligned version by default
Benjamin Poirier bpoirier@nvidia.com net: mpls: Fix notifications when deleting a device
Zhou Qingyang zhou1615@umn.edu net: qlogic: qlcnic: Fix a NULL pointer dereference in qlcnic_83xx_add_rings()
Randy Dunlap rdunlap@infradead.org natsemi: xtensa: fix section mismatch warnings
Linus Torvalds torvalds@linux-foundation.org fget: check that the fd still exists after getting a ref to it
Jens Axboe axboe@kernel.dk fs: add fget_many() and fput_many()
Baokun Li libaokun1@huawei.com sata_fsl: fix warning in remove_proc_entry when rmmod sata_fsl
Baokun Li libaokun1@huawei.com sata_fsl: fix UAF in sata_fsl_port_stop when rmmod sata_fsl
Masami Hiramatsu mhiramat@kernel.org kprobes: Limit max data_size of the kretprobe instances
Stephen Suryaputra ssuryaextr@gmail.com vrf: Reset IPCB/IP6CB when processing outbound pkts in vrf dev xmit
Ian Rogers irogers@google.com perf hist: Fix memory leak of a perf_hpp_fmt
Teng Qi starmiku1207184332@gmail.com net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock()
zhangyue zhangyue1@kylinos.cn net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound
Teng Qi starmiku1207184332@gmail.com ethernet: hisilicon: hns: hns_dsaf_misc: fix a possible array overflow in hns_dsaf_ge_srst_by_port()
Mike Christie michael.christie@oracle.com scsi: iscsi: Unblock session then wake up error handler
Manaf Meethalavalappu Pallikunhi manafm@codeaurora.org thermal: core: Reset previous low and high trip during thermal zone init
Wang Yugui wangyugui@e16-tech.com btrfs: check-integrity: fix a warning on write caching disabled disk
Vasily Gorbik gor@linux.ibm.com s390/setup: avoid using memblock_enforce_memory_limit
Slark Xiao slark_xiao@163.com platform/x86: thinkpad_acpi: Fix WWAN device disabled issue after S3 deep
liuguoqiang liuguoqiang@uniontech.com net: return correct error code
Mike Kravetz mike.kravetz@oracle.com hugetlb: take PMD sharing into account when flushing tlb/caches
Benjamin Coddington bcodding@redhat.com NFSv42: Fix pagecache invalidation after COPY/CLONE
Alexander Mikhalitsyn alexander.mikhalitsyn@virtuozzo.com ipc: WARN if trying to remove ipc object which is absent
Alexander Mikhalitsyn alexander.mikhalitsyn@virtuozzo.com shm: extend forced shm destroy to support objects from several IPC nses
Juergen Gross jgross@suse.com tty: hvc: replace BUG_ON() with negative return value
Juergen Gross jgross@suse.com xen/netfront: don't trust the backend response data blindly
Juergen Gross jgross@suse.com xen/netfront: disentangle tx_skb_freelist
Juergen Gross jgross@suse.com xen/netfront: don't read data from request on the ring page
Juergen Gross jgross@suse.com xen/netfront: read response from backend only once
Juergen Gross jgross@suse.com xen/blkfront: don't trust the backend response data blindly
Juergen Gross jgross@suse.com xen/blkfront: don't take local copy of a request from the ring page
Juergen Gross jgross@suse.com xen/blkfront: read response from backend only once
Juergen Gross jgross@suse.com xen: sync include/xen/interface/io/ring.h with Xen's newest version
Miklos Szeredi mszeredi@redhat.com fuse: release pipe buf after last use
Lin Ma linma@zju.edu.cn NFC: add NCI_UNREG flag to eliminate the race
David Hildenbrand david@redhat.com proc/vmcore: fix clearing user buffer by properly using clear_user()
Nadav Amit namit@vmware.com hugetlbfs: flush TLBs correctly after huge_pmd_unshare
Marek Behún marek.behun@nic.cz arm64: dts: marvell: armada-37xx: Set pcie_reset_pin to gpio function
Miquel Raynal miquel.raynal@bootlin.com arm64: dts: marvell: armada-37xx: declare PCIe reset pin
Marek Behún kabel@kernel.org pinctrl: armada-37xx: Correct PWM pins definitions
Gregory CLEMENT gregory.clement@bootlin.com pinctrl: armada-37xx: add missing pin: PCIe1 Wakeup
Marek Behún marek.behun@nic.cz pinctrl: armada-37xx: Correct mpp definitions
Pali Rohár pali@kernel.org PCI: aardvark: Fix checking for link up via LTSSM state
Pali Rohár pali@kernel.org PCI: aardvark: Fix link training
Frederick Lawler fred@fredlawl.com PCI: Add PCI_EXP_LNKCTL2_TLS* macros
Pali Rohár pali@kernel.org PCI: aardvark: Fix PCIe Max Payload Size setting
Pali Rohár pali@kernel.org PCI: aardvark: Configure PCIe resources from 'ranges' DT property
Evan Wang xswang@marvell.com PCI: aardvark: Remove PCIe outbound window configuration
Pali Rohár pali@kernel.org PCI: aardvark: Update comment about disabling link training
Pali Rohár pali@kernel.org PCI: aardvark: Move PCIe reset card code to advk_pcie_train_link()
Pali Rohár pali@kernel.org PCI: aardvark: Fix compilation on s390
Pali Rohár pali@kernel.org PCI: aardvark: Don't touch PCIe registers if no card connected
Thomas Petazzoni thomas.petazzoni@bootlin.com PCI: aardvark: Introduce an advk_pcie_valid_device() helper
Pali Rohár pali@kernel.org PCI: aardvark: Indicate error in 'val' when config read fails
Pali Rohár pali@kernel.org PCI: aardvark: Replace custom macros by standard linux/pci_regs.h macros
Pali Rohár pali@kernel.org PCI: aardvark: Issue PERST via GPIO
Marek Behún marek.behun@nic.cz PCI: aardvark: Improve link training
Pali Rohár pali@kernel.org PCI: aardvark: Train link immediately after enabling training
Remi Pommarel repk@triplefau.lt PCI: aardvark: Wait for endpoint to be ready before training link
Wen Yang wen.yang99@zte.com.cn PCI: aardvark: Fix a leaked reference by adding missing of_node_put()
Sergei Shtylyov sergei.shtylyov@cogentembedded.com PCI: aardvark: Fix I/O space page leak
David Hildenbrand david@redhat.com s390/mm: validate VMA in PGSTE manipulation functions
Steven Rostedt (VMware) rostedt@goodmis.org tracing: Check pid filtering when creating events
Stefano Garzarella sgarzare@redhat.com vhost/vsock: fix incorrect used length reported to the guest
Tony Lu tonylu@linux.alibaba.com net/smc: Don't call clcsock shutdown twice when smc shutdown
Huang Pei huangpei@loongson.cn MIPS: use 3-level pgtable for 64KB page size on MIPS_VA_BITS_48
Eric Dumazet edumazet@google.com tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows
Thomas Zeitlhofer thomas.zeitlhofer+lkml@ze-it.at PM: hibernate: use correct mode for swsusp_close()
Tony Lu tonylu@linux.alibaba.com net/smc: Ensure the active closing peer first closes clcsock
Eric Dumazet edumazet@google.com ipv6: fix typos in __ip6_finish_output()
Dan Carpenter dan.carpenter@oracle.com drm/vc4: fix error code in vc4_create_object()
Sreekanth Reddy sreekanth.reddy@broadcom.com scsi: mpt3sas: Fix kernel panic during drive powercycle test
Takashi Iwai tiwai@suse.de ARM: socfpga: Fix crash with CONFIG_FORTIRY_SOURCE
Trond Myklebust trond.myklebust@hammerspace.com NFSv42: Don't fail clone() unless the OP_CLONE operation failed
Alexander Aring aahringo@redhat.com net: ieee802154: handle iftypes as u32
Takashi Iwai tiwai@suse.de ASoC: topology: Add missing rwsem around snd_ctl_remove() calls
Florian Fainelli f.fainelli@gmail.com ARM: dts: BCM5301X: Add interrupt properties to GPIO node
Florian Fainelli f.fainelli@gmail.com ARM: dts: BCM5301X: Fix I2C controller interrupt
yangxingwu xingwu.yang@gmail.com netfilter: ipvs: Fix reuse connection if RS weight is 0
Steven Rostedt (VMware) rostedt@goodmis.org tracing: Fix pid filtering when triggers are attached
Stefano Stabellini stefano.stabellini@xilinx.com xen: detect uninitialized xenbus in xenbus_init
Stefano Stabellini stefano.stabellini@xilinx.com xen: don't continue xenstore initialization in case of errors
Miklos Szeredi mszeredi@redhat.com fuse: fix page stealing
Dan Carpenter dan.carpenter@oracle.com staging: rtl8192e: Fix use after free in _rtl92e_pci_disconnect()
Jason Gerecke killertofu@gmail.com HID: wacom: Use "Confidence" flag to prevent reporting invalid contacts
Hans Verkuil hverkuil-cisco@xs4all.nl media: cec: copy sequence field for the reply
Takashi Iwai tiwai@suse.de ALSA: ctxfi: Fix out-of-range access
Todd Kjos tkjos@google.com binder: fix test regression due to sender_euid change
Mathias Nyman mathias.nyman@linux.intel.com usb: hub: Fix locking issues with address0_mutex
Mathias Nyman mathias.nyman@linux.intel.com usb: hub: Fix usb enumeration issue due to address0 race
Mingjie Zhang superzmj@fibocom.com USB: serial: option: add Fibocom FM101-GL variants
Daniele Palmas dnlplm@gmail.com USB: serial: option: add Telit LE910S1 0x9200 composition
-------------
Diffstat:
.../pinctrl/marvell,armada-37xx-pinctrl.txt | 26 +- Documentation/networking/ipvs-sysctl.txt | 3 +- Makefile | 4 +- arch/arm/boot/dts/bcm5301x.dtsi | 4 +- arch/arm/include/asm/tlb.h | 8 + arch/arm/mach-socfpga/core.h | 2 +- arch/arm/mach-socfpga/platsmp.c | 8 +- arch/arm64/boot/dts/marvell/armada-3720-db.dts | 3 + .../boot/dts/marvell/armada-3720-espressobin.dts | 3 + arch/arm64/boot/dts/marvell/armada-37xx.dtsi | 9 + arch/ia64/include/asm/tlb.h | 10 + arch/mips/Kconfig | 2 +- arch/parisc/Makefile | 5 + arch/parisc/install.sh | 1 + arch/parisc/kernel/time.c | 24 +- arch/s390/include/asm/tlb.h | 14 + arch/s390/kernel/setup.c | 3 - arch/s390/mm/pgtable.c | 13 + arch/sh/include/asm/tlb.h | 10 + arch/um/include/asm/tlb.h | 12 + arch/x86/realmode/init.c | 12 +- drivers/android/binder.c | 2 +- drivers/ata/sata_fsl.c | 20 +- drivers/block/xen-blkfront.c | 126 ++++-- drivers/gpu/drm/vc4/vc4_bo.c | 2 +- drivers/hid/wacom_wac.c | 8 +- drivers/hid/wacom_wac.h | 1 + drivers/media/cec/cec-adap.c | 1 + drivers/net/ethernet/dec/tulip/de4x5.c | 34 +- drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c | 4 + drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 9 +- drivers/net/ethernet/natsemi/xtsonic.c | 2 +- .../net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 10 +- drivers/net/usb/lan78xx.c | 2 +- drivers/net/vrf.c | 2 + drivers/net/xen-netfront.c | 257 +++++++----- drivers/pci/host/pci-aardvark.c | 463 +++++++++++++++++---- drivers/pinctrl/mvebu/pinctrl-armada-37xx.c | 28 +- drivers/platform/x86/thinkpad_acpi.c | 12 - drivers/scsi/mpt3sas/mpt3sas_scsih.c | 2 +- drivers/scsi/scsi_transport_iscsi.c | 6 +- drivers/staging/rtl8192e/rtl8192e/rtl_core.c | 3 +- drivers/staging/typec/tcpm.c | 4 - drivers/thermal/thermal_core.c | 2 + drivers/tty/hvc/hvc_xen.c | 17 +- drivers/tty/serial/amba-pl011.c | 1 + drivers/tty/serial/msm_serial.c | 3 + drivers/tty/serial/serial_core.c | 13 +- drivers/usb/core/hub.c | 23 +- drivers/usb/host/xhci-ring.c | 21 +- drivers/usb/serial/option.c | 5 + drivers/vhost/vsock.c | 2 +- drivers/video/console/vgacon.c | 14 +- drivers/xen/xenbus/xenbus_probe.c | 27 +- fs/btrfs/disk-io.c | 14 +- fs/file.c | 19 +- fs/file_table.c | 9 +- fs/fuse/dev.c | 14 +- fs/nfs/nfs42proc.c | 5 +- fs/nfs/nfs42xdr.c | 3 +- fs/proc/vmcore.c | 15 +- include/asm-generic/tlb.h | 2 + include/linux/file.h | 2 + include/linux/fs.h | 4 +- include/linux/ipc_namespace.h | 15 + include/linux/kprobes.h | 2 + include/linux/sched/task.h | 2 +- include/linux/shm.h | 13 +- include/linux/siphash.h | 14 +- include/net/nfc/nci_core.h | 1 + include/net/nl802154.h | 7 +- include/uapi/linux/pci_regs.h | 5 + include/xen/interface/io/ring.h | 293 +++++++------ ipc/shm.c | 176 ++++++-- ipc/util.c | 6 +- kernel/kprobes.c | 3 + kernel/power/hibernate.c | 6 +- kernel/trace/trace.h | 24 +- kernel/trace/trace_events.c | 7 + lib/siphash.c | 12 +- mm/hugetlb.c | 72 +++- mm/memory.c | 10 + net/ipv4/devinet.c | 2 +- net/ipv4/tcp_cubic.c | 5 +- net/ipv6/ip6_output.c | 2 +- net/mpls/af_mpls.c | 68 ++- net/netfilter/ipvs/ip_vs_core.c | 8 +- net/nfc/nci/core.c | 19 +- net/rds/tcp.c | 2 +- net/smc/af_smc.c | 8 +- net/smc/smc_close.c | 10 + sound/pci/ctxfi/ctamixer.c | 14 +- sound/pci/ctxfi/ctdaio.c | 16 +- sound/pci/ctxfi/ctresource.c | 7 +- sound/pci/ctxfi/ctresource.h | 4 +- sound/pci/ctxfi/ctsrc.c | 7 +- sound/soc/soc-topology.c | 3 + tools/perf/ui/hist.c | 28 +- tools/perf/util/hist.h | 1 - 99 files changed, 1591 insertions(+), 670 deletions(-)
From: Daniele Palmas dnlplm@gmail.com
commit e353f3e88720300c3d72f49a4bea54f42db1fa5e upstream.
Add the following Telit LE910S1 composition:
0x9200: tty
Signed-off-by: Daniele Palmas dnlplm@gmail.com Link: https://lore.kernel.org/r/20211119140319.10448-1-dnlplm@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/serial/option.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -1270,6 +1270,8 @@ static const struct usb_device_id option .driver_info = NCTRL(2) }, { USB_DEVICE(TELIT_VENDOR_ID, 0x9010), /* Telit SBL FN980 flashing device */ .driver_info = NCTRL(0) | ZLP }, + { USB_DEVICE(TELIT_VENDOR_ID, 0x9200), /* Telit LE910S1 flashing device */ + .driver_info = NCTRL(0) | ZLP }, { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, ZTE_PRODUCT_MF622, 0xff, 0xff, 0xff) }, /* ZTE WCDMA products */ { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0002, 0xff, 0xff, 0xff), .driver_info = RSVD(1) },
From: Mingjie Zhang superzmj@fibocom.com
commit 88459e3e42760abb2299bbf6cb1026491170e02a upstream.
Update the USB serial option driver support for the Fibocom FM101-GL Cat.6 LTE modules as there are actually several different variants. - VID:PID 2cb7:01a2, FM101-GL are laptop M.2 cards (with MBIM interfaces for /Linux/Chrome OS) - VID:PID 2cb7:01a4, FM101-GL for laptop debug M.2 cards(with adb interface for /Linux/Chrome OS)
0x01a2: mbim, tty, tty, diag, gnss 0x01a4: mbim, diag, tty, adb, gnss, gnss
Here are the outputs of lsusb -v and usb-devices:
T: Bus=02 Lev=01 Prnt=01 Port=03 Cnt=01 Dev#= 86 Spd=5000 MxCh= 0 D: Ver= 3.20 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 9 #Cfgs= 1 P: Vendor=2cb7 ProdID=01a2 Rev= 5.04 S: Manufacturer=Fibocom Wireless Inc. S: Product=Fibocom FM101-GL Module S: SerialNumber=673326ce C:* #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=896mA A: FirstIf#= 0 IfCount= 2 Cls=02(comm.) Sub=0e Prot=00 I:* If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=0e Prot=00 Driver=cdc_mbim I: If#= 1 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim I:* If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=(none) I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=(none) I:* If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=(none) I:* If#= 5 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=40 Driver=(none)
Bus 002 Device 084: ID 2cb7:01a2 Fibocom Wireless Inc. Fibocom FM101-GL Module Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 3.20 bDeviceClass 0 bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 9 idVendor 0x2cb7 idProduct 0x01a2 bcdDevice 5.04 iManufacturer 1 Fibocom Wireless Inc. iProduct 2 Fibocom FM101-GL Module iSerial 3 673326ce bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 0x015d bNumInterfaces 6 bConfigurationValue 1 iConfiguration 4 MBIM_DUN_DUN_DIAG_NMEA bmAttributes 0xa0 (Bus Powered) Remote Wakeup MaxPower 896mA Interface Association: bLength 8 bDescriptorType 11 bFirstInterface 0 bInterfaceCount 2 bFunctionClass 2 Communications bFunctionSubClass 14 bFunctionProtocol 0 iFunction 0 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 2 Communications bInterfaceSubClass 14 bInterfaceProtocol 0 iInterface 5 Fibocom FM101-GL LTE Modem CDC Header: bcdCDC 1.10 CDC Union: bMasterInterface 0 bSlaveInterface 1 CDC MBIM: bcdMBIMVersion 1.00 wMaxControlMessage 4096 bNumberFilters 32 bMaxFilterSize 128 wMaxSegmentSize 2048 bmNetworkCapabilities 0x20 8-byte ntb input size CDC MBIM Extended: bcdMBIMExtendedVersion 1.00 bMaxOutstandingCommandMessages 64 wMTU 1500 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0040 1x 64 bytes bInterval 9 bMaxBurst 0 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 1 bAlternateSetting 0 bNumEndpoints 0 bInterfaceClass 10 CDC Data bInterfaceSubClass 0 bInterfaceProtocol 2 iInterface 0 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 1 bAlternateSetting 1 bNumEndpoints 2 bInterfaceClass 10 CDC Data bInterfaceSubClass 0 bInterfaceProtocol 2 iInterface 6 MBIM Data Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x8e EP 14 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 6 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x0f EP 15 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 2 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 2 bAlternateSetting 0 bNumEndpoints 3 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 64 iInterface 0 ** UNRECOGNIZED: 05 24 00 10 01 ** UNRECOGNIZED: 05 24 01 00 00 ** UNRECOGNIZED: 04 24 02 02 ** UNRECOGNIZED: 05 24 06 00 00 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x83 EP 3 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x000a 1x 10 bytes bInterval 9 bMaxBurst 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x01 EP 1 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 3 bAlternateSetting 0 bNumEndpoints 3 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 64 iInterface 0 ** UNRECOGNIZED: 05 24 00 10 01 ** UNRECOGNIZED: 05 24 01 00 00 ** UNRECOGNIZED: 04 24 02 02 ** UNRECOGNIZED: 05 24 06 00 00 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x85 EP 5 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x000a 1x 10 bytes bInterval 9 bMaxBurst 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x84 EP 4 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x02 EP 2 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 4 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 48 iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x03 EP 3 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x86 EP 6 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 5 bAlternateSetting 0 bNumEndpoints 3 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 0 bInterfaceProtocol 64 iInterface 0 ** UNRECOGNIZED: 05 24 00 10 01 ** UNRECOGNIZED: 05 24 01 00 00 ** UNRECOGNIZED: 04 24 02 02 ** UNRECOGNIZED: 05 24 06 00 00 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x88 EP 8 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x000a 1x 10 bytes bInterval 9 bMaxBurst 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x87 EP 7 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x04 EP 4 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0
T: Bus=02 Lev=01 Prnt=01 Port=03 Cnt=01 Dev#= 85 Spd=5000 MxCh= 0 D: Ver= 3.20 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 9 #Cfgs= 1 P: Vendor=2cb7 ProdID=01a4 Rev= 5.04 S: Manufacturer=Fibocom Wireless Inc. S: Product=Fibocom FM101-GL Module S: SerialNumber=673326ce C:* #Ifs= 7 Cfg#= 1 Atr=a0 MxPwr=896mA A: FirstIf#= 0 IfCount= 2 Cls=02(comm.) Sub=0e Prot=00 I:* If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=0e Prot=00 Driver=cdc_mbim I: If#= 1 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim I:* If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=(none) I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=(none) I:* If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none) I:* If#= 5 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=40 Driver=(none) I:* If#= 6 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=40 Driver=(none)
Bus 002 Device 085: ID 2cb7:01a4 Fibocom Wireless Inc. Fibocom FM101-GL Module Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 3.20 bDeviceClass 0 bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 9 idVendor 0x2cb7 idProduct 0x01a4 bcdDevice 5.04 iManufacturer 1 Fibocom Wireless Inc. iProduct 2 Fibocom FM101-GL Module iSerial 3 673326ce bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 0x0180 bNumInterfaces 7 bConfigurationValue 1 iConfiguration 4 MBIM_DIAG_DUN_ADB_GNSS_GNSS bmAttributes 0xa0 (Bus Powered) Remote Wakeup MaxPower 896mA Interface Association: bLength 8 bDescriptorType 11 bFirstInterface 0 bInterfaceCount 2 bFunctionClass 2 Communications bFunctionSubClass 14 bFunctionProtocol 0 iFunction 0 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 2 Communications bInterfaceSubClass 14 bInterfaceProtocol 0 iInterface 5 Fibocom FM101-GL LTE Modem CDC Header: bcdCDC 1.10 CDC Union: bMasterInterface 0 bSlaveInterface 1 CDC MBIM: bcdMBIMVersion 1.00 wMaxControlMessage 4096 bNumberFilters 32 bMaxFilterSize 128 wMaxSegmentSize 2048 bmNetworkCapabilities 0x20 8-byte ntb input size CDC MBIM Extended: bcdMBIMExtendedVersion 1.00 bMaxOutstandingCommandMessages 64 wMTU 1500 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0040 1x 64 bytes bInterval 9 bMaxBurst 0 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 1 bAlternateSetting 0 bNumEndpoints 0 bInterfaceClass 10 CDC Data bInterfaceSubClass 0 bInterfaceProtocol 2 iInterface 0 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 1 bAlternateSetting 1 bNumEndpoints 2 bInterfaceClass 10 CDC Data bInterfaceSubClass 0 bInterfaceProtocol 2 iInterface 6 MBIM Data Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x8e EP 14 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 6 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x0f EP 15 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 2 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 2 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 48 iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x01 EP 1 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 3 bAlternateSetting 0 bNumEndpoints 3 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 64 iInterface 0 ** UNRECOGNIZED: 05 24 00 10 01 ** UNRECOGNIZED: 05 24 01 00 00 ** UNRECOGNIZED: 04 24 02 02 ** UNRECOGNIZED: 05 24 06 00 00 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x84 EP 4 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x000a 1x 10 bytes bInterval 9 bMaxBurst 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x83 EP 3 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x02 EP 2 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 4 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 66 bInterfaceProtocol 1 iInterface 8 ADB Interface Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x03 EP 3 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x85 EP 5 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 5 bAlternateSetting 0 bNumEndpoints 3 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 0 bInterfaceProtocol 64 iInterface 0 ** UNRECOGNIZED: 05 24 00 10 01 ** UNRECOGNIZED: 05 24 01 00 00 ** UNRECOGNIZED: 04 24 02 02 ** UNRECOGNIZED: 05 24 06 00 00 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x87 EP 7 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x000a 1x 10 bytes bInterval 9 bMaxBurst 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x86 EP 6 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x04 EP 4 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 6 bAlternateSetting 0 bNumEndpoints 3 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 0 bInterfaceProtocol 64 iInterface 0 ** UNRECOGNIZED: 05 24 00 10 01 ** UNRECOGNIZED: 05 24 01 00 00 ** UNRECOGNIZED: 04 24 02 02 ** UNRECOGNIZED: 05 24 06 00 00 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x89 EP 9 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x000a 1x 10 bytes bInterval 9 bMaxBurst 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x88 EP 8 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x05 EP 5 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 0
Signed-off-by: Mingjie Zhang superzmj@fibocom.com Link: https://lore.kernel.org/r/20211123133757.37475-1-superzmj@fibocom.com Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/serial/option.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -2098,6 +2098,9 @@ static const struct usb_device_id option { USB_DEVICE_AND_INTERFACE_INFO(0x2cb7, 0x010b, 0xff, 0xff, 0x30) }, /* Fibocom FG150 Diag */ { USB_DEVICE_AND_INTERFACE_INFO(0x2cb7, 0x010b, 0xff, 0, 0) }, /* Fibocom FG150 AT */ { USB_DEVICE_INTERFACE_CLASS(0x2cb7, 0x01a0, 0xff) }, /* Fibocom NL668-AM/NL652-EU (laptop MBIM) */ + { USB_DEVICE_INTERFACE_CLASS(0x2cb7, 0x01a2, 0xff) }, /* Fibocom FM101-GL (laptop MBIM) */ + { USB_DEVICE_INTERFACE_CLASS(0x2cb7, 0x01a4, 0xff), /* Fibocom FM101-GL (laptop MBIM) */ + .driver_info = RSVD(4) }, { USB_DEVICE_INTERFACE_CLASS(0x2df3, 0x9d03, 0xff) }, /* LongSung M5710 */ { USB_DEVICE_INTERFACE_CLASS(0x305a, 0x1404, 0xff) }, /* GosunCn GM500 RNDIS */ { USB_DEVICE_INTERFACE_CLASS(0x305a, 0x1405, 0xff) }, /* GosunCn GM500 MBIM */
From: Mathias Nyman mathias.nyman@linux.intel.com
commit 6ae6dc22d2d1ce6aa77a6da8a761e61aca216f8b upstream.
xHC hardware can only have one slot in default state with address 0 waiting for a unique address at a time, otherwise "undefined behavior may occur" according to xhci spec 5.4.3.4
The address0_mutex exists to prevent this across both xhci roothubs.
If hub_port_init() fails, it may unlock the mutex and exit with a xhci slot in default state. If the other xhci roothub calls hub_port_init() at this point we end up with two slots in default state.
Make sure the address0_mutex protects the slot default state across hub_port_init() retries, until slot is addressed or disabled.
Note, one known minor case is not fixed by this patch. If device needs to be reset during resume, but fails all hub_port_init() retries in usb_reset_and_verify_device(), then it's possible the slot is still left in default state when address0_mutex is unlocked.
Cc: stable@vger.kernel.org Fixes: 638139eb95d2 ("usb: hub: allow to process more usb hub events in parallel") Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20211115221630.871204-1-mathias.nyman@linux.intel.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/core/hub.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
--- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -4472,8 +4472,6 @@ hub_port_init(struct usb_hub *hub, struc if (oldspeed == USB_SPEED_LOW) delay = HUB_LONG_RESET_TIME;
- mutex_lock(hcd->address0_mutex); - /* Reset the device; full speed may morph to high speed */ /* FIXME a USB 2.0 device may morph into SuperSpeed on reset. */ retval = hub_port_reset(hub, port1, udev, delay, false); @@ -4772,7 +4770,6 @@ fail: hub_port_disable(hub, port1, 0); update_devnum(udev, devnum); /* for disconnect processing */ } - mutex_unlock(hcd->address0_mutex); return retval; }
@@ -4917,6 +4914,9 @@ static void hub_port_connect(struct usb_ unit_load = 100;
status = 0; + + mutex_lock(hcd->address0_mutex); + for (i = 0; i < SET_CONFIG_TRIES; i++) {
/* reallocate for each attempt, since references @@ -4953,6 +4953,8 @@ static void hub_port_connect(struct usb_ if (status < 0) goto loop;
+ mutex_unlock(hcd->address0_mutex); + if (udev->quirks & USB_QUIRK_DELAY_INIT) msleep(2000);
@@ -5041,6 +5043,7 @@ static void hub_port_connect(struct usb_
loop_disable: hub_port_disable(hub, port1, 1); + mutex_lock(hcd->address0_mutex); loop: usb_ep0_reinit(udev); release_devnum(udev); @@ -5067,6 +5070,8 @@ loop: }
done: + mutex_unlock(hcd->address0_mutex); + hub_port_disable(hub, port1, 1); if (hcd->driver->relinquish_port && !hub->hdev->parent) { if (status != -ENOTCONN && status != -ENODEV) @@ -5608,6 +5613,8 @@ static int usb_reset_and_verify_device(s bos = udev->bos; udev->bos = NULL;
+ mutex_lock(hcd->address0_mutex); + for (i = 0; i < SET_CONFIG_TRIES; ++i) {
/* ep0 maxpacket size may change; let the HCD know about it. @@ -5617,6 +5624,7 @@ static int usb_reset_and_verify_device(s if (ret >= 0 || ret == -ENOTCONN || ret == -ENODEV) break; } + mutex_unlock(hcd->address0_mutex);
if (ret < 0) goto re_enumerate;
From: Mathias Nyman mathias.nyman@linux.intel.com
commit 6cca13de26eea6d32a98d96d916a048d16a12822 upstream.
Fix the circular lock dependency and unbalanced unlock of addess0_mutex introduced when fixing an address0_mutex enumeration retry race in commit ae6dc22d2d1 ("usb: hub: Fix usb enumeration issue due to address0 race")
Make sure locking order between port_dev->status_lock and address0_mutex is correct, and that address0_mutex is not unlocked in hub_port_connect "done:" codepath which may be reached without locking address0_mutex
Fixes: 6ae6dc22d2d1 ("usb: hub: Fix usb enumeration issue due to address0 race") Cc: stable@vger.kernel.org Reported-by: Marek Szyprowski m.szyprowski@samsung.com Tested-by: Hans de Goede hdegoede@redhat.com Tested-by: Marek Szyprowski m.szyprowski@samsung.com Acked-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20211123101656.1113518-1-mathias.nyman@linux.intel... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/core/hub.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-)
--- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -4859,6 +4859,7 @@ static void hub_port_connect(struct usb_ struct usb_port *port_dev = hub->ports[port1 - 1]; struct usb_device *udev = port_dev->child; static int unreliable_port = -1; + bool retry_locked;
/* Disconnect any existing devices under this port */ if (udev) { @@ -4915,9 +4916,10 @@ static void hub_port_connect(struct usb_
status = 0;
- mutex_lock(hcd->address0_mutex); - for (i = 0; i < SET_CONFIG_TRIES; i++) { + usb_lock_port(port_dev); + mutex_lock(hcd->address0_mutex); + retry_locked = true;
/* reallocate for each attempt, since references * to the previous one can escape in various ways @@ -4926,6 +4928,8 @@ static void hub_port_connect(struct usb_ if (!udev) { dev_err(&port_dev->dev, "couldn't allocate usb_device\n"); + mutex_unlock(hcd->address0_mutex); + usb_unlock_port(port_dev); goto done; }
@@ -4947,13 +4951,13 @@ static void hub_port_connect(struct usb_ }
/* reset (non-USB 3.0 devices) and get descriptor */ - usb_lock_port(port_dev); status = hub_port_init(hub, udev, port1, i); - usb_unlock_port(port_dev); if (status < 0) goto loop;
mutex_unlock(hcd->address0_mutex); + usb_unlock_port(port_dev); + retry_locked = false;
if (udev->quirks & USB_QUIRK_DELAY_INIT) msleep(2000); @@ -5043,11 +5047,14 @@ static void hub_port_connect(struct usb_
loop_disable: hub_port_disable(hub, port1, 1); - mutex_lock(hcd->address0_mutex); loop: usb_ep0_reinit(udev); release_devnum(udev); hub_free_dev(udev); + if (retry_locked) { + mutex_unlock(hcd->address0_mutex); + usb_unlock_port(port_dev); + } usb_put_dev(udev); if ((status == -ENOTCONN) || (status == -ENOTSUPP)) break; @@ -5070,8 +5077,6 @@ loop: }
done: - mutex_unlock(hcd->address0_mutex); - hub_port_disable(hub, port1, 1); if (hcd->driver->relinquish_port && !hub->hdev->parent) { if (status != -ENOTCONN && status != -ENODEV)
From: Todd Kjos tkjos@google.com
commit c21a80ca0684ec2910344d72556c816cb8940c01 upstream.
This is a partial revert of commit 29bc22ac5e5b ("binder: use euid from cred instead of using task"). Setting sender_euid using proc->cred caused some Android system test regressions that need further investigation. It is a partial reversion because subsequent patches rely on proc->cred.
Fixes: 29bc22ac5e5b ("binder: use euid from cred instead of using task") Cc: stable@vger.kernel.org # 4.4+ Acked-by: Christian Brauner christian.brauner@ubuntu.com Signed-off-by: Todd Kjos tkjos@google.com Change-Id: I9b1769a3510fed250bb21859ef8beebabe034c66 Link: https://lore.kernel.org/r/20211112180720.2858135-1-tkjos@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/android/binder.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -2894,7 +2894,7 @@ static void binder_transaction(struct bi t->from = thread; else t->from = NULL; - t->sender_euid = proc->cred->euid; + t->sender_euid = task_euid(proc->tsk); t->to_proc = target_proc; t->to_thread = target_thread; t->code = tr->code;
From: Takashi Iwai tiwai@suse.de
commit 76c47183224c86e4011048b80f0e2d0d166f01c2 upstream.
The master and next_conj of rcs_ops are used for iterating the resource list entries, and currently those are supposed to return the current value. The problem is that next_conf may go over the last entry before the loop abort condition is evaluated, and it may return the "current" value that is beyond the array size. It was caught recently as a GPF, for example.
Those return values are, however, never actually evaluated, hence basically we don't have to consider the current value as the return at all. By dropping those return values, the potential out-of-range access above is also fixed automatically.
This patch changes the return type of master and next_conj callbacks to void and drop the superfluous code accordingly.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=214985 Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211118215729.26257-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/ctxfi/ctamixer.c | 14 ++++++-------- sound/pci/ctxfi/ctdaio.c | 16 ++++++++-------- sound/pci/ctxfi/ctresource.c | 7 +++---- sound/pci/ctxfi/ctresource.h | 4 ++-- sound/pci/ctxfi/ctsrc.c | 7 +++---- 5 files changed, 22 insertions(+), 26 deletions(-)
--- a/sound/pci/ctxfi/ctamixer.c +++ b/sound/pci/ctxfi/ctamixer.c @@ -27,16 +27,15 @@
#define BLANK_SLOT 4094
-static int amixer_master(struct rsc *rsc) +static void amixer_master(struct rsc *rsc) { rsc->conj = 0; - return rsc->idx = container_of(rsc, struct amixer, rsc)->idx[0]; + rsc->idx = container_of(rsc, struct amixer, rsc)->idx[0]; }
-static int amixer_next_conj(struct rsc *rsc) +static void amixer_next_conj(struct rsc *rsc) { rsc->conj++; - return container_of(rsc, struct amixer, rsc)->idx[rsc->conj]; }
static int amixer_index(const struct rsc *rsc) @@ -335,16 +334,15 @@ int amixer_mgr_destroy(struct amixer_mgr
/* SUM resource management */
-static int sum_master(struct rsc *rsc) +static void sum_master(struct rsc *rsc) { rsc->conj = 0; - return rsc->idx = container_of(rsc, struct sum, rsc)->idx[0]; + rsc->idx = container_of(rsc, struct sum, rsc)->idx[0]; }
-static int sum_next_conj(struct rsc *rsc) +static void sum_next_conj(struct rsc *rsc) { rsc->conj++; - return container_of(rsc, struct sum, rsc)->idx[rsc->conj]; }
static int sum_index(const struct rsc *rsc) --- a/sound/pci/ctxfi/ctdaio.c +++ b/sound/pci/ctxfi/ctdaio.c @@ -55,12 +55,12 @@ static struct daio_rsc_idx idx_20k2[NUM_ [SPDIFIO] = {.left = 0x05, .right = 0x85}, };
-static int daio_master(struct rsc *rsc) +static void daio_master(struct rsc *rsc) { /* Actually, this is not the resource index of DAIO. * For DAO, it is the input mapper index. And, for DAI, * it is the output time-slot index. */ - return rsc->conj = rsc->idx; + rsc->conj = rsc->idx; }
static int daio_index(const struct rsc *rsc) @@ -68,19 +68,19 @@ static int daio_index(const struct rsc * return rsc->conj; }
-static int daio_out_next_conj(struct rsc *rsc) +static void daio_out_next_conj(struct rsc *rsc) { - return rsc->conj += 2; + rsc->conj += 2; }
-static int daio_in_next_conj_20k1(struct rsc *rsc) +static void daio_in_next_conj_20k1(struct rsc *rsc) { - return rsc->conj += 0x200; + rsc->conj += 0x200; }
-static int daio_in_next_conj_20k2(struct rsc *rsc) +static void daio_in_next_conj_20k2(struct rsc *rsc) { - return rsc->conj += 0x100; + rsc->conj += 0x100; }
static const struct rsc_ops daio_out_rsc_ops = { --- a/sound/pci/ctxfi/ctresource.c +++ b/sound/pci/ctxfi/ctresource.c @@ -113,18 +113,17 @@ static int audio_ring_slot(const struct return (rsc->conj << 4) + offset_in_audio_slot_block[rsc->type]; }
-static int rsc_next_conj(struct rsc *rsc) +static void rsc_next_conj(struct rsc *rsc) { unsigned int i; for (i = 0; (i < 8) && (!(rsc->msr & (0x1 << i))); ) i++; rsc->conj += (AUDIO_SLOT_BLOCK_NUM >> i); - return rsc->conj; }
-static int rsc_master(struct rsc *rsc) +static void rsc_master(struct rsc *rsc) { - return rsc->conj = rsc->idx; + rsc->conj = rsc->idx; }
static const struct rsc_ops rsc_generic_ops = { --- a/sound/pci/ctxfi/ctresource.h +++ b/sound/pci/ctxfi/ctresource.h @@ -43,8 +43,8 @@ struct rsc { };
struct rsc_ops { - int (*master)(struct rsc *rsc); /* Move to master resource */ - int (*next_conj)(struct rsc *rsc); /* Move to next conjugate resource */ + void (*master)(struct rsc *rsc); /* Move to master resource */ + void (*next_conj)(struct rsc *rsc); /* Move to next conjugate resource */ int (*index)(const struct rsc *rsc); /* Return the index of resource */ /* Return the output slot number */ int (*output_slot)(const struct rsc *rsc); --- a/sound/pci/ctxfi/ctsrc.c +++ b/sound/pci/ctxfi/ctsrc.c @@ -594,16 +594,15 @@ int src_mgr_destroy(struct src_mgr *src_
/* SRCIMP resource manager operations */
-static int srcimp_master(struct rsc *rsc) +static void srcimp_master(struct rsc *rsc) { rsc->conj = 0; - return rsc->idx = container_of(rsc, struct srcimp, rsc)->idx[0]; + rsc->idx = container_of(rsc, struct srcimp, rsc)->idx[0]; }
-static int srcimp_next_conj(struct rsc *rsc) +static void srcimp_next_conj(struct rsc *rsc) { rsc->conj++; - return container_of(rsc, struct srcimp, rsc)->idx[rsc->conj]; }
static int srcimp_index(const struct rsc *rsc)
From: Hans Verkuil hverkuil-cisco@xs4all.nl
commit 13cbaa4c2b7bf9f8285e1164d005dbf08244ecd5 upstream.
When the reply for a non-blocking transmit arrives, the sequence field for that reply was never filled in, so userspace would have no way of associating the reply to the original transmit.
Copy the sequence field to ensure that this is now possible.
Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Fixes: 0dbacebede1e ([media] cec: move the CEC framework out of staging and to media) Cc: stable@vger.kernel.org Signed-off-by: Mauro Carvalho Chehab mchehab+huawei@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/cec/cec-adap.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/media/cec/cec-adap.c +++ b/drivers/media/cec/cec-adap.c @@ -1135,6 +1135,7 @@ void cec_received_msg_ts(struct cec_adap if (abort) dst->rx_status |= CEC_RX_STATUS_FEATURE_ABORT; msg->flags = dst->flags; + msg->sequence = dst->sequence; /* Remove it from the wait_queue */ list_del_init(&data->list);
From: Jason Gerecke killertofu@gmail.com
commit 7fb0413baa7f8a04caef0c504df9af7e0623d296 upstream.
The HID descriptor of many of Wacom's touch input devices include a "Confidence" usage that signals if a particular touch collection contains useful data. The driver does not look at this flag, however, which causes even invalid contacts to be reported to userspace. A lucky combination of kernel event filtering and device behavior (specifically: contact ID 0 == invalid, contact ID >0 == valid; and order all data so that all valid contacts are reported before any invalid contacts) spare most devices from any visibly-bad behavior.
The DTH-2452 is one example of an unlucky device that misbehaves. It uses ID 0 for both the first valid contact and all invalid contacts. Because we report both the valid and invalid contacts, the kernel reports that contact 0 first goes down (valid) and then goes up (invalid) in every report. This causes ~100 clicks per second simply by touching the screen.
This patch inroduces new `confidence` flag in our `hid_data` structure. The value is initially set to `true` at the start of a report and can be set to `false` if an invalid touch usage is seen.
Link: https://github.com/linuxwacom/input-wacom/issues/270 Fixes: f8b6a74719b5 ("HID: wacom: generic: Support multiple tools per report") Signed-off-by: Jason Gerecke jason.gerecke@wacom.com Tested-by: Joshua Dickens joshua.dickens@wacom.com Cc: stable@vger.kernel.org Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/wacom_wac.c | 8 +++++++- drivers/hid/wacom_wac.h | 1 + 2 files changed, 8 insertions(+), 1 deletion(-)
--- a/drivers/hid/wacom_wac.c +++ b/drivers/hid/wacom_wac.c @@ -2433,6 +2433,9 @@ static void wacom_wac_finger_event(struc struct wacom_features *features = &wacom->wacom_wac.features;
switch (equivalent_usage) { + case HID_DG_CONFIDENCE: + wacom_wac->hid_data.confidence = value; + break; case HID_GD_X: wacom_wac->hid_data.x = value; break; @@ -2463,7 +2466,8 @@ static void wacom_wac_finger_event(struc
if (usage->usage_index + 1 == field->report_count) { - if (equivalent_usage == wacom_wac->hid_data.last_slot_field) + if (equivalent_usage == wacom_wac->hid_data.last_slot_field && + wacom_wac->hid_data.confidence) wacom_wac_finger_slot(wacom_wac, wacom_wac->touch_input); } } @@ -2476,6 +2480,8 @@ static void wacom_wac_finger_pre_report( struct hid_data* hid_data = &wacom_wac->hid_data; int i;
+ hid_data->confidence = true; + for (i = 0; i < report->maxfield; i++) { struct hid_field *field = report->field[i]; int j; --- a/drivers/hid/wacom_wac.h +++ b/drivers/hid/wacom_wac.h @@ -293,6 +293,7 @@ struct hid_data { bool inrange_state; bool invert_state; bool tipswitch; + bool confidence; int x; int y; int pressure;
From: Dan Carpenter dan.carpenter@oracle.com
commit b535917c51acc97fb0761b1edec85f1f3d02bda4 upstream.
The free_rtllib() function frees the "dev" pointer so there is use after free on the next line. Re-arrange things to avoid that.
Fixes: 66898177e7e5 ("staging: rtl8192e: Fix unload/reload problem") Cc: stable stable@vger.kernel.org Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Link: https://lore.kernel.org/r/20211117072016.GA5237@kili Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/staging/rtl8192e/rtl8192e/rtl_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/staging/rtl8192e/rtl8192e/rtl_core.c +++ b/drivers/staging/rtl8192e/rtl8192e/rtl_core.c @@ -2582,13 +2582,14 @@ static void _rtl92e_pci_disconnect(struc free_irq(dev->irq, dev); priv->irq = 0; } - free_rtllib(dev);
if (dev->mem_start != 0) { iounmap((void __iomem *)dev->mem_start); release_mem_region(pci_resource_start(pdev, 1), pci_resource_len(pdev, 1)); } + + free_rtllib(dev); } else { priv = rtllib_priv(dev); }
From: Miklos Szeredi mszeredi@redhat.com
commit 712a951025c0667ff00b25afc360f74e639dfabe upstream.
It is possible to trigger a crash by splicing anon pipe bufs to the fuse device.
The reason for this is that anon_pipe_buf_release() will reuse buf->page if the refcount is 1, but that page might have already been stolen and its flags modified (e.g. PG_lru added).
This happens in the unlikely case of fuse_dev_splice_write() getting around to calling pipe_buf_release() after a page has been stolen, added to the page cache and removed from the page cache.
Fix by calling pipe_buf_release() right after the page was inserted into the page cache. In this case the page has an elevated refcount so any release function will know that the page isn't reusable.
Reported-by: Frank Dinoff fdinoff@google.com Link: https://lore.kernel.org/r/CAAmZXrsGg2xsP1CK+cbuEMumtrqdvD-NKnWzhNcvn71RV3c1y... Fixes: dd3bb14f44a6 ("fuse: support splice() writing to fuse device") Cc: stable@vger.kernel.org # v2.6.35 Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/fuse/dev.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
--- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -897,6 +897,12 @@ static int fuse_try_move_page(struct fus goto out_put_old; }
+ /* + * Release while we have extra ref on stolen page. Otherwise + * anon_pipe_buf_release() might think the page can be reused. + */ + pipe_buf_release(cs->pipe, buf); + get_page(newpage);
if (!(buf->flags & PIPE_BUF_FLAG_LRU)) @@ -2046,8 +2052,12 @@ static ssize_t fuse_dev_splice_write(str
pipe_lock(pipe); out_free: - for (idx = 0; idx < nbuf; idx++) - pipe_buf_release(pipe, &bufs[idx]); + for (idx = 0; idx < nbuf; idx++) { + struct pipe_buffer *buf = &bufs[idx]; + + if (buf->ops) + pipe_buf_release(pipe, buf); + } pipe_unlock(pipe);
kfree(bufs);
From: Stefano Stabellini stefano.stabellini@xilinx.com
commit 08f6c2b09ebd4b326dbe96d13f94fee8f9814c78 upstream.
In case of errors in xenbus_init (e.g. missing xen_store_gfn parameter), we goto out_error but we forget to reset xen_store_domain_type to XS_UNKNOWN. As a consequence xenbus_probe_initcall and other initcalls will still try to initialize xenstore resulting into a crash at boot.
[ 2.479830] Call trace: [ 2.482314] xb_init_comms+0x18/0x150 [ 2.486354] xs_init+0x34/0x138 [ 2.489786] xenbus_probe+0x4c/0x70 [ 2.498432] xenbus_probe_initcall+0x2c/0x7c [ 2.503944] do_one_initcall+0x54/0x1b8 [ 2.507358] kernel_init_freeable+0x1ac/0x210 [ 2.511617] kernel_init+0x28/0x130 [ 2.516112] ret_from_fork+0x10/0x20
Cc: Stable@vger.kernel.org Cc: jbeulich@suse.com Signed-off-by: Stefano Stabellini stefano.stabellini@xilinx.com Link: https://lore.kernel.org/r/20211115222719.2558207-1-sstabellini@kernel.org Reviewed-by: Jan Beulich jbeulich@suse.com Signed-off-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/xen/xenbus/xenbus_probe.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -838,7 +838,7 @@ static struct notifier_block xenbus_resu
static int __init xenbus_init(void) { - int err = 0; + int err; uint64_t v = 0; xen_store_domain_type = XS_UNKNOWN;
@@ -912,8 +912,10 @@ static int __init xenbus_init(void) */ proc_create_mount_point("xen"); #endif + return 0;
out_error: + xen_store_domain_type = XS_UNKNOWN; return err; }
From: Stefano Stabellini stefano.stabellini@xilinx.com
commit 36e8f60f0867d3b70d398d653c17108459a04efe upstream.
If the xenstore page hasn't been allocated properly, reading the value of the related hvm_param (HVM_PARAM_STORE_PFN) won't actually return error. Instead, it will succeed and return zero. Instead of attempting to xen_remap a bad guest physical address, detect this condition and return early.
Note that although a guest physical address of zero for HVM_PARAM_STORE_PFN is theoretically possible, it is not a good choice and zero has never been validly used in that capacity.
Also recognize all bits set as an invalid value.
For 32-bit Linux, any pfn above ULONG_MAX would get truncated. Pfns above ULONG_MAX should never be passed by the Xen tools to HVM guests anyway, so check for this condition and return early.
Cc: stable@vger.kernel.org Signed-off-by: Stefano Stabellini stefano.stabellini@xilinx.com Reviewed-by: Juergen Gross jgross@suse.com Reviewed-by: Jan Beulich jbeulich@suse.com Link: https://lore.kernel.org/r/20211123210748.1910236-1-sstabellini@kernel.org Signed-off-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/xen/xenbus/xenbus_probe.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+)
--- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -878,6 +878,29 @@ static int __init xenbus_init(void) err = hvm_get_parameter(HVM_PARAM_STORE_PFN, &v); if (err) goto out_error; + /* + * Uninitialized hvm_params are zero and return no error. + * Although it is theoretically possible to have + * HVM_PARAM_STORE_PFN set to zero on purpose, in reality it is + * not zero when valid. If zero, it means that Xenstore hasn't + * been properly initialized. Instead of attempting to map a + * wrong guest physical address return error. + * + * Also recognize all bits set as an invalid value. + */ + if (!v || !~v) { + err = -ENOENT; + goto out_error; + } + /* Avoid truncation on 32-bit. */ +#if BITS_PER_LONG == 32 + if (v > ULONG_MAX) { + pr_err("%s: cannot handle HVM_PARAM_STORE_PFN=%llx > ULONG_MAX\n", + __func__, v); + err = -EINVAL; + goto out_error; + } +#endif xen_store_gfn = (unsigned long)v; xen_store_interface = xen_remap(xen_store_gfn << XEN_PAGE_SHIFT,
From: Steven Rostedt (VMware) rostedt@goodmis.org
commit a55f224ff5f238013de8762c4287117e47b86e22 upstream.
If a event is filtered by pid and a trigger that requires processing of the event to happen is a attached to the event, the discard portion does not take the pid filtering into account, and the event will then be recorded when it should not have been.
Cc: stable@vger.kernel.org Fixes: 3fdaf80f4a836 ("tracing: Implement event pid filtering") Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace.h | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-)
--- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1363,14 +1363,26 @@ __event_trigger_test_discard(struct trac if (eflags & EVENT_FILE_FL_TRIGGER_COND) *tt = event_triggers_call(file, entry);
- if (test_bit(EVENT_FILE_FL_SOFT_DISABLED_BIT, &file->flags) || - (unlikely(file->flags & EVENT_FILE_FL_FILTERED) && - !filter_match_preds(file->filter, entry))) { - __trace_event_discard_commit(buffer, event); - return true; - } + if (likely(!(file->flags & (EVENT_FILE_FL_SOFT_DISABLED | + EVENT_FILE_FL_FILTERED | + EVENT_FILE_FL_PID_FILTER)))) + return false; + + if (file->flags & EVENT_FILE_FL_SOFT_DISABLED) + goto discard; + + if (file->flags & EVENT_FILE_FL_FILTERED && + !filter_match_preds(file->filter, entry)) + goto discard; + + if ((file->flags & EVENT_FILE_FL_PID_FILTER) && + trace_event_ignore_this_pid(file)) + goto discard;
return false; + discard: + __trace_event_discard_commit(buffer, event); + return true; }
/**
From: yangxingwu xingwu.yang@gmail.com
[ Upstream commit c95c07836fa4c1767ed11d8eca0769c652760e32 ]
We are changing expire_nodest_conn to work even for reused connections when conn_reuse_mode=0, just as what was done with commit dc7b3eb900aa ("ipvs: Fix reuse connection if real server is dead").
For controlled and persistent connections, the new connection will get the needed real server depending on the rules in ip_vs_check_template().
Fixes: d752c3645717 ("ipvs: allow rescheduling of new connections when port reuse is detected") Co-developed-by: Chuanqi Liu legend050709@qq.com Signed-off-by: Chuanqi Liu legend050709@qq.com Signed-off-by: yangxingwu xingwu.yang@gmail.com Acked-by: Simon Horman horms@verge.net.au Acked-by: Julian Anastasov ja@ssi.bg Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/networking/ipvs-sysctl.txt | 3 +-- net/netfilter/ipvs/ip_vs_core.c | 8 ++++---- 2 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/Documentation/networking/ipvs-sysctl.txt b/Documentation/networking/ipvs-sysctl.txt index 056898685d408..fc531c29a2e83 100644 --- a/Documentation/networking/ipvs-sysctl.txt +++ b/Documentation/networking/ipvs-sysctl.txt @@ -30,8 +30,7 @@ conn_reuse_mode - INTEGER
0: disable any special handling on port reuse. The new connection will be delivered to the same real server that was - servicing the previous connection. This will effectively - disable expire_nodest_conn. + servicing the previous connection.
bit 1: enable rescheduling of new connections when it is safe. That is, whenever expire_nodest_conn and for TCP sockets, when diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c index a95fe5fe9f046..4b9cd1c1c9987 100644 --- a/net/netfilter/ipvs/ip_vs_core.c +++ b/net/netfilter/ipvs/ip_vs_core.c @@ -1838,7 +1838,6 @@ ip_vs_in(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, int struct ip_vs_proto_data *pd; struct ip_vs_conn *cp; int ret, pkts; - int conn_reuse_mode; struct sock *sk;
/* Already marked as IPVS request or reply? */ @@ -1914,15 +1913,16 @@ ip_vs_in(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, int */ cp = pp->conn_in_get(ipvs, af, skb, &iph);
- conn_reuse_mode = sysctl_conn_reuse_mode(ipvs); - if (conn_reuse_mode && !iph.fragoffs && is_new_conn(skb, &iph) && cp) { + if (!iph.fragoffs && is_new_conn(skb, &iph) && cp) { + int conn_reuse_mode = sysctl_conn_reuse_mode(ipvs); bool old_ct = false, resched = false;
if (unlikely(sysctl_expire_nodest_conn(ipvs)) && cp->dest && unlikely(!atomic_read(&cp->dest->weight))) { resched = true; old_ct = ip_vs_conn_uses_old_conntrack(cp, skb); - } else if (is_new_conn_expected(cp, conn_reuse_mode)) { + } else if (conn_reuse_mode && + is_new_conn_expected(cp, conn_reuse_mode)) { old_ct = ip_vs_conn_uses_old_conntrack(cp, skb); if (!atomic_read(&cp->n_control)) { resched = true;
From: Florian Fainelli f.fainelli@gmail.com
[ Upstream commit 754c4050a00e802e122690112fc2c3a6abafa7e2 ]
The I2C interrupt controller line is off by 32 because the datasheet describes interrupt inputs into the GIC which are for Shared Peripheral Interrupts and are starting at offset 32. The ARM GIC binding expects the SPI interrupts to be numbered from 0 relative to the SPI base.
Fixes: bb097e3e0045 ("ARM: dts: BCM5301X: Add I2C support to the DT") Tested-by: Christian Lamparter chunkeey@gmail.com Signed-off-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/bcm5301x.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/boot/dts/bcm5301x.dtsi b/arch/arm/boot/dts/bcm5301x.dtsi index 165fd1c1461a0..01a4935598915 100644 --- a/arch/arm/boot/dts/bcm5301x.dtsi +++ b/arch/arm/boot/dts/bcm5301x.dtsi @@ -365,7 +365,7 @@ mdio: mdio@18003000 { i2c0: i2c@18009000 { compatible = "brcm,iproc-i2c"; reg = <0x18009000 0x50>; - interrupts = <GIC_SPI 121 IRQ_TYPE_LEVEL_HIGH>; + interrupts = <GIC_SPI 89 IRQ_TYPE_LEVEL_HIGH>; #address-cells = <1>; #size-cells = <0>; clock-frequency = <100000>;
From: Florian Fainelli f.fainelli@gmail.com
[ Upstream commit 40f7342f0587639e5ad625adaa15efdd3cffb18f ]
The GPIO controller is also an interrupt controller provider and is currently missing the appropriate 'interrupt-controller' and '#interrupt-cells' properties to denote that.
Fixes: fb026d3de33b ("ARM: BCM5301X: Add Broadcom's bus-axi to the DTS file") Signed-off-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/bcm5301x.dtsi | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/arm/boot/dts/bcm5301x.dtsi b/arch/arm/boot/dts/bcm5301x.dtsi index 01a4935598915..c3b6ba4db8e3d 100644 --- a/arch/arm/boot/dts/bcm5301x.dtsi +++ b/arch/arm/boot/dts/bcm5301x.dtsi @@ -246,6 +246,8 @@ chipcommon: chipcommon@0 {
gpio-controller; #gpio-cells = <2>; + interrupt-controller; + #interrupt-cells = <2>; };
pcie0: pcie@12000 {
From: Takashi Iwai tiwai@suse.de
[ Upstream commit 7e567b5ae06315ef2d70666b149962e2bb4b97af ]
snd_ctl_remove() has to be called with card->controls_rwsem held (when called after the card instantiation). This patch add the missing rwsem calls around it.
Fixes: 8a9782346dcc ("ASoC: topology: Add topology core") Signed-off-by: Takashi Iwai tiwai@suse.de Link: https://lore.kernel.org/r/20211116071812.18109-1-tiwai@suse.de Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/soc-topology.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/sound/soc/soc-topology.c b/sound/soc/soc-topology.c index 50aa45525be5a..0fbe505026997 100644 --- a/sound/soc/soc-topology.c +++ b/sound/soc/soc-topology.c @@ -2585,6 +2585,7 @@ EXPORT_SYMBOL_GPL(snd_soc_tplg_widget_remove_all); /* remove dynamic controls from the component driver */ int snd_soc_tplg_component_remove(struct snd_soc_component *comp, u32 index) { + struct snd_card *card = comp->card->snd_card; struct snd_soc_dobj *dobj, *next_dobj; int pass = SOC_TPLG_PASS_END;
@@ -2592,6 +2593,7 @@ int snd_soc_tplg_component_remove(struct snd_soc_component *comp, u32 index) while (pass >= SOC_TPLG_PASS_START) {
/* remove mixer controls */ + down_write(&card->controls_rwsem); list_for_each_entry_safe(dobj, next_dobj, &comp->dobj_list, list) {
@@ -2625,6 +2627,7 @@ int snd_soc_tplg_component_remove(struct snd_soc_component *comp, u32 index) break; } } + up_write(&card->controls_rwsem); pass--; }
From: Alexander Aring aahringo@redhat.com
[ Upstream commit 451dc48c806a7ce9fbec5e7a24ccf4b2c936e834 ]
This patch fixes an issue that an u32 netlink value is handled as a signed enum value which doesn't fit into the range of u32 netlink type. If it's handled as -1 value some BIT() evaluation ends in a shift-out-of-bounds issue. To solve the issue we set the to u32 max which is s32 "-1" value to keep backwards compatibility and let the followed enum values start counting at 0. This brings the compiler to never handle the enum as signed and a check if the value is above NL802154_IFTYPE_MAX should filter -1 out.
Fixes: f3ea5e44231a ("ieee802154: add new interface command") Signed-off-by: Alexander Aring aahringo@redhat.com Link: https://lore.kernel.org/r/20211112030916.685793-1-aahringo@redhat.com Signed-off-by: Stefan Schmidt stefan@datenfreihafen.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/nl802154.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/include/net/nl802154.h b/include/net/nl802154.h index ddcee128f5d9a..145acb8f25095 100644 --- a/include/net/nl802154.h +++ b/include/net/nl802154.h @@ -19,6 +19,8 @@ * */
+#include <linux/types.h> + #define NL802154_GENL_NAME "nl802154"
enum nl802154_commands { @@ -150,10 +152,9 @@ enum nl802154_attrs { };
enum nl802154_iftype { - /* for backwards compatibility TODO */ - NL802154_IFTYPE_UNSPEC = -1, + NL802154_IFTYPE_UNSPEC = (~(__u32)0),
- NL802154_IFTYPE_NODE, + NL802154_IFTYPE_NODE = 0, NL802154_IFTYPE_MONITOR, NL802154_IFTYPE_COORD,
From: Trond Myklebust trond.myklebust@hammerspace.com
[ Upstream commit d3c45824ad65aebf765fcf51366d317a29538820 ]
The failure to retrieve post-op attributes has no bearing on whether or not the clone operation itself was successful. We must therefore ignore the return value of decode_getfattr() when looking at the success or failure of nfs4_xdr_dec_clone().
Fixes: 36022770de6c ("nfs42: add CLONE xdr functions") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs42xdr.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c index 5966e1e7b1f51..09c683402f950 100644 --- a/fs/nfs/nfs42xdr.c +++ b/fs/nfs/nfs42xdr.c @@ -625,8 +625,7 @@ static int nfs4_xdr_dec_clone(struct rpc_rqst *rqstp, status = decode_clone(xdr); if (status) goto out; - status = decode_getfattr(xdr, res->dst_fattr, res->server); - + decode_getfattr(xdr, res->dst_fattr, res->server); out: res->rpc_status = status; return status;
From: Takashi Iwai tiwai@suse.de
[ Upstream commit 187bea472600dcc8d2eb714335053264dd437172 ]
When CONFIG_FORTIFY_SOURCE is set, memcpy() checks the potential buffer overflow and panics. The code in sofcpga bootstrapping contains the memcpy() calls are mistakenly translated as the shorter size, hence it triggers a panic as if it were overflowing.
This patch changes the secondary_trampoline and *_end definitions to arrays for avoiding the false-positive crash above.
Fixes: 9c4566a117a6 ("ARM: socfpga: Enable SMP for socfpga") Suggested-by: Kees Cook keescook@chromium.org Buglink: https://bugzilla.suse.com/show_bug.cgi?id=1192473 Link: https://lore.kernel.org/r/20211117193244.31162-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Dinh Nguyen dinguyen@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/mach-socfpga/core.h | 2 +- arch/arm/mach-socfpga/platsmp.c | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/arm/mach-socfpga/core.h b/arch/arm/mach-socfpga/core.h index 65e1817d8afe6..692a287a8712d 100644 --- a/arch/arm/mach-socfpga/core.h +++ b/arch/arm/mach-socfpga/core.h @@ -48,7 +48,7 @@ extern void __iomem *sdr_ctl_base_addr; u32 socfpga_sdram_self_refresh(u32 sdr_base); extern unsigned int socfpga_sdram_self_refresh_sz;
-extern char secondary_trampoline, secondary_trampoline_end; +extern char secondary_trampoline[], secondary_trampoline_end[];
extern unsigned long socfpga_cpu1start_addr;
diff --git a/arch/arm/mach-socfpga/platsmp.c b/arch/arm/mach-socfpga/platsmp.c index 0ee76772b5074..a272999ce04b9 100644 --- a/arch/arm/mach-socfpga/platsmp.c +++ b/arch/arm/mach-socfpga/platsmp.c @@ -31,14 +31,14 @@
static int socfpga_boot_secondary(unsigned int cpu, struct task_struct *idle) { - int trampoline_size = &secondary_trampoline_end - &secondary_trampoline; + int trampoline_size = secondary_trampoline_end - secondary_trampoline;
if (socfpga_cpu1start_addr) { /* This will put CPU #1 into reset. */ writel(RSTMGR_MPUMODRST_CPU1, rst_manager_base_addr + SOCFPGA_RSTMGR_MODMPURST);
- memcpy(phys_to_virt(0), &secondary_trampoline, trampoline_size); + memcpy(phys_to_virt(0), secondary_trampoline, trampoline_size);
writel(__pa_symbol(secondary_startup), sys_manager_base_addr + (socfpga_cpu1start_addr & 0x000000ff)); @@ -56,12 +56,12 @@ static int socfpga_boot_secondary(unsigned int cpu, struct task_struct *idle)
static int socfpga_a10_boot_secondary(unsigned int cpu, struct task_struct *idle) { - int trampoline_size = &secondary_trampoline_end - &secondary_trampoline; + int trampoline_size = secondary_trampoline_end - secondary_trampoline;
if (socfpga_cpu1start_addr) { writel(RSTMGR_MPUMODRST_CPU1, rst_manager_base_addr + SOCFPGA_A10_RSTMGR_MODMPURST); - memcpy(phys_to_virt(0), &secondary_trampoline, trampoline_size); + memcpy(phys_to_virt(0), secondary_trampoline, trampoline_size);
writel(__pa_symbol(secondary_startup), sys_manager_base_addr + (socfpga_cpu1start_addr & 0x00000fff));
From: Sreekanth Reddy sreekanth.reddy@broadcom.com
[ Upstream commit 0ee4ba13e09c9d9c1cb6abb59da8295d9952328b ]
While looping over shost's sdev list it is possible that one of the drives is getting removed and its sas_target object is freed but its sdev object remains intact.
Consequently, a kernel panic can occur while the driver is trying to access the sas_address field of sas_target object without also checking the sas_target object for NULL.
Link: https://lore.kernel.org/r/20211117104909.2069-1-sreekanth.reddy@broadcom.com Fixes: f92363d12359 ("[SCSI] mpt3sas: add new driver supporting 12GB SAS") Signed-off-by: Sreekanth Reddy sreekanth.reddy@broadcom.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index 332ea3af69ec3..79c5a193308f4 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -2955,7 +2955,7 @@ _scsih_ublock_io_device(struct MPT3SAS_ADAPTER *ioc, u64 sas_address)
shost_for_each_device(sdev, ioc->shost) { sas_device_priv_data = sdev->hostdata; - if (!sas_device_priv_data) + if (!sas_device_priv_data || !sas_device_priv_data->sas_target) continue; if (sas_device_priv_data->sas_target->sas_address != sas_address)
From: Dan Carpenter dan.carpenter@oracle.com
[ Upstream commit 96c5f82ef0a145d3e56e5b26f2bf6dcd2ffeae1c ]
The ->gem_create_object() functions are supposed to return NULL if there is an error. None of the callers expect error pointers so returing one will lead to an Oops. See drm_gem_vram_create(), for example.
Fixes: c826a6e10644 ("drm/vc4: Add a BO cache.") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Maxime Ripard maxime@cerno.tech Link: https://patchwork.freedesktop.org/patch/msgid/20211118111416.GC1147@kili Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/vc4/vc4_bo.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/vc4/vc4_bo.c b/drivers/gpu/drm/vc4/vc4_bo.c index eff0a8ece8bcf..85abba74ed981 100644 --- a/drivers/gpu/drm/vc4/vc4_bo.c +++ b/drivers/gpu/drm/vc4/vc4_bo.c @@ -292,7 +292,7 @@ struct drm_gem_object *vc4_create_object(struct drm_device *dev, size_t size)
bo = kzalloc(sizeof(*bo), GFP_KERNEL); if (!bo) - return ERR_PTR(-ENOMEM); + return NULL;
mutex_lock(&vc4->bo_lock); bo->label = VC4_BO_TYPE_KERNEL;
From: Eric Dumazet edumazet@google.com
[ Upstream commit 19d36c5f294879949c9d6f57cb61d39cc4c48553 ]
We deal with IPv6 packets, so we need to use IP6CB(skb)->flags and IP6SKB_REROUTED, instead of IPCB(skb)->flags and IPSKB_REROUTED
Found by code inspection, please double check that fixing this bug does not surface other bugs.
Fixes: 09ee9dba9611 ("ipv6: Reinject IPv6 packets if IPsec policy matches after SNAT") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Tobias Brunner tobias@strongswan.org Cc: Steffen Klassert steffen.klassert@secunet.com Cc: David Ahern dsahern@kernel.org Reviewed-by: David Ahern dsahern@kernel.org Tested-by: Tobias Brunner tobias@strongswan.org Acked-by: Tobias Brunner tobias@strongswan.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6_output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index a903d0ce7e701..f906fe2acedd3 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -175,7 +175,7 @@ static int ip6_finish_output(struct net *net, struct sock *sk, struct sk_buff *s #if defined(CONFIG_NETFILTER) && defined(CONFIG_XFRM) /* Policy lookup after SNAT yielded a new policy */ if (skb_dst(skb)->xfrm) { - IPCB(skb)->flags |= IPSKB_REROUTED; + IP6CB(skb)->flags |= IP6SKB_REROUTED; return dst_output(net, sk, skb); } #endif
From: Tony Lu tonylu@linux.alibaba.com
[ Upstream commit 606a63c9783a32a45bd2ef0eee393711d75b3284 ]
The side that actively closed socket, it's clcsock doesn't enter TIME_WAIT state, but the passive side does it. It should show the same behavior as TCP sockets.
Consider this, when client actively closes the socket, the clcsock in server enters TIME_WAIT state, which means the address is occupied and won't be reused before TIME_WAIT dismissing. If we restarted server, the service would be unavailable for a long time.
To solve this issue, shutdown the clcsock in [A], perform the TCP active close progress first, before the passive closed side closing it. So that the actively closed side enters TIME_WAIT, not the passive one.
Client | Server close() // client actively close | smc_release() | smc_close_active() // PEERCLOSEWAIT1 | smc_close_final() // abort or closed = 1| smc_cdc_get_slot_and_msg_send() | [A] | |smc_cdc_msg_recv_action() // ACTIVE | queue_work(smc_close_wq, &conn->close_work) | smc_close_passive_work() // PROCESSABORT or APPCLOSEWAIT1 | smc_close_passive_abort_received() // only in abort | |close() // server recv zero, close | smc_release() // PROCESSABORT or APPCLOSEWAIT1 | smc_close_active() | smc_close_abort() or smc_close_final() // CLOSED | smc_cdc_get_slot_and_msg_send() // abort or closed = 1 smc_cdc_msg_recv_action() | smc_clcsock_release() queue_work(smc_close_wq, &conn->close_work) | sock_release(tcp) // actively close clc, enter TIME_WAIT smc_close_passive_work() // PEERCLOSEWAIT1 | smc_conn_free() smc_close_passive_abort_received() // CLOSED| smc_conn_free() | smc_clcsock_release() | sock_release(tcp) // passive close clc |
Link: https://www.spinics.net/lists/netdev/msg780407.html Fixes: b38d732477e4 ("smc: socket closing and linkgroup cleanup") Signed-off-by: Tony Lu tonylu@linux.alibaba.com Reviewed-by: Wen Gu guwen@linux.alibaba.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/smc/smc_close.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c index 2427a1f3d0d12..d915907ac29ae 100644 --- a/net/smc/smc_close.c +++ b/net/smc/smc_close.c @@ -215,6 +215,12 @@ int smc_close_active(struct smc_sock *smc) /* send close request */ rc = smc_close_final(conn); sk->sk_state = SMC_PEERCLOSEWAIT1; + + /* actively shutdown clcsock before peer close it, + * prevent peer from entering TIME_WAIT state. + */ + if (smc->clcsock && smc->clcsock->sk) + rc = kernel_sock_shutdown(smc->clcsock, SHUT_RDWR); } else { /* peer event has changed the state */ goto again;
From: Thomas Zeitlhofer thomas.zeitlhofer+lkml@ze-it.at
[ Upstream commit cefcf24b4d351daf70ecd945324e200d3736821e ]
Commit 39fbef4b0f77 ("PM: hibernate: Get block device exclusively in swsusp_check()") changed the opening mode of the block device to (FMODE_READ | FMODE_EXCL).
In the corresponding calls to swsusp_close(), the mode is still just FMODE_READ which triggers the warning in blkdev_flush_mapping() on resume from hibernate.
So, use the mode (FMODE_READ | FMODE_EXCL) also when closing the device.
Fixes: 39fbef4b0f77 ("PM: hibernate: Get block device exclusively in swsusp_check()") Signed-off-by: Thomas Zeitlhofer thomas.zeitlhofer+lkml@ze-it.at Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/power/hibernate.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c index 02df69a8ee3c0..e68b1c20ad3d2 100644 --- a/kernel/power/hibernate.c +++ b/kernel/power/hibernate.c @@ -668,7 +668,7 @@ static int load_image_and_restore(void) goto Unlock;
error = swsusp_read(&flags); - swsusp_close(FMODE_READ); + swsusp_close(FMODE_READ | FMODE_EXCL); if (!error) hibernation_restore(flags & SF_PLATFORM_MODE);
@@ -865,7 +865,7 @@ static int software_resume(void) /* The snapshot device should not be opened while we're running */ if (!atomic_add_unless(&snapshot_device_available, -1, 0)) { error = -EBUSY; - swsusp_close(FMODE_READ); + swsusp_close(FMODE_READ | FMODE_EXCL); goto Unlock; }
@@ -901,7 +901,7 @@ static int software_resume(void) pm_pr_dbg("Hibernation image not present or could not be loaded.\n"); return error; Close_Finish: - swsusp_close(FMODE_READ); + swsusp_close(FMODE_READ | FMODE_EXCL); goto Finish; }
From: Eric Dumazet edumazet@google.com
[ Upstream commit 4e1fddc98d2585ddd4792b5e44433dcee7ece001 ]
While testing BIG TCP patch series, I was expecting that TCP_RR workloads with 80KB requests/answers would send one 80KB TSO packet, then being received as a single GRO packet.
It turns out this was not happening, and the root cause was that cubic Hystart ACK train was triggering after a few (2 or 3) rounds of RPC.
Hystart was wrongly setting CWND/SSTHRESH to 30, while my RPC needed a budget of ~20 segments.
Ideally these TCP_RR flows should not exit slow start.
Cubic Hystart should reset itself at each round, instead of assuming every TCP flow is a bulk one.
Note that even after this patch, Hystart can still trigger, depending on scheduling artifacts, but at a higher CWND/SSTHRESH threshold, keeping optimal TSO packet sizes.
Tested:
ip link set dev eth0 gro_ipv6_max_size 131072 gso_ipv6_max_size 131072 nstat -n; netperf -H ... -t TCP_RR -l 5 -- -r 80000,80000 -K cubic; nstat|egrep "Ip6InReceives|Hystart|Ip6OutRequests"
Before:
8605 Ip6InReceives 87541 0.0 Ip6OutRequests 129496 0.0 TcpExtTCPHystartTrainDetect 1 0.0 TcpExtTCPHystartTrainCwnd 30 0.0
After:
8760 Ip6InReceives 88514 0.0 Ip6OutRequests 87975 0.0
Fixes: ae27e98a5152 ("[TCP] CUBIC v2.3") Co-developed-by: Neal Cardwell ncardwell@google.com Signed-off-by: Neal Cardwell ncardwell@google.com Signed-off-by: Eric Dumazet edumazet@google.com Cc: Stephen Hemminger stephen@networkplumber.org Cc: Yuchung Cheng ycheng@google.com Cc: Soheil Hassas Yeganeh soheil@google.com Link: https://lore.kernel.org/r/20211123202535.1843771-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_cubic.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c index 8b5ba0a5cd386..93530bd332470 100644 --- a/net/ipv4/tcp_cubic.c +++ b/net/ipv4/tcp_cubic.c @@ -340,8 +340,6 @@ static void bictcp_cong_avoid(struct sock *sk, u32 ack, u32 acked) return;
if (tcp_in_slow_start(tp)) { - if (hystart && after(ack, ca->end_seq)) - bictcp_hystart_reset(sk); acked = tcp_slow_start(tp, acked); if (!acked) return; @@ -383,6 +381,9 @@ static void hystart_update(struct sock *sk, u32 delay) if (ca->found & hystart_detect) return;
+ if (after(tp->snd_una, ca->end_seq)) + bictcp_hystart_reset(sk); + if (hystart_detect & HYSTART_ACK_TRAIN) { u32 now = bictcp_clock();
From: Huang Pei huangpei@loongson.cn
[ Upstream commit 41ce097f714401e6ad8f3f5eb30d7f91b0b5e495 ]
It hangup when booting Loongson 3A1000 with BOTH CONFIG_PAGE_SIZE_64KB and CONFIG_MIPS_VA_BITS_48, that it turn out to use 2-level pgtable instead of 3-level. 64KB page size with 2-level pgtable only cover 42 bits VA, use 3-level pgtable to cover all 48 bits VA(55 bits)
Fixes: 1e321fa917fb ("MIPS64: Support of at least 48 bits of SEGBITS) Signed-off-by: Huang Pei huangpei@loongson.cn Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/mips/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 85afd6b4297b2..45a5801ff467b 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -2990,7 +2990,7 @@ config HAVE_LATENCYTOP_SUPPORT config PGTABLE_LEVELS int default 4 if PAGE_SIZE_4KB && MIPS_VA_BITS_48 - default 3 if 64BIT && !PAGE_SIZE_64KB + default 3 if 64BIT && (!PAGE_SIZE_64KB || MIPS_VA_BITS_48) default 2
source "init/Kconfig"
From: Tony Lu tonylu@linux.alibaba.com
[ Upstream commit bacb6c1e47691cda4a95056c21b5487fb7199fcc ]
When applications call shutdown() with SHUT_RDWR in userspace, smc_close_active() calls kernel_sock_shutdown(), and it is called twice in smc_shutdown().
This fixes this by checking sk_state before do clcsock shutdown, and avoids missing the application's call of smc_shutdown().
Link: https://lore.kernel.org/linux-s390/1f67548e-cbf6-0dce-82b5-10288a4583bd@linu... Fixes: 606a63c9783a ("net/smc: Ensure the active closing peer first closes clcsock") Signed-off-by: Tony Lu tonylu@linux.alibaba.com Reviewed-by: Wen Gu guwen@linux.alibaba.com Acked-by: Karsten Graul kgraul@linux.ibm.com Link: https://lore.kernel.org/r/20211126024134.45693-1-tonylu@linux.alibaba.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/smc/af_smc.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 8c71f0929fbb7..f2c907d52812b 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -1178,8 +1178,10 @@ static unsigned int smc_poll(struct file *file, struct socket *sock, static int smc_shutdown(struct socket *sock, int how) { struct sock *sk = sock->sk; + bool do_shutdown = true; struct smc_sock *smc; int rc = -EINVAL; + int old_state; int rc1 = 0;
smc = smc_sk(sk); @@ -1206,7 +1208,11 @@ static int smc_shutdown(struct socket *sock, int how) } switch (how) { case SHUT_RDWR: /* shutdown in both directions */ + old_state = sk->sk_state; rc = smc_close_active(smc); + if (old_state == SMC_ACTIVE && + sk->sk_state == SMC_PEERCLOSEWAIT1) + do_shutdown = false; break; case SHUT_WR: rc = smc_close_shutdown_write(smc); @@ -1216,7 +1222,7 @@ static int smc_shutdown(struct socket *sock, int how) /* nothing more to do because peer is not involved */ break; } - if (smc->clcsock) + if (do_shutdown && smc->clcsock) rc1 = kernel_sock_shutdown(smc->clcsock, how); /* map sock_shutdown_cmd constants to sk_shutdown value range */ sk->sk_shutdown |= how + 1;
From: Stefano Garzarella sgarzare@redhat.com
commit 49d8c5ffad07ca014cfae72a1b9b8c52b6ad9cb8 upstream.
The "used length" reported by calling vhost_add_used() must be the number of bytes written by the device (using "in" buffers).
In vhost_vsock_handle_tx_kick() the device only reads the guest buffers (they are all "out" buffers), without writing anything, so we must pass 0 as "used length" to comply virtio spec.
Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") Cc: stable@vger.kernel.org Reported-by: Halil Pasic pasic@linux.ibm.com Suggested-by: Jason Wang jasowang@redhat.com Signed-off-by: Stefano Garzarella sgarzare@redhat.com Link: https://lore.kernel.org/r/20211122163525.294024-2-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Reviewed-by: Stefan Hajnoczi stefanha@redhat.com Reviewed-by: Halil Pasic pasic@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/vhost/vsock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -490,7 +490,7 @@ static void vhost_vsock_handle_tx_kick(s virtio_transport_free_pkt(pkt);
len += sizeof(pkt->hdr); - vhost_add_used(vq, head, len); + vhost_add_used(vq, head, 0); total_len += len; added = true; } while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
From: Steven Rostedt (VMware) rostedt@goodmis.org
commit 6cb206508b621a9a0a2c35b60540e399225c8243 upstream.
When pid filtering is activated in an instance, all of the events trace files for that instance has the PID_FILTER flag set. This determines whether or not pid filtering needs to be done on the event, otherwise the event is executed as normal.
If pid filtering is enabled when an event is created (via a dynamic event or modules), its flag is not updated to reflect the current state, and the events are not filtered properly.
Cc: stable@vger.kernel.org Fixes: 3fdaf80f4a836 ("tracing: Implement event pid filtering") Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace_events.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -2254,12 +2254,19 @@ static struct trace_event_file * trace_create_new_event(struct trace_event_call *call, struct trace_array *tr) { + struct trace_pid_list *pid_list; struct trace_event_file *file;
file = kmem_cache_alloc(file_cachep, GFP_TRACE); if (!file) return NULL;
+ pid_list = rcu_dereference_protected(tr->filtered_pids, + lockdep_is_held(&event_mutex)); + + if (pid_list) + file->flags |= EVENT_FILE_FL_PID_FILTER; + file->event_call = call; file->tr = tr; atomic_set(&file->sm_ref, 0);
From: David Hildenbrand david@redhat.com
commit fe3d10024073f06f04c74b9674bd71ccc1d787cf upstream.
We should not walk/touch page tables outside of VMA boundaries when holding only the mmap sem in read mode. Evil user space can modify the VMA layout just before this function runs and e.g., trigger races with page table removal code since commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap"). gfn_to_hva() will only translate using KVM memory regions, but won't validate the VMA.
Further, we should not allocate page tables outside of VMA boundaries: if evil user space decides to map hugetlbfs to these ranges, bad things will happen because we suddenly have PTE or PMD page tables where we shouldn't have them.
Similarly, we have to check if we suddenly find a hugetlbfs VMA, before calling get_locked_pte().
Fixes: 2d42f9477320 ("s390/kvm: Add PGSTE manipulation functions") Signed-off-by: David Hildenbrand david@redhat.com Reviewed-by: Claudio Imbrenda imbrenda@linux.ibm.com Acked-by: Heiko Carstens hca@linux.ibm.com Link: https://lore.kernel.org/r/20210909162248.14969-4-david@redhat.com Signed-off-by: Christian Borntraeger borntraeger@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/mm/pgtable.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
--- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -896,6 +896,7 @@ EXPORT_SYMBOL(get_guest_storage_key); int pgste_perform_essa(struct mm_struct *mm, unsigned long hva, int orc, unsigned long *oldpte, unsigned long *oldpgste) { + struct vm_area_struct *vma; unsigned long pgstev; spinlock_t *ptl; pgste_t pgste; @@ -905,6 +906,10 @@ int pgste_perform_essa(struct mm_struct WARN_ON_ONCE(orc > ESSA_MAX); if (unlikely(orc > ESSA_MAX)) return -EINVAL; + + vma = find_vma(mm, hva); + if (!vma || hva < vma->vm_start || is_vm_hugetlb_page(vma)) + return -EFAULT; ptep = get_locked_pte(mm, hva, &ptl); if (unlikely(!ptep)) return -EFAULT; @@ -997,10 +1002,14 @@ EXPORT_SYMBOL(pgste_perform_essa); int set_pgste_bits(struct mm_struct *mm, unsigned long hva, unsigned long bits, unsigned long value) { + struct vm_area_struct *vma; spinlock_t *ptl; pgste_t new; pte_t *ptep;
+ vma = find_vma(mm, hva); + if (!vma || hva < vma->vm_start || is_vm_hugetlb_page(vma)) + return -EFAULT; ptep = get_locked_pte(mm, hva, &ptl); if (unlikely(!ptep)) return -EFAULT; @@ -1025,9 +1034,13 @@ EXPORT_SYMBOL(set_pgste_bits); */ int get_pgste(struct mm_struct *mm, unsigned long hva, unsigned long *pgstep) { + struct vm_area_struct *vma; spinlock_t *ptl; pte_t *ptep;
+ vma = find_vma(mm, hva); + if (!vma || hva < vma->vm_start || is_vm_hugetlb_page(vma)) + return -EFAULT; ptep = get_locked_pte(mm, hva, &ptl); if (unlikely(!ptep)) return -EFAULT;
From: Sergei Shtylyov sergei.shtylyov@cogentembedded.com
commit 1df3e5b3feebf29a3ecfa0c0f06f79544ca573e4 upstream.
When testing the R-Car PCIe driver on the Condor board, if the PCIe PHY driver was left disabled, the kernel crashed with this BUG:
kernel BUG at lib/ioremap.c:72! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 39 Comm: kworker/0:1 Not tainted 4.17.0-dirty #1092 Hardware name: Renesas Condor board based on r8a77980 (DT) Workqueue: events deferred_probe_work_func pstate: 80000005 (Nzcv daif -PAN -UAO) pc : ioremap_page_range+0x370/0x3c8 lr : ioremap_page_range+0x40/0x3c8 sp : ffff000008da39e0 x29: ffff000008da39e0 x28: 00e8000000000f07 x27: ffff7dfffee00000 x26: 0140000000000000 x25: ffff7dfffef00000 x24: 00000000000fe100 x23: ffff80007b906000 x22: ffff000008ab8000 x21: ffff000008bb1d58 x20: ffff7dfffef00000 x19: ffff800009c30fb8 x18: 0000000000000001 x17: 00000000000152d0 x16: 00000000014012d0 x15: 0000000000000000 x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720 x11: 0720072007300730 x10: 00000000000000ae x9 : 0000000000000000 x8 : ffff7dffff000000 x7 : 0000000000000000 x6 : 0000000000000100 x5 : 0000000000000000 x4 : 000000007b906000 x3 : ffff80007c61a880 x2 : ffff7dfffeefffff x1 : 0000000040000000 x0 : 00e80000fe100f07 Process kworker/0:1 (pid: 39, stack limit = 0x (ptrval)) Call trace: ioremap_page_range+0x370/0x3c8 pci_remap_iospace+0x7c/0xac pci_parse_request_of_pci_ranges+0x13c/0x190 rcar_pcie_probe+0x4c/0xb04 platform_drv_probe+0x50/0xbc driver_probe_device+0x21c/0x308 __device_attach_driver+0x98/0xc8 bus_for_each_drv+0x54/0x94 __device_attach+0xc4/0x12c device_initial_probe+0x10/0x18 bus_probe_device+0x90/0x98 deferred_probe_work_func+0xb0/0x150 process_one_work+0x12c/0x29c worker_thread+0x200/0x3fc kthread+0x108/0x134 ret_from_fork+0x10/0x18 Code: f9004ba2 54000080 aa0003fb 17ffff48 (d4210000)
It turned out that pci_remap_iospace() wasn't undone when the driver's probe failed, and since devm_phy_optional_get() returned -EPROBE_DEFER, the probe was retried, finally causing the BUG due to trying to remap already remapped pages.
The Aardvark PCI controller driver has the same issue. Replace pci_remap_iospace() with its devm_ managed version to fix the bug.
Fixes: 8c39d710363c ("PCI: aardvark: Add Aardvark PCI host controller driver") Signed-off-by: Sergei Shtylyov sergei.shtylyov@cogentembedded.com [lorenzo.pieralisi@arm.com: updated the commit log] Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Thomas Petazzoni thomas.petazzoni@bootlin.com Reviewed-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/host/pci-aardvark.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/pci/host/pci-aardvark.c +++ b/drivers/pci/host/pci-aardvark.c @@ -932,7 +932,7 @@ static int advk_pcie_parse_request_of_pc 0, 0xF8000000, 0, lower_32_bits(res->start), OB_PCIE_IO); - err = pci_remap_iospace(res, iobase); + err = devm_pci_remap_iospace(dev, res, iobase); if (err) { dev_warn(dev, "error %d: failed to map resource %pR\n", err, res);
From: Wen Yang wen.yang99@zte.com.cn
commit 3842f5166bf1ef286fe7a39f262b5c9581308366 upstream.
The call to of_get_next_child() returns a node pointer with refcount incremented thus it must be explicitly decremented after the last usage.
irq_domain_add_linear() also calls of_node_get() to increase refcount, so irq_domain will not be affected when it is released.
Detected by coccinelle with the following warnings: ./drivers/pci/controller/pci-aardvark.c:826:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 798, but without a corresponding object release within this function.
Signed-off-by: Wen Yang wen.yang99@zte.com.cn Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Cc: Thomas Petazzoni thomas.petazzoni@bootlin.com Cc: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Cc: Bjorn Helgaas bhelgaas@google.com Cc: linux-pci@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/host/pci-aardvark.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
--- a/drivers/pci/host/pci-aardvark.c +++ b/drivers/pci/host/pci-aardvark.c @@ -789,6 +789,7 @@ static int advk_pcie_init_irq_domain(str struct device_node *node = dev->of_node; struct device_node *pcie_intc_node; struct irq_chip *irq_chip; + int ret = 0;
raw_spin_lock_init(&pcie->irq_lock);
@@ -803,8 +804,8 @@ static int advk_pcie_init_irq_domain(str irq_chip->name = devm_kasprintf(dev, GFP_KERNEL, "%s-irq", dev_name(dev)); if (!irq_chip->name) { - of_node_put(pcie_intc_node); - return -ENOMEM; + ret = -ENOMEM; + goto out_put_node; }
irq_chip->irq_mask = advk_pcie_irq_mask; @@ -816,11 +817,13 @@ static int advk_pcie_init_irq_domain(str &advk_pcie_irq_domain_ops, pcie); if (!pcie->irq_domain) { dev_err(dev, "Failed to get a INTx IRQ domain\n"); - of_node_put(pcie_intc_node); - return -ENOMEM; + ret = -ENOMEM; + goto out_put_node; }
- return 0; +out_put_node: + of_node_put(pcie_intc_node); + return ret; }
static void advk_pcie_remove_irq_domain(struct advk_pcie *pcie)
From: Remi Pommarel repk@triplefau.lt
commit f4c7d053d7f77cd5c1a1ba7c7ce085ddba13d1d7 upstream.
When configuring pcie reset pin from gpio (e.g. initially set by u-boot) to pcie function this pin goes low for a brief moment asserting the PERST# signal. Thus connected device enters fundamental reset process and link configuration can only begin after a minimal 100ms delay (see [1]).
Because the pin configuration comes from the "default" pinctrl it is implicitly configured before the probe callback is called:
driver_probe_device() really_probe() ... pinctrl_bind_pins() /* Here pin goes from gpio to PCIE reset function and PERST# is asserted */ ... drv->probe()
[1] "PCI Express Base Specification", REV. 4.0 PCI Express, February 19 2014, 6.6.1 Conventional Reset
Signed-off-by: Remi Pommarel repk@triplefau.lt Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Acked-by: Thomas Petazzoni thomas.petazzoni@bootlin.com Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/host/pci-aardvark.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/pci/host/pci-aardvark.c +++ b/drivers/pci/host/pci-aardvark.c @@ -362,6 +362,14 @@ static void advk_pcie_setup_hw(struct ad reg |= PIO_CTRL_ADDR_WIN_DISABLE; advk_writel(pcie, reg, PIO_CTRL);
+ /* + * PERST# signal could have been asserted by pinctrl subsystem before + * probe() callback has been called, making the endpoint going into + * fundamental reset. As required by PCI Express spec a delay for at + * least 100ms after such a reset before link training is needed. + */ + msleep(PCI_PM_D3COLD_WAIT); + /* Start link training */ reg = advk_readl(pcie, PCIE_CORE_LINK_CTRL_STAT_REG); reg |= PCIE_CORE_LINK_TRAINING;
From: Pali Rohár pali@kernel.org
commit 6964494582f56a3882c2c53b0edbfe99eb32b2e1 upstream.
Adding even 100ms (PCI_PM_D3COLD_WAIT) delay between enabling link training and starting link training causes detection issues with some buggy cards (such as Compex WLE900VX).
Move the code which enables link training immediately before the one which starts link traning.
This fixes detection issues of Compex WLE900VX card on Turris MOX after cold boot.
Link: https://lore.kernel.org/r/20200430080625.26070-2-pali@kernel.org Fixes: f4c7d053d7f7 ("PCI: aardvark: Wait for endpoint to be ready...") Tested-by: Tomasz Maciej Nowak tmn505@gmail.com Signed-off-by: Pali Rohár pali@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Acked-by: Rob Herring robh@kernel.org Acked-by: Thomas Petazzoni thomas.petazzoni@bootlin.com Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/host/pci-aardvark.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
--- a/drivers/pci/host/pci-aardvark.c +++ b/drivers/pci/host/pci-aardvark.c @@ -324,11 +324,6 @@ static void advk_pcie_setup_hw(struct ad reg |= LANE_COUNT_1; advk_writel(pcie, reg, PCIE_CORE_CTRL0_REG);
- /* Enable link training */ - reg = advk_readl(pcie, PCIE_CORE_CTRL0_REG); - reg |= LINK_TRAINING_EN; - advk_writel(pcie, reg, PCIE_CORE_CTRL0_REG); - /* Enable MSI */ reg = advk_readl(pcie, PCIE_CORE_CTRL2_REG); reg |= PCIE_CORE_CTRL2_MSI_ENABLE; @@ -370,7 +365,15 @@ static void advk_pcie_setup_hw(struct ad */ msleep(PCI_PM_D3COLD_WAIT);
- /* Start link training */ + /* Enable link training */ + reg = advk_readl(pcie, PCIE_CORE_CTRL0_REG); + reg |= LINK_TRAINING_EN; + advk_writel(pcie, reg, PCIE_CORE_CTRL0_REG); + + /* + * Start link training immediately after enabling it. + * This solves problems for some buggy cards. + */ reg = advk_readl(pcie, PCIE_CORE_LINK_CTRL_STAT_REG); reg |= PCIE_CORE_LINK_TRAINING; advk_writel(pcie, reg, PCIE_CORE_LINK_CTRL_STAT_REG);
From: Marek Behún marek.behun@nic.cz
commit 43fc679ced18006b12d918d7a8a4af392b7fbfe7 upstream.
Currently the aardvark driver trains link in PCIe gen2 mode. This may cause some buggy gen1 cards (such as Compex WLE900VX) to be unstable or even not detected. Moreover when ASPM code tries to retrain link second time, these cards may stop responding and link goes down. If gen1 is used this does not happen.
Unconditionally forcing gen1 is not a good solution since it may have performance impact on gen2 cards.
To overcome this, read 'max-link-speed' property (as defined in PCI device tree bindings) and use this as max gen mode. Then iteratively try link training at this mode or lower until successful. After successful link training choose final controller gen based on Negotiated Link Speed from Link Status register, which should match card speed.
Link: https://lore.kernel.org/r/20200430080625.26070-5-pali@kernel.org Tested-by: Tomasz Maciej Nowak tmn505@gmail.com Signed-off-by: Pali Rohár pali@kernel.org Signed-off-by: Marek Behún marek.behun@nic.cz Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Reviewed-by: Rob Herring robh@kernel.org Acked-by: Thomas Petazzoni thomas.petazzoni@bootlin.com Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/host/pci-aardvark.c | 114 +++++++++++++++++++++++++++++++--------- 1 file changed, 89 insertions(+), 25 deletions(-)
--- a/drivers/pci/host/pci-aardvark.c +++ b/drivers/pci/host/pci-aardvark.c @@ -36,6 +36,7 @@ #define PCIE_CORE_LINK_CTRL_STAT_REG 0xd0 #define PCIE_CORE_LINK_L0S_ENTRY BIT(0) #define PCIE_CORE_LINK_TRAINING BIT(5) +#define PCIE_CORE_LINK_SPEED_SHIFT 16 #define PCIE_CORE_LINK_WIDTH_SHIFT 20 #define PCIE_CORE_ERR_CAPCTL_REG 0x118 #define PCIE_CORE_ERR_CAPCTL_ECRC_CHK_TX BIT(5) @@ -212,6 +213,7 @@ struct advk_pcie { struct mutex msi_used_lock; u16 msi_msg; int root_bus_nr; + int link_gen; };
static inline void advk_writel(struct advk_pcie *pcie, u32 val, u64 reg) @@ -235,20 +237,16 @@ static int advk_pcie_link_up(struct advk
static int advk_pcie_wait_for_link(struct advk_pcie *pcie) { - struct device *dev = &pcie->pdev->dev; int retries;
/* check if the link is up or not */ for (retries = 0; retries < LINK_WAIT_MAX_RETRIES; retries++) { - if (advk_pcie_link_up(pcie)) { - dev_info(dev, "link up\n"); + if (advk_pcie_link_up(pcie)) return 0; - }
usleep_range(LINK_WAIT_USLEEP_MIN, LINK_WAIT_USLEEP_MAX); }
- dev_err(dev, "link never came up\n"); return -ETIMEDOUT; }
@@ -272,6 +270,85 @@ static void advk_pcie_set_ob_win(struct advk_writel(pcie, match_ls | BIT(0), OB_WIN_MATCH_LS(win_num)); }
+static int advk_pcie_train_at_gen(struct advk_pcie *pcie, int gen) +{ + int ret, neg_gen; + u32 reg; + + /* Setup link speed */ + reg = advk_readl(pcie, PCIE_CORE_CTRL0_REG); + reg &= ~PCIE_GEN_SEL_MSK; + if (gen == 3) + reg |= SPEED_GEN_3; + else if (gen == 2) + reg |= SPEED_GEN_2; + else + reg |= SPEED_GEN_1; + advk_writel(pcie, reg, PCIE_CORE_CTRL0_REG); + + /* + * Enable link training. This is not needed in every call to this + * function, just once suffices, but it does not break anything either. + */ + reg = advk_readl(pcie, PCIE_CORE_CTRL0_REG); + reg |= LINK_TRAINING_EN; + advk_writel(pcie, reg, PCIE_CORE_CTRL0_REG); + + /* + * Start link training immediately after enabling it. + * This solves problems for some buggy cards. + */ + reg = advk_readl(pcie, PCIE_CORE_LINK_CTRL_STAT_REG); + reg |= PCIE_CORE_LINK_TRAINING; + advk_writel(pcie, reg, PCIE_CORE_LINK_CTRL_STAT_REG); + + ret = advk_pcie_wait_for_link(pcie); + if (ret) + return ret; + + reg = advk_readl(pcie, PCIE_CORE_LINK_CTRL_STAT_REG); + neg_gen = (reg >> PCIE_CORE_LINK_SPEED_SHIFT) & 0xf; + + return neg_gen; +} + +static void advk_pcie_train_link(struct advk_pcie *pcie) +{ + struct device *dev = &pcie->pdev->dev; + int neg_gen = -1, gen; + + /* + * Try link training at link gen specified by device tree property + * 'max-link-speed'. If this fails, iteratively train at lower gen. + */ + for (gen = pcie->link_gen; gen > 0; --gen) { + neg_gen = advk_pcie_train_at_gen(pcie, gen); + if (neg_gen > 0) + break; + } + + if (neg_gen < 0) + goto err; + + /* + * After successful training if negotiated gen is lower than requested, + * train again on negotiated gen. This solves some stability issues for + * some buggy gen1 cards. + */ + if (neg_gen < gen) { + gen = neg_gen; + neg_gen = advk_pcie_train_at_gen(pcie, gen); + } + + if (neg_gen == gen) { + dev_info(dev, "link up at gen %i\n", gen); + return; + } + +err: + dev_err(dev, "link never came up\n"); +} + static void advk_pcie_setup_hw(struct advk_pcie *pcie) { u32 reg; @@ -312,12 +389,6 @@ static void advk_pcie_setup_hw(struct ad PCIE_CORE_CTRL2_TD_ENABLE; advk_writel(pcie, reg, PCIE_CORE_CTRL2_REG);
- /* Set GEN2 */ - reg = advk_readl(pcie, PCIE_CORE_CTRL0_REG); - reg &= ~PCIE_GEN_SEL_MSK; - reg |= SPEED_GEN_2; - advk_writel(pcie, reg, PCIE_CORE_CTRL0_REG); - /* Set lane X1 */ reg = advk_readl(pcie, PCIE_CORE_CTRL0_REG); reg &= ~LANE_CNT_MSK; @@ -365,20 +436,7 @@ static void advk_pcie_setup_hw(struct ad */ msleep(PCI_PM_D3COLD_WAIT);
- /* Enable link training */ - reg = advk_readl(pcie, PCIE_CORE_CTRL0_REG); - reg |= LINK_TRAINING_EN; - advk_writel(pcie, reg, PCIE_CORE_CTRL0_REG); - - /* - * Start link training immediately after enabling it. - * This solves problems for some buggy cards. - */ - reg = advk_readl(pcie, PCIE_CORE_LINK_CTRL_STAT_REG); - reg |= PCIE_CORE_LINK_TRAINING; - advk_writel(pcie, reg, PCIE_CORE_LINK_CTRL_STAT_REG); - - advk_pcie_wait_for_link(pcie); + advk_pcie_train_link(pcie);
reg = advk_readl(pcie, PCIE_CORE_CMD_STATUS_REG); reg |= PCIE_CORE_CMD_MEM_ACCESS_EN | @@ -1017,6 +1075,12 @@ static int advk_pcie_probe(struct platfo return ret; }
+ ret = of_pci_get_max_link_speed(dev->of_node); + if (ret <= 0 || ret > 3) + pcie->link_gen = 3; + else + pcie->link_gen = ret; + advk_pcie_setup_hw(pcie);
ret = advk_pcie_init_irq_domain(pcie);
From: Pali Rohár pali@kernel.org
commit 5169a9851daaa2782a7bd2bb83d5b1bd224b2879 upstream.
Add support for issuing PERST via GPIO specified in 'reset-gpios' property (as described in PCI device tree bindings).
Some buggy cards (e.g. Compex WLE900VX or WLE1216) are not detected after reboot when PERST is not issued during driver initialization.
If bootloader already enabled link training then issuing PERST has no effect for some buggy cards (e.g. Compex WLE900VX) and these cards are not detected. We therefore clear the LINK_TRAINING_EN register before.
It was observed that Compex WLE900VX card needs to be in PERST reset for at least 10ms if bootloader enabled link training.
Tested on Turris MOX.
Link: https://lore.kernel.org/r/20200430080625.26070-6-pali@kernel.org Tested-by: Tomasz Maciej Nowak tmn505@gmail.com Signed-off-by: Pali Rohár pali@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Acked-by: Thomas Petazzoni thomas.petazzoni@bootlin.com Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/host/pci-aardvark.c | 44 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 43 insertions(+), 1 deletion(-)
--- a/drivers/pci/host/pci-aardvark.c +++ b/drivers/pci/host/pci-aardvark.c @@ -12,6 +12,7 @@ */
#include <linux/delay.h> +#include <linux/gpio.h> #include <linux/interrupt.h> #include <linux/irq.h> #include <linux/irqdomain.h> @@ -20,6 +21,7 @@ #include <linux/init.h> #include <linux/platform_device.h> #include <linux/of_address.h> +#include <linux/of_gpio.h> #include <linux/of_pci.h>
/* PCIe core registers */ @@ -214,6 +216,7 @@ struct advk_pcie { u16 msi_msg; int root_bus_nr; int link_gen; + struct gpio_desc *reset_gpio; };
static inline void advk_writel(struct advk_pcie *pcie, u32 val, u64 reg) @@ -349,6 +352,25 @@ err: dev_err(dev, "link never came up\n"); }
+static void advk_pcie_issue_perst(struct advk_pcie *pcie) +{ + u32 reg; + + if (!pcie->reset_gpio) + return; + + /* PERST does not work for some cards when link training is enabled */ + reg = advk_readl(pcie, PCIE_CORE_CTRL0_REG); + reg &= ~LINK_TRAINING_EN; + advk_writel(pcie, reg, PCIE_CORE_CTRL0_REG); + + /* 10ms delay is needed for some cards */ + dev_info(&pcie->pdev->dev, "issuing PERST via reset GPIO for 10ms\n"); + gpiod_set_value_cansleep(pcie->reset_gpio, 1); + usleep_range(10000, 11000); + gpiod_set_value_cansleep(pcie->reset_gpio, 0); +} + static void advk_pcie_setup_hw(struct advk_pcie *pcie) { u32 reg; @@ -358,6 +380,8 @@ static void advk_pcie_setup_hw(struct ad for (i = 0; i < 8; i++) advk_pcie_set_ob_win(pcie, i, 0, 0, 0, 0, 0, 0, 0);
+ advk_pcie_issue_perst(pcie); + /* Set to Direct mode */ reg = advk_readl(pcie, CTRL_CONFIG_REG); reg &= ~(CTRL_MODE_MASK << CTRL_MODE_SHIFT); @@ -430,7 +454,8 @@ static void advk_pcie_setup_hw(struct ad
/* * PERST# signal could have been asserted by pinctrl subsystem before - * probe() callback has been called, making the endpoint going into + * probe() callback has been called or issued explicitly by reset gpio + * function advk_pcie_issue_perst(), making the endpoint going into * fundamental reset. As required by PCI Express spec a delay for at * least 100ms after such a reset before link training is needed. */ @@ -1075,6 +1100,23 @@ static int advk_pcie_probe(struct platfo return ret; }
+ pcie->reset_gpio = devm_fwnode_get_index_gpiod_from_child(dev, "reset", + 0, + dev_fwnode(dev), + GPIOD_OUT_LOW, + "pcie1-reset"); + ret = PTR_ERR_OR_ZERO(pcie->reset_gpio); + if (ret) { + if (ret == -ENOENT) { + pcie->reset_gpio = NULL; + } else { + if (ret != -EPROBE_DEFER) + dev_err(dev, "Failed to get reset-gpio: %i\n", + ret); + return ret; + } + } + ret = of_pci_get_max_link_speed(dev->of_node); if (ret <= 0 || ret > 3) pcie->link_gen = 3;
From: Pali Rohár pali@kernel.org
commit 96be36dbffacea0aa9e6ec4839583e79faa141a1 upstream.
PCI-E capability macros are already defined in linux/pci_regs.h. Remove their reimplementation in pcie-aardvark.
Link: https://lore.kernel.org/r/20200430080625.26070-9-pali@kernel.org Tested-by: Tomasz Maciej Nowak tmn505@gmail.com Signed-off-by: Pali Rohár pali@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Reviewed-by: Rob Herring robh@kernel.org Acked-by: Thomas Petazzoni thomas.petazzoni@bootlin.com Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/host/pci-aardvark.c | 42 ++++++++++++++++++---------------------- 1 file changed, 19 insertions(+), 23 deletions(-)
--- a/drivers/pci/host/pci-aardvark.c +++ b/drivers/pci/host/pci-aardvark.c @@ -29,17 +29,7 @@ #define PCIE_CORE_CMD_IO_ACCESS_EN BIT(0) #define PCIE_CORE_CMD_MEM_ACCESS_EN BIT(1) #define PCIE_CORE_CMD_MEM_IO_REQ_EN BIT(2) -#define PCIE_CORE_DEV_CTRL_STATS_REG 0xc8 -#define PCIE_CORE_DEV_CTRL_STATS_RELAX_ORDER_DISABLE (0 << 4) -#define PCIE_CORE_DEV_CTRL_STATS_MAX_PAYLOAD_SZ_SHIFT 5 -#define PCIE_CORE_DEV_CTRL_STATS_SNOOP_DISABLE (0 << 11) -#define PCIE_CORE_DEV_CTRL_STATS_MAX_RD_REQ_SIZE_SHIFT 12 -#define PCIE_CORE_DEV_CTRL_STATS_MAX_RD_REQ_SZ 0x2 -#define PCIE_CORE_LINK_CTRL_STAT_REG 0xd0 -#define PCIE_CORE_LINK_L0S_ENTRY BIT(0) -#define PCIE_CORE_LINK_TRAINING BIT(5) -#define PCIE_CORE_LINK_SPEED_SHIFT 16 -#define PCIE_CORE_LINK_WIDTH_SHIFT 20 +#define PCIE_CORE_PCIEXP_CAP 0xc0 #define PCIE_CORE_ERR_CAPCTL_REG 0x118 #define PCIE_CORE_ERR_CAPCTL_ECRC_CHK_TX BIT(5) #define PCIE_CORE_ERR_CAPCTL_ECRC_CHK_TX_EN BIT(6) @@ -229,6 +219,11 @@ static inline u32 advk_readl(struct advk return readl(pcie->base + reg); }
+static inline u16 advk_read16(struct advk_pcie *pcie, u64 reg) +{ + return advk_readl(pcie, (reg & ~0x3)) >> ((reg & 0x3) * 8); +} + static int advk_pcie_link_up(struct advk_pcie *pcie) { u32 val, ltssm_state; @@ -301,16 +296,16 @@ static int advk_pcie_train_at_gen(struct * Start link training immediately after enabling it. * This solves problems for some buggy cards. */ - reg = advk_readl(pcie, PCIE_CORE_LINK_CTRL_STAT_REG); - reg |= PCIE_CORE_LINK_TRAINING; - advk_writel(pcie, reg, PCIE_CORE_LINK_CTRL_STAT_REG); + reg = advk_readl(pcie, PCIE_CORE_PCIEXP_CAP + PCI_EXP_LNKCTL); + reg |= PCI_EXP_LNKCTL_RL; + advk_writel(pcie, reg, PCIE_CORE_PCIEXP_CAP + PCI_EXP_LNKCTL);
ret = advk_pcie_wait_for_link(pcie); if (ret) return ret;
- reg = advk_readl(pcie, PCIE_CORE_LINK_CTRL_STAT_REG); - neg_gen = (reg >> PCIE_CORE_LINK_SPEED_SHIFT) & 0xf; + reg = advk_read16(pcie, PCIE_CORE_PCIEXP_CAP + PCI_EXP_LNKSTA); + neg_gen = reg & PCI_EXP_LNKSTA_CLS;
return neg_gen; } @@ -400,13 +395,14 @@ static void advk_pcie_setup_hw(struct ad PCIE_CORE_ERR_CAPCTL_ECRC_CHCK_RCV; advk_writel(pcie, reg, PCIE_CORE_ERR_CAPCTL_REG);
- /* Set PCIe Device Control and Status 1 PF0 register */ - reg = PCIE_CORE_DEV_CTRL_STATS_RELAX_ORDER_DISABLE | - (7 << PCIE_CORE_DEV_CTRL_STATS_MAX_PAYLOAD_SZ_SHIFT) | - PCIE_CORE_DEV_CTRL_STATS_SNOOP_DISABLE | - (PCIE_CORE_DEV_CTRL_STATS_MAX_RD_REQ_SZ << - PCIE_CORE_DEV_CTRL_STATS_MAX_RD_REQ_SIZE_SHIFT); - advk_writel(pcie, reg, PCIE_CORE_DEV_CTRL_STATS_REG); + /* Set PCIe Device Control register */ + reg = advk_readl(pcie, PCIE_CORE_PCIEXP_CAP + PCI_EXP_DEVCTL); + reg &= ~PCI_EXP_DEVCTL_RELAX_EN; + reg &= ~PCI_EXP_DEVCTL_NOSNOOP_EN; + reg &= ~PCI_EXP_DEVCTL_READRQ; + reg |= PCI_EXP_DEVCTL_PAYLOAD; /* Set max payload size */ + reg |= PCI_EXP_DEVCTL_READRQ_512B; + advk_writel(pcie, reg, PCIE_CORE_PCIEXP_CAP + PCI_EXP_DEVCTL);
/* Program PCIe Control 2 to disable strict ordering */ reg = PCIE_CORE_CTRL2_RESERVED |
From: Pali Rohár pali@kernel.org
commit b1bd5714472cc72e14409f5659b154c765a76c65 upstream.
Most callers of config read do not check for return value. But most of the ones that do, checks for error indication in 'val' variable.
This patch updates error handling in advk_pcie_rd_conf() function. If PIO transfer fails then 'val' variable is set to 0xffffffff which indicates failture.
Link: https://lore.kernel.org/r/20200528162604.GA323482@bjorn-Precision-5520 Link: https://lore.kernel.org/r/20200601130315.18895-1-pali@kernel.org Reported-by: Bjorn Helgaas helgaas@kernel.org Signed-off-by: Pali Rohár pali@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/host/pci-aardvark.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/pci/host/pci-aardvark.c +++ b/drivers/pci/host/pci-aardvark.c @@ -631,8 +631,10 @@ static int advk_pcie_rd_conf(struct pci_ advk_writel(pcie, 1, PIO_START);
ret = advk_pcie_wait_pio(pcie); - if (ret < 0) + if (ret < 0) { + *val = 0xffffffff; return PCIBIOS_SET_FAILED; + }
/* Check PIO status and get the read result */ ret = advk_pcie_check_pio_status(pcie, val);
From: Thomas Petazzoni thomas.petazzoni@bootlin.com
commit 248d4e59616c632f37f04c233eec6d5008384926 upstream.
In other to mimic other PCIe host controller drivers, introduce an advk_pcie_valid_device() helper, used in the configuration read/write functions.
Signed-off-by: Thomas Petazzoni thomas.petazzoni@bootlin.com [lorenzo.pieralisi@arm.com: updated host->controller dir move] Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/host/pci-aardvark.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
--- a/drivers/pci/host/pci-aardvark.c +++ b/drivers/pci/host/pci-aardvark.c @@ -592,6 +592,15 @@ static bool advk_pcie_pio_is_running(str return false; }
+static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus, + int devfn) +{ + if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0) + return false; + + return true; +} + static int advk_pcie_rd_conf(struct pci_bus *bus, u32 devfn, int where, int size, u32 *val) { @@ -599,7 +608,7 @@ static int advk_pcie_rd_conf(struct pci_ u32 reg; int ret;
- if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0) { + if (!advk_pcie_valid_device(pcie, bus, devfn)) { *val = 0xffffffff; return PCIBIOS_DEVICE_NOT_FOUND; } @@ -660,7 +669,7 @@ static int advk_pcie_wr_conf(struct pci_ int offset; int ret;
- if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0) + if (!advk_pcie_valid_device(pcie, bus, devfn)) return PCIBIOS_DEVICE_NOT_FOUND;
if (where % size)
From: Pali Rohár pali@kernel.org
commit 70e380250c3621c55ff218cbaf2272830d9dbb1d upstream.
When there is no PCIe card connected and advk_pcie_rd_conf() or advk_pcie_wr_conf() is called for PCI bus which doesn't belong to emulated root bridge, the aardvark driver throws the following error message:
advk-pcie d0070000.pcie: config read/write timed out
Obviously accessing PCIe registers of disconnected card is not possible.
Extend check in advk_pcie_valid_device() function for validating availability of PCIe bus. If PCIe link is down, then the device is marked as Not Found and the driver does not try to access these registers.
This is just an optimization to prevent accessing PCIe registers when card is disconnected. Trying to access PCIe registers of disconnected card does not cause any crash, kernel just needs to wait for a timeout. So if card disappear immediately after checking for PCIe link (before accessing PCIe registers), it does not cause any problems.
Link: https://lore.kernel.org/r/20200702083036.12230-1-pali@kernel.org Signed-off-by: Pali Rohár pali@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/host/pci-aardvark.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/pci/host/pci-aardvark.c +++ b/drivers/pci/host/pci-aardvark.c @@ -598,6 +598,13 @@ static bool advk_pcie_valid_device(struc if ((bus->number == pcie->root_bus_nr) && PCI_SLOT(devfn) != 0) return false;
+ /* + * If the link goes down after we check for link-up, nothing bad + * happens but the config access times out. + */ + if (bus->number != pcie->root_bus_nr && !advk_pcie_link_up(pcie)) + return false; + return true; }
From: Pali Rohár pali@kernel.org
commit b32c012e4b98f0126aa327be2d1f409963057643 upstream.
Include linux/gpio/consumer.h instead of linux/gpio.h, as is said in the latter file.
This was reported by kernel test bot when compiling for s390.
drivers/pci/controller/pci-aardvark.c:350:2: error: implicit declaration of function 'gpiod_set_value_cansleep' [-Werror,-Wimplicit-function-declaration] drivers/pci/controller/pci-aardvark.c:1074:21: error: implicit declaration of function 'devm_gpiod_get_from_of_node' [-Werror,-Wimplicit-function-declaration] drivers/pci/controller/pci-aardvark.c:1076:14: error: use of undeclared identifier 'GPIOD_OUT_LOW'
Link: https://lore.kernel.org/r/202006211118.LxtENQfl%25lkp@intel.com Link: https://lore.kernel.org/r/20200907111038.5811-2-pali@kernel.org Fixes: 5169a9851daa ("PCI: aardvark: Issue PERST via GPIO") Reported-by: kernel test robot lkp@intel.com Signed-off-by: Pali Rohár pali@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Reviewed-by: Marek Behún marek.behun@nic.cz Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/host/pci-aardvark.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/pci/host/pci-aardvark.c +++ b/drivers/pci/host/pci-aardvark.c @@ -12,7 +12,7 @@ */
#include <linux/delay.h> -#include <linux/gpio.h> +#include <linux/gpio/consumer.h> #include <linux/interrupt.h> #include <linux/irq.h> #include <linux/irqdomain.h>
From: Pali Rohár pali@kernel.org
commit d0c6a3475b033960e85ae2bf176b14cab0a627d2 upstream.
Move code which belongs to link training (delays and resets) into advk_pcie_train_link() function, so everything related to link training, including timings is at one place.
After experiments it can be observed that link training in aardvark hardware is very sensitive to timings and delays, so it is a good idea to have this code at the same place as link training calls.
This patch does not change behavior of aardvark initialization.
Link: https://lore.kernel.org/r/20200907111038.5811-6-pali@kernel.org Tested-by: Marek Behún marek.behun@nic.cz Signed-off-by: Pali Rohár pali@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/host/pci-aardvark.c | 64 +++++++++++++++++++++------------------- 1 file changed, 34 insertions(+), 30 deletions(-)
--- a/drivers/pci/host/pci-aardvark.c +++ b/drivers/pci/host/pci-aardvark.c @@ -268,6 +268,25 @@ static void advk_pcie_set_ob_win(struct advk_writel(pcie, match_ls | BIT(0), OB_WIN_MATCH_LS(win_num)); }
+static void advk_pcie_issue_perst(struct advk_pcie *pcie) +{ + u32 reg; + + if (!pcie->reset_gpio) + return; + + /* PERST does not work for some cards when link training is enabled */ + reg = advk_readl(pcie, PCIE_CORE_CTRL0_REG); + reg &= ~LINK_TRAINING_EN; + advk_writel(pcie, reg, PCIE_CORE_CTRL0_REG); + + /* 10ms delay is needed for some cards */ + dev_info(&pcie->pdev->dev, "issuing PERST via reset GPIO for 10ms\n"); + gpiod_set_value_cansleep(pcie->reset_gpio, 1); + usleep_range(10000, 11000); + gpiod_set_value_cansleep(pcie->reset_gpio, 0); +} + static int advk_pcie_train_at_gen(struct advk_pcie *pcie, int gen) { int ret, neg_gen; @@ -316,6 +335,21 @@ static void advk_pcie_train_link(struct int neg_gen = -1, gen;
/* + * Reset PCIe card via PERST# signal. Some cards are not detected + * during link training when they are in some non-initial state. + */ + advk_pcie_issue_perst(pcie); + + /* + * PERST# signal could have been asserted by pinctrl subsystem before + * probe() callback has been called or issued explicitly by reset gpio + * function advk_pcie_issue_perst(), making the endpoint going into + * fundamental reset. As required by PCI Express spec a delay for at + * least 100ms after such a reset before link training is needed. + */ + msleep(PCI_PM_D3COLD_WAIT); + + /* * Try link training at link gen specified by device tree property * 'max-link-speed'. If this fails, iteratively train at lower gen. */ @@ -347,25 +381,6 @@ err: dev_err(dev, "link never came up\n"); }
-static void advk_pcie_issue_perst(struct advk_pcie *pcie) -{ - u32 reg; - - if (!pcie->reset_gpio) - return; - - /* PERST does not work for some cards when link training is enabled */ - reg = advk_readl(pcie, PCIE_CORE_CTRL0_REG); - reg &= ~LINK_TRAINING_EN; - advk_writel(pcie, reg, PCIE_CORE_CTRL0_REG); - - /* 10ms delay is needed for some cards */ - dev_info(&pcie->pdev->dev, "issuing PERST via reset GPIO for 10ms\n"); - gpiod_set_value_cansleep(pcie->reset_gpio, 1); - usleep_range(10000, 11000); - gpiod_set_value_cansleep(pcie->reset_gpio, 0); -} - static void advk_pcie_setup_hw(struct advk_pcie *pcie) { u32 reg; @@ -375,8 +390,6 @@ static void advk_pcie_setup_hw(struct ad for (i = 0; i < 8; i++) advk_pcie_set_ob_win(pcie, i, 0, 0, 0, 0, 0, 0, 0);
- advk_pcie_issue_perst(pcie); - /* Set to Direct mode */ reg = advk_readl(pcie, CTRL_CONFIG_REG); reg &= ~(CTRL_MODE_MASK << CTRL_MODE_SHIFT); @@ -448,15 +461,6 @@ static void advk_pcie_setup_hw(struct ad reg |= PIO_CTRL_ADDR_WIN_DISABLE; advk_writel(pcie, reg, PIO_CTRL);
- /* - * PERST# signal could have been asserted by pinctrl subsystem before - * probe() callback has been called or issued explicitly by reset gpio - * function advk_pcie_issue_perst(), making the endpoint going into - * fundamental reset. As required by PCI Express spec a delay for at - * least 100ms after such a reset before link training is needed. - */ - msleep(PCI_PM_D3COLD_WAIT); - advk_pcie_train_link(pcie);
reg = advk_readl(pcie, PCIE_CORE_CMD_STATUS_REG);
From: Pali Rohár pali@kernel.org
commit 1d1cd163d0de22a4041a6f1aeabcf78f80076539 upstream.
According to PCI Express Base Specifications (rev 4.0, 6.6.1 "Conventional reset"), after fundamental reset a 100ms delay is needed prior to enabling link training.
Update comment in code to reflect this requirement.
Link: https://lore.kernel.org/r/20201202184659.3795-1-pali@kernel.org Signed-off-by: Pali Rohár pali@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/host/pci-aardvark.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/drivers/pci/host/pci-aardvark.c +++ b/drivers/pci/host/pci-aardvark.c @@ -275,7 +275,14 @@ static void advk_pcie_issue_perst(struct if (!pcie->reset_gpio) return;
- /* PERST does not work for some cards when link training is enabled */ + /* + * As required by PCI Express spec (PCI Express Base Specification, REV. + * 4.0 PCI Express, February 19 2014, 6.6.1 Conventional Reset) a delay + * for at least 100ms after de-asserting PERST# signal is needed before + * link training is enabled. So ensure that link training is disabled + * prior de-asserting PERST# signal to fulfill that PCI Express spec + * requirement. + */ reg = advk_readl(pcie, PCIE_CORE_CTRL0_REG); reg &= ~LINK_TRAINING_EN; advk_writel(pcie, reg, PCIE_CORE_CTRL0_REG);
From: Evan Wang xswang@marvell.com
commit 6df6ba974a55678a2c7d9a0c06eb15cde0c4b184 upstream.
Outbound window is used to translate CPU space addresses to PCIe space addresses when the CPU initiates PCIe transactions.
According to the suggestion of the HW designers, the recommended solution is to use the default outbound parameters, even though the current outbound window setting does not cause any known functional issue.
This patch doesn't address any known functional issue, but aligns to HW design guidelines, and removes code that isn't needed.
Signed-off-by: Evan Wang xswang@marvell.com [Thomas: tweak commit log.] Signed-off-by: Thomas Petazzoni thomas.petazzoni@bootlin.com [lorenzo.pieralisi@arm.com: handled host->controller dir move] Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Reviewed-by: Victor Gu xigu@marvell.com Reviewed-by: Nadav Haklai nadavh@marvell.com Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/host/pci-aardvark.c | 55 ---------------------------------------- 1 file changed, 55 deletions(-)
--- a/drivers/pci/host/pci-aardvark.c +++ b/drivers/pci/host/pci-aardvark.c @@ -107,24 +107,6 @@ #define PCIE_MSI_PAYLOAD_REG (CONTROL_BASE_ADDR + 0x9C) #define PCIE_MSI_DATA_MASK GENMASK(15, 0)
-/* PCIe window configuration */ -#define OB_WIN_BASE_ADDR 0x4c00 -#define OB_WIN_BLOCK_SIZE 0x20 -#define OB_WIN_REG_ADDR(win, offset) (OB_WIN_BASE_ADDR + \ - OB_WIN_BLOCK_SIZE * (win) + \ - (offset)) -#define OB_WIN_MATCH_LS(win) OB_WIN_REG_ADDR(win, 0x00) -#define OB_WIN_MATCH_MS(win) OB_WIN_REG_ADDR(win, 0x04) -#define OB_WIN_REMAP_LS(win) OB_WIN_REG_ADDR(win, 0x08) -#define OB_WIN_REMAP_MS(win) OB_WIN_REG_ADDR(win, 0x0c) -#define OB_WIN_MASK_LS(win) OB_WIN_REG_ADDR(win, 0x10) -#define OB_WIN_MASK_MS(win) OB_WIN_REG_ADDR(win, 0x14) -#define OB_WIN_ACTIONS(win) OB_WIN_REG_ADDR(win, 0x18) - -/* PCIe window types */ -#define OB_PCIE_MEM 0x0 -#define OB_PCIE_IO 0x4 - /* LMI registers base address and register offsets */ #define LMI_BASE_ADDR 0x6000 #define CFG_REG (LMI_BASE_ADDR + 0x0) @@ -248,26 +230,6 @@ static int advk_pcie_wait_for_link(struc return -ETIMEDOUT; }
-/* - * Set PCIe address window register which could be used for memory - * mapping. - */ -static void advk_pcie_set_ob_win(struct advk_pcie *pcie, - u32 win_num, u32 match_ms, - u32 match_ls, u32 mask_ms, - u32 mask_ls, u32 remap_ms, - u32 remap_ls, u32 action) -{ - advk_writel(pcie, match_ls, OB_WIN_MATCH_LS(win_num)); - advk_writel(pcie, match_ms, OB_WIN_MATCH_MS(win_num)); - advk_writel(pcie, mask_ms, OB_WIN_MASK_MS(win_num)); - advk_writel(pcie, mask_ls, OB_WIN_MASK_LS(win_num)); - advk_writel(pcie, remap_ms, OB_WIN_REMAP_MS(win_num)); - advk_writel(pcie, remap_ls, OB_WIN_REMAP_LS(win_num)); - advk_writel(pcie, action, OB_WIN_ACTIONS(win_num)); - advk_writel(pcie, match_ls | BIT(0), OB_WIN_MATCH_LS(win_num)); -} - static void advk_pcie_issue_perst(struct advk_pcie *pcie) { u32 reg; @@ -391,11 +353,6 @@ err: static void advk_pcie_setup_hw(struct advk_pcie *pcie) { u32 reg; - int i; - - /* Point PCIe unit MBUS decode windows to DRAM space */ - for (i = 0; i < 8; i++) - advk_pcie_set_ob_win(pcie, i, 0, 0, 0, 0, 0, 0, 0);
/* Set to Direct mode */ reg = advk_readl(pcie, CTRL_CONFIG_REG); @@ -1048,12 +1005,6 @@ static int advk_pcie_parse_request_of_pc
switch (resource_type(res)) { case IORESOURCE_IO: - advk_pcie_set_ob_win(pcie, 1, - upper_32_bits(res->start), - lower_32_bits(res->start), - 0, 0xF8000000, 0, - lower_32_bits(res->start), - OB_PCIE_IO); err = devm_pci_remap_iospace(dev, res, iobase); if (err) { dev_warn(dev, "error %d: failed to map resource %pR\n", @@ -1062,12 +1013,6 @@ static int advk_pcie_parse_request_of_pc } break; case IORESOURCE_MEM: - advk_pcie_set_ob_win(pcie, 0, - upper_32_bits(res->start), - lower_32_bits(res->start), - 0x0, 0xF8000000, 0, - lower_32_bits(res->start), - (2 << 20) | OB_PCIE_MEM); res_valid |= !(res->flags & IORESOURCE_PREFETCH); break; case IORESOURCE_BUS:
From: Pali Rohár pali@kernel.org
commit 64f160e19e9264a7f6d89c516baae1473b6f8359 upstream.
In commit 6df6ba974a55 ("PCI: aardvark: Remove PCIe outbound window configuration") was removed aardvark PCIe outbound window configuration and commit description said that was recommended solution by HW designers.
But that commit completely removed support for configuring PCIe IO resources without removing PCIe IO 'ranges' from DTS files. After that commit PCIe IO space started to be treated as PCIe MEM space and accessing it just caused kernel crash.
Moreover implementation of PCIe outbound windows prior that commit was incorrect. It completely ignored offset between CPU address and PCIe bus address and expected that in DTS is CPU address always same as PCIe bus address without doing any checks. Also it completely ignored size of every PCIe resource specified in 'ranges' DTS property and expected that every PCIe resource has size 128 MB (also for PCIe IO range). Again without any check. Apparently none of PCIe resource has in DTS specified size of 128 MB. So it was completely broken and thanks to how aardvark mask works, configuration was completely ignored.
This patch reverts back support for PCIe outbound window configuration but implementation is a new without issues mentioned above. PCIe outbound window is required when DTS specify in 'ranges' property non-zero offset between CPU and PCIe address space. To address recommendation by HW designers as specified in commit description of 6df6ba974a55, set default outbound parameters as PCIe MEM access without translation and therefore for this PCIe 'ranges' it is not needed to configure PCIe outbound window. For PCIe IO space is needed to configure aardvark PCIe outbound window.
This patch fixes kernel crash when trying to access PCIe IO space.
Link: https://lore.kernel.org/r/20210624215546.4015-2-pali@kernel.org Signed-off-by: Pali Rohár pali@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Cc: stable@vger.kernel.org # 6df6ba974a55 ("PCI: aardvark: Remove PCIe outbound window configuration") Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/host/pci-aardvark.c | 190 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 189 insertions(+), 1 deletion(-)
--- a/drivers/pci/host/pci-aardvark.c +++ b/drivers/pci/host/pci-aardvark.c @@ -107,6 +107,46 @@ #define PCIE_MSI_PAYLOAD_REG (CONTROL_BASE_ADDR + 0x9C) #define PCIE_MSI_DATA_MASK GENMASK(15, 0)
+/* PCIe window configuration */ +#define OB_WIN_BASE_ADDR 0x4c00 +#define OB_WIN_BLOCK_SIZE 0x20 +#define OB_WIN_COUNT 8 +#define OB_WIN_REG_ADDR(win, offset) (OB_WIN_BASE_ADDR + \ + OB_WIN_BLOCK_SIZE * (win) + \ + (offset)) +#define OB_WIN_MATCH_LS(win) OB_WIN_REG_ADDR(win, 0x00) +#define OB_WIN_ENABLE BIT(0) +#define OB_WIN_MATCH_MS(win) OB_WIN_REG_ADDR(win, 0x04) +#define OB_WIN_REMAP_LS(win) OB_WIN_REG_ADDR(win, 0x08) +#define OB_WIN_REMAP_MS(win) OB_WIN_REG_ADDR(win, 0x0c) +#define OB_WIN_MASK_LS(win) OB_WIN_REG_ADDR(win, 0x10) +#define OB_WIN_MASK_MS(win) OB_WIN_REG_ADDR(win, 0x14) +#define OB_WIN_ACTIONS(win) OB_WIN_REG_ADDR(win, 0x18) +#define OB_WIN_DEFAULT_ACTIONS (OB_WIN_ACTIONS(OB_WIN_COUNT-1) + 0x4) +#define OB_WIN_FUNC_NUM_MASK GENMASK(31, 24) +#define OB_WIN_FUNC_NUM_SHIFT 24 +#define OB_WIN_FUNC_NUM_ENABLE BIT(23) +#define OB_WIN_BUS_NUM_BITS_MASK GENMASK(22, 20) +#define OB_WIN_BUS_NUM_BITS_SHIFT 20 +#define OB_WIN_MSG_CODE_ENABLE BIT(22) +#define OB_WIN_MSG_CODE_MASK GENMASK(21, 14) +#define OB_WIN_MSG_CODE_SHIFT 14 +#define OB_WIN_MSG_PAYLOAD_LEN BIT(12) +#define OB_WIN_ATTR_ENABLE BIT(11) +#define OB_WIN_ATTR_TC_MASK GENMASK(10, 8) +#define OB_WIN_ATTR_TC_SHIFT 8 +#define OB_WIN_ATTR_RELAXED BIT(7) +#define OB_WIN_ATTR_NOSNOOP BIT(6) +#define OB_WIN_ATTR_POISON BIT(5) +#define OB_WIN_ATTR_IDO BIT(4) +#define OB_WIN_TYPE_MASK GENMASK(3, 0) +#define OB_WIN_TYPE_SHIFT 0 +#define OB_WIN_TYPE_MEM 0x0 +#define OB_WIN_TYPE_IO 0x4 +#define OB_WIN_TYPE_CONFIG_TYPE0 0x8 +#define OB_WIN_TYPE_CONFIG_TYPE1 0x9 +#define OB_WIN_TYPE_MSG 0xc + /* LMI registers base address and register offsets */ #define LMI_BASE_ADDR 0x6000 #define CFG_REG (LMI_BASE_ADDR + 0x0) @@ -175,6 +215,13 @@ struct advk_pcie { struct platform_device *pdev; void __iomem *base; struct list_head resources; + struct { + phys_addr_t match; + phys_addr_t remap; + phys_addr_t mask; + u32 actions; + } wins[OB_WIN_COUNT]; + u8 wins_count; struct irq_domain *irq_domain; struct irq_chip irq_chip; raw_spinlock_t irq_lock; @@ -350,9 +397,39 @@ err: dev_err(dev, "link never came up\n"); }
+/* + * Set PCIe address window register which could be used for memory + * mapping. + */ +static void advk_pcie_set_ob_win(struct advk_pcie *pcie, u8 win_num, + phys_addr_t match, phys_addr_t remap, + phys_addr_t mask, u32 actions) +{ + advk_writel(pcie, OB_WIN_ENABLE | + lower_32_bits(match), OB_WIN_MATCH_LS(win_num)); + advk_writel(pcie, upper_32_bits(match), OB_WIN_MATCH_MS(win_num)); + advk_writel(pcie, lower_32_bits(remap), OB_WIN_REMAP_LS(win_num)); + advk_writel(pcie, upper_32_bits(remap), OB_WIN_REMAP_MS(win_num)); + advk_writel(pcie, lower_32_bits(mask), OB_WIN_MASK_LS(win_num)); + advk_writel(pcie, upper_32_bits(mask), OB_WIN_MASK_MS(win_num)); + advk_writel(pcie, actions, OB_WIN_ACTIONS(win_num)); +} + +static void advk_pcie_disable_ob_win(struct advk_pcie *pcie, u8 win_num) +{ + advk_writel(pcie, 0, OB_WIN_MATCH_LS(win_num)); + advk_writel(pcie, 0, OB_WIN_MATCH_MS(win_num)); + advk_writel(pcie, 0, OB_WIN_REMAP_LS(win_num)); + advk_writel(pcie, 0, OB_WIN_REMAP_MS(win_num)); + advk_writel(pcie, 0, OB_WIN_MASK_LS(win_num)); + advk_writel(pcie, 0, OB_WIN_MASK_MS(win_num)); + advk_writel(pcie, 0, OB_WIN_ACTIONS(win_num)); +} + static void advk_pcie_setup_hw(struct advk_pcie *pcie) { u32 reg; + int i;
/* Set to Direct mode */ reg = advk_readl(pcie, CTRL_CONFIG_REG); @@ -416,15 +493,51 @@ static void advk_pcie_setup_hw(struct ad reg = PCIE_IRQ_ALL_MASK & (~PCIE_IRQ_ENABLE_INTS_MASK); advk_writel(pcie, reg, HOST_CTRL_INT_MASK_REG);
+ /* + * Enable AXI address window location generation: + * When it is enabled, the default outbound window + * configurations (Default User Field: 0xD0074CFC) + * are used to transparent address translation for + * the outbound transactions. Thus, PCIe address + * windows are not required for transparent memory + * access when default outbound window configuration + * is set for memory access. + */ reg = advk_readl(pcie, PCIE_CORE_CTRL2_REG); reg |= PCIE_CORE_CTRL2_OB_WIN_ENABLE; advk_writel(pcie, reg, PCIE_CORE_CTRL2_REG);
- /* Bypass the address window mapping for PIO */ + /* + * Set memory access in Default User Field so it + * is not required to configure PCIe address for + * transparent memory access. + */ + advk_writel(pcie, OB_WIN_TYPE_MEM, OB_WIN_DEFAULT_ACTIONS); + + /* + * Bypass the address window mapping for PIO: + * Since PIO access already contains all required + * info over AXI interface by PIO registers, the + * address window is not required. + */ reg = advk_readl(pcie, PIO_CTRL); reg |= PIO_CTRL_ADDR_WIN_DISABLE; advk_writel(pcie, reg, PIO_CTRL);
+ /* + * Configure PCIe address windows for non-memory or + * non-transparent access as by default PCIe uses + * transparent memory access. + */ + for (i = 0; i < pcie->wins_count; i++) + advk_pcie_set_ob_win(pcie, i, + pcie->wins[i].match, pcie->wins[i].remap, + pcie->wins[i].mask, pcie->wins[i].actions); + + /* Disable remaining PCIe outbound windows */ + for (i = pcie->wins_count; i < OB_WIN_COUNT; i++) + advk_pcie_disable_ob_win(pcie, i); + advk_pcie_train_link(pcie);
reg = advk_readl(pcie, PCIE_CORE_CMD_STATUS_REG); @@ -1041,6 +1154,7 @@ static int advk_pcie_probe(struct platfo struct resource *res; struct pci_bus *bus, *child; struct pci_host_bridge *bridge; + struct resource_entry *entry; int ret, irq;
bridge = devm_pci_alloc_host_bridge(dev, sizeof(struct advk_pcie)); @@ -1070,6 +1184,80 @@ static int advk_pcie_probe(struct platfo return ret; }
+ resource_list_for_each_entry(entry, &pcie->resources) { + resource_size_t start = entry->res->start; + resource_size_t size = resource_size(entry->res); + unsigned long type = resource_type(entry->res); + u64 win_size; + + /* + * Aardvark hardware allows to configure also PCIe window + * for config type 0 and type 1 mapping, but driver uses + * only PIO for issuing configuration transfers which does + * not use PCIe window configuration. + */ + if (type != IORESOURCE_MEM && type != IORESOURCE_MEM_64 && + type != IORESOURCE_IO) + continue; + + /* + * Skip transparent memory resources. Default outbound access + * configuration is set to transparent memory access so it + * does not need window configuration. + */ + if ((type == IORESOURCE_MEM || type == IORESOURCE_MEM_64) && + entry->offset == 0) + continue; + + /* + * The n-th PCIe window is configured by tuple (match, remap, mask) + * and an access to address A uses this window if A matches the + * match with given mask. + * So every PCIe window size must be a power of two and every start + * address must be aligned to window size. Minimal size is 64 KiB + * because lower 16 bits of mask must be zero. Remapped address + * may have set only bits from the mask. + */ + while (pcie->wins_count < OB_WIN_COUNT && size > 0) { + /* Calculate the largest aligned window size */ + win_size = (1ULL << (fls64(size)-1)) | + (start ? (1ULL << __ffs64(start)) : 0); + win_size = 1ULL << __ffs64(win_size); + if (win_size < 0x10000) + break; + + dev_dbg(dev, + "Configuring PCIe window %d: [0x%llx-0x%llx] as %lu\n", + pcie->wins_count, (unsigned long long)start, + (unsigned long long)start + win_size, type); + + if (type == IORESOURCE_IO) { + pcie->wins[pcie->wins_count].actions = OB_WIN_TYPE_IO; + pcie->wins[pcie->wins_count].match = pci_pio_to_address(start); + } else { + pcie->wins[pcie->wins_count].actions = OB_WIN_TYPE_MEM; + pcie->wins[pcie->wins_count].match = start; + } + pcie->wins[pcie->wins_count].remap = start - entry->offset; + pcie->wins[pcie->wins_count].mask = ~(win_size - 1); + + if (pcie->wins[pcie->wins_count].remap & (win_size - 1)) + break; + + start += win_size; + size -= win_size; + pcie->wins_count++; + } + + if (size > 0) { + dev_err(&pcie->pdev->dev, + "Invalid PCIe region [0x%llx-0x%llx]\n", + (unsigned long long)entry->res->start, + (unsigned long long)entry->res->end + 1); + return -EINVAL; + } + } + pcie->reset_gpio = devm_fwnode_get_index_gpiod_from_child(dev, "reset", 0, dev_fwnode(dev),
From: Pali Rohár pali@kernel.org
commit a4e17d65dafdd3513042d8f00404c9b6068a825c upstream.
Change PCIe Max Payload Size setting in PCIe Device Control register to 512 bytes to align with PCIe Link Initialization sequence as defined in Marvell Armada 3700 Functional Specification. According to the specification, maximal Max Payload Size supported by this device is 512 bytes.
Without this kernel prints suspicious line:
pci 0000:01:00.0: Upstream bridge's Max Payload Size set to 256 (was 16384, max 512)
With this change it changes to:
pci 0000:01:00.0: Upstream bridge's Max Payload Size set to 256 (was 512, max 512)
Link: https://lore.kernel.org/r/20211005180952.6812-3-kabel@kernel.org Fixes: 8c39d710363c ("PCI: aardvark: Add Aardvark PCI host controller driver") Signed-off-by: Pali Rohár pali@kernel.org Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Reviewed-by: Marek Behún kabel@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/host/pci-aardvark.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/pci/host/pci-aardvark.c +++ b/drivers/pci/host/pci-aardvark.c @@ -453,8 +453,9 @@ static void advk_pcie_setup_hw(struct ad reg = advk_readl(pcie, PCIE_CORE_PCIEXP_CAP + PCI_EXP_DEVCTL); reg &= ~PCI_EXP_DEVCTL_RELAX_EN; reg &= ~PCI_EXP_DEVCTL_NOSNOOP_EN; + reg &= ~PCI_EXP_DEVCTL_PAYLOAD; reg &= ~PCI_EXP_DEVCTL_READRQ; - reg |= PCI_EXP_DEVCTL_PAYLOAD; /* Set max payload size */ + reg |= PCI_EXP_DEVCTL_PAYLOAD_512B; reg |= PCI_EXP_DEVCTL_READRQ_512B; advk_writel(pcie, reg, PCIE_CORE_PCIEXP_CAP + PCI_EXP_DEVCTL);
From: Frederick Lawler fred@fredlawl.com
commit c80851f6ce63a6e313f8c7b4b6eb82c67aa4497b upstream.
The Link Control 2 register is missing macros for Target Link Speeds. Add those in.
Signed-off-by: Frederick Lawler fred@fredlawl.com [bhelgaas: use "GT" instead of "GB"] Signed-off-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/uapi/linux/pci_regs.h | 5 +++++ 1 file changed, 5 insertions(+)
--- a/include/uapi/linux/pci_regs.h +++ b/include/uapi/linux/pci_regs.h @@ -654,6 +654,11 @@ #define PCI_EXP_LNKCAP2_SLS_8_0GB 0x00000008 /* Supported Speed 8.0GT/s */ #define PCI_EXP_LNKCAP2_CROSSLINK 0x00000100 /* Crosslink supported */ #define PCI_EXP_LNKCTL2 48 /* Link Control 2 */ +#define PCI_EXP_LNKCTL2_TLS 0x000f +#define PCI_EXP_LNKCTL2_TLS_2_5GT 0x0001 /* Supported Speed 2.5GT/s */ +#define PCI_EXP_LNKCTL2_TLS_5_0GT 0x0002 /* Supported Speed 5GT/s */ +#define PCI_EXP_LNKCTL2_TLS_8_0GT 0x0003 /* Supported Speed 8GT/s */ +#define PCI_EXP_LNKCTL2_TLS_16_0GT 0x0004 /* Supported Speed 16GT/s */ #define PCI_EXP_LNKSTA2 50 /* Link Status 2 */ #define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 52 /* v2 endpoints with link end here */ #define PCI_EXP_SLTCAP2 52 /* Slot Capabilities 2 */
From: Pali Rohár pali@kernel.org
commit f76b36d40beee0a13aa8f6aa011df0d7cbbb8a7f upstream.
Fix multiple link training issues in aardvark driver. The main reason of these issues was misunderstanding of what certain registers do, since their names and comments were misleading: before commit 96be36dbffac ("PCI: aardvark: Replace custom macros by standard linux/pci_regs.h macros"), the pci-aardvark.c driver used custom macros for accessing standard PCIe Root Bridge registers, and misleading comments did not help to understand what the code was really doing.
After doing more tests and experiments I've come to the conclusion that the SPEED_GEN register in aardvark sets the PCIe revision / generation compliance and forces maximal link speed. Both GEN3 and GEN2 values set the read-only PCI_EXP_FLAGS_VERS bits (PCIe capabilities version of Root Bridge) to value 2, while GEN1 value sets PCI_EXP_FLAGS_VERS to 1, which matches with PCI Express specifications revisions 3, 2 and 1 respectively. Changing SPEED_GEN also sets the read-only bits PCI_EXP_LNKCAP_SLS and PCI_EXP_LNKCAP2_SLS to corresponding speed.
(Note that PCI Express rev 1 specification does not define PCI_EXP_LNKCAP2 and PCI_EXP_LNKCTL2 registers and when SPEED_GEN is set to GEN1 (which also sets PCI_EXP_FLAGS_VERS set to 1), lspci cannot access PCI_EXP_LNKCAP2 and PCI_EXP_LNKCTL2 registers.)
Changing PCIe link speed can be done via PCI_EXP_LNKCTL2_TLS bits of PCI_EXP_LNKCTL2 register. Armada 3700 Functional Specifications says that the default value of PCI_EXP_LNKCTL2_TLS is based on SPEED_GEN value, but tests showed that the default value is always 8.0 GT/s, independently of speed set by SPEED_GEN. So after setting SPEED_GEN, we must also set value in PCI_EXP_LNKCTL2 register via PCI_EXP_LNKCTL2_TLS bits.
Triggering PCI_EXP_LNKCTL_RL bit immediately after setting LINK_TRAINING_EN bit actually doesn't do anything. Tests have shown that a delay is needed after enabling LINK_TRAINING_EN bit. As triggering PCI_EXP_LNKCTL_RL currently does nothing, remove it.
Commit 43fc679ced18 ("PCI: aardvark: Improve link training") introduced code which sets SPEED_GEN register based on negotiated link speed from PCI_EXP_LNKSTA_CLS bits of PCI_EXP_LNKSTA register. This code was added to fix detection of Compex WLE900VX (Atheros QCA9880) WiFi GEN1 PCIe cards, as otherwise these cards were "invisible" on PCIe bus (probably because they crashed). But apparently more people reported the same issues with these cards also with other PCIe controllers [1] and I was able to reproduce this issue also with other "noname" WiFi cards based on Atheros QCA9890 chip (with the same PCI vendor/device ids as Atheros QCA9880). So this is not an issue in aardvark but rather an issue in Atheros QCA98xx chips. Also, this issue only exists if the kernel is compiled with PCIe ASPM support, and a generic workaround for this is to change PCIe Bridge to 2.5 GT/s link speed via PCI_EXP_LNKCTL2_TLS_2_5GT bits in PCI_EXP_LNKCTL2 register [2], before triggering PCI_EXP_LNKCTL_RL bit. This workaround also works when SPEED_GEN is set to value GEN2 (5 GT/s). So remove this hack completely in the aardvark driver and always set SPEED_GEN to value from 'max-link-speed' DT property. Fix for Atheros QCA98xx chips is handled separately by patch [2].
These two things (code for triggering PCI_EXP_LNKCTL_RL bit and changing SPEED_GEN value) also explain why commit 6964494582f5 ("PCI: aardvark: Train link immediately after enabling training") somehow fixed detection of those problematic Compex cards with Atheros chips: if triggering link retraining (via PCI_EXP_LNKCTL_RL bit) was done immediately after enabling link training (via LINK_TRAINING_EN), it did nothing. If there was a specific delay, aardvark HW already initialized PCIe link and therefore triggering link retraining caused the above issue. Compex cards triggered link down event and disappeared from the PCIe bus.
Commit f4c7d053d7f7 ("PCI: aardvark: Wait for endpoint to be ready before training link") added 100ms sleep before calling 'Start link training' command and explained that it is a requirement of PCI Express specification. But the code after this 100ms sleep was not doing 'Start link training', rather it triggered PCI_EXP_LNKCTL_RL bit via PCIe Root Bridge to put link into Recovery state.
The required delay after fundamental reset is already done in function advk_pcie_wait_for_link() which also checks whether PCIe link is up. So after removing the code which triggers PCI_EXP_LNKCTL_RL bit on PCIe Root Bridge, there is no need to wait 100ms again. Remove the extra msleep() call and update comment about the delay required by the PCI Express specification.
According to Marvell Armada 3700 Functional Specifications, Link training should be enabled via aardvark register LINK_TRAINING_EN after selecting PCIe generation and x1 lane. There is no need to disable it prior resetting card via PERST# signal. This disabling code was introduced in commit 5169a9851daa ("PCI: aardvark: Issue PERST via GPIO") as a workaround for some Atheros cards. It turns out that this also is Atheros specific issue and affects any PCIe controller, not only aardvark. Moreover this Atheros issue was triggered by juggling with PCI_EXP_LNKCTL_RL, LINK_TRAINING_EN and SPEED_GEN bits interleaved with sleeps. Now, after removing triggering PCI_EXP_LNKCTL_RL, there is no need to explicitly disable LINK_TRAINING_EN bit. So remove this code too. The problematic Compex cards described in previous git commits are correctly detected in advk_pcie_train_link() function even after applying all these changes.
Note that with this patch, and also prior this patch, some NVMe disks which support PCIe GEN3 with 8 GT/s speed are negotiated only at the lowest link speed 2.5 GT/s, independently of SPEED_GEN value. After manually triggering PCI_EXP_LNKCTL_RL bit (e.g. from userspace via setpci), these NVMe disks change link speed to 5 GT/s when SPEED_GEN was configured to GEN2. This issue first needs to be properly investigated. I will send a fix in the future.
On the other hand, some other GEN2 PCIe cards with 5 GT/s speed are autonomously by HW autonegotiated at full 5 GT/s speed without need of any software interaction.
Armada 3700 Functional Specifications describes the following steps for link training: set SPEED_GEN to GEN2, enable LINK_TRAINING_EN, poll until link training is complete, trigger PCI_EXP_LNKCTL_RL, poll until signal rate is 5 GT/s, poll until link training is complete, enable ASPM L0s.
The requirement for triggering PCI_EXP_LNKCTL_RL can be explained by the need to achieve 5 GT/s speed (as changing link speed is done by throw to recovery state entered by PCI_EXP_LNKCTL_RL) or maybe as a part of enabling ASPM L0s (but in this case ASPM L0s should have been enabled prior PCI_EXP_LNKCTL_RL).
It is unknown why the original pci-aardvark.c driver was triggering PCI_EXP_LNKCTL_RL bit before waiting for the link to be up. This does not align with neither PCIe base specifications nor with Armada 3700 Functional Specification. (Note that in older versions of aardvark, this bit was called incorrectly PCIE_CORE_LINK_TRAINING, so this may be the reason.)
It is also unknown why Armada 3700 Functional Specification says that it is needed to trigger PCI_EXP_LNKCTL_RL for GEN2 mode, as according to PCIe base specification 5 GT/s speed negotiation is supposed to be entirely autonomous, even if initial speed is 2.5 GT/s.
[1] - https://lore.kernel.org/linux-pci/87h7l8axqp.fsf@toke.dk/ [2] - https://lore.kernel.org/linux-pci/20210326124326.21163-1-pali@kernel.org/
Link: https://lore.kernel.org/r/20211005180952.6812-12-kabel@kernel.org Signed-off-by: Pali Rohár pali@kernel.org Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Reviewed-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/host/pci-aardvark.c | 119 +++++++++++----------------------------- 1 file changed, 35 insertions(+), 84 deletions(-)
--- a/drivers/pci/host/pci-aardvark.c +++ b/drivers/pci/host/pci-aardvark.c @@ -248,11 +248,6 @@ static inline u32 advk_readl(struct advk return readl(pcie->base + reg); }
-static inline u16 advk_read16(struct advk_pcie *pcie, u64 reg) -{ - return advk_readl(pcie, (reg & ~0x3)) >> ((reg & 0x3) * 8); -} - static int advk_pcie_link_up(struct advk_pcie *pcie) { u32 val, ltssm_state; @@ -279,23 +274,9 @@ static int advk_pcie_wait_for_link(struc
static void advk_pcie_issue_perst(struct advk_pcie *pcie) { - u32 reg; - if (!pcie->reset_gpio) return;
- /* - * As required by PCI Express spec (PCI Express Base Specification, REV. - * 4.0 PCI Express, February 19 2014, 6.6.1 Conventional Reset) a delay - * for at least 100ms after de-asserting PERST# signal is needed before - * link training is enabled. So ensure that link training is disabled - * prior de-asserting PERST# signal to fulfill that PCI Express spec - * requirement. - */ - reg = advk_readl(pcie, PCIE_CORE_CTRL0_REG); - reg &= ~LINK_TRAINING_EN; - advk_writel(pcie, reg, PCIE_CORE_CTRL0_REG); - /* 10ms delay is needed for some cards */ dev_info(&pcie->pdev->dev, "issuing PERST via reset GPIO for 10ms\n"); gpiod_set_value_cansleep(pcie->reset_gpio, 1); @@ -303,54 +284,47 @@ static void advk_pcie_issue_perst(struct gpiod_set_value_cansleep(pcie->reset_gpio, 0); }
-static int advk_pcie_train_at_gen(struct advk_pcie *pcie, int gen) +static void advk_pcie_train_link(struct advk_pcie *pcie) { - int ret, neg_gen; + struct device *dev = &pcie->pdev->dev; u32 reg; + int ret;
- /* Setup link speed */ + /* + * Setup PCIe rev / gen compliance based on device tree property + * 'max-link-speed' which also forces maximal link speed. + */ reg = advk_readl(pcie, PCIE_CORE_CTRL0_REG); reg &= ~PCIE_GEN_SEL_MSK; - if (gen == 3) + if (pcie->link_gen == 3) reg |= SPEED_GEN_3; - else if (gen == 2) + else if (pcie->link_gen == 2) reg |= SPEED_GEN_2; else reg |= SPEED_GEN_1; advk_writel(pcie, reg, PCIE_CORE_CTRL0_REG);
/* - * Enable link training. This is not needed in every call to this - * function, just once suffices, but it does not break anything either. - */ + * Set maximal link speed value also into PCIe Link Control 2 register. + * Armada 3700 Functional Specification says that default value is based + * on SPEED_GEN but tests showed that default value is always 8.0 GT/s. + */ + reg = advk_readl(pcie, PCIE_CORE_PCIEXP_CAP + PCI_EXP_LNKCTL2); + reg &= ~PCI_EXP_LNKCTL2_TLS; + if (pcie->link_gen == 3) + reg |= PCI_EXP_LNKCTL2_TLS_8_0GT; + else if (pcie->link_gen == 2) + reg |= PCI_EXP_LNKCTL2_TLS_5_0GT; + else + reg |= PCI_EXP_LNKCTL2_TLS_2_5GT; + advk_writel(pcie, reg, PCIE_CORE_PCIEXP_CAP + PCI_EXP_LNKCTL2); + + /* Enable link training after selecting PCIe generation */ reg = advk_readl(pcie, PCIE_CORE_CTRL0_REG); reg |= LINK_TRAINING_EN; advk_writel(pcie, reg, PCIE_CORE_CTRL0_REG);
/* - * Start link training immediately after enabling it. - * This solves problems for some buggy cards. - */ - reg = advk_readl(pcie, PCIE_CORE_PCIEXP_CAP + PCI_EXP_LNKCTL); - reg |= PCI_EXP_LNKCTL_RL; - advk_writel(pcie, reg, PCIE_CORE_PCIEXP_CAP + PCI_EXP_LNKCTL); - - ret = advk_pcie_wait_for_link(pcie); - if (ret) - return ret; - - reg = advk_read16(pcie, PCIE_CORE_PCIEXP_CAP + PCI_EXP_LNKSTA); - neg_gen = reg & PCI_EXP_LNKSTA_CLS; - - return neg_gen; -} - -static void advk_pcie_train_link(struct advk_pcie *pcie) -{ - struct device *dev = &pcie->pdev->dev; - int neg_gen = -1, gen; - - /* * Reset PCIe card via PERST# signal. Some cards are not detected * during link training when they are in some non-initial state. */ @@ -360,41 +334,18 @@ static void advk_pcie_train_link(struct * PERST# signal could have been asserted by pinctrl subsystem before * probe() callback has been called or issued explicitly by reset gpio * function advk_pcie_issue_perst(), making the endpoint going into - * fundamental reset. As required by PCI Express spec a delay for at - * least 100ms after such a reset before link training is needed. - */ - msleep(PCI_PM_D3COLD_WAIT); - - /* - * Try link training at link gen specified by device tree property - * 'max-link-speed'. If this fails, iteratively train at lower gen. - */ - for (gen = pcie->link_gen; gen > 0; --gen) { - neg_gen = advk_pcie_train_at_gen(pcie, gen); - if (neg_gen > 0) - break; - } - - if (neg_gen < 0) - goto err; - - /* - * After successful training if negotiated gen is lower than requested, - * train again on negotiated gen. This solves some stability issues for - * some buggy gen1 cards. + * fundamental reset. As required by PCI Express spec (PCI Express + * Base Specification, REV. 4.0 PCI Express, February 19 2014, 6.6.1 + * Conventional Reset) a delay for at least 100ms after such a reset + * before sending a Configuration Request to the device is needed. + * So wait until PCIe link is up. Function advk_pcie_wait_for_link() + * waits for link at least 900ms. */ - if (neg_gen < gen) { - gen = neg_gen; - neg_gen = advk_pcie_train_at_gen(pcie, gen); - } - - if (neg_gen == gen) { - dev_info(dev, "link up at gen %i\n", gen); - return; - } - -err: - dev_err(dev, "link never came up\n"); + ret = advk_pcie_wait_for_link(pcie); + if (ret < 0) + dev_err(dev, "link never came up\n"); + else + dev_info(dev, "link up\n"); }
/*
From: Pali Rohár pali@kernel.org
commit 661c399a651c11aaf83c45cbfe0b4a1fb7bc3179 upstream.
Current implementation of advk_pcie_link_up() is wrong as it marks also link disabled or hot reset states as link up.
Fix it by marking link up only to those states which are defined in PCIe Base specification 3.0, Table 4-14: Link Status Mapped to the LTSSM.
To simplify implementation, Define macros for every LTSSM state which aardvark hardware can return in CFG_REG register.
Fix also checking for link training according to the same Table 4-14. Define a new function advk_pcie_link_training() for this purpose.
Link: https://lore.kernel.org/r/20211005180952.6812-13-kabel@kernel.org Fixes: 8c39d710363c ("PCI: aardvark: Add Aardvark PCI host controller driver") Signed-off-by: Pali Rohár pali@kernel.org Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Reviewed-by: Marek Behún kabel@kernel.org Cc: stable@vger.kernel.org Cc: Remi Pommarel repk@triplefau.lt Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/host/pci-aardvark.c | 71 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 67 insertions(+), 4 deletions(-)
--- a/drivers/pci/host/pci-aardvark.c +++ b/drivers/pci/host/pci-aardvark.c @@ -152,9 +152,50 @@ #define CFG_REG (LMI_BASE_ADDR + 0x0) #define LTSSM_SHIFT 24 #define LTSSM_MASK 0x3f -#define LTSSM_L0 0x10 #define RC_BAR_CONFIG 0x300
+/* LTSSM values in CFG_REG */ +enum { + LTSSM_DETECT_QUIET = 0x0, + LTSSM_DETECT_ACTIVE = 0x1, + LTSSM_POLLING_ACTIVE = 0x2, + LTSSM_POLLING_COMPLIANCE = 0x3, + LTSSM_POLLING_CONFIGURATION = 0x4, + LTSSM_CONFIG_LINKWIDTH_START = 0x5, + LTSSM_CONFIG_LINKWIDTH_ACCEPT = 0x6, + LTSSM_CONFIG_LANENUM_ACCEPT = 0x7, + LTSSM_CONFIG_LANENUM_WAIT = 0x8, + LTSSM_CONFIG_COMPLETE = 0x9, + LTSSM_CONFIG_IDLE = 0xa, + LTSSM_RECOVERY_RCVR_LOCK = 0xb, + LTSSM_RECOVERY_SPEED = 0xc, + LTSSM_RECOVERY_RCVR_CFG = 0xd, + LTSSM_RECOVERY_IDLE = 0xe, + LTSSM_L0 = 0x10, + LTSSM_RX_L0S_ENTRY = 0x11, + LTSSM_RX_L0S_IDLE = 0x12, + LTSSM_RX_L0S_FTS = 0x13, + LTSSM_TX_L0S_ENTRY = 0x14, + LTSSM_TX_L0S_IDLE = 0x15, + LTSSM_TX_L0S_FTS = 0x16, + LTSSM_L1_ENTRY = 0x17, + LTSSM_L1_IDLE = 0x18, + LTSSM_L2_IDLE = 0x19, + LTSSM_L2_TRANSMIT_WAKE = 0x1a, + LTSSM_DISABLED = 0x20, + LTSSM_LOOPBACK_ENTRY_MASTER = 0x21, + LTSSM_LOOPBACK_ACTIVE_MASTER = 0x22, + LTSSM_LOOPBACK_EXIT_MASTER = 0x23, + LTSSM_LOOPBACK_ENTRY_SLAVE = 0x24, + LTSSM_LOOPBACK_ACTIVE_SLAVE = 0x25, + LTSSM_LOOPBACK_EXIT_SLAVE = 0x26, + LTSSM_HOT_RESET = 0x27, + LTSSM_RECOVERY_EQUALIZATION_PHASE0 = 0x28, + LTSSM_RECOVERY_EQUALIZATION_PHASE1 = 0x29, + LTSSM_RECOVERY_EQUALIZATION_PHASE2 = 0x2a, + LTSSM_RECOVERY_EQUALIZATION_PHASE3 = 0x2b, +}; + /* PCIe core controller registers */ #define CTRL_CORE_BASE_ADDR 0x18000 #define CTRL_CONFIG_REG (CTRL_CORE_BASE_ADDR + 0x0) @@ -248,13 +289,35 @@ static inline u32 advk_readl(struct advk return readl(pcie->base + reg); }
-static int advk_pcie_link_up(struct advk_pcie *pcie) +static u8 advk_pcie_ltssm_state(struct advk_pcie *pcie) { - u32 val, ltssm_state; + u32 val; + u8 ltssm_state;
val = advk_readl(pcie, CFG_REG); ltssm_state = (val >> LTSSM_SHIFT) & LTSSM_MASK; - return ltssm_state >= LTSSM_L0; + return ltssm_state; +} + +static inline bool advk_pcie_link_up(struct advk_pcie *pcie) +{ + /* check if LTSSM is in normal operation - some L* state */ + u8 ltssm_state = advk_pcie_ltssm_state(pcie); + return ltssm_state >= LTSSM_L0 && ltssm_state < LTSSM_DISABLED; +} + +static inline bool advk_pcie_link_training(struct advk_pcie *pcie) +{ + /* + * According to PCIe Base specification 3.0, Table 4-14: Link + * Status Mapped to the LTSSM is Link Training mapped to LTSSM + * Configuration and Recovery states. + */ + u8 ltssm_state = advk_pcie_ltssm_state(pcie); + return ((ltssm_state >= LTSSM_CONFIG_LINKWIDTH_START && + ltssm_state < LTSSM_L0) || + (ltssm_state >= LTSSM_RECOVERY_EQUALIZATION_PHASE0 && + ltssm_state <= LTSSM_RECOVERY_EQUALIZATION_PHASE3)); }
static int advk_pcie_wait_for_link(struct advk_pcie *pcie)
From: Marek Behún marek.behun@nic.cz
commit 823868fceae3bac07cf5eccb128d6916e7a5ae9d upstream.
This is a cleanup and fix of the patch by Ken Ma make@marvell.com.
Fix the mpp definitions according to newest revision of the specification: - northbridge: fix pmic1 gpio number to 7 fix pmic0 gpio number to 6 - southbridge split pcie1 group bit mask to BIT(5) and BIT(9) fix ptp group bit mask to BIT(11) | BIT(12) | BIT(13) add smi group with bit mask BIT(4)
[gregory: split the pcie group in 2, as at hardware level they can be configured separately] Signed-off-by: Marek Behún marek.behun@nic.cz Signed-off-by: Gregory CLEMENT gregory.clement@bootlin.com Tested-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/devicetree/bindings/pinctrl/marvell,armada-37xx-pinctrl.txt | 18 +++++++--- drivers/pinctrl/mvebu/pinctrl-armada-37xx.c | 10 +++-- 2 files changed, 19 insertions(+), 9 deletions(-)
--- a/Documentation/devicetree/bindings/pinctrl/marvell,armada-37xx-pinctrl.txt +++ b/Documentation/devicetree/bindings/pinctrl/marvell,armada-37xx-pinctrl.txt @@ -58,11 +58,11 @@ group pwm3 - functions pwm, gpio
group pmic1 - - pin 17 + - pin 7 - functions pmic, gpio
group pmic0 - - pin 16 + - pin 6 - functions pmic, gpio
group i2c2 @@ -112,17 +112,25 @@ group usb2_drvvbus1 - functions drvbus, gpio
group sdio_sb - - pins 60-64 + - pins 60-65 - functions sdio, gpio
group rgmii - - pins 42-55 + - pins 42-53 - functions mii, gpio
group pcie1 - - pins 39-40 + - pins 39 + - functions pcie, gpio + +group pcie1_clkreq + - pins 40 - functions pcie, gpio
+group smi + - pins 54-55 + - functions smi, gpio + group ptp - pins 56-58 - functions ptp, gpio --- a/drivers/pinctrl/mvebu/pinctrl-armada-37xx.c +++ b/drivers/pinctrl/mvebu/pinctrl-armada-37xx.c @@ -157,8 +157,8 @@ static struct armada_37xx_pin_group arma PIN_GRP_GPIO("pwm1", 12, 1, BIT(4), "pwm"), PIN_GRP_GPIO("pwm2", 13, 1, BIT(5), "pwm"), PIN_GRP_GPIO("pwm3", 14, 1, BIT(6), "pwm"), - PIN_GRP_GPIO("pmic1", 17, 1, BIT(7), "pmic"), - PIN_GRP_GPIO("pmic0", 16, 1, BIT(8), "pmic"), + PIN_GRP_GPIO("pmic1", 7, 1, BIT(7), "pmic"), + PIN_GRP_GPIO("pmic0", 6, 1, BIT(8), "pmic"), PIN_GRP_GPIO("i2c2", 2, 2, BIT(9), "i2c"), PIN_GRP_GPIO("i2c1", 0, 2, BIT(10), "i2c"), PIN_GRP_GPIO("spi_cs1", 17, 1, BIT(12), "spi"), @@ -182,8 +182,10 @@ static struct armada_37xx_pin_group arma PIN_GRP_GPIO("usb2_drvvbus1", 1, 1, BIT(1), "drvbus"), PIN_GRP_GPIO("sdio_sb", 24, 6, BIT(2), "sdio"), PIN_GRP_GPIO("rgmii", 6, 12, BIT(3), "mii"), - PIN_GRP_GPIO("pcie1", 3, 2, BIT(4), "pcie"), - PIN_GRP_GPIO("ptp", 20, 3, BIT(5), "ptp"), + PIN_GRP_GPIO("smi", 18, 2, BIT(4), "smi"), + PIN_GRP_GPIO("pcie1", 3, 1, BIT(5), "pcie"), + PIN_GRP_GPIO("pcie1_clkreq", 4, 1, BIT(9), "pcie"), + PIN_GRP_GPIO("ptp", 20, 3, BIT(11) | BIT(12) | BIT(13), "ptp"), PIN_GRP("ptp_clk", 21, 1, BIT(6), "ptp", "mii"), PIN_GRP("ptp_trig", 22, 1, BIT(7), "ptp", "mii"), PIN_GRP_GPIO_3("mii_col", 23, 1, BIT(8) | BIT(14), 0, BIT(8), BIT(14),
From: Gregory CLEMENT gregory.clement@bootlin.com
commit 4d98fbaacd79a82f408febb66a9c42fe42361b16 upstream.
Declare the PCIe1 Wakeup which was initially missing.
Signed-off-by: Gregory CLEMENT gregory.clement@bootlin.com Tested-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pinctrl/mvebu/pinctrl-armada-37xx.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/pinctrl/mvebu/pinctrl-armada-37xx.c +++ b/drivers/pinctrl/mvebu/pinctrl-armada-37xx.c @@ -185,6 +185,7 @@ static struct armada_37xx_pin_group arma PIN_GRP_GPIO("smi", 18, 2, BIT(4), "smi"), PIN_GRP_GPIO("pcie1", 3, 1, BIT(5), "pcie"), PIN_GRP_GPIO("pcie1_clkreq", 4, 1, BIT(9), "pcie"), + PIN_GRP_GPIO("pcie1_wakeup", 5, 1, BIT(10), "pcie"), PIN_GRP_GPIO("ptp", 20, 3, BIT(11) | BIT(12) | BIT(13), "ptp"), PIN_GRP("ptp_clk", 21, 1, BIT(6), "ptp", "mii"), PIN_GRP("ptp_trig", 22, 1, BIT(7), "ptp", "mii"),
From: "Marek Beh�n" kabel@kernel.org
commit baf8d6899b1e8906dc076ef26cc633e96a8bb0c3 upstream.
The PWM pins on North Bridge on Armada 37xx can be configured into PWM or GPIO functions. When in PWM function, each pin can also be configured to drive low on 0 and tri-state on 1 (LED mode).
The current definitions handle this by declaring two pin groups for each pin: - group "pwmN" with functions "pwm" and "gpio" - group "ledN_od" ("od" for open drain) with functions "led" and "gpio"
This is semantically incorrect. The correct definition for each pin should be one group with three functions: "pwm", "led" and "gpio".
Change the "pwmN" groups to support "led" function.
Remove "ledN_od" groups. This cannot break backwards compatibility with older device trees: no device tree uses it since there is no PWM driver for this SOC yet. Also "ledN_od" groups are not even documented.
Fixes: b835d6953009 ("pinctrl: armada-37xx: swap polarity on LED group") Signed-off-by: Marek Behún kabel@kernel.org Acked-by: Rob Herring robh@kernel.org Link: https://lore.kernel.org/r/20210719112938.27594-1-kabel@kernel.org Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/devicetree/bindings/pinctrl/marvell,armada-37xx-pinctrl.txt | 8 ++-- drivers/pinctrl/mvebu/pinctrl-armada-37xx.c | 17 ++++------ 2 files changed, 12 insertions(+), 13 deletions(-)
--- a/Documentation/devicetree/bindings/pinctrl/marvell,armada-37xx-pinctrl.txt +++ b/Documentation/devicetree/bindings/pinctrl/marvell,armada-37xx-pinctrl.txt @@ -43,19 +43,19 @@ group emmc_nb
group pwm0 - pin 11 (GPIO1-11) - - functions pwm, gpio + - functions pwm, led, gpio
group pwm1 - pin 12 - - functions pwm, gpio + - functions pwm, led, gpio
group pwm2 - pin 13 - - functions pwm, gpio + - functions pwm, led, gpio
group pwm3 - pin 14 - - functions pwm, gpio + - functions pwm, led, gpio
group pmic1 - pin 7 --- a/drivers/pinctrl/mvebu/pinctrl-armada-37xx.c +++ b/drivers/pinctrl/mvebu/pinctrl-armada-37xx.c @@ -153,10 +153,14 @@ static struct armada_37xx_pin_group arma PIN_GRP_GPIO("jtag", 20, 5, BIT(0), "jtag"), PIN_GRP_GPIO("sdio0", 8, 3, BIT(1), "sdio"), PIN_GRP_GPIO("emmc_nb", 27, 9, BIT(2), "emmc"), - PIN_GRP_GPIO("pwm0", 11, 1, BIT(3), "pwm"), - PIN_GRP_GPIO("pwm1", 12, 1, BIT(4), "pwm"), - PIN_GRP_GPIO("pwm2", 13, 1, BIT(5), "pwm"), - PIN_GRP_GPIO("pwm3", 14, 1, BIT(6), "pwm"), + PIN_GRP_GPIO_3("pwm0", 11, 1, BIT(3) | BIT(20), 0, BIT(20), BIT(3), + "pwm", "led"), + PIN_GRP_GPIO_3("pwm1", 12, 1, BIT(4) | BIT(21), 0, BIT(21), BIT(4), + "pwm", "led"), + PIN_GRP_GPIO_3("pwm2", 13, 1, BIT(5) | BIT(22), 0, BIT(22), BIT(5), + "pwm", "led"), + PIN_GRP_GPIO_3("pwm3", 14, 1, BIT(6) | BIT(23), 0, BIT(23), BIT(6), + "pwm", "led"), PIN_GRP_GPIO("pmic1", 7, 1, BIT(7), "pmic"), PIN_GRP_GPIO("pmic0", 6, 1, BIT(8), "pmic"), PIN_GRP_GPIO("i2c2", 2, 2, BIT(9), "i2c"), @@ -170,11 +174,6 @@ static struct armada_37xx_pin_group arma PIN_GRP_EXTRA("uart2", 9, 2, BIT(1) | BIT(13) | BIT(14) | BIT(19), BIT(1) | BIT(13) | BIT(14), BIT(1) | BIT(19), 18, 2, "gpio", "uart"), - PIN_GRP_GPIO_2("led0_od", 11, 1, BIT(20), BIT(20), 0, "led"), - PIN_GRP_GPIO_2("led1_od", 12, 1, BIT(21), BIT(21), 0, "led"), - PIN_GRP_GPIO_2("led2_od", 13, 1, BIT(22), BIT(22), 0, "led"), - PIN_GRP_GPIO_2("led3_od", 14, 1, BIT(23), BIT(23), 0, "led"), - };
static struct armada_37xx_pin_group armada_37xx_sb_groups[] = {
From: Miquel Raynal miquel.raynal@bootlin.com
commit a5470af981a0cc14a650af8da5186668971a4fc8 upstream.
One pin can be muxed as PCIe endpoint card reset.
Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Gregory CLEMENT gregory.clement@bootlin.com Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/marvell/armada-37xx.dtsi | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/arch/arm64/boot/dts/marvell/armada-37xx.dtsi +++ b/arch/arm64/boot/dts/marvell/armada-37xx.dtsi @@ -239,6 +239,15 @@ function = "mii"; };
+ pcie_reset_pins: pcie-reset-pins { + groups = "pcie1"; + function = "pcie"; + }; + + pcie_clkreq_pins: pcie-clkreq-pins { + groups = "pcie1_clkreq"; + function = "pcie"; + }; };
eth0: ethernet@30000 {
From: Marek Behún marek.behun@nic.cz
commit 715878016984b2617f6c1f177c50039e12e7bd5b upstream.
We found out that we are unable to control the PERST# signal via the default pin dedicated to be PERST# pin (GPIO2[3] pin) on A3700 SOC when this pin is in EP_PCIE1_Resetn mode. There is a register in the PCIe register space called PERSTN_GPIO_EN (D0088004[3]), but changing the value of this register does not change the pin output when measuring with voltmeter.
We do not know if this is a bug in the SOC, or if it works only when PCIe controller is in a certain state.
Commit f4c7d053d7f7 ("PCI: aardvark: Wait for endpoint to be ready before training link") says that when this pin changes pinctrl mode from EP_PCIE1_Resetn to GPIO, the PERST# signal is asserted for a brief moment.
So currently the situation is that on A3700 boards the PERST# signal is asserted in U-Boot (because the code in U-Boot issues reset via this pin via GPIO mode), and then in Linux by the obscure and undocumented mechanism described by the above mentioned commit.
We want to issue PERST# signal in a known way, therefore this patch changes the pcie_reset_pin function from "pcie" to "gpio" and adds the reset-gpios property to the PCIe node in device tree files of EspressoBin and Armada 3720 Dev Board (Turris Mox device tree already has this property and uDPU does not have a PCIe port).
Signed-off-by: Marek Behún marek.behun@nic.cz Cc: Remi Pommarel repk@triplefau.lt Tested-by: Tomasz Maciej Nowak tmn505@gmail.com Acked-by: Thomas Petazzoni thomas.petazzoni@bootlin.com Signed-off-by: Gregory CLEMENT gregory.clement@bootlin.com Signed-off-by: Marek Behún kabel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/marvell/armada-3720-db.dts | 3 +++ arch/arm64/boot/dts/marvell/armada-3720-espressobin.dts | 3 +++ arch/arm64/boot/dts/marvell/armada-37xx.dtsi | 2 +- 3 files changed, 7 insertions(+), 1 deletion(-)
--- a/arch/arm64/boot/dts/marvell/armada-3720-db.dts +++ b/arch/arm64/boot/dts/marvell/armada-3720-db.dts @@ -155,6 +155,9 @@
/* CON15(V2.0)/CON17(V1.4) : PCIe / CON15(V2.0)/CON12(V1.4) :mini-PCIe */ &pcie0 { + pinctrl-names = "default"; + pinctrl-0 = <&pcie_reset_pins &pcie_clkreq_pins>; + reset-gpios = <&gpiosb 3 GPIO_ACTIVE_LOW>; status = "okay"; };
--- a/arch/arm64/boot/dts/marvell/armada-3720-espressobin.dts +++ b/arch/arm64/boot/dts/marvell/armada-3720-espressobin.dts @@ -82,6 +82,9 @@
/* J9 */ &pcie0 { + pinctrl-names = "default"; + pinctrl-0 = <&pcie_reset_pins &pcie_clkreq_pins>; + reset-gpios = <&gpiosb 3 GPIO_ACTIVE_LOW>; status = "okay"; };
--- a/arch/arm64/boot/dts/marvell/armada-37xx.dtsi +++ b/arch/arm64/boot/dts/marvell/armada-37xx.dtsi @@ -241,7 +241,7 @@
pcie_reset_pins: pcie-reset-pins { groups = "pcie1"; - function = "pcie"; + function = "gpio"; };
pcie_clkreq_pins: pcie-clkreq-pins {
From: Nadav Amit namit@vmware.com
commit a4a118f2eead1d6c49e00765de89878288d4b890 upstream.
When __unmap_hugepage_range() calls to huge_pmd_unshare() succeed, a TLB flush is missing. This TLB flush must be performed before releasing the i_mmap_rwsem, in order to prevent an unshared PMDs page from being released and reused before the TLB flush took place.
Arguably, a comprehensive solution would use mmu_gather interface to batch the TLB flushes and the PMDs page release, however it is not an easy solution: (1) try_to_unmap_one() and try_to_migrate_one() also call huge_pmd_unshare() and they cannot use the mmu_gather interface; and (2) deferring the release of the page reference for the PMDs page until after i_mmap_rwsem is dropeed can confuse huge_pmd_unshare() into thinking PMDs are shared when they are not.
Fix __unmap_hugepage_range() by adding the missing TLB flush, and forcing a flush when unshare is successful.
Fixes: 24669e58477e ("hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages)" # 3.6 Signed-off-by: Nadav Amit namit@vmware.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Cc: Aneesh Kumar K.V aneesh.kumar@linux.vnet.ibm.com Cc: KAMEZAWA Hiroyuki kamezawa.hiroyu@jp.fujitsu.com Cc: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/include/asm/tlb.h | 8 ++++++++ arch/ia64/include/asm/tlb.h | 10 ++++++++++ arch/s390/include/asm/tlb.h | 14 ++++++++++++++ arch/sh/include/asm/tlb.h | 10 ++++++++++ arch/um/include/asm/tlb.h | 12 ++++++++++++ include/asm-generic/tlb.h | 2 ++ mm/hugetlb.c | 19 +++++++++++++++++++ mm/memory.c | 10 ++++++++++ 8 files changed, 85 insertions(+)
--- a/arch/arm/include/asm/tlb.h +++ b/arch/arm/include/asm/tlb.h @@ -280,6 +280,14 @@ tlb_remove_pmd_tlb_entry(struct mmu_gath tlb_add_flush(tlb, addr); }
+static inline void +tlb_flush_pmd_range(struct mmu_gather *tlb, unsigned long address, + unsigned long size) +{ + tlb_add_flush(tlb, address); + tlb_add_flush(tlb, address + size - PMD_SIZE); +} + #define pte_free_tlb(tlb, ptep, addr) __pte_free_tlb(tlb, ptep, addr) #define pmd_free_tlb(tlb, pmdp, addr) __pmd_free_tlb(tlb, pmdp, addr) #define pud_free_tlb(tlb, pudp, addr) pud_free((tlb)->mm, pudp) --- a/arch/ia64/include/asm/tlb.h +++ b/arch/ia64/include/asm/tlb.h @@ -269,6 +269,16 @@ __tlb_remove_tlb_entry (struct mmu_gathe tlb->end_addr = address + PAGE_SIZE; }
+static inline void +tlb_flush_pmd_range(struct mmu_gather *tlb, unsigned long address, + unsigned long size) +{ + if (tlb->start_addr > address) + tlb->start_addr = address; + if (tlb->end_addr < address + size) + tlb->end_addr = address + size; +} + #define tlb_migrate_finish(mm) platform_tlb_migrate_finish(mm)
#define tlb_start_vma(tlb, vma) do { } while (0) --- a/arch/s390/include/asm/tlb.h +++ b/arch/s390/include/asm/tlb.h @@ -116,6 +116,20 @@ static inline void tlb_remove_page_size( return tlb_remove_page(tlb, page); }
+static inline void tlb_flush_pmd_range(struct mmu_gather *tlb, + unsigned long address, unsigned long size) +{ + /* + * the range might exceed the original range that was provided to + * tlb_gather_mmu(), so we need to update it despite the fact it is + * usually not updated. + */ + if (tlb->start > address) + tlb->start = address; + if (tlb->end < address + size) + tlb->end = address + size; +} + /* * pte_free_tlb frees a pte table and clears the CRSTE for the * page table from the tlb. --- a/arch/sh/include/asm/tlb.h +++ b/arch/sh/include/asm/tlb.h @@ -127,6 +127,16 @@ static inline void tlb_remove_page_size( return tlb_remove_page(tlb, page); }
+static inline void +tlb_flush_pmd_range(struct mmu_gather *tlb, unsigned long address, + unsigned long size) +{ + if (tlb->start > address) + tlb->start = address; + if (tlb->end < address + size) + tlb->end = address + size; +} + #define tlb_remove_check_page_size_change tlb_remove_check_page_size_change static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb, unsigned int page_size) --- a/arch/um/include/asm/tlb.h +++ b/arch/um/include/asm/tlb.h @@ -130,6 +130,18 @@ static inline void tlb_remove_page_size( return tlb_remove_page(tlb, page); }
+static inline void +tlb_flush_pmd_range(struct mmu_gather *tlb, unsigned long address, + unsigned long size) +{ + tlb->need_flush = 1; + + if (tlb->start > address) + tlb->start = address; + if (tlb->end < address + size) + tlb->end = address + size; +} + /** * tlb_remove_tlb_entry - remember a pte unmapping for later tlb invalidation. * --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -117,6 +117,8 @@ void arch_tlb_gather_mmu(struct mmu_gath void tlb_flush_mmu(struct mmu_gather *tlb); void arch_tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end, bool force); +void tlb_flush_pmd_range(struct mmu_gather *tlb, unsigned long address, + unsigned long size); extern bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page, int page_size);
--- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3386,6 +3386,7 @@ void __unmap_hugepage_range(struct mmu_g unsigned long sz = huge_page_size(h); const unsigned long mmun_start = start; /* For mmu_notifiers */ const unsigned long mmun_end = end; /* For mmu_notifiers */ + bool force_flush = false;
WARN_ON(!is_vm_hugetlb_page(vma)); BUG_ON(start & ~huge_page_mask(h)); @@ -3407,6 +3408,8 @@ void __unmap_hugepage_range(struct mmu_g ptl = huge_pte_lock(h, mm, ptep); if (huge_pmd_unshare(mm, &address, ptep)) { spin_unlock(ptl); + tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE); + force_flush = true; continue; }
@@ -3463,6 +3466,22 @@ void __unmap_hugepage_range(struct mmu_g } mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); tlb_end_vma(tlb, vma); + + /* + * If we unshared PMDs, the TLB flush was not recorded in mmu_gather. We + * could defer the flush until now, since by holding i_mmap_rwsem we + * guaranteed that the last refernece would not be dropped. But we must + * do the flushing before we return, as otherwise i_mmap_rwsem will be + * dropped and the last reference to the shared PMDs page might be + * dropped as well. + * + * In theory we could defer the freeing of the PMD pages as well, but + * huge_pmd_unshare() relies on the exact page_count for the PMD page to + * detect sharing, so we cannot defer the release of the page either. + * Instead, do flush now. + */ + if (force_flush) + tlb_flush_mmu(tlb); }
void __unmap_hugepage_range_final(struct mmu_gather *tlb, --- a/mm/memory.c +++ b/mm/memory.c @@ -335,6 +335,16 @@ bool __tlb_remove_page_size(struct mmu_g return false; }
+void tlb_flush_pmd_range(struct mmu_gather *tlb, unsigned long address, + unsigned long size) +{ + if (tlb->page_size != 0 && tlb->page_size != PMD_SIZE) + tlb_flush_mmu(tlb); + + tlb->page_size = PMD_SIZE; + tlb->start = min(tlb->start, address); + tlb->end = max(tlb->end, address + size); +} #endif /* HAVE_GENERIC_MMU_GATHER */
#ifdef CONFIG_HAVE_RCU_TABLE_FREE
From: David Hildenbrand david@redhat.com
commit c1e63117711977cc4295b2ce73de29dd17066c82 upstream.
To clear a user buffer we cannot simply use memset, we have to use clear_user(). With a virtio-mem device that registers a vmcore_cb and has some logically unplugged memory inside an added Linux memory block, I can easily trigger a BUG by copying the vmcore via "cp":
systemd[1]: Starting Kdump Vmcore Save Service... kdump[420]: Kdump is using the default log level(3). kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/ kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/ kdump[465]: saving vmcore-dmesg.txt complete kdump[467]: saving vmcore BUG: unable to handle page fault for address: 00007f2374e01000 #PF: supervisor write access in kernel mode #PF: error_code(0x0003) - permissions violation PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867 Oops: 0003 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014 RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86 Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81 RSP: 0018:ffffc9000073be08 EFLAGS: 00010212 RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000 RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008 RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50 R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000 R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8 FS: 00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0 Call Trace: read_vmcore+0x236/0x2c0 proc_reg_read+0x55/0xa0 vfs_read+0x95/0x190 ksys_read+0x4f/0xc0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae
Some x86-64 CPUs have a CPU feature called "Supervisor Mode Access Prevention (SMAP)", which is used to detect wrong access from the kernel to user buffers like this: SMAP triggers a permissions violation on wrong access. In the x86-64 variant of clear_user(), SMAP is properly handled via clac()+stac().
To fix, properly use clear_user() when we're dealing with a user buffer.
Link: https://lkml.kernel.org/r/20211112092750.6921-1-david@redhat.com Fixes: 997c136f518c ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages") Signed-off-by: David Hildenbrand david@redhat.com Acked-by: Baoquan He bhe@redhat.com Cc: Dave Young dyoung@redhat.com Cc: Baoquan He bhe@redhat.com Cc: Vivek Goyal vgoyal@redhat.com Cc: Philipp Rudo prudo@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/proc/vmcore.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
--- a/fs/proc/vmcore.c +++ b/fs/proc/vmcore.c @@ -105,14 +105,19 @@ static ssize_t read_from_oldmem(char *bu nr_bytes = count;
/* If pfn is not ram, return zeros for sparse dump files */ - if (pfn_is_ram(pfn) == 0) - memset(buf, 0, nr_bytes); - else { + if (pfn_is_ram(pfn) == 0) { + tmp = 0; + if (!userbuf) + memset(buf, 0, nr_bytes); + else if (clear_user(buf, nr_bytes)) + tmp = -EFAULT; + } else { tmp = copy_oldmem_page(pfn, buf, nr_bytes, offset, userbuf); - if (tmp < 0) - return tmp; } + if (tmp < 0) + return tmp; + *ppos += nr_bytes; count -= nr_bytes; buf += nr_bytes;
From: Lin Ma linma@zju.edu.cn
commit 48b71a9e66c2eab60564b1b1c85f4928ed04e406 upstream.
There are two sites that calls queue_work() after the destroy_workqueue() and lead to possible UAF.
The first site is nci_send_cmd(), which can happen after the nci_close_device as below
nfcmrvl_nci_unregister_dev | nfc_genl_dev_up nci_close_device | flush_workqueue | del_timer_sync | nci_unregister_device | nfc_get_device destroy_workqueue | nfc_dev_up nfc_unregister_device | nci_dev_up device_del | nci_open_device | __nci_request | nci_send_cmd | queue_work !!!
Another site is nci_cmd_timer, awaked by the nci_cmd_work from the nci_send_cmd.
... | ... nci_unregister_device | queue_work destroy_workqueue | nfc_unregister_device | ... device_del | nci_cmd_work | mod_timer | ... | nci_cmd_timer | queue_work !!!
For the above two UAF, the root cause is that the nfc_dev_up can race between the nci_unregister_device routine. Therefore, this patch introduce NCI_UNREG flag to easily eliminate the possible race. In addition, the mutex_lock in nci_close_device can act as a barrier.
Signed-off-by: Lin Ma linma@zju.edu.cn Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation") Reviewed-by: Jakub Kicinski kuba@kernel.org Reviewed-by: Krzysztof Kozlowski krzysztof.kozlowski@canonical.com Link: https://lore.kernel.org/r/20211116152732.19238-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/nfc/nci_core.h | 1 + net/nfc/nci/core.c | 19 +++++++++++++++++-- 2 files changed, 18 insertions(+), 2 deletions(-)
--- a/include/net/nfc/nci_core.h +++ b/include/net/nfc/nci_core.h @@ -42,6 +42,7 @@ enum nci_flag { NCI_UP, NCI_DATA_EXCHANGE, NCI_DATA_EXCHANGE_TO, + NCI_UNREG, };
/* NCI device states */ --- a/net/nfc/nci/core.c +++ b/net/nfc/nci/core.c @@ -485,6 +485,11 @@ static int nci_open_device(struct nci_de
mutex_lock(&ndev->req_lock);
+ if (test_bit(NCI_UNREG, &ndev->flags)) { + rc = -ENODEV; + goto done; + } + if (test_bit(NCI_UP, &ndev->flags)) { rc = -EALREADY; goto done; @@ -548,6 +553,10 @@ done: static int nci_close_device(struct nci_dev *ndev) { nci_req_cancel(ndev, ENODEV); + + /* This mutex needs to be held as a barrier for + * caller nci_unregister_device + */ mutex_lock(&ndev->req_lock);
if (!test_and_clear_bit(NCI_UP, &ndev->flags)) { @@ -585,8 +594,8 @@ static int nci_close_device(struct nci_d /* Flush cmd wq */ flush_workqueue(ndev->cmd_wq);
- /* Clear flags */ - ndev->flags = 0; + /* Clear flags except NCI_UNREG */ + ndev->flags &= BIT(NCI_UNREG);
mutex_unlock(&ndev->req_lock);
@@ -1270,6 +1279,12 @@ void nci_unregister_device(struct nci_de { struct nci_conn_info *conn_info, *n;
+ /* This set_bit is not protected with specialized barrier, + * However, it is fine because the mutex_lock(&ndev->req_lock); + * in nci_close_device() will help to emit one. + */ + set_bit(NCI_UNREG, &ndev->flags); + nci_close_device(ndev);
destroy_workqueue(ndev->cmd_wq);
From: Miklos Szeredi mszeredi@redhat.com
commit 473441720c8616dfaf4451f9c7ea14f0eb5e5d65 upstream.
Checking buf->flags should be done before the pipe_buf_release() is called on the pipe buffer, since releasing the buffer might modify the flags.
This is exactly what page_cache_pipe_buf_release() does, and which results in the same VM_BUG_ON_PAGE(PageLRU(page)) that the original patch was trying to fix.
Reported-by: Justin Forbes jmforbes@linuxtx.org Fixes: 712a951025c0 ("fuse: fix page stealing") Cc: stable@vger.kernel.org # v2.6.35 Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/fuse/dev.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -897,17 +897,17 @@ static int fuse_try_move_page(struct fus goto out_put_old; }
+ get_page(newpage); + + if (!(buf->flags & PIPE_BUF_FLAG_LRU)) + lru_cache_add_file(newpage); + /* * Release while we have extra ref on stolen page. Otherwise * anon_pipe_buf_release() might think the page can be reused. */ pipe_buf_release(cs->pipe, buf);
- get_page(newpage); - - if (!(buf->flags & PIPE_BUF_FLAG_LRU)) - lru_cache_add_file(newpage); - err = 0; spin_lock(&cs->req->waitq.lock); if (test_bit(FR_ABORTED, &cs->req->flags))
From: Juergen Gross jgross@suse.com
commit 629a5d87e26fe96bcaab44cbb81f5866af6f7008 upstream.
Sync include/xen/interface/io/ring.h with Xen's newest version in order to get the RING_COPY_RESPONSE() and RING_RESPONSE_PROD_OVERFLOW() macros.
Note that this will correct the wrong license info by adding the missing original copyright notice.
Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/xen/interface/io/ring.h | 307 +++++++++++++++++++++------------------- 1 file changed, 165 insertions(+), 142 deletions(-)
--- a/include/xen/interface/io/ring.h +++ b/include/xen/interface/io/ring.h @@ -1,21 +1,53 @@ -/* SPDX-License-Identifier: GPL-2.0 */ /****************************************************************************** * ring.h * * Shared producer-consumer ring macros. * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * * Tim Deegan and Andrew Warfield November 2004. */
#ifndef __XEN_PUBLIC_IO_RING_H__ #define __XEN_PUBLIC_IO_RING_H__
+/* + * When #include'ing this header, you need to provide the following + * declaration upfront: + * - standard integers types (uint8_t, uint16_t, etc) + * They are provided by stdint.h of the standard headers. + * + * In addition, if you intend to use the FLEX macros, you also need to + * provide the following, before invoking the FLEX macros: + * - size_t + * - memcpy + * - grant_ref_t + * These declarations are provided by string.h of the standard headers, + * and grant_table.h from the Xen public headers. + */ + #include <xen/interface/grant_table.h>
typedef unsigned int RING_IDX;
/* Round a 32-bit unsigned constant down to the nearest power of two. */ -#define __RD2(_x) (((_x) & 0x00000002) ? 0x2 : ((_x) & 0x1)) +#define __RD2(_x) (((_x) & 0x00000002) ? 0x2 : ((_x) & 0x1)) #define __RD4(_x) (((_x) & 0x0000000c) ? __RD2((_x)>>2)<<2 : __RD2(_x)) #define __RD8(_x) (((_x) & 0x000000f0) ? __RD4((_x)>>4)<<4 : __RD4(_x)) #define __RD16(_x) (((_x) & 0x0000ff00) ? __RD8((_x)>>8)<<8 : __RD8(_x)) @@ -27,82 +59,79 @@ typedef unsigned int RING_IDX; * A ring contains as many entries as will fit, rounded down to the nearest * power of two (so we can mask with (size-1) to loop around). */ -#define __CONST_RING_SIZE(_s, _sz) \ - (__RD32(((_sz) - offsetof(struct _s##_sring, ring)) / \ - sizeof(((struct _s##_sring *)0)->ring[0]))) - +#define __CONST_RING_SIZE(_s, _sz) \ + (__RD32(((_sz) - offsetof(struct _s##_sring, ring)) / \ + sizeof(((struct _s##_sring *)0)->ring[0]))) /* * The same for passing in an actual pointer instead of a name tag. */ -#define __RING_SIZE(_s, _sz) \ - (__RD32(((_sz) - (long)&(_s)->ring + (long)(_s)) / sizeof((_s)->ring[0]))) +#define __RING_SIZE(_s, _sz) \ + (__RD32(((_sz) - (long)(_s)->ring + (long)(_s)) / sizeof((_s)->ring[0])))
/* * Macros to make the correct C datatypes for a new kind of ring. * * To make a new ring datatype, you need to have two message structures, - * let's say struct request, and struct response already defined. + * let's say request_t, and response_t already defined. * * In a header where you want the ring datatype declared, you then do: * - * DEFINE_RING_TYPES(mytag, struct request, struct response); + * DEFINE_RING_TYPES(mytag, request_t, response_t); * * These expand out to give you a set of types, as you can see below. * The most important of these are: * - * struct mytag_sring - The shared ring. - * struct mytag_front_ring - The 'front' half of the ring. - * struct mytag_back_ring - The 'back' half of the ring. + * mytag_sring_t - The shared ring. + * mytag_front_ring_t - The 'front' half of the ring. + * mytag_back_ring_t - The 'back' half of the ring. * * To initialize a ring in your code you need to know the location and size * of the shared memory area (PAGE_SIZE, for instance). To initialise * the front half: * - * struct mytag_front_ring front_ring; - * SHARED_RING_INIT((struct mytag_sring *)shared_page); - * FRONT_RING_INIT(&front_ring, (struct mytag_sring *)shared_page, - * PAGE_SIZE); + * mytag_front_ring_t front_ring; + * SHARED_RING_INIT((mytag_sring_t *)shared_page); + * FRONT_RING_INIT(&front_ring, (mytag_sring_t *)shared_page, PAGE_SIZE); * * Initializing the back follows similarly (note that only the front * initializes the shared ring): * - * struct mytag_back_ring back_ring; - * BACK_RING_INIT(&back_ring, (struct mytag_sring *)shared_page, - * PAGE_SIZE); + * mytag_back_ring_t back_ring; + * BACK_RING_INIT(&back_ring, (mytag_sring_t *)shared_page, PAGE_SIZE); */
-#define DEFINE_RING_TYPES(__name, __req_t, __rsp_t) \ - \ -/* Shared ring entry */ \ -union __name##_sring_entry { \ - __req_t req; \ - __rsp_t rsp; \ -}; \ - \ -/* Shared ring page */ \ -struct __name##_sring { \ - RING_IDX req_prod, req_event; \ - RING_IDX rsp_prod, rsp_event; \ - uint8_t pad[48]; \ - union __name##_sring_entry ring[1]; /* variable-length */ \ -}; \ - \ -/* "Front" end's private variables */ \ -struct __name##_front_ring { \ - RING_IDX req_prod_pvt; \ - RING_IDX rsp_cons; \ - unsigned int nr_ents; \ - struct __name##_sring *sring; \ -}; \ - \ -/* "Back" end's private variables */ \ -struct __name##_back_ring { \ - RING_IDX rsp_prod_pvt; \ - RING_IDX req_cons; \ - unsigned int nr_ents; \ - struct __name##_sring *sring; \ -}; - +#define DEFINE_RING_TYPES(__name, __req_t, __rsp_t) \ + \ +/* Shared ring entry */ \ +union __name##_sring_entry { \ + __req_t req; \ + __rsp_t rsp; \ +}; \ + \ +/* Shared ring page */ \ +struct __name##_sring { \ + RING_IDX req_prod, req_event; \ + RING_IDX rsp_prod, rsp_event; \ + uint8_t __pad[48]; \ + union __name##_sring_entry ring[1]; /* variable-length */ \ +}; \ + \ +/* "Front" end's private variables */ \ +struct __name##_front_ring { \ + RING_IDX req_prod_pvt; \ + RING_IDX rsp_cons; \ + unsigned int nr_ents; \ + struct __name##_sring *sring; \ +}; \ + \ +/* "Back" end's private variables */ \ +struct __name##_back_ring { \ + RING_IDX rsp_prod_pvt; \ + RING_IDX req_cons; \ + unsigned int nr_ents; \ + struct __name##_sring *sring; \ +}; \ + \ /* * Macros for manipulating rings. * @@ -119,105 +148,99 @@ struct __name##_back_ring { \ */
/* Initialising empty rings */ -#define SHARED_RING_INIT(_s) do { \ - (_s)->req_prod = (_s)->rsp_prod = 0; \ - (_s)->req_event = (_s)->rsp_event = 1; \ - memset((_s)->pad, 0, sizeof((_s)->pad)); \ +#define SHARED_RING_INIT(_s) do { \ + (_s)->req_prod = (_s)->rsp_prod = 0; \ + (_s)->req_event = (_s)->rsp_event = 1; \ + (void)memset((_s)->__pad, 0, sizeof((_s)->__pad)); \ } while(0)
-#define FRONT_RING_INIT(_r, _s, __size) do { \ - (_r)->req_prod_pvt = 0; \ - (_r)->rsp_cons = 0; \ - (_r)->nr_ents = __RING_SIZE(_s, __size); \ - (_r)->sring = (_s); \ +#define FRONT_RING_ATTACH(_r, _s, _i, __size) do { \ + (_r)->req_prod_pvt = (_i); \ + (_r)->rsp_cons = (_i); \ + (_r)->nr_ents = __RING_SIZE(_s, __size); \ + (_r)->sring = (_s); \ } while (0)
-#define BACK_RING_INIT(_r, _s, __size) do { \ - (_r)->rsp_prod_pvt = 0; \ - (_r)->req_cons = 0; \ - (_r)->nr_ents = __RING_SIZE(_s, __size); \ - (_r)->sring = (_s); \ -} while (0) +#define FRONT_RING_INIT(_r, _s, __size) FRONT_RING_ATTACH(_r, _s, 0, __size)
-/* Initialize to existing shared indexes -- for recovery */ -#define FRONT_RING_ATTACH(_r, _s, __size) do { \ - (_r)->sring = (_s); \ - (_r)->req_prod_pvt = (_s)->req_prod; \ - (_r)->rsp_cons = (_s)->rsp_prod; \ - (_r)->nr_ents = __RING_SIZE(_s, __size); \ +#define BACK_RING_ATTACH(_r, _s, _i, __size) do { \ + (_r)->rsp_prod_pvt = (_i); \ + (_r)->req_cons = (_i); \ + (_r)->nr_ents = __RING_SIZE(_s, __size); \ + (_r)->sring = (_s); \ } while (0)
-#define BACK_RING_ATTACH(_r, _s, __size) do { \ - (_r)->sring = (_s); \ - (_r)->rsp_prod_pvt = (_s)->rsp_prod; \ - (_r)->req_cons = (_s)->req_prod; \ - (_r)->nr_ents = __RING_SIZE(_s, __size); \ -} while (0) +#define BACK_RING_INIT(_r, _s, __size) BACK_RING_ATTACH(_r, _s, 0, __size)
/* How big is this ring? */ -#define RING_SIZE(_r) \ +#define RING_SIZE(_r) \ ((_r)->nr_ents)
/* Number of free requests (for use on front side only). */ -#define RING_FREE_REQUESTS(_r) \ +#define RING_FREE_REQUESTS(_r) \ (RING_SIZE(_r) - ((_r)->req_prod_pvt - (_r)->rsp_cons))
/* Test if there is an empty slot available on the front ring. * (This is only meaningful from the front. ) */ -#define RING_FULL(_r) \ +#define RING_FULL(_r) \ (RING_FREE_REQUESTS(_r) == 0)
/* Test if there are outstanding messages to be processed on a ring. */ -#define RING_HAS_UNCONSUMED_RESPONSES(_r) \ +#define RING_HAS_UNCONSUMED_RESPONSES(_r) \ ((_r)->sring->rsp_prod - (_r)->rsp_cons)
-#define RING_HAS_UNCONSUMED_REQUESTS(_r) \ - ({ \ - unsigned int req = (_r)->sring->req_prod - (_r)->req_cons; \ - unsigned int rsp = RING_SIZE(_r) - \ - ((_r)->req_cons - (_r)->rsp_prod_pvt); \ - req < rsp ? req : rsp; \ - }) +#define RING_HAS_UNCONSUMED_REQUESTS(_r) ({ \ + unsigned int req = (_r)->sring->req_prod - (_r)->req_cons; \ + unsigned int rsp = RING_SIZE(_r) - \ + ((_r)->req_cons - (_r)->rsp_prod_pvt); \ + req < rsp ? req : rsp; \ +})
/* Direct access to individual ring elements, by index. */ -#define RING_GET_REQUEST(_r, _idx) \ +#define RING_GET_REQUEST(_r, _idx) \ (&((_r)->sring->ring[((_idx) & (RING_SIZE(_r) - 1))].req))
+#define RING_GET_RESPONSE(_r, _idx) \ + (&((_r)->sring->ring[((_idx) & (RING_SIZE(_r) - 1))].rsp)) + /* - * Get a local copy of a request. + * Get a local copy of a request/response. * - * Use this in preference to RING_GET_REQUEST() so all processing is + * Use this in preference to RING_GET_{REQUEST,RESPONSE}() so all processing is * done on a local copy that cannot be modified by the other end. * * Note that https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 may cause this - * to be ineffective where _req is a struct which consists of only bitfields. + * to be ineffective where dest is a struct which consists of only bitfields. */ -#define RING_COPY_REQUEST(_r, _idx, _req) do { \ - /* Use volatile to force the copy into _req. */ \ - *(_req) = *(volatile typeof(_req))RING_GET_REQUEST(_r, _idx); \ +#define RING_COPY_(type, r, idx, dest) do { \ + /* Use volatile to force the copy into dest. */ \ + *(dest) = *(volatile typeof(dest))RING_GET_##type(r, idx); \ } while (0)
-#define RING_GET_RESPONSE(_r, _idx) \ - (&((_r)->sring->ring[((_idx) & (RING_SIZE(_r) - 1))].rsp)) +#define RING_COPY_REQUEST(r, idx, req) RING_COPY_(REQUEST, r, idx, req) +#define RING_COPY_RESPONSE(r, idx, rsp) RING_COPY_(RESPONSE, r, idx, rsp)
/* Loop termination condition: Would the specified index overflow the ring? */ -#define RING_REQUEST_CONS_OVERFLOW(_r, _cons) \ +#define RING_REQUEST_CONS_OVERFLOW(_r, _cons) \ (((_cons) - (_r)->rsp_prod_pvt) >= RING_SIZE(_r))
/* Ill-behaved frontend determination: Can there be this many requests? */ -#define RING_REQUEST_PROD_OVERFLOW(_r, _prod) \ +#define RING_REQUEST_PROD_OVERFLOW(_r, _prod) \ (((_prod) - (_r)->rsp_prod_pvt) > RING_SIZE(_r))
- -#define RING_PUSH_REQUESTS(_r) do { \ - virt_wmb(); /* back sees requests /before/ updated producer index */ \ - (_r)->sring->req_prod = (_r)->req_prod_pvt; \ +/* Ill-behaved backend determination: Can there be this many responses? */ +#define RING_RESPONSE_PROD_OVERFLOW(_r, _prod) \ + (((_prod) - (_r)->rsp_cons) > RING_SIZE(_r)) + +#define RING_PUSH_REQUESTS(_r) do { \ + virt_wmb(); /* back sees requests /before/ updated producer index */\ + (_r)->sring->req_prod = (_r)->req_prod_pvt; \ } while (0)
-#define RING_PUSH_RESPONSES(_r) do { \ - virt_wmb(); /* front sees responses /before/ updated producer index */ \ - (_r)->sring->rsp_prod = (_r)->rsp_prod_pvt; \ +#define RING_PUSH_RESPONSES(_r) do { \ + virt_wmb(); /* front sees resps /before/ updated producer index */ \ + (_r)->sring->rsp_prod = (_r)->rsp_prod_pvt; \ } while (0)
/* @@ -250,40 +273,40 @@ struct __name##_back_ring { \ * field appropriately. */
-#define RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(_r, _notify) do { \ - RING_IDX __old = (_r)->sring->req_prod; \ - RING_IDX __new = (_r)->req_prod_pvt; \ - virt_wmb(); /* back sees requests /before/ updated producer index */ \ - (_r)->sring->req_prod = __new; \ - virt_mb(); /* back sees new requests /before/ we check req_event */ \ - (_notify) = ((RING_IDX)(__new - (_r)->sring->req_event) < \ - (RING_IDX)(__new - __old)); \ -} while (0) - -#define RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(_r, _notify) do { \ - RING_IDX __old = (_r)->sring->rsp_prod; \ - RING_IDX __new = (_r)->rsp_prod_pvt; \ - virt_wmb(); /* front sees responses /before/ updated producer index */ \ - (_r)->sring->rsp_prod = __new; \ - virt_mb(); /* front sees new responses /before/ we check rsp_event */ \ - (_notify) = ((RING_IDX)(__new - (_r)->sring->rsp_event) < \ - (RING_IDX)(__new - __old)); \ -} while (0) - -#define RING_FINAL_CHECK_FOR_REQUESTS(_r, _work_to_do) do { \ - (_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \ - if (_work_to_do) break; \ - (_r)->sring->req_event = (_r)->req_cons + 1; \ - virt_mb(); \ - (_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \ -} while (0) - -#define RING_FINAL_CHECK_FOR_RESPONSES(_r, _work_to_do) do { \ - (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \ - if (_work_to_do) break; \ - (_r)->sring->rsp_event = (_r)->rsp_cons + 1; \ - virt_mb(); \ - (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \ +#define RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(_r, _notify) do { \ + RING_IDX __old = (_r)->sring->req_prod; \ + RING_IDX __new = (_r)->req_prod_pvt; \ + virt_wmb(); /* back sees requests /before/ updated producer index */\ + (_r)->sring->req_prod = __new; \ + virt_mb(); /* back sees new requests /before/ we check req_event */ \ + (_notify) = ((RING_IDX)(__new - (_r)->sring->req_event) < \ + (RING_IDX)(__new - __old)); \ +} while (0) + +#define RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(_r, _notify) do { \ + RING_IDX __old = (_r)->sring->rsp_prod; \ + RING_IDX __new = (_r)->rsp_prod_pvt; \ + virt_wmb(); /* front sees resps /before/ updated producer index */ \ + (_r)->sring->rsp_prod = __new; \ + virt_mb(); /* front sees new resps /before/ we check rsp_event */ \ + (_notify) = ((RING_IDX)(__new - (_r)->sring->rsp_event) < \ + (RING_IDX)(__new - __old)); \ +} while (0) + +#define RING_FINAL_CHECK_FOR_REQUESTS(_r, _work_to_do) do { \ + (_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \ + if (_work_to_do) break; \ + (_r)->sring->req_event = (_r)->req_cons + 1; \ + virt_mb(); \ + (_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \ +} while (0) + +#define RING_FINAL_CHECK_FOR_RESPONSES(_r, _work_to_do) do { \ + (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \ + if (_work_to_do) break; \ + (_r)->sring->rsp_event = (_r)->rsp_cons + 1; \ + virt_mb(); \ + (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \ } while (0)
From: Juergen Gross jgross@suse.com
commit 71b66243f9898d0e54296b4e7035fb33cdcb0707 upstream.
In order to avoid problems in case the backend is modifying a response on the ring page while the frontend has already seen it, just read the response into a local buffer in one go and then operate on that buffer only.
Signed-off-by: Juergen Gross jgross@suse.com Reviewed-by: Jan Beulich jbeulich@suse.com Acked-by: Roger Pau Monné roger.pau@citrix.com Link: https://lore.kernel.org/r/20210730103854.12681-2-jgross@suse.com Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/xen-blkfront.c | 35 ++++++++++++++++++----------------- 1 file changed, 18 insertions(+), 17 deletions(-)
--- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -1550,7 +1550,7 @@ static bool blkif_completion(unsigned lo static irqreturn_t blkif_interrupt(int irq, void *dev_id) { struct request *req; - struct blkif_response *bret; + struct blkif_response bret; RING_IDX i, rp; unsigned long flags; struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id; @@ -1567,8 +1567,9 @@ static irqreturn_t blkif_interrupt(int i for (i = rinfo->ring.rsp_cons; i != rp; i++) { unsigned long id;
- bret = RING_GET_RESPONSE(&rinfo->ring, i); - id = bret->id; + RING_COPY_RESPONSE(&rinfo->ring, i, &bret); + id = bret.id; + /* * The backend has messed up and given us an id that we would * never have given to it (we stamp it up to BLK_RING_SIZE - @@ -1576,39 +1577,39 @@ static irqreturn_t blkif_interrupt(int i */ if (id >= BLK_RING_SIZE(info)) { WARN(1, "%s: response to %s has incorrect id (%ld)\n", - info->gd->disk_name, op_name(bret->operation), id); + info->gd->disk_name, op_name(bret.operation), id); /* We can't safely get the 'struct request' as * the id is busted. */ continue; } req = rinfo->shadow[id].request;
- if (bret->operation != BLKIF_OP_DISCARD) { + if (bret.operation != BLKIF_OP_DISCARD) { /* * We may need to wait for an extra response if the * I/O request is split in 2 */ - if (!blkif_completion(&id, rinfo, bret)) + if (!blkif_completion(&id, rinfo, &bret)) continue; }
if (add_id_to_freelist(rinfo, id)) { WARN(1, "%s: response to %s (id %ld) couldn't be recycled!\n", - info->gd->disk_name, op_name(bret->operation), id); + info->gd->disk_name, op_name(bret.operation), id); continue; }
- if (bret->status == BLKIF_RSP_OKAY) + if (bret.status == BLKIF_RSP_OKAY) blkif_req(req)->error = BLK_STS_OK; else blkif_req(req)->error = BLK_STS_IOERR;
- switch (bret->operation) { + switch (bret.operation) { case BLKIF_OP_DISCARD: - if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { + if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) { struct request_queue *rq = info->rq; printk(KERN_WARNING "blkfront: %s: %s op failed\n", - info->gd->disk_name, op_name(bret->operation)); + info->gd->disk_name, op_name(bret.operation)); blkif_req(req)->error = BLK_STS_NOTSUPP; info->feature_discard = 0; info->feature_secdiscard = 0; @@ -1618,15 +1619,15 @@ static irqreturn_t blkif_interrupt(int i break; case BLKIF_OP_FLUSH_DISKCACHE: case BLKIF_OP_WRITE_BARRIER: - if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { + if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) { printk(KERN_WARNING "blkfront: %s: %s op failed\n", - info->gd->disk_name, op_name(bret->operation)); + info->gd->disk_name, op_name(bret.operation)); blkif_req(req)->error = BLK_STS_NOTSUPP; } - if (unlikely(bret->status == BLKIF_RSP_ERROR && + if (unlikely(bret.status == BLKIF_RSP_ERROR && rinfo->shadow[id].req.u.rw.nr_segments == 0)) { printk(KERN_WARNING "blkfront: %s: empty %s op failed\n", - info->gd->disk_name, op_name(bret->operation)); + info->gd->disk_name, op_name(bret.operation)); blkif_req(req)->error = BLK_STS_NOTSUPP; } if (unlikely(blkif_req(req)->error)) { @@ -1639,9 +1640,9 @@ static irqreturn_t blkif_interrupt(int i /* fall through */ case BLKIF_OP_READ: case BLKIF_OP_WRITE: - if (unlikely(bret->status != BLKIF_RSP_OKAY)) + if (unlikely(bret.status != BLKIF_RSP_OKAY)) dev_dbg(&info->xbdev->dev, "Bad return from blkdev data " - "request: %x\n", bret->status); + "request: %x\n", bret.status);
break; default:
From: Juergen Gross jgross@suse.com
commit 8f5a695d99000fc3aa73934d7ced33cfc64dcdab upstream.
In order to avoid a malicious backend being able to influence the local copy of a request build the request locally first and then copy it to the ring page instead of doing it the other way round as today.
Signed-off-by: Juergen Gross jgross@suse.com Reviewed-by: Jan Beulich jbeulich@suse.com Acked-by: Roger Pau Monné roger.pau@citrix.com Link: https://lore.kernel.org/r/20210730103854.12681-3-jgross@suse.com Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/xen-blkfront.c | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-)
--- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -537,7 +537,7 @@ static unsigned long blkif_ring_get_requ rinfo->shadow[id].status = REQ_WAITING; rinfo->shadow[id].associated_id = NO_ASSOCIATED_ID;
- (*ring_req)->u.rw.id = id; + rinfo->shadow[id].req.u.rw.id = id;
return id; } @@ -545,11 +545,12 @@ static unsigned long blkif_ring_get_requ static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_info *rinfo) { struct blkfront_info *info = rinfo->dev_info; - struct blkif_request *ring_req; + struct blkif_request *ring_req, *final_ring_req; unsigned long id;
/* Fill out a communications ring structure. */ - id = blkif_ring_get_request(rinfo, req, &ring_req); + id = blkif_ring_get_request(rinfo, req, &final_ring_req); + ring_req = &rinfo->shadow[id].req;
ring_req->operation = BLKIF_OP_DISCARD; ring_req->u.discard.nr_sectors = blk_rq_sectors(req); @@ -560,8 +561,8 @@ static int blkif_queue_discard_req(struc else ring_req->u.discard.flag = 0;
- /* Keep a private copy so we can reissue requests when recovering. */ - rinfo->shadow[id].req = *ring_req; + /* Copy the request to the ring page. */ + *final_ring_req = *ring_req;
return 0; } @@ -694,6 +695,7 @@ static int blkif_queue_rw_req(struct req { struct blkfront_info *info = rinfo->dev_info; struct blkif_request *ring_req, *extra_ring_req = NULL; + struct blkif_request *final_ring_req, *final_extra_ring_req = NULL; unsigned long id, extra_id = NO_ASSOCIATED_ID; bool require_extra_req = false; int i; @@ -738,7 +740,8 @@ static int blkif_queue_rw_req(struct req }
/* Fill out a communications ring structure. */ - id = blkif_ring_get_request(rinfo, req, &ring_req); + id = blkif_ring_get_request(rinfo, req, &final_ring_req); + ring_req = &rinfo->shadow[id].req;
num_sg = blk_rq_map_sg(req->q, req, rinfo->shadow[id].sg); num_grant = 0; @@ -789,7 +792,9 @@ static int blkif_queue_rw_req(struct req ring_req->u.rw.nr_segments = num_grant; if (unlikely(require_extra_req)) { extra_id = blkif_ring_get_request(rinfo, req, - &extra_ring_req); + &final_extra_ring_req); + extra_ring_req = &rinfo->shadow[extra_id].req; + /* * Only the first request contains the scatter-gather * list. @@ -831,10 +836,10 @@ static int blkif_queue_rw_req(struct req if (setup.segments) kunmap_atomic(setup.segments);
- /* Keep a private copy so we can reissue requests when recovering. */ - rinfo->shadow[id].req = *ring_req; + /* Copy request(s) to the ring page. */ + *final_ring_req = *ring_req; if (unlikely(require_extra_req)) - rinfo->shadow[extra_id].req = *extra_ring_req; + *final_extra_ring_req = *extra_ring_req;
if (new_persistent_gnts) gnttab_free_grant_references(setup.gref_head);
From: Juergen Gross jgross@suse.com
commit b94e4b147fd1992ad450e1fea1fdaa3738753373 upstream.
Today blkfront will trust the backend to send only sane response data. In order to avoid privilege escalations or crashes in case of malicious backends verify the data to be within expected limits. Especially make sure that the response always references an outstanding request.
Introduce a new state of the ring BLKIF_STATE_ERROR which will be switched to in case an inconsistency is being detected. Recovering from this state is possible only via removing and adding the virtual device again (e.g. via a suspend/resume cycle).
Make all warning messages issued due to valid error responses rate limited in order to avoid message floods being triggered by a malicious backend.
Signed-off-by: Juergen Gross jgross@suse.com Reviewed-by: Jan Beulich jbeulich@suse.com Acked-by: Roger Pau Monné roger.pau@citrix.com Link: https://lore.kernel.org/r/20210730103854.12681-4-jgross@suse.com Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/xen-blkfront.c | 70 ++++++++++++++++++++++++++++++++----------- 1 file changed, 53 insertions(+), 17 deletions(-)
--- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -78,6 +78,7 @@ enum blkif_state { BLKIF_STATE_DISCONNECTED, BLKIF_STATE_CONNECTED, BLKIF_STATE_SUSPENDED, + BLKIF_STATE_ERROR, };
struct grant { @@ -87,6 +88,7 @@ struct grant { };
enum blk_req_status { + REQ_PROCESSING, REQ_WAITING, REQ_DONE, REQ_ERROR, @@ -534,7 +536,7 @@ static unsigned long blkif_ring_get_requ
id = get_id_from_freelist(rinfo); rinfo->shadow[id].request = req; - rinfo->shadow[id].status = REQ_WAITING; + rinfo->shadow[id].status = REQ_PROCESSING; rinfo->shadow[id].associated_id = NO_ASSOCIATED_ID;
rinfo->shadow[id].req.u.rw.id = id; @@ -563,6 +565,7 @@ static int blkif_queue_discard_req(struc
/* Copy the request to the ring page. */ *final_ring_req = *ring_req; + rinfo->shadow[id].status = REQ_WAITING;
return 0; } @@ -838,8 +841,11 @@ static int blkif_queue_rw_req(struct req
/* Copy request(s) to the ring page. */ *final_ring_req = *ring_req; - if (unlikely(require_extra_req)) + rinfo->shadow[id].status = REQ_WAITING; + if (unlikely(require_extra_req)) { *final_extra_ring_req = *extra_ring_req; + rinfo->shadow[extra_id].status = REQ_WAITING; + }
if (new_persistent_gnts) gnttab_free_grant_references(setup.gref_head); @@ -1413,8 +1419,8 @@ static enum blk_req_status blkif_rsp_to_ static int blkif_get_final_status(enum blk_req_status s1, enum blk_req_status s2) { - BUG_ON(s1 == REQ_WAITING); - BUG_ON(s2 == REQ_WAITING); + BUG_ON(s1 < REQ_DONE); + BUG_ON(s2 < REQ_DONE);
if (s1 == REQ_ERROR || s2 == REQ_ERROR) return BLKIF_RSP_ERROR; @@ -1447,7 +1453,7 @@ static bool blkif_completion(unsigned lo s->status = blkif_rsp_to_req_status(bret->status);
/* Wait the second response if not yet here. */ - if (s2->status == REQ_WAITING) + if (s2->status < REQ_DONE) return 0;
bret->status = blkif_get_final_status(s->status, @@ -1566,11 +1572,17 @@ static irqreturn_t blkif_interrupt(int i
spin_lock_irqsave(&rinfo->ring_lock, flags); again: - rp = rinfo->ring.sring->rsp_prod; - rmb(); /* Ensure we see queued responses up to 'rp'. */ + rp = READ_ONCE(rinfo->ring.sring->rsp_prod); + virt_rmb(); /* Ensure we see queued responses up to 'rp'. */ + if (RING_RESPONSE_PROD_OVERFLOW(&rinfo->ring, rp)) { + pr_alert("%s: illegal number of responses %u\n", + info->gd->disk_name, rp - rinfo->ring.rsp_cons); + goto err; + }
for (i = rinfo->ring.rsp_cons; i != rp; i++) { unsigned long id; + unsigned int op;
RING_COPY_RESPONSE(&rinfo->ring, i, &bret); id = bret.id; @@ -1581,14 +1593,28 @@ static irqreturn_t blkif_interrupt(int i * look in get_id_from_freelist. */ if (id >= BLK_RING_SIZE(info)) { - WARN(1, "%s: response to %s has incorrect id (%ld)\n", - info->gd->disk_name, op_name(bret.operation), id); - /* We can't safely get the 'struct request' as - * the id is busted. */ - continue; + pr_alert("%s: response has incorrect id (%ld)\n", + info->gd->disk_name, id); + goto err; + } + if (rinfo->shadow[id].status != REQ_WAITING) { + pr_alert("%s: response references no pending request\n", + info->gd->disk_name); + goto err; } + + rinfo->shadow[id].status = REQ_PROCESSING; req = rinfo->shadow[id].request;
+ op = rinfo->shadow[id].req.operation; + if (op == BLKIF_OP_INDIRECT) + op = rinfo->shadow[id].req.u.indirect.indirect_op; + if (bret.operation != op) { + pr_alert("%s: response has wrong operation (%u instead of %u)\n", + info->gd->disk_name, bret.operation, op); + goto err; + } + if (bret.operation != BLKIF_OP_DISCARD) { /* * We may need to wait for an extra response if the @@ -1613,7 +1639,8 @@ static irqreturn_t blkif_interrupt(int i case BLKIF_OP_DISCARD: if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) { struct request_queue *rq = info->rq; - printk(KERN_WARNING "blkfront: %s: %s op failed\n", + + pr_warn_ratelimited("blkfront: %s: %s op failed\n", info->gd->disk_name, op_name(bret.operation)); blkif_req(req)->error = BLK_STS_NOTSUPP; info->feature_discard = 0; @@ -1625,13 +1652,13 @@ static irqreturn_t blkif_interrupt(int i case BLKIF_OP_FLUSH_DISKCACHE: case BLKIF_OP_WRITE_BARRIER: if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) { - printk(KERN_WARNING "blkfront: %s: %s op failed\n", + pr_warn_ratelimited("blkfront: %s: %s op failed\n", info->gd->disk_name, op_name(bret.operation)); blkif_req(req)->error = BLK_STS_NOTSUPP; } if (unlikely(bret.status == BLKIF_RSP_ERROR && rinfo->shadow[id].req.u.rw.nr_segments == 0)) { - printk(KERN_WARNING "blkfront: %s: empty %s op failed\n", + pr_warn_ratelimited("blkfront: %s: empty %s op failed\n", info->gd->disk_name, op_name(bret.operation)); blkif_req(req)->error = BLK_STS_NOTSUPP; } @@ -1646,8 +1673,9 @@ static irqreturn_t blkif_interrupt(int i case BLKIF_OP_READ: case BLKIF_OP_WRITE: if (unlikely(bret.status != BLKIF_RSP_OKAY)) - dev_dbg(&info->xbdev->dev, "Bad return from blkdev data " - "request: %x\n", bret.status); + dev_dbg_ratelimited(&info->xbdev->dev, + "Bad return from blkdev data request: %#x\n", + bret.status);
break; default: @@ -1672,6 +1700,14 @@ static irqreturn_t blkif_interrupt(int i spin_unlock_irqrestore(&rinfo->ring_lock, flags);
return IRQ_HANDLED; + + err: + info->connected = BLKIF_STATE_ERROR; + + spin_unlock_irqrestore(&rinfo->ring_lock, flags); + + pr_alert("%s disabled for further use\n", info->gd->disk_name); + return IRQ_HANDLED; }
From: Juergen Gross jgross@suse.com
commit 8446066bf8c1f9f7b7412c43fbea0fb87464d75b upstream.
In order to avoid problems in case the backend is modifying a response on the ring page while the frontend has already seen it, just read the response into a local buffer in one go and then operate on that buffer only.
Signed-off-by: Juergen Gross jgross@suse.com Reviewed-by: Jan Beulich jbeulich@suse.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/xen-netfront.c | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-)
--- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -389,13 +389,13 @@ static void xennet_tx_buf_gc(struct netf rmb(); /* Ensure we see responses up to 'rp'. */
for (cons = queue->tx.rsp_cons; cons != prod; cons++) { - struct xen_netif_tx_response *txrsp; + struct xen_netif_tx_response txrsp;
- txrsp = RING_GET_RESPONSE(&queue->tx, cons); - if (txrsp->status == XEN_NETIF_RSP_NULL) + RING_COPY_RESPONSE(&queue->tx, cons, &txrsp); + if (txrsp.status == XEN_NETIF_RSP_NULL) continue;
- id = txrsp->id; + id = txrsp.id; skb = queue->tx_skbs[id].skb; if (unlikely(gnttab_query_foreign_access( queue->grant_tx_ref[id]) != 0)) { @@ -743,7 +743,7 @@ static int xennet_get_extras(struct netf RING_IDX rp)
{ - struct xen_netif_extra_info *extra; + struct xen_netif_extra_info extra; struct device *dev = &queue->info->netdev->dev; RING_IDX cons = queue->rx.rsp_cons; int err = 0; @@ -759,24 +759,22 @@ static int xennet_get_extras(struct netf break; }
- extra = (struct xen_netif_extra_info *) - RING_GET_RESPONSE(&queue->rx, ++cons); + RING_COPY_RESPONSE(&queue->rx, ++cons, &extra);
- if (unlikely(!extra->type || - extra->type >= XEN_NETIF_EXTRA_TYPE_MAX)) { + if (unlikely(!extra.type || + extra.type >= XEN_NETIF_EXTRA_TYPE_MAX)) { if (net_ratelimit()) dev_warn(dev, "Invalid extra type: %d\n", - extra->type); + extra.type); err = -EINVAL; } else { - memcpy(&extras[extra->type - 1], extra, - sizeof(*extra)); + extras[extra.type - 1] = extra; }
skb = xennet_get_rx_skb(queue, cons); ref = xennet_get_rx_ref(queue, cons); xennet_move_rx_slot(queue, skb, ref); - } while (extra->flags & XEN_NETIF_EXTRA_FLAG_MORE); + } while (extra.flags & XEN_NETIF_EXTRA_FLAG_MORE);
queue->rx.rsp_cons = cons; return err; @@ -786,7 +784,7 @@ static int xennet_get_responses(struct n struct netfront_rx_info *rinfo, RING_IDX rp, struct sk_buff_head *list) { - struct xen_netif_rx_response *rx = &rinfo->rx; + struct xen_netif_rx_response *rx = &rinfo->rx, rx_local; struct xen_netif_extra_info *extras = rinfo->extras; struct device *dev = &queue->info->netdev->dev; RING_IDX cons = queue->rx.rsp_cons; @@ -844,7 +842,8 @@ next: break; }
- rx = RING_GET_RESPONSE(&queue->rx, cons + slots); + RING_COPY_RESPONSE(&queue->rx, cons + slots, &rx_local); + rx = &rx_local; skb = xennet_get_rx_skb(queue, cons + slots); ref = xennet_get_rx_ref(queue, cons + slots); slots++; @@ -899,10 +898,11 @@ static int xennet_fill_frags(struct netf struct sk_buff *nskb;
while ((nskb = __skb_dequeue(list))) { - struct xen_netif_rx_response *rx = - RING_GET_RESPONSE(&queue->rx, ++cons); + struct xen_netif_rx_response rx; skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0];
+ RING_COPY_RESPONSE(&queue->rx, ++cons, &rx); + if (skb_shinfo(skb)->nr_frags == MAX_SKB_FRAGS) { unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to;
@@ -917,7 +917,7 @@ static int xennet_fill_frags(struct netf
skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, skb_frag_page(nfrag), - rx->offset, rx->status, PAGE_SIZE); + rx.offset, rx.status, PAGE_SIZE);
skb_shinfo(nskb)->nr_frags = 0; kfree_skb(nskb); @@ -1015,7 +1015,7 @@ static int xennet_poll(struct napi_struc i = queue->rx.rsp_cons; work_done = 0; while ((i != rp) && (work_done < budget)) { - memcpy(rx, RING_GET_RESPONSE(&queue->rx, i), sizeof(*rx)); + RING_COPY_RESPONSE(&queue->rx, i, rx); memset(extras, 0, sizeof(rinfo.extras));
err = xennet_get_responses(queue, &rinfo, rp, &tmpq);
From: Juergen Gross jgross@suse.com
commit 162081ec33c2686afa29d91bf8d302824aa846c7 upstream.
In order to avoid a malicious backend being able to influence the local processing of a request build the request locally first and then copy it to the ring page. Any reading from the request influencing the processing in the frontend needs to be done on the local instance.
Signed-off-by: Juergen Gross jgross@suse.com Reviewed-by: Jan Beulich jbeulich@suse.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/xen-netfront.c | 80 ++++++++++++++++++++------------------------- 1 file changed, 37 insertions(+), 43 deletions(-)
--- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -425,7 +425,8 @@ struct xennet_gnttab_make_txreq { struct netfront_queue *queue; struct sk_buff *skb; struct page *page; - struct xen_netif_tx_request *tx; /* Last request */ + struct xen_netif_tx_request *tx; /* Last request on ring page */ + struct xen_netif_tx_request tx_local; /* Last request local copy*/ unsigned int size; };
@@ -453,30 +454,27 @@ static void xennet_tx_setup_grant(unsign queue->grant_tx_page[id] = page; queue->grant_tx_ref[id] = ref;
- tx->id = id; - tx->gref = ref; - tx->offset = offset; - tx->size = len; - tx->flags = 0; + info->tx_local.id = id; + info->tx_local.gref = ref; + info->tx_local.offset = offset; + info->tx_local.size = len; + info->tx_local.flags = 0; + + *tx = info->tx_local;
info->tx = tx; - info->size += tx->size; + info->size += info->tx_local.size; }
static struct xen_netif_tx_request *xennet_make_first_txreq( - struct netfront_queue *queue, struct sk_buff *skb, - struct page *page, unsigned int offset, unsigned int len) + struct xennet_gnttab_make_txreq *info, + unsigned int offset, unsigned int len) { - struct xennet_gnttab_make_txreq info = { - .queue = queue, - .skb = skb, - .page = page, - .size = 0, - }; + info->size = 0;
- gnttab_for_one_grant(page, offset, len, xennet_tx_setup_grant, &info); + gnttab_for_one_grant(info->page, offset, len, xennet_tx_setup_grant, info);
- return info.tx; + return info->tx; }
static void xennet_make_one_txreq(unsigned long gfn, unsigned int offset, @@ -489,35 +487,27 @@ static void xennet_make_one_txreq(unsign xennet_tx_setup_grant(gfn, offset, len, data); }
-static struct xen_netif_tx_request *xennet_make_txreqs( - struct netfront_queue *queue, struct xen_netif_tx_request *tx, - struct sk_buff *skb, struct page *page, +static void xennet_make_txreqs( + struct xennet_gnttab_make_txreq *info, + struct page *page, unsigned int offset, unsigned int len) { - struct xennet_gnttab_make_txreq info = { - .queue = queue, - .skb = skb, - .tx = tx, - }; - /* Skip unused frames from start of page */ page += offset >> PAGE_SHIFT; offset &= ~PAGE_MASK;
while (len) { - info.page = page; - info.size = 0; + info->page = page; + info->size = 0;
gnttab_foreach_grant_in_range(page, offset, len, xennet_make_one_txreq, - &info); + info);
page++; offset = 0; - len -= info.size; + len -= info->size; } - - return info.tx; }
/* @@ -570,7 +560,7 @@ static int xennet_start_xmit(struct sk_b { struct netfront_info *np = netdev_priv(dev); struct netfront_stats *tx_stats = this_cpu_ptr(np->tx_stats); - struct xen_netif_tx_request *tx, *first_tx; + struct xen_netif_tx_request *first_tx; unsigned int i; int notify; int slots; @@ -579,6 +569,7 @@ static int xennet_start_xmit(struct sk_b unsigned int len; unsigned long flags; struct netfront_queue *queue = NULL; + struct xennet_gnttab_make_txreq info = { }; unsigned int num_queues = dev->real_num_tx_queues; u16 queue_index; struct sk_buff *nskb; @@ -636,21 +627,24 @@ static int xennet_start_xmit(struct sk_b }
/* First request for the linear area. */ - first_tx = tx = xennet_make_first_txreq(queue, skb, - page, offset, len); - offset += tx->size; + info.queue = queue; + info.skb = skb; + info.page = page; + first_tx = xennet_make_first_txreq(&info, offset, len); + offset += info.tx_local.size; if (offset == PAGE_SIZE) { page++; offset = 0; } - len -= tx->size; + len -= info.tx_local.size;
if (skb->ip_summed == CHECKSUM_PARTIAL) /* local packet? */ - tx->flags |= XEN_NETTXF_csum_blank | XEN_NETTXF_data_validated; + first_tx->flags |= XEN_NETTXF_csum_blank | + XEN_NETTXF_data_validated; else if (skb->ip_summed == CHECKSUM_UNNECESSARY) /* remote but checksummed. */ - tx->flags |= XEN_NETTXF_data_validated; + first_tx->flags |= XEN_NETTXF_data_validated;
/* Optional extra info after the first request. */ if (skb_shinfo(skb)->gso_size) { @@ -659,7 +653,7 @@ static int xennet_start_xmit(struct sk_b gso = (struct xen_netif_extra_info *) RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++);
- tx->flags |= XEN_NETTXF_extra_info; + first_tx->flags |= XEN_NETTXF_extra_info;
gso->u.gso.size = skb_shinfo(skb)->gso_size; gso->u.gso.type = (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) ? @@ -673,13 +667,13 @@ static int xennet_start_xmit(struct sk_b }
/* Requests for the rest of the linear area. */ - tx = xennet_make_txreqs(queue, tx, skb, page, offset, len); + xennet_make_txreqs(&info, page, offset, len);
/* Requests for all the frags. */ for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; - tx = xennet_make_txreqs(queue, tx, skb, - skb_frag_page(frag), frag->page_offset, + xennet_make_txreqs(&info, skb_frag_page(frag), + frag->page_offset, skb_frag_size(frag)); }
From: Juergen Gross jgross@suse.com
commit 21631d2d741a64a073e167c27769e73bc7844a2f upstream.
The tx_skb_freelist elements are in a single linked list with the request id used as link reference. The per element link field is in a union with the skb pointer of an in use request.
Move the link reference out of the union in order to enable a later reuse of it for requests which need a populated skb pointer.
Rename add_id_to_freelist() and get_id_from_freelist() to add_id_to_list() and get_id_from_list() in order to prepare using those for other lists as well. Define ~0 as value to indicate the end of a list and place that value into the link for a request not being on the list.
When freeing a skb zero the skb pointer in the request. Use a NULL value of the skb pointer instead of skb_entry_is_link() for deciding whether a request has a skb linked to it.
Remove skb_entry_set_link() and open code it instead as it is really trivial now.
Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/xen-netfront.c | 61 ++++++++++++++++++--------------------------- 1 file changed, 25 insertions(+), 36 deletions(-)
--- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -121,17 +121,11 @@ struct netfront_queue {
/* * {tx,rx}_skbs store outstanding skbuffs. Free tx_skb entries - * are linked from tx_skb_freelist through skb_entry.link. - * - * NB. Freelist index entries are always going to be less than - * PAGE_OFFSET, whereas pointers to skbs will always be equal or - * greater than PAGE_OFFSET: we use this property to distinguish - * them. + * are linked from tx_skb_freelist through tx_link. */ - union skb_entry { - struct sk_buff *skb; - unsigned long link; - } tx_skbs[NET_TX_RING_SIZE]; + struct sk_buff *tx_skbs[NET_TX_RING_SIZE]; + unsigned short tx_link[NET_TX_RING_SIZE]; +#define TX_LINK_NONE 0xffff grant_ref_t gref_tx_head; grant_ref_t grant_tx_ref[NET_TX_RING_SIZE]; struct page *grant_tx_page[NET_TX_RING_SIZE]; @@ -169,33 +163,25 @@ struct netfront_rx_info { struct xen_netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX - 1]; };
-static void skb_entry_set_link(union skb_entry *list, unsigned short id) -{ - list->link = id; -} - -static int skb_entry_is_link(const union skb_entry *list) -{ - BUILD_BUG_ON(sizeof(list->skb) != sizeof(list->link)); - return (unsigned long)list->skb < PAGE_OFFSET; -} - /* * Access macros for acquiring freeing slots in tx_skbs[]. */
-static void add_id_to_freelist(unsigned *head, union skb_entry *list, - unsigned short id) +static void add_id_to_list(unsigned *head, unsigned short *list, + unsigned short id) { - skb_entry_set_link(&list[id], *head); + list[id] = *head; *head = id; }
-static unsigned short get_id_from_freelist(unsigned *head, - union skb_entry *list) +static unsigned short get_id_from_list(unsigned *head, unsigned short *list) { unsigned int id = *head; - *head = list[id].link; + + if (id != TX_LINK_NONE) { + *head = list[id]; + list[id] = TX_LINK_NONE; + } return id; }
@@ -396,7 +382,8 @@ static void xennet_tx_buf_gc(struct netf continue;
id = txrsp.id; - skb = queue->tx_skbs[id].skb; + skb = queue->tx_skbs[id]; + queue->tx_skbs[id] = NULL; if (unlikely(gnttab_query_foreign_access( queue->grant_tx_ref[id]) != 0)) { pr_alert("%s: warning -- grant still in use by backend domain\n", @@ -409,7 +396,7 @@ static void xennet_tx_buf_gc(struct netf &queue->gref_tx_head, queue->grant_tx_ref[id]); queue->grant_tx_ref[id] = GRANT_INVALID_REF; queue->grant_tx_page[id] = NULL; - add_id_to_freelist(&queue->tx_skb_freelist, queue->tx_skbs, id); + add_id_to_list(&queue->tx_skb_freelist, queue->tx_link, id); dev_kfree_skb_irq(skb); }
@@ -442,7 +429,7 @@ static void xennet_tx_setup_grant(unsign struct netfront_queue *queue = info->queue; struct sk_buff *skb = info->skb;
- id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs); + id = get_id_from_list(&queue->tx_skb_freelist, queue->tx_link); tx = RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++); ref = gnttab_claim_grant_reference(&queue->gref_tx_head); WARN_ON_ONCE(IS_ERR_VALUE((unsigned long)(int)ref)); @@ -450,7 +437,7 @@ static void xennet_tx_setup_grant(unsign gnttab_grant_foreign_access_ref(ref, queue->info->xbdev->otherend_id, gfn, GNTMAP_readonly);
- queue->tx_skbs[id].skb = skb; + queue->tx_skbs[id] = skb; queue->grant_tx_page[id] = page; queue->grant_tx_ref[id] = ref;
@@ -1131,17 +1118,18 @@ static void xennet_release_tx_bufs(struc
for (i = 0; i < NET_TX_RING_SIZE; i++) { /* Skip over entries which are actually freelist references */ - if (skb_entry_is_link(&queue->tx_skbs[i])) + if (!queue->tx_skbs[i]) continue;
- skb = queue->tx_skbs[i].skb; + skb = queue->tx_skbs[i]; + queue->tx_skbs[i] = NULL; get_page(queue->grant_tx_page[i]); gnttab_end_foreign_access(queue->grant_tx_ref[i], GNTMAP_readonly, (unsigned long)page_address(queue->grant_tx_page[i])); queue->grant_tx_page[i] = NULL; queue->grant_tx_ref[i] = GRANT_INVALID_REF; - add_id_to_freelist(&queue->tx_skb_freelist, queue->tx_skbs, i); + add_id_to_list(&queue->tx_skb_freelist, queue->tx_link, i); dev_kfree_skb_irq(skb); } } @@ -1624,13 +1612,14 @@ static int xennet_init_queue(struct netf snprintf(queue->name, sizeof(queue->name), "vif%s-q%u", devid, queue->id);
- /* Initialise tx_skbs as a free chain containing every entry. */ + /* Initialise tx_skb_freelist as a free chain containing every entry. */ queue->tx_skb_freelist = 0; for (i = 0; i < NET_TX_RING_SIZE; i++) { - skb_entry_set_link(&queue->tx_skbs[i], i+1); + queue->tx_link[i] = i + 1; queue->grant_tx_ref[i] = GRANT_INVALID_REF; queue->grant_tx_page[i] = NULL; } + queue->tx_link[NET_TX_RING_SIZE - 1] = TX_LINK_NONE;
/* Clear out rx_skbs */ for (i = 0; i < NET_RX_RING_SIZE; i++) {
From: Juergen Gross jgross@suse.com
commit a884daa61a7d91650987e855464526aef219590f upstream.
Today netfront will trust the backend to send only sane response data. In order to avoid privilege escalations or crashes in case of malicious backends verify the data to be within expected limits. Especially make sure that the response always references an outstanding request.
Note that only the tx queue needs special id handling, as for the rx queue the id is equal to the index in the ring page.
Introduce a new indicator for the device whether it is broken and let the device stop working when it is set. Set this indicator in case the backend sets any weird data.
Signed-off-by: Juergen Gross jgross@suse.com Reviewed-by: Jan Beulich jbeulich@suse.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/xen-netfront.c | 80 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 75 insertions(+), 5 deletions(-)
--- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -126,10 +126,12 @@ struct netfront_queue { struct sk_buff *tx_skbs[NET_TX_RING_SIZE]; unsigned short tx_link[NET_TX_RING_SIZE]; #define TX_LINK_NONE 0xffff +#define TX_PENDING 0xfffe grant_ref_t gref_tx_head; grant_ref_t grant_tx_ref[NET_TX_RING_SIZE]; struct page *grant_tx_page[NET_TX_RING_SIZE]; unsigned tx_skb_freelist; + unsigned int tx_pend_queue;
spinlock_t rx_lock ____cacheline_aligned_in_smp; struct xen_netif_rx_front_ring rx; @@ -155,6 +157,9 @@ struct netfront_info { struct netfront_stats __percpu *rx_stats; struct netfront_stats __percpu *tx_stats;
+ /* Is device behaving sane? */ + bool broken; + atomic_t rx_gso_checksum_fixup; };
@@ -339,7 +344,7 @@ static int xennet_open(struct net_device unsigned int i = 0; struct netfront_queue *queue = NULL;
- if (!np->queues) + if (!np->queues || np->broken) return -ENODEV;
for (i = 0; i < num_queues; ++i) { @@ -367,11 +372,17 @@ static void xennet_tx_buf_gc(struct netf unsigned short id; struct sk_buff *skb; bool more_to_do; + const struct device *dev = &queue->info->netdev->dev;
BUG_ON(!netif_carrier_ok(queue->info->netdev));
do { prod = queue->tx.sring->rsp_prod; + if (RING_RESPONSE_PROD_OVERFLOW(&queue->tx, prod)) { + dev_alert(dev, "Illegal number of responses %u\n", + prod - queue->tx.rsp_cons); + goto err; + } rmb(); /* Ensure we see responses up to 'rp'. */
for (cons = queue->tx.rsp_cons; cons != prod; cons++) { @@ -381,14 +392,27 @@ static void xennet_tx_buf_gc(struct netf if (txrsp.status == XEN_NETIF_RSP_NULL) continue;
- id = txrsp.id; + id = txrsp.id; + if (id >= RING_SIZE(&queue->tx)) { + dev_alert(dev, + "Response has incorrect id (%u)\n", + id); + goto err; + } + if (queue->tx_link[id] != TX_PENDING) { + dev_alert(dev, + "Response for inactive request\n"); + goto err; + } + + queue->tx_link[id] = TX_LINK_NONE; skb = queue->tx_skbs[id]; queue->tx_skbs[id] = NULL; if (unlikely(gnttab_query_foreign_access( queue->grant_tx_ref[id]) != 0)) { - pr_alert("%s: warning -- grant still in use by backend domain\n", - __func__); - BUG(); + dev_alert(dev, + "Grant still in use by backend domain\n"); + goto err; } gnttab_end_foreign_access_ref( queue->grant_tx_ref[id], GNTMAP_readonly); @@ -406,6 +430,12 @@ static void xennet_tx_buf_gc(struct netf } while (more_to_do);
xennet_maybe_wake_tx(queue); + + return; + + err: + queue->info->broken = true; + dev_alert(dev, "Disabled for further use\n"); }
struct xennet_gnttab_make_txreq { @@ -449,6 +479,12 @@ static void xennet_tx_setup_grant(unsign
*tx = info->tx_local;
+ /* + * Put the request in the pending queue, it will be set to be pending + * when the producer index is about to be raised. + */ + add_id_to_list(&queue->tx_pend_queue, queue->tx_link, id); + info->tx = tx; info->size += info->tx_local.size; } @@ -541,6 +577,15 @@ static u16 xennet_select_queue(struct ne return queue_idx; }
+static void xennet_mark_tx_pending(struct netfront_queue *queue) +{ + unsigned int i; + + while ((i = get_id_from_list(&queue->tx_pend_queue, queue->tx_link)) != + TX_LINK_NONE) + queue->tx_link[i] = TX_PENDING; +} + #define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1)
static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev) @@ -564,6 +609,8 @@ static int xennet_start_xmit(struct sk_b /* Drop the packet if no queues are set up */ if (num_queues < 1) goto drop; + if (unlikely(np->broken)) + goto drop; /* Determine which queue to transmit this SKB on */ queue_index = skb_get_queue_mapping(skb); queue = &np->queues[queue_index]; @@ -667,6 +714,8 @@ static int xennet_start_xmit(struct sk_b /* First request has the packet length. */ first_tx->size = skb->len;
+ xennet_mark_tx_pending(queue); + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&queue->tx, notify); if (notify) notify_remote_via_irq(queue->tx_irq); @@ -991,6 +1040,13 @@ static int xennet_poll(struct napi_struc skb_queue_head_init(&tmpq);
rp = queue->rx.sring->rsp_prod; + if (RING_RESPONSE_PROD_OVERFLOW(&queue->rx, rp)) { + dev_alert(&dev->dev, "Illegal number of responses %u\n", + rp - queue->rx.rsp_cons); + queue->info->broken = true; + spin_unlock(&queue->rx_lock); + return 0; + } rmb(); /* Ensure we see queued responses up to 'rp'. */
i = queue->rx.rsp_cons; @@ -1209,6 +1265,9 @@ static irqreturn_t xennet_tx_interrupt(i struct netfront_queue *queue = dev_id; unsigned long flags;
+ if (queue->info->broken) + return IRQ_HANDLED; + spin_lock_irqsave(&queue->tx_lock, flags); xennet_tx_buf_gc(queue); spin_unlock_irqrestore(&queue->tx_lock, flags); @@ -1221,6 +1280,9 @@ static irqreturn_t xennet_rx_interrupt(i struct netfront_queue *queue = dev_id; struct net_device *dev = queue->info->netdev;
+ if (queue->info->broken) + return IRQ_HANDLED; + if (likely(netif_carrier_ok(dev) && RING_HAS_UNCONSUMED_RESPONSES(&queue->rx))) napi_schedule(&queue->napi); @@ -1242,6 +1304,10 @@ static void xennet_poll_controller(struc struct netfront_info *info = netdev_priv(dev); unsigned int num_queues = dev->real_num_tx_queues; unsigned int i; + + if (info->broken) + return; + for (i = 0; i < num_queues; ++i) xennet_interrupt(0, &info->queues[i]); } @@ -1614,6 +1680,7 @@ static int xennet_init_queue(struct netf
/* Initialise tx_skb_freelist as a free chain containing every entry. */ queue->tx_skb_freelist = 0; + queue->tx_pend_queue = TX_LINK_NONE; for (i = 0; i < NET_TX_RING_SIZE; i++) { queue->tx_link[i] = i + 1; queue->grant_tx_ref[i] = GRANT_INVALID_REF; @@ -1824,6 +1891,9 @@ static int talk_to_netback(struct xenbus if (info->queues) xennet_destroy_queues(info);
+ /* For the case of a reconnect reset the "broken" indicator. */ + info->broken = false; + err = xennet_create_queues(info, &num_queues); if (err < 0) { xenbus_dev_fatal(dev, err, "creating queues");
From: Juergen Gross jgross@suse.com
commit e679004dec37566f658a255157d3aed9d762a2b7 upstream.
Xen frontends shouldn't BUG() in case of illegal data received from their backends. So replace the BUG_ON()s when reading illegal data from the ring page with negative return values.
Reviewed-by: Jan Beulich jbeulich@suse.com Signed-off-by: Juergen Gross jgross@suse.com Link: https://lore.kernel.org/r/20210707091045.460-1-jgross@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/hvc/hvc_xen.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-)
--- a/drivers/tty/hvc/hvc_xen.c +++ b/drivers/tty/hvc/hvc_xen.c @@ -99,7 +99,11 @@ static int __write_console(struct xencon cons = intf->out_cons; prod = intf->out_prod; mb(); /* update queue values before going on */ - BUG_ON((prod - cons) > sizeof(intf->out)); + + if ((prod - cons) > sizeof(intf->out)) { + pr_err_once("xencons: Illegal ring page indices"); + return -EINVAL; + }
while ((sent < len) && ((prod - cons) < sizeof(intf->out))) intf->out[MASK_XENCONS_IDX(prod++, intf->out)] = data[sent++]; @@ -127,7 +131,10 @@ static int domU_write_console(uint32_t v */ while (len) { int sent = __write_console(cons, data, len); - + + if (sent < 0) + return sent; + data += sent; len -= sent;
@@ -151,7 +158,11 @@ static int domU_read_console(uint32_t vt cons = intf->in_cons; prod = intf->in_prod; mb(); /* get pointers before reading ring */ - BUG_ON((prod - cons) > sizeof(intf->in)); + + if ((prod - cons) > sizeof(intf->in)) { + pr_err_once("xencons: Illegal ring page indices"); + return -EINVAL; + }
while (cons != prod && recv < len) buf[recv++] = intf->in[MASK_XENCONS_IDX(cons++, intf->in)];
From: Alexander Mikhalitsyn alexander.mikhalitsyn@virtuozzo.com
commit 85b6d24646e4125c591639841169baa98a2da503 upstream.
Currently, the exit_shm() function not designed to work properly when task->sysvshm.shm_clist holds shm objects from different IPC namespaces.
This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it leads to use-after-free (reproducer exists).
This is an attempt to fix the problem by extending exit_shm mechanism to handle shm's destroy from several IPC ns'es.
To achieve that we do several things:
1. add a namespace (non-refcounted) pointer to the struct shmid_kernel
2. during new shm object creation (newseg()/shmget syscall) we initialize this pointer by current task IPC ns
3. exit_shm() fully reworked such that it traverses over all shp's in task->sysvshm.shm_clist and gets IPC namespace not from current task as it was before but from shp's object itself, then call shm_destroy(shp, ns).
Note: We need to be really careful here, because as it was said before (1), our pointer to IPC ns non-refcnt'ed. To be on the safe side we using special helper get_ipc_ns_not_zero() which allows to get IPC ns refcounter only if IPC ns not in the "state of destruction".
Q/A
Q: Why can we access shp->ns memory using non-refcounted pointer? A: Because shp object lifetime is always shorther than IPC namespace lifetime, so, if we get shp object from the task->sysvshm.shm_clist while holding task_lock(task) nobody can steal our namespace.
Q: Does this patch change semantics of unshare/setns/clone syscalls? A: No. It's just fixes non-covered case when process may leave IPC namespace without getting task->sysvshm.shm_clist list cleaned up.
Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife.... Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com Fixes: ab602f79915 ("shm: make exit_shm work proportional to task activity") Co-developed-by: Manfred Spraul manfred@colorfullife.com Signed-off-by: Manfred Spraul manfred@colorfullife.com Signed-off-by: Alexander Mikhalitsyn alexander.mikhalitsyn@virtuozzo.com Cc: "Eric W. Biederman" ebiederm@xmission.com Cc: Davidlohr Bueso dave@stgolabs.net Cc: Greg KH gregkh@linuxfoundation.org Cc: Andrei Vagin avagin@gmail.com Cc: Pavel Tikhomirov ptikhomirov@virtuozzo.com Cc: Vasily Averin vvs@virtuozzo.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/ipc_namespace.h | 15 +++ include/linux/sched/task.h | 2 include/linux/shm.h | 13 ++- ipc/shm.c | 176 +++++++++++++++++++++++++++++++----------- 4 files changed, 159 insertions(+), 47 deletions(-)
--- a/include/linux/ipc_namespace.h +++ b/include/linux/ipc_namespace.h @@ -127,6 +127,16 @@ static inline struct ipc_namespace *get_ return ns; }
+static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns) +{ + if (ns) { + if (refcount_inc_not_zero(&ns->count)) + return ns; + } + + return NULL; +} + extern void put_ipc_ns(struct ipc_namespace *ns); #else static inline struct ipc_namespace *copy_ipcs(unsigned long flags, @@ -142,6 +152,11 @@ static inline struct ipc_namespace *get_ { return ns; } + +static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns) +{ + return ns; +}
static inline void put_ipc_ns(struct ipc_namespace *ns) { --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -122,7 +122,7 @@ static inline struct vm_struct *task_sta * Protects ->fs, ->files, ->mm, ->group_info, ->comm, keyring * subscriptions and synchronises with wait4(). Also used in procfs. Also * pins the final release of task.io_context. Also protects ->cpuset and - * ->cgroup.subsys[]. And ->vfork_done. + * ->cgroup.subsys[]. And ->vfork_done. And ->sysvshm.shm_clist. * * Nests both inside and outside of read_lock(&tasklist_lock). * It must not be nested with write_lock_irq(&tasklist_lock), --- a/include/linux/shm.h +++ b/include/linux/shm.h @@ -20,9 +20,18 @@ struct shmid_kernel /* private to the ke pid_t shm_lprid; struct user_struct *mlock_user;
- /* The task created the shm object. NULL if the task is dead. */ + /* + * The task created the shm object, for + * task_lock(shp->shm_creator) + */ struct task_struct *shm_creator; - struct list_head shm_clist; /* list by creator */ + + /* + * List by creator. task_lock(->shm_creator) required for read/write. + * If list_empty(), then the creator is dead already. + */ + struct list_head shm_clist; + struct ipc_namespace *ns; } __randomize_layout;
/* shm_mode upper byte flags */ --- a/ipc/shm.c +++ b/ipc/shm.c @@ -92,6 +92,7 @@ static void do_shm_rmid(struct ipc_names struct shmid_kernel *shp;
shp = container_of(ipcp, struct shmid_kernel, shm_perm); + WARN_ON(ns != shp->ns);
if (shp->shm_nattch) { shp->shm_perm.mode |= SHM_DEST; @@ -185,10 +186,43 @@ static void shm_rcu_free(struct rcu_head kvfree(shp); }
-static inline void shm_rmid(struct ipc_namespace *ns, struct shmid_kernel *s) +/* + * It has to be called with shp locked. + * It must be called before ipc_rmid() + */ +static inline void shm_clist_rm(struct shmid_kernel *shp) { - list_del(&s->shm_clist); - ipc_rmid(&shm_ids(ns), &s->shm_perm); + struct task_struct *creator; + + /* ensure that shm_creator does not disappear */ + rcu_read_lock(); + + /* + * A concurrent exit_shm may do a list_del_init() as well. + * Just do nothing if exit_shm already did the work + */ + if (!list_empty(&shp->shm_clist)) { + /* + * shp->shm_creator is guaranteed to be valid *only* + * if shp->shm_clist is not empty. + */ + creator = shp->shm_creator; + + task_lock(creator); + /* + * list_del_init() is a nop if the entry was already removed + * from the list. + */ + list_del_init(&shp->shm_clist); + task_unlock(creator); + } + rcu_read_unlock(); +} + +static inline void shm_rmid(struct shmid_kernel *s) +{ + shm_clist_rm(s); + ipc_rmid(&shm_ids(s->ns), &s->shm_perm); }
@@ -243,7 +277,7 @@ static void shm_destroy(struct ipc_names shm_file = shp->shm_file; shp->shm_file = NULL; ns->shm_tot -= (shp->shm_segsz + PAGE_SIZE - 1) >> PAGE_SHIFT; - shm_rmid(ns, shp); + shm_rmid(shp); shm_unlock(shp); if (!is_file_hugepages(shm_file)) shmem_lock(shm_file, 0, shp->mlock_user); @@ -264,10 +298,10 @@ static void shm_destroy(struct ipc_names * * 2) sysctl kernel.shm_rmid_forced is set to 1. */ -static bool shm_may_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp) +static bool shm_may_destroy(struct shmid_kernel *shp) { return (shp->shm_nattch == 0) && - (ns->shm_rmid_forced || + (shp->ns->shm_rmid_forced || (shp->shm_perm.mode & SHM_DEST)); }
@@ -298,7 +332,7 @@ static void shm_close(struct vm_area_str shp->shm_lprid = task_tgid_vnr(current); shp->shm_dtim = ktime_get_real_seconds(); shp->shm_nattch--; - if (shm_may_destroy(ns, shp)) + if (shm_may_destroy(shp)) shm_destroy(ns, shp); else shm_unlock(shp); @@ -319,10 +353,10 @@ static int shm_try_destroy_orphaned(int * * As shp->* are changed under rwsem, it's safe to skip shp locking. */ - if (shp->shm_creator != NULL) + if (!list_empty(&shp->shm_clist)) return 0;
- if (shm_may_destroy(ns, shp)) { + if (shm_may_destroy(shp)) { shm_lock_by_ptr(shp); shm_destroy(ns, shp); } @@ -340,48 +374,97 @@ void shm_destroy_orphaned(struct ipc_nam /* Locking assumes this will only be called with task == current */ void exit_shm(struct task_struct *task) { - struct ipc_namespace *ns = task->nsproxy->ipc_ns; - struct shmid_kernel *shp, *n; + for (;;) { + struct shmid_kernel *shp; + struct ipc_namespace *ns;
- if (list_empty(&task->sysvshm.shm_clist)) - return; + task_lock(task); + + if (list_empty(&task->sysvshm.shm_clist)) { + task_unlock(task); + break; + } + + shp = list_first_entry(&task->sysvshm.shm_clist, struct shmid_kernel, + shm_clist);
- /* - * If kernel.shm_rmid_forced is not set then only keep track of - * which shmids are orphaned, so that a later set of the sysctl - * can clean them up. - */ - if (!ns->shm_rmid_forced) { - down_read(&shm_ids(ns).rwsem); - list_for_each_entry(shp, &task->sysvshm.shm_clist, shm_clist) - shp->shm_creator = NULL; /* - * Only under read lock but we are only called on current - * so no entry on the list will be shared. + * 1) Get pointer to the ipc namespace. It is worth to say + * that this pointer is guaranteed to be valid because + * shp lifetime is always shorter than namespace lifetime + * in which shp lives. + * We taken task_lock it means that shp won't be freed. */ - list_del(&task->sysvshm.shm_clist); - up_read(&shm_ids(ns).rwsem); - return; - } + ns = shp->ns;
- /* - * Destroy all already created segments, that were not yet mapped, - * and mark any mapped as orphan to cover the sysctl toggling. - * Destroy is skipped if shm_may_destroy() returns false. - */ - down_write(&shm_ids(ns).rwsem); - list_for_each_entry_safe(shp, n, &task->sysvshm.shm_clist, shm_clist) { - shp->shm_creator = NULL; + /* + * 2) If kernel.shm_rmid_forced is not set then only keep track of + * which shmids are orphaned, so that a later set of the sysctl + * can clean them up. + */ + if (!ns->shm_rmid_forced) + goto unlink_continue;
- if (shm_may_destroy(ns, shp)) { - shm_lock_by_ptr(shp); - shm_destroy(ns, shp); + /* + * 3) get a reference to the namespace. + * The refcount could be already 0. If it is 0, then + * the shm objects will be free by free_ipc_work(). + */ + ns = get_ipc_ns_not_zero(ns); + if (!ns) { +unlink_continue: + list_del_init(&shp->shm_clist); + task_unlock(task); + continue; } - }
- /* Remove the list head from any segments still attached. */ - list_del(&task->sysvshm.shm_clist); - up_write(&shm_ids(ns).rwsem); + /* + * 4) get a reference to shp. + * This cannot fail: shm_clist_rm() is called before + * ipc_rmid(), thus the refcount cannot be 0. + */ + WARN_ON(!ipc_rcu_getref(&shp->shm_perm)); + + /* + * 5) unlink the shm segment from the list of segments + * created by current. + * This must be done last. After unlinking, + * only the refcounts obtained above prevent IPC_RMID + * from destroying the segment or the namespace. + */ + list_del_init(&shp->shm_clist); + + task_unlock(task); + + /* + * 6) we have all references + * Thus lock & if needed destroy shp. + */ + down_write(&shm_ids(ns).rwsem); + shm_lock_by_ptr(shp); + /* + * rcu_read_lock was implicitly taken in shm_lock_by_ptr, it's + * safe to call ipc_rcu_putref here + */ + ipc_rcu_putref(&shp->shm_perm, shm_rcu_free); + + if (ipc_valid_object(&shp->shm_perm)) { + if (shm_may_destroy(shp)) + shm_destroy(ns, shp); + else + shm_unlock(shp); + } else { + /* + * Someone else deleted the shp from namespace + * idr/kht while we have waited. + * Just unlock and continue. + */ + shm_unlock(shp); + } + + up_write(&shm_ids(ns).rwsem); + put_ipc_ns(ns); /* paired with get_ipc_ns_not_zero */ + } }
static int shm_fault(struct vm_fault *vmf) @@ -625,7 +708,11 @@ static int newseg(struct ipc_namespace * if (error < 0) goto no_id;
+ shp->ns = ns; + + task_lock(current); list_add(&shp->shm_clist, ¤t->sysvshm.shm_clist); + task_unlock(current);
/* * shmid gets reported as "inode#" in /proc/pid/maps. @@ -1449,7 +1536,8 @@ out_nattch: down_write(&shm_ids(ns).rwsem); shp = shm_lock(ns, shmid); shp->shm_nattch--; - if (shm_may_destroy(ns, shp)) + + if (shm_may_destroy(shp)) shm_destroy(ns, shp); else shm_unlock(shp);
From: Alexander Mikhalitsyn alexander.mikhalitsyn@virtuozzo.com
commit 126e8bee943e9926238c891e2df5b5573aee76bc upstream.
Patch series "shm: shm_rmid_forced feature fixes".
Some time ago I met kernel crash after CRIU restore procedure, fortunately, it was CRIU restore, so, I had dump files and could do restore many times and crash reproduced easily. After some investigation I've constructed the minimal reproducer. It was found that it's use-after-free and it happens only if sysctl kernel.shm_rmid_forced = 1.
The key of the problem is that the exit_shm() function not handles shp's object destroy when task->sysvshm.shm_clist contains items from different IPC namespaces. In most cases this list will contain only items from one IPC namespace.
How can this list contain object from different namespaces? The exit_shm() function is designed to clean up this list always when process leaves IPC namespace. But we made a mistake a long time ago and did not add a exit_shm() call into the setns() syscall procedures.
The first idea was just to add this call to setns() syscall but it obviously changes semantics of setns() syscall and that's userspace-visible change. So, I gave up on this idea.
The first real attempt to address the issue was just to omit forced destroy if we meet shp object not from current task IPC namespace [1]. But that was not the best idea because task->sysvshm.shm_clist was protected by rwsem which belongs to current task IPC namespace. It means that list corruption may occur.
Second approach is just extend exit_shm() to properly handle shp's from different IPC namespaces [2]. This is really non-trivial thing, I've put a lot of effort into that but not believed that it's possible to make it fully safe, clean and clear.
Thanks to the efforts of Manfred Spraul working an elegant solution was designed. Thanks a lot, Manfred!
Eric also suggested the way to address the issue in ("[RFC][PATCH] shm: In shm_exit destroy all created and never attached segments") Eric's idea was to maintain a list of shm_clists one per IPC namespace, use lock-less lists. But there is some extra memory consumption-related concerns.
An alternative solution which was suggested by me was implemented in ("shm: reset shm_clist on setns but omit forced shm destroy"). The idea is pretty simple, we add exit_shm() syscall to setns() but DO NOT destroy shm segments even if sysctl kernel.shm_rmid_forced = 1, we just clean up the task->sysvshm.shm_clist list.
This chages semantics of setns() syscall a little bit but in comparision to the "naive" solution when we just add exit_shm() without any special exclusions this looks like a safer option.
[1] https://lkml.org/lkml/2021/7/6/1108 [2] https://lkml.org/lkml/2021/7/14/736
This patch (of 2):
Let's produce a warning if we trying to remove non-existing IPC object from IPC namespace kht/idr structures.
This allows us to catch possible bugs when the ipc_rmid() function was called with inconsistent struct ipc_ids*, struct kern_ipc_perm* arguments.
Link: https://lkml.kernel.org/r/20211027224348.611025-1-alexander.mikhalitsyn@virt... Link: https://lkml.kernel.org/r/20211027224348.611025-2-alexander.mikhalitsyn@virt... Co-developed-by: Manfred Spraul manfred@colorfullife.com Signed-off-by: Manfred Spraul manfred@colorfullife.com Signed-off-by: Alexander Mikhalitsyn alexander.mikhalitsyn@virtuozzo.com Cc: "Eric W. Biederman" ebiederm@xmission.com Cc: Davidlohr Bueso dave@stgolabs.net Cc: Greg KH gregkh@linuxfoundation.org Cc: Andrei Vagin avagin@gmail.com Cc: Pavel Tikhomirov ptikhomirov@virtuozzo.com Cc: Vasily Averin vvs@virtuozzo.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- ipc/util.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/ipc/util.c +++ b/ipc/util.c @@ -409,8 +409,8 @@ static int ipcget_public(struct ipc_name static void ipc_kht_remove(struct ipc_ids *ids, struct kern_ipc_perm *ipcp) { if (ipcp->key != IPC_PRIVATE) - rhashtable_remove_fast(&ids->key_ht, &ipcp->khtnode, - ipc_kht_params); + WARN_ON_ONCE(rhashtable_remove_fast(&ids->key_ht, &ipcp->khtnode, + ipc_kht_params)); }
/** @@ -425,7 +425,7 @@ void ipc_rmid(struct ipc_ids *ids, struc { int lid = ipcid_to_idx(ipcp->id);
- idr_remove(&ids->ipcs_idr, lid); + WARN_ON_ONCE(idr_remove(&ids->ipcs_idr, lid) != ipcp); ipc_kht_remove(ids, ipcp); ids->in_use--; ipcp->deleted = true;
From: Benjamin Coddington bcodding@redhat.com
commit 3f015d89a47cd8855cd92f71fff770095bd885a1 upstream.
The mechanism in use to allow the client to see the results of COPY/CLONE is to drop those pages from the pagecache. This forces the client to read those pages once more from the server. However, truncate_pagecache_range() zeros out partial pages instead of dropping them. Let us instead use invalidate_inode_pages2_range() with full-page offsets to ensure the client properly sees the results of COPY/CLONE operations.
Cc: stable@vger.kernel.org # v4.7+ Fixes: 2e72448b07dc ("NFS: Add COPY nfs operation") Signed-off-by: Benjamin Coddington bcodding@redhat.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfs/nfs42proc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -186,8 +186,9 @@ static ssize_t _nfs42_proc_copy(struct f goto out; }
- truncate_pagecache_range(dst_inode, pos_dst, - pos_dst + res->write_res.count); + WARN_ON_ONCE(invalidate_inode_pages2_range(dst_inode->i_mapping, + pos_dst >> PAGE_SHIFT, + (pos_dst + res->write_res.count - 1) >> PAGE_SHIFT));
status = res->write_res.count; out:
From: Mike Kravetz mike.kravetz@oracle.com
commit dff11abe280b47c21b804a8ace318e0638bb9a49 upstream.
When fixing an issue with PMD sharing and migration, it was discovered via code inspection that other callers of huge_pmd_unshare potentially have an issue with cache and tlb flushing.
Use the routine adjust_range_if_pmd_sharing_possible() to calculate worst case ranges for mmu notifiers. Ensure that this range is flushed if huge_pmd_unshare succeeds and unmaps a PUD_SUZE area.
Link: http://lkml.kernel.org/r/20180823205917.16297-3-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz mike.kravetz@oracle.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reviewed-by: Naoya Horiguchi n-horiguchi@ah.jp.nec.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Davidlohr Bueso dave@stgolabs.net Cc: Michal Hocko mhocko@kernel.org Cc: Jerome Glisse jglisse@redhat.com Cc: Mike Kravetz mike.kravetz@oracle.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/hugetlb.c | 53 +++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 43 insertions(+), 10 deletions(-)
--- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3384,8 +3384,8 @@ void __unmap_hugepage_range(struct mmu_g struct page *page; struct hstate *h = hstate_vma(vma); unsigned long sz = huge_page_size(h); - const unsigned long mmun_start = start; /* For mmu_notifiers */ - const unsigned long mmun_end = end; /* For mmu_notifiers */ + unsigned long mmun_start = start; /* For mmu_notifiers */ + unsigned long mmun_end = end; /* For mmu_notifiers */ bool force_flush = false;
WARN_ON(!is_vm_hugetlb_page(vma)); @@ -3398,6 +3398,11 @@ void __unmap_hugepage_range(struct mmu_g */ tlb_remove_check_page_size_change(tlb, sz); tlb_start_vma(tlb, vma); + + /* + * If sharing possible, alert mmu notifiers of worst case. + */ + adjust_range_if_pmd_sharing_possible(vma, &mmun_start, &mmun_end); mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end); address = start; for (; address < end; address += sz) { @@ -3508,12 +3513,23 @@ void unmap_hugepage_range(struct vm_area { struct mm_struct *mm; struct mmu_gather tlb; + unsigned long tlb_start = start; + unsigned long tlb_end = end; + + /* + * If shared PMDs were possibly used within this vma range, adjust + * start/end for worst case tlb flushing. + * Note that we can not be sure if PMDs are shared until we try to + * unmap pages. However, we want to make sure TLB flushing covers + * the largest possible range. + */ + adjust_range_if_pmd_sharing_possible(vma, &tlb_start, &tlb_end);
mm = vma->vm_mm;
- tlb_gather_mmu(&tlb, mm, start, end); + tlb_gather_mmu(&tlb, mm, tlb_start, tlb_end); __unmap_hugepage_range(&tlb, vma, start, end, ref_page); - tlb_finish_mmu(&tlb, start, end); + tlb_finish_mmu(&tlb, tlb_start, tlb_end); }
/* @@ -4408,11 +4424,21 @@ unsigned long hugetlb_change_protection( pte_t pte; struct hstate *h = hstate_vma(vma); unsigned long pages = 0; + unsigned long f_start = start; + unsigned long f_end = end; + bool shared_pmd = false; + + /* + * In the case of shared PMDs, the area to flush could be beyond + * start/end. Set f_start/f_end to cover the maximum possible + * range if PMD sharing is possible. + */ + adjust_range_if_pmd_sharing_possible(vma, &f_start, &f_end);
BUG_ON(address >= end); - flush_cache_range(vma, address, end); + flush_cache_range(vma, f_start, f_end);
- mmu_notifier_invalidate_range_start(mm, start, end); + mmu_notifier_invalidate_range_start(mm, f_start, f_end); i_mmap_lock_write(vma->vm_file->f_mapping); for (; address < end; address += huge_page_size(h)) { spinlock_t *ptl; @@ -4423,6 +4449,7 @@ unsigned long hugetlb_change_protection( if (huge_pmd_unshare(mm, &address, ptep)) { pages++; spin_unlock(ptl); + shared_pmd = true; continue; } pte = huge_ptep_get(ptep); @@ -4458,12 +4485,18 @@ unsigned long hugetlb_change_protection( * Must flush TLB before releasing i_mmap_rwsem: x86's huge_pmd_unshare * may have cleared our pud entry and done put_page on the page table: * once we release i_mmap_rwsem, another task can do the final put_page - * and that page table be reused and filled with junk. + * and that page table be reused and filled with junk. If we actually + * did unshare a page of pmds, flush the range corresponding to the pud. */ - flush_hugetlb_tlb_range(vma, start, end); - mmu_notifier_invalidate_range(mm, start, end); + if (shared_pmd) { + flush_hugetlb_tlb_range(vma, f_start, f_end); + mmu_notifier_invalidate_range(mm, f_start, f_end); + } else { + flush_hugetlb_tlb_range(vma, start, end); + mmu_notifier_invalidate_range(mm, start, end); + } i_mmap_unlock_write(vma->vm_file->f_mapping); - mmu_notifier_invalidate_range_end(mm, start, end); + mmu_notifier_invalidate_range_end(mm, f_start, f_end);
return pages << h->order; }
From: liuguoqiang liuguoqiang@uniontech.com
[ Upstream commit 6def480181f15f6d9ec812bca8cbc62451ba314c ]
When kmemdup called failed and register_net_sysctl return NULL, should return ENOMEM instead of ENOBUFS
Signed-off-by: liuguoqiang liuguoqiang@uniontech.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/devinet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index d4d53aea2c600..d9bb3ae785608 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -2324,7 +2324,7 @@ static int __devinet_sysctl_register(struct net *net, char *dev_name, free: kfree(t); out: - return -ENOBUFS; + return -ENOMEM; }
static void __devinet_sysctl_unregister(struct net *net,
From: Slark Xiao slark_xiao@163.com
[ Upstream commit 39f53292181081d35174a581a98441de5da22bc9 ]
When WWAN device wake from S3 deep, under thinkpad platform, WWAN would be disabled. This disable status could be checked by command 'nmcli r wwan' or 'rfkill list'.
Issue analysis as below: When host resume from S3 deep, thinkpad_acpi driver would call hotkey_resume() function. Finnaly, it will use wan_get_status to check the current status of WWAN device. During this resume progress, wan_get_status would always return off even WWAN boot up completely. In patch V2, Hans said 'sw_state should be unchanged after a suspend/resume. It's better to drop the tpacpi_rfk_update_swstate call all together from the resume path'. And it's confimed by Lenovo that GWAN is no longer available from WHL generation because the design does not match with current pin control.
Signed-off-by: Slark Xiao slark_xiao@163.com Link: https://lore.kernel.org/r/20211108060648.8212-1-slark_xiao@163.com Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/thinkpad_acpi.c | 12 ------------ 1 file changed, 12 deletions(-)
diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c index 9d836d779d475..05b3e0f724fcf 100644 --- a/drivers/platform/x86/thinkpad_acpi.c +++ b/drivers/platform/x86/thinkpad_acpi.c @@ -1180,15 +1180,6 @@ static int tpacpi_rfk_update_swstate(const struct tpacpi_rfk *tp_rfk) return status; }
-/* Query FW and update rfkill sw state for all rfkill switches */ -static void tpacpi_rfk_update_swstate_all(void) -{ - unsigned int i; - - for (i = 0; i < TPACPI_RFK_SW_MAX; i++) - tpacpi_rfk_update_swstate(tpacpi_rfkill_switches[i]); -} - /* * Sync the HW-blocking state of all rfkill switches, * do notice it causes the rfkill core to schedule uevents @@ -3025,9 +3016,6 @@ static void tpacpi_send_radiosw_update(void) if (wlsw == TPACPI_RFK_RADIO_OFF) tpacpi_rfk_update_hwblock_state(true);
- /* Sync sw blocking state */ - tpacpi_rfk_update_swstate_all(); - /* Sync hw blocking state last if it is hw-unblocked */ if (wlsw == TPACPI_RFK_RADIO_ON) tpacpi_rfk_update_hwblock_state(false);
From: Vasily Gorbik gor@linux.ibm.com
[ Upstream commit 5dbc4cb4667457b0c53bcd7bff11500b3c362975 ]
There is a difference in how architectures treat "mem=" option. For some that is an amount of online memory, for s390 and x86 this is the limiting max address. Some memblock api like memblock_enforce_memory_limit() take limit argument and explicitly treat it as the size of online memory, and use __find_max_addr to convert it to an actual max address. Current s390 usage:
memblock_enforce_memory_limit(memblock_end_of_DRAM());
yields different results depending on presence of memory holes (offline memory blocks in between online memory). If there are no memory holes limit == max_addr in memblock_enforce_memory_limit() and it does trim online memory and reserved memory regions. With memory holes present it actually does nothing.
Since we already use memblock_remove() explicitly to trim online memory regions to potential limit (think mem=, kdump, addressing limits, etc.) drop the usage of memblock_enforce_memory_limit() altogether. Trimming reserved regions should not be required, since we now use memblock_set_current_limit() to limit allocations and any explicit memory reservations above the limit is an actual problem we should not hide.
Reviewed-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/kernel/setup.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c index ceaee215e2436..e9ef093eb6767 100644 --- a/arch/s390/kernel/setup.c +++ b/arch/s390/kernel/setup.c @@ -706,9 +706,6 @@ static void __init setup_memory(void) storage_key_init_range(reg->base, reg->base + reg->size); } psw_set_key(PAGE_DEFAULT_KEY); - - /* Only cosmetics */ - memblock_enforce_memory_limit(memblock_end_of_DRAM()); }
/*
From: Wang Yugui wangyugui@e16-tech.com
[ Upstream commit a91cf0ffbc244792e0b3ecf7d0fddb2f344b461f ]
When a disk has write caching disabled, we skip submission of a bio with flush and sync requests before writing the superblock, since it's not needed. However when the integrity checker is enabled, this results in reports that there are metadata blocks referred by a superblock that were not properly flushed. So don't skip the bio submission only when the integrity checker is enabled for the sake of simplicity, since this is a debug tool and not meant for use in non-debug builds.
fstests/btrfs/220 trigger a check-integrity warning like the following when CONFIG_BTRFS_FS_CHECK_INTEGRITY=y and the disk with WCE=0.
btrfs: attempt to write superblock which references block M @5242880 (sdb2/5242880/0) which is not flushed out of disk's write cache (block flush_gen=1, dev->flush_gen=0)! ------------[ cut here ]------------ WARNING: CPU: 28 PID: 843680 at fs/btrfs/check-integrity.c:2196 btrfsic_process_written_superblock+0x22a/0x2a0 [btrfs] CPU: 28 PID: 843680 Comm: umount Not tainted 5.15.0-0.rc5.39.el8.x86_64 #1 Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019 RIP: 0010:btrfsic_process_written_superblock+0x22a/0x2a0 [btrfs] RSP: 0018:ffffb642afb47940 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000 RDX: 00000000ffffffff RSI: ffff8b722fc97d00 RDI: ffff8b722fc97d00 RBP: ffff8b5601c00000 R08: 0000000000000000 R09: c0000000ffff7fff R10: 0000000000000001 R11: ffffb642afb476f8 R12: ffffffffffffffff R13: ffffb642afb47974 R14: ffff8b5499254c00 R15: 0000000000000003 FS: 00007f00a06d4080(0000) GS:ffff8b722fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fff5cff5ff0 CR3: 00000001c0c2a006 CR4: 00000000001706e0 Call Trace: btrfsic_process_written_block+0x2f7/0x850 [btrfs] __btrfsic_submit_bio.part.19+0x310/0x330 [btrfs] ? bio_associate_blkg_from_css+0xa4/0x2c0 btrfsic_submit_bio+0x18/0x30 [btrfs] write_dev_supers+0x81/0x2a0 [btrfs] ? find_get_pages_range_tag+0x219/0x280 ? pagevec_lookup_range_tag+0x24/0x30 ? __filemap_fdatawait_range+0x6d/0xf0 ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e ? find_first_extent_bit+0x9b/0x160 [btrfs] ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e write_all_supers+0x1b3/0xa70 [btrfs] ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e btrfs_commit_transaction+0x59d/0xac0 [btrfs] close_ctree+0x11d/0x339 [btrfs] generic_shutdown_super+0x71/0x110 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0xb8/0x140 task_work_run+0x6d/0xb0 exit_to_user_mode_prepare+0x1f0/0x200 syscall_exit_to_user_mode+0x12/0x30 do_syscall_64+0x46/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f009f711dfb RSP: 002b:00007fff5cff7928 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 000055b68c6c9970 RCX: 00007f009f711dfb RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055b68c6c9b50 RBP: 0000000000000000 R08: 000055b68c6ca900 R09: 00007f009f795580 R10: 0000000000000000 R11: 0000000000000246 R12: 000055b68c6c9b50 R13: 00007f00a04bf184 R14: 0000000000000000 R15: 00000000ffffffff ---[ end trace 2c4b82abcef9eec4 ]--- S-65536(sdb2/65536/1) --> M-1064960(sdb2/1064960/1)
Reviewed-by: Filipe Manana fdmanana@gmail.com Signed-off-by: Wang Yugui wangyugui@e16-tech.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/disk-io.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index ace58d6a270b6..41ebc613ca4cf 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3339,11 +3339,23 @@ static void btrfs_end_empty_barrier(struct bio *bio) */ static void write_dev_flush(struct btrfs_device *device) { - struct request_queue *q = bdev_get_queue(device->bdev); struct bio *bio = device->flush_bio;
+#ifndef CONFIG_BTRFS_FS_CHECK_INTEGRITY + /* + * When a disk has write caching disabled, we skip submission of a bio + * with flush and sync requests before writing the superblock, since + * it's not needed. However when the integrity checker is enabled, this + * results in reports that there are metadata blocks referred by a + * superblock that were not properly flushed. So don't skip the bio + * submission only when the integrity checker is enabled for the sake + * of simplicity, since this is a debug tool and not meant for use in + * non-debug builds. + */ + struct request_queue *q = bdev_get_queue(device->bdev); if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) return; +#endif
bio_reset(bio); bio->bi_end_io = btrfs_end_empty_barrier;
From: Manaf Meethalavalappu Pallikunhi manafm@codeaurora.org
[ Upstream commit 99b63316c39988039965693f5f43d8b4ccb1c86c ]
During the suspend is in process, thermal_zone_device_update bails out thermal zone re-evaluation for any sensor trip violation without setting next valid trip to that sensor. It assumes during resume it will re-evaluate same thermal zone and update trip. But when it is in suspend temperature goes down and on resume path while updating thermal zone if temperature is less than previously violated trip, thermal zone set trip function evaluates the same previous high and previous low trip as new high and low trip. Since there is no change in high/low trip, it bails out from thermal zone set trip API without setting any trip. It leads to a case where sensor high trip or low trip is disabled forever even though thermal zone has a valid high or low trip.
During thermal zone device init, reset thermal zone previous high and low trip. It resolves above mentioned scenario.
Signed-off-by: Manaf Meethalavalappu Pallikunhi manafm@codeaurora.org Reviewed-by: Thara Gopinath thara.gopinath@linaro.org Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/thermal_core.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 94820f25a15ff..8374b8078b7df 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -457,6 +457,8 @@ static void thermal_zone_device_init(struct thermal_zone_device *tz) { struct thermal_instance *pos; tz->temperature = THERMAL_TEMP_INVALID; + tz->prev_low_trip = -INT_MAX; + tz->prev_high_trip = INT_MAX; list_for_each_entry(pos, &tz->thermal_instances, tz_node) pos->initialized = false; }
From: Mike Christie michael.christie@oracle.com
[ Upstream commit a0c2f8b6709a9a4af175497ca65f93804f57b248 ]
We can race where iscsi_session_recovery_timedout() has woken up the error handler thread and it's now setting the devices to offline, and session_recovery_timedout()'s call to scsi_target_unblock() is also trying to set the device's state to transport-offline. We can then get a mix of states.
For the case where we can't relogin we want the devices to be in transport-offline so when we have repaired the connection __iscsi_unblock_session() can set the state back to running.
Set the device state then call into libiscsi to wake up the error handler.
Link: https://lore.kernel.org/r/20211105221048.6541-2-michael.christie@oracle.com Reviewed-by: Lee Duncan lduncan@suse.com Signed-off-by: Mike Christie michael.christie@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/scsi_transport_iscsi.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c index d276d84c0f7a2..26c6f1b288013 100644 --- a/drivers/scsi/scsi_transport_iscsi.c +++ b/drivers/scsi/scsi_transport_iscsi.c @@ -1892,12 +1892,12 @@ static void session_recovery_timedout(struct work_struct *work) } spin_unlock_irqrestore(&session->lock, flags);
- if (session->transport->session_recovery_timedout) - session->transport->session_recovery_timedout(session); - ISCSI_DBG_TRANS_SESSION(session, "Unblocking SCSI target\n"); scsi_target_unblock(&session->dev, SDEV_TRANSPORT_OFFLINE); ISCSI_DBG_TRANS_SESSION(session, "Completed unblocking SCSI target\n"); + + if (session->transport->session_recovery_timedout) + session->transport->session_recovery_timedout(session); }
static void __iscsi_unblock_session(struct work_struct *work)
From: Teng Qi starmiku1207184332@gmail.com
[ Upstream commit a66998e0fbf213d47d02813b9679426129d0d114 ]
The if statement: if (port >= DSAF_GE_NUM) return;
limits the value of port less than DSAF_GE_NUM (i.e., 8). However, if the value of port is 6 or 7, an array overflow could occur: port_rst_off = dsaf_dev->mac_cb[port]->port_rst_off;
because the length of dsaf_dev->mac_cb is DSAF_MAX_PORT_NUM (i.e., 6).
To fix this possible array overflow, we first check port and if it is greater than or equal to DSAF_MAX_PORT_NUM, the function returns.
Reported-by: TOTE Robot oslab@tsinghua.edu.cn Signed-off-by: Teng Qi starmiku1207184332@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c index 408b63faf9a81..c4e56784ed1b8 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c @@ -336,6 +336,10 @@ static void hns_dsaf_ge_srst_by_port(struct dsaf_device *dsaf_dev, u32 port, return;
if (!HNS_DSAF_IS_DEBUG(dsaf_dev)) { + /* DSAF_MAX_PORT_NUM is 6, but DSAF_GE_NUM is 8. + We need check to prevent array overflow */ + if (port >= DSAF_MAX_PORT_NUM) + return; reg_val_1 = 0x1 << port; port_rst_off = dsaf_dev->mac_cb[port]->port_rst_off; /* there is difference between V1 and V2 in register.*/
From: zhangyue zhangyue1@kylinos.cn
[ Upstream commit 61217be886b5f7402843677e4be7e7e83de9cb41 ]
In line 5001, if all id in the array 'lp->phy[8]' is not 0, when the 'for' end, the 'k' is 8.
At this time, the array 'lp->phy[8]' may be out of bound.
Signed-off-by: zhangyue zhangyue1@kylinos.cn Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/dec/tulip/de4x5.c | 30 +++++++++++++++----------- 1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/dec/tulip/de4x5.c b/drivers/net/ethernet/dec/tulip/de4x5.c index 8f108a30cba66..84cf7b4582f3e 100644 --- a/drivers/net/ethernet/dec/tulip/de4x5.c +++ b/drivers/net/ethernet/dec/tulip/de4x5.c @@ -4994,19 +4994,23 @@ mii_get_phy(struct net_device *dev) } if ((j == limit) && (i < DE4X5_MAX_MII)) { for (k=0; k < DE4X5_MAX_PHY && lp->phy[k].id; k++); - lp->phy[k].addr = i; - lp->phy[k].id = id; - lp->phy[k].spd.reg = GENERIC_REG; /* ANLPA register */ - lp->phy[k].spd.mask = GENERIC_MASK; /* 100Mb/s technologies */ - lp->phy[k].spd.value = GENERIC_VALUE; /* TX & T4, H/F Duplex */ - lp->mii_cnt++; - lp->active++; - printk("%s: Using generic MII device control. If the board doesn't operate,\nplease mail the following dump to the author:\n", dev->name); - j = de4x5_debug; - de4x5_debug |= DEBUG_MII; - de4x5_dbg_mii(dev, k); - de4x5_debug = j; - printk("\n"); + if (k < DE4X5_MAX_PHY) { + lp->phy[k].addr = i; + lp->phy[k].id = id; + lp->phy[k].spd.reg = GENERIC_REG; /* ANLPA register */ + lp->phy[k].spd.mask = GENERIC_MASK; /* 100Mb/s technologies */ + lp->phy[k].spd.value = GENERIC_VALUE; /* TX & T4, H/F Duplex */ + lp->mii_cnt++; + lp->active++; + printk("%s: Using generic MII device control. If the board doesn't operate,\nplease mail the following dump to the author:\n", dev->name); + j = de4x5_debug; + de4x5_debug |= DEBUG_MII; + de4x5_dbg_mii(dev, k); + de4x5_debug = j; + printk("\n"); + } else { + goto purgatory; + } } } purgatory:
From: Teng Qi starmiku1207184332@gmail.com
[ Upstream commit 0fa68da72c3be09e06dd833258ee89c33374195f ]
The definition of macro MOTO_SROM_BUG is: #define MOTO_SROM_BUG (lp->active == 8 && (get_unaligned_le32( dev->dev_addr) & 0x00ffffff) == 0x3e0008)
and the if statement if (MOTO_SROM_BUG) lp->active = 0;
using this macro indicates lp->active could be 8. If lp->active is 8 and the second comparison of this macro is false. lp->active will remain 8 in: lp->phy[lp->active].gep = (*p ? p : NULL); p += (2 * (*p) + 1); lp->phy[lp->active].rst = (*p ? p : NULL); p += (2 * (*p) + 1); lp->phy[lp->active].mc = get_unaligned_le16(p); p += 2; lp->phy[lp->active].ana = get_unaligned_le16(p); p += 2; lp->phy[lp->active].fdx = get_unaligned_le16(p); p += 2; lp->phy[lp->active].ttm = get_unaligned_le16(p); p += 2; lp->phy[lp->active].mci = *p;
However, the length of array lp->phy is 8, so array overflows can occur. To fix these possible array overflows, we first check lp->active and then return -EINVAL if it is greater or equal to ARRAY_SIZE(lp->phy) (i.e. 8).
Reported-by: TOTE Robot oslab@tsinghua.edu.cn Signed-off-by: Teng Qi starmiku1207184332@gmail.com Reviewed-by: Arnd Bergmann arnd@arndb.de Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/dec/tulip/de4x5.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/dec/tulip/de4x5.c b/drivers/net/ethernet/dec/tulip/de4x5.c index 84cf7b4582f3e..09a65f8f62038 100644 --- a/drivers/net/ethernet/dec/tulip/de4x5.c +++ b/drivers/net/ethernet/dec/tulip/de4x5.c @@ -4703,6 +4703,10 @@ type3_infoblock(struct net_device *dev, u_char count, u_char *p) lp->ibn = 3; lp->active = *p++; if (MOTO_SROM_BUG) lp->active = 0; + /* if (MOTO_SROM_BUG) statement indicates lp->active could + * be 8 (i.e. the size of array lp->phy) */ + if (WARN_ON(lp->active >= ARRAY_SIZE(lp->phy))) + return -EINVAL; lp->phy[lp->active].gep = (*p ? p : NULL); p += (2 * (*p) + 1); lp->phy[lp->active].rst = (*p ? p : NULL); p += (2 * (*p) + 1); lp->phy[lp->active].mc = get_unaligned_le16(p); p += 2;
From: Ian Rogers irogers@google.com
[ Upstream commit 0ca1f534a776cc7d42f2c33da4732b74ec2790cd ]
perf_hpp__column_unregister() removes an entry from a list but doesn't free the memory causing a memory leak spotted by leak sanitizer.
Add the free while at the same time reducing the scope of the function to static.
Signed-off-by: Ian Rogers irogers@google.com Reviewed-by: Kajol Jain kjain@linux.ibm.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Jiri Olsa jolsa@redhat.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Link: http://lore.kernel.org/lkml/20211118071247.2140392-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/ui/hist.c | 28 ++++++++++++++-------------- tools/perf/util/hist.h | 1 - 2 files changed, 14 insertions(+), 15 deletions(-)
diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c index 706f6f1e9c7d6..445a7012b1179 100644 --- a/tools/perf/ui/hist.c +++ b/tools/perf/ui/hist.c @@ -468,6 +468,18 @@ struct perf_hpp_list perf_hpp_list = { #undef __HPP_SORT_ACC_FN #undef __HPP_SORT_RAW_FN
+static void fmt_free(struct perf_hpp_fmt *fmt) +{ + /* + * At this point fmt should be completely + * unhooked, if not it's a bug. + */ + BUG_ON(!list_empty(&fmt->list)); + BUG_ON(!list_empty(&fmt->sort_list)); + + if (fmt->free) + fmt->free(fmt); +}
void perf_hpp__init(void) { @@ -531,9 +543,10 @@ void perf_hpp_list__prepend_sort_field(struct perf_hpp_list *list, list_add(&format->sort_list, &list->sorts); }
-void perf_hpp__column_unregister(struct perf_hpp_fmt *format) +static void perf_hpp__column_unregister(struct perf_hpp_fmt *format) { list_del_init(&format->list); + fmt_free(format); }
void perf_hpp__cancel_cumulate(void) @@ -605,19 +618,6 @@ void perf_hpp__append_sort_keys(struct perf_hpp_list *list) }
-static void fmt_free(struct perf_hpp_fmt *fmt) -{ - /* - * At this point fmt should be completely - * unhooked, if not it's a bug. - */ - BUG_ON(!list_empty(&fmt->list)); - BUG_ON(!list_empty(&fmt->sort_list)); - - if (fmt->free) - fmt->free(fmt); -} - void perf_hpp__reset_output_field(struct perf_hpp_list *list) { struct perf_hpp_fmt *fmt, *tmp; diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index 595f91f46811f..2eb71eeec4858 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -339,7 +339,6 @@ enum { };
void perf_hpp__init(void); -void perf_hpp__column_unregister(struct perf_hpp_fmt *format); void perf_hpp__cancel_cumulate(void); void perf_hpp__setup_output_field(struct perf_hpp_list *list); void perf_hpp__reset_output_field(struct perf_hpp_list *list);
From: Stephen Suryaputra ssuryaextr@gmail.com
commit ee201011c1e1563c114a55c86eb164b236f18e84 upstream.
IPCB/IP6CB need to be initialized when processing outbound v4 or v6 pkts in the codepath of vrf device xmit function so that leftover garbage doesn't cause futher code that uses the CB to incorrectly process the pkt.
One occasion of the issue might occur when MPLS route uses the vrf device as the outgoing device such as when the route is added using "ip -f mpls route add <label> dev <vrf>" command.
The problems seems to exist since day one. Hence I put the day one commits on the Fixes tags.
Fixes: 193125dbd8eb ("net: Introduce VRF device driver") Fixes: 35402e313663 ("net: Add IPv6 support to VRF device") Cc: stable@vger.kernel.org Signed-off-by: Stephen Suryaputra ssuryaextr@gmail.com Reviewed-by: David Ahern dsahern@kernel.org Link: https://lore.kernel.org/r/20211130162637.3249-1-ssuryaextr@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/vrf.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -208,6 +208,7 @@ static netdev_tx_t vrf_process_v6_outbou /* strip the ethernet header added for pass through VRF device */ __skb_pull(skb, skb_network_offset(skb));
+ memset(IP6CB(skb), 0, sizeof(*IP6CB(skb))); ret = vrf_ip6_local_out(net, skb->sk, skb); if (unlikely(net_xmit_eval(ret))) dev->stats.tx_errors++; @@ -289,6 +290,7 @@ static netdev_tx_t vrf_process_v4_outbou RT_SCOPE_LINK); }
+ memset(IPCB(skb), 0, sizeof(*IPCB(skb))); ret = vrf_ip_local_out(dev_net(skb_dst(skb)->dev), skb->sk, skb); if (unlikely(net_xmit_eval(ret))) vrf_dev->stats.tx_errors++;
From: Masami Hiramatsu mhiramat@kernel.org
commit 6bbfa44116689469267f1a6e3d233b52114139d2 upstream.
The 'kprobe::data_size' is unsigned, thus it can not be negative. But if user sets it enough big number (e.g. (size_t)-8), the result of 'data_size + sizeof(struct kretprobe_instance)' becomes smaller than sizeof(struct kretprobe_instance) or zero. In result, the kretprobe_instance are allocated without enough memory, and kretprobe accesses outside of allocated memory.
To avoid this issue, introduce a max limitation of the kretprobe::data_size. 4KB per instance should be OK.
Link: https://lkml.kernel.org/r/163836995040.432120.10322772773821182925.stgit@dev...
Cc: stable@vger.kernel.org Fixes: f47cd9b553aa ("kprobes: kretprobe user entry-handler") Reported-by: zhangyue zhangyue1@kylinos.cn Signed-off-by: Masami Hiramatsu mhiramat@kernel.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/kprobes.h | 2 ++ kernel/kprobes.c | 3 +++ 2 files changed, 5 insertions(+)
--- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -193,6 +193,8 @@ struct kretprobe { raw_spinlock_t lock; };
+#define KRETPROBE_MAX_DATA_SIZE 4096 + struct kretprobe_instance { struct hlist_node hlist; struct kretprobe *rp; --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -2004,6 +2004,9 @@ int register_kretprobe(struct kretprobe } }
+ if (rp->data_size > KRETPROBE_MAX_DATA_SIZE) + return -E2BIG; + rp->kp.pre_handler = pre_handler_kretprobe; rp->kp.post_handler = NULL; rp->kp.fault_handler = NULL;
From: Baokun Li libaokun1@huawei.com
commit 6c8ad7e8cf29eb55836e7a0215f967746ab2b504 upstream.
When the `rmmod sata_fsl.ko` command is executed in the PPC64 GNU/Linux, a bug is reported: ================================================================== BUG: Unable to handle kernel data access on read at 0x80000800805b502c Oops: Kernel access of bad area, sig: 11 [#1] NIP [c0000000000388a4] .ioread32+0x4/0x20 LR [80000000000c6034] .sata_fsl_port_stop+0x44/0xe0 [sata_fsl] Call Trace: .free_irq+0x1c/0x4e0 (unreliable) .ata_host_stop+0x74/0xd0 [libata] .release_nodes+0x330/0x3f0 .device_release_driver_internal+0x178/0x2c0 .driver_detach+0x64/0xd0 .bus_remove_driver+0x70/0xf0 .driver_unregister+0x38/0x80 .platform_driver_unregister+0x14/0x30 .fsl_sata_driver_exit+0x18/0xa20 [sata_fsl] .__se_sys_delete_module+0x1ec/0x2d0 .system_call_exception+0xfc/0x1f0 system_call_common+0xf8/0x200 ==================================================================
The triggering of the BUG is shown in the following stack:
driver_detach device_release_driver_internal __device_release_driver drv->remove(dev) --> platform_drv_remove/platform_remove drv->remove(dev) --> sata_fsl_remove iounmap(host_priv->hcr_base); <---- unmap kfree(host_priv); <---- free devres_release_all release_nodes dr->node.release(dev, dr->data) --> ata_host_stop ap->ops->port_stop(ap) --> sata_fsl_port_stop ioread32(hcr_base + HCONTROL) <---- UAF host->ops->host_stop(host)
The iounmap(host_priv->hcr_base) and kfree(host_priv) functions should not be executed in drv->remove. These functions should be executed in host_stop after port_stop. Therefore, we move these functions to the new function sata_fsl_host_stop and bind the new function to host_stop.
Fixes: faf0b2e5afe7 ("drivers/ata: add support to Freescale 3.0Gbps SATA Controller") Cc: stable@vger.kernel.org Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Sergei Shtylyov sergei.shtylyov@gmail.com Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ata/sata_fsl.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
--- a/drivers/ata/sata_fsl.c +++ b/drivers/ata/sata_fsl.c @@ -1406,6 +1406,14 @@ static int sata_fsl_init_controller(stru return 0; }
+static void sata_fsl_host_stop(struct ata_host *host) +{ + struct sata_fsl_host_priv *host_priv = host->private_data; + + iounmap(host_priv->hcr_base); + kfree(host_priv); +} + /* * scsi mid-layer and libata interface structures */ @@ -1438,6 +1446,8 @@ static struct ata_port_operations sata_f .port_start = sata_fsl_port_start, .port_stop = sata_fsl_port_stop,
+ .host_stop = sata_fsl_host_stop, + .pmp_attach = sata_fsl_pmp_attach, .pmp_detach = sata_fsl_pmp_detach, }; @@ -1570,8 +1580,6 @@ static int sata_fsl_remove(struct platfo ata_host_detach(host);
irq_dispose_mapping(host_priv->irq); - iounmap(host_priv->hcr_base); - kfree(host_priv);
return 0; }
From: Baokun Li libaokun1@huawei.com
commit 6f48394cf1f3e8486591ad98c11cdadb8f1ef2ad upstream.
Trying to remove the fsl-sata module in the PPC64 GNU/Linux leads to the following warning: ------------[ cut here ]------------ remove_proc_entry: removing non-empty directory 'irq/69', leaking at least 'fsl-sata[ff0221000.sata]' WARNING: CPU: 3 PID: 1048 at fs/proc/generic.c:722 .remove_proc_entry+0x20c/0x220 IRQMASK: 0 NIP [c00000000033826c] .remove_proc_entry+0x20c/0x220 LR [c000000000338268] .remove_proc_entry+0x208/0x220 Call Trace: .remove_proc_entry+0x208/0x220 (unreliable) .unregister_irq_proc+0x104/0x140 .free_desc+0x44/0xb0 .irq_free_descs+0x9c/0xf0 .irq_dispose_mapping+0x64/0xa0 .sata_fsl_remove+0x58/0xa0 [sata_fsl] .platform_drv_remove+0x40/0x90 .device_release_driver_internal+0x160/0x2c0 .driver_detach+0x64/0xd0 .bus_remove_driver+0x70/0xf0 .driver_unregister+0x38/0x80 .platform_driver_unregister+0x14/0x30 .fsl_sata_driver_exit+0x18/0xa20 [sata_fsl] ---[ end trace 0ea876d4076908f5 ]---
The driver creates the mapping by calling irq_of_parse_and_map(), so it also has to dispose the mapping. But the easy way out is to simply use platform_get_irq() instead of irq_of_parse_map(). Also we should adapt return value checking and propagate error values.
In this case the mapping is not managed by the device but by the of core, so the device has not to dispose the mapping.
Fixes: faf0b2e5afe7 ("drivers/ata: add support to Freescale 3.0Gbps SATA Controller") Cc: stable@vger.kernel.org Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Sergei Shtylyov sergei.shtylyov@gmail.com Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ata/sata_fsl.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
--- a/drivers/ata/sata_fsl.c +++ b/drivers/ata/sata_fsl.c @@ -1502,9 +1502,9 @@ static int sata_fsl_probe(struct platfor host_priv->ssr_base = ssr_base; host_priv->csr_base = csr_base;
- irq = irq_of_parse_and_map(ofdev->dev.of_node, 0); - if (!irq) { - dev_err(&ofdev->dev, "invalid irq from platform\n"); + irq = platform_get_irq(ofdev, 0); + if (irq < 0) { + retval = irq; goto error_exit_with_cleanup; } host_priv->irq = irq; @@ -1579,8 +1579,6 @@ static int sata_fsl_remove(struct platfo
ata_host_detach(host);
- irq_dispose_mapping(host_priv->irq); - return 0; }
From: Jens Axboe axboe@kernel.dk
commit 091141a42e15fe47ada737f3996b317072afcefb upstream.
Some uses cases repeatedly get and put references to the same file, but the only exposed interface is doing these one at the time. As each of these entail an atomic inc or dec on a shared structure, that cost can add up.
Add fget_many(), which works just like fget(), except it takes an argument for how many references to get on the file. Ditto fput_many(), which can drop an arbitrary number of references to a file.
Reviewed-by: Hannes Reinecke hare@suse.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/file.c | 15 ++++++++++----- fs/file_table.c | 9 +++++++-- include/linux/file.h | 2 ++ include/linux/fs.h | 4 +++- 4 files changed, 22 insertions(+), 8 deletions(-)
--- a/fs/file.c +++ b/fs/file.c @@ -679,7 +679,7 @@ void do_close_on_exec(struct files_struc spin_unlock(&files->file_lock); }
-static struct file *__fget(unsigned int fd, fmode_t mask) +static struct file *__fget(unsigned int fd, fmode_t mask, unsigned int refs) { struct files_struct *files = current->files; struct file *file; @@ -694,7 +694,7 @@ loop: */ if (file->f_mode & mask) file = NULL; - else if (!get_file_rcu(file)) + else if (!get_file_rcu_many(file, refs)) goto loop; } rcu_read_unlock(); @@ -702,15 +702,20 @@ loop: return file; }
+struct file *fget_many(unsigned int fd, unsigned int refs) +{ + return __fget(fd, FMODE_PATH, refs); +} + struct file *fget(unsigned int fd) { - return __fget(fd, FMODE_PATH); + return __fget(fd, FMODE_PATH, 1); } EXPORT_SYMBOL(fget);
struct file *fget_raw(unsigned int fd) { - return __fget(fd, 0); + return __fget(fd, 0, 1); } EXPORT_SYMBOL(fget_raw);
@@ -741,7 +746,7 @@ static unsigned long __fget_light(unsign return 0; return (unsigned long)file; } else { - file = __fget(fd, mask); + file = __fget(fd, mask, 1); if (!file) return 0; return FDPUT_FPUT | (unsigned long)file; --- a/fs/file_table.c +++ b/fs/file_table.c @@ -261,9 +261,9 @@ void flush_delayed_fput(void)
static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
-void fput(struct file *file) +void fput_many(struct file *file, unsigned int refs) { - if (atomic_long_dec_and_test(&file->f_count)) { + if (atomic_long_sub_and_test(refs, &file->f_count)) { struct task_struct *task = current;
if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) { @@ -282,6 +282,11 @@ void fput(struct file *file) } }
+void fput(struct file *file) +{ + fput_many(file, 1); +} + /* * synchronous analog of fput(); for kernel threads that might be needed * in some umount() (and thus can't use flush_delayed_fput() without --- a/include/linux/file.h +++ b/include/linux/file.h @@ -13,6 +13,7 @@ struct file;
extern void fput(struct file *); +extern void fput_many(struct file *, unsigned int);
struct file_operations; struct vfsmount; @@ -41,6 +42,7 @@ static inline void fdput(struct fd fd) }
extern struct file *fget(unsigned int fd); +extern struct file *fget_many(unsigned int fd, unsigned int refs); extern struct file *fget_raw(unsigned int fd); extern unsigned long __fdget(unsigned int fd); extern unsigned long __fdget_raw(unsigned int fd); --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -908,7 +908,9 @@ static inline struct file *get_file(stru atomic_long_inc(&f->f_count); return f; } -#define get_file_rcu(x) atomic_long_inc_not_zero(&(x)->f_count) +#define get_file_rcu_many(x, cnt) \ + atomic_long_add_unless(&(x)->f_count, (cnt), 0) +#define get_file_rcu(x) get_file_rcu_many((x), 1) #define fput_atomic(x) atomic_long_add_unless(&(x)->f_count, -1, 1) #define file_count(x) atomic_long_read(&(x)->f_count)
From: Linus Torvalds torvalds@linux-foundation.org
commit 054aa8d439b9185d4f5eb9a90282d1ce74772969 upstream.
Jann Horn points out that there is another possible race wrt Unix domain socket garbage collection, somewhat reminiscent of the one fixed in commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK").
See the extended comment about the garbage collection requirements added to unix_peek_fds() by that commit for details.
The race comes from how we can locklessly look up a file descriptor just as it is in the process of being closed, and with the right artificial timing (Jann added a few strategic 'mdelay(500)' calls to do that), the Unix domain socket garbage collector could see the reference count decrement of the close() happen before fget() took its reference to the file and the file was attached onto a new file descriptor.
This is all (intentionally) correct on the 'struct file *' side, with RCU lookups and lockless reference counting very much part of the design. Getting that reference count out of order isn't a problem per se.
But the garbage collector can get confused by seeing this situation of having seen a file not having any remaining external references and then seeing it being attached to an fd.
In commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK") the fix was to serialize the file descriptor install with the garbage collector by taking and releasing the unix_gc_lock.
That's not really an option here, but since this all happens when we are in the process of looking up a file descriptor, we can instead simply just re-check that the file hasn't been closed in the meantime, and just re-do the lookup if we raced with a concurrent close() of the same file descriptor.
Reported-and-tested-by: Jann Horn jannh@google.com Acked-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/file.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/fs/file.c +++ b/fs/file.c @@ -696,6 +696,10 @@ loop: file = NULL; else if (!get_file_rcu_many(file, refs)) goto loop; + else if (__fcheck_files(files, fd) != file) { + fput_many(file, refs); + goto loop; + } } rcu_read_unlock();
From: Randy Dunlap rdunlap@infradead.org
commit b0f38e15979fa8851e88e8aa371367f264e7b6e9 upstream.
Fix section mismatch warnings in xtsonic. The first one appears to be bogus and after fixing the second one, the first one is gone.
WARNING: modpost: vmlinux.o(.text+0x529adc): Section mismatch in reference from the function sonic_get_stats() to the function .init.text:set_reset_devices() The function sonic_get_stats() references the function __init set_reset_devices(). This is often because sonic_get_stats lacks a __init annotation or the annotation of set_reset_devices is wrong.
WARNING: modpost: vmlinux.o(.text+0x529b3b): Section mismatch in reference from the function xtsonic_probe() to the function .init.text:sonic_probe1() The function xtsonic_probe() references the function __init sonic_probe1(). This is often because xtsonic_probe lacks a __init annotation or the annotation of sonic_probe1 is wrong.
Fixes: 74f2a5f0ef64 ("xtensa: Add support for the Sonic Ethernet device for the XT2000 board.") Signed-off-by: Randy Dunlap rdunlap@infradead.org Reported-by: kernel test robot lkp@intel.com Cc: Christophe JAILLET christophe.jaillet@wanadoo.fr Cc: Finn Thain fthain@telegraphics.com.au Cc: Chris Zankel chris@zankel.net Cc: linux-xtensa@linux-xtensa.org Cc: Thomas Bogendoerfer tsbogend@alpha.franken.de Acked-by: Max Filippov jcmvbkbc@gmail.com Link: https://lore.kernel.org/r/20211130063947.7529-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/natsemi/xtsonic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/natsemi/xtsonic.c +++ b/drivers/net/ethernet/natsemi/xtsonic.c @@ -128,7 +128,7 @@ static const struct net_device_ops xtson .ndo_set_mac_address = eth_mac_addr, };
-static int __init sonic_probe1(struct net_device *dev) +static int sonic_probe1(struct net_device *dev) { static unsigned version_printed = 0; unsigned int silicon_revision;
From: Zhou Qingyang zhou1615@umn.edu
commit e2dabc4f7e7b60299c20a36d6a7b24ed9bf8e572 upstream.
In qlcnic_83xx_add_rings(), the indirect function of ahw->hw_ops->alloc_mbx_args will be called to allocate memory for cmd.req.arg, and there is a dereference of it in qlcnic_83xx_add_rings(), which could lead to a NULL pointer dereference on failure of the indirect function like qlcnic_83xx_alloc_mbx_args().
Fix this bug by adding a check of alloc_mbx_args(), this patch imitates the logic of mbx_cmd()'s failure handling.
This bug was found by a static analyzer. The analysis employs differential checking to identify inconsistent security operations (e.g., checks or kfrees) between two code paths and confirms that the inconsistent operations are not recovered in the current function or the callers, so they constitute bugs.
Note that, as a bug found by static analysis, it can be a false positive or hard to trigger. Multiple researchers have cross-reviewed the bug.
Builds with CONFIG_QLCNIC=m show no new warnings, and our static analyzer no longer warns about this code.
Fixes: 7f9664525f9c ("qlcnic: 83xx memory map and HW access routine") Signed-off-by: Zhou Qingyang zhou1615@umn.edu Link: https://lore.kernel.org/r/20211130110848.109026-1-zhou1615@umn.edu Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c @@ -1078,8 +1078,14 @@ static int qlcnic_83xx_add_rings(struct sds_mbx_size = sizeof(struct qlcnic_sds_mbx); context_id = recv_ctx->context_id; num_sds = adapter->drv_sds_rings - QLCNIC_MAX_SDS_RINGS; - ahw->hw_ops->alloc_mbx_args(&cmd, adapter, - QLCNIC_CMD_ADD_RCV_RINGS); + err = ahw->hw_ops->alloc_mbx_args(&cmd, adapter, + QLCNIC_CMD_ADD_RCV_RINGS); + if (err) { + dev_err(&adapter->pdev->dev, + "Failed to alloc mbx args %d\n", err); + return err; + } + cmd.req.arg[1] = 0 | (num_sds << 8) | (context_id << 16);
/* set up status rings, mbx 2-81 */
From: Benjamin Poirier bpoirier@nvidia.com
commit 7d4741eacdefa5f0475431645b56baf00784df1f upstream.
There are various problems related to netlink notifications for mpls route changes in response to interfaces being deleted: * delete interface of only nexthop DELROUTE notification is missing RTA_OIF attribute * delete interface of non-last nexthop NEWROUTE notification is missing entirely * delete interface of last nexthop DELROUTE notification is missing nexthop
All of these problems stem from the fact that existing routes are modified in-place before sending a notification. Restructure mpls_ifdown() to avoid changing the route in the DELROUTE cases and to create a copy in the NEWROUTE case.
Fixes: f8efb73c97e2 ("mpls: multipath route support") Signed-off-by: Benjamin Poirier bpoirier@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mpls/af_mpls.c | 68 ++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 52 insertions(+), 16 deletions(-)
--- a/net/mpls/af_mpls.c +++ b/net/mpls/af_mpls.c @@ -1407,22 +1407,52 @@ static void mpls_dev_destroy_rcu(struct kfree(mdev); }
-static void mpls_ifdown(struct net_device *dev, int event) +static int mpls_ifdown(struct net_device *dev, int event) { struct mpls_route __rcu **platform_label; struct net *net = dev_net(dev); - u8 alive, deleted; unsigned index;
platform_label = rtnl_dereference(net->mpls.platform_label); for (index = 0; index < net->mpls.platform_labels; index++) { struct mpls_route *rt = rtnl_dereference(platform_label[index]); + bool nh_del = false; + u8 alive = 0;
if (!rt) continue;
- alive = 0; - deleted = 0; + if (event == NETDEV_UNREGISTER) { + u8 deleted = 0; + + for_nexthops(rt) { + struct net_device *nh_dev = + rtnl_dereference(nh->nh_dev); + + if (!nh_dev || nh_dev == dev) + deleted++; + if (nh_dev == dev) + nh_del = true; + } endfor_nexthops(rt); + + /* if there are no more nexthops, delete the route */ + if (deleted == rt->rt_nhn) { + mpls_route_update(net, index, NULL, NULL); + continue; + } + + if (nh_del) { + size_t size = sizeof(*rt) + rt->rt_nhn * + rt->rt_nh_size; + struct mpls_route *orig = rt; + + rt = kmalloc(size, GFP_KERNEL); + if (!rt) + return -ENOMEM; + memcpy(rt, orig, size); + } + } + change_nexthops(rt) { unsigned int nh_flags = nh->nh_flags;
@@ -1446,16 +1476,15 @@ static void mpls_ifdown(struct net_devic next: if (!(nh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN))) alive++; - if (!rtnl_dereference(nh->nh_dev)) - deleted++; } endfor_nexthops(rt);
WRITE_ONCE(rt->rt_nhn_alive, alive);
- /* if there are no more nexthops, delete the route */ - if (event == NETDEV_UNREGISTER && deleted == rt->rt_nhn) - mpls_route_update(net, index, NULL, NULL); + if (nh_del) + mpls_route_update(net, index, rt, NULL); } + + return 0; }
static void mpls_ifup(struct net_device *dev, unsigned int flags) @@ -1519,8 +1548,12 @@ static int mpls_dev_notify(struct notifi return NOTIFY_OK;
switch (event) { + int err; + case NETDEV_DOWN: - mpls_ifdown(dev, event); + err = mpls_ifdown(dev, event); + if (err) + return notifier_from_errno(err); break; case NETDEV_UP: flags = dev_get_flags(dev); @@ -1531,13 +1564,18 @@ static int mpls_dev_notify(struct notifi break; case NETDEV_CHANGE: flags = dev_get_flags(dev); - if (flags & (IFF_RUNNING | IFF_LOWER_UP)) + if (flags & (IFF_RUNNING | IFF_LOWER_UP)) { mpls_ifup(dev, RTNH_F_DEAD | RTNH_F_LINKDOWN); - else - mpls_ifdown(dev, event); + } else { + err = mpls_ifdown(dev, event); + if (err) + return notifier_from_errno(err); + } break; case NETDEV_UNREGISTER: - mpls_ifdown(dev, event); + err = mpls_ifdown(dev, event); + if (err) + return notifier_from_errno(err); mdev = mpls_dev_get(dev); if (mdev) { mpls_dev_sysctl_unregister(dev, mdev); @@ -1548,8 +1586,6 @@ static int mpls_dev_notify(struct notifi case NETDEV_CHANGENAME: mdev = mpls_dev_get(dev); if (mdev) { - int err; - mpls_dev_sysctl_unregister(dev, mdev); err = mpls_dev_sysctl_register(dev, mdev); if (err)
From: Arnd Bergmann arnd@arndb.de
commit f7e5b9bfa6c8820407b64eabc1f29c9a87e8993d upstream.
On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS because the ordinary load/store instructions (ldr, ldrh, ldrb) can tolerate any misalignment of the memory address. However, load/store double and load/store multiple instructions (ldrd, ldm) may still only be used on memory addresses that are 32-bit aligned, and so we have to use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we may end up with a severe performance hit due to alignment traps that require fixups by the kernel. Testing shows that this currently happens with clang-13 but not gcc-11. In theory, any compiler version can produce this bug or other problems, as we are dealing with undefined behavior in C99 even on architectures that support this in hardware, see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363.
Fortunately, the get_unaligned() accessors do the right thing: when building for ARMv6 or later, the compiler will emit unaligned accesses using the ordinary load/store instructions (but avoid the ones that require 32-bit alignment). When building for older ARM, those accessors will emit the appropriate sequence of ldrb/mov/orr instructions. And on architectures that can truly tolerate any kind of misalignment, the get_unaligned() accessors resolve to the leXX_to_cpup accessors that operate on aligned addresses.
Since the compiler will in fact emit ldrd or ldm instructions when building this code for ARM v6 or later, the solution is to use the unaligned accessors unconditionally on architectures where this is known to be fast. The _aligned version of the hash function is however still needed to get the best performance on architectures that cannot do any unaligned access in hardware.
This new version avoids the undefined behavior and should produce the fastest hash on all architectures we support.
Link: https://lore.kernel.org/linux-arm-kernel/20181008211554.5355-4-ard.biesheuve... Link: https://lore.kernel.org/linux-crypto/CAK8P3a2KfmmGDbVHULWevB0hv71P2oi2ZCHEAq... Reported-by: Ard Biesheuvel ard.biesheuvel@linaro.org Fixes: 2c956a60778c ("siphash: add cryptographically secure PRF") Signed-off-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Jason A. Donenfeld Jason@zx2c4.com Acked-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/siphash.h | 14 ++++---------- lib/siphash.c | 12 ++++++------ 2 files changed, 10 insertions(+), 16 deletions(-)
--- a/include/linux/siphash.h +++ b/include/linux/siphash.h @@ -27,9 +27,7 @@ static inline bool siphash_key_is_zero(c }
u64 __siphash_aligned(const void *data, size_t len, const siphash_key_t *key); -#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS u64 __siphash_unaligned(const void *data, size_t len, const siphash_key_t *key); -#endif
u64 siphash_1u64(const u64 a, const siphash_key_t *key); u64 siphash_2u64(const u64 a, const u64 b, const siphash_key_t *key); @@ -82,10 +80,9 @@ static inline u64 ___siphash_aligned(con static inline u64 siphash(const void *data, size_t len, const siphash_key_t *key) { -#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS - if (!IS_ALIGNED((unsigned long)data, SIPHASH_ALIGNMENT)) + if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) || + !IS_ALIGNED((unsigned long)data, SIPHASH_ALIGNMENT)) return __siphash_unaligned(data, len, key); -#endif return ___siphash_aligned(data, len, key); }
@@ -96,10 +93,8 @@ typedef struct {
u32 __hsiphash_aligned(const void *data, size_t len, const hsiphash_key_t *key); -#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS u32 __hsiphash_unaligned(const void *data, size_t len, const hsiphash_key_t *key); -#endif
u32 hsiphash_1u32(const u32 a, const hsiphash_key_t *key); u32 hsiphash_2u32(const u32 a, const u32 b, const hsiphash_key_t *key); @@ -135,10 +130,9 @@ static inline u32 ___hsiphash_aligned(co static inline u32 hsiphash(const void *data, size_t len, const hsiphash_key_t *key) { -#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS - if (!IS_ALIGNED((unsigned long)data, HSIPHASH_ALIGNMENT)) + if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) || + !IS_ALIGNED((unsigned long)data, HSIPHASH_ALIGNMENT)) return __hsiphash_unaligned(data, len, key); -#endif return ___hsiphash_aligned(data, len, key); }
--- a/lib/siphash.c +++ b/lib/siphash.c @@ -49,6 +49,7 @@ SIPROUND; \ return (v0 ^ v1) ^ (v2 ^ v3);
+#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS u64 __siphash_aligned(const void *data, size_t len, const siphash_key_t *key) { const u8 *end = data + len - (len % sizeof(u64)); @@ -80,8 +81,8 @@ u64 __siphash_aligned(const void *data, POSTAMBLE } EXPORT_SYMBOL(__siphash_aligned); +#endif
-#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS u64 __siphash_unaligned(const void *data, size_t len, const siphash_key_t *key) { const u8 *end = data + len - (len % sizeof(u64)); @@ -113,7 +114,6 @@ u64 __siphash_unaligned(const void *data POSTAMBLE } EXPORT_SYMBOL(__siphash_unaligned); -#endif
/** * siphash_1u64 - compute 64-bit siphash PRF value of a u64 @@ -250,6 +250,7 @@ EXPORT_SYMBOL(siphash_3u32); HSIPROUND; \ return (v0 ^ v1) ^ (v2 ^ v3);
+#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS u32 __hsiphash_aligned(const void *data, size_t len, const hsiphash_key_t *key) { const u8 *end = data + len - (len % sizeof(u64)); @@ -280,8 +281,8 @@ u32 __hsiphash_aligned(const void *data, HPOSTAMBLE } EXPORT_SYMBOL(__hsiphash_aligned); +#endif
-#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS u32 __hsiphash_unaligned(const void *data, size_t len, const hsiphash_key_t *key) { @@ -313,7 +314,6 @@ u32 __hsiphash_unaligned(const void *dat HPOSTAMBLE } EXPORT_SYMBOL(__hsiphash_unaligned); -#endif
/** * hsiphash_1u32 - compute 64-bit hsiphash PRF value of a u32 @@ -418,6 +418,7 @@ EXPORT_SYMBOL(hsiphash_4u32); HSIPROUND; \ return v1 ^ v3;
+#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS u32 __hsiphash_aligned(const void *data, size_t len, const hsiphash_key_t *key) { const u8 *end = data + len - (len % sizeof(u32)); @@ -438,8 +439,8 @@ u32 __hsiphash_aligned(const void *data, HPOSTAMBLE } EXPORT_SYMBOL(__hsiphash_aligned); +#endif
-#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS u32 __hsiphash_unaligned(const void *data, size_t len, const hsiphash_key_t *key) { @@ -461,7 +462,6 @@ u32 __hsiphash_unaligned(const void *dat HPOSTAMBLE } EXPORT_SYMBOL(__hsiphash_unaligned); -#endif
/** * hsiphash_1u32 - compute 32-bit hsiphash PRF value of a u32
From: Zhou Qingyang zhou1615@umn.edu
commit addad7643142f500080417dd7272f49b7a185570 upstream.
In mlx4_en_try_alloc_resources(), mlx4_en_copy_priv() is called and tmp->tx_cq will be freed on the error path of mlx4_en_copy_priv(). After that mlx4_en_alloc_resources() is called and there is a dereference of &tmp->tx_cq[t][i] in mlx4_en_alloc_resources(), which could lead to a use after free problem on failure of mlx4_en_copy_priv().
Fix this bug by adding a check of mlx4_en_copy_priv()
This bug was found by a static analyzer. The analysis employs differential checking to identify inconsistent security operations (e.g., checks or kfrees) between two code paths and confirms that the inconsistent operations are not recovered in the current function or the callers, so they constitute bugs.
Note that, as a bug found by static analysis, it can be a false positive or hard to trigger. Multiple researchers have cross-reviewed the bug.
Builds with CONFIG_MLX4_EN=m show no new warnings, and our static analyzer no longer warns about this code.
Fixes: ec25bc04ed8e ("net/mlx4_en: Add resilience in low memory systems") Signed-off-by: Zhou Qingyang zhou1615@umn.edu Reviewed-by: Leon Romanovsky leonro@nvidia.com Link: https://lore.kernel.org/r/20211130164438.190591-1-zhou1615@umn.edu Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c @@ -2283,9 +2283,14 @@ int mlx4_en_try_alloc_resources(struct m bool carry_xdp_prog) { struct bpf_prog *xdp_prog; - int i, t; + int i, t, ret;
- mlx4_en_copy_priv(tmp, priv, prof); + ret = mlx4_en_copy_priv(tmp, priv, prof); + if (ret) { + en_warn(priv, "%s: mlx4_en_copy_priv() failed, return\n", + __func__); + return ret; + }
if (mlx4_en_alloc_resources(tmp)) { en_warn(priv,
From: Sven Schuchmann schuchmann@schleissheimer.de
commit 817b653160db9852d5a0498a31f047e18ce27e5b upstream.
On most systems request for IRQ 0 will fail, phylib will print an error message and fall back to polling. To fix this set the phydev->irq to PHY_POLL if no IRQ is available.
Fixes: cc89c323a30e ("lan78xx: Use irq_domain for phy interrupt from USB Int. EP") Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: Sven Schuchmann schuchmann@schleissheimer.de Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/lan78xx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -2052,7 +2052,7 @@ static int lan78xx_phy_init(struct lan78 if (dev->domain_data.phyirq > 0) phydev->irq = dev->domain_data.phyirq; else - phydev->irq = 0; + phydev->irq = PHY_POLL; netdev_dbg(dev->net, "phydev->irq = %d\n", phydev->irq);
/* set to AUTOMDIX */
From: William Kucharski william.kucharski@oracle.com
commit 19f36edf14bcdb783aef3af8217df96f76a8ce34 upstream.
Correct an error where setting /proc/sys/net/rds/tcp/rds_tcp_rcvbuf would instead modify the socket's sk_sndbuf and would leave sk_rcvbuf untouched.
Fixes: c6a58ffed536 ("RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket") Signed-off-by: William Kucharski william.kucharski@oracle.com Acked-by: Santosh Shilimkar santosh.shilimkar@oracle.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/rds/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/rds/tcp.c +++ b/net/rds/tcp.c @@ -392,7 +392,7 @@ void rds_tcp_tune(struct socket *sock) sk->sk_userlocks |= SOCK_SNDBUF_LOCK; } if (rtn->rcvbuf_size > 0) { - sk->sk_sndbuf = rtn->rcvbuf_size; + sk->sk_rcvbuf = rtn->rcvbuf_size; sk->sk_userlocks |= SOCK_RCVBUF_LOCK; } release_sock(sk);
From: Tony Lu tonylu@linux.alibaba.com
commit 00e158fb91dfaff3f94746f260d11f1a4853506e upstream.
When smc_close_final() returns error, the return code overwrites by kernel_sock_shutdown() in smc_close_active(). The return code of smc_close_final() is more important than kernel_sock_shutdown(), and it will pass to userspace directly.
Fix it by keeping both return codes, if smc_close_final() raises an error, return it or kernel_sock_shutdown()'s.
Link: https://lore.kernel.org/linux-s390/1f67548e-cbf6-0dce-82b5-10288a4583bd@linu... Fixes: 606a63c9783a ("net/smc: Ensure the active closing peer first closes clcsock") Suggested-by: Karsten Graul kgraul@linux.ibm.com Signed-off-by: Tony Lu tonylu@linux.alibaba.com Reviewed-by: Wen Gu guwen@linux.alibaba.com Acked-by: Karsten Graul kgraul@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/smc/smc_close.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/net/smc/smc_close.c +++ b/net/smc/smc_close.c @@ -180,6 +180,7 @@ int smc_close_active(struct smc_sock *sm int old_state; long timeout; int rc = 0; + int rc1 = 0;
timeout = current->flags & PF_EXITING ? 0 : sock_flag(sk, SOCK_LINGER) ? @@ -219,8 +220,11 @@ again: /* actively shutdown clcsock before peer close it, * prevent peer from entering TIME_WAIT state. */ - if (smc->clcsock && smc->clcsock->sk) - rc = kernel_sock_shutdown(smc->clcsock, SHUT_RDWR); + if (smc->clcsock && smc->clcsock->sk) { + rc1 = kernel_sock_shutdown(smc->clcsock, + SHUT_RDWR); + rc = rc ? rc : rc1; + } } else { /* peer event has changed the state */ goto again;
From: Helge Deller deller@gmx.de
commit 1d7c29b77725d05faff6754d2f5e7c147aedcf93 upstream.
Default KBUILD_IMAGE to $(boot)/bzImage if a self-extracting (CONFIG_PARISC_SELF_EXTRACT=y) kernel is to be built. This fixes the bindeb-pkg make target.
Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/parisc/Makefile | 5 +++++ 1 file changed, 5 insertions(+)
--- a/arch/parisc/Makefile +++ b/arch/parisc/Makefile @@ -17,7 +17,12 @@ # Mike Shaver, Helge Deller and Martin K. Petersen #
+ifdef CONFIG_PARISC_SELF_EXTRACT +boot := arch/parisc/boot +KBUILD_IMAGE := $(boot)/bzImage +else KBUILD_IMAGE := vmlinuz +endif
KBUILD_DEFCONFIG := default_defconfig
From: Helge Deller deller@gmx.de
commit 0f9fee4cdebfbe695c297e5b603a275e2557c1cc upstream.
On newer debian releases the debian-provided "installkernel" script is installed in /usr/sbin. Fix the kernel install.sh script to look for the script in this directory as well.
Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/parisc/install.sh | 1 + 1 file changed, 1 insertion(+)
--- a/arch/parisc/install.sh +++ b/arch/parisc/install.sh @@ -39,6 +39,7 @@ verify "$3" if [ -n "${INSTALLKERNEL}" ]; then if [ -x ~/bin/${INSTALLKERNEL} ]; then exec ~/bin/${INSTALLKERNEL} "$@"; fi if [ -x /sbin/${INSTALLKERNEL} ]; then exec /sbin/${INSTALLKERNEL} "$@"; fi + if [ -x /usr/sbin/${INSTALLKERNEL} ]; then exec /usr/sbin/${INSTALLKERNEL} "$@"; fi fi
# Default install
From: Maciej W. Rozycki macro@orcam.me.uk
commit 3dfac26e2ef29ff2abc2a75aa4cd48fce25a2c4b upstream.
Fix a division by zero in `vgacon_resize' with a backtrace like:
vgacon_resize vc_do_resize vgacon_init do_bind_con_driver do_unbind_con_driver fbcon_fb_unbind do_unregister_framebuffer do_register_framebuffer register_framebuffer __drm_fb_helper_initial_config_and_unlock drm_helper_hpd_irq_event dw_hdmi_irq irq_thread kthread
caused by `c->vc_cell_height' not having been initialized. This has only started to trigger with commit 860dafa90259 ("vt: Fix character height handling with VT_RESIZEX"), however the ultimate offender is commit 50ec42edd978 ("[PATCH] Detaching fbcon: fix vgacon to allow retaking of the console").
Said commit has added a call to `vc_resize' whenever `vgacon_init' is called with the `init' argument set to 0, which did not happen before. And the call is made before a key vgacon boot parameter retrieved in `vgacon_startup' has been propagated in `vgacon_init' for `vc_resize' to use to the console structure being worked on. Previously the parameter was `c->vc_font.height' and now it is `c->vc_cell_height'.
In this particular scenario the registration of fbcon has failed and vt resorts to vgacon. Now fbcon does have initialized `c->vc_font.height' somehow, unlike `c->vc_cell_height', which is why this code did not crash before, but either way the boot parameters should have been copied to the console structure ahead of the call to `vc_resize' rather than afterwards, so that first the call has a chance to use them and second they do not change the console structure to something possibly different from what was used by `vc_resize'.
Move the propagation of the vgacon boot parameters ahead of the call to `vc_resize' then. Adjust the comment accordingly.
Fixes: 50ec42edd978 ("[PATCH] Detaching fbcon: fix vgacon to allow retaking of the console") Cc: stable@vger.kernel.org # v2.6.18+ Reported-by: Wim Osterholt wim@djo.tudelft.nl Reported-by: Pavel V. Panteleev panteleev_p@mcst.ru Signed-off-by: Maciej W. Rozycki macro@orcam.me.uk Link: https://lore.kernel.org/r/alpine.DEB.2.21.2110252317110.58149@angie.orcam.me... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/video/console/vgacon.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
--- a/drivers/video/console/vgacon.c +++ b/drivers/video/console/vgacon.c @@ -365,11 +365,17 @@ static void vgacon_init(struct vc_data * struct uni_pagedir *p;
/* - * We cannot be loaded as a module, therefore init is always 1, - * but vgacon_init can be called more than once, and init will - * not be 1. + * We cannot be loaded as a module, therefore init will be 1 + * if we are the default console, however if we are a fallback + * console, for example if fbcon has failed registration, then + * init will be 0, so we need to make sure our boot parameters + * have been copied to the console structure for vgacon_resize + * ultimately called by vc_resize. Any subsequent calls to + * vgacon_init init will have init set to 0 too. */ c->vc_can_do_color = vga_can_do_color; + c->vc_scan_lines = vga_scan_lines; + c->vc_font.height = c->vc_cell_height = vga_video_font_height;
/* set dimensions manually if init != 0 since vc_resize() will fail */ if (init) { @@ -378,8 +384,6 @@ static void vgacon_init(struct vc_data * } else vc_resize(c, vga_video_num_columns, vga_video_num_lines);
- c->vc_scan_lines = vga_scan_lines; - c->vc_font.height = c->vc_cell_height = vga_video_font_height; c->vc_complement_mask = 0x7700; if (vga_512_chars) c->vc_hi_font_mask = 0x0800;
From: Mathias Nyman mathias.nyman@linux.intel.com
commit 09f736aa95476631227d2dc0e6b9aeee1ad7ed58 upstream.
Turns out some xHC controllers require all 64 bits in the CRCR register to be written to execute a command abort.
The lower 32 bits containing the command abort bit is written first. In case the command ring stops before we write the upper 32 bits then hardware may use these upper bits to set the commnd ring dequeue pointer.
Solve this by making sure the upper 32 bits contain a valid command ring dequeue pointer.
The original patch that only wrote the first 32 to stop the ring went to stable, so this fix should go there as well.
Fixes: ff0e50d3564f ("xhci: Fix command ring pointer corruption while aborting a command") Cc: stable@vger.kernel.org Tested-by: Pavankumar Kondeti quic_pkondeti@quicinc.com Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20211126122340.1193239-2-mathias.nyman@linux.intel... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci-ring.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-)
--- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -350,7 +350,9 @@ static void xhci_handle_stopped_cmd_ring /* Must be called with xhci->lock held, releases and aquires lock back */ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci, unsigned long flags) { - u32 temp_32; + struct xhci_segment *new_seg = xhci->cmd_ring->deq_seg; + union xhci_trb *new_deq = xhci->cmd_ring->dequeue; + u64 crcr; int ret;
xhci_dbg(xhci, "Abort command ring\n"); @@ -359,13 +361,18 @@ static int xhci_abort_cmd_ring(struct xh
/* * The control bits like command stop, abort are located in lower - * dword of the command ring control register. Limit the write - * to the lower dword to avoid corrupting the command ring pointer - * in case if the command ring is stopped by the time upper dword - * is written. + * dword of the command ring control register. + * Some controllers require all 64 bits to be written to abort the ring. + * Make sure the upper dword is valid, pointing to the next command, + * avoiding corrupting the command ring pointer in case the command ring + * is stopped by the time the upper dword is written. */ - temp_32 = readl(&xhci->op_regs->cmd_ring); - writel(temp_32 | CMD_RING_ABORT, &xhci->op_regs->cmd_ring); + next_trb(xhci, NULL, &new_seg, &new_deq); + if (trb_is_link(new_deq)) + next_trb(xhci, NULL, &new_seg, &new_deq); + + crcr = xhci_trb_virt_to_dma(new_seg, new_deq); + xhci_write_64(xhci, crcr | CMD_RING_ABORT, &xhci->op_regs->cmd_ring);
/* Section 4.6.1.2 of xHCI 1.0 spec says software should also time the * completion of the Command Abort operation. If CRR is not negated in 5
From: Badhri Jagan Sridharan badhri@google.com
commit fbcd13df1e78eb2ba83a3c160eefe2d6f574beaf upstream.
Stub from the spec: "4.5.2.2.4.2 Exiting from AttachWait.SNK State A Sink shall transition to Unattached.SNK when the state of both the CC1 and CC2 pins is SNK.Open for at least tPDDebounce. A DRP shall transition to Unattached.SRC when the state of both the CC1 and CC2 pins is SNK.Open for at least tPDDebounce."
This change makes TCPM to wait in SNK_DEBOUNCED state until CC1 and CC2 pins is SNK.Open for at least tPDDebounce. Previously, TCPM resets the port if vbus is not present in PD_T_PS_SOURCE_ON. This causes TCPM to loop continuously when connected to a faulty power source that does not present vbus. Waiting in SNK_DEBOUNCED also ensures that TCPM is adherant to "4.5.2.2.4.2 Exiting from AttachWait.SNK State" requirements.
[ 6169.280751] CC1: 0 -> 0, CC2: 0 -> 5 [state TOGGLING, polarity 0, connected] [ 6169.280759] state change TOGGLING -> SNK_ATTACH_WAIT [rev2 NONE_AMS] [ 6169.280771] pending state change SNK_ATTACH_WAIT -> SNK_DEBOUNCED @ 170 ms [rev2 NONE_AMS] [ 6169.282427] CC1: 0 -> 0, CC2: 5 -> 5 [state SNK_ATTACH_WAIT, polarity 0, connected] [ 6169.450825] state change SNK_ATTACH_WAIT -> SNK_DEBOUNCED [delayed 170 ms] [ 6169.450834] pending state change SNK_DEBOUNCED -> PORT_RESET @ 480 ms [rev2 NONE_AMS] [ 6169.930892] state change SNK_DEBOUNCED -> PORT_RESET [delayed 480 ms] [ 6169.931296] disable vbus discharge ret:0 [ 6169.931301] Setting usb_comm capable false [ 6169.932783] Setting voltage/current limit 0 mV 0 mA [ 6169.932802] polarity 0 [ 6169.933706] Requesting mux state 0, usb-role 0, orientation 0 [ 6169.936689] cc:=0 [ 6169.936812] pending state change PORT_RESET -> PORT_RESET_WAIT_OFF @ 100 ms [rev2 NONE_AMS] [ 6169.937157] CC1: 0 -> 0, CC2: 5 -> 0 [state PORT_RESET, polarity 0, disconnected] [ 6170.036880] state change PORT_RESET -> PORT_RESET_WAIT_OFF [delayed 100 ms] [ 6170.036890] state change PORT_RESET_WAIT_OFF -> SNK_UNATTACHED [rev2 NONE_AMS] [ 6170.036896] Start toggling [ 6170.041412] CC1: 0 -> 0, CC2: 0 -> 0 [state TOGGLING, polarity 0, disconnected] [ 6170.042973] CC1: 0 -> 0, CC2: 0 -> 5 [state TOGGLING, polarity 0, connected] [ 6170.042976] state change TOGGLING -> SNK_ATTACH_WAIT [rev2 NONE_AMS] [ 6170.042981] pending state change SNK_ATTACH_WAIT -> SNK_DEBOUNCED @ 170 ms [rev2 NONE_AMS] [ 6170.213014] state change SNK_ATTACH_WAIT -> SNK_DEBOUNCED [delayed 170 ms] [ 6170.213019] pending state change SNK_DEBOUNCED -> PORT_RESET @ 480 ms [rev2 NONE_AMS] [ 6170.693068] state change SNK_DEBOUNCED -> PORT_RESET [delayed 480 ms] [ 6170.693304] disable vbus discharge ret:0 [ 6170.693308] Setting usb_comm capable false [ 6170.695193] Setting voltage/current limit 0 mV 0 mA [ 6170.695210] polarity 0 [ 6170.695990] Requesting mux state 0, usb-role 0, orientation 0 [ 6170.701896] cc:=0 [ 6170.702181] pending state change PORT_RESET -> PORT_RESET_WAIT_OFF @ 100 ms [rev2 NONE_AMS] [ 6170.703343] CC1: 0 -> 0, CC2: 5 -> 0 [state PORT_RESET, polarity 0, disconnected]
Fixes: f0690a25a140b8 ("staging: typec: USB Type-C Port Manager (tcpm)") Cc: stable@vger.kernel.org Acked-by: Heikki Krogerus heikki.krogerus@linux.intel.com Signed-off-by: Badhri Jagan Sridharan badhri@google.com Link: https://lore.kernel.org/r/20211130001825.3142830-1-badhri@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/staging/typec/tcpm.c | 4 ---- 1 file changed, 4 deletions(-)
--- a/drivers/staging/typec/tcpm.c +++ b/drivers/staging/typec/tcpm.c @@ -2386,11 +2386,7 @@ static void run_state_machine(struct tcp tcpm_try_src(port) ? SRC_TRY : SNK_ATTACHED, 0); - else - /* Wait for VBUS, but not forever */ - tcpm_set_state(port, PORT_RESET, PD_T_PS_SOURCE_ON); break; - case SRC_TRY: port->try_src_count++; tcpm_set_cc(port, tcpm_rp_cc(port));
From: Joerg Roedel jroedel@suse.de
commit 51523ed1c26758de1af7e58730a656875f72f783 upstream.
The trampoline_pgd only maps the 0xfffffff000000000-0xffffffffffffffff range of kernel memory (with 4-level paging). This range contains the kernel's text+data+bss mappings and the module mapping space but not the direct mapping and the vmalloc area.
This is enough to get the application processors out of real-mode, but for code that switches back to real-mode the trampoline_pgd is missing important parts of the address space. For example, consider this code from arch/x86/kernel/reboot.c, function machine_real_restart() for a 64-bit kernel:
#ifdef CONFIG_X86_32 load_cr3(initial_page_table); #else write_cr3(real_mode_header->trampoline_pgd);
/* Exiting long mode will fail if CR4.PCIDE is set. */ if (boot_cpu_has(X86_FEATURE_PCID)) cr4_clear_bits(X86_CR4_PCIDE); #endif
/* Jump to the identity-mapped low memory code */ #ifdef CONFIG_X86_32 asm volatile("jmpl *%0" : : "rm" (real_mode_header->machine_real_restart_asm), "a" (type)); #else asm volatile("ljmpl *%0" : : "m" (real_mode_header->machine_real_restart_asm), "D" (type)); #endif
The code switches to the trampoline_pgd, which unmaps the direct mapping and also the kernel stack. The call to cr4_clear_bits() will find no stack and crash the machine. The real_mode_header pointer below points into the direct mapping, and dereferencing it also causes a crash.
The reason this does not crash always is only that kernel mappings are global and the CR3 switch does not flush those mappings. But if theses mappings are not in the TLB already, the above code will crash before it can jump to the real-mode stub.
Extend the trampoline_pgd to contain all kernel mappings to prevent these crashes and to make code which runs on this page-table more robust.
Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Borislav Petkov bp@suse.de Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20211202153226.22946-5-joro@8bytes.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/realmode/init.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
--- a/arch/x86/realmode/init.c +++ b/arch/x86/realmode/init.c @@ -57,6 +57,7 @@ static void __init setup_real_mode(void) #ifdef CONFIG_X86_64 u64 *trampoline_pgd; u64 efer; + int i; #endif
base = (unsigned char *)real_mode_header; @@ -114,8 +115,17 @@ static void __init setup_real_mode(void) trampoline_header->flags |= TH_FLAGS_SME_ACTIVE;
trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd); + + /* Map the real mode stub as virtual == physical */ trampoline_pgd[0] = trampoline_pgd_entry.pgd; - trampoline_pgd[511] = init_top_pgt[511].pgd; + + /* + * Include the entirety of the kernel mapping into the trampoline + * PGD. This way, all mappings present in the normal kernel page + * tables are usable while running on trampoline_pgd. + */ + for (i = pgd_index(__PAGE_OFFSET); i < PTRS_PER_PGD; i++) + trampoline_pgd[i] = init_top_pgt[i].pgd; #endif }
From: Sven Eckelmann sven@narfation.org
commit 7492ffc90fa126afb67d4392d56cb4134780194a upstream.
The CONSOLE_POLLING mode is used for tools like k(g)db. In this kind of setup, it is often sharing a serial device with the normal system console. This is usually no problem because the polling helpers can consume input values directly (when in kgdb context) and the normal Linux handlers can only consume new input values after kgdb switched back.
This is not true anymore when RX DMA is enabled for UARTDM controllers. Single input values can no longer be received correctly. Instead following seems to happen:
* on 1. input, some old input is read (continuously) * on 2. input, two old inputs are read (continuously) * on 3. input, three old input values are read (continuously) * on 4. input, 4 previous inputs are received
This repeats then for each group of 4 input values.
This behavior changes slightly depending on what state the controller was when the first input was received. But this makes working with kgdb basically impossible because control messages are always corrupted when kgdboc tries to parse them.
RX DMA should therefore be off when CONSOLE_POLLING is enabled to avoid these kind of problems. No such problem was noticed for TX DMA.
Fixes: 99693945013a ("tty: serial: msm: Add RX DMA support") Cc: stable@vger.kernel.org Signed-off-by: Sven Eckelmann sven@narfation.org Link: https://lore.kernel.org/r/20211113121050.7266-1-sven@narfation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/msm_serial.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/tty/serial/msm_serial.c +++ b/drivers/tty/serial/msm_serial.c @@ -611,6 +611,9 @@ static void msm_start_rx_dma(struct msm_ u32 val; int ret;
+ if (IS_ENABLED(CONFIG_CONSOLE_POLL)) + return; + if (!dma->chan) return;
From: Pierre Gondois Pierre.Gondois@arm.com
commit ac442a077acf9a6bf1db4320ec0c3f303be092b3 upstream.
The document 'ACPI for Arm Components 1.0' defines the following _HID mappings: -'Prime cell UART (PL011)': ARMH0011 -'SBSA UART': ARMHB000
Use the sbsa-uart driver when a device is described with the 'ARMHB000' _HID.
Note: PL011 devices currently use the sbsa-uart driver instead of the uart-pl011 driver. Indeed, PL011 devices are not bound to a clock in ACPI. It is not possible to change their baudrate.
Cc: stable@vger.kernel.org Signed-off-by: Pierre Gondois Pierre.Gondois@arm.com Link: https://lore.kernel.org/r/20211109172248.19061-1-Pierre.Gondois@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/amba-pl011.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/tty/serial/amba-pl011.c +++ b/drivers/tty/serial/amba-pl011.c @@ -2790,6 +2790,7 @@ MODULE_DEVICE_TABLE(of, sbsa_uart_of_mat
static const struct acpi_device_id sbsa_uart_acpi_match[] = { { "ARMH0011", 0 }, + { "ARMHB000", 0 }, {}, }; MODULE_DEVICE_TABLE(acpi, sbsa_uart_acpi_match);
From: Johan Hovold johan@kernel.org
commit 00de977f9e0aa9760d9a79d1e41ff780f74e3424 upstream.
Commit 761ed4a94582 ("tty: serial_core: convert uart_close to use tty_port_close") converted serial core to use tty_port_close() but failed to notice that the transmit buffer still needs to be freed on final close.
Not freeing the transmit buffer means that the buffer is no longer cleared on next open so that any ioctl() waiting for the buffer to drain might wait indefinitely (e.g. on termios changes) or that stale data can end up being transmitted in case tx is restarted.
Furthermore, the buffer of any port that has been opened would leak on driver unbind.
Note that the port lock is held when clearing the buffer pointer due to the ldisc race worked around by commit a5ba1d95e46e ("uart: fix race between uart_put_char() and uart_shutdown()").
Also note that the tty-port shutdown() callback is not called for console ports so it is not strictly necessary to free the buffer page after releasing the lock (cf. d72402145ace ("tty/serial: do not free trasnmit buffer page under port lock")).
Link: https://lore.kernel.org/r/319321886d97c456203d5c6a576a5480d07c3478.163578168... Fixes: 761ed4a94582 ("tty: serial_core: convert uart_close to use tty_port_close") Cc: stable@vger.kernel.org # 4.9 Cc: Rob Herring robh@kernel.org Reported-by: Baruch Siach baruch@tkos.co.il Tested-by: Baruch Siach baruch@tkos.co.il Signed-off-by: Johan Hovold johan@kernel.org Link: https://lore.kernel.org/r/20211108085431.12637-1-johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/serial_core.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
--- a/drivers/tty/serial/serial_core.c +++ b/drivers/tty/serial/serial_core.c @@ -1541,6 +1541,7 @@ static void uart_tty_port_shutdown(struc { struct uart_state *state = container_of(port, struct uart_state, port); struct uart_port *uport = uart_port_check(state); + char *buf;
/* * At this point, we stop accepting input. To do this, we @@ -1562,8 +1563,18 @@ static void uart_tty_port_shutdown(struc */ tty_port_set_suspended(port, 0);
- uart_change_pm(state, UART_PM_STATE_OFF); + /* + * Free the transmit buffer. + */ + spin_lock_irq(&uport->lock); + buf = state->xmit.buf; + state->xmit.buf = NULL; + spin_unlock_irq(&uport->lock); + + if (buf) + free_page((unsigned long)buf);
+ uart_change_pm(state, UART_PM_STATE_OFF); }
static void uart_wait_until_sent(struct tty_struct *tty, int timeout)
From: Helge Deller deller@gmx.de
commit afdb4a5b1d340e4afffc65daa21cc71890d7d589 upstream.
In commit c8c3735997a3 ("parisc: Enhance detection of synchronous cr16 clocksources") I assumed that CPUs on the same physical core are syncronous. While booting up the kernel on two different C8000 machines, one with a dual-core PA8800 and one with a dual-core PA8900 CPU, this turned out to be wrong. The symptom was that I saw a jump in the internal clocks printed to the syslog and strange overall behaviour. On machines which have 4 cores (2 dual-cores) the problem isn't visible, because the current logic already marked the cr16 clocksource unstable in this case.
This patch now marks the cr16 interval timers unstable if we have more than one CPU in the system, and it fixes this issue.
Fixes: c8c3735997a3 ("parisc: Enhance detection of synchronous cr16 clocksources") Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/parisc/kernel/time.c | 24 +++++------------------- 1 file changed, 5 insertions(+), 19 deletions(-)
--- a/arch/parisc/kernel/time.c +++ b/arch/parisc/kernel/time.c @@ -245,27 +245,13 @@ void __init time_init(void) static int __init init_cr16_clocksource(void) { /* - * The cr16 interval timers are not syncronized across CPUs on - * different sockets, so mark them unstable and lower rating on - * multi-socket SMP systems. + * The cr16 interval timers are not syncronized across CPUs, even if + * they share the same socket. */ if (num_online_cpus() > 1 && !running_on_qemu) { - int cpu; - unsigned long cpu0_loc; - cpu0_loc = per_cpu(cpu_data, 0).cpu_loc; - - for_each_online_cpu(cpu) { - if (cpu == 0) - continue; - if ((cpu0_loc != 0) && - (cpu0_loc == per_cpu(cpu_data, cpu).cpu_loc)) - continue; - - clocksource_cr16.name = "cr16_unstable"; - clocksource_cr16.flags = CLOCK_SOURCE_UNSTABLE; - clocksource_cr16.rating = 0; - break; - } + clocksource_cr16.name = "cr16_unstable"; + clocksource_cr16.flags = CLOCK_SOURCE_UNSTABLE; + clocksource_cr16.rating = 0; }
/* XXX: We may want to mark sched_clock stable here if cr16 clocks are
On Mon, 06 Dec 2021 15:55:08 +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.257 release. There are 106 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 08 Dec 2021 14:55:37 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.257-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v4.14: 8 builds: 8 pass, 0 fail 16 boots: 16 pass, 0 fail 32 tests: 32 pass, 0 fail
Linux version: 4.14.257-rc1-gbfbeffc30386 Boards tested: tegra124-jetson-tk1, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On Mon, Dec 06, 2021 at 03:55:08PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.257 release. There are 106 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 08 Dec 2021 14:55:37 +0000. Anything received after that time might be too late.
Build results: total: 168 pass: 168 fail: 0 Qemu test results: total: 423 pass: 423 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
linux-stable-mirror@lists.linaro.org