This is the start of the stable review cycle for the 5.10.211 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Feb 2024 13:15:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.211-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.10.211-rc1
Baokun Li libaokun1@huawei.com ext4: regenerate buddy after block freeing failed if under fc replay
Kuniyuki Iwashima kuniyu@amazon.com arp: Prevent overflow in arp_req_get().
Bart Van Assche bvanassche@acm.org fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio
Michael Schmitz schmitzmic@gmail.com block: ataflop: more blk-mq refactoring fixes
Armin Wolf W_Armin@gmx.de drm/amd/display: Fix memory leak in dm_sw_fini()
Erik Kurzinger ekurzinger@nvidia.com drm/syncobj: call drm_syncobj_fence_add_wait when WAIT_AVAILABLE flag is set
Christian König christian.koenig@amd.com drm/syncobj: make lockdep complain on WAIT_FOR_SUBMIT v3
Florian Westphal fw@strlen.de netfilter: nf_tables: set dormant flag on hook register failure
Sabrina Dubroca sd@queasysnail.net tls: stop recv() if initial process_rx_list gave us non-DATA
Jakub Kicinski kuba@kernel.org tls: rx: drop pointless else after goto
Jakub Kicinski kuba@kernel.org tls: rx: jump to a more appropriate label
Jason Gunthorpe jgg@ziepe.ca s390: use the correct count for __iowrite64_copy()
Kees Cook keescook@chromium.org net: dev: Convert sa_data to flexible array in struct sockaddr
Wolfram Sang wsa+renesas@sang-engineering.com packet: move from strlcpy with unused retval to strscpy
Vasiliy Kovalev kovalev@altlinux.org ipv6: sr: fix possible use-after-free and null-ptr-deref
Daniil Dulov d.dulov@aladdin.ru afs: Increase buffer size in afs_update_volume_status()
Eric Dumazet edumazet@google.com ipv6: properly combine dev_base_seq and ipv6.dev_addr_genid
Eric Dumazet edumazet@google.com ipv4: properly combine dev_base_seq and ipv4.dev_addr_genid
Arnd Bergmann arnd@arndb.de nouveau: fix function cast warnings
Randy Dunlap rdunlap@infradead.org scsi: jazz_esp: Only build if SCSI core is builtin
Gianmarco Lusvardi glusvardi@posteo.net bpf, scripts: Correct GPL license name
Arnd Bergmann arnd@arndb.de RDMA/srpt: fix function pointer cast warnings
Heiko Stuebner heiko.stuebner@cherry.de arm64: dts: rockchip: set num-cs property for spi on px30
Kamal Heib kheib@redhat.com RDMA/qedr: Fix qedr_create_user_qp error flow
Bart Van Assche bvanassche@acm.org RDMA/srpt: Support specifying the srpt_service_guid parameter
Kalesh AP kalesh-anakkur.purayil@broadcom.com RDMA/bnxt_re: Return error for SRQ resize
Zhipeng Lu alexious@zju.edu.cn IB/hfi1: Fix a memleak in init_credit_return
Paolo Abeni pabeni@redhat.com mptcp: fix lockless access in subflow ULP diag
Xu Yang xu.yang_2@nxp.com usb: roles: don't get/set_role() when usb_role_switch is unregistered
Xu Yang xu.yang_2@nxp.com usb: roles: fix NULL pointer issue when put module's reference
Krishna Kurapati quic_kriskura@quicinc.com usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs
Frank Li Frank.Li@nxp.com usb: cdns3: fix memory double free when handle zero packet
Frank Li Frank.Li@nxp.com usb: cdns3: fixed memory use after free at cdns3_gadget_ep_disable()
Peter Zijlstra peterz@infradead.org x86/alternative: Make custom return thunk unconditional
Borislav Petkov (AMD) bp@alien8.de Revert "x86/alternative: Make custom return thunk unconditional"
Peter Zijlstra peterz@infradead.org x86/returnthunk: Allow different return thunks
Peter Zijlstra peterz@infradead.org x86/ftrace: Use alternative RET encoding
Peter Zijlstra peterz@infradead.org x86/ibt,paravirt: Use text_gen_insn() for paravirt_patch()
Peter Zijlstra peterz@infradead.org x86/text-patching: Make text_gen_insn() play nice with ANNOTATE_NOENDBR
Borislav Petkov (AMD) bp@alien8.de Revert "x86/ftrace: Use alternative RET encoding"
Nikita Shubin nikita.shubin@maquefel.me ARM: ep93xx: Add terminator to gpiod_lookup_table
Tom Parkin tparkin@katalix.com l2tp: pass correct message length to ip6_append_data
Vidya Sagar vidyas@nvidia.com PCI/MSI: Prevent MSI hardware interrupt number truncation
Vasiliy Kovalev kovalev@altlinux.org gtp: fix use-after-free and null-ptr-deref in gtp_genl_dump_pdp()
Oliver Upton oliver.upton@linux.dev KVM: arm64: vgic-its: Test for valid IRQ in its_sync_lpi_pending_table()
Oliver Upton oliver.upton@linux.dev KVM: arm64: vgic-its: Test for valid IRQ in MOVALL handler
Mikulas Patocka mpatocka@redhat.com dm-crypt: don't modify the data when using authenticated encryption
Peter Oberparleiter oberpar@linux.ibm.com s390/cio: fix invalid -EBUSY on ccw_device_start
Daniel Vacek neelx@redhat.com IB/hfi1: Fix sdma.h tx->num_descs off-by-one error
Gao Xiang hsiangkao@linux.alibaba.com erofs: fix lz4 inplace decompression
Jan Beulich jbeulich@suse.com x86: drop bogus "cc" clobber from __try_cmpxchg_user_asm()
Zhihao Cheng chengzhihao1@huawei.com jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint
Zhang Yi yi.zhang@huawei.com jbd2: recheck chechpointing non-dirty buffer
Zhang Yi yi.zhang@huawei.com jbd2: remove redundant buffer io error checks
Johannes Berg johannes.berg@intel.com iwlwifi: mvm: write queue_sync_state only for sync
Johannes Berg johannes.berg@intel.com iwlwifi: mvm: do more useful queue sync accounting
Max Verevkin me@maxverevkin.tk platform/x86: intel-vbtn: Support for tablet mode on HP Pavilion 13 x360 PC
Sergej Bauer sbauer@blackbox.su lan743x: fix for potential NULL pointer dereference with bare card
Filipe Manana fdmanana@suse.com btrfs: do not pin logs too early during renames
Filipe Manana fdmanana@suse.com btrfs: unify lookup return value when dir entry is missing
Marcos Paulo de Souza mpdesouza@suse.com btrfs: introduce btrfs_lookup_match_dir
Josef Bacik josef@toxicpanda.com btrfs: tree-checker: check for overlapping extent items
Borislav Petkov bp@suse.de task_stack, x86/cea: Force-inline stack helpers
Andy Shevchenko andriy.shevchenko@linux.intel.com ASoC: Intel: bytcr_rt5651: Drop reference count of ACPI device after use
Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com ASoC: Intel: boards: get codec device with ACPI instead of bus search
Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com ASoC: Intel: boards: harden codec property handling
YouChing Lin ycllin@mxic.com.tw mtd: spinand: macronix: Add support for MX35LFxGE4AD
Shyam Prasad N sprasad@microsoft.com cifs: add a warning when the in-flight count goes negative
Benjamin Gray bgray@linux.ibm.com powerpc/watchpoints: Annotate atomic context in more places
Ravi Bangoria ravi.bangoria@linux.ibm.com powerpc/watchpoint: Workaround P10 DD1 issue with VSX-32 byte instructions
Michael Schmitz schmitzmic@gmail.com block: ataflop: fix breakage introduced at blk-mq refactoring
Kees Cook keescook@chromium.org seccomp: Invalidate seccomp mode to catch death failures
Peter Zijlstra peterz@infradead.org x86/uaccess: Implement macros for CMPXCHG on user addresses
Sebastian Andrzej Siewior bigeasy@linutronix.de hsr: Avoid double remove of a node.
Roger Pau Monne roger.pau@citrix.com hvc/xen: prevent concurrent accesses to the shared ring
Dan Carpenter error27@gmail.com media: av7110: prevent underflow in write_ts_to_decoder()
Shengjiu Wang shengjiu.wang@nxp.com ASoC: fsl_micfil: register platform component before registering cpu dai
Xiaolei Wang xiaolei.wang@windriver.com ARM: dts: imx: Set default tuning step for imx6sx usdhc
Jiaxun Yang jiaxun.yang@flygoat.com irqchip/mips-gic: Don't touch vl_map if a local interrupt is not routable
Rafał Miłecki rafal@milecki.pl ARM: dts: BCM53573: Drop nonexistent "default-off" LED trigger
Geert Uytterhoeven geert+renesas@glider.be pmdomain: renesas: r8a77980-sysc: CR7 must be always on
Yi Sun yi.sun@unisoc.com virtio-blk: Ensure no requests in virtqueues before deleting vqs.
Takashi Sakamoto o-takashi@sakamocchi.jp firewire: core: send bus reset promptly on gap count error
Hannes Reinecke hare@suse.de scsi: lpfc: Use unsigned type for num_sge
Zhang Rui rui.zhang@intel.com hwmon: (coretemp) Enlarge per package core count limit
Andrew Bresticker abrestic@rivosinc.com efi: Don't add memblocks for soft-reserved memory
Andrew Bresticker abrestic@rivosinc.com efi: runtime: Fix potential overflow of soft-reserved region size
Szilard Fabian szfabian@bluemarch.art Input: i8042 - add Fujitsu Lifebook U728 to i8042 quirk table
Zhang Yi yi.zhang@huawei.com ext4: correct the hole length returned by ext4_map_blocks()
Daniel Wagner dwagner@suse.de nvmet-fc: abort command when there is no binding
Daniel Wagner dwagner@suse.de nvmet-fc: release reference on target port
Daniel Wagner dwagner@suse.de nvmet-fcloop: swap the list_add_tail arguments
Daniel Wagner dwagner@suse.de nvme-fc: do not wait in vain when unloading module
Xin Long lucien.xin@gmail.com netfilter: conntrack: check SCTP_CID_SHUTDOWN_ACK for vtag setting in sctp_new
Wolfram Sang wsa+renesas@sang-engineering.com spi: sh-msiof: avoid integer overflow in constants
Chen-Yu Tsai wens@csie.org ASoC: sunxi: sun4i-spdif: Add support for Allwinner H616
Guixin Liu kanie@linux.alibaba.com nvmet-tcp: fix nvme tcp ida memory leak
Martin Blumenstingl martin.blumenstingl@googlemail.com regulator: pwm-regulator: Add validity checks in continuous .get_voltage
Kunwu Chan chentao@kylinos.cn dmaengine: ti: edma: Add some null pointer checks to the edma_probe
Baokun Li libaokun1@huawei.com ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal()
Baokun Li libaokun1@huawei.com ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found()
Lennert Buytenhek kernel@wantstofly.org ahci: add 43-bit DMA address quirk for ASMedia ASM1061 controllers
Conrad Kostecki conikost@gentoo.org ahci: asm1166: correct count of reported ports
Devyn Liu liudingyuan@huawei.com spi: hisi-sfc-v3xx: Return IRQ_NONE if no interrupts were detected
Fullway Wang fullwaywang@outlook.com fbdev: sis: Error out if pixclock equals zero
Fullway Wang fullwaywang@outlook.com fbdev: savage: Error out if pixclock equals zero
Felix Fietkau nbd@nbd.name wifi: mac80211: fix race condition on enabling fast-xmit
Michal Kazior michal@plume.com wifi: cfg80211: fix missing interfaces when dumping
Vinod Koul vkoul@kernel.org dmaengine: fsl-qdma: increase size of 'irq_name'
Vinod Koul vkoul@kernel.org dmaengine: shdma: increase size of 'dev_id'
Dmitry Bogdanov d.bogdanov@yadro.com scsi: target: core: Add TMF to tmr_list handling
Cyril Hrubis chrubis@suse.cz sched/rt: Disallow writing invalid values to sched_rt_period_us
Cyril Hrubis chrubis@suse.cz sched/rt: Fix sysctl_sched_rr_timeslice intial value
Damien Le Moal dlemoal@kernel.org zonefs: Improve error handling
Lokesh Gidra lokeshgidra@google.com userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb
Cyril Hrubis chrubis@suse.cz sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset
Paulo Alcantara pc@manguebit.com smb: client: fix parsing of SMB3.1.1 POSIX create context
Paulo Alcantara pc@manguebit.com smb: client: fix potential OOBs in smb2_parse_contexts()
Paulo Alcantara pc@manguebit.com smb: client: fix OOB in receive_encrypted_standard()
Jamal Hadi Salim jhs@mojatatu.com net/sched: Retire dsmark qdisc
Jamal Hadi Salim jhs@mojatatu.com net/sched: Retire ATM qdisc
Jamal Hadi Salim jhs@mojatatu.com net/sched: Retire CBQ qdisc
-------------
Diffstat:
Makefile | 4 +- arch/arm/boot/dts/bcm47189-luxul-xap-1440.dts | 1 - arch/arm/boot/dts/bcm47189-luxul-xap-810.dts | 2 - arch/arm/boot/dts/imx6sx.dtsi | 6 + arch/arm/mach-ep93xx/core.c | 1 + arch/arm64/boot/dts/rockchip/px30.dtsi | 2 + arch/arm64/kvm/vgic/vgic-its.c | 5 + arch/powerpc/kernel/hw_breakpoint.c | 76 +- arch/s390/pci/pci.c | 2 +- arch/x86/include/asm/cpu_entry_area.h | 2 +- arch/x86/include/asm/nospec-branch.h | 2 + arch/x86/include/asm/text-patching.h | 46 +- arch/x86/include/asm/uaccess.h | 142 ++ arch/x86/kernel/alternative.c | 13 +- arch/x86/kernel/ftrace.c | 4 +- arch/x86/kernel/paravirt.c | 22 +- arch/x86/kernel/static_call.c | 2 +- arch/x86/net/bpf_jit_comp.c | 2 +- drivers/ata/ahci.c | 34 +- drivers/ata/ahci.h | 1 + drivers/block/ataflop.c | 56 +- drivers/block/virtio_blk.c | 7 +- drivers/dma/fsl-qdma.c | 2 +- drivers/dma/sh/shdma.h | 2 +- drivers/dma/ti/edma.c | 10 + drivers/firewire/core-card.c | 18 +- drivers/firmware/efi/arm-runtime.c | 2 +- drivers/firmware/efi/efi-init.c | 19 +- drivers/firmware/efi/riscv-runtime.c | 2 +- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 + drivers/gpu/drm/drm_syncobj.c | 16 +- drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c | 8 +- drivers/hwmon/coretemp.c | 2 +- drivers/infiniband/hw/bnxt_re/ib_verbs.c | 5 +- drivers/infiniband/hw/hfi1/pio.c | 6 +- drivers/infiniband/hw/hfi1/sdma.c | 2 +- drivers/infiniband/hw/qedr/verbs.c | 11 +- drivers/infiniband/ulp/srpt/ib_srpt.c | 17 +- drivers/input/serio/i8042-acpipnpio.h | 8 + drivers/irqchip/irq-mips-gic.c | 2 + drivers/md/dm-crypt.c | 6 + drivers/media/pci/ttpci/av7110_av.c | 4 +- drivers/mtd/nand/spi/macronix.c | 20 + drivers/net/ethernet/microchip/lan743x_ethtool.c | 9 +- drivers/net/gtp.c | 10 +- drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c | 14 +- drivers/net/wireless/intel/iwlwifi/mvm/mvm.h | 2 +- drivers/net/wireless/intel/iwlwifi/mvm/ops.c | 2 +- drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c | 10 +- drivers/nvme/host/fc.c | 47 +- drivers/nvme/target/fc.c | 11 +- drivers/nvme/target/fcloop.c | 6 +- drivers/nvme/target/tcp.c | 1 + drivers/pci/msi.c | 2 +- drivers/platform/x86/intel-vbtn.c | 6 + drivers/regulator/pwm-regulator.c | 3 + drivers/s390/cio/device_ops.c | 6 +- drivers/scsi/Kconfig | 2 +- drivers/scsi/lpfc/lpfc_scsi.c | 12 +- drivers/soc/renesas/r8a77980-sysc.c | 3 +- drivers/spi/spi-hisi-sfc-v3xx.c | 5 + drivers/spi/spi-sh-msiof.c | 16 +- drivers/target/target_core_device.c | 5 - drivers/target/target_core_transport.c | 4 + drivers/tty/hvc/hvc_xen.c | 19 +- drivers/usb/cdns3/gadget.c | 8 +- drivers/usb/gadget/function/f_ncm.c | 10 +- drivers/usb/roles/class.c | 29 +- drivers/video/fbdev/savage/savagefb_driver.c | 3 + drivers/video/fbdev/sis/sis_main.c | 2 + fs/afs/volume.c | 4 +- fs/aio.c | 9 +- fs/btrfs/ctree.h | 2 +- fs/btrfs/dir-item.c | 122 +- fs/btrfs/inode.c | 48 +- fs/btrfs/tree-checker.c | 25 +- fs/btrfs/tree-log.c | 14 +- fs/cifs/smb2ops.c | 19 +- fs/cifs/smb2pdu.c | 95 +- fs/cifs/smb2proto.h | 12 +- fs/erofs/decompressor.c | 24 +- fs/ext4/extents.c | 111 +- fs/ext4/mballoc.c | 33 +- fs/jbd2/checkpoint.c | 137 +- fs/zonefs/super.c | 68 +- include/linux/fs.h | 2 + include/linux/lockdep.h | 5 + include/linux/sched/task_stack.h | 2 +- include/linux/socket.h | 5 +- include/net/tcp.h | 2 +- kernel/sched/rt.c | 10 +- kernel/seccomp.c | 10 + kernel/sysctl.c | 4 + mm/userfaultfd.c | 14 +- net/core/dev.c | 2 +- net/core/dev_ioctl.c | 2 +- net/hsr/hsr_framereg.c | 16 +- net/hsr/hsr_framereg.h | 1 + net/ipv4/arp.c | 3 +- net/ipv4/devinet.c | 21 +- net/ipv6/addrconf.c | 21 +- net/ipv6/seg6.c | 20 +- net/l2tp/l2tp_ip6.c | 2 +- net/mac80211/sta_info.c | 2 + net/mac80211/tx.c | 2 +- net/mptcp/diag.c | 6 +- net/netfilter/nf_conntrack_proto_sctp.c | 2 +- net/netfilter/nf_tables_api.c | 1 + net/packet/af_packet.c | 12 +- net/sched/Kconfig | 42 - net/sched/Makefile | 3 - net/sched/sch_atm.c | 709 -------- net/sched/sch_cbq.c | 1816 --------------------- net/sched/sch_dsmark.c | 521 ------ net/tls/tls_main.c | 2 +- net/tls/tls_sw.c | 12 +- net/wireless/nl80211.c | 1 + scripts/bpf_helpers_doc.py | 2 +- sound/soc/fsl/fsl_micfil.c | 15 +- sound/soc/intel/boards/bytcht_es8316.c | 14 +- sound/soc/intel/boards/bytcr_rt5640.c | 46 +- sound/soc/intel/boards/bytcr_rt5651.c | 41 +- sound/soc/sunxi/sun4i-spdif.c | 5 + 123 files changed, 1267 insertions(+), 3694 deletions(-)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jamal Hadi Salim jhs@mojatatu.com
commit 051d442098421c28c7951625652f61b1e15c4bd5 upstream.
While this amazing qdisc has served us well over the years it has not been getting any tender love and care and has bitrotted over time. It has become mostly a shooting target for syzkaller lately. For this reason, we are retiring it. Goodbye CBQ - we loved you.
Signed-off-by: Jamal Hadi Salim jhs@mojatatu.com Acked-by: Jiri Pirko jiri@nvidia.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/Kconfig | 17 net/sched/Makefile | 1 net/sched/sch_cbq.c | 1816 ---------------------------------------------------- 3 files changed, 1834 deletions(-) delete mode 100644 net/sched/sch_cbq.c delete mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/cbq.json
--- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -45,23 +45,6 @@ if NET_SCHED
comment "Queueing/Scheduling"
-config NET_SCH_CBQ - tristate "Class Based Queueing (CBQ)" - help - Say Y here if you want to use the Class-Based Queueing (CBQ) packet - scheduling algorithm. This algorithm classifies the waiting packets - into a tree-like hierarchy of classes; the leaves of this tree are - in turn scheduled by separate algorithms. - - See the top of file:net/sched/sch_cbq.c for more details. - - CBQ is a commonly used scheduler, so if you're unsure, you should - say Y here. Then say Y to all the queueing algorithms below that you - want to use as leaf disciplines. - - To compile this code as a module, choose M here: the - module will be called sch_cbq. - config NET_SCH_HTB tristate "Hierarchical Token Bucket (HTB)" help --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -32,7 +32,6 @@ obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_t obj-$(CONFIG_NET_ACT_CT) += act_ct.o obj-$(CONFIG_NET_ACT_GATE) += act_gate.o obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o -obj-$(CONFIG_NET_SCH_CBQ) += sch_cbq.o obj-$(CONFIG_NET_SCH_HTB) += sch_htb.o obj-$(CONFIG_NET_SCH_HFSC) += sch_hfsc.o obj-$(CONFIG_NET_SCH_RED) += sch_red.o --- a/net/sched/sch_cbq.c +++ /dev/null @@ -1,1816 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-or-later -/* - * net/sched/sch_cbq.c Class-Based Queueing discipline. - * - * Authors: Alexey Kuznetsov, kuznet@ms2.inr.ac.ru - */ - -#include <linux/module.h> -#include <linux/slab.h> -#include <linux/types.h> -#include <linux/kernel.h> -#include <linux/string.h> -#include <linux/errno.h> -#include <linux/skbuff.h> -#include <net/netlink.h> -#include <net/pkt_sched.h> -#include <net/pkt_cls.h> - - -/* Class-Based Queueing (CBQ) algorithm. - ======================================= - - Sources: [1] Sally Floyd and Van Jacobson, "Link-sharing and Resource - Management Models for Packet Networks", - IEEE/ACM Transactions on Networking, Vol.3, No.4, 1995 - - [2] Sally Floyd, "Notes on CBQ and Guaranteed Service", 1995 - - [3] Sally Floyd, "Notes on Class-Based Queueing: Setting - Parameters", 1996 - - [4] Sally Floyd and Michael Speer, "Experimental Results - for Class-Based Queueing", 1998, not published. - - ----------------------------------------------------------------------- - - Algorithm skeleton was taken from NS simulator cbq.cc. - If someone wants to check this code against the LBL version, - he should take into account that ONLY the skeleton was borrowed, - the implementation is different. Particularly: - - --- The WRR algorithm is different. Our version looks more - reasonable (I hope) and works when quanta are allowed to be - less than MTU, which is always the case when real time classes - have small rates. Note, that the statement of [3] is - incomplete, delay may actually be estimated even if class - per-round allotment is less than MTU. Namely, if per-round - allotment is W*r_i, and r_1+...+r_k = r < 1 - - delay_i <= ([MTU/(W*r_i)]*W*r + W*r + k*MTU)/B - - In the worst case we have IntServ estimate with D = W*r+k*MTU - and C = MTU*r. The proof (if correct at all) is trivial. - - - --- It seems that cbq-2.0 is not very accurate. At least, I cannot - interpret some places, which look like wrong translations - from NS. Anyone is advised to find these differences - and explain to me, why I am wrong 8). - - --- Linux has no EOI event, so that we cannot estimate true class - idle time. Workaround is to consider the next dequeue event - as sign that previous packet is finished. This is wrong because of - internal device queueing, but on a permanently loaded link it is true. - Moreover, combined with clock integrator, this scheme looks - very close to an ideal solution. */ - -struct cbq_sched_data; - - -struct cbq_class { - struct Qdisc_class_common common; - struct cbq_class *next_alive; /* next class with backlog in this priority band */ - -/* Parameters */ - unsigned char priority; /* class priority */ - unsigned char priority2; /* priority to be used after overlimit */ - unsigned char ewma_log; /* time constant for idle time calculation */ - - u32 defmap; - - /* Link-sharing scheduler parameters */ - long maxidle; /* Class parameters: see below. */ - long offtime; - long minidle; - u32 avpkt; - struct qdisc_rate_table *R_tab; - - /* General scheduler (WRR) parameters */ - long allot; - long quantum; /* Allotment per WRR round */ - long weight; /* Relative allotment: see below */ - - struct Qdisc *qdisc; /* Ptr to CBQ discipline */ - struct cbq_class *split; /* Ptr to split node */ - struct cbq_class *share; /* Ptr to LS parent in the class tree */ - struct cbq_class *tparent; /* Ptr to tree parent in the class tree */ - struct cbq_class *borrow; /* NULL if class is bandwidth limited; - parent otherwise */ - struct cbq_class *sibling; /* Sibling chain */ - struct cbq_class *children; /* Pointer to children chain */ - - struct Qdisc *q; /* Elementary queueing discipline */ - - -/* Variables */ - unsigned char cpriority; /* Effective priority */ - unsigned char delayed; - unsigned char level; /* level of the class in hierarchy: - 0 for leaf classes, and maximal - level of children + 1 for nodes. - */ - - psched_time_t last; /* Last end of service */ - psched_time_t undertime; - long avgidle; - long deficit; /* Saved deficit for WRR */ - psched_time_t penalized; - struct gnet_stats_basic_packed bstats; - struct gnet_stats_queue qstats; - struct net_rate_estimator __rcu *rate_est; - struct tc_cbq_xstats xstats; - - struct tcf_proto __rcu *filter_list; - struct tcf_block *block; - - int filters; - - struct cbq_class *defaults[TC_PRIO_MAX + 1]; -}; - -struct cbq_sched_data { - struct Qdisc_class_hash clhash; /* Hash table of all classes */ - int nclasses[TC_CBQ_MAXPRIO + 1]; - unsigned int quanta[TC_CBQ_MAXPRIO + 1]; - - struct cbq_class link; - - unsigned int activemask; - struct cbq_class *active[TC_CBQ_MAXPRIO + 1]; /* List of all classes - with backlog */ - -#ifdef CONFIG_NET_CLS_ACT - struct cbq_class *rx_class; -#endif - struct cbq_class *tx_class; - struct cbq_class *tx_borrowed; - int tx_len; - psched_time_t now; /* Cached timestamp */ - unsigned int pmask; - - struct hrtimer delay_timer; - struct qdisc_watchdog watchdog; /* Watchdog timer, - started when CBQ has - backlog, but cannot - transmit just now */ - psched_tdiff_t wd_expires; - int toplevel; - u32 hgenerator; -}; - - -#define L2T(cl, len) qdisc_l2t((cl)->R_tab, len) - -static inline struct cbq_class * -cbq_class_lookup(struct cbq_sched_data *q, u32 classid) -{ - struct Qdisc_class_common *clc; - - clc = qdisc_class_find(&q->clhash, classid); - if (clc == NULL) - return NULL; - return container_of(clc, struct cbq_class, common); -} - -#ifdef CONFIG_NET_CLS_ACT - -static struct cbq_class * -cbq_reclassify(struct sk_buff *skb, struct cbq_class *this) -{ - struct cbq_class *cl; - - for (cl = this->tparent; cl; cl = cl->tparent) { - struct cbq_class *new = cl->defaults[TC_PRIO_BESTEFFORT]; - - if (new != NULL && new != this) - return new; - } - return NULL; -} - -#endif - -/* Classify packet. The procedure is pretty complicated, but - * it allows us to combine link sharing and priority scheduling - * transparently. - * - * Namely, you can put link sharing rules (f.e. route based) at root of CBQ, - * so that it resolves to split nodes. Then packets are classified - * by logical priority, or a more specific classifier may be attached - * to the split node. - */ - -static struct cbq_class * -cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *head = &q->link; - struct cbq_class **defmap; - struct cbq_class *cl = NULL; - u32 prio = skb->priority; - struct tcf_proto *fl; - struct tcf_result res; - - /* - * Step 1. If skb->priority points to one of our classes, use it. - */ - if (TC_H_MAJ(prio ^ sch->handle) == 0 && - (cl = cbq_class_lookup(q, prio)) != NULL) - return cl; - - *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; - for (;;) { - int result = 0; - defmap = head->defaults; - - fl = rcu_dereference_bh(head->filter_list); - /* - * Step 2+n. Apply classifier. - */ - result = tcf_classify(skb, fl, &res, true); - if (!fl || result < 0) - goto fallback; - if (result == TC_ACT_SHOT) - return NULL; - - cl = (void *)res.class; - if (!cl) { - if (TC_H_MAJ(res.classid)) - cl = cbq_class_lookup(q, res.classid); - else if ((cl = defmap[res.classid & TC_PRIO_MAX]) == NULL) - cl = defmap[TC_PRIO_BESTEFFORT]; - - if (cl == NULL) - goto fallback; - } - if (cl->level >= head->level) - goto fallback; -#ifdef CONFIG_NET_CLS_ACT - switch (result) { - case TC_ACT_QUEUED: - case TC_ACT_STOLEN: - case TC_ACT_TRAP: - *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; - fallthrough; - case TC_ACT_RECLASSIFY: - return cbq_reclassify(skb, cl); - } -#endif - if (cl->level == 0) - return cl; - - /* - * Step 3+n. If classifier selected a link sharing class, - * apply agency specific classifier. - * Repeat this procdure until we hit a leaf node. - */ - head = cl; - } - -fallback: - cl = head; - - /* - * Step 4. No success... - */ - if (TC_H_MAJ(prio) == 0 && - !(cl = head->defaults[prio & TC_PRIO_MAX]) && - !(cl = head->defaults[TC_PRIO_BESTEFFORT])) - return head; - - return cl; -} - -/* - * A packet has just been enqueued on the empty class. - * cbq_activate_class adds it to the tail of active class list - * of its priority band. - */ - -static inline void cbq_activate_class(struct cbq_class *cl) -{ - struct cbq_sched_data *q = qdisc_priv(cl->qdisc); - int prio = cl->cpriority; - struct cbq_class *cl_tail; - - cl_tail = q->active[prio]; - q->active[prio] = cl; - - if (cl_tail != NULL) { - cl->next_alive = cl_tail->next_alive; - cl_tail->next_alive = cl; - } else { - cl->next_alive = cl; - q->activemask |= (1<<prio); - } -} - -/* - * Unlink class from active chain. - * Note that this same procedure is done directly in cbq_dequeue* - * during round-robin procedure. - */ - -static void cbq_deactivate_class(struct cbq_class *this) -{ - struct cbq_sched_data *q = qdisc_priv(this->qdisc); - int prio = this->cpriority; - struct cbq_class *cl; - struct cbq_class *cl_prev = q->active[prio]; - - do { - cl = cl_prev->next_alive; - if (cl == this) { - cl_prev->next_alive = cl->next_alive; - cl->next_alive = NULL; - - if (cl == q->active[prio]) { - q->active[prio] = cl_prev; - if (cl == q->active[prio]) { - q->active[prio] = NULL; - q->activemask &= ~(1<<prio); - return; - } - } - return; - } - } while ((cl_prev = cl) != q->active[prio]); -} - -static void -cbq_mark_toplevel(struct cbq_sched_data *q, struct cbq_class *cl) -{ - int toplevel = q->toplevel; - - if (toplevel > cl->level) { - psched_time_t now = psched_get_time(); - - do { - if (cl->undertime < now) { - q->toplevel = cl->level; - return; - } - } while ((cl = cl->borrow) != NULL && toplevel > cl->level); - } -} - -static int -cbq_enqueue(struct sk_buff *skb, struct Qdisc *sch, - struct sk_buff **to_free) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - int ret; - struct cbq_class *cl = cbq_classify(skb, sch, &ret); - -#ifdef CONFIG_NET_CLS_ACT - q->rx_class = cl; -#endif - if (cl == NULL) { - if (ret & __NET_XMIT_BYPASS) - qdisc_qstats_drop(sch); - __qdisc_drop(skb, to_free); - return ret; - } - - ret = qdisc_enqueue(skb, cl->q, to_free); - if (ret == NET_XMIT_SUCCESS) { - sch->q.qlen++; - cbq_mark_toplevel(q, cl); - if (!cl->next_alive) - cbq_activate_class(cl); - return ret; - } - - if (net_xmit_drop_count(ret)) { - qdisc_qstats_drop(sch); - cbq_mark_toplevel(q, cl); - cl->qstats.drops++; - } - return ret; -} - -/* Overlimit action: penalize leaf class by adding offtime */ -static void cbq_overlimit(struct cbq_class *cl) -{ - struct cbq_sched_data *q = qdisc_priv(cl->qdisc); - psched_tdiff_t delay = cl->undertime - q->now; - - if (!cl->delayed) { - delay += cl->offtime; - - /* - * Class goes to sleep, so that it will have no - * chance to work avgidle. Let's forgive it 8) - * - * BTW cbq-2.0 has a crap in this - * place, apparently they forgot to shift it by cl->ewma_log. - */ - if (cl->avgidle < 0) - delay -= (-cl->avgidle) - ((-cl->avgidle) >> cl->ewma_log); - if (cl->avgidle < cl->minidle) - cl->avgidle = cl->minidle; - if (delay <= 0) - delay = 1; - cl->undertime = q->now + delay; - - cl->xstats.overactions++; - cl->delayed = 1; - } - if (q->wd_expires == 0 || q->wd_expires > delay) - q->wd_expires = delay; - - /* Dirty work! We must schedule wakeups based on - * real available rate, rather than leaf rate, - * which may be tiny (even zero). - */ - if (q->toplevel == TC_CBQ_MAXLEVEL) { - struct cbq_class *b; - psched_tdiff_t base_delay = q->wd_expires; - - for (b = cl->borrow; b; b = b->borrow) { - delay = b->undertime - q->now; - if (delay < base_delay) { - if (delay <= 0) - delay = 1; - base_delay = delay; - } - } - - q->wd_expires = base_delay; - } -} - -static psched_tdiff_t cbq_undelay_prio(struct cbq_sched_data *q, int prio, - psched_time_t now) -{ - struct cbq_class *cl; - struct cbq_class *cl_prev = q->active[prio]; - psched_time_t sched = now; - - if (cl_prev == NULL) - return 0; - - do { - cl = cl_prev->next_alive; - if (now - cl->penalized > 0) { - cl_prev->next_alive = cl->next_alive; - cl->next_alive = NULL; - cl->cpriority = cl->priority; - cl->delayed = 0; - cbq_activate_class(cl); - - if (cl == q->active[prio]) { - q->active[prio] = cl_prev; - if (cl == q->active[prio]) { - q->active[prio] = NULL; - return 0; - } - } - - cl = cl_prev->next_alive; - } else if (sched - cl->penalized > 0) - sched = cl->penalized; - } while ((cl_prev = cl) != q->active[prio]); - - return sched - now; -} - -static enum hrtimer_restart cbq_undelay(struct hrtimer *timer) -{ - struct cbq_sched_data *q = container_of(timer, struct cbq_sched_data, - delay_timer); - struct Qdisc *sch = q->watchdog.qdisc; - psched_time_t now; - psched_tdiff_t delay = 0; - unsigned int pmask; - - now = psched_get_time(); - - pmask = q->pmask; - q->pmask = 0; - - while (pmask) { - int prio = ffz(~pmask); - psched_tdiff_t tmp; - - pmask &= ~(1<<prio); - - tmp = cbq_undelay_prio(q, prio, now); - if (tmp > 0) { - q->pmask |= 1<<prio; - if (tmp < delay || delay == 0) - delay = tmp; - } - } - - if (delay) { - ktime_t time; - - time = 0; - time = ktime_add_ns(time, PSCHED_TICKS2NS(now + delay)); - hrtimer_start(&q->delay_timer, time, HRTIMER_MODE_ABS_PINNED); - } - - __netif_schedule(qdisc_root(sch)); - return HRTIMER_NORESTART; -} - -/* - * It is mission critical procedure. - * - * We "regenerate" toplevel cutoff, if transmitting class - * has backlog and it is not regulated. It is not part of - * original CBQ description, but looks more reasonable. - * Probably, it is wrong. This question needs further investigation. - */ - -static inline void -cbq_update_toplevel(struct cbq_sched_data *q, struct cbq_class *cl, - struct cbq_class *borrowed) -{ - if (cl && q->toplevel >= borrowed->level) { - if (cl->q->q.qlen > 1) { - do { - if (borrowed->undertime == PSCHED_PASTPERFECT) { - q->toplevel = borrowed->level; - return; - } - } while ((borrowed = borrowed->borrow) != NULL); - } -#if 0 - /* It is not necessary now. Uncommenting it - will save CPU cycles, but decrease fairness. - */ - q->toplevel = TC_CBQ_MAXLEVEL; -#endif - } -} - -static void -cbq_update(struct cbq_sched_data *q) -{ - struct cbq_class *this = q->tx_class; - struct cbq_class *cl = this; - int len = q->tx_len; - psched_time_t now; - - q->tx_class = NULL; - /* Time integrator. We calculate EOS time - * by adding expected packet transmission time. - */ - now = q->now + L2T(&q->link, len); - - for ( ; cl; cl = cl->share) { - long avgidle = cl->avgidle; - long idle; - - cl->bstats.packets++; - cl->bstats.bytes += len; - - /* - * (now - last) is total time between packet right edges. - * (last_pktlen/rate) is "virtual" busy time, so that - * - * idle = (now - last) - last_pktlen/rate - */ - - idle = now - cl->last; - if ((unsigned long)idle > 128*1024*1024) { - avgidle = cl->maxidle; - } else { - idle -= L2T(cl, len); - - /* true_avgidle := (1-W)*true_avgidle + W*idle, - * where W=2^{-ewma_log}. But cl->avgidle is scaled: - * cl->avgidle == true_avgidle/W, - * hence: - */ - avgidle += idle - (avgidle>>cl->ewma_log); - } - - if (avgidle <= 0) { - /* Overlimit or at-limit */ - - if (avgidle < cl->minidle) - avgidle = cl->minidle; - - cl->avgidle = avgidle; - - /* Calculate expected time, when this class - * will be allowed to send. - * It will occur, when: - * (1-W)*true_avgidle + W*delay = 0, i.e. - * idle = (1/W - 1)*(-true_avgidle) - * or - * idle = (1 - W)*(-cl->avgidle); - */ - idle = (-avgidle) - ((-avgidle) >> cl->ewma_log); - - /* - * That is not all. - * To maintain the rate allocated to the class, - * we add to undertime virtual clock, - * necessary to complete transmitted packet. - * (len/phys_bandwidth has been already passed - * to the moment of cbq_update) - */ - - idle -= L2T(&q->link, len); - idle += L2T(cl, len); - - cl->undertime = now + idle; - } else { - /* Underlimit */ - - cl->undertime = PSCHED_PASTPERFECT; - if (avgidle > cl->maxidle) - cl->avgidle = cl->maxidle; - else - cl->avgidle = avgidle; - } - if ((s64)(now - cl->last) > 0) - cl->last = now; - } - - cbq_update_toplevel(q, this, q->tx_borrowed); -} - -static inline struct cbq_class * -cbq_under_limit(struct cbq_class *cl) -{ - struct cbq_sched_data *q = qdisc_priv(cl->qdisc); - struct cbq_class *this_cl = cl; - - if (cl->tparent == NULL) - return cl; - - if (cl->undertime == PSCHED_PASTPERFECT || q->now >= cl->undertime) { - cl->delayed = 0; - return cl; - } - - do { - /* It is very suspicious place. Now overlimit - * action is generated for not bounded classes - * only if link is completely congested. - * Though it is in agree with ancestor-only paradigm, - * it looks very stupid. Particularly, - * it means that this chunk of code will either - * never be called or result in strong amplification - * of burstiness. Dangerous, silly, and, however, - * no another solution exists. - */ - cl = cl->borrow; - if (!cl) { - this_cl->qstats.overlimits++; - cbq_overlimit(this_cl); - return NULL; - } - if (cl->level > q->toplevel) - return NULL; - } while (cl->undertime != PSCHED_PASTPERFECT && q->now < cl->undertime); - - cl->delayed = 0; - return cl; -} - -static inline struct sk_buff * -cbq_dequeue_prio(struct Qdisc *sch, int prio) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *cl_tail, *cl_prev, *cl; - struct sk_buff *skb; - int deficit; - - cl_tail = cl_prev = q->active[prio]; - cl = cl_prev->next_alive; - - do { - deficit = 0; - - /* Start round */ - do { - struct cbq_class *borrow = cl; - - if (cl->q->q.qlen && - (borrow = cbq_under_limit(cl)) == NULL) - goto skip_class; - - if (cl->deficit <= 0) { - /* Class exhausted its allotment per - * this round. Switch to the next one. - */ - deficit = 1; - cl->deficit += cl->quantum; - goto next_class; - } - - skb = cl->q->dequeue(cl->q); - - /* Class did not give us any skb :-( - * It could occur even if cl->q->q.qlen != 0 - * f.e. if cl->q == "tbf" - */ - if (skb == NULL) - goto skip_class; - - cl->deficit -= qdisc_pkt_len(skb); - q->tx_class = cl; - q->tx_borrowed = borrow; - if (borrow != cl) { -#ifndef CBQ_XSTATS_BORROWS_BYTES - borrow->xstats.borrows++; - cl->xstats.borrows++; -#else - borrow->xstats.borrows += qdisc_pkt_len(skb); - cl->xstats.borrows += qdisc_pkt_len(skb); -#endif - } - q->tx_len = qdisc_pkt_len(skb); - - if (cl->deficit <= 0) { - q->active[prio] = cl; - cl = cl->next_alive; - cl->deficit += cl->quantum; - } - return skb; - -skip_class: - if (cl->q->q.qlen == 0 || prio != cl->cpriority) { - /* Class is empty or penalized. - * Unlink it from active chain. - */ - cl_prev->next_alive = cl->next_alive; - cl->next_alive = NULL; - - /* Did cl_tail point to it? */ - if (cl == cl_tail) { - /* Repair it! */ - cl_tail = cl_prev; - - /* Was it the last class in this band? */ - if (cl == cl_tail) { - /* Kill the band! */ - q->active[prio] = NULL; - q->activemask &= ~(1<<prio); - if (cl->q->q.qlen) - cbq_activate_class(cl); - return NULL; - } - - q->active[prio] = cl_tail; - } - if (cl->q->q.qlen) - cbq_activate_class(cl); - - cl = cl_prev; - } - -next_class: - cl_prev = cl; - cl = cl->next_alive; - } while (cl_prev != cl_tail); - } while (deficit); - - q->active[prio] = cl_prev; - - return NULL; -} - -static inline struct sk_buff * -cbq_dequeue_1(struct Qdisc *sch) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct sk_buff *skb; - unsigned int activemask; - - activemask = q->activemask & 0xFF; - while (activemask) { - int prio = ffz(~activemask); - activemask &= ~(1<<prio); - skb = cbq_dequeue_prio(sch, prio); - if (skb) - return skb; - } - return NULL; -} - -static struct sk_buff * -cbq_dequeue(struct Qdisc *sch) -{ - struct sk_buff *skb; - struct cbq_sched_data *q = qdisc_priv(sch); - psched_time_t now; - - now = psched_get_time(); - - if (q->tx_class) - cbq_update(q); - - q->now = now; - - for (;;) { - q->wd_expires = 0; - - skb = cbq_dequeue_1(sch); - if (skb) { - qdisc_bstats_update(sch, skb); - sch->q.qlen--; - return skb; - } - - /* All the classes are overlimit. - * - * It is possible, if: - * - * 1. Scheduler is empty. - * 2. Toplevel cutoff inhibited borrowing. - * 3. Root class is overlimit. - * - * Reset 2d and 3d conditions and retry. - * - * Note, that NS and cbq-2.0 are buggy, peeking - * an arbitrary class is appropriate for ancestor-only - * sharing, but not for toplevel algorithm. - * - * Our version is better, but slower, because it requires - * two passes, but it is unavoidable with top-level sharing. - */ - - if (q->toplevel == TC_CBQ_MAXLEVEL && - q->link.undertime == PSCHED_PASTPERFECT) - break; - - q->toplevel = TC_CBQ_MAXLEVEL; - q->link.undertime = PSCHED_PASTPERFECT; - } - - /* No packets in scheduler or nobody wants to give them to us :-( - * Sigh... start watchdog timer in the last case. - */ - - if (sch->q.qlen) { - qdisc_qstats_overlimit(sch); - if (q->wd_expires) - qdisc_watchdog_schedule(&q->watchdog, - now + q->wd_expires); - } - return NULL; -} - -/* CBQ class maintanance routines */ - -static void cbq_adjust_levels(struct cbq_class *this) -{ - if (this == NULL) - return; - - do { - int level = 0; - struct cbq_class *cl; - - cl = this->children; - if (cl) { - do { - if (cl->level > level) - level = cl->level; - } while ((cl = cl->sibling) != this->children); - } - this->level = level + 1; - } while ((this = this->tparent) != NULL); -} - -static void cbq_normalize_quanta(struct cbq_sched_data *q, int prio) -{ - struct cbq_class *cl; - unsigned int h; - - if (q->quanta[prio] == 0) - return; - - for (h = 0; h < q->clhash.hashsize; h++) { - hlist_for_each_entry(cl, &q->clhash.hash[h], common.hnode) { - /* BUGGGG... Beware! This expression suffer of - * arithmetic overflows! - */ - if (cl->priority == prio) { - cl->quantum = (cl->weight*cl->allot*q->nclasses[prio])/ - q->quanta[prio]; - } - if (cl->quantum <= 0 || - cl->quantum > 32*qdisc_dev(cl->qdisc)->mtu) { - pr_warn("CBQ: class %08x has bad quantum==%ld, repaired.\n", - cl->common.classid, cl->quantum); - cl->quantum = qdisc_dev(cl->qdisc)->mtu/2 + 1; - } - } - } -} - -static void cbq_sync_defmap(struct cbq_class *cl) -{ - struct cbq_sched_data *q = qdisc_priv(cl->qdisc); - struct cbq_class *split = cl->split; - unsigned int h; - int i; - - if (split == NULL) - return; - - for (i = 0; i <= TC_PRIO_MAX; i++) { - if (split->defaults[i] == cl && !(cl->defmap & (1<<i))) - split->defaults[i] = NULL; - } - - for (i = 0; i <= TC_PRIO_MAX; i++) { - int level = split->level; - - if (split->defaults[i]) - continue; - - for (h = 0; h < q->clhash.hashsize; h++) { - struct cbq_class *c; - - hlist_for_each_entry(c, &q->clhash.hash[h], - common.hnode) { - if (c->split == split && c->level < level && - c->defmap & (1<<i)) { - split->defaults[i] = c; - level = c->level; - } - } - } - } -} - -static void cbq_change_defmap(struct cbq_class *cl, u32 splitid, u32 def, u32 mask) -{ - struct cbq_class *split = NULL; - - if (splitid == 0) { - split = cl->split; - if (!split) - return; - splitid = split->common.classid; - } - - if (split == NULL || split->common.classid != splitid) { - for (split = cl->tparent; split; split = split->tparent) - if (split->common.classid == splitid) - break; - } - - if (split == NULL) - return; - - if (cl->split != split) { - cl->defmap = 0; - cbq_sync_defmap(cl); - cl->split = split; - cl->defmap = def & mask; - } else - cl->defmap = (cl->defmap & ~mask) | (def & mask); - - cbq_sync_defmap(cl); -} - -static void cbq_unlink_class(struct cbq_class *this) -{ - struct cbq_class *cl, **clp; - struct cbq_sched_data *q = qdisc_priv(this->qdisc); - - qdisc_class_hash_remove(&q->clhash, &this->common); - - if (this->tparent) { - clp = &this->sibling; - cl = *clp; - do { - if (cl == this) { - *clp = cl->sibling; - break; - } - clp = &cl->sibling; - } while ((cl = *clp) != this->sibling); - - if (this->tparent->children == this) { - this->tparent->children = this->sibling; - if (this->sibling == this) - this->tparent->children = NULL; - } - } else { - WARN_ON(this->sibling != this); - } -} - -static void cbq_link_class(struct cbq_class *this) -{ - struct cbq_sched_data *q = qdisc_priv(this->qdisc); - struct cbq_class *parent = this->tparent; - - this->sibling = this; - qdisc_class_hash_insert(&q->clhash, &this->common); - - if (parent == NULL) - return; - - if (parent->children == NULL) { - parent->children = this; - } else { - this->sibling = parent->children->sibling; - parent->children->sibling = this; - } -} - -static void -cbq_reset(struct Qdisc *sch) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *cl; - int prio; - unsigned int h; - - q->activemask = 0; - q->pmask = 0; - q->tx_class = NULL; - q->tx_borrowed = NULL; - qdisc_watchdog_cancel(&q->watchdog); - hrtimer_cancel(&q->delay_timer); - q->toplevel = TC_CBQ_MAXLEVEL; - q->now = psched_get_time(); - - for (prio = 0; prio <= TC_CBQ_MAXPRIO; prio++) - q->active[prio] = NULL; - - for (h = 0; h < q->clhash.hashsize; h++) { - hlist_for_each_entry(cl, &q->clhash.hash[h], common.hnode) { - qdisc_reset(cl->q); - - cl->next_alive = NULL; - cl->undertime = PSCHED_PASTPERFECT; - cl->avgidle = cl->maxidle; - cl->deficit = cl->quantum; - cl->cpriority = cl->priority; - } - } -} - - -static int cbq_set_lss(struct cbq_class *cl, struct tc_cbq_lssopt *lss) -{ - if (lss->change & TCF_CBQ_LSS_FLAGS) { - cl->share = (lss->flags & TCF_CBQ_LSS_ISOLATED) ? NULL : cl->tparent; - cl->borrow = (lss->flags & TCF_CBQ_LSS_BOUNDED) ? NULL : cl->tparent; - } - if (lss->change & TCF_CBQ_LSS_EWMA) - cl->ewma_log = lss->ewma_log; - if (lss->change & TCF_CBQ_LSS_AVPKT) - cl->avpkt = lss->avpkt; - if (lss->change & TCF_CBQ_LSS_MINIDLE) - cl->minidle = -(long)lss->minidle; - if (lss->change & TCF_CBQ_LSS_MAXIDLE) { - cl->maxidle = lss->maxidle; - cl->avgidle = lss->maxidle; - } - if (lss->change & TCF_CBQ_LSS_OFFTIME) - cl->offtime = lss->offtime; - return 0; -} - -static void cbq_rmprio(struct cbq_sched_data *q, struct cbq_class *cl) -{ - q->nclasses[cl->priority]--; - q->quanta[cl->priority] -= cl->weight; - cbq_normalize_quanta(q, cl->priority); -} - -static void cbq_addprio(struct cbq_sched_data *q, struct cbq_class *cl) -{ - q->nclasses[cl->priority]++; - q->quanta[cl->priority] += cl->weight; - cbq_normalize_quanta(q, cl->priority); -} - -static int cbq_set_wrr(struct cbq_class *cl, struct tc_cbq_wrropt *wrr) -{ - struct cbq_sched_data *q = qdisc_priv(cl->qdisc); - - if (wrr->allot) - cl->allot = wrr->allot; - if (wrr->weight) - cl->weight = wrr->weight; - if (wrr->priority) { - cl->priority = wrr->priority - 1; - cl->cpriority = cl->priority; - if (cl->priority >= cl->priority2) - cl->priority2 = TC_CBQ_MAXPRIO - 1; - } - - cbq_addprio(q, cl); - return 0; -} - -static int cbq_set_fopt(struct cbq_class *cl, struct tc_cbq_fopt *fopt) -{ - cbq_change_defmap(cl, fopt->split, fopt->defmap, fopt->defchange); - return 0; -} - -static const struct nla_policy cbq_policy[TCA_CBQ_MAX + 1] = { - [TCA_CBQ_LSSOPT] = { .len = sizeof(struct tc_cbq_lssopt) }, - [TCA_CBQ_WRROPT] = { .len = sizeof(struct tc_cbq_wrropt) }, - [TCA_CBQ_FOPT] = { .len = sizeof(struct tc_cbq_fopt) }, - [TCA_CBQ_OVL_STRATEGY] = { .len = sizeof(struct tc_cbq_ovl) }, - [TCA_CBQ_RATE] = { .len = sizeof(struct tc_ratespec) }, - [TCA_CBQ_RTAB] = { .type = NLA_BINARY, .len = TC_RTAB_SIZE }, - [TCA_CBQ_POLICE] = { .len = sizeof(struct tc_cbq_police) }, -}; - -static int cbq_opt_parse(struct nlattr *tb[TCA_CBQ_MAX + 1], - struct nlattr *opt, - struct netlink_ext_ack *extack) -{ - int err; - - if (!opt) { - NL_SET_ERR_MSG(extack, "CBQ options are required for this operation"); - return -EINVAL; - } - - err = nla_parse_nested_deprecated(tb, TCA_CBQ_MAX, opt, - cbq_policy, extack); - if (err < 0) - return err; - - if (tb[TCA_CBQ_WRROPT]) { - const struct tc_cbq_wrropt *wrr = nla_data(tb[TCA_CBQ_WRROPT]); - - if (wrr->priority > TC_CBQ_MAXPRIO) { - NL_SET_ERR_MSG(extack, "priority is bigger than TC_CBQ_MAXPRIO"); - err = -EINVAL; - } - } - return err; -} - -static int cbq_init(struct Qdisc *sch, struct nlattr *opt, - struct netlink_ext_ack *extack) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct nlattr *tb[TCA_CBQ_MAX + 1]; - struct tc_ratespec *r; - int err; - - qdisc_watchdog_init(&q->watchdog, sch); - hrtimer_init(&q->delay_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED); - q->delay_timer.function = cbq_undelay; - - err = cbq_opt_parse(tb, opt, extack); - if (err < 0) - return err; - - if (!tb[TCA_CBQ_RTAB] || !tb[TCA_CBQ_RATE]) { - NL_SET_ERR_MSG(extack, "Rate specification missing or incomplete"); - return -EINVAL; - } - - r = nla_data(tb[TCA_CBQ_RATE]); - - q->link.R_tab = qdisc_get_rtab(r, tb[TCA_CBQ_RTAB], extack); - if (!q->link.R_tab) - return -EINVAL; - - err = tcf_block_get(&q->link.block, &q->link.filter_list, sch, extack); - if (err) - goto put_rtab; - - err = qdisc_class_hash_init(&q->clhash); - if (err < 0) - goto put_block; - - q->link.sibling = &q->link; - q->link.common.classid = sch->handle; - q->link.qdisc = sch; - q->link.q = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, - sch->handle, NULL); - if (!q->link.q) - q->link.q = &noop_qdisc; - else - qdisc_hash_add(q->link.q, true); - - q->link.priority = TC_CBQ_MAXPRIO - 1; - q->link.priority2 = TC_CBQ_MAXPRIO - 1; - q->link.cpriority = TC_CBQ_MAXPRIO - 1; - q->link.allot = psched_mtu(qdisc_dev(sch)); - q->link.quantum = q->link.allot; - q->link.weight = q->link.R_tab->rate.rate; - - q->link.ewma_log = TC_CBQ_DEF_EWMA; - q->link.avpkt = q->link.allot/2; - q->link.minidle = -0x7FFFFFFF; - - q->toplevel = TC_CBQ_MAXLEVEL; - q->now = psched_get_time(); - - cbq_link_class(&q->link); - - if (tb[TCA_CBQ_LSSOPT]) - cbq_set_lss(&q->link, nla_data(tb[TCA_CBQ_LSSOPT])); - - cbq_addprio(q, &q->link); - return 0; - -put_block: - tcf_block_put(q->link.block); - -put_rtab: - qdisc_put_rtab(q->link.R_tab); - return err; -} - -static int cbq_dump_rate(struct sk_buff *skb, struct cbq_class *cl) -{ - unsigned char *b = skb_tail_pointer(skb); - - if (nla_put(skb, TCA_CBQ_RATE, sizeof(cl->R_tab->rate), &cl->R_tab->rate)) - goto nla_put_failure; - return skb->len; - -nla_put_failure: - nlmsg_trim(skb, b); - return -1; -} - -static int cbq_dump_lss(struct sk_buff *skb, struct cbq_class *cl) -{ - unsigned char *b = skb_tail_pointer(skb); - struct tc_cbq_lssopt opt; - - opt.flags = 0; - if (cl->borrow == NULL) - opt.flags |= TCF_CBQ_LSS_BOUNDED; - if (cl->share == NULL) - opt.flags |= TCF_CBQ_LSS_ISOLATED; - opt.ewma_log = cl->ewma_log; - opt.level = cl->level; - opt.avpkt = cl->avpkt; - opt.maxidle = cl->maxidle; - opt.minidle = (u32)(-cl->minidle); - opt.offtime = cl->offtime; - opt.change = ~0; - if (nla_put(skb, TCA_CBQ_LSSOPT, sizeof(opt), &opt)) - goto nla_put_failure; - return skb->len; - -nla_put_failure: - nlmsg_trim(skb, b); - return -1; -} - -static int cbq_dump_wrr(struct sk_buff *skb, struct cbq_class *cl) -{ - unsigned char *b = skb_tail_pointer(skb); - struct tc_cbq_wrropt opt; - - memset(&opt, 0, sizeof(opt)); - opt.flags = 0; - opt.allot = cl->allot; - opt.priority = cl->priority + 1; - opt.cpriority = cl->cpriority + 1; - opt.weight = cl->weight; - if (nla_put(skb, TCA_CBQ_WRROPT, sizeof(opt), &opt)) - goto nla_put_failure; - return skb->len; - -nla_put_failure: - nlmsg_trim(skb, b); - return -1; -} - -static int cbq_dump_fopt(struct sk_buff *skb, struct cbq_class *cl) -{ - unsigned char *b = skb_tail_pointer(skb); - struct tc_cbq_fopt opt; - - if (cl->split || cl->defmap) { - opt.split = cl->split ? cl->split->common.classid : 0; - opt.defmap = cl->defmap; - opt.defchange = ~0; - if (nla_put(skb, TCA_CBQ_FOPT, sizeof(opt), &opt)) - goto nla_put_failure; - } - return skb->len; - -nla_put_failure: - nlmsg_trim(skb, b); - return -1; -} - -static int cbq_dump_attr(struct sk_buff *skb, struct cbq_class *cl) -{ - if (cbq_dump_lss(skb, cl) < 0 || - cbq_dump_rate(skb, cl) < 0 || - cbq_dump_wrr(skb, cl) < 0 || - cbq_dump_fopt(skb, cl) < 0) - return -1; - return 0; -} - -static int cbq_dump(struct Qdisc *sch, struct sk_buff *skb) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct nlattr *nest; - - nest = nla_nest_start_noflag(skb, TCA_OPTIONS); - if (nest == NULL) - goto nla_put_failure; - if (cbq_dump_attr(skb, &q->link) < 0) - goto nla_put_failure; - return nla_nest_end(skb, nest); - -nla_put_failure: - nla_nest_cancel(skb, nest); - return -1; -} - -static int -cbq_dump_stats(struct Qdisc *sch, struct gnet_dump *d) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - - q->link.xstats.avgidle = q->link.avgidle; - return gnet_stats_copy_app(d, &q->link.xstats, sizeof(q->link.xstats)); -} - -static int -cbq_dump_class(struct Qdisc *sch, unsigned long arg, - struct sk_buff *skb, struct tcmsg *tcm) -{ - struct cbq_class *cl = (struct cbq_class *)arg; - struct nlattr *nest; - - if (cl->tparent) - tcm->tcm_parent = cl->tparent->common.classid; - else - tcm->tcm_parent = TC_H_ROOT; - tcm->tcm_handle = cl->common.classid; - tcm->tcm_info = cl->q->handle; - - nest = nla_nest_start_noflag(skb, TCA_OPTIONS); - if (nest == NULL) - goto nla_put_failure; - if (cbq_dump_attr(skb, cl) < 0) - goto nla_put_failure; - return nla_nest_end(skb, nest); - -nla_put_failure: - nla_nest_cancel(skb, nest); - return -1; -} - -static int -cbq_dump_class_stats(struct Qdisc *sch, unsigned long arg, - struct gnet_dump *d) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *cl = (struct cbq_class *)arg; - __u32 qlen; - - cl->xstats.avgidle = cl->avgidle; - cl->xstats.undertime = 0; - qdisc_qstats_qlen_backlog(cl->q, &qlen, &cl->qstats.backlog); - - if (cl->undertime != PSCHED_PASTPERFECT) - cl->xstats.undertime = cl->undertime - q->now; - - if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch), - d, NULL, &cl->bstats) < 0 || - gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 || - gnet_stats_copy_queue(d, NULL, &cl->qstats, qlen) < 0) - return -1; - - return gnet_stats_copy_app(d, &cl->xstats, sizeof(cl->xstats)); -} - -static int cbq_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new, - struct Qdisc **old, struct netlink_ext_ack *extack) -{ - struct cbq_class *cl = (struct cbq_class *)arg; - - if (new == NULL) { - new = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, - cl->common.classid, extack); - if (new == NULL) - return -ENOBUFS; - } - - *old = qdisc_replace(sch, new, &cl->q); - return 0; -} - -static struct Qdisc *cbq_leaf(struct Qdisc *sch, unsigned long arg) -{ - struct cbq_class *cl = (struct cbq_class *)arg; - - return cl->q; -} - -static void cbq_qlen_notify(struct Qdisc *sch, unsigned long arg) -{ - struct cbq_class *cl = (struct cbq_class *)arg; - - cbq_deactivate_class(cl); -} - -static unsigned long cbq_find(struct Qdisc *sch, u32 classid) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - - return (unsigned long)cbq_class_lookup(q, classid); -} - -static void cbq_destroy_class(struct Qdisc *sch, struct cbq_class *cl) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - - WARN_ON(cl->filters); - - tcf_block_put(cl->block); - qdisc_put(cl->q); - qdisc_put_rtab(cl->R_tab); - gen_kill_estimator(&cl->rate_est); - if (cl != &q->link) - kfree(cl); -} - -static void cbq_destroy(struct Qdisc *sch) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct hlist_node *next; - struct cbq_class *cl; - unsigned int h; - -#ifdef CONFIG_NET_CLS_ACT - q->rx_class = NULL; -#endif - /* - * Filters must be destroyed first because we don't destroy the - * classes from root to leafs which means that filters can still - * be bound to classes which have been destroyed already. --TGR '04 - */ - for (h = 0; h < q->clhash.hashsize; h++) { - hlist_for_each_entry(cl, &q->clhash.hash[h], common.hnode) { - tcf_block_put(cl->block); - cl->block = NULL; - } - } - for (h = 0; h < q->clhash.hashsize; h++) { - hlist_for_each_entry_safe(cl, next, &q->clhash.hash[h], - common.hnode) - cbq_destroy_class(sch, cl); - } - qdisc_class_hash_destroy(&q->clhash); -} - -static int -cbq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, struct nlattr **tca, - unsigned long *arg, struct netlink_ext_ack *extack) -{ - int err; - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *cl = (struct cbq_class *)*arg; - struct nlattr *opt = tca[TCA_OPTIONS]; - struct nlattr *tb[TCA_CBQ_MAX + 1]; - struct cbq_class *parent; - struct qdisc_rate_table *rtab = NULL; - - err = cbq_opt_parse(tb, opt, extack); - if (err < 0) - return err; - - if (tb[TCA_CBQ_OVL_STRATEGY] || tb[TCA_CBQ_POLICE]) { - NL_SET_ERR_MSG(extack, "Neither overlimit strategy nor policing attributes can be used for changing class params"); - return -EOPNOTSUPP; - } - - if (cl) { - /* Check parent */ - if (parentid) { - if (cl->tparent && - cl->tparent->common.classid != parentid) { - NL_SET_ERR_MSG(extack, "Invalid parent id"); - return -EINVAL; - } - if (!cl->tparent && parentid != TC_H_ROOT) { - NL_SET_ERR_MSG(extack, "Parent must be root"); - return -EINVAL; - } - } - - if (tb[TCA_CBQ_RATE]) { - rtab = qdisc_get_rtab(nla_data(tb[TCA_CBQ_RATE]), - tb[TCA_CBQ_RTAB], extack); - if (rtab == NULL) - return -EINVAL; - } - - if (tca[TCA_RATE]) { - err = gen_replace_estimator(&cl->bstats, NULL, - &cl->rate_est, - NULL, - qdisc_root_sleeping_running(sch), - tca[TCA_RATE]); - if (err) { - NL_SET_ERR_MSG(extack, "Failed to replace specified rate estimator"); - qdisc_put_rtab(rtab); - return err; - } - } - - /* Change class parameters */ - sch_tree_lock(sch); - - if (cl->next_alive != NULL) - cbq_deactivate_class(cl); - - if (rtab) { - qdisc_put_rtab(cl->R_tab); - cl->R_tab = rtab; - } - - if (tb[TCA_CBQ_LSSOPT]) - cbq_set_lss(cl, nla_data(tb[TCA_CBQ_LSSOPT])); - - if (tb[TCA_CBQ_WRROPT]) { - cbq_rmprio(q, cl); - cbq_set_wrr(cl, nla_data(tb[TCA_CBQ_WRROPT])); - } - - if (tb[TCA_CBQ_FOPT]) - cbq_set_fopt(cl, nla_data(tb[TCA_CBQ_FOPT])); - - if (cl->q->q.qlen) - cbq_activate_class(cl); - - sch_tree_unlock(sch); - - return 0; - } - - if (parentid == TC_H_ROOT) - return -EINVAL; - - if (!tb[TCA_CBQ_WRROPT] || !tb[TCA_CBQ_RATE] || !tb[TCA_CBQ_LSSOPT]) { - NL_SET_ERR_MSG(extack, "One of the following attributes MUST be specified: WRR, rate or link sharing"); - return -EINVAL; - } - - rtab = qdisc_get_rtab(nla_data(tb[TCA_CBQ_RATE]), tb[TCA_CBQ_RTAB], - extack); - if (rtab == NULL) - return -EINVAL; - - if (classid) { - err = -EINVAL; - if (TC_H_MAJ(classid ^ sch->handle) || - cbq_class_lookup(q, classid)) { - NL_SET_ERR_MSG(extack, "Specified class not found"); - goto failure; - } - } else { - int i; - classid = TC_H_MAKE(sch->handle, 0x8000); - - for (i = 0; i < 0x8000; i++) { - if (++q->hgenerator >= 0x8000) - q->hgenerator = 1; - if (cbq_class_lookup(q, classid|q->hgenerator) == NULL) - break; - } - err = -ENOSR; - if (i >= 0x8000) { - NL_SET_ERR_MSG(extack, "Unable to generate classid"); - goto failure; - } - classid = classid|q->hgenerator; - } - - parent = &q->link; - if (parentid) { - parent = cbq_class_lookup(q, parentid); - err = -EINVAL; - if (!parent) { - NL_SET_ERR_MSG(extack, "Failed to find parentid"); - goto failure; - } - } - - err = -ENOBUFS; - cl = kzalloc(sizeof(*cl), GFP_KERNEL); - if (cl == NULL) - goto failure; - - err = tcf_block_get(&cl->block, &cl->filter_list, sch, extack); - if (err) { - kfree(cl); - goto failure; - } - - if (tca[TCA_RATE]) { - err = gen_new_estimator(&cl->bstats, NULL, &cl->rate_est, - NULL, - qdisc_root_sleeping_running(sch), - tca[TCA_RATE]); - if (err) { - NL_SET_ERR_MSG(extack, "Couldn't create new estimator"); - tcf_block_put(cl->block); - kfree(cl); - goto failure; - } - } - - cl->R_tab = rtab; - rtab = NULL; - cl->q = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, classid, - NULL); - if (!cl->q) - cl->q = &noop_qdisc; - else - qdisc_hash_add(cl->q, true); - - cl->common.classid = classid; - cl->tparent = parent; - cl->qdisc = sch; - cl->allot = parent->allot; - cl->quantum = cl->allot; - cl->weight = cl->R_tab->rate.rate; - - sch_tree_lock(sch); - cbq_link_class(cl); - cl->borrow = cl->tparent; - if (cl->tparent != &q->link) - cl->share = cl->tparent; - cbq_adjust_levels(parent); - cl->minidle = -0x7FFFFFFF; - cbq_set_lss(cl, nla_data(tb[TCA_CBQ_LSSOPT])); - cbq_set_wrr(cl, nla_data(tb[TCA_CBQ_WRROPT])); - if (cl->ewma_log == 0) - cl->ewma_log = q->link.ewma_log; - if (cl->maxidle == 0) - cl->maxidle = q->link.maxidle; - if (cl->avpkt == 0) - cl->avpkt = q->link.avpkt; - if (tb[TCA_CBQ_FOPT]) - cbq_set_fopt(cl, nla_data(tb[TCA_CBQ_FOPT])); - sch_tree_unlock(sch); - - qdisc_class_hash_grow(sch, &q->clhash); - - *arg = (unsigned long)cl; - return 0; - -failure: - qdisc_put_rtab(rtab); - return err; -} - -static int cbq_delete(struct Qdisc *sch, unsigned long arg) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *cl = (struct cbq_class *)arg; - - if (cl->filters || cl->children || cl == &q->link) - return -EBUSY; - - sch_tree_lock(sch); - - qdisc_purge_queue(cl->q); - - if (cl->next_alive) - cbq_deactivate_class(cl); - - if (q->tx_borrowed == cl) - q->tx_borrowed = q->tx_class; - if (q->tx_class == cl) { - q->tx_class = NULL; - q->tx_borrowed = NULL; - } -#ifdef CONFIG_NET_CLS_ACT - if (q->rx_class == cl) - q->rx_class = NULL; -#endif - - cbq_unlink_class(cl); - cbq_adjust_levels(cl->tparent); - cl->defmap = 0; - cbq_sync_defmap(cl); - - cbq_rmprio(q, cl); - sch_tree_unlock(sch); - - cbq_destroy_class(sch, cl); - return 0; -} - -static struct tcf_block *cbq_tcf_block(struct Qdisc *sch, unsigned long arg, - struct netlink_ext_ack *extack) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *cl = (struct cbq_class *)arg; - - if (cl == NULL) - cl = &q->link; - - return cl->block; -} - -static unsigned long cbq_bind_filter(struct Qdisc *sch, unsigned long parent, - u32 classid) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *p = (struct cbq_class *)parent; - struct cbq_class *cl = cbq_class_lookup(q, classid); - - if (cl) { - if (p && p->level <= cl->level) - return 0; - cl->filters++; - return (unsigned long)cl; - } - return 0; -} - -static void cbq_unbind_filter(struct Qdisc *sch, unsigned long arg) -{ - struct cbq_class *cl = (struct cbq_class *)arg; - - cl->filters--; -} - -static void cbq_walk(struct Qdisc *sch, struct qdisc_walker *arg) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *cl; - unsigned int h; - - if (arg->stop) - return; - - for (h = 0; h < q->clhash.hashsize; h++) { - hlist_for_each_entry(cl, &q->clhash.hash[h], common.hnode) { - if (arg->count < arg->skip) { - arg->count++; - continue; - } - if (arg->fn(sch, (unsigned long)cl, arg) < 0) { - arg->stop = 1; - return; - } - arg->count++; - } - } -} - -static const struct Qdisc_class_ops cbq_class_ops = { - .graft = cbq_graft, - .leaf = cbq_leaf, - .qlen_notify = cbq_qlen_notify, - .find = cbq_find, - .change = cbq_change_class, - .delete = cbq_delete, - .walk = cbq_walk, - .tcf_block = cbq_tcf_block, - .bind_tcf = cbq_bind_filter, - .unbind_tcf = cbq_unbind_filter, - .dump = cbq_dump_class, - .dump_stats = cbq_dump_class_stats, -}; - -static struct Qdisc_ops cbq_qdisc_ops __read_mostly = { - .next = NULL, - .cl_ops = &cbq_class_ops, - .id = "cbq", - .priv_size = sizeof(struct cbq_sched_data), - .enqueue = cbq_enqueue, - .dequeue = cbq_dequeue, - .peek = qdisc_peek_dequeued, - .init = cbq_init, - .reset = cbq_reset, - .destroy = cbq_destroy, - .change = NULL, - .dump = cbq_dump, - .dump_stats = cbq_dump_stats, - .owner = THIS_MODULE, -}; - -static int __init cbq_module_init(void) -{ - return register_qdisc(&cbq_qdisc_ops); -} -static void __exit cbq_module_exit(void) -{ - unregister_qdisc(&cbq_qdisc_ops); -} -module_init(cbq_module_init) -module_exit(cbq_module_exit) -MODULE_LICENSE("GPL");
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jamal Hadi Salim jhs@mojatatu.com
commit fb38306ceb9e770adfb5ffa6e3c64047b55f7a07 upstream.
The ATM qdisc has served us well over the years but has not been getting much TLC due to lack of known users. Most recently it has become a shooting target for syzkaller. For this reason, we are retiring it.
Signed-off-by: Jamal Hadi Salim jhs@mojatatu.com Acked-by: Jiri Pirko jiri@nvidia.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/Kconfig | 14 - net/sched/Makefile | 1 net/sched/sch_atm.c | 709 ---------------------------------------------------- 3 files changed, 724 deletions(-) delete mode 100644 net/sched/sch_atm.c delete mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/atm.json
--- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -68,20 +68,6 @@ config NET_SCH_HFSC To compile this code as a module, choose M here: the module will be called sch_hfsc.
-config NET_SCH_ATM - tristate "ATM Virtual Circuits (ATM)" - depends on ATM - help - Say Y here if you want to use the ATM pseudo-scheduler. This - provides a framework for invoking classifiers, which in turn - select classes of this queuing discipline. Each class maps - the flow(s) it is handling to a given virtual circuit. - - See the top of file:net/sched/sch_atm.c for more details. - - To compile this code as a module, choose M here: the - module will be called sch_atm. - config NET_SCH_PRIO tristate "Multi Band Priority Queueing (PRIO)" help --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -44,7 +44,6 @@ obj-$(CONFIG_NET_SCH_TBF) += sch_tbf.o obj-$(CONFIG_NET_SCH_TEQL) += sch_teql.o obj-$(CONFIG_NET_SCH_PRIO) += sch_prio.o obj-$(CONFIG_NET_SCH_MULTIQ) += sch_multiq.o -obj-$(CONFIG_NET_SCH_ATM) += sch_atm.o obj-$(CONFIG_NET_SCH_NETEM) += sch_netem.o obj-$(CONFIG_NET_SCH_DRR) += sch_drr.o obj-$(CONFIG_NET_SCH_PLUG) += sch_plug.o --- a/net/sched/sch_atm.c +++ /dev/null @@ -1,709 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* net/sched/sch_atm.c - ATM VC selection "queueing discipline" */ - -/* Written 1998-2000 by Werner Almesberger, EPFL ICA */ - -#include <linux/module.h> -#include <linux/slab.h> -#include <linux/init.h> -#include <linux/interrupt.h> -#include <linux/string.h> -#include <linux/errno.h> -#include <linux/skbuff.h> -#include <linux/atmdev.h> -#include <linux/atmclip.h> -#include <linux/rtnetlink.h> -#include <linux/file.h> /* for fput */ -#include <net/netlink.h> -#include <net/pkt_sched.h> -#include <net/pkt_cls.h> - -/* - * The ATM queuing discipline provides a framework for invoking classifiers - * (aka "filters"), which in turn select classes of this queuing discipline. - * Each class maps the flow(s) it is handling to a given VC. Multiple classes - * may share the same VC. - * - * When creating a class, VCs are specified by passing the number of the open - * socket descriptor by which the calling process references the VC. The kernel - * keeps the VC open at least until all classes using it are removed. - * - * In this file, most functions are named atm_tc_* to avoid confusion with all - * the atm_* in net/atm. This naming convention differs from what's used in the - * rest of net/sched. - * - * Known bugs: - * - sometimes messes up the IP stack - * - any manipulations besides the few operations described in the README, are - * untested and likely to crash the system - * - should lock the flow while there is data in the queue (?) - */ - -#define VCC2FLOW(vcc) ((struct atm_flow_data *) ((vcc)->user_back)) - -struct atm_flow_data { - struct Qdisc_class_common common; - struct Qdisc *q; /* FIFO, TBF, etc. */ - struct tcf_proto __rcu *filter_list; - struct tcf_block *block; - struct atm_vcc *vcc; /* VCC; NULL if VCC is closed */ - void (*old_pop)(struct atm_vcc *vcc, - struct sk_buff *skb); /* chaining */ - struct atm_qdisc_data *parent; /* parent qdisc */ - struct socket *sock; /* for closing */ - int ref; /* reference count */ - struct gnet_stats_basic_packed bstats; - struct gnet_stats_queue qstats; - struct list_head list; - struct atm_flow_data *excess; /* flow for excess traffic; - NULL to set CLP instead */ - int hdr_len; - unsigned char hdr[]; /* header data; MUST BE LAST */ -}; - -struct atm_qdisc_data { - struct atm_flow_data link; /* unclassified skbs go here */ - struct list_head flows; /* NB: "link" is also on this - list */ - struct tasklet_struct task; /* dequeue tasklet */ -}; - -/* ------------------------- Class/flow operations ------------------------- */ - -static inline struct atm_flow_data *lookup_flow(struct Qdisc *sch, u32 classid) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow; - - list_for_each_entry(flow, &p->flows, list) { - if (flow->common.classid == classid) - return flow; - } - return NULL; -} - -static int atm_tc_graft(struct Qdisc *sch, unsigned long arg, - struct Qdisc *new, struct Qdisc **old, - struct netlink_ext_ack *extack) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow = (struct atm_flow_data *)arg; - - pr_debug("atm_tc_graft(sch %p,[qdisc %p],flow %p,new %p,old %p)\n", - sch, p, flow, new, old); - if (list_empty(&flow->list)) - return -EINVAL; - if (!new) - new = &noop_qdisc; - *old = flow->q; - flow->q = new; - if (*old) - qdisc_reset(*old); - return 0; -} - -static struct Qdisc *atm_tc_leaf(struct Qdisc *sch, unsigned long cl) -{ - struct atm_flow_data *flow = (struct atm_flow_data *)cl; - - pr_debug("atm_tc_leaf(sch %p,flow %p)\n", sch, flow); - return flow ? flow->q : NULL; -} - -static unsigned long atm_tc_find(struct Qdisc *sch, u32 classid) -{ - struct atm_qdisc_data *p __maybe_unused = qdisc_priv(sch); - struct atm_flow_data *flow; - - pr_debug("%s(sch %p,[qdisc %p],classid %x)\n", __func__, sch, p, classid); - flow = lookup_flow(sch, classid); - pr_debug("%s: flow %p\n", __func__, flow); - return (unsigned long)flow; -} - -static unsigned long atm_tc_bind_filter(struct Qdisc *sch, - unsigned long parent, u32 classid) -{ - struct atm_qdisc_data *p __maybe_unused = qdisc_priv(sch); - struct atm_flow_data *flow; - - pr_debug("%s(sch %p,[qdisc %p],classid %x)\n", __func__, sch, p, classid); - flow = lookup_flow(sch, classid); - if (flow) - flow->ref++; - pr_debug("%s: flow %p\n", __func__, flow); - return (unsigned long)flow; -} - -/* - * atm_tc_put handles all destructions, including the ones that are explicitly - * requested (atm_tc_destroy, etc.). The assumption here is that we never drop - * anything that still seems to be in use. - */ -static void atm_tc_put(struct Qdisc *sch, unsigned long cl) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow = (struct atm_flow_data *)cl; - - pr_debug("atm_tc_put(sch %p,[qdisc %p],flow %p)\n", sch, p, flow); - if (--flow->ref) - return; - pr_debug("atm_tc_put: destroying\n"); - list_del_init(&flow->list); - pr_debug("atm_tc_put: qdisc %p\n", flow->q); - qdisc_put(flow->q); - tcf_block_put(flow->block); - if (flow->sock) { - pr_debug("atm_tc_put: f_count %ld\n", - file_count(flow->sock->file)); - flow->vcc->pop = flow->old_pop; - sockfd_put(flow->sock); - } - if (flow->excess) - atm_tc_put(sch, (unsigned long)flow->excess); - if (flow != &p->link) - kfree(flow); - /* - * If flow == &p->link, the qdisc no longer works at this point and - * needs to be removed. (By the caller of atm_tc_put.) - */ -} - -static void sch_atm_pop(struct atm_vcc *vcc, struct sk_buff *skb) -{ - struct atm_qdisc_data *p = VCC2FLOW(vcc)->parent; - - pr_debug("sch_atm_pop(vcc %p,skb %p,[qdisc %p])\n", vcc, skb, p); - VCC2FLOW(vcc)->old_pop(vcc, skb); - tasklet_schedule(&p->task); -} - -static const u8 llc_oui_ip[] = { - 0xaa, /* DSAP: non-ISO */ - 0xaa, /* SSAP: non-ISO */ - 0x03, /* Ctrl: Unnumbered Information Command PDU */ - 0x00, /* OUI: EtherType */ - 0x00, 0x00, - 0x08, 0x00 -}; /* Ethertype IP (0800) */ - -static const struct nla_policy atm_policy[TCA_ATM_MAX + 1] = { - [TCA_ATM_FD] = { .type = NLA_U32 }, - [TCA_ATM_EXCESS] = { .type = NLA_U32 }, -}; - -static int atm_tc_change(struct Qdisc *sch, u32 classid, u32 parent, - struct nlattr **tca, unsigned long *arg, - struct netlink_ext_ack *extack) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow = (struct atm_flow_data *)*arg; - struct atm_flow_data *excess = NULL; - struct nlattr *opt = tca[TCA_OPTIONS]; - struct nlattr *tb[TCA_ATM_MAX + 1]; - struct socket *sock; - int fd, error, hdr_len; - void *hdr; - - pr_debug("atm_tc_change(sch %p,[qdisc %p],classid %x,parent %x," - "flow %p,opt %p)\n", sch, p, classid, parent, flow, opt); - /* - * The concept of parents doesn't apply for this qdisc. - */ - if (parent && parent != TC_H_ROOT && parent != sch->handle) - return -EINVAL; - /* - * ATM classes cannot be changed. In order to change properties of the - * ATM connection, that socket needs to be modified directly (via the - * native ATM API. In order to send a flow to a different VC, the old - * class needs to be removed and a new one added. (This may be changed - * later.) - */ - if (flow) - return -EBUSY; - if (opt == NULL) - return -EINVAL; - - error = nla_parse_nested_deprecated(tb, TCA_ATM_MAX, opt, atm_policy, - NULL); - if (error < 0) - return error; - - if (!tb[TCA_ATM_FD]) - return -EINVAL; - fd = nla_get_u32(tb[TCA_ATM_FD]); - pr_debug("atm_tc_change: fd %d\n", fd); - if (tb[TCA_ATM_HDR]) { - hdr_len = nla_len(tb[TCA_ATM_HDR]); - hdr = nla_data(tb[TCA_ATM_HDR]); - } else { - hdr_len = RFC1483LLC_LEN; - hdr = NULL; /* default LLC/SNAP for IP */ - } - if (!tb[TCA_ATM_EXCESS]) - excess = NULL; - else { - excess = (struct atm_flow_data *) - atm_tc_find(sch, nla_get_u32(tb[TCA_ATM_EXCESS])); - if (!excess) - return -ENOENT; - } - pr_debug("atm_tc_change: type %d, payload %d, hdr_len %d\n", - opt->nla_type, nla_len(opt), hdr_len); - sock = sockfd_lookup(fd, &error); - if (!sock) - return error; /* f_count++ */ - pr_debug("atm_tc_change: f_count %ld\n", file_count(sock->file)); - if (sock->ops->family != PF_ATMSVC && sock->ops->family != PF_ATMPVC) { - error = -EPROTOTYPE; - goto err_out; - } - /* @@@ should check if the socket is really operational or we'll crash - on vcc->send */ - if (classid) { - if (TC_H_MAJ(classid ^ sch->handle)) { - pr_debug("atm_tc_change: classid mismatch\n"); - error = -EINVAL; - goto err_out; - } - } else { - int i; - unsigned long cl; - - for (i = 1; i < 0x8000; i++) { - classid = TC_H_MAKE(sch->handle, 0x8000 | i); - cl = atm_tc_find(sch, classid); - if (!cl) - break; - } - } - pr_debug("atm_tc_change: new id %x\n", classid); - flow = kzalloc(sizeof(struct atm_flow_data) + hdr_len, GFP_KERNEL); - pr_debug("atm_tc_change: flow %p\n", flow); - if (!flow) { - error = -ENOBUFS; - goto err_out; - } - - error = tcf_block_get(&flow->block, &flow->filter_list, sch, - extack); - if (error) { - kfree(flow); - goto err_out; - } - - flow->q = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, classid, - extack); - if (!flow->q) - flow->q = &noop_qdisc; - pr_debug("atm_tc_change: qdisc %p\n", flow->q); - flow->sock = sock; - flow->vcc = ATM_SD(sock); /* speedup */ - flow->vcc->user_back = flow; - pr_debug("atm_tc_change: vcc %p\n", flow->vcc); - flow->old_pop = flow->vcc->pop; - flow->parent = p; - flow->vcc->pop = sch_atm_pop; - flow->common.classid = classid; - flow->ref = 1; - flow->excess = excess; - list_add(&flow->list, &p->link.list); - flow->hdr_len = hdr_len; - if (hdr) - memcpy(flow->hdr, hdr, hdr_len); - else - memcpy(flow->hdr, llc_oui_ip, sizeof(llc_oui_ip)); - *arg = (unsigned long)flow; - return 0; -err_out: - sockfd_put(sock); - return error; -} - -static int atm_tc_delete(struct Qdisc *sch, unsigned long arg) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow = (struct atm_flow_data *)arg; - - pr_debug("atm_tc_delete(sch %p,[qdisc %p],flow %p)\n", sch, p, flow); - if (list_empty(&flow->list)) - return -EINVAL; - if (rcu_access_pointer(flow->filter_list) || flow == &p->link) - return -EBUSY; - /* - * Reference count must be 2: one for "keepalive" (set at class - * creation), and one for the reference held when calling delete. - */ - if (flow->ref < 2) { - pr_err("atm_tc_delete: flow->ref == %d\n", flow->ref); - return -EINVAL; - } - if (flow->ref > 2) - return -EBUSY; /* catch references via excess, etc. */ - atm_tc_put(sch, arg); - return 0; -} - -static void atm_tc_walk(struct Qdisc *sch, struct qdisc_walker *walker) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow; - - pr_debug("atm_tc_walk(sch %p,[qdisc %p],walker %p)\n", sch, p, walker); - if (walker->stop) - return; - list_for_each_entry(flow, &p->flows, list) { - if (walker->count >= walker->skip && - walker->fn(sch, (unsigned long)flow, walker) < 0) { - walker->stop = 1; - break; - } - walker->count++; - } -} - -static struct tcf_block *atm_tc_tcf_block(struct Qdisc *sch, unsigned long cl, - struct netlink_ext_ack *extack) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow = (struct atm_flow_data *)cl; - - pr_debug("atm_tc_find_tcf(sch %p,[qdisc %p],flow %p)\n", sch, p, flow); - return flow ? flow->block : p->link.block; -} - -/* --------------------------- Qdisc operations ---------------------------- */ - -static int atm_tc_enqueue(struct sk_buff *skb, struct Qdisc *sch, - struct sk_buff **to_free) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow; - struct tcf_result res; - int result; - int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; - - pr_debug("atm_tc_enqueue(skb %p,sch %p,[qdisc %p])\n", skb, sch, p); - result = TC_ACT_OK; /* be nice to gcc */ - flow = NULL; - if (TC_H_MAJ(skb->priority) != sch->handle || - !(flow = (struct atm_flow_data *)atm_tc_find(sch, skb->priority))) { - struct tcf_proto *fl; - - list_for_each_entry(flow, &p->flows, list) { - fl = rcu_dereference_bh(flow->filter_list); - if (fl) { - result = tcf_classify(skb, fl, &res, true); - if (result < 0) - continue; - if (result == TC_ACT_SHOT) - goto done; - - flow = (struct atm_flow_data *)res.class; - if (!flow) - flow = lookup_flow(sch, res.classid); - goto drop; - } - } - flow = NULL; -done: - ; - } - if (!flow) { - flow = &p->link; - } else { - if (flow->vcc) - ATM_SKB(skb)->atm_options = flow->vcc->atm_options; - /*@@@ looks good ... but it's not supposed to work :-) */ -#ifdef CONFIG_NET_CLS_ACT - switch (result) { - case TC_ACT_QUEUED: - case TC_ACT_STOLEN: - case TC_ACT_TRAP: - __qdisc_drop(skb, to_free); - return NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; - case TC_ACT_SHOT: - __qdisc_drop(skb, to_free); - goto drop; - case TC_ACT_RECLASSIFY: - if (flow->excess) - flow = flow->excess; - else - ATM_SKB(skb)->atm_options |= ATM_ATMOPT_CLP; - break; - } -#endif - } - - ret = qdisc_enqueue(skb, flow->q, to_free); - if (ret != NET_XMIT_SUCCESS) { -drop: __maybe_unused - if (net_xmit_drop_count(ret)) { - qdisc_qstats_drop(sch); - if (flow) - flow->qstats.drops++; - } - return ret; - } - /* - * Okay, this may seem weird. We pretend we've dropped the packet if - * it goes via ATM. The reason for this is that the outer qdisc - * expects to be able to q->dequeue the packet later on if we return - * success at this place. Also, sch->q.qdisc needs to reflect whether - * there is a packet egligible for dequeuing or not. Note that the - * statistics of the outer qdisc are necessarily wrong because of all - * this. There's currently no correct solution for this. - */ - if (flow == &p->link) { - sch->q.qlen++; - return NET_XMIT_SUCCESS; - } - tasklet_schedule(&p->task); - return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; -} - -/* - * Dequeue packets and send them over ATM. Note that we quite deliberately - * avoid checking net_device's flow control here, simply because sch_atm - * uses its own channels, which have nothing to do with any CLIP/LANE/or - * non-ATM interfaces. - */ - -static void sch_atm_dequeue(unsigned long data) -{ - struct Qdisc *sch = (struct Qdisc *)data; - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow; - struct sk_buff *skb; - - pr_debug("sch_atm_dequeue(sch %p,[qdisc %p])\n", sch, p); - list_for_each_entry(flow, &p->flows, list) { - if (flow == &p->link) - continue; - /* - * If traffic is properly shaped, this won't generate nasty - * little bursts. Otherwise, it may ... (but that's okay) - */ - while ((skb = flow->q->ops->peek(flow->q))) { - if (!atm_may_send(flow->vcc, skb->truesize)) - break; - - skb = qdisc_dequeue_peeked(flow->q); - if (unlikely(!skb)) - break; - - qdisc_bstats_update(sch, skb); - bstats_update(&flow->bstats, skb); - pr_debug("atm_tc_dequeue: sending on class %p\n", flow); - /* remove any LL header somebody else has attached */ - skb_pull(skb, skb_network_offset(skb)); - if (skb_headroom(skb) < flow->hdr_len) { - struct sk_buff *new; - - new = skb_realloc_headroom(skb, flow->hdr_len); - dev_kfree_skb(skb); - if (!new) - continue; - skb = new; - } - pr_debug("sch_atm_dequeue: ip %p, data %p\n", - skb_network_header(skb), skb->data); - ATM_SKB(skb)->vcc = flow->vcc; - memcpy(skb_push(skb, flow->hdr_len), flow->hdr, - flow->hdr_len); - refcount_add(skb->truesize, - &sk_atm(flow->vcc)->sk_wmem_alloc); - /* atm.atm_options are already set by atm_tc_enqueue */ - flow->vcc->send(flow->vcc, skb); - } - } -} - -static struct sk_buff *atm_tc_dequeue(struct Qdisc *sch) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct sk_buff *skb; - - pr_debug("atm_tc_dequeue(sch %p,[qdisc %p])\n", sch, p); - tasklet_schedule(&p->task); - skb = qdisc_dequeue_peeked(p->link.q); - if (skb) - sch->q.qlen--; - return skb; -} - -static struct sk_buff *atm_tc_peek(struct Qdisc *sch) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - - pr_debug("atm_tc_peek(sch %p,[qdisc %p])\n", sch, p); - - return p->link.q->ops->peek(p->link.q); -} - -static int atm_tc_init(struct Qdisc *sch, struct nlattr *opt, - struct netlink_ext_ack *extack) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - int err; - - pr_debug("atm_tc_init(sch %p,[qdisc %p],opt %p)\n", sch, p, opt); - INIT_LIST_HEAD(&p->flows); - INIT_LIST_HEAD(&p->link.list); - list_add(&p->link.list, &p->flows); - p->link.q = qdisc_create_dflt(sch->dev_queue, - &pfifo_qdisc_ops, sch->handle, extack); - if (!p->link.q) - p->link.q = &noop_qdisc; - pr_debug("atm_tc_init: link (%p) qdisc %p\n", &p->link, p->link.q); - p->link.vcc = NULL; - p->link.sock = NULL; - p->link.common.classid = sch->handle; - p->link.ref = 1; - - err = tcf_block_get(&p->link.block, &p->link.filter_list, sch, - extack); - if (err) - return err; - - tasklet_init(&p->task, sch_atm_dequeue, (unsigned long)sch); - return 0; -} - -static void atm_tc_reset(struct Qdisc *sch) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow; - - pr_debug("atm_tc_reset(sch %p,[qdisc %p])\n", sch, p); - list_for_each_entry(flow, &p->flows, list) - qdisc_reset(flow->q); -} - -static void atm_tc_destroy(struct Qdisc *sch) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow, *tmp; - - pr_debug("atm_tc_destroy(sch %p,[qdisc %p])\n", sch, p); - list_for_each_entry(flow, &p->flows, list) { - tcf_block_put(flow->block); - flow->block = NULL; - } - - list_for_each_entry_safe(flow, tmp, &p->flows, list) { - if (flow->ref > 1) - pr_err("atm_destroy: %p->ref = %d\n", flow, flow->ref); - atm_tc_put(sch, (unsigned long)flow); - } - tasklet_kill(&p->task); -} - -static int atm_tc_dump_class(struct Qdisc *sch, unsigned long cl, - struct sk_buff *skb, struct tcmsg *tcm) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow = (struct atm_flow_data *)cl; - struct nlattr *nest; - - pr_debug("atm_tc_dump_class(sch %p,[qdisc %p],flow %p,skb %p,tcm %p)\n", - sch, p, flow, skb, tcm); - if (list_empty(&flow->list)) - return -EINVAL; - tcm->tcm_handle = flow->common.classid; - tcm->tcm_info = flow->q->handle; - - nest = nla_nest_start_noflag(skb, TCA_OPTIONS); - if (nest == NULL) - goto nla_put_failure; - - if (nla_put(skb, TCA_ATM_HDR, flow->hdr_len, flow->hdr)) - goto nla_put_failure; - if (flow->vcc) { - struct sockaddr_atmpvc pvc; - int state; - - memset(&pvc, 0, sizeof(pvc)); - pvc.sap_family = AF_ATMPVC; - pvc.sap_addr.itf = flow->vcc->dev ? flow->vcc->dev->number : -1; - pvc.sap_addr.vpi = flow->vcc->vpi; - pvc.sap_addr.vci = flow->vcc->vci; - if (nla_put(skb, TCA_ATM_ADDR, sizeof(pvc), &pvc)) - goto nla_put_failure; - state = ATM_VF2VS(flow->vcc->flags); - if (nla_put_u32(skb, TCA_ATM_STATE, state)) - goto nla_put_failure; - } - if (flow->excess) { - if (nla_put_u32(skb, TCA_ATM_EXCESS, flow->common.classid)) - goto nla_put_failure; - } else { - if (nla_put_u32(skb, TCA_ATM_EXCESS, 0)) - goto nla_put_failure; - } - return nla_nest_end(skb, nest); - -nla_put_failure: - nla_nest_cancel(skb, nest); - return -1; -} -static int -atm_tc_dump_class_stats(struct Qdisc *sch, unsigned long arg, - struct gnet_dump *d) -{ - struct atm_flow_data *flow = (struct atm_flow_data *)arg; - - if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch), - d, NULL, &flow->bstats) < 0 || - gnet_stats_copy_queue(d, NULL, &flow->qstats, flow->q->q.qlen) < 0) - return -1; - - return 0; -} - -static int atm_tc_dump(struct Qdisc *sch, struct sk_buff *skb) -{ - return 0; -} - -static const struct Qdisc_class_ops atm_class_ops = { - .graft = atm_tc_graft, - .leaf = atm_tc_leaf, - .find = atm_tc_find, - .change = atm_tc_change, - .delete = atm_tc_delete, - .walk = atm_tc_walk, - .tcf_block = atm_tc_tcf_block, - .bind_tcf = atm_tc_bind_filter, - .unbind_tcf = atm_tc_put, - .dump = atm_tc_dump_class, - .dump_stats = atm_tc_dump_class_stats, -}; - -static struct Qdisc_ops atm_qdisc_ops __read_mostly = { - .cl_ops = &atm_class_ops, - .id = "atm", - .priv_size = sizeof(struct atm_qdisc_data), - .enqueue = atm_tc_enqueue, - .dequeue = atm_tc_dequeue, - .peek = atm_tc_peek, - .init = atm_tc_init, - .reset = atm_tc_reset, - .destroy = atm_tc_destroy, - .dump = atm_tc_dump, - .owner = THIS_MODULE, -}; - -static int __init atm_init(void) -{ - return register_qdisc(&atm_qdisc_ops); -} - -static void __exit atm_exit(void) -{ - unregister_qdisc(&atm_qdisc_ops); -} - -module_init(atm_init) -module_exit(atm_exit) -MODULE_LICENSE("GPL");
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jamal Hadi Salim jhs@mojatatu.com
commit bbe77c14ee6185a61ba6d5e435c1cbb489d2a9ed upstream.
The dsmark qdisc has served us well over the years for diffserv but has not been getting much attention due to other more popular approaches to do diffserv services. Most recently it has become a shooting target for syzkaller. For this reason, we are retiring it.
Signed-off-by: Jamal Hadi Salim jhs@mojatatu.com Acked-by: Jiri Pirko jiri@nvidia.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/Kconfig | 11 - net/sched/Makefile | 1 net/sched/sch_dsmark.c | 521 ------------------------------------------------- 3 files changed, 533 deletions(-) delete mode 100644 net/sched/sch_dsmark.c delete mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/dsmark.json
--- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -186,17 +186,6 @@ config NET_SCH_GRED To compile this code as a module, choose M here: the module will be called sch_gred.
-config NET_SCH_DSMARK - tristate "Differentiated Services marker (DSMARK)" - help - Say Y if you want to schedule packets according to the - Differentiated Services architecture proposed in RFC 2475. - Technical information on this method, with pointers to associated - RFCs, is available at http://www.gta.ufrj.br/diffserv/. - - To compile this code as a module, choose M here: the - module will be called sch_dsmark. - config NET_SCH_NETEM tristate "Network emulator (NETEM)" help --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -37,7 +37,6 @@ obj-$(CONFIG_NET_SCH_HFSC) += sch_hfsc.o obj-$(CONFIG_NET_SCH_RED) += sch_red.o obj-$(CONFIG_NET_SCH_GRED) += sch_gred.o obj-$(CONFIG_NET_SCH_INGRESS) += sch_ingress.o -obj-$(CONFIG_NET_SCH_DSMARK) += sch_dsmark.o obj-$(CONFIG_NET_SCH_SFB) += sch_sfb.o obj-$(CONFIG_NET_SCH_SFQ) += sch_sfq.o obj-$(CONFIG_NET_SCH_TBF) += sch_tbf.o --- a/net/sched/sch_dsmark.c +++ /dev/null @@ -1,521 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* net/sched/sch_dsmark.c - Differentiated Services field marker */ - -/* Written 1998-2000 by Werner Almesberger, EPFL ICA */ - - -#include <linux/module.h> -#include <linux/init.h> -#include <linux/slab.h> -#include <linux/types.h> -#include <linux/string.h> -#include <linux/errno.h> -#include <linux/skbuff.h> -#include <linux/rtnetlink.h> -#include <linux/bitops.h> -#include <net/pkt_sched.h> -#include <net/pkt_cls.h> -#include <net/dsfield.h> -#include <net/inet_ecn.h> -#include <asm/byteorder.h> - -/* - * classid class marking - * ------- ----- ------- - * n/a 0 n/a - * x:0 1 use entry [0] - * ... ... ... - * x:y y>0 y+1 use entry [y] - * ... ... ... - * x:indices-1 indices use entry [indices-1] - * ... ... ... - * x:y y+1 use entry [y & (indices-1)] - * ... ... ... - * 0xffff 0x10000 use entry [indices-1] - */ - - -#define NO_DEFAULT_INDEX (1 << 16) - -struct mask_value { - u8 mask; - u8 value; -}; - -struct dsmark_qdisc_data { - struct Qdisc *q; - struct tcf_proto __rcu *filter_list; - struct tcf_block *block; - struct mask_value *mv; - u16 indices; - u8 set_tc_index; - u32 default_index; /* index range is 0...0xffff */ -#define DSMARK_EMBEDDED_SZ 16 - struct mask_value embedded[DSMARK_EMBEDDED_SZ]; -}; - -static inline int dsmark_valid_index(struct dsmark_qdisc_data *p, u16 index) -{ - return index <= p->indices && index > 0; -} - -/* ------------------------- Class/flow operations ------------------------- */ - -static int dsmark_graft(struct Qdisc *sch, unsigned long arg, - struct Qdisc *new, struct Qdisc **old, - struct netlink_ext_ack *extack) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - - pr_debug("%s(sch %p,[qdisc %p],new %p,old %p)\n", - __func__, sch, p, new, old); - - if (new == NULL) { - new = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, - sch->handle, NULL); - if (new == NULL) - new = &noop_qdisc; - } - - *old = qdisc_replace(sch, new, &p->q); - return 0; -} - -static struct Qdisc *dsmark_leaf(struct Qdisc *sch, unsigned long arg) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - return p->q; -} - -static unsigned long dsmark_find(struct Qdisc *sch, u32 classid) -{ - return TC_H_MIN(classid) + 1; -} - -static unsigned long dsmark_bind_filter(struct Qdisc *sch, - unsigned long parent, u32 classid) -{ - pr_debug("%s(sch %p,[qdisc %p],classid %x)\n", - __func__, sch, qdisc_priv(sch), classid); - - return dsmark_find(sch, classid); -} - -static void dsmark_unbind_filter(struct Qdisc *sch, unsigned long cl) -{ -} - -static const struct nla_policy dsmark_policy[TCA_DSMARK_MAX + 1] = { - [TCA_DSMARK_INDICES] = { .type = NLA_U16 }, - [TCA_DSMARK_DEFAULT_INDEX] = { .type = NLA_U16 }, - [TCA_DSMARK_SET_TC_INDEX] = { .type = NLA_FLAG }, - [TCA_DSMARK_MASK] = { .type = NLA_U8 }, - [TCA_DSMARK_VALUE] = { .type = NLA_U8 }, -}; - -static int dsmark_change(struct Qdisc *sch, u32 classid, u32 parent, - struct nlattr **tca, unsigned long *arg, - struct netlink_ext_ack *extack) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - struct nlattr *opt = tca[TCA_OPTIONS]; - struct nlattr *tb[TCA_DSMARK_MAX + 1]; - int err = -EINVAL; - - pr_debug("%s(sch %p,[qdisc %p],classid %x,parent %x), arg 0x%lx\n", - __func__, sch, p, classid, parent, *arg); - - if (!dsmark_valid_index(p, *arg)) { - err = -ENOENT; - goto errout; - } - - if (!opt) - goto errout; - - err = nla_parse_nested_deprecated(tb, TCA_DSMARK_MAX, opt, - dsmark_policy, NULL); - if (err < 0) - goto errout; - - if (tb[TCA_DSMARK_VALUE]) - p->mv[*arg - 1].value = nla_get_u8(tb[TCA_DSMARK_VALUE]); - - if (tb[TCA_DSMARK_MASK]) - p->mv[*arg - 1].mask = nla_get_u8(tb[TCA_DSMARK_MASK]); - - err = 0; - -errout: - return err; -} - -static int dsmark_delete(struct Qdisc *sch, unsigned long arg) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - - if (!dsmark_valid_index(p, arg)) - return -EINVAL; - - p->mv[arg - 1].mask = 0xff; - p->mv[arg - 1].value = 0; - - return 0; -} - -static void dsmark_walk(struct Qdisc *sch, struct qdisc_walker *walker) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - int i; - - pr_debug("%s(sch %p,[qdisc %p],walker %p)\n", - __func__, sch, p, walker); - - if (walker->stop) - return; - - for (i = 0; i < p->indices; i++) { - if (p->mv[i].mask == 0xff && !p->mv[i].value) - goto ignore; - if (walker->count >= walker->skip) { - if (walker->fn(sch, i + 1, walker) < 0) { - walker->stop = 1; - break; - } - } -ignore: - walker->count++; - } -} - -static struct tcf_block *dsmark_tcf_block(struct Qdisc *sch, unsigned long cl, - struct netlink_ext_ack *extack) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - - return p->block; -} - -/* --------------------------- Qdisc operations ---------------------------- */ - -static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch, - struct sk_buff **to_free) -{ - unsigned int len = qdisc_pkt_len(skb); - struct dsmark_qdisc_data *p = qdisc_priv(sch); - int err; - - pr_debug("%s(skb %p,sch %p,[qdisc %p])\n", __func__, skb, sch, p); - - if (p->set_tc_index) { - int wlen = skb_network_offset(skb); - - switch (skb_protocol(skb, true)) { - case htons(ETH_P_IP): - wlen += sizeof(struct iphdr); - if (!pskb_may_pull(skb, wlen) || - skb_try_make_writable(skb, wlen)) - goto drop; - - skb->tc_index = ipv4_get_dsfield(ip_hdr(skb)) - & ~INET_ECN_MASK; - break; - - case htons(ETH_P_IPV6): - wlen += sizeof(struct ipv6hdr); - if (!pskb_may_pull(skb, wlen) || - skb_try_make_writable(skb, wlen)) - goto drop; - - skb->tc_index = ipv6_get_dsfield(ipv6_hdr(skb)) - & ~INET_ECN_MASK; - break; - default: - skb->tc_index = 0; - break; - } - } - - if (TC_H_MAJ(skb->priority) == sch->handle) - skb->tc_index = TC_H_MIN(skb->priority); - else { - struct tcf_result res; - struct tcf_proto *fl = rcu_dereference_bh(p->filter_list); - int result = tcf_classify(skb, fl, &res, false); - - pr_debug("result %d class 0x%04x\n", result, res.classid); - - switch (result) { -#ifdef CONFIG_NET_CLS_ACT - case TC_ACT_QUEUED: - case TC_ACT_STOLEN: - case TC_ACT_TRAP: - __qdisc_drop(skb, to_free); - return NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; - - case TC_ACT_SHOT: - goto drop; -#endif - case TC_ACT_OK: - skb->tc_index = TC_H_MIN(res.classid); - break; - - default: - if (p->default_index != NO_DEFAULT_INDEX) - skb->tc_index = p->default_index; - break; - } - } - - err = qdisc_enqueue(skb, p->q, to_free); - if (err != NET_XMIT_SUCCESS) { - if (net_xmit_drop_count(err)) - qdisc_qstats_drop(sch); - return err; - } - - sch->qstats.backlog += len; - sch->q.qlen++; - - return NET_XMIT_SUCCESS; - -drop: - qdisc_drop(skb, sch, to_free); - return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; -} - -static struct sk_buff *dsmark_dequeue(struct Qdisc *sch) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - struct sk_buff *skb; - u32 index; - - pr_debug("%s(sch %p,[qdisc %p])\n", __func__, sch, p); - - skb = qdisc_dequeue_peeked(p->q); - if (skb == NULL) - return NULL; - - qdisc_bstats_update(sch, skb); - qdisc_qstats_backlog_dec(sch, skb); - sch->q.qlen--; - - index = skb->tc_index & (p->indices - 1); - pr_debug("index %d->%d\n", skb->tc_index, index); - - switch (skb_protocol(skb, true)) { - case htons(ETH_P_IP): - ipv4_change_dsfield(ip_hdr(skb), p->mv[index].mask, - p->mv[index].value); - break; - case htons(ETH_P_IPV6): - ipv6_change_dsfield(ipv6_hdr(skb), p->mv[index].mask, - p->mv[index].value); - break; - default: - /* - * Only complain if a change was actually attempted. - * This way, we can send non-IP traffic through dsmark - * and don't need yet another qdisc as a bypass. - */ - if (p->mv[index].mask != 0xff || p->mv[index].value) - pr_warn("%s: unsupported protocol %d\n", - __func__, ntohs(skb_protocol(skb, true))); - break; - } - - return skb; -} - -static struct sk_buff *dsmark_peek(struct Qdisc *sch) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - - pr_debug("%s(sch %p,[qdisc %p])\n", __func__, sch, p); - - return p->q->ops->peek(p->q); -} - -static int dsmark_init(struct Qdisc *sch, struct nlattr *opt, - struct netlink_ext_ack *extack) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - struct nlattr *tb[TCA_DSMARK_MAX + 1]; - int err = -EINVAL; - u32 default_index = NO_DEFAULT_INDEX; - u16 indices; - int i; - - pr_debug("%s(sch %p,[qdisc %p],opt %p)\n", __func__, sch, p, opt); - - if (!opt) - goto errout; - - err = tcf_block_get(&p->block, &p->filter_list, sch, extack); - if (err) - return err; - - err = nla_parse_nested_deprecated(tb, TCA_DSMARK_MAX, opt, - dsmark_policy, NULL); - if (err < 0) - goto errout; - - err = -EINVAL; - if (!tb[TCA_DSMARK_INDICES]) - goto errout; - indices = nla_get_u16(tb[TCA_DSMARK_INDICES]); - - if (hweight32(indices) != 1) - goto errout; - - if (tb[TCA_DSMARK_DEFAULT_INDEX]) - default_index = nla_get_u16(tb[TCA_DSMARK_DEFAULT_INDEX]); - - if (indices <= DSMARK_EMBEDDED_SZ) - p->mv = p->embedded; - else - p->mv = kmalloc_array(indices, sizeof(*p->mv), GFP_KERNEL); - if (!p->mv) { - err = -ENOMEM; - goto errout; - } - for (i = 0; i < indices; i++) { - p->mv[i].mask = 0xff; - p->mv[i].value = 0; - } - p->indices = indices; - p->default_index = default_index; - p->set_tc_index = nla_get_flag(tb[TCA_DSMARK_SET_TC_INDEX]); - - p->q = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, sch->handle, - NULL); - if (p->q == NULL) - p->q = &noop_qdisc; - else - qdisc_hash_add(p->q, true); - - pr_debug("%s: qdisc %p\n", __func__, p->q); - - err = 0; -errout: - return err; -} - -static void dsmark_reset(struct Qdisc *sch) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - - pr_debug("%s(sch %p,[qdisc %p])\n", __func__, sch, p); - if (p->q) - qdisc_reset(p->q); -} - -static void dsmark_destroy(struct Qdisc *sch) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - - pr_debug("%s(sch %p,[qdisc %p])\n", __func__, sch, p); - - tcf_block_put(p->block); - qdisc_put(p->q); - if (p->mv != p->embedded) - kfree(p->mv); -} - -static int dsmark_dump_class(struct Qdisc *sch, unsigned long cl, - struct sk_buff *skb, struct tcmsg *tcm) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - struct nlattr *opts = NULL; - - pr_debug("%s(sch %p,[qdisc %p],class %ld\n", __func__, sch, p, cl); - - if (!dsmark_valid_index(p, cl)) - return -EINVAL; - - tcm->tcm_handle = TC_H_MAKE(TC_H_MAJ(sch->handle), cl - 1); - tcm->tcm_info = p->q->handle; - - opts = nla_nest_start_noflag(skb, TCA_OPTIONS); - if (opts == NULL) - goto nla_put_failure; - if (nla_put_u8(skb, TCA_DSMARK_MASK, p->mv[cl - 1].mask) || - nla_put_u8(skb, TCA_DSMARK_VALUE, p->mv[cl - 1].value)) - goto nla_put_failure; - - return nla_nest_end(skb, opts); - -nla_put_failure: - nla_nest_cancel(skb, opts); - return -EMSGSIZE; -} - -static int dsmark_dump(struct Qdisc *sch, struct sk_buff *skb) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - struct nlattr *opts = NULL; - - opts = nla_nest_start_noflag(skb, TCA_OPTIONS); - if (opts == NULL) - goto nla_put_failure; - if (nla_put_u16(skb, TCA_DSMARK_INDICES, p->indices)) - goto nla_put_failure; - - if (p->default_index != NO_DEFAULT_INDEX && - nla_put_u16(skb, TCA_DSMARK_DEFAULT_INDEX, p->default_index)) - goto nla_put_failure; - - if (p->set_tc_index && - nla_put_flag(skb, TCA_DSMARK_SET_TC_INDEX)) - goto nla_put_failure; - - return nla_nest_end(skb, opts); - -nla_put_failure: - nla_nest_cancel(skb, opts); - return -EMSGSIZE; -} - -static const struct Qdisc_class_ops dsmark_class_ops = { - .graft = dsmark_graft, - .leaf = dsmark_leaf, - .find = dsmark_find, - .change = dsmark_change, - .delete = dsmark_delete, - .walk = dsmark_walk, - .tcf_block = dsmark_tcf_block, - .bind_tcf = dsmark_bind_filter, - .unbind_tcf = dsmark_unbind_filter, - .dump = dsmark_dump_class, -}; - -static struct Qdisc_ops dsmark_qdisc_ops __read_mostly = { - .next = NULL, - .cl_ops = &dsmark_class_ops, - .id = "dsmark", - .priv_size = sizeof(struct dsmark_qdisc_data), - .enqueue = dsmark_enqueue, - .dequeue = dsmark_dequeue, - .peek = dsmark_peek, - .init = dsmark_init, - .reset = dsmark_reset, - .destroy = dsmark_destroy, - .change = NULL, - .dump = dsmark_dump, - .owner = THIS_MODULE, -}; - -static int __init dsmark_module_init(void) -{ - return register_qdisc(&dsmark_qdisc_ops); -} - -static void __exit dsmark_module_exit(void) -{ - unregister_qdisc(&dsmark_qdisc_ops); -} - -module_init(dsmark_module_init) -module_exit(dsmark_module_exit) - -MODULE_LICENSE("GPL");
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.com
[ Upstream commit eec04ea119691e65227a97ce53c0da6b9b74b0b7 ]
Fix potential OOB in receive_encrypted_standard() if server returned a large shdr->NextCommand that would end up writing off the end of @next_buffer.
Fixes: b24df3e30cbf ("cifs: update receive_encrypted_standard to handle compounded responses") Cc: stable@vger.kernel.org Reported-by: Robert Morris rtm@csail.mit.edu Signed-off-by: Paulo Alcantara (SUSE) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com [Guru: receive_encrypted_standard() is present in file smb2ops.c, smb2ops.c file location is changed, modified patch accordingly.] Signed-off-by: Guruswamy Basavaiah guruswamy.basavaiah@broadcom.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/cifs/smb2ops.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)
--- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -4892,6 +4892,7 @@ receive_encrypted_standard(struct TCP_Se struct smb2_sync_hdr *shdr; unsigned int pdu_length = server->pdu_size; unsigned int buf_size; + unsigned int next_cmd; struct mid_q_entry *mid_entry; int next_is_large; char *next_buffer = NULL; @@ -4920,14 +4921,15 @@ receive_encrypted_standard(struct TCP_Se next_is_large = server->large_buf; one_more: shdr = (struct smb2_sync_hdr *)buf; - if (shdr->NextCommand) { + next_cmd = le32_to_cpu(shdr->NextCommand); + if (next_cmd) { + if (WARN_ON_ONCE(next_cmd > pdu_length)) + return -1; if (next_is_large) next_buffer = (char *)cifs_buf_get(); else next_buffer = (char *)cifs_small_buf_get(); - memcpy(next_buffer, - buf + le32_to_cpu(shdr->NextCommand), - pdu_length - le32_to_cpu(shdr->NextCommand)); + memcpy(next_buffer, buf + next_cmd, pdu_length - next_cmd); }
mid_entry = smb2_find_mid(server, buf); @@ -4951,8 +4953,8 @@ one_more: else ret = cifs_handle_standard(server, mid_entry);
- if (ret == 0 && shdr->NextCommand) { - pdu_length -= le32_to_cpu(shdr->NextCommand); + if (ret == 0 && next_cmd) { + pdu_length -= next_cmd; server->large_buf = next_is_large; if (next_is_large) server->bigbuf = buf = next_buffer;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.com
[ Upstream commit af1689a9b7701d9907dfc84d2a4b57c4bc907144 ]
Validate offsets and lengths before dereferencing create contexts in smb2_parse_contexts().
This fixes following oops when accessing invalid create contexts from server:
BUG: unable to handle page fault for address: ffff8881178d8cc3 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 4a01067 P4D 4a01067 PUD 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 3 PID: 1736 Comm: mount.cifs Not tainted 6.7.0-rc4 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 RIP: 0010:smb2_parse_contexts+0xa0/0x3a0 [cifs] Code: f8 10 75 13 48 b8 93 ad 25 50 9c b4 11 e7 49 39 06 0f 84 d2 00 00 00 8b 45 00 85 c0 74 61 41 29 c5 48 01 c5 41 83 fd 0f 76 55 <0f> b7 7d 04 0f b7 45 06 4c 8d 74 3d 00 66 83 f8 04 75 bc ba 04 00 RSP: 0018:ffffc900007939e0 EFLAGS: 00010216 RAX: ffffc90000793c78 RBX: ffff8880180cc000 RCX: ffffc90000793c90 RDX: ffffc90000793cc0 RSI: ffff8880178d8cc0 RDI: ffff8880180cc000 RBP: ffff8881178d8cbf R08: ffffc90000793c22 R09: 0000000000000000 R10: ffff8880180cc000 R11: 0000000000000024 R12: 0000000000000000 R13: 0000000000000020 R14: 0000000000000000 R15: ffffc90000793c22 FS: 00007f873753cbc0(0000) GS:ffff88806bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff8881178d8cc3 CR3: 00000000181ca000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x181/0x480 ? search_module_extables+0x19/0x60 ? srso_alias_return_thunk+0x5/0xfbef5 ? exc_page_fault+0x1b6/0x1c0 ? asm_exc_page_fault+0x26/0x30 ? smb2_parse_contexts+0xa0/0x3a0 [cifs] SMB2_open+0x38d/0x5f0 [cifs] ? smb2_is_path_accessible+0x138/0x260 [cifs] smb2_is_path_accessible+0x138/0x260 [cifs] cifs_is_path_remote+0x8d/0x230 [cifs] cifs_mount+0x7e/0x350 [cifs] cifs_smb3_do_mount+0x128/0x780 [cifs] smb3_get_tree+0xd9/0x290 [cifs] vfs_get_tree+0x2c/0x100 ? capable+0x37/0x70 path_mount+0x2d7/0xb80 ? srso_alias_return_thunk+0x5/0xfbef5 ? _raw_spin_unlock_irqrestore+0x44/0x60 __x64_sys_mount+0x11a/0x150 do_syscall_64+0x47/0xf0 entry_SYSCALL_64_after_hwframe+0x6f/0x77 RIP: 0033:0x7f8737657b1e
Reported-by: Robert Morris rtm@csail.mit.edu Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com [Guru: Removed changes to cached_dir.c and checking return value of smb2_parse_contexts in smb2ops.c] Signed-off-by: Guruswamy Basavaiah guruswamy.basavaiah@broadcom.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/cifs/smb2ops.c | 4 +- fs/cifs/smb2pdu.c | 91 +++++++++++++++++++++++++++++++--------------------- fs/cifs/smb2proto.h | 12 ++++-- 3 files changed, 65 insertions(+), 42 deletions(-)
--- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -818,10 +818,12 @@ int open_shroot(unsigned int xid, struct if (o_rsp->OplockLevel == SMB2_OPLOCK_LEVEL_LEASE) { kref_get(&tcon->crfid.refcount); tcon->crfid.has_lease = true; - smb2_parse_contexts(server, o_rsp, + rc = smb2_parse_contexts(server, rsp_iov, &oparms.fid->epoch, oparms.fid->lease_key, &oplock, NULL, NULL); + if (rc) + goto oshr_exit; } else goto oshr_exit;
--- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -1991,17 +1991,18 @@ parse_posix_ctxt(struct create_context * posix->nlink, posix->mode, posix->reparse_tag); }
-void -smb2_parse_contexts(struct TCP_Server_Info *server, - struct smb2_create_rsp *rsp, - unsigned int *epoch, char *lease_key, __u8 *oplock, - struct smb2_file_all_info *buf, - struct create_posix_rsp *posix) +int smb2_parse_contexts(struct TCP_Server_Info *server, + struct kvec *rsp_iov, + unsigned int *epoch, + char *lease_key, __u8 *oplock, + struct smb2_file_all_info *buf, + struct create_posix_rsp *posix) { - char *data_offset; + struct smb2_create_rsp *rsp = rsp_iov->iov_base; struct create_context *cc; - unsigned int next; - unsigned int remaining; + size_t rem, off, len; + size_t doff, dlen; + size_t noff, nlen; char *name; static const char smb3_create_tag_posix[] = { 0x93, 0xAD, 0x25, 0x50, 0x9C, @@ -2010,45 +2011,63 @@ smb2_parse_contexts(struct TCP_Server_In };
*oplock = 0; - data_offset = (char *)rsp + le32_to_cpu(rsp->CreateContextsOffset); - remaining = le32_to_cpu(rsp->CreateContextsLength); - cc = (struct create_context *)data_offset; + + off = le32_to_cpu(rsp->CreateContextsOffset); + rem = le32_to_cpu(rsp->CreateContextsLength); + if (check_add_overflow(off, rem, &len) || len > rsp_iov->iov_len) + return -EINVAL; + cc = (struct create_context *)((u8 *)rsp + off);
/* Initialize inode number to 0 in case no valid data in qfid context */ if (buf) buf->IndexNumber = 0;
- while (remaining >= sizeof(struct create_context)) { - name = le16_to_cpu(cc->NameOffset) + (char *)cc; - if (le16_to_cpu(cc->NameLength) == 4 && - strncmp(name, SMB2_CREATE_REQUEST_LEASE, 4) == 0) - *oplock = server->ops->parse_lease_buf(cc, epoch, - lease_key); - else if (buf && (le16_to_cpu(cc->NameLength) == 4) && - strncmp(name, SMB2_CREATE_QUERY_ON_DISK_ID, 4) == 0) - parse_query_id_ctxt(cc, buf); - else if ((le16_to_cpu(cc->NameLength) == 16)) { - if (posix && - memcmp(name, smb3_create_tag_posix, 16) == 0) + while (rem >= sizeof(*cc)) { + doff = le16_to_cpu(cc->DataOffset); + dlen = le32_to_cpu(cc->DataLength); + if (check_add_overflow(doff, dlen, &len) || len > rem) + return -EINVAL; + + noff = le16_to_cpu(cc->NameOffset); + nlen = le16_to_cpu(cc->NameLength); + if (noff + nlen >= doff) + return -EINVAL; + + name = (char *)cc + noff; + switch (nlen) { + case 4: + if (!strncmp(name, SMB2_CREATE_REQUEST_LEASE, 4)) { + *oplock = server->ops->parse_lease_buf(cc, epoch, + lease_key); + } else if (buf && + !strncmp(name, SMB2_CREATE_QUERY_ON_DISK_ID, 4)) { + parse_query_id_ctxt(cc, buf); + } + break; + case 16: + if (posix && !memcmp(name, smb3_create_tag_posix, 16)) parse_posix_ctxt(cc, buf, posix); + break; + default: + cifs_dbg(FYI, "%s: unhandled context (nlen=%zu dlen=%zu)\n", + __func__, nlen, dlen); + if (IS_ENABLED(CONFIG_CIFS_DEBUG2)) + cifs_dump_mem("context data: ", cc, dlen); + break; } - /* else { - cifs_dbg(FYI, "Context not matched with len %d\n", - le16_to_cpu(cc->NameLength)); - cifs_dump_mem("Cctxt name: ", name, 4); - } */
- next = le32_to_cpu(cc->Next); - if (!next) + off = le32_to_cpu(cc->Next); + if (!off) break; - remaining -= next; - cc = (struct create_context *)((char *)cc + next); + if (check_sub_overflow(rem, off, &rem)) + return -EINVAL; + cc = (struct create_context *)((u8 *)cc + off); }
if (rsp->OplockLevel != SMB2_OPLOCK_LEVEL_LEASE) *oplock = rsp->OplockLevel;
- return; + return 0; }
static int @@ -2915,8 +2934,8 @@ SMB2_open(const unsigned int xid, struct }
- smb2_parse_contexts(server, rsp, &oparms->fid->epoch, - oparms->fid->lease_key, oplock, buf, posix); + rc = smb2_parse_contexts(server, &rsp_iov, &oparms->fid->epoch, + oparms->fid->lease_key, oplock, buf, posix); creat_exit: SMB2_open_free(&rqst); free_rsp_buf(resp_buftype, rsp); --- a/fs/cifs/smb2proto.h +++ b/fs/cifs/smb2proto.h @@ -270,11 +270,13 @@ extern int smb3_validate_negotiate(const
extern enum securityEnum smb2_select_sectype(struct TCP_Server_Info *, enum securityEnum); -extern void smb2_parse_contexts(struct TCP_Server_Info *server, - struct smb2_create_rsp *rsp, - unsigned int *epoch, char *lease_key, - __u8 *oplock, struct smb2_file_all_info *buf, - struct create_posix_rsp *posix); +int smb2_parse_contexts(struct TCP_Server_Info *server, + struct kvec *rsp_iov, + unsigned int *epoch, + char *lease_key, __u8 *oplock, + struct smb2_file_all_info *buf, + struct create_posix_rsp *posix); + extern int smb3_encryption_required(const struct cifs_tcon *tcon); extern int smb2_validate_iov(unsigned int offset, unsigned int buffer_length, struct kvec *iov, unsigned int min_buf_size);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.com
[ Upstream commit 76025cc2285d9ede3d717fe4305d66f8be2d9346 ]
The data offset for the SMB3.1.1 POSIX create context will always be 8-byte aligned so having the check 'noff + nlen >= doff' in smb2_parse_contexts() is wrong as it will lead to -EINVAL because noff + nlen == doff.
Fix the sanity check to correctly handle aligned create context data.
Fixes: af1689a9b770 ("smb: client: fix potential OOBs in smb2_parse_contexts()") Signed-off-by: Paulo Alcantara pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com [Guru:smb2_parse_contexts() is present in file smb2ops.c, smb2ops.c file location is changed, modified patch accordingly.] Signed-off-by: Guruswamy Basavaiah guruswamy.basavaiah@broadcom.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/cifs/smb2pdu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -2030,7 +2030,7 @@ int smb2_parse_contexts(struct TCP_Serve
noff = le16_to_cpu(cc->NameOffset); nlen = le16_to_cpu(cc->NameLength); - if (noff + nlen >= doff) + if (noff + nlen > doff) return -EINVAL;
name = (char *)cc + noff;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cyril Hrubis chrubis@suse.cz
commit c1fc6484e1fb7cc2481d169bfef129a1b0676abe upstream.
The sched_rr_timeslice can be reset to default by writing value that is <= 0. However after reading from this file we always got the last value written, which is not useful at all.
$ echo -1 > /proc/sys/kernel/sched_rr_timeslice_ms $ cat /proc/sys/kernel/sched_rr_timeslice_ms -1
Fix this by setting the variable that holds the sysctl file value to the jiffies_to_msecs(RR_TIMESLICE) in case that <= 0 value was written.
Signed-off-by: Cyril Hrubis chrubis@suse.cz Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Petr Vorel pvorel@suse.cz Acked-by: Mel Gorman mgorman@suse.de Tested-by: Petr Vorel pvorel@suse.cz Cc: Mahmoud Adam mngyadam@amazon.com Link: https://lore.kernel.org/r/20230802151906.25258-3-chrubis@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sched/rt.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2804,6 +2804,9 @@ int sched_rr_handler(struct ctl_table *t sched_rr_timeslice = sysctl_sched_rr_timeslice <= 0 ? RR_TIMESLICE : msecs_to_jiffies(sysctl_sched_rr_timeslice); + + if (sysctl_sched_rr_timeslice <= 0) + sysctl_sched_rr_timeslice = jiffies_to_msecs(RR_TIMESLICE); } mutex_unlock(&mutex);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lokesh Gidra lokeshgidra@google.com
commit 67695f18d55924b2013534ef3bdc363bc9e14605 upstream.
In mfill_atomic_hugetlb(), mmap_changing isn't being checked again if we drop mmap_lock and reacquire it. When the lock is not held, mmap_changing could have been incremented. This is also inconsistent with the behavior in mfill_atomic().
Link: https://lkml.kernel.org/r/20240117223729.1444522-1-lokeshgidra@google.com Fixes: df2cc96e77011 ("userfaultfd: prevent non-cooperative events vs mcopy_atomic races") Signed-off-by: Lokesh Gidra lokeshgidra@google.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: Mike Rapoport rppt@kernel.org Cc: Axel Rasmussen axelrasmussen@google.com Cc: Brian Geffon bgeffon@google.com Cc: David Hildenbrand david@redhat.com Cc: Jann Horn jannh@google.com Cc: Kalesh Singh kaleshsingh@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Nicolas Geoffray ngeoffray@google.com Cc: Peter Xu peterx@redhat.com Cc: Suren Baghdasaryan surenb@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Mike Rapoport (IBM) rppt@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/userfaultfd.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
--- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -209,6 +209,7 @@ static __always_inline ssize_t __mcopy_a unsigned long dst_start, unsigned long src_start, unsigned long len, + bool *mmap_changing, bool zeropage) { int vm_alloc_shared = dst_vma->vm_flags & VM_SHARED; @@ -329,6 +330,15 @@ retry: goto out; } mmap_read_lock(dst_mm); + /* + * If memory mappings are changing because of non-cooperative + * operation (e.g. mremap) running in parallel, bail out and + * request the user to retry later + */ + if (mmap_changing && READ_ONCE(*mmap_changing)) { + err = -EAGAIN; + break; + }
dst_vma = NULL; goto retry; @@ -410,6 +420,7 @@ extern ssize_t __mcopy_atomic_hugetlb(st unsigned long dst_start, unsigned long src_start, unsigned long len, + bool *mmap_changing, bool zeropage); #endif /* CONFIG_HUGETLB_PAGE */
@@ -529,7 +540,8 @@ retry: */ if (is_vm_hugetlb_page(dst_vma)) return __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start, - src_start, len, zeropage); + src_start, len, mmap_changing, + zeropage);
if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) goto out_unlock;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Damien Le Moal dlemoal@kernel.org
commit 14db5f64a971fce3d8ea35de4dfc7f443a3efb92 upstream.
Write error handling is racy and can sometime lead to the error recovery path wrongly changing the inode size of a sequential zone file to an incorrect value which results in garbage data being readable at the end of a file. There are 2 problems:
1) zonefs_file_dio_write() updates a zone file write pointer offset after issuing a direct IO with iomap_dio_rw(). This update is done only if the IO succeed for synchronous direct writes. However, for asynchronous direct writes, the update is done without waiting for the IO completion so that the next asynchronous IO can be immediately issued. However, if an asynchronous IO completes with a failure right before the i_truncate_mutex lock protecting the update, the update may change the value of the inode write pointer offset that was corrected by the error path (zonefs_io_error() function).
2) zonefs_io_error() is called when a read or write error occurs. This function executes a report zone operation using the callback function zonefs_io_error_cb(), which does all the error recovery handling based on the current zone condition, write pointer position and according to the mount options being used. However, depending on the zoned device being used, a report zone callback may be executed in a context that is different from the context of __zonefs_io_error(). As a result, zonefs_io_error_cb() may be executed without the inode truncate mutex lock held, which can lead to invalid error processing.
Fix both problems as follows: - Problem 1: Perform the inode write pointer offset update before a direct write is issued with iomap_dio_rw(). This is safe to do as partial direct writes are not supported (IOMAP_DIO_PARTIAL is not set) and any failed IO will trigger the execution of zonefs_io_error() which will correct the inode write pointer offset to reflect the current state of the one on the device. - Problem 2: Change zonefs_io_error_cb() into zonefs_handle_io_error() and call this function directly from __zonefs_io_error() after obtaining the zone information using blkdev_report_zones() with a simple callback function that copies to a local stack variable the struct blk_zone obtained from the device. This ensures that error handling is performed holding the inode truncate mutex. This change also simplifies error handling for conventional zone files by bypassing the execution of report zones entirely. This is safe to do because the condition of conventional zones cannot be read-only or offline and conventional zone files are always fully mapped with a constant file size.
Reported-by: Shin'ichiro Kawasaki shinichiro.kawasaki@wdc.com Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal dlemoal@kernel.org Tested-by: Shin'ichiro Kawasaki shinichiro.kawasaki@wdc.com Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/zonefs/super.c | 70 ++++++++++++++++++++++++++++++------------------------ 1 file changed, 40 insertions(+), 30 deletions(-)
--- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -319,16 +319,18 @@ static loff_t zonefs_check_zone_conditio } }
-struct zonefs_ioerr_data { - struct inode *inode; - bool write; -}; - static int zonefs_io_error_cb(struct blk_zone *zone, unsigned int idx, void *data) { - struct zonefs_ioerr_data *err = data; - struct inode *inode = err->inode; + struct blk_zone *z = data; + + *z = *zone; + return 0; +} + +static void zonefs_handle_io_error(struct inode *inode, struct blk_zone *zone, + bool write) +{ struct zonefs_inode_info *zi = ZONEFS_I(inode); struct super_block *sb = inode->i_sb; struct zonefs_sb_info *sbi = ZONEFS_SB(sb); @@ -344,8 +346,8 @@ static int zonefs_io_error_cb(struct blk isize = i_size_read(inode); if (zone->cond != BLK_ZONE_COND_OFFLINE && zone->cond != BLK_ZONE_COND_READONLY && - !err->write && isize == data_size) - return 0; + !write && isize == data_size) + return;
/* * At this point, we detected either a bad zone or an inconsistency @@ -366,8 +368,9 @@ static int zonefs_io_error_cb(struct blk * In all cases, warn about inode size inconsistency and handle the * IO error according to the zone condition and to the mount options. */ - if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && isize != data_size) - zonefs_warn(sb, "inode %lu: invalid size %lld (should be %lld)\n", + if (isize != data_size) + zonefs_warn(sb, + "inode %lu: invalid size %lld (should be %lld)\n", inode->i_ino, isize, data_size);
/* @@ -427,8 +430,6 @@ static int zonefs_io_error_cb(struct blk zonefs_update_stats(inode, data_size); zonefs_i_size_write(inode, data_size); zi->i_wpoffset = data_size; - - return 0; }
/* @@ -442,23 +443,25 @@ static void __zonefs_io_error(struct ino { struct zonefs_inode_info *zi = ZONEFS_I(inode); struct super_block *sb = inode->i_sb; - struct zonefs_sb_info *sbi = ZONEFS_SB(sb); unsigned int noio_flag; - unsigned int nr_zones = 1; - struct zonefs_ioerr_data err = { - .inode = inode, - .write = write, - }; + struct blk_zone zone; int ret;
/* - * The only files that have more than one zone are conventional zone - * files with aggregated conventional zones, for which the inode zone - * size is always larger than the device zone size. - */ - if (zi->i_zone_size > bdev_zone_sectors(sb->s_bdev)) - nr_zones = zi->i_zone_size >> - (sbi->s_zone_sectors_shift + SECTOR_SHIFT); + * Conventional zone have no write pointer and cannot become read-only + * or offline. So simply fake a report for a single or aggregated zone + * and let zonefs_handle_io_error() correct the zone inode information + * according to the mount options. + */ + if (zi->i_ztype != ZONEFS_ZTYPE_SEQ) { + zone.start = zi->i_zsector; + zone.len = zi->i_max_size >> SECTOR_SHIFT; + zone.wp = zone.start + zone.len; + zone.type = BLK_ZONE_TYPE_CONVENTIONAL; + zone.cond = BLK_ZONE_COND_NOT_WP; + zone.capacity = zone.len; + goto handle_io_error; + }
/* * Memory allocations in blkdev_report_zones() can trigger a memory @@ -469,12 +472,19 @@ static void __zonefs_io_error(struct ino * the GFP_NOIO context avoids both problems. */ noio_flag = memalloc_noio_save(); - ret = blkdev_report_zones(sb->s_bdev, zi->i_zsector, nr_zones, - zonefs_io_error_cb, &err); - if (ret != nr_zones) + ret = blkdev_report_zones(sb->s_bdev, zi->i_zsector, 1, + zonefs_io_error_cb, &zone); + memalloc_noio_restore(noio_flag); + if (ret != 1) { zonefs_err(sb, "Get inode %lu zone information failed %d\n", inode->i_ino, ret); - memalloc_noio_restore(noio_flag); + zonefs_warn(sb, "remounting filesystem read-only\n"); + sb->s_flags |= SB_RDONLY; + return; + } + +handle_io_error: + zonefs_handle_io_error(inode, &zone, write); }
static void zonefs_io_error(struct inode *inode, bool write)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cyril Hrubis chrubis@suse.cz
commit c7fcb99877f9f542c918509b2801065adcaf46fa upstream.
There is a 10% rounding error in the intial value of the sysctl_sched_rr_timeslice with CONFIG_HZ_300=y.
This was found with LTP test sched_rr_get_interval01:
sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90 sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90
What this test does is to compare the return value from the sched_rr_get_interval() and the sched_rr_timeslice_ms sysctl file and fails if they do not match.
The problem it found is the intial sysctl file value which was computed as:
static int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE;
which works fine as long as MSEC_PER_SEC is multiple of HZ, however it introduces 10% rounding error for CONFIG_HZ_300:
(MSEC_PER_SEC / HZ) * (100 * HZ / 1000)
(1000 / 300) * (100 * 300 / 1000)
3 * 30 = 90
This can be easily fixed by reversing the order of the multiplication and division. After this fix we get:
(MSEC_PER_SEC * (100 * HZ / 1000)) / HZ
(1000 * (100 * 300 / 1000)) / 300
(1000 * 30) / 300 = 100
Fixes: 975e155ed873 ("sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds") Signed-off-by: Cyril Hrubis chrubis@suse.cz Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Petr Vorel pvorel@suse.cz Acked-by: Mel Gorman mgorman@suse.de Tested-by: Petr Vorel pvorel@suse.cz Link: https://lore.kernel.org/r/20230802151906.25258-2-chrubis@suse.cz [ pvorel: rebased for 5.15, 5.10 ] Signed-off-by: Petr Vorel pvorel@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sched/rt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -8,7 +8,7 @@ #include "pelt.h"
int sched_rr_timeslice = RR_TIMESLICE; -int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE; +int sysctl_sched_rr_timeslice = (MSEC_PER_SEC * RR_TIMESLICE) / HZ; /* More than 4 hours if BW_SHIFT equals 20. */ static const u64 max_rt_runtime = MAX_BW;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cyril Hrubis chrubis@suse.cz
commit 079be8fc630943d9fc70a97807feb73d169ee3fc upstream.
The validation of the value written to sched_rt_period_us was broken because:
- the sysclt_sched_rt_period is declared as unsigned int - parsed by proc_do_intvec() - the range is asserted after the value parsed by proc_do_intvec()
Because of this negative values written to the file were written into a unsigned integer that were later on interpreted as large positive integers which did passed the check:
if (sysclt_sched_rt_period <= 0) return EINVAL;
This commit fixes the parsing by setting explicit range for both perid_us and runtime_us into the sched_rt_sysctls table and processes the values with proc_dointvec_minmax() instead.
Alternatively if we wanted to use full range of unsigned int for the period value we would have to split the proc_handler and use proc_douintvec() for it however even the Documentation/scheduller/sched-rt-group.rst describes the range as 1 to INT_MAX.
As far as I can tell the only problem this causes is that the sysctl file allows writing negative values which when read back may confuse userspace.
There is also a LTP test being submitted for these sysctl files at:
http://patchwork.ozlabs.org/project/ltp/patch/20230901144433.2526-1-chrubis@...
Signed-off-by: Cyril Hrubis chrubis@suse.cz Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lore.kernel.org/r/20231002115553.3007-2-chrubis@suse.cz [ pvorel: rebased for 5.15, 5.10 ] Reviewed-by: Petr Vorel pvorel@suse.cz Signed-off-by: Petr Vorel pvorel@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sched/rt.c | 5 +---- kernel/sysctl.c | 4 ++++ 2 files changed, 5 insertions(+), 4 deletions(-)
--- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2727,9 +2727,6 @@ static int sched_rt_global_constraints(v
static int sched_rt_global_validate(void) { - if (sysctl_sched_rt_period <= 0) - return -EINVAL; - if ((sysctl_sched_rt_runtime != RUNTIME_INF) && ((sysctl_sched_rt_runtime > sysctl_sched_rt_period) || ((u64)sysctl_sched_rt_runtime * @@ -2760,7 +2757,7 @@ int sched_rt_handler(struct ctl_table *t old_period = sysctl_sched_rt_period; old_runtime = sysctl_sched_rt_runtime;
- ret = proc_dointvec(table, write, buffer, lenp, ppos); + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
if (!ret && write) { ret = sched_rt_global_validate(); --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1859,6 +1859,8 @@ static struct ctl_table kern_table[] = { .maxlen = sizeof(unsigned int), .mode = 0644, .proc_handler = sched_rt_handler, + .extra1 = SYSCTL_ONE, + .extra2 = SYSCTL_INT_MAX, }, { .procname = "sched_rt_runtime_us", @@ -1866,6 +1868,8 @@ static struct ctl_table kern_table[] = { .maxlen = sizeof(int), .mode = 0644, .proc_handler = sched_rt_handler, + .extra1 = SYSCTL_NEG_ONE, + .extra2 = SYSCTL_INT_MAX, }, { .procname = "sched_deadline_period_max_us",
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Bogdanov d.bogdanov@yadro.com
[ Upstream commit 83ab68168a3d990d5ff39ab030ad5754cbbccb25 ]
An abort that is responded to by iSCSI itself is added to tmr_list but does not go to target core. A LUN_RESET that goes through tmr_list takes a refcounter on the abort and waits for completion. However, the abort will be never complete because it was not started in target core.
Unable to locate ITT: 0x05000000 on CID: 0 Unable to locate RefTaskTag: 0x05000000 on CID: 0. wait_for_tasks: Stopping tmf LUN_RESET with tag 0x0 ref_task_tag 0x0 i_state 34 t_state ISTATE_PROCESSING refcnt 2 transport_state active,stop,fabric_stop wait for tasks: tmf LUN_RESET with tag 0x0 ref_task_tag 0x0 i_state 34 t_state ISTATE_PROCESSING refcnt 2 transport_state active,stop,fabric_stop ... INFO: task kworker/0:2:49 blocked for more than 491 seconds. task:kworker/0:2 state:D stack: 0 pid: 49 ppid: 2 flags:0x00000800 Workqueue: events target_tmr_work [target_core_mod] Call Trace: __switch_to+0x2c4/0x470 _schedule+0x314/0x1730 schedule+0x64/0x130 schedule_timeout+0x168/0x430 wait_for_completion+0x140/0x270 target_put_cmd_and_wait+0x64/0xb0 [target_core_mod] core_tmr_lun_reset+0x30/0xa0 [target_core_mod] target_tmr_work+0xc8/0x1b0 [target_core_mod] process_one_work+0x2d4/0x5d0 worker_thread+0x78/0x6c0
To fix this, only add abort to tmr_list if it will be handled by target core.
Signed-off-by: Dmitry Bogdanov d.bogdanov@yadro.com Link: https://lore.kernel.org/r/20240111125941.8688-1-d.bogdanov@yadro.com Reviewed-by: Mike Christie michael.christie@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/target/target_core_device.c | 5 ----- drivers/target/target_core_transport.c | 4 ++++ 2 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c index 9aeedcff7d02e..daa4d06ce2336 100644 --- a/drivers/target/target_core_device.c +++ b/drivers/target/target_core_device.c @@ -150,7 +150,6 @@ int transport_lookup_tmr_lun(struct se_cmd *se_cmd) struct se_session *se_sess = se_cmd->se_sess; struct se_node_acl *nacl = se_sess->se_node_acl; struct se_tmr_req *se_tmr = se_cmd->se_tmr_req; - unsigned long flags;
rcu_read_lock(); deve = target_nacl_find_deve(nacl, se_cmd->orig_fe_lun); @@ -181,10 +180,6 @@ int transport_lookup_tmr_lun(struct se_cmd *se_cmd) se_cmd->se_dev = rcu_dereference_raw(se_lun->lun_se_dev); se_tmr->tmr_dev = rcu_dereference_raw(se_lun->lun_se_dev);
- spin_lock_irqsave(&se_tmr->tmr_dev->se_tmr_lock, flags); - list_add_tail(&se_tmr->tmr_list, &se_tmr->tmr_dev->dev_tmr_list); - spin_unlock_irqrestore(&se_tmr->tmr_dev->se_tmr_lock, flags); - return 0; } EXPORT_SYMBOL(transport_lookup_tmr_lun); diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 2e97937f005ff..8d294b658592c 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -3436,6 +3436,10 @@ int transport_generic_handle_tmr( unsigned long flags; bool aborted = false;
+ spin_lock_irqsave(&cmd->se_dev->se_tmr_lock, flags); + list_add_tail(&cmd->se_tmr_req->tmr_list, &cmd->se_dev->dev_tmr_list); + spin_unlock_irqrestore(&cmd->se_dev->se_tmr_lock, flags); + spin_lock_irqsave(&cmd->t_state_lock, flags); if (cmd->transport_state & CMD_T_ABORTED) { aborted = true;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vinod Koul vkoul@kernel.org
[ Upstream commit 404290240827c3bb5c4e195174a8854eef2f89ac ]
We seem to have hit warnings of 'output may be truncated' which is fixed by increasing the size of 'dev_id'
drivers/dma/sh/shdmac.c: In function ‘sh_dmae_probe’: drivers/dma/sh/shdmac.c:541:34: error: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size 9 [-Werror=format-truncation=] 541 | "sh-dmae%d.%d", pdev->id, id); | ^~ In function ‘sh_dmae_chan_probe’, inlined from ‘sh_dmae_probe’ at drivers/dma/sh/shdmac.c:845:9: drivers/dma/sh/shdmac.c:541:26: note: directive argument in the range [0, 2147483647] 541 | "sh-dmae%d.%d", pdev->id, id); | ^~~~~~~~~~~~~~ drivers/dma/sh/shdmac.c:541:26: note: directive argument in the range [0, 19] drivers/dma/sh/shdmac.c:540:17: note: ‘snprintf’ output between 11 and 21 bytes into a destination of size 16 540 | snprintf(sh_chan->dev_id, sizeof(sh_chan->dev_id), | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 541 | "sh-dmae%d.%d", pdev->id, id); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/sh/shdma.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma/sh/shdma.h b/drivers/dma/sh/shdma.h index 9c121a4b33ad8..f97d80343aea4 100644 --- a/drivers/dma/sh/shdma.h +++ b/drivers/dma/sh/shdma.h @@ -25,7 +25,7 @@ struct sh_dmae_chan { const struct sh_dmae_slave_config *config; /* Slave DMA configuration */ int xmit_shift; /* log_2(bytes_per_xfer) */ void __iomem *base; - char dev_id[16]; /* unique name per DMAC of channel */ + char dev_id[32]; /* unique name per DMAC of channel */ int pm_error; dma_addr_t slave_addr; };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vinod Koul vkoul@kernel.org
[ Upstream commit 6386f6c995b3ab91c72cfb76e4465553c555a8da ]
We seem to have hit warnings of 'output may be truncated' which is fixed by increasing the size of 'irq_name'
drivers/dma/fsl-qdma.c: In function ‘fsl_qdma_irq_init’: drivers/dma/fsl-qdma.c:824:46: error: ‘%d’ directive writing between 1 and 11 bytes into a region of size 10 [-Werror=format-overflow=] 824 | sprintf(irq_name, "qdma-queue%d", i); | ^~ drivers/dma/fsl-qdma.c:824:35: note: directive argument in the range [-2147483641, 2147483646] 824 | sprintf(irq_name, "qdma-queue%d", i); | ^~~~~~~~~~~~~~ drivers/dma/fsl-qdma.c:824:17: note: ‘sprintf’ output between 12 and 22 bytes into a destination of size 20 824 | sprintf(irq_name, "qdma-queue%d", i); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/fsl-qdma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma/fsl-qdma.c b/drivers/dma/fsl-qdma.c index 69385f32e2756..f383f219ed008 100644 --- a/drivers/dma/fsl-qdma.c +++ b/drivers/dma/fsl-qdma.c @@ -805,7 +805,7 @@ fsl_qdma_irq_init(struct platform_device *pdev, int i; int cpu; int ret; - char irq_name[20]; + char irq_name[32];
fsl_qdma->error_irq = platform_get_irq_byname(pdev, "qdma-error");
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Kazior michal@plume.com
[ Upstream commit a6e4f85d3820d00694ed10f581f4c650445dbcda ]
The nl80211_dump_interface() supports resumption in case nl80211_send_iface() doesn't have the resources to complete its work.
The logic would store the progress as iteration offsets for rdev and wdev loops.
However the logic did not properly handle resumption for non-last rdev. Assuming a system with 2 rdevs, with 2 wdevs each, this could happen:
dump(cb=[0, 0]): if_start=cb[1] (=0) send rdev0.wdev0 -> ok send rdev0.wdev1 -> yield cb[1] = 1
dump(cb=[0, 1]): if_start=cb[1] (=1) send rdev0.wdev1 -> ok // since if_start=1 the rdev0.wdev0 got skipped // through if_idx < if_start send rdev1.wdev1 -> ok
The if_start needs to be reset back to 0 upon wdev loop end.
The problem is actually hard to hit on a desktop, and even on most routers. The prerequisites for this manifesting was: - more than 1 wiphy - a few handful of interfaces - dump without rdev or wdev filter
I was seeing this with 4 wiphys 9 interfaces each. It'd miss 6 interfaces from the last wiphy reported to userspace.
Signed-off-by: Michal Kazior michal@plume.com Link: https://msgid.link/20240116142340.89678-1-kazikcz@gmail.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/wireless/nl80211.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 0ac829c8f1888..279f4977e2eed 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -3595,6 +3595,7 @@ static int nl80211_dump_interface(struct sk_buff *skb, struct netlink_callback * if_idx++; }
+ if_start = 0; wp_idx++; } out:
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felix Fietkau nbd@nbd.name
[ Upstream commit bcbc84af1183c8cf3d1ca9b78540c2185cd85e7f ]
fast-xmit must only be enabled after the sta has been uploaded to the driver, otherwise it could end up passing the not-yet-uploaded sta via drv_tx calls to the driver, leading to potential crashes because of uninitialized drv_priv data. Add a missing sta->uploaded check and re-check fast xmit after inserting a sta.
Signed-off-by: Felix Fietkau nbd@nbd.name Link: https://msgid.link/20240104181059.84032-1-nbd@nbd.name Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/sta_info.c | 2 ++ net/mac80211/tx.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c index 2e84360990f0c..44bd03c6b8473 100644 --- a/net/mac80211/sta_info.c +++ b/net/mac80211/sta_info.c @@ -700,6 +700,8 @@ static int sta_info_insert_finish(struct sta_info *sta) __acquires(RCU) if (ieee80211_vif_is_mesh(&sdata->vif)) mesh_accept_plinks_update(sdata);
+ ieee80211_check_fast_xmit(sta); + return 0; out_remove: sta_info_hash_del(local, sta); diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index 55abc06214c4d..0d6d12fc3c07e 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -2959,7 +2959,7 @@ void ieee80211_check_fast_xmit(struct sta_info *sta) sdata->vif.type == NL80211_IFTYPE_STATION) goto out;
- if (!test_sta_flag(sta, WLAN_STA_AUTHORIZED)) + if (!test_sta_flag(sta, WLAN_STA_AUTHORIZED) || !sta->uploaded) goto out;
if (test_sta_flag(sta, WLAN_STA_PS_STA) ||
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fullway Wang fullwaywang@outlook.com
[ Upstream commit 04e5eac8f3ab2ff52fa191c187a46d4fdbc1e288 ]
The userspace program could pass any values to the driver through ioctl() interface. If the driver doesn't check the value of pixclock, it may cause divide-by-zero error.
Although pixclock is checked in savagefb_decode_var(), but it is not checked properly in savagefb_probe(). Fix this by checking whether pixclock is zero in the function savagefb_check_var() before info->var.pixclock is used as the divisor.
This is similar to CVE-2022-3061 in i740fb which was fixed by commit 15cf0b8.
Signed-off-by: Fullway Wang fullwaywang@outlook.com Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/video/fbdev/savage/savagefb_driver.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/video/fbdev/savage/savagefb_driver.c b/drivers/video/fbdev/savage/savagefb_driver.c index 0ac750cc5ea13..94ebd8af50cf7 100644 --- a/drivers/video/fbdev/savage/savagefb_driver.c +++ b/drivers/video/fbdev/savage/savagefb_driver.c @@ -868,6 +868,9 @@ static int savagefb_check_var(struct fb_var_screeninfo *var,
DBG("savagefb_check_var");
+ if (!var->pixclock) + return -EINVAL; + var->transp.offset = 0; var->transp.length = 0; switch (var->bits_per_pixel) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fullway Wang fullwaywang@outlook.com
[ Upstream commit e421946be7d9bf545147bea8419ef8239cb7ca52 ]
The userspace program could pass any values to the driver through ioctl() interface. If the driver doesn't check the value of pixclock, it may cause divide-by-zero error.
In sisfb_check_var(), var->pixclock is used as a divisor to caculate drate before it is checked against zero. Fix this by checking it at the beginning.
This is similar to CVE-2022-3061 in i740fb which was fixed by commit 15cf0b8.
Signed-off-by: Fullway Wang fullwaywang@outlook.com Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/video/fbdev/sis/sis_main.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/video/fbdev/sis/sis_main.c b/drivers/video/fbdev/sis/sis_main.c index 03c736f6f3d08..e540cb0c51726 100644 --- a/drivers/video/fbdev/sis/sis_main.c +++ b/drivers/video/fbdev/sis/sis_main.c @@ -1474,6 +1474,8 @@ sisfb_check_var(struct fb_var_screeninfo *var, struct fb_info *info)
vtotal = var->upper_margin + var->lower_margin + var->vsync_len;
+ if (!var->pixclock) + return -EINVAL; pixclock = var->pixclock;
if((var->vmode & FB_VMODE_MASK) == FB_VMODE_NONINTERLACED) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Devyn Liu liudingyuan@huawei.com
[ Upstream commit de8b6e1c231a95abf95ad097b993d34b31458ec9 ]
Return IRQ_NONE from the interrupt handler when no interrupt was detected. Because an empty interrupt will cause a null pointer error:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 Call trace: complete+0x54/0x100 hisi_sfc_v3xx_isr+0x2c/0x40 [spi_hisi_sfc_v3xx] __handle_irq_event_percpu+0x64/0x1e0 handle_irq_event+0x7c/0x1cc
Signed-off-by: Devyn Liu liudingyuan@huawei.com Link: https://msgid.link/r/20240123071149.917678-1-liudingyuan@huawei.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-hisi-sfc-v3xx.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/spi/spi-hisi-sfc-v3xx.c b/drivers/spi/spi-hisi-sfc-v3xx.c index 4650b483a33d3..e0c3ad73c576d 100644 --- a/drivers/spi/spi-hisi-sfc-v3xx.c +++ b/drivers/spi/spi-hisi-sfc-v3xx.c @@ -365,6 +365,11 @@ static const struct spi_controller_mem_ops hisi_sfc_v3xx_mem_ops = { static irqreturn_t hisi_sfc_v3xx_isr(int irq, void *data) { struct hisi_sfc_v3xx_host *host = data; + u32 reg; + + reg = readl(host->regbase + HISI_SFC_V3XX_INT_STAT); + if (!reg) + return IRQ_NONE;
hisi_sfc_v3xx_disable_int(host);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Conrad Kostecki conikost@gentoo.org
[ Upstream commit 0077a504e1a4468669fd2e011108db49133db56e ]
The ASM1166 SATA host controller always reports wrongly, that it has 32 ports. But in reality, it only has six ports.
This seems to be a hardware issue, as all tested ASM1166 SATA host controllers reports such high count of ports.
Example output: ahci 0000:09:00.0: AHCI 0001.0301 32 slots 32 ports 6 Gbps 0xffffff3f impl SATA mode.
By adjusting the port_map, the count is limited to six ports.
New output: ahci 0000:09:00.0: AHCI 0001.0301 32 slots 32 ports 6 Gbps 0x3f impl SATA mode.
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=211873 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218346 Signed-off-by: Conrad Kostecki conikost@gentoo.org Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Niklas Cassel cassel@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ata/ahci.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 4297a8d69dbf7..8bfada4843d83 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -654,6 +654,11 @@ MODULE_PARM_DESC(mobile_lpm_policy, "Default LPM policy for mobile chipsets"); static void ahci_pci_save_initial_config(struct pci_dev *pdev, struct ahci_host_priv *hpriv) { + if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA && pdev->device == 0x1166) { + dev_info(&pdev->dev, "ASM1166 has only six ports\n"); + hpriv->saved_port_map = 0x3f; + } + if (pdev->vendor == PCI_VENDOR_ID_JMICRON && pdev->device == 0x2361) { dev_info(&pdev->dev, "JMB361 has only one port\n"); hpriv->force_port_map = 1;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lennert Buytenhek kernel@wantstofly.org
[ Upstream commit 20730e9b277873deeb6637339edcba64468f3da3 ]
With one of the on-board ASM1061 AHCI controllers (1b21:0612) on an ASUSTeK Pro WS WRX80E-SAGE SE WIFI mainboard, a controller hang was observed that was immediately preceded by the following kernel messages:
ahci 0000:28:00.0: Using 64-bit DMA addresses ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00000 flags=0x0000] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00300 flags=0x0000] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00380 flags=0x0000] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00400 flags=0x0000] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00680 flags=0x0000] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00700 flags=0x0000]
The first message is produced by code in drivers/iommu/dma-iommu.c which is accompanied by the following comment that seems to apply:
/* * Try to use all the 32-bit PCI addresses first. The original SAC vs. * DAC reasoning loses relevance with PCIe, but enough hardware and * firmware bugs are still lurking out there that it's safest not to * venture into the 64-bit space until necessary. * * If your device goes wrong after seeing the notice then likely either * its driver is not setting DMA masks accurately, the hardware has * some inherent bug in handling >32-bit addresses, or not all the * expected address bits are wired up between the device and the IOMMU. */
Asking the ASM1061 on a discrete PCIe card to DMA from I/O virtual address 0xffffffff00000000 produces the following I/O page faults:
vfio-pci 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0x7ff00000000 flags=0x0010] vfio-pci 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0x7ff00000500 flags=0x0010]
Note that the upper 21 bits of the logged DMA address are zero. (When asking a different PCIe device in the same PCIe slot to DMA to the same I/O virtual address, we do see all the upper 32 bits of the DMA address as 1, so this is not an issue with the chipset or IOMMU configuration on the test system.)
Also, hacking libahci to always set the upper 21 bits of all DMA addresses to 1 produces no discernible effect on the behavior of the ASM1061, and mkfs/mount/scrub/etc work as without this hack.
This all strongly suggests that the ASM1061 has a 43 bit DMA address limit, and this commit therefore adds a quirk to deal with this limit.
This issue probably applies to (some of) the other supported ASMedia parts as well, but we limit it to the PCI IDs known to refer to ASM1061 parts, as that's the only part we know for sure to be affected by this issue at this point.
Link: https://lore.kernel.org/linux-ide/ZaZ2PIpEId-rl6jv@wantstofly.org/ Signed-off-by: Lennert Buytenhek kernel@wantstofly.org [cassel: drop date from error messages in commit log] Signed-off-by: Niklas Cassel cassel@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ata/ahci.c | 29 +++++++++++++++++++++++------ drivers/ata/ahci.h | 1 + 2 files changed, 24 insertions(+), 6 deletions(-)
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 8bfada4843d83..6f7f8e41404dc 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -49,6 +49,7 @@ enum { enum board_ids { /* board IDs by feature in alphabetical order */ board_ahci, + board_ahci_43bit_dma, board_ahci_ign_iferr, board_ahci_low_power, board_ahci_no_debounce_delay, @@ -129,6 +130,13 @@ static const struct ata_port_info ahci_port_info[] = { .udma_mask = ATA_UDMA6, .port_ops = &ahci_ops, }, + [board_ahci_43bit_dma] = { + AHCI_HFLAGS (AHCI_HFLAG_43BIT_ONLY), + .flags = AHCI_FLAG_COMMON, + .pio_mask = ATA_PIO4, + .udma_mask = ATA_UDMA6, + .port_ops = &ahci_ops, + }, [board_ahci_ign_iferr] = { AHCI_HFLAGS (AHCI_HFLAG_IGN_IRQ_IF_ERR), .flags = AHCI_FLAG_COMMON, @@ -594,11 +602,11 @@ static const struct pci_device_id ahci_pci_tbl[] = { { PCI_VDEVICE(PROMISE, 0x3f20), board_ahci }, /* PDC42819 */ { PCI_VDEVICE(PROMISE, 0x3781), board_ahci }, /* FastTrak TX8660 ahci-mode */
- /* Asmedia */ + /* ASMedia */ { PCI_VDEVICE(ASMEDIA, 0x0601), board_ahci }, /* ASM1060 */ { PCI_VDEVICE(ASMEDIA, 0x0602), board_ahci }, /* ASM1060 */ - { PCI_VDEVICE(ASMEDIA, 0x0611), board_ahci }, /* ASM1061 */ - { PCI_VDEVICE(ASMEDIA, 0x0612), board_ahci }, /* ASM1062 */ + { PCI_VDEVICE(ASMEDIA, 0x0611), board_ahci_43bit_dma }, /* ASM1061 */ + { PCI_VDEVICE(ASMEDIA, 0x0612), board_ahci_43bit_dma }, /* ASM1061/1062 */ { PCI_VDEVICE(ASMEDIA, 0x0621), board_ahci }, /* ASM1061R */ { PCI_VDEVICE(ASMEDIA, 0x0622), board_ahci }, /* ASM1062R */
@@ -951,11 +959,20 @@ static int ahci_pci_device_resume(struct device *dev)
#endif /* CONFIG_PM */
-static int ahci_configure_dma_masks(struct pci_dev *pdev, int using_dac) +static int ahci_configure_dma_masks(struct pci_dev *pdev, + struct ahci_host_priv *hpriv) { - const int dma_bits = using_dac ? 64 : 32; + int dma_bits; int rc;
+ if (hpriv->cap & HOST_CAP_64) { + dma_bits = 64; + if (hpriv->flags & AHCI_HFLAG_43BIT_ONLY) + dma_bits = 43; + } else { + dma_bits = 32; + } + /* * If the device fixup already set the dma_mask to some non-standard * value, don't extend it here. This happens on STA2X11, for example. @@ -1933,7 +1950,7 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) ahci_gtf_filter_workaround(host);
/* initialize adapter */ - rc = ahci_configure_dma_masks(pdev, hpriv->cap & HOST_CAP_64); + rc = ahci_configure_dma_masks(pdev, hpriv); if (rc) return rc;
diff --git a/drivers/ata/ahci.h b/drivers/ata/ahci.h index 7cc6feb17e972..b8db2b0d74146 100644 --- a/drivers/ata/ahci.h +++ b/drivers/ata/ahci.h @@ -244,6 +244,7 @@ enum { AHCI_HFLAG_IGN_NOTSUPP_POWER_ON = BIT(27), /* ignore -EOPNOTSUPP from phy_power_on() */ AHCI_HFLAG_NO_SXS = BIT(28), /* SXS not supported */ + AHCI_HFLAG_43BIT_ONLY = BIT(29), /* 43bit DMA addr limit */
/* ap->flags bits */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baokun Li libaokun1@huawei.com
[ Upstream commit 4530b3660d396a646aad91a787b6ab37cf604b53 ]
Determine if the group block bitmap is corrupted before using ac_b_ex in ext4_mb_try_best_found() to avoid allocating blocks from a group with a corrupted block bitmap in the following concurrency and making the situation worse.
ext4_mb_regular_allocator ext4_lock_group(sb, group) ext4_mb_good_group // check if the group bbitmap is corrupted ext4_mb_complex_scan_group // Scan group gets ac_b_ex but doesn't use it ext4_unlock_group(sb, group) ext4_mark_group_bitmap_corrupted(group) // The block bitmap was corrupted during // the group unlock gap. ext4_mb_try_best_found ext4_lock_group(ac->ac_sb, group) ext4_mb_use_best_found mb_mark_used // Allocating blocks in block bitmap corrupted group
Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20240104142040.2835097-7-libaokun1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/mballoc.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 9bec75847b856..ebb79f10330a8 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -1854,6 +1854,9 @@ int ext4_mb_try_best_found(struct ext4_allocation_context *ac, return err;
ext4_lock_group(ac->ac_sb, group); + if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info))) + goto out; + max = mb_find_extent(e4b, ex.fe_start, ex.fe_len, &ex);
if (max > 0) { @@ -1861,6 +1864,7 @@ int ext4_mb_try_best_found(struct ext4_allocation_context *ac, ext4_mb_use_best_found(ac, e4b); }
+out: ext4_unlock_group(ac->ac_sb, group); ext4_mb_unload_buddy(e4b);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baokun Li libaokun1@huawei.com
[ Upstream commit 832698373a25950942c04a512daa652c18a9b513 ]
Places the logic for checking if the group's block bitmap is corrupt under the protection of the group lock to avoid allocating blocks from the group with a corrupted block bitmap.
Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20240104142040.2835097-8-libaokun1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/mballoc.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index ebb79f10330a8..46da6845c3072 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -1893,12 +1893,10 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac, if (err) return err;
- if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info))) { - ext4_mb_unload_buddy(e4b); - return 0; - } - ext4_lock_group(ac->ac_sb, group); + if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info))) + goto out; + max = mb_find_extent(e4b, ac->ac_g_ex.fe_start, ac->ac_g_ex.fe_len, &ex); ex.fe_logical = 0xDEADFA11; /* debug value */ @@ -1931,6 +1929,7 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac, ac->ac_b_ex = ex; ext4_mb_use_best_found(ac, e4b); } +out: ext4_unlock_group(ac->ac_sb, group); ext4_mb_unload_buddy(e4b);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kunwu Chan chentao@kylinos.cn
[ Upstream commit 6e2276203ac9ff10fc76917ec9813c660f627369 ]
devm_kasprintf() returns a pointer to dynamically allocated memory which can be NULL upon failure. Ensure the allocation was successful by checking the pointer validity.
Signed-off-by: Kunwu Chan chentao@kylinos.cn Link: https://lore.kernel.org/r/20240118031929.192192-1-chentao@kylinos.cn Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/ti/edma.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/dma/ti/edma.c b/drivers/dma/ti/edma.c index a1adc8d91fd8d..69292d4a0c441 100644 --- a/drivers/dma/ti/edma.c +++ b/drivers/dma/ti/edma.c @@ -2462,6 +2462,11 @@ static int edma_probe(struct platform_device *pdev) if (irq > 0) { irq_name = devm_kasprintf(dev, GFP_KERNEL, "%s_ccint", dev_name(dev)); + if (!irq_name) { + ret = -ENOMEM; + goto err_disable_pm; + } + ret = devm_request_irq(dev, irq, dma_irq_handler, 0, irq_name, ecc); if (ret) { @@ -2478,6 +2483,11 @@ static int edma_probe(struct platform_device *pdev) if (irq > 0) { irq_name = devm_kasprintf(dev, GFP_KERNEL, "%s_ccerrint", dev_name(dev)); + if (!irq_name) { + ret = -ENOMEM; + goto err_disable_pm; + } + ret = devm_request_irq(dev, irq, dma_ccerr_handler, 0, irq_name, ecc); if (ret) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Blumenstingl martin.blumenstingl@googlemail.com
[ Upstream commit c92688cac239794e4a1d976afa5203a4d3a2ac0e ]
Continuous regulators can be configured to operate only in a certain duty cycle range (for example from 0..91%). Add a check to error out if the duty cycle translates to an unsupported (or out of range) voltage.
Suggested-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Signed-off-by: Martin Blumenstingl martin.blumenstingl@googlemail.com Link: https://msgid.link/r/20240113224628.377993-2-martin.blumenstingl@googlemail.... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/regulator/pwm-regulator.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/regulator/pwm-regulator.c b/drivers/regulator/pwm-regulator.c index 7629476d94aeb..f4d9d9455dea6 100644 --- a/drivers/regulator/pwm-regulator.c +++ b/drivers/regulator/pwm-regulator.c @@ -158,6 +158,9 @@ static int pwm_regulator_get_voltage(struct regulator_dev *rdev) pwm_get_state(drvdata->pwm, &pstate);
voltage = pwm_get_relative_duty_cycle(&pstate, duty_unit); + if (voltage < min(max_uV_duty, min_uV_duty) || + voltage > max(max_uV_duty, min_uV_duty)) + return -ENOTRECOVERABLE;
/* * The dutycycle for min_uV might be greater than the one for max_uV.
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guixin Liu kanie@linux.alibaba.com
[ Upstream commit 47c5dd66c1840524572dcdd956f4af2bdb6fbdff ]
The nvmet_tcp_queue_ida should be destroy when the nvmet-tcp module exit.
Signed-off-by: Guixin Liu kanie@linux.alibaba.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Chaitanya Kulkarni kch@nvidia.com Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/target/tcp.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index 116ae6fd35e2d..d70a2fa4ba45f 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -1852,6 +1852,7 @@ static void __exit nvmet_tcp_exit(void) flush_scheduled_work();
destroy_workqueue(nvmet_tcp_wq); + ida_destroy(&nvmet_tcp_queue_ida); }
module_init(nvmet_tcp_init);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chen-Yu Tsai wens@csie.org
[ Upstream commit 0adf963b8463faa44653e22e56ce55f747e68868 ]
The SPDIF hardware block found in the H616 SoC has the same layout as the one found in the H6 SoC, except that it is missing the receiver side.
Since the driver currently only supports the transmit function, support for the H616 is identical to what is currently done for the H6.
Signed-off-by: Chen-Yu Tsai wens@csie.org Reviewed-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Jernej Skrabec jernej.skrabec@gmail.com Link: https://msgid.link/r/20240127163247.384439-4-wens@kernel.org Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/sunxi/sun4i-spdif.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/sound/soc/sunxi/sun4i-spdif.c b/sound/soc/sunxi/sun4i-spdif.c index 228485fe07342..6dcad1aa25037 100644 --- a/sound/soc/sunxi/sun4i-spdif.c +++ b/sound/soc/sunxi/sun4i-spdif.c @@ -464,6 +464,11 @@ static const struct of_device_id sun4i_spdif_of_match[] = { .compatible = "allwinner,sun50i-h6-spdif", .data = &sun50i_h6_spdif_quirks, }, + { + .compatible = "allwinner,sun50i-h616-spdif", + /* Essentially the same as the H6, but without RX */ + .data = &sun50i_h6_spdif_quirks, + }, { /* sentinel */ } }; MODULE_DEVICE_TABLE(of, sun4i_spdif_of_match);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wolfram Sang wsa+renesas@sang-engineering.com
[ Upstream commit 6500ad28fd5d67d5ca0fee9da73c463090842440 ]
cppcheck rightfully warned:
drivers/spi/spi-sh-msiof.c:792:28: warning: Signed integer overflow for expression '7<<29'. [integerOverflow] sh_msiof_write(p, SIFCTR, SIFCTR_TFWM_1 | SIFCTR_RFWM_1);
Signed-off-by: Wolfram Sang wsa+renesas@sang-engineering.com Reviewed-by: Geert Uytterhoeven geert+renesas@glider.be Link: https://msgid.link/r/20240130094053.10672-1-wsa+renesas@sang-engineering.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-sh-msiof.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/spi/spi-sh-msiof.c b/drivers/spi/spi-sh-msiof.c index 35d30378256f6..12fd02f92e37b 100644 --- a/drivers/spi/spi-sh-msiof.c +++ b/drivers/spi/spi-sh-msiof.c @@ -137,14 +137,14 @@ struct sh_msiof_spi_priv {
/* SIFCTR */ #define SIFCTR_TFWM_MASK GENMASK(31, 29) /* Transmit FIFO Watermark */ -#define SIFCTR_TFWM_64 (0 << 29) /* Transfer Request when 64 empty stages */ -#define SIFCTR_TFWM_32 (1 << 29) /* Transfer Request when 32 empty stages */ -#define SIFCTR_TFWM_24 (2 << 29) /* Transfer Request when 24 empty stages */ -#define SIFCTR_TFWM_16 (3 << 29) /* Transfer Request when 16 empty stages */ -#define SIFCTR_TFWM_12 (4 << 29) /* Transfer Request when 12 empty stages */ -#define SIFCTR_TFWM_8 (5 << 29) /* Transfer Request when 8 empty stages */ -#define SIFCTR_TFWM_4 (6 << 29) /* Transfer Request when 4 empty stages */ -#define SIFCTR_TFWM_1 (7 << 29) /* Transfer Request when 1 empty stage */ +#define SIFCTR_TFWM_64 (0UL << 29) /* Transfer Request when 64 empty stages */ +#define SIFCTR_TFWM_32 (1UL << 29) /* Transfer Request when 32 empty stages */ +#define SIFCTR_TFWM_24 (2UL << 29) /* Transfer Request when 24 empty stages */ +#define SIFCTR_TFWM_16 (3UL << 29) /* Transfer Request when 16 empty stages */ +#define SIFCTR_TFWM_12 (4UL << 29) /* Transfer Request when 12 empty stages */ +#define SIFCTR_TFWM_8 (5UL << 29) /* Transfer Request when 8 empty stages */ +#define SIFCTR_TFWM_4 (6UL << 29) /* Transfer Request when 4 empty stages */ +#define SIFCTR_TFWM_1 (7UL << 29) /* Transfer Request when 1 empty stage */ #define SIFCTR_TFUA_MASK GENMASK(26, 20) /* Transmit FIFO Usable Area */ #define SIFCTR_TFUA_SHIFT 20 #define SIFCTR_TFUA(i) ((i) << SIFCTR_TFUA_SHIFT)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 6e348067ee4bc5905e35faa3a8fafa91c9124bc7 ]
The annotation says in sctp_new(): "If it is a shutdown ack OOTB packet, we expect a return shutdown complete, otherwise an ABORT Sec 8.4 (5) and (8)". However, it does not check SCTP_CID_SHUTDOWN_ACK before setting vtag[REPLY] in the conntrack entry(ct).
Because of that, if the ct in Router disappears for some reason in [1] with the packet sequence like below:
Client > Server: sctp (1) [INIT] [init tag: 3201533963] Server > Client: sctp (1) [INIT ACK] [init tag: 972498433] Client > Server: sctp (1) [COOKIE ECHO] Server > Client: sctp (1) [COOKIE ACK] Client > Server: sctp (1) [DATA] (B)(E) [TSN: 3075057809] Server > Client: sctp (1) [SACK] [cum ack 3075057809] Server > Client: sctp (1) [HB REQ] (the ct in Router disappears somehow) <-------- [1] Client > Server: sctp (1) [HB ACK] Client > Server: sctp (1) [DATA] (B)(E) [TSN: 3075057810] Client > Server: sctp (1) [DATA] (B)(E) [TSN: 3075057810] Client > Server: sctp (1) [HB REQ] Client > Server: sctp (1) [DATA] (B)(E) [TSN: 3075057810] Client > Server: sctp (1) [HB REQ] Client > Server: sctp (1) [ABORT]
when processing HB ACK packet in Router it calls sctp_new() to initialize the new ct with vtag[REPLY] set to HB_ACK packet's vtag.
Later when sending DATA from Client, all the SACKs from Server will get dropped in Router, as the SACK packet's vtag does not match vtag[REPLY] in the ct. The worst thing is the vtag in this ct will never get fixed by the upcoming packets from Server.
This patch fixes it by checking SCTP_CID_SHUTDOWN_ACK before setting vtag[REPLY] in the ct in sctp_new() as the annotation says. With this fix, it will leave vtag[REPLY] in ct to 0 in the case above, and the next HB REQ/ACK from Server is able to fix the vtag as its value is 0 in nf_conntrack_sctp_packet().
Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_conntrack_proto_sctp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nf_conntrack_proto_sctp.c b/net/netfilter/nf_conntrack_proto_sctp.c index e7545bcca805e..6b2a215b27862 100644 --- a/net/netfilter/nf_conntrack_proto_sctp.c +++ b/net/netfilter/nf_conntrack_proto_sctp.c @@ -299,7 +299,7 @@ sctp_new(struct nf_conn *ct, const struct sk_buff *skb, pr_debug("Setting vtag %x for secondary conntrack\n", sh->vtag); ct->proto.sctp.vtag[IP_CT_DIR_ORIGINAL] = sh->vtag; - } else { + } else if (sch->type == SCTP_CID_SHUTDOWN_ACK) { /* If it is a shutdown ack OOTB packet, we expect a return shutdown complete, otherwise an ABORT Sec 8.4 (5) and (8) */ pr_debug("Setting vtag %x for new conn OOTB\n",
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Wagner dwagner@suse.de
[ Upstream commit 70fbfc47a392b98e5f8dba70c6efc6839205c982 ]
The module exit path has race between deleting all controllers and freeing 'left over IDs'. To prevent double free a synchronization between nvme_delete_ctrl and ida_destroy has been added by the initial commit.
There is some logic around trying to prevent from hanging forever in wait_for_completion, though it does not handling all cases. E.g. blktests is able to reproduce the situation where the module unload hangs forever.
If we completely rely on the cleanup code executed from the nvme_delete_ctrl path, all IDs will be freed eventually. This makes calling ida_destroy unnecessary. We only have to ensure that all nvme_delete_ctrl code has been executed before we leave nvme_fc_exit_module. This is done by flushing the nvme_delete_wq workqueue.
While at it, remove the unused nvme_fc_wq workqueue too.
Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Daniel Wagner dwagner@suse.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/fc.c | 47 ++++++------------------------------------ 1 file changed, 6 insertions(+), 41 deletions(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index 906cab35afe7a..8e05239073ef2 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -220,11 +220,6 @@ static LIST_HEAD(nvme_fc_lport_list); static DEFINE_IDA(nvme_fc_local_port_cnt); static DEFINE_IDA(nvme_fc_ctrl_cnt);
-static struct workqueue_struct *nvme_fc_wq; - -static bool nvme_fc_waiting_to_unload; -static DECLARE_COMPLETION(nvme_fc_unload_proceed); - /* * These items are short-term. They will eventually be moved into * a generic FC class. See comments in module init. @@ -254,8 +249,6 @@ nvme_fc_free_lport(struct kref *ref) /* remove from transport list */ spin_lock_irqsave(&nvme_fc_lock, flags); list_del(&lport->port_list); - if (nvme_fc_waiting_to_unload && list_empty(&nvme_fc_lport_list)) - complete(&nvme_fc_unload_proceed); spin_unlock_irqrestore(&nvme_fc_lock, flags);
ida_simple_remove(&nvme_fc_local_port_cnt, lport->localport.port_num); @@ -3823,10 +3816,6 @@ static int __init nvme_fc_init_module(void) { int ret;
- nvme_fc_wq = alloc_workqueue("nvme_fc_wq", WQ_MEM_RECLAIM, 0); - if (!nvme_fc_wq) - return -ENOMEM; - /* * NOTE: * It is expected that in the future the kernel will combine @@ -3844,7 +3833,7 @@ static int __init nvme_fc_init_module(void) ret = class_register(&fc_class); if (ret) { pr_err("couldn't register class fc\n"); - goto out_destroy_wq; + return ret; }
/* @@ -3868,8 +3857,6 @@ static int __init nvme_fc_init_module(void) device_destroy(&fc_class, MKDEV(0, 0)); out_destroy_class: class_unregister(&fc_class); -out_destroy_wq: - destroy_workqueue(nvme_fc_wq);
return ret; } @@ -3889,45 +3876,23 @@ nvme_fc_delete_controllers(struct nvme_fc_rport *rport) spin_unlock(&rport->lock); }
-static void -nvme_fc_cleanup_for_unload(void) +static void __exit nvme_fc_exit_module(void) { struct nvme_fc_lport *lport; struct nvme_fc_rport *rport; - - list_for_each_entry(lport, &nvme_fc_lport_list, port_list) { - list_for_each_entry(rport, &lport->endp_list, endp_list) { - nvme_fc_delete_controllers(rport); - } - } -} - -static void __exit nvme_fc_exit_module(void) -{ unsigned long flags; - bool need_cleanup = false;
spin_lock_irqsave(&nvme_fc_lock, flags); - nvme_fc_waiting_to_unload = true; - if (!list_empty(&nvme_fc_lport_list)) { - need_cleanup = true; - nvme_fc_cleanup_for_unload(); - } + list_for_each_entry(lport, &nvme_fc_lport_list, port_list) + list_for_each_entry(rport, &lport->endp_list, endp_list) + nvme_fc_delete_controllers(rport); spin_unlock_irqrestore(&nvme_fc_lock, flags); - if (need_cleanup) { - pr_info("%s: waiting for ctlr deletes\n", __func__); - wait_for_completion(&nvme_fc_unload_proceed); - pr_info("%s: ctrl deletes complete\n", __func__); - } + flush_workqueue(nvme_delete_wq);
nvmf_unregister_transport(&nvme_fc_transport);
- ida_destroy(&nvme_fc_local_port_cnt); - ida_destroy(&nvme_fc_ctrl_cnt); - device_destroy(&fc_class, MKDEV(0, 0)); class_unregister(&fc_class); - destroy_workqueue(nvme_fc_wq); }
module_init(nvme_fc_init_module);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Wagner dwagner@suse.de
[ Upstream commit dcfad4ab4d6733f2861cd241d8532a0004fc835a ]
The first argument of list_add_tail function is the new element which should be added to the list which is the second argument. Swap the arguments to allow processing more than one element at a time.
Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Daniel Wagner dwagner@suse.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/target/fcloop.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c index 80a208fb34f52..f2c5136bf2b82 100644 --- a/drivers/nvme/target/fcloop.c +++ b/drivers/nvme/target/fcloop.c @@ -358,7 +358,7 @@ fcloop_h2t_ls_req(struct nvme_fc_local_port *localport, if (!rport->targetport) { tls_req->status = -ECONNREFUSED; spin_lock(&rport->lock); - list_add_tail(&rport->ls_list, &tls_req->ls_list); + list_add_tail(&tls_req->ls_list, &rport->ls_list); spin_unlock(&rport->lock); schedule_work(&rport->ls_work); return ret; @@ -391,7 +391,7 @@ fcloop_h2t_xmt_ls_rsp(struct nvmet_fc_target_port *targetport, if (remoteport) { rport = remoteport->private; spin_lock(&rport->lock); - list_add_tail(&rport->ls_list, &tls_req->ls_list); + list_add_tail(&tls_req->ls_list, &rport->ls_list); spin_unlock(&rport->lock); schedule_work(&rport->ls_work); } @@ -446,7 +446,7 @@ fcloop_t2h_ls_req(struct nvmet_fc_target_port *targetport, void *hosthandle, if (!tport->remoteport) { tls_req->status = -ECONNREFUSED; spin_lock(&tport->lock); - list_add_tail(&tport->ls_list, &tls_req->ls_list); + list_add_tail(&tls_req->ls_list, &tport->ls_list); spin_unlock(&tport->lock); schedule_work(&tport->ls_work); return ret;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Wagner dwagner@suse.de
[ Upstream commit c691e6d7e13dab81ac8c7489c83b5dea972522a5 ]
In case we return early out of __nvmet_fc_finish_ls_req() we still have to release the reference on the target port.
Reviewed-by: Hannes Reinecke hare@suse.de Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Daniel Wagner dwagner@suse.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/target/fc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c index 46fc44ce86712..18a64a4fd8da9 100644 --- a/drivers/nvme/target/fc.c +++ b/drivers/nvme/target/fc.c @@ -357,7 +357,7 @@ __nvmet_fc_finish_ls_req(struct nvmet_fc_ls_req_op *lsop)
if (!lsop->req_queued) { spin_unlock_irqrestore(&tgtport->lock, flags); - return; + goto out_puttgtport; }
list_del(&lsop->lsreq_list); @@ -370,6 +370,7 @@ __nvmet_fc_finish_ls_req(struct nvmet_fc_ls_req_op *lsop) (lsreq->rqstlen + lsreq->rsplen), DMA_BIDIRECTIONAL);
+out_puttgtport: nvmet_fc_tgtport_put(tgtport); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Wagner dwagner@suse.de
[ Upstream commit 3146345c2e9c2f661527054e402b0cfad80105a4 ]
When the target port has not active port binding, there is no point in trying to process the command as it has to fail anyway. Instead adding checks to all commands abort the command early.
Reviewed-by: Hannes Reinecke hare@suse.de Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Daniel Wagner dwagner@suse.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/target/fc.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c index 18a64a4fd8da9..846fb41da6430 100644 --- a/drivers/nvme/target/fc.c +++ b/drivers/nvme/target/fc.c @@ -1102,6 +1102,9 @@ nvmet_fc_alloc_target_assoc(struct nvmet_fc_tgtport *tgtport, void *hosthandle) int idx; bool needrandom = true;
+ if (!tgtport->pe) + return NULL; + assoc = kzalloc(sizeof(*assoc), GFP_KERNEL); if (!assoc) return NULL; @@ -2529,8 +2532,9 @@ nvmet_fc_handle_fcp_rqst(struct nvmet_fc_tgtport *tgtport,
fod->req.cmd = &fod->cmdiubuf.sqe; fod->req.cqe = &fod->rspiubuf.cqe; - if (tgtport->pe) - fod->req.port = tgtport->pe->port; + if (!tgtport->pe) + goto transport_error; + fod->req.port = tgtport->pe->port;
/* clear any response payload */ memset(&fod->rspiubuf, 0, sizeof(fod->rspiubuf));
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhang Yi yi.zhang@huawei.com
[ Upstream commit 6430dea07e85958fa87d0276c0c4388dd51e630b ]
In ext4_map_blocks(), if we can't find a range of mapping in the extents cache, we are calling ext4_ext_map_blocks() to search the real path and ext4_ext_determine_hole() to determine the hole range. But if the querying range was partially or completely overlaped by a delalloc extent, we can't find it in the real extent path, so the returned hole length could be incorrect.
Fortunately, ext4_ext_put_gap_in_cache() have already handle delalloc extent, but it searches start from the expanded hole_start, doesn't start from the querying range, so the delalloc extent found could not be the one that overlaped the querying range, plus, it also didn't adjust the hole length. Let's just remove ext4_ext_put_gap_in_cache(), handle delalloc and insert adjusted hole extent in ext4_ext_determine_hole().
Signed-off-by: Zhang Yi yi.zhang@huawei.com Suggested-by: Jan Kara jack@suse.cz Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20240127015825.1608160-4-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/extents.c | 111 +++++++++++++++++++++++++++++----------------- 1 file changed, 70 insertions(+), 41 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 193b13630ac1e..68aa8760cb465 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -2222,7 +2222,7 @@ static int ext4_fill_es_cache_info(struct inode *inode,
/* - * ext4_ext_determine_hole - determine hole around given block + * ext4_ext_find_hole - find hole around given block according to the given path * @inode: inode we lookup in * @path: path in extent tree to @lblk * @lblk: pointer to logical block around which we want to determine hole @@ -2234,9 +2234,9 @@ static int ext4_fill_es_cache_info(struct inode *inode, * The function returns the length of a hole starting at @lblk. We update @lblk * to the beginning of the hole if we managed to find it. */ -static ext4_lblk_t ext4_ext_determine_hole(struct inode *inode, - struct ext4_ext_path *path, - ext4_lblk_t *lblk) +static ext4_lblk_t ext4_ext_find_hole(struct inode *inode, + struct ext4_ext_path *path, + ext4_lblk_t *lblk) { int depth = ext_depth(inode); struct ext4_extent *ex; @@ -2263,30 +2263,6 @@ static ext4_lblk_t ext4_ext_determine_hole(struct inode *inode, return len; }
-/* - * ext4_ext_put_gap_in_cache: - * calculate boundaries of the gap that the requested block fits into - * and cache this gap - */ -static void -ext4_ext_put_gap_in_cache(struct inode *inode, ext4_lblk_t hole_start, - ext4_lblk_t hole_len) -{ - struct extent_status es; - - ext4_es_find_extent_range(inode, &ext4_es_is_delayed, hole_start, - hole_start + hole_len - 1, &es); - if (es.es_len) { - /* There's delayed extent containing lblock? */ - if (es.es_lblk <= hole_start) - return; - hole_len = min(es.es_lblk - hole_start, hole_len); - } - ext_debug(inode, " -> %u:%u\n", hole_start, hole_len); - ext4_es_insert_extent(inode, hole_start, hole_len, ~0, - EXTENT_STATUS_HOLE); -} - /* * ext4_ext_rm_idx: * removes index from the index block. @@ -4058,6 +4034,69 @@ static int get_implied_cluster_alloc(struct super_block *sb, return 0; }
+/* + * Determine hole length around the given logical block, first try to + * locate and expand the hole from the given @path, and then adjust it + * if it's partially or completely converted to delayed extents, insert + * it into the extent cache tree if it's indeed a hole, finally return + * the length of the determined extent. + */ +static ext4_lblk_t ext4_ext_determine_insert_hole(struct inode *inode, + struct ext4_ext_path *path, + ext4_lblk_t lblk) +{ + ext4_lblk_t hole_start, len; + struct extent_status es; + + hole_start = lblk; + len = ext4_ext_find_hole(inode, path, &hole_start); +again: + ext4_es_find_extent_range(inode, &ext4_es_is_delayed, hole_start, + hole_start + len - 1, &es); + if (!es.es_len) + goto insert_hole; + + /* + * There's a delalloc extent in the hole, handle it if the delalloc + * extent is in front of, behind and straddle the queried range. + */ + if (lblk >= es.es_lblk + es.es_len) { + /* + * The delalloc extent is in front of the queried range, + * find again from the queried start block. + */ + len -= lblk - hole_start; + hole_start = lblk; + goto again; + } else if (in_range(lblk, es.es_lblk, es.es_len)) { + /* + * The delalloc extent containing lblk, it must have been + * added after ext4_map_blocks() checked the extent status + * tree, adjust the length to the delalloc extent's after + * lblk. + */ + len = es.es_lblk + es.es_len - lblk; + return len; + } else { + /* + * The delalloc extent is partially or completely behind + * the queried range, update hole length until the + * beginning of the delalloc extent. + */ + len = min(es.es_lblk - hole_start, len); + } + +insert_hole: + /* Put just found gap into cache to speed up subsequent requests */ + ext_debug(inode, " -> %u:%u\n", hole_start, len); + ext4_es_insert_extent(inode, hole_start, len, ~0, EXTENT_STATUS_HOLE); + + /* Update hole_len to reflect hole size after lblk */ + if (hole_start != lblk) + len -= lblk - hole_start; + + return len; +}
/* * Block allocation/map/preallocation routine for extents based files @@ -4175,22 +4214,12 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode, * we couldn't try to create block if create flag is zero */ if ((flags & EXT4_GET_BLOCKS_CREATE) == 0) { - ext4_lblk_t hole_start, hole_len; + ext4_lblk_t len;
- hole_start = map->m_lblk; - hole_len = ext4_ext_determine_hole(inode, path, &hole_start); - /* - * put just found gap into cache to speed up - * subsequent requests - */ - ext4_ext_put_gap_in_cache(inode, hole_start, hole_len); + len = ext4_ext_determine_insert_hole(inode, path, map->m_lblk);
- /* Update hole_len to reflect hole size after map->m_lblk */ - if (hole_start != map->m_lblk) - hole_len -= map->m_lblk - hole_start; map->m_pblk = 0; - map->m_len = min_t(unsigned int, map->m_len, hole_len); - + map->m_len = min_t(unsigned int, map->m_len, len); goto out; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Szilard Fabian szfabian@bluemarch.art
[ Upstream commit 4255447ad34c5c3785fcdcf76cfa0271d6e5ed39 ]
Another Fujitsu-related patch.
In the initial boot stage the integrated keyboard of Fujitsu Lifebook U728 refuses to work and it's not possible to type for example a dm-crypt passphrase without the help of an external keyboard.
i8042.nomux kernel parameter resolves this issue but using that a PS/2 mouse is detected. This input device is unused even when the i2c-hid-acpi kernel module is blacklisted making the integrated ELAN touchpad (04F3:3092) not working at all.
So this notebook uses a hid-over-i2c touchpad which is managed by the i2c_designware input driver. Since you can't find a PS/2 mouse port on this computer and you can't connect a PS/2 mouse to it even with an official port replicator I think it's safe to not use the PS/2 mouse port at all.
Signed-off-by: Szilard Fabian szfabian@bluemarch.art Link: https://lore.kernel.org/r/20240103014717.127307-2-szfabian@bluemarch.art Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/input/serio/i8042-acpipnpio.h | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/input/serio/i8042-acpipnpio.h b/drivers/input/serio/i8042-acpipnpio.h index cd21c92a6b2cd..6804970d8f51a 100644 --- a/drivers/input/serio/i8042-acpipnpio.h +++ b/drivers/input/serio/i8042-acpipnpio.h @@ -625,6 +625,14 @@ static const struct dmi_system_id i8042_dmi_quirk_table[] __initconst = { }, .driver_data = (void *)(SERIO_QUIRK_NOAUX) }, + { + /* Fujitsu Lifebook U728 */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"), + DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK U728"), + }, + .driver_data = (void *)(SERIO_QUIRK_NOAUX) + }, { /* Gigabyte M912 */ .matches = {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrew Bresticker abrestic@rivosinc.com
[ Upstream commit de1034b38a346ef6be25fe8792f5d1e0684d5ff4 ]
md_size will have been narrowed if we have >= 4GB worth of pages in a soft-reserved region.
Signed-off-by: Andrew Bresticker abrestic@rivosinc.com Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/efi/arm-runtime.c | 2 +- drivers/firmware/efi/riscv-runtime.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c index 3359ae2adf24b..9054c2852580d 100644 --- a/drivers/firmware/efi/arm-runtime.c +++ b/drivers/firmware/efi/arm-runtime.c @@ -107,7 +107,7 @@ static int __init arm_enable_runtime_services(void) efi_memory_desc_t *md;
for_each_efi_memory_desc(md) { - int md_size = md->num_pages << EFI_PAGE_SHIFT; + u64 md_size = md->num_pages << EFI_PAGE_SHIFT; struct resource *res;
if (!(md->attribute & EFI_MEMORY_SP)) diff --git a/drivers/firmware/efi/riscv-runtime.c b/drivers/firmware/efi/riscv-runtime.c index d28e715d2bcc8..6711e64eb0b16 100644 --- a/drivers/firmware/efi/riscv-runtime.c +++ b/drivers/firmware/efi/riscv-runtime.c @@ -85,7 +85,7 @@ static int __init riscv_enable_runtime_services(void) efi_memory_desc_t *md;
for_each_efi_memory_desc(md) { - int md_size = md->num_pages << EFI_PAGE_SHIFT; + u64 md_size = md->num_pages << EFI_PAGE_SHIFT; struct resource *res;
if (!(md->attribute & EFI_MEMORY_SP))
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrew Bresticker abrestic@rivosinc.com
[ Upstream commit 0bcff59ef7a652fcdc6d535554b63278c2406c8f ]
Adding memblocks for soft-reserved regions prevents them from later being hotplugged in by dax_kmem.
Signed-off-by: Andrew Bresticker abrestic@rivosinc.com Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/efi/efi-init.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/drivers/firmware/efi/efi-init.c b/drivers/firmware/efi/efi-init.c index f55a92ff12c0f..86da3c7a5036a 100644 --- a/drivers/firmware/efi/efi-init.c +++ b/drivers/firmware/efi/efi-init.c @@ -141,15 +141,6 @@ static __init int is_usable_memory(efi_memory_desc_t *md) case EFI_BOOT_SERVICES_DATA: case EFI_CONVENTIONAL_MEMORY: case EFI_PERSISTENT_MEMORY: - /* - * Special purpose memory is 'soft reserved', which means it - * is set aside initially, but can be hotplugged back in or - * be assigned to the dax driver after boot. - */ - if (efi_soft_reserve_enabled() && - (md->attribute & EFI_MEMORY_SP)) - return false; - /* * According to the spec, these regions are no longer reserved * after calling ExitBootServices(). However, we can only use @@ -194,6 +185,16 @@ static __init void reserve_regions(void) size = npages << PAGE_SHIFT;
if (is_memory(md)) { + /* + * Special purpose memory is 'soft reserved', which + * means it is set aside initially. Don't add a memblock + * for it now so that it can be hotplugged back in or + * be assigned to the dax driver after boot. + */ + if (efi_soft_reserve_enabled() && + (md->attribute & EFI_MEMORY_SP)) + continue; + early_init_dt_add_memory_arch(paddr, size);
if (!is_usable_memory(md))
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhang Rui rui.zhang@intel.com
[ Upstream commit 34cf8c657cf0365791cdc658ddbca9cc907726ce ]
Currently, coretemp driver supports only 128 cores per package. This loses some core temperature information on systems that have more than 128 cores per package. [ 58.685033] coretemp coretemp.0: Adding Core 128 failed [ 58.692009] coretemp coretemp.0: Adding Core 129 failed ...
Enlarge the limitation to 512 because there are platforms with more than 256 cores per package.
Signed-off-by: Zhang Rui rui.zhang@intel.com Link: https://lore.kernel.org/r/20240202092144.71180-4-rui.zhang@intel.com Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/coretemp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c index d67d972d18aa2..cbe2f874b5e2f 100644 --- a/drivers/hwmon/coretemp.c +++ b/drivers/hwmon/coretemp.c @@ -40,7 +40,7 @@ MODULE_PARM_DESC(tjmax, "TjMax value in degrees Celsius");
#define PKG_SYSFS_ATTR_NO 1 /* Sysfs attribute for package temp */ #define BASE_SYSFS_ATTR_NO 2 /* Sysfs Base attr no for coretemp */ -#define NUM_REAL_CORES 128 /* Number of Real cores per cpu */ +#define NUM_REAL_CORES 512 /* Number of Real cores per cpu */ #define CORETEMP_NAME_LENGTH 28 /* String Length of attrs */ #define MAX_CORE_ATTRS 4 /* Maximum no of basic attrs */ #define TOTAL_ATTRS (MAX_CORE_ATTRS + 1)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hannes Reinecke hare@suse.de
[ Upstream commit d6c1b19153f92e95e5e1801d540e98771053afae ]
LUNs going into "failed ready running" state observed on >1T and on even numbers of size (2T, 4T, 6T, 8T and 10T). The issue occurs when DIF is enabled at the host.
The kernel logs:
Cannot setup S/G List for HBAIO segs 1/1 SGL 512 SCSI 256: 3 0
The host lpfc driver is failing to setup scatter/gather list (protection data) for the I/Os.
The return type lpfc_bg_setup_sgl()/lpfc_bg_setup_sgl_prot() causes the compiler to remove the most significant bit. Use an unsigned type instead.
Signed-off-by: Hannes Reinecke hare@suse.de [dwagner: added commit message] Signed-off-by: Daniel Wagner dwagner@suse.de Link: https://lore.kernel.org/r/20231220162658.12392-1-dwagner@suse.de Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/lpfc/lpfc_scsi.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c index 983eeb0e3d07e..b4b87e5d8b291 100644 --- a/drivers/scsi/lpfc/lpfc_scsi.c +++ b/drivers/scsi/lpfc/lpfc_scsi.c @@ -1944,7 +1944,7 @@ lpfc_bg_setup_bpl_prot(struct lpfc_hba *phba, struct scsi_cmnd *sc, * * Returns the number of SGEs added to the SGL. **/ -static int +static uint32_t lpfc_bg_setup_sgl(struct lpfc_hba *phba, struct scsi_cmnd *sc, struct sli4_sge *sgl, int datasegcnt, struct lpfc_io_buf *lpfc_cmd) @@ -1952,8 +1952,8 @@ lpfc_bg_setup_sgl(struct lpfc_hba *phba, struct scsi_cmnd *sc, struct scatterlist *sgde = NULL; /* s/g data entry */ struct sli4_sge_diseed *diseed = NULL; dma_addr_t physaddr; - int i = 0, num_sge = 0, status; - uint32_t reftag; + int i = 0, status; + uint32_t reftag, num_sge = 0; uint8_t txop, rxop; #ifdef CONFIG_SCSI_LPFC_DEBUG_FS uint32_t rc; @@ -2124,7 +2124,7 @@ lpfc_bg_setup_sgl(struct lpfc_hba *phba, struct scsi_cmnd *sc, * * Returns the number of SGEs added to the SGL. **/ -static int +static uint32_t lpfc_bg_setup_sgl_prot(struct lpfc_hba *phba, struct scsi_cmnd *sc, struct sli4_sge *sgl, int datacnt, int protcnt, struct lpfc_io_buf *lpfc_cmd) @@ -2148,8 +2148,8 @@ lpfc_bg_setup_sgl_prot(struct lpfc_hba *phba, struct scsi_cmnd *sc, uint32_t rc; #endif uint32_t checking = 1; - uint32_t dma_offset = 0; - int num_sge = 0, j = 2; + uint32_t dma_offset = 0, num_sge = 0; + int j = 2; struct sli4_hybrid_sgl *sgl_xtra = NULL;
sgpe = scsi_prot_sglist(sc);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Sakamoto o-takashi@sakamocchi.jp
[ Upstream commit 7ed4380009e96d9e9c605e12822e987b35b05648 ]
If we are bus manager and the bus has inconsistent gap counts, send a bus reset immediately instead of trying to read the root node's config ROM first. Otherwise, we could spend a lot of time trying to read the config ROM but never succeeding.
This eliminates a 50+ second delay before the FireWire bus is usable after a newly connected device is powered on in certain circumstances.
The delay occurs if a gap count inconsistency occurs, we are not the root node, and we become bus manager. One scenario that causes this is with a TI XIO2213B OHCI, the first time a Sony DSR-25 is powered on after being connected to the FireWire cable. In this configuration, the Linux box will not receive the initial PHY configuration packet sent by the DSR-25 as IRM, resulting in the DSR-25 having a gap count of 44 while the Linux box has a gap count of 63.
FireWire devices have a gap count parameter, which is set to 63 on power-up and can be changed with a PHY configuration packet. This determines the duration of the subaction and arbitration gaps. For reliable communication, all nodes on a FireWire bus must have the same gap count.
A node may have zero or more of the following roles: root node, bus manager (BM), isochronous resource manager (IRM), and cycle master. Unless a root node was forced with a PHY configuration packet, any node might become root node after a bus reset. Only the root node can become cycle master. If the root node is not cycle master capable, the BM or IRM should force a change of root node.
After a bus reset, each node sends a self-ID packet, which contains its current gap count. A single bus reset does not change the gap count, but two bus resets in a row will set the gap count to 63. Because a consistent gap count is required for reliable communication, IEEE 1394a-2000 requires that the bus manager generate a bus reset if it detects that the gap count is inconsistent.
When the gap count is inconsistent, build_tree() will notice this after the self identification process. It will set card->gap_count to the invalid value 0. If we become bus master, this will force bm_work() to send a bus reset when it performs gap count optimization.
After a bus reset, there is no bus manager. We will almost always try to become bus manager. Once we become bus manager, we will first determine whether the root node is cycle master capable. Then, we will determine if the gap count should be changed. If either the root node or the gap count should be changed, we will generate a bus reset.
To determine if the root node is cycle master capable, we read its configuration ROM. bm_work() will wait until we have finished trying to read the configuration ROM.
However, an inconsistent gap count can make this take a long time. read_config_rom() will read the first few quadlets from the config ROM. Due to the gap count inconsistency, eventually one of the reads will time out. When read_config_rom() fails, fw_device_init() calls it again until MAX_RETRIES is reached. This takes 50+ seconds.
Once we give up trying to read the configuration ROM, bm_work() will wake up, assume that the root node is not cycle master capable, and do a bus reset. Hopefully, this will resolve the gap count inconsistency.
This change makes bm_work() check for an inconsistent gap count before waiting for the root node's configuration ROM. If the gap count is inconsistent, bm_work() will immediately do a bus reset. This eliminates the 50+ second delay and rapidly brings the bus to a working state.
I considered that if the gap count is inconsistent, a PHY configuration packet might not be successful, so it could be desirable to skip the PHY configuration packet before the bus reset in this case. However, IEEE 1394a-2000 and IEEE 1394-2008 say that the bus manager may transmit a PHY configuration packet before a bus reset when correcting a gap count error. Since the standard endorses this, I decided it's safe to retain the PHY configuration packet transmission.
Normally, after a topology change, we will reset the bus a maximum of 5 times to change the root node and perform gap count optimization. However, if there is a gap count inconsistency, we must always generate a bus reset. Otherwise the gap count inconsistency will persist and communication will be unreliable. For that reason, if there is a gap count inconstency, we generate a bus reset even if we already reached the 5 reset limit.
Signed-off-by: Adam Goldman adamg@pobox.com Reference: https://sourceforge.net/p/linux1394/mailman/message/58727806/ Signed-off-by: Takashi Sakamoto o-takashi@sakamocchi.jp Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firewire/core-card.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/drivers/firewire/core-card.c b/drivers/firewire/core-card.c index f3b3953cac834..be195ba834632 100644 --- a/drivers/firewire/core-card.c +++ b/drivers/firewire/core-card.c @@ -429,7 +429,23 @@ static void bm_work(struct work_struct *work) */ card->bm_generation = generation;
- if (root_device == NULL) { + if (card->gap_count == 0) { + /* + * If self IDs have inconsistent gap counts, do a + * bus reset ASAP. The config rom read might never + * complete, so don't wait for it. However, still + * send a PHY configuration packet prior to the + * bus reset. The PHY configuration packet might + * fail, but 1394-2008 8.4.5.2 explicitly permits + * it in this case, so it should be safe to try. + */ + new_root_id = local_id; + /* + * We must always send a bus reset if the gap count + * is inconsistent, so bypass the 5-reset limit. + */ + card->bm_retries = 0; + } else if (root_device == NULL) { /* * Either link_on is false, or we failed to read the * config rom. In either case, pick another root.
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yi Sun yi.sun@unisoc.com
[ Upstream commit 4ce6e2db00de8103a0687fb0f65fd17124a51aaa ]
Ensure no remaining requests in virtqueues before resetting vdev and deleting virtqueues. Otherwise these requests will never be completed. It may cause the system to become unresponsive.
Function blk_mq_quiesce_queue() can ensure that requests have become in_flight status, but it cannot guarantee that requests have been processed by the device. Virtqueues should never be deleted before all requests become complete status.
Function blk_mq_freeze_queue() ensure that all requests in virtqueues become complete status. And no requests can enter in virtqueues.
Signed-off-by: Yi Sun yi.sun@unisoc.com Reviewed-by: Stefan Hajnoczi stefanha@redhat.com Link: https://lore.kernel.org/r/20240129085250.1550594-1-yi.sun@unisoc.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/block/virtio_blk.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 9b54eec9b17eb..7eae3f3732336 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -952,14 +952,15 @@ static int virtblk_freeze(struct virtio_device *vdev) { struct virtio_blk *vblk = vdev->priv;
+ /* Ensure no requests in virtqueues before deleting vqs. */ + blk_mq_freeze_queue(vblk->disk->queue); + /* Ensure we don't receive any more interrupts */ vdev->config->reset(vdev);
/* Make sure no work handler is accessing the device. */ flush_work(&vblk->config_work);
- blk_mq_quiesce_queue(vblk->disk->queue); - vdev->config->del_vqs(vdev); kfree(vblk->vqs);
@@ -977,7 +978,7 @@ static int virtblk_restore(struct virtio_device *vdev)
virtio_device_ready(vdev);
- blk_mq_unquiesce_queue(vblk->disk->queue); + blk_mq_unfreeze_queue(vblk->disk->queue); return 0; } #endif
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geert Uytterhoeven geert+renesas@glider.be
[ Upstream commit f0e4a1356466ec1858ae8e5c70bea2ce5e55008b ]
The power domain containing the Cortex-R7 CPU core on the R-Car V3H SoC must always be in power-on state, unlike on other SoCs in the R-Car Gen3 family. See Table 9.4 "Power domains" in the R-Car Series, 3rd Generation Hardware User’s Manual Rev.1.00 and later.
Fix this by marking the domain as a CPU domain without control registers, so the driver will not touch it.
Fixes: 41d6d8bd8ae9 ("soc: renesas: rcar-sysc: add R8A77980 support") Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/fdad9a86132d53ecddf72b734dac406915c4edc0.170507673... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/soc/renesas/r8a77980-sysc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/soc/renesas/r8a77980-sysc.c b/drivers/soc/renesas/r8a77980-sysc.c index 39ca84a67daad..621e411fc9991 100644 --- a/drivers/soc/renesas/r8a77980-sysc.c +++ b/drivers/soc/renesas/r8a77980-sysc.c @@ -25,7 +25,8 @@ static const struct rcar_sysc_area r8a77980_areas[] __initconst = { PD_CPU_NOCR }, { "ca53-cpu3", 0x200, 3, R8A77980_PD_CA53_CPU3, R8A77980_PD_CA53_SCU, PD_CPU_NOCR }, - { "cr7", 0x240, 0, R8A77980_PD_CR7, R8A77980_PD_ALWAYS_ON }, + { "cr7", 0x240, 0, R8A77980_PD_CR7, R8A77980_PD_ALWAYS_ON, + PD_CPU_NOCR }, { "a3ir", 0x180, 0, R8A77980_PD_A3IR, R8A77980_PD_ALWAYS_ON }, { "a2ir0", 0x400, 0, R8A77980_PD_A2IR0, R8A77980_PD_A3IR }, { "a2ir1", 0x400, 1, R8A77980_PD_A2IR1, R8A77980_PD_A3IR },
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafał Miłecki rafal@milecki.pl
[ Upstream commit be7e1e5b0f67c58ec4be0a54db23b6a4fa6e2116 ]
There is no such trigger documented or implemented in Linux. It was a copy & paste mistake.
This fixes: arch/arm/boot/dts/broadcom/bcm47189-luxul-xap-1440.dtb: leds: led-wlan:linux,default-trigger: 'oneOf' conditional failed, one must be fixed: 'default-off' is not one of ['backlight', 'default-on', 'heartbeat', 'disk-activity', 'disk-read', 'disk-write', 'timer', 'pattern', 'audio-micmute', 'audio-mute', 'bluetooth-power', 'flash', 'kbd-capslock', 'mtd', 'nand-disk', 'none', 'torch', 'usb-gadget', 'usb-host', 'usbport'] 'default-off' does not match '^cpu[0-9]*$' 'default-off' does not match '^hci[0-9]+-power$' 'default-off' does not match '^mmc[0-9]+$' 'default-off' does not match '^phy[0-9]+tx$' From schema: Documentation/devicetree/bindings/leds/leds-gpio.yaml
Signed-off-by: Rafał Miłecki rafal@milecki.pl Link: https://lore.kernel.org/r/20230707114004.2740-1-zajec5@gmail.com Signed-off-by: Florian Fainelli florian.fainelli@broadcom.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/bcm47189-luxul-xap-1440.dts | 1 - arch/arm/boot/dts/bcm47189-luxul-xap-810.dts | 2 -- 2 files changed, 3 deletions(-)
diff --git a/arch/arm/boot/dts/bcm47189-luxul-xap-1440.dts b/arch/arm/boot/dts/bcm47189-luxul-xap-1440.dts index 00e688b45d981..5901160919dcd 100644 --- a/arch/arm/boot/dts/bcm47189-luxul-xap-1440.dts +++ b/arch/arm/boot/dts/bcm47189-luxul-xap-1440.dts @@ -26,7 +26,6 @@ leds { wlan { label = "bcm53xx:blue:wlan"; gpios = <&chipcommon 10 GPIO_ACTIVE_LOW>; - linux,default-trigger = "default-off"; };
system { diff --git a/arch/arm/boot/dts/bcm47189-luxul-xap-810.dts b/arch/arm/boot/dts/bcm47189-luxul-xap-810.dts index 78c80a5d3f4fa..8e7483272d47d 100644 --- a/arch/arm/boot/dts/bcm47189-luxul-xap-810.dts +++ b/arch/arm/boot/dts/bcm47189-luxul-xap-810.dts @@ -26,7 +26,6 @@ leds { 5ghz { label = "bcm53xx:blue:5ghz"; gpios = <&chipcommon 11 GPIO_ACTIVE_HIGH>; - linux,default-trigger = "default-off"; };
system { @@ -42,7 +41,6 @@ pcie0_leds { 2ghz { label = "bcm53xx:blue:2ghz"; gpios = <&pcie0_chipcommon 3 GPIO_ACTIVE_HIGH>; - linux,default-trigger = "default-off"; }; };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiaxun Yang jiaxun.yang@flygoat.com
[ Upstream commit 2c6c9c049510163090b979ea5f92a68ae8d93c45 ]
When a GIC local interrupt is not routable, it's vl_map will be used to control some internal states for core (providing IPTI, IPPCI, IPFDC input signal for core). Overriding it will interfere core's intetrupt controller.
Do not touch vl_map if a local interrupt is not routable, we are not going to remap it.
Before dd098a0e0319 (" irqchip/mips-gic: Get rid of the reliance on irq_cpu_online()"), if a local interrupt is not routable, then it won't be requested from GIC Local domain, and thus gic_all_vpes_irq_cpu_online won't be called for that particular interrupt.
Fixes: dd098a0e0319 (" irqchip/mips-gic: Get rid of the reliance on irq_cpu_online()") Cc: stable@vger.kernel.org Signed-off-by: Jiaxun Yang jiaxun.yang@flygoat.com Reviewed-by: Serge Semin fancer.lancer@gmail.com Tested-by: Serge Semin fancer.lancer@gmail.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20230424103156.66753-2-jiaxun.yang@flygoat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/irqchip/irq-mips-gic.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/irqchip/irq-mips-gic.c b/drivers/irqchip/irq-mips-gic.c index fc25b900cef71..7888e3c08df40 100644 --- a/drivers/irqchip/irq-mips-gic.c +++ b/drivers/irqchip/irq-mips-gic.c @@ -398,6 +398,8 @@ static void gic_all_vpes_irq_cpu_online(void) unsigned int intr = local_intrs[i]; struct gic_all_vpes_chip_data *cd;
+ if (!gic_local_irq_is_routable(intr)) + continue; cd = &gic_all_vpes_chip_data[intr]; write_gic_vl_map(mips_gic_vx_map_reg(intr), cd->map); if (cd->mask)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xiaolei Wang xiaolei.wang@windriver.com
[ Upstream commit 0a2b96e42a0284c4fc03022236f656a085ca714a ]
If the tuning step is not set, the tuning step is set to 1. For some sd cards, the following Tuning timeout will occur.
Tuning failed, falling back to fixed sampling clock
So set the default tuning step. This refers to the NXP vendor's commit below:
https://github.com/nxp-imx/linux-imx/blob/lf-6.1.y/ arch/arm/boot/dts/imx6sx.dtsi#L1108-L1109
Fixes: 1e336aa0c025 ("mmc: sdhci-esdhc-imx: correct the tuning start tap and step setting") Signed-off-by: Xiaolei Wang xiaolei.wang@windriver.com Reviewed-by: Fabio Estevam festevam@gmail.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/imx6sx.dtsi | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/arch/arm/boot/dts/imx6sx.dtsi b/arch/arm/boot/dts/imx6sx.dtsi index 08332f70a8dc2..51491b7418e40 100644 --- a/arch/arm/boot/dts/imx6sx.dtsi +++ b/arch/arm/boot/dts/imx6sx.dtsi @@ -981,6 +981,8 @@ usdhc1: mmc@2190000 { <&clks IMX6SX_CLK_USDHC1>; clock-names = "ipg", "ahb", "per"; bus-width = <4>; + fsl,tuning-start-tap = <20>; + fsl,tuning-step= <2>; status = "disabled"; };
@@ -993,6 +995,8 @@ usdhc2: mmc@2194000 { <&clks IMX6SX_CLK_USDHC2>; clock-names = "ipg", "ahb", "per"; bus-width = <4>; + fsl,tuning-start-tap = <20>; + fsl,tuning-step= <2>; status = "disabled"; };
@@ -1005,6 +1009,8 @@ usdhc3: mmc@2198000 { <&clks IMX6SX_CLK_USDHC3>; clock-names = "ipg", "ahb", "per"; bus-width = <4>; + fsl,tuning-start-tap = <20>; + fsl,tuning-step= <2>; status = "disabled"; };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shengjiu Wang shengjiu.wang@nxp.com
[ Upstream commit 0adf292069dcca8bab76a603251fcaabf77468ca ]
There is no defer probe when adding platform component to snd_soc_pcm_runtime(rtd), the code is in snd_soc_add_pcm_runtime()
snd_soc_register_card() -> snd_soc_bind_card() -> snd_soc_add_pcm_runtime() -> adding cpu dai -> adding codec dai -> adding platform component.
So if the platform component is not ready at that time, then the sound card still registered successfully, but platform component is empty, the sound card can't be used.
As there is defer probe checking for cpu dai component, then register platform component before cpu dai to avoid such issue.
Fixes: 47a70e6fc9a8 ("ASoC: Add MICFIL SoC Digital Audio Interface driver.") Signed-off-by: Shengjiu Wang shengjiu.wang@nxp.com Link: https://lore.kernel.org/r/1630665006-31437-4-git-send-email-shengjiu.wang@nx... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/fsl/fsl_micfil.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/sound/soc/fsl/fsl_micfil.c b/sound/soc/fsl/fsl_micfil.c index 97f83c63e7652..826829e3ff7a2 100644 --- a/sound/soc/fsl/fsl_micfil.c +++ b/sound/soc/fsl/fsl_micfil.c @@ -756,18 +756,23 @@ static int fsl_micfil_probe(struct platform_device *pdev)
pm_runtime_enable(&pdev->dev);
+ /* + * Register platform component before registering cpu dai for there + * is not defer probe for platform component in snd_soc_add_pcm_runtime(). + */ + ret = devm_snd_dmaengine_pcm_register(&pdev->dev, NULL, 0); + if (ret) { + dev_err(&pdev->dev, "failed to pcm register\n"); + return ret; + } + ret = devm_snd_soc_register_component(&pdev->dev, &fsl_micfil_component, &fsl_micfil_dai, 1); if (ret) { dev_err(&pdev->dev, "failed to register component %s\n", fsl_micfil_component.name); - return ret; }
- ret = devm_snd_dmaengine_pcm_register(&pdev->dev, NULL, 0); - if (ret) - dev_err(&pdev->dev, "failed to pcm register\n"); - return ret; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter error27@gmail.com
[ Upstream commit eed9496a0501357aa326ddd6b71408189ed872eb ]
The buf[4] value comes from the user via ts_play(). It is a value in the u8 range. The final length we pass to av7110_ipack_instant_repack() is "len - (buf[4] + 1) - 4" so add a check to ensure that the length is not negative. It's not clear that passing a negative len value does anything bad necessarily, but it's not best practice.
With the new bounds checking the "if (!len)" condition is no longer possible or required so remove that.
Fixes: fd46d16d602a ("V4L/DVB (11759): dvb-ttpci: Add TS replay capability") Signed-off-by: Dan Carpenter error27@gmail.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/pci/ttpci/av7110_av.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/media/pci/ttpci/av7110_av.c b/drivers/media/pci/ttpci/av7110_av.c index ea9f7d0058a21..e201d5a56bc65 100644 --- a/drivers/media/pci/ttpci/av7110_av.c +++ b/drivers/media/pci/ttpci/av7110_av.c @@ -822,10 +822,10 @@ static int write_ts_to_decoder(struct av7110 *av7110, int type, const u8 *buf, s av7110_ipack_flush(ipack);
if (buf[3] & ADAPT_FIELD) { + if (buf[4] > len - 1 - 4) + return 0; len -= buf[4] + 1; buf += buf[4] + 1; - if (!len) - return 0; }
av7110_ipack_instant_repack(buf + 4, len - 4, ipack);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Roger Pau Monne roger.pau@citrix.com
[ Upstream commit 6214894f49a967c749ee6c07cb00f9cede748df4 ]
The hvc machinery registers both a console and a tty device based on the hv ops provided by the specific implementation. Those two interfaces however have different locks, and there's no single locks that's shared between the tty and the console implementations, hence the driver needs to protect itself against concurrent accesses. Otherwise concurrent calls using the split interfaces are likely to corrupt the ring indexes, leaving the console unusable.
Introduce a lock to xencons_info to serialize accesses to the shared ring. This is only required when using the shared memory console, concurrent accesses to the hypercall based console implementation are not an issue.
Note the conditional logic in domU_read_console() is slightly modified so the notify_daemon() call can be done outside of the locked region: it's an hypercall and there's no need for it to be done with the lock held.
Fixes: b536b4b96230 ('xen: use the hvc console infrastructure for Xen console') Signed-off-by: Roger Pau Monné roger.pau@citrix.com Reviewed-by: Juergen Gross jgross@suse.com Link: https://lore.kernel.org/r/20221130150919.13935-1-roger.pau@citrix.com Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/tty/hvc/hvc_xen.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c index cf0fb650a9247..4886cad0fde61 100644 --- a/drivers/tty/hvc/hvc_xen.c +++ b/drivers/tty/hvc/hvc_xen.c @@ -43,6 +43,7 @@ struct xencons_info { int irq; int vtermno; grant_ref_t gntref; + spinlock_t ring_lock; };
static LIST_HEAD(xenconsoles); @@ -89,12 +90,15 @@ static int __write_console(struct xencons_info *xencons, XENCONS_RING_IDX cons, prod; struct xencons_interface *intf = xencons->intf; int sent = 0; + unsigned long flags;
+ spin_lock_irqsave(&xencons->ring_lock, flags); cons = intf->out_cons; prod = intf->out_prod; mb(); /* update queue values before going on */
if ((prod - cons) > sizeof(intf->out)) { + spin_unlock_irqrestore(&xencons->ring_lock, flags); pr_err_once("xencons: Illegal ring page indices"); return -EINVAL; } @@ -104,6 +108,7 @@ static int __write_console(struct xencons_info *xencons,
wmb(); /* write ring before updating pointer */ intf->out_prod = prod; + spin_unlock_irqrestore(&xencons->ring_lock, flags);
if (sent) notify_daemon(xencons); @@ -146,16 +151,19 @@ static int domU_read_console(uint32_t vtermno, char *buf, int len) int recv = 0; struct xencons_info *xencons = vtermno_to_xencons(vtermno); unsigned int eoiflag = 0; + unsigned long flags;
if (xencons == NULL) return -EINVAL; intf = xencons->intf;
+ spin_lock_irqsave(&xencons->ring_lock, flags); cons = intf->in_cons; prod = intf->in_prod; mb(); /* get pointers before reading ring */
if ((prod - cons) > sizeof(intf->in)) { + spin_unlock_irqrestore(&xencons->ring_lock, flags); pr_err_once("xencons: Illegal ring page indices"); return -EINVAL; } @@ -179,10 +187,13 @@ static int domU_read_console(uint32_t vtermno, char *buf, int len) xencons->out_cons = intf->out_cons; xencons->out_cons_same = 0; } + if (!recv && xencons->out_cons_same++ > 1) { + eoiflag = XEN_EOI_FLAG_SPURIOUS; + } + spin_unlock_irqrestore(&xencons->ring_lock, flags); + if (recv) { notify_daemon(xencons); - } else if (xencons->out_cons_same++ > 1) { - eoiflag = XEN_EOI_FLAG_SPURIOUS; }
xen_irq_lateeoi(xencons->irq, eoiflag); @@ -239,6 +250,7 @@ static int xen_hvm_console_init(void) info = kzalloc(sizeof(struct xencons_info), GFP_KERNEL); if (!info) return -ENOMEM; + spin_lock_init(&info->ring_lock); } else if (info->intf != NULL) { /* already configured */ return 0; @@ -275,6 +287,7 @@ static int xen_hvm_console_init(void)
static int xencons_info_pv_init(struct xencons_info *info, int vtermno) { + spin_lock_init(&info->ring_lock); info->evtchn = xen_start_info->console.domU.evtchn; /* GFN == MFN for PV guest */ info->intf = gfn_to_virt(xen_start_info->console.domU.mfn); @@ -325,6 +338,7 @@ static int xen_initial_domain_console_init(void) info = kzalloc(sizeof(struct xencons_info), GFP_KERNEL); if (!info) return -ENOMEM; + spin_lock_init(&info->ring_lock); }
info->irq = bind_virq_to_irq(VIRQ_CONSOLE, 0, false); @@ -485,6 +499,7 @@ static int xencons_probe(struct xenbus_device *dev, info = kzalloc(sizeof(struct xencons_info), GFP_KERNEL); if (!info) return -ENOMEM; + spin_lock_init(&info->ring_lock); dev_set_drvdata(&dev->dev, info); info->xbdev = dev; info->vtermno = xenbus_devid_to_vtermno(devid);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
[ Upstream commit 0c74d9f79ec4299365bbe803baa736ae0068179e ]
Due to the hashed-MAC optimisation one problem become visible: hsr_handle_sup_frame() walks over the list of available nodes and merges two node entries into one if based on the information in the supervision both MAC addresses belong to one node. The list-walk happens on a RCU protected list and delete operation happens under a lock.
If the supervision arrives on both slave interfaces at the same time then this delete operation can occur simultaneously on two CPUs. The result is the first-CPU deletes the from the list and the second CPUs BUGs while attempting to dereference a poisoned list-entry. This happens more likely with the optimisation because a new node for the mac_B entry is created once a packet has been received and removed (merged) once the supervision frame has been received.
Avoid removing/ cleaning up a hsr_node twice by adding a `removed' field which is set to true after the removal and checked before the removal.
Fixes: f266a683a4804 ("net/hsr: Better frame dispatch") Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/hsr/hsr_framereg.c | 16 +++++++++++----- net/hsr/hsr_framereg.h | 1 + 2 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/net/hsr/hsr_framereg.c b/net/hsr/hsr_framereg.c index afc97d65cf2d8..87fc86aade5c9 100644 --- a/net/hsr/hsr_framereg.c +++ b/net/hsr/hsr_framereg.c @@ -327,9 +327,12 @@ void hsr_handle_sup_frame(struct hsr_frame_info *frame) node_real->addr_B_port = port_rcv->type;
spin_lock_bh(&hsr->list_lock); - list_del_rcu(&node_curr->mac_list); + if (!node_curr->removed) { + list_del_rcu(&node_curr->mac_list); + node_curr->removed = true; + kfree_rcu(node_curr, rcu_head); + } spin_unlock_bh(&hsr->list_lock); - kfree_rcu(node_curr, rcu_head);
done: /* PRP uses v0 header */ @@ -506,9 +509,12 @@ void hsr_prune_nodes(struct timer_list *t) if (time_is_before_jiffies(timestamp + msecs_to_jiffies(HSR_NODE_FORGET_TIME))) { hsr_nl_nodedown(hsr, node->macaddress_A); - list_del_rcu(&node->mac_list); - /* Note that we need to free this entry later: */ - kfree_rcu(node, rcu_head); + if (!node->removed) { + list_del_rcu(&node->mac_list); + node->removed = true; + /* Note that we need to free this entry later: */ + kfree_rcu(node, rcu_head); + } } } spin_unlock_bh(&hsr->list_lock); diff --git a/net/hsr/hsr_framereg.h b/net/hsr/hsr_framereg.h index 5a771cb3f0325..48990166e4c4e 100644 --- a/net/hsr/hsr_framereg.h +++ b/net/hsr/hsr_framereg.h @@ -82,6 +82,7 @@ struct hsr_node { bool san_a; bool san_b; u16 seq_out[HSR_PT_PORTS]; + bool removed; struct rcu_head rcu_head; };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Zijlstra peterz@infradead.org
[ Upstream commit 989b5db215a2f22f89d730b607b071d964780f10 ]
Add support for CMPXCHG loops on userspace addresses. Provide both an "unsafe" version for tight loops that do their own uaccess begin/end, as well as a "safe" version for use cases where the CMPXCHG is not buried in a loop, e.g. KVM will resume the guest instead of looping when emulation of a guest atomic accesses fails the CMPXCHG.
Provide 8-byte versions for 32-bit kernels so that KVM can do CMPXCHG on guest PAE PTEs, which are accessed via userspace addresses.
Guard the asm_volatile_goto() variation with CC_HAS_ASM_GOTO_TIED_OUTPUT, the "+m" constraint fails on some compilers that otherwise support CC_HAS_ASM_GOTO_OUTPUT.
Cc: stable@vger.kernel.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Co-developed-by: Sean Christopherson seanjc@google.com Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20220202004945.2540433-3-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/uaccess.h | 142 +++++++++++++++++++++++++++++++++ 1 file changed, 142 insertions(+)
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index bf2561a5eb581..68b910f30b222 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -414,6 +414,103 @@ do { \
#endif // CONFIG_CC_ASM_GOTO_OUTPUT
+#ifdef CONFIG_CC_HAS_ASM_GOTO_TIED_OUTPUT +#define __try_cmpxchg_user_asm(itype, ltype, _ptr, _pold, _new, label) ({ \ + bool success; \ + __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ + __typeof__(*(_ptr)) __old = *_old; \ + __typeof__(*(_ptr)) __new = (_new); \ + asm_volatile_goto("\n" \ + "1: " LOCK_PREFIX "cmpxchg"itype" %[new], %[ptr]\n"\ + _ASM_EXTABLE_UA(1b, %l[label]) \ + : CC_OUT(z) (success), \ + [ptr] "+m" (*_ptr), \ + [old] "+a" (__old) \ + : [new] ltype (__new) \ + : "memory" \ + : label); \ + if (unlikely(!success)) \ + *_old = __old; \ + likely(success); }) + +#ifdef CONFIG_X86_32 +#define __try_cmpxchg64_user_asm(_ptr, _pold, _new, label) ({ \ + bool success; \ + __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ + __typeof__(*(_ptr)) __old = *_old; \ + __typeof__(*(_ptr)) __new = (_new); \ + asm_volatile_goto("\n" \ + "1: " LOCK_PREFIX "cmpxchg8b %[ptr]\n" \ + _ASM_EXTABLE_UA(1b, %l[label]) \ + : CC_OUT(z) (success), \ + "+A" (__old), \ + [ptr] "+m" (*_ptr) \ + : "b" ((u32)__new), \ + "c" ((u32)((u64)__new >> 32)) \ + : "memory" \ + : label); \ + if (unlikely(!success)) \ + *_old = __old; \ + likely(success); }) +#endif // CONFIG_X86_32 +#else // !CONFIG_CC_HAS_ASM_GOTO_TIED_OUTPUT +#define __try_cmpxchg_user_asm(itype, ltype, _ptr, _pold, _new, label) ({ \ + int __err = 0; \ + bool success; \ + __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ + __typeof__(*(_ptr)) __old = *_old; \ + __typeof__(*(_ptr)) __new = (_new); \ + asm volatile("\n" \ + "1: " LOCK_PREFIX "cmpxchg"itype" %[new], %[ptr]\n"\ + CC_SET(z) \ + "2:\n" \ + _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_EFAULT_REG, \ + %[errout]) \ + : CC_OUT(z) (success), \ + [errout] "+r" (__err), \ + [ptr] "+m" (*_ptr), \ + [old] "+a" (__old) \ + : [new] ltype (__new) \ + : "memory", "cc"); \ + if (unlikely(__err)) \ + goto label; \ + if (unlikely(!success)) \ + *_old = __old; \ + likely(success); }) + +#ifdef CONFIG_X86_32 +/* + * Unlike the normal CMPXCHG, hardcode ECX for both success/fail and error. + * There are only six GPRs available and four (EAX, EBX, ECX, and EDX) are + * hardcoded by CMPXCHG8B, leaving only ESI and EDI. If the compiler uses + * both ESI and EDI for the memory operand, compilation will fail if the error + * is an input+output as there will be no register available for input. + */ +#define __try_cmpxchg64_user_asm(_ptr, _pold, _new, label) ({ \ + int __result; \ + __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ + __typeof__(*(_ptr)) __old = *_old; \ + __typeof__(*(_ptr)) __new = (_new); \ + asm volatile("\n" \ + "1: " LOCK_PREFIX "cmpxchg8b %[ptr]\n" \ + "mov $0, %%ecx\n\t" \ + "setz %%cl\n" \ + "2:\n" \ + _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_EFAULT_REG, %%ecx) \ + : [result]"=c" (__result), \ + "+A" (__old), \ + [ptr] "+m" (*_ptr) \ + : "b" ((u32)__new), \ + "c" ((u32)((u64)__new >> 32)) \ + : "memory", "cc"); \ + if (unlikely(__result < 0)) \ + goto label; \ + if (unlikely(!__result)) \ + *_old = __old; \ + likely(__result); }) +#endif // CONFIG_X86_32 +#endif // CONFIG_CC_HAS_ASM_GOTO_TIED_OUTPUT + /* FIXME: this hack is definitely wrong -AK */ struct __large_struct { unsigned long buf[100]; }; #define __m(x) (*(struct __large_struct __user *)(x)) @@ -506,6 +603,51 @@ do { \ } while (0) #endif // CONFIG_CC_HAS_ASM_GOTO_OUTPUT
+extern void __try_cmpxchg_user_wrong_size(void); + +#ifndef CONFIG_X86_32 +#define __try_cmpxchg64_user_asm(_ptr, _oldp, _nval, _label) \ + __try_cmpxchg_user_asm("q", "r", (_ptr), (_oldp), (_nval), _label) +#endif + +/* + * Force the pointer to u<size> to match the size expected by the asm helper. + * clang/LLVM compiles all cases and only discards the unused paths after + * processing errors, which breaks i386 if the pointer is an 8-byte value. + */ +#define unsafe_try_cmpxchg_user(_ptr, _oldp, _nval, _label) ({ \ + bool __ret; \ + __chk_user_ptr(_ptr); \ + switch (sizeof(*(_ptr))) { \ + case 1: __ret = __try_cmpxchg_user_asm("b", "q", \ + (__force u8 *)(_ptr), (_oldp), \ + (_nval), _label); \ + break; \ + case 2: __ret = __try_cmpxchg_user_asm("w", "r", \ + (__force u16 *)(_ptr), (_oldp), \ + (_nval), _label); \ + break; \ + case 4: __ret = __try_cmpxchg_user_asm("l", "r", \ + (__force u32 *)(_ptr), (_oldp), \ + (_nval), _label); \ + break; \ + case 8: __ret = __try_cmpxchg64_user_asm((__force u64 *)(_ptr), (_oldp),\ + (_nval), _label); \ + break; \ + default: __try_cmpxchg_user_wrong_size(); \ + } \ + __ret; }) + +/* "Returns" 0 on success, 1 on failure, -EFAULT if the access faults. */ +#define __try_cmpxchg_user(_ptr, _oldp, _nval, _label) ({ \ + int __ret = -EFAULT; \ + __uaccess_begin_nospec(); \ + __ret = !unsafe_try_cmpxchg_user(_ptr, _oldp, _nval, _label); \ +_label: \ + __uaccess_end(); \ + __ret; \ + }) + /* * We want the unsafe accessors to always be inlined and use * the error labels - thus the macro games.
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook keescook@chromium.org
[ Upstream commit 495ac3069a6235bfdf516812a2a9b256671bbdf9 ]
If seccomp tries to kill a process, it should never see that process again. To enforce this proactively, switch the mode to something impossible. If encountered: WARN, reject all syscalls, and attempt to kill the process again even harder.
Cc: Andy Lutomirski luto@amacapital.net Cc: Will Drewry wad@chromium.org Fixes: 8112c4f140fa ("seccomp: remove 2-phase API") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/seccomp.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 305f0eca163ed..0b0331346e4be 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -29,6 +29,9 @@ #include <linux/syscalls.h> #include <linux/sysctl.h>
+/* Not exposed in headers: strictly internal use only. */ +#define SECCOMP_MODE_DEAD (SECCOMP_MODE_FILTER + 1) + #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER #include <asm/syscall.h> #endif @@ -795,6 +798,7 @@ static void __secure_computing_strict(int this_syscall) #ifdef SECCOMP_DEBUG dump_stack(); #endif + current->seccomp.mode = SECCOMP_MODE_DEAD; seccomp_log(this_syscall, SIGKILL, SECCOMP_RET_KILL_THREAD, true); do_exit(SIGKILL); } @@ -1023,6 +1027,7 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd, case SECCOMP_RET_KILL_THREAD: case SECCOMP_RET_KILL_PROCESS: default: + current->seccomp.mode = SECCOMP_MODE_DEAD; seccomp_log(this_syscall, SIGSYS, action, true); /* Dump core only if this is the last remaining thread. */ if (action != SECCOMP_RET_KILL_THREAD || @@ -1075,6 +1080,11 @@ int __secure_computing(const struct seccomp_data *sd) return 0; case SECCOMP_MODE_FILTER: return __seccomp_filter(this_syscall, sd, false); + /* Surviving SECCOMP_RET_KILL_* must be proactively impossible. */ + case SECCOMP_MODE_DEAD: + WARN_ON_ONCE(1); + do_exit(SIGKILL); + return -1; default: BUG(); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Schmitz schmitzmic@gmail.com
[ Upstream commit 86d46fdaa12ae5befc16b8d73fc85a3ca0399ea6 ]
Refactoring of the Atari floppy driver when converting to blk-mq has broken the state machine in not-so-subtle ways:
finish_fdc() must be called when operations on the floppy device have completed. This is crucial in order to relase the ST-DMA lock, which protects against concurrent access to the ST-DMA controller by other drivers (some DMA related, most just related to device register access - broken beyond compare, I know).
When rewriting the driver's old do_request() function, the fact that finish_fdc() was called only when all queued requests had completed appears to have been overlooked. Instead, the new request function calls finish_fdc() immediately after the last request has been queued. finish_fdc() executes a dummy seek after most requests, and this overwrites the state machine's interrupt hander that was set up to wait for completion of the read/write request just prior. To make matters worse, finish_fdc() is called before device interrupts are re-enabled, making certain that the read/write interupt is missed.
Shifting the finish_fdc() call into the read/write request completion handler ensures the driver waits for the request to actually complete. With a queue depth of 2, we won't see long request sequences, so calling finish_fdc() unconditionally just adds a little overhead for the dummy seeks, and keeps the code simple.
While we're at it, kill ataflop_commit_rqs() which does nothing but run finish_fdc() unconditionally, again likely wiping out an in-flight request.
Signed-off-by: Michael Schmitz schmitzmic@gmail.com Fixes: 6ec3938cff95 ("ataflop: convert to blk-mq") CC: linux-block@vger.kernel.org CC: Tetsuo Handa penguin-kernel@i-love.sakura.ne.jp Link: https://lore.kernel.org/r/20211019061321.26425-1-schmitzmic@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/block/ataflop.c | 18 +++--------------- 1 file changed, 3 insertions(+), 15 deletions(-)
diff --git a/drivers/block/ataflop.c b/drivers/block/ataflop.c index 3e881fdb06e0a..cd612cd04767a 100644 --- a/drivers/block/ataflop.c +++ b/drivers/block/ataflop.c @@ -653,9 +653,6 @@ static inline void copy_buffer(void *from, void *to) *p2++ = *p1++; }
- - - /* General Interrupt Handling */
static void (*FloppyIRQHandler)( int status ) = NULL; @@ -1225,6 +1222,7 @@ static void fd_rwsec_done1(int status) } else { /* all sectors finished */ + finish_fdc(); fd_end_request_cur(BLK_STS_OK); } return; @@ -1472,15 +1470,6 @@ static void setup_req_params( int drive ) ReqTrack, ReqSector, (unsigned long)ReqData )); }
-static void ataflop_commit_rqs(struct blk_mq_hw_ctx *hctx) -{ - spin_lock_irq(&ataflop_lock); - atari_disable_irq(IRQ_MFP_FDC); - finish_fdc(); - atari_enable_irq(IRQ_MFP_FDC); - spin_unlock_irq(&ataflop_lock); -} - static blk_status_t ataflop_queue_rq(struct blk_mq_hw_ctx *hctx, const struct blk_mq_queue_data *bd) { @@ -1488,6 +1477,8 @@ static blk_status_t ataflop_queue_rq(struct blk_mq_hw_ctx *hctx, int drive = floppy - unit; int type = floppy->type;
+ DPRINT(("Queue request: drive %d type %d last %d\n", drive, type, bd->last)); + spin_lock_irq(&ataflop_lock); if (fd_request) { spin_unlock_irq(&ataflop_lock); @@ -1547,8 +1538,6 @@ static blk_status_t ataflop_queue_rq(struct blk_mq_hw_ctx *hctx, setup_req_params( drive ); do_fd_action( drive );
- if (bd->last) - finish_fdc(); atari_enable_irq( IRQ_MFP_FDC );
out: @@ -1959,7 +1948,6 @@ static const struct block_device_operations floppy_fops = {
static const struct blk_mq_ops ataflop_mq_ops = { .queue_rq = ataflop_queue_rq, - .commit_rqs = ataflop_commit_rqs, };
static struct kobject *floppy_find(dev_t dev, int *part, void *data)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ravi Bangoria ravi.bangoria@linux.ibm.com
[ Upstream commit 3d2ffcdd2a982e8bbe65fa0f94fb21bf304c281e ]
POWER10 DD1 has an issue where it generates watchpoint exceptions when it shouldn't. The conditions where this occur are:
- octword op - ending address of DAWR range is less than starting address of op - those addresses need to be in the same or in two consecutive 512B blocks - 'op address + 64B' generates an address that has a carry into bit 52 (crosses 2K boundary)
Handle such spurious exception by considering them as extraneous and emulating/single-steeping instruction without generating an event.
[ravi: Fixed build warning reported by lkp@intel.com] Signed-off-by: Ravi Bangoria ravi.bangoria@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20201106045650.278987-1-ravi.bangoria@linux.ibm.co... Stable-dep-of: 27646b2e02b0 ("powerpc/watchpoints: Annotate atomic context in more places") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/hw_breakpoint.c | 67 ++++++++++++++++++++++++++++- 1 file changed, 65 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 6e5bed50c3578..49273f67c7498 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -504,6 +504,11 @@ static bool is_larx_stcx_instr(int type) return type == LARX || type == STCX; }
+static bool is_octword_vsx_instr(int type, int size) +{ + return ((type == LOAD_VSX || type == STORE_VSX) && size == 32); +} + /* * We've failed in reliably handling the hw-breakpoint. Unregister * it and throw a warning message to let the user know about it. @@ -554,6 +559,58 @@ static bool stepping_handler(struct pt_regs *regs, struct perf_event **bp, return true; }
+static void handle_p10dd1_spurious_exception(struct arch_hw_breakpoint **info, + int *hit, unsigned long ea) +{ + int i; + unsigned long hw_end_addr; + + /* + * Handle spurious exception only when any bp_per_reg is set. + * Otherwise this might be created by xmon and not actually a + * spurious exception. + */ + for (i = 0; i < nr_wp_slots(); i++) { + if (!info[i]) + continue; + + hw_end_addr = ALIGN(info[i]->address + info[i]->len, HW_BREAKPOINT_SIZE); + + /* + * Ending address of DAWR range is less than starting + * address of op. + */ + if ((hw_end_addr - 1) >= ea) + continue; + + /* + * Those addresses need to be in the same or in two + * consecutive 512B blocks; + */ + if (((hw_end_addr - 1) >> 10) != (ea >> 10)) + continue; + + /* + * 'op address + 64B' generates an address that has a + * carry into bit 52 (crosses 2K boundary). + */ + if ((ea & 0x800) == ((ea + 64) & 0x800)) + continue; + + break; + } + + if (i == nr_wp_slots()) + return; + + for (i = 0; i < nr_wp_slots(); i++) { + if (info[i]) { + hit[i] = 1; + info[i]->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ; + } + } +} + int hw_breakpoint_handler(struct die_args *args) { bool err = false; @@ -612,8 +669,14 @@ int hw_breakpoint_handler(struct die_args *args) goto reset;
if (!nr_hit) { - rc = NOTIFY_DONE; - goto out; + /* Workaround for Power10 DD1 */ + if (!IS_ENABLED(CONFIG_PPC_8xx) && mfspr(SPRN_PVR) == 0x800100 && + is_octword_vsx_instr(type, size)) { + handle_p10dd1_spurious_exception(info, hit, ea); + } else { + rc = NOTIFY_DONE; + goto out; + } }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Benjamin Gray bgray@linux.ibm.com
[ Upstream commit 27646b2e02b096a6936b3e3b6ba334ae20763eab ]
It can be easy to miss that the notifier mechanism invokes the callbacks in an atomic context, so add some comments to that effect on the two handlers we register here.
Signed-off-by: Benjamin Gray bgray@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20230829063457.54157-4-bgray@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/hw_breakpoint.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 49273f67c7498..ca3374c6f3749 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -611,6 +611,11 @@ static void handle_p10dd1_spurious_exception(struct arch_hw_breakpoint **info, } }
+/* + * Handle a DABR or DAWR exception. + * + * Called in atomic context. + */ int hw_breakpoint_handler(struct die_args *args) { bool err = false; @@ -737,6 +742,8 @@ NOKPROBE_SYMBOL(hw_breakpoint_handler);
/* * Handle single-step exceptions following a DABR hit. + * + * Called in atomic context. */ static int single_step_dabr_instruction(struct die_args *args) { @@ -794,6 +801,8 @@ NOKPROBE_SYMBOL(single_step_dabr_instruction);
/* * Handle debug exception notifications. + * + * Called in atomic context. */ int hw_breakpoint_exceptions_notify( struct notifier_block *unused, unsigned long val, void *data)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shyam Prasad N sprasad@microsoft.com
[ Upstream commit e4645cc2f1e2d6f268bb8dcfac40997c52432aed ]
We've seen the in-flight count go into negative with some internal stress testing in Microsoft.
Adding a WARN when this happens, in hope of understanding why this happens when it happens.
Signed-off-by: Shyam Prasad N sprasad@microsoft.com Reviewed-by: Bharath SM bharathsm@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cifs/smb2ops.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index f83901c1c17a5..b2a7238a34221 100644 --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -82,6 +82,7 @@ smb2_add_credits(struct TCP_Server_Info *server, *val = 65000; /* Don't get near 64K credits, avoid srv bugs */ pr_warn_once("server overflowed SMB3 credits\n"); } + WARN_ON_ONCE(server->in_flight == 0); server->in_flight--; if (server->in_flight == 0 && (optype & CIFS_OP_MASK) != CIFS_NEG_OP) rc = change_conf(server);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: YouChing Lin ycllin@mxic.com.tw
[ Upstream commit 5ece78de88739b4c68263e9f2582380c1fd8314f ]
The Macronix MX35LF2GE4AD / MX35LF4GE4AD are 3V, 2G / 4Gbit serial SLC NAND flash device (with on-die ECC).
Validated by read, erase, read back, write, read back and nandtest on Xilinx Zynq PicoZed FPGA board which included Macronix SPI Host (drivers/spi/spi-mxic.c).
Signed-off-by: YouChing Lin ycllin@mxic.com.tw Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/1604561020-13499-1-git-send-email-ycllin@m... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mtd/nand/spi/macronix.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/mtd/nand/spi/macronix.c b/drivers/mtd/nand/spi/macronix.c index cd7a9cacc3fbf..8bd3f6bf9b103 100644 --- a/drivers/mtd/nand/spi/macronix.c +++ b/drivers/mtd/nand/spi/macronix.c @@ -119,6 +119,26 @@ static const struct spinand_info macronix_spinand_table[] = { &update_cache_variants), SPINAND_HAS_QE_BIT, SPINAND_ECCINFO(&mx35lfxge4ab_ooblayout, NULL)), + SPINAND_INFO("MX35LF2GE4AD", + SPINAND_ID(SPINAND_READID_METHOD_OPCODE_DUMMY, 0x26), + NAND_MEMORG(1, 2048, 64, 64, 2048, 40, 1, 1, 1), + NAND_ECCREQ(8, 512), + SPINAND_INFO_OP_VARIANTS(&read_cache_variants, + &write_cache_variants, + &update_cache_variants), + 0, + SPINAND_ECCINFO(&mx35lfxge4ab_ooblayout, + mx35lf1ge4ab_ecc_get_status)), + SPINAND_INFO("MX35LF4GE4AD", + SPINAND_ID(SPINAND_READID_METHOD_OPCODE_DUMMY, 0x37), + NAND_MEMORG(1, 4096, 128, 64, 2048, 40, 1, 1, 1), + NAND_ECCREQ(8, 512), + SPINAND_INFO_OP_VARIANTS(&read_cache_variants, + &write_cache_variants, + &update_cache_variants), + 0, + SPINAND_ECCINFO(&mx35lfxge4ab_ooblayout, + mx35lf1ge4ab_ecc_get_status)), SPINAND_INFO("MX31LF1GE4BC", SPINAND_ID(SPINAND_READID_METHOD_OPCODE_DUMMY, 0x1e), NAND_MEMORG(1, 2048, 64, 64, 1024, 20, 1, 1, 1),
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
[ Upstream commit c50f126b3c9ebb77585838726a3a490ad33b92cd ]
In current ACPI-based devices, the DSDT does not include any of the properties required by the codec driver. This is not an ACPI limitation proper since the _DSD method could be used, as done for Camera and SoundWire in newer platforms. For legacy devices, there is unfortunately no other option than using a work-around: we add properties to the codec device from the machine driver.
To avoid any issues with the codec driver being unbound, we need to keep a reference to the codec device until the card is removed.
Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Co-developed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Link: https://lore.kernel.org/r/20210813151116.23931-2-pierre-louis.bossart@linux.... Signed-off-by: Mark Brown broonie@kernel.org Stable-dep-of: 721858823d7c ("ASoC: Intel: bytcr_rt5651: Drop reference count of ACPI device after use") Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/intel/boards/bytcht_es8316.c | 12 +++++-- sound/soc/intel/boards/bytcr_rt5640.c | 47 +++++++++++++++++--------- sound/soc/intel/boards/bytcr_rt5651.c | 37 +++++++++++++------- 3 files changed, 64 insertions(+), 32 deletions(-)
diff --git a/sound/soc/intel/boards/bytcht_es8316.c b/sound/soc/intel/boards/bytcht_es8316.c index 81269ed5a2aaa..91351b8536aa1 100644 --- a/sound/soc/intel/boards/bytcht_es8316.c +++ b/sound/soc/intel/boards/bytcht_es8316.c @@ -37,6 +37,7 @@ struct byt_cht_es8316_private { struct clk *mclk; struct snd_soc_jack jack; struct gpio_desc *speaker_en_gpio; + struct device *codec_dev; bool speaker_en; };
@@ -569,7 +570,7 @@ static int snd_byt_cht_es8316_mc_probe(struct platform_device *pdev) gpiod_get_index(codec_dev, "speaker-enable", 0, /* see comment in byt_cht_es8316_resume */ GPIOD_OUT_LOW | GPIOD_FLAGS_BIT_NONEXCLUSIVE); - put_device(codec_dev); + priv->codec_dev = codec_dev;
if (IS_ERR(priv->speaker_en_gpio)) { ret = PTR_ERR(priv->speaker_en_gpio); @@ -581,7 +582,7 @@ static int snd_byt_cht_es8316_mc_probe(struct platform_device *pdev) dev_err(dev, "get speaker GPIO failed: %d\n", ret); fallthrough; case -EPROBE_DEFER: - return ret; + goto err_put_codec; } }
@@ -604,10 +605,14 @@ static int snd_byt_cht_es8316_mc_probe(struct platform_device *pdev) if (ret) { gpiod_put(priv->speaker_en_gpio); dev_err(dev, "snd_soc_register_card failed: %d\n", ret); - return ret; + goto err_put_codec; } platform_set_drvdata(pdev, &byt_cht_es8316_card); return 0; + +err_put_codec: + put_device(priv->codec_dev); + return ret; }
static int snd_byt_cht_es8316_mc_remove(struct platform_device *pdev) @@ -616,6 +621,7 @@ static int snd_byt_cht_es8316_mc_remove(struct platform_device *pdev) struct byt_cht_es8316_private *priv = snd_soc_card_get_drvdata(card);
gpiod_put(priv->speaker_en_gpio); + put_device(priv->codec_dev); return 0; }
diff --git a/sound/soc/intel/boards/bytcr_rt5640.c b/sound/soc/intel/boards/bytcr_rt5640.c index 9a5ab96f917d3..b531c348fb697 100644 --- a/sound/soc/intel/boards/bytcr_rt5640.c +++ b/sound/soc/intel/boards/bytcr_rt5640.c @@ -86,6 +86,7 @@ enum { struct byt_rt5640_private { struct snd_soc_jack jack; struct clk *mclk; + struct device *codec_dev; }; static bool is_bytcr;
@@ -941,15 +942,11 @@ static const struct dmi_system_id byt_rt5640_quirk_table[] = { * Note this MUST be called before snd_soc_register_card(), so that the props * are in place before the codec component driver's probe function parses them. */ -static int byt_rt5640_add_codec_device_props(const char *i2c_dev_name) +static int byt_rt5640_add_codec_device_props(struct device *i2c_dev, + struct byt_rt5640_private *priv) { struct property_entry props[MAX_NO_PROPS] = {}; - struct device *i2c_dev; - int ret, cnt = 0; - - i2c_dev = bus_find_device_by_name(&i2c_bus_type, NULL, i2c_dev_name); - if (!i2c_dev) - return -EPROBE_DEFER; + int cnt = 0;
switch (BYT_RT5640_MAP(byt_rt5640_quirk)) { case BYT_RT5640_DMIC1_MAP: @@ -989,10 +986,7 @@ static int byt_rt5640_add_codec_device_props(const char *i2c_dev_name) if (byt_rt5640_quirk & BYT_RT5640_JD_NOT_INV) props[cnt++] = PROPERTY_ENTRY_BOOL("realtek,jack-detect-not-inverted");
- ret = device_add_properties(i2c_dev, props); - put_device(i2c_dev); - - return ret; + return device_add_properties(i2c_dev, props); }
static int byt_rt5640_init(struct snd_soc_pcm_runtime *runtime) @@ -1324,6 +1318,7 @@ static int snd_byt_rt5640_mc_probe(struct platform_device *pdev) struct snd_soc_acpi_mach *mach; const char *platform_name; struct acpi_device *adev; + struct device *codec_dev; int ret_val = 0; int dai_index = 0; int i, cfg_spk; @@ -1430,10 +1425,16 @@ static int snd_byt_rt5640_mc_probe(struct platform_device *pdev) byt_rt5640_quirk = quirk_override; }
+ codec_dev = bus_find_device_by_name(&i2c_bus_type, NULL, byt_rt5640_codec_name); + if (!codec_dev) + return -EPROBE_DEFER; + + priv->codec_dev = codec_dev; + /* Must be called before register_card, also see declaration comment. */ - ret_val = byt_rt5640_add_codec_device_props(byt_rt5640_codec_name); + ret_val = byt_rt5640_add_codec_device_props(codec_dev, priv); if (ret_val) - return ret_val; + goto err;
log_quirks(&pdev->dev);
@@ -1460,7 +1461,7 @@ static int snd_byt_rt5640_mc_probe(struct platform_device *pdev) * for all other errors, including -EPROBE_DEFER */ if (ret_val != -ENOENT) - return ret_val; + goto err; byt_rt5640_quirk &= ~BYT_RT5640_MCLK_EN; } } @@ -1493,17 +1494,30 @@ static int snd_byt_rt5640_mc_probe(struct platform_device *pdev) ret_val = snd_soc_fixup_dai_links_platform_name(&byt_rt5640_card, platform_name); if (ret_val) - return ret_val; + goto err;
ret_val = devm_snd_soc_register_card(&pdev->dev, &byt_rt5640_card);
if (ret_val) { dev_err(&pdev->dev, "devm_snd_soc_register_card failed %d\n", ret_val); - return ret_val; + goto err; } platform_set_drvdata(pdev, &byt_rt5640_card); return ret_val; + +err: + put_device(priv->codec_dev); + return ret_val; +} + +static int snd_byt_rt5640_mc_remove(struct platform_device *pdev) +{ + struct snd_soc_card *card = platform_get_drvdata(pdev); + struct byt_rt5640_private *priv = snd_soc_card_get_drvdata(card); + + put_device(priv->codec_dev); + return 0; }
static struct platform_driver snd_byt_rt5640_mc_driver = { @@ -1514,6 +1528,7 @@ static struct platform_driver snd_byt_rt5640_mc_driver = { #endif }, .probe = snd_byt_rt5640_mc_probe, + .remove = snd_byt_rt5640_mc_remove, };
module_platform_driver(snd_byt_rt5640_mc_driver); diff --git a/sound/soc/intel/boards/bytcr_rt5651.c b/sound/soc/intel/boards/bytcr_rt5651.c index bf8b87d45cb0a..acea83e814acb 100644 --- a/sound/soc/intel/boards/bytcr_rt5651.c +++ b/sound/soc/intel/boards/bytcr_rt5651.c @@ -85,6 +85,7 @@ struct byt_rt5651_private { struct gpio_desc *ext_amp_gpio; struct gpio_desc *hp_detect; struct snd_soc_jack jack; + struct device *codec_dev; };
static const struct acpi_gpio_mapping *byt_rt5651_gpios; @@ -995,12 +996,12 @@ static int snd_byt_rt5651_mc_probe(struct platform_device *pdev) byt_rt5651_quirk = quirk_override; }
+ priv->codec_dev = codec_dev; + /* Must be called before register_card, also see declaration comment. */ ret_val = byt_rt5651_add_codec_device_props(codec_dev); - if (ret_val) { - put_device(codec_dev); - return ret_val; - } + if (ret_val) + goto err;
/* Cherry Trail devices use an external amplifier enable gpio */ if (soc_intel_is_cht() && !byt_rt5651_gpios) @@ -1024,8 +1025,7 @@ static int snd_byt_rt5651_mc_probe(struct platform_device *pdev) ret_val); fallthrough; case -EPROBE_DEFER: - put_device(codec_dev); - return ret_val; + goto err; } } priv->hp_detect = devm_fwnode_gpiod_get(&pdev->dev, @@ -1044,14 +1044,11 @@ static int snd_byt_rt5651_mc_probe(struct platform_device *pdev) ret_val); fallthrough; case -EPROBE_DEFER: - put_device(codec_dev); - return ret_val; + goto err; } } }
- put_device(codec_dev); - log_quirks(&pdev->dev);
if ((byt_rt5651_quirk & BYT_RT5651_SSP2_AIF2) || @@ -1075,7 +1072,7 @@ static int snd_byt_rt5651_mc_probe(struct platform_device *pdev) * for all other errors, including -EPROBE_DEFER */ if (ret_val != -ENOENT) - return ret_val; + goto err; byt_rt5651_quirk &= ~BYT_RT5651_MCLK_EN; } } @@ -1104,17 +1101,30 @@ static int snd_byt_rt5651_mc_probe(struct platform_device *pdev) ret_val = snd_soc_fixup_dai_links_platform_name(&byt_rt5651_card, platform_name); if (ret_val) - return ret_val; + goto err;
ret_val = devm_snd_soc_register_card(&pdev->dev, &byt_rt5651_card);
if (ret_val) { dev_err(&pdev->dev, "devm_snd_soc_register_card failed %d\n", ret_val); - return ret_val; + goto err; } platform_set_drvdata(pdev, &byt_rt5651_card); return ret_val; + +err: + put_device(priv->codec_dev); + return ret_val; +} + +static int snd_byt_rt5651_mc_remove(struct platform_device *pdev) +{ + struct snd_soc_card *card = platform_get_drvdata(pdev); + struct byt_rt5651_private *priv = snd_soc_card_get_drvdata(card); + + put_device(priv->codec_dev); + return 0; }
static struct platform_driver snd_byt_rt5651_mc_driver = { @@ -1125,6 +1135,7 @@ static struct platform_driver snd_byt_rt5651_mc_driver = { #endif }, .probe = snd_byt_rt5651_mc_probe, + .remove = snd_byt_rt5651_mc_remove, };
module_platform_driver(snd_byt_rt5651_mc_driver);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
[ Upstream commit d3409eb20d3ed7d9e021cd13243e9e63255a315f ]
We have an existing 'adev' handle from which we can find the codec device, no need for an I2C bus search.
Suggested-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Link: https://lore.kernel.org/r/20210813151116.23931-4-pierre-louis.bossart@linux.... Signed-off-by: Mark Brown broonie@kernel.org Stable-dep-of: 721858823d7c ("ASoC: Intel: bytcr_rt5651: Drop reference count of ACPI device after use") Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/intel/boards/bytcht_es8316.c | 4 ++-- sound/soc/intel/boards/bytcr_rt5640.c | 5 ++--- sound/soc/intel/boards/bytcr_rt5651.c | 6 ++---- 3 files changed, 6 insertions(+), 9 deletions(-)
diff --git a/sound/soc/intel/boards/bytcht_es8316.c b/sound/soc/intel/boards/bytcht_es8316.c index 91351b8536aa1..03b9cdbd3170f 100644 --- a/sound/soc/intel/boards/bytcht_es8316.c +++ b/sound/soc/intel/boards/bytcht_es8316.c @@ -550,9 +550,10 @@ static int snd_byt_cht_es8316_mc_probe(struct platform_device *pdev) }
/* get speaker enable GPIO */ - codec_dev = bus_find_device_by_name(&i2c_bus_type, NULL, codec_name); + codec_dev = acpi_get_first_physical_node(adev); if (!codec_dev) return -EPROBE_DEFER; + priv->codec_dev = get_device(codec_dev);
if (quirk & BYT_CHT_ES8316_JD_INVERTED) props[cnt++] = PROPERTY_ENTRY_BOOL("everest,jack-detect-inverted"); @@ -570,7 +571,6 @@ static int snd_byt_cht_es8316_mc_probe(struct platform_device *pdev) gpiod_get_index(codec_dev, "speaker-enable", 0, /* see comment in byt_cht_es8316_resume */ GPIOD_OUT_LOW | GPIOD_FLAGS_BIT_NONEXCLUSIVE); - priv->codec_dev = codec_dev;
if (IS_ERR(priv->speaker_en_gpio)) { ret = PTR_ERR(priv->speaker_en_gpio); diff --git a/sound/soc/intel/boards/bytcr_rt5640.c b/sound/soc/intel/boards/bytcr_rt5640.c index b531c348fb697..f5b1b3b876980 100644 --- a/sound/soc/intel/boards/bytcr_rt5640.c +++ b/sound/soc/intel/boards/bytcr_rt5640.c @@ -1425,11 +1425,10 @@ static int snd_byt_rt5640_mc_probe(struct platform_device *pdev) byt_rt5640_quirk = quirk_override; }
- codec_dev = bus_find_device_by_name(&i2c_bus_type, NULL, byt_rt5640_codec_name); + codec_dev = acpi_get_first_physical_node(adev); if (!codec_dev) return -EPROBE_DEFER; - - priv->codec_dev = codec_dev; + priv->codec_dev = get_device(codec_dev);
/* Must be called before register_card, also see declaration comment. */ ret_val = byt_rt5640_add_codec_device_props(codec_dev, priv); diff --git a/sound/soc/intel/boards/bytcr_rt5651.c b/sound/soc/intel/boards/bytcr_rt5651.c index acea83e814acb..472f6e3327960 100644 --- a/sound/soc/intel/boards/bytcr_rt5651.c +++ b/sound/soc/intel/boards/bytcr_rt5651.c @@ -926,10 +926,10 @@ static int snd_byt_rt5651_mc_probe(struct platform_device *pdev) return -ENODEV; }
- codec_dev = bus_find_device_by_name(&i2c_bus_type, NULL, - byt_rt5651_codec_name); + codec_dev = acpi_get_first_physical_node(adev); if (!codec_dev) return -EPROBE_DEFER; + priv->codec_dev = get_device(codec_dev);
/* * swap SSP0 if bytcr is detected @@ -996,8 +996,6 @@ static int snd_byt_rt5651_mc_probe(struct platform_device *pdev) byt_rt5651_quirk = quirk_override; }
- priv->codec_dev = codec_dev; - /* Must be called before register_card, also see declaration comment. */ ret_val = byt_rt5651_add_codec_device_props(codec_dev); if (ret_val)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Shevchenko andriy.shevchenko@linux.intel.com
[ Upstream commit 721858823d7cdc8f2a897579b040e935989f6f02 ]
Theoretically the device might gone if its reference count drops to 0. This might be the case when we try to find the first physical node of the ACPI device. We need to keep reference to it until we get a result of the above mentioned call. Refactor the code to drop the reference count at the correct place.
While at it, move to acpi_dev_put() as symmetrical call to the acpi_dev_get_first_match_dev().
Fixes: 02c0a3b3047f ("ASoC: Intel: bytcr_rt5651: add MCLK, quirks and cleanups") Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Acked-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Link: https://lore.kernel.org/r/20230112112852.67714-3-andriy.shevchenko@linux.int... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/intel/boards/bytcr_rt5651.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/intel/boards/bytcr_rt5651.c b/sound/soc/intel/boards/bytcr_rt5651.c index 472f6e3327960..a8289f74463e9 100644 --- a/sound/soc/intel/boards/bytcr_rt5651.c +++ b/sound/soc/intel/boards/bytcr_rt5651.c @@ -919,7 +919,6 @@ static int snd_byt_rt5651_mc_probe(struct platform_device *pdev) if (adev) { snprintf(byt_rt5651_codec_name, sizeof(byt_rt5651_codec_name), "i2c-%s", acpi_dev_name(adev)); - put_device(&adev->dev); byt_rt5651_dais[dai_index].codecs->name = byt_rt5651_codec_name; } else { dev_err(&pdev->dev, "Error cannot find '%s' dev\n", mach->id); @@ -927,6 +926,7 @@ static int snd_byt_rt5651_mc_probe(struct platform_device *pdev) }
codec_dev = acpi_get_first_physical_node(adev); + acpi_dev_put(adev); if (!codec_dev) return -EPROBE_DEFER; priv->codec_dev = get_device(codec_dev);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Borislav Petkov bp@suse.de
[ Upstream commit e87f4152e542610d0b4c6c8548964a68a59d2040 ]
Force-inline two stack helpers to fix the following objtool warnings:
vmlinux.o: warning: objtool: in_task_stack()+0xc: call to task_stack_page() leaves .noinstr.text section vmlinux.o: warning: objtool: in_entry_stack()+0x10: call to cpu_entry_stack() leaves .noinstr.text section
Signed-off-by: Borislav Petkov bp@suse.de Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lore.kernel.org/r/20220324183607.31717-2-bp@alien8.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/cpu_entry_area.h | 2 +- include/linux/sched/task_stack.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h index dd5ea1bdf04c5..75efc4c6f0766 100644 --- a/arch/x86/include/asm/cpu_entry_area.h +++ b/arch/x86/include/asm/cpu_entry_area.h @@ -143,7 +143,7 @@ extern void cea_set_pte(void *cea_vaddr, phys_addr_t pa, pgprot_t flags);
extern struct cpu_entry_area *get_cpu_entry_area(int cpu);
-static inline struct entry_stack *cpu_entry_stack(int cpu) +static __always_inline struct entry_stack *cpu_entry_stack(int cpu) { return &get_cpu_entry_area(cpu)->entry_stack_page.stack; } diff --git a/include/linux/sched/task_stack.h b/include/linux/sched/task_stack.h index f24575942dabe..879a5c8f930b6 100644 --- a/include/linux/sched/task_stack.h +++ b/include/linux/sched/task_stack.h @@ -16,7 +16,7 @@ * try_get_task_stack() instead. task_stack_page will return a pointer * that could get freed out from under you. */ -static inline void *task_stack_page(const struct task_struct *task) +static __always_inline void *task_stack_page(const struct task_struct *task) { return task->stack; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josef Bacik josef@toxicpanda.com
[ Upstream commit 899b7f69f244e539ea5df1b4d756046337de44a5 ]
We're seeing a weird problem in production where we have overlapping extent items in the extent tree. It's unclear where these are coming from, and in debugging we realized there's no check in the tree checker for this sort of problem. Add a check to the tree-checker to make sure that the extents do not overlap each other.
Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/tree-checker.c | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index c0eda3816f685..5b952f69bc1f6 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -1189,7 +1189,8 @@ static void extent_err(const struct extent_buffer *eb, int slot, }
static int check_extent_item(struct extent_buffer *leaf, - struct btrfs_key *key, int slot) + struct btrfs_key *key, int slot, + struct btrfs_key *prev_key) { struct btrfs_fs_info *fs_info = leaf->fs_info; struct btrfs_extent_item *ei; @@ -1400,6 +1401,26 @@ static int check_extent_item(struct extent_buffer *leaf, total_refs, inline_refs); return -EUCLEAN; } + + if ((prev_key->type == BTRFS_EXTENT_ITEM_KEY) || + (prev_key->type == BTRFS_METADATA_ITEM_KEY)) { + u64 prev_end = prev_key->objectid; + + if (prev_key->type == BTRFS_METADATA_ITEM_KEY) + prev_end += fs_info->nodesize; + else + prev_end += prev_key->offset; + + if (unlikely(prev_end > key->objectid)) { + extent_err(leaf, slot, + "previous extent [%llu %u %llu] overlaps current extent [%llu %u %llu]", + prev_key->objectid, prev_key->type, + prev_key->offset, key->objectid, key->type, + key->offset); + return -EUCLEAN; + } + } + return 0; }
@@ -1568,7 +1589,7 @@ static int check_leaf_item(struct extent_buffer *leaf, break; case BTRFS_EXTENT_ITEM_KEY: case BTRFS_METADATA_ITEM_KEY: - ret = check_extent_item(leaf, key, slot); + ret = check_extent_item(leaf, key, slot, prev_key); break; case BTRFS_TREE_BLOCK_REF_KEY: case BTRFS_SHARED_DATA_REF_KEY:
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marcos Paulo de Souza mpdesouza@suse.com
[ Upstream commit a7d1c5dc8632e9b370ad26478c468d4e4e29f263 ]
btrfs_search_slot is called in multiple places in dir-item.c to search for a dir entry, and then calling btrfs_match_dir_name to return a btrfs_dir_item.
In order to reduce the number of callers of btrfs_search_slot, create a common function that looks for the dir key, and if found call btrfs_match_dir_item_name.
Signed-off-by: Marcos Paulo de Souza mpdesouza@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Stable-dep-of: 8dcbc26194eb ("btrfs: unify lookup return value when dir entry is missing") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/dir-item.c | 76 +++++++++++++++++++++++---------------------- 1 file changed, 39 insertions(+), 37 deletions(-)
diff --git a/fs/btrfs/dir-item.c b/fs/btrfs/dir-item.c index 863367c2c6205..1c0a7cd6b9b0a 100644 --- a/fs/btrfs/dir-item.c +++ b/fs/btrfs/dir-item.c @@ -171,6 +171,25 @@ int btrfs_insert_dir_item(struct btrfs_trans_handle *trans, const char *name, return 0; }
+static struct btrfs_dir_item *btrfs_lookup_match_dir( + struct btrfs_trans_handle *trans, + struct btrfs_root *root, struct btrfs_path *path, + struct btrfs_key *key, const char *name, + int name_len, int mod) +{ + const int ins_len = (mod < 0 ? -1 : 0); + const int cow = (mod != 0); + int ret; + + ret = btrfs_search_slot(trans, root, key, path, ins_len, cow); + if (ret < 0) + return ERR_PTR(ret); + if (ret > 0) + return ERR_PTR(-ENOENT); + + return btrfs_match_dir_item_name(root->fs_info, path, name, name_len); +} + /* * lookup a directory item based on name. 'dir' is the objectid * we're searching in, and 'mod' tells us if you plan on deleting the @@ -182,23 +201,18 @@ struct btrfs_dir_item *btrfs_lookup_dir_item(struct btrfs_trans_handle *trans, const char *name, int name_len, int mod) { - int ret; struct btrfs_key key; - int ins_len = mod < 0 ? -1 : 0; - int cow = mod != 0; + struct btrfs_dir_item *di;
key.objectid = dir; key.type = BTRFS_DIR_ITEM_KEY; - key.offset = btrfs_name_hash(name, name_len);
- ret = btrfs_search_slot(trans, root, &key, path, ins_len, cow); - if (ret < 0) - return ERR_PTR(ret); - if (ret > 0) + di = btrfs_lookup_match_dir(trans, root, path, &key, name, name_len, mod); + if (IS_ERR(di) && PTR_ERR(di) == -ENOENT) return NULL;
- return btrfs_match_dir_item_name(root->fs_info, path, name, name_len); + return di; }
int btrfs_check_dir_item_collision(struct btrfs_root *root, u64 dir, @@ -212,7 +226,6 @@ int btrfs_check_dir_item_collision(struct btrfs_root *root, u64 dir, int slot; struct btrfs_path *path;
- path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -221,20 +234,20 @@ int btrfs_check_dir_item_collision(struct btrfs_root *root, u64 dir, key.type = BTRFS_DIR_ITEM_KEY; key.offset = btrfs_name_hash(name, name_len);
- ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); - - /* return back any errors */ - if (ret < 0) - goto out; + di = btrfs_lookup_match_dir(NULL, root, path, &key, name, name_len, 0); + if (IS_ERR(di)) { + ret = PTR_ERR(di); + /* Nothing found, we're safe */ + if (ret == -ENOENT) { + ret = 0; + goto out; + }
- /* nothing found, we're safe */ - if (ret > 0) { - ret = 0; - goto out; + if (ret < 0) + goto out; }
/* we found an item, look for our name in the item */ - di = btrfs_match_dir_item_name(root->fs_info, path, name, name_len); if (di) { /* our exact name was found */ ret = -EEXIST; @@ -275,21 +288,13 @@ btrfs_lookup_dir_index_item(struct btrfs_trans_handle *trans, u64 objectid, const char *name, int name_len, int mod) { - int ret; struct btrfs_key key; - int ins_len = mod < 0 ? -1 : 0; - int cow = mod != 0;
key.objectid = dir; key.type = BTRFS_DIR_INDEX_KEY; key.offset = objectid;
- ret = btrfs_search_slot(trans, root, &key, path, ins_len, cow); - if (ret < 0) - return ERR_PTR(ret); - if (ret > 0) - return ERR_PTR(-ENOENT); - return btrfs_match_dir_item_name(root->fs_info, path, name, name_len); + return btrfs_lookup_match_dir(trans, root, path, &key, name, name_len, mod); }
struct btrfs_dir_item * @@ -346,21 +351,18 @@ struct btrfs_dir_item *btrfs_lookup_xattr(struct btrfs_trans_handle *trans, const char *name, u16 name_len, int mod) { - int ret; struct btrfs_key key; - int ins_len = mod < 0 ? -1 : 0; - int cow = mod != 0; + struct btrfs_dir_item *di;
key.objectid = dir; key.type = BTRFS_XATTR_ITEM_KEY; key.offset = btrfs_name_hash(name, name_len); - ret = btrfs_search_slot(trans, root, &key, path, ins_len, cow); - if (ret < 0) - return ERR_PTR(ret); - if (ret > 0) + + di = btrfs_lookup_match_dir(trans, root, path, &key, name, name_len, mod); + if (IS_ERR(di) && PTR_ERR(di) == -ENOENT) return NULL;
- return btrfs_match_dir_item_name(root->fs_info, path, name, name_len); + return di; }
/*
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
[ Upstream commit 8dcbc26194eb872cc3430550fb70bb461424d267 ]
btrfs_lookup_dir_index_item() and btrfs_lookup_dir_item() lookup for dir entries and both are used during log replay or when updating a log tree during an unlink.
However when the dir item does not exists, btrfs_lookup_dir_item() returns NULL while btrfs_lookup_dir_index_item() returns PTR_ERR(-ENOENT), and if the dir item exists but there is no matching entry for a given name or index, both return NULL. This makes the call sites during log replay to be more verbose than necessary and it makes it easy to miss this slight difference. Since we don't need to distinguish between those two cases, make btrfs_lookup_dir_index_item() always return NULL when there is no matching directory entry - either because there isn't any dir entry or because there is one but it does not match the given name and index.
Also rename the argument 'objectid' of btrfs_lookup_dir_index_item() to 'index' since it is supposed to match an index number, and the name 'objectid' is not very good because it can easily be confused with an inode number (like the inode number a dir entry points to).
CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/ctree.h | 2 +- fs/btrfs/dir-item.c | 48 ++++++++++++++++++++++++++++++++++----------- fs/btrfs/tree-log.c | 14 ++++--------- 3 files changed, 42 insertions(+), 22 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 67831868ef0de..3ddb09f2b1685 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2879,7 +2879,7 @@ struct btrfs_dir_item * btrfs_lookup_dir_index_item(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, u64 dir, - u64 objectid, const char *name, int name_len, + u64 index, const char *name, int name_len, int mod); struct btrfs_dir_item * btrfs_search_dir_index_item(struct btrfs_root *root, diff --git a/fs/btrfs/dir-item.c b/fs/btrfs/dir-item.c index 1c0a7cd6b9b0a..98c6faa8ce15b 100644 --- a/fs/btrfs/dir-item.c +++ b/fs/btrfs/dir-item.c @@ -191,9 +191,20 @@ static struct btrfs_dir_item *btrfs_lookup_match_dir( }
/* - * lookup a directory item based on name. 'dir' is the objectid - * we're searching in, and 'mod' tells us if you plan on deleting the - * item (use mod < 0) or changing the options (use mod > 0) + * Lookup for a directory item by name. + * + * @trans: The transaction handle to use. Can be NULL if @mod is 0. + * @root: The root of the target tree. + * @path: Path to use for the search. + * @dir: The inode number (objectid) of the directory. + * @name: The name associated to the directory entry we are looking for. + * @name_len: The length of the name. + * @mod: Used to indicate if the tree search is meant for a read only + * lookup, for a modification lookup or for a deletion lookup, so + * its value should be 0, 1 or -1, respectively. + * + * Returns: NULL if the dir item does not exists, an error pointer if an error + * happened, or a pointer to a dir item if a dir item exists for the given name. */ struct btrfs_dir_item *btrfs_lookup_dir_item(struct btrfs_trans_handle *trans, struct btrfs_root *root, @@ -274,27 +285,42 @@ int btrfs_check_dir_item_collision(struct btrfs_root *root, u64 dir, }
/* - * lookup a directory item based on index. 'dir' is the objectid - * we're searching in, and 'mod' tells us if you plan on deleting the - * item (use mod < 0) or changing the options (use mod > 0) + * Lookup for a directory index item by name and index number. * - * The name is used to make sure the index really points to the name you were - * looking for. + * @trans: The transaction handle to use. Can be NULL if @mod is 0. + * @root: The root of the target tree. + * @path: Path to use for the search. + * @dir: The inode number (objectid) of the directory. + * @index: The index number. + * @name: The name associated to the directory entry we are looking for. + * @name_len: The length of the name. + * @mod: Used to indicate if the tree search is meant for a read only + * lookup, for a modification lookup or for a deletion lookup, so + * its value should be 0, 1 or -1, respectively. + * + * Returns: NULL if the dir index item does not exists, an error pointer if an + * error happened, or a pointer to a dir item if the dir index item exists and + * matches the criteria (name and index number). */ struct btrfs_dir_item * btrfs_lookup_dir_index_item(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, u64 dir, - u64 objectid, const char *name, int name_len, + u64 index, const char *name, int name_len, int mod) { + struct btrfs_dir_item *di; struct btrfs_key key;
key.objectid = dir; key.type = BTRFS_DIR_INDEX_KEY; - key.offset = objectid; + key.offset = index;
- return btrfs_lookup_match_dir(trans, root, path, &key, name, name_len, mod); + di = btrfs_lookup_match_dir(trans, root, path, &key, name, name_len, mod); + if (di == ERR_PTR(-ENOENT)) + return NULL; + + return di; }
struct btrfs_dir_item * diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 10a0913ffb492..34e9eb5010cda 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -912,8 +912,7 @@ static noinline int inode_in_dir(struct btrfs_root *root, di = btrfs_lookup_dir_index_item(NULL, root, path, dirid, index, name, name_len, 0); if (IS_ERR(di)) { - if (PTR_ERR(di) != -ENOENT) - ret = PTR_ERR(di); + ret = PTR_ERR(di); goto out; } else if (di) { btrfs_dir_item_key_to_cpu(path->nodes[0], di, &location); @@ -1149,8 +1148,7 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans, di = btrfs_lookup_dir_index_item(trans, root, path, btrfs_ino(dir), ref_index, name, namelen, 0); if (IS_ERR(di)) { - if (PTR_ERR(di) != -ENOENT) - return PTR_ERR(di); + return PTR_ERR(di); } else if (di) { ret = drop_one_dir_item(trans, root, path, dir, di); if (ret) @@ -1976,9 +1974,6 @@ static noinline int replay_one_name(struct btrfs_trans_handle *trans, goto out; }
- if (dst_di == ERR_PTR(-ENOENT)) - dst_di = NULL; - if (IS_ERR(dst_di)) { ret = PTR_ERR(dst_di); goto out; @@ -2286,7 +2281,7 @@ static noinline int check_item_in_log(struct btrfs_trans_handle *trans, dir_key->offset, name, name_len, 0); } - if (!log_di || log_di == ERR_PTR(-ENOENT)) { + if (!log_di) { btrfs_dir_item_key_to_cpu(eb, di, &location); btrfs_release_path(path); btrfs_release_path(log_path); @@ -3495,8 +3490,7 @@ int btrfs_del_dir_entries_in_log(struct btrfs_trans_handle *trans, if (err == -ENOSPC) { btrfs_set_log_full_commit(trans); err = 0; - } else if (err < 0 && err != -ENOENT) { - /* ENOENT can be returned if the entry hasn't been fsynced yet */ + } else if (err < 0) { btrfs_abort_transaction(trans, err); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
[ Upstream commit bd54f381a12ac695593271a663d36d14220215b2 ]
During renames we pin the logs of the roots a bit too early, before the calls to btrfs_insert_inode_ref(). We can pin the logs after those calls, since those will not change anything in a log tree.
In a scenario where we have multiple and diverse filesystem operations running in parallel, those calls can take a significant amount of time, due to lock contention on extent buffers, and delay log commits from other tasks for longer than necessary.
So just pin logs after calls to btrfs_insert_inode_ref() and right before the first operation that can update a log tree.
The following script that uses dbench was used for testing:
$ cat dbench-test.sh #!/bin/bash
DEV=/dev/nvme0n1 MNT=/mnt/nvme0n1 MOUNT_OPTIONS="-o ssd" MKFS_OPTIONS="-m single -d single"
echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
umount $DEV &> /dev/null mkfs.btrfs -f $MKFS_OPTIONS $DEV mount $MOUNT_OPTIONS $DEV $MNT
dbench -D $MNT -t 120 16
umount $MNT
The tests were run on a machine with 12 cores, 64G of RAN, a NVMe device and using a non-debug kernel config (Debian's default config).
The results compare a branch without this patch and without the previous patch in the series, that has the subject:
"btrfs: eliminate some false positives when checking if inode was logged"
Versus the same branch with these two patches applied.
dbench with 8 clients, results before:
Operation Count AvgLat MaxLat ---------------------------------------- NTCreateX 4391359 0.009 249.745 Close 3225882 0.001 3.243 Rename 185953 0.065 240.643 Unlink 886669 0.049 249.906 Deltree 112 2.455 217.433 Mkdir 56 0.002 0.004 Qpathinfo 3980281 0.004 3.109 Qfileinfo 697579 0.001 0.187 Qfsinfo 729780 0.002 2.424 Sfileinfo 357764 0.004 1.415 Find 1538861 0.016 4.863 WriteX 2189666 0.010 3.327 ReadX 6883443 0.002 0.729 LockX 14298 0.002 0.073 UnlockX 14298 0.001 0.042 Flush 307777 2.447 303.663
Throughput 1149.6 MB/sec 8 clients 8 procs max_latency=303.666 ms
dbench with 8 clients, results after:
Operation Count AvgLat MaxLat ---------------------------------------- NTCreateX 4269920 0.009 213.532 Close 3136653 0.001 0.690 Rename 180805 0.082 213.858 Unlink 862189 0.050 172.893 Deltree 112 2.998 218.328 Mkdir 56 0.002 0.003 Qpathinfo 3870158 0.004 5.072 Qfileinfo 678375 0.001 0.194 Qfsinfo 709604 0.002 0.485 Sfileinfo 347850 0.004 1.304 Find 1496310 0.017 5.504 WriteX 2129613 0.010 2.882 ReadX 6693066 0.002 1.517 LockX 13902 0.002 0.075 UnlockX 13902 0.001 0.055 Flush 299276 2.511 220.189
Throughput 1187.33 MB/sec 8 clients 8 procs max_latency=220.194 ms
+3.2% throughput, -31.8% max latency
dbench with 16 clients, results before:
Operation Count AvgLat MaxLat ---------------------------------------- NTCreateX 5978334 0.028 156.507 Close 4391598 0.001 1.345 Rename 253136 0.241 155.057 Unlink 1207220 0.182 257.344 Deltree 160 6.123 36.277 Mkdir 80 0.003 0.005 Qpathinfo 5418817 0.012 6.867 Qfileinfo 949929 0.001 0.941 Qfsinfo 993560 0.002 1.386 Sfileinfo 486904 0.004 2.829 Find 2095088 0.059 8.164 WriteX 2982319 0.017 9.029 ReadX 9371484 0.002 4.052 LockX 19470 0.002 0.461 UnlockX 19470 0.001 0.990 Flush 418936 2.740 347.902
Throughput 1495.31 MB/sec 16 clients 16 procs max_latency=347.909 ms
dbench with 16 clients, results after:
Operation Count AvgLat MaxLat ---------------------------------------- NTCreateX 5711833 0.029 131.240 Close 4195897 0.001 1.732 Rename 241849 0.204 147.831 Unlink 1153341 0.184 231.322 Deltree 160 6.086 30.198 Mkdir 80 0.003 0.021 Qpathinfo 5177011 0.012 7.150 Qfileinfo 907768 0.001 0.793 Qfsinfo 949205 0.002 1.431 Sfileinfo 465317 0.004 2.454 Find 2001541 0.058 7.819 WriteX 2850661 0.017 9.110 ReadX 8952289 0.002 3.991 LockX 18596 0.002 0.655 UnlockX 18596 0.001 0.179 Flush 400342 2.879 293.607
Throughput 1565.73 MB/sec 16 clients 16 procs max_latency=293.611 ms
+4.6% throughput, -16.9% max latency
Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/inode.c | 48 ++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 42 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 250b6064876de..591caac2bf814 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8968,8 +8968,6 @@ static int btrfs_rename_exchange(struct inode *old_dir, /* force full log commit if subvolume involved. */ btrfs_set_log_full_commit(trans); } else { - btrfs_pin_log_trans(root); - root_log_pinned = true; ret = btrfs_insert_inode_ref(trans, dest, new_dentry->d_name.name, new_dentry->d_name.len, @@ -8986,8 +8984,6 @@ static int btrfs_rename_exchange(struct inode *old_dir, /* force full log commit if subvolume involved. */ btrfs_set_log_full_commit(trans); } else { - btrfs_pin_log_trans(dest); - dest_log_pinned = true; ret = btrfs_insert_inode_ref(trans, root, old_dentry->d_name.name, old_dentry->d_name.len, @@ -9018,6 +9014,29 @@ static int btrfs_rename_exchange(struct inode *old_dir, BTRFS_I(new_inode), 1); }
+ /* + * Now pin the logs of the roots. We do it to ensure that no other task + * can sync the logs while we are in progress with the rename, because + * that could result in an inconsistency in case any of the inodes that + * are part of this rename operation were logged before. + * + * We pin the logs even if at this precise moment none of the inodes was + * logged before. This is because right after we checked for that, some + * other task fsyncing some other inode not involved with this rename + * operation could log that one of our inodes exists. + * + * We don't need to pin the logs before the above calls to + * btrfs_insert_inode_ref(), since those don't ever need to change a log. + */ + if (old_ino != BTRFS_FIRST_FREE_OBJECTID) { + btrfs_pin_log_trans(root); + root_log_pinned = true; + } + if (new_ino != BTRFS_FIRST_FREE_OBJECTID) { + btrfs_pin_log_trans(dest); + dest_log_pinned = true; + } + /* src is a subvolume */ if (old_ino == BTRFS_FIRST_FREE_OBJECTID) { ret = btrfs_unlink_subvol(trans, old_dir, old_dentry); @@ -9267,8 +9286,6 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry, /* force full log commit if subvolume involved. */ btrfs_set_log_full_commit(trans); } else { - btrfs_pin_log_trans(root); - log_pinned = true; ret = btrfs_insert_inode_ref(trans, dest, new_dentry->d_name.name, new_dentry->d_name.len, @@ -9292,6 +9309,25 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry, if (unlikely(old_ino == BTRFS_FIRST_FREE_OBJECTID)) { ret = btrfs_unlink_subvol(trans, old_dir, old_dentry); } else { + /* + * Now pin the log. We do it to ensure that no other task can + * sync the log while we are in progress with the rename, as + * that could result in an inconsistency in case any of the + * inodes that are part of this rename operation were logged + * before. + * + * We pin the log even if at this precise moment none of the + * inodes was logged before. This is because right after we + * checked for that, some other task fsyncing some other inode + * not involved with this rename operation could log that one of + * our inodes exists. + * + * We don't need to pin the logs before the above call to + * btrfs_insert_inode_ref(), since that does not need to change + * a log. + */ + btrfs_pin_log_trans(root); + log_pinned = true; ret = __btrfs_unlink_inode(trans, root, BTRFS_I(old_dir), BTRFS_I(d_inode(old_dentry)), old_dentry->d_name.name,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sergej Bauer sbauer@blackbox.su
[ Upstream commit e9e13b6adc338be1eb88db87bcb392696144bd02 ]
This is the 3rd revision of the patch fix for potential null pointer dereference with lan743x card.
The simpliest way to reproduce: boot with bare lan743x and issue "ethtool ethN" commant where ethN is the interface with lan743x card. Example:
$ sudo ethtool eth7 dmesg: [ 103.510336] BUG: kernel NULL pointer dereference, address: 0000000000000340 ... [ 103.510836] RIP: 0010:phy_ethtool_get_wol+0x5/0x30 [libphy] ... [ 103.511629] Call Trace: [ 103.511666] lan743x_ethtool_get_wol+0x21/0x40 [lan743x] [ 103.511724] dev_ethtool+0x1507/0x29d0 [ 103.511769] ? avc_has_extended_perms+0x17f/0x440 [ 103.511820] ? tomoyo_init_request_info+0x84/0x90 [ 103.511870] ? tomoyo_path_number_perm+0x68/0x1e0 [ 103.511919] ? tty_insert_flip_string_fixed_flag+0x82/0xe0 [ 103.511973] ? inet_ioctl+0x187/0x1d0 [ 103.512016] dev_ioctl+0xb5/0x560 [ 103.512055] sock_do_ioctl+0xa0/0x140 [ 103.512098] sock_ioctl+0x2cb/0x3c0 [ 103.512139] __x64_sys_ioctl+0x84/0xc0 [ 103.512183] do_syscall_64+0x33/0x80 [ 103.512224] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 103.512274] RIP: 0033:0x7f54a9cba427 ...
Previous versions can be found at: v1: initial version https://lkml.org/lkml/2020/10/28/921
v2: do not return from lan743x_ethtool_set_wol if netdev->phydev == NULL, just skip the call of phy_ethtool_set_wol() instead. https://lkml.org/lkml/2020/10/31/380
v3: in function lan743x_ethtool_set_wol: use ternary operator instead of if-else sentence (review by Markus Elfring) return -ENETDOWN insted of -EIO (review by Andrew Lunn)
Signed-off-by: Sergej Bauer sbauer@blackbox.su
Link: https://lore.kernel.org/r/20201101223556.16116-1-sbauer@blackbox.su Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/microchip/lan743x_ethtool.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/microchip/lan743x_ethtool.c b/drivers/net/ethernet/microchip/lan743x_ethtool.c index dcde496da7fb4..c5de8f46cdd35 100644 --- a/drivers/net/ethernet/microchip/lan743x_ethtool.c +++ b/drivers/net/ethernet/microchip/lan743x_ethtool.c @@ -780,7 +780,9 @@ static void lan743x_ethtool_get_wol(struct net_device *netdev,
wol->supported = 0; wol->wolopts = 0; - phy_ethtool_get_wol(netdev->phydev, wol); + + if (netdev->phydev) + phy_ethtool_get_wol(netdev->phydev, wol);
wol->supported |= WAKE_BCAST | WAKE_UCAST | WAKE_MCAST | WAKE_MAGIC | WAKE_PHY | WAKE_ARP; @@ -809,9 +811,8 @@ static int lan743x_ethtool_set_wol(struct net_device *netdev,
device_set_wakeup_enable(&adapter->pdev->dev, (bool)wol->wolopts);
- phy_ethtool_set_wol(netdev->phydev, wol); - - return 0; + return netdev->phydev ? phy_ethtool_set_wol(netdev->phydev, wol) + : -ENETDOWN; } #endif /* CONFIG_PM */
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Max Verevkin me@maxverevkin.tk
[ Upstream commit 07b211992d6c0d80b321403244d43bbd2d6cf48c ]
The Pavilion 13 x360 PC has a chassis-type which does not indicate it is a convertible, while it is actually a convertible. Add it to the dmi_switches_allow_list.
Signed-off-by: Max Verevkin me@maxverevkin.tk Link: https://lore.kernel.org/r/20201124131652.11165-1-me@maxverevkin.tk Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/intel-vbtn.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/platform/x86/intel-vbtn.c b/drivers/platform/x86/intel-vbtn.c index a90c32d072da3..9c8a6722f115e 100644 --- a/drivers/platform/x86/intel-vbtn.c +++ b/drivers/platform/x86/intel-vbtn.c @@ -230,6 +230,12 @@ static const struct dmi_system_id dmi_switches_allow_list[] = { DMI_MATCH(DMI_PRODUCT_NAME, "Inspiron 7352"), }, }, + { + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"), + DMI_MATCH(DMI_PRODUCT_NAME, "HP Pavilion 13 x360 PC"), + }, + }, {} /* Array terminator */ };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit 2f7a04c7b03b7fd63b7618e29295fc25732faac1 ]
We're currently doing accounting on the queue sync with an atomic variable that counts down the number of remaining notifications that we still need.
As we've been hitting issues in this area, modify this to track a bitmap of queues, not just the number of queues, and print out the remaining bitmap in the warning.
Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Luca Coelho luciano.coelho@intel.com Link: https://lore.kernel.org/r/iwlwifi.20201209231352.0a3fa177cd6b.I7c69ff9994193... Signed-off-by: Luca Coelho luciano.coelho@intel.com Stable-dep-of: 5f8a3561ea8b ("iwlwifi: mvm: write queue_sync_state only for sync") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c | 11 ++++++----- drivers/net/wireless/intel/iwlwifi/mvm/mvm.h | 2 +- drivers/net/wireless/intel/iwlwifi/mvm/ops.c | 2 +- drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c | 10 +++++++--- 4 files changed, 15 insertions(+), 10 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c index d2c6fdb702732..f2096729ac5ac 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c @@ -5155,8 +5155,7 @@ void iwl_mvm_sync_rx_queues_internal(struct iwl_mvm *mvm,
if (notif->sync) { notif->cookie = mvm->queue_sync_cookie; - atomic_set(&mvm->queue_sync_counter, - mvm->trans->num_rx_queues); + mvm->queue_sync_state = (1 << mvm->trans->num_rx_queues) - 1; }
ret = iwl_mvm_notify_rx_queue(mvm, qmask, (u8 *)notif, @@ -5169,14 +5168,16 @@ void iwl_mvm_sync_rx_queues_internal(struct iwl_mvm *mvm, if (notif->sync) { lockdep_assert_held(&mvm->mutex); ret = wait_event_timeout(mvm->rx_sync_waitq, - atomic_read(&mvm->queue_sync_counter) == 0 || + READ_ONCE(mvm->queue_sync_state) == 0 || iwl_mvm_is_radio_killed(mvm), HZ); - WARN_ON_ONCE(!ret && !iwl_mvm_is_radio_killed(mvm)); + WARN_ONCE(!ret && !iwl_mvm_is_radio_killed(mvm), + "queue sync: failed to sync, state is 0x%lx\n", + mvm->queue_sync_state); }
out: - atomic_set(&mvm->queue_sync_counter, 0); + mvm->queue_sync_state = 0; if (notif->sync) mvm->queue_sync_cookie++; } diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h index 64f5a4cb3d3ac..8b779c3a92d43 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h @@ -842,7 +842,7 @@ struct iwl_mvm { unsigned long status;
u32 queue_sync_cookie; - atomic_t queue_sync_counter; + unsigned long queue_sync_state; /* * for beacon filtering - * currently only one interface can be supported diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c index 5b173f21e87bf..3548eb57f1f30 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c @@ -725,7 +725,7 @@ iwl_op_mode_mvm_start(struct iwl_trans *trans, const struct iwl_cfg *cfg,
init_waitqueue_head(&mvm->rx_sync_waitq);
- atomic_set(&mvm->queue_sync_counter, 0); + mvm->queue_sync_state = 0;
SET_IEEE80211_DEV(mvm->hw, mvm->trans->dev);
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c index 86b3fb321dfdd..e2a39e8b98d07 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c @@ -853,9 +853,13 @@ void iwl_mvm_rx_queue_notif(struct iwl_mvm *mvm, struct napi_struct *napi, WARN_ONCE(1, "Invalid identifier %d", internal_notif->type); }
- if (internal_notif->sync && - !atomic_dec_return(&mvm->queue_sync_counter)) - wake_up(&mvm->rx_sync_waitq); + if (internal_notif->sync) { + WARN_ONCE(!test_and_clear_bit(queue, &mvm->queue_sync_state), + "queue sync: queue %d responded a second time!\n", + queue); + if (READ_ONCE(mvm->queue_sync_state) == 0) + wake_up(&mvm->rx_sync_waitq); + } }
static void iwl_mvm_oldsn_workaround(struct iwl_mvm *mvm,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit 5f8a3561ea8bf75ad52cb16dafe69dd550fa542e ]
We use mvm->queue_sync_state to wait for synchronous queue sync messages, but if an async one happens inbetween we shouldn't clear mvm->queue_sync_state after sending the async one, that can run concurrently (at least from the CPU POV) with another synchronous queue sync.
Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Luca Coelho luciano.coelho@intel.com Link: https://lore.kernel.org/r/iwlwifi.20210331121101.d11c9bcdb4aa.I0772171dbaec8... Signed-off-by: Luca Coelho luciano.coelho@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c index f2096729ac5ac..08008b0c0637c 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c @@ -5177,9 +5177,10 @@ void iwl_mvm_sync_rx_queues_internal(struct iwl_mvm *mvm, }
out: - mvm->queue_sync_state = 0; - if (notif->sync) + if (notif->sync) { + mvm->queue_sync_state = 0; mvm->queue_sync_cookie++; + } }
static void iwl_mvm_sync_rx_queues(struct ieee80211_hw *hw)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhang Yi yi.zhang@huawei.com
[ Upstream commit 214eb5a4d8a2032fb9f0711d1b202eb88ee02920 ]
Now that __jbd2_journal_remove_checkpoint() can detect buffer io error and mark journal checkpoint error, then we abort the journal later before updating log tail to ensure the filesystem works consistently. So we could remove other redundant buffer io error checkes.
Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20210610112440.3438139-5-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Stable-dep-of: e34c8dd238d0 ("jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/jbd2/checkpoint.c | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-)
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c index 472932b9e6bca..ea08adcea84c3 100644 --- a/fs/jbd2/checkpoint.c +++ b/fs/jbd2/checkpoint.c @@ -91,8 +91,7 @@ static int __try_to_free_cp_buf(struct journal_head *jh) int ret = 0; struct buffer_head *bh = jh2bh(jh);
- if (jh->b_transaction == NULL && !buffer_locked(bh) && - !buffer_dirty(bh) && !buffer_write_io_error(bh)) { + if (!jh->b_transaction && !buffer_locked(bh) && !buffer_dirty(bh)) { JBUFFER_TRACE(jh, "remove from checkpoint list"); ret = __jbd2_journal_remove_checkpoint(jh) + 1; } @@ -228,7 +227,6 @@ int jbd2_log_do_checkpoint(journal_t *journal) * OK, we need to start writing disk blocks. Take one transaction * and write it. */ - result = 0; spin_lock(&journal->j_list_lock); if (!journal->j_checkpoint_transactions) goto out; @@ -295,8 +293,6 @@ int jbd2_log_do_checkpoint(journal_t *journal) goto restart; } if (!buffer_dirty(bh)) { - if (unlikely(buffer_write_io_error(bh)) && !result) - result = -EIO; BUFFER_TRACE(bh, "remove from checkpoint"); if (__jbd2_journal_remove_checkpoint(jh)) /* The transaction was released; we're done */ @@ -356,8 +352,6 @@ int jbd2_log_do_checkpoint(journal_t *journal) spin_lock(&journal->j_list_lock); goto restart2; } - if (unlikely(buffer_write_io_error(bh)) && !result) - result = -EIO;
/* * Now in whatever state the buffer currently is, we @@ -369,10 +363,7 @@ int jbd2_log_do_checkpoint(journal_t *journal) } out: spin_unlock(&journal->j_list_lock); - if (result < 0) - jbd2_journal_abort(journal, result); - else - result = jbd2_cleanup_journal_tail(journal); + result = jbd2_cleanup_journal_tail(journal);
return (result < 0) ? result : 0; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhang Yi yi.zhang@huawei.com
[ Upstream commit c2d6fd9d6f35079f1669f0100f05b46708c74b7f ]
There is a long-standing metadata corruption issue that happens from time to time, but it's very difficult to reproduce and analyse, benefit from the JBD2_CYCLE_RECORD option, we found out that the problem is the checkpointing process miss to write out some buffers which are raced by another do_get_write_access(). Looks below for detail.
jbd2_log_do_checkpoint() //transaction X //buffer A is dirty and not belones to any transaction __buffer_relink_io() //move it to the IO list __flush_batch() write_dirty_buffer() do_get_write_access() clear_buffer_dirty __jbd2_journal_file_buffer() //add buffer A to a new transaction Y lock_buffer(bh) //doesn't write out __jbd2_journal_remove_checkpoint() //finish checkpoint except buffer A //filesystem corrupt if the new transaction Y isn't fully write out.
Due to the t_checkpoint_list walking loop in jbd2_log_do_checkpoint() have already handles waiting for buffers under IO and re-added new transaction to complete commit, and it also removing cleaned buffers, this makes sure the list will eventually get empty. So it's fine to leave buffers on the t_checkpoint_list while flushing out and completely stop using the t_checkpoint_io_list.
Cc: stable@vger.kernel.org Suggested-by: Jan Kara jack@suse.cz Signed-off-by: Zhang Yi yi.zhang@huawei.com Tested-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20230606135928.434610-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Stable-dep-of: e34c8dd238d0 ("jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/jbd2/checkpoint.c | 102 ++++++++++++------------------------------- 1 file changed, 29 insertions(+), 73 deletions(-)
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c index ea08adcea84c3..aac17c59b7ece 100644 --- a/fs/jbd2/checkpoint.c +++ b/fs/jbd2/checkpoint.c @@ -57,28 +57,6 @@ static inline void __buffer_unlink(struct journal_head *jh) } }
-/* - * Move a buffer from the checkpoint list to the checkpoint io list - * - * Called with j_list_lock held - */ -static inline void __buffer_relink_io(struct journal_head *jh) -{ - transaction_t *transaction = jh->b_cp_transaction; - - __buffer_unlink_first(jh); - - if (!transaction->t_checkpoint_io_list) { - jh->b_cpnext = jh->b_cpprev = jh; - } else { - jh->b_cpnext = transaction->t_checkpoint_io_list; - jh->b_cpprev = transaction->t_checkpoint_io_list->b_cpprev; - jh->b_cpprev->b_cpnext = jh; - jh->b_cpnext->b_cpprev = jh; - } - transaction->t_checkpoint_io_list = jh; -} - /* * Try to release a checkpointed buffer from its transaction. * Returns 1 if we released it and 2 if we also released the @@ -190,6 +168,7 @@ __flush_batch(journal_t *journal, int *batch_count) struct buffer_head *bh = journal->j_chkpt_bhs[i]; BUFFER_TRACE(bh, "brelse"); __brelse(bh); + journal->j_chkpt_bhs[i] = NULL; } *batch_count = 0; } @@ -249,6 +228,11 @@ int jbd2_log_do_checkpoint(journal_t *journal) jh = transaction->t_checkpoint_list; bh = jh2bh(jh);
+ /* + * The buffer may be writing back, or flushing out in the + * last couple of cycles, or re-adding into a new transaction, + * need to check it again until it's unlocked. + */ if (buffer_locked(bh)) { get_bh(bh); spin_unlock(&journal->j_list_lock); @@ -294,28 +278,32 @@ int jbd2_log_do_checkpoint(journal_t *journal) } if (!buffer_dirty(bh)) { BUFFER_TRACE(bh, "remove from checkpoint"); - if (__jbd2_journal_remove_checkpoint(jh)) - /* The transaction was released; we're done */ + /* + * If the transaction was released or the checkpoint + * list was empty, we're done. + */ + if (__jbd2_journal_remove_checkpoint(jh) || + !transaction->t_checkpoint_list) goto out; - continue; + } else { + /* + * We are about to write the buffer, it could be + * raced by some other transaction shrink or buffer + * re-log logic once we release the j_list_lock, + * leave it on the checkpoint list and check status + * again to make sure it's clean. + */ + BUFFER_TRACE(bh, "queue"); + get_bh(bh); + J_ASSERT_BH(bh, !buffer_jwrite(bh)); + journal->j_chkpt_bhs[batch_count++] = bh; + transaction->t_chp_stats.cs_written++; + transaction->t_checkpoint_list = jh->b_cpnext; } - /* - * Important: we are about to write the buffer, and - * possibly block, while still holding the journal - * lock. We cannot afford to let the transaction - * logic start messing around with this buffer before - * we write it to disk, as that would break - * recoverability. - */ - BUFFER_TRACE(bh, "queue"); - get_bh(bh); - J_ASSERT_BH(bh, !buffer_jwrite(bh)); - journal->j_chkpt_bhs[batch_count++] = bh; - __buffer_relink_io(jh); - transaction->t_chp_stats.cs_written++; + if ((batch_count == JBD2_NR_BATCH) || - need_resched() || - spin_needbreak(&journal->j_list_lock)) + need_resched() || spin_needbreak(&journal->j_list_lock) || + jh2bh(transaction->t_checkpoint_list) == journal->j_chkpt_bhs[0]) goto unlock_and_flush; }
@@ -329,38 +317,6 @@ int jbd2_log_do_checkpoint(journal_t *journal) goto restart; }
- /* - * Now we issued all of the transaction's buffers, let's deal - * with the buffers that are out for I/O. - */ -restart2: - /* Did somebody clean up the transaction in the meanwhile? */ - if (journal->j_checkpoint_transactions != transaction || - transaction->t_tid != this_tid) - goto out; - - while (transaction->t_checkpoint_io_list) { - jh = transaction->t_checkpoint_io_list; - bh = jh2bh(jh); - if (buffer_locked(bh)) { - get_bh(bh); - spin_unlock(&journal->j_list_lock); - wait_on_buffer(bh); - /* the journal_head may have gone by now */ - BUFFER_TRACE(bh, "brelse"); - __brelse(bh); - spin_lock(&journal->j_list_lock); - goto restart2; - } - - /* - * Now in whatever state the buffer currently is, we - * know that it has been written out and so we can - * drop it from the list - */ - if (__jbd2_journal_remove_checkpoint(jh)) - break; - } out: spin_unlock(&journal->j_list_lock); result = jbd2_cleanup_journal_tail(journal);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhihao Cheng chengzhihao1@huawei.com
[ Upstream commit e34c8dd238d0c9368b746480f313055f5bab5040 ]
Following process,
jbd2_journal_commit_transaction // there are several dirty buffer heads in transaction->t_checkpoint_list P1 wb_workfn jbd2_log_do_checkpoint if (buffer_locked(bh)) // false __block_write_full_page trylock_buffer(bh) test_clear_buffer_dirty(bh) if (!buffer_dirty(bh)) __jbd2_journal_remove_checkpoint(jh) if (buffer_write_io_error(bh)) // false >> bh IO error occurs << jbd2_cleanup_journal_tail __jbd2_update_log_tail jbd2_write_superblock // The bh won't be replayed in next mount. , which could corrupt the ext4 image, fetch a reproducer in [Link].
Since writeback process clears buffer dirty after locking buffer head, we can fix it by try locking buffer and check dirtiness while buffer is locked, the buffer head can be removed if it is neither dirty nor locked.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490 Fixes: 470decc613ab ("[PATCH] jbd2: initial copy of files from jbd") Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20230606135928.434610-5-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/jbd2/checkpoint.c | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-)
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c index aac17c59b7ece..7898983c9fba0 100644 --- a/fs/jbd2/checkpoint.c +++ b/fs/jbd2/checkpoint.c @@ -228,20 +228,6 @@ int jbd2_log_do_checkpoint(journal_t *journal) jh = transaction->t_checkpoint_list; bh = jh2bh(jh);
- /* - * The buffer may be writing back, or flushing out in the - * last couple of cycles, or re-adding into a new transaction, - * need to check it again until it's unlocked. - */ - if (buffer_locked(bh)) { - get_bh(bh); - spin_unlock(&journal->j_list_lock); - wait_on_buffer(bh); - /* the journal_head may have gone by now */ - BUFFER_TRACE(bh, "brelse"); - __brelse(bh); - goto retry; - } if (jh->b_transaction != NULL) { transaction_t *t = jh->b_transaction; tid_t tid = t->t_tid; @@ -276,7 +262,22 @@ int jbd2_log_do_checkpoint(journal_t *journal) spin_lock(&journal->j_list_lock); goto restart; } - if (!buffer_dirty(bh)) { + if (!trylock_buffer(bh)) { + /* + * The buffer is locked, it may be writing back, or + * flushing out in the last couple of cycles, or + * re-adding into a new transaction, need to check + * it again until it's unlocked. + */ + get_bh(bh); + spin_unlock(&journal->j_list_lock); + wait_on_buffer(bh); + /* the journal_head may have gone by now */ + BUFFER_TRACE(bh, "brelse"); + __brelse(bh); + goto retry; + } else if (!buffer_dirty(bh)) { + unlock_buffer(bh); BUFFER_TRACE(bh, "remove from checkpoint"); /* * If the transaction was released or the checkpoint @@ -286,6 +287,7 @@ int jbd2_log_do_checkpoint(journal_t *journal) !transaction->t_checkpoint_list) goto out; } else { + unlock_buffer(bh); /* * We are about to write the buffer, it could be * raced by some other transaction shrink or buffer
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jan Beulich jbeulich@suse.com
commit 1df931d95f4dc1c11db1123e85d4e08156e46ef9 upstream.
As noted (and fixed) a couple of times in the past, "=@cc<cond>" outputs and clobbering of "cc" don't work well together. The compiler appears to mean to reject such, but doesn't - in its upstream form - quite manage to yet for "cc". Furthermore two similar macros don't clobber "cc", and clobbering "cc" is pointless in asm()-s for x86 anyway - the compiler always assumes status flags to be clobbered there.
Fixes: 989b5db215a2 ("x86/uaccess: Implement macros for CMPXCHG on user addresses") Signed-off-by: Jan Beulich jbeulich@suse.com Message-Id: 485c0c0b-a3a7-0b7c-5264-7d00c01de032@suse.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/uaccess.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -471,7 +471,7 @@ do { \ [ptr] "+m" (*_ptr), \ [old] "+a" (__old) \ : [new] ltype (__new) \ - : "memory", "cc"); \ + : "memory"); \ if (unlikely(__err)) \ goto label; \ if (unlikely(!success)) \
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gao Xiang hsiangkao@linux.alibaba.com
commit 3c12466b6b7bf1e56f9b32c366a3d83d87afb4de upstream.
Currently EROFS can map another compressed buffer for inplace decompression, that was used to handle the cases that some pages of compressed data are actually not in-place I/O.
However, like most simple LZ77 algorithms, LZ4 expects the compressed data is arranged at the end of the decompressed buffer and it explicitly uses memmove() to handle overlapping: __________________________________________________________ |_ direction of decompression --> ____ |_ compressed data _|
Although EROFS arranges compressed data like this, it typically maps two individual virtual buffers so the relative order is uncertain. Previously, it was hardly observed since LZ4 only uses memmove() for short overlapped literals and x86/arm64 memmove implementations seem to completely cover it up and they don't have this issue. Juhyung reported that EROFS data corruption can be found on a new Intel x86 processor. After some analysis, it seems that recent x86 processors with the new FSRM feature expose this issue with "rep movsb".
Let's strictly use the decompressed buffer for lz4 inplace decompression for now. Later, as an useful improvement, we could try to tie up these two buffers together in the correct order.
Reported-and-tested-by: Juhyung Park qkrwngud825@gmail.com Closes: https://lore.kernel.org/r/CAD14+f2AVKf8Fa2OO1aAUdDNTDsVzzR6ctU_oJSmTyd6zSYR2... Fixes: 0ffd71bcc3a0 ("staging: erofs: introduce LZ4 decompression inplace") Fixes: 598162d05080 ("erofs: support decompress big pcluster for lz4 backend") Cc: stable stable@vger.kernel.org # 5.4+ Tested-by: Yifan Zhao zhaoyifan@sjtu.edu.cn Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://lore.kernel.org/r/20231206045534.3920847-1-hsiangkao@linux.alibaba.c... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/erofs/decompressor.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-)
--- a/fs/erofs/decompressor.c +++ b/fs/erofs/decompressor.c @@ -24,7 +24,8 @@ struct z_erofs_decompressor { */ int (*prepare_destpages)(struct z_erofs_decompress_req *rq, struct list_head *pagepool); - int (*decompress)(struct z_erofs_decompress_req *rq, u8 *out); + int (*decompress)(struct z_erofs_decompress_req *rq, u8 *out, + u8 *obase); char *name; };
@@ -114,10 +115,13 @@ static void *generic_copy_inplace_data(s return tmp; }
-static int z_erofs_lz4_decompress(struct z_erofs_decompress_req *rq, u8 *out) +static int z_erofs_lz4_decompress(struct z_erofs_decompress_req *rq, u8 *out, + u8 *obase) { + const uint nrpages_out = PAGE_ALIGN(rq->pageofs_out + + rq->outputsize) >> PAGE_SHIFT; unsigned int inputmargin, inlen; - u8 *src; + u8 *src, *src2; bool copied, support_0padding; int ret;
@@ -125,6 +129,7 @@ static int z_erofs_lz4_decompress(struct return -EOPNOTSUPP;
src = kmap_atomic(*rq->in); + src2 = src; inputmargin = 0; support_0padding = false;
@@ -148,16 +153,15 @@ static int z_erofs_lz4_decompress(struct if (rq->inplace_io) { const uint oend = (rq->pageofs_out + rq->outputsize) & ~PAGE_MASK; - const uint nr = PAGE_ALIGN(rq->pageofs_out + - rq->outputsize) >> PAGE_SHIFT; - if (rq->partial_decoding || !support_0padding || - rq->out[nr - 1] != rq->in[0] || + rq->out[nrpages_out - 1] != rq->in[0] || rq->inputsize - oend < LZ4_DECOMPRESS_INPLACE_MARGIN(inlen)) { src = generic_copy_inplace_data(rq, src, inputmargin); inputmargin = 0; copied = true; + } else { + src = obase + ((nrpages_out - 1) << PAGE_SHIFT); } }
@@ -187,7 +191,7 @@ static int z_erofs_lz4_decompress(struct if (copied) erofs_put_pcpubuf(src); else - kunmap_atomic(src); + kunmap_atomic(src2); return ret; }
@@ -257,7 +261,7 @@ static int z_erofs_decompress_generic(st return PTR_ERR(dst);
rq->inplace_io = false; - ret = alg->decompress(rq, dst); + ret = alg->decompress(rq, dst, NULL); if (!ret) copy_from_pcpubuf(rq->out, dst, rq->pageofs_out, rq->outputsize); @@ -291,7 +295,7 @@ static int z_erofs_decompress_generic(st dst_maptype = 2;
dstmap_out: - ret = alg->decompress(rq, dst + rq->pageofs_out); + ret = alg->decompress(rq, dst + rq->pageofs_out, dst);
if (!dst_maptype) kunmap_atomic(dst);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Vacek neelx@redhat.com
commit e6f57c6881916df39db7d95981a8ad2b9c3458d6 upstream.
Unfortunately the commit `fd8958efe877` introduced another error causing the `descs` array to overflow. This reults in further crashes easily reproducible by `sendmsg` system call.
[ 1080.836473] general protection fault, probably for non-canonical address 0x400300015528b00a: 0000 [#1] PREEMPT SMP PTI [ 1080.869326] RIP: 0010:hfi1_ipoib_build_ib_tx_headers.constprop.0+0xe1/0x2b0 [hfi1] -- [ 1080.974535] Call Trace: [ 1080.976990] <TASK> [ 1081.021929] hfi1_ipoib_send_dma_common+0x7a/0x2e0 [hfi1] [ 1081.027364] hfi1_ipoib_send_dma_list+0x62/0x270 [hfi1] [ 1081.032633] hfi1_ipoib_send+0x112/0x300 [hfi1] [ 1081.042001] ipoib_start_xmit+0x2a9/0x2d0 [ib_ipoib] [ 1081.046978] dev_hard_start_xmit+0xc4/0x210 -- [ 1081.148347] __sys_sendmsg+0x59/0xa0
crash> ipoib_txreq 0xffff9cfeba229f00 struct ipoib_txreq { txreq = { list = { next = 0xffff9cfeba229f00, prev = 0xffff9cfeba229f00 }, descp = 0xffff9cfeba229f40, coalesce_buf = 0x0, wait = 0xffff9cfea4e69a48, complete = 0xffffffffc0fe0760 <hfi1_ipoib_sdma_complete>, packet_len = 0x46d, tlen = 0x0, num_desc = 0x0, desc_limit = 0x6, next_descq_idx = 0x45c, coalesce_idx = 0x0, flags = 0x0, descs = {{ qw = {0x8024000120dffb00, 0x4} # SDMA_DESC0_FIRST_DESC_FLAG (bit 63) }, { qw = { 0x3800014231b108, 0x4} }, { qw = { 0x310000e4ee0fcf0, 0x8} }, { qw = { 0x3000012e9f8000, 0x8} }, { qw = { 0x59000dfb9d0000, 0x8} }, { qw = { 0x78000e02e40000, 0x8} }} }, sdma_hdr = 0x400300015528b000, <<< invalid pointer in the tx request structure sdma_status = 0x0, SDMA_DESC0_LAST_DESC_FLAG (bit 62) complete = 0x0, priv = 0x0, txq = 0xffff9cfea4e69880, skb = 0xffff9d099809f400 }
If an SDMA send consists of exactly 6 descriptors and requires dword padding (in the 7th descriptor), the sdma_txreq descriptor array is not properly expanded and the packet will overflow into the container structure. This results in a panic when the send completion runs. The exact panic varies depending on what elements of the container structure get corrupted. The fix is to use the correct expression in _pad_sdma_tx_descs() to test the need to expand the descriptor array.
With this patch the crashes are no longer reproducible and the machine is stable.
Fixes: fd8958efe877 ("IB/hfi1: Fix sdma.h tx->num_descs off-by-one errors") Cc: stable@vger.kernel.org Reported-by: Mats Kronberg kronberg@nsc.liu.se Tested-by: Mats Kronberg kronberg@nsc.liu.se Signed-off-by: Daniel Vacek neelx@redhat.com Link: https://lore.kernel.org/r/20240201081009.1109442-1-neelx@redhat.com Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/infiniband/hw/hfi1/sdma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/infiniband/hw/hfi1/sdma.c +++ b/drivers/infiniband/hw/hfi1/sdma.c @@ -3200,7 +3200,7 @@ int _pad_sdma_tx_descs(struct hfi1_devda { int rval = 0;
- if ((unlikely(tx->num_desc + 1 == tx->desc_limit))) { + if ((unlikely(tx->num_desc == tx->desc_limit))) { rval = _extend_sdma_tx_descs(dd, tx); if (rval) { __sdma_txclean(dd, tx);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Oberparleiter oberpar@linux.ibm.com
commit 5ef1dc40ffa6a6cb968b0fdc43c3a61727a9e950 upstream.
The s390 common I/O layer (CIO) returns an unexpected -EBUSY return code when drivers try to start I/O while a path-verification (PV) process is pending. This can lead to failed device initialization attempts with symptoms like broken network connectivity after boot.
Fix this by replacing the -EBUSY return code with a deferred condition code 1 reply to make path-verification handling consistent from a driver's point of view.
The problem can be reproduced semi-regularly using the following process, while repeating steps 2-3 as necessary (example assumes an OSA device with bus-IDs 0.0.a000-0.0.a002 on CHPID 0.02):
1. echo 0.0.a000,0.0.a001,0.0.a002 >/sys/bus/ccwgroup/drivers/qeth/group 2. echo 0 > /sys/bus/ccwgroup/devices/0.0.a000/online 3. echo 1 > /sys/bus/ccwgroup/devices/0.0.a000/online ; \ echo on > /sys/devices/css0/chp0.02/status
Background information:
The common I/O layer starts path-verification I/Os when it receives indications about changes in a device path's availability. This occurs for example when hardware events indicate a change in channel-path status, or when a manual operation such as a CHPID vary or configure operation is performed.
If a driver attempts to start I/O while a PV is running, CIO reports a successful I/O start (ccw_device_start() return code 0). Then, after completion of PV, CIO synthesizes an interrupt response that indicates an asynchronous status condition that prevented the start of the I/O (deferred condition code 1).
If a PV indication arrives while a device is busy with driver-owned I/O, PV is delayed until after I/O completion was reported to the driver's interrupt handler. To ensure that PV can be started eventually, CIO reports a device busy condition (ccw_device_start() return code -EBUSY) if a driver tries to start another I/O while PV is pending.
In some cases this -EBUSY return code causes device drivers to consider a device not operational, resulting in failed device initialization.
Note: The code that introduced the problem was added in 2003. Symptoms started appearing with the following CIO commit that causes a PV indication when a device is removed from the cio_ignore list after the associated parent subchannel device was probed, but before online processing of the CCW device has started:
2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers")
During boot, the cio_ignore list is modified by the cio_ignore dracut module [1] as well as Linux vendor-specific systemd service scripts[2]. When combined, this commit and boot scripts cause a frequent occurrence of the problem during boot.
[1] https://github.com/dracutdevs/dracut/tree/master/modules.d/81cio_ignore [2] https://github.com/SUSE/s390-tools/blob/master/cio_ignore.service
Cc: stable@vger.kernel.org # v5.15+ Fixes: 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers") Tested-By: Thorsten Winkler twinkler@linux.ibm.com Reviewed-by: Thorsten Winkler twinkler@linux.ibm.com Signed-off-by: Peter Oberparleiter oberpar@linux.ibm.com Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/cio/device_ops.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/s390/cio/device_ops.c b/drivers/s390/cio/device_ops.c index c533d1dadc6b..a5dba3829769 100644 --- a/drivers/s390/cio/device_ops.c +++ b/drivers/s390/cio/device_ops.c @@ -202,7 +202,8 @@ int ccw_device_start_timeout_key(struct ccw_device *cdev, struct ccw1 *cpa, return -EINVAL; if (cdev->private->state == DEV_STATE_NOT_OPER) return -ENODEV; - if (cdev->private->state == DEV_STATE_VERIFY) { + if (cdev->private->state == DEV_STATE_VERIFY || + cdev->private->flags.doverify) { /* Remember to fake irb when finished. */ if (!cdev->private->flags.fake_irb) { cdev->private->flags.fake_irb = FAKE_CMD_IRB; @@ -214,8 +215,7 @@ int ccw_device_start_timeout_key(struct ccw_device *cdev, struct ccw1 *cpa, } if (cdev->private->state != DEV_STATE_ONLINE || ((sch->schib.scsw.cmd.stctl & SCSW_STCTL_PRIM_STATUS) && - !(sch->schib.scsw.cmd.stctl & SCSW_STCTL_SEC_STATUS)) || - cdev->private->flags.doverify) + !(sch->schib.scsw.cmd.stctl & SCSW_STCTL_SEC_STATUS))) return -EBUSY; ret = cio_set_options (sch, flags); if (ret)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mikulas Patocka mpatocka@redhat.com
commit 50c70240097ce41fe6bce6478b80478281e4d0f7 upstream.
It was said that authenticated encryption could produce invalid tag when the data that is being encrypted is modified [1]. So, fix this problem by copying the data into the clone bio first and then encrypt them inside the clone bio.
This may reduce performance, but it is needed to prevent the user from corrupting the device by writing data with O_DIRECT and modifying them at the same time.
[1] https://lore.kernel.org/all/20240207004723.GA35324@sol.localdomain/T/
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-crypt.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -2064,6 +2064,12 @@ static void kcryptd_crypt_write_convert( io->ctx.bio_out = clone; io->ctx.iter_out = clone->bi_iter;
+ if (crypt_integrity_aead(cc)) { + bio_copy_data(clone, io->base_bio); + io->ctx.bio_in = clone; + io->ctx.iter_in = clone->bi_iter; + } + sector += bio_sectors(clone);
crypt_inc_pending(io);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oliver Upton oliver.upton@linux.dev
commit 85a71ee9a0700f6c18862ef3b0011ed9dad99aca upstream.
It is possible that an LPI mapped in a different ITS gets unmapped while handling the MOVALL command. If that is the case, there is no state that can be migrated to the destination. Silently ignore it and continue migrating other LPIs.
Cc: stable@vger.kernel.org Fixes: ff9c114394aa ("KVM: arm/arm64: GICv4: Handle MOVALL applied to a vPE") Signed-off-by: Oliver Upton oliver.upton@linux.dev Link: https://lore.kernel.org/r/20240221092732.4126848-3-oliver.upton@linux.dev Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/vgic/vgic-its.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/arch/arm64/kvm/vgic/vgic-its.c +++ b/arch/arm64/kvm/vgic/vgic-its.c @@ -1374,6 +1374,8 @@ static int vgic_its_cmd_handle_movall(st
for (i = 0; i < irq_count; i++) { irq = vgic_get_irq(kvm, NULL, intids[i]); + if (!irq) + continue;
update_affinity(irq, vcpu2);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oliver Upton oliver.upton@linux.dev
commit 8d3a7dfb801d157ac423261d7cd62c33e95375f8 upstream.
vgic_get_irq() may not return a valid descriptor if there is no ITS that holds a valid translation for the specified INTID. If that is the case, it is safe to silently ignore it and continue processing the LPI pending table.
Cc: stable@vger.kernel.org Fixes: 33d3bc9556a7 ("KVM: arm64: vgic-its: Read initial LPI pending table") Signed-off-by: Oliver Upton oliver.upton@linux.dev Link: https://lore.kernel.org/r/20240221092732.4126848-2-oliver.upton@linux.dev Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/vgic/vgic-its.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/arch/arm64/kvm/vgic/vgic-its.c +++ b/arch/arm64/kvm/vgic/vgic-its.c @@ -462,6 +462,9 @@ static int its_sync_lpi_pending_table(st }
irq = vgic_get_irq(vcpu->kvm, NULL, intids[i]); + if (!irq) + continue; + raw_spin_lock_irqsave(&irq->irq_lock, flags); irq->pending_latch = pendmask & (1U << bit_nr); vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasiliy Kovalev kovalev@altlinux.org
commit 136cfaca22567a03bbb3bf53a43d8cb5748b80ec upstream.
The gtp_net_ops pernet operations structure for the subsystem must be registered before registering the generic netlink family.
Syzkaller hit 'general protection fault in gtp_genl_dump_pdp' bug:
general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017] CPU: 1 PID: 5826 Comm: gtp Not tainted 6.8.0-rc3-std-def-alt1 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-alt1 04/01/2014 RIP: 0010:gtp_genl_dump_pdp+0x1be/0x800 [gtp] Code: c6 89 c6 e8 64 e9 86 df 58 45 85 f6 0f 85 4e 04 00 00 e8 c5 ee 86 df 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 de 05 00 00 48 8b 44 24 18 4c 8b 30 4c 39 f0 74 RSP: 0018:ffff888014107220 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff88800fcda588 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f1be4eb05c0(0000) GS:ffff88806ce80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1be4e766cf CR3: 000000000c33e000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? show_regs+0x90/0xa0 ? die_addr+0x50/0xd0 ? exc_general_protection+0x148/0x220 ? asm_exc_general_protection+0x22/0x30 ? gtp_genl_dump_pdp+0x1be/0x800 [gtp] ? __alloc_skb+0x1dd/0x350 ? __pfx___alloc_skb+0x10/0x10 genl_dumpit+0x11d/0x230 netlink_dump+0x5b9/0xce0 ? lockdep_hardirqs_on_prepare+0x253/0x430 ? __pfx_netlink_dump+0x10/0x10 ? kasan_save_track+0x10/0x40 ? __kasan_kmalloc+0x9b/0xa0 ? genl_start+0x675/0x970 __netlink_dump_start+0x6fc/0x9f0 genl_family_rcv_msg_dumpit+0x1bb/0x2d0 ? __pfx_genl_family_rcv_msg_dumpit+0x10/0x10 ? genl_op_from_small+0x2a/0x440 ? cap_capable+0x1d0/0x240 ? __pfx_genl_start+0x10/0x10 ? __pfx_genl_dumpit+0x10/0x10 ? __pfx_genl_done+0x10/0x10 ? security_capable+0x9d/0xe0
Cc: stable@vger.kernel.org Signed-off-by: Vasiliy Kovalev kovalev@altlinux.org Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)") Link: https://lore.kernel.org/r/20240214162733.34214-1-kovalev@altlinux.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/gtp.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -1410,20 +1410,20 @@ static int __init gtp_init(void) if (err < 0) goto error_out;
- err = genl_register_family(>p_genl_family); + err = register_pernet_subsys(>p_net_ops); if (err < 0) goto unreg_rtnl_link;
- err = register_pernet_subsys(>p_net_ops); + err = genl_register_family(>p_genl_family); if (err < 0) - goto unreg_genl_family; + goto unreg_pernet_subsys;
pr_info("GTP module loaded (pdp ctx size %zd bytes)\n", sizeof(struct pdp_ctx)); return 0;
-unreg_genl_family: - genl_unregister_family(>p_genl_family); +unreg_pernet_subsys: + unregister_pernet_subsys(>p_net_ops); unreg_rtnl_link: rtnl_link_unregister(>p_link_ops); error_out:
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vidya Sagar vidyas@nvidia.com
commit db744ddd59be798c2627efbfc71f707f5a935a40 upstream.
While calculating the hardware interrupt number for a MSI interrupt, the higher bits (i.e. from bit-5 onwards a.k.a domain_nr >= 32) of the PCI domain number gets truncated because of the shifted value casting to return type of pci_domain_nr() which is 'int'. This for example is resulting in same hardware interrupt number for devices 0019:00:00.0 and 0039:00:00.0.
To address this cast the PCI domain number to 'irq_hw_number_t' before left shifting it to calculate the hardware interrupt number.
Please note that this fixes the issue only on 64-bit systems and doesn't change the behavior for 32-bit systems i.e. the 32-bit systems continue to have the issue. Since the issue surfaces only if there are too many PCIe controllers in the system which usually is the case in modern server systems and they don't tend to run 32-bit kernels.
Fixes: 3878eaefb89a ("PCI/MSI: Enhance core to support hierarchy irqdomain") Signed-off-by: Vidya Sagar vidyas@nvidia.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Shanker Donthineni sdonthineni@nvidia.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240115135649.708536-1-vidyas@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/msi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -1409,7 +1409,7 @@ static irq_hw_number_t pci_msi_domain_ca
return (irq_hw_number_t)desc->msi_attrib.entry_nr | pci_dev_id(dev) << 11 | - (pci_domain_nr(dev->bus) & 0xFFFFFFFF) << 27; + ((irq_hw_number_t)(pci_domain_nr(dev->bus) & 0xFFFFFFFF)) << 27; }
static inline bool pci_msi_desc_is_multi_msi(struct msi_desc *desc)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tom Parkin tparkin@katalix.com
commit 359e54a93ab43d32ee1bff3c2f9f10cb9f6b6e79 upstream.
l2tp_ip6_sendmsg needs to avoid accounting for the transport header twice when splicing more data into an already partially-occupied skbuff.
To manage this, we check whether the skbuff contains data using skb_queue_empty when deciding how much data to append using ip6_append_data.
However, the code which performed the calculation was incorrect:
ulen = len + skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0;
...due to C operator precedence, this ends up setting ulen to transhdrlen for messages with a non-zero length, which results in corrupted packets on the wire.
Add parentheses to correct the calculation in line with the original intent.
Fixes: 9d4c75800f61 ("ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data()") Cc: David Howells dhowells@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Tom Parkin tparkin@katalix.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240220122156.43131-1-tparkin@katalix.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/l2tp/l2tp_ip6.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/l2tp/l2tp_ip6.c +++ b/net/l2tp/l2tp_ip6.c @@ -628,7 +628,7 @@ static int l2tp_ip6_sendmsg(struct sock
back_from_confirm: lock_sock(sk); - ulen = len + skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0; + ulen = len + (skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0); err = ip6_append_data(sk, ip_generic_getfrag, msg, ulen, transhdrlen, &ipc6, &fl6, (struct rt6_info *)dst,
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikita Shubin nikita.shubin@maquefel.me
commit fdf87a0dc26d0550c60edc911cda42f9afec3557 upstream.
Without the terminator, if a con_id is passed to gpio_find() that does not exist in the lookup table the function will not stop looping correctly, and eventually cause an oops.
Cc: stable@vger.kernel.org Fixes: b2e63555592f ("i2c: gpio: Convert to use descriptors") Reported-by: Andy Shevchenko andriy.shevchenko@intel.com Signed-off-by: Nikita Shubin nikita.shubin@maquefel.me Reviewed-by: Linus Walleij linus.walleij@linaro.org Acked-by: Alexander Sverdlin alexander.sverdlin@gmail.com Signed-off-by: Alexander Sverdlin alexander.sverdlin@gmail.com Link: https://lore.kernel.org/r/20240205102337.439002-1-alexander.sverdlin@gmail.c... Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/mach-ep93xx/core.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/arm/mach-ep93xx/core.c +++ b/arch/arm/mach-ep93xx/core.c @@ -337,6 +337,7 @@ static struct gpiod_lookup_table ep93xx_ GPIO_ACTIVE_HIGH | GPIO_OPEN_DRAIN), GPIO_LOOKUP_IDX("G", 0, NULL, 1, GPIO_ACTIVE_HIGH | GPIO_OPEN_DRAIN), + { } }, };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Borislav Petkov (AMD)" bp@alien8.de
This reverts commit 3eb602ad6a94a76941f93173131a71ad36fa1324.
Revert the backport of upstream commit
1f001e9da6bb ("x86/ftrace: Use alternative RET encoding")
in favor of a proper backport after backporting the commit which adds __text_gen_insn().
Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/ftrace.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)
--- a/arch/x86/kernel/ftrace.c +++ b/arch/x86/kernel/ftrace.c @@ -311,7 +311,7 @@ union ftrace_op_code_union { } __attribute__((packed)); };
-#define RET_SIZE (IS_ENABLED(CONFIG_RETPOLINE) ? 5 : 1 + IS_ENABLED(CONFIG_SLS)) +#define RET_SIZE 1 + IS_ENABLED(CONFIG_SLS)
static unsigned long create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size) @@ -367,12 +367,7 @@ create_trampoline(struct ftrace_ops *ops goto fail;
ip = trampoline + size; - - /* The trampoline ends with ret(q) */ - if (cpu_feature_enabled(X86_FEATURE_RETHUNK)) - memcpy(ip, text_gen_insn(JMP32_INSN_OPCODE, ip, &__x86_return_thunk), JMP32_INSN_SIZE); - else - memcpy(ip, retq, sizeof(retq)); + memcpy(ip, retq, RET_SIZE);
/* No need to test direct calls on created trampolines */ if (ops->flags & FTRACE_OPS_FL_SAVE_REGS) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Zijlstra peterz@infradead.org
Upstream commit: bbf92368b0b1fe472d489e62d3340d7897e9c697
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Acked-by: Josh Poimboeuf jpoimboe@redhat.com Link: https://lore.kernel.org/r/20220308154317.638561109@infradead.org Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/text-patching.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -101,13 +101,21 @@ void *text_gen_insn(u8 opcode, const voi static union text_poke_insn insn; /* per instance */ int size = text_opcode_size(opcode);
+ /* + * Hide the addresses to avoid the compiler folding in constants when + * referencing code, these can mess up annotations like + * ANNOTATE_NOENDBR. + */ + OPTIMIZER_HIDE_VAR(addr); + OPTIMIZER_HIDE_VAR(dest); + insn.opcode = opcode;
if (size > 1) { insn.disp = (long)dest - (long)(addr + size); if (size == 2) { /* - * Ensure that for JMP9 the displacement + * Ensure that for JMP8 the displacement * actually fits the signed byte. */ BUG_ON((insn.disp >> 31) != (insn.disp >> 7));
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Zijlstra peterz@infradead.org
Upstream commit: ba27d1a80871eb8dbeddf34ec7d396c149cbb8d7
Less duplication is more better.
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Acked-by: Josh Poimboeuf jpoimboe@redhat.com Link: https://lore.kernel.org/r/20220308154317.697253958@infradead.org [ Keep struct branch. ] Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/text-patching.h | 20 ++++++++++++++------ arch/x86/kernel/paravirt.c | 22 +++++----------------- 2 files changed, 19 insertions(+), 23 deletions(-)
--- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -96,32 +96,40 @@ union text_poke_insn { };
static __always_inline -void *text_gen_insn(u8 opcode, const void *addr, const void *dest) +void __text_gen_insn(void *buf, u8 opcode, const void *addr, const void *dest, int size) { - static union text_poke_insn insn; /* per instance */ - int size = text_opcode_size(opcode); + union text_poke_insn *insn = buf; + + BUG_ON(size < text_opcode_size(opcode));
/* * Hide the addresses to avoid the compiler folding in constants when * referencing code, these can mess up annotations like * ANNOTATE_NOENDBR. */ + OPTIMIZER_HIDE_VAR(insn); OPTIMIZER_HIDE_VAR(addr); OPTIMIZER_HIDE_VAR(dest);
- insn.opcode = opcode; + insn->opcode = opcode;
if (size > 1) { - insn.disp = (long)dest - (long)(addr + size); + insn->disp = (long)dest - (long)(addr + size); if (size == 2) { /* * Ensure that for JMP8 the displacement * actually fits the signed byte. */ - BUG_ON((insn.disp >> 31) != (insn.disp >> 7)); + BUG_ON((insn->disp >> 31) != (insn->disp >> 7)); } } +}
+static __always_inline +void *text_gen_insn(u8 opcode, const void *addr, const void *dest) +{ + static union text_poke_insn insn; /* per instance */ + __text_gen_insn(&insn, opcode, addr, dest, text_opcode_size(opcode)); return &insn.text; }
--- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -55,28 +55,16 @@ void __init default_banner(void) static const unsigned char ud2a[] = { 0x0f, 0x0b };
struct branch { - unsigned char opcode; - u32 delta; + unsigned char opcode; + u32 delta; } __attribute__((packed));
static unsigned paravirt_patch_call(void *insn_buff, const void *target, unsigned long addr, unsigned len) { - const int call_len = 5; - struct branch *b = insn_buff; - unsigned long delta = (unsigned long)target - (addr+call_len); - - if (len < call_len) { - pr_warn("paravirt: Failed to patch indirect CALL at %ps\n", (void *)addr); - /* Kernel might not be viable if patching fails, bail out: */ - BUG_ON(1); - } - - b->opcode = 0xe8; /* call */ - b->delta = delta; - BUILD_BUG_ON(sizeof(*b) != call_len); - - return call_len; + __text_gen_insn(insn_buff, CALL_INSN_OPCODE, + (void *)addr, target, CALL_INSN_SIZE); + return CALL_INSN_SIZE; }
#ifdef CONFIG_PARAVIRT_XXL
On Tue, Feb 27, 2024 at 02:27:26PM +0100, Greg Kroah-Hartman wrote:
5.10-stable review patch. If anyone has any objections, please let me know.
From: Peter Zijlstra peterz@infradead.org
Upstream commit: ba27d1a80871eb8dbeddf34ec7d396c149cbb8d7
Less duplication is more better.
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Acked-by: Josh Poimboeuf jpoimboe@redhat.com Link: https://lore.kernel.org/r/20220308154317.697253958@infradead.org [ Keep struct branch. ] Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
arch/x86/include/asm/text-patching.h | 20 ++++++++++++++------ arch/x86/kernel/paravirt.c | 22 +++++----------------- 2 files changed, 19 insertions(+), 23 deletions(-)
--- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -96,32 +96,40 @@ union text_poke_insn { }; static __always_inline -void *text_gen_insn(u8 opcode, const void *addr, const void *dest) +void __text_gen_insn(void *buf, u8 opcode, const void *addr, const void *dest, int size) {
- static union text_poke_insn insn; /* per instance */
- int size = text_opcode_size(opcode);
- union text_poke_insn *insn = buf;
- BUG_ON(size < text_opcode_size(opcode));
/* * Hide the addresses to avoid the compiler folding in constants when * referencing code, these can mess up annotations like * ANNOTATE_NOENDBR. */
- OPTIMIZER_HIDE_VAR(insn); OPTIMIZER_HIDE_VAR(addr); OPTIMIZER_HIDE_VAR(dest);
- insn.opcode = opcode;
- insn->opcode = opcode;
if (size > 1) {
insn.disp = (long)dest - (long)(addr + size);
if (size == 2) { /* * Ensure that for JMP8 the displacement * actually fits the signed byte. */insn->disp = (long)dest - (long)(addr + size);
BUG_ON((insn.disp >> 31) != (insn.disp >> 7));
} }BUG_ON((insn->disp >> 31) != (insn->disp >> 7));
+} +static __always_inline +void *text_gen_insn(u8 opcode, const void *addr, const void *dest) +{
- static union text_poke_insn insn; /* per instance */
- __text_gen_insn(&insn, opcode, addr, dest, text_opcode_size(opcode)); return &insn.text;
} --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -55,28 +55,16 @@ void __init default_banner(void) static const unsigned char ud2a[] = { 0x0f, 0x0b }; struct branch {
- unsigned char opcode;
- u32 delta;
unsigned char opcode;
u32 delta;
} __attribute__((packed)); static unsigned paravirt_patch_call(void *insn_buff, const void *target, unsigned long addr, unsigned len) {
- const int call_len = 5;
- struct branch *b = insn_buff;
- unsigned long delta = (unsigned long)target - (addr+call_len);
- if (len < call_len) {
pr_warn("paravirt: Failed to patch indirect CALL at %ps\n", (void *)addr);
/* Kernel might not be viable if patching fails, bail out: */
BUG_ON(1);
- }
- b->opcode = 0xe8; /* call */
- b->delta = delta;
- BUILD_BUG_ON(sizeof(*b) != call_len);
- return call_len;
- __text_gen_insn(insn_buff, CALL_INSN_OPCODE,
(void *)addr, target, CALL_INSN_SIZE);
- return CALL_INSN_SIZE;
v5.10.211 is failing to build with the attached .config with the following error:
/home/runner/work/drgn/drgn/build/vmtest/linux.git/arch/x86/kernel/paravirt.c: In function 'paravirt_patch_call': /home/runner/work/drgn/drgn/build/vmtest/linux.git/arch/x86/kernel/paravirt.c:65:9: error: implicit declaration of function '__text_gen_insn' [-Werror=implicit-function-declaration] 65 | __text_gen_insn(insn_buff, CALL_INSN_OPCODE, | ^~~~~~~~~~~~~~~ /home/runner/work/drgn/drgn/build/vmtest/linux.git/arch/x86/kernel/paravirt.c:65:36: error: 'CALL_INSN_OPCODE' undeclared (first use in this function) 65 | __text_gen_insn(insn_buff, CALL_INSN_OPCODE, | ^~~~~~~~~~~~~~~~ /home/runner/work/drgn/drgn/build/vmtest/linux.git/arch/x86/kernel/paravirt.c:65:36: note: each undeclared identifier is reported only once for each function it appears in /home/runner/work/drgn/drgn/build/vmtest/linux.git/arch/x86/kernel/paravirt.c:66:47: error: 'CALL_INSN_SIZE' undeclared (first use in this function) 66 | (void *)addr, target, CALL_INSN_SIZE); | ^~~~~~~~~~~~~~ /home/runner/work/drgn/drgn/build/vmtest/linux.git/arch/x86/kernel/paravirt.c:68:1: error: control reaches end of non-void function [-Werror=return-type] 68 | } | ^
Newer versions include asm/text-patching.h through the include of linux/static-call.h added by a0e2bf7cb700 ("x86/paravirt: Switch time pvops functions to use static_call()"). This code is gone in mainline as of commit f7af6977621a ("x86/paravirt: Remove no longer needed paravirt patching code"). What's the preferred way to resolve this?
Thanks, Omar
On Mon, Mar 04, 2024 at 10:49:33AM -0800, Omar Sandoval wrote:
v5.10.211 is failing to build with the attached .config with the following error:
Ok, let's try this:
--- From: "Borislav Petkov (AMD)" bp@alien8.de
The Link tag has all the details but basically due to missing upstream commits, the header which contains __text_gen_insn() is not in the includes in paravirt.c, leading to:
arch/x86/kernel/paravirt.c: In function 'paravirt_patch_call': arch/x86/kernel/paravirt.c:65:9: error: implicit declaration of function '__text_gen_insn' \ [-Werror=implicit-function-declaration] 65 | __text_gen_insn(insn_buff, CALL_INSN_OPCODE, | ^~~~~~~~~~~~~~~
Add the missing include.
Reported-by: Omar Sandoval osandov@osandov.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Link: https://lore.kernel.org/r/ZeYXvd1-rVkPGvvW@telecaster --- arch/x86/kernel/paravirt.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 5bea8d93883a..f0e4ad8595ca 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -31,6 +31,7 @@ #include <asm/special_insns.h> #include <asm/tlb.h> #include <asm/io_bitmap.h> +#include <asm/text-patching.h>
/* * nop stub, which must not clobber anything *including the stack* to
On Tue, Mar 05, 2024 at 12:27:11PM +0100, Borislav Petkov wrote:
On Mon, Mar 04, 2024 at 10:49:33AM -0800, Omar Sandoval wrote:
v5.10.211 is failing to build with the attached .config with the following error:
Ok, let's try this:
From: "Borislav Petkov (AMD)" bp@alien8.de
The Link tag has all the details but basically due to missing upstream commits, the header which contains __text_gen_insn() is not in the includes in paravirt.c, leading to:
arch/x86/kernel/paravirt.c: In function 'paravirt_patch_call': arch/x86/kernel/paravirt.c:65:9: error: implicit declaration of function '__text_gen_insn' \ [-Werror=implicit-function-declaration] 65 | __text_gen_insn(insn_buff, CALL_INSN_OPCODE, | ^~~~~~~~~~~~~~~
Add the missing include.
Reported-by: Omar Sandoval osandov@osandov.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Link: https://lore.kernel.org/r/ZeYXvd1-rVkPGvvW@telecaster
arch/x86/kernel/paravirt.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 5bea8d93883a..f0e4ad8595ca 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -31,6 +31,7 @@ #include <asm/special_insns.h> #include <asm/tlb.h> #include <asm/io_bitmap.h> +#include <asm/text-patching.h> /*
- nop stub, which must not clobber anything *including the stack* to
-- 2.43.0
-- Regards/Gruss, Boris.
Tested-by: Omar Sandoval osandov@osandov.com
Thanks!
Hi All,
v5.10.211 is failing to build with the attached .config with the following error:
/home/runner/work/drgn/drgn/build/vmtest/linux.git/arch/x86/kernel/paravirt.c: In function 'paravirt_patch_call': /home/runner/work/drgn/drgn/build/vmtest/linux.git/arch/x86/kernel/paravirt.c:65:9: error: implicit declaration of function '__text_gen_insn' [-Werror=implicit-function-declaration] 65 | __text_gen_insn(insn_buff, CALL_INSN_OPCODE, | ^~~~~~~~~~~~~~~ /home/runner/work/drgn/drgn/build/vmtest/linux.git/arch/x86/kernel/paravirt.c:65:36: error: 'CALL_INSN_OPCODE' undeclared (first use in this function) 65 | __text_gen_insn(insn_buff, CALL_INSN_OPCODE, | ^~~~~~~~~~~~~~~~ /home/runner/work/drgn/drgn/build/vmtest/linux.git/arch/x86/kernel/paravirt.c:65:36: note: each undeclared identifier is reported only once for each function it appears in /home/runner/work/drgn/drgn/build/vmtest/linux.git/arch/x86/kernel/paravirt.c:66:47: error: 'CALL_INSN_SIZE' undeclared (first use in this function) 66 | (void *)addr, target, CALL_INSN_SIZE); | ^~~~~~~~~~~~~~ /home/runner/work/drgn/drgn/build/vmtest/linux.git/arch/x86/kernel/paravirt.c:68:1: error: control reaches end of non-void function [-Werror=return-type] 68 | } | ^
Exact same issue here, also since 5.10.211 and still present in 5.10.213.
A bisect shows the same commit: [b253061d4b863f0e9ebf25a3cf0313501a5d51b4] x86/ibt,paravirt: Use text_gen_insn() for paravirt_patch()
I add the used config file in attachment.
Regards, Emeric Brun
On Mon, Mar 18, 2024 at 11:17:10AM +0100, Emeric Brun wrote:
Exact same issue here, also since 5.10.211 and still present in 5.10.213.
Sasha,
can you pls pick up this one:
https://lore.kernel.org/r/20240305112711.GAZecBj5TMaQDSz6Ym@fat_crate.local
?
See upthread for what it fixes.
Thx.
On Mon, Mar 18, 2024 at 03:20:14PM +0100, Borislav Petkov wrote:
On Mon, Mar 18, 2024 at 11:17:10AM +0100, Emeric Brun wrote:
Exact same issue here, also since 5.10.211 and still present in 5.10.213.
Sasha,
can you pls pick up this one:
https://lore.kernel.org/r/20240305112711.GAZecBj5TMaQDSz6Ym@fat_crate.local
I'll take it, thanks!
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Zijlstra peterz@infradead.org
Upstream commit: 1f001e9da6bbf482311e45e48f53c2bd2179e59c
Use the return thunk in ftrace trampolines, if needed.
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Borislav Petkov bp@suse.de Reviewed-by: Josh Poimboeuf jpoimboe@kernel.org Signed-off-by: Borislav Petkov bp@suse.de Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/ftrace.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/ftrace.c +++ b/arch/x86/kernel/ftrace.c @@ -311,7 +311,7 @@ union ftrace_op_code_union { } __attribute__((packed)); };
-#define RET_SIZE 1 + IS_ENABLED(CONFIG_SLS) +#define RET_SIZE (IS_ENABLED(CONFIG_RETPOLINE) ? 5 : 1 + IS_ENABLED(CONFIG_SLS))
static unsigned long create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size) @@ -367,7 +367,10 @@ create_trampoline(struct ftrace_ops *ops goto fail;
ip = trampoline + size; - memcpy(ip, retq, RET_SIZE); + if (cpu_feature_enabled(X86_FEATURE_RETHUNK)) + __text_gen_insn(ip, JMP32_INSN_OPCODE, ip, &__x86_return_thunk, JMP32_INSN_SIZE); + else + memcpy(ip, retq, sizeof(retq));
/* No need to test direct calls on created trampolines */ if (ops->flags & FTRACE_OPS_FL_SAVE_REGS) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Zijlstra peterz@infradead.org
Upstream commit: 770ae1b709528a6a173b5c7b183818ee9b45e376
In preparation for call depth tracking on Intel SKL CPUs, make it possible to patch in a SKL specific return thunk.
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lore.kernel.org/r/20220915111147.680469665@infradead.org Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/nospec-branch.h | 6 ++++++ arch/x86/kernel/alternative.c | 19 ++++++++++++++----- arch/x86/kernel/cpu/bugs.c | 2 ++ arch/x86/kernel/ftrace.c | 2 +- arch/x86/kernel/static_call.c | 2 +- arch/x86/net/bpf_jit_comp.c | 2 +- 6 files changed, 25 insertions(+), 8 deletions(-)
--- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -207,6 +207,12 @@ extern void srso_alias_untrain_ret(void) extern void entry_untrain_ret(void); extern void entry_ibpb(void);
+#ifdef CONFIG_CALL_THUNKS +extern void (*x86_return_thunk)(void); +#else +#define x86_return_thunk (&__x86_return_thunk) +#endif + #ifdef CONFIG_RETPOLINE
typedef u8 retpoline_thunk_t[RETPOLINE_THUNK_SIZE]; --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -676,6 +676,11 @@ void __init_or_module noinline apply_ret }
#ifdef CONFIG_RETHUNK + +#ifdef CONFIG_CALL_THUNKS +void (*x86_return_thunk)(void) __ro_after_init = &__x86_return_thunk; +#endif + /* * Rewrite the compiler generated return thunk tail-calls. * @@ -691,14 +696,18 @@ static int patch_return(void *addr, stru { int i = 0;
- if (cpu_feature_enabled(X86_FEATURE_RETHUNK)) - return -1; - - bytes[i++] = RET_INSN_OPCODE; + if (cpu_feature_enabled(X86_FEATURE_RETHUNK)) { + if (x86_return_thunk == __x86_return_thunk) + return -1; + + i = JMP32_INSN_SIZE; + __text_gen_insn(bytes, JMP32_INSN_OPCODE, addr, x86_return_thunk, i); + } else { + bytes[i++] = RET_INSN_OPCODE; + }
for (; i < insn->length;) bytes[i++] = INT3_INSN_OPCODE; - return i; }
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -61,7 +61,9 @@ EXPORT_SYMBOL_GPL(x86_pred_cmd);
static DEFINE_MUTEX(spec_ctrl_mutex);
+#ifdef CONFIG_CALL_THUNKS void (*x86_return_thunk)(void) __ro_after_init = &__x86_return_thunk; +#endif
/* Update SPEC_CTRL MSR and its cached copy unconditionally */ static void update_spec_ctrl(u64 val) --- a/arch/x86/kernel/ftrace.c +++ b/arch/x86/kernel/ftrace.c @@ -368,7 +368,7 @@ create_trampoline(struct ftrace_ops *ops
ip = trampoline + size; if (cpu_feature_enabled(X86_FEATURE_RETHUNK)) - __text_gen_insn(ip, JMP32_INSN_OPCODE, ip, &__x86_return_thunk, JMP32_INSN_SIZE); + __text_gen_insn(ip, JMP32_INSN_OPCODE, ip, x86_return_thunk, JMP32_INSN_SIZE); else memcpy(ip, retq, sizeof(retq));
--- a/arch/x86/kernel/static_call.c +++ b/arch/x86/kernel/static_call.c @@ -41,7 +41,7 @@ static void __ref __static_call_transfor
case RET: if (cpu_feature_enabled(X86_FEATURE_RETHUNK)) - code = text_gen_insn(JMP32_INSN_OPCODE, insn, &__x86_return_thunk); + code = text_gen_insn(JMP32_INSN_OPCODE, insn, x86_return_thunk); else code = &retinsn; break; --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -405,7 +405,7 @@ static void emit_return(u8 **pprog, u8 * int cnt = 0;
if (cpu_feature_enabled(X86_FEATURE_RETHUNK)) { - emit_jump(&prog, &__x86_return_thunk, ip); + emit_jump(&prog, x86_return_thunk, ip); } else { EMIT1(0xC3); /* ret */ if (IS_ENABLED(CONFIG_SLS))
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Borislav Petkov (AMD)" bp@alien8.de
This reverts commit 08f7cfd44f77b2796582bc26164fdef44dd33b6c.
Revert the backport of upstream commit:
095b8303f383 ("x86/alternative: Make custom return thunk unconditional")
in order to backport the full version now that
770ae1b70952 ("x86/returnthunk: Allow different return thunks")
has been backported.
Revert it here so that the build breakage is kept at minimum.
Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/nospec-branch.h | 4 ---- arch/x86/kernel/cpu/bugs.c | 4 ---- 2 files changed, 8 deletions(-)
--- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -190,11 +190,7 @@ _ASM_PTR " 999b\n\t" \ ".popsection\n\t"
-#ifdef CONFIG_RETHUNK extern void __x86_return_thunk(void); -#else -static inline void __x86_return_thunk(void) {} -#endif
extern void retbleed_return_thunk(void); extern void srso_return_thunk(void); --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -61,10 +61,6 @@ EXPORT_SYMBOL_GPL(x86_pred_cmd);
static DEFINE_MUTEX(spec_ctrl_mutex);
-#ifdef CONFIG_CALL_THUNKS -void (*x86_return_thunk)(void) __ro_after_init = &__x86_return_thunk; -#endif - /* Update SPEC_CTRL MSR and its cached copy unconditionally */ static void update_spec_ctrl(u64 val) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Zijlstra peterz@infradead.org
Upstream commit: 095b8303f3835c68ac4a8b6d754ca1c3b6230711
There is infrastructure to rewrite return thunks to point to any random thunk one desires, unwrap that from CALL_THUNKS, which up to now was the sole user of that.
[ bp: Make the thunks visible on 32-bit and add ifdeffery for the 32-bit builds. ]
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Link: https://lore.kernel.org/r/20230814121148.775293785@infradead.org Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/nospec-branch.h | 8 ++++---- arch/x86/kernel/alternative.c | 4 ---- arch/x86/kernel/cpu/bugs.c | 2 ++ 3 files changed, 6 insertions(+), 8 deletions(-)
--- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -190,7 +190,11 @@ _ASM_PTR " 999b\n\t" \ ".popsection\n\t"
+#ifdef CONFIG_RETHUNK extern void __x86_return_thunk(void); +#else +static inline void __x86_return_thunk(void) {} +#endif
extern void retbleed_return_thunk(void); extern void srso_return_thunk(void); @@ -203,11 +207,7 @@ extern void srso_alias_untrain_ret(void) extern void entry_untrain_ret(void); extern void entry_ibpb(void);
-#ifdef CONFIG_CALL_THUNKS extern void (*x86_return_thunk)(void); -#else -#define x86_return_thunk (&__x86_return_thunk) -#endif
#ifdef CONFIG_RETPOLINE
--- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -677,10 +677,6 @@ void __init_or_module noinline apply_ret
#ifdef CONFIG_RETHUNK
-#ifdef CONFIG_CALL_THUNKS -void (*x86_return_thunk)(void) __ro_after_init = &__x86_return_thunk; -#endif - /* * Rewrite the compiler generated return thunk tail-calls. * --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -61,6 +61,8 @@ EXPORT_SYMBOL_GPL(x86_pred_cmd);
static DEFINE_MUTEX(spec_ctrl_mutex);
+void (*x86_return_thunk)(void) __ro_after_init = &__x86_return_thunk; + /* Update SPEC_CTRL MSR and its cached copy unconditionally */ static void update_spec_ctrl(u64 val) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frank Li Frank.Li@nxp.com
commit cd45f99034b0c8c9cb346dd0d6407a95ca3d36f6 upstream.
... cdns3_gadget_ep_free_request(&priv_ep->endpoint, &priv_req->request); list_del_init(&priv_req->list); ...
'priv_req' actually free at cdns3_gadget_ep_free_request(). But list_del_init() use priv_req->list after it.
[ 1542.642868][ T534] BUG: KFENCE: use-after-free read in __list_del_entry_valid+0x10/0xd4 [ 1542.642868][ T534] [ 1542.653162][ T534] Use-after-free read at 0x000000009ed0ba99 (in kfence-#3): [ 1542.660311][ T534] __list_del_entry_valid+0x10/0xd4 [ 1542.665375][ T534] cdns3_gadget_ep_disable+0x1f8/0x388 [cdns3] [ 1542.671571][ T534] usb_ep_disable+0x44/0xe4 [ 1542.675948][ T534] ffs_func_eps_disable+0x64/0xc8 [ 1542.680839][ T534] ffs_func_set_alt+0x74/0x368 [ 1542.685478][ T534] ffs_func_disable+0x18/0x28
Move list_del_init() before cdns3_gadget_ep_free_request() to resolve this problem.
Cc: stable@vger.kernel.org Fixes: 7733f6c32e36 ("usb: cdns3: Add Cadence USB3 DRD Driver") Signed-off-by: Frank Li Frank.Li@nxp.com Reviewed-by: Roger Quadros rogerq@kernel.org Acked-by: Peter Chen peter.chen@kernel.org Link: https://lore.kernel.org/r/20240202154217.661867-1-Frank.Li@nxp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/cdns3/gadget.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/usb/cdns3/gadget.c +++ b/drivers/usb/cdns3/gadget.c @@ -2538,11 +2538,11 @@ static int cdns3_gadget_ep_disable(struc
while (!list_empty(&priv_ep->wa2_descmiss_req_list)) { priv_req = cdns3_next_priv_request(&priv_ep->wa2_descmiss_req_list); + list_del_init(&priv_req->list);
kfree(priv_req->request.buf); cdns3_gadget_ep_free_request(&priv_ep->endpoint, &priv_req->request); - list_del_init(&priv_req->list); --priv_ep->wa2_counter; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frank Li Frank.Li@nxp.com
commit 5fd9e45f1ebcd57181358af28506e8a661a260b3 upstream.
829 if (request->complete) { 830 spin_unlock(&priv_dev->lock); 831 usb_gadget_giveback_request(&priv_ep->endpoint, 832 request); 833 spin_lock(&priv_dev->lock); 834 } 835 836 if (request->buf == priv_dev->zlp_buf) 837 cdns3_gadget_ep_free_request(&priv_ep->endpoint, request);
Driver append an additional zero packet request when queue a packet, which length mod max packet size is 0. When transfer complete, run to line 831, usb_gadget_giveback_request() will free this requestion. 836 condition is true, so cdns3_gadget_ep_free_request() free this request again.
Log:
[ 1920.140696][ T150] BUG: KFENCE: use-after-free read in cdns3_gadget_giveback+0x134/0x2c0 [cdns3] [ 1920.140696][ T150] [ 1920.151837][ T150] Use-after-free read at 0x000000003d1cd10b (in kfence-#36): [ 1920.159082][ T150] cdns3_gadget_giveback+0x134/0x2c0 [cdns3] [ 1920.164988][ T150] cdns3_transfer_completed+0x438/0x5f8 [cdns3]
Add check at line 829, skip call usb_gadget_giveback_request() if it is additional zero length packet request. Needn't call usb_gadget_giveback_request() because it is allocated in this driver.
Cc: stable@vger.kernel.org Fixes: 7733f6c32e36 ("usb: cdns3: Add Cadence USB3 DRD Driver") Signed-off-by: Frank Li Frank.Li@nxp.com Reviewed-by: Roger Quadros rogerq@kernel.org Acked-by: Peter Chen peter.chen@kernel.org Link: https://lore.kernel.org/r/20240202154217.661867-2-Frank.Li@nxp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/cdns3/gadget.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/usb/cdns3/gadget.c +++ b/drivers/usb/cdns3/gadget.c @@ -837,7 +837,11 @@ void cdns3_gadget_giveback(struct cdns3_ return; }
- if (request->complete) { + /* + * zlp request is appended by driver, needn't call usb_gadget_giveback_request() to notify + * gadget composite driver. + */ + if (request->complete && request->buf != priv_dev->zlp_buf) { spin_unlock(&priv_dev->lock); usb_gadget_giveback_request(&priv_ep->endpoint, request);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krishna Kurapati quic_kriskura@quicinc.com
commit 76c51146820c5dac629f21deafab0a7039bc3ccd upstream.
It is observed sometimes when tethering is used over NCM with Windows 11 as host, at some instances, the gadget_giveback has one byte appended at the end of a proper NTB. When the NTB is parsed, unwrap call looks for any leftover bytes in SKB provided by u_ether and if there are any pending bytes, it treats them as a separate NTB and parses it. But in case the second NTB (as per unwrap call) is faulty/corrupt, all the datagrams that were parsed properly in the first NTB and saved in rx_list are dropped.
Adding a few custom traces showed the following: [002] d..1 7828.532866: dwc3_gadget_giveback: ep1out: req 000000003868811a length 1025/16384 zsI ==> 0 [002] d..1 7828.532867: ncm_unwrap_ntb: K: ncm_unwrap_ntb toprocess: 1025 [002] d..1 7828.532867: ncm_unwrap_ntb: K: ncm_unwrap_ntb nth: 1751999342 [002] d..1 7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb seq: 0xce67 [002] d..1 7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb blk_len: 0x400 [002] d..1 7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb ndp_len: 0x10 [002] d..1 7828.532869: ncm_unwrap_ntb: K: Parsed NTB with 1 frames
In this case, the giveback is of 1025 bytes and block length is 1024. The rest 1 byte (which is 0x00) won't be parsed resulting in drop of all datagrams in rx_list.
Same is case with packets of size 2048: [002] d..1 7828.557948: dwc3_gadget_giveback: ep1out: req 0000000011dfd96e length 2049/16384 zsI ==> 0 [002] d..1 7828.557949: ncm_unwrap_ntb: K: ncm_unwrap_ntb nth: 1751999342 [002] d..1 7828.557950: ncm_unwrap_ntb: K: ncm_unwrap_ntb blk_len: 0x800
Lecroy shows one byte coming in extra confirming that the byte is coming in from PC:
Transfer 2959 - Bytes Transferred(1025) Timestamp((18.524 843 590) - Transaction 8391 - Data(1025 bytes) Timestamp(18.524 843 590) --- Packet 4063861 Data(1024 bytes) Duration(2.117us) Idle(14.700ns) Timestamp(18.524 843 590) --- Packet 4063863 Data(1 byte) Duration(66.160ns) Time(282.000ns) Timestamp(18.524 845 722)
According to Windows driver, no ZLP is needed if wBlockLength is non-zero, because the non-zero wBlockLength has already told the function side the size of transfer to be expected. However, there are in-market NCM devices that rely on ZLP as long as the wBlockLength is multiple of wMaxPacketSize. To deal with such devices, it pads an extra 0 at end so the transfer is no longer multiple of wMaxPacketSize.
Cc: stable@vger.kernel.org Fixes: 9f6ce4240a2b ("usb: gadget: f_ncm.c added") Signed-off-by: Krishna Kurapati quic_kriskura@quicinc.com Reviewed-by: Maciej Żenczykowski maze@google.com Link: https://lore.kernel.org/r/20240205074650.200304-1-quic_kriskura@quicinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/gadget/function/f_ncm.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/drivers/usb/gadget/function/f_ncm.c +++ b/drivers/usb/gadget/function/f_ncm.c @@ -1349,7 +1349,15 @@ parse_ntb: "Parsed NTB with %d frames\n", dgram_counter);
to_process -= block_len; - if (to_process != 0) { + + /* + * Windows NCM driver avoids USB ZLPs by adding a 1-byte + * zero pad as needed. + */ + if (to_process == 1 && + (*(unsigned char *)(ntb_ptr + block_len) == 0x00)) { + to_process--; + } else if (to_process > 0) { ntb_ptr = (unsigned char *)(ntb_ptr + block_len); goto parse_ntb; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xu Yang xu.yang_2@nxp.com
commit 1c9be13846c0b2abc2480602f8ef421360e1ad9e upstream.
In current design, usb role class driver will get usb_role_switch parent's module reference after the user get usb_role_switch device and put the reference after the user put the usb_role_switch device. However, the parent device of usb_role_switch may be removed before the user put the usb_role_switch. If so, then, NULL pointer issue will be met when the user put the parent module's reference.
This will save the module pointer in structure of usb_role_switch. Then, we don't need to find module by iterating long relations.
Fixes: 5c54fcac9a9d ("usb: roles: Take care of driver module reference counting") cc: stable@vger.kernel.org Signed-off-by: Xu Yang xu.yang_2@nxp.com Acked-by: Heikki Krogerus heikki.krogerus@linux.intel.com Link: https://lore.kernel.org/r/20240129093739.2371530-1-xu.yang_2@nxp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/roles/class.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
--- a/drivers/usb/roles/class.c +++ b/drivers/usb/roles/class.c @@ -19,6 +19,7 @@ static struct class *role_class; struct usb_role_switch { struct device dev; struct mutex lock; /* device lock*/ + struct module *module; /* the module this device depends on */ enum usb_role role;
/* From descriptor */ @@ -133,7 +134,7 @@ struct usb_role_switch *usb_role_switch_ usb_role_switch_match);
if (!IS_ERR_OR_NULL(sw)) - WARN_ON(!try_module_get(sw->dev.parent->driver->owner)); + WARN_ON(!try_module_get(sw->module));
return sw; } @@ -155,7 +156,7 @@ struct usb_role_switch *fwnode_usb_role_ sw = fwnode_connection_find_match(fwnode, "usb-role-switch", NULL, usb_role_switch_match); if (!IS_ERR_OR_NULL(sw)) - WARN_ON(!try_module_get(sw->dev.parent->driver->owner)); + WARN_ON(!try_module_get(sw->module));
return sw; } @@ -170,7 +171,7 @@ EXPORT_SYMBOL_GPL(fwnode_usb_role_switch void usb_role_switch_put(struct usb_role_switch *sw) { if (!IS_ERR_OR_NULL(sw)) { - module_put(sw->dev.parent->driver->owner); + module_put(sw->module); put_device(&sw->dev); } } @@ -187,15 +188,18 @@ struct usb_role_switch * usb_role_switch_find_by_fwnode(const struct fwnode_handle *fwnode) { struct device *dev; + struct usb_role_switch *sw = NULL;
if (!fwnode) return NULL;
dev = class_find_device_by_fwnode(role_class, fwnode); - if (dev) - WARN_ON(!try_module_get(dev->parent->driver->owner)); + if (dev) { + sw = to_role_switch(dev); + WARN_ON(!try_module_get(sw->module)); + }
- return dev ? to_role_switch(dev) : NULL; + return sw; } EXPORT_SYMBOL_GPL(usb_role_switch_find_by_fwnode);
@@ -328,6 +332,7 @@ usb_role_switch_register(struct device * sw->set = desc->set; sw->get = desc->get;
+ sw->module = parent->driver->owner; sw->dev.parent = parent; sw->dev.fwnode = desc->fwnode; sw->dev.class = role_class;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xu Yang xu.yang_2@nxp.com
commit b787a3e781759026a6212736ef8e52cf83d1821a upstream.
There is a possibility that usb_role_switch device is unregistered before the user put usb_role_switch. In this case, the user may still want to get/set_role() since the user can't sense the changes of usb_role_switch.
This will add a flag to show if usb_role_switch is already registered and avoid unwanted behaviors.
Fixes: fde0aa6c175a ("usb: common: Small class for USB role switches") cc: stable@vger.kernel.org Signed-off-by: Xu Yang xu.yang_2@nxp.com Acked-by: Heikki Krogerus heikki.krogerus@linux.intel.com Link: https://lore.kernel.org/r/20240129093739.2371530-2-xu.yang_2@nxp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/roles/class.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
--- a/drivers/usb/roles/class.c +++ b/drivers/usb/roles/class.c @@ -21,6 +21,7 @@ struct usb_role_switch { struct mutex lock; /* device lock*/ struct module *module; /* the module this device depends on */ enum usb_role role; + bool registered;
/* From descriptor */ struct device *usb2_port; @@ -47,6 +48,9 @@ int usb_role_switch_set_role(struct usb_ if (IS_ERR_OR_NULL(sw)) return 0;
+ if (!sw->registered) + return -EOPNOTSUPP; + mutex_lock(&sw->lock);
ret = sw->set(sw, role); @@ -72,7 +76,7 @@ enum usb_role usb_role_switch_get_role(s { enum usb_role role;
- if (IS_ERR_OR_NULL(sw)) + if (IS_ERR_OR_NULL(sw) || !sw->registered) return USB_ROLE_NONE;
mutex_lock(&sw->lock); @@ -347,6 +351,8 @@ usb_role_switch_register(struct device * return ERR_PTR(ret); }
+ sw->registered = true; + /* TODO: Symlinks for the host port and the device controller. */
return sw; @@ -361,8 +367,10 @@ EXPORT_SYMBOL_GPL(usb_role_switch_regist */ void usb_role_switch_unregister(struct usb_role_switch *sw) { - if (!IS_ERR_OR_NULL(sw)) + if (!IS_ERR_OR_NULL(sw)) { + sw->registered = false; device_unregister(&sw->dev); + } } EXPORT_SYMBOL_GPL(usb_role_switch_unregister);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
commit b8adb69a7d29c2d33eb327bca66476fb6066516b upstream.
Since the introduction of the subflow ULP diag interface, the dump callback accessed all the subflow data with lockless.
We need either to annotate all the read and write operation accordingly, or acquire the subflow socket lock. Let's do latter, even if slower, to avoid a diffstat havoc.
Fixes: 5147dfb50832 ("mptcp: allow dumping subflow context to userspace") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/tcp.h | 2 +- net/mptcp/diag.c | 6 +++++- net/tls/tls_main.c | 2 +- 3 files changed, 7 insertions(+), 3 deletions(-)
--- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2224,7 +2224,7 @@ struct tcp_ulp_ops { /* cleanup ulp */ void (*release)(struct sock *sk); /* diagnostic */ - int (*get_info)(const struct sock *sk, struct sk_buff *skb); + int (*get_info)(struct sock *sk, struct sk_buff *skb); size_t (*get_info_size)(const struct sock *sk); /* clone ulp */ void (*clone)(const struct request_sock *req, struct sock *newsk, --- a/net/mptcp/diag.c +++ b/net/mptcp/diag.c @@ -13,17 +13,19 @@ #include <uapi/linux/mptcp.h> #include "protocol.h"
-static int subflow_get_info(const struct sock *sk, struct sk_buff *skb) +static int subflow_get_info(struct sock *sk, struct sk_buff *skb) { struct mptcp_subflow_context *sf; struct nlattr *start; u32 flags = 0; + bool slow; int err;
start = nla_nest_start_noflag(skb, INET_ULP_INFO_MPTCP); if (!start) return -EMSGSIZE;
+ slow = lock_sock_fast(sk); rcu_read_lock(); sf = rcu_dereference(inet_csk(sk)->icsk_ulp_data); if (!sf) { @@ -69,11 +71,13 @@ static int subflow_get_info(const struct }
rcu_read_unlock(); + unlock_sock_fast(sk, slow); nla_nest_end(skb, start); return 0;
nla_failure: rcu_read_unlock(); + unlock_sock_fast(sk, slow); nla_nest_cancel(skb, start); return err; } --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -800,7 +800,7 @@ static void tls_update(struct sock *sk, } }
-static int tls_get_info(const struct sock *sk, struct sk_buff *skb) +static int tls_get_info(struct sock *sk, struct sk_buff *skb) { u16 version, cipher_type; struct tls_context *ctx;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhipeng Lu alexious@zju.edu.cn
[ Upstream commit 809aa64ebff51eb170ee31a95f83b2d21efa32e2 ]
When dma_alloc_coherent fails to allocate dd->cr_base[i].va, init_credit_return should deallocate dd->cr_base and dd->cr_base[i] that allocated before. Or those resources would be never freed and a memleak is triggered.
Fixes: 7724105686e7 ("IB/hfi1: add driver files") Signed-off-by: Zhipeng Lu alexious@zju.edu.cn Link: https://lore.kernel.org/r/20240112085523.3731720-1-alexious@zju.edu.cn Acked-by: Dennis Dalessandro dennis.dalessandro@cornelisnetworks.com Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/hfi1/pio.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/hfi1/pio.c b/drivers/infiniband/hw/hfi1/pio.c index 60eb3a64518f3..969004258692b 100644 --- a/drivers/infiniband/hw/hfi1/pio.c +++ b/drivers/infiniband/hw/hfi1/pio.c @@ -2131,7 +2131,7 @@ int init_credit_return(struct hfi1_devdata *dd) "Unable to allocate credit return DMA range for NUMA %d\n", i); ret = -ENOMEM; - goto done; + goto free_cr_base; } } set_dev_node(&dd->pcidev->dev, dd->node); @@ -2139,6 +2139,10 @@ int init_credit_return(struct hfi1_devdata *dd) ret = 0; done: return ret; + +free_cr_base: + free_credit_return(dd); + goto done; }
void free_credit_return(struct hfi1_devdata *dd)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kalesh AP kalesh-anakkur.purayil@broadcom.com
[ Upstream commit 3687b450c5f32e80f179ce4b09e0454da1449eac ]
SRQ resize is not supported in the driver. But driver is not returning error from bnxt_re_modify_srq() for SRQ resize.
Fixes: 37cb11acf1f7 ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters") Signed-off-by: Kalesh AP kalesh-anakkur.purayil@broadcom.com Signed-off-by: Selvin Xavier selvin.xavier@broadcom.com Link: https://lore.kernel.org/r/1705985677-15551-5-git-send-email-selvin.xavier@br... Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/bnxt_re/ib_verbs.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c index 2a973a1390a4a..a0d7777acb6d4 100644 --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c @@ -1711,7 +1711,7 @@ int bnxt_re_modify_srq(struct ib_srq *ib_srq, struct ib_srq_attr *srq_attr, switch (srq_attr_mask) { case IB_SRQ_MAX_WR: /* SRQ resize is not supported */ - break; + return -EINVAL; case IB_SRQ_LIMIT: /* Change the SRQ threshold */ if (srq_attr->srq_limit > srq->qplib_srq.max_wqe) @@ -1726,13 +1726,12 @@ int bnxt_re_modify_srq(struct ib_srq *ib_srq, struct ib_srq_attr *srq_attr, /* On success, update the shadow */ srq->srq_limit = srq_attr->srq_limit; /* No need to Build and send response back to udata */ - break; + return 0; default: ibdev_err(&rdev->ibdev, "Unsupported srq_attr_mask 0x%x", srq_attr_mask); return -EINVAL; } - return 0; }
int bnxt_re_query_srq(struct ib_srq *ib_srq, struct ib_srq_attr *srq_attr)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bart Van Assche bvanassche@acm.org
[ Upstream commit fdfa083549de5d50ebf7f6811f33757781e838c0 ]
Make loading ib_srpt with this parameter set work. The current behavior is that setting that parameter while loading the ib_srpt kernel module triggers the following kernel crash:
BUG: kernel NULL pointer dereference, address: 0000000000000000 Call Trace: <TASK> parse_one+0x18c/0x1d0 parse_args+0xe1/0x230 load_module+0x8de/0xa60 init_module_from_file+0x8b/0xd0 idempotent_init_module+0x181/0x240 __x64_sys_finit_module+0x5a/0xb0 do_syscall_64+0x5f/0xe0 entry_SYSCALL_64_after_hwframe+0x6e/0x76
Cc: LiHonggang honggangli@163.com Reported-by: LiHonggang honggangli@163.com Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1") Signed-off-by: Bart Van Assche bvanassche@acm.org Link: https://lore.kernel.org/r/20240205004207.17031-1-bvanassche@acm.org Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/ulp/srpt/ib_srpt.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 983f59c87b79f..80e99e9e97172 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -79,12 +79,16 @@ module_param(srpt_srq_size, int, 0444); MODULE_PARM_DESC(srpt_srq_size, "Shared receive queue (SRQ) size.");
+static int srpt_set_u64_x(const char *buffer, const struct kernel_param *kp) +{ + return kstrtou64(buffer, 16, (u64 *)kp->arg); +} static int srpt_get_u64_x(char *buffer, const struct kernel_param *kp) { return sprintf(buffer, "0x%016llx\n", *(u64 *)kp->arg); } -module_param_call(srpt_service_guid, NULL, srpt_get_u64_x, &srpt_service_guid, - 0444); +module_param_call(srpt_service_guid, srpt_set_u64_x, srpt_get_u64_x, + &srpt_service_guid, 0444); MODULE_PARM_DESC(srpt_service_guid, "Using this value for ioc_guid, id_ext, and cm_listen_id instead of using the node_guid of the first HCA.");
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kamal Heib kheib@redhat.com
[ Upstream commit 5ba4e6d5863c53e937f49932dee0ecb004c65928 ]
Avoid the following warning by making sure to free the allocated resources in case that qedr_init_user_queue() fail.
-----------[ cut here ]----------- WARNING: CPU: 0 PID: 143192 at drivers/infiniband/core/rdma_core.c:874 uverbs_destroy_ufile_hw+0xcf/0xf0 [ib_uverbs] Modules linked in: tls target_core_user uio target_core_pscsi target_core_file target_core_iblock ib_srpt ib_srp scsi_transport_srp nfsd nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs 8021q garp mrp stp llc ext4 mbcache jbd2 opa_vnic ib_umad ib_ipoib sunrpc rdma_ucm ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm hfi1 intel_rapl_msr intel_rapl_common mgag200 qedr sb_edac drm_shmem_helper rdmavt x86_pkg_temp_thermal drm_kms_helper intel_powerclamp ib_uverbs coretemp i2c_algo_bit kvm_intel dell_wmi_descriptor ipmi_ssif sparse_keymap kvm ib_core rfkill syscopyarea sysfillrect video sysimgblt irqbypass ipmi_si ipmi_devintf fb_sys_fops rapl iTCO_wdt mxm_wmi iTCO_vendor_support intel_cstate pcspkr dcdbas intel_uncore ipmi_msghandler lpc_ich acpi_power_meter mei_me mei fuse drm xfs libcrc32c qede sd_mod ahci libahci t10_pi sg crct10dif_pclmul crc32_pclmul crc32c_intel qed libata tg3 ghash_clmulni_intel megaraid_sas crc8 wmi [last unloaded: ib_srpt] CPU: 0 PID: 143192 Comm: fi_rdm_tagged_p Kdump: loaded Not tainted 5.14.0-408.el9.x86_64 #1 Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 2.14.0 01/25/2022 RIP: 0010:uverbs_destroy_ufile_hw+0xcf/0xf0 [ib_uverbs] Code: 5d 41 5c 41 5d 41 5e e9 0f 26 1b dd 48 89 df e8 67 6a ff ff 49 8b 86 10 01 00 00 48 85 c0 74 9c 4c 89 e7 e8 83 c0 cb dd eb 92 <0f> 0b eb be 0f 0b be 04 00 00 00 48 89 df e8 8e f5 ff ff e9 6d ff RSP: 0018:ffffb7c6cadfbc60 EFLAGS: 00010286 RAX: ffff8f0889ee3f60 RBX: ffff8f088c1a5200 RCX: 00000000802a0016 RDX: 00000000802a0017 RSI: 0000000000000001 RDI: ffff8f0880042600 RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 R10: ffff8f11fffd5000 R11: 0000000000039000 R12: ffff8f0d5b36cd80 R13: ffff8f088c1a5250 R14: ffff8f1206d91000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8f11d7c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000147069200e20 CR3: 00000001c7210002 CR4: 00000000001706f0 Call Trace: <TASK> ? show_trace_log_lvl+0x1c4/0x2df ? show_trace_log_lvl+0x1c4/0x2df ? ib_uverbs_close+0x1f/0xb0 [ib_uverbs] ? uverbs_destroy_ufile_hw+0xcf/0xf0 [ib_uverbs] ? __warn+0x81/0x110 ? uverbs_destroy_ufile_hw+0xcf/0xf0 [ib_uverbs] ? report_bug+0x10a/0x140 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? uverbs_destroy_ufile_hw+0xcf/0xf0 [ib_uverbs] ib_uverbs_close+0x1f/0xb0 [ib_uverbs] __fput+0x94/0x250 task_work_run+0x5c/0x90 do_exit+0x270/0x4a0 do_group_exit+0x2d/0x90 get_signal+0x87c/0x8c0 arch_do_signal_or_restart+0x25/0x100 ? ib_uverbs_ioctl+0xc2/0x110 [ib_uverbs] exit_to_user_mode_loop+0x9c/0x130 exit_to_user_mode_prepare+0xb6/0x100 syscall_exit_to_user_mode+0x12/0x40 do_syscall_64+0x69/0x90 ? syscall_exit_work+0x103/0x130 ? syscall_exit_to_user_mode+0x22/0x40 ? do_syscall_64+0x69/0x90 ? syscall_exit_work+0x103/0x130 ? syscall_exit_to_user_mode+0x22/0x40 ? do_syscall_64+0x69/0x90 ? do_syscall_64+0x69/0x90 ? common_interrupt+0x43/0xa0 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x1470abe3ec6b Code: Unable to access opcode bytes at RIP 0x1470abe3ec41. RSP: 002b:00007fff13ce9108 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: fffffffffffffffc RBX: 00007fff13ce9218 RCX: 00001470abe3ec6b RDX: 00007fff13ce9200 RSI: 00000000c0181b01 RDI: 0000000000000004 RBP: 00007fff13ce91e0 R08: 0000558d9655da10 R09: 0000558d9655dd00 R10: 00007fff13ce95c0 R11: 0000000000000246 R12: 00007fff13ce9358 R13: 0000000000000013 R14: 0000558d9655db50 R15: 00007fff13ce9470 </TASK> --[ end trace 888a9b92e04c5c97 ]--
Fixes: df15856132bc ("RDMA/qedr: restructure functions that create/destroy QPs") Signed-off-by: Kamal Heib kheib@redhat.com Link: https://lore.kernel.org/r/20240208223628.2040841-1-kheib@redhat.com Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/qedr/verbs.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c index 3543b9af10b7a..d382ac21159c2 100644 --- a/drivers/infiniband/hw/qedr/verbs.c +++ b/drivers/infiniband/hw/qedr/verbs.c @@ -1865,8 +1865,17 @@ static int qedr_create_user_qp(struct qedr_dev *dev, /* RQ - read access only (0) */ rc = qedr_init_user_queue(udata, dev, &qp->urq, ureq.rq_addr, ureq.rq_len, true, 0, alloc_and_init); - if (rc) + if (rc) { + ib_umem_release(qp->usq.umem); + qp->usq.umem = NULL; + if (rdma_protocol_roce(&dev->ibdev, 1)) { + qedr_free_pbl(dev, &qp->usq.pbl_info, + qp->usq.pbl_tbl); + } else { + kfree(qp->usq.pbl_tbl); + } return rc; + } }
memset(&in_params, 0, sizeof(in_params));
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Heiko Stuebner heiko.stuebner@cherry.de
[ Upstream commit 334bf0710c98d391f4067b72f535d6c4c84dfb6f ]
The px30 has two spi controllers with two chip-selects each. The num-cs property is specified as the total number of chip selects a controllers has and is used since 2020 to find uses of chipselects outside that range in the Rockchip spi driver.
Without the property set, the default is 1, so spi devices using the second chipselect will not be created.
Fixes: eb1262e3cc8b ("spi: spi-rockchip: use num-cs property and ctlr->enable_gpiods") Signed-off-by: Heiko Stuebner heiko.stuebner@cherry.de Reviewed-by: Quentin Schulz quentin.schulz@theobroma-systems.com Link: https://lore.kernel.org/r/20240119101656.965744-1-heiko@sntech.de Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/rockchip/px30.dtsi | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/rockchip/px30.dtsi b/arch/arm64/boot/dts/rockchip/px30.dtsi index 0d6761074b11a..f241e7c318bcd 100644 --- a/arch/arm64/boot/dts/rockchip/px30.dtsi +++ b/arch/arm64/boot/dts/rockchip/px30.dtsi @@ -577,6 +577,7 @@ clock-names = "spiclk", "apb_pclk"; dmas = <&dmac 12>, <&dmac 13>; dma-names = "tx", "rx"; + num-cs = <2>; pinctrl-names = "default"; pinctrl-0 = <&spi0_clk &spi0_csn &spi0_miso &spi0_mosi>; #address-cells = <1>; @@ -592,6 +593,7 @@ clock-names = "spiclk", "apb_pclk"; dmas = <&dmac 14>, <&dmac 15>; dma-names = "tx", "rx"; + num-cs = <2>; pinctrl-names = "default"; pinctrl-0 = <&spi1_clk &spi1_csn0 &spi1_csn1 &spi1_miso &spi1_mosi>; #address-cells = <1>;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit eb5c7465c3240151cd42a55c7ace9da0026308a1 ]
clang-16 notices that srpt_qp_event() gets called through an incompatible pointer here:
drivers/infiniband/ulp/srpt/ib_srpt.c:1815:5: error: cast from 'void (*)(struct ib_event *, struct srpt_rdma_ch *)' to 'void (*)(struct ib_event *, void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict] 1815 | = (void(*)(struct ib_event *, void*))srpt_qp_event;
Change srpt_qp_event() to use the correct prototype and adjust the argument inside of it.
Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1") Signed-off-by: Arnd Bergmann arnd@arndb.de Link: https://lore.kernel.org/r/20240213100728.458348-1-arnd@kernel.org Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/ulp/srpt/ib_srpt.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 80e99e9e97172..41abf9cf11c67 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -214,10 +214,12 @@ static const char *get_ch_state_name(enum rdma_ch_state s) /** * srpt_qp_event - QP event callback function * @event: Description of the event that occurred. - * @ch: SRPT RDMA channel. + * @ptr: SRPT RDMA channel. */ -static void srpt_qp_event(struct ib_event *event, struct srpt_rdma_ch *ch) +static void srpt_qp_event(struct ib_event *event, void *ptr) { + struct srpt_rdma_ch *ch = ptr; + pr_debug("QP event %d on ch=%p sess_name=%s-%d state=%s\n", event->event, ch, ch->sess_name, ch->qp->qp_num, get_ch_state_name(ch->state)); @@ -1807,8 +1809,7 @@ static int srpt_create_ch_ib(struct srpt_rdma_ch *ch) ch->cq_size = ch->rq_size + sq_size;
qp_init->qp_context = (void *)ch; - qp_init->event_handler - = (void(*)(struct ib_event *, void*))srpt_qp_event; + qp_init->event_handler = srpt_qp_event; qp_init->send_cq = ch->cq; qp_init->recv_cq = ch->cq; qp_init->sq_sig_type = IB_SIGNAL_REQ_WR;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gianmarco Lusvardi glusvardi@posteo.net
[ Upstream commit e37243b65d528a8a9f8b9a57a43885f8e8dfc15c ]
The bpf_doc script refers to the GPL as the "GNU Privacy License". I strongly suspect that the author wanted to refer to the GNU General Public License, under which the Linux kernel is released, as, to the best of my knowledge, there is no license named "GNU Privacy License". This patch corrects the license name in the script accordingly.
Fixes: 56a092c89505 ("bpf: add script and prepare bpf.h for new helpers documentation") Signed-off-by: Gianmarco Lusvardi glusvardi@posteo.net Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Quentin Monnet quentin@isovalent.com Link: https://lore.kernel.org/bpf/20240213230544.930018-3-glusvardi@posteo.net Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/bpf_helpers_doc.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py index 31484377b8b11..806240dda6090 100755 --- a/scripts/bpf_helpers_doc.py +++ b/scripts/bpf_helpers_doc.py @@ -284,7 +284,7 @@ eBPF programs can have an associated license, passed along with the bytecode instructions to the kernel when the programs are loaded. The format for that string is identical to the one in use for kernel modules (Dual licenses, such as "Dual BSD/GPL", may be used). Some helper functions are only accessible to -programs that are compatible with the GNU Privacy License (GPL). +programs that are compatible with the GNU General Public License (GNU GPL).
In order to use such helpers, the eBPF program must be loaded with the correct license string passed (via **attr**) to the **bpf**\ () system call, and this
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit 9ddf190a7df77b77817f955fdb9c2ae9d1c9c9a3 ]
JAZZ_ESP is a bool kconfig symbol that selects SCSI_SPI_ATTRS. When CONFIG_SCSI=m, this results in SCSI_SPI_ATTRS=m while JAZZ_ESP=y, which causes many undefined symbol linker errors.
Fix this by only offering to build this driver when CONFIG_SCSI=y.
[mkp: JAZZ_ESP is unique in that it does not support being compiled as a module unlike the remaining SPI SCSI HBA drivers]
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Randy Dunlap rdunlap@infradead.org Link: https://lore.kernel.org/r/20240214055953.9612-1-rdunlap@infradead.org Cc: Thomas Bogendoerfer tsbogend@alpha.franken.de Cc: linux-mips@vger.kernel.org Cc: Arnd Bergmann arnd@arndb.de Cc: Masahiro Yamada masahiroy@kernel.org Cc: Nicolas Schier nicolas@fjasle.eu Cc: James E.J. Bottomley jejb@linux.ibm.com Cc: Martin K. Petersen martin.petersen@oracle.com Cc: linux-scsi@vger.kernel.org Cc: Geert Uytterhoeven geert@linux-m68k.org Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202402112222.Gl0udKyU-lkp@intel.com/ Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig index 6524e1fe54d2e..f59c9002468cc 100644 --- a/drivers/scsi/Kconfig +++ b/drivers/scsi/Kconfig @@ -1289,7 +1289,7 @@ source "drivers/scsi/arm/Kconfig"
config JAZZ_ESP bool "MIPS JAZZ FAS216 SCSI support" - depends on MACH_JAZZ && SCSI + depends on MACH_JAZZ && SCSI=y select SCSI_SPI_ATTRS help This is the driver for the onboard SCSI host adapter of MIPS Magnum
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit 0affdba22aca5573f9d989bcb1d71d32a6a03efe ]
clang-16 warns about casting between incompatible function types:
drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c:161:10: error: cast from 'void (*)(const struct firmware *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict] 161 | .fini = (void(*)(void *))release_firmware,
This one was done to use the generic shadow_fw_release() function as a callback for struct nvbios_source. Change it to use the same prototype as the other five instances, with a trivial helper function that actually calls release_firmware.
Fixes: 70c0f263cc2e ("drm/nouveau/bios: pull in basic vbios subdev, more to come later") Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Danilo Krummrich dakr@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20240213095753.455062-1-arnd@k... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c index 4b571cc6bc70f..6597def18627e 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c @@ -154,11 +154,17 @@ shadow_fw_init(struct nvkm_bios *bios, const char *name) return (void *)fw; }
+static void +shadow_fw_release(void *fw) +{ + release_firmware(fw); +} + static const struct nvbios_source shadow_fw = { .name = "firmware", .init = shadow_fw_init, - .fini = (void(*)(void *))release_firmware, + .fini = shadow_fw_release, .read = shadow_fw_read, .rw = false, };
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 081a0e3b0d4c061419d3f4679dec9f68725b17e4 ]
net->dev_base_seq and ipv4.dev_addr_genid are monotonically increasing.
If we XOR their values, we could miss to detect if both values were changed with the same amount.
Fixes: 0465277f6b3f ("ipv4: provide addr and netconf dump consistency info") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Nicolas Dichtel nicolas.dichtel@6wind.com Acked-by: Nicolas Dichtel nicolas.dichtel@6wind.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/devinet.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index da1ca8081c035..9ac7d47d27b81 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -1798,6 +1798,21 @@ static int in_dev_dump_addr(struct in_device *in_dev, struct sk_buff *skb, return err; }
+/* Combine dev_addr_genid and dev_base_seq to detect changes. + */ +static u32 inet_base_seq(const struct net *net) +{ + u32 res = atomic_read(&net->ipv4.dev_addr_genid) + + net->dev_base_seq; + + /* Must not return 0 (see nl_dump_check_consistent()). + * Chose a value far away from 0. + */ + if (!res) + res = 0x80000000; + return res; +} + static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb) { const struct nlmsghdr *nlh = cb->nlh; @@ -1849,8 +1864,7 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb) idx = 0; head = &tgt_net->dev_index_head[h]; rcu_read_lock(); - cb->seq = atomic_read(&tgt_net->ipv4.dev_addr_genid) ^ - tgt_net->dev_base_seq; + cb->seq = inet_base_seq(tgt_net); hlist_for_each_entry_rcu(dev, head, index_hlist) { if (idx < s_idx) goto cont; @@ -2249,8 +2263,7 @@ static int inet_netconf_dump_devconf(struct sk_buff *skb, idx = 0; head = &net->dev_index_head[h]; rcu_read_lock(); - cb->seq = atomic_read(&net->ipv4.dev_addr_genid) ^ - net->dev_base_seq; + cb->seq = inet_base_seq(net); hlist_for_each_entry_rcu(dev, head, index_hlist) { if (idx < s_idx) goto cont;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit e898e4cd1aab271ca414f9ac6e08e4c761f6913c ]
net->dev_base_seq and ipv6.dev_addr_genid are monotonically increasing.
If we XOR their values, we could miss to detect if both values were changed with the same amount.
Fixes: 63998ac24f83 ("ipv6: provide addr and netconf dump consistency info") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Nicolas Dichtel nicolas.dichtel@6wind.com
Signed-off-by: Eric Dumazet edumazet@google.com Acked-by: Nicolas Dichtel nicolas.dichtel@6wind.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/addrconf.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 79787a1f5ab34..150c2f71ec89f 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -698,6 +698,22 @@ static int inet6_netconf_get_devconf(struct sk_buff *in_skb, return err; }
+/* Combine dev_addr_genid and dev_base_seq to detect changes. + */ +static u32 inet6_base_seq(const struct net *net) +{ + u32 res = atomic_read(&net->ipv6.dev_addr_genid) + + net->dev_base_seq; + + /* Must not return 0 (see nl_dump_check_consistent()). + * Chose a value far away from 0. + */ + if (!res) + res = 0x80000000; + return res; +} + + static int inet6_netconf_dump_devconf(struct sk_buff *skb, struct netlink_callback *cb) { @@ -731,8 +747,7 @@ static int inet6_netconf_dump_devconf(struct sk_buff *skb, idx = 0; head = &net->dev_index_head[h]; rcu_read_lock(); - cb->seq = atomic_read(&net->ipv6.dev_addr_genid) ^ - net->dev_base_seq; + cb->seq = inet6_base_seq(net); hlist_for_each_entry_rcu(dev, head, index_hlist) { if (idx < s_idx) goto cont; @@ -5288,7 +5303,7 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb, }
rcu_read_lock(); - cb->seq = atomic_read(&tgt_net->ipv6.dev_addr_genid) ^ tgt_net->dev_base_seq; + cb->seq = inet6_base_seq(tgt_net); for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) { idx = 0; head = &tgt_net->dev_index_head[h];
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniil Dulov d.dulov@aladdin.ru
[ Upstream commit 6ea38e2aeb72349cad50e38899b0ba6fbcb2af3d ]
The max length of volume->vid value is 20 characters. So increase idbuf[] size up to 24 to avoid overflow.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
[DH: Actually, it's 20 + NUL, so increase it to 24 and use snprintf()]
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: Daniil Dulov d.dulov@aladdin.ru Signed-off-by: David Howells dhowells@redhat.com Link: https://lore.kernel.org/r/20240211150442.3416-1-d.dulov@aladdin.ru/ # v1 Link: https://lore.kernel.org/r/20240212083347.10742-1-d.dulov@aladdin.ru/ # v2 Link: https://lore.kernel.org/r/20240219143906.138346-3-dhowells@redhat.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/afs/volume.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/afs/volume.c b/fs/afs/volume.c index f84194b791d3e..fb19c69284ab2 100644 --- a/fs/afs/volume.c +++ b/fs/afs/volume.c @@ -302,7 +302,7 @@ static int afs_update_volume_status(struct afs_volume *volume, struct key *key) { struct afs_server_list *new, *old, *discard; struct afs_vldb_entry *vldb; - char idbuf[16]; + char idbuf[24]; int ret, idsz;
_enter(""); @@ -310,7 +310,7 @@ static int afs_update_volume_status(struct afs_volume *volume, struct key *key) /* We look up an ID by passing it as a decimal string in the * operation's name parameter. */ - idsz = sprintf(idbuf, "%llu", volume->vid); + idsz = snprintf(idbuf, sizeof(idbuf), "%llu", volume->vid);
vldb = afs_vl_lookup_vldb(volume->cell, key, idbuf, idsz); if (IS_ERR(vldb)) {
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasiliy Kovalev kovalev@altlinux.org
[ Upstream commit 5559cea2d5aa3018a5f00dd2aca3427ba09b386b ]
The pernet operations structure for the subsystem must be registered before registering the generic netlink family.
Fixes: 915d7e5e5930 ("ipv6: sr: add code base for control plane support of SR-IPv6") Signed-off-by: Vasiliy Kovalev kovalev@altlinux.org Link: https://lore.kernel.org/r/20240215202717.29815-1-kovalev@altlinux.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/seg6.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/net/ipv6/seg6.c b/net/ipv6/seg6.c index 2278c0234c497..a8439fded12dc 100644 --- a/net/ipv6/seg6.c +++ b/net/ipv6/seg6.c @@ -451,22 +451,24 @@ int __init seg6_init(void) { int err;
- err = genl_register_family(&seg6_genl_family); + err = register_pernet_subsys(&ip6_segments_ops); if (err) goto out;
- err = register_pernet_subsys(&ip6_segments_ops); + err = genl_register_family(&seg6_genl_family); if (err) - goto out_unregister_genl; + goto out_unregister_pernet;
#ifdef CONFIG_IPV6_SEG6_LWTUNNEL err = seg6_iptunnel_init(); if (err) - goto out_unregister_pernet; + goto out_unregister_genl;
err = seg6_local_init(); - if (err) - goto out_unregister_pernet; + if (err) { + seg6_iptunnel_exit(); + goto out_unregister_genl; + } #endif
#ifdef CONFIG_IPV6_SEG6_HMAC @@ -487,11 +489,11 @@ int __init seg6_init(void) #endif #endif #ifdef CONFIG_IPV6_SEG6_LWTUNNEL -out_unregister_pernet: - unregister_pernet_subsys(&ip6_segments_ops); -#endif out_unregister_genl: genl_unregister_family(&seg6_genl_family); +#endif +out_unregister_pernet: + unregister_pernet_subsys(&ip6_segments_ops); goto out; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wolfram Sang wsa+renesas@sang-engineering.com
[ Upstream commit 8fc9d51ea2d32a05f7d7cf86a25cc86ecc57eb45 ]
Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmkn... Signed-off-by: Wolfram Sang wsa+renesas@sang-engineering.com Link: https://lore.kernel.org/r/20220818210227.8611-1-wsa+renesas@sang-engineering... Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: a7d6027790ac ("arp: Prevent overflow in arp_req_get().") Signed-off-by: Sasha Levin sashal@kernel.org --- net/packet/af_packet.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index b292d58fdcc4c..1052cbcdd3c8d 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1871,7 +1871,7 @@ static int packet_rcv_spkt(struct sk_buff *skb, struct net_device *dev, */
spkt->spkt_family = dev->type; - strlcpy(spkt->spkt_device, dev->name, sizeof(spkt->spkt_device)); + strscpy(spkt->spkt_device, dev->name, sizeof(spkt->spkt_device)); spkt->spkt_protocol = skb->protocol;
/* @@ -3540,7 +3540,7 @@ static int packet_getname_spkt(struct socket *sock, struct sockaddr *uaddr, rcu_read_lock(); dev = dev_get_by_index_rcu(sock_net(sk), READ_ONCE(pkt_sk(sk)->ifindex)); if (dev) - strlcpy(uaddr->sa_data, dev->name, sizeof(uaddr->sa_data)); + strscpy(uaddr->sa_data, dev->name, sizeof(uaddr->sa_data)); rcu_read_unlock();
return sizeof(*uaddr);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook keescook@chromium.org
[ Upstream commit b5f0de6df6dce8d641ef58ef7012f3304dffb9a1 ]
One of the worst offenders of "fake flexible arrays" is struct sockaddr, as it is the classic example of why GCC and Clang have been traditionally forced to treat all trailing arrays as fake flexible arrays: in the distant misty past, sa_data became too small, and code started just treating it as a flexible array, even though it was fixed-size. The special case by the compiler is specifically that sizeof(sa->sa_data) and FORTIFY_SOURCE (which uses __builtin_object_size(sa->sa_data, 1)) do not agree (14 and -1 respectively), which makes FORTIFY_SOURCE treat it as a flexible array.
However, the coming -fstrict-flex-arrays compiler flag will remove these special cases so that FORTIFY_SOURCE can gain coverage over all the trailing arrays in the kernel that are _not_ supposed to be treated as a flexible array. To deal with this change, convert sa_data to a true flexible array. To keep the structure size the same, move sa_data into a union with a newly introduced sa_data_min with the original size. The result is that FORTIFY_SOURCE can continue to have no idea how large sa_data may actually be, but anything using sizeof(sa->sa_data) must switch to sizeof(sa->sa_data_min).
Cc: Jens Axboe axboe@kernel.dk Cc: Pavel Begunkov asml.silence@gmail.com Cc: David Ahern dsahern@kernel.org Cc: Dylan Yudaken dylany@fb.com Cc: Yajun Deng yajun.deng@linux.dev Cc: Petr Machata petrm@nvidia.com Cc: Hangbin Liu liuhangbin@gmail.com Cc: Leon Romanovsky leon@kernel.org Cc: syzbot syzkaller@googlegroups.com Cc: Willem de Bruijn willemb@google.com Cc: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Kees Cook keescook@chromium.org Link: https://lore.kernel.org/r/20221018095503.never.671-kees@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: a7d6027790ac ("arp: Prevent overflow in arp_req_get().") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/socket.h | 5 ++++- net/core/dev.c | 2 +- net/core/dev_ioctl.c | 2 +- net/packet/af_packet.c | 10 +++++----- 4 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/include/linux/socket.h b/include/linux/socket.h index c3b35d18bcd30..daf51fef5a8d1 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -31,7 +31,10 @@ typedef __kernel_sa_family_t sa_family_t;
struct sockaddr { sa_family_t sa_family; /* address family, AF_xxx */ - char sa_data[14]; /* 14 bytes of protocol address */ + union { + char sa_data_min[14]; /* Minimum 14 bytes of protocol address */ + DECLARE_FLEX_ARRAY(char, sa_data); + }; };
struct linger { diff --git a/net/core/dev.c b/net/core/dev.c index fc881d60a9dcc..0619d2253aa24 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -8787,7 +8787,7 @@ EXPORT_SYMBOL(dev_set_mac_address_user);
int dev_get_mac_address(struct sockaddr *sa, struct net *net, char *dev_name) { - size_t size = sizeof(sa->sa_data); + size_t size = sizeof(sa->sa_data_min); struct net_device *dev; int ret = 0;
diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c index 993420da29307..60e815a71909a 100644 --- a/net/core/dev_ioctl.c +++ b/net/core/dev_ioctl.c @@ -245,7 +245,7 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, unsigned int cmd) if (ifr->ifr_hwaddr.sa_family != dev->type) return -EINVAL; memcpy(dev->broadcast, ifr->ifr_hwaddr.sa_data, - min(sizeof(ifr->ifr_hwaddr.sa_data), + min(sizeof(ifr->ifr_hwaddr.sa_data_min), (size_t)dev->addr_len)); call_netdevice_notifiers(NETDEV_CHANGEADDR, dev); return 0; diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 1052cbcdd3c8d..6cc054dd53b6e 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -3252,7 +3252,7 @@ static int packet_bind_spkt(struct socket *sock, struct sockaddr *uaddr, int addr_len) { struct sock *sk = sock->sk; - char name[sizeof(uaddr->sa_data) + 1]; + char name[sizeof(uaddr->sa_data_min) + 1];
/* * Check legality @@ -3263,8 +3263,8 @@ static int packet_bind_spkt(struct socket *sock, struct sockaddr *uaddr, /* uaddr->sa_data comes from the userspace, it's not guaranteed to be * zero-terminated. */ - memcpy(name, uaddr->sa_data, sizeof(uaddr->sa_data)); - name[sizeof(uaddr->sa_data)] = 0; + memcpy(name, uaddr->sa_data, sizeof(uaddr->sa_data_min)); + name[sizeof(uaddr->sa_data_min)] = 0;
return packet_do_bind(sk, name, 0, 0); } @@ -3536,11 +3536,11 @@ static int packet_getname_spkt(struct socket *sock, struct sockaddr *uaddr, return -EOPNOTSUPP;
uaddr->sa_family = AF_PACKET; - memset(uaddr->sa_data, 0, sizeof(uaddr->sa_data)); + memset(uaddr->sa_data, 0, sizeof(uaddr->sa_data_min)); rcu_read_lock(); dev = dev_get_by_index_rcu(sock_net(sk), READ_ONCE(pkt_sk(sk)->ifindex)); if (dev) - strscpy(uaddr->sa_data, dev->name, sizeof(uaddr->sa_data)); + strscpy(uaddr->sa_data, dev->name, sizeof(uaddr->sa_data_min)); rcu_read_unlock();
return sizeof(*uaddr);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Gunthorpe jgg@nvidia.com
[ Upstream commit 723a2cc8d69d4342b47dfddbfe6c19f1b135f09b ]
The signature for __iowrite64_copy() requires the number of 64 bit quantities, not bytes. Multiple by 8 to get to a byte length before invoking zpci_memcpy_toio()
Fixes: 87bc359b9822 ("s390/pci: speed up __iowrite64_copy by using pci store block insn") Acked-by: Niklas Schnelle schnelle@linux.ibm.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/0-v1-9223d11a7662+1d7785-s390_iowrite64_jgg@nvidia... Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/pci/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c index 74799439b2598..beecc36c30276 100644 --- a/arch/s390/pci/pci.c +++ b/arch/s390/pci/pci.c @@ -225,7 +225,7 @@ resource_size_t pcibios_align_resource(void *data, const struct resource *res, /* combine single writes by using store-block insn */ void __iowrite64_copy(void __iomem *to, const void *from, size_t count) { - zpci_memcpy_toio(to, from, count); + zpci_memcpy_toio(to, from, count * 8); }
static void __iomem *__ioremap(phys_addr_t addr, size_t size, pgprot_t prot)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit bfc06e1aaa130b86a81ce3c41ec71a2f5e191690 ]
'recv_end:' checks num_async and decrypted, and is then followed by the 'end' label. Since we know that decrypted and num_async are 0 at the start we can jump to 'end'.
Move the init of decrypted and num_async to let the compiler catch if I'm wrong.
Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: fdfbaec5923d ("tls: stop recv() if initial process_rx_list gave us non-DATA") Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index dd980438f201f..d11094a81ee6c 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1754,6 +1754,7 @@ int tls_sw_recvmsg(struct sock *sk, struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx); struct tls_prot_info *prot = &tls_ctx->prot_info; struct sk_psock *psock; + int num_async, pending; unsigned char control = 0; ssize_t decrypted = 0; struct strp_msg *rxm; @@ -1766,8 +1767,6 @@ int tls_sw_recvmsg(struct sock *sk, bool is_kvec = iov_iter_is_kvec(&msg->msg_iter); bool is_peek = flags & MSG_PEEK; bool bpf_strp_enabled; - int num_async = 0; - int pending;
flags |= nonblock;
@@ -1789,12 +1788,14 @@ int tls_sw_recvmsg(struct sock *sk, }
if (len <= copied) - goto recv_end; + goto end;
target = sock_rcvlowat(sk, flags & MSG_WAITALL, len); len = len - copied; timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
+ decrypted = 0; + num_async = 0; while (len && (decrypted + copied < target || ctx->recv_pkt)) { bool retain_skb = false; bool zc = false;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit d5123edd10cf9d324fcb88e276bdc7375f3c5321 ]
Pointless else branch after goto makes the code harder to refactor down the line.
Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: fdfbaec5923d ("tls: stop recv() if initial process_rx_list gave us non-DATA") Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index d11094a81ee6c..732f96b7bafc8 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1783,10 +1783,9 @@ int tls_sw_recvmsg(struct sock *sk, if (err < 0) { tls_err_abort(sk, err); goto end; - } else { - copied = err; }
+ copied = err; if (len <= copied) goto end;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sabrina Dubroca sd@queasysnail.net
[ Upstream commit fdfbaec5923d9359698cbb286bc0deadbb717504 ]
If we have a non-DATA record on the rx_list and another record of the same type still on the queue, we will end up merging them: - process_rx_list copies the non-DATA record - we start the loop and process the first available record since it's of the same type - we break out of the loop since the record was not DATA
Just check the record type and jump to the end in case process_rx_list did some work.
Fixes: 692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: Sabrina Dubroca sd@queasysnail.net Link: https://lore.kernel.org/r/bd31449e43bd4b6ff546f5c51cf958c31c511deb.170800737... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 732f96b7bafc8..46f1c19f7c60b 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1786,7 +1786,7 @@ int tls_sw_recvmsg(struct sock *sk, }
copied = err; - if (len <= copied) + if (len <= copied || (copied && control != TLS_RECORD_TYPE_DATA)) goto end;
target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit bccebf64701735533c8db37773eeacc6566cc8ec ]
We need to set the dormant flag again if we fail to register the hooks.
During memory pressure hook registration can fail and we end up with a table marked as active but no registered hooks.
On table/base chain deletion, nf_tables will attempt to unregister the hook again which yields a warn splat from the nftables core.
Reported-and-tested-by: syzbot+de4025c006ec68ac56fc@syzkaller.appspotmail.com Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index f586e8b3c6cfa..73b0a6925304c 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -1132,6 +1132,7 @@ static int nf_tables_updtable(struct nft_ctx *ctx) return 0;
err_register_hooks: + ctx->table->flags |= NFT_TABLE_F_DORMANT; nft_trans_destroy(trans); return ret; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian König christian.koenig@amd.com
[ Upstream commit 7621350c6bb20fb6ab7eb988833ab96eac3dcbef ]
DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT can't be used when we hold locks since we are basically waiting for userspace to do something.
Holding a lock while doing so can trivial deadlock with page faults etc...
So make lockdep complain when a driver tries to do this.
v2: Add lockdep_assert_none_held() macro. v3: Add might_sleep() and also use lockdep_assert_none_held() in the IOCTL path.
Signed-off-by: Christian König christian.koenig@amd.com Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://patchwork.freedesktop.org/patch/414944/ Stable-dep-of: 3c43177ffb54 ("drm/syncobj: call drm_syncobj_fence_add_wait when WAIT_AVAILABLE flag is set") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/drm_syncobj.c | 12 ++++++++++++ include/linux/lockdep.h | 5 +++++ 2 files changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 738e60139db90..4c3c8f8da0215 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -387,6 +387,15 @@ int drm_syncobj_find_fence(struct drm_file *file_private, if (!syncobj) return -ENOENT;
+ /* Waiting for userspace with locks help is illegal cause that can + * trivial deadlock with page faults for example. Make lockdep complain + * about it early on. + */ + if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { + might_sleep(); + lockdep_assert_none_held_once(); + } + *fence = drm_syncobj_fence_get(syncobj);
if (*fence) { @@ -951,6 +960,9 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, uint64_t *points; uint32_t signaled_count, i;
+ if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) + lockdep_assert_none_held_once(); + points = kmalloc_array(count, sizeof(*points), GFP_KERNEL); if (points == NULL) return -ENOMEM; diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 2c2586312b447..3eca9f91b9a56 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -321,6 +321,10 @@ extern void lock_unpin_lock(struct lockdep_map *lock, struct pin_cookie); WARN_ON_ONCE(debug_locks && !lockdep_is_held(l)); \ } while (0)
+#define lockdep_assert_none_held_once() do { \ + WARN_ON_ONCE(debug_locks && current->lockdep_depth); \ + } while (0) + #define lockdep_recursing(tsk) ((tsk)->lockdep_recursion)
#define lockdep_pin_lock(l) lock_pin_lock(&(l)->dep_map) @@ -394,6 +398,7 @@ static inline void lockdep_unregister_key(struct lock_class_key *key) #define lockdep_assert_held_write(l) do { (void)(l); } while (0) #define lockdep_assert_held_read(l) do { (void)(l); } while (0) #define lockdep_assert_held_once(l) do { (void)(l); } while (0) +#define lockdep_assert_none_held_once() do { } while (0)
#define lockdep_recursing(tsk) (0)
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Erik Kurzinger ekurzinger@nvidia.com
[ Upstream commit 3c43177ffb54ea5be97505eb8e2690e99ac96bc9 ]
When waiting for a syncobj timeline point whose fence has not yet been submitted with the WAIT_FOR_SUBMIT flag, a callback is registered using drm_syncobj_fence_add_wait and the thread is put to sleep until the timeout expires. If the fence is submitted before then, drm_syncobj_add_point will wake up the sleeping thread immediately which will proceed to wait for the fence to be signaled.
However, if the WAIT_AVAILABLE flag is used instead, drm_syncobj_fence_add_wait won't get called, meaning the waiting thread will always sleep for the full timeout duration, even if the fence gets submitted earlier. If it turns out that the fence *has* been submitted by the time it eventually wakes up, it will still indicate to userspace that the wait completed successfully (it won't return -ETIME), but it will have taken much longer than it should have.
To fix this, we must call drm_syncobj_fence_add_wait if *either* the WAIT_FOR_SUBMIT flag or the WAIT_AVAILABLE flag is set. The only difference being that with WAIT_FOR_SUBMIT we will also wait for the fence to be signaled after it has been submitted while with WAIT_AVAILABLE we will return immediately.
IGT test patch: https://lists.freedesktop.org/archives/igt-dev/2024-January/067537.html
v1 -> v2: adjust lockdep_assert_none_held_once condition
(cherry picked from commit 8c44ea81634a4a337df70a32621a5f3791be23df)
Fixes: 01d6c3578379 ("drm/syncobj: add support for timeline point wait v8") Signed-off-by: Erik Kurzinger ekurzinger@nvidia.com Signed-off-by: Simon Ser contact@emersion.fr Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch Reviewed-by: Simon Ser contact@emersion.fr Link: https://patchwork.freedesktop.org/patch/msgid/20240119163208.3723457-1-ekurz... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/drm_syncobj.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 4c3c8f8da0215..6ce446cc88780 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -960,7 +960,8 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, uint64_t *points; uint32_t signaled_count, i;
- if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) + if (flags & (DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT | + DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE)) lockdep_assert_none_held_once();
points = kmalloc_array(count, sizeof(*points), GFP_KERNEL); @@ -1029,7 +1030,8 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, * fallthough and try a 0 timeout wait! */
- if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { + if (flags & (DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT | + DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE)) { for (i = 0; i < count; ++i) drm_syncobj_fence_add_wait(syncobjs[i], &entries[i]); }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Armin Wolf W_Armin@gmx.de
[ Upstream commit bae67893578d608e35691dcdfa90c4957debf1d3 ]
After destroying dmub_srv, the memory associated with it is not freed, causing a memory leak:
unreferenced object 0xffff896302b45800 (size 1024): comm "(udev-worker)", pid 222, jiffies 4294894636 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 6265fd77): [<ffffffff993495ed>] kmalloc_trace+0x29d/0x340 [<ffffffffc0ea4a94>] dm_dmub_sw_init+0xb4/0x450 [amdgpu] [<ffffffffc0ea4e55>] dm_sw_init+0x15/0x2b0 [amdgpu] [<ffffffffc0ba8557>] amdgpu_device_init+0x1417/0x24e0 [amdgpu] [<ffffffffc0bab285>] amdgpu_driver_load_kms+0x15/0x190 [amdgpu] [<ffffffffc0ba09c7>] amdgpu_pci_probe+0x187/0x4e0 [amdgpu] [<ffffffff9968fd1e>] local_pci_probe+0x3e/0x90 [<ffffffff996918a3>] pci_device_probe+0xc3/0x230 [<ffffffff99805872>] really_probe+0xe2/0x480 [<ffffffff99805c98>] __driver_probe_device+0x78/0x160 [<ffffffff99805daf>] driver_probe_device+0x1f/0x90 [<ffffffff9980601e>] __driver_attach+0xce/0x1c0 [<ffffffff99803170>] bus_for_each_dev+0x70/0xc0 [<ffffffff99804822>] bus_add_driver+0x112/0x210 [<ffffffff99807245>] driver_register+0x55/0x100 [<ffffffff990012d1>] do_one_initcall+0x41/0x300
Fix this by freeing dmub_srv after destroying it.
Fixes: 743b9786b14a ("drm/amd/display: Hook up the DMUB service in DM") Signed-off-by: Armin Wolf W_Armin@gmx.de Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 54d6b4128721e..3578e3b3536e3 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -1456,6 +1456,7 @@ static int dm_sw_fini(void *handle)
if (adev->dm.dmub_srv) { dmub_srv_destroy(adev->dm.dmub_srv); + kfree(adev->dm.dmub_srv); adev->dm.dmub_srv = NULL; }
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Schmitz schmitzmic@gmail.com
commit d28e4dff085c5a87025c9a0a85fb798bd8e9ca17 upstream.
As it turns out, my earlier patch in commit 86d46fdaa12a (block: ataflop: fix breakage introduced at blk-mq refactoring) was incomplete. This patch fixes any remaining issues found during more testing and code review.
Requests exceeding 4 k are handled in 4k segments but __blk_mq_end_request() is never called on these (still sectors outstanding on the request). With redo_fd_request() removed, there is no provision to kick off processing of the next segment, causing requests exceeding 4k to hang. (By setting /sys/block/fd0/queue/max_sectors_k <= 4 as workaround, this behaviour can be avoided).
Instead of reintroducing redo_fd_request(), requeue the remainder of the request by calling blk_mq_requeue_request() on incomplete requests (i.e. when blk_update_request() still returns true), and rely on the block layer to queue the residual as new request.
Both error handling and formatting needs to release the ST-DMA lock, so call finish_fdc() on these (this was previously handled by redo_fd_request()). finish_fdc() may be called legitimately without the ST-DMA lock held - make sure we only release the lock if we actually held it. In a similar way, early exit due to errors in ataflop_queue_rq() must release the lock.
After minor errors, fd_error sets up to recalibrate the drive but never re-runs the current operation (another task handled by redo_fd_request() before). Call do_fd_action() to get the next steps (seek, retry read/write) underway.
Signed-off-by: Michael Schmitz schmitzmic@gmail.com Fixes: 6ec3938cff95f (ataflop: convert to blk-mq) CC: linux-block@vger.kernel.org Link: https://lore.kernel.org/r/20211024002013.9332-1-schmitzmic@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk [MSch: v5.10 backport merge conflict fix] Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/ataflop.c | 42 +++++++++++++++++++++++++++++++++++++----- 1 file changed, 37 insertions(+), 5 deletions(-)
--- a/drivers/block/ataflop.c +++ b/drivers/block/ataflop.c @@ -456,10 +456,20 @@ static DEFINE_TIMER(fd_timer, check_chan static void fd_end_request_cur(blk_status_t err) { + DPRINT(("fd_end_request_cur(), bytes %d of %d\n", + blk_rq_cur_bytes(fd_request), + blk_rq_bytes(fd_request))); + if (!blk_update_request(fd_request, err, blk_rq_cur_bytes(fd_request))) { + DPRINT(("calling __blk_mq_end_request()\n")); __blk_mq_end_request(fd_request, err); fd_request = NULL; + } else { + /* requeue rest of request */ + DPRINT(("calling blk_mq_requeue_request()\n")); + blk_mq_requeue_request(fd_request, true); + fd_request = NULL; } }
@@ -697,12 +707,21 @@ static void fd_error( void ) if (fd_request->error_count >= MAX_ERRORS) { printk(KERN_ERR "fd%d: too many errors.\n", SelectedDrive ); fd_end_request_cur(BLK_STS_IOERR); + finish_fdc(); + return; } else if (fd_request->error_count == RECALIBRATE_ERRORS) { printk(KERN_WARNING "fd%d: recalibrating\n", SelectedDrive ); if (SelectedDrive != -1) SUD.track = -1; } + /* need to re-run request to recalibrate */ + atari_disable_irq( IRQ_MFP_FDC ); + + setup_req_params( SelectedDrive ); + do_fd_action( SelectedDrive ); + + atari_enable_irq( IRQ_MFP_FDC ); }
@@ -737,6 +756,7 @@ static int do_format(int drive, int type if (type) { if (--type >= NUM_DISK_MINORS || minor2disktype[type].drive_types > DriveType) { + finish_fdc(); ret = -EINVAL; goto out; } @@ -745,6 +765,7 @@ static int do_format(int drive, int type }
if (!UDT || desc->track >= UDT->blocks/UDT->spt/2 || desc->head >= 2) { + finish_fdc(); ret = -EINVAL; goto out; } @@ -785,6 +806,7 @@ static int do_format(int drive, int type
wait_for_completion(&format_wait);
+ finish_fdc(); ret = FormatError ? -EIO : 0; out: blk_mq_unquiesce_queue(q); @@ -819,6 +841,7 @@ static void do_fd_action( int drive ) else { /* all sectors finished */ fd_end_request_cur(BLK_STS_OK); + finish_fdc(); return; } } @@ -1222,8 +1245,8 @@ static void fd_rwsec_done1(int status) } else { /* all sectors finished */ - finish_fdc(); fd_end_request_cur(BLK_STS_OK); + finish_fdc(); } return;
@@ -1345,7 +1368,7 @@ static void fd_times_out(struct timer_li
static void finish_fdc( void ) { - if (!NeedSeek) { + if (!NeedSeek || !stdma_is_locked_by(floppy_irq)) { finish_fdc_done( 0 ); } else { @@ -1380,7 +1403,8 @@ static void finish_fdc_done( int dummy ) start_motor_off_timer();
local_irq_save(flags); - stdma_release(); + if (stdma_is_locked_by(floppy_irq)) + stdma_release(); local_irq_restore(flags);
DPRINT(("finish_fdc() finished\n")); @@ -1477,7 +1501,9 @@ static blk_status_t ataflop_queue_rq(str int drive = floppy - unit; int type = floppy->type;
- DPRINT(("Queue request: drive %d type %d last %d\n", drive, type, bd->last)); + DPRINT(("Queue request: drive %d type %d sectors %d of %d last %d\n", + drive, type, blk_rq_cur_sectors(bd->rq), + blk_rq_sectors(bd->rq), bd->last));
spin_lock_irq(&ataflop_lock); if (fd_request) { @@ -1499,6 +1525,7 @@ static blk_status_t ataflop_queue_rq(str /* drive not connected */ printk(KERN_ERR "Unknown Device: fd%d\n", drive ); fd_end_request_cur(BLK_STS_IOERR); + stdma_release(); goto out; } @@ -1515,11 +1542,13 @@ static blk_status_t ataflop_queue_rq(str if (--type >= NUM_DISK_MINORS) { printk(KERN_WARNING "fd%d: invalid disk format", drive ); fd_end_request_cur(BLK_STS_IOERR); + stdma_release(); goto out; } if (minor2disktype[type].drive_types > DriveType) { printk(KERN_WARNING "fd%d: unsupported disk format", drive ); fd_end_request_cur(BLK_STS_IOERR); + stdma_release(); goto out; } type = minor2disktype[type].index; @@ -1620,6 +1649,7 @@ static int fd_locked_ioctl(struct block_ /* what if type > 0 here? Overwrite specified entry ? */ if (type) { /* refuse to re-set a predefined type for now */ + finish_fdc(); return -EINVAL; }
@@ -1687,8 +1717,10 @@ static int fd_locked_ioctl(struct block_
/* sanity check */ if (setprm.track != dtp->blocks/dtp->spt/2 || - setprm.head != 2) + setprm.head != 2) { + finish_fdc(); return -EINVAL; + }
UDT = dtp; set_capacity(floppy->disk, UDT->blocks);
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bart Van Assche bvanassche@acm.org
commit b820de741ae48ccf50dd95e297889c286ff4f760 upstream.
If kiocb_set_cancel_fn() is called for I/O submitted via io_uring, the following kernel warning appears:
WARNING: CPU: 3 PID: 368 at fs/aio.c:598 kiocb_set_cancel_fn+0x9c/0xa8 Call trace: kiocb_set_cancel_fn+0x9c/0xa8 ffs_epfile_read_iter+0x144/0x1d0 io_read+0x19c/0x498 io_issue_sqe+0x118/0x27c io_submit_sqes+0x25c/0x5fc __arm64_sys_io_uring_enter+0x104/0xab0 invoke_syscall+0x58/0x11c el0_svc_common+0xb4/0xf4 do_el0_svc+0x2c/0xb0 el0_svc+0x2c/0xa4 el0t_64_sync_handler+0x68/0xb4 el0t_64_sync+0x1a4/0x1a8
Fix this by setting the IOCB_AIO_RW flag for read and write I/O that is submitted by libaio.
Suggested-by: Jens Axboe axboe@kernel.dk Cc: Christoph Hellwig hch@lst.de Cc: Avi Kivity avi@scylladb.com Cc: Sandeep Dhavale dhavale@google.com Cc: Jens Axboe axboe@kernel.dk Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Kent Overstreet kent.overstreet@linux.dev Cc: stable@vger.kernel.org Signed-off-by: Bart Van Assche bvanassche@acm.org Link: https://lore.kernel.org/r/20240215204739.2677806-2-bvanassche@acm.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/aio.c | 9 ++++++++- include/linux/fs.h | 2 ++ 2 files changed, 10 insertions(+), 1 deletion(-)
--- a/fs/aio.c +++ b/fs/aio.c @@ -569,6 +569,13 @@ void kiocb_set_cancel_fn(struct kiocb *i struct kioctx *ctx = req->ki_ctx; unsigned long flags;
+ /* + * kiocb didn't come from aio or is neither a read nor a write, hence + * ignore it. + */ + if (!(iocb->ki_flags & IOCB_AIO_RW)) + return; + if (WARN_ON_ONCE(!list_empty(&req->ki_list))) return;
@@ -1454,7 +1461,7 @@ static int aio_prep_rw(struct kiocb *req req->ki_complete = aio_complete_rw; req->private = NULL; req->ki_pos = iocb->aio_offset; - req->ki_flags = iocb_flags(req->ki_filp); + req->ki_flags = iocb_flags(req->ki_filp) | IOCB_AIO_RW; if (iocb->aio_flags & IOCB_FLAG_RESFD) req->ki_flags |= IOCB_EVENTFD; req->ki_hint = ki_hint_validate(file_write_hint(req->ki_filp)); --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -316,6 +316,8 @@ enum rw_hint { /* iocb->ki_waitq is valid */ #define IOCB_WAITQ (1 << 19) #define IOCB_NOIO (1 << 20) +/* kiocb is a read or write operation submitted by fs/aio.c. */ +#define IOCB_AIO_RW (1 << 23)
struct kiocb { struct file *ki_filp;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
commit a7d6027790acea24446ddd6632d394096c0f4667 upstream.
syzkaller reported an overflown write in arp_req_get(). [0]
When ioctl(SIOCGARP) is issued, arp_req_get() looks up an neighbour entry and copies neigh->ha to struct arpreq.arp_ha.sa_data.
The arp_ha here is struct sockaddr, not struct sockaddr_storage, so the sa_data buffer is just 14 bytes.
In the splat below, 2 bytes are overflown to the next int field, arp_flags. We initialise the field just after the memcpy(), so it's not a problem.
However, when dev->addr_len is greater than 22 (e.g. MAX_ADDR_LEN), arp_netmask is overwritten, which could be set as htonl(0xFFFFFFFFUL) in arp_ioctl() before calling arp_req_get().
To avoid the overflow, let's limit the max length of memcpy().
Note that commit b5f0de6df6dc ("net: dev: Convert sa_data to flexible array in struct sockaddr") just silenced syzkaller.
[0]: memcpy: detected field-spanning write (size 16) of single field "r->arp_ha.sa_data" at net/ipv4/arp.c:1128 (size 14) WARNING: CPU: 0 PID: 144638 at net/ipv4/arp.c:1128 arp_req_get+0x411/0x4a0 net/ipv4/arp.c:1128 Modules linked in: CPU: 0 PID: 144638 Comm: syz-executor.4 Not tainted 6.1.74 #31 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014 RIP: 0010:arp_req_get+0x411/0x4a0 net/ipv4/arp.c:1128 Code: fd ff ff e8 41 42 de fb b9 0e 00 00 00 4c 89 fe 48 c7 c2 20 6d ab 87 48 c7 c7 80 6d ab 87 c6 05 25 af 72 04 01 e8 5f 8d ad fb <0f> 0b e9 6c fd ff ff e8 13 42 de fb be 03 00 00 00 4c 89 e7 e8 a6 RSP: 0018:ffffc900050b7998 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff88803a815000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff8641a44a RDI: 0000000000000001 RBP: ffffc900050b7a98 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 203a7970636d656d R12: ffff888039c54000 R13: 1ffff92000a16f37 R14: ffff88803a815084 R15: 0000000000000010 FS: 00007f172bf306c0(0000) GS:ffff88805aa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f172b3569f0 CR3: 0000000057f12005 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> arp_ioctl+0x33f/0x4b0 net/ipv4/arp.c:1261 inet_ioctl+0x314/0x3a0 net/ipv4/af_inet.c:981 sock_do_ioctl+0xdf/0x260 net/socket.c:1204 sock_ioctl+0x3ef/0x650 net/socket.c:1321 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x18e/0x220 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x37/0x90 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x64/0xce RIP: 0033:0x7f172b262b8d Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f172bf300b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f172b3abf80 RCX: 00007f172b262b8d RDX: 0000000020000000 RSI: 0000000000008954 RDI: 0000000000000003 RBP: 00007f172b2d3493 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007f172b3abf80 R15: 00007f172bf10000 </TASK>
Reported-by: syzkaller syzkaller@googlegroups.com Reported-by: Bjoern Doebel doebel@amazon.de Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20240215230516.31330-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/arp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -1104,7 +1104,8 @@ static int arp_req_get(struct arpreq *r, if (neigh) { if (!(neigh->nud_state & NUD_NOARP)) { read_lock_bh(&neigh->lock); - memcpy(r->arp_ha.sa_data, neigh->ha, dev->addr_len); + memcpy(r->arp_ha.sa_data, neigh->ha, + min(dev->addr_len, sizeof(r->arp_ha.sa_data_min))); r->arp_flags = arp_state_to_flags(neigh); read_unlock_bh(&neigh->lock); r->arp_ha.sa_family = dev->type;
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baokun Li libaokun1@huawei.com
commit c9b528c35795b711331ed36dc3dbee90d5812d4e upstream.
This mostly reverts commit 6bd97bf273bd ("ext4: remove redundant mb_regenerate_buddy()") and reintroduces mb_regenerate_buddy(). Based on code in mb_free_blocks(), fast commit replay can end up marking as free blocks that are already marked as such. This causes corruption of the buddy bitmap so we need to regenerate it in that case.
Reported-by: Jan Kara jack@suse.cz Fixes: 6bd97bf273bd ("ext4: remove redundant mb_regenerate_buddy()") Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20240104142040.2835097-4-libaokun1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Baokun Li libaokun1@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/mballoc.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
--- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -823,6 +823,24 @@ void ext4_mb_generate_buddy(struct super atomic64_add(period, &sbi->s_mb_generation_time); }
+static void mb_regenerate_buddy(struct ext4_buddy *e4b) +{ + int count; + int order = 1; + void *buddy; + + while ((buddy = mb_find_buddy(e4b, order++, &count))) + ext4_set_bits(buddy, 0, count); + + e4b->bd_info->bb_fragments = 0; + memset(e4b->bd_info->bb_counters, 0, + sizeof(*e4b->bd_info->bb_counters) * + (e4b->bd_sb->s_blocksize_bits + 2)); + + ext4_mb_generate_buddy(e4b->bd_sb, e4b->bd_buddy, + e4b->bd_bitmap, e4b->bd_group, e4b->bd_info); +} + /* The buddy information is attached the buddy cache inode * for convenience. The information regarding each group * is loaded via ext4_mb_load_buddy. The information involve @@ -1505,6 +1523,8 @@ static void mb_free_blocks(struct inode ext4_mark_group_bitmap_corrupted( sb, e4b->bd_group, EXT4_GROUP_INFO_BBITMAP_CORRUPT); + } else { + mb_regenerate_buddy(e4b); } goto done; }
Hi!
This is the start of the stable review cycle for the 5.10.211 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
CIP testing did not find any problems here:
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-5...
Tested-by: Pavel Machek (CIP) pavel@denx.de
Best regards, Pavel
Hello!
On 27/02/24 7:26 a. m., Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.211 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Feb 2024 13:15:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.211-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
We're seeing new warnings on 32-bits architectures: Arm, i386, PowerPC, RISC-V and System/390:
-----8<----- builds/linux/net/ipv4/arp.c: In function 'arp_req_get': /builds/linux/include/linux/minmax.h:20:35: warning: comparison of distinct pointer types lacks a cast 20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) | ^~ /builds/linux/include/linux/minmax.h:26:18: note: in expansion of macro '__typecheck' 26 | (__typecheck(x, y) && __no_side_effects(x, y)) | ^~~~~~~~~~~ /builds/linux/include/linux/minmax.h:36:31: note: in expansion of macro '__safe_cmp' 36 | __builtin_choose_expr(__safe_cmp(x, y), \ | ^~~~~~~~~~ /builds/linux/include/linux/minmax.h:45:25: note: in expansion of macro '__careful_cmp' 45 | #define min(x, y) __careful_cmp(x, y, <) | ^~~~~~~~~~~~~ /builds/linux/net/ipv4/arp.c:1108:32: note: in expansion of macro 'min' 1108 | min(dev->addr_len, sizeof(r->arp_ha.sa_data_min))); | ^~~ ----->8-----
Bisection points to:
commit 5a2d57992eca13530ac79ae287243b3ff6b01128 Author: Kuniyuki Iwashima kuniyu@amazon.com Date: Thu Feb 15 15:05:16 2024 -0800
arp: Prevent overflow in arp_req_get().
commit a7d6027790acea24446ddd6632d394096c0f4667 upstream.
Tuxmake reproducers:
tuxmake --runtime podman --target-arch arm --toolchain gcc-12 --kconfig davinci_all_defconfig tuxmake --runtime podman --target-arch i386 --toolchain gcc-12 --kconfig defconfig tuxmake --runtime podman --target-arch powerpc --toolchain gcc-12 --kconfig maple_defconfig tuxmake --runtime podman --target-arch riscv --toolchain gcc-12 --kconfig defconfig
Reverting the change made the build pass again without warnings.
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Greetings!
Daniel Díaz daniel.diaz@linaro.org
On 2/27/24 10:56, Daniel Díaz wrote:
Hello!
On 27/02/24 7:26 a. m., Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.211 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Feb 2024 13:15:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.211-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
We're seeing new warnings on 32-bits architectures: Arm, i386, PowerPC, RISC-V and System/390:
Seeing the same thing here.
On Tue, Feb 27, 2024 at 12:56:00PM -0600, Daniel Díaz wrote:
Hello!
On 27/02/24 7:26 a. m., Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.211 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Feb 2024 13:15:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.211-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
We're seeing new warnings on 32-bits architectures: Arm, i386, PowerPC, RISC-V and System/390:
-----8<----- builds/linux/net/ipv4/arp.c: In function 'arp_req_get': /builds/linux/include/linux/minmax.h:20:35: warning: comparison of distinct pointer types lacks a cast 20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) | ^~ /builds/linux/include/linux/minmax.h:26:18: note: in expansion of macro '__typecheck' 26 | (__typecheck(x, y) && __no_side_effects(x, y)) | ^~~~~~~~~~~ /builds/linux/include/linux/minmax.h:36:31: note: in expansion of macro '__safe_cmp' 36 | __builtin_choose_expr(__safe_cmp(x, y), \ | ^~~~~~~~~~ /builds/linux/include/linux/minmax.h:45:25: note: in expansion of macro '__careful_cmp' 45 | #define min(x, y) __careful_cmp(x, y, <) | ^~~~~~~~~~~~~ /builds/linux/net/ipv4/arp.c:1108:32: note: in expansion of macro 'min' 1108 | min(dev->addr_len, sizeof(r->arp_ha.sa_data_min))); | ^~~ ----->8-----
Bisection points to:
commit 5a2d57992eca13530ac79ae287243b3ff6b01128 Author: Kuniyuki Iwashima kuniyu@amazon.com Date: Thu Feb 15 15:05:16 2024 -0800
arp: Prevent overflow in arp_req_get(). commit a7d6027790acea24446ddd6632d394096c0f4667 upstream.
Ugh, I fixed this up for 5.15, but forgot to do so for older kernels, my fault. I'll go update it now.
thanks,
greg k-h
On 2/27/24 05:26, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.211 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Feb 2024 13:15:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.211-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli florian.fainelli@broadcom.com
Same warning as what Daniel reported for ARM 32-bit:
In file included from ./include/linux/kernel.h:15, from ./include/linux/list.h:9, from ./include/linux/module.h:12, from net/ipv4/arp.c:74: net/ipv4/arp.c: In function 'arp_req_get': ./include/linux/minmax.h:20:35: warning: comparison of distinct pointer types lacks a cast 20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) | ^~ ./include/linux/minmax.h:26:18: note: in expansion of macro '__typecheck' 26 | (__typecheck(x, y) && __no_side_effects(x, y)) | ^~~~~~~~~~~ ./include/linux/minmax.h:36:31: note: in expansion of macro '__safe_cmp' 36 | __builtin_choose_expr(__safe_cmp(x, y), \ | ^~~~~~~~~~ ./include/linux/minmax.h:45:25: note: in expansion of macro '__careful_cmp' 45 | #define min(x, y) __careful_cmp(x, y, <) | ^~~~~~~~~~~~~ net/ipv4/arp.c:1108:32: note: in expansion of macro 'min' 1108 | min(dev->addr_len, sizeof(r->arp_ha.sa_data_min))); | ^~~
Greg Kroah-Hartman wrote on Tue, Feb 27, 2024 at 02:26:01PM +0100:
Kees Cook keescook@chromium.org net: dev: Convert sa_data to flexible array in struct sockaddr (ca13c2b1e9e4b5d982c2f1e75f28b1586e5c0f7f in this tree, b5f0de6df6dce8d641ef58ef7012f3304dffb9a1 upstream)
This commit breaks build of some 3rd party wireless module we use here (because sizeof(sa->sa_data) no longer works and needs to use sa_data_min) With that said I guess it really is a dependency on the arp_req_get overflow, so probably necessary evil, and I don't think we explicitly pretend to preserve APIs for 3rd party modules so this is probably fine... The new warnings that poped up (and were reported in other messages) a probably worth checking though.
That aside no particular problem actually running this, so-- Tested 5d69d611e74d ("Linux 5.10.211-rc1") on: - arm i.MX6ULL (Armadillo 640) - arm64 i.MX8MP (Armadillo G4)
No obvious regression in dmesg or basic tests: Tested-by: Dominique Martinet dominique.martinet@atmark-techno.com
On Wed, Feb 28, 2024 at 08:59:36AM +0900, Dominique Martinet wrote:
Greg Kroah-Hartman wrote on Tue, Feb 27, 2024 at 02:26:01PM +0100:
Kees Cook keescook@chromium.org net: dev: Convert sa_data to flexible array in struct sockaddr (ca13c2b1e9e4b5d982c2f1e75f28b1586e5c0f7f in this tree, b5f0de6df6dce8d641ef58ef7012f3304dffb9a1 upstream)
This commit breaks build of some 3rd party wireless module we use here (because sizeof(sa->sa_data) no longer works and needs to use sa_data_min) With that said I guess it really is a dependency on the arp_req_get overflow, so probably necessary evil, and I don't think we explicitly pretend to preserve APIs for 3rd party modules so this is probably fine... The new warnings that poped up (and were reported in other messages) a probably worth checking though.
We NEVER preserve in-kernel APIs for any out-of-tree code as obviously, we have no idea what out-of-tree code is actually using, so it would be impossible to do so.
Also, it's odd that a driver is hit by this as no in-kernel driver was, so perhaps it's using the wrong api to start with :)
thanks,
greg k-h
On Wed, Feb 28, 2024 at 07:06:38AM +0100, Greg Kroah-Hartman wrote:
On Wed, Feb 28, 2024 at 08:59:36AM +0900, Dominique Martinet wrote:
Greg Kroah-Hartman wrote on Tue, Feb 27, 2024 at 02:26:01PM +0100:
Kees Cook keescook@chromium.org net: dev: Convert sa_data to flexible array in struct sockaddr (ca13c2b1e9e4b5d982c2f1e75f28b1586e5c0f7f in this tree, b5f0de6df6dce8d641ef58ef7012f3304dffb9a1 upstream)
This commit breaks build of some 3rd party wireless module we use here (because sizeof(sa->sa_data) no longer works and needs to use sa_data_min)
Just FYI, it's possible that things using sizeof(sa->sa_data) were buggy to begin with since the struct size isn't actually dictated by that size (it's only the minimum possible size).
With that said I guess it really is a dependency on the arp_req_get overflow, so probably necessary evil, and I don't think we explicitly pretend to preserve APIs for 3rd party modules so this is probably fine... The new warnings that poped up (and were reported in other messages) a probably worth checking though.
We NEVER preserve in-kernel APIs for any out-of-tree code as obviously, we have no idea what out-of-tree code is actually using, so it would be impossible to do so.
Also, it's odd that a driver is hit by this as no in-kernel driver was, so perhaps it's using the wrong api to start with :)
The reason is that most drivers don't want this size (see above) and all the in-tree code that did need adjustment got adjusted (visible in the referenced patch). :) But that's the risk of an out-of-tree driver: it doesn't get those fixes automatically.
Out of curiosity, which drivers broke and what's needed to get them into upstream (or at least staging)?
-Kees
Kees Cook wrote on Wed, Feb 28, 2024 at 12:39:42PM -0800:
This commit breaks build of some 3rd party wireless module we use here (because sizeof(sa->sa_data) no longer works and needs to use sa_data_min)
Just FYI, it's possible that things using sizeof(sa->sa_data) were buggy to begin with since the struct size isn't actually dictated by that size (it's only the minimum possible size).
Yes, I definitely agree with this. As it's "vendor stuff" I just replaced with sa_data_min because that preserves the values, but it ought to get a second look. I'd love to pretend that driver's upstream will do the right thing and use proper values here on newer kernel but upon checking its >6.2 tree support now they apparently did the same instead of getting the size properly.
We NEVER preserve in-kernel APIs for any out-of-tree code as obviously, we have no idea what out-of-tree code is actually using, so it would be impossible to do so.
Right, I just don't see much "common struct" changes in stable tree patches -- stuff like livepatches or weak modules and whatsnot don't like these so some downstreams (redhat to name them) try very hard to keep these constants for the lifetime of a given stable release... iirc they go as far as adding some padding fields to some structs that are likely to need fiddling so they can do this while preserving binary compatibility.
I understand the upstream stable kernels don't make such promise (and given the amount of work that probably goes into it, rightfully so! I wouldn't exect you or anyone to do this here), just pointed it out as part of my usual test round for anyone else who'd care.
Out of curiosity, which drivers broke and what's needed to get them into upstream (or at least staging)?
Sure, it was NXP's wifi chips driver: https://github.com/nxp-imx/mwifiex/
It's mostly based on drivers/net/wireless/marvell/mwifiex but has since been quite extensively modified, so it'd take quite a bit of effort to upstream as a separate entity (changing a few names to avoid conflcts so both can be built together... Add to that the requirement for a compatible firmware with a restrictive license... And that NXP isn't exactly focused on upstreaming); I have little hope of seeing it upstream at this point unfortunately and gave up on it as part of maintaining an embedded kernel "port" as it's sadly far from being only one :/
(I especially don't get it as I consider maintaining a bunch of spaghetti ifdef on kernel versions to be much more work than getting the driver upstream once, but I guess I'm barking at the wrong tree here)
Thanks,
On Tue, 27 Feb 2024 14:26:01 +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.211 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Feb 2024 13:15:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.211-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v5.10: 10 builds: 10 pass, 0 fail 26 boots: 26 pass, 0 fail 68 tests: 68 pass, 0 fail
Linux version: 5.10.211-rc1-g5d69d611e74d Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On Tuesday, February 27, 2024 18:56 IST, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.10.211 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Feb 2024 13:15:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.211-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
KernelCI report for stable-rc/linux-5.10.y for this week.
## stable-rc HEAD for linux-5.10.y: Date: 2024-02-27 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/l...
## Build failures: No build failures seen for the stable-rc/linux-5.10.y commit head \o/
## Boot failures: No **new** boot failures seen for the stable-rc/linux-5.10.y commit head \o/
Tested-by: kernelci.org bot bot@kernelci.org
Thanks, Shreeya Patel
linux-stable-mirror@lists.linaro.org