This is the start of the stable review cycle for the 6.7.6 release. There are 309 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 22 Feb 2024 20:55:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.7.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.7.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.7.6-rc1
NeilBrown neilb@suse.de nfsd: don't take fi_lock in nfsd_break_deleg_cb()
Jozsef Kadlecsik kadlec@netfilter.org netfilter: ipset: Missing gc cancellations fixed
Jozsef Kadlecsik kadlec@netfilter.org netfilter: ipset: fix performance regression in swap operation
Mark Brown broonie@kernel.org usb: typec: tpcm: Fix issues with power being removed during reset
Damien Le Moal dlemoal@kernel.org block: fix partial zone append completion handling in req_bio_endio()
Junxiao Bi junxiao.bi@oracle.com md: bypass block throttle for superblock update
Steven Rostedt (Google) rostedt@goodmis.org tracing: Inform kmemleak of saved_cmdlines allocation
Petr Pavlu petr.pavlu@suse.com tracing: Fix HAVE_DYNAMIC_FTRACE_WITH_REGS ifdef
Oleg Nesterov oleg@redhat.com fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats
Oleg Nesterov oleg@redhat.com fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of lock_task_sighand()
Konrad Dybcio konrad.dybcio@linaro.org pmdomain: core: Move the unused cleanup to a _sync initcall
Oleksij Rempel o.rempel@pengutronix.de can: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER)
Ziqi Zhao astrajoan@yahoo.com can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock
Maxime Jayat maxime.jayat@mobile-devices.fr can: netlink: Fix TDCO calculation using the old data bittiming
Maximilian Heyne mheyne@amazon.de xen/events: close evtchn after mapping cleanup
Nuno Sa nuno.sa@analog.com of: property: fix typo in io-channels
Vegard Nossum vegard.nossum@oracle.com docs: kernel_feat.py: fix build error for missing files
Jan Kara jack@suse.cz blk-wbt: Fix detection of dirty-throttled tasks
Huacai Chen chenhuacai@kernel.org LoongArch: Fix earlycon parameter if KASAN enabled
Prakash Sangappa prakash.sangappa@oracle.com mm: hugetlb pages should not be reserved by shmat() if SHM_NORESERVE
Oscar Salvador osalvador@suse.de fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super
Dave Airlie airlied@redhat.com nouveau/gsp: use correct size for registry rpc.
Rishabh Dave ridave@redhat.com ceph: prevent use-after-free in encode_cap_msg()
Shradha Gupta shradhagupta@linux.microsoft.com hv_netvsc: Register VF in netvsc_probe if NET_DEVICE_REGISTER missed
Petr Tesarik petr@tesarici.cz net: stmmac: protect updates of 64-bit statistics counters
Jan Kiszka jan.kiszka@siemens.com riscv/efistub: Ensure GP-relative addressing is not used
Geert Uytterhoeven geert+renesas@glider.be pmdomain: renesas: r8a77980-sysc: CR7 must be always on
Sinthu Raja sinthu.raja@ti.com net: ethernet: ti: cpsw_new: enable mac_managed_pm to fix mdio
SeongJae Park sj@kernel.org mm/damon/sysfs-schemes: fix wrong DAMOS tried regions update timeout setup
Alexandra Winter wintera@linux.ibm.com s390/qeth: Fix potential loss of L3-IP@ in case of network issues
Sinthu Raja sinthu.raja@ti.com net: ethernet: ti: cpsw: enable mac_managed_pm to fix mdio
Christian Brauner brauner@kernel.org fs: relax mount_setattr() permission checks
Daniel Bristot de Oliveira bristot@kernel.org tools/rtla: Fix Makefile compiler options for clang
Daniel Bristot de Oliveira bristot@kernel.org tools/rtla: Fix uninitialized bucket/data->bucket_size warning
John Kacur jkacur@redhat.com tools/rtla: Exit with EXIT_SUCCESS when help is invoked
Daniel Bristot de Oliveira bristot@kernel.org tools/rtla: Fix clang warning about mount_point var size
limingming3 limingming890315@gmail.com tools/rtla: Replace setting prio with nice for SCHED_OTHER
Daniel Bristot de Oliveira bristot@kernel.org tools/rtla: Remove unused sched_getattr() function
Daniel Bristot de Oliveira bristot@kernel.org tools/rv: Fix Makefile compiler options for clang
Daniel Bristot de Oliveira bristot@kernel.org tools/rv: Fix curr_reactor uninitialized variable
Mario Limonciello mario.limonciello@amd.com ASoC: amd: yc: Add DMI quirk for Lenovo Ideapad Pro 5 16ARP8
Gergo Koteles soyer@irl.hu ASoC: tas2781: add module parameter to tascodec_init()
Curtis Malainey cujomalainey@chromium.org ASoC: SOF: IPC3: fix message bounds on ipc ops
Easwar Hariharan eahariha@linux.microsoft.com arm64: Subscribe Microsoft Azure Cobalt 100 to ARM Neoverse N2 errata
Mark Brown broonie@kernel.org arm64/signal: Don't assume that TIF_SVE means we saved SVE state
Fred Ai fred.ai@bayhubtech.com mmc: sdhci-pci-o2micro: Fix a warm reboot issue that disk can't be detected by BIOS
Damien Le Moal dlemoal@kernel.org zonefs: Improve error handling
Sebastian Ene sebastianene@google.com KVM: arm64: Fix circular locking dependency
Christian Borntraeger borntraeger@linux.ibm.com KVM: s390: vsie: fix race during shadow creation
Steve French stfrench@microsoft.com smb: Fix regression in writes when non-standard maximum write size negotiated
Paulo Alcantara pc@manguebit.com smb: client: set correct id, uid and cruid for multiuser automounts
Mohammad Rahimi rahimi.mhmmd@gmail.com thunderbolt: Fix setting the CNS bit in ROUTER_CS_5
Marc Zyngier maz@kernel.org irqchip/gic-v3-its: Fix GICv4.1 VPE affinity update
Marc Zyngier maz@kernel.org irqchip/gic-v3-its: Restore quirk probing for ACPI-based systems
Doug Berger opendmb@gmail.com irqchip/irq-brcmstb-l2: Add write memory barrier before exit
Dan Carpenter dan.carpenter@linaro.org PCI: dwc: Fix a 64bit bug in dw_pcie_ep_raise_msix_irq()
Emmanuel Grumbach emmanuel.grumbach@intel.com wifi: iwlwifi: mvm: fix a crash when we run out of stations
Johannes Berg johannes.berg@intel.com wifi: mac80211: reload info pointer in ieee80211_tx_dequeue()
Johannes Berg johannes.berg@intel.com wifi: cfg80211: fix wiphy delayed work queueing
Johannes Berg johannes.berg@intel.com wifi: iwlwifi: fix double-free bug
Daniel de Villiers daniel.devilliers@corigine.com nfp: flower: prevent re-adding mac index for bonded port
James Hershaw james.hershaw@corigine.com nfp: enable NETDEV_XDP_ACT_REDIRECT feature flag
Daniel Basilio daniel.basilio@corigine.com nfp: use correct macro for LengthSelect in BAR config
Herbert Xu herbert@gondor.apana.org.au crypto: algif_hash - Remove bogus SGL free on zero-length error path
Kim Phillips kim.phillips@amd.com crypto: ccp - Fix null pointer dereference in __sev_platform_shutdown_locked
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: fix data corruption in dsync block recovery for small block sizes
Shuming Fan shumingf@realtek.com ALSA: hda/realtek: add IDs for Dell dual spk platform
bo liu bo.liu@senarytech.com ALSA: hda/conexant: Add quirk for SWS JS201D
Eniac Zhang eniac-xw.zhang@hp.com ALSA: hda/realtek: fix mute/micmute LED For HP mt645
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org gpiolib: add gpiod_to_gpio_device() stub for !GPIOLIB
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org gpiolib: add gpio_device_get_base() stub for !GPIOLIB
Alexander Stein alexander.stein@ew.tq-group.com mmc: slot-gpio: Allow non-sleeping GPIO ro
Jens Axboe axboe@kernel.dk io_uring/net: fix multishot accept overflow handling
Steve Wahl steve.wahl@hpe.com x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
Mingwei Zhang mizhang@google.com KVM: x86/pmu: Fix type length error when reading pmu->fixed_ctr_ctrl
Prasad Pandit pjp@fedoraproject.org KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu
Andrei Vagin avagin@google.com x86/fpu: Stop relying on userspace for info to fault in xsave buffer
Aleksander Mazur deweloper@wp.pl x86/Kconfig: Transmeta Crusoe is CPU family 5, not 6
Jiri Slaby (SUSE) jirislaby@kernel.org serial: mxs-auart: fix tx
Jiri Slaby (SUSE) jirislaby@kernel.org serial: core: introduce uart_port_tx_flags()
Shrikanth Hegde sshegde@linux.ibm.com powerpc/pseries: fix accuracy of stolen time
David Engraf david.engraf@sysgo.com powerpc/cputable: Add missing PPC_FEATURE_BOOKE on PPC64 Book-E
Naveen N Rao naveen@kernel.org powerpc/64: Set task pt_regs->link to the LR value on scv entry
Masami Hiramatsu (Google) mhiramat@kernel.org ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default
Hugo Villeneuve hvilleneuve@dimonoff.com serial: max310x: prevent infinite while() loop in port startup
Hugo Villeneuve hvilleneuve@dimonoff.com serial: max310x: fail probe if clock crystal is unstable
Hugo Villeneuve hvilleneuve@dimonoff.com serial: max310x: improve crystal stable clock detection
Hugo Villeneuve hvilleneuve@dimonoff.com serial: max310x: set default value when reading clock ready bit
Gui-Dong Han 2045gemini@gmail.com serial: core: Fix atomicity violation in uart_tiocmget
Hui Zhou hui.zhou@corigine.com nfp: flower: fix hardware offload for the transfer layer port
Hui Zhou hui.zhou@corigine.com nfp: flower: add hardware offload check for post ct entry
Andrew Lunn andrew@lunn.ch net: dsa: mv88e6xxx: Fix failed probe due to unsupported C45 reads
Vincent Donnefort vdonnefort@google.com ring-buffer: Clean ring_buffer_poll_wait() error return
Souradeep Chakrabarti schakrabarti@linux.microsoft.com hv_netvsc: Fix race condition between netvsc_probe and netvsc_remove
Lijo Lazar lijo.lazar@amd.com drm/amdgpu: Avoid fetching VRAM vendor info
Tom Chung chiahsuan.chung@amd.com drm/amd/display: Preserve original aspect ratio in create stream
Roman Li roman.li@amd.com drm/amd/display: Fix array-index-out-of-bounds in dcn35_clkmgr
Nathan Chancellor nathan@kernel.org drm/amd/display: Increase frame-larger-than for all display_mode_vba files
Fangzhi Zuo jerry.zuo@amd.com drm/amd/display: Fix MST Null Ptr for RV
Thong thong.thai@amd.com drm/amdgpu/soc21: update VCN 4 max HEVC encoding resolution
Philip Yang Philip.Yang@amd.com drm/prime: Support page array >= 4GB
Ville Syrjälä ville.syrjala@linux.intel.com drm/i915/dp: Limit SST link rate to <=8.1Gbps
Zhikai Zhai zhikai.zhai@amd.com drm/amd/display: Add align done check
Rob Clark robdclark@chromium.org drm/msm: Wire up tlb ops
Arunpravin Paneer Selvam Arunpravin.PaneerSelvam@amd.com drm/buddy: Fix alloc_range() error handling code
Timur Tabi ttabi@nvidia.com drm/nouveau: fix several DMA buffer leaks
Fedor Pchelkin pchelkin@ispras.ru ksmbd: free aux buffer if ksmbd_iov_pin_rsp_read fails
Oleg Nesterov oleg@redhat.com getrusage: use sig->stats_lock rather than lock_task_sighand()
Oleg Nesterov oleg@redhat.com getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand()
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Keep all directory links at 1
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Remove fsnotify*() functions from lookup()
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Restructure eventfs_inode structure to be more condensed
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Warn if an eventfs_inode is freed without is_freed being set
Linus Torvalds torvalds@linux-foundation.org eventfs: Get rid of dentry pointers without refcounts
Linus Torvalds torvalds@linux-foundation.org eventfs: Clean up dentry ops and add revalidate function
Linus Torvalds torvalds@linux-foundation.org eventfs: Remove unused d_parent pointer field
Linus Torvalds torvalds@linux-foundation.org tracefs: dentry lookup crapectomy
Linus Torvalds torvalds@linux-foundation.org tracefs: Avoid using the ei->dentry pointer unnecessarily
Linus Torvalds torvalds@linux-foundation.org eventfs: Initialize the tracefs inode properly
Steven Rostedt (Google) rostedt@goodmis.org tracefs: Zero out the tracefs_inode when allocating it
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Save directory inodes in the eventfs_inode structure
Erick Archer erick.archer@gmx.com eventfs: Use kcalloc() instead of kzalloc()
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Do not create dentries nor inodes in iterate_shared
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Have the inodes all for files and directories all be the same
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Shortcut eventfs_iterate() by skipping entries already read
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Read ei->entries before ei->children in eventfs_iterate()
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Do ctx->pos update for all iterations in eventfs_iterate()
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Have eventfs_iterate() stop immediately if ei->is_freed is set
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Stop using dcache_readdir() for getdents()
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Remove "lookup" parameter from create_dir/file_dentry()
Sean Young sean@mess.org media: rc: bpf attach/detach requires write permission
Eugen Hristev eugen.hristev@collabora.com pmdomain: mediatek: fix race conditions with genpd
Sam Protsenko semen.protsenko@linaro.org iio: pressure: bmp280: Add missing bmp085 to SPI id table
Randy Dunlap rdunlap@infradead.org iio: imu: bno055: serdev requires REGMAP
Nuno Sa nuno.sa@analog.com iio: imu: adis: ensure proper DMA alignment
Nuno Sa nuno.sa@analog.com iio: adc: ad_sigma_delta: ensure proper DMA alignment
Mario Limonciello mario.limonciello@amd.com iio: accel: bma400: Fix a compilation problem
Nuno Sa nuno.sa@analog.com iio: commom: st_sensors: ensure proper DMA alignment
Dinghao Liu dinghao.liu@zju.edu.cn iio: core: fix memleak in iio_device_register_sysfs
zhili.liu zhili.liu@ucas.com.cn iio: magnetometer: rm3100: add boundary check for the value read from RM3100_REG_TMRC
David Schiller david.schiller@jku.at staging: iio: ad5933: fix type mismatch regression
Tejun Heo tj@kernel.org Revert "workqueue: Override implicit ordered attribute in workqueue_apply_unbound_cpumask()"
Masami Hiramatsu (Google) mhiramat@kernel.org tracing/probes: Fix to search structure fields correctly
Masami Hiramatsu (Google) mhiramat@kernel.org tracing/probes: Fix to set arg size and fmt after setting type from BTF
Masami Hiramatsu (Google) mhiramat@kernel.org tracing/probes: Fix to show a parse error for bad type for $comm
Thorsten Blum thorsten.blum@toblux.com tracing/synthetic: Fix trace_string() return value
Steven Rostedt (Google) rostedt@goodmis.org tracing: Fix wasted memory in saved_cmdlines logic
Daniel Bristot de Oliveira bristot@kernel.org tracing/timerlat: Move hrtimer_init to timerlat_fd open()
Baokun Li libaokun1@huawei.com ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks()
Baokun Li libaokun1@huawei.com ext4: fix double-free of blocks due to wrong extents moved_len
Ekansh Gupta quic_ekangupt@quicinc.com misc: fastrpc: Mark all sessions as invalid in cb_remove
Carlos Llamas cmllamas@google.com binder: signal epoll threads of self-work
Andy Chi andy.chi@canonical.com ALSA: hda/realtek: fix mute/micmute LEDs for HP ZBook Power
Vitaly Rodionov vitalyr@opensource.cirrus.com ALSA: hda/cs8409: Suppress vmaster control for Dolphin models
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org ASoC: codecs: wcd938x: handle deferred probe
Kailang Yang kailang@realtek.com ALSA: hda/realtek - Add speaker pin verbtable for Dell dual speaker platform
Edson Juliano Drosdeck edson.drosdeck@gmail.com ALSA: hda/realtek: Enable headset mic on Vaio VJFE-ADL
Nathan Chancellor nathan@kernel.org modpost: Add '.ltext' and '.ltext.*' to TEXT_SECTIONS
Nathan Chancellor nathan@kernel.org um: Fix adding '-no-pie' for clang
Jan Beulich jbeulich@suse.com xen-netback: properly sync TX responses
Helge Deller deller@gmx.de parisc: BTLB: Fix crash when setting up BTLB at CPU bringup
Helge Deller deller@gmx.de parisc: Fix random data corruption from exception handler
Esben Haabendal esben@geanix.com net: stmmac: do not clear TBS enable bit on link up/down
Nikita Zhandarovich n.zhandarovich@fintech.ru net: hsr: remove WARN_ONCE() in send_hsr_supervision_frame()
Fedor Pchelkin pchelkin@ispras.ru nfc: nci: free rx_data_reassembly skb on NCI device cleanup
Nathan Chancellor nathan@kernel.org kbuild: Fix changing ELF file type for output of gen_btf for big endian
José Relvas josemonsantorelvas@gmail.com ALSA: hda/realtek: Apply headset jack quirk for non-bass alc287 thinkpads
Takashi Sakamoto o-takashi@sakamocchi.jp firewire: core: correct documentation of fw_csr_string() kernel API
Ondrej Mosnacek omosnace@redhat.com lsm: fix the logic in security_inode_getsecctx()
Ondrej Mosnacek omosnace@redhat.com lsm: fix default return value of the socket_getpeersec_*() hooks
Fangzhi Zuo jerry.zuo@amd.com drm/amd/display: Fix dcn35 8k30 Underflow/Corruption Issue
Wenjing Liu wenjing.liu@amd.com drm/amd/display: fix incorrect mpc_combine array size
David McFarland corngood@gmail.com drm/amd: Don't init MEC2 firmware when it fails to load
Friedrich Vock friedrich.vock@gmx.de drm/amdgpu: Reset IH OVERFLOW_CLEAR bit
Sebastian Ott sebott@redhat.com drm/virtio: Set segment size for virtio_gpu device
Vaishnav Achath vaishnav.a@ti.com spi: omap2-mcspi: Revert FIFO support without DMA
Keqi Wang wangkeqi_chris@163.com connector/cn_proc: revert "connector: Fix proc_event_num_listeners count not cleared"
Rob Clark robdclark@chromium.org Revert "drm/msm/gpu: Push gpu lock down past runpm"
Mario Limonciello mario.limonciello@amd.com Revert "drm/amd: flush any delayed gfxoff on suspend entry"
Lee Duncan lduncan@suse.com scsi: Revert "scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock"
Tomi Valkeinen tomi.valkeinen@ideasonboard.com media: Revert "media: rkisp1: Drop IRQF_SHARED"
Michael Ellerman mpe@ellerman.id.au Revert "powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add"
Paolo Abeni pabeni@redhat.com mptcp: really cope with fastopen race
Geliang Tang geliang@kernel.org mptcp: check addrs list in userspace_pm_get_local_id
Paolo Abeni pabeni@redhat.com mptcp: fix rcv space initialization
Paolo Abeni pabeni@redhat.com mptcp: drop the push_pending field
Geliang Tang geliang.tang@linux.dev selftests: mptcp: add mptcp_lib_kill_wait
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: mptcp: allow changing subtests prefix
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: mptcp: increase timeout to 30 min
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: mptcp: add missing kconfig for NF Mangle
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: mptcp: add missing kconfig for NF Filter in v6
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: mptcp: add missing kconfig for NF Filter
Paolo Abeni pabeni@redhat.com mptcp: fix data re-injection from stale subflow
Arnd Bergmann arnd@arndb.de kallsyms: ignore ARMv4 thunks along with others
Radek Krejci radek.krejci@oracle.com modpost: trim leading spaces when processing source files list
Jean Delvare jdelvare@suse.de i2c: i801: Fix block process call transactions
Arnd Bergmann arnd@arndb.de i2c: pasemi: split driver into two separate modules
Shivaprasad G Bhat sbhat@linux.ibm.com powerpc/iommu: Fix the missing iommu_group_put() during platform domain attach
Michael Ellerman mpe@ellerman.id.au powerpc/kasan: Limit KASAN thread size increase to 32KB
Marc Zyngier maz@kernel.org irqchip/gic-v3-its: Handle non-coherent GICv4 redistributors
Bibo Mao maobibo@loongson.cn irqchip/loongson-eiointc: Use correct struct type in eiointc_domain_alloc()
Viken Dadhaniya quic_vdadhani@quicinc.com i2c: qcom-geni: Correct I2C TRE sequence
Dan Carpenter dan.carpenter@linaro.org cifs: fix underflow in parse_server_interfaces()
Cosmin Tanislav demonsingur@gmail.com iio: adc: ad4130: only set GPIO_CTRL if pin is unused
Cosmin Tanislav demonsingur@gmail.com iio: adc: ad4130: zero-initialize clock init data
Alex Williamson alex.williamson@redhat.com PCI: Fix active state requirement in PME polling
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "kobject: Remove redundant checks for whether ktype is NULL"
Jiangfeng Xiao xiaojiangfeng@huawei.com powerpc/kasan: Fix addr error caused by page alignment
Matthias Schiffer matthias.schiffer@ew.tq-group.com powerpc/6xx: set High BAT Enable flag on G2_LE cores
Gaurav Batra gbatra@linux.ibm.com powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add
Saravana Kannan saravanak@google.com driver core: fw_devlink: Improve detection of overlapping cycles
Zhipeng Lu alexious@zju.edu.cn media: ir_toy: fix a memleak in irtoy_tx
Konrad Dybcio konrad.dybcio@linaro.org interconnect: qcom: sm8550: Enable sync_state
Konrad Dybcio konrad.dybcio@linaro.org interconnect: qcom: sc8180x: Mark CO0 BCM keepalive
Uttkarsh Aggarwal quic_uaggarwa@quicinc.com usb: dwc3: gadget: Fix NULL pointer dereference in dwc3_gadget_suspend
Udipto Goswami quic_ugoswami@quicinc.com usb: core: Prevent null pointer dereference in update_port_device_state
Xu Yang xu.yang_2@nxp.com usb: chipidea: core: handle power lost in workqueue
yuan linyu yuanlinyu@hihonor.com usb: f_mass_storage: forbid async queue when shutdown happen
Oliver Neukum oneukum@suse.com USB: hub: check for alternate port before enabling A_ALT_HNP_SUPPORT
Christian A. Ehrhardt lk@c--e.de usb: ucsi_acpi: Fix command completion handling
Sean Anderson sean.anderson@seco.com usb: ulpi: Fix debugfs directory leak
Christian A. Ehrhardt lk@c--e.de usb: ucsi: Add missing ppm_lock
Srinivas Pandruvada srinivas.pandruvada@linux.intel.com iio: hid-sensor-als: Return 0 for HID_USAGE_SENSOR_TIME_TIMESTAMP
Jason Gerecke killertofu@gmail.com HID: wacom: Do not register input devices until after hid_hw_start
Tatsunosuke Tobita tatsunosuke.tobita@wacom.com HID: wacom: generic: Avoid reporting a serial of '0' to userspace
Johan Hovold johan+linaro@kernel.org HID: i2c-hid-of: fix NULL-deref on failed power up
Benjamin Tissoires bentiss@kernel.org HID: bpf: actually free hdev memory after attaching a HID-BPF program
Benjamin Tissoires bentiss@kernel.org HID: bpf: remove double fdget()
Luka Guzenko l.guzenko@web.de ALSA: hda/realtek: Enable Mute LED on HP Laptop 14-fq0xxx
David Senoner seda18@rolmail.net ALSA: hda/realtek: Fix the external mic not being recognised for Acer Swift 1 SF114-32
Helge Deller deller@gmx.de parisc: Prevent hung tasks when printing inventory on serial console
Techno Mooney techno.mooney@gmail.com ASoC: amd: yc: Add DMI quirk for MSI Bravo 15 C7VF
Mikulas Patocka mpatocka@redhat.com dm-crypt, dm-verity: disable tasklets
Dave Airlie airlied@redhat.com nouveau: offload fence uevents work to workqueue
Michael Kelley mhklinux@outlook.com scsi: storvsc: Fix ring buffer size calculation
Nico Pache npache@redhat.com selftests: mm: fix map_hugetlb failure on 64K page size systems
Audra Mitchell audra@redhat.com selftests/mm: Update va_high_addr_switch.sh to check CPU for la57 flag
Zach O'Keefe zokeefe@google.com mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again
Jan Kara jack@suse.cz readahead: avoid multiple marked readahead pages
Muhammad Usama Anjum usama.anjum@collabora.com selftests/mm: switch to bash from sh
Sidhartha Kumar sidhartha.kumar@oracle.com fs/hugetlbfs/inode.c: mm/memory-failure.c: fix hugetlbfs hwpoison handling
Masami Hiramatsu (Google) mhiramat@kernel.org tracing/trigger: Fix to return error if failed to alloc snapshot
Samuel Holland samuel.holland@sifive.com scs: add CONFIG_MMU dependency for vfree_atomic()
Ryan Roberts ryan.roberts@arm.com selftests/mm: ksm_tests should only MADV_HUGEPAGE valid memory
Lokesh Gidra lokeshgidra@google.com userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb
Ryan Roberts ryan.roberts@arm.com mm: thp_get_unmapped_area must honour topdown preference
Ivan Vecera ivecera@redhat.com i40e: Fix waiting for queues of all VSIs to be disabled
Ivan Vecera ivecera@redhat.com i40e: Do not allow untrusted VF to remove administratively set MAC
Jiaxun Yang jiaxun.yang@flygoat.com mm/memory: Use exception ip to search exception tables
Jiaxun Yang jiaxun.yang@flygoat.com ptrace: Introduce exception_ip arch hook
Guenter Roeck linux@roeck-us.net MIPS: Add 'memory' clobber to csum_ipv6_magic() inline assembler
Arnd Bergmann arnd@arndb.de nouveau/svm: fix kvcalloc() argument order
Breno Leitao leitao@debian.org net: sysfs: Fix /sys/class/net/<iface> path for statistics
Manasi Navare navaremanasi@chromium.org drm/i915/dsc: Fix the macro that calculates DSCC_/DSCA_ PPS reg address
Alexey Khoroshilov khoroshilov@ispras.ru ASoC: rt5645: Fix deadlock in rt5645_jack_detect_work()
Uwe Kleine-König u.kleine-koenig@pengutronix.de spi: ppc4xx: Drop write-only variable
Jakub Kicinski kuba@kernel.org net: tls: fix returned read length with async decrypt
Sabrina Dubroca sd@queasysnail.net net: tls: fix use-after-free with partial reads and async decrypt
Jakub Kicinski kuba@kernel.org net: tls: handle backlogging of crypto requests
Jakub Kicinski kuba@kernel.org tls: fix race between tx work scheduling and socket close
Jakub Kicinski kuba@kernel.org tls: fix race between async notify and socket close
Jakub Kicinski kuba@kernel.org net: tls: factor out tls_*crypt_async_wait()
Horatiu Vultur horatiu.vultur@microchip.com lan966x: Fix crash when adding interface under a lag
Aaron Conole aconole@redhat.com net: openvswitch: limit the number of recursions from action sets
Ido Schimmel idosch@nvidia.com selftests: forwarding: Fix bridge locked port test flakiness
Ido Schimmel idosch@nvidia.com selftests: forwarding: Suppress grep warnings
Ido Schimmel idosch@nvidia.com selftests: forwarding: Fix bridge MDB test flakiness
Ido Schimmel idosch@nvidia.com selftests: forwarding: Fix layer 2 miss test flakiness
Ido Schimmel idosch@nvidia.com selftests: net: Fix bridge backup port test flakiness
Hangbin Liu liuhangbin@gmail.com selftests/net: convert test_bridge_backup_port.sh to run it in unique namespace
Hojin Nam hj96.nam@samsung.com perf: CXL: fix mismatched cpmu event opcode
Lukas Bulwahn lukas.bulwahn@gmail.com ALSA: hda/cs35l56: select intended config FW_CS_DSP
Saravana Kannan saravanak@google.com of: property: Improve finding the supplier of a remote-endpoint property
Saravana Kannan saravanak@google.com of: property: Improve finding the consumer of a remote-endpoint property
Parav Pandit parav@nvidia.com devlink: Fix command annotation documentation
Magnus Karlsson magnus.karlsson@intel.com bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY
Chuck Lever chuck.lever@oracle.com net/handshake: Fix handshake_req_destroy_test1
Jiri Pirko jiri@resnulli.us net/mlx5: DPLL, Fix possible use after free after delayed work timer triggers
Jiri Pirko jiri@resnulli.us dpll: fix possible deadlock during netlink dump operation
Ranjani Sridharan ranjani.sridharan@linux.intel.com ASoC: SOF: ipc3-topology: Fix pipeline tear down logic
Dan Carpenter dan.carpenter@linaro.org wifi: iwlwifi: uninitialized variable in iwl_acpi_get_ppag_table()
Dan Carpenter dan.carpenter@linaro.org wifi: iwlwifi: Fix some error codes
Miri Korenblit miriam.rachel.korenblit@intel.com wifi: iwlwifi: clear link_id in time_event
Amadeusz Sławiński amadeuszx.slawinski@linux.intel.com ASoC: Intel: avs: Fix dynamic port assignment when TDM is set
Carlos Song carlos.song@nxp.com spi: imx: fix the burst length at DMA mode and CPU mode
Cezary Rojewski cezary.rojewski@intel.com ASoC: Intel: avs: Fix pci_probe() error path
Mickaël Salaün mic@digikod.net selftests/landlock: Fix capability for net_test
Rob Clark robdclark@chromium.org drm/msm/gem: Fix double resv lock aquire
Christian A. Ehrhardt lk@c--e.de of: unittest: Fix compile in the non-dynamic case
Hu Yadi hu.yadi@h3c.com selftests/landlock: Fix fs_test build with old libc
Hu Yadi hu.yadi@h3c.com selftests/landlock: Fix net_test build with old libc
Nícolas F. R. A. Prado nfraprado@collabora.com kselftest: dt: Stop relying on dirname to improve performance
Saravana Kannan saravanak@google.com driver core: Fix device_link_flag_is_sync_state_only()
Josef Bacik josef@toxicpanda.com btrfs: don't drop extent_map for free space inode on write error
Filipe Manana fdmanana@suse.com btrfs: reject encoded write if inode has nodatasum flag set
Filipe Manana fdmanana@suse.com btrfs: don't reserve space for checksums when writing to nocow files
David Sterba dsterba@suse.com btrfs: send: return EOPNOTSUPP on unknown flags
Boris Burkov boris@bur.io btrfs: forbid deleting live subvol qgroup
Qu Wenruo wqu@suse.com btrfs: do not ASSERT() if the newly created subvolume already got read
Boris Burkov boris@bur.io btrfs: forbid creating subvol qgroups
Filipe Manana fdmanana@suse.com btrfs: don't refill whole delayed refs block reserve when starting transaction
Filipe Manana fdmanana@suse.com btrfs: add new unused block groups to the list of unused block groups
Filipe Manana fdmanana@suse.com btrfs: do not delete unused block group if it may be used soon
Filipe Manana fdmanana@suse.com btrfs: add and use helper to check if block group is used
Yang Shi yang@os.amperecomputing.com mm: mmap: map MAP_STACK to VM_NOHUGEPAGE
Yang Shi yang@os.amperecomputing.com mm: huge_memory: don't force huge page alignment on 32 bit
Linus Torvalds torvalds@linux-foundation.org update workarounds for gcc "asm goto" issue
Linus Torvalds torvalds@linux-foundation.org work around gcc bugs with 'asm goto' with outputs
-------------
Diffstat:
.../ABI/testing/sysfs-class-net-statistics | 48 +- Documentation/arch/arm64/silicon-errata.rst | 7 + Documentation/netlink/specs/dpll.yaml | 4 - Documentation/networking/devlink/devlink-port.rst | 2 +- Documentation/sphinx/kernel_feat.py | 2 +- Makefile | 4 +- arch/Kconfig | 1 + arch/arc/include/asm/jump_label.h | 4 +- arch/arm/include/asm/jump_label.h | 4 +- arch/arm64/include/asm/alternative-macros.h | 4 +- arch/arm64/include/asm/cputype.h | 4 + arch/arm64/include/asm/jump_label.h | 4 +- arch/arm64/kernel/cpu_errata.c | 3 + arch/arm64/kernel/fpsimd.c | 2 +- arch/arm64/kernel/signal.c | 4 +- arch/arm64/kvm/pkvm.c | 27 +- arch/csky/include/asm/jump_label.h | 4 +- arch/loongarch/include/asm/jump_label.h | 4 +- arch/loongarch/mm/kasan_init.c | 3 + arch/mips/include/asm/checksum.h | 3 +- arch/mips/include/asm/jump_label.h | 4 +- arch/mips/include/asm/ptrace.h | 2 + arch/mips/kernel/ptrace.c | 7 + arch/parisc/Kconfig | 1 - arch/parisc/include/asm/assembly.h | 1 + arch/parisc/include/asm/extable.h | 64 ++ arch/parisc/include/asm/jump_label.h | 4 +- arch/parisc/include/asm/special_insns.h | 6 +- arch/parisc/include/asm/uaccess.h | 48 +- arch/parisc/kernel/cache.c | 6 +- arch/parisc/kernel/drivers.c | 3 + arch/parisc/kernel/unaligned.c | 44 +- arch/parisc/mm/fault.c | 11 +- arch/powerpc/include/asm/jump_label.h | 4 +- arch/powerpc/include/asm/reg.h | 2 + arch/powerpc/include/asm/thread_info.h | 2 +- arch/powerpc/include/asm/uaccess.h | 12 +- arch/powerpc/kernel/cpu_setup_6xx.S | 20 +- arch/powerpc/kernel/cpu_specs_e500mc.h | 3 +- arch/powerpc/kernel/interrupt_64.S | 4 +- arch/powerpc/kernel/iommu.c | 4 +- arch/powerpc/kernel/irq_64.c | 2 +- arch/powerpc/mm/kasan/init_32.c | 1 + arch/powerpc/platforms/pseries/lpar.c | 8 +- arch/riscv/include/asm/bitops.h | 8 +- arch/riscv/include/asm/cpufeature.h | 4 +- arch/riscv/include/asm/jump_label.h | 4 +- arch/s390/include/asm/jump_label.h | 4 +- arch/s390/kvm/vsie.c | 1 - arch/s390/mm/gmap.c | 1 + arch/sparc/include/asm/jump_label.h | 4 +- arch/um/Makefile | 4 +- arch/um/include/asm/cpufeature.h | 2 +- arch/x86/Kconfig.cpu | 2 +- arch/x86/include/asm/cpufeature.h | 2 +- arch/x86/include/asm/jump_label.h | 6 +- arch/x86/include/asm/rmwcc.h | 2 +- arch/x86/include/asm/special_insns.h | 2 +- arch/x86/include/asm/uaccess.h | 10 +- arch/x86/kernel/fpu/signal.c | 13 +- arch/x86/kvm/svm/svm_ops.h | 6 +- arch/x86/kvm/vmx/pmu_intel.c | 2 +- arch/x86/kvm/vmx/vmx.c | 4 +- arch/x86/kvm/vmx/vmx_ops.h | 6 +- arch/x86/kvm/x86.c | 3 +- arch/x86/mm/ident_map.c | 23 +- arch/xtensa/include/asm/jump_label.h | 4 +- block/blk-mq.c | 9 +- block/blk-wbt.c | 4 +- crypto/algif_hash.c | 5 +- drivers/android/binder.c | 10 + drivers/base/core.c | 15 +- drivers/base/power/domain.c | 2 +- drivers/connector/cn_proc.c | 5 +- drivers/crypto/ccp/sev-dev.c | 10 +- drivers/dpll/dpll_netlink.c | 20 +- drivers/dpll/dpll_nl.c | 4 - drivers/dpll/dpll_nl.h | 2 - drivers/firewire/core-device.c | 7 +- drivers/firmware/efi/libstub/Makefile | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 9 +- drivers/gpu/drm/amd/amdgpu/cik_ih.c | 6 + drivers/gpu/drm/amd/amdgpu/cz_ih.c | 5 + drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 2 - drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 8 - drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 5 + drivers/gpu/drm/amd/amdgpu/ih_v6_0.c | 6 + drivers/gpu/drm/amd/amdgpu/ih_v6_1.c | 7 + drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 6 + drivers/gpu/drm/amd/amdgpu/si_ih.c | 6 + drivers/gpu/drm/amd/amdgpu/soc21.c | 4 +- drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 6 + drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 6 + drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 6 + drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 14 +- .../amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c | 15 +- drivers/gpu/drm/amd/display/dc/dml/Makefile | 6 +- .../gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c | 2 +- .../amd/display/dc/dml2/dml2_translation_helper.c | 27 +- drivers/gpu/drm/amd/display/dc/inc/core_types.h | 2 + .../display/dc/link/protocols/link_dp_training.c | 5 +- drivers/gpu/drm/drm_buddy.c | 6 + drivers/gpu/drm/drm_prime.c | 2 +- drivers/gpu/drm/i915/display/intel_dp.c | 3 + drivers/gpu/drm/i915/display/intel_vdsc_regs.h | 4 +- drivers/gpu/drm/msm/msm_gem_prime.c | 4 +- drivers/gpu/drm/msm/msm_gpu.c | 11 +- drivers/gpu/drm/msm/msm_iommu.c | 32 +- drivers/gpu/drm/msm/msm_ringbuffer.c | 7 +- drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h | 2 +- drivers/gpu/drm/nouveau/nouveau_fence.c | 26 +- drivers/gpu/drm/nouveau/nouveau_fence.h | 1 + drivers/gpu/drm/nouveau/nouveau_svm.c | 2 +- drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 61 +- drivers/gpu/drm/virtio/virtgpu_drv.c | 1 + drivers/hid/bpf/hid_bpf_dispatch.c | 83 ++- drivers/hid/bpf/hid_bpf_dispatch.h | 4 +- drivers/hid/bpf/hid_bpf_jmp_table.c | 40 +- drivers/hid/i2c-hid/i2c-hid-of.c | 1 + drivers/hid/wacom_sys.c | 63 +- drivers/hid/wacom_wac.c | 9 +- drivers/i2c/busses/Makefile | 6 +- drivers/i2c/busses/i2c-i801.c | 4 +- drivers/i2c/busses/i2c-pasemi-core.c | 6 + drivers/i2c/busses/i2c-qcom-geni.c | 16 +- drivers/iio/accel/Kconfig | 2 + drivers/iio/adc/ad4130.c | 12 +- drivers/iio/imu/bno055/Kconfig | 1 + drivers/iio/industrialio-core.c | 5 +- drivers/iio/light/hid-sensor-als.c | 1 + drivers/iio/magnetometer/rm3100-core.c | 10 +- drivers/iio/pressure/bmp280-spi.c | 1 + drivers/interconnect/qcom/sc8180x.c | 1 + drivers/interconnect/qcom/sm8550.c | 1 + drivers/irqchip/irq-brcmstb-l2.c | 5 +- drivers/irqchip/irq-gic-v3-its.c | 62 +- drivers/irqchip/irq-loongson-eiointc.c | 2 +- drivers/md/dm-crypt.c | 38 +- drivers/md/dm-verity-target.c | 26 +- drivers/md/dm-verity.h | 1 - drivers/md/md.c | 7 +- .../media/platform/rockchip/rkisp1/rkisp1-dev.c | 2 +- drivers/media/rc/bpf-lirc.c | 6 +- drivers/media/rc/ir_toy.c | 2 + drivers/media/rc/lirc_dev.c | 5 +- drivers/media/rc/rc-core-priv.h | 2 +- drivers/misc/fastrpc.c | 2 +- drivers/mmc/core/slot-gpio.c | 6 +- drivers/mmc/host/sdhci-pci-o2micro.c | 30 + drivers/net/bonding/bond_main.c | 5 +- drivers/net/can/dev/netlink.c | 2 +- drivers/net/dsa/mv88e6xxx/chip.c | 2 +- drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 38 +- drivers/net/ethernet/mellanox/mlx5/core/dpll.c | 2 +- .../net/ethernet/microchip/lan966x/lan966x_lag.c | 9 +- .../net/ethernet/netronome/nfp/flower/conntrack.c | 46 +- .../ethernet/netronome/nfp/flower/tunnel_conf.c | 2 +- .../net/ethernet/netronome/nfp/nfp_net_common.c | 1 + .../ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c | 6 +- drivers/net/ethernet/stmicro/stmmac/common.h | 56 +- drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 15 +- drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c | 15 +- drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c | 15 +- drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c | 15 +- .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 125 +++- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 136 ++-- drivers/net/ethernet/ti/cpsw.c | 2 + drivers/net/ethernet/ti/cpsw_new.c | 3 + drivers/net/hyperv/netvsc.c | 5 +- drivers/net/hyperv/netvsc_drv.c | 82 ++- drivers/net/wireless/intel/iwlwifi/fw/acpi.c | 15 +- drivers/net/wireless/intel/iwlwifi/iwl-drv.c | 1 + drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c | 3 + drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c | 4 + .../net/wireless/intel/iwlwifi/mvm/time-event.c | 3 +- drivers/net/xen-netback/netback.c | 100 ++- drivers/of/property.c | 61 +- drivers/of/unittest.c | 12 +- drivers/pci/controller/dwc/pcie-designware-ep.c | 3 +- drivers/pci/pci.c | 37 +- drivers/perf/cxl_pmu.c | 2 +- drivers/pmdomain/mediatek/mtk-pm-domains.c | 15 +- drivers/pmdomain/renesas/r8a77980-sysc.c | 3 +- drivers/s390/net/qeth_l3_main.c | 9 +- drivers/scsi/fcoe/fcoe_ctlr.c | 20 +- drivers/scsi/storvsc_drv.c | 12 +- drivers/spi/spi-imx.c | 9 +- drivers/spi/spi-omap2-mcspi.c | 137 +--- drivers/spi/spi-ppc4xx.c | 5 - drivers/staging/iio/impedance-analyzer/ad5933.c | 2 +- drivers/thunderbolt/tb_regs.h | 2 +- drivers/thunderbolt/usb4.c | 2 +- drivers/tty/serial/max310x.c | 53 +- drivers/tty/serial/mxs-auart.c | 5 +- drivers/tty/serial/serial_core.c | 2 +- drivers/usb/chipidea/ci.h | 2 + drivers/usb/chipidea/core.c | 44 +- drivers/usb/common/ulpi.c | 2 +- drivers/usb/core/hub.c | 46 +- drivers/usb/dwc3/gadget.c | 6 +- drivers/usb/gadget/function/f_mass_storage.c | 20 +- drivers/usb/typec/tcpm/tcpm.c | 3 +- drivers/usb/typec/ucsi/ucsi.c | 2 + drivers/usb/typec/ucsi/ucsi_acpi.c | 17 +- drivers/xen/events/events_base.c | 8 +- fs/btrfs/block-group.c | 80 +- fs/btrfs/block-group.h | 7 + fs/btrfs/delalloc-space.c | 29 +- fs/btrfs/disk-io.c | 13 +- fs/btrfs/inode.c | 26 +- fs/btrfs/ioctl.c | 5 + fs/btrfs/qgroup.c | 14 + fs/btrfs/send.c | 2 +- fs/btrfs/transaction.c | 38 +- fs/ceph/caps.c | 3 +- fs/ext4/mballoc.c | 39 +- fs/ext4/move_extent.c | 6 +- fs/hugetlbfs/inode.c | 21 +- fs/namespace.c | 11 +- fs/nfsd/nfs4state.c | 11 +- fs/nilfs2/file.c | 8 +- fs/nilfs2/recovery.c | 7 +- fs/proc/array.c | 66 +- fs/smb/client/connect.c | 14 +- fs/smb/client/fs_context.c | 11 + fs/smb/client/namespace.c | 16 + fs/smb/client/smb2ops.c | 2 +- fs/smb/server/smb2pdu.c | 8 +- fs/tracefs/event_inode.c | 814 ++++++--------------- fs/tracefs/inode.c | 102 +-- fs/tracefs/internal.h | 46 +- fs/zonefs/file.c | 42 +- fs/zonefs/super.c | 66 +- include/linux/backing-dev-defs.h | 7 +- include/linux/compiler-gcc.h | 20 + include/linux/compiler_types.h | 11 +- include/linux/gpio/driver.h | 12 + include/linux/iio/adc/ad_sigma_delta.h | 4 +- include/linux/iio/common/st_sensors.h | 4 +- include/linux/iio/imu/adis.h | 3 +- include/linux/lsm_hook_defs.h | 4 +- include/linux/mman.h | 1 + include/linux/netfilter/ipset/ip_set.h | 4 + include/linux/ptrace.h | 4 + include/linux/serial_core.h | 32 +- include/net/tls.h | 5 - include/sound/tas2781.h | 1 + init/Kconfig | 9 + io_uring/net.c | 5 +- kernel/sys.c | 54 +- kernel/trace/ftrace.c | 10 + kernel/trace/ring_buffer.c | 2 +- kernel/trace/trace.c | 78 +- kernel/trace/trace_btf.c | 4 +- kernel/trace/trace_events_synth.c | 3 +- kernel/trace/trace_events_trigger.c | 6 +- kernel/trace/trace_osnoise.c | 6 +- kernel/trace/trace_probe.c | 32 +- kernel/trace/trace_probe.h | 3 +- kernel/workqueue.c | 8 +- lib/kobject.c | 24 +- mm/backing-dev.c | 2 +- mm/damon/sysfs-schemes.c | 2 +- mm/huge_memory.c | 14 +- mm/memory-failure.c | 2 +- mm/memory.c | 4 +- mm/mmap.c | 6 +- mm/page-writeback.c | 4 +- mm/readahead.c | 4 +- mm/userfaultfd.c | 15 +- net/can/j1939/j1939-priv.h | 3 +- net/can/j1939/main.c | 2 +- net/can/j1939/socket.c | 46 +- net/handshake/handshake-test.c | 5 +- net/hsr/hsr_device.c | 4 +- net/mac80211/tx.c | 5 +- net/mptcp/pm_userspace.c | 13 +- net/mptcp/protocol.c | 25 +- net/mptcp/protocol.h | 7 +- net/mptcp/subflow.c | 4 +- net/netfilter/ipset/ip_set_bitmap_gen.h | 14 +- net/netfilter/ipset/ip_set_core.c | 39 +- net/netfilter/ipset/ip_set_hash_gen.h | 19 +- net/netfilter/ipset/ip_set_list_set.c | 13 +- net/netfilter/nft_set_pipapo_avx2.c | 2 +- net/nfc/nci/core.c | 4 + net/openvswitch/flow_netlink.c | 49 +- net/tls/tls_sw.c | 135 ++-- net/wireless/core.c | 3 +- samples/bpf/asm_goto_workaround.h | 8 +- scripts/link-vmlinux.sh | 9 +- scripts/mksysmap | 13 +- scripts/mod/modpost.c | 3 +- scripts/mod/sumversion.c | 7 +- security/security.c | 45 +- sound/pci/hda/Kconfig | 4 +- sound/pci/hda/patch_conexant.c | 18 + sound/pci/hda/patch_cs8409.c | 1 + sound/pci/hda/patch_realtek.c | 20 +- sound/pci/hda/tas2781_hda_i2c.c | 2 +- sound/soc/amd/yc/acp6x-mach.c | 14 + sound/soc/codecs/rt5645.c | 1 + sound/soc/codecs/tas2781-comlib.c | 3 +- sound/soc/codecs/tas2781-i2c.c | 2 +- sound/soc/codecs/wcd938x.c | 2 +- sound/soc/intel/avs/core.c | 3 + sound/soc/intel/avs/topology.c | 2 +- sound/soc/sof/ipc3-topology.c | 69 +- sound/soc/sof/ipc3.c | 2 +- tools/arch/x86/include/asm/rmwcc.h | 2 +- tools/include/linux/compiler_types.h | 4 +- .../testing/selftests/dt/test_unprobed_devices.sh | 13 +- tools/testing/selftests/landlock/common.h | 48 +- tools/testing/selftests/landlock/fs_test.c | 11 +- tools/testing/selftests/landlock/net_test.c | 13 +- .../selftests/mm/charge_reserved_hugetlb.sh | 2 +- tools/testing/selftests/mm/ksm_tests.c | 2 +- tools/testing/selftests/mm/map_hugetlb.c | 7 + tools/testing/selftests/mm/va_high_addr_switch.sh | 6 + tools/testing/selftests/mm/write_hugetlb_memory.sh | 2 +- .../selftests/net/forwarding/bridge_locked_port.sh | 4 +- .../testing/selftests/net/forwarding/bridge_mdb.sh | 14 +- .../selftests/net/forwarding/tc_flower_l2_miss.sh | 8 +- tools/testing/selftests/net/mptcp/config | 3 + tools/testing/selftests/net/mptcp/mptcp_join.sh | 10 +- tools/testing/selftests/net/mptcp/mptcp_lib.sh | 11 +- tools/testing/selftests/net/mptcp/settings | 2 +- tools/testing/selftests/net/mptcp/userspace_pm.sh | 31 +- .../selftests/net/test_bridge_backup_port.sh | 394 +++++----- tools/tracing/rtla/Makefile | 7 +- tools/tracing/rtla/src/osnoise_hist.c | 9 +- tools/tracing/rtla/src/osnoise_top.c | 6 +- tools/tracing/rtla/src/timerlat_hist.c | 9 +- tools/tracing/rtla/src/timerlat_top.c | 6 +- tools/tracing/rtla/src/utils.c | 14 +- tools/tracing/rtla/src/utils.h | 2 + tools/verification/rv/Makefile | 7 +- tools/verification/rv/src/in_kernel.c | 2 +- 340 files changed, 3297 insertions(+), 2530 deletions(-)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Linus Torvalds torvalds@linux-foundation.org
commit 4356e9f841f7fbb945521cef3577ba394c65f3fc upstream.
We've had issues with gcc and 'asm goto' before, and we created a 'asm_volatile_goto()' macro for that in the past: see commits 3f0116c3238a ("compiler/gcc4: Add quirk for 'asm goto' miscompilation bug") and a9f180345f53 ("compiler/gcc4: Make quirk for asm_volatile_goto() unconditional").
Then, much later, we ended up removing the workaround in commit 43c249ea0b1e ("compiler-gcc.h: remove ancient workaround for gcc PR 58670") because we no longer supported building the kernel with the affected gcc versions, but we left the macro uses around.
Now, Sean Christopherson reports a new version of a very similar problem, which is fixed by re-applying that ancient workaround. But the problem in question is limited to only the 'asm goto with outputs' cases, so instead of re-introducing the old workaround as-is, let's rename and limit the workaround to just that much less common case.
It looks like there are at least two separate issues that all hit in this area:
(a) some versions of gcc don't mark the asm goto as 'volatile' when it has outputs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420
which is easy to work around by just adding the 'volatile' by hand.
(b) Internal compiler errors:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422
which are worked around by adding the extra empty 'asm' as a barrier, as in the original workaround.
but the problem Sean sees may be a third thing since it involves bad code generation (not an ICE) even with the manually added 'volatile'.
but the same old workaround works for this case, even if this feels a bit like voodoo programming and may only be hiding the issue.
Reported-and-tested-by: Sean Christopherson seanjc@google.com Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/ Cc: Nick Desaulniers ndesaulniers@google.com Cc: Uros Bizjak ubizjak@gmail.com Cc: Jakub Jelinek jakub@redhat.com Cc: Andrew Pinski quic_apinski@quicinc.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arc/include/asm/jump_label.h | 4 ++-- arch/arm/include/asm/jump_label.h | 4 ++-- arch/arm64/include/asm/alternative-macros.h | 4 ++-- arch/arm64/include/asm/jump_label.h | 4 ++-- arch/csky/include/asm/jump_label.h | 4 ++-- arch/loongarch/include/asm/jump_label.h | 4 ++-- arch/mips/include/asm/jump_label.h | 4 ++-- arch/parisc/include/asm/jump_label.h | 4 ++-- arch/powerpc/include/asm/jump_label.h | 4 ++-- arch/powerpc/include/asm/uaccess.h | 12 ++++++------ arch/powerpc/kernel/irq_64.c | 2 +- arch/riscv/include/asm/bitops.h | 8 ++++---- arch/riscv/include/asm/cpufeature.h | 4 ++-- arch/riscv/include/asm/jump_label.h | 4 ++-- arch/s390/include/asm/jump_label.h | 4 ++-- arch/sparc/include/asm/jump_label.h | 4 ++-- arch/um/include/asm/cpufeature.h | 2 +- arch/x86/include/asm/cpufeature.h | 2 +- arch/x86/include/asm/jump_label.h | 6 +++--- arch/x86/include/asm/rmwcc.h | 2 +- arch/x86/include/asm/special_insns.h | 2 +- arch/x86/include/asm/uaccess.h | 10 +++++----- arch/x86/kvm/svm/svm_ops.h | 6 +++--- arch/x86/kvm/vmx/vmx.c | 4 ++-- arch/x86/kvm/vmx/vmx_ops.h | 6 +++--- arch/xtensa/include/asm/jump_label.h | 4 ++-- include/linux/compiler-gcc.h | 19 +++++++++++++++++++ include/linux/compiler_types.h | 4 ++-- net/netfilter/nft_set_pipapo_avx2.c | 2 +- samples/bpf/asm_goto_workaround.h | 8 ++++---- tools/arch/x86/include/asm/rmwcc.h | 2 +- tools/include/linux/compiler_types.h | 4 ++-- 32 files changed, 88 insertions(+), 69 deletions(-)
--- a/arch/arc/include/asm/jump_label.h +++ b/arch/arc/include/asm/jump_label.h @@ -31,7 +31,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool branch) { - asm_volatile_goto(".balign "__stringify(JUMP_LABEL_NOP_SIZE)" \n" + asm goto(".balign "__stringify(JUMP_LABEL_NOP_SIZE)" \n" "1: \n" "nop \n" ".pushsection __jump_table, "aw" \n" @@ -47,7 +47,7 @@ l_yes: static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch) { - asm_volatile_goto(".balign "__stringify(JUMP_LABEL_NOP_SIZE)" \n" + asm goto(".balign "__stringify(JUMP_LABEL_NOP_SIZE)" \n" "1: \n" "b %l[l_yes] \n" ".pushsection __jump_table, "aw" \n" --- a/arch/arm/include/asm/jump_label.h +++ b/arch/arm/include/asm/jump_label.h @@ -11,7 +11,7 @@
static __always_inline bool arch_static_branch(struct static_key *key, bool branch) { - asm_volatile_goto("1:\n\t" + asm goto("1:\n\t" WASM(nop) "\n\t" ".pushsection __jump_table, "aw"\n\t" ".word 1b, %l[l_yes], %c0\n\t" @@ -25,7 +25,7 @@ l_yes:
static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch) { - asm_volatile_goto("1:\n\t" + asm goto("1:\n\t" WASM(b) " %l[l_yes]\n\t" ".pushsection __jump_table, "aw"\n\t" ".word 1b, %l[l_yes], %c0\n\t" --- a/arch/arm64/include/asm/alternative-macros.h +++ b/arch/arm64/include/asm/alternative-macros.h @@ -229,7 +229,7 @@ alternative_has_cap_likely(const unsigne if (!cpucap_is_possible(cpucap)) return false;
- asm_volatile_goto( + asm goto( ALTERNATIVE_CB("b %l[l_no]", %[cpucap], alt_cb_patch_nops) : : [cpucap] "i" (cpucap) @@ -247,7 +247,7 @@ alternative_has_cap_unlikely(const unsig if (!cpucap_is_possible(cpucap)) return false;
- asm_volatile_goto( + asm goto( ALTERNATIVE("nop", "b %l[l_yes]", %[cpucap]) : : [cpucap] "i" (cpucap) --- a/arch/arm64/include/asm/jump_label.h +++ b/arch/arm64/include/asm/jump_label.h @@ -18,7 +18,7 @@ static __always_inline bool arch_static_branch(struct static_key * const key, const bool branch) { - asm_volatile_goto( + asm goto( "1: nop \n\t" " .pushsection __jump_table, "aw" \n\t" " .align 3 \n\t" @@ -35,7 +35,7 @@ l_yes: static __always_inline bool arch_static_branch_jump(struct static_key * const key, const bool branch) { - asm_volatile_goto( + asm goto( "1: b %l[l_yes] \n\t" " .pushsection __jump_table, "aw" \n\t" " .align 3 \n\t" --- a/arch/csky/include/asm/jump_label.h +++ b/arch/csky/include/asm/jump_label.h @@ -12,7 +12,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool branch) { - asm_volatile_goto( + asm goto( "1: nop32 \n" " .pushsection __jump_table, "aw" \n" " .align 2 \n" @@ -29,7 +29,7 @@ label: static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch) { - asm_volatile_goto( + asm goto( "1: bsr32 %l[label] \n" " .pushsection __jump_table, "aw" \n" " .align 2 \n" --- a/arch/loongarch/include/asm/jump_label.h +++ b/arch/loongarch/include/asm/jump_label.h @@ -22,7 +22,7 @@
static __always_inline bool arch_static_branch(struct static_key * const key, const bool branch) { - asm_volatile_goto( + asm goto( "1: nop \n\t" JUMP_TABLE_ENTRY : : "i"(&((char *)key)[branch]) : : l_yes); @@ -35,7 +35,7 @@ l_yes:
static __always_inline bool arch_static_branch_jump(struct static_key * const key, const bool branch) { - asm_volatile_goto( + asm goto( "1: b %l[l_yes] \n\t" JUMP_TABLE_ENTRY : : "i"(&((char *)key)[branch]) : : l_yes); --- a/arch/mips/include/asm/jump_label.h +++ b/arch/mips/include/asm/jump_label.h @@ -36,7 +36,7 @@
static __always_inline bool arch_static_branch(struct static_key *key, bool branch) { - asm_volatile_goto("1:\t" B_INSN " 2f\n\t" + asm goto("1:\t" B_INSN " 2f\n\t" "2:\t.insn\n\t" ".pushsection __jump_table, "aw"\n\t" WORD_INSN " 1b, %l[l_yes], %0\n\t" @@ -50,7 +50,7 @@ l_yes:
static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch) { - asm_volatile_goto("1:\t" J_INSN " %l[l_yes]\n\t" + asm goto("1:\t" J_INSN " %l[l_yes]\n\t" ".pushsection __jump_table, "aw"\n\t" WORD_INSN " 1b, %l[l_yes], %0\n\t" ".popsection\n\t" --- a/arch/parisc/include/asm/jump_label.h +++ b/arch/parisc/include/asm/jump_label.h @@ -12,7 +12,7 @@
static __always_inline bool arch_static_branch(struct static_key *key, bool branch) { - asm_volatile_goto("1:\n\t" + asm goto("1:\n\t" "nop\n\t" ".pushsection __jump_table, "aw"\n\t" ".align %1\n\t" @@ -29,7 +29,7 @@ l_yes:
static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch) { - asm_volatile_goto("1:\n\t" + asm goto("1:\n\t" "b,n %l[l_yes]\n\t" ".pushsection __jump_table, "aw"\n\t" ".align %1\n\t" --- a/arch/powerpc/include/asm/jump_label.h +++ b/arch/powerpc/include/asm/jump_label.h @@ -17,7 +17,7 @@
static __always_inline bool arch_static_branch(struct static_key *key, bool branch) { - asm_volatile_goto("1:\n\t" + asm goto("1:\n\t" "nop # arch_static_branch\n\t" ".pushsection __jump_table, "aw"\n\t" ".long 1b - ., %l[l_yes] - .\n\t" @@ -32,7 +32,7 @@ l_yes:
static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch) { - asm_volatile_goto("1:\n\t" + asm goto("1:\n\t" "b %l[l_yes] # arch_static_branch_jump\n\t" ".pushsection __jump_table, "aw"\n\t" ".long 1b - ., %l[l_yes] - .\n\t" --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -74,7 +74,7 @@ __pu_failed: \ /* -mprefixed can generate offsets beyond range, fall back hack */ #ifdef CONFIG_PPC_KERNEL_PREFIXED #define __put_user_asm_goto(x, addr, label, op) \ - asm_volatile_goto( \ + asm goto( \ "1: " op " %0,0(%1) # put_user\n" \ EX_TABLE(1b, %l2) \ : \ @@ -83,7 +83,7 @@ __pu_failed: \ : label) #else #define __put_user_asm_goto(x, addr, label, op) \ - asm_volatile_goto( \ + asm goto( \ "1: " op "%U1%X1 %0,%1 # put_user\n" \ EX_TABLE(1b, %l2) \ : \ @@ -97,7 +97,7 @@ __pu_failed: \ __put_user_asm_goto(x, ptr, label, "std") #else /* __powerpc64__ */ #define __put_user_asm2_goto(x, addr, label) \ - asm_volatile_goto( \ + asm goto( \ "1: stw%X1 %0, %1\n" \ "2: stw%X1 %L0, %L1\n" \ EX_TABLE(1b, %l2) \ @@ -146,7 +146,7 @@ do { \ /* -mprefixed can generate offsets beyond range, fall back hack */ #ifdef CONFIG_PPC_KERNEL_PREFIXED #define __get_user_asm_goto(x, addr, label, op) \ - asm_volatile_goto( \ + asm_goto_output( \ "1: "op" %0,0(%1) # get_user\n" \ EX_TABLE(1b, %l2) \ : "=r" (x) \ @@ -155,7 +155,7 @@ do { \ : label) #else #define __get_user_asm_goto(x, addr, label, op) \ - asm_volatile_goto( \ + asm_goto_output( \ "1: "op"%U1%X1 %0, %1 # get_user\n" \ EX_TABLE(1b, %l2) \ : "=r" (x) \ @@ -169,7 +169,7 @@ do { \ __get_user_asm_goto(x, addr, label, "ld") #else /* __powerpc64__ */ #define __get_user_asm2_goto(x, addr, label) \ - asm_volatile_goto( \ + asm_goto_output( \ "1: lwz%X1 %0, %1\n" \ "2: lwz%X1 %L0, %L1\n" \ EX_TABLE(1b, %l2) \ --- a/arch/powerpc/kernel/irq_64.c +++ b/arch/powerpc/kernel/irq_64.c @@ -230,7 +230,7 @@ again: * This allows interrupts to be unmasked without hard disabling, and * also without new hard interrupts coming in ahead of pending ones. */ - asm_volatile_goto( + asm goto( "1: \n" " lbz 9,%0(13) \n" " cmpwi 9,0 \n" --- a/arch/riscv/include/asm/bitops.h +++ b/arch/riscv/include/asm/bitops.h @@ -39,7 +39,7 @@ static __always_inline unsigned long var { int num;
- asm_volatile_goto(ALTERNATIVE("j %l[legacy]", "nop", 0, + asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) : : : : legacy);
@@ -95,7 +95,7 @@ static __always_inline unsigned long var { int num;
- asm_volatile_goto(ALTERNATIVE("j %l[legacy]", "nop", 0, + asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) : : : : legacy);
@@ -154,7 +154,7 @@ static __always_inline int variable_ffs( if (!x) return 0;
- asm_volatile_goto(ALTERNATIVE("j %l[legacy]", "nop", 0, + asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) : : : : legacy);
@@ -209,7 +209,7 @@ static __always_inline int variable_fls( if (!x) return 0;
- asm_volatile_goto(ALTERNATIVE("j %l[legacy]", "nop", 0, + asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) : : : : legacy);
--- a/arch/riscv/include/asm/cpufeature.h +++ b/arch/riscv/include/asm/cpufeature.h @@ -78,7 +78,7 @@ riscv_has_extension_likely(const unsigne "ext must be < RISCV_ISA_EXT_MAX");
if (IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) { - asm_volatile_goto( + asm goto( ALTERNATIVE("j %l[l_no]", "nop", 0, %[ext], 1) : : [ext] "i" (ext) @@ -101,7 +101,7 @@ riscv_has_extension_unlikely(const unsig "ext must be < RISCV_ISA_EXT_MAX");
if (IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) { - asm_volatile_goto( + asm goto( ALTERNATIVE("nop", "j %l[l_yes]", 0, %[ext], 1) : : [ext] "i" (ext) --- a/arch/riscv/include/asm/jump_label.h +++ b/arch/riscv/include/asm/jump_label.h @@ -17,7 +17,7 @@ static __always_inline bool arch_static_branch(struct static_key * const key, const bool branch) { - asm_volatile_goto( + asm goto( " .align 2 \n\t" " .option push \n\t" " .option norelax \n\t" @@ -39,7 +39,7 @@ label: static __always_inline bool arch_static_branch_jump(struct static_key * const key, const bool branch) { - asm_volatile_goto( + asm goto( " .align 2 \n\t" " .option push \n\t" " .option norelax \n\t" --- a/arch/s390/include/asm/jump_label.h +++ b/arch/s390/include/asm/jump_label.h @@ -25,7 +25,7 @@ */ static __always_inline bool arch_static_branch(struct static_key *key, bool branch) { - asm_volatile_goto("0: brcl 0,%l[label]\n" + asm goto("0: brcl 0,%l[label]\n" ".pushsection __jump_table,"aw"\n" ".balign 8\n" ".long 0b-.,%l[label]-.\n" @@ -39,7 +39,7 @@ label:
static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch) { - asm_volatile_goto("0: brcl 15,%l[label]\n" + asm goto("0: brcl 15,%l[label]\n" ".pushsection __jump_table,"aw"\n" ".balign 8\n" ".long 0b-.,%l[label]-.\n" --- a/arch/sparc/include/asm/jump_label.h +++ b/arch/sparc/include/asm/jump_label.h @@ -10,7 +10,7 @@
static __always_inline bool arch_static_branch(struct static_key *key, bool branch) { - asm_volatile_goto("1:\n\t" + asm goto("1:\n\t" "nop\n\t" "nop\n\t" ".pushsection __jump_table, "aw"\n\t" @@ -26,7 +26,7 @@ l_yes:
static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch) { - asm_volatile_goto("1:\n\t" + asm goto("1:\n\t" "b %l[l_yes]\n\t" "nop\n\t" ".pushsection __jump_table, "aw"\n\t" --- a/arch/um/include/asm/cpufeature.h +++ b/arch/um/include/asm/cpufeature.h @@ -75,7 +75,7 @@ extern void setup_clear_cpu_cap(unsigned */ static __always_inline bool _static_cpu_has(u16 bit) { - asm_volatile_goto("1: jmp 6f\n" + asm goto("1: jmp 6f\n" "2:\n" ".skip -(((5f-4f) - (2b-1b)) > 0) * " "((5f-4f) - (2b-1b)),0x90\n" --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -168,7 +168,7 @@ extern void clear_cpu_cap(struct cpuinfo */ static __always_inline bool _static_cpu_has(u16 bit) { - asm_volatile_goto( + asm goto( ALTERNATIVE_TERNARY("jmp 6f", %P[feature], "", "jmp %l[t_no]") ".pushsection .altinstr_aux,"ax"\n" "6:\n" --- a/arch/x86/include/asm/jump_label.h +++ b/arch/x86/include/asm/jump_label.h @@ -24,7 +24,7 @@
static __always_inline bool arch_static_branch(struct static_key *key, bool branch) { - asm_volatile_goto("1:" + asm goto("1:" "jmp %l[l_yes] # objtool NOPs this \n\t" JUMP_TABLE_ENTRY : : "i" (key), "i" (2 | branch) : : l_yes); @@ -38,7 +38,7 @@ l_yes:
static __always_inline bool arch_static_branch(struct static_key * const key, const bool branch) { - asm_volatile_goto("1:" + asm goto("1:" ".byte " __stringify(BYTES_NOP5) "\n\t" JUMP_TABLE_ENTRY : : "i" (key), "i" (branch) : : l_yes); @@ -52,7 +52,7 @@ l_yes:
static __always_inline bool arch_static_branch_jump(struct static_key * const key, const bool branch) { - asm_volatile_goto("1:" + asm goto("1:" "jmp %l[l_yes]\n\t" JUMP_TABLE_ENTRY : : "i" (key), "i" (branch) : : l_yes); --- a/arch/x86/include/asm/rmwcc.h +++ b/arch/x86/include/asm/rmwcc.h @@ -13,7 +13,7 @@ #define __GEN_RMWcc(fullop, _var, cc, clobbers, ...) \ ({ \ bool c = false; \ - asm_volatile_goto (fullop "; j" #cc " %l[cc_label]" \ + asm goto (fullop "; j" #cc " %l[cc_label]" \ : : [var] "m" (_var), ## __VA_ARGS__ \ : clobbers : cc_label); \ if (0) { \ --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -205,7 +205,7 @@ static inline void clwb(volatile void *_ #ifdef CONFIG_X86_USER_SHADOW_STACK static inline int write_user_shstk_64(u64 __user *addr, u64 val) { - asm_volatile_goto("1: wrussq %[val], (%[addr])\n" + asm goto("1: wrussq %[val], (%[addr])\n" _ASM_EXTABLE(1b, %l[fail]) :: [addr] "r" (addr), [val] "r" (val) :: fail); --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -133,7 +133,7 @@ extern int __get_user_bad(void);
#ifdef CONFIG_X86_32 #define __put_user_goto_u64(x, addr, label) \ - asm_volatile_goto("\n" \ + asm goto("\n" \ "1: movl %%eax,0(%1)\n" \ "2: movl %%edx,4(%1)\n" \ _ASM_EXTABLE_UA(1b, %l2) \ @@ -295,7 +295,7 @@ do { \ } while (0)
#define __get_user_asm(x, addr, itype, ltype, label) \ - asm_volatile_goto("\n" \ + asm_goto_output("\n" \ "1: mov"itype" %[umem],%[output]\n" \ _ASM_EXTABLE_UA(1b, %l2) \ : [output] ltype(x) \ @@ -375,7 +375,7 @@ do { \ __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ __typeof__(*(_ptr)) __old = *_old; \ __typeof__(*(_ptr)) __new = (_new); \ - asm_volatile_goto("\n" \ + asm_goto_output("\n" \ "1: " LOCK_PREFIX "cmpxchg"itype" %[new], %[ptr]\n"\ _ASM_EXTABLE_UA(1b, %l[label]) \ : CC_OUT(z) (success), \ @@ -394,7 +394,7 @@ do { \ __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ __typeof__(*(_ptr)) __old = *_old; \ __typeof__(*(_ptr)) __new = (_new); \ - asm_volatile_goto("\n" \ + asm_goto_output("\n" \ "1: " LOCK_PREFIX "cmpxchg8b %[ptr]\n" \ _ASM_EXTABLE_UA(1b, %l[label]) \ : CC_OUT(z) (success), \ @@ -477,7 +477,7 @@ struct __large_struct { unsigned long bu * aliasing issues. */ #define __put_user_goto(x, addr, itype, ltype, label) \ - asm_volatile_goto("\n" \ + asm goto("\n" \ "1: mov"itype" %0,%1\n" \ _ASM_EXTABLE_UA(1b, %l2) \ : : ltype(x), "m" (__m(addr)) \ --- a/arch/x86/kvm/svm/svm_ops.h +++ b/arch/x86/kvm/svm/svm_ops.h @@ -8,7 +8,7 @@
#define svm_asm(insn, clobber...) \ do { \ - asm_volatile_goto("1: " __stringify(insn) "\n\t" \ + asm goto("1: " __stringify(insn) "\n\t" \ _ASM_EXTABLE(1b, %l[fault]) \ ::: clobber : fault); \ return; \ @@ -18,7 +18,7 @@ fault: \
#define svm_asm1(insn, op1, clobber...) \ do { \ - asm_volatile_goto("1: " __stringify(insn) " %0\n\t" \ + asm goto("1: " __stringify(insn) " %0\n\t" \ _ASM_EXTABLE(1b, %l[fault]) \ :: op1 : clobber : fault); \ return; \ @@ -28,7 +28,7 @@ fault: \
#define svm_asm2(insn, op1, op2, clobber...) \ do { \ - asm_volatile_goto("1: " __stringify(insn) " %1, %0\n\t" \ + asm goto("1: " __stringify(insn) " %1, %0\n\t" \ _ASM_EXTABLE(1b, %l[fault]) \ :: op1, op2 : clobber : fault); \ return; \ --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -745,7 +745,7 @@ static int vmx_set_guest_uret_msr(struct */ static int kvm_cpu_vmxoff(void) { - asm_volatile_goto("1: vmxoff\n\t" + asm goto("1: vmxoff\n\t" _ASM_EXTABLE(1b, %l[fault]) ::: "cc", "memory" : fault);
@@ -2789,7 +2789,7 @@ static int kvm_cpu_vmxon(u64 vmxon_point
cr4_set_bits(X86_CR4_VMXE);
- asm_volatile_goto("1: vmxon %[vmxon_pointer]\n\t" + asm goto("1: vmxon %[vmxon_pointer]\n\t" _ASM_EXTABLE(1b, %l[fault]) : : [vmxon_pointer] "m"(vmxon_pointer) : : fault); --- a/arch/x86/kvm/vmx/vmx_ops.h +++ b/arch/x86/kvm/vmx/vmx_ops.h @@ -94,7 +94,7 @@ static __always_inline unsigned long __v
#ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
- asm_volatile_goto("1: vmread %[field], %[output]\n\t" + asm_goto_output("1: vmread %[field], %[output]\n\t" "jna %l[do_fail]\n\t"
_ASM_EXTABLE(1b, %l[do_exception]) @@ -188,7 +188,7 @@ static __always_inline unsigned long vmc
#define vmx_asm1(insn, op1, error_args...) \ do { \ - asm_volatile_goto("1: " __stringify(insn) " %0\n\t" \ + asm goto("1: " __stringify(insn) " %0\n\t" \ ".byte 0x2e\n\t" /* branch not taken hint */ \ "jna %l[error]\n\t" \ _ASM_EXTABLE(1b, %l[fault]) \ @@ -205,7 +205,7 @@ fault: \
#define vmx_asm2(insn, op1, op2, error_args...) \ do { \ - asm_volatile_goto("1: " __stringify(insn) " %1, %0\n\t" \ + asm goto("1: " __stringify(insn) " %1, %0\n\t" \ ".byte 0x2e\n\t" /* branch not taken hint */ \ "jna %l[error]\n\t" \ _ASM_EXTABLE(1b, %l[fault]) \ --- a/arch/xtensa/include/asm/jump_label.h +++ b/arch/xtensa/include/asm/jump_label.h @@ -13,7 +13,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool branch) { - asm_volatile_goto("1:\n\t" + asm goto("1:\n\t" "_nop\n\t" ".pushsection __jump_table, "aw"\n\t" ".word 1b, %l[l_yes], %c0\n\t" @@ -38,7 +38,7 @@ static __always_inline bool arch_static_ * make it reachable and wrap both into a no-transform block * to avoid any assembler interference with this. */ - asm_volatile_goto("1:\n\t" + asm goto("1:\n\t" ".begin no-transform\n\t" "_j %l[l_yes]\n\t" "2:\n\t" --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -64,6 +64,25 @@ __builtin_unreachable(); \ } while (0)
+/* + * GCC 'asm goto' with outputs miscompiles certain code sequences: + * + * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420 + * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422 + * + * Work it around via the same compiler barrier quirk that we used + * to use for the old 'asm goto' workaround. + * + * Also, always mark such 'asm goto' statements as volatile: all + * asm goto statements are supposed to be volatile as per the + * documentation, but some versions of gcc didn't actually do + * that for asms with outputs: + * + * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619 + */ +#define asm_goto_output(x...) \ + do { asm volatile goto(x); asm (""); } while (0) + #if defined(CONFIG_ARCH_USE_BUILTIN_BSWAP) #define __HAVE_BUILTIN_BSWAP32__ #define __HAVE_BUILTIN_BSWAP64__ --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -362,8 +362,8 @@ struct ftrace_likely_data { #define __member_size(p) __builtin_object_size(p, 1) #endif
-#ifndef asm_volatile_goto -#define asm_volatile_goto(x...) asm goto(x) +#ifndef asm_goto_output +#define asm_goto_output(x...) asm goto(x) #endif
#ifdef CONFIG_CC_HAS_ASM_INLINE --- a/net/netfilter/nft_set_pipapo_avx2.c +++ b/net/netfilter/nft_set_pipapo_avx2.c @@ -57,7 +57,7 @@
/* Jump to label if @reg is zero */ #define NFT_PIPAPO_AVX2_NOMATCH_GOTO(reg, label) \ - asm_volatile_goto("vptest %%ymm" #reg ", %%ymm" #reg ";" \ + asm goto("vptest %%ymm" #reg ", %%ymm" #reg ";" \ "je %l[" #label "]" : : : : label)
/* Store 256 bits from YMM register into memory. Contrary to bucket load --- a/samples/bpf/asm_goto_workaround.h +++ b/samples/bpf/asm_goto_workaround.h @@ -4,14 +4,14 @@ #define __ASM_GOTO_WORKAROUND_H
/* - * This will bring in asm_volatile_goto and asm_inline macro definitions + * This will bring in asm_goto_output and asm_inline macro definitions * if enabled by compiler and config options. */ #include <linux/types.h>
-#ifdef asm_volatile_goto -#undef asm_volatile_goto -#define asm_volatile_goto(x...) asm volatile("invalid use of asm_volatile_goto") +#ifdef asm_goto_output +#undef asm_goto_output +#define asm_goto_output(x...) asm volatile("invalid use of asm_goto_output") #endif
/* --- a/tools/arch/x86/include/asm/rmwcc.h +++ b/tools/arch/x86/include/asm/rmwcc.h @@ -4,7 +4,7 @@
#define __GEN_RMWcc(fullop, var, cc, ...) \ do { \ - asm_volatile_goto (fullop "; j" cc " %l[cc_label]" \ + asm goto (fullop "; j" cc " %l[cc_label]" \ : : "m" (var), ## __VA_ARGS__ \ : "memory" : cc_label); \ return 0; \ --- a/tools/include/linux/compiler_types.h +++ b/tools/include/linux/compiler_types.h @@ -36,8 +36,8 @@ #include <linux/compiler-gcc.h> #endif
-#ifndef asm_volatile_goto -#define asm_volatile_goto(x...) asm goto(x) +#ifndef asm_goto_output +#define asm_goto_output(x...) asm goto(x) #endif
#endif /* __LINUX_COMPILER_TYPES_H */
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Linus Torvalds torvalds@linux-foundation.org
commit 68fb3ca0e408e00db1c3f8fccdfa19e274c033be upstream.
In commit 4356e9f841f7 ("work around gcc bugs with 'asm goto' with outputs") I did the gcc workaround unconditionally, because the cause of the bad code generation wasn't entirely clear.
In the meantime, Jakub Jelinek debugged the issue, and has come up with a fix in gcc [2], which also got backported to the still maintained branches of gcc-11, gcc-12 and gcc-13.
Note that while the fix technically wasn't in the original gcc-14 branch, Jakub says:
"while it is true that no GCC 14 snapshots until today (or whenever the fix will be committed) have the fix, for GCC trunk it is up to the distros to use the latest snapshot if they use it at all and would allow better testing of the kernel code without the workaround, so that if there are other issues they won't be discovered years later. Most userland code doesn't actually use asm goto with outputs..."
so we will consider gcc-14 to be fixed - if somebody is using gcc snapshots of the gcc-14 before the fix, they should upgrade.
Note that while the bug goes back to gcc-11, in practice other gcc changes seem to have effectively hidden it since gcc-12.1 as per a bisect by Jakub. So even a gcc-14 snapshot without the fix likely doesn't show actual problems.
Also, make the default 'asm_goto_output()' macro mark the asm as volatile by hand, because of an unrelated gcc issue [1] where it doesn't match the documented behavior ("asm goto is always volatile").
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103979 [1] Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 [2] Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/ Requested-by: Jakub Jelinek jakub@redhat.com Cc: Uros Bizjak ubizjak@gmail.com Cc: Nick Desaulniers ndesaulniers@google.com Cc: Sean Christopherson seanjc@google.com Cc: Andrew Pinski quic_apinski@quicinc.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/compiler-gcc.h | 7 ++++--- include/linux/compiler_types.h | 9 ++++++++- init/Kconfig | 9 +++++++++ 3 files changed, 21 insertions(+), 4 deletions(-)
--- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -67,10 +67,9 @@ /* * GCC 'asm goto' with outputs miscompiles certain code sequences: * - * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420 - * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422 + * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 * - * Work it around via the same compiler barrier quirk that we used + * Work around it via the same compiler barrier quirk that we used * to use for the old 'asm goto' workaround. * * Also, always mark such 'asm goto' statements as volatile: all @@ -80,8 +79,10 @@ * * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619 */ +#ifdef CONFIG_GCC_ASM_GOTO_OUTPUT_WORKAROUND #define asm_goto_output(x...) \ do { asm volatile goto(x); asm (""); } while (0) +#endif
#if defined(CONFIG_ARCH_USE_BUILTIN_BSWAP) #define __HAVE_BUILTIN_BSWAP32__ --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -362,8 +362,15 @@ struct ftrace_likely_data { #define __member_size(p) __builtin_object_size(p, 1) #endif
+/* + * Some versions of gcc do not mark 'asm goto' volatile: + * + * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103979 + * + * We do it here by hand, because it doesn't hurt. + */ #ifndef asm_goto_output -#define asm_goto_output(x...) asm goto(x) +#define asm_goto_output(x...) asm volatile goto(x) #endif
#ifdef CONFIG_CC_HAS_ASM_INLINE --- a/init/Kconfig +++ b/init/Kconfig @@ -89,6 +89,15 @@ config CC_HAS_ASM_GOTO_TIED_OUTPUT # Detect buggy gcc and clang, fixed in gcc-11 clang-14. def_bool $(success,echo 'int foo(int *x) { asm goto (".long (%l[bar]) - .": "+m"(*x) ::: bar); return *x; bar: return 0; }' | $CC -x c - -c -o /dev/null)
+config GCC_ASM_GOTO_OUTPUT_WORKAROUND + bool + depends on CC_IS_GCC && CC_HAS_ASM_GOTO_OUTPUT + # Fixed in GCC 14, 13.3, 12.4 and 11.5 + # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 + default y if GCC_VERSION < 110500 + default y if GCC_VERSION >= 120000 && GCC_VERSION < 120400 + default y if GCC_VERSION >= 130000 && GCC_VERSION < 130300 + config TOOLS_SUPPORT_RELR def_bool $(success,env "CC=$(CC)" "LD=$(LD)" "NM=$(NM)" "OBJCOPY=$(OBJCOPY)" $(srctree)/scripts/tools-support-relr.sh)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yang Shi yang@os.amperecomputing.com
commit 4ef9ad19e17676b9ef071309bc62020e2373705d upstream.
commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") caused two issues [1] [2] reported on 32 bit system or compat userspace.
It doesn't make too much sense to force huge page alignment on 32 bit system due to the constrained virtual address space.
[1] https://lore.kernel.org/linux-mm/d0a136a0-4a31-46bc-adf4-2db109a61672@kernel... [2] https://lore.kernel.org/linux-mm/CAJuCfpHXLdQy1a2B6xN2d7quTYwg2OoZseYPZTRpU0...
Link: https://lkml.kernel.org/r/20240118180505.2914778-1-shy828301@gmail.com Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") Signed-off-by: Yang Shi yang@os.amperecomputing.com Reported-by: Jiri Slaby jirislaby@kernel.org Reported-by: Suren Baghdasaryan surenb@google.com Tested-by: Jiri Slaby jirislaby@kernel.org Tested-by: Suren Baghdasaryan surenb@google.com Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Cc: Rik van Riel riel@surriel.com Cc: Christopher Lameter cl@linux.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/huge_memory.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -37,6 +37,7 @@ #include <linux/page_owner.h> #include <linux/sched/sysctl.h> #include <linux/memory-tiers.h> +#include <linux/compat.h>
#include <asm/tlb.h> #include <asm/pgalloc.h> @@ -634,6 +635,9 @@ static unsigned long __thp_get_unmapped_ loff_t off_align = round_up(off, size); unsigned long len_pad, ret;
+ if (IS_ENABLED(CONFIG_32BIT) || in_compat_syscall()) + return 0; + if (off_end <= off_align || (off_end - off_align) < size) return 0;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yang Shi yang@os.amperecomputing.com
commit c4608d1bf7c6536d1a3d233eb21e50678681564e upstream.
commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") incured regression for stress-ng pthread benchmark [1]. It is because THP get allocated to pthread's stack area much more possible than before. Pthread's stack area is allocated by mmap without VM_GROWSDOWN or VM_GROWSUP flag, so kernel can't tell whether it is a stack area or not.
The MAP_STACK flag is used to mark the stack area, but it is a no-op on Linux. Mapping MAP_STACK to VM_NOHUGEPAGE to prevent from allocating THP for such stack area.
With this change the stack area looks like:
fffd18e10000-fffd19610000 rw-p 00000000 00:00 0 Size: 8192 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 12 kB Pss: 12 kB Pss_Dirty: 12 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 12 kB Referenced: 12 kB Anonymous: 12 kB KSM: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd wr mr mw me ac nh
The "nh" flag is set.
[1] https://lore.kernel.org/linux-mm/202312192310.56367035-oliver.sang@intel.com...
Link: https://lkml.kernel.org/r/20231221065943.2803551-2-shy828301@gmail.com Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") Signed-off-by: Yang Shi yang@os.amperecomputing.com Reported-by: kernel test robot oliver.sang@intel.com Tested-by: Oliver Sang oliver.sang@intel.com Reviewed-by: Yin Fengwei fengwei.yin@intel.com Cc: Rik van Riel riel@surriel.com Cc: Matthew Wilcox willy@infradead.org Cc: Christopher Lameter cl@linux.com Cc: Huang, Ying ying.huang@intel.com Cc: stable@vger.kerenl.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/mman.h | 1 + 1 file changed, 1 insertion(+)
--- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags) return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) | arch_calc_vm_flag_bits(flags); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
commit 1693d5442c458ae8d5b0d58463b873cd879569ed upstream.
Add a helper function to determine if a block group is being used and make use of it at btrfs_delete_unused_bgs(). This helper will also be used in future code changes.
Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Reviewed-by: Josef Bacik josef@toxicpanda.com Reviewed-by: Boris Burkov boris@bur.io Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/block-group.c | 3 +-- fs/btrfs/block-group.h | 7 +++++++ 2 files changed, 8 insertions(+), 2 deletions(-)
--- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1524,8 +1524,7 @@ void btrfs_delete_unused_bgs(struct btrf }
spin_lock(&block_group->lock); - if (block_group->reserved || block_group->pinned || - block_group->used || block_group->ro || + if (btrfs_is_block_group_used(block_group) || block_group->ro || list_is_singular(&block_group->list)) { /* * We want to bail if we made new allocations or have --- a/fs/btrfs/block-group.h +++ b/fs/btrfs/block-group.h @@ -255,6 +255,13 @@ static inline u64 btrfs_block_group_end( return (block_group->start + block_group->length); }
+static inline bool btrfs_is_block_group_used(const struct btrfs_block_group *bg) +{ + lockdep_assert_held(&bg->lock); + + return (bg->used > 0 || bg->reserved > 0 || bg->pinned > 0); +} + static inline bool btrfs_is_block_group_data_only( struct btrfs_block_group *block_group) {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
commit f4a9f219411f318ae60d6ff7f129082a75686c6c upstream.
Before deleting a block group that is in the list of unused block groups (fs_info->unused_bgs), we check if the block group became used before deleting it, as extents from it may have been allocated after it was added to the list.
However even if the block group was not yet used, there may be tasks that have only reserved space and have not yet allocated extents, and they might be relying on the availability of the unused block group in order to allocate extents. The reservation works first by increasing the "bytes_may_use" field of the corresponding space_info object (which may first require flushing delayed items, allocating a new block group, etc), and only later a task does the actual allocation of extents.
For metadata we usually don't end up using all reserved space, as we are pessimistic and typically account for the worst cases (need to COW every single node in a path of a tree at maximum possible height, etc). For data we usually reserve the exact amount of space we're going to allocate later, except when using compression where we always reserve space based on the uncompressed size, as compression is only triggered when writeback starts so we don't know in advance how much space we'll actually need, or if the data is compressible.
So don't delete an unused block group if the total size of its space_info object minus the block group's size is less then the sum of used space and space that may be used (space_info->bytes_may_use), as that means we have tasks that reserved space and may need to allocate extents from the block group. In this case, besides skipping the deletion, re-add the block group to the list of unused block groups so that it may be reconsidered later, in case the tasks that reserved space end up not needing to allocate extents from it.
Allowing the deletion of the block group while we have reserved space, can result in tasks failing to allocate metadata extents (-ENOSPC) while under a transaction handle, resulting in a transaction abort, or failure during writeback for the case of data extents.
CC: stable@vger.kernel.org # 6.0+ Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Reviewed-by: Josef Bacik josef@toxicpanda.com Reviewed-by: Boris Burkov boris@bur.io Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/block-group.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+)
--- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1467,6 +1467,7 @@ out: */ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) { + LIST_HEAD(retry_list); struct btrfs_block_group *block_group; struct btrfs_space_info *space_info; struct btrfs_trans_handle *trans; @@ -1488,6 +1489,7 @@ void btrfs_delete_unused_bgs(struct btrf
spin_lock(&fs_info->unused_bgs_lock); while (!list_empty(&fs_info->unused_bgs)) { + u64 used; int trimming;
block_group = list_first_entry(&fs_info->unused_bgs, @@ -1523,6 +1525,7 @@ void btrfs_delete_unused_bgs(struct btrf goto next; }
+ spin_lock(&space_info->lock); spin_lock(&block_group->lock); if (btrfs_is_block_group_used(block_group) || block_group->ro || list_is_singular(&block_group->list)) { @@ -1534,10 +1537,49 @@ void btrfs_delete_unused_bgs(struct btrf */ trace_btrfs_skip_unused_block_group(block_group); spin_unlock(&block_group->lock); + spin_unlock(&space_info->lock); up_write(&space_info->groups_sem); goto next; } + + /* + * The block group may be unused but there may be space reserved + * accounting with the existence of that block group, that is, + * space_info->bytes_may_use was incremented by a task but no + * space was yet allocated from the block group by the task. + * That space may or may not be allocated, as we are generally + * pessimistic about space reservation for metadata as well as + * for data when using compression (as we reserve space based on + * the worst case, when data can't be compressed, and before + * actually attempting compression, before starting writeback). + * + * So check if the total space of the space_info minus the size + * of this block group is less than the used space of the + * space_info - if that's the case, then it means we have tasks + * that might be relying on the block group in order to allocate + * extents, and add back the block group to the unused list when + * we finish, so that we retry later in case no tasks ended up + * needing to allocate extents from the block group. + */ + used = btrfs_space_info_used(space_info, true); + if (space_info->total_bytes - block_group->length < used) { + /* + * Add a reference for the list, compensate for the ref + * drop under the "next" label for the + * fs_info->unused_bgs list. + */ + btrfs_get_block_group(block_group); + list_add_tail(&block_group->bg_list, &retry_list); + + trace_btrfs_skip_unused_block_group(block_group); + spin_unlock(&block_group->lock); + spin_unlock(&space_info->lock); + up_write(&space_info->groups_sem); + goto next; + } + spin_unlock(&block_group->lock); + spin_unlock(&space_info->lock);
/* We don't want to force the issue, only flip if it's ok. */ ret = inc_block_group_ro(block_group, 0); @@ -1661,12 +1703,16 @@ next: btrfs_put_block_group(block_group); spin_lock(&fs_info->unused_bgs_lock); } + list_splice_tail(&retry_list, &fs_info->unused_bgs); spin_unlock(&fs_info->unused_bgs_lock); mutex_unlock(&fs_info->reclaim_bgs_lock); return;
flip_async: btrfs_end_transaction(trans); + spin_lock(&fs_info->unused_bgs_lock); + list_splice_tail(&retry_list, &fs_info->unused_bgs); + spin_unlock(&fs_info->unused_bgs_lock); mutex_unlock(&fs_info->reclaim_bgs_lock); btrfs_put_block_group(block_group); btrfs_discard_punt_unused_bgs_list(fs_info);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
commit 12c5128f101bfa47a08e4c0e1a75cfa2d0872bcd upstream.
Space reservations for metadata are, most of the time, pessimistic as we reserve space for worst possible cases - where tree heights are at the maximum possible height (8), we need to COW every extent buffer in a tree path, need to split extent buffers, etc.
For data, we generally reserve the exact amount of space we are going to allocate. The exception here is when using compression, in which case we reserve space matching the uncompressed size, as the compression only happens at writeback time and in the worst possible case we need that amount of space in case the data is not compressible.
This means that when there's not available space in the corresponding space_info object, we may need to allocate a new block group, and then that block group might not be used after all. In this case the block group is never added to the list of unused block groups and ends up never being deleted - except if we unmount and mount again the fs, as when reading block groups from disk we add unused ones to the list of unused block groups (fs_info->unused_bgs). Otherwise a block group is only added to the list of unused block groups when we deallocate the last extent from it, so if no extent is ever allocated, the block group is kept around forever.
This also means that if we have a bunch of tasks reserving space in parallel we can end up allocating many block groups that end up never being used or kept around for too long without being used, which has the potential to result in ENOSPC failures in case for example we over allocate too many metadata block groups and then end up in a state without enough unallocated space to allocate a new data block group.
This is more likely to happen with metadata reservations as of kernel 6.7, namely since commit 28270e25c69a ("btrfs: always reserve space for delayed refs when starting transaction"), because we started to always reserve space for delayed references when starting a transaction handle for a non-zero number of items, and also to try to reserve space to fill the gap between the delayed block reserve's reserved space and its size.
So to avoid this, when finishing the creation a new block group, add the block group to the list of unused block groups if it's still unused at that time. This way the next time the cleaner kthread runs, it will delete the block group if it's still unused and not needed to satisfy existing space reservations.
Reported-by: Ivan Shapovalov intelfx@intelfx.name Link: https://lore.kernel.org/linux-btrfs/9cdbf0ca9cdda1b4c84e15e548af7d7f9f926382... CC: stable@vger.kernel.org # 6.7+ Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Reviewed-by: Josef Bacik josef@toxicpanda.com Reviewed-by: Boris Burkov boris@bur.io Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/block-group.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+)
--- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -2757,6 +2757,37 @@ next: btrfs_dec_delayed_refs_rsv_bg_inserts(fs_info); list_del_init(&block_group->bg_list); clear_bit(BLOCK_GROUP_FLAG_NEW, &block_group->runtime_flags); + + /* + * If the block group is still unused, add it to the list of + * unused block groups. The block group may have been created in + * order to satisfy a space reservation, in which case the + * extent allocation only happens later. But often we don't + * actually need to allocate space that we previously reserved, + * so the block group may become unused for a long time. For + * example for metadata we generally reserve space for a worst + * possible scenario, but then don't end up allocating all that + * space or none at all (due to no need to COW, extent buffers + * were already COWed in the current transaction and still + * unwritten, tree heights lower than the maximum possible + * height, etc). For data we generally reserve the axact amount + * of space we are going to allocate later, the exception is + * when using compression, as we must reserve space based on the + * uncompressed data size, because the compression is only done + * when writeback triggered and we don't know how much space we + * are actually going to need, so we reserve the uncompressed + * size because the data may be uncompressible in the worst case. + */ + if (ret == 0) { + bool used; + + spin_lock(&block_group->lock); + used = btrfs_is_block_group_used(block_group); + spin_unlock(&block_group->lock); + + if (!used) + btrfs_mark_bg_unused(block_group); + } } btrfs_trans_release_chunk_metadata(trans); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
commit 2f6397e448e689adf57e6788c90f913abd7e1af8 upstream.
Since commit 28270e25c69a ("btrfs: always reserve space for delayed refs when starting transaction") we started not only to reserve metadata space for the delayed refs a caller of btrfs_start_transaction() might generate but also to try to fully refill the delayed refs block reserve, because there are several case where we generate delayed refs and haven't reserved space for them, relying on the global block reserve. Relying too much on the global block reserve is not always safe, and can result in hitting -ENOSPC during transaction commits or worst, in rare cases, being unable to mount a filesystem that needs to do orphan cleanup or anything that requires modifying the filesystem during mount, and has no more unallocated space and the metadata space is nearly full. This was explained in detail in that commit's change log.
However the gap between the reserved amount and the size of the delayed refs block reserve can be huge, so attempting to reserve space for such a gap can result in allocating many metadata block groups that end up not being used. After a recent patch, with the subject:
"btrfs: add new unused block groups to the list of unused block groups"
We started to add new block groups that are unused to the list of unused block groups, to avoid having them around for a very long time in case they are never used, because a block group is only added to the list of unused block groups when we deallocate the last extent or when mounting the filesystem and the block group has 0 bytes used. This is not a problem introduced by the commit mentioned earlier, it always existed as our metadata space reservations are, most of the time, pessimistic and end up not using all the space they reserved, so we can occasionally end up with one or two unused metadata block groups for a long period. However after that commit mentioned earlier, we are just more pessimistic in the metadata space reservations when starting a transaction and therefore the issue is more likely to happen.
This however is not always enough because we might create unused metadata block groups when reserving metadata space at a high rate if there's always a gap in the delayed refs block reserve and the cleaner kthread isn't triggered often enough or is busy with other work (running delayed iputs, cleaning deleted roots, etc), not to mention the block group's allocated space is only usable for a new block group after the transaction used to remove it is committed.
A user reported that he's getting a lot of allocated metadata block groups but the usage percentage of metadata space was very low compared to the total allocated space, specially after running a series of block group relocations.
So for now stop trying to refill the gap in the delayed refs block reserve and reserve space only for the delayed refs we are expected to generate when starting a transaction.
CC: stable@vger.kernel.org # 6.7+ Reported-by: Ivan Shapovalov intelfx@intelfx.name Link: https://lore.kernel.org/linux-btrfs/9cdbf0ca9cdda1b4c84e15e548af7d7f9f926382... Link: https://lore.kernel.org/linux-btrfs/CAL3q7H6802ayLHUJFztzZAVzBLJAGdFx=6FHNNy... Tested-by: Ivan Shapovalov intelfx@intelfx.name Reported-by: Heddxh g311571057@gmail.com Link: https://lore.kernel.org/linux-btrfs/CAE93xANEby6RezOD=zcofENYZOT-wpYygJyauyU... Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/transaction.c | 38 ++------------------------------------ 1 file changed, 2 insertions(+), 36 deletions(-)
--- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -564,56 +564,22 @@ static int btrfs_reserve_trans_metadata( u64 num_bytes, u64 *delayed_refs_bytes) { - struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv; struct btrfs_space_info *si = fs_info->trans_block_rsv.space_info; - u64 extra_delayed_refs_bytes = 0; - u64 bytes; + u64 bytes = num_bytes + *delayed_refs_bytes; int ret;
/* - * If there's a gap between the size of the delayed refs reserve and - * its reserved space, than some tasks have added delayed refs or bumped - * its size otherwise (due to block group creation or removal, or block - * group item update). Also try to allocate that gap in order to prevent - * using (and possibly abusing) the global reserve when committing the - * transaction. - */ - if (flush == BTRFS_RESERVE_FLUSH_ALL && - !btrfs_block_rsv_full(delayed_refs_rsv)) { - spin_lock(&delayed_refs_rsv->lock); - if (delayed_refs_rsv->size > delayed_refs_rsv->reserved) - extra_delayed_refs_bytes = delayed_refs_rsv->size - - delayed_refs_rsv->reserved; - spin_unlock(&delayed_refs_rsv->lock); - } - - bytes = num_bytes + *delayed_refs_bytes + extra_delayed_refs_bytes; - - /* * We want to reserve all the bytes we may need all at once, so we only * do 1 enospc flushing cycle per transaction start. */ ret = btrfs_reserve_metadata_bytes(fs_info, si, bytes, flush); - if (ret == 0) { - if (extra_delayed_refs_bytes > 0) - btrfs_migrate_to_delayed_refs_rsv(fs_info, - extra_delayed_refs_bytes); - return 0; - } - - if (extra_delayed_refs_bytes > 0) { - bytes -= extra_delayed_refs_bytes; - ret = btrfs_reserve_metadata_bytes(fs_info, si, bytes, flush); - if (ret == 0) - return 0; - }
/* * If we are an emergency flush, which can steal from the global block * reserve, then attempt to not reserve space for the delayed refs, as * we will consume space for them from the global block reserve. */ - if (flush == BTRFS_RESERVE_FLUSH_ALL_STEAL) { + if (ret && flush == BTRFS_RESERVE_FLUSH_ALL_STEAL) { bytes -= *delayed_refs_bytes; *delayed_refs_bytes = 0; ret = btrfs_reserve_metadata_bytes(fs_info, si, bytes, flush);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boris Burkov boris@bur.io
commit 0c309d66dacddf8ce939b891d9ead4a8e21ad6f0 upstream.
Creating a qgroup 0/subvolid leads to various races and it isn't helpful, because you can't specify a subvol id when creating a subvol, so you can't be sure it will be the right one. Any requirements on the automatic subvol can be gratified by using a higher level qgroup and the inheritance parameters of subvol creation.
Fixes: cecbb533b5fc ("btrfs: record simple quota deltas in delayed refs") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Boris Burkov boris@bur.io Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/ioctl.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3815,6 +3815,11 @@ static long btrfs_ioctl_qgroup_create(st goto out; }
+ if (sa->create && is_fstree(sa->qgroupid)) { + ret = -EINVAL; + goto out; + } + trans = btrfs_join_transaction(root); if (IS_ERR(trans)) { ret = PTR_ERR(trans);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Wenruo wqu@suse.com
commit e03ee2fe873eb68c1f9ba5112fee70303ebf9dfb upstream.
[BUG] There is a syzbot crash, triggered by the ASSERT() during subvolume creation:
assertion failed: !anon_dev, in fs/btrfs/disk-io.c:1319 ------------[ cut here ]------------ kernel BUG at fs/btrfs/disk-io.c:1319! invalid opcode: 0000 [#1] PREEMPT SMP KASAN RIP: 0010:btrfs_get_root_ref.part.0+0x9aa/0xa60 <TASK> btrfs_get_new_fs_root+0xd3/0xf0 create_subvol+0xd02/0x1650 btrfs_mksubvol+0xe95/0x12b0 __btrfs_ioctl_snap_create+0x2f9/0x4f0 btrfs_ioctl_snap_create+0x16b/0x200 btrfs_ioctl+0x35f0/0x5cf0 __x64_sys_ioctl+0x19d/0x210 do_syscall_64+0x3f/0xe0 entry_SYSCALL_64_after_hwframe+0x63/0x6b ---[ end trace 0000000000000000 ]---
[CAUSE] During create_subvol(), after inserting root item for the newly created subvolume, we would trigger btrfs_get_new_fs_root() to get the btrfs_root of that subvolume.
The idea here is, we have preallocated an anonymous device number for the subvolume, thus we can assign it to the new subvolume.
But there is really nothing preventing things like backref walk to read the new subvolume. If that happens before we call btrfs_get_new_fs_root(), the subvolume would be read out, with a new anonymous device number assigned already.
In that case, we would trigger ASSERT(), as we really expect no one to read out that subvolume (which is not yet accessible from the fs). But things like backref walk is still possible to trigger the read on the subvolume.
Thus our assumption on the ASSERT() is not correct in the first place.
[FIX] Fix it by removing the ASSERT(), and just free the @anon_dev, reset it to 0, and continue.
If the subvolume tree is read out by something else, it should have already get a new anon_dev assigned thus we only need to free the preallocated one.
Reported-by: Chenyuan Yang chenyuan0y@gmail.com Fixes: 2dfb1e43f57d ("btrfs: preallocate anon block device at first phase of snapshot creation") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/disk-io.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
--- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1315,8 +1315,17 @@ static struct btrfs_root *btrfs_get_root again: root = btrfs_lookup_fs_root(fs_info, objectid); if (root) { - /* Shouldn't get preallocated anon_dev for cached roots */ - ASSERT(!anon_dev); + /* + * Some other caller may have read out the newly inserted + * subvolume already (for things like backref walk etc). Not + * that common but still possible. In that case, we just need + * to free the anon_dev. + */ + if (unlikely(anon_dev)) { + free_anon_bdev(anon_dev); + anon_dev = 0; + } + if (check_ref && btrfs_root_refs(&root->root_item) == 0) { btrfs_put_root(root); return ERR_PTR(-ENOENT);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boris Burkov boris@bur.io
commit a8df35619948bd8363d330c20a90c9a7fbff28c0 upstream.
If a subvolume still exists, forbid deleting its qgroup 0/subvolid. This behavior generally leads to incorrect behavior in squotas and doesn't have a legitimate purpose.
Fixes: cecbb533b5fc ("btrfs: record simple quota deltas in delayed refs") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Boris Burkov boris@bur.io Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/qgroup.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
--- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1736,6 +1736,15 @@ out: return ret; }
+static bool qgroup_has_usage(struct btrfs_qgroup *qgroup) +{ + return (qgroup->rfer > 0 || qgroup->rfer_cmpr > 0 || + qgroup->excl > 0 || qgroup->excl_cmpr > 0 || + qgroup->rsv.values[BTRFS_QGROUP_RSV_DATA] > 0 || + qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PREALLOC] > 0 || + qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PERTRANS] > 0); +} + int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid) { struct btrfs_fs_info *fs_info = trans->fs_info; @@ -1755,6 +1764,11 @@ int btrfs_remove_qgroup(struct btrfs_tra goto out; }
+ if (is_fstree(qgroupid) && qgroup_has_usage(qgroup)) { + ret = -EBUSY; + goto out; + } + /* Check if there are no children of this qgroup */ if (!list_empty(&qgroup->members)) { ret = -EBUSY;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Sterba dsterba@suse.com
commit f884a9f9e59206a2d41f265e7e403f080d10b493 upstream.
When some ioctl flags are checked we return EOPNOTSUPP, like for BTRFS_SCRUB_SUPPORTED_FLAGS, BTRFS_SUBVOL_CREATE_ARGS_MASK or fallocate modes. The EINVAL is supposed to be for a supported but invalid values or combination of options. Fix that when checking send flags so it's consistent with the rest.
CC: stable@vger.kernel.org # 4.14+ Link: https://lore.kernel.org/linux-btrfs/CAL3q7H5rryOLzp3EKq8RTbjMHMHeaJubfpsVLF6... Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/send.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -8111,7 +8111,7 @@ long btrfs_ioctl_send(struct inode *inod }
if (arg->flags & ~BTRFS_SEND_FLAG_MASK) { - ret = -EINVAL; + ret = -EOPNOTSUPP; goto out; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
commit feefe1f49d26bad9d8997096e3a200280fa7b1c5 upstream.
Currently when doing a write to a file we always reserve metadata space for inserting data checksums. However we don't need to do it if we have a nodatacow file (-o nodatacow mount option or chattr +C) or if checksums are disabled (-o nodatasum mount option), as in that case we are only adding unnecessary pressure to metadata reservations.
For example on x86_64, with the default node size of 16K, a 4K buffered write into a nodatacow file is reserving 655360 bytes of metadata space, as it's accounting for checksums. After this change, which stops reserving space for checksums if we have a nodatacow file or checksums are disabled, we only need to reserve 393216 bytes of metadata.
CC: stable@vger.kernel.org # 6.1+ Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/delalloc-space.c | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-)
--- a/fs/btrfs/delalloc-space.c +++ b/fs/btrfs/delalloc-space.c @@ -245,7 +245,6 @@ static void btrfs_calculate_inode_block_ struct btrfs_block_rsv *block_rsv = &inode->block_rsv; u64 reserve_size = 0; u64 qgroup_rsv_size = 0; - u64 csum_leaves; unsigned outstanding_extents;
lockdep_assert_held(&inode->lock); @@ -260,10 +259,12 @@ static void btrfs_calculate_inode_block_ outstanding_extents); reserve_size += btrfs_calc_metadata_size(fs_info, 1); } - csum_leaves = btrfs_csum_bytes_to_leaves(fs_info, - inode->csum_bytes); - reserve_size += btrfs_calc_insert_metadata_size(fs_info, - csum_leaves); + if (!(inode->flags & BTRFS_INODE_NODATASUM)) { + u64 csum_leaves; + + csum_leaves = btrfs_csum_bytes_to_leaves(fs_info, inode->csum_bytes); + reserve_size += btrfs_calc_insert_metadata_size(fs_info, csum_leaves); + } /* * For qgroup rsv, the calculation is very simple: * account one nodesize for each outstanding extent @@ -278,14 +279,20 @@ static void btrfs_calculate_inode_block_ spin_unlock(&block_rsv->lock); }
-static void calc_inode_reservations(struct btrfs_fs_info *fs_info, +static void calc_inode_reservations(struct btrfs_inode *inode, u64 num_bytes, u64 disk_num_bytes, u64 *meta_reserve, u64 *qgroup_reserve) { + struct btrfs_fs_info *fs_info = inode->root->fs_info; u64 nr_extents = count_max_extents(fs_info, num_bytes); - u64 csum_leaves = btrfs_csum_bytes_to_leaves(fs_info, disk_num_bytes); + u64 csum_leaves; u64 inode_update = btrfs_calc_metadata_size(fs_info, 1);
+ if (inode->flags & BTRFS_INODE_NODATASUM) + csum_leaves = 0; + else + csum_leaves = btrfs_csum_bytes_to_leaves(fs_info, disk_num_bytes); + *meta_reserve = btrfs_calc_insert_metadata_size(fs_info, nr_extents + csum_leaves);
@@ -337,7 +344,7 @@ int btrfs_delalloc_reserve_metadata(stru * everything out and try again, which is bad. This way we just * over-reserve slightly, and clean up the mess when we are done. */ - calc_inode_reservations(fs_info, num_bytes, disk_num_bytes, + calc_inode_reservations(inode, num_bytes, disk_num_bytes, &meta_reserve, &qgroup_reserve); ret = btrfs_qgroup_reserve_meta_prealloc(root, qgroup_reserve, true, noflush); @@ -359,7 +366,8 @@ int btrfs_delalloc_reserve_metadata(stru nr_extents = count_max_extents(fs_info, num_bytes); spin_lock(&inode->lock); btrfs_mod_outstanding_extents(inode, nr_extents); - inode->csum_bytes += disk_num_bytes; + if (!(inode->flags & BTRFS_INODE_NODATASUM)) + inode->csum_bytes += disk_num_bytes; btrfs_calculate_inode_block_rsv_size(fs_info, inode); spin_unlock(&inode->lock);
@@ -393,7 +401,8 @@ void btrfs_delalloc_release_metadata(str
num_bytes = ALIGN(num_bytes, fs_info->sectorsize); spin_lock(&inode->lock); - inode->csum_bytes -= num_bytes; + if (!(inode->flags & BTRFS_INODE_NODATASUM)) + inode->csum_bytes -= num_bytes; btrfs_calculate_inode_block_rsv_size(fs_info, inode); spin_unlock(&inode->lock);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
commit 1bd96c92c6a0a4d43815eb685c15aa4b78879dc9 upstream.
Currently we allow an encoded write against inodes that have the NODATASUM flag set, either because they are NOCOW files or they were created while the filesystem was mounted with "-o nodatasum". This results in having compressed extents without corresponding checksums, which is a filesystem inconsistency reported by 'btrfs check'.
For example, running btrfs/281 with MOUNT_OPTIONS="-o nodatacow" triggers this and 'btrfs check' errors out with:
[1/7] checking root items [2/7] checking extents [3/7] checking free space tree [4/7] checking fs roots root 256 inode 257 errors 1040, bad file extent, some csum missing root 256 inode 258 errors 1040, bad file extent, some csum missing ERROR: errors found in fs roots (...)
So reject encoded writes if the target inode has NODATASUM set.
CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/inode.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -10233,6 +10233,13 @@ ssize_t btrfs_do_encoded_write(struct ki if (encoded->encryption != BTRFS_ENCODED_IO_ENCRYPTION_NONE) return -EINVAL;
+ /* + * Compressed extents should always have checksums, so error out if we + * have a NOCOW file or inode was created while mounted with NODATASUM. + */ + if (inode->flags & BTRFS_INODE_NODATASUM) + return -EINVAL; + orig_count = iov_iter_count(from);
/* The extent size must be sane. */
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josef Bacik josef@toxicpanda.com
commit 5571e41ec6e56e35f34ae9f5b3a335ef510e0ade upstream.
While running the CI for an unrelated change I hit the following panic with generic/648 on btrfs_holes_spacecache.
assertion failed: block_start != EXTENT_MAP_HOLE, in fs/btrfs/extent_io.c:1385 ------------[ cut here ]------------ kernel BUG at fs/btrfs/extent_io.c:1385! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 2695096 Comm: fsstress Kdump: loaded Tainted: G W 6.8.0-rc2+ #1 RIP: 0010:__extent_writepage_io.constprop.0+0x4c1/0x5c0 Call Trace: <TASK> extent_write_cache_pages+0x2ac/0x8f0 extent_writepages+0x87/0x110 do_writepages+0xd5/0x1f0 filemap_fdatawrite_wbc+0x63/0x90 __filemap_fdatawrite_range+0x5c/0x80 btrfs_fdatawrite_range+0x1f/0x50 btrfs_write_out_cache+0x507/0x560 btrfs_write_dirty_block_groups+0x32a/0x420 commit_cowonly_roots+0x21b/0x290 btrfs_commit_transaction+0x813/0x1360 btrfs_sync_file+0x51a/0x640 __x64_sys_fdatasync+0x52/0x90 do_syscall_64+0x9c/0x190 entry_SYSCALL_64_after_hwframe+0x6e/0x76
This happens because we fail to write out the free space cache in one instance, come back around and attempt to write it again. However on the second pass through we go to call btrfs_get_extent() on the inode to get the extent mapping. Because this is a new block group, and with the free space inode we always search the commit root to avoid deadlocking with the tree, we find nothing and return a EXTENT_MAP_HOLE for the requested range.
This happens because the first time we try to write the space cache out we hit an error, and on an error we drop the extent mapping. This is normal for normal files, but the free space cache inode is special. We always expect the extent map to be correct. Thus the second time through we end up with a bogus extent map.
Since we're deprecating this feature, the most straightforward way to fix this is to simply skip dropping the extent map range for this failed range.
I shortened the test by using error injection to stress the area to make it easier to reproduce. With this patch in place we no longer panic with my error injection test.
CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Josef Bacik josef@toxicpanda.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/inode.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)
--- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3175,8 +3175,23 @@ out: unwritten_start += logical_len; clear_extent_uptodate(io_tree, unwritten_start, end, NULL);
- /* Drop extent maps for the part of the extent we didn't write. */ - btrfs_drop_extent_map_range(inode, unwritten_start, end, false); + /* + * Drop extent maps for the part of the extent we didn't write. + * + * We have an exception here for the free_space_inode, this is + * because when we do btrfs_get_extent() on the free space inode + * we will search the commit root. If this is a new block group + * we won't find anything, and we will trip over the assert in + * writepage where we do ASSERT(em->block_start != + * EXTENT_MAP_HOLE). + * + * Theoretically we could also skip this for any NOCOW extent as + * we don't mess with the extent map tree in the NOCOW case, but + * for now simply skip this if we are the free space inode. + */ + if (!btrfs_is_free_space_inode(inode)) + btrfs_drop_extent_map_range(inode, unwritten_start, + end, false);
/* * If the ordered extent had an IOERR or something else went
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Saravana Kannan saravanak@google.com
commit 7fddac12c38237252431d5b8af7b6d5771b6d125 upstream.
device_link_flag_is_sync_state_only() correctly returns true on the flags of an existing device link that only implements sync_state() functionality. However, it incorrectly and confusingly returns false if it's called with DL_FLAG_SYNC_STATE_ONLY.
This bug doesn't manifest in any of the existing calls to this function, but fix this confusing behavior to avoid future bugs.
Fixes: 67cad5c67019 ("driver core: fw_devlink: Add DL_FLAG_CYCLE support to device links") Signed-off-by: Saravana Kannan saravanak@google.com Tested-by: Xu Yang xu.yang_2@nxp.com Link: https://lore.kernel.org/r/20240202095636.868578-2-saravanak@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/base/core.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -284,10 +284,12 @@ static bool device_is_ancestor(struct de return false; }
+#define DL_MARKER_FLAGS (DL_FLAG_INFERRED | \ + DL_FLAG_CYCLE | \ + DL_FLAG_MANAGED) static inline bool device_link_flag_is_sync_state_only(u32 flags) { - return (flags & ~(DL_FLAG_INFERRED | DL_FLAG_CYCLE)) == - (DL_FLAG_SYNC_STATE_ONLY | DL_FLAG_MANAGED); + return (flags & ~DL_MARKER_FLAGS) == DL_FLAG_SYNC_STATE_ONLY; }
/**
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nícolas F. R. A. Prado nfraprado@collabora.com
[ Upstream commit 6154fb9c2134f8d9534b2de10491aa3a22f3c9ff ]
When walking directory trees, instead of looking for specific files and running dirname to get the parent folder, traverse all folders and ignore the ones not containing the desired files. This avoids the need to call dirname inside the loop, which drastically decreases run time: Running locally on a mt8192-asurada-spherion, which reports 160 test cases, has gone from 5.5s to 2.9s, while running remotely with an nfsroot has gone from 13.5s to 5.5s.
This change has a side-effect, which is that the root DT node now also shows in the output, even though it isn't expected to bind to a driver. However there shouldn't be a matching driver for the board compatible, so the end result will be just an extra skipped test:
ok 1 / # SKIP
Reported-by: Mark Brown broonie@kernel.org Closes: https://lore.kernel.org/all/310391e8-fdf2-4c2f-a680-7744eb685177@sirena.org.... Fixes: 14571ab1ad21 ("kselftest: Add new test for detecting unprobed Devicetree devices") Tested-by: Mark Brown broonie@kernel.org Signed-off-by: Nícolas F. R. A. Prado nfraprado@collabora.com Link: https://lore.kernel.org/r/20240122-dt-kselftest-dirname-perf-fix-v2-1-f16305... Signed-off-by: Rob Herring robh@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/dt/test_unprobed_devices.sh | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/dt/test_unprobed_devices.sh b/tools/testing/selftests/dt/test_unprobed_devices.sh index b07af2a4c4de..7fae90293a9d 100755 --- a/tools/testing/selftests/dt/test_unprobed_devices.sh +++ b/tools/testing/selftests/dt/test_unprobed_devices.sh @@ -33,8 +33,8 @@ if [[ ! -d "${PDT}" ]]; then fi
nodes_compatible=$( - for node_compat in $(find ${PDT} -name compatible); do - node=$(dirname "${node_compat}") + for node in $(find ${PDT} -type d); do + [ ! -f "${node}"/compatible ] && continue # Check if node is available if [[ -e "${node}"/status ]]; then status=$(tr -d '\000' < "${node}"/status) @@ -46,10 +46,11 @@ nodes_compatible=$(
nodes_dev_bound=$( IFS=$'\n' - for uevent in $(find /sys/devices -name uevent); do - if [[ -d "$(dirname "${uevent}")"/driver ]]; then - grep '^OF_FULLNAME=' "${uevent}" | sed -e 's|OF_FULLNAME=||' - fi + for dev_dir in $(find /sys/devices -type d); do + [ ! -f "${dev_dir}"/uevent ] && continue + [ ! -d "${dev_dir}"/driver ] && continue + + grep '^OF_FULLNAME=' "${dev_dir}"/uevent | sed -e 's|OF_FULLNAME=||' done )
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hu Yadi hu.yadi@h3c.com
[ Upstream commit 116099ed345c932a8ae4a0d884a8f6cc54fd5fed ]
One issue comes up while building selftest/landlock/net_test on my side (gcc 7.3/glibc-2.28/kernel-4.19).
net_test.c: In function ‘set_service’: net_test.c:91:45: warning: implicit declaration of function ‘gettid’; [-Wimplicit-function-declaration] "_selftests-landlock-net-tid%d-index%d", gettid(), ^~~~~~ getgid net_test.c:(.text+0x4e0): undefined reference to `gettid'
Signed-off-by: Hu Yadi hu.yadi@h3c.com Suggested-by: Jiao jiaoxupo@h3c.com Reviewed-by: Berlin berlin@h3c.com Fixes: a549d055a22e ("selftests/landlock: Add network tests") Link: https://lore.kernel.org/r/20240123062621.25082-1-hu.yadi@h3c.com [mic: Cosmetic fixes] Signed-off-by: Mickaël Salaün mic@digikod.net Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/landlock/net_test.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/landlock/net_test.c b/tools/testing/selftests/landlock/net_test.c index 929e21c4db05..e07267acbc9a 100644 --- a/tools/testing/selftests/landlock/net_test.c +++ b/tools/testing/selftests/landlock/net_test.c @@ -17,6 +17,7 @@ #include <string.h> #include <sys/prctl.h> #include <sys/socket.h> +#include <sys/syscall.h> #include <sys/un.h>
#include "common.h" @@ -54,6 +55,11 @@ struct service_fixture { }; };
+static pid_t sys_gettid(void) +{ + return syscall(__NR_gettid); +} + static int set_service(struct service_fixture *const srv, const struct protocol_variant prot, const unsigned short index) @@ -88,7 +94,7 @@ static int set_service(struct service_fixture *const srv, case AF_UNIX: srv->unix_addr.sun_family = prot.domain; sprintf(srv->unix_addr.sun_path, - "_selftests-landlock-net-tid%d-index%d", gettid(), + "_selftests-landlock-net-tid%d-index%d", sys_gettid(), index); srv->unix_addr_len = SUN_LEN(&srv->unix_addr); srv->unix_addr.sun_path[0] = '\0';
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hu Yadi hu.yadi@h3c.com
[ Upstream commit 40b7835e74e0383be308d528c5e0e41b3bf72ade ]
One issue comes up while building selftest/landlock/fs_test on my side (gcc 7.3/glibc-2.28/kernel-4.19).
gcc -Wall -O2 -isystem fs_test.c -lcap -o selftests/landlock/fs_test fs_test.c:4575:9: error: initializer element is not constant .mnt = mnt_tmp, ^~~~~~~
Signed-off-by: Hu Yadi hu.yadi@h3c.com Suggested-by: Jiao jiaoxupo@h3c.com Reviewed-by: Berlin berlin@h3c.com Link: https://lore.kernel.org/r/20240124022908.42100-1-hu.yadi@h3c.com Fixes: 04f9070e99a4 ("selftests/landlock: Add tests for pseudo filesystems") [mic: Factor out mount's data string and make mnt_tmp static] Signed-off-by: Mickaël Salaün mic@digikod.net Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/landlock/fs_test.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c index 18e1f86a6234..fde1a96ef9f4 100644 --- a/tools/testing/selftests/landlock/fs_test.c +++ b/tools/testing/selftests/landlock/fs_test.c @@ -241,9 +241,11 @@ struct mnt_opt { const char *const data; };
-const struct mnt_opt mnt_tmp = { +#define MNT_TMP_DATA "size=4m,mode=700" + +static const struct mnt_opt mnt_tmp = { .type = "tmpfs", - .data = "size=4m,mode=700", + .data = MNT_TMP_DATA, };
static int mount_opt(const struct mnt_opt *const mnt, const char *const target) @@ -4572,7 +4574,10 @@ FIXTURE_VARIANT(layout3_fs) /* clang-format off */ FIXTURE_VARIANT_ADD(layout3_fs, tmpfs) { /* clang-format on */ - .mnt = mnt_tmp, + .mnt = { + .type = "tmpfs", + .data = MNT_TMP_DATA, + }, .file_path = file1_s1d1, };
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian A. Ehrhardt lk@c--e.de
[ Upstream commit 607aad1e4356c210dbef9022955a3089377909b2 ]
If CONFIG_OF_KOBJ is not set, a device_node does not contain a kobj and attempts to access the embedded kobj via kref_read break the compile.
Replace affected kref_read calls with a macro that reads the refcount if it exists and returns 1 if there is no embedded kobj.
Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202401291740.VP219WIz-lkp@intel.com/ Fixes: 4dde83569832 ("of: Fix double free in of_parse_phandle_with_args_map") Signed-off-by: Christian A. Ehrhardt lk@c--e.de Link: https://lore.kernel.org/r/20240129192556.403271-1-lk@c--e.de Signed-off-by: Rob Herring robh@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/of/unittest.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c index cfd60e35a899..d7593bde2d02 100644 --- a/drivers/of/unittest.c +++ b/drivers/of/unittest.c @@ -50,6 +50,12 @@ static struct unittest_results { failed; \ })
+#ifdef CONFIG_OF_KOBJ +#define OF_KREF_READ(NODE) kref_read(&(NODE)->kobj.kref) +#else +#define OF_KREF_READ(NODE) 1 +#endif + /* * Expected message may have a message level other than KERN_INFO. * Print the expected message only if the current loglevel will allow @@ -570,7 +576,7 @@ static void __init of_unittest_parse_phandle_with_args_map(void) pr_err("missing testcase data\n"); return; } - prefs[i] = kref_read(&p[i]->kobj.kref); + prefs[i] = OF_KREF_READ(p[i]); }
rc = of_count_phandle_with_args(np, "phandle-list", "#phandle-cells"); @@ -693,9 +699,9 @@ static void __init of_unittest_parse_phandle_with_args_map(void) unittest(rc == -EINVAL, "expected:%i got:%i\n", -EINVAL, rc);
for (i = 0; i < ARRAY_SIZE(p); ++i) { - unittest(prefs[i] == kref_read(&p[i]->kobj.kref), + unittest(prefs[i] == OF_KREF_READ(p[i]), "provider%d: expected:%d got:%d\n", - i, prefs[i], kref_read(&p[i]->kobj.kref)); + i, prefs[i], OF_KREF_READ(p[i])); of_node_put(p[i]); } }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rob Clark robdclark@chromium.org
[ Upstream commit 03facb39d6c6433a78d0f79c7a146b1e6a61943e ]
Since commit 79e2cf2e7a19 ("drm/gem: Take reservation lock for vmap/vunmap operations"), the resv lock is already held in the prime vmap path, so don't try to grab it again.
v2: This applies to vunmap path as well v3: Fix fixes commit
Fixes: 79e2cf2e7a19 ("drm/gem: Take reservation lock for vmap/vunmap operations") Signed-off-by: Rob Clark robdclark@chromium.org Acked-by: Christian König christian.koenig@amd.com Patchwork: https://patchwork.freedesktop.org/patch/576642/ Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/msm_gem_prime.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/msm/msm_gem_prime.c b/drivers/gpu/drm/msm/msm_gem_prime.c index 5f68e31a3e4e..0915f3b68752 100644 --- a/drivers/gpu/drm/msm/msm_gem_prime.c +++ b/drivers/gpu/drm/msm/msm_gem_prime.c @@ -26,7 +26,7 @@ int msm_gem_prime_vmap(struct drm_gem_object *obj, struct iosys_map *map) { void *vaddr;
- vaddr = msm_gem_get_vaddr(obj); + vaddr = msm_gem_get_vaddr_locked(obj); if (IS_ERR(vaddr)) return PTR_ERR(vaddr); iosys_map_set_vaddr(map, vaddr); @@ -36,7 +36,7 @@ int msm_gem_prime_vmap(struct drm_gem_object *obj, struct iosys_map *map)
void msm_gem_prime_vunmap(struct drm_gem_object *obj, struct iosys_map *map) { - msm_gem_put_vaddr(obj); + msm_gem_put_vaddr_locked(obj); }
struct drm_gem_object *msm_gem_prime_import_sg_table(struct drm_device *dev,
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mickaël Salaün mic@digikod.net
[ Upstream commit bb6f4dbe2639d5b8a9fde4bfb6fefecfd3f18df3 ]
CAP_NET_ADMIN allows to configure network interfaces, not CAP_SYS_ADMIN which only allows to call unshare(2). Without this change, running network tests as a non-root user but with all capabilities would fail at the setup_loopback() step with "RTNETLINK answers: Operation not permitted".
The issue is only visible when running tests with non-root users (i.e. only relying on ambient capabilities). Indeed, when configuring the network interface, the "ip" command is called, which may lead to the special handling of capabilities for the root user by execve(2). If root is the caller, then the inherited, permitted and effective capabilities are all reset, which then includes CAP_NET_ADMIN. However, if a non-root user is the caller, then ambient capabilities are masked by the inherited ones, which were explicitly dropped.
To make execution deterministic whatever users are running the tests, set the noroot secure bit for each test, and set the inheritable and ambient capabilities to CAP_NET_ADMIN, the only capability that may be required after an execve(2).
Factor out _effective_cap() into _change_cap(), and use it to manage ambient capabilities with the new set_ambient_cap() and clear_ambient_cap() helpers.
This makes it possible to run all Landlock tests with check-linux.sh from https://github.com/landlock-lsm/landlock-test-tools
Cc: Konstantin Meskhidze konstantin.meskhidze@huawei.com Fixes: a549d055a22e ("selftests/landlock: Add network tests") Link: https://lore.kernel.org/r/20240125153230.3817165-2-mic@digikod.net [mic: Make sure SECBIT_NOROOT_LOCKED is set] Signed-off-by: Mickaël Salaün mic@digikod.net Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/landlock/common.h | 48 +++++++++++++++++---- tools/testing/selftests/landlock/net_test.c | 5 ++- 2 files changed, 44 insertions(+), 9 deletions(-)
diff --git a/tools/testing/selftests/landlock/common.h b/tools/testing/selftests/landlock/common.h index 5b79758cae62..e64bbdf0e86e 100644 --- a/tools/testing/selftests/landlock/common.h +++ b/tools/testing/selftests/landlock/common.h @@ -9,6 +9,7 @@
#include <errno.h> #include <linux/landlock.h> +#include <linux/securebits.h> #include <sys/capability.h> #include <sys/socket.h> #include <sys/syscall.h> @@ -115,11 +116,16 @@ static void _init_caps(struct __test_metadata *const _metadata, bool drop_all) /* clang-format off */ CAP_DAC_OVERRIDE, CAP_MKNOD, + CAP_NET_ADMIN, + CAP_NET_BIND_SERVICE, CAP_SYS_ADMIN, CAP_SYS_CHROOT, - CAP_NET_BIND_SERVICE, /* clang-format on */ }; + const unsigned int noroot = SECBIT_NOROOT | SECBIT_NOROOT_LOCKED; + + if ((cap_get_secbits() & noroot) != noroot) + EXPECT_EQ(0, cap_set_secbits(noroot));
cap_p = cap_get_proc(); EXPECT_NE(NULL, cap_p) @@ -137,6 +143,8 @@ static void _init_caps(struct __test_metadata *const _metadata, bool drop_all) TH_LOG("Failed to cap_set_flag: %s", strerror(errno)); } } + + /* Automatically resets ambient capabilities. */ EXPECT_NE(-1, cap_set_proc(cap_p)) { TH_LOG("Failed to cap_set_proc: %s", strerror(errno)); @@ -145,6 +153,9 @@ static void _init_caps(struct __test_metadata *const _metadata, bool drop_all) { TH_LOG("Failed to cap_free: %s", strerror(errno)); } + + /* Quickly checks that ambient capabilities are cleared. */ + EXPECT_NE(-1, cap_get_ambient(caps[0])); }
/* We cannot put such helpers in a library because of kselftest_harness.h . */ @@ -158,8 +169,9 @@ static void __maybe_unused drop_caps(struct __test_metadata *const _metadata) _init_caps(_metadata, true); }
-static void _effective_cap(struct __test_metadata *const _metadata, - const cap_value_t caps, const cap_flag_value_t value) +static void _change_cap(struct __test_metadata *const _metadata, + const cap_flag_t flag, const cap_value_t cap, + const cap_flag_value_t value) { cap_t cap_p;
@@ -168,7 +180,7 @@ static void _effective_cap(struct __test_metadata *const _metadata, { TH_LOG("Failed to cap_get_proc: %s", strerror(errno)); } - EXPECT_NE(-1, cap_set_flag(cap_p, CAP_EFFECTIVE, 1, &caps, value)) + EXPECT_NE(-1, cap_set_flag(cap_p, flag, 1, &cap, value)) { TH_LOG("Failed to cap_set_flag: %s", strerror(errno)); } @@ -183,15 +195,35 @@ static void _effective_cap(struct __test_metadata *const _metadata, }
static void __maybe_unused set_cap(struct __test_metadata *const _metadata, - const cap_value_t caps) + const cap_value_t cap) { - _effective_cap(_metadata, caps, CAP_SET); + _change_cap(_metadata, CAP_EFFECTIVE, cap, CAP_SET); }
static void __maybe_unused clear_cap(struct __test_metadata *const _metadata, - const cap_value_t caps) + const cap_value_t cap) +{ + _change_cap(_metadata, CAP_EFFECTIVE, cap, CAP_CLEAR); +} + +static void __maybe_unused +set_ambient_cap(struct __test_metadata *const _metadata, const cap_value_t cap) +{ + _change_cap(_metadata, CAP_INHERITABLE, cap, CAP_SET); + + EXPECT_NE(-1, cap_set_ambient(cap, CAP_SET)) + { + TH_LOG("Failed to set ambient capability %d: %s", cap, + strerror(errno)); + } +} + +static void __maybe_unused clear_ambient_cap( + struct __test_metadata *const _metadata, const cap_value_t cap) { - _effective_cap(_metadata, caps, CAP_CLEAR); + EXPECT_EQ(1, cap_get_ambient(cap)); + _change_cap(_metadata, CAP_INHERITABLE, cap, CAP_CLEAR); + EXPECT_EQ(0, cap_get_ambient(cap)); }
/* Receives an FD from a UNIX socket. Returns the received FD, or -errno. */ diff --git a/tools/testing/selftests/landlock/net_test.c b/tools/testing/selftests/landlock/net_test.c index e07267acbc9a..4499b2736e1a 100644 --- a/tools/testing/selftests/landlock/net_test.c +++ b/tools/testing/selftests/landlock/net_test.c @@ -107,8 +107,11 @@ static void setup_loopback(struct __test_metadata *const _metadata) { set_cap(_metadata, CAP_SYS_ADMIN); ASSERT_EQ(0, unshare(CLONE_NEWNET)); - ASSERT_EQ(0, system("ip link set dev lo up")); clear_cap(_metadata, CAP_SYS_ADMIN); + + set_ambient_cap(_metadata, CAP_NET_ADMIN); + ASSERT_EQ(0, system("ip link set dev lo up")); + clear_ambient_cap(_metadata, CAP_NET_ADMIN); }
static bool is_restricted(const struct protocol_variant *const prot,
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cezary Rojewski cezary.rojewski@intel.com
[ Upstream commit b5fbde22684af5456d1de60758950944d69d69ad ]
Recent changes modified operation-order in the probe() function without updating its error path accordingly. If snd_hdac_i915_init() exists with status EPROBE_DEFER the error path must cleanup allocated IRQs before leaving the scope.
Fixes: 2dddc514b6e4 ("ASoC: Intel: avs: Move snd_hdac_i915_init to before probe_work.") Signed-off-by: Cezary Rojewski cezary.rojewski@intel.com Link: https://lore.kernel.org/r/20240202114901.1002127-1-cezary.rojewski@intel.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/intel/avs/core.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/sound/soc/intel/avs/core.c b/sound/soc/intel/avs/core.c index 59c3793f65df..db78eb2f0108 100644 --- a/sound/soc/intel/avs/core.c +++ b/sound/soc/intel/avs/core.c @@ -477,6 +477,9 @@ static int avs_pci_probe(struct pci_dev *pci, const struct pci_device_id *id) return 0;
err_i915_init: + pci_free_irq(pci, 0, adev); + pci_free_irq(pci, 0, bus); + pci_free_irq_vectors(pci); pci_clear_master(pci); pci_set_drvdata(pci, NULL); err_acquire_irq:
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Carlos Song carlos.song@nxp.com
[ Upstream commit c712c05e46c8ce550842951e9e2606e24dbf0475 ]
For DMA mode, the bus width of the DMA is equal to the size of data word, so burst length should be configured as bits per word.
For CPU mode, because of the spi transfer len is in byte, so calculate the total number of words according to spi transfer len and bits per word, burst length should be configured as total data bits.
Signed-off-by: Carlos Song carlos.song@nxp.com Reviewed-by: Clark Wang xiaoning.wang@nxp.com Fixes: e9b220aeacf1 ("spi: spi-imx: correctly configure burst length when using dma") Fixes: 5f66db08cbd3 ("spi: imx: Take in account bits per word instead of assuming 8-bits") Link: https://lore.kernel.org/r/20240204091912.36488-1-carlos.song@nxp.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-imx.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c index 272bc871a848..e2d3e3ec1378 100644 --- a/drivers/spi/spi-imx.c +++ b/drivers/spi/spi-imx.c @@ -2,6 +2,7 @@ // Copyright 2004-2007 Freescale Semiconductor, Inc. All Rights Reserved. // Copyright (C) 2008 Juergen Beisert
+#include <linux/bits.h> #include <linux/clk.h> #include <linux/completion.h> #include <linux/delay.h> @@ -660,15 +661,15 @@ static int mx51_ecspi_prepare_transfer(struct spi_imx_data *spi_imx, << MX51_ECSPI_CTRL_BL_OFFSET; else { if (spi_imx->usedma) { - ctrl |= (spi_imx->bits_per_word * - spi_imx_bytes_per_word(spi_imx->bits_per_word) - 1) + ctrl |= (spi_imx->bits_per_word - 1) << MX51_ECSPI_CTRL_BL_OFFSET; } else { if (spi_imx->count >= MX51_ECSPI_CTRL_MAX_BURST) - ctrl |= (MX51_ECSPI_CTRL_MAX_BURST - 1) + ctrl |= (MX51_ECSPI_CTRL_MAX_BURST * BITS_PER_BYTE - 1) << MX51_ECSPI_CTRL_BL_OFFSET; else - ctrl |= (spi_imx->count * spi_imx->bits_per_word - 1) + ctrl |= spi_imx->count / DIV_ROUND_UP(spi_imx->bits_per_word, + BITS_PER_BYTE) * spi_imx->bits_per_word << MX51_ECSPI_CTRL_BL_OFFSET; } }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amadeusz Sławiński amadeuszx.slawinski@linux.intel.com
[ Upstream commit 44d3b8a19b91cd2af11f918b2fd05628383172de ]
In case TDM is set in topology on SSP0, parser will overwrite vindex value, because it only checks if port is set. Fix this by checking whole field value.
Fixes: e6d50e474e45 ("ASoC: Intel: avs: Improve topology parsing of dynamic strings") Reviewed-by: Cezary Rojewski cezary.rojewski@intel.com Signed-off-by: Amadeusz Sławiński amadeuszx.slawinski@linux.intel.com Link: https://lore.kernel.org/r/20240207112624.2132821-1-amadeuszx.slawinski@linux... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/intel/avs/topology.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/intel/avs/topology.c b/sound/soc/intel/avs/topology.c index c74e9d622e4c..41020409ffb6 100644 --- a/sound/soc/intel/avs/topology.c +++ b/sound/soc/intel/avs/topology.c @@ -857,7 +857,7 @@ assign_copier_gtw_instance(struct snd_soc_component *comp, struct avs_tplg_modcf }
/* If topology sets value don't overwrite it */ - if (cfg->copier.vindex.i2s.instance) + if (cfg->copier.vindex.val) return;
mach = dev_get_platdata(comp->card->dev);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miri Korenblit miriam.rachel.korenblit@intel.com
[ Upstream commit 3012477cd510044d346c5e0465ead4732aef8349 ]
Before sending a SESSION PROTECTION cmd the driver checks if the link_id indicated in the time event (and for which the cmd will be sent) is valid and exists. Clear the te_data::link_id when FW notifies that a session protection ended, so the check will actually fail when it should.
Fixes: 135065837310 ("wifi: iwlwifi: support link_id in SESSION_PROTECTION cmd") Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Link: https://msgid.link/20240204235836.c64a6b3606c2.I35cdc08e8a3be282563163690f8c... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/mvm/time-event.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/time-event.c b/drivers/net/wireless/intel/iwlwifi/mvm/time-event.c index 218fdf1ed530..2e653a417d62 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/time-event.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/time-event.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause /* - * Copyright (C) 2012-2014, 2018-2023 Intel Corporation + * Copyright (C) 2012-2014, 2018-2024 Intel Corporation * Copyright (C) 2013-2015 Intel Mobile Communications GmbH * Copyright (C) 2017 Intel Deutschland GmbH */ @@ -972,6 +972,7 @@ void iwl_mvm_rx_session_protect_notif(struct iwl_mvm *mvm, if (!le32_to_cpu(notif->status) || !le32_to_cpu(notif->start)) { /* End TE, notify mac80211 */ mvmvif->time_event_data.id = SESSION_PROTECT_CONF_MAX_ID; + mvmvif->time_event_data.link_id = -1; iwl_mvm_p2p_roc_finished(mvm); ieee80211_remain_on_channel_expired(mvm->hw); } else if (le32_to_cpu(notif->start)) {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit c6ebb5b67641994de8bc486b33457fe0b681d6fe ]
This saves the error as PTR_ERR(wifi_pkg). The problem is that "wifi_pkg" is a valid pointer, not an error pointer. Set the error code to -EINVAL instead.
Fixes: 2a8084147bff ("iwlwifi: acpi: support reading and storing WRDS revision 1 and 2") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Link: https://msgid.link/9620bb77-2d7c-4d76-b255-ad824ebf8e35@moroto.mountain Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/fw/acpi.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/acpi.c b/drivers/net/wireless/intel/iwlwifi/fw/acpi.c index b96f30d11644..d73d561709d3 100644 --- a/drivers/net/wireless/intel/iwlwifi/fw/acpi.c +++ b/drivers/net/wireless/intel/iwlwifi/fw/acpi.c @@ -618,7 +618,7 @@ int iwl_sar_get_wrds_table(struct iwl_fw_runtime *fwrt) &tbl_rev); if (!IS_ERR(wifi_pkg)) { if (tbl_rev != 2) { - ret = PTR_ERR(wifi_pkg); + ret = -EINVAL; goto out_free; }
@@ -634,7 +634,7 @@ int iwl_sar_get_wrds_table(struct iwl_fw_runtime *fwrt) &tbl_rev); if (!IS_ERR(wifi_pkg)) { if (tbl_rev != 1) { - ret = PTR_ERR(wifi_pkg); + ret = -EINVAL; goto out_free; }
@@ -650,7 +650,7 @@ int iwl_sar_get_wrds_table(struct iwl_fw_runtime *fwrt) &tbl_rev); if (!IS_ERR(wifi_pkg)) { if (tbl_rev != 0) { - ret = PTR_ERR(wifi_pkg); + ret = -EINVAL; goto out_free; }
@@ -707,7 +707,7 @@ int iwl_sar_get_ewrd_table(struct iwl_fw_runtime *fwrt) &tbl_rev); if (!IS_ERR(wifi_pkg)) { if (tbl_rev != 2) { - ret = PTR_ERR(wifi_pkg); + ret = -EINVAL; goto out_free; }
@@ -723,7 +723,7 @@ int iwl_sar_get_ewrd_table(struct iwl_fw_runtime *fwrt) &tbl_rev); if (!IS_ERR(wifi_pkg)) { if (tbl_rev != 1) { - ret = PTR_ERR(wifi_pkg); + ret = -EINVAL; goto out_free; }
@@ -739,7 +739,7 @@ int iwl_sar_get_ewrd_table(struct iwl_fw_runtime *fwrt) &tbl_rev); if (!IS_ERR(wifi_pkg)) { if (tbl_rev != 0) { - ret = PTR_ERR(wifi_pkg); + ret = -EINVAL; goto out_free; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit 65c6ee90455053cfd3067c17aaa4a42b0c766543 ]
This is an error path and Smatch complains that "tbl_rev" is uninitialized on this path. All the other functions follow this same patter where they set the error code and goto out_free so that's probably what was intended here as well.
Fixes: e8e10a37c51c ("iwlwifi: acpi: move ppag code from mvm to fw/acpi") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Link: https://msgid.link/09900c01-6540-4a32-9451-563da0029cb6@moroto.mountain Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/fw/acpi.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/acpi.c b/drivers/net/wireless/intel/iwlwifi/fw/acpi.c index d73d561709d3..dcc4810cb324 100644 --- a/drivers/net/wireless/intel/iwlwifi/fw/acpi.c +++ b/drivers/net/wireless/intel/iwlwifi/fw/acpi.c @@ -1116,6 +1116,9 @@ int iwl_acpi_get_ppag_table(struct iwl_fw_runtime *fwrt) goto read_table; }
+ ret = PTR_ERR(wifi_pkg); + goto out_free; + read_table: fwrt->ppag_ver = tbl_rev; flags = &wifi_pkg->package.elements[1];
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ranjani Sridharan ranjani.sridharan@linux.intel.com
[ Upstream commit d7332c4a4f1a7d16f054c6357fb65c597b6a86a7 ]
With the change in the widget free logic to power down the cores only when the scheduler widgets are freed, we need to ensure that the scheduler widget is freed only after all the widgets associated with the scheduler are freed. This is to ensure that the secondary core that the scheduler is scheduled to run on is kept powered on until all widgets that need them are in use. While this works well for dynamic pipelines, in the case of static pipelines the current logic does not take this into account and frees all widgets in the order they occur in the widget_list. So, modify this to ensure that the scheduler widgets are freed only after all other types of widgets in the widget_list are freed.
Link: https://github.com/thesofproject/linux/issues/4807 Fixes: 31ed8da1c8e5 ("ASoC: SOF: sof-audio: Modify logic for enabling/disabling topology cores") Signed-off-by: Ranjani Sridharan ranjani.sridharan@linux.intel.com Reviewed-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Reviewed-by: Péter Ujfalusi peter.ujfalusi@linux.intel.com Signed-off-by: Peter Ujfalusi peter.ujfalusi@linux.intel.com Link: https://lore.kernel.org/r/20240208133432.1688-1-peter.ujfalusi@linux.intel.c... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/sof/ipc3-topology.c | 55 ++++++++++++++++++++++++++--------- 1 file changed, 41 insertions(+), 14 deletions(-)
diff --git a/sound/soc/sof/ipc3-topology.c b/sound/soc/sof/ipc3-topology.c index 2c7a5e7a364c..d96555438c6b 100644 --- a/sound/soc/sof/ipc3-topology.c +++ b/sound/soc/sof/ipc3-topology.c @@ -2309,27 +2309,16 @@ static int sof_tear_down_left_over_pipelines(struct snd_sof_dev *sdev) return 0; }
-/* - * For older firmware, this function doesn't free widgets for static pipelines during suspend. - * It only resets use_count for all widgets. - */ -static int sof_ipc3_tear_down_all_pipelines(struct snd_sof_dev *sdev, bool verify) +static int sof_ipc3_free_widgets_in_list(struct snd_sof_dev *sdev, bool include_scheduler, + bool *dyn_widgets, bool verify) { struct sof_ipc_fw_version *v = &sdev->fw_ready.version; struct snd_sof_widget *swidget; - struct snd_sof_route *sroute; - bool dyn_widgets = false; int ret;
- /* - * This function is called during suspend and for one-time topology verification during - * first boot. In both cases, there is no need to protect swidget->use_count and - * sroute->setup because during suspend all running streams are suspended and during - * topology loading the sound card unavailable to open PCMs. - */ list_for_each_entry(swidget, &sdev->widget_list, list) { if (swidget->dynamic_pipeline_widget) { - dyn_widgets = true; + *dyn_widgets = true; continue; }
@@ -2344,11 +2333,49 @@ static int sof_ipc3_tear_down_all_pipelines(struct snd_sof_dev *sdev, bool verif continue; }
+ if (include_scheduler && swidget->id != snd_soc_dapm_scheduler) + continue; + + if (!include_scheduler && swidget->id == snd_soc_dapm_scheduler) + continue; + ret = sof_widget_free(sdev, swidget); if (ret < 0) return ret; }
+ return 0; +} + +/* + * For older firmware, this function doesn't free widgets for static pipelines during suspend. + * It only resets use_count for all widgets. + */ +static int sof_ipc3_tear_down_all_pipelines(struct snd_sof_dev *sdev, bool verify) +{ + struct sof_ipc_fw_version *v = &sdev->fw_ready.version; + struct snd_sof_widget *swidget; + struct snd_sof_route *sroute; + bool dyn_widgets = false; + int ret; + + /* + * This function is called during suspend and for one-time topology verification during + * first boot. In both cases, there is no need to protect swidget->use_count and + * sroute->setup because during suspend all running streams are suspended and during + * topology loading the sound card unavailable to open PCMs. Do not free the scheduler + * widgets yet so that the secondary cores do not get powered down before all the widgets + * associated with the scheduler are freed. + */ + ret = sof_ipc3_free_widgets_in_list(sdev, false, &dyn_widgets, verify); + if (ret < 0) + return ret; + + /* free all the scheduler widgets now */ + ret = sof_ipc3_free_widgets_in_list(sdev, true, &dyn_widgets, verify); + if (ret < 0) + return ret; + /* * Tear down all pipelines associated with PCMs that did not get suspended * and unset the prepare flag so that they can be set up again during resume.
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Pirko jiri@nvidia.com
[ Upstream commit 53c0441dd2c44ee93fddb5473885fd41e4bc2361 ]
Recently, I've been hitting following deadlock warning during dpll pin dump:
[52804.637962] ====================================================== [52804.638536] WARNING: possible circular locking dependency detected [52804.639111] 6.8.0-rc2jiri+ #1 Not tainted [52804.639529] ------------------------------------------------------ [52804.640104] python3/2984 is trying to acquire lock: [52804.640581] ffff88810e642678 (nlk_cb_mutex-GENERIC){+.+.}-{3:3}, at: netlink_dump+0xb3/0x780 [52804.641417] but task is already holding lock: [52804.642010] ffffffff83bde4c8 (dpll_lock){+.+.}-{3:3}, at: dpll_lock_dumpit+0x13/0x20 [52804.642747] which lock already depends on the new lock.
[52804.643551] the existing dependency chain (in reverse order) is: [52804.644259] -> #1 (dpll_lock){+.+.}-{3:3}: [52804.644836] lock_acquire+0x174/0x3e0 [52804.645271] __mutex_lock+0x119/0x1150 [52804.645723] dpll_lock_dumpit+0x13/0x20 [52804.646169] genl_start+0x266/0x320 [52804.646578] __netlink_dump_start+0x321/0x450 [52804.647056] genl_family_rcv_msg_dumpit+0x155/0x1e0 [52804.647575] genl_rcv_msg+0x1ed/0x3b0 [52804.648001] netlink_rcv_skb+0xdc/0x210 [52804.648440] genl_rcv+0x24/0x40 [52804.648831] netlink_unicast+0x2f1/0x490 [52804.649290] netlink_sendmsg+0x36d/0x660 [52804.649742] __sock_sendmsg+0x73/0xc0 [52804.650165] __sys_sendto+0x184/0x210 [52804.650597] __x64_sys_sendto+0x72/0x80 [52804.651045] do_syscall_64+0x6f/0x140 [52804.651474] entry_SYSCALL_64_after_hwframe+0x46/0x4e [52804.652001] -> #0 (nlk_cb_mutex-GENERIC){+.+.}-{3:3}: [52804.652650] check_prev_add+0x1ae/0x1280 [52804.653107] __lock_acquire+0x1ed3/0x29a0 [52804.653559] lock_acquire+0x174/0x3e0 [52804.653984] __mutex_lock+0x119/0x1150 [52804.654423] netlink_dump+0xb3/0x780 [52804.654845] __netlink_dump_start+0x389/0x450 [52804.655321] genl_family_rcv_msg_dumpit+0x155/0x1e0 [52804.655842] genl_rcv_msg+0x1ed/0x3b0 [52804.656272] netlink_rcv_skb+0xdc/0x210 [52804.656721] genl_rcv+0x24/0x40 [52804.657119] netlink_unicast+0x2f1/0x490 [52804.657570] netlink_sendmsg+0x36d/0x660 [52804.658022] __sock_sendmsg+0x73/0xc0 [52804.658450] __sys_sendto+0x184/0x210 [52804.658877] __x64_sys_sendto+0x72/0x80 [52804.659322] do_syscall_64+0x6f/0x140 [52804.659752] entry_SYSCALL_64_after_hwframe+0x46/0x4e [52804.660281] other info that might help us debug this:
[52804.661077] Possible unsafe locking scenario:
[52804.661671] CPU0 CPU1 [52804.662129] ---- ---- [52804.662577] lock(dpll_lock); [52804.662924] lock(nlk_cb_mutex-GENERIC); [52804.663538] lock(dpll_lock); [52804.664073] lock(nlk_cb_mutex-GENERIC); [52804.664490]
The issue as follows: __netlink_dump_start() calls control->start(cb) with nlk->cb_mutex held. In control->start(cb) the dpll_lock is taken. Then nlk->cb_mutex is released and taken again in netlink_dump(), while dpll_lock still being held. That leads to ABBA deadlock when another CPU races with the same operation.
Fix this by moving dpll_lock taking into dumpit() callback which ensures correct lock taking order.
Fixes: 9d71b54b65b1 ("dpll: netlink: Add DPLL framework base functions") Signed-off-by: Jiri Pirko jiri@nvidia.com Reviewed-by: Arkadiusz Kubalewski arkadiusz.kubalewski@intel.com Link: https://lore.kernel.org/r/20240207115902.371649-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/netlink/specs/dpll.yaml | 4 ---- drivers/dpll/dpll_netlink.c | 20 ++++++-------------- drivers/dpll/dpll_nl.c | 4 ---- drivers/dpll/dpll_nl.h | 2 -- 4 files changed, 6 insertions(+), 24 deletions(-)
diff --git a/Documentation/netlink/specs/dpll.yaml b/Documentation/netlink/specs/dpll.yaml index cf8abe1c0550..2b4c4bcd8361 100644 --- a/Documentation/netlink/specs/dpll.yaml +++ b/Documentation/netlink/specs/dpll.yaml @@ -374,8 +374,6 @@ operations: - type
dump: - pre: dpll-lock-dumpit - post: dpll-unlock-dumpit reply: *dev-attrs
- @@ -462,8 +460,6 @@ operations: - phase-adjust
dump: - pre: dpll-lock-dumpit - post: dpll-unlock-dumpit request: attributes: - id diff --git a/drivers/dpll/dpll_netlink.c b/drivers/dpll/dpll_netlink.c index 7cc99d627942..c8c2e836193a 100644 --- a/drivers/dpll/dpll_netlink.c +++ b/drivers/dpll/dpll_netlink.c @@ -1171,6 +1171,7 @@ int dpll_nl_pin_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) unsigned long i; int ret = 0;
+ mutex_lock(&dpll_lock); xa_for_each_marked_start(&dpll_pin_xa, i, pin, DPLL_REGISTERED, ctx->idx) { if (!dpll_pin_available(pin)) @@ -1190,6 +1191,8 @@ int dpll_nl_pin_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) } genlmsg_end(skb, hdr); } + mutex_unlock(&dpll_lock); + if (ret == -EMSGSIZE) { ctx->idx = i; return skb->len; @@ -1345,6 +1348,7 @@ int dpll_nl_device_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) unsigned long i; int ret = 0;
+ mutex_lock(&dpll_lock); xa_for_each_marked_start(&dpll_device_xa, i, dpll, DPLL_REGISTERED, ctx->idx) { hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, @@ -1361,6 +1365,8 @@ int dpll_nl_device_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb) } genlmsg_end(skb, hdr); } + mutex_unlock(&dpll_lock); + if (ret == -EMSGSIZE) { ctx->idx = i; return skb->len; @@ -1411,20 +1417,6 @@ dpll_unlock_doit(const struct genl_split_ops *ops, struct sk_buff *skb, mutex_unlock(&dpll_lock); }
-int dpll_lock_dumpit(struct netlink_callback *cb) -{ - mutex_lock(&dpll_lock); - - return 0; -} - -int dpll_unlock_dumpit(struct netlink_callback *cb) -{ - mutex_unlock(&dpll_lock); - - return 0; -} - int dpll_pin_pre_doit(const struct genl_split_ops *ops, struct sk_buff *skb, struct genl_info *info) { diff --git a/drivers/dpll/dpll_nl.c b/drivers/dpll/dpll_nl.c index eaee5be7aa64..1e95f5397cfc 100644 --- a/drivers/dpll/dpll_nl.c +++ b/drivers/dpll/dpll_nl.c @@ -95,9 +95,7 @@ static const struct genl_split_ops dpll_nl_ops[] = { }, { .cmd = DPLL_CMD_DEVICE_GET, - .start = dpll_lock_dumpit, .dumpit = dpll_nl_device_get_dumpit, - .done = dpll_unlock_dumpit, .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DUMP, }, { @@ -129,9 +127,7 @@ static const struct genl_split_ops dpll_nl_ops[] = { }, { .cmd = DPLL_CMD_PIN_GET, - .start = dpll_lock_dumpit, .dumpit = dpll_nl_pin_get_dumpit, - .done = dpll_unlock_dumpit, .policy = dpll_pin_get_dump_nl_policy, .maxattr = DPLL_A_PIN_ID, .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DUMP, diff --git a/drivers/dpll/dpll_nl.h b/drivers/dpll/dpll_nl.h index 92d4c9c4f788..f491262bee4f 100644 --- a/drivers/dpll/dpll_nl.h +++ b/drivers/dpll/dpll_nl.h @@ -30,8 +30,6 @@ dpll_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb, void dpll_pin_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb, struct genl_info *info); -int dpll_lock_dumpit(struct netlink_callback *cb); -int dpll_unlock_dumpit(struct netlink_callback *cb);
int dpll_nl_device_id_get_doit(struct sk_buff *skb, struct genl_info *info); int dpll_nl_device_get_doit(struct sk_buff *skb, struct genl_info *info);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Pirko jiri@nvidia.com
[ Upstream commit aa1eec2f546f2afa8c98ec41e5d8ee488165d685 ]
I managed to hit following use after free warning recently:
[ 2169.711665] ================================================================== [ 2169.714009] BUG: KASAN: slab-use-after-free in __run_timers.part.0+0x179/0x4c0 [ 2169.716293] Write of size 8 at addr ffff88812b326a70 by task swapper/4/0
[ 2169.719022] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 6.8.0-rc2jiri+ #2 [ 2169.720974] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 2169.722457] Call Trace: [ 2169.722756] <IRQ> [ 2169.723024] dump_stack_lvl+0x58/0xb0 [ 2169.723417] print_report+0xc5/0x630 [ 2169.723807] ? __virt_addr_valid+0x126/0x2b0 [ 2169.724268] kasan_report+0xbe/0xf0 [ 2169.724667] ? __run_timers.part.0+0x179/0x4c0 [ 2169.725116] ? __run_timers.part.0+0x179/0x4c0 [ 2169.725570] __run_timers.part.0+0x179/0x4c0 [ 2169.726003] ? call_timer_fn+0x320/0x320 [ 2169.726404] ? lock_downgrade+0x3a0/0x3a0 [ 2169.726820] ? kvm_clock_get_cycles+0x14/0x20 [ 2169.727257] ? ktime_get+0x92/0x150 [ 2169.727630] ? lapic_next_deadline+0x35/0x60 [ 2169.728069] run_timer_softirq+0x40/0x80 [ 2169.728475] __do_softirq+0x1a1/0x509 [ 2169.728866] irq_exit_rcu+0x95/0xc0 [ 2169.729241] sysvec_apic_timer_interrupt+0x6b/0x80 [ 2169.729718] </IRQ> [ 2169.729993] <TASK> [ 2169.730259] asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 2169.730755] RIP: 0010:default_idle+0x13/0x20 [ 2169.731190] Code: c0 08 00 00 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 72 ff ff ff cc cc cc cc 8b 05 9a 7f 1f 02 85 c0 7e 07 0f 00 2d cf 69 43 00 fb f4 <fa> c3 66 66 2e 0f 1f 84 00 00 00 00 00 65 48 8b 04 25 c0 93 04 00 [ 2169.732759] RSP: 0018:ffff888100dbfe10 EFLAGS: 00000242 [ 2169.733264] RAX: 0000000000000001 RBX: ffff888100d9c200 RCX: ffffffff8241bd62 [ 2169.733925] RDX: ffffed109a848b15 RSI: 0000000000000004 RDI: ffffffff8127ac55 [ 2169.734566] RBP: 0000000000000004 R08: 0000000000000000 R09: ffffed109a848b14 [ 2169.735200] R10: ffff8884d42458a3 R11: 000000000000ba7e R12: ffffffff83d7d3a0 [ 2169.735835] R13: 1ffff110201b7fc6 R14: 0000000000000000 R15: ffff888100d9c200 [ 2169.736478] ? ct_kernel_exit.constprop.0+0xa2/0xc0 [ 2169.736954] ? do_idle+0x285/0x290 [ 2169.737323] default_idle_call+0x63/0x90 [ 2169.737730] do_idle+0x285/0x290 [ 2169.738089] ? arch_cpu_idle_exit+0x30/0x30 [ 2169.738511] ? mark_held_locks+0x1a/0x80 [ 2169.738917] ? lockdep_hardirqs_on_prepare+0x12e/0x200 [ 2169.739417] cpu_startup_entry+0x30/0x40 [ 2169.739825] start_secondary+0x19a/0x1c0 [ 2169.740229] ? set_cpu_sibling_map+0xbd0/0xbd0 [ 2169.740673] secondary_startup_64_no_verify+0x15d/0x16b [ 2169.741179] </TASK>
[ 2169.741686] Allocated by task 1098: [ 2169.742058] kasan_save_stack+0x1c/0x40 [ 2169.742456] kasan_save_track+0x10/0x30 [ 2169.742852] __kasan_kmalloc+0x83/0x90 [ 2169.743246] mlx5_dpll_probe+0xf5/0x3c0 [mlx5_dpll] [ 2169.743730] auxiliary_bus_probe+0x62/0xb0 [ 2169.744148] really_probe+0x127/0x590 [ 2169.744534] __driver_probe_device+0xd2/0x200 [ 2169.744973] device_driver_attach+0x6b/0xf0 [ 2169.745402] bind_store+0x90/0xe0 [ 2169.745761] kernfs_fop_write_iter+0x1df/0x2a0 [ 2169.746210] vfs_write+0x41f/0x790 [ 2169.746579] ksys_write+0xc7/0x160 [ 2169.746947] do_syscall_64+0x6f/0x140 [ 2169.747333] entry_SYSCALL_64_after_hwframe+0x46/0x4e
[ 2169.748049] Freed by task 1220: [ 2169.748393] kasan_save_stack+0x1c/0x40 [ 2169.748789] kasan_save_track+0x10/0x30 [ 2169.749188] kasan_save_free_info+0x3b/0x50 [ 2169.749621] poison_slab_object+0x106/0x180 [ 2169.750044] __kasan_slab_free+0x14/0x50 [ 2169.750451] kfree+0x118/0x330 [ 2169.750792] mlx5_dpll_remove+0xf5/0x110 [mlx5_dpll] [ 2169.751271] auxiliary_bus_remove+0x2e/0x40 [ 2169.751694] device_release_driver_internal+0x24b/0x2e0 [ 2169.752191] unbind_store+0xa6/0xb0 [ 2169.752563] kernfs_fop_write_iter+0x1df/0x2a0 [ 2169.753004] vfs_write+0x41f/0x790 [ 2169.753381] ksys_write+0xc7/0x160 [ 2169.753750] do_syscall_64+0x6f/0x140 [ 2169.754132] entry_SYSCALL_64_after_hwframe+0x46/0x4e
[ 2169.754847] Last potentially related work creation: [ 2169.755315] kasan_save_stack+0x1c/0x40 [ 2169.755709] __kasan_record_aux_stack+0x9b/0xf0 [ 2169.756165] __queue_work+0x382/0x8f0 [ 2169.756552] call_timer_fn+0x126/0x320 [ 2169.756941] __run_timers.part.0+0x2ea/0x4c0 [ 2169.757376] run_timer_softirq+0x40/0x80 [ 2169.757782] __do_softirq+0x1a1/0x509
[ 2169.758387] Second to last potentially related work creation: [ 2169.758924] kasan_save_stack+0x1c/0x40 [ 2169.759322] __kasan_record_aux_stack+0x9b/0xf0 [ 2169.759773] __queue_work+0x382/0x8f0 [ 2169.760156] call_timer_fn+0x126/0x320 [ 2169.760550] __run_timers.part.0+0x2ea/0x4c0 [ 2169.760978] run_timer_softirq+0x40/0x80 [ 2169.761381] __do_softirq+0x1a1/0x509
[ 2169.761998] The buggy address belongs to the object at ffff88812b326a00 which belongs to the cache kmalloc-256 of size 256 [ 2169.763061] The buggy address is located 112 bytes inside of freed 256-byte region [ffff88812b326a00, ffff88812b326b00)
[ 2169.764346] The buggy address belongs to the physical page: [ 2169.764866] page:000000000f2b1e89 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12b324 [ 2169.765731] head:000000000f2b1e89 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 2169.766484] anon flags: 0x200000000000840(slab|head|node=0|zone=2) [ 2169.767048] page_type: 0xffffffff() [ 2169.767422] raw: 0200000000000840 ffff888100042b40 0000000000000000 dead000000000001 [ 2169.768183] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 [ 2169.768899] page dumped because: kasan: bad access detected
[ 2169.769649] Memory state around the buggy address: [ 2169.770116] ffff88812b326900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 2169.770805] ffff88812b326980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 2169.771485] >ffff88812b326a00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 2169.772173] ^ [ 2169.772787] ffff88812b326a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 2169.773477] ffff88812b326b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 2169.774160] ================================================================== [ 2169.774845] ==================================================================
I didn't manage to reproduce it. Though the issue seems to be obvious. There is a chance that the mlx5_dpll_remove() calls cancel_delayed_work() when the work runs and manages to re-arm itself. In that case, after delay timer triggers next attempt to queue it, it works with freed memory.
Fix this by using cancel_delayed_work_sync() instead which makes sure that work is done when it returns.
Fixes: 496fd0a26bbf ("mlx5: Implement SyncE support using DPLL infrastructure") Signed-off-by: Jiri Pirko jiri@nvidia.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240206164328.360313-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/dpll.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dpll.c b/drivers/net/ethernet/mellanox/mlx5/core/dpll.c index 2cd81bb32c66..8ce5c8bcda1c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/dpll.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/dpll.c @@ -374,7 +374,7 @@ static void mlx5_dpll_remove(struct auxiliary_device *adev) struct mlx5_dpll *mdpll = auxiliary_get_drvdata(adev); struct mlx5_core_dev *mdev = mdpll->mdev;
- cancel_delayed_work(&mdpll->work); + cancel_delayed_work_sync(&mdpll->work); mlx5_dpll_mdev_netdev_untrack(mdpll, mdev); destroy_workqueue(mdpll->wq); dpll_pin_unregister(mdpll->dpll, mdpll->dpll_pin,
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
[ Upstream commit 4e1d71cabb19ec2586827adfc60d68689c68c194 ]
Recently, handshake_req_destroy_test1 started failing:
Expected handshake_req_destroy_test == req, but handshake_req_destroy_test == 0000000000000000 req == 0000000060f99b40 not ok 11 req_destroy works
This is because "sock_release(sock)" was replaced with "fput(filp)" to address a memory leak. Note that sock_release() is synchronous but fput() usually delays the final close and clean-up.
The delay is not consequential in the other cases that were changed but handshake_req_destroy_test1 is testing that handshake_req_cancel() followed by closing the file actually does call the ->hp_destroy method. Thus the PTR_EQ test at the end has to be sure that the final close is complete before it checks the pointer.
We cannot use a completion here because if ->hp_destroy is never called (ie, there is an API bug) then the test will hang.
Reported by: Guenter Roeck linux@roeck-us.net Closes: https://lore.kernel.org/netdev/ZcKDd1to4MPANCrn@tissot.1015granger.net/T/#ma... Fixes: 4a0f07d71b04 ("net/handshake: Fix memory leak in __sock_create() and sock_alloc_file()") Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Hannes Reinecke hare@suse.de Link: https://lore.kernel.org/r/170724699027.91401.7839730697326806733.stgit@oracl... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/handshake/handshake-test.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/handshake/handshake-test.c b/net/handshake/handshake-test.c index 16ed7bfd29e4..34fd1d9b2db8 100644 --- a/net/handshake/handshake-test.c +++ b/net/handshake/handshake-test.c @@ -471,7 +471,10 @@ static void handshake_req_destroy_test1(struct kunit *test) handshake_req_cancel(sock->sk);
/* Act */ - fput(filp); + /* Ensure the close/release/put process has run to + * completion before checking the result. + */ + __fput_sync(filp);
/* Assert */ KUNIT_EXPECT_PTR_EQ(test, handshake_req_destroy_test, req);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Magnus Karlsson magnus.karlsson@intel.com
[ Upstream commit 9b0ed890ac2ae233efd8b27d11aee28a19437bb8 ]
Do not report the XDP capability NETDEV_XDP_ACT_XSK_ZEROCOPY as the bonding driver does not support XDP and AF_XDP in zero-copy mode even if the real NIC drivers do.
Note that the driver used to report everything as supported before a device was bonded. Instead of just masking out the zero-copy support from this, have the driver report that no XDP feature is supported until a real device is bonded. This seems to be more truthful as it is the real drivers that decide what XDP features are supported.
Fixes: cb9e6e584d58 ("bonding: add xdp_features support") Reported-by: Prashant Batra prbatra.mail@gmail.com Link: https://lore.kernel.org/all/CAJ8uoz2ieZCopgqTvQ9ZY6xQgTbujmC6XkMTamhp68O-h_-... Signed-off-by: Magnus Karlsson magnus.karlsson@intel.com Reviewed-by: Toke Høiland-Jørgensen toke@redhat.com Link: https://lore.kernel.org/r/20240207084737.20890-1-magnus.karlsson@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 8e6cc0e133b7..6cf7f364704e 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1819,6 +1819,8 @@ void bond_xdp_set_features(struct net_device *bond_dev) bond_for_each_slave(bond, slave, iter) val &= slave->dev->xdp_features;
+ val &= ~NETDEV_XDP_ACT_XSK_ZEROCOPY; + xdp_set_features_flag(bond_dev, val); }
@@ -5934,9 +5936,6 @@ void bond_setup(struct net_device *bond_dev) if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP) bond_dev->features |= BOND_XFRM_FEATURES; #endif /* CONFIG_XFRM_OFFLOAD */ - - if (bond_xdp_check(bond)) - bond_dev->xdp_features = NETDEV_XDP_ACT_MASK; }
/* Destroy a bonding device.
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Parav Pandit parav@nvidia.com
[ Upstream commit 4ab18af47a2c2a80ac11674122935700caf80cc6 ]
Command example string is not read as command. Fix command annotation.
Fixes: a8ce7b26a51e ("devlink: Expose port function commands to control migratable") Signed-off-by: Parav Pandit parav@nvidia.com Reviewed-by: Jiri Pirko jiri@nvidia.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240206161717.466653-1-parav@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/networking/devlink/devlink-port.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst index e33ad2401ad7..562f46b41274 100644 --- a/Documentation/networking/devlink/devlink-port.rst +++ b/Documentation/networking/devlink/devlink-port.rst @@ -126,7 +126,7 @@ Users may also set the RoCE capability of the function using `devlink port function set roce` command.
Users may also set the function as migratable using -'devlink port function set migratable' command. +`devlink port function set migratable` command.
Users may also set the IPsec crypto capability of the function using `devlink port function set ipsec_crypto` command.
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Saravana Kannan saravanak@google.com
[ Upstream commit f4653ec9861cd96a1a6a3258c4a807898ee8cf3c ]
We have a more accurate function to find the right consumer of a remote-endpoint property instead of searching for a parent with compatible string property. So, use that instead. While at it, make the code to find the consumer a bit more flexible and based on the property being parsed.
Fixes: f7514a663016 ("of: property: fw_devlink: Add support for remote-endpoint") Signed-off-by: Saravana Kannan saravanak@google.com Link: https://lore.kernel.org/r/20240207011803.2637531-2-saravanak@google.com Signed-off-by: Rob Herring robh@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/of/property.c | 47 +++++++++---------------------------------- 1 file changed, 10 insertions(+), 37 deletions(-)
diff --git a/drivers/of/property.c b/drivers/of/property.c index afdaefbd03f6..41c3da8a54b6 100644 --- a/drivers/of/property.c +++ b/drivers/of/property.c @@ -1062,36 +1062,6 @@ of_fwnode_device_get_match_data(const struct fwnode_handle *fwnode, return of_device_get_match_data(dev); }
-static struct device_node *of_get_compat_node(struct device_node *np) -{ - of_node_get(np); - - while (np) { - if (!of_device_is_available(np)) { - of_node_put(np); - np = NULL; - } - - if (of_property_present(np, "compatible")) - break; - - np = of_get_next_parent(np); - } - - return np; -} - -static struct device_node *of_get_compat_node_parent(struct device_node *np) -{ - struct device_node *parent, *node; - - parent = of_get_parent(np); - node = of_get_compat_node(parent); - of_node_put(parent); - - return node; -} - static void of_link_to_phandle(struct device_node *con_np, struct device_node *sup_np) { @@ -1221,10 +1191,10 @@ static struct device_node *parse_##fname(struct device_node *np, \ * @parse_prop.prop_name: Name of property holding a phandle value * @parse_prop.index: For properties holding a list of phandles, this is the * index into the list + * @get_con_dev: If the consumer node containing the property is never converted + * to a struct device, implement this ops so fw_devlink can use it + * to find the true consumer. * @optional: Describes whether a supplier is mandatory or not - * @node_not_dev: The consumer node containing the property is never converted - * to a struct device. Instead, parse ancestor nodes for the - * compatible property to find a node corresponding to a device. * * Returns: * parse_prop() return values are @@ -1235,8 +1205,8 @@ static struct device_node *parse_##fname(struct device_node *np, \ struct supplier_bindings { struct device_node *(*parse_prop)(struct device_node *np, const char *prop_name, int index); + struct device_node *(*get_con_dev)(struct device_node *np); bool optional; - bool node_not_dev; };
DEFINE_SIMPLE_PROP(clocks, "clocks", "#clock-cells") @@ -1351,7 +1321,10 @@ static const struct supplier_bindings of_supplier_bindings[] = { { .parse_prop = parse_pinctrl6, }, { .parse_prop = parse_pinctrl7, }, { .parse_prop = parse_pinctrl8, }, - { .parse_prop = parse_remote_endpoint, .node_not_dev = true, }, + { + .parse_prop = parse_remote_endpoint, + .get_con_dev = of_graph_get_port_parent, + }, { .parse_prop = parse_pwms, }, { .parse_prop = parse_resets, }, { .parse_prop = parse_leds, }, @@ -1402,8 +1375,8 @@ static int of_link_property(struct device_node *con_np, const char *prop_name) while ((phandle = s->parse_prop(con_np, prop_name, i))) { struct device_node *con_dev_np;
- con_dev_np = s->node_not_dev - ? of_get_compat_node_parent(con_np) + con_dev_np = s->get_con_dev + ? s->get_con_dev(con_np) : of_node_get(con_np); matched = true; i++;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Saravana Kannan saravanak@google.com
[ Upstream commit 782bfd03c3ae2c0e6e01b661b8e18f1de50357be ]
After commit 4a032827daa8 ("of: property: Simplify of_link_to_phandle()"), remote-endpoint properties created a fwnode link from the consumer device to the supplier endpoint. This is a tiny bit inefficient (not buggy) when trying to create device links or detecting cycles. So, improve this the same way we improved finding the consumer of a remote-endpoint property.
Fixes: 4a032827daa8 ("of: property: Simplify of_link_to_phandle()") Signed-off-by: Saravana Kannan saravanak@google.com Link: https://lore.kernel.org/r/20240207011803.2637531-3-saravanak@google.com Signed-off-by: Rob Herring robh@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/of/property.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/of/property.c b/drivers/of/property.c index 41c3da8a54b6..aacedfdfedc6 100644 --- a/drivers/of/property.c +++ b/drivers/of/property.c @@ -1231,7 +1231,6 @@ DEFINE_SIMPLE_PROP(pinctrl5, "pinctrl-5", NULL) DEFINE_SIMPLE_PROP(pinctrl6, "pinctrl-6", NULL) DEFINE_SIMPLE_PROP(pinctrl7, "pinctrl-7", NULL) DEFINE_SIMPLE_PROP(pinctrl8, "pinctrl-8", NULL) -DEFINE_SIMPLE_PROP(remote_endpoint, "remote-endpoint", NULL) DEFINE_SIMPLE_PROP(pwms, "pwms", "#pwm-cells") DEFINE_SIMPLE_PROP(resets, "resets", "#reset-cells") DEFINE_SIMPLE_PROP(leds, "leds", NULL) @@ -1297,6 +1296,17 @@ static struct device_node *parse_interrupts(struct device_node *np, return of_irq_parse_one(np, index, &sup_args) ? NULL : sup_args.np; }
+static struct device_node *parse_remote_endpoint(struct device_node *np, + const char *prop_name, + int index) +{ + /* Return NULL for index > 0 to signify end of remote-endpoints. */ + if (!index || strcmp(prop_name, "remote-endpoint")) + return NULL; + + return of_graph_get_remote_port_parent(np); +} + static const struct supplier_bindings of_supplier_bindings[] = { { .parse_prop = parse_clocks, }, { .parse_prop = parse_interconnects, },
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lukas Bulwahn lukas.bulwahn@gmail.com
[ Upstream commit e5aa6d51a2ef8c7ef7e3fe76bebe530fb68e7f08 ]
Commit 73cfbfa9caea ("ALSA: hda/cs35l56: Add driver for Cirrus Logic CS35L56 amplifier") adds configs SND_HDA_SCODEC_CS35L56_{I2C,SPI}, which selects the non-existing config CS_DSP. Note the renaming in commit d7cfdf17cb9d ("firmware: cs_dsp: Rename KConfig symbol CS_DSP -> FW_CS_DSP"), though.
Select the intended config FW_CS_DSP.
This broken select command probably was not noticed as the configs also select SND_HDA_CS_DSP_CONTROLS and this then selects FW_CS_DSP. So, the select FW_CS_DSP could actually be dropped, but we will keep this redundancy in place as the author originally also intended to have this redundancy of selects in place.
Fixes: 73cfbfa9caea ("ALSA: hda/cs35l56: Add driver for Cirrus Logic CS35L56 amplifier") Signed-off-by: Lukas Bulwahn lukas.bulwahn@gmail.com Reviewed-by: Simon Trimmer simont@opensource.cirrus.com Link: https://lore.kernel.org/r/20240209082044.3981-1-lukas.bulwahn@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/pci/hda/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sound/pci/hda/Kconfig b/sound/pci/hda/Kconfig index 21a90b3c4cc7..8e0ff70fb610 100644 --- a/sound/pci/hda/Kconfig +++ b/sound/pci/hda/Kconfig @@ -156,7 +156,7 @@ config SND_HDA_SCODEC_CS35L56_I2C depends on I2C depends on ACPI || COMPILE_TEST depends on SND_SOC - select CS_DSP + select FW_CS_DSP select SND_HDA_GENERIC select SND_SOC_CS35L56_SHARED select SND_HDA_SCODEC_CS35L56 @@ -171,7 +171,7 @@ config SND_HDA_SCODEC_CS35L56_SPI depends on SPI_MASTER depends on ACPI || COMPILE_TEST depends on SND_SOC - select CS_DSP + select FW_CS_DSP select SND_HDA_GENERIC select SND_SOC_CS35L56_SHARED select SND_HDA_SCODEC_CS35L56
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hojin Nam hj96.nam@samsung.com
[ Upstream commit 719da04f2d1285922abca72b074fb6fa75d464ea ]
S2M NDR BI-ConflictAck opcode is described as 4 in the CXL r3.0 3.3.9 Table 3.43. However, it is defined as 3 in macro definition.
Fixes: 5d7107c72796 ("perf: CXL Performance Monitoring Unit driver") Signed-off-by: Hojin Nam hj96.nam@samsung.com Reviewed-by: Jonathan Cameron Jonathan.Cameron@huawei.com Link: https://lore.kernel.org/r/20240208013415epcms2p2904187c8a863f4d0d2adc980fb91... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/perf/cxl_pmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/perf/cxl_pmu.c b/drivers/perf/cxl_pmu.c index 365d964b0f6a..bc0d414a6aff 100644 --- a/drivers/perf/cxl_pmu.c +++ b/drivers/perf/cxl_pmu.c @@ -419,7 +419,7 @@ static struct attribute *cxl_pmu_event_attrs[] = { CXL_PMU_EVENT_CXL_ATTR(s2m_ndr_cmp, CXL_PMU_GID_S2M_NDR, BIT(0)), CXL_PMU_EVENT_CXL_ATTR(s2m_ndr_cmps, CXL_PMU_GID_S2M_NDR, BIT(1)), CXL_PMU_EVENT_CXL_ATTR(s2m_ndr_cmpe, CXL_PMU_GID_S2M_NDR, BIT(2)), - CXL_PMU_EVENT_CXL_ATTR(s2m_ndr_biconflictack, CXL_PMU_GID_S2M_NDR, BIT(3)), + CXL_PMU_EVENT_CXL_ATTR(s2m_ndr_biconflictack, CXL_PMU_GID_S2M_NDR, BIT(4)), /* CXL rev 3.0 Table 3-46 S2M DRS opcodes */ CXL_PMU_EVENT_CXL_ATTR(s2m_drs_memdata, CXL_PMU_GID_S2M_DRS, BIT(0)), CXL_PMU_EVENT_CXL_ATTR(s2m_drs_memdatanxm, CXL_PMU_GID_S2M_DRS, BIT(1)),
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit 4624a78c18c62da815f3253966b7a87995f77e1b ]
There is no h1 h2 actually. Remove it. Here is the test result after conversion.
]# ./test_bridge_backup_port.sh
Backup port ----------- TEST: Forwarding out of swp1 [ OK ] TEST: No forwarding out of vx0 [ OK ] TEST: swp1 carrier off [ OK ] TEST: No forwarding out of swp1 [ OK ] ... Backup nexthop ID - ping ------------------------ TEST: Ping with backup nexthop ID [ OK ] TEST: Ping after disabling backup nexthop ID [ OK ]
Backup nexthop ID - torture test -------------------------------- TEST: Torture test [ OK ]
Tests passed: 83 Tests failed: 0
Acked-by: David Ahern dsahern@kernel.org Signed-off-by: Hangbin Liu liuhangbin@gmail.com Reviewed-by: Ido Schimmel idosch@nvidia.com Tested-by: Ido Schimmel idosch@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: 38ee0cb2a2e2 ("selftests: net: Fix bridge backup port test flakiness") Signed-off-by: Sasha Levin sashal@kernel.org --- .../selftests/net/test_bridge_backup_port.sh | 371 +++++++++--------- 1 file changed, 182 insertions(+), 189 deletions(-)
diff --git a/tools/testing/selftests/net/test_bridge_backup_port.sh b/tools/testing/selftests/net/test_bridge_backup_port.sh index 112cfd8a10ad..70a7d87ba2d2 100755 --- a/tools/testing/selftests/net/test_bridge_backup_port.sh +++ b/tools/testing/selftests/net/test_bridge_backup_port.sh @@ -35,9 +35,8 @@ # | sw1 | | sw2 | # +------------------------------------+ +------------------------------------+
+source lib.sh ret=0 -# Kselftest framework requirement - SKIP code is 4. -ksft_skip=4
# All tests in this script. Can be overridden with -t option. TESTS=" @@ -132,9 +131,6 @@ setup_topo_ns() { local ns=$1; shift
- ip netns add $ns - ip -n $ns link set dev lo up - ip netns exec $ns sysctl -qw net.ipv6.conf.all.keep_addr_on_down=1 ip netns exec $ns sysctl -qw net.ipv6.conf.default.ignore_routes_with_linkdown=1 ip netns exec $ns sysctl -qw net.ipv6.conf.all.accept_dad=0 @@ -145,13 +141,14 @@ setup_topo() { local ns
- for ns in sw1 sw2; do + setup_ns sw1 sw2 + for ns in $sw1 $sw2; do setup_topo_ns $ns done
ip link add name veth0 type veth peer name veth1 - ip link set dev veth0 netns sw1 name veth0 - ip link set dev veth1 netns sw2 name veth0 + ip link set dev veth0 netns $sw1 name veth0 + ip link set dev veth1 netns $sw2 name veth0 }
setup_sw_common() @@ -190,7 +187,7 @@ setup_sw_common()
setup_sw1() { - local ns=sw1 + local ns=$sw1 local local_addr=192.0.2.33 local remote_addr=192.0.2.34 local veth_addr=192.0.2.49 @@ -203,7 +200,7 @@ setup_sw1()
setup_sw2() { - local ns=sw2 + local ns=$sw2 local local_addr=192.0.2.34 local remote_addr=192.0.2.33 local veth_addr=192.0.2.50 @@ -229,11 +226,7 @@ setup()
cleanup() { - local ns - - for ns in h1 h2 sw1 sw2; do - ip netns del $ns &> /dev/null - done + cleanup_ns $sw1 $sw2 }
################################################################################ @@ -248,85 +241,85 @@ backup_port() echo "Backup port" echo "-----------"
- run_cmd "tc -n sw1 qdisc replace dev swp1 clsact" - run_cmd "tc -n sw1 filter replace dev swp1 egress pref 1 handle 101 proto ip flower src_mac $smac dst_mac $dmac action pass" + run_cmd "tc -n $sw1 qdisc replace dev swp1 clsact" + run_cmd "tc -n $sw1 filter replace dev swp1 egress pref 1 handle 101 proto ip flower src_mac $smac dst_mac $dmac action pass"
- run_cmd "tc -n sw1 qdisc replace dev vx0 clsact" - run_cmd "tc -n sw1 filter replace dev vx0 egress pref 1 handle 101 proto ip flower src_mac $smac dst_mac $dmac action pass" + run_cmd "tc -n $sw1 qdisc replace dev vx0 clsact" + run_cmd "tc -n $sw1 filter replace dev vx0 egress pref 1 handle 101 proto ip flower src_mac $smac dst_mac $dmac action pass"
- run_cmd "bridge -n sw1 fdb replace $dmac dev swp1 master static vlan 10" + run_cmd "bridge -n $sw1 fdb replace $dmac dev swp1 master static vlan 10"
# Initial state - check that packets are forwarded out of swp1 when it # has a carrier and not forwarded out of any port when it does not have # a carrier. - run_cmd "ip netns exec sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" - tc_check_packets sw1 "dev swp1 egress" 101 1 + run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" + tc_check_packets $sw1 "dev swp1 egress" 101 1 log_test $? 0 "Forwarding out of swp1" - tc_check_packets sw1 "dev vx0 egress" 101 0 + tc_check_packets $sw1 "dev vx0 egress" 101 0 log_test $? 0 "No forwarding out of vx0"
- run_cmd "ip -n sw1 link set dev swp1 carrier off" + run_cmd "ip -n $sw1 link set dev swp1 carrier off" log_test $? 0 "swp1 carrier off"
- run_cmd "ip netns exec sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" - tc_check_packets sw1 "dev swp1 egress" 101 1 + run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" + tc_check_packets $sw1 "dev swp1 egress" 101 1 log_test $? 0 "No forwarding out of swp1" - tc_check_packets sw1 "dev vx0 egress" 101 0 + tc_check_packets $sw1 "dev vx0 egress" 101 0 log_test $? 0 "No forwarding out of vx0"
- run_cmd "ip -n sw1 link set dev swp1 carrier on" + run_cmd "ip -n $sw1 link set dev swp1 carrier on" log_test $? 0 "swp1 carrier on"
# Configure vx0 as the backup port of swp1 and check that packets are # forwarded out of swp1 when it has a carrier and out of vx0 when swp1 # does not have a carrier. - run_cmd "bridge -n sw1 link set dev swp1 backup_port vx0" - run_cmd "bridge -n sw1 -d link show dev swp1 | grep "backup_port vx0"" + run_cmd "bridge -n $sw1 link set dev swp1 backup_port vx0" + run_cmd "bridge -n $sw1 -d link show dev swp1 | grep "backup_port vx0"" log_test $? 0 "vx0 configured as backup port of swp1"
- run_cmd "ip netns exec sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" - tc_check_packets sw1 "dev swp1 egress" 101 2 + run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" + tc_check_packets $sw1 "dev swp1 egress" 101 2 log_test $? 0 "Forwarding out of swp1" - tc_check_packets sw1 "dev vx0 egress" 101 0 + tc_check_packets $sw1 "dev vx0 egress" 101 0 log_test $? 0 "No forwarding out of vx0"
- run_cmd "ip -n sw1 link set dev swp1 carrier off" + run_cmd "ip -n $sw1 link set dev swp1 carrier off" log_test $? 0 "swp1 carrier off"
- run_cmd "ip netns exec sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" - tc_check_packets sw1 "dev swp1 egress" 101 2 + run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" + tc_check_packets $sw1 "dev swp1 egress" 101 2 log_test $? 0 "No forwarding out of swp1" - tc_check_packets sw1 "dev vx0 egress" 101 1 + tc_check_packets $sw1 "dev vx0 egress" 101 1 log_test $? 0 "Forwarding out of vx0"
- run_cmd "ip -n sw1 link set dev swp1 carrier on" + run_cmd "ip -n $sw1 link set dev swp1 carrier on" log_test $? 0 "swp1 carrier on"
- run_cmd "ip netns exec sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" - tc_check_packets sw1 "dev swp1 egress" 101 3 + run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" + tc_check_packets $sw1 "dev swp1 egress" 101 3 log_test $? 0 "Forwarding out of swp1" - tc_check_packets sw1 "dev vx0 egress" 101 1 + tc_check_packets $sw1 "dev vx0 egress" 101 1 log_test $? 0 "No forwarding out of vx0"
# Remove vx0 as the backup port of swp1 and check that packets are no # longer forwarded out of vx0 when swp1 does not have a carrier. - run_cmd "bridge -n sw1 link set dev swp1 nobackup_port" - run_cmd "bridge -n sw1 -d link show dev swp1 | grep "backup_port vx0"" + run_cmd "bridge -n $sw1 link set dev swp1 nobackup_port" + run_cmd "bridge -n $sw1 -d link show dev swp1 | grep "backup_port vx0"" log_test $? 1 "vx0 not configured as backup port of swp1"
- run_cmd "ip netns exec sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" - tc_check_packets sw1 "dev swp1 egress" 101 4 + run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" + tc_check_packets $sw1 "dev swp1 egress" 101 4 log_test $? 0 "Forwarding out of swp1" - tc_check_packets sw1 "dev vx0 egress" 101 1 + tc_check_packets $sw1 "dev vx0 egress" 101 1 log_test $? 0 "No forwarding out of vx0"
- run_cmd "ip -n sw1 link set dev swp1 carrier off" + run_cmd "ip -n $sw1 link set dev swp1 carrier off" log_test $? 0 "swp1 carrier off"
- run_cmd "ip netns exec sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" - tc_check_packets sw1 "dev swp1 egress" 101 4 + run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" + tc_check_packets $sw1 "dev swp1 egress" 101 4 log_test $? 0 "No forwarding out of swp1" - tc_check_packets sw1 "dev vx0 egress" 101 1 + tc_check_packets $sw1 "dev vx0 egress" 101 1 log_test $? 0 "No forwarding out of vx0" }
@@ -339,125 +332,125 @@ backup_nhid() echo "Backup nexthop ID" echo "-----------------"
- run_cmd "tc -n sw1 qdisc replace dev swp1 clsact" - run_cmd "tc -n sw1 filter replace dev swp1 egress pref 1 handle 101 proto ip flower src_mac $smac dst_mac $dmac action pass" + run_cmd "tc -n $sw1 qdisc replace dev swp1 clsact" + run_cmd "tc -n $sw1 filter replace dev swp1 egress pref 1 handle 101 proto ip flower src_mac $smac dst_mac $dmac action pass"
- run_cmd "tc -n sw1 qdisc replace dev vx0 clsact" - run_cmd "tc -n sw1 filter replace dev vx0 egress pref 1 handle 101 proto ip flower src_mac $smac dst_mac $dmac action pass" + run_cmd "tc -n $sw1 qdisc replace dev vx0 clsact" + run_cmd "tc -n $sw1 filter replace dev vx0 egress pref 1 handle 101 proto ip flower src_mac $smac dst_mac $dmac action pass"
- run_cmd "ip -n sw1 nexthop replace id 1 via 192.0.2.34 fdb" - run_cmd "ip -n sw1 nexthop replace id 2 via 192.0.2.34 fdb" - run_cmd "ip -n sw1 nexthop replace id 10 group 1/2 fdb" + run_cmd "ip -n $sw1 nexthop replace id 1 via 192.0.2.34 fdb" + run_cmd "ip -n $sw1 nexthop replace id 2 via 192.0.2.34 fdb" + run_cmd "ip -n $sw1 nexthop replace id 10 group 1/2 fdb"
- run_cmd "bridge -n sw1 fdb replace $dmac dev swp1 master static vlan 10" - run_cmd "bridge -n sw1 fdb replace $dmac dev vx0 self static dst 192.0.2.36 src_vni 10010" + run_cmd "bridge -n $sw1 fdb replace $dmac dev swp1 master static vlan 10" + run_cmd "bridge -n $sw1 fdb replace $dmac dev vx0 self static dst 192.0.2.36 src_vni 10010"
- run_cmd "ip -n sw2 address replace 192.0.2.36/32 dev lo" + run_cmd "ip -n $sw2 address replace 192.0.2.36/32 dev lo"
# The first filter matches on packets forwarded using the backup # nexthop ID and the second filter matches on packets forwarded using a # regular VXLAN FDB entry. - run_cmd "tc -n sw2 qdisc replace dev vx0 clsact" - run_cmd "tc -n sw2 filter replace dev vx0 ingress pref 1 handle 101 proto ip flower src_mac $smac dst_mac $dmac enc_key_id 10010 enc_dst_ip 192.0.2.34 action pass" - run_cmd "tc -n sw2 filter replace dev vx0 ingress pref 1 handle 102 proto ip flower src_mac $smac dst_mac $dmac enc_key_id 10010 enc_dst_ip 192.0.2.36 action pass" + run_cmd "tc -n $sw2 qdisc replace dev vx0 clsact" + run_cmd "tc -n $sw2 filter replace dev vx0 ingress pref 1 handle 101 proto ip flower src_mac $smac dst_mac $dmac enc_key_id 10010 enc_dst_ip 192.0.2.34 action pass" + run_cmd "tc -n $sw2 filter replace dev vx0 ingress pref 1 handle 102 proto ip flower src_mac $smac dst_mac $dmac enc_key_id 10010 enc_dst_ip 192.0.2.36 action pass"
# Configure vx0 as the backup port of swp1 and check that packets are # forwarded out of swp1 when it has a carrier and out of vx0 when swp1 # does not have a carrier. When packets are forwarded out of vx0, check # that they are forwarded by the VXLAN FDB entry. - run_cmd "bridge -n sw1 link set dev swp1 backup_port vx0" - run_cmd "bridge -n sw1 -d link show dev swp1 | grep "backup_port vx0"" + run_cmd "bridge -n $sw1 link set dev swp1 backup_port vx0" + run_cmd "bridge -n $sw1 -d link show dev swp1 | grep "backup_port vx0"" log_test $? 0 "vx0 configured as backup port of swp1"
- run_cmd "ip netns exec sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" - tc_check_packets sw1 "dev swp1 egress" 101 1 + run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" + tc_check_packets $sw1 "dev swp1 egress" 101 1 log_test $? 0 "Forwarding out of swp1" - tc_check_packets sw1 "dev vx0 egress" 101 0 + tc_check_packets $sw1 "dev vx0 egress" 101 0 log_test $? 0 "No forwarding out of vx0"
- run_cmd "ip -n sw1 link set dev swp1 carrier off" + run_cmd "ip -n $sw1 link set dev swp1 carrier off" log_test $? 0 "swp1 carrier off"
- run_cmd "ip netns exec sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" - tc_check_packets sw1 "dev swp1 egress" 101 1 + run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" + tc_check_packets $sw1 "dev swp1 egress" 101 1 log_test $? 0 "No forwarding out of swp1" - tc_check_packets sw1 "dev vx0 egress" 101 1 + tc_check_packets $sw1 "dev vx0 egress" 101 1 log_test $? 0 "Forwarding out of vx0" - tc_check_packets sw2 "dev vx0 ingress" 101 0 + tc_check_packets $sw2 "dev vx0 ingress" 101 0 log_test $? 0 "No forwarding using backup nexthop ID" - tc_check_packets sw2 "dev vx0 ingress" 102 1 + tc_check_packets $sw2 "dev vx0 ingress" 102 1 log_test $? 0 "Forwarding using VXLAN FDB entry"
- run_cmd "ip -n sw1 link set dev swp1 carrier on" + run_cmd "ip -n $sw1 link set dev swp1 carrier on" log_test $? 0 "swp1 carrier on"
# Configure nexthop ID 10 as the backup nexthop ID of swp1 and check # that when packets are forwarded out of vx0, they are forwarded using # the backup nexthop ID. - run_cmd "bridge -n sw1 link set dev swp1 backup_nhid 10" - run_cmd "bridge -n sw1 -d link show dev swp1 | grep "backup_nhid 10"" + run_cmd "bridge -n $sw1 link set dev swp1 backup_nhid 10" + run_cmd "bridge -n $sw1 -d link show dev swp1 | grep "backup_nhid 10"" log_test $? 0 "nexthop ID 10 configured as backup nexthop ID of swp1"
- run_cmd "ip netns exec sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" - tc_check_packets sw1 "dev swp1 egress" 101 2 + run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" + tc_check_packets $sw1 "dev swp1 egress" 101 2 log_test $? 0 "Forwarding out of swp1" - tc_check_packets sw1 "dev vx0 egress" 101 1 + tc_check_packets $sw1 "dev vx0 egress" 101 1 log_test $? 0 "No forwarding out of vx0"
- run_cmd "ip -n sw1 link set dev swp1 carrier off" + run_cmd "ip -n $sw1 link set dev swp1 carrier off" log_test $? 0 "swp1 carrier off"
- run_cmd "ip netns exec sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" - tc_check_packets sw1 "dev swp1 egress" 101 2 + run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" + tc_check_packets $sw1 "dev swp1 egress" 101 2 log_test $? 0 "No forwarding out of swp1" - tc_check_packets sw1 "dev vx0 egress" 101 2 + tc_check_packets $sw1 "dev vx0 egress" 101 2 log_test $? 0 "Forwarding out of vx0" - tc_check_packets sw2 "dev vx0 ingress" 101 1 + tc_check_packets $sw2 "dev vx0 ingress" 101 1 log_test $? 0 "Forwarding using backup nexthop ID" - tc_check_packets sw2 "dev vx0 ingress" 102 1 + tc_check_packets $sw2 "dev vx0 ingress" 102 1 log_test $? 0 "No forwarding using VXLAN FDB entry"
- run_cmd "ip -n sw1 link set dev swp1 carrier on" + run_cmd "ip -n $sw1 link set dev swp1 carrier on" log_test $? 0 "swp1 carrier on"
- run_cmd "ip netns exec sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" - tc_check_packets sw1 "dev swp1 egress" 101 3 + run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" + tc_check_packets $sw1 "dev swp1 egress" 101 3 log_test $? 0 "Forwarding out of swp1" - tc_check_packets sw1 "dev vx0 egress" 101 2 + tc_check_packets $sw1 "dev vx0 egress" 101 2 log_test $? 0 "No forwarding out of vx0" - tc_check_packets sw2 "dev vx0 ingress" 101 1 + tc_check_packets $sw2 "dev vx0 ingress" 101 1 log_test $? 0 "No forwarding using backup nexthop ID" - tc_check_packets sw2 "dev vx0 ingress" 102 1 + tc_check_packets $sw2 "dev vx0 ingress" 102 1 log_test $? 0 "No forwarding using VXLAN FDB entry"
# Reset the backup nexthop ID to 0 and check that packets are no longer # forwarded using the backup nexthop ID when swp1 does not have a # carrier and are instead forwarded by the VXLAN FDB. - run_cmd "bridge -n sw1 link set dev swp1 backup_nhid 0" - run_cmd "bridge -n sw1 -d link show dev swp1 | grep "backup_nhid"" + run_cmd "bridge -n $sw1 link set dev swp1 backup_nhid 0" + run_cmd "bridge -n $sw1 -d link show dev swp1 | grep "backup_nhid"" log_test $? 1 "No backup nexthop ID configured for swp1"
- run_cmd "ip netns exec sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" - tc_check_packets sw1 "dev swp1 egress" 101 4 + run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" + tc_check_packets $sw1 "dev swp1 egress" 101 4 log_test $? 0 "Forwarding out of swp1" - tc_check_packets sw1 "dev vx0 egress" 101 2 + tc_check_packets $sw1 "dev vx0 egress" 101 2 log_test $? 0 "No forwarding out of vx0" - tc_check_packets sw2 "dev vx0 ingress" 101 1 + tc_check_packets $sw2 "dev vx0 ingress" 101 1 log_test $? 0 "No forwarding using backup nexthop ID" - tc_check_packets sw2 "dev vx0 ingress" 102 1 + tc_check_packets $sw2 "dev vx0 ingress" 102 1 log_test $? 0 "No forwarding using VXLAN FDB entry"
- run_cmd "ip -n sw1 link set dev swp1 carrier off" + run_cmd "ip -n $sw1 link set dev swp1 carrier off" log_test $? 0 "swp1 carrier off"
- run_cmd "ip netns exec sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" - tc_check_packets sw1 "dev swp1 egress" 101 4 + run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" + tc_check_packets $sw1 "dev swp1 egress" 101 4 log_test $? 0 "No forwarding out of swp1" - tc_check_packets sw1 "dev vx0 egress" 101 3 + tc_check_packets $sw1 "dev vx0 egress" 101 3 log_test $? 0 "Forwarding out of vx0" - tc_check_packets sw2 "dev vx0 ingress" 101 1 + tc_check_packets $sw2 "dev vx0 ingress" 101 1 log_test $? 0 "No forwarding using backup nexthop ID" - tc_check_packets sw2 "dev vx0 ingress" 102 2 + tc_check_packets $sw2 "dev vx0 ingress" 102 2 log_test $? 0 "Forwarding using VXLAN FDB entry" }
@@ -475,109 +468,109 @@ backup_nhid_invalid() # is forwarded out of the VXLAN port, but dropped by the VXLAN driver # and does not crash the host.
- run_cmd "tc -n sw1 qdisc replace dev swp1 clsact" - run_cmd "tc -n sw1 filter replace dev swp1 egress pref 1 handle 101 proto ip flower src_mac $smac dst_mac $dmac action pass" + run_cmd "tc -n $sw1 qdisc replace dev swp1 clsact" + run_cmd "tc -n $sw1 filter replace dev swp1 egress pref 1 handle 101 proto ip flower src_mac $smac dst_mac $dmac action pass"
- run_cmd "tc -n sw1 qdisc replace dev vx0 clsact" - run_cmd "tc -n sw1 filter replace dev vx0 egress pref 1 handle 101 proto ip flower src_mac $smac dst_mac $dmac action pass" + run_cmd "tc -n $sw1 qdisc replace dev vx0 clsact" + run_cmd "tc -n $sw1 filter replace dev vx0 egress pref 1 handle 101 proto ip flower src_mac $smac dst_mac $dmac action pass" # Drop all other Tx traffic to avoid changes to Tx drop counter. - run_cmd "tc -n sw1 filter replace dev vx0 egress pref 2 handle 102 proto all matchall action drop" + run_cmd "tc -n $sw1 filter replace dev vx0 egress pref 2 handle 102 proto all matchall action drop"
- tx_drop=$(ip -n sw1 -s -j link show dev vx0 | jq '.[]["stats64"]["tx"]["dropped"]') + tx_drop=$(ip -n $sw1 -s -j link show dev vx0 | jq '.[]["stats64"]["tx"]["dropped"]')
- run_cmd "ip -n sw1 nexthop replace id 1 via 192.0.2.34 fdb" - run_cmd "ip -n sw1 nexthop replace id 2 via 192.0.2.34 fdb" - run_cmd "ip -n sw1 nexthop replace id 10 group 1/2 fdb" + run_cmd "ip -n $sw1 nexthop replace id 1 via 192.0.2.34 fdb" + run_cmd "ip -n $sw1 nexthop replace id 2 via 192.0.2.34 fdb" + run_cmd "ip -n $sw1 nexthop replace id 10 group 1/2 fdb"
- run_cmd "bridge -n sw1 fdb replace $dmac dev swp1 master static vlan 10" + run_cmd "bridge -n $sw1 fdb replace $dmac dev swp1 master static vlan 10"
- run_cmd "tc -n sw2 qdisc replace dev vx0 clsact" - run_cmd "tc -n sw2 filter replace dev vx0 ingress pref 1 handle 101 proto ip flower src_mac $smac dst_mac $dmac enc_key_id 10010 enc_dst_ip 192.0.2.34 action pass" + run_cmd "tc -n $sw2 qdisc replace dev vx0 clsact" + run_cmd "tc -n $sw2 filter replace dev vx0 ingress pref 1 handle 101 proto ip flower src_mac $smac dst_mac $dmac enc_key_id 10010 enc_dst_ip 192.0.2.34 action pass"
# First, check that redirection works. - run_cmd "bridge -n sw1 link set dev swp1 backup_port vx0" - run_cmd "bridge -n sw1 -d link show dev swp1 | grep "backup_port vx0"" + run_cmd "bridge -n $sw1 link set dev swp1 backup_port vx0" + run_cmd "bridge -n $sw1 -d link show dev swp1 | grep "backup_port vx0"" log_test $? 0 "vx0 configured as backup port of swp1"
- run_cmd "bridge -n sw1 link set dev swp1 backup_nhid 10" - run_cmd "bridge -n sw1 -d link show dev swp1 | grep "backup_nhid 10"" + run_cmd "bridge -n $sw1 link set dev swp1 backup_nhid 10" + run_cmd "bridge -n $sw1 -d link show dev swp1 | grep "backup_nhid 10"" log_test $? 0 "Valid nexthop as backup nexthop"
- run_cmd "ip -n sw1 link set dev swp1 carrier off" + run_cmd "ip -n $sw1 link set dev swp1 carrier off" log_test $? 0 "swp1 carrier off"
- run_cmd "ip netns exec sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" - tc_check_packets sw1 "dev swp1 egress" 101 0 + run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" + tc_check_packets $sw1 "dev swp1 egress" 101 0 log_test $? 0 "No forwarding out of swp1" - tc_check_packets sw1 "dev vx0 egress" 101 1 + tc_check_packets $sw1 "dev vx0 egress" 101 1 log_test $? 0 "Forwarding out of vx0" - tc_check_packets sw2 "dev vx0 ingress" 101 1 + tc_check_packets $sw2 "dev vx0 ingress" 101 1 log_test $? 0 "Forwarding using backup nexthop ID" - run_cmd "ip -n sw1 -s -j link show dev vx0 | jq -e '.[]["stats64"]["tx"]["dropped"] == $tx_drop'" + run_cmd "ip -n $sw1 -s -j link show dev vx0 | jq -e '.[]["stats64"]["tx"]["dropped"] == $tx_drop'" log_test $? 0 "No Tx drop increase"
# Use a non-existent nexthop ID. - run_cmd "bridge -n sw1 link set dev swp1 backup_nhid 20" - run_cmd "bridge -n sw1 -d link show dev swp1 | grep "backup_nhid 20"" + run_cmd "bridge -n $sw1 link set dev swp1 backup_nhid 20" + run_cmd "bridge -n $sw1 -d link show dev swp1 | grep "backup_nhid 20"" log_test $? 0 "Non-existent nexthop as backup nexthop"
- run_cmd "ip netns exec sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" - tc_check_packets sw1 "dev swp1 egress" 101 0 + run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" + tc_check_packets $sw1 "dev swp1 egress" 101 0 log_test $? 0 "No forwarding out of swp1" - tc_check_packets sw1 "dev vx0 egress" 101 2 + tc_check_packets $sw1 "dev vx0 egress" 101 2 log_test $? 0 "Forwarding out of vx0" - tc_check_packets sw2 "dev vx0 ingress" 101 1 + tc_check_packets $sw2 "dev vx0 ingress" 101 1 log_test $? 0 "No forwarding using backup nexthop ID" - run_cmd "ip -n sw1 -s -j link show dev vx0 | jq -e '.[]["stats64"]["tx"]["dropped"] == $((tx_drop + 1))'" + run_cmd "ip -n $sw1 -s -j link show dev vx0 | jq -e '.[]["stats64"]["tx"]["dropped"] == $((tx_drop + 1))'" log_test $? 0 "Tx drop increased"
# Use a blckhole nexthop. - run_cmd "ip -n sw1 nexthop replace id 30 blackhole" - run_cmd "bridge -n sw1 link set dev swp1 backup_nhid 30" - run_cmd "bridge -n sw1 -d link show dev swp1 | grep "backup_nhid 30"" + run_cmd "ip -n $sw1 nexthop replace id 30 blackhole" + run_cmd "bridge -n $sw1 link set dev swp1 backup_nhid 30" + run_cmd "bridge -n $sw1 -d link show dev swp1 | grep "backup_nhid 30"" log_test $? 0 "Blackhole nexthop as backup nexthop"
- run_cmd "ip netns exec sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" - tc_check_packets sw1 "dev swp1 egress" 101 0 + run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" + tc_check_packets $sw1 "dev swp1 egress" 101 0 log_test $? 0 "No forwarding out of swp1" - tc_check_packets sw1 "dev vx0 egress" 101 3 + tc_check_packets $sw1 "dev vx0 egress" 101 3 log_test $? 0 "Forwarding out of vx0" - tc_check_packets sw2 "dev vx0 ingress" 101 1 + tc_check_packets $sw2 "dev vx0 ingress" 101 1 log_test $? 0 "No forwarding using backup nexthop ID" - run_cmd "ip -n sw1 -s -j link show dev vx0 | jq -e '.[]["stats64"]["tx"]["dropped"] == $((tx_drop + 2))'" + run_cmd "ip -n $sw1 -s -j link show dev vx0 | jq -e '.[]["stats64"]["tx"]["dropped"] == $((tx_drop + 2))'" log_test $? 0 "Tx drop increased"
# Non-group FDB nexthop. - run_cmd "bridge -n sw1 link set dev swp1 backup_nhid 1" - run_cmd "bridge -n sw1 -d link show dev swp1 | grep "backup_nhid 1"" + run_cmd "bridge -n $sw1 link set dev swp1 backup_nhid 1" + run_cmd "bridge -n $sw1 -d link show dev swp1 | grep "backup_nhid 1"" log_test $? 0 "Non-group FDB nexthop as backup nexthop"
- run_cmd "ip netns exec sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" - tc_check_packets sw1 "dev swp1 egress" 101 0 + run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" + tc_check_packets $sw1 "dev swp1 egress" 101 0 log_test $? 0 "No forwarding out of swp1" - tc_check_packets sw1 "dev vx0 egress" 101 4 + tc_check_packets $sw1 "dev vx0 egress" 101 4 log_test $? 0 "Forwarding out of vx0" - tc_check_packets sw2 "dev vx0 ingress" 101 1 + tc_check_packets $sw2 "dev vx0 ingress" 101 1 log_test $? 0 "No forwarding using backup nexthop ID" - run_cmd "ip -n sw1 -s -j link show dev vx0 | jq -e '.[]["stats64"]["tx"]["dropped"] == $((tx_drop + 3))'" + run_cmd "ip -n $sw1 -s -j link show dev vx0 | jq -e '.[]["stats64"]["tx"]["dropped"] == $((tx_drop + 3))'" log_test $? 0 "Tx drop increased"
# IPv6 address family nexthop. - run_cmd "ip -n sw1 nexthop replace id 100 via 2001:db8:100::1 fdb" - run_cmd "ip -n sw1 nexthop replace id 200 via 2001:db8:100::1 fdb" - run_cmd "ip -n sw1 nexthop replace id 300 group 100/200 fdb" - run_cmd "bridge -n sw1 link set dev swp1 backup_nhid 300" - run_cmd "bridge -n sw1 -d link show dev swp1 | grep "backup_nhid 300"" + run_cmd "ip -n $sw1 nexthop replace id 100 via 2001:db8:100::1 fdb" + run_cmd "ip -n $sw1 nexthop replace id 200 via 2001:db8:100::1 fdb" + run_cmd "ip -n $sw1 nexthop replace id 300 group 100/200 fdb" + run_cmd "bridge -n $sw1 link set dev swp1 backup_nhid 300" + run_cmd "bridge -n $sw1 -d link show dev swp1 | grep "backup_nhid 300"" log_test $? 0 "IPv6 address family nexthop as backup nexthop"
- run_cmd "ip netns exec sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" - tc_check_packets sw1 "dev swp1 egress" 101 0 + run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" + tc_check_packets $sw1 "dev swp1 egress" 101 0 log_test $? 0 "No forwarding out of swp1" - tc_check_packets sw1 "dev vx0 egress" 101 5 + tc_check_packets $sw1 "dev vx0 egress" 101 5 log_test $? 0 "Forwarding out of vx0" - tc_check_packets sw2 "dev vx0 ingress" 101 1 + tc_check_packets $sw2 "dev vx0 ingress" 101 1 log_test $? 0 "No forwarding using backup nexthop ID" - run_cmd "ip -n sw1 -s -j link show dev vx0 | jq -e '.[]["stats64"]["tx"]["dropped"] == $((tx_drop + 4))'" + run_cmd "ip -n $sw1 -s -j link show dev vx0 | jq -e '.[]["stats64"]["tx"]["dropped"] == $((tx_drop + 4))'" log_test $? 0 "Tx drop increased" }
@@ -591,44 +584,44 @@ backup_nhid_ping() echo "------------------------"
# Test bidirectional traffic when traffic is redirected in both VTEPs. - sw1_mac=$(ip -n sw1 -j -p link show br0.10 | jq -r '.[]["address"]') - sw2_mac=$(ip -n sw2 -j -p link show br0.10 | jq -r '.[]["address"]') + sw1_mac=$(ip -n $sw1 -j -p link show br0.10 | jq -r '.[]["address"]') + sw2_mac=$(ip -n $sw2 -j -p link show br0.10 | jq -r '.[]["address"]')
- run_cmd "bridge -n sw1 fdb replace $sw2_mac dev swp1 master static vlan 10" - run_cmd "bridge -n sw2 fdb replace $sw1_mac dev swp1 master static vlan 10" + run_cmd "bridge -n $sw1 fdb replace $sw2_mac dev swp1 master static vlan 10" + run_cmd "bridge -n $sw2 fdb replace $sw1_mac dev swp1 master static vlan 10"
- run_cmd "ip -n sw1 neigh replace 192.0.2.66 lladdr $sw2_mac nud perm dev br0.10" - run_cmd "ip -n sw2 neigh replace 192.0.2.65 lladdr $sw1_mac nud perm dev br0.10" + run_cmd "ip -n $sw1 neigh replace 192.0.2.66 lladdr $sw2_mac nud perm dev br0.10" + run_cmd "ip -n $sw2 neigh replace 192.0.2.65 lladdr $sw1_mac nud perm dev br0.10"
- run_cmd "ip -n sw1 nexthop replace id 1 via 192.0.2.34 fdb" - run_cmd "ip -n sw2 nexthop replace id 1 via 192.0.2.33 fdb" - run_cmd "ip -n sw1 nexthop replace id 10 group 1 fdb" - run_cmd "ip -n sw2 nexthop replace id 10 group 1 fdb" + run_cmd "ip -n $sw1 nexthop replace id 1 via 192.0.2.34 fdb" + run_cmd "ip -n $sw2 nexthop replace id 1 via 192.0.2.33 fdb" + run_cmd "ip -n $sw1 nexthop replace id 10 group 1 fdb" + run_cmd "ip -n $sw2 nexthop replace id 10 group 1 fdb"
- run_cmd "bridge -n sw1 link set dev swp1 backup_port vx0" - run_cmd "bridge -n sw2 link set dev swp1 backup_port vx0" - run_cmd "bridge -n sw1 link set dev swp1 backup_nhid 10" - run_cmd "bridge -n sw2 link set dev swp1 backup_nhid 10" + run_cmd "bridge -n $sw1 link set dev swp1 backup_port vx0" + run_cmd "bridge -n $sw2 link set dev swp1 backup_port vx0" + run_cmd "bridge -n $sw1 link set dev swp1 backup_nhid 10" + run_cmd "bridge -n $sw2 link set dev swp1 backup_nhid 10"
- run_cmd "ip -n sw1 link set dev swp1 carrier off" - run_cmd "ip -n sw2 link set dev swp1 carrier off" + run_cmd "ip -n $sw1 link set dev swp1 carrier off" + run_cmd "ip -n $sw2 link set dev swp1 carrier off"
- run_cmd "ip netns exec sw1 ping -i 0.1 -c 10 -w $PING_TIMEOUT 192.0.2.66" + run_cmd "ip netns exec $sw1 ping -i 0.1 -c 10 -w $PING_TIMEOUT 192.0.2.66" log_test $? 0 "Ping with backup nexthop ID"
# Reset the backup nexthop ID to 0 and check that ping fails. - run_cmd "bridge -n sw1 link set dev swp1 backup_nhid 0" - run_cmd "bridge -n sw2 link set dev swp1 backup_nhid 0" + run_cmd "bridge -n $sw1 link set dev swp1 backup_nhid 0" + run_cmd "bridge -n $sw2 link set dev swp1 backup_nhid 0"
- run_cmd "ip netns exec sw1 ping -i 0.1 -c 10 -w $PING_TIMEOUT 192.0.2.66" + run_cmd "ip netns exec $sw1 ping -i 0.1 -c 10 -w $PING_TIMEOUT 192.0.2.66" log_test $? 1 "Ping after disabling backup nexthop ID" }
backup_nhid_add_del_loop() { while true; do - ip -n sw1 nexthop del id 10 - ip -n sw1 nexthop replace id 10 group 1/2 fdb + ip -n $sw1 nexthop del id 10 + ip -n $sw1 nexthop replace id 10 group 1/2 fdb done >/dev/null 2>&1 }
@@ -648,19 +641,19 @@ backup_nhid_torture() # deleting the group. The test is considered successful if nothing # crashed.
- run_cmd "ip -n sw1 nexthop replace id 1 via 192.0.2.34 fdb" - run_cmd "ip -n sw1 nexthop replace id 2 via 192.0.2.34 fdb" - run_cmd "ip -n sw1 nexthop replace id 10 group 1/2 fdb" + run_cmd "ip -n $sw1 nexthop replace id 1 via 192.0.2.34 fdb" + run_cmd "ip -n $sw1 nexthop replace id 2 via 192.0.2.34 fdb" + run_cmd "ip -n $sw1 nexthop replace id 10 group 1/2 fdb"
- run_cmd "bridge -n sw1 fdb replace $dmac dev swp1 master static vlan 10" + run_cmd "bridge -n $sw1 fdb replace $dmac dev swp1 master static vlan 10"
- run_cmd "bridge -n sw1 link set dev swp1 backup_port vx0" - run_cmd "bridge -n sw1 link set dev swp1 backup_nhid 10" - run_cmd "ip -n sw1 link set dev swp1 carrier off" + run_cmd "bridge -n $sw1 link set dev swp1 backup_port vx0" + run_cmd "bridge -n $sw1 link set dev swp1 backup_nhid 10" + run_cmd "ip -n $sw1 link set dev swp1 carrier off"
backup_nhid_add_del_loop & pid1=$! - ip netns exec sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 0 & + ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 0 & pid2=$!
sleep 30
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit 38ee0cb2a2e2ade077442085638eb181b0562971 ]
The test toggles the carrier of a bridge port in order to test the bridge backup port feature.
Due to the linkwatch delayed work the carrier change is not always reflected fast enough to the bridge driver and packets are not forwarded as the test expects, resulting in failures [1].
Fix by busy waiting on the bridge port state until it changes to the desired state following the carrier change.
[1] # Backup port # ----------- [...] # TEST: swp1 carrier off [ OK ] # TEST: No forwarding out of swp1 [FAIL] [ 641.995910] br0: port 1(swp1) entered disabled state # TEST: No forwarding out of vx0 [ OK ]
Fixes: b408453053fb ("selftests: net: Add bridge backup port and backup nexthop ID test") Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Petr Machata petrm@nvidia.com Acked-by: Paolo Abeni pabeni@redhat.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20240208123110.1063930-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../selftests/net/test_bridge_backup_port.sh | 23 +++++++++++++++++++ 1 file changed, 23 insertions(+)
diff --git a/tools/testing/selftests/net/test_bridge_backup_port.sh b/tools/testing/selftests/net/test_bridge_backup_port.sh index 70a7d87ba2d2..1b3f89e2b86e 100755 --- a/tools/testing/selftests/net/test_bridge_backup_port.sh +++ b/tools/testing/selftests/net/test_bridge_backup_port.sh @@ -124,6 +124,16 @@ tc_check_packets() [[ $pkts == $count ]] }
+bridge_link_check() +{ + local ns=$1; shift + local dev=$1; shift + local state=$1; shift + + bridge -n $ns -d -j link show dev $dev | \ + jq -e ".[]["state"] == "$state"" &> /dev/null +} + ################################################################################ # Setup
@@ -259,6 +269,7 @@ backup_port() log_test $? 0 "No forwarding out of vx0"
run_cmd "ip -n $sw1 link set dev swp1 carrier off" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled log_test $? 0 "swp1 carrier off"
run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" @@ -268,6 +279,7 @@ backup_port() log_test $? 0 "No forwarding out of vx0"
run_cmd "ip -n $sw1 link set dev swp1 carrier on" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 forwarding log_test $? 0 "swp1 carrier on"
# Configure vx0 as the backup port of swp1 and check that packets are @@ -284,6 +296,7 @@ backup_port() log_test $? 0 "No forwarding out of vx0"
run_cmd "ip -n $sw1 link set dev swp1 carrier off" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled log_test $? 0 "swp1 carrier off"
run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" @@ -293,6 +306,7 @@ backup_port() log_test $? 0 "Forwarding out of vx0"
run_cmd "ip -n $sw1 link set dev swp1 carrier on" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 forwarding log_test $? 0 "swp1 carrier on"
run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" @@ -314,6 +328,7 @@ backup_port() log_test $? 0 "No forwarding out of vx0"
run_cmd "ip -n $sw1 link set dev swp1 carrier off" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled log_test $? 0 "swp1 carrier off"
run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" @@ -369,6 +384,7 @@ backup_nhid() log_test $? 0 "No forwarding out of vx0"
run_cmd "ip -n $sw1 link set dev swp1 carrier off" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled log_test $? 0 "swp1 carrier off"
run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" @@ -382,6 +398,7 @@ backup_nhid() log_test $? 0 "Forwarding using VXLAN FDB entry"
run_cmd "ip -n $sw1 link set dev swp1 carrier on" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 forwarding log_test $? 0 "swp1 carrier on"
# Configure nexthop ID 10 as the backup nexthop ID of swp1 and check @@ -398,6 +415,7 @@ backup_nhid() log_test $? 0 "No forwarding out of vx0"
run_cmd "ip -n $sw1 link set dev swp1 carrier off" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled log_test $? 0 "swp1 carrier off"
run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" @@ -411,6 +429,7 @@ backup_nhid() log_test $? 0 "No forwarding using VXLAN FDB entry"
run_cmd "ip -n $sw1 link set dev swp1 carrier on" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 forwarding log_test $? 0 "swp1 carrier on"
run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" @@ -441,6 +460,7 @@ backup_nhid() log_test $? 0 "No forwarding using VXLAN FDB entry"
run_cmd "ip -n $sw1 link set dev swp1 carrier off" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled log_test $? 0 "swp1 carrier off"
run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" @@ -497,6 +517,7 @@ backup_nhid_invalid() log_test $? 0 "Valid nexthop as backup nexthop"
run_cmd "ip -n $sw1 link set dev swp1 carrier off" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled log_test $? 0 "swp1 carrier off"
run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" @@ -604,7 +625,9 @@ backup_nhid_ping() run_cmd "bridge -n $sw2 link set dev swp1 backup_nhid 10"
run_cmd "ip -n $sw1 link set dev swp1 carrier off" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled run_cmd "ip -n $sw2 link set dev swp1 carrier off" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw2 swp1 disabled
run_cmd "ip netns exec $sw1 ping -i 0.1 -c 10 -w $PING_TIMEOUT 192.0.2.66" log_test $? 0 "Ping with backup nexthop ID"
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit 93590849a05edffaefa11695fab98f621259ded2 ]
After enabling a multicast querier on the bridge (like the test is doing), the bridge will wait for the Max Response Delay before starting to forward according to its MDB in order to let Membership Reports enough time to be received and processed.
Currently, the test is waiting for exactly the default Max Response Delay (10 seconds) which is racy and leads to failures [1].
Fix by reducing the Max Response Delay to 1 second.
[1] [...] # TEST: L2 miss - Multicast (IPv4) [FAIL] # Unregistered multicast filter was hit after adding MDB entry
Fixes: 8c33266ae26a ("selftests: forwarding: Add layer 2 miss test cases") Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20240208155529.1199729-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../testing/selftests/net/forwarding/tc_flower_l2_miss.sh | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/tc_flower_l2_miss.sh b/tools/testing/selftests/net/forwarding/tc_flower_l2_miss.sh index 20a7cb7222b8..c2420bb72c12 100755 --- a/tools/testing/selftests/net/forwarding/tc_flower_l2_miss.sh +++ b/tools/testing/selftests/net/forwarding/tc_flower_l2_miss.sh @@ -209,14 +209,17 @@ test_l2_miss_multicast() # both registered and unregistered multicast traffic. bridge link set dev $swp2 mcast_router 2
+ # Set the Max Response Delay to 100 centiseconds (1 second) so that the + # bridge will start forwarding according to its MDB soon after a + # multicast querier is enabled. + ip link set dev br1 type bridge mcast_query_response_interval 100 + # Forwarding according to MDB entries only takes place when the bridge # detects that there is a valid querier in the network. Set the bridge # as the querier and assign it a valid IPv6 link-local address to be # used as the source address for MLD queries. ip link set dev br1 type bridge mcast_querier 1 ip -6 address add fe80::1/64 nodad dev br1 - # Wait the default Query Response Interval (10 seconds) for the bridge - # to determine that there are no other queriers in the network. sleep 10
test_l2_miss_multicast_ipv4 @@ -224,6 +227,7 @@ test_l2_miss_multicast()
ip -6 address del fe80::1/64 dev br1 ip link set dev br1 type bridge mcast_querier 0 + ip link set dev br1 type bridge mcast_query_response_interval 1000 bridge link set dev $swp2 mcast_router 1 }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit 7399e2ce4d424f426417496eb289458780eea985 ]
After enabling a multicast querier on the bridge (like the test is doing), the bridge will wait for the Max Response Delay before starting to forward according to its MDB in order to let Membership Reports enough time to be received and processed.
Currently, the test is waiting for exactly the default Max Response Delay (10 seconds) which is racy and leads to failures [1].
Fix by reducing the Max Response Delay to 1 second.
[1] [...] # TEST: IPv4 host entries forwarding tests [FAIL] # Packet locally received after flood
Fixes: b6d00da08610 ("selftests: forwarding: Add bridge MDB test") Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20240208155529.1199729-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/forwarding/bridge_mdb.sh | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/bridge_mdb.sh b/tools/testing/selftests/net/forwarding/bridge_mdb.sh index e4e3e9405056..ebeb43f6606c 100755 --- a/tools/testing/selftests/net/forwarding/bridge_mdb.sh +++ b/tools/testing/selftests/net/forwarding/bridge_mdb.sh @@ -1065,14 +1065,17 @@ fwd_test() echo log_info "# Forwarding tests"
+ # Set the Max Response Delay to 100 centiseconds (1 second) so that the + # bridge will start forwarding according to its MDB soon after a + # multicast querier is enabled. + ip link set dev br0 type bridge mcast_query_response_interval 100 + # Forwarding according to MDB entries only takes place when the bridge # detects that there is a valid querier in the network. Set the bridge # as the querier and assign it a valid IPv6 link-local address to be # used as the source address for MLD queries. ip -6 address add fe80::1/64 nodad dev br0 ip link set dev br0 type bridge mcast_querier 1 - # Wait the default Query Response Interval (10 seconds) for the bridge - # to determine that there are no other queriers in the network. sleep 10
fwd_test_host @@ -1080,6 +1083,7 @@ fwd_test()
ip link set dev br0 type bridge mcast_querier 0 ip -6 address del fe80::1/64 dev br0 + ip link set dev br0 type bridge mcast_query_response_interval 1000 }
ctrl_igmpv3_is_in_test()
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit dd6b34589441f2ad4698dd88a664811550148b41 ]
Suppress the following grep warnings:
[...] INFO: # Port group entries configuration tests - (*, G) TEST: Common port group entries configuration tests (IPv4 (*, G)) [ OK ] TEST: Common port group entries configuration tests (IPv6 (*, G)) [ OK ] grep: warning: stray \ before / grep: warning: stray \ before / grep: warning: stray \ before / TEST: IPv4 (*, G) port group entries configuration tests [ OK ] grep: warning: stray \ before / grep: warning: stray \ before / grep: warning: stray \ before / TEST: IPv6 (*, G) port group entries configuration tests [ OK ] [...]
They do not fail the test, but do clutter the output.
Fixes: b6d00da08610 ("selftests: forwarding: Add bridge MDB test") Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20240208155529.1199729-4-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/forwarding/bridge_mdb.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/bridge_mdb.sh b/tools/testing/selftests/net/forwarding/bridge_mdb.sh index ebeb43f6606c..a3678dfe5848 100755 --- a/tools/testing/selftests/net/forwarding/bridge_mdb.sh +++ b/tools/testing/selftests/net/forwarding/bridge_mdb.sh @@ -329,7 +329,7 @@ __cfg_test_port_ip_star_g()
bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q " 0.00" check_err $? "(*, G) "permanent" entry has a pending group timer" - bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q "/0.00" + bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q "/0.00" check_err $? ""permanent" source entry has a pending source timer"
bridge mdb del dev br0 port $swp1 grp $grp vid 10 @@ -346,7 +346,7 @@ __cfg_test_port_ip_star_g()
bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q " 0.00" check_fail $? "(*, G) EXCLUDE entry does not have a pending group timer" - bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q "/0.00" + bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q "/0.00" check_err $? ""blocked" source entry has a pending source timer"
bridge mdb del dev br0 port $swp1 grp $grp vid 10 @@ -363,7 +363,7 @@ __cfg_test_port_ip_star_g()
bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q " 0.00" check_err $? "(*, G) INCLUDE entry has a pending group timer" - bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q "/0.00" + bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q "/0.00" check_fail $? "Source entry does not have a pending source timer"
bridge mdb del dev br0 port $swp1 grp $grp vid 10
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit f97f1fcc96908c97a240ff6cb4474e155abfa0d7 ]
The redirection test case fails in the netdev CI on debug kernels because an FDB entry is learned despite the presence of a tc filter that redirects incoming traffic [1].
I am unable to reproduce the failure locally, but I can see how it can happen given that learning is first enabled and only then the ingress tc filter is configured. On debug kernels the time window between these two operations is longer compared to regular kernels, allowing random packets to be transmitted and trigger learning.
Fix by reversing the order and configure the ingress tc filter before enabling learning.
[1] [...] # TEST: Locked port MAB redirect [FAIL] # Locked entry created for redirected traffic
Fixes: 38c43a1ce758 ("selftests: forwarding: Add test case for traffic redirection from a locked port") Signed-off-by: Ido Schimmel idosch@nvidia.com Reviewed-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20240208155529.1199729-5-idosch@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/forwarding/bridge_locked_port.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/bridge_locked_port.sh b/tools/testing/selftests/net/forwarding/bridge_locked_port.sh index 9af9f6964808..c62331b2e006 100755 --- a/tools/testing/selftests/net/forwarding/bridge_locked_port.sh +++ b/tools/testing/selftests/net/forwarding/bridge_locked_port.sh @@ -327,10 +327,10 @@ locked_port_mab_redirect() RET=0 check_port_mab_support || return 0
- bridge link set dev $swp1 learning on locked on mab on tc qdisc add dev $swp1 clsact tc filter add dev $swp1 ingress protocol all pref 1 handle 101 flower \ action mirred egress redirect dev $swp2 + bridge link set dev $swp1 learning on locked on mab on
ping_do $h1 192.0.2.2 check_err $? "Ping did not work with redirection" @@ -349,8 +349,8 @@ locked_port_mab_redirect() check_err $? "Locked entry not created after deleting filter"
bridge fdb del `mac_get $h1` vlan 1 dev $swp1 master - tc qdisc del dev $swp1 clsact bridge link set dev $swp1 learning off locked off mab off + tc qdisc del dev $swp1 clsact
log_test "Locked port MAB redirect" }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aaron Conole aconole@redhat.com
[ Upstream commit 6e2f90d31fe09f2b852de25125ca875aabd81367 ]
The ovs module allows for some actions to recursively contain an action list for complex scenarios, such as sampling, checking lengths, etc. When these actions are copied into the internal flow table, they are evaluated to validate that such actions make sense, and these calls happen recursively.
The ovs-vswitchd userspace won't emit more than 16 recursion levels deep. However, the module has no such limit and will happily accept limits larger than 16 levels nested. Prevent this by tracking the number of recursions happening and manually limiting it to 16 levels nested.
The initial implementation of the sample action would track this depth and prevent more than 3 levels of recursion, but this was removed to support the clone use case, rather than limited at the current userspace limit.
Fixes: 798c166173ff ("openvswitch: Optimize sample action for the clone use cases") Signed-off-by: Aaron Conole aconole@redhat.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240207132416.1488485-2-aconole@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/openvswitch/flow_netlink.c | 49 +++++++++++++++++++++++----------- 1 file changed, 33 insertions(+), 16 deletions(-)
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c index 88965e2068ac..ebc5728aab4e 100644 --- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -48,6 +48,7 @@ struct ovs_len_tbl {
#define OVS_ATTR_NESTED -1 #define OVS_ATTR_VARIABLE -2 +#define OVS_COPY_ACTIONS_MAX_DEPTH 16
static bool actions_may_change_flow(const struct nlattr *actions) { @@ -2545,13 +2546,15 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr, const struct sw_flow_key *key, struct sw_flow_actions **sfa, __be16 eth_type, __be16 vlan_tci, - u32 mpls_label_count, bool log); + u32 mpls_label_count, bool log, + u32 depth);
static int validate_and_copy_sample(struct net *net, const struct nlattr *attr, const struct sw_flow_key *key, struct sw_flow_actions **sfa, __be16 eth_type, __be16 vlan_tci, - u32 mpls_label_count, bool log, bool last) + u32 mpls_label_count, bool log, bool last, + u32 depth) { const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1]; const struct nlattr *probability, *actions; @@ -2602,7 +2605,8 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr, return err;
err = __ovs_nla_copy_actions(net, actions, key, sfa, - eth_type, vlan_tci, mpls_label_count, log); + eth_type, vlan_tci, mpls_label_count, log, + depth + 1);
if (err) return err; @@ -2617,7 +2621,8 @@ static int validate_and_copy_dec_ttl(struct net *net, const struct sw_flow_key *key, struct sw_flow_actions **sfa, __be16 eth_type, __be16 vlan_tci, - u32 mpls_label_count, bool log) + u32 mpls_label_count, bool log, + u32 depth) { const struct nlattr *attrs[OVS_DEC_TTL_ATTR_MAX + 1]; int start, action_start, err, rem; @@ -2660,7 +2665,8 @@ static int validate_and_copy_dec_ttl(struct net *net, return action_start;
err = __ovs_nla_copy_actions(net, actions, key, sfa, eth_type, - vlan_tci, mpls_label_count, log); + vlan_tci, mpls_label_count, log, + depth + 1); if (err) return err;
@@ -2674,7 +2680,8 @@ static int validate_and_copy_clone(struct net *net, const struct sw_flow_key *key, struct sw_flow_actions **sfa, __be16 eth_type, __be16 vlan_tci, - u32 mpls_label_count, bool log, bool last) + u32 mpls_label_count, bool log, bool last, + u32 depth) { int start, err; u32 exec; @@ -2694,7 +2701,8 @@ static int validate_and_copy_clone(struct net *net, return err;
err = __ovs_nla_copy_actions(net, attr, key, sfa, - eth_type, vlan_tci, mpls_label_count, log); + eth_type, vlan_tci, mpls_label_count, log, + depth + 1); if (err) return err;
@@ -3063,7 +3071,7 @@ static int validate_and_copy_check_pkt_len(struct net *net, struct sw_flow_actions **sfa, __be16 eth_type, __be16 vlan_tci, u32 mpls_label_count, - bool log, bool last) + bool log, bool last, u32 depth) { const struct nlattr *acts_if_greater, *acts_if_lesser_eq; struct nlattr *a[OVS_CHECK_PKT_LEN_ATTR_MAX + 1]; @@ -3111,7 +3119,8 @@ static int validate_and_copy_check_pkt_len(struct net *net, return nested_acts_start;
err = __ovs_nla_copy_actions(net, acts_if_lesser_eq, key, sfa, - eth_type, vlan_tci, mpls_label_count, log); + eth_type, vlan_tci, mpls_label_count, log, + depth + 1);
if (err) return err; @@ -3124,7 +3133,8 @@ static int validate_and_copy_check_pkt_len(struct net *net, return nested_acts_start;
err = __ovs_nla_copy_actions(net, acts_if_greater, key, sfa, - eth_type, vlan_tci, mpls_label_count, log); + eth_type, vlan_tci, mpls_label_count, log, + depth + 1);
if (err) return err; @@ -3152,12 +3162,16 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr, const struct sw_flow_key *key, struct sw_flow_actions **sfa, __be16 eth_type, __be16 vlan_tci, - u32 mpls_label_count, bool log) + u32 mpls_label_count, bool log, + u32 depth) { u8 mac_proto = ovs_key_mac_proto(key); const struct nlattr *a; int rem, err;
+ if (depth > OVS_COPY_ACTIONS_MAX_DEPTH) + return -EOVERFLOW; + nla_for_each_nested(a, attr, rem) { /* Expected argument lengths, (u32)-1 for variable length. */ static const u32 action_lens[OVS_ACTION_ATTR_MAX + 1] = { @@ -3355,7 +3369,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr, err = validate_and_copy_sample(net, a, key, sfa, eth_type, vlan_tci, mpls_label_count, - log, last); + log, last, depth); if (err) return err; skip_copy = true; @@ -3426,7 +3440,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr, err = validate_and_copy_clone(net, a, key, sfa, eth_type, vlan_tci, mpls_label_count, - log, last); + log, last, depth); if (err) return err; skip_copy = true; @@ -3440,7 +3454,8 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr, eth_type, vlan_tci, mpls_label_count, - log, last); + log, last, + depth); if (err) return err; skip_copy = true; @@ -3450,7 +3465,8 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr, case OVS_ACTION_ATTR_DEC_TTL: err = validate_and_copy_dec_ttl(net, a, key, sfa, eth_type, vlan_tci, - mpls_label_count, log); + mpls_label_count, log, + depth); if (err) return err; skip_copy = true; @@ -3495,7 +3511,8 @@ int ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
(*sfa)->orig_len = nla_len(attr); err = __ovs_nla_copy_actions(net, attr, key, sfa, key->eth.type, - key->eth.vlan.tci, mpls_label_count, log); + key->eth.vlan.tci, mpls_label_count, log, + 0); if (err) ovs_nla_free_flow_actions(*sfa);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Horatiu Vultur horatiu.vultur@microchip.com
[ Upstream commit 15faa1f67ab405d47789d4702f587ec7df7ef03e ]
There is a crash when adding one of the lan966x interfaces under a lag interface. The issue can be reproduced like this: ip link add name bond0 type bond miimon 100 mode balance-xor ip link set dev eth0 master bond0
The reason is because when adding a interface under the lag it would go through all the ports and try to figure out which other ports are under that lag interface. And the issue is that lan966x can have ports that are NULL pointer as they are not probed. So then iterating over these ports it would just crash as they are NULL pointers. The fix consists in actually checking for NULL pointers before accessing something from the ports. Like we do in other places.
Fixes: cabc9d49333d ("net: lan966x: Add lag support for lan966x") Signed-off-by: Horatiu Vultur horatiu.vultur@microchip.com Reviewed-by: Michal Swiatkowski michal.swiatkowski@linux.intel.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240206123054.3052966-1-horatiu.vultur@microchip.... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/microchip/lan966x/lan966x_lag.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c b/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c index 41fa2523d91d..5f2cd9a8cf8f 100644 --- a/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c @@ -37,19 +37,24 @@ static void lan966x_lag_set_aggr_pgids(struct lan966x *lan966x)
/* Now, set PGIDs for each active LAG */ for (lag = 0; lag < lan966x->num_phys_ports; ++lag) { - struct net_device *bond = lan966x->ports[lag]->bond; + struct lan966x_port *port = lan966x->ports[lag]; int num_active_ports = 0; + struct net_device *bond; unsigned long bond_mask; u8 aggr_idx[16];
- if (!bond || (visited & BIT(lag))) + if (!port || !port->bond || (visited & BIT(lag))) continue;
+ bond = port->bond; bond_mask = lan966x_lag_get_mask(lan966x, bond);
for_each_set_bit(p, &bond_mask, lan966x->num_phys_ports) { struct lan966x_port *port = lan966x->ports[p];
+ if (!port) + continue; + lan_wr(ANA_PGID_PGID_SET(bond_mask), lan966x, ANA_PGID(p)); if (port->lag_tx_active)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit c57ca512f3b68ddcd62bda9cc24a8f5584ab01b1 ]
Factor out waiting for async encrypt and decrypt to finish. There are already multiple copies and a subsequent fix will need more. No functional changes.
Note that crypto_wait_req() returns wait->err
Signed-off-by: Jakub Kicinski kuba@kernel.org Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: aec7961916f3 ("tls: fix race between async notify and socket close") Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 96 +++++++++++++++++++++++------------------------- 1 file changed, 45 insertions(+), 51 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 31e8a94dfc11..6a73714f34cc 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -230,6 +230,20 @@ static void tls_decrypt_done(void *data, int err) spin_unlock_bh(&ctx->decrypt_compl_lock); }
+static int tls_decrypt_async_wait(struct tls_sw_context_rx *ctx) +{ + int pending; + + spin_lock_bh(&ctx->decrypt_compl_lock); + reinit_completion(&ctx->async_wait.completion); + pending = atomic_read(&ctx->decrypt_pending); + spin_unlock_bh(&ctx->decrypt_compl_lock); + if (pending) + crypto_wait_req(-EINPROGRESS, &ctx->async_wait); + + return ctx->async_wait.err; +} + static int tls_do_decryption(struct sock *sk, struct scatterlist *sgin, struct scatterlist *sgout, @@ -495,6 +509,28 @@ static void tls_encrypt_done(void *data, int err) schedule_delayed_work(&ctx->tx_work.work, 1); }
+static int tls_encrypt_async_wait(struct tls_sw_context_tx *ctx) +{ + int pending; + + spin_lock_bh(&ctx->encrypt_compl_lock); + ctx->async_notify = true; + + pending = atomic_read(&ctx->encrypt_pending); + spin_unlock_bh(&ctx->encrypt_compl_lock); + if (pending) + crypto_wait_req(-EINPROGRESS, &ctx->async_wait); + else + reinit_completion(&ctx->async_wait.completion); + + /* There can be no concurrent accesses, since we have no + * pending encrypt operations + */ + WRITE_ONCE(ctx->async_notify, false); + + return ctx->async_wait.err; +} + static int tls_do_encryption(struct sock *sk, struct tls_context *tls_ctx, struct tls_sw_context_tx *ctx, @@ -984,7 +1020,6 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg, int num_zc = 0; int orig_size; int ret = 0; - int pending;
if (!eor && (msg->msg_flags & MSG_EOR)) return -EINVAL; @@ -1163,24 +1198,12 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg, if (!num_async) { goto send_end; } else if (num_zc) { - /* Wait for pending encryptions to get completed */ - spin_lock_bh(&ctx->encrypt_compl_lock); - ctx->async_notify = true; - - pending = atomic_read(&ctx->encrypt_pending); - spin_unlock_bh(&ctx->encrypt_compl_lock); - if (pending) - crypto_wait_req(-EINPROGRESS, &ctx->async_wait); - else - reinit_completion(&ctx->async_wait.completion); - - /* There can be no concurrent accesses, since we have no - * pending encrypt operations - */ - WRITE_ONCE(ctx->async_notify, false); + int err;
- if (ctx->async_wait.err) { - ret = ctx->async_wait.err; + /* Wait for pending encryptions to get completed */ + err = tls_encrypt_async_wait(ctx); + if (err) { + ret = err; copied = 0; } } @@ -1229,7 +1252,6 @@ void tls_sw_splice_eof(struct socket *sock) ssize_t copied = 0; bool retrying = false; int ret = 0; - int pending;
if (!ctx->open_rec) return; @@ -1264,22 +1286,7 @@ void tls_sw_splice_eof(struct socket *sock) }
/* Wait for pending encryptions to get completed */ - spin_lock_bh(&ctx->encrypt_compl_lock); - ctx->async_notify = true; - - pending = atomic_read(&ctx->encrypt_pending); - spin_unlock_bh(&ctx->encrypt_compl_lock); - if (pending) - crypto_wait_req(-EINPROGRESS, &ctx->async_wait); - else - reinit_completion(&ctx->async_wait.completion); - - /* There can be no concurrent accesses, since we have no pending - * encrypt operations - */ - WRITE_ONCE(ctx->async_notify, false); - - if (ctx->async_wait.err) + if (tls_encrypt_async_wait(ctx)) goto unlock;
/* Transmit if any encryptions have completed */ @@ -2109,16 +2116,10 @@ int tls_sw_recvmsg(struct sock *sk,
recv_end: if (async) { - int ret, pending; + int ret;
/* Wait for all previously submitted records to be decrypted */ - spin_lock_bh(&ctx->decrypt_compl_lock); - reinit_completion(&ctx->async_wait.completion); - pending = atomic_read(&ctx->decrypt_pending); - spin_unlock_bh(&ctx->decrypt_compl_lock); - ret = 0; - if (pending) - ret = crypto_wait_req(-EINPROGRESS, &ctx->async_wait); + ret = tls_decrypt_async_wait(ctx); __skb_queue_purge(&ctx->async_hold);
if (ret) { @@ -2435,16 +2436,9 @@ void tls_sw_release_resources_tx(struct sock *sk) struct tls_context *tls_ctx = tls_get_ctx(sk); struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx); struct tls_rec *rec, *tmp; - int pending;
/* Wait for any pending async encryptions to complete */ - spin_lock_bh(&ctx->encrypt_compl_lock); - ctx->async_notify = true; - pending = atomic_read(&ctx->encrypt_pending); - spin_unlock_bh(&ctx->encrypt_compl_lock); - - if (pending) - crypto_wait_req(-EINPROGRESS, &ctx->async_wait); + tls_encrypt_async_wait(ctx);
tls_tx_records(sk, -1);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit aec7961916f3f9e88766e2688992da6980f11b8d ]
The submitting thread (one which called recvmsg/sendmsg) may exit as soon as the async crypto handler calls complete() so any code past that point risks touching already freed data.
Try to avoid the locking and extra flags altogether. Have the main thread hold an extra reference, this way we can depend solely on the atomic ref counter for synchronization.
Don't futz with reiniting the completion, either, we are now tightly controlling when completion fires.
Reported-by: valis sec@valis.email Fixes: 0cada33241d9 ("net/tls: fix race condition causing kernel panic") Signed-off-by: Jakub Kicinski kuba@kernel.org Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Eric Dumazet edumazet@google.com Reviewed-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/tls.h | 5 ----- net/tls/tls_sw.c | 43 ++++++++++--------------------------------- 2 files changed, 10 insertions(+), 38 deletions(-)
diff --git a/include/net/tls.h b/include/net/tls.h index 962f0c501111..340ad43971e4 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -97,9 +97,6 @@ struct tls_sw_context_tx { struct tls_rec *open_rec; struct list_head tx_list; atomic_t encrypt_pending; - /* protect crypto_wait with encrypt_pending */ - spinlock_t encrypt_compl_lock; - int async_notify; u8 async_capable:1;
#define BIT_TX_SCHEDULED 0 @@ -136,8 +133,6 @@ struct tls_sw_context_rx { struct tls_strparser strp;
atomic_t decrypt_pending; - /* protect crypto_wait with decrypt_pending*/ - spinlock_t decrypt_compl_lock; struct sk_buff_head async_hold; struct wait_queue_head wq; }; diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 6a73714f34cc..635305bebfef 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -224,22 +224,15 @@ static void tls_decrypt_done(void *data, int err)
kfree(aead_req);
- spin_lock_bh(&ctx->decrypt_compl_lock); - if (!atomic_dec_return(&ctx->decrypt_pending)) + if (atomic_dec_and_test(&ctx->decrypt_pending)) complete(&ctx->async_wait.completion); - spin_unlock_bh(&ctx->decrypt_compl_lock); }
static int tls_decrypt_async_wait(struct tls_sw_context_rx *ctx) { - int pending; - - spin_lock_bh(&ctx->decrypt_compl_lock); - reinit_completion(&ctx->async_wait.completion); - pending = atomic_read(&ctx->decrypt_pending); - spin_unlock_bh(&ctx->decrypt_compl_lock); - if (pending) + if (!atomic_dec_and_test(&ctx->decrypt_pending)) crypto_wait_req(-EINPROGRESS, &ctx->async_wait); + atomic_inc(&ctx->decrypt_pending);
return ctx->async_wait.err; } @@ -267,6 +260,7 @@ static int tls_do_decryption(struct sock *sk, aead_request_set_callback(aead_req, CRYPTO_TFM_REQ_MAY_BACKLOG, tls_decrypt_done, aead_req); + DEBUG_NET_WARN_ON_ONCE(atomic_read(&ctx->decrypt_pending) < 1); atomic_inc(&ctx->decrypt_pending); } else { aead_request_set_callback(aead_req, @@ -455,7 +449,6 @@ static void tls_encrypt_done(void *data, int err) struct sk_msg *msg_en; bool ready = false; struct sock *sk; - int pending;
msg_en = &rec->msg_encrypted;
@@ -494,12 +487,8 @@ static void tls_encrypt_done(void *data, int err) ready = true; }
- spin_lock_bh(&ctx->encrypt_compl_lock); - pending = atomic_dec_return(&ctx->encrypt_pending); - - if (!pending && ctx->async_notify) + if (atomic_dec_and_test(&ctx->encrypt_pending)) complete(&ctx->async_wait.completion); - spin_unlock_bh(&ctx->encrypt_compl_lock);
if (!ready) return; @@ -511,22 +500,9 @@ static void tls_encrypt_done(void *data, int err)
static int tls_encrypt_async_wait(struct tls_sw_context_tx *ctx) { - int pending; - - spin_lock_bh(&ctx->encrypt_compl_lock); - ctx->async_notify = true; - - pending = atomic_read(&ctx->encrypt_pending); - spin_unlock_bh(&ctx->encrypt_compl_lock); - if (pending) + if (!atomic_dec_and_test(&ctx->encrypt_pending)) crypto_wait_req(-EINPROGRESS, &ctx->async_wait); - else - reinit_completion(&ctx->async_wait.completion); - - /* There can be no concurrent accesses, since we have no - * pending encrypt operations - */ - WRITE_ONCE(ctx->async_notify, false); + atomic_inc(&ctx->encrypt_pending);
return ctx->async_wait.err; } @@ -577,6 +553,7 @@ static int tls_do_encryption(struct sock *sk,
/* Add the record in tx_list */ list_add_tail((struct list_head *)&rec->list, &ctx->tx_list); + DEBUG_NET_WARN_ON_ONCE(atomic_read(&ctx->encrypt_pending) < 1); atomic_inc(&ctx->encrypt_pending);
rc = crypto_aead_encrypt(aead_req); @@ -2601,7 +2578,7 @@ static struct tls_sw_context_tx *init_ctx_tx(struct tls_context *ctx, struct soc }
crypto_init_wait(&sw_ctx_tx->async_wait); - spin_lock_init(&sw_ctx_tx->encrypt_compl_lock); + atomic_set(&sw_ctx_tx->encrypt_pending, 1); INIT_LIST_HEAD(&sw_ctx_tx->tx_list); INIT_DELAYED_WORK(&sw_ctx_tx->tx_work.work, tx_work_handler); sw_ctx_tx->tx_work.sk = sk; @@ -2622,7 +2599,7 @@ static struct tls_sw_context_rx *init_ctx_rx(struct tls_context *ctx) }
crypto_init_wait(&sw_ctx_rx->async_wait); - spin_lock_init(&sw_ctx_rx->decrypt_compl_lock); + atomic_set(&sw_ctx_rx->decrypt_pending, 1); init_waitqueue_head(&sw_ctx_rx->wq); skb_queue_head_init(&sw_ctx_rx->rx_list); skb_queue_head_init(&sw_ctx_rx->async_hold);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit e01e3934a1b2d122919f73bc6ddbe1cdafc4bbdb ]
Similarly to previous commit, the submitting thread (recvmsg/sendmsg) may exit as soon as the async crypto handler calls complete(). Reorder scheduling the work before calling complete(). This seems more logical in the first place, as it's the inverse order of what the submitting thread will do.
Reported-by: valis sec@valis.email Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Jakub Kicinski kuba@kernel.org Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 635305bebfef..9374a61cef00 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -447,7 +447,6 @@ static void tls_encrypt_done(void *data, int err) struct tls_rec *rec = data; struct scatterlist *sge; struct sk_msg *msg_en; - bool ready = false; struct sock *sk;
msg_en = &rec->msg_encrypted; @@ -483,19 +482,16 @@ static void tls_encrypt_done(void *data, int err) /* If received record is at head of tx_list, schedule tx */ first_rec = list_first_entry(&ctx->tx_list, struct tls_rec, list); - if (rec == first_rec) - ready = true; + if (rec == first_rec) { + /* Schedule the transmission */ + if (!test_and_set_bit(BIT_TX_SCHEDULED, + &ctx->tx_bitmask)) + schedule_delayed_work(&ctx->tx_work.work, 1); + } }
if (atomic_dec_and_test(&ctx->encrypt_pending)) complete(&ctx->async_wait.completion); - - if (!ready) - return; - - /* Schedule the transmission */ - if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) - schedule_delayed_work(&ctx->tx_work.work, 1); }
static int tls_encrypt_async_wait(struct tls_sw_context_tx *ctx)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 8590541473188741055d27b955db0777569438e3 ]
Since we're setting the CRYPTO_TFM_REQ_MAY_BACKLOG flag on our requests to the crypto API, crypto_aead_{encrypt,decrypt} can return -EBUSY instead of -EINPROGRESS in valid situations. For example, when the cryptd queue for AESNI is full (easy to trigger with an artificially low cryptd.cryptd_max_cpu_qlen), requests will be enqueued to the backlog but still processed. In that case, the async callback will also be called twice: first with err == -EINPROGRESS, which it seems we can just ignore, then with err == 0.
Compared to Sabrina's original patch this version uses the new tls_*crypt_async_wait() helpers and converts the EBUSY to EINPROGRESS to avoid having to modify all the error handling paths. The handling is identical.
Fixes: a54667f6728c ("tls: Add support for encryption using async offload accelerator") Fixes: 94524d8fc965 ("net/tls: Add support for async decryption of tls records") Co-developed-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: Sabrina Dubroca sd@queasysnail.net Link: https://lore.kernel.org/netdev/9681d1febfec295449a62300938ed2ae66983f28.1694... Signed-off-by: Jakub Kicinski kuba@kernel.org Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 9374a61cef00..63bef5666e36 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -196,6 +196,17 @@ static void tls_decrypt_done(void *data, int err) struct sock *sk; int aead_size;
+ /* If requests get too backlogged crypto API returns -EBUSY and calls + * ->complete(-EINPROGRESS) immediately followed by ->complete(0) + * to make waiting for backlog to flush with crypto_wait_req() easier. + * First wait converts -EBUSY -> -EINPROGRESS, and the second one + * -EINPROGRESS -> 0. + * We have a single struct crypto_async_request per direction, this + * scheme doesn't help us, so just ignore the first ->complete(). + */ + if (err == -EINPROGRESS) + return; + aead_size = sizeof(*aead_req) + crypto_aead_reqsize(aead); aead_size = ALIGN(aead_size, __alignof__(*dctx)); dctx = (void *)((u8 *)aead_req + aead_size); @@ -269,6 +280,10 @@ static int tls_do_decryption(struct sock *sk, }
ret = crypto_aead_decrypt(aead_req); + if (ret == -EBUSY) { + ret = tls_decrypt_async_wait(ctx); + ret = ret ?: -EINPROGRESS; + } if (ret == -EINPROGRESS) { if (darg->async) return 0; @@ -449,6 +464,9 @@ static void tls_encrypt_done(void *data, int err) struct sk_msg *msg_en; struct sock *sk;
+ if (err == -EINPROGRESS) /* see the comment in tls_decrypt_done() */ + return; + msg_en = &rec->msg_encrypted;
sk = rec->sk; @@ -553,6 +571,10 @@ static int tls_do_encryption(struct sock *sk, atomic_inc(&ctx->encrypt_pending);
rc = crypto_aead_encrypt(aead_req); + if (rc == -EBUSY) { + rc = tls_encrypt_async_wait(ctx); + rc = rc ?: -EINPROGRESS; + } if (!rc || rc != -EINPROGRESS) { atomic_dec(&ctx->encrypt_pending); sge->offset -= prot->prepend_size;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sabrina Dubroca sd@queasysnail.net
[ Upstream commit 32b55c5ff9103b8508c1e04bfa5a08c64e7a925f ]
tls_decrypt_sg doesn't take a reference on the pages from clear_skb, so the put_page() in tls_decrypt_done releases them, and we trigger a use-after-free in process_rx_list when we try to read from the partially-read skb.
Fixes: fd31f3996af2 ("tls: rx: decrypt into a fresh skb") Signed-off-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: Jakub Kicinski kuba@kernel.org Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 63bef5666e36..a6eff21ade23 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -63,6 +63,7 @@ struct tls_decrypt_ctx { u8 iv[TLS_MAX_IV_SIZE]; u8 aad[TLS_MAX_AAD_SIZE]; u8 tail; + bool free_sgout; struct scatterlist sg[]; };
@@ -187,7 +188,6 @@ static void tls_decrypt_done(void *data, int err) struct aead_request *aead_req = data; struct crypto_aead *aead = crypto_aead_reqtfm(aead_req); struct scatterlist *sgout = aead_req->dst; - struct scatterlist *sgin = aead_req->src; struct tls_sw_context_rx *ctx; struct tls_decrypt_ctx *dctx; struct tls_context *tls_ctx; @@ -224,7 +224,7 @@ static void tls_decrypt_done(void *data, int err) }
/* Free the destination pages if skb was not decrypted inplace */ - if (sgout != sgin) { + if (dctx->free_sgout) { /* Skip the first S/G entry as it points to AAD */ for_each_sg(sg_next(sgout), sg, UINT_MAX, pages) { if (!sg) @@ -1583,6 +1583,7 @@ static int tls_decrypt_sg(struct sock *sk, struct iov_iter *out_iov, } else if (out_sg) { memcpy(sgout, out_sg, n_sgout * sizeof(*sgout)); } + dctx->free_sgout = !!pages;
/* Prepare and submit AEAD request */ err = tls_do_decryption(sk, sgin, sgout, dctx->iv,
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit ac437a51ce662364062f704e321227f6728e6adc ]
We double count async, non-zc rx data. The previous fix was lucky because if we fully zc async_copy_bytes is 0 so we add 0. Decrypted already has all the bytes we handled, in all cases. We don't have to adjust anything, delete the erroneous line.
Fixes: 4d42cd6bc2ac ("tls: rx: fix return value for async crypto") Co-developed-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: Jakub Kicinski kuba@kernel.org Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index a6eff21ade23..9fbc70200cd0 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -2132,7 +2132,6 @@ int tls_sw_recvmsg(struct sock *sk, else err = process_rx_list(ctx, msg, &control, 0, async_copy_bytes, is_peek); - decrypted += max(err, 0); }
copied += decrypted;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Uwe Kleine-König u.kleine-koenig@pengutronix.de
[ Upstream commit b3aa619a8b4706f35cb62f780c14e68796b37f3f ]
Since commit 24778be20f87 ("spi: convert drivers to use bits_per_word_mask") the bits_per_word variable is only written to. The check that was there before isn't needed any more as the spi core ensures that only 8 bit transfers are used, so the variable can go away together with all assignments to it.
Fixes: 24778be20f87 ("spi: convert drivers to use bits_per_word_mask") Signed-off-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Link: https://lore.kernel.org/r/20240210164006.208149-8-u.kleine-koenig@pengutroni... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-ppc4xx.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/drivers/spi/spi-ppc4xx.c b/drivers/spi/spi-ppc4xx.c index 03aab661be9d..e982d3189fdc 100644 --- a/drivers/spi/spi-ppc4xx.c +++ b/drivers/spi/spi-ppc4xx.c @@ -166,10 +166,8 @@ static int spi_ppc4xx_setupxfer(struct spi_device *spi, struct spi_transfer *t) int scr; u8 cdm = 0; u32 speed; - u8 bits_per_word;
/* Start with the generic configuration for this device. */ - bits_per_word = spi->bits_per_word; speed = spi->max_speed_hz;
/* @@ -177,9 +175,6 @@ static int spi_ppc4xx_setupxfer(struct spi_device *spi, struct spi_transfer *t) * the transfer to overwrite the generic configuration with zeros. */ if (t) { - if (t->bits_per_word) - bits_per_word = t->bits_per_word; - if (t->speed_hz) speed = min(t->speed_hz, spi->max_speed_hz); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexey Khoroshilov khoroshilov@ispras.ru
[ Upstream commit 6ef5d5b92f7117b324efaac72b3db27ae8bb3082 ]
There is a path in rt5645_jack_detect_work(), where rt5645->jd_mutex is left locked forever. That may lead to deadlock when rt5645_jack_detect_work() is called for the second time.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: cdba4301adda ("ASoC: rt5650: add mutex to avoid the jack detection failure") Signed-off-by: Alexey Khoroshilov khoroshilov@ispras.ru Link: https://lore.kernel.org/r/1707645514-21196-1-git-send-email-khoroshilov@ispr... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/codecs/rt5645.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/sound/soc/codecs/rt5645.c b/sound/soc/codecs/rt5645.c index edcb85bd8ea7..ea08b7cfc31d 100644 --- a/sound/soc/codecs/rt5645.c +++ b/sound/soc/codecs/rt5645.c @@ -3314,6 +3314,7 @@ static void rt5645_jack_detect_work(struct work_struct *work) report, SND_JACK_HEADPHONE); snd_soc_jack_report(rt5645->mic_jack, report, SND_JACK_MICROPHONE); + mutex_unlock(&rt5645->jd_mutex); return; case 4: val = snd_soc_component_read(rt5645->component, RT5645_A_JD_CTRL1) & 0x0020;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Manasi Navare navaremanasi@chromium.org
[ Upstream commit 962ac2dce56bb3aad1f82a4bbe3ada57a020287c ]
Commit bd077259d0a9 ("drm/i915/vdsc: Add function to read any PPS register") defines a new macro to calculate the DSC PPS register addresses with PPS number as an input. This macro correctly calculates the addresses till PPS 11 since the addresses increment by 4. So in that case the following macro works correctly to give correct register address:
_MMIO(_DSCA_PPS_0 + (pps) * 4)
However after PPS 11, the register address for PPS 12 increments by 12 because of RC Buffer memory allocation in between. Because of this discontinuity in the address space, the macro calculates wrong addresses for PPS 12 - 16 resulting into incorrect DSC PPS parameter value read/writes causing DSC corruption.
This fixes it by correcting this macro to add the offset of 12 for PPS
=12.
v3: Add correct paranthesis for pps argument (Jani Nikula)
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10172 Fixes: bd077259d0a9 ("drm/i915/vdsc: Add function to read any PPS register") Cc: Suraj Kandpal suraj.kandpal@intel.com Cc: Ankit Nautiyal ankit.k.nautiyal@intel.com Cc: Animesh Manna animesh.manna@intel.com Cc: Jani Nikula jani.nikula@linux.intel.com Cc: Sean Paul sean@poorly.run Cc: Drew Davenport ddavenport@chromium.org Signed-off-by: Manasi Navare navaremanasi@chromium.org Reviewed-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Jani Nikula jani.nikula@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240205204619.1991673-1-navar... (cherry picked from commit 6074be620c31dc2ae11af96a1a5ea95580976fb5) Signed-off-by: Joonas Lahtinen joonas.lahtinen@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/display/intel_vdsc_regs.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_vdsc_regs.h b/drivers/gpu/drm/i915/display/intel_vdsc_regs.h index 64f440fdc22b..8b21dc8e26d5 100644 --- a/drivers/gpu/drm/i915/display/intel_vdsc_regs.h +++ b/drivers/gpu/drm/i915/display/intel_vdsc_regs.h @@ -51,8 +51,8 @@ #define DSCC_PICTURE_PARAMETER_SET_0 _MMIO(0x6BA00) #define _DSCA_PPS_0 0x6B200 #define _DSCC_PPS_0 0x6BA00 -#define DSCA_PPS(pps) _MMIO(_DSCA_PPS_0 + (pps) * 4) -#define DSCC_PPS(pps) _MMIO(_DSCC_PPS_0 + (pps) * 4) +#define DSCA_PPS(pps) _MMIO(_DSCA_PPS_0 + ((pps) < 12 ? (pps) : (pps) + 12) * 4) +#define DSCC_PPS(pps) _MMIO(_DSCC_PPS_0 + ((pps) < 12 ? (pps) : (pps) + 12) * 4) #define _ICL_DSC0_PICTURE_PARAMETER_SET_0_PB 0x78270 #define _ICL_DSC1_PICTURE_PARAMETER_SET_0_PB 0x78370 #define _ICL_DSC0_PICTURE_PARAMETER_SET_0_PC 0x78470
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Breno Leitao leitao@debian.org
[ Upstream commit 5b3fbd61b9d1f4ed2db95aaf03f9adae0373784d ]
The Documentation/ABI/testing/sysfs-class-net-statistics documentation is pointing to the wrong path for the interface. Documentation is pointing to /sys/class/<iface>, instead of /sys/class/net/<iface>.
Fix it by adding the `net/` directory before the interface.
Fixes: 6044f9700645 ("net: sysfs: document /sys/class/net/statistics/*") Signed-off-by: Breno Leitao leitao@debian.org Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../ABI/testing/sysfs-class-net-statistics | 48 +++++++++---------- 1 file changed, 24 insertions(+), 24 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-class-net-statistics b/Documentation/ABI/testing/sysfs-class-net-statistics index 55db27815361..53e508c6936a 100644 --- a/Documentation/ABI/testing/sysfs-class-net-statistics +++ b/Documentation/ABI/testing/sysfs-class-net-statistics @@ -1,4 +1,4 @@ -What: /sys/class/<iface>/statistics/collisions +What: /sys/class/net/<iface>/statistics/collisions Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -6,7 +6,7 @@ Description: Indicates the number of collisions seen by this network device. This value might not be relevant with all MAC layers.
-What: /sys/class/<iface>/statistics/multicast +What: /sys/class/net/<iface>/statistics/multicast Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -14,7 +14,7 @@ Description: Indicates the number of multicast packets received by this network device.
-What: /sys/class/<iface>/statistics/rx_bytes +What: /sys/class/net/<iface>/statistics/rx_bytes Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -23,7 +23,7 @@ Description: See the network driver for the exact meaning of when this value is incremented.
-What: /sys/class/<iface>/statistics/rx_compressed +What: /sys/class/net/<iface>/statistics/rx_compressed Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -32,7 +32,7 @@ Description: network device. This value might only be relevant for interfaces that support packet compression (e.g: PPP).
-What: /sys/class/<iface>/statistics/rx_crc_errors +What: /sys/class/net/<iface>/statistics/rx_crc_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -41,7 +41,7 @@ Description: by this network device. Note that the specific meaning might depend on the MAC layer used by the interface.
-What: /sys/class/<iface>/statistics/rx_dropped +What: /sys/class/net/<iface>/statistics/rx_dropped Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -51,7 +51,7 @@ Description: packet processing. See the network driver for the exact meaning of this value.
-What: /sys/class/<iface>/statistics/rx_errors +What: /sys/class/net/<iface>/statistics/rx_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -59,7 +59,7 @@ Description: Indicates the number of receive errors on this network device. See the network driver for the exact meaning of this value.
-What: /sys/class/<iface>/statistics/rx_fifo_errors +What: /sys/class/net/<iface>/statistics/rx_fifo_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -68,7 +68,7 @@ Description: network device. See the network driver for the exact meaning of this value.
-What: /sys/class/<iface>/statistics/rx_frame_errors +What: /sys/class/net/<iface>/statistics/rx_frame_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -78,7 +78,7 @@ Description: on the MAC layer protocol used. See the network driver for the exact meaning of this value.
-What: /sys/class/<iface>/statistics/rx_length_errors +What: /sys/class/net/<iface>/statistics/rx_length_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -87,7 +87,7 @@ Description: error, oversized or undersized. See the network driver for the exact meaning of this value.
-What: /sys/class/<iface>/statistics/rx_missed_errors +What: /sys/class/net/<iface>/statistics/rx_missed_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -96,7 +96,7 @@ Description: due to lack of capacity in the receive side. See the network driver for the exact meaning of this value.
-What: /sys/class/<iface>/statistics/rx_nohandler +What: /sys/class/net/<iface>/statistics/rx_nohandler Date: February 2016 KernelVersion: 4.6 Contact: netdev@vger.kernel.org @@ -104,7 +104,7 @@ Description: Indicates the number of received packets that were dropped on an inactive device by the network core.
-What: /sys/class/<iface>/statistics/rx_over_errors +What: /sys/class/net/<iface>/statistics/rx_over_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -114,7 +114,7 @@ Description: (e.g: larger than MTU). See the network driver for the exact meaning of this value.
-What: /sys/class/<iface>/statistics/rx_packets +What: /sys/class/net/<iface>/statistics/rx_packets Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -122,7 +122,7 @@ Description: Indicates the total number of good packets received by this network device.
-What: /sys/class/<iface>/statistics/tx_aborted_errors +What: /sys/class/net/<iface>/statistics/tx_aborted_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -132,7 +132,7 @@ Description: a medium collision). See the network driver for the exact meaning of this value.
-What: /sys/class/<iface>/statistics/tx_bytes +What: /sys/class/net/<iface>/statistics/tx_bytes Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -143,7 +143,7 @@ Description: transmitted packets or all packets that have been queued for transmission.
-What: /sys/class/<iface>/statistics/tx_carrier_errors +What: /sys/class/net/<iface>/statistics/tx_carrier_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -152,7 +152,7 @@ Description: because of carrier errors (e.g: physical link down). See the network driver for the exact meaning of this value.
-What: /sys/class/<iface>/statistics/tx_compressed +What: /sys/class/net/<iface>/statistics/tx_compressed Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -161,7 +161,7 @@ Description: this might only be relevant for devices that support compression (e.g: PPP).
-What: /sys/class/<iface>/statistics/tx_dropped +What: /sys/class/net/<iface>/statistics/tx_dropped Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -170,7 +170,7 @@ Description: See the driver for the exact reasons as to why the packets were dropped.
-What: /sys/class/<iface>/statistics/tx_errors +What: /sys/class/net/<iface>/statistics/tx_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -179,7 +179,7 @@ Description: a network device. See the driver for the exact reasons as to why the packets were dropped.
-What: /sys/class/<iface>/statistics/tx_fifo_errors +What: /sys/class/net/<iface>/statistics/tx_fifo_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -188,7 +188,7 @@ Description: FIFO error. See the driver for the exact reasons as to why the packets were dropped.
-What: /sys/class/<iface>/statistics/tx_heartbeat_errors +What: /sys/class/net/<iface>/statistics/tx_heartbeat_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -197,7 +197,7 @@ Description: reported as heartbeat errors. See the driver for the exact reasons as to why the packets were dropped.
-What: /sys/class/<iface>/statistics/tx_packets +What: /sys/class/net/<iface>/statistics/tx_packets Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org @@ -206,7 +206,7 @@ Description: device. See the driver for whether this reports the number of all attempted or successful transmissions.
-What: /sys/class/<iface>/statistics/tx_window_errors +What: /sys/class/net/<iface>/statistics/tx_window_errors Date: April 2005 KernelVersion: 2.6.12 Contact: netdev@vger.kernel.org
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit 2c80a2b715df75881359d07dbaacff8ad411f40e ]
The conversion to kvcalloc() mixed up the object size and count arguments, causing a warning:
drivers/gpu/drm/nouveau/nouveau_svm.c: In function 'nouveau_svm_fault_buffer_ctor': drivers/gpu/drm/nouveau/nouveau_svm.c:1010:40: error: 'kvcalloc' sizes specified with 'sizeof' in the earlier argument and not in the later argument [-Werror=calloc-transposed-args] 1010 | buffer->fault = kvcalloc(sizeof(*buffer->fault), buffer->entries, GFP_KERNEL); | ^ drivers/gpu/drm/nouveau/nouveau_svm.c:1010:40: note: earlier argument should specify number of elements, later size of each element
The behavior is still correct aside from the warning, but fixing it avoids the warnings and can help the compiler track the individual objects better.
Fixes: 71e4bbca070e ("nouveau/svm: Use kvcalloc() instead of kvzalloc()") Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Danilo Krummrich dakr@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20240212112230.1117284-1-arnd@... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/nouveau/nouveau_svm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c index cc03e0c22ff3..5e4565c5011a 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.c +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c @@ -1011,7 +1011,7 @@ nouveau_svm_fault_buffer_ctor(struct nouveau_svm *svm, s32 oclass, int id) if (ret) return ret;
- buffer->fault = kvcalloc(sizeof(*buffer->fault), buffer->entries, GFP_KERNEL); + buffer->fault = kvcalloc(buffer->entries, sizeof(*buffer->fault), GFP_KERNEL); if (!buffer->fault) return -ENOMEM;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guenter Roeck linux@roeck-us.net
[ Upstream commit d55347bfe4e66dce2e1e7501e5492f4af3e315f8 ]
After 'lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests' was applied, the test_csum_ipv6_magic unit test started failing for all mips platforms, both little and bit endian. Oddly enough, adding debug code into test_csum_ipv6_magic() made the problem disappear.
The gcc manual says:
"The "memory" clobber tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters) "
This is definitely the case for csum_ipv6_magic(). Indeed, adding the 'memory' clobber fixes the problem.
Cc: Charlie Jenkins charlie@rivosinc.com Cc: Palmer Dabbelt palmer@rivosinc.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Guenter Roeck linux@roeck-us.net Reviewed-by: Charlie Jenkins charlie@rivosinc.com Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/mips/include/asm/checksum.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/mips/include/asm/checksum.h b/arch/mips/include/asm/checksum.h index 4044eaf989ac..0921ddda11a4 100644 --- a/arch/mips/include/asm/checksum.h +++ b/arch/mips/include/asm/checksum.h @@ -241,7 +241,8 @@ static __inline__ __sum16 csum_ipv6_magic(const struct in6_addr *saddr, " .set pop" : "=&r" (sum), "=&r" (tmp) : "r" (saddr), "r" (daddr), - "0" (htonl(len)), "r" (htonl(proto)), "r" (sum)); + "0" (htonl(len)), "r" (htonl(proto)), "r" (sum) + : "memory");
return csum_fold(sum); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiaxun Yang jiaxun.yang@flygoat.com
[ Upstream commit 11ba1728be3edb6928791f4c622f154ebe228ae6 ]
On architectures with delay slot, architecture level instruction pointer (or program counter) in pt_regs may differ from where exception was triggered.
Introduce exception_ip hook to invoke architecture code and determine actual instruction pointer to the exception.
Link: https://lore.kernel.org/lkml/00d1b813-c55f-4365-8d81-d70258e10b16@app.fastma... Signed-off-by: Jiaxun Yang jiaxun.yang@flygoat.com Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Stable-dep-of: 8fa507083388 ("mm/memory: Use exception ip to search exception tables") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/mips/include/asm/ptrace.h | 2 ++ arch/mips/kernel/ptrace.c | 7 +++++++ include/linux/ptrace.h | 4 ++++ 3 files changed, 13 insertions(+)
diff --git a/arch/mips/include/asm/ptrace.h b/arch/mips/include/asm/ptrace.h index daf3cf244ea9..701a233583c2 100644 --- a/arch/mips/include/asm/ptrace.h +++ b/arch/mips/include/asm/ptrace.h @@ -154,6 +154,8 @@ static inline long regs_return_value(struct pt_regs *regs) }
#define instruction_pointer(regs) ((regs)->cp0_epc) +extern unsigned long exception_ip(struct pt_regs *regs); +#define exception_ip(regs) exception_ip(regs) #define profile_pc(regs) instruction_pointer(regs)
extern asmlinkage long syscall_trace_enter(struct pt_regs *regs, long syscall); diff --git a/arch/mips/kernel/ptrace.c b/arch/mips/kernel/ptrace.c index d9df543f7e2c..59288c13b581 100644 --- a/arch/mips/kernel/ptrace.c +++ b/arch/mips/kernel/ptrace.c @@ -31,6 +31,7 @@ #include <linux/seccomp.h> #include <linux/ftrace.h>
+#include <asm/branch.h> #include <asm/byteorder.h> #include <asm/cpu.h> #include <asm/cpu-info.h> @@ -48,6 +49,12 @@ #define CREATE_TRACE_POINTS #include <trace/events/syscalls.h>
+unsigned long exception_ip(struct pt_regs *regs) +{ + return exception_epc(regs); +} +EXPORT_SYMBOL(exception_ip); + /* * Called by kernel/ptrace.c when detaching.. * diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h index eaaef3ffec22..90507d4afcd6 100644 --- a/include/linux/ptrace.h +++ b/include/linux/ptrace.h @@ -393,6 +393,10 @@ static inline void user_single_step_report(struct pt_regs *regs) #define current_user_stack_pointer() user_stack_pointer(current_pt_regs()) #endif
+#ifndef exception_ip +#define exception_ip(x) instruction_pointer(x) +#endif + extern int task_current_syscall(struct task_struct *target, struct syscall_info *info);
extern void sigaction_compat_abi(struct k_sigaction *act, struct k_sigaction *oact);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiaxun Yang jiaxun.yang@flygoat.com
[ Upstream commit 8fa5070833886268e4fb646daaca99f725b378e9 ]
On architectures with delay slot, instruction_pointer() may differ from where exception was triggered.
Use exception_ip we just introduced to search exception tables to get rid of the problem.
Fixes: 4bce37a68ff8 ("mips/mm: Convert to using lock_mm_and_find_vma()") Reported-by: Xi Ruoyao xry111@xry111.site Link: https://lore.kernel.org/r/75e9fd7b08562ad9b456a5bdaacb7cc220311cc9.camel@xry... Suggested-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Jiaxun Yang jiaxun.yang@flygoat.com Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Sasha Levin sashal@kernel.org --- mm/memory.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c index 6e0712d06cd4..f941489d6041 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5373,7 +5373,7 @@ static inline bool get_mmap_lock_carefully(struct mm_struct *mm, struct pt_regs return true;
if (regs && !user_mode(regs)) { - unsigned long ip = instruction_pointer(regs); + unsigned long ip = exception_ip(regs); if (!search_exception_tables(ip)) return false; } @@ -5398,7 +5398,7 @@ static inline bool upgrade_mmap_lock_carefully(struct mm_struct *mm, struct pt_r { mmap_read_unlock(mm); if (regs && !user_mode(regs)) { - unsigned long ip = instruction_pointer(regs); + unsigned long ip = exception_ip(regs); if (!search_exception_tables(ip)) return false; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Vecera ivecera@redhat.com
[ Upstream commit 73d9629e1c8c1982f13688c4d1019c3994647ccc ]
Currently when PF administratively sets VF's MAC address and the VF is put down (VF tries to delete all MACs) then the MAC is removed from MAC filters and primary VF MAC is zeroed.
Do not allow untrusted VF to remove primary MAC when it was set administratively by PF.
Reproducer: 1) Create VF 2) Set VF interface up 3) Administratively set the VF's MAC 4) Put VF interface down
[root@host ~]# echo 1 > /sys/class/net/enp2s0f0/device/sriov_numvfs [root@host ~]# ip link set enp2s0f0v0 up [root@host ~]# ip link set enp2s0f0 vf 0 mac fe:6c:b5:da:c7:7d [root@host ~]# ip link show enp2s0f0 23: enp2s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 3c:ec:ef:b7:dd:04 brd ff:ff:ff:ff:ff:ff vf 0 link/ether fe:6c:b5:da:c7:7d brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off [root@host ~]# ip link set enp2s0f0v0 down [root@host ~]# ip link show enp2s0f0 23: enp2s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 3c:ec:ef:b7:dd:04 brd ff:ff:ff:ff:ff:ff vf 0 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
Fixes: 700bbf6c1f9e ("i40e: allow VF to remove any MAC filter") Fixes: ceb29474bbbc ("i40e: Add support for VF to specify its primary MAC address") Signed-off-by: Ivan Vecera ivecera@redhat.com Reviewed-by: Simon Horman horms@kernel.org Tested-by: Rafal Romanowski rafal.romanowski@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Link: https://lore.kernel.org/r/20240208180335.1844996-1-anthony.l.nguyen@intel.co... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/intel/i40e/i40e_virtchnl_pf.c | 38 ++++++++++++++++--- 1 file changed, 33 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index 7db89b294510..3d8a23d3352e 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -2850,6 +2850,24 @@ static int i40e_vc_get_stats_msg(struct i40e_vf *vf, u8 *msg) (u8 *)&stats, sizeof(stats)); }
+/** + * i40e_can_vf_change_mac + * @vf: pointer to the VF info + * + * Return true if the VF is allowed to change its MAC filters, false otherwise + */ +static bool i40e_can_vf_change_mac(struct i40e_vf *vf) +{ + /* If the VF MAC address has been set administratively (via the + * ndo_set_vf_mac command), then deny permission to the VF to + * add/delete unicast MAC addresses, unless the VF is trusted + */ + if (vf->pf_set_mac && !vf->trusted) + return false; + + return true; +} + #define I40E_MAX_MACVLAN_PER_HW 3072 #define I40E_MAX_MACVLAN_PER_PF(num_ports) (I40E_MAX_MACVLAN_PER_HW / \ (num_ports)) @@ -2909,8 +2927,8 @@ static inline int i40e_check_vf_permission(struct i40e_vf *vf, * The VF may request to set the MAC address filter already * assigned to it so do not return an error in that case. */ - if (!test_bit(I40E_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps) && - !is_multicast_ether_addr(addr) && vf->pf_set_mac && + if (!i40e_can_vf_change_mac(vf) && + !is_multicast_ether_addr(addr) && !ether_addr_equal(addr, vf->default_lan_addr.addr)) { dev_err(&pf->pdev->dev, "VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation\n"); @@ -3116,19 +3134,29 @@ static int i40e_vc_del_mac_addr_msg(struct i40e_vf *vf, u8 *msg) ret = -EINVAL; goto error_param; } - if (ether_addr_equal(al->list[i].addr, vf->default_lan_addr.addr)) - was_unimac_deleted = true; } vsi = pf->vsi[vf->lan_vsi_idx];
spin_lock_bh(&vsi->mac_filter_hash_lock); /* delete addresses from the list */ - for (i = 0; i < al->num_elements; i++) + for (i = 0; i < al->num_elements; i++) { + const u8 *addr = al->list[i].addr; + + /* Allow to delete VF primary MAC only if it was not set + * administratively by PF or if VF is trusted. + */ + if (ether_addr_equal(addr, vf->default_lan_addr.addr) && + i40e_can_vf_change_mac(vf)) + was_unimac_deleted = true; + else + continue; + if (i40e_del_mac_filter(vsi, al->list[i].addr)) { ret = -EINVAL; spin_unlock_bh(&vsi->mac_filter_hash_lock); goto error_param; } + }
spin_unlock_bh(&vsi->mac_filter_hash_lock);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Vecera ivecera@redhat.com
[ Upstream commit c73729b64bb692186da080602cd13612783f52ac ]
The function i40e_pf_wait_queues_disabled() iterates all PF's VSIs up to 'pf->hw.func_caps.num_vsis' but this is incorrect because the real number of VSIs can be up to 'pf->num_alloc_vsi' that can be higher. Fix this loop.
Fixes: 69129dc39fac ("i40e: Modify Tx disable wait flow in case of DCB reconfiguration") Signed-off-by: Ivan Vecera ivecera@redhat.com Reviewed-by: Jacob Keller jacob.e.keller@intel.com Reviewed-by: Wojciech Drewek wojciech.drewek@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 2bd7b29fb251..d9716bcec81b 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -5361,7 +5361,7 @@ static int i40e_pf_wait_queues_disabled(struct i40e_pf *pf) { int v, ret = 0;
- for (v = 0; v < pf->hw.func_caps.num_vsis; v++) { + for (v = 0; v < pf->num_alloc_vsi; v++) { if (pf->vsi[v]) { ret = i40e_vsi_wait_queues_disabled(pf->vsi[v]); if (ret)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ryan Roberts ryan.roberts@arm.com
commit 96204e15310c218fd9355bdcacd02fed1d18070e upstream.
The addition of commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") caused the "virtual_address_range" mm selftest to start failing on arm64. Let's fix that regression.
There were 2 visible problems when running the test; 1) it takes much longer to execute, and 2) the test fails. Both are related:
The (first part of the) test allocates as many 1GB anonymous blocks as it can in the low 256TB of address space, passing NULL as the addr hint to mmap. Before the faulty patch, all allocations were abutted and contained in a single, merged VMA. However, after this patch, each allocation is in its own VMA, and there is a 2M gap between each VMA. This causes the 2 problems in the test: 1) mmap becomes MUCH slower because there are so many VMAs to check to find a new 1G gap. 2) mmap fails once it hits the VMA limit (/proc/sys/vm/max_map_count). Hitting this limit then causes a subsequent calloc() to fail, which causes the test to fail.
The problem is that arm64 (unlike x86) selects ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area() allocates len+2M then always aligns to the bottom of the discovered gap. That causes the 2M hole.
Fix this by detecting cases where we can still achive the alignment goal when moved to the top of the allocated area, if configured to prefer top-down allocation.
While we are at it, fix thp_get_unmapped_area's use of pgoff, which should always be zero for anonymous mappings. Prior to the faulty change, while it was possible for user space to pass in pgoff!=0, the old mm->get_unmapped_area() handler would not use it. thp_get_unmapped_area() does use it, so let's explicitly zero it before calling the handler. This should also be the correct behavior for arches that define their own get_unmapped_area() handler.
Link: https://lkml.kernel.org/r/20240123171420.3970220-1-ryan.roberts@arm.com Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") Closes: https://lore.kernel.org/linux-mm/1e8f5ac7-54ce-433a-ae53-81522b2320e1@arm.co... Signed-off-by: Ryan Roberts ryan.roberts@arm.com Reviewed-by: Yang Shi shy828301@gmail.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Rik van Riel riel@surriel.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/huge_memory.c | 10 ++++++++-- mm/mmap.c | 6 ++++-- 2 files changed, 12 insertions(+), 4 deletions(-)
--- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -633,7 +633,7 @@ static unsigned long __thp_get_unmapped_ { loff_t off_end = off + len; loff_t off_align = round_up(off, size); - unsigned long len_pad, ret; + unsigned long len_pad, ret, off_sub;
if (IS_ENABLED(CONFIG_32BIT) || in_compat_syscall()) return 0; @@ -662,7 +662,13 @@ static unsigned long __thp_get_unmapped_ if (ret == addr) return addr;
- ret += (off - ret) & (size - 1); + off_sub = (off - ret) & (size - 1); + + if (current->mm->get_unmapped_area == arch_get_unmapped_area_topdown && + !off_sub) + return ret + size; + + ret += off_sub; return ret; }
--- a/mm/mmap.c +++ b/mm/mmap.c @@ -1825,15 +1825,17 @@ get_unmapped_area(struct file *file, uns /* * mmap_region() will call shmem_zero_setup() to create a file, * so use shmem's get_unmapped_area in case it can be huge. - * do_mmap() will clear pgoff, so match alignment. */ - pgoff = 0; get_area = shmem_get_unmapped_area; } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { /* Ensures that larger anonymous mappings are THP aligned. */ get_area = thp_get_unmapped_area; }
+ /* Always treat pgoff as zero for anonymous memory. */ + if (!file) + pgoff = 0; + addr = get_area(file, addr, len, pgoff, flags); if (IS_ERR_VALUE(addr)) return addr;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lokesh Gidra lokeshgidra@google.com
commit 67695f18d55924b2013534ef3bdc363bc9e14605 upstream.
In mfill_atomic_hugetlb(), mmap_changing isn't being checked again if we drop mmap_lock and reacquire it. When the lock is not held, mmap_changing could have been incremented. This is also inconsistent with the behavior in mfill_atomic().
Link: https://lkml.kernel.org/r/20240117223729.1444522-1-lokeshgidra@google.com Fixes: df2cc96e77011 ("userfaultfd: prevent non-cooperative events vs mcopy_atomic races") Signed-off-by: Lokesh Gidra lokeshgidra@google.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: Mike Rapoport rppt@kernel.org Cc: Axel Rasmussen axelrasmussen@google.com Cc: Brian Geffon bgeffon@google.com Cc: David Hildenbrand david@redhat.com Cc: Jann Horn jannh@google.com Cc: Kalesh Singh kaleshsingh@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Nicolas Geoffray ngeoffray@google.com Cc: Peter Xu peterx@redhat.com Cc: Suren Baghdasaryan surenb@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/userfaultfd.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
--- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -357,6 +357,7 @@ static __always_inline ssize_t mfill_ato unsigned long dst_start, unsigned long src_start, unsigned long len, + atomic_t *mmap_changing, uffd_flags_t flags) { struct mm_struct *dst_mm = dst_vma->vm_mm; @@ -472,6 +473,15 @@ retry: goto out; } mmap_read_lock(dst_mm); + /* + * If memory mappings are changing because of non-cooperative + * operation (e.g. mremap) running in parallel, bail out and + * request the user to retry later + */ + if (mmap_changing && atomic_read(mmap_changing)) { + err = -EAGAIN; + break; + }
dst_vma = NULL; goto retry; @@ -506,6 +516,7 @@ extern ssize_t mfill_atomic_hugetlb(stru unsigned long dst_start, unsigned long src_start, unsigned long len, + atomic_t *mmap_changing, uffd_flags_t flags); #endif /* CONFIG_HUGETLB_PAGE */
@@ -622,8 +633,8 @@ retry: * If this is a HUGETLB vma, pass off to appropriate routine */ if (is_vm_hugetlb_page(dst_vma)) - return mfill_atomic_hugetlb(dst_vma, dst_start, - src_start, len, flags); + return mfill_atomic_hugetlb(dst_vma, dst_start, src_start, + len, mmap_changing, flags);
if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) goto out_unlock;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ryan Roberts ryan.roberts@arm.com
commit d021b442cf312664811783e92b3d5e4548e92a53 upstream.
ksm_tests was previously mmapping a region of memory, aligning the returned pointer to a PMD boundary, then setting MADV_HUGEPAGE, but was setting it past the end of the mmapped area due to not taking the pointer alignment into consideration. Fix this behaviour.
Up until commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries"), this buggy behavior was (usually) masked because the alignment difference was always less than PMD-size. But since the mentioned commit, `ksm_tests -H -s 100` started failing.
Link: https://lkml.kernel.org/r/20240122120554.3108022-1-ryan.roberts@arm.com Fixes: 325254899684 ("selftests: vm: add KSM huge pages merging time test") Signed-off-by: Ryan Roberts ryan.roberts@arm.com Cc: Pedro Demarchi Gomes pedrodemargomes@gmail.com Cc: Shuah Khan shuah@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/mm/ksm_tests.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/mm/ksm_tests.c +++ b/tools/testing/selftests/mm/ksm_tests.c @@ -566,7 +566,7 @@ static int ksm_merge_hugepages_time(int if (map_ptr_orig == MAP_FAILED) err(2, "initial mmap");
- if (madvise(map_ptr, len + HPAGE_SIZE, MADV_HUGEPAGE)) + if (madvise(map_ptr, len, MADV_HUGEPAGE)) err(2, "MADV_HUGEPAGE");
pagemap_fd = open("/proc/self/pagemap", O_RDONLY);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Samuel Holland samuel.holland@sifive.com
commit 6f9dc684cae638dda0570154509884ee78d0f75c upstream.
The shadow call stack implementation fails to build without CONFIG_MMU:
ld.lld: error: undefined symbol: vfree_atomic
referenced by scs.c kernel/scs.o:(scs_free) in archive vmlinux.a
Link: https://lkml.kernel.org/r/20240122175204.2371009-1-samuel.holland@sifive.com Fixes: a2abe7cbd8fe ("scs: switch to vmapped shadow stacks") Signed-off-by: Samuel Holland samuel.holland@sifive.com Reviewed-by: Sami Tolvanen samitolvanen@google.com Cc: Will Deacon will@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/Kconfig | 1 + 1 file changed, 1 insertion(+)
--- a/arch/Kconfig +++ b/arch/Kconfig @@ -681,6 +681,7 @@ config SHADOW_CALL_STACK bool "Shadow Call Stack" depends on ARCH_SUPPORTS_SHADOW_CALL_STACK depends on DYNAMIC_FTRACE_WITH_ARGS || DYNAMIC_FTRACE_WITH_REGS || !FUNCTION_GRAPH_TRACER + depends on MMU help This option enables the compiler's Shadow Call Stack, which uses a shadow stack to protect function return addresses from
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masami Hiramatsu (Google) mhiramat@kernel.org
commit 0958b33ef5a04ed91f61cef4760ac412080c4e08 upstream.
Fix register_snapshot_trigger() to return error code if it failed to allocate a snapshot instead of 0 (success). Unless that, it will register snapshot trigger without an error.
Link: https://lore.kernel.org/linux-trace-kernel/170622977792.270660.2789298642759...
Fixes: 0bbe7f719985 ("tracing: Fix the race between registering 'snapshot' event trigger and triggering 'snapshot' operation") Cc: stable@vger.kernel.org Cc: Vincent Donnefort vdonnefort@google.com Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace_events_trigger.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/kernel/trace/trace_events_trigger.c +++ b/kernel/trace/trace_events_trigger.c @@ -1470,8 +1470,10 @@ register_snapshot_trigger(char *glob, struct event_trigger_data *data, struct trace_event_file *file) { - if (tracing_alloc_snapshot_instance(file->tr) != 0) - return 0; + int ret = tracing_alloc_snapshot_instance(file->tr); + + if (ret < 0) + return ret;
return register_trigger(glob, data, file); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sidhartha Kumar sidhartha.kumar@oracle.com
commit 19d3e221807772f8443e565234a6fdc5a2b09d26 upstream.
has_extra_refcount() makes the assumption that the page cache adds a ref count of 1 and subtracts this in the extra_pins case. Commit a08c7193e4f1 (mm/filemap: remove hugetlb special casing in filemap.c) modifies __filemap_add_folio() by calling folio_ref_add(folio, nr); for all cases (including hugtetlb) where nr is the number of pages in the folio. We should adjust the number of references coming from the page cache by subtracing the number of pages rather than 1.
In hugetlbfs_read_iter(), folio_test_has_hwpoisoned() is testing the wrong flag as, in the hugetlb case, memory-failure code calls folio_test_set_hwpoison() to indicate poison. folio_test_hwpoison() is the correct function to test for that flag.
After these fixes, the hugetlb hwpoison read selftest passes all cases.
Link: https://lkml.kernel.org/r/20240112180840.367006-1-sidhartha.kumar@oracle.com Fixes: a08c7193e4f1 ("mm/filemap: remove hugetlb special casing in filemap.c") Signed-off-by: Sidhartha Kumar sidhartha.kumar@oracle.com Closes: https://lore.kernel.org/linux-mm/20230713001833.3778937-1-jiaqiyan@google.co... Reported-by: Muhammad Usama Anjum usama.anjum@collabora.com Tested-by: Muhammad Usama Anjum usama.anjum@collabora.com Acked-by: Miaohe Lin linmiaohe@huawei.com Acked-by: Muchun Song muchun.song@linux.dev Cc: James Houghton jthoughton@google.com Cc: Jiaqi Yan jiaqiyan@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Naoya Horiguchi naoya.horiguchi@nec.com Cc: stable@vger.kernel.org [6.7+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/hugetlbfs/inode.c | 2 +- mm/memory-failure.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -340,7 +340,7 @@ static ssize_t hugetlbfs_read_iter(struc } else { folio_unlock(folio);
- if (!folio_test_has_hwpoisoned(folio)) + if (!folio_test_hwpoison(folio)) want = nr; else { /* --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -976,7 +976,7 @@ static bool has_extra_refcount(struct pa int count = page_count(p) - 1;
if (extra_pins) - count -= 1; + count -= folio_nr_pages(page_folio(p));
if (count > 0) { pr_err("%#lx: %s still referenced by %d users\n",
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Muhammad Usama Anjum usama.anjum@collabora.com
commit bc29036e1da1cf66e5f8312649aeec2d51ea3d86 upstream.
Running charge_reserved_hugetlb.sh generates errors if sh is set to dash:
./charge_reserved_hugetlb.sh: 9: [[: not found ./charge_reserved_hugetlb.sh: 19: [[: not found ./charge_reserved_hugetlb.sh: 27: [[: not found ./charge_reserved_hugetlb.sh: 37: [[: not found ./charge_reserved_hugetlb.sh: 45: Syntax error: "(" unexpected
Switch to using /bin/bash instead of /bin/sh. Make the switch for write_hugetlb_memory.sh as well which is called from charge_reserved_hugetlb.sh.
Link: https://lkml.kernel.org/r/20240116090455.3407378-1-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum usama.anjum@collabora.com Cc: Muhammad Usama Anjum usama.anjum@collabora.com Cc: Shuah Khan shuah@kernel.org Cc: David Laight David.Laight@ACULAB.COM Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/mm/charge_reserved_hugetlb.sh | 2 +- tools/testing/selftests/mm/write_hugetlb_memory.sh | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh +++ b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh @@ -1,4 +1,4 @@ -#!/bin/sh +#!/bin/bash # SPDX-License-Identifier: GPL-2.0
# Kselftest framework requirement - SKIP code is 4. --- a/tools/testing/selftests/mm/write_hugetlb_memory.sh +++ b/tools/testing/selftests/mm/write_hugetlb_memory.sh @@ -1,4 +1,4 @@ -#!/bin/sh +#!/bin/bash # SPDX-License-Identifier: GPL-2.0
set -e
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jan Kara jack@suse.cz
commit ab4443fe3ca6298663a55c4a70efc6c3ce913ca6 upstream.
ra_alloc_folio() marks a page that should trigger next round of async readahead. However it rounds up computed index to the order of page being allocated. This can however lead to multiple consecutive pages being marked with readahead flag. Consider situation with index == 1, mark == 1, order == 0. We insert order 0 page at index 1 and mark it. Then we bump order to 1, index to 2, mark (still == 1) is rounded up to 2 so page at index 2 is marked as well. Then we bump order to 2, index is incremented to 4, mark gets rounded to 4 so page at index 4 is marked as well. The fact that multiple pages get marked within a single readahead window confuses the readahead logic and results in readahead window being trimmed back to 1. This situation is triggered in particular when maximum readahead window size is not a power of two (in the observed case it was 768 KB) and as a result sequential read throughput suffers.
Fix the problem by rounding 'mark' down instead of up. Because the index is naturally aligned to 'order', we are guaranteed 'rounded mark' == index iff 'mark' is within the page we are allocating at 'index' and thus exactly one page is marked with readahead flag as required by the readahead code and sequential read performance is restored.
This effectively reverts part of commit b9ff43dd2743 ("mm/readahead: Fix readahead with large folios"). The commit changed the rounding with the rationale:
"... we were setting the readahead flag on the folio which contains the last byte read from the block. This is wrong because we will trigger readahead at the end of the read without waiting to see if a subsequent read is going to use the pages we just read."
Although this is true, the fact is this was always the case with read sizes not aligned to folio boundaries and large folios in the page cache just make the situation more obvious (and frequent). Also for sequential read workloads it is better to trigger the readahead earlier rather than later. It is true that the difference in the rounding and thus earlier triggering of the readahead can result in reading more for semi-random workloads. However workloads really suffering from this seem to be rare. In particular I have verified that the workload described in commit b9ff43dd2743 ("mm/readahead: Fix readahead with large folios") of reading random 100k blocks from a file like:
[reader] bs=100k rw=randread numjobs=1 size=64g runtime=60s
is not impacted by the rounding change and achieves ~70MB/s in both cases.
[jack@suse.cz: fix one more place where mark rounding was done as well] Link: https://lkml.kernel.org/r/20240123153254.5206-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240104085839.21029-1-jack@suse.cz Fixes: b9ff43dd2743 ("mm/readahead: Fix readahead with large folios") Signed-off-by: Jan Kara jack@suse.cz Cc: Matthew Wilcox willy@infradead.org Cc: Guo Xuenan guoxuenan@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/readahead.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/readahead.c +++ b/mm/readahead.c @@ -469,7 +469,7 @@ static inline int ra_alloc_folio(struct
if (!folio) return -ENOMEM; - mark = round_up(mark, 1UL << order); + mark = round_down(mark, 1UL << order); if (index == mark) folio_set_readahead(folio); err = filemap_add_folio(ractl->mapping, folio, index, gfp); @@ -577,7 +577,7 @@ static void ondemand_readahead(struct re * It's the expected callback index, assume sequential access. * Ramp up sizes, and push forward the readahead window. */ - expected = round_up(ra->start + ra->size - ra->async_size, + expected = round_down(ra->start + ra->size - ra->async_size, 1UL << order); if (index == expected || index == (ra->start + ra->size)) { ra->start += ra->size;
On Tue, Feb 20, 2024 at 09:53:49PM +0100, Greg Kroah-Hartman wrote:
6.7-stable review patch. If anyone has any objections, please let me know.
From: Jan Kara jack@suse.cz
commit ab4443fe3ca6298663a55c4a70efc6c3ce913ca6 upstream.
ibid
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zach O'Keefe zokeefe@google.com
commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78 upstream.
(struct dirty_throttle_control *)->thresh is an unsigned long, but is passed as the u32 divisor argument to div_u64(). On architectures where unsigned long is 64 bytes, the argument will be implicitly truncated.
Use div64_u64() instead of div_u64() so that the value used in the "is this a safe division" check is the same as the divisor.
Also, remove redundant cast of the numerator to u64, as that should happen implicitly.
This would be difficult to exploit in memcg domain, given the ratio-based arithmetic domain_drity_limits() uses, but is much easier in global writeback domain with a BDI_CAP_STRICTLIMIT-backing device, using e.g. vm.dirty_bytes=(1<<32)*PAGE_SIZE so that dtc->thresh == (1<<32)
Link: https://lkml.kernel.org/r/20240118181954.1415197-1-zokeefe@google.com Fixes: f6789593d5ce ("mm/page-writeback.c: fix divide by zero in bdi_dirty_limits()") Signed-off-by: Zach O'Keefe zokeefe@google.com Cc: Maxim Patlasov MPatlasov@parallels.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/page-writeback.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1638,7 +1638,7 @@ static inline void wb_dirty_limits(struc */ dtc->wb_thresh = __wb_calc_thresh(dtc); dtc->wb_bg_thresh = dtc->thresh ? - div_u64((u64)dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0; + div64_u64(dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
/* * In order to avoid the stacked BDI deadlock we need
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Audra Mitchell audra@redhat.com
commit 52e63d67b5bb423b33d7a262ac7f8bd375a90145 upstream.
In order for the page table level 5 to be in use, the CPU must have the setting enabled in addition to the CONFIG option. Check for the flag to be set to avoid false test failures on systems that do not have this cpu flag set.
The test does a series of mmap calls including three using the MAP_FIXED flag and specifying an address that is 1<<47 or 1<<48. These addresses are only available if you are using level 5 page tables, which requires both the CPU to have the capabiltiy (la57 flag) and the kernel to be configured. Currently the test only checks for the kernel configuration option, so this test can still report a false positive. Here are the three failing lines:
$ ./va_high_addr_switch | grep FAILED mmap(ADDR_SWITCH_HINT, 2 * PAGE_SIZE, MAP_FIXED): 0xffffffffffffffff - FAILED mmap(HIGH_ADDR, MAP_FIXED): 0xffffffffffffffff - FAILED mmap(ADDR_SWITCH_HINT, 2 * PAGE_SIZE, MAP_FIXED): 0xffffffffffffffff - FAILED
I thought (for about a second) refactoring the test so that these three mmap calls will only be run on systems with the level 5 page tables available, but the whole point of the test is to check the level 5 feature...
Link: https://lkml.kernel.org/r/20240119205801.62769-1-audra@redhat.com Fixes: 4f2930c6718a ("selftests/vm: only run 128TBswitch with 5-level paging") Signed-off-by: Audra Mitchell audra@redhat.com Cc: Rafael Aquini raquini@redhat.com Cc: Shuah Khan shuah@kernel.org Cc: Adam Sindelar adam@wowsignal.io Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/mm/va_high_addr_switch.sh | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/tools/testing/selftests/mm/va_high_addr_switch.sh +++ b/tools/testing/selftests/mm/va_high_addr_switch.sh @@ -29,9 +29,15 @@ check_supported_x86_64() # See man 1 gzip under '-f'. local pg_table_levels=$(gzip -dcfq "${config}" | grep PGTABLE_LEVELS | cut -d'=' -f 2)
+ local cpu_supports_pl5=$(awk '/^flags/ {if (/la57/) {print 0;} + else {print 1}; exit}' /proc/cpuinfo 2>/dev/null) + if [[ "${pg_table_levels}" -lt 5 ]]; then echo "$0: PGTABLE_LEVELS=${pg_table_levels}, must be >= 5 to run this test" exit $ksft_skip + elif [[ "${cpu_supports_pl5}" -ne 0 ]]; then + echo "$0: CPU does not have the necessary la57 flag to support page table level 5" + exit $ksft_skip fi }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nico Pache npache@redhat.com
commit 91b80cc5b39f00399e8e2d17527cad2c7fa535e2 upstream.
On systems with 64k page size and 512M huge page sizes, the allocation and test succeeds but errors out at the munmap. As the comment states, munmap will failure if its not HUGEPAGE aligned. This is due to the length of the mapping being 1/2 the size of the hugepage causing the munmap to not be hugepage aligned. Fix this by making the mapping length the full hugepage if the hugepage is larger than the length of the mapping.
Link: https://lkml.kernel.org/r/20240119131429.172448-1-npache@redhat.com Signed-off-by: Nico Pache npache@redhat.com Cc: Donet Tom donettom@linux.vnet.ibm.com Cc: Shuah Khan shuah@kernel.org Cc: Christophe Leroy christophe.leroy@c-s.fr Cc: Michael Ellerman mpe@ellerman.id.au Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/mm/map_hugetlb.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/tools/testing/selftests/mm/map_hugetlb.c +++ b/tools/testing/selftests/mm/map_hugetlb.c @@ -15,6 +15,7 @@ #include <unistd.h> #include <sys/mman.h> #include <fcntl.h> +#include "vm_util.h"
#define LENGTH (256UL*1024*1024) #define PROTECTION (PROT_READ | PROT_WRITE) @@ -58,10 +59,16 @@ int main(int argc, char **argv) { void *addr; int ret; + size_t hugepage_size; size_t length = LENGTH; int flags = FLAGS; int shift = 0;
+ hugepage_size = default_huge_page_size(); + /* munmap with fail if the length is not page aligned */ + if (hugepage_size > length) + length = hugepage_size; + if (argc > 1) length = atol(argv[1]) << 20; if (argc > 2) {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Kelley mhklinux@outlook.com
commit f4469f3858352ad1197434557150b1f7086762a0 upstream.
Current code uses the specified ring buffer size (either the default of 128 Kbytes or a module parameter specified value) to encompass the one page ring buffer header plus the actual ring itself. When the page size is 4K, carving off one page for the header isn't significant. But when the page size is 64K on ARM64, only half of the default 128 Kbytes is left for the actual ring. While this doesn't break anything, the smaller ring size could be a performance bottleneck.
Fix this by applying the VMBUS_RING_SIZE macro to the specified ring buffer size. This macro adds a page for the header, and rounds up the size to a page boundary, using the page size for which the kernel is built. Use this new size for subsequent ring buffer calculations. For example, on ARM64 with 64K page size and the default ring size, this results in the actual ring being 128 Kbytes, which is intended.
Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Michael Kelley mhklinux@outlook.com Link: https://lore.kernel.org/r/20240122170956.496436-1-mhklinux@outlook.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/storvsc_drv.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
--- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -330,6 +330,7 @@ enum storvsc_request_type { */
static int storvsc_ringbuffer_size = (128 * 1024); +static int aligned_ringbuffer_size; static u32 max_outstanding_req_per_channel; static int storvsc_change_queue_depth(struct scsi_device *sdev, int queue_depth);
@@ -687,8 +688,8 @@ static void handle_sc_creation(struct vm new_sc->next_request_id_callback = storvsc_next_request_id;
ret = vmbus_open(new_sc, - storvsc_ringbuffer_size, - storvsc_ringbuffer_size, + aligned_ringbuffer_size, + aligned_ringbuffer_size, (void *)&props, sizeof(struct vmstorage_channel_properties), storvsc_on_channel_callback, new_sc); @@ -1973,7 +1974,7 @@ static int storvsc_probe(struct hv_devic dma_set_min_align_mask(&device->device, HV_HYP_PAGE_SIZE - 1);
stor_device->port_number = host->host_no; - ret = storvsc_connect_to_vsp(device, storvsc_ringbuffer_size, is_fc); + ret = storvsc_connect_to_vsp(device, aligned_ringbuffer_size, is_fc); if (ret) goto err_out1;
@@ -2164,7 +2165,7 @@ static int storvsc_resume(struct hv_devi { int ret;
- ret = storvsc_connect_to_vsp(hv_dev, storvsc_ringbuffer_size, + ret = storvsc_connect_to_vsp(hv_dev, aligned_ringbuffer_size, hv_dev_is_fc(hv_dev)); return ret; } @@ -2198,8 +2199,9 @@ static int __init storvsc_drv_init(void) * the ring buffer indices) by the max request size (which is * vmbus_channel_packet_multipage_buffer + struct vstor_packet + u64) */ + aligned_ringbuffer_size = VMBUS_RING_SIZE(storvsc_ringbuffer_size); max_outstanding_req_per_channel = - ((storvsc_ringbuffer_size - PAGE_SIZE) / + ((aligned_ringbuffer_size - PAGE_SIZE) / ALIGN(MAX_MULTIPAGE_BUFFER_PACKET + sizeof(struct vstor_packet) + sizeof(u64), sizeof(u64)));
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Airlie airlied@redhat.com
commit 39126abc5e20611579602f03b66627d7cd1422f0 upstream.
This should break the deadlock between the fctx lock and the irq lock.
This offloads the processing off the work from the irq into a workqueue.
Cc: linux-stable@vger.kernel.org Signed-off-by: Dave Airlie airlied@redhat.com Link: https://patchwork.freedesktop.org/patch/576237/ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/nouveau/nouveau_fence.c | 24 ++++++++++++++++++------ drivers/gpu/drm/nouveau/nouveau_fence.h | 1 + 2 files changed, 19 insertions(+), 6 deletions(-)
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -103,6 +103,7 @@ nouveau_fence_context_kill(struct nouvea void nouveau_fence_context_del(struct nouveau_fence_chan *fctx) { + cancel_work_sync(&fctx->uevent_work); nouveau_fence_context_kill(fctx, 0); nvif_event_dtor(&fctx->event); fctx->dead = 1; @@ -145,12 +146,13 @@ nouveau_fence_update(struct nouveau_chan return drop; }
-static int -nouveau_fence_wait_uevent_handler(struct nvif_event *event, void *repv, u32 repc) +static void +nouveau_fence_uevent_work(struct work_struct *work) { - struct nouveau_fence_chan *fctx = container_of(event, typeof(*fctx), event); + struct nouveau_fence_chan *fctx = container_of(work, struct nouveau_fence_chan, + uevent_work); unsigned long flags; - int ret = NVIF_EVENT_KEEP; + int drop = 0;
spin_lock_irqsave(&fctx->lock, flags); if (!list_empty(&fctx->pending)) { @@ -160,11 +162,20 @@ nouveau_fence_wait_uevent_handler(struct fence = list_entry(fctx->pending.next, typeof(*fence), head); chan = rcu_dereference_protected(fence->channel, lockdep_is_held(&fctx->lock)); if (nouveau_fence_update(chan, fctx)) - ret = NVIF_EVENT_DROP; + drop = 1; } + if (drop) + nvif_event_block(&fctx->event); + spin_unlock_irqrestore(&fctx->lock, flags); +}
- return ret; +static int +nouveau_fence_wait_uevent_handler(struct nvif_event *event, void *repv, u32 repc) +{ + struct nouveau_fence_chan *fctx = container_of(event, typeof(*fctx), event); + schedule_work(&fctx->uevent_work); + return NVIF_EVENT_KEEP; }
void @@ -178,6 +189,7 @@ nouveau_fence_context_new(struct nouveau } args; int ret;
+ INIT_WORK(&fctx->uevent_work, nouveau_fence_uevent_work); INIT_LIST_HEAD(&fctx->flip); INIT_LIST_HEAD(&fctx->pending); spin_lock_init(&fctx->lock); --- a/drivers/gpu/drm/nouveau/nouveau_fence.h +++ b/drivers/gpu/drm/nouveau/nouveau_fence.h @@ -44,6 +44,7 @@ struct nouveau_fence_chan { u32 context; char name[32];
+ struct work_struct uevent_work; struct nvif_event event; int notify_ref, dead, killed; };
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mikulas Patocka mpatocka@redhat.com
commit 0a9bab391e336489169b95cb0d4553d921302189 upstream.
Tasklets have an inherent problem with memory corruption. The function tasklet_action_common calls tasklet_trylock, then it calls the tasklet callback and then it calls tasklet_unlock. If the tasklet callback frees the structure that contains the tasklet or if it calls some code that may free it, tasklet_unlock will write into free memory.
The commits 8e14f610159d and d9a02e016aaf try to fix it for dm-crypt, but it is not a sufficient fix and the data corruption can still happen [1]. There is no fix for dm-verity and dm-verity will write into free memory with every tasklet-processed bio.
There will be atomic workqueues implemented in the kernel 6.9 [2]. They will have better interface and they will not suffer from the memory corruption problem.
But we need something that stops the memory corruption now and that can be backported to the stable kernels. So, I'm proposing this commit that disables tasklets in both dm-crypt and dm-verity. This commit doesn't remove the tasklet support, because the tasklet code will be reused when atomic workqueues will be implemented.
[1] https://lore.kernel.org/all/d390d7ee-f142-44d3-822a-87949e14608b@suse.de/T/ [2] https://lore.kernel.org/lkml/20240130091300.2968534-1-tj@kernel.org/
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org Fixes: 39d42fa96ba1b ("dm crypt: add flags to optionally bypass kcryptd workqueues") Fixes: 5721d4e5a9cdb ("dm verity: Add optional "try_verify_in_tasklet" feature") Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-crypt.c | 38 ++------------------------------------ drivers/md/dm-verity-target.c | 26 ++------------------------ drivers/md/dm-verity.h | 1 - 3 files changed, 4 insertions(+), 61 deletions(-)
--- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -73,10 +73,8 @@ struct dm_crypt_io { struct bio *base_bio; u8 *integrity_metadata; bool integrity_metadata_from_pool:1; - bool in_tasklet:1;
struct work_struct work; - struct tasklet_struct tasklet;
struct convert_context ctx;
@@ -1762,7 +1760,6 @@ static void crypt_io_init(struct dm_cryp io->ctx.r.req = NULL; io->integrity_metadata = NULL; io->integrity_metadata_from_pool = false; - io->in_tasklet = false; atomic_set(&io->io_pending, 0); }
@@ -1771,13 +1768,6 @@ static void crypt_inc_pending(struct dm_ atomic_inc(&io->io_pending); }
-static void kcryptd_io_bio_endio(struct work_struct *work) -{ - struct dm_crypt_io *io = container_of(work, struct dm_crypt_io, work); - - bio_endio(io->base_bio); -} - /* * One of the bios was finished. Check for completion of * the whole request and correctly clean up the buffer. @@ -1801,20 +1791,6 @@ static void crypt_dec_pending(struct dm_
base_bio->bi_status = error;
- /* - * If we are running this function from our tasklet, - * we can't call bio_endio() here, because it will call - * clone_endio() from dm.c, which in turn will - * free the current struct dm_crypt_io structure with - * our tasklet. In this case we need to delay bio_endio() - * execution to after the tasklet is done and dequeued. - */ - if (io->in_tasklet) { - INIT_WORK(&io->work, kcryptd_io_bio_endio); - queue_work(cc->io_queue, &io->work); - return; - } - bio_endio(base_bio); }
@@ -2246,11 +2222,6 @@ static void kcryptd_crypt(struct work_st kcryptd_crypt_write_convert(io); }
-static void kcryptd_crypt_tasklet(unsigned long work) -{ - kcryptd_crypt((struct work_struct *)work); -} - static void kcryptd_queue_crypt(struct dm_crypt_io *io) { struct crypt_config *cc = io->cc; @@ -2262,15 +2233,10 @@ static void kcryptd_queue_crypt(struct d * irqs_disabled(): the kernel may run some IO completion from the idle thread, but * it is being executed with irqs disabled. */ - if (in_hardirq() || irqs_disabled()) { - io->in_tasklet = true; - tasklet_init(&io->tasklet, kcryptd_crypt_tasklet, (unsigned long)&io->work); - tasklet_schedule(&io->tasklet); + if (!(in_hardirq() || irqs_disabled())) { + kcryptd_crypt(&io->work); return; } - - kcryptd_crypt(&io->work); - return; }
INIT_WORK(&io->work, kcryptd_crypt); --- a/drivers/md/dm-verity-target.c +++ b/drivers/md/dm-verity-target.c @@ -645,23 +645,6 @@ static void verity_work(struct work_stru verity_finish_io(io, errno_to_blk_status(verity_verify_io(io))); }
-static void verity_tasklet(unsigned long data) -{ - struct dm_verity_io *io = (struct dm_verity_io *)data; - int err; - - io->in_tasklet = true; - err = verity_verify_io(io); - if (err == -EAGAIN || err == -ENOMEM) { - /* fallback to retrying with work-queue */ - INIT_WORK(&io->work, verity_work); - queue_work(io->v->verify_wq, &io->work); - return; - } - - verity_finish_io(io, errno_to_blk_status(err)); -} - static void verity_end_io(struct bio *bio) { struct dm_verity_io *io = bio->bi_private; @@ -674,13 +657,8 @@ static void verity_end_io(struct bio *bi return; }
- if (static_branch_unlikely(&use_tasklet_enabled) && io->v->use_tasklet) { - tasklet_init(&io->tasklet, verity_tasklet, (unsigned long)io); - tasklet_schedule(&io->tasklet); - } else { - INIT_WORK(&io->work, verity_work); - queue_work(io->v->verify_wq, &io->work); - } + INIT_WORK(&io->work, verity_work); + queue_work(io->v->verify_wq, &io->work); }
/* --- a/drivers/md/dm-verity.h +++ b/drivers/md/dm-verity.h @@ -83,7 +83,6 @@ struct dm_verity_io { struct bvec_iter iter;
struct work_struct work; - struct tasklet_struct tasklet;
/* * Three variably-size fields follow this struct:
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Techno Mooney techno.mooney@gmail.com
commit c6dce23ec993f7da7790a9eadb36864ceb60e942 upstream.
The laptop requires a quirk ID to enable its internal microphone. Add it to the DMI quirk table.
Reported-by: Techno Mooney techno.mooney@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218402 Cc: stable@vger.kernel.org Signed-off-by: Techno Mooney techno.mooney@gmail.com Signed-off-by: Bagas Sanjaya bagasdotme@gmail.com Link: https://msgid.link/r/20240129081148.1044891-1-bagasdotme@gmail.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/amd/yc/acp6x-mach.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/sound/soc/amd/yc/acp6x-mach.c +++ b/sound/soc/amd/yc/acp6x-mach.c @@ -300,6 +300,13 @@ static const struct dmi_system_id yc_acp { .driver_data = &acp6x_card, .matches = { + DMI_MATCH(DMI_BOARD_VENDOR, "Micro-Star International Co., Ltd."), + DMI_MATCH(DMI_PRODUCT_NAME, "Bravo 15 C7VF"), + } + }, + { + .driver_data = &acp6x_card, + .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "Alienware"), DMI_MATCH(DMI_PRODUCT_NAME, "Alienware m17 R5 AMD"), }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Helge Deller deller@gmx.de
commit c8708d758e715c3824a73bf0cda97292b52be44d upstream.
Printing the inventory on a serial console can be quite slow and thus may trigger the hung task detector (CONFIG_DETECT_HUNG_TASK=y) and possibly reboot the machine. Adding a cond_resched() prevents this.
Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # v6.0+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/parisc/kernel/drivers.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/arch/parisc/kernel/drivers.c +++ b/arch/parisc/kernel/drivers.c @@ -1004,6 +1004,9 @@ static __init int qemu_print_iodc_data(s
pr_info("\n");
+ /* Prevent hung task messages when printing on serial console */ + cond_resched(); + pr_info("#define HPA_%08lx_DESCRIPTION "%s"\n", hpa, parisc_hardware_description(&dev->id));
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Senoner seda18@rolmail.net
commit efb56d84dd9c3de3c99fc396abb57c6d330038b5 upstream.
If you connect an external headset/microphone to the 3.5mm jack on the Acer Swift 1 SF114-32 it does not recognize the microphone. This fixes that and gives the user the ability to choose between internal and headset mic.
Signed-off-by: David Senoner seda18@rolmail.net Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240126155626.2304465-1-seda18@rolmail.net Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -9640,6 +9640,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x1025, 0x1247, "Acer vCopperbox", ALC269VC_FIXUP_ACER_VCOPPERBOX_PINS), SND_PCI_QUIRK(0x1025, 0x1248, "Acer Veriton N4660G", ALC269VC_FIXUP_ACER_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1025, 0x1269, "Acer SWIFT SF314-54", ALC256_FIXUP_ACER_HEADSET_MIC), + SND_PCI_QUIRK(0x1025, 0x126a, "Acer Swift SF114-32", ALC256_FIXUP_ACER_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1025, 0x128f, "Acer Veriton Z6860G", ALC286_FIXUP_ACER_AIO_HEADSET_MIC), SND_PCI_QUIRK(0x1025, 0x1290, "Acer Veriton Z4860G", ALC286_FIXUP_ACER_AIO_HEADSET_MIC), SND_PCI_QUIRK(0x1025, 0x1291, "Acer Veriton Z4660G", ALC286_FIXUP_ACER_AIO_HEADSET_MIC),
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luka Guzenko l.guzenko@web.de
commit f0d78972f27dc1d1d51fbace2713ad3cdc60a877 upstream.
This HP Laptop uses ALC236 codec with COEF 0x07 controlling the mute LED. Enable existing quirk for this device.
Signed-off-by: Luka Guzenko l.guzenko@web.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240128155704.2333812-1-l.guzenko@web.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -9840,6 +9840,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x103c, 0x8786, "HP OMEN 15", ALC285_FIXUP_HP_MUTE_LED), SND_PCI_QUIRK(0x103c, 0x8787, "HP OMEN 15", ALC285_FIXUP_HP_MUTE_LED), SND_PCI_QUIRK(0x103c, 0x8788, "HP OMEN 15", ALC285_FIXUP_HP_MUTE_LED), + SND_PCI_QUIRK(0x103c, 0x87b7, "HP Laptop 14-fq0xxx", ALC236_FIXUP_HP_MUTE_LED_COEFBIT2), SND_PCI_QUIRK(0x103c, 0x87c8, "HP", ALC287_FIXUP_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x87e5, "HP ProBook 440 G8 Notebook PC", ALC236_FIXUP_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x87e7, "HP ProBook 450 G8 Notebook PC", ALC236_FIXUP_HP_GPIO_LED),
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Benjamin Tissoires bentiss@kernel.org
commit 7cdd2108903a4e369eb37579830afc12a6877ec2 upstream.
When the kfunc hid_bpf_attach_prog() is called, we called twice fdget(): one for fetching the type of the bpf program, and one for actually attaching the program to the device.
The problem is that between those two calls, we have no guarantees that the prog_fd is still the same file descriptor for the given program.
Solve this by calling bpf_prog_get() earlier, and use this to fetch the program type.
Reported-by: Dan Carpenter dan.carpenter@linaro.org Link: https://lore.kernel.org/bpf/CAO-hwJJ8vh8JD3-P43L-_CLNmPx0hWj44aom0O838vfP4=_... Cc: stable@vger.kernel.org Fixes: f5c27da4e3c8 ("HID: initial BPF implementation") Link: https://lore.kernel.org/r/20240124-b4-hid-bpf-fixes-v2-1-052520b1e5e6@kernel... Signed-off-by: Benjamin Tissoires bentiss@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/bpf/hid_bpf_dispatch.c | 66 ++++++++++++++++++++++++------------ drivers/hid/bpf/hid_bpf_dispatch.h | 4 +- drivers/hid/bpf/hid_bpf_jmp_table.c | 20 +--------- 3 files changed, 49 insertions(+), 41 deletions(-)
--- a/drivers/hid/bpf/hid_bpf_dispatch.c +++ b/drivers/hid/bpf/hid_bpf_dispatch.c @@ -241,6 +241,39 @@ int hid_bpf_reconnect(struct hid_device return 0; }
+static int do_hid_bpf_attach_prog(struct hid_device *hdev, int prog_fd, struct bpf_prog *prog, + __u32 flags) +{ + int fd, err, prog_type; + + prog_type = hid_bpf_get_prog_attach_type(prog); + if (prog_type < 0) + return prog_type; + + if (prog_type >= HID_BPF_PROG_TYPE_MAX) + return -EINVAL; + + if (prog_type == HID_BPF_PROG_TYPE_DEVICE_EVENT) { + err = hid_bpf_allocate_event_data(hdev); + if (err) + return err; + } + + fd = __hid_bpf_attach_prog(hdev, prog_type, prog_fd, prog, flags); + if (fd < 0) + return fd; + + if (prog_type == HID_BPF_PROG_TYPE_RDESC_FIXUP) { + err = hid_bpf_reconnect(hdev); + if (err) { + close_fd(fd); + return err; + } + } + + return fd; +} + /** * hid_bpf_attach_prog - Attach the given @prog_fd to the given HID device * @@ -257,18 +290,13 @@ noinline int hid_bpf_attach_prog(unsigned int hid_id, int prog_fd, __u32 flags) { struct hid_device *hdev; + struct bpf_prog *prog; struct device *dev; - int fd, err, prog_type = hid_bpf_get_prog_attach_type(prog_fd); + int fd;
if (!hid_bpf_ops) return -EINVAL;
- if (prog_type < 0) - return prog_type; - - if (prog_type >= HID_BPF_PROG_TYPE_MAX) - return -EINVAL; - if ((flags & ~HID_BPF_FLAG_MASK)) return -EINVAL;
@@ -278,23 +306,17 @@ hid_bpf_attach_prog(unsigned int hid_id,
hdev = to_hid_device(dev);
- if (prog_type == HID_BPF_PROG_TYPE_DEVICE_EVENT) { - err = hid_bpf_allocate_event_data(hdev); - if (err) - return err; - } + /* + * take a ref on the prog itself, it will be released + * on errors or when it'll be detached + */ + prog = bpf_prog_get(prog_fd); + if (IS_ERR(prog)) + return PTR_ERR(prog);
- fd = __hid_bpf_attach_prog(hdev, prog_type, prog_fd, flags); + fd = do_hid_bpf_attach_prog(hdev, prog_fd, prog, flags); if (fd < 0) - return fd; - - if (prog_type == HID_BPF_PROG_TYPE_RDESC_FIXUP) { - err = hid_bpf_reconnect(hdev); - if (err) { - close_fd(fd); - return err; - } - } + bpf_prog_put(prog);
return fd; } --- a/drivers/hid/bpf/hid_bpf_dispatch.h +++ b/drivers/hid/bpf/hid_bpf_dispatch.h @@ -12,9 +12,9 @@ struct hid_bpf_ctx_kern {
int hid_bpf_preload_skel(void); void hid_bpf_free_links_and_skel(void); -int hid_bpf_get_prog_attach_type(int prog_fd); +int hid_bpf_get_prog_attach_type(struct bpf_prog *prog); int __hid_bpf_attach_prog(struct hid_device *hdev, enum hid_bpf_prog_type prog_type, int prog_fd, - __u32 flags); + struct bpf_prog *prog, __u32 flags); void __hid_bpf_destroy_device(struct hid_device *hdev); int hid_bpf_prog_run(struct hid_device *hdev, enum hid_bpf_prog_type type, struct hid_bpf_ctx_kern *ctx_kern); --- a/drivers/hid/bpf/hid_bpf_jmp_table.c +++ b/drivers/hid/bpf/hid_bpf_jmp_table.c @@ -333,15 +333,10 @@ static int hid_bpf_insert_prog(int prog_ return err; }
-int hid_bpf_get_prog_attach_type(int prog_fd) +int hid_bpf_get_prog_attach_type(struct bpf_prog *prog) { - struct bpf_prog *prog = NULL; - int i; int prog_type = HID_BPF_PROG_TYPE_UNDEF; - - prog = bpf_prog_get(prog_fd); - if (IS_ERR(prog)) - return PTR_ERR(prog); + int i;
for (i = 0; i < HID_BPF_PROG_TYPE_MAX; i++) { if (hid_bpf_btf_ids[i] == prog->aux->attach_btf_id) { @@ -350,8 +345,6 @@ int hid_bpf_get_prog_attach_type(int pro } }
- bpf_prog_put(prog); - return prog_type; }
@@ -388,19 +381,13 @@ static const struct bpf_link_ops hid_bpf /* called from syscall */ noinline int __hid_bpf_attach_prog(struct hid_device *hdev, enum hid_bpf_prog_type prog_type, - int prog_fd, __u32 flags) + int prog_fd, struct bpf_prog *prog, __u32 flags) { struct bpf_link_primer link_primer; struct hid_bpf_link *link; - struct bpf_prog *prog = NULL; struct hid_bpf_prog_entry *prog_entry; int cnt, err = -EINVAL, prog_table_idx = -1;
- /* take a ref on the prog itself */ - prog = bpf_prog_get(prog_fd); - if (IS_ERR(prog)) - return PTR_ERR(prog); - mutex_lock(&hid_bpf_attach_lock);
link = kzalloc(sizeof(*link), GFP_USER); @@ -467,7 +454,6 @@ __hid_bpf_attach_prog(struct hid_device err_unlock: mutex_unlock(&hid_bpf_attach_lock);
- bpf_prog_put(prog); kfree(link);
return err;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Benjamin Tissoires bentiss@kernel.org
commit 89be8aa5b0ecb3b729c7bcff64bb2af7921fec63 upstream.
Turns out that I got my reference counts wrong and each successful bus_find_device() actually calls get_device(), and we need to manually call put_device().
Ensure each bus_find_device() gets a matching put_device() when releasing the bpf programs and fix all the error paths.
Cc: stable@vger.kernel.org Fixes: f5c27da4e3c8 ("HID: initial BPF implementation") Link: https://lore.kernel.org/r/20240124-b4-hid-bpf-fixes-v2-2-052520b1e5e6@kernel... Signed-off-by: Benjamin Tissoires bentiss@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/bpf/hid_bpf_dispatch.c | 29 +++++++++++++++++++++++------ drivers/hid/bpf/hid_bpf_jmp_table.c | 20 +++++++++++++++++--- 2 files changed, 40 insertions(+), 9 deletions(-)
--- a/drivers/hid/bpf/hid_bpf_dispatch.c +++ b/drivers/hid/bpf/hid_bpf_dispatch.c @@ -292,7 +292,7 @@ hid_bpf_attach_prog(unsigned int hid_id, struct hid_device *hdev; struct bpf_prog *prog; struct device *dev; - int fd; + int err, fd;
if (!hid_bpf_ops) return -EINVAL; @@ -311,14 +311,24 @@ hid_bpf_attach_prog(unsigned int hid_id, * on errors or when it'll be detached */ prog = bpf_prog_get(prog_fd); - if (IS_ERR(prog)) - return PTR_ERR(prog); + if (IS_ERR(prog)) { + err = PTR_ERR(prog); + goto out_dev_put; + }
fd = do_hid_bpf_attach_prog(hdev, prog_fd, prog, flags); - if (fd < 0) - bpf_prog_put(prog); + if (fd < 0) { + err = fd; + goto out_prog_put; + }
return fd; + + out_prog_put: + bpf_prog_put(prog); + out_dev_put: + put_device(dev); + return err; }
/** @@ -345,8 +355,10 @@ hid_bpf_allocate_context(unsigned int hi hdev = to_hid_device(dev);
ctx_kern = kzalloc(sizeof(*ctx_kern), GFP_KERNEL); - if (!ctx_kern) + if (!ctx_kern) { + put_device(dev); return NULL; + }
ctx_kern->ctx.hid = hdev;
@@ -363,10 +375,15 @@ noinline void hid_bpf_release_context(struct hid_bpf_ctx *ctx) { struct hid_bpf_ctx_kern *ctx_kern; + struct hid_device *hid;
ctx_kern = container_of(ctx, struct hid_bpf_ctx_kern, ctx); + hid = (struct hid_device *)ctx_kern->ctx.hid; /* ignore const */
kfree(ctx_kern); + + /* get_device() is called by bus_find_device() */ + put_device(&hid->dev); }
/** --- a/drivers/hid/bpf/hid_bpf_jmp_table.c +++ b/drivers/hid/bpf/hid_bpf_jmp_table.c @@ -196,6 +196,7 @@ static void __hid_bpf_do_release_prog(in static void hid_bpf_release_progs(struct work_struct *work) { int i, j, n, map_fd = -1; + bool hdev_destroyed;
if (!jmp_table.map) return; @@ -220,6 +221,12 @@ static void hid_bpf_release_progs(struct if (entry->hdev) { hdev = entry->hdev; type = entry->type; + /* + * hdev is still valid, even if we are called after hid_destroy_device(): + * when hid_bpf_attach() gets called, it takes a ref on the dev through + * bus_find_device() + */ + hdev_destroyed = hdev->bpf.destroyed;
hid_bpf_populate_hdev(hdev, type);
@@ -232,12 +239,19 @@ static void hid_bpf_release_progs(struct if (test_bit(next->idx, jmp_table.enabled)) continue;
- if (next->hdev == hdev && next->type == type) + if (next->hdev == hdev && next->type == type) { + /* + * clear the hdev reference and decrement the device ref + * that was taken during bus_find_device() while calling + * hid_bpf_attach() + */ next->hdev = NULL; + put_device(&hdev->dev); + } }
- /* if type was rdesc fixup, reconnect device */ - if (type == HID_BPF_PROG_TYPE_RDESC_FIXUP) + /* if type was rdesc fixup and the device is not gone, reconnect device */ + if (type == HID_BPF_PROG_TYPE_RDESC_FIXUP && !hdev_destroyed) hid_bpf_reconnect(hdev); } }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit 00aab7dcb2267f2aef59447602f34501efe1a07f upstream.
A while back the I2C HID implementation was split in an ACPI and OF part, but the new OF driver never initialises the client pointer which is dereferenced on power-up failures.
Fixes: b33752c30023 ("HID: i2c-hid: Reorganize so ACPI and OF are separate modules") Cc: stable@vger.kernel.org # 5.12 Cc: Douglas Anderson dianders@chromium.org Signed-off-by: Johan Hovold johan+linaro@kernel.org Reviewed-by: Douglas Anderson dianders@chromium.org Signed-off-by: Jiri Kosina jkosina@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/i2c-hid/i2c-hid-of.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/hid/i2c-hid/i2c-hid-of.c +++ b/drivers/hid/i2c-hid/i2c-hid-of.c @@ -87,6 +87,7 @@ static int i2c_hid_of_probe(struct i2c_c if (!ihid_of) return -ENOMEM;
+ ihid_of->client = client; ihid_of->ops.power_up = i2c_hid_of_power_up; ihid_of->ops.power_down = i2c_hid_of_power_down;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tatsunosuke Tobita tatsunosuke.tobita@wacom.com
commit ab41a31dd5e2681803642b6d08590b61867840ec upstream.
The xf86-input-wacom driver does not treat '0' as a valid serial number and will drop any input report which contains an MSC_SERIAL = 0 event. The kernel driver already takes care to avoid sending any MSC_SERIAL event if the value of serial[0] == 0 (which is the case for devices that don't actually report a serial number), but this is not quite sufficient. Only the lower 32 bits of the serial get reported to userspace, so if this portion of the serial is zero then there can still be problems.
This commit allows the driver to report either the lower 32 bits if they are non-zero or the upper 32 bits otherwise.
Signed-off-by: Jason Gerecke jason.gerecke@wacom.com Signed-off-by: Tatsunosuke Tobita tatsunosuke.tobita@wacom.com Fixes: f85c9dc678a5 ("HID: wacom: generic: Support tool ID and additional tool types") CC: stable@vger.kernel.org # v4.10 Signed-off-by: Jiri Kosina jkosina@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/wacom_wac.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/drivers/hid/wacom_wac.c +++ b/drivers/hid/wacom_wac.c @@ -2574,7 +2574,14 @@ static void wacom_wac_pen_report(struct wacom_wac->hid_data.tipswitch); input_report_key(input, wacom_wac->tool[0], sense); if (wacom_wac->serial[0]) { - input_event(input, EV_MSC, MSC_SERIAL, wacom_wac->serial[0]); + /* + * xf86-input-wacom does not accept a serial number + * of '0'. Report the low 32 bits if possible, but + * if they are zero, report the upper ones instead. + */ + __u32 serial_lo = wacom_wac->serial[0] & 0xFFFFFFFFu; + __u32 serial_hi = wacom_wac->serial[0] >> 32; + input_event(input, EV_MSC, MSC_SERIAL, (int)(serial_lo ? serial_lo : serial_hi)); input_report_abs(input, ABS_MISC, sense ? id : 0); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Gerecke killertofu@gmail.com
commit c1d6708bf0d3dd976460d435373cf5abf21ce258 upstream.
If a input device is opened before hid_hw_start is called, events may not be received from the hardware. In the case of USB-backed devices, for example, the hid_hw_start function is responsible for filling in the URB which is submitted when the input device is opened. If a device is opened prematurely, polling will never start because the device will not have been in the correct state to send the URB.
Because the wacom driver registers its input devices before calling hid_hw_start, there is a window of time where a device can be opened and end up in an inoperable state. Some ARM-based Chromebooks in particular reliably trigger this bug.
This commit splits the wacom_register_inputs function into two pieces. One which is responsible for setting up the allocated inputs (and runs prior to hid_hw_start so that devices are ready for any input events they may end up receiving) and another which only registers the devices (and runs after hid_hw_start to ensure devices can be immediately opened without issue). Note that the functions to initialize the LEDs and remotes are also moved after hid_hw_start to maintain their own dependency chains.
Fixes: 7704ac937345 ("HID: wacom: implement generic HID handling for pen generic devices") Cc: stable@vger.kernel.org # v3.18+ Suggested-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Jason Gerecke jason.gerecke@wacom.com Tested-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Jiri Kosina jkosina@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/wacom_sys.c | 63 ++++++++++++++++++++++++++++++++---------------- 1 file changed, 43 insertions(+), 20 deletions(-)
--- a/drivers/hid/wacom_sys.c +++ b/drivers/hid/wacom_sys.c @@ -2080,7 +2080,7 @@ static int wacom_allocate_inputs(struct return 0; }
-static int wacom_register_inputs(struct wacom *wacom) +static int wacom_setup_inputs(struct wacom *wacom) { struct input_dev *pen_input_dev, *touch_input_dev, *pad_input_dev; struct wacom_wac *wacom_wac = &(wacom->wacom_wac); @@ -2099,10 +2099,6 @@ static int wacom_register_inputs(struct input_free_device(pen_input_dev); wacom_wac->pen_input = NULL; pen_input_dev = NULL; - } else { - error = input_register_device(pen_input_dev); - if (error) - goto fail; }
error = wacom_setup_touch_input_capabilities(touch_input_dev, wacom_wac); @@ -2111,10 +2107,6 @@ static int wacom_register_inputs(struct input_free_device(touch_input_dev); wacom_wac->touch_input = NULL; touch_input_dev = NULL; - } else { - error = input_register_device(touch_input_dev); - if (error) - goto fail; }
error = wacom_setup_pad_input_capabilities(pad_input_dev, wacom_wac); @@ -2123,7 +2115,34 @@ static int wacom_register_inputs(struct input_free_device(pad_input_dev); wacom_wac->pad_input = NULL; pad_input_dev = NULL; - } else { + } + + return 0; +} + +static int wacom_register_inputs(struct wacom *wacom) +{ + struct input_dev *pen_input_dev, *touch_input_dev, *pad_input_dev; + struct wacom_wac *wacom_wac = &(wacom->wacom_wac); + int error = 0; + + pen_input_dev = wacom_wac->pen_input; + touch_input_dev = wacom_wac->touch_input; + pad_input_dev = wacom_wac->pad_input; + + if (pen_input_dev) { + error = input_register_device(pen_input_dev); + if (error) + goto fail; + } + + if (touch_input_dev) { + error = input_register_device(touch_input_dev); + if (error) + goto fail; + } + + if (pad_input_dev) { error = input_register_device(pad_input_dev); if (error) goto fail; @@ -2376,6 +2395,20 @@ static int wacom_parse_and_register(stru if (error) goto fail;
+ error = wacom_setup_inputs(wacom); + if (error) + goto fail; + + if (features->type == HID_GENERIC) + connect_mask |= HID_CONNECT_DRIVER; + + /* Regular HID work starts now */ + error = hid_hw_start(hdev, connect_mask); + if (error) { + hid_err(hdev, "hw start failed\n"); + goto fail; + } + error = wacom_register_inputs(wacom); if (error) goto fail; @@ -2390,16 +2423,6 @@ static int wacom_parse_and_register(stru goto fail; }
- if (features->type == HID_GENERIC) - connect_mask |= HID_CONNECT_DRIVER; - - /* Regular HID work starts now */ - error = hid_hw_start(hdev, connect_mask); - if (error) { - hid_err(hdev, "hw start failed\n"); - goto fail; - } - if (!wireless) { /* Note that if query fails it is not a hard failure */ wacom_query_tablet_data(wacom);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com
commit 621c6257128149e45b36ffb973a01c3f3461b893 upstream.
When als_capture_sample() is called with usage ID HID_USAGE_SENSOR_TIME_TIMESTAMP, return 0. The HID sensor core ignores the return value for capture_sample() callback, so return value doesn't make difference. But correct the return value to return success instead of -EINVAL.
Signed-off-by: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Link: https://lore.kernel.org/r/20240204125617.2635574-1-srinivas.pandruvada@linux... Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/light/hid-sensor-als.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/iio/light/hid-sensor-als.c +++ b/drivers/iio/light/hid-sensor-als.c @@ -226,6 +226,7 @@ static int als_capture_sample(struct hid case HID_USAGE_SENSOR_TIME_TIMESTAMP: als_state->timestamp = hid_sensor_convert_timestamp(&als_state->common_attributes, *(s64 *)raw_data); + ret = 0; break; default: break;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian A. Ehrhardt lk@c--e.de
commit c9aed03a0a683fd1600ea92f2ad32232d4736272 upstream.
Calling ->sync_write must be done while holding the PPM lock as the mailbox logic does not support concurrent commands.
At least since the addition of partner task this means that ucsi_acknowledge_connector_change should be called with the PPM lock held as it calls ->sync_write.
Thus protect the only call to ucsi_acknowledge_connector_change with the PPM. All other calls to ->sync_write already happen under the PPM lock.
Fixes: b9aa02ca39a4 ("usb: typec: ucsi: Add polling mechanism for partner tasks like alt mode checking") Cc: stable@vger.kernel.org Signed-off-by: "Christian A. Ehrhardt" lk@c--e.de Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Link: https://lore.kernel.org/r/20240121204123.275441-2-lk@c--e.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/ucsi/ucsi.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/usb/typec/ucsi/ucsi.c +++ b/drivers/usb/typec/ucsi/ucsi.c @@ -935,7 +935,9 @@ static void ucsi_handle_connector_change
clear_bit(EVENT_PENDING, &con->ucsi->flags);
+ mutex_lock(&ucsi->ppm_lock); ret = ucsi_acknowledge_connector_change(ucsi); + mutex_unlock(&ucsi->ppm_lock); if (ret) dev_err(ucsi->dev, "%s: ACK failed (%d)", __func__, ret);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Anderson sean.anderson@seco.com
commit 3caf2b2ad7334ef35f55b95f3e1b138c6f77b368 upstream.
The ULPI per-device debugfs root is named after the ulpi device's parent, but ulpi_unregister_interface tries to remove a debugfs directory named after the ulpi device itself. This results in the directory sticking around and preventing subsequent (deferred) probes from succeeding. Change the directory name to match the ulpi device.
Fixes: bd0a0a024f2a ("usb: ulpi: Add debugfs support") Cc: stable@vger.kernel.org Signed-off-by: Sean Anderson sean.anderson@seco.com Link: https://lore.kernel.org/r/20240126223800.2864613-1-sean.anderson@seco.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/common/ulpi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/usb/common/ulpi.c +++ b/drivers/usb/common/ulpi.c @@ -301,7 +301,7 @@ static int ulpi_register(struct device * return ret; }
- root = debugfs_create_dir(dev_name(dev), ulpi_root); + root = debugfs_create_dir(dev_name(&ulpi->dev), ulpi_root); debugfs_create_file("regs", 0444, root, ulpi, &ulpi_regs_fops);
dev_dbg(&ulpi->dev, "registered ULPI PHY: vendor %04x, product %04x\n",
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian A. Ehrhardt lk@c--e.de
commit 2840143e393a4ddc1caab4372969ea337371168c upstream.
In case of a spurious or otherwise delayed notification it is possible that CCI still reports the previous completion. The UCSI spec is aware of this and provides two completion bits in CCI, one for normal commands and one for acks. As acks and commands alternate the notification handler can determine if the completion bit is from the current command.
The initial UCSI code correctly handled this but the distinction between the two completion bits was lost with the introduction of the new API.
To fix this revive the ACK_PENDING bit for ucsi_acpi and only complete commands if the completion bit matches.
Fixes: f56de278e8ec ("usb: typec: ucsi: acpi: Move to the new API") Cc: stable@vger.kernel.org Signed-off-by: "Christian A. Ehrhardt" lk@c--e.de Acked-by: Heikki Krogerus heikki.krogerus@linux.intel.com Link: https://lore.kernel.org/r/20240121204123.275441-3-lk@c--e.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/ucsi/ucsi_acpi.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)
--- a/drivers/usb/typec/ucsi/ucsi_acpi.c +++ b/drivers/usb/typec/ucsi/ucsi_acpi.c @@ -73,9 +73,13 @@ static int ucsi_acpi_sync_write(struct u const void *val, size_t val_len) { struct ucsi_acpi *ua = ucsi_get_drvdata(ucsi); + bool ack = UCSI_COMMAND(*(u64 *)val) == UCSI_ACK_CC_CI; int ret;
- set_bit(COMMAND_PENDING, &ua->flags); + if (ack) + set_bit(ACK_PENDING, &ua->flags); + else + set_bit(COMMAND_PENDING, &ua->flags);
ret = ucsi_acpi_async_write(ucsi, offset, val, val_len); if (ret) @@ -85,7 +89,10 @@ static int ucsi_acpi_sync_write(struct u ret = -ETIMEDOUT;
out_clear_bit: - clear_bit(COMMAND_PENDING, &ua->flags); + if (ack) + clear_bit(ACK_PENDING, &ua->flags); + else + clear_bit(COMMAND_PENDING, &ua->flags);
return ret; } @@ -142,8 +149,10 @@ static void ucsi_acpi_notify(acpi_handle if (UCSI_CCI_CONNECTOR(cci)) ucsi_connector_change(ua->ucsi, UCSI_CCI_CONNECTOR(cci));
- if (test_bit(COMMAND_PENDING, &ua->flags) && - cci & (UCSI_CCI_ACK_COMPLETE | UCSI_CCI_COMMAND_COMPLETE)) + if (cci & UCSI_CCI_ACK_COMPLETE && test_bit(ACK_PENDING, &ua->flags)) + complete(&ua->complete); + if (cci & UCSI_CCI_COMMAND_COMPLETE && + test_bit(COMMAND_PENDING, &ua->flags)) complete(&ua->complete); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oliver Neukum oneukum@suse.com
commit f17c34ffc792bbb520e4b61baa16b6cfc7d44b13 upstream.
The OTG 1.3 spec has the feature A_ALT_HNP_SUPPORT, which tells a device that it is connected to the wrong port. Some devices refuse to operate if you enable that feature, because it indicates to them that they ought to request to be connected to another port.
According to the spec this feature may be used based only the following three conditions:
6.5.3 a_alt_hnp_support Setting this feature indicates to the B-device that it is connected to an A-device port that is not capable of HNP, but that the A-device does have an alternate port that is capable of HNP. The A-device is required to set this feature under the following conditions: • the A-device has multiple receptacles • the A-device port that connects to the B-device does not support HNP • the A-device has another port that does support HNP
A check for the third and first condition is missing. Add it.
Signed-off-by: Oliver Neukum oneukum@suse.com Cc: stable stable@kernel.org Fixes: 7d2d641c44269 ("usb: otg: don't set a_alt_hnp_support feature for OTG 2.0 device") Link: https://lore.kernel.org/r/20240122153545.12284-1-oneukum@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/core/hub.c | 30 +++++++++++++++++++----------- 1 file changed, 19 insertions(+), 11 deletions(-)
--- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -2382,17 +2382,25 @@ static int usb_enumerate_device_otg(stru } } else if (desc->bLength == sizeof (struct usb_otg_descriptor)) { - /* Set a_alt_hnp_support for legacy otg device */ - err = usb_control_msg(udev, - usb_sndctrlpipe(udev, 0), - USB_REQ_SET_FEATURE, 0, - USB_DEVICE_A_ALT_HNP_SUPPORT, - 0, NULL, 0, - USB_CTRL_SET_TIMEOUT); - if (err < 0) - dev_err(&udev->dev, - "set a_alt_hnp_support failed: %d\n", - err); + /* + * We are operating on a legacy OTP device + * These should be told that they are operating + * on the wrong port if we have another port that does + * support HNP + */ + if (bus->otg_port != 0) { + /* Set a_alt_hnp_support for legacy otg device */ + err = usb_control_msg(udev, + usb_sndctrlpipe(udev, 0), + USB_REQ_SET_FEATURE, 0, + USB_DEVICE_A_ALT_HNP_SUPPORT, + 0, NULL, 0, + USB_CTRL_SET_TIMEOUT); + if (err < 0) + dev_err(&udev->dev, + "set a_alt_hnp_support failed: %d\n", + err); + } } } #endif
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: yuan linyu yuanlinyu@hihonor.com
commit b2d2d7ea0dd09802cf5a0545bf54d8ad8987d20c upstream.
When write UDC to empty and unbind gadget driver from gadget device, it is possible that there are many queue failures for mass storage function.
The root cause is mass storage main thread alaways try to queue request to receive a command from host if running flag is on, on platform like dwc3, if pull down called, it will not queue request again and return -ESHUTDOWN, but it not affect running flag of mass storage function.
Check return code from mass storage function and clear running flag if it is -ESHUTDOWN, also indicate start in/out transfer failure to break loops.
Cc: stable stable@kernel.org Signed-off-by: yuan linyu yuanlinyu@hihonor.com Reviewed-by: Alan Stern stern@rowland.harvard.edu Link: https://lore.kernel.org/r/20240123034829.3848409-1-yuanlinyu@hihonor.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/gadget/function/f_mass_storage.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
--- a/drivers/usb/gadget/function/f_mass_storage.c +++ b/drivers/usb/gadget/function/f_mass_storage.c @@ -545,21 +545,37 @@ static int start_transfer(struct fsg_dev
static bool start_in_transfer(struct fsg_common *common, struct fsg_buffhd *bh) { + int rc; + if (!fsg_is_set(common)) return false; bh->state = BUF_STATE_SENDING; - if (start_transfer(common->fsg, common->fsg->bulk_in, bh->inreq)) + rc = start_transfer(common->fsg, common->fsg->bulk_in, bh->inreq); + if (rc) { bh->state = BUF_STATE_EMPTY; + if (rc == -ESHUTDOWN) { + common->running = 0; + return false; + } + } return true; }
static bool start_out_transfer(struct fsg_common *common, struct fsg_buffhd *bh) { + int rc; + if (!fsg_is_set(common)) return false; bh->state = BUF_STATE_RECEIVING; - if (start_transfer(common->fsg, common->fsg->bulk_out, bh->outreq)) + rc = start_transfer(common->fsg, common->fsg->bulk_out, bh->outreq); + if (rc) { bh->state = BUF_STATE_FULL; + if (rc == -ESHUTDOWN) { + common->running = 0; + return false; + } + } return true; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xu Yang xu.yang_2@nxp.com
commit cc509b6a47e7c8998d9e41c273191299d5d9d631 upstream.
When power is recycled in usb controller during system power management, the controller will recognize it and switch role if role has been changed during power lost. In current design, it will be completed in resume() function. However, this may bring issues since usb class devices have their pm operations too and these device's resume() functions are still not being called at this point. When usb controller recognized host role should be stopped, these usb class devices will be removed at this point. But these usb class devices can't be removed in some cases, such as scsi devices. Since scsi driver may sync data to U-disk, however it will block there because scsi drvier can only handle pm request when is in suspended state. Therefore, there may exist a dependency between ci_resume() and usb class device's resume(). To break this potential dependency, we need to handle power lost work in a workqueue.
Fixes: 74494b33211d ("usb: chipidea: core: add controller resume support when controller is powered off") cc: stable@vger.kernel.org Signed-off-by: Xu Yang xu.yang_2@nxp.com Link: https://lore.kernel.org/r/20240119123537.3614838-1-xu.yang_2@nxp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/chipidea/ci.h | 2 ++ drivers/usb/chipidea/core.c | 44 ++++++++++++++++++++++++-------------------- 2 files changed, 26 insertions(+), 20 deletions(-)
--- a/drivers/usb/chipidea/ci.h +++ b/drivers/usb/chipidea/ci.h @@ -176,6 +176,7 @@ struct hw_bank { * @enabled_otg_timer_bits: bits of enabled otg timers * @next_otg_timer: next nearest enabled timer to be expired * @work: work for role changing + * @power_lost_work: work for power lost handling * @wq: workqueue thread * @qh_pool: allocation pool for queue heads * @td_pool: allocation pool for transfer descriptors @@ -226,6 +227,7 @@ struct ci_hdrc { enum otg_fsm_timer next_otg_timer; struct usb_role_switch *role_switch; struct work_struct work; + struct work_struct power_lost_work; struct workqueue_struct *wq;
struct dma_pool *qh_pool; --- a/drivers/usb/chipidea/core.c +++ b/drivers/usb/chipidea/core.c @@ -856,6 +856,27 @@ static int ci_extcon_register(struct ci_ return 0; }
+static void ci_power_lost_work(struct work_struct *work) +{ + struct ci_hdrc *ci = container_of(work, struct ci_hdrc, power_lost_work); + enum ci_role role; + + disable_irq_nosync(ci->irq); + pm_runtime_get_sync(ci->dev); + if (!ci_otg_is_fsm_mode(ci)) { + role = ci_get_role(ci); + + if (ci->role != role) { + ci_handle_id_switch(ci); + } else if (role == CI_ROLE_GADGET) { + if (ci->is_otg && hw_read_otgsc(ci, OTGSC_BSV)) + usb_gadget_vbus_connect(&ci->gadget); + } + } + pm_runtime_put_sync(ci->dev); + enable_irq(ci->irq); +} + static DEFINE_IDA(ci_ida);
struct platform_device *ci_hdrc_add_device(struct device *dev, @@ -1045,6 +1066,8 @@ static int ci_hdrc_probe(struct platform
spin_lock_init(&ci->lock); mutex_init(&ci->mutex); + INIT_WORK(&ci->power_lost_work, ci_power_lost_work); + ci->dev = dev; ci->platdata = dev_get_platdata(dev); ci->imx28_write_fix = !!(ci->platdata->flags & @@ -1396,25 +1419,6 @@ static int ci_suspend(struct device *dev return 0; }
-static void ci_handle_power_lost(struct ci_hdrc *ci) -{ - enum ci_role role; - - disable_irq_nosync(ci->irq); - if (!ci_otg_is_fsm_mode(ci)) { - role = ci_get_role(ci); - - if (ci->role != role) { - ci_handle_id_switch(ci); - } else if (role == CI_ROLE_GADGET) { - if (ci->is_otg && hw_read_otgsc(ci, OTGSC_BSV)) - usb_gadget_vbus_connect(&ci->gadget); - } - } - - enable_irq(ci->irq); -} - static int ci_resume(struct device *dev) { struct ci_hdrc *ci = dev_get_drvdata(dev); @@ -1446,7 +1450,7 @@ static int ci_resume(struct device *dev) ci_role(ci)->resume(ci, power_lost);
if (power_lost) - ci_handle_power_lost(ci); + queue_work(system_freezable_wq, &ci->power_lost_work);
if (ci->supports_runtime_pm) { pm_runtime_disable(dev);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Udipto Goswami quic_ugoswami@quicinc.com
commit 12783c0b9e2c7915a50d5ec829630ff2da50472c upstream.
Currently, the function update_port_device_state gets the usb_hub from udev->parent by calling usb_hub_to_struct_hub. However, in case the actconfig or the maxchild is 0, the usb_hub would be NULL and upon further accessing to get port_dev would result in null pointer dereference.
Fix this by introducing an if check after the usb_hub is populated.
Fixes: 83cb2604f641 ("usb: core: add sysfs entry for usb device state") Cc: stable@vger.kernel.org Signed-off-by: Udipto Goswami quic_ugoswami@quicinc.com Reviewed-by: Alan Stern stern@rowland.harvard.edu Link: https://lore.kernel.org/r/20240110095814.7626-1-quic_ugoswami@quicinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/core/hub.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-)
--- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -2047,9 +2047,19 @@ static void update_port_device_state(str
if (udev->parent) { hub = usb_hub_to_struct_hub(udev->parent); - port_dev = hub->ports[udev->portnum - 1]; - WRITE_ONCE(port_dev->state, udev->state); - sysfs_notify_dirent(port_dev->state_kn); + + /* + * The Link Layer Validation System Driver (lvstest) + * has a test step to unbind the hub before running the + * rest of the procedure. This triggers hub_disconnect + * which will set the hub's maxchild to 0, further + * resulting in usb_hub_to_struct_hub returning NULL. + */ + if (hub) { + port_dev = hub->ports[udev->portnum - 1]; + WRITE_ONCE(port_dev->state, udev->state); + sysfs_notify_dirent(port_dev->state_kn); + } } }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Uttkarsh Aggarwal quic_uaggarwa@quicinc.com
commit 61a348857e869432e6a920ad8ea9132e8d44c316 upstream.
In current scenario if Plug-out and Plug-In performed continuously there could be a chance while checking for dwc->gadget_driver in dwc3_gadget_suspend, a NULL pointer dereference may occur.
Call Stack:
CPU1: CPU2: gadget_unbind_driver dwc3_suspend_common dwc3_gadget_stop dwc3_gadget_suspend dwc3_disconnect_gadget
CPU1 basically clears the variable and CPU2 checks the variable. Consider CPU1 is running and right before gadget_driver is cleared and in parallel CPU2 executes dwc3_gadget_suspend where it finds dwc->gadget_driver which is not NULL and resumes execution and then CPU1 completes execution. CPU2 executes dwc3_disconnect_gadget where it checks dwc->gadget_driver is already NULL because of which the NULL pointer deference occur.
Cc: stable@vger.kernel.org Fixes: 9772b47a4c29 ("usb: dwc3: gadget: Fix suspend/resume during device mode") Acked-by: Thinh Nguyen Thinh.Nguyen@synopsys.com Signed-off-by: Uttkarsh Aggarwal quic_uaggarwa@quicinc.com Link: https://lore.kernel.org/r/20240119094825.26530-1-quic_uaggarwa@quicinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc3/gadget.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
--- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -4703,15 +4703,13 @@ int dwc3_gadget_suspend(struct dwc3 *dwc unsigned long flags; int ret;
- if (!dwc->gadget_driver) - return 0; - ret = dwc3_gadget_soft_disconnect(dwc); if (ret) goto err;
spin_lock_irqsave(&dwc->lock, flags); - dwc3_disconnect_gadget(dwc); + if (dwc->gadget_driver) + dwc3_disconnect_gadget(dwc); spin_unlock_irqrestore(&dwc->lock, flags);
return 0;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Konrad Dybcio konrad.dybcio@linaro.org
[ Upstream commit 85e985a4f46e462a37f1875cb74ed380e7c0c2e0 ]
The CO0 BCM needs to be up at all times, otherwise some hardware (like the UFS controller) loses its connection to the rest of the SoC, resulting in a hang of the platform, accompanied by a spectacular logspam.
Mark it as keepalive to prevent such cases.
Fixes: 9c8c6bac1ae8 ("interconnect: qcom: Add SC8180x providers") Signed-off-by: Konrad Dybcio konrad.dybcio@linaro.org Link: https://lore.kernel.org/r/20231214-topic-sc8180_fixes-v1-1-421904863006@lina... Signed-off-by: Georgi Djakov djakov@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/interconnect/qcom/sc8180x.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/interconnect/qcom/sc8180x.c b/drivers/interconnect/qcom/sc8180x.c index 20331e119beb..03d626776ba1 100644 --- a/drivers/interconnect/qcom/sc8180x.c +++ b/drivers/interconnect/qcom/sc8180x.c @@ -1372,6 +1372,7 @@ static struct qcom_icc_bcm bcm_mm0 = {
static struct qcom_icc_bcm bcm_co0 = { .name = "CO0", + .keepalive = true, .num_nodes = 1, .nodes = { &slv_qns_cdsp_mem_noc } };
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Konrad Dybcio konrad.dybcio@linaro.org
[ Upstream commit 24406f6794aa631516241deb9e19de333d6a0600 ]
To ensure the interconnect votes are actually meaningful and in order to prevent holding all buses at FMAX, introduce the sync state callback.
Fixes: e6f0d6a30f73 ("interconnect: qcom: Add SM8550 interconnect provider driver") Signed-off-by: Konrad Dybcio konrad.dybcio@linaro.org Reviewed-by: Bjorn Andersson andersson@kernel.org Link: https://lore.kernel.org/r/20231218-topic-8550_fixes-v1-2-ce1272d77540@linaro... Signed-off-by: Georgi Djakov djakov@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/interconnect/qcom/sm8550.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/interconnect/qcom/sm8550.c b/drivers/interconnect/qcom/sm8550.c index 629faa4c9aae..fc22cecf650f 100644 --- a/drivers/interconnect/qcom/sm8550.c +++ b/drivers/interconnect/qcom/sm8550.c @@ -2223,6 +2223,7 @@ static struct platform_driver qnoc_driver = { .driver = { .name = "qnoc-sm8550", .of_match_table = qnoc_of_match, + .sync_state = icc_sync_state, }, };
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhipeng Lu alexious@zju.edu.cn
[ Upstream commit dc9ceb90c4b42c6e5c6757df1d6257110433788e ]
When irtoy_command fails, buf should be freed since it is allocated by irtoy_tx, or there is a memleak.
Fixes: 4114978dcd24 ("media: ir_toy: prevent device from hanging during transmit") Signed-off-by: Zhipeng Lu alexious@zju.edu.cn Signed-off-by: Sean Young sean@mess.org Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/rc/ir_toy.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/media/rc/ir_toy.c b/drivers/media/rc/ir_toy.c index 196806709259..69e630d85262 100644 --- a/drivers/media/rc/ir_toy.c +++ b/drivers/media/rc/ir_toy.c @@ -332,6 +332,7 @@ static int irtoy_tx(struct rc_dev *rc, uint *txbuf, uint count) sizeof(COMMAND_SMODE_EXIT), STATE_COMMAND_NO_RESP); if (err) { dev_err(irtoy->dev, "exit sample mode: %d\n", err); + kfree(buf); return err; }
@@ -339,6 +340,7 @@ static int irtoy_tx(struct rc_dev *rc, uint *txbuf, uint count) sizeof(COMMAND_SMODE_ENTER), STATE_COMMAND); if (err) { dev_err(irtoy->dev, "enter sample mode: %d\n", err); + kfree(buf); return err; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Saravana Kannan saravanak@google.com
[ Upstream commit 6442d79d880cf7a2fff18779265d657fef0cce4c ]
fw_devlink can detect most overlapping/intersecting cycles. However it was missing a few corner cases because of an incorrect optimization logic that tries to avoid repeating cycle detection for devices that are already marked as part of a cycle.
Here's an example provided by Xu Yang (edited for clarity):
usb +-----+ tcpc | | +-----+ | +--| | |----------->|EP| |--+ | | +--| |EP|<-----------| | |--+ | | B | | | +-----+ | A | | +-----+ | ^ +-----+ | | | | | +-----| C |<--+ | | +-----+ usb-phy
Node A (tcpc) will be populated as device 1-0050. Node B (usb) will be populated as device 38100000.usb. Node C (usb-phy) will be populated as device 381f0040.usb-phy.
The description below uses the notation: consumer --> supplier child ==> parent
1. Node C is populated as device C. No cycles detected because cycle detection is only run when a fwnode link is converted to a device link.
2. Node B is populated as device B. As we convert B --> C into a device link we run cycle detection and find and mark the device link/fwnode link cycle: C--> A --> B.EP ==> B --> C
3. Node A is populated as device A. As we convert C --> A into a device link, we see it's already part of a cycle (from step 2) and don't run cycle detection. Thus we miss detecting the cycle: A --> B.EP ==> B --> A.EP ==> A
Looking at it another way, A depends on B in one way: A --> B.EP ==> B
But B depends on A in two ways and we only detect the first: B --> C --> A B --> A.EP ==> A
To detect both of these, we remove the incorrect optimization attempt in step 3 and run cycle detection even if the fwnode link from which the device link is being created has already been marked as part of a cycle.
Reported-by: Xu Yang xu.yang_2@nxp.com Closes: https://lore.kernel.org/lkml/DU2PR04MB8822693748725F85DC0CB86C8C792@DU2PR04M... Fixes: 3fb16866b51d ("driver core: fw_devlink: Make cycle detection more robust") Signed-off-by: Saravana Kannan saravanak@google.com Tested-by: Xu Yang xu.yang_2@nxp.com Link: https://lore.kernel.org/r/20240202095636.868578-3-saravanak@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/base/core.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/base/core.c b/drivers/base/core.c index c18c7d4c6971..f9fb69249110 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -2060,9 +2060,14 @@ static int fw_devlink_create_devlink(struct device *con,
/* * SYNC_STATE_ONLY device links don't block probing and supports cycles. - * So cycle detection isn't necessary and shouldn't be done. + * So, one might expect that cycle detection isn't necessary for them. + * However, if the device link was marked as SYNC_STATE_ONLY because + * it's part of a cycle, then we still need to do cycle detection. This + * is because the consumer and supplier might be part of multiple cycles + * and we need to detect all those cycles. */ - if (!(flags & DL_FLAG_SYNC_STATE_ONLY)) { + if (!device_link_flag_is_sync_state_only(flags) || + flags & DL_FLAG_CYCLE) { device_links_write_lock(); if (__fw_devlink_relax_cycles(con, sup_handle)) { __fwnode_link_cycle(link);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gaurav Batra gbatra@linux.ibm.com
[ Upstream commit ed8b94f6e0acd652ce69bd69d678a0c769172df8 ]
When a PCI device is dynamically added, the kernel oopses with a NULL pointer dereference:
BUG: Kernel NULL pointer dereference on read at 0x00000030 Faulting instruction address: 0xc0000000006bbe5c Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs xsk_diag bonding nft_compat nf_tables nfnetlink rfkill binfmt_misc dm_multipath rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_umad ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm mlx5_ib ib_uverbs ib_core pseries_rng drm drm_panel_orientation_quirks xfs libcrc32c mlx5_core mlxfw sd_mod t10_pi sg tls ibmvscsi ibmveth scsi_transport_srp vmx_crypto pseries_wdt psample dm_mirror dm_region_hash dm_log dm_mod fuse CPU: 17 PID: 2685 Comm: drmgr Not tainted 6.7.0-203405+ #66 Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_008) hv:phyp pSeries NIP: c0000000006bbe5c LR: c000000000a13e68 CTR: c0000000000579f8 REGS: c00000009924f240 TRAP: 0300 Not tainted (6.7.0-203405+) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24002220 XER: 20040006 CFAR: c000000000a13e64 DAR: 0000000000000030 DSISR: 40000000 IRQMASK: 0 ... NIP sysfs_add_link_to_group+0x34/0x94 LR iommu_device_link+0x5c/0x118 Call Trace: iommu_init_device+0x26c/0x318 (unreliable) iommu_device_link+0x5c/0x118 iommu_init_device+0xa8/0x318 iommu_probe_device+0xc0/0x134 iommu_bus_notifier+0x44/0x104 notifier_call_chain+0xb8/0x19c blocking_notifier_call_chain+0x64/0x98 bus_notify+0x50/0x7c device_add+0x640/0x918 pci_device_add+0x23c/0x298 of_create_pci_dev+0x400/0x884 of_scan_pci_dev+0x124/0x1b0 __of_scan_bus+0x78/0x18c pcibios_scan_phb+0x2a4/0x3b0 init_phb_dynamic+0xb8/0x110 dlpar_add_slot+0x170/0x3b8 [rpadlpar_io] add_slot_store.part.0+0xb4/0x130 [rpadlpar_io] kobj_attr_store+0x2c/0x48 sysfs_kf_write+0x64/0x78 kernfs_fop_write_iter+0x1b0/0x290 vfs_write+0x350/0x4a0 ksys_write+0x84/0x140 system_call_exception+0x124/0x330 system_call_vectored_common+0x15c/0x2ec
Commit a940904443e4 ("powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains") broke DLPAR add of PCI devices.
The above added iommu_device structure to pci_controller. During system boot, PCI devices are discovered and this newly added iommu_device structure is initialized by a call to iommu_device_register().
During DLPAR add of a PCI device, a new pci_controller structure is allocated but there are no calls made to iommu_device_register() interface.
Fix is to register the iommu device during DLPAR add as well.
Fixes: a940904443e4 ("powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains") Signed-off-by: Gaurav Batra gbatra@linux.ibm.com [mpe: Trim oops and tweak some change log wording] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20240122222407.39603-1-gbatra@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/ppc-pci.h | 3 +++ arch/powerpc/kernel/iommu.c | 21 ++++++++++++++++----- arch/powerpc/platforms/pseries/pci_dlpar.c | 4 ++++ 3 files changed, 23 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h index d9fcff575027..e500a7b9d1b5 100644 --- a/arch/powerpc/include/asm/ppc-pci.h +++ b/arch/powerpc/include/asm/ppc-pci.h @@ -29,6 +29,9 @@ void *pci_traverse_device_nodes(struct device_node *start, void *(*fn)(struct device_node *, void *), void *data); extern void pci_devs_phb_init_dynamic(struct pci_controller *phb); +extern void ppc_iommu_register_device(struct pci_controller *phb); +extern void ppc_iommu_unregister_device(struct pci_controller *phb); +
/* From rtas_pci.h */ extern void init_pci_config_tokens (void); diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index ebe259bdd462..c6f62e130d55 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -1388,6 +1388,21 @@ static const struct attribute_group *spapr_tce_iommu_groups[] = { NULL, };
+void ppc_iommu_register_device(struct pci_controller *phb) +{ + iommu_device_sysfs_add(&phb->iommu, phb->parent, + spapr_tce_iommu_groups, "iommu-phb%04x", + phb->global_number); + iommu_device_register(&phb->iommu, &spapr_tce_iommu_ops, + phb->parent); +} + +void ppc_iommu_unregister_device(struct pci_controller *phb) +{ + iommu_device_unregister(&phb->iommu); + iommu_device_sysfs_remove(&phb->iommu); +} + /* * This registers IOMMU devices of PHBs. This needs to happen * after core_initcall(iommu_init) + postcore_initcall(pci_driver_init) and @@ -1398,11 +1413,7 @@ static int __init spapr_tce_setup_phb_iommus_initcall(void) struct pci_controller *hose;
list_for_each_entry(hose, &hose_list, list_node) { - iommu_device_sysfs_add(&hose->iommu, hose->parent, - spapr_tce_iommu_groups, "iommu-phb%04x", - hose->global_number); - iommu_device_register(&hose->iommu, &spapr_tce_iommu_ops, - hose->parent); + ppc_iommu_register_device(hose); } return 0; } diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c b/arch/powerpc/platforms/pseries/pci_dlpar.c index 4ba824568119..4448386268d9 100644 --- a/arch/powerpc/platforms/pseries/pci_dlpar.c +++ b/arch/powerpc/platforms/pseries/pci_dlpar.c @@ -35,6 +35,8 @@ struct pci_controller *init_phb_dynamic(struct device_node *dn)
pseries_msi_allocate_domains(phb);
+ ppc_iommu_register_device(phb); + /* Create EEH devices for the PHB */ eeh_phb_pe_create(phb);
@@ -76,6 +78,8 @@ int remove_phb_dynamic(struct pci_controller *phb) } }
+ ppc_iommu_unregister_device(phb); + pseries_msi_free_domains(phb);
/* Keep a reference so phb isn't freed yet */
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthias Schiffer matthias.schiffer@ew.tq-group.com
[ Upstream commit a038a3ff8c6582404834852c043dadc73a5b68b4 ]
MMU_FTR_USE_HIGH_BATS is set for G2_LE cores and derivatives like e300cX, but the high BATs need to be enabled in HID2 to work. Add register definitions and add the needed setup to __setup_cpu_603.
This fixes boot on CPUs like the MPC5200B with STRICT_KERNEL_RWX enabled on systems where the flag has not been set by the bootloader already.
Fixes: e4d6654ebe6e ("powerpc/mm/32s: rework mmu_mapin_ram()") Signed-off-by: Matthias Schiffer matthias.schiffer@ew.tq-group.com Reviewed-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20240124103838.43675-1-matthias.schiffer@ew.tq-group.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/reg.h | 2 ++ arch/powerpc/kernel/cpu_setup_6xx.S | 20 +++++++++++++++++++- 2 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 4ae4ab9090a2..ade5f094dbd2 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -617,6 +617,8 @@ #endif #define SPRN_HID2 0x3F8 /* Hardware Implementation Register 2 */ #define SPRN_HID2_GEKKO 0x398 /* Gekko HID2 Register */ +#define SPRN_HID2_G2_LE 0x3F3 /* G2_LE HID2 Register */ +#define HID2_G2_LE_HBE (1<<18) /* High BAT Enable (G2_LE) */ #define SPRN_IABR 0x3F2 /* Instruction Address Breakpoint Register */ #define SPRN_IABR2 0x3FA /* 83xx */ #define SPRN_IBCR 0x135 /* 83xx Insn Breakpoint Control Reg */ diff --git a/arch/powerpc/kernel/cpu_setup_6xx.S b/arch/powerpc/kernel/cpu_setup_6xx.S index f29ce3dd6140..bfd3f442e5eb 100644 --- a/arch/powerpc/kernel/cpu_setup_6xx.S +++ b/arch/powerpc/kernel/cpu_setup_6xx.S @@ -26,6 +26,15 @@ BEGIN_FTR_SECTION bl __init_fpu_registers END_FTR_SECTION_IFCLR(CPU_FTR_FPU_UNAVAILABLE) bl setup_common_caches + + /* + * This assumes that all cores using __setup_cpu_603 with + * MMU_FTR_USE_HIGH_BATS are G2_LE compatible + */ +BEGIN_MMU_FTR_SECTION + bl setup_g2_le_hid2 +END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_HIGH_BATS) + mtlr r5 blr _GLOBAL(__setup_cpu_604) @@ -115,6 +124,16 @@ SYM_FUNC_START_LOCAL(setup_604_hid0) blr SYM_FUNC_END(setup_604_hid0)
+/* Enable high BATs for G2_LE and derivatives like e300cX */ +SYM_FUNC_START_LOCAL(setup_g2_le_hid2) + mfspr r11,SPRN_HID2_G2_LE + oris r11,r11,HID2_G2_LE_HBE@h + mtspr SPRN_HID2_G2_LE,r11 + sync + isync + blr +SYM_FUNC_END(setup_g2_le_hid2) + /* 7400 <= rev 2.7 and 7410 rev = 1.0 suffer from some * erratas we work around here. * Moto MPC710CE.pdf describes them, those are errata @@ -495,4 +514,3 @@ _GLOBAL(__restore_cpu_setup) mtcr r7 blr _ASM_NOKPROBE_SYMBOL(__restore_cpu_setup) -
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiangfeng Xiao xiaojiangfeng@huawei.com
[ Upstream commit 4a7aee96200ad281a5cc4cf5c7a2e2a49d2b97b0 ]
In kasan_init_region, when k_start is not page aligned, at the begin of for loop, k_cur = k_start & PAGE_MASK is less than k_start, and then `va = block + k_cur - k_start` is less than block, the addr va is invalid, because the memory address space from va to block is not alloced by memblock_alloc, which will not be reserved by memblock_reserve later, it will be used by other places.
As a result, memory overwriting occurs.
for example: int __init __weak kasan_init_region(void *start, size_t size) { [...] /* if say block(dcd97000) k_start(feef7400) k_end(feeff3fe) */ block = memblock_alloc(k_end - k_start, PAGE_SIZE); [...] for (k_cur = k_start & PAGE_MASK; k_cur < k_end; k_cur += PAGE_SIZE) { /* at the begin of for loop * block(dcd97000) va(dcd96c00) k_cur(feef7000) k_start(feef7400) * va(dcd96c00) is less than block(dcd97000), va is invalid */ void *va = block + k_cur - k_start; [...] } [...] }
Therefore, page alignment is performed on k_start before memblock_alloc() to ensure the validity of the VA address.
Fixes: 663c0c9496a6 ("powerpc/kasan: Fix shadow area set up for modules.") Signed-off-by: Jiangfeng Xiao xiaojiangfeng@huawei.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/1705974359-43790-1-git-send-email-xiaojiangfeng@huawei.co... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/mm/kasan/init_32.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/powerpc/mm/kasan/init_32.c b/arch/powerpc/mm/kasan/init_32.c index a70828a6d935..aa9aa11927b2 100644 --- a/arch/powerpc/mm/kasan/init_32.c +++ b/arch/powerpc/mm/kasan/init_32.c @@ -64,6 +64,7 @@ int __init __weak kasan_init_region(void *start, size_t size) if (ret) return ret;
+ k_start = k_start & PAGE_MASK; block = memblock_alloc(k_end - k_start, PAGE_SIZE); if (!block) return -ENOMEM;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
[ Upstream commit 3ca8fbabcceb8bfe44f7f50640092fd8f1de375c ]
This reverts commit 1b28cb81dab7c1eedc6034206f4e8d644046ad31.
It is reported to cause problems, so revert it for now until the root cause can be found.
Reported-by: kernel test robot oliver.sang@intel.com Fixes: 1b28cb81dab7 ("kobject: Remove redundant checks for whether ktype is NULL") Cc: Zhen Lei thunder.leizhen@huawei.com Closes: https://lore.kernel.org/oe-lkp/202402071403.e302e33a-oliver.sang@intel.com Link: https://lore.kernel.org/r/2024020849-consensus-length-6264@gregkh Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- lib/kobject.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/lib/kobject.c b/lib/kobject.c index 59dbcbdb1c91..72fa20f405f1 100644 --- a/lib/kobject.c +++ b/lib/kobject.c @@ -74,10 +74,12 @@ static int create_dir(struct kobject *kobj) if (error) return error;
- error = sysfs_create_groups(kobj, ktype->default_groups); - if (error) { - sysfs_remove_dir(kobj); - return error; + if (ktype) { + error = sysfs_create_groups(kobj, ktype->default_groups); + if (error) { + sysfs_remove_dir(kobj); + return error; + } }
/* @@ -589,7 +591,8 @@ static void __kobject_del(struct kobject *kobj) sd = kobj->sd; ktype = get_ktype(kobj);
- sysfs_remove_groups(kobj, ktype->default_groups); + if (ktype) + sysfs_remove_groups(kobj, ktype->default_groups);
/* send "remove" if the caller did not do it but sent "add" */ if (kobj->state_add_uevent_sent && !kobj->state_remove_uevent_sent) { @@ -666,6 +669,10 @@ static void kobject_cleanup(struct kobject *kobj) pr_debug("'%s' (%p): %s, parent %p\n", kobject_name(kobj), kobj, __func__, kobj->parent);
+ if (t && !t->release) + pr_debug("'%s' (%p): does not have a release() function, it is broken and must be fixed. See Documentation/core-api/kobject.rst.\n", + kobject_name(kobj), kobj); + /* remove from sysfs if the caller did not do it */ if (kobj->state_in_sysfs) { pr_debug("'%s' (%p): auto cleanup kobject_del\n", @@ -676,13 +683,10 @@ static void kobject_cleanup(struct kobject *kobj) parent = NULL; }
- if (t->release) { + if (t && t->release) { pr_debug("'%s' (%p): calling ktype release\n", kobject_name(kobj), kobj); t->release(kobj); - } else { - pr_debug("'%s' (%p): does not have a release() function, it is broken and must be fixed. See Documentation/core-api/kobject.rst.\n", - kobject_name(kobj), kobj); }
/* free name if we allocated it */ @@ -1056,7 +1060,7 @@ const struct kobj_ns_type_operations *kobj_child_ns_ops(const struct kobject *pa { const struct kobj_ns_type_operations *ops = NULL;
- if (parent && parent->ktype->child_ns_type) + if (parent && parent->ktype && parent->ktype->child_ns_type) ops = parent->ktype->child_ns_type(parent);
return ops;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Williamson alex.williamson@redhat.com
[ Upstream commit 41044d5360685e78a869d40a168491a70cdb7e73 ]
The commit noted in fixes added a bogus requirement that runtime PM managed devices need to be in the RPM_ACTIVE state for PME polling. In fact, only devices in low power states should be polled.
However there's still a requirement that the device config space must be accessible, which has implications for both the current state of the polled device and the parent bridge, when present. It's not sufficient to assume the bridge remains in D0 and cases have been observed where the bridge passes the D0 test, but the PM state indicates RPM_SUSPENDING and config space of the polled device becomes inaccessible during pci_pme_wakeup().
Therefore, since the bridge is already effectively required to be in the RPM_ACTIVE state, formalize this in the code and elevate the PM usage count to maintain the state while polling the subordinate device.
This resolves a regression reported in the bugzilla below where a Thunderbolt/USB4 hierarchy fails to scan for an attached NVMe endpoint downstream of a bridge in a D3hot power state.
Link: https://lore.kernel.org/r/20240123185548.1040096-1-alex.williamson@redhat.co... Fixes: d3fcd7360338 ("PCI: Fix runtime PM race with PME polling") Reported-by: Sanath S sanath.s@amd.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218360 Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Tested-by: Sanath S sanath.s@amd.com Reviewed-by: Rafael J. Wysocki rafael@kernel.org Cc: Lukas Wunner lukas@wunner.de Cc: Mika Westerberg mika.westerberg@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/pci.c | 37 ++++++++++++++++++++++--------------- 1 file changed, 22 insertions(+), 15 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index b2000b14faa0..e62de9730422 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -2459,29 +2459,36 @@ static void pci_pme_list_scan(struct work_struct *work) if (pdev->pme_poll) { struct pci_dev *bridge = pdev->bus->self; struct device *dev = &pdev->dev; - int pm_status; + struct device *bdev = bridge ? &bridge->dev : NULL; + int bref = 0;
/* - * If bridge is in low power state, the - * configuration space of subordinate devices - * may be not accessible + * If we have a bridge, it should be in an active/D0 + * state or the configuration space of subordinate + * devices may not be accessible or stable over the + * course of the call. */ - if (bridge && bridge->current_state != PCI_D0) - continue; + if (bdev) { + bref = pm_runtime_get_if_active(bdev, true); + if (!bref) + continue; + + if (bridge->current_state != PCI_D0) + goto put_bridge; + }
/* - * If the device is in a low power state it - * should not be polled either. + * The device itself should be suspended but config + * space must be accessible, therefore it cannot be in + * D3cold. */ - pm_status = pm_runtime_get_if_active(dev, true); - if (!pm_status) - continue; - - if (pdev->current_state != PCI_D3cold) + if (pm_runtime_suspended(dev) && + pdev->current_state != PCI_D3cold) pci_pme_wakeup(pdev, NULL);
- if (pm_status > 0) - pm_runtime_put(dev); +put_bridge: + if (bref > 0) + pm_runtime_put(bdev); } else { list_del(&pme_dev->list); kfree(pme_dev);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cosmin Tanislav demonsingur@gmail.com
[ Upstream commit a22b0a2be69a36511cb5b37d948b651ddf7debf3 ]
The clk_init_data struct does not have all its members initialized, causing issues when trying to expose the internal clock on the CLK pin.
Fix this by zero-initializing the clk_init_data struct.
Fixes: 62094060cf3a ("iio: adc: ad4130: add AD4130 driver") Signed-off-by: Cosmin Tanislav demonsingur@gmail.com Reviewed-by: Nuno Sa nuno.sa@analog.com Link: https://lore.kernel.org/r/20240207132007.253768-1-demonsingur@gmail.com Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iio/adc/ad4130.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/adc/ad4130.c b/drivers/iio/adc/ad4130.c index feb86fe6c422..9daeac16499b 100644 --- a/drivers/iio/adc/ad4130.c +++ b/drivers/iio/adc/ad4130.c @@ -1821,7 +1821,7 @@ static int ad4130_setup_int_clk(struct ad4130_state *st) { struct device *dev = &st->spi->dev; struct device_node *of_node = dev_of_node(dev); - struct clk_init_data init; + struct clk_init_data init = {}; const char *clk_name; int ret;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cosmin Tanislav demonsingur@gmail.com
[ Upstream commit 78367c32bebfe833cd30c855755d863a4ff3fdee ]
Currently, GPIO_CTRL bits are set even if the pins are used for measurements.
GPIO_CTRL bits should only be set if the pin is not used for other functionality.
Fix this by only setting the GPIO_CTRL bits if the pin has no other function.
Fixes: 62094060cf3a ("iio: adc: ad4130: add AD4130 driver") Signed-off-by: Cosmin Tanislav demonsingur@gmail.com Reviewed-by: Nuno Sa nuno.sa@analog.com Link: https://lore.kernel.org/r/20240207132007.253768-2-demonsingur@gmail.com Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iio/adc/ad4130.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/iio/adc/ad4130.c b/drivers/iio/adc/ad4130.c index 9daeac16499b..62490424b6ae 100644 --- a/drivers/iio/adc/ad4130.c +++ b/drivers/iio/adc/ad4130.c @@ -1891,10 +1891,14 @@ static int ad4130_setup(struct iio_dev *indio_dev) return ret;
/* - * Configure all GPIOs for output. If configured, the interrupt function - * of P2 takes priority over the GPIO out function. + * Configure unused GPIOs for output. If configured, the interrupt + * function of P2 takes priority over the GPIO out function. */ - val = AD4130_IO_CONTROL_GPIO_CTRL_MASK; + val = 0; + for (i = 0; i < AD4130_MAX_GPIOS; i++) + if (st->pins_fn[i + AD4130_AIN2_P1] == AD4130_PIN_FN_NONE) + val |= FIELD_PREP(AD4130_IO_CONTROL_GPIO_CTRL_MASK, BIT(i)); + val |= FIELD_PREP(AD4130_IO_CONTROL_INT_PIN_SEL_MASK, st->int_pin_sel);
ret = regmap_write(st->regmap, AD4130_IO_CONTROL_REG, val);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit cffe487026be13eaf37ea28b783d9638ab147204 ]
In this loop, we step through the buffer and after each item we check if the size_left is greater than the minimum size we need. However, the problem is that "bytes_left" is type ssize_t while sizeof() is type size_t. That means that because of type promotion, the comparison is done as an unsigned and if we have negative bytes left the loop continues instead of ending.
Fixes: fe856be475f7 ("CIFS: parse and store info on iface queries") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Reviewed-by: Shyam Prasad N sprasad@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/smb2ops.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/smb/client/smb2ops.c b/fs/smb/client/smb2ops.c index beb81fa00cff..ba734395b036 100644 --- a/fs/smb/client/smb2ops.c +++ b/fs/smb/client/smb2ops.c @@ -619,7 +619,7 @@ parse_server_interfaces(struct network_interface_info_ioctl_rsp *buf, goto out; }
- while (bytes_left >= sizeof(*p)) { + while (bytes_left >= (ssize_t)sizeof(*p)) { memset(&tmp_iface, 0, sizeof(tmp_iface)); tmp_iface.speed = le64_to_cpu(p->LinkSpeed); tmp_iface.rdma_capable = le32_to_cpu(p->Capability & RDMA_CAPABLE) ? 1 : 0;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Viken Dadhaniya quic_vdadhani@quicinc.com
[ Upstream commit 83ef106fa732aea8558253641cd98e8a895604d7 ]
For i2c read operation in GSI mode, we are getting timeout due to malformed TRE basically incorrect TRE sequence in gpi(drivers/dma/qcom/gpi.c) driver.
I2C driver has geni_i2c_gpi(I2C_WRITE) function which generates GO TRE and geni_i2c_gpi(I2C_READ)generates DMA TRE. Hence to generate GO TRE before DMA TRE, we should move geni_i2c_gpi(I2C_WRITE) before geni_i2c_gpi(I2C_READ) inside the I2C GSI mode transfer function i.e. geni_i2c_gpi_xfer().
TRE stands for Transfer Ring Element - which is basically an element with size of 4 words. It contains all information like slave address, clk divider, dma address value data size etc).
Mainly we have 3 TREs(Config, GO and DMA tre). - CONFIG TRE : consists of internal register configuration which is required before start of the transfer. - DMA TRE : contains DDR/Memory address, called as DMA descriptor. - GO TRE : contains Transfer directions, slave ID, Delay flags, Length of the transfer.
I2c driver calls GPI driver API to config each TRE depending on the protocol.
For read operation tre sequence will be as below which is not aligned to hardware programming guide.
- CONFIG tre - DMA tre - GO tre
As per Qualcomm's internal Hardware Programming Guide, we should configure TREs in below sequence for any RX only transfer.
- CONFIG tre - GO tre - DMA tre
Fixes: d8703554f4de ("i2c: qcom-geni: Add support for GPI DMA") Reviewed-by: Andi Shyti andi.shyti@kernel.org Reviewed-by: Bryan O'Donoghue bryan.odonoghue@linaro.org Tested-by: Bryan O'Donoghue bryan.odonoghue@linaro.org # qrb5165-rb5 Co-developed-by: Mukesh Kumar Savaliya quic_msavaliy@quicinc.com Signed-off-by: Mukesh Kumar Savaliya quic_msavaliy@quicinc.com Signed-off-by: Viken Dadhaniya quic_vdadhani@quicinc.com Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Signed-off-by: Andi Shyti andi.shyti@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-qcom-geni.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/i2c/busses/i2c-qcom-geni.c b/drivers/i2c/busses/i2c-qcom-geni.c index 0d2e7171e3a6..da94df466e83 100644 --- a/drivers/i2c/busses/i2c-qcom-geni.c +++ b/drivers/i2c/busses/i2c-qcom-geni.c @@ -613,20 +613,20 @@ static int geni_i2c_gpi_xfer(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[], i
peripheral.addr = msgs[i].addr;
+ ret = geni_i2c_gpi(gi2c, &msgs[i], &config, + &tx_addr, &tx_buf, I2C_WRITE, gi2c->tx_c); + if (ret) + goto err; + if (msgs[i].flags & I2C_M_RD) { ret = geni_i2c_gpi(gi2c, &msgs[i], &config, &rx_addr, &rx_buf, I2C_READ, gi2c->rx_c); if (ret) goto err; - } - - ret = geni_i2c_gpi(gi2c, &msgs[i], &config, - &tx_addr, &tx_buf, I2C_WRITE, gi2c->tx_c); - if (ret) - goto err;
- if (msgs[i].flags & I2C_M_RD) dma_async_issue_pending(gi2c->rx_c); + } + dma_async_issue_pending(gi2c->tx_c);
timeout = wait_for_completion_timeout(&gi2c->done, XFER_TIMEOUT);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bibo Mao maobibo@loongson.cn
[ Upstream commit f1c2765c6afcd1f71f76ed8c9bf94acedab4cecb ]
eiointc_domain_alloc() uses struct eiointc, which is not defined, for a pointer. Older compilers treat that as a forward declaration and due to assignment of a void pointer there is no warning emitted. As the variable is then handed in as a void pointer argument to irq_domain_set_info() the code is functional.
Use struct eiointc_priv instead.
[ tglx: Rewrote changelog ]
Fixes: dd281e1a1a93 ("irqchip: Add Loongson Extended I/O interrupt controller support") Signed-off-by: Bibo Mao maobibo@loongson.cn Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Huacai Chen chenhuacai@loongson.cn Link: https://lore.kernel.org/r/20240130082722.2912576-2-maobibo@loongson.cn Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/irqchip/irq-loongson-eiointc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-loongson-eiointc.c b/drivers/irqchip/irq-loongson-eiointc.c index 1623cd779175..b3736bdd4b9f 100644 --- a/drivers/irqchip/irq-loongson-eiointc.c +++ b/drivers/irqchip/irq-loongson-eiointc.c @@ -241,7 +241,7 @@ static int eiointc_domain_alloc(struct irq_domain *domain, unsigned int virq, int ret; unsigned int i, type; unsigned long hwirq = 0; - struct eiointc *priv = domain->host_data; + struct eiointc_priv *priv = domain->host_data;
ret = irq_domain_translate_onecell(domain, arg, &hwirq, &type); if (ret)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier maz@kernel.org
[ Upstream commit 846297e11e8ae428f8b00156a0cfe2db58100702 ]
Although the GICv3 code base has gained some handling of systems failing to handle the shareability attributes, the GICv4 side of things has been firmly ignored.
This is unfortunate, as the new recent addition of the "dma-noncoherent" is supposed to apply to all of the GICR tables, and not just the ones that are common to v3 and v4.
Add some checks to handle the VPROPBASE/VPENDBASE shareability and cacheability attributes in the same way we deal with the other GICR_BASE registers, wrapping the flag check in a helper for improved readability.
Note that this has been found by inspection only, as I don't have access to HW that suffers from this particular issue.
Fixes: 3a0fff0fb6a3 ("irqchip/gic-v3: Enable non-coherent redistributors/ITSes DT probing") Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Lorenzo Pieralisi lpieralisi@kernel.org Link: https://lore.kernel.org/r/20240213101206.2137483-2-maz@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/irqchip/irq-gic-v3-its.c | 37 +++++++++++++++++++++----------- 1 file changed, 25 insertions(+), 12 deletions(-)
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 9a7a74239eab..bdc2c8330479 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -207,6 +207,11 @@ static bool require_its_list_vmovp(struct its_vm *vm, struct its_node *its) return (gic_rdists->has_rvpeid || vm->vlpi_count[its->list_nr]); }
+static bool rdists_support_shareable(void) +{ + return !(gic_rdists->flags & RDIST_FLAGS_FORCE_NON_SHAREABLE); +} + static u16 get_its_list(struct its_vm *vm) { struct its_node *its; @@ -2710,10 +2715,12 @@ static u64 inherit_vpe_l1_table_from_its(void) break; } val |= FIELD_PREP(GICR_VPROPBASER_4_1_ADDR, addr >> 12); - val |= FIELD_PREP(GICR_VPROPBASER_SHAREABILITY_MASK, - FIELD_GET(GITS_BASER_SHAREABILITY_MASK, baser)); - val |= FIELD_PREP(GICR_VPROPBASER_INNER_CACHEABILITY_MASK, - FIELD_GET(GITS_BASER_INNER_CACHEABILITY_MASK, baser)); + if (rdists_support_shareable()) { + val |= FIELD_PREP(GICR_VPROPBASER_SHAREABILITY_MASK, + FIELD_GET(GITS_BASER_SHAREABILITY_MASK, baser)); + val |= FIELD_PREP(GICR_VPROPBASER_INNER_CACHEABILITY_MASK, + FIELD_GET(GITS_BASER_INNER_CACHEABILITY_MASK, baser)); + } val |= FIELD_PREP(GICR_VPROPBASER_4_1_SIZE, GITS_BASER_NR_PAGES(baser) - 1);
return val; @@ -2936,8 +2943,10 @@ static int allocate_vpe_l1_table(void) WARN_ON(!IS_ALIGNED(pa, psz));
val |= FIELD_PREP(GICR_VPROPBASER_4_1_ADDR, pa >> 12); - val |= GICR_VPROPBASER_RaWb; - val |= GICR_VPROPBASER_InnerShareable; + if (rdists_support_shareable()) { + val |= GICR_VPROPBASER_RaWb; + val |= GICR_VPROPBASER_InnerShareable; + } val |= GICR_VPROPBASER_4_1_Z; val |= GICR_VPROPBASER_4_1_VALID;
@@ -3126,7 +3135,7 @@ static void its_cpu_init_lpis(void) gicr_write_propbaser(val, rbase + GICR_PROPBASER); tmp = gicr_read_propbaser(rbase + GICR_PROPBASER);
- if (gic_rdists->flags & RDIST_FLAGS_FORCE_NON_SHAREABLE) + if (!rdists_support_shareable()) tmp &= ~GICR_PROPBASER_SHAREABILITY_MASK;
if ((tmp ^ val) & GICR_PROPBASER_SHAREABILITY_MASK) { @@ -3153,7 +3162,7 @@ static void its_cpu_init_lpis(void) gicr_write_pendbaser(val, rbase + GICR_PENDBASER); tmp = gicr_read_pendbaser(rbase + GICR_PENDBASER);
- if (gic_rdists->flags & RDIST_FLAGS_FORCE_NON_SHAREABLE) + if (!rdists_support_shareable()) tmp &= ~GICR_PENDBASER_SHAREABILITY_MASK;
if (!(tmp & GICR_PENDBASER_SHAREABILITY_MASK)) { @@ -3880,14 +3889,18 @@ static void its_vpe_schedule(struct its_vpe *vpe) val = virt_to_phys(page_address(vpe->its_vm->vprop_page)) & GENMASK_ULL(51, 12); val |= (LPI_NRBITS - 1) & GICR_VPROPBASER_IDBITS_MASK; - val |= GICR_VPROPBASER_RaWb; - val |= GICR_VPROPBASER_InnerShareable; + if (rdists_support_shareable()) { + val |= GICR_VPROPBASER_RaWb; + val |= GICR_VPROPBASER_InnerShareable; + } gicr_write_vpropbaser(val, vlpi_base + GICR_VPROPBASER);
val = virt_to_phys(page_address(vpe->vpt_page)) & GENMASK_ULL(51, 16); - val |= GICR_VPENDBASER_RaWaWb; - val |= GICR_VPENDBASER_InnerShareable; + if (rdists_support_shareable()) { + val |= GICR_VPENDBASER_RaWaWb; + val |= GICR_VPENDBASER_InnerShareable; + } /* * There is no good way of finding out if the pending table is * empty as we can race against the doorbell interrupt very
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Ellerman mpe@ellerman.id.au
[ Upstream commit f1acb109505d983779bbb7e20a1ee6244d2b5736 ]
KASAN is seen to increase stack usage, to the point that it was reported to lead to stack overflow on some 32-bit machines (see link).
To avoid overflows the stack size was doubled for KASAN builds in commit 3e8635fb2e07 ("powerpc/kasan: Force thread size increase with KASAN").
However with a 32KB stack size to begin with, the doubling leads to a 64KB stack, which causes build errors: arch/powerpc/kernel/switch.S:249: Error: operand out of range (0x000000000000fe50 is not between 0xffffffffffff8000 and 0x0000000000007fff)
Although the asm could be reworked, in practice a 32KB stack seems sufficient even for KASAN builds - the additional usage seems to be in the 2-3KB range for a 64-bit KASAN build.
So only increase the stack for KASAN if the stack size is < 32KB.
Fixes: 18f14afe2816 ("powerpc/64s: Increase default stack size to 32KB") Reported-by: Spoorthy spoorthy@linux.ibm.com Reported-by: Benjamin Gray bgray@linux.ibm.com Reviewed-by: Benjamin Gray bgray@linux.ibm.com Link: https://lore.kernel.org/linuxppc-dev/bug-207129-206035@https.bugzilla.kernel... Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20240212064244.3924505-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/thread_info.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index bf5dde1a4114..15c5691dd218 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -14,7 +14,7 @@
#ifdef __KERNEL__
-#ifdef CONFIG_KASAN +#if defined(CONFIG_KASAN) && CONFIG_THREAD_SHIFT < 15 #define MIN_THREAD_SHIFT (CONFIG_THREAD_SHIFT + 1) #else #define MIN_THREAD_SHIFT CONFIG_THREAD_SHIFT
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shivaprasad G Bhat sbhat@linux.ibm.com
[ Upstream commit 0846dd77c8349ec92ca0079c9c71d130f34cb192 ]
The function spapr_tce_platform_iommu_attach_dev() is missing to call iommu_group_put() when the domain is already set. This refcount leak shows up with BUG_ON() during DLPAR remove operation as:
KernelBug: Kernel bug in state 'None': kernel BUG at arch/powerpc/platforms/pseries/iommu.c:100! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries <snip> Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_016) hv:phyp pSeries NIP: c0000000000ff4d4 LR: c0000000000ff4cc CTR: 0000000000000000 REGS: c0000013aed5f840 TRAP: 0700 Tainted: G I (6.8.0-rc3-autotest-g99bd3cb0d12e) MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 44002402 XER: 20040000 CFAR: c000000000a0d170 IRQMASK: 0 ... NIP iommu_reconfig_notifier+0x94/0x200 LR iommu_reconfig_notifier+0x8c/0x200 Call Trace: iommu_reconfig_notifier+0x8c/0x200 (unreliable) notifier_call_chain+0xb8/0x19c blocking_notifier_call_chain+0x64/0x98 of_reconfig_notify+0x44/0xdc of_detach_node+0x78/0xb0 ofdt_write.part.0+0x86c/0xbb8 proc_reg_write+0xf4/0x150 vfs_write+0xf8/0x488 ksys_write+0x84/0x140 system_call_exception+0x138/0x330 system_call_vectored_common+0x15c/0x2ec
The patch adds the missing iommu_group_put() call.
Fixes: a8ca9fc9134c ("powerpc/iommu: Do not do platform domain attach atctions after probe") Reported-by: Venkat Rao Bagalkote venkat88@linux.vnet.ibm.com Closes: https://lore.kernel.org/all/274e0d2b-b5cc-475e-94e6-8427e88e271d@linux.vnet.... Signed-off-by: Shivaprasad G Bhat sbhat@linux.ibm.com Tested-by: Venkat Rao Bagalkote venkat88@linux.vnet.ibm.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/170784021983.6249.10039296655906636112.stgit@linux.ibm.co... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/iommu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index c6f62e130d55..4393d447cb56 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -1290,8 +1290,10 @@ spapr_tce_platform_iommu_attach_dev(struct iommu_domain *platform_domain, int ret = -EINVAL;
/* At first attach the ownership is already set */ - if (!domain) + if (!domain) { + iommu_group_put(grp); return 0; + }
if (!grp) return -ENODEV;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit f44bff19268517ee98e80e944cad0f04f1db72e3 ]
On powerpc, it is possible to compile test both the new apple (arm) and old pasemi (powerpc) drivers for the i2c hardware at the same time, which leads to a warning about linking the same object file twice:
scripts/Makefile.build:244: drivers/i2c/busses/Makefile: i2c-pasemi-core.o is added to multiple modules: i2c-apple i2c-pasemi
Rework the driver to have an explicit helper module, letting Kbuild take care of whether this should be built-in or a loadable driver.
Fixes: 9bc5f4f660ff ("i2c: pasemi: Split pci driver to its own file") Signed-off-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Sven Peter sven@svenpeter.dev Signed-off-by: Andi Shyti andi.shyti@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/Makefile | 6 ++---- drivers/i2c/busses/i2c-pasemi-core.c | 6 ++++++ 2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile index 3757b9391e60..aa0ee8ecd6f2 100644 --- a/drivers/i2c/busses/Makefile +++ b/drivers/i2c/busses/Makefile @@ -90,10 +90,8 @@ obj-$(CONFIG_I2C_NPCM) += i2c-npcm7xx.o obj-$(CONFIG_I2C_OCORES) += i2c-ocores.o obj-$(CONFIG_I2C_OMAP) += i2c-omap.o obj-$(CONFIG_I2C_OWL) += i2c-owl.o -i2c-pasemi-objs := i2c-pasemi-core.o i2c-pasemi-pci.o -obj-$(CONFIG_I2C_PASEMI) += i2c-pasemi.o -i2c-apple-objs := i2c-pasemi-core.o i2c-pasemi-platform.o -obj-$(CONFIG_I2C_APPLE) += i2c-apple.o +obj-$(CONFIG_I2C_PASEMI) += i2c-pasemi-core.o i2c-pasemi-pci.o +obj-$(CONFIG_I2C_APPLE) += i2c-pasemi-core.o i2c-pasemi-platform.o obj-$(CONFIG_I2C_PCA_PLATFORM) += i2c-pca-platform.o obj-$(CONFIG_I2C_PNX) += i2c-pnx.o obj-$(CONFIG_I2C_PXA) += i2c-pxa.o diff --git a/drivers/i2c/busses/i2c-pasemi-core.c b/drivers/i2c/busses/i2c-pasemi-core.c index 7d54a9f34c74..bd8becbdeeb2 100644 --- a/drivers/i2c/busses/i2c-pasemi-core.c +++ b/drivers/i2c/busses/i2c-pasemi-core.c @@ -369,6 +369,7 @@ int pasemi_i2c_common_probe(struct pasemi_smbus *smbus)
return 0; } +EXPORT_SYMBOL_GPL(pasemi_i2c_common_probe);
irqreturn_t pasemi_irq_handler(int irq, void *dev_id) { @@ -378,3 +379,8 @@ irqreturn_t pasemi_irq_handler(int irq, void *dev_id) complete(&smbus->irq_completion); return IRQ_HANDLED; } +EXPORT_SYMBOL_GPL(pasemi_irq_handler); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Olof Johansson olof@lixom.net"); +MODULE_DESCRIPTION("PA Semi PWRficient SMBus driver");
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jean Delvare jdelvare@suse.de
[ Upstream commit c1c9d0f6f7f1dbf29db996bd8e166242843a5f21 ]
According to the Intel datasheets, software must reset the block buffer index twice for block process call transactions: once before writing the outgoing data to the buffer, and once again before reading the incoming data from the buffer.
The driver is currently missing the second reset, causing the wrong portion of the block buffer to be read.
Signed-off-by: Jean Delvare jdelvare@suse.de Reported-by: Piotr Zakowski piotr.zakowski@intel.com Closes: https://lore.kernel.org/linux-i2c/20240213120553.7b0ab120@endymion.delvare/ Fixes: 315cd67c9453 ("i2c: i801: Add Block Write-Block Read Process Call support") Reviewed-by: Alexander Sverdlin alexander.sverdlin@gmail.com Signed-off-by: Andi Shyti andi.shyti@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-i801.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/i2c/busses/i2c-i801.c b/drivers/i2c/busses/i2c-i801.c index 070999139c6d..6a5a93cf4ecc 100644 --- a/drivers/i2c/busses/i2c-i801.c +++ b/drivers/i2c/busses/i2c-i801.c @@ -498,11 +498,10 @@ static int i801_block_transaction_by_block(struct i801_priv *priv, /* Set block buffer mode */ outb_p(inb_p(SMBAUXCTL(priv)) | SMBAUXCTL_E32B, SMBAUXCTL(priv));
- inb_p(SMBHSTCNT(priv)); /* reset the data buffer index */ - if (read_write == I2C_SMBUS_WRITE) { len = data->block[0]; outb_p(len, SMBHSTDAT0(priv)); + inb_p(SMBHSTCNT(priv)); /* reset the data buffer index */ for (i = 0; i < len; i++) outb_p(data->block[i+1], SMBBLKDAT(priv)); } @@ -520,6 +519,7 @@ static int i801_block_transaction_by_block(struct i801_priv *priv, }
data->block[0] = len; + inb_p(SMBHSTCNT(priv)); /* reset the data buffer index */ for (i = 0; i < len; i++) data->block[i + 1] = inb_p(SMBBLKDAT(priv)); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Radek Krejci radek.krejci@oracle.com
[ Upstream commit 5d9a16b2a4d9e8fa028892ded43f6501bc2969e5 ]
get_line() does not trim the leading spaces, but the parse_source_files() expects to get lines with source files paths where the first space occurs after the file path.
Fixes: 70f30cfe5b89 ("modpost: use read_text_file() and get_line() for reading text files") Signed-off-by: Radek Krejci radek.krejci@oracle.com Signed-off-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/mod/sumversion.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/scripts/mod/sumversion.c b/scripts/mod/sumversion.c index 31066bfdba04..dc4878502276 100644 --- a/scripts/mod/sumversion.c +++ b/scripts/mod/sumversion.c @@ -326,7 +326,12 @@ static int parse_source_files(const char *objfile, struct md4_ctx *md)
/* Sum all files in the same dir or subdirs. */ while ((line = get_line(&pos))) { - char* p = line; + char* p; + + /* trim the leading spaces away */ + while (isspace(*line)) + line++; + p = line;
if (strncmp(line, "source_", sizeof("source_")-1) == 0) { p = strrchr(line, ' ');
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit a951884d82886d8453d489f84f20ac168d062b38 ]
lld is now able to build ARMv4 and ARMv4T kernels, which means it can generate thunks for those (__ARMv4PILongThunk_*, __ARMv4PILongBXThunk_*) that can interfere with kallsyms table generation since they do not get ignore like the corresponding ARMv5+ ones are:
Inconsistent kallsyms data Try "make KALLSYMS_EXTRA_PASS=1" as a workaround
Replace the hardcoded list of thunk symbols with a more general regex that covers this one along with future symbols that follow the same pattern.
Fixes: 5eb6e280432d ("ARM: 9289/1: Allow pre-ARMv5 builds with ld.lld 16.0.0 and newer") Fixes: efe6e3068067 ("kallsyms: fix nonconverging kallsyms table with lld") Suggested-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/mksysmap | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-)
diff --git a/scripts/mksysmap b/scripts/mksysmap index 9ba1c9da0a40..57ff5656d566 100755 --- a/scripts/mksysmap +++ b/scripts/mksysmap @@ -48,17 +48,8 @@ ${NM} -n ${1} | sed >${2} -e " / __kvm_nvhe_\$/d / __kvm_nvhe_.L/d
-# arm64 lld -/ __AArch64ADRPThunk_/d - -# arm lld -/ __ARMV5PILongThunk_/d -/ __ARMV7PILongThunk_/d -/ __ThumbV7PILongThunk_/d - -# mips lld -/ __LA25Thunk_/d -/ __microLA25Thunk_/d +# lld arm/aarch64/mips thunks +/ __[[:alnum:]]*Thunk_/d
# CFI type identifiers / __kcfi_typeid_/d
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
commit b6c620dc43ccb4e802894e54b651cf81495e9598 upstream.
When the MPTCP PM detects that a subflow is stale, all the packet scheduler must re-inject all the mptcp-level unacked data. To avoid acquiring unneeded locks, it first try to check if any unacked data is present at all in the RTX queue, but such check is currently broken, as it uses TCP-specific helper on an MPTCP socket.
Funnily enough fuzzers and static checkers are happy, as the accessed memory still belongs to the mptcp_sock struct, and even from a functional perspective the recovery completed successfully, as the short-cut test always failed.
A recent unrelated TCP change - commit d5fed5addb2b ("tcp: reorganize tcp_sock fast path variables") - exposed the issue, as the tcp field reorganization makes the mptcp code always skip the re-inection.
Fix the issue dropping the bogus call: we are on a slow path, the early optimization proved once again to be evil.
Fixes: 1e1d9d6f119c ("mptcp: handle pending data on closed subflow") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/468 Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.c | 3 --- 1 file changed, 3 deletions(-)
--- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2328,9 +2328,6 @@ bool __mptcp_retransmit_pending_data(str if (__mptcp_check_fallback(msk)) return false;
- if (tcp_rtx_and_write_queues_empty(sk)) - return false; - /* the closing socket has some data untransmitted and/or unacked: * some data in the mptcp rtx queue has not really xmitted yet. * keep it simple and re-inject the whole mptcp level rtx queue
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit 3645c844902bd4e173d6704fc2a37e8746904d67 upstream.
Since the commit mentioned below, 'mptcp_join' selftests is using IPTables to add rules to the Filter table.
It is then required to have IP_NF_FILTER KConfig.
This KConfig is usually enabled by default in many defconfig, but we recently noticed that some CI were running our selftests without them enabled.
Fixes: 8d014eaa9254 ("selftests: mptcp: add ADD_ADDR timeout test case") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/config | 1 + 1 file changed, 1 insertion(+)
--- a/tools/testing/selftests/net/mptcp/config +++ b/tools/testing/selftests/net/mptcp/config @@ -22,6 +22,7 @@ CONFIG_NFT_TPROXY=m CONFIG_NFT_SOCKET=m CONFIG_IP_ADVANCED_ROUTER=y CONFIG_IP_MULTIPLE_TABLES=y +CONFIG_IP_NF_FILTER=m CONFIG_IP_NF_TARGET_REJECT=m CONFIG_IPV6_MULTIPLE_TABLES=y CONFIG_NET_ACT_CSUM=m
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit 8c86fad2cecdc6bf7283ecd298b4d0555bd8b8aa upstream.
Since the commit mentioned below, 'mptcp_join' selftests is using IPTables to add rules to the Filter table for IPv6.
It is then required to have IP6_NF_FILTER KConfig.
This KConfig is usually enabled by default in many defconfig, but we recently noticed that some CI were running our selftests without them enabled.
Fixes: 523514ed0a99 ("selftests: mptcp: add ADD_ADDR IPv6 test cases") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/config | 1 + 1 file changed, 1 insertion(+)
--- a/tools/testing/selftests/net/mptcp/config +++ b/tools/testing/selftests/net/mptcp/config @@ -25,6 +25,7 @@ CONFIG_IP_MULTIPLE_TABLES=y CONFIG_IP_NF_FILTER=m CONFIG_IP_NF_TARGET_REJECT=m CONFIG_IPV6_MULTIPLE_TABLES=y +CONFIG_IP6_NF_FILTER=m CONFIG_NET_ACT_CSUM=m CONFIG_NET_ACT_PEDIT=m CONFIG_NET_CLS_ACT=y
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit 2d41f10fa497182df9012d3e95d9cea24eb42e61 upstream.
Since the commit mentioned below, 'mptcp_join' selftests is using IPTables to add rules to the Mangle table, only in IPv4.
This KConfig is usually enabled by default in many defconfig, but we recently noticed that some CI were running our selftests without them enabled.
Fixes: b6e074e171bc ("selftests: mptcp: add infinite map testcase") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/config | 1 + 1 file changed, 1 insertion(+)
--- a/tools/testing/selftests/net/mptcp/config +++ b/tools/testing/selftests/net/mptcp/config @@ -23,6 +23,7 @@ CONFIG_NFT_SOCKET=m CONFIG_IP_ADVANCED_ROUTER=y CONFIG_IP_MULTIPLE_TABLES=y CONFIG_IP_NF_FILTER=m +CONFIG_IP_NF_MANGLE=m CONFIG_IP_NF_TARGET_REJECT=m CONFIG_IPV6_MULTIPLE_TABLES=y CONFIG_IP6_NF_FILTER=m
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit 4d4dfb2019d7010efb65926d9d1c1793f9a367c6 upstream.
On very slow environments -- e.g. when QEmu is used without KVM --, mptcp_join.sh selftest can take a bit more than 20 minutes. Bump the default timeout by 50% as it seems normal to take that long on some environments.
When a debug kernel config is used, this selftest will take even longer, but that's certainly not a common test env to consider for the timeout.
The Fixes tag that has been picked here is there simply to help having this patch backported to older stable versions. It is difficult to point to the exact commit that made some env reaching the timeout from time to time.
Fixes: d17b968b9876 ("selftests: mptcp: increase timeout to 20 minutes") Cc: stable@vger.kernel.org Acked-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/settings | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/net/mptcp/settings +++ b/tools/testing/selftests/net/mptcp/settings @@ -1 +1 @@ -timeout=1200 +timeout=1800
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit de46d138e7735eded9756906747fd3a8c3a42225 upstream.
If a CI executes the same selftest multiple times with different options, all results from the same subtests will have the same title, which confuse the CI. With the same title printed in TAP, the tests are considered as the same ones.
Now, it is possible to override this prefix by using MPTCP_LIB_KSFT_TEST env var, and have a different title.
While at it, use 'basename' to remove the suffix as well instead of using an extra 'sed'.
Fixes: c4192967e62f ("selftests: mptcp: lib: format subtests results in TAP") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_lib.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_lib.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_lib.sh @@ -6,7 +6,7 @@ readonly KSFT_FAIL=1 readonly KSFT_SKIP=4
# shellcheck disable=SC2155 # declare and assign separately -readonly KSFT_TEST=$(basename "${0}" | sed 's/.sh$//g') +readonly KSFT_TEST="${MPTCP_LIB_KSFT_TEST:-$(basename "${0}" .sh)}"
MPTCP_LIB_SUBTESTS=()
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geliang Tang geliang.tang@suse.com
commit bdbef0a6ff10603895b0ba39f56bf874cb2b551a upstream.
To avoid duplicated code in different MPTCP selftests, we can add and use helpers defined in mptcp_lib.sh.
Export kill_wait() helper in userspace_pm.sh into mptcp_lib.sh and rename it as mptcp_lib_kill_wait(). It can be used to instead of kill_wait() in mptcp_join.sh. Use the new helper in both scripts.
Reviewed-by: Matthieu Baerts matttbe@kernel.org Signed-off-by: Geliang Tang geliang.tang@suse.com Signed-off-by: Mat Martineau martineau@kernel.org Link: https://lore.kernel.org/r/20231128-send-net-next-2023107-v4-9-8d6b94150f6b@k... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 10 +------ tools/testing/selftests/net/mptcp/mptcp_lib.sh | 9 ++++++ tools/testing/selftests/net/mptcp/userspace_pm.sh | 31 +++++++--------------- 3 files changed, 22 insertions(+), 28 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -682,16 +682,10 @@ wait_mpj() done }
-kill_wait() -{ - kill $1 > /dev/null 2>&1 - wait $1 2>/dev/null -} - kill_events_pids() { - kill_wait $evts_ns1_pid - kill_wait $evts_ns2_pid + mptcp_lib_kill_wait $evts_ns1_pid + mptcp_lib_kill_wait $evts_ns2_pid }
kill_tests_wait() --- a/tools/testing/selftests/net/mptcp/mptcp_lib.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_lib.sh @@ -207,3 +207,12 @@ mptcp_lib_result_print_all_tap() { printf "%s\n" "${subtest}" done } + +# $1: PID +mptcp_lib_kill_wait() { + [ "${1}" -eq 0 ] && return 0 + + kill -SIGUSR1 "${1}" > /dev/null 2>&1 + kill "${1}" > /dev/null 2>&1 + wait "${1}" 2>/dev/null +} --- a/tools/testing/selftests/net/mptcp/userspace_pm.sh +++ b/tools/testing/selftests/net/mptcp/userspace_pm.sh @@ -108,15 +108,6 @@ test_fail() mptcp_lib_result_fail "${test_name}" }
-kill_wait() -{ - [ $1 -eq 0 ] && return 0 - - kill -SIGUSR1 $1 > /dev/null 2>&1 - kill $1 > /dev/null 2>&1 - wait $1 2>/dev/null -} - # This function is used in the cleanup trap #shellcheck disable=SC2317 cleanup() @@ -128,7 +119,7 @@ cleanup() for pid in $client4_pid $server4_pid $client6_pid $server6_pid\ $server_evts_pid $client_evts_pid do - kill_wait $pid + mptcp_lib_kill_wait $pid done
local netns @@ -210,7 +201,7 @@ make_connection() fi :>"$client_evts" if [ $client_evts_pid -ne 0 ]; then - kill_wait $client_evts_pid + mptcp_lib_kill_wait $client_evts_pid fi ip netns exec "$ns2" ./pm_nl_ctl events >> "$client_evts" 2>&1 & client_evts_pid=$! @@ -219,7 +210,7 @@ make_connection() fi :>"$server_evts" if [ $server_evts_pid -ne 0 ]; then - kill_wait $server_evts_pid + mptcp_lib_kill_wait $server_evts_pid fi ip netns exec "$ns1" ./pm_nl_ctl events >> "$server_evts" 2>&1 & server_evts_pid=$! @@ -627,7 +618,7 @@ test_subflows() "10.0.2.2" "$client4_port" "23" "$client_addr_id" "ns1" "ns2"
# Delete the listener from the client ns, if one was created - kill_wait $listener_pid + mptcp_lib_kill_wait $listener_pid
local sport sport=$(sed --unbuffered -n 's/.*(sport:)([[:digit:]]*).*$/\2/p;q' "$server_evts") @@ -666,7 +657,7 @@ test_subflows() "$client_addr_id" "ns1" "ns2"
# Delete the listener from the client ns, if one was created - kill_wait $listener_pid + mptcp_lib_kill_wait $listener_pid
sport=$(sed --unbuffered -n 's/.*(sport:)([[:digit:]]*).*$/\2/p;q' "$server_evts")
@@ -705,7 +696,7 @@ test_subflows() "$client_addr_id" "ns1" "ns2"
# Delete the listener from the client ns, if one was created - kill_wait $listener_pid + mptcp_lib_kill_wait $listener_pid
sport=$(sed --unbuffered -n 's/.*(sport:)([[:digit:]]*).*$/\2/p;q' "$server_evts")
@@ -743,7 +734,7 @@ test_subflows() "10.0.2.1" "$app4_port" "23" "$server_addr_id" "ns2" "ns1"
# Delete the listener from the server ns, if one was created - kill_wait $listener_pid + mptcp_lib_kill_wait $listener_pid
sport=$(sed --unbuffered -n 's/.*(sport:)([[:digit:]]*).*$/\2/p;q' "$client_evts")
@@ -782,7 +773,7 @@ test_subflows() "$server_addr_id" "ns2" "ns1"
# Delete the listener from the server ns, if one was created - kill_wait $listener_pid + mptcp_lib_kill_wait $listener_pid
sport=$(sed --unbuffered -n 's/.*(sport:)([[:digit:]]*).*$/\2/p;q' "$client_evts")
@@ -819,7 +810,7 @@ test_subflows() "10.0.2.2" "10.0.2.1" "$new4_port" "23" "$server_addr_id" "ns2" "ns1"
# Delete the listener from the server ns, if one was created - kill_wait $listener_pid + mptcp_lib_kill_wait $listener_pid
sport=$(sed --unbuffered -n 's/.*(sport:)([[:digit:]]*).*$/\2/p;q' "$client_evts")
@@ -865,7 +856,7 @@ test_subflows_v4_v6_mix() "$server_addr_id" "ns2" "ns1"
# Delete the listener from the server ns, if one was created - kill_wait $listener_pid + mptcp_lib_kill_wait $listener_pid
sport=$(sed --unbuffered -n 's/.*(sport:)([[:digit:]]*).*$/\2/p;q' "$client_evts")
@@ -982,7 +973,7 @@ test_listener() sleep 0.5
# Delete the listener from the client ns, if one was created - kill_wait $listener_pid + mptcp_lib_kill_wait $listener_pid
sleep 0.5 verify_listener_events $client_evts $LISTENER_CLOSED $AF_INET 10.0.2.2 $client4_port
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
commit bdd70eb68913c960acb895b00a8c62eb64715b1f upstream.
Such field is there to avoid acquiring the data lock in a few spots, but it adds complexity to the already non trivial locking schema.
All the relevant call sites (mptcp-level re-injection, set socket options), are slow-path, drop such field in favor of 'cb_flags', adding the relevant locking.
This patch could be seen as an improvement, instead of a fix. But it simplifies the next patch. The 'Fixes' tag has been added to help having this series backported to stable.
Fixes: e9d09baca676 ("mptcp: avoid atomic bit manipulation when possible") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.c | 12 ++++++------ net/mptcp/protocol.h | 1 - 2 files changed, 6 insertions(+), 7 deletions(-)
--- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1519,8 +1519,11 @@ static void mptcp_update_post_push(struc
void mptcp_check_and_set_pending(struct sock *sk) { - if (mptcp_send_head(sk)) - mptcp_sk(sk)->push_pending |= BIT(MPTCP_PUSH_PENDING); + if (mptcp_send_head(sk)) { + mptcp_data_lock(sk); + mptcp_sk(sk)->cb_flags |= BIT(MPTCP_PUSH_PENDING); + mptcp_data_unlock(sk); + } }
static int __subflow_push_pending(struct sock *sk, struct sock *ssk, @@ -3138,7 +3141,6 @@ static int mptcp_disconnect(struct sock mptcp_destroy_common(msk, MPTCP_CF_FASTCLOSE); WRITE_ONCE(msk->flags, 0); msk->cb_flags = 0; - msk->push_pending = 0; msk->recovery = false; msk->can_ack = false; msk->fully_established = false; @@ -3364,8 +3366,7 @@ static void mptcp_release_cb(struct sock struct mptcp_sock *msk = mptcp_sk(sk);
for (;;) { - unsigned long flags = (msk->cb_flags & MPTCP_FLAGS_PROCESS_CTX_NEED) | - msk->push_pending; + unsigned long flags = (msk->cb_flags & MPTCP_FLAGS_PROCESS_CTX_NEED); struct list_head join_list;
if (!flags) @@ -3381,7 +3382,6 @@ static void mptcp_release_cb(struct sock * datapath acquires the msk socket spinlock while helding * the subflow socket lock */ - msk->push_pending = 0; msk->cb_flags &= ~flags; spin_unlock_bh(&sk->sk_lock.slock);
--- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -286,7 +286,6 @@ struct mptcp_sock { int rmem_released; unsigned long flags; unsigned long cb_flags; - unsigned long push_pending; bool recovery; /* closing subflow write queue reinjected */ bool can_ack; bool fully_established;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
commit 013e3179dbd2bc756ce1dd90354abac62f65b739 upstream.
mptcp_rcv_space_init() is supposed to happen under the msk socket lock, but active msk socket does that without such protection.
Leverage the existing mptcp_propagate_state() helper to that extent. We need to ensure mptcp_rcv_space_init will happen before mptcp_rcv_space_adjust(), and the release_cb does not assure that: explicitly check for such condition.
While at it, move the wnd_end initialization out of mptcp_rcv_space_init(), it never belonged there.
Note that the race does not produce ill effect in practice, but change allows cleaning-up and defying better the locking model.
Fixes: a6b118febbab ("mptcp: add receive buffer auto-tuning") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.c | 10 ++++++---- net/mptcp/protocol.h | 3 ++- net/mptcp/subflow.c | 4 ++-- 3 files changed, 10 insertions(+), 7 deletions(-)
--- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1977,6 +1977,9 @@ static void mptcp_rcv_space_adjust(struc if (copied <= 0) return;
+ if (!msk->rcvspace_init) + mptcp_rcv_space_init(msk, msk->first); + msk->rcvq_space.copied += copied;
mstamp = div_u64(tcp_clock_ns(), NSEC_PER_USEC); @@ -3156,6 +3159,7 @@ static int mptcp_disconnect(struct sock msk->bytes_received = 0; msk->bytes_sent = 0; msk->bytes_retrans = 0; + msk->rcvspace_init = 0;
WRITE_ONCE(sk->sk_shutdown, 0); sk_error_report(sk); @@ -3243,6 +3247,7 @@ void mptcp_rcv_space_init(struct mptcp_s { const struct tcp_sock *tp = tcp_sk(ssk);
+ msk->rcvspace_init = 1; msk->rcvq_space.copied = 0; msk->rcvq_space.rtt_us = 0;
@@ -3253,8 +3258,6 @@ void mptcp_rcv_space_init(struct mptcp_s TCP_INIT_CWND * tp->advmss); if (msk->rcvq_space.space == 0) msk->rcvq_space.space = TCP_INIT_CWND * TCP_MSS_DEFAULT; - - WRITE_ONCE(msk->wnd_end, msk->snd_nxt + tcp_sk(ssk)->snd_wnd); }
static struct sock *mptcp_accept(struct sock *ssk, int flags, int *err, @@ -3512,10 +3515,9 @@ void mptcp_finish_connect(struct sock *s WRITE_ONCE(msk->write_seq, subflow->idsn + 1); WRITE_ONCE(msk->snd_nxt, msk->write_seq); WRITE_ONCE(msk->snd_una, msk->write_seq); + WRITE_ONCE(msk->wnd_end, msk->snd_nxt + tcp_sk(ssk)->snd_wnd);
mptcp_pm_new_connection(msk, ssk, 0); - - mptcp_rcv_space_init(msk, ssk); }
void mptcp_sock_graft(struct sock *sk, struct socket *parent) --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -304,7 +304,8 @@ struct mptcp_sock { nodelay:1, fastopening:1, in_accept_queue:1, - free_first:1; + free_first:1, + rcvspace_init:1; struct work_struct work; struct sk_buff *ooo_last_skb; struct rb_root out_of_order_queue; --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -424,6 +424,8 @@ void __mptcp_sync_state(struct sock *sk, struct mptcp_sock *msk = mptcp_sk(sk);
__mptcp_propagate_sndbuf(sk, msk->first); + if (!msk->rcvspace_init) + mptcp_rcv_space_init(msk, msk->first); if (sk->sk_state == TCP_SYN_SENT) { inet_sk_state_store(sk, state); sk->sk_state_change(sk); @@ -545,7 +547,6 @@ static void subflow_finish_connect(struc } } else if (mptcp_check_fallback(sk)) { fallback: - mptcp_rcv_space_init(msk, sk); mptcp_propagate_state(parent, sk); } return; @@ -1744,7 +1745,6 @@ static void subflow_state_change(struct msk = mptcp_sk(parent); if (subflow_simultaneous_connect(sk)) { mptcp_do_fallback(sk); - mptcp_rcv_space_init(msk, sk); pr_fallback(msk); subflow->conn_finished = 1; mptcp_propagate_state(parent, sk);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geliang Tang geliang@kernel.org
commit f012d796a6de662692159c539689e47e662853a8 upstream.
Before adding a new entry in mptcp_userspace_pm_get_local_id(), it's better to check whether this address is already in userspace pm local address list. If it's in the list, no need to add a new entry, just return it's address ID and use this address.
Fixes: 8b20137012d9 ("mptcp: read attributes of addr entries managed by userspace PMs") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang geliang.tang@linux.dev Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_userspace.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
--- a/net/mptcp/pm_userspace.c +++ b/net/mptcp/pm_userspace.c @@ -130,10 +130,21 @@ int mptcp_userspace_pm_get_flags_and_ifi int mptcp_userspace_pm_get_local_id(struct mptcp_sock *msk, struct mptcp_addr_info *skc) { - struct mptcp_pm_addr_entry new_entry; + struct mptcp_pm_addr_entry *entry = NULL, *e, new_entry; __be16 msk_sport = ((struct inet_sock *) inet_sk((struct sock *)msk))->inet_sport;
+ spin_lock_bh(&msk->pm.lock); + list_for_each_entry(e, &msk->pm.userspace_pm_local_addr_list, list) { + if (mptcp_addresses_equal(&e->addr, skc, false)) { + entry = e; + break; + } + } + spin_unlock_bh(&msk->pm.lock); + if (entry) + return entry->addr.id; + memset(&new_entry, 0, sizeof(struct mptcp_pm_addr_entry)); new_entry.addr = *skc; new_entry.addr.id = 0;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
commit 337cebbd850f94147cee05252778f8f78b8c337f upstream.
Fastopen and PM-trigger subflow shutdown can race, as reported by syzkaller.
In my first attempt to close such race, I missed the fact that the subflow status can change again before the subflow_state_change callback is invoked.
Address the issue additionally copying with all the states directly reachable from TCP_FIN_WAIT1.
Fixes: 1e777f39b4d7 ("mptcp: add MSG_FASTOPEN sendmsg flag support") Fixes: 4fd19a307016 ("mptcp: fix inconsistent state on fastopen race") Cc: stable@vger.kernel.org Reported-by: syzbot+c53d4d3ddb327e80bc51@syzkaller.appspotmail.com Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/458 Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -1118,7 +1118,8 @@ static inline bool subflow_simultaneous_ { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
- return (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_FIN_WAIT1) && + return (1 << sk->sk_state) & + (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2 | TCPF_CLOSING) && is_active_ssk(subflow) && !subflow->conn_finished; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Ellerman mpe@ellerman.id.au
commit 1fba2bf8e9d5a27b7394856181b6200de7260b79 upstream.
This reverts commit ed8b94f6e0acd652ce69bd69d678a0c769172df8.
Gaurav reported that there are still problems with the patch and it should be reverted pending a fuller fix.
Link: https://lore.kernel.org/all/4f6fc1ac-7a76-4447-9d0e-f55c0be373f8@linux.ibm.c... Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/ppc-pci.h | 3 --- arch/powerpc/kernel/iommu.c | 21 +++++---------------- arch/powerpc/platforms/pseries/pci_dlpar.c | 4 ---- 3 files changed, 5 insertions(+), 23 deletions(-)
--- a/arch/powerpc/include/asm/ppc-pci.h +++ b/arch/powerpc/include/asm/ppc-pci.h @@ -29,9 +29,6 @@ void *pci_traverse_device_nodes(struct d void *(*fn)(struct device_node *, void *), void *data); extern void pci_devs_phb_init_dynamic(struct pci_controller *phb); -extern void ppc_iommu_register_device(struct pci_controller *phb); -extern void ppc_iommu_unregister_device(struct pci_controller *phb); -
/* From rtas_pci.h */ extern void init_pci_config_tokens (void); --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -1390,21 +1390,6 @@ static const struct attribute_group *spa NULL, };
-void ppc_iommu_register_device(struct pci_controller *phb) -{ - iommu_device_sysfs_add(&phb->iommu, phb->parent, - spapr_tce_iommu_groups, "iommu-phb%04x", - phb->global_number); - iommu_device_register(&phb->iommu, &spapr_tce_iommu_ops, - phb->parent); -} - -void ppc_iommu_unregister_device(struct pci_controller *phb) -{ - iommu_device_unregister(&phb->iommu); - iommu_device_sysfs_remove(&phb->iommu); -} - /* * This registers IOMMU devices of PHBs. This needs to happen * after core_initcall(iommu_init) + postcore_initcall(pci_driver_init) and @@ -1415,7 +1400,11 @@ static int __init spapr_tce_setup_phb_io struct pci_controller *hose;
list_for_each_entry(hose, &hose_list, list_node) { - ppc_iommu_register_device(hose); + iommu_device_sysfs_add(&hose->iommu, hose->parent, + spapr_tce_iommu_groups, "iommu-phb%04x", + hose->global_number); + iommu_device_register(&hose->iommu, &spapr_tce_iommu_ops, + hose->parent); } return 0; } --- a/arch/powerpc/platforms/pseries/pci_dlpar.c +++ b/arch/powerpc/platforms/pseries/pci_dlpar.c @@ -35,8 +35,6 @@ struct pci_controller *init_phb_dynamic(
pseries_msi_allocate_domains(phb);
- ppc_iommu_register_device(phb); - /* Create EEH devices for the PHB */ eeh_phb_pe_create(phb);
@@ -78,8 +76,6 @@ int remove_phb_dynamic(struct pci_contro } }
- ppc_iommu_unregister_device(phb); - pseries_msi_free_domains(phb);
/* Keep a reference so phb isn't freed yet */
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tomi Valkeinen tomi.valkeinen@ideasonboard.com
commit a107d643b2a3382e0a2d2c4ef08bf8c6bff4561d upstream.
This reverts commit 85d2a31fe4d9be1555f621ead7a520d8791e0f74.
The rkisp1 does share interrupt lines on some platforms, after all. Thus we need to revert this, and implement a fix for the rkisp1 shared irq handling in a follow-up patch.
Closes: https://lore.kernel.org/all/87o7eo8vym.fsf@gmail.com/ Link: https://lore.kernel.org/r/20231218-rkisp-shirq-fix-v1-1-173007628248@ideason...
Reported-by: Mikhail Rudenko mike.rudenko@gmail.com Signed-off-by: Tomi Valkeinen tomi.valkeinen@ideasonboard.com Signed-off-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/platform/rockchip/rkisp1/rkisp1-dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/media/platform/rockchip/rkisp1/rkisp1-dev.c +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-dev.c @@ -559,7 +559,7 @@ static int rkisp1_probe(struct platform_ rkisp1->irqs[il] = irq; }
- ret = devm_request_irq(dev, irq, info->isrs[i].isr, 0, + ret = devm_request_irq(dev, irq, info->isrs[i].isr, IRQF_SHARED, dev_driver_string(dev), dev); if (ret) { dev_err(dev, "request irq failed: %d\n", ret);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lee Duncan lduncan@suse.com
commit 977fe773dcc7098d8eaf4ee6382cb51e13e784cb upstream.
This reverts commit 1a1975551943f681772720f639ff42fbaa746212.
This commit causes interrupts to be lost for FCoE devices, since it changed sping locks from "bh" to "irqsave".
Instead, a work queue should be used, and will be addressed in a separate commit.
Fixes: 1a1975551943 ("scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock") Signed-off-by: Lee Duncan lduncan@suse.com Link: https://lore.kernel.org/r/c578cdcd46b60470535c4c4a953e6a1feca0dffd.170750078... Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/fcoe/fcoe_ctlr.c | 20 ++++++++------------ 1 file changed, 8 insertions(+), 12 deletions(-)
--- a/drivers/scsi/fcoe/fcoe_ctlr.c +++ b/drivers/scsi/fcoe/fcoe_ctlr.c @@ -319,17 +319,16 @@ static void fcoe_ctlr_announce(struct fc { struct fcoe_fcf *sel; struct fcoe_fcf *fcf; - unsigned long flags;
mutex_lock(&fip->ctlr_mutex); - spin_lock_irqsave(&fip->ctlr_lock, flags); + spin_lock_bh(&fip->ctlr_lock);
kfree_skb(fip->flogi_req); fip->flogi_req = NULL; list_for_each_entry(fcf, &fip->fcfs, list) fcf->flogi_sent = 0;
- spin_unlock_irqrestore(&fip->ctlr_lock, flags); + spin_unlock_bh(&fip->ctlr_lock); sel = fip->sel_fcf;
if (sel && ether_addr_equal(sel->fcf_mac, fip->dest_addr)) @@ -700,7 +699,6 @@ int fcoe_ctlr_els_send(struct fcoe_ctlr { struct fc_frame *fp; struct fc_frame_header *fh; - unsigned long flags; u16 old_xid; u8 op; u8 mac[ETH_ALEN]; @@ -734,11 +732,11 @@ int fcoe_ctlr_els_send(struct fcoe_ctlr op = FIP_DT_FLOGI; if (fip->mode == FIP_MODE_VN2VN) break; - spin_lock_irqsave(&fip->ctlr_lock, flags); + spin_lock_bh(&fip->ctlr_lock); kfree_skb(fip->flogi_req); fip->flogi_req = skb; fip->flogi_req_send = 1; - spin_unlock_irqrestore(&fip->ctlr_lock, flags); + spin_unlock_bh(&fip->ctlr_lock); schedule_work(&fip->timer_work); return -EINPROGRESS; case ELS_FDISC: @@ -1707,11 +1705,10 @@ static int fcoe_ctlr_flogi_send_locked(s static int fcoe_ctlr_flogi_retry(struct fcoe_ctlr *fip) { struct fcoe_fcf *fcf; - unsigned long flags; int error;
mutex_lock(&fip->ctlr_mutex); - spin_lock_irqsave(&fip->ctlr_lock, flags); + spin_lock_bh(&fip->ctlr_lock); LIBFCOE_FIP_DBG(fip, "re-sending FLOGI - reselect\n"); fcf = fcoe_ctlr_select(fip); if (!fcf || fcf->flogi_sent) { @@ -1722,7 +1719,7 @@ static int fcoe_ctlr_flogi_retry(struct fcoe_ctlr_solicit(fip, NULL); error = fcoe_ctlr_flogi_send_locked(fip); } - spin_unlock_irqrestore(&fip->ctlr_lock, flags); + spin_unlock_bh(&fip->ctlr_lock); mutex_unlock(&fip->ctlr_mutex); return error; } @@ -1739,9 +1736,8 @@ static int fcoe_ctlr_flogi_retry(struct static void fcoe_ctlr_flogi_send(struct fcoe_ctlr *fip) { struct fcoe_fcf *fcf; - unsigned long flags;
- spin_lock_irqsave(&fip->ctlr_lock, flags); + spin_lock_bh(&fip->ctlr_lock); fcf = fip->sel_fcf; if (!fcf || !fip->flogi_req_send) goto unlock; @@ -1768,7 +1764,7 @@ static void fcoe_ctlr_flogi_send(struct } else /* XXX */ LIBFCOE_FIP_DBG(fip, "No FCF selected - defer send\n"); unlock: - spin_unlock_irqrestore(&fip->ctlr_lock, flags); + spin_unlock_bh(&fip->ctlr_lock); }
/**
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
commit 916361685319098f696b798ef1560f69ed96e934 upstream.
commit ab4750332dbe ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks") caused GFXOFF control to be used more heavily and the codepath that was removed from commit 0dee72639533 ("drm/amd: flush any delayed gfxoff on suspend entry") now can be exercised at suspend again.
Users report that by using GNOME to suspend the lockscreen trigger will cause SDMA traffic and the system can deadlock.
This reverts commit 0dee726395333fea833eaaf838bc80962df886c8.
Acked-by: Alex Deucher alexander.deucher@amd.com Fixes: ab4750332dbe ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks") Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 9 ++++++++- 2 files changed, 8 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4496,7 +4496,6 @@ int amdgpu_device_suspend(struct drm_dev drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)->fb_helper, true);
cancel_delayed_work_sync(&adev->delayed_init_work); - flush_delayed_work(&adev->gfx.gfx_off_delay_work);
amdgpu_ras_suspend(adev);
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -723,8 +723,15 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_d
if (adev->gfx.gfx_off_req_count == 0 && !adev->gfx.gfx_off_state) { - schedule_delayed_work(&adev->gfx.gfx_off_delay_work, + /* If going to s2idle, no need to wait */ + if (adev->in_s0ix) { + if (!amdgpu_dpm_set_powergating_by_smu(adev, + AMD_IP_BLOCK_TYPE_GFX, true)) + adev->gfx.gfx_off_state = true; + } else { + schedule_delayed_work(&adev->gfx.gfx_off_delay_work, delay); + } } } else { if (adev->gfx.gfx_off_req_count == 0) {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rob Clark robdclark@chromium.org
commit 917e9b7c2350e3e53162fcf5035e5f2d68e2cbed upstream.
This reverts commit abe2023b4cea192ab266b351fd38dc9dbd846df0.
Changing the locking order means that scheduler/msm_job_run() can race with the recovery kthread worker, with the result that the GPU gets an extra runpm get when we are trying to power it off. Leaving the GPU in an unrecovered state.
I'll need to come up with a different scheme for appeasing lockdep.
Signed-off-by: Rob Clark robdclark@chromium.org Patchwork: https://patchwork.freedesktop.org/patch/573835/ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/msm/msm_gpu.c | 11 +++++------ drivers/gpu/drm/msm/msm_ringbuffer.c | 7 +++++-- 2 files changed, 10 insertions(+), 8 deletions(-)
--- a/drivers/gpu/drm/msm/msm_gpu.c +++ b/drivers/gpu/drm/msm/msm_gpu.c @@ -749,12 +749,14 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_ringbuffer *ring = submit->ring; unsigned long flags;
- pm_runtime_get_sync(&gpu->pdev->dev); + WARN_ON(!mutex_is_locked(&gpu->lock));
- mutex_lock(&gpu->lock); + pm_runtime_get_sync(&gpu->pdev->dev);
msm_gpu_hw_init(gpu);
+ submit->seqno = submit->hw_fence->seqno; + update_sw_cntrs(gpu);
/* @@ -779,11 +781,8 @@ void msm_gpu_submit(struct msm_gpu *gpu, gpu->funcs->submit(gpu, submit); gpu->cur_ctx_seqno = submit->queue->ctx->seqno;
- hangcheck_timer_reset(gpu); - - mutex_unlock(&gpu->lock); - pm_runtime_put(&gpu->pdev->dev); + hangcheck_timer_reset(gpu); }
/* --- a/drivers/gpu/drm/msm/msm_ringbuffer.c +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c @@ -21,8 +21,6 @@ static struct dma_fence *msm_job_run(str
msm_fence_init(submit->hw_fence, fctx);
- submit->seqno = submit->hw_fence->seqno; - mutex_lock(&priv->lru.lock);
for (i = 0; i < submit->nr_bos; i++) { @@ -34,8 +32,13 @@ static struct dma_fence *msm_job_run(str
mutex_unlock(&priv->lru.lock);
+ /* TODO move submit path over to using a per-ring lock.. */ + mutex_lock(&gpu->lock); + msm_gpu_submit(gpu, submit);
+ mutex_unlock(&gpu->lock); + return dma_fence_get(submit->hw_fence); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Keqi Wang wangkeqi_chris@163.com
commit 8929f95b2b587791a7dcd04cc91520194a76d3a6 upstream.
This reverts commit c46bfba1337d ("connector: Fix proc_event_num_listeners count not cleared").
It is not accurate to reset proc_event_num_listeners according to cn_netlink_send_mult() return value -ESRCH.
In the case of stress-ng netlink-proc, -ESRCH will always be returned, because netlink_broadcast_filtered will return -ESRCH, which may cause stress-ng netlink-proc performance degradation.
Reported-by: kernel test robot oliver.sang@intel.com Closes: https://lore.kernel.org/oe-lkp/202401112259.b23a1567-oliver.sang@intel.com Fixes: c46bfba1337d ("connector: Fix proc_event_num_listeners count not cleared") Signed-off-by: Keqi Wang wangkeqi_chris@163.com Link: https://lore.kernel.org/r/20240209091659.68723-1-wangkeqi_chris@163.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/connector/cn_proc.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/drivers/connector/cn_proc.c +++ b/drivers/connector/cn_proc.c @@ -108,9 +108,8 @@ static inline void send_msg(struct cn_ms filter_data[1] = 0; }
- if (cn_netlink_send_mult(msg, msg->len, 0, CN_IDX_PROC, GFP_NOWAIT, - cn_filter, (void *)filter_data) == -ESRCH) - atomic_set(&proc_event_num_listeners, 0); + cn_netlink_send_mult(msg, msg->len, 0, CN_IDX_PROC, GFP_NOWAIT, + cn_filter, (void *)filter_data);
local_unlock(&local_event.lock); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vaishnav Achath vaishnav.a@ti.com
commit e56c671c2272d939d48a66be7e73b92b74c560c2 upstream.
MCSPI controller have few limitations regarding the transaction size when the FIFO buffer is enabled and the WCNT feature is used to find the end of word, in this case if WCNT is not a multiple of the FIFO Almost Empty Level (AEL), then the FIFO empty event is not generated correctly. In addition to this limitation, few other unknown sequence of events that causes the FIFO empty status to not reflect the exact status were found when FIFO is being used without DMA enabled during extended testing in AM65x platform. Till the exact root cause is found and fixed, revert the FIFO support without DMA.
See J721E Technical Reference Manual (SPRUI1C), section 12.1.5 for further details: http://www.ti.com/lit/pdf/spruil1
This reverts commit 75223bbea840e ("spi: omap2-mcspi: Add FIFO support without DMA")
Signed-off-by: Vaishnav Achath vaishnav.a@ti.com Link: https://msgid.link/r/20240212120049.438495-1-vaishnav.a@ti.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/spi/spi-omap2-mcspi.c | 137 ++---------------------------------------- 1 file changed, 8 insertions(+), 129 deletions(-)
--- a/drivers/spi/spi-omap2-mcspi.c +++ b/drivers/spi/spi-omap2-mcspi.c @@ -53,8 +53,6 @@
/* per-register bitmasks: */ #define OMAP2_MCSPI_IRQSTATUS_EOW BIT(17) -#define OMAP2_MCSPI_IRQSTATUS_TX0_EMPTY BIT(0) -#define OMAP2_MCSPI_IRQSTATUS_RX0_FULL BIT(2)
#define OMAP2_MCSPI_MODULCTRL_SINGLE BIT(0) #define OMAP2_MCSPI_MODULCTRL_MS BIT(2) @@ -293,7 +291,7 @@ static void omap2_mcspi_set_mode(struct }
static void omap2_mcspi_set_fifo(const struct spi_device *spi, - struct spi_transfer *t, int enable, int dma_enabled) + struct spi_transfer *t, int enable) { struct spi_controller *ctlr = spi->controller; struct omap2_mcspi_cs *cs = spi->controller_state; @@ -314,28 +312,20 @@ static void omap2_mcspi_set_fifo(const s max_fifo_depth = OMAP2_MCSPI_MAX_FIFODEPTH / 2; else max_fifo_depth = OMAP2_MCSPI_MAX_FIFODEPTH; - if (dma_enabled) - wcnt = t->len / bytes_per_word; - else - wcnt = 0; + + wcnt = t->len / bytes_per_word; if (wcnt > OMAP2_MCSPI_MAX_FIFOWCNT) goto disable_fifo;
xferlevel = wcnt << 16; if (t->rx_buf != NULL) { chconf |= OMAP2_MCSPI_CHCONF_FFER; - if (dma_enabled) - xferlevel |= (bytes_per_word - 1) << 8; - else - xferlevel |= (max_fifo_depth - 1) << 8; + xferlevel |= (bytes_per_word - 1) << 8; }
if (t->tx_buf != NULL) { chconf |= OMAP2_MCSPI_CHCONF_FFET; - if (dma_enabled) - xferlevel |= bytes_per_word - 1; - else - xferlevel |= (max_fifo_depth - 1); + xferlevel |= bytes_per_word - 1; }
mcspi_write_reg(ctlr, OMAP2_MCSPI_XFERLEVEL, xferlevel); @@ -892,113 +882,6 @@ out: return count - c; }
-static unsigned -omap2_mcspi_txrx_piofifo(struct spi_device *spi, struct spi_transfer *xfer) -{ - struct omap2_mcspi_cs *cs = spi->controller_state; - struct omap2_mcspi *mcspi; - unsigned int count, c; - unsigned int iter, cwc; - int last_request; - void __iomem *base = cs->base; - void __iomem *tx_reg; - void __iomem *rx_reg; - void __iomem *chstat_reg; - void __iomem *irqstat_reg; - int word_len, bytes_per_word; - u8 *rx; - const u8 *tx; - - mcspi = spi_controller_get_devdata(spi->controller); - count = xfer->len; - c = count; - word_len = cs->word_len; - bytes_per_word = mcspi_bytes_per_word(word_len); - - /* - * We store the pre-calculated register addresses on stack to speed - * up the transfer loop. - */ - tx_reg = base + OMAP2_MCSPI_TX0; - rx_reg = base + OMAP2_MCSPI_RX0; - chstat_reg = base + OMAP2_MCSPI_CHSTAT0; - irqstat_reg = base + OMAP2_MCSPI_IRQSTATUS; - - if (c < (word_len >> 3)) - return 0; - - rx = xfer->rx_buf; - tx = xfer->tx_buf; - - do { - /* calculate number of words in current iteration */ - cwc = min((unsigned int)mcspi->fifo_depth / bytes_per_word, - c / bytes_per_word); - last_request = cwc != (mcspi->fifo_depth / bytes_per_word); - if (tx) { - if (mcspi_wait_for_reg_bit(irqstat_reg, - OMAP2_MCSPI_IRQSTATUS_TX0_EMPTY) < 0) { - dev_err(&spi->dev, "TX Empty timed out\n"); - goto out; - } - writel_relaxed(OMAP2_MCSPI_IRQSTATUS_TX0_EMPTY, irqstat_reg); - - for (iter = 0; iter < cwc; iter++, tx += bytes_per_word) { - if (bytes_per_word == 1) - writel_relaxed(*tx, tx_reg); - else if (bytes_per_word == 2) - writel_relaxed(*((u16 *)tx), tx_reg); - else if (bytes_per_word == 4) - writel_relaxed(*((u32 *)tx), tx_reg); - } - } - - if (rx) { - if (!last_request && - mcspi_wait_for_reg_bit(irqstat_reg, - OMAP2_MCSPI_IRQSTATUS_RX0_FULL) < 0) { - dev_err(&spi->dev, "RX_FULL timed out\n"); - goto out; - } - writel_relaxed(OMAP2_MCSPI_IRQSTATUS_RX0_FULL, irqstat_reg); - - for (iter = 0; iter < cwc; iter++, rx += bytes_per_word) { - if (last_request && - mcspi_wait_for_reg_bit(chstat_reg, - OMAP2_MCSPI_CHSTAT_RXS) < 0) { - dev_err(&spi->dev, "RXS timed out\n"); - goto out; - } - if (bytes_per_word == 1) - *rx = readl_relaxed(rx_reg); - else if (bytes_per_word == 2) - *((u16 *)rx) = readl_relaxed(rx_reg); - else if (bytes_per_word == 4) - *((u32 *)rx) = readl_relaxed(rx_reg); - } - } - - if (last_request) { - if (mcspi_wait_for_reg_bit(chstat_reg, - OMAP2_MCSPI_CHSTAT_EOT) < 0) { - dev_err(&spi->dev, "EOT timed out\n"); - goto out; - } - if (mcspi_wait_for_reg_bit(chstat_reg, - OMAP2_MCSPI_CHSTAT_TXFFE) < 0) { - dev_err(&spi->dev, "TXFFE timed out\n"); - goto out; - } - omap2_mcspi_set_enable(spi, 0); - } - c -= cwc * bytes_per_word; - } while (c >= bytes_per_word); - -out: - omap2_mcspi_set_enable(spi, 1); - return count - c; -} - static u32 omap2_mcspi_calc_divisor(u32 speed_hz, u32 ref_clk_hz) { u32 div; @@ -1323,9 +1206,7 @@ static int omap2_mcspi_transfer_one(stru if ((mcspi_dma->dma_rx && mcspi_dma->dma_tx) && ctlr->cur_msg_mapped && ctlr->can_dma(ctlr, spi, t)) - omap2_mcspi_set_fifo(spi, t, 1, 1); - else if (t->len > OMAP2_MCSPI_MAX_FIFODEPTH) - omap2_mcspi_set_fifo(spi, t, 1, 0); + omap2_mcspi_set_fifo(spi, t, 1);
omap2_mcspi_set_enable(spi, 1);
@@ -1338,8 +1219,6 @@ static int omap2_mcspi_transfer_one(stru ctlr->cur_msg_mapped && ctlr->can_dma(ctlr, spi, t)) count = omap2_mcspi_txrx_dma(spi, t); - else if (mcspi->fifo_depth > 0) - count = omap2_mcspi_txrx_piofifo(spi, t); else count = omap2_mcspi_txrx_pio(spi, t);
@@ -1352,7 +1231,7 @@ static int omap2_mcspi_transfer_one(stru omap2_mcspi_set_enable(spi, 0);
if (mcspi->fifo_depth > 0) - omap2_mcspi_set_fifo(spi, t, 0, 0); + omap2_mcspi_set_fifo(spi, t, 0);
out: /* Restore defaults if they were overriden */ @@ -1375,7 +1254,7 @@ out: omap2_mcspi_set_cs(spi, !(spi->mode & SPI_CS_HIGH));
if (mcspi->fifo_depth > 0 && t) - omap2_mcspi_set_fifo(spi, t, 0, 0); + omap2_mcspi_set_fifo(spi, t, 0);
return status; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sebastian Ott sebott@redhat.com
commit 9c64e749cebd9c2d3d55261530a98bcccb83b950 upstream.
Set the segment size of the virtio_gpu device to the value used by the drm helpers when allocating sg lists to fix the following complaint from DMA_API debug code:
DMA-API: virtio-pci 0000:07:00.0: mapping sg segment longer than device claims to support [len=262144] [max=65536]
Cc: stable@vger.kernel.org Tested-by: Zhenyu Zhang zhenyzha@redhat.com Acked-by: Vivek Kasireddy vivek.kasireddy@intel.com Signed-off-by: Sebastian Ott sebott@redhat.com Signed-off-by: Dmitry Osipenko dmitry.osipenko@collabora.com Link: https://patchwork.freedesktop.org/patch/msgid/7258a4cc-da16-5c34-a042-2a23ee... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/virtio/virtgpu_drv.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c +++ b/drivers/gpu/drm/virtio/virtgpu_drv.c @@ -94,6 +94,7 @@ static int virtio_gpu_probe(struct virti goto err_free; }
+ dma_set_max_seg_size(dev->dev, dma_max_mapping_size(dev->dev) ?: UINT_MAX); ret = virtio_gpu_init(vdev, dev); if (ret) goto err_free;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Friedrich Vock friedrich.vock@gmx.de
commit 7330256268664ea0a7dd5b07a3fed363093477dd upstream.
Allows us to detect subsequent IH ring buffer overflows as well.
Cc: Joshua Ashton joshua@froggi.es Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: stable@vger.kernel.org Signed-off-by: Friedrich Vock friedrich.vock@gmx.de Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/cik_ih.c | 6 ++++++ drivers/gpu/drm/amd/amdgpu/cz_ih.c | 5 +++++ drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 5 +++++ drivers/gpu/drm/amd/amdgpu/ih_v6_0.c | 6 ++++++ drivers/gpu/drm/amd/amdgpu/ih_v6_1.c | 7 +++++++ drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 6 ++++++ drivers/gpu/drm/amd/amdgpu/si_ih.c | 6 ++++++ drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 6 ++++++ drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 6 ++++++ drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 6 ++++++ 10 files changed, 59 insertions(+)
--- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c @@ -204,6 +204,12 @@ static u32 cik_ih_get_wptr(struct amdgpu tmp = RREG32(mmIH_RB_CNTL); tmp |= IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK; WREG32(mmIH_RB_CNTL, tmp); + + /* Unset the CLEAR_OVERFLOW bit immediately so new overflows + * can be detected. + */ + tmp &= ~IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK; + WREG32(mmIH_RB_CNTL, tmp); } return (wptr & ih->ptr_mask); } --- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c @@ -216,6 +216,11 @@ static u32 cz_ih_get_wptr(struct amdgpu_ tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1); WREG32(mmIH_RB_CNTL, tmp);
+ /* Unset the CLEAR_OVERFLOW bit immediately so new overflows + * can be detected. + */ + tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0); + WREG32(mmIH_RB_CNTL, tmp);
out: return (wptr & ih->ptr_mask); --- a/drivers/gpu/drm/amd/amdgpu/iceland_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/iceland_ih.c @@ -215,6 +215,11 @@ static u32 iceland_ih_get_wptr(struct am tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1); WREG32(mmIH_RB_CNTL, tmp);
+ /* Unset the CLEAR_OVERFLOW bit immediately so new overflows + * can be detected. + */ + tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0); + WREG32(mmIH_RB_CNTL, tmp);
out: return (wptr & ih->ptr_mask); --- a/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c @@ -418,6 +418,12 @@ static u32 ih_v6_0_get_wptr(struct amdgp tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl); tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1); WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp); + + /* Unset the CLEAR_OVERFLOW bit immediately so new overflows + * can be detected. + */ + tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0); + WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp); out: return (wptr & ih->ptr_mask); } --- a/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c +++ b/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c @@ -418,6 +418,13 @@ static u32 ih_v6_1_get_wptr(struct amdgp tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl); tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1); WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp); + + /* Unset the CLEAR_OVERFLOW bit immediately so new overflows + * can be detected. + */ + tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0); + WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp); + out: return (wptr & ih->ptr_mask); } --- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c @@ -442,6 +442,12 @@ static u32 navi10_ih_get_wptr(struct amd tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl); tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1); WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp); + + /* Unset the CLEAR_OVERFLOW bit immediately so new overflows + * can be detected. + */ + tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0); + WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp); out: return (wptr & ih->ptr_mask); } --- a/drivers/gpu/drm/amd/amdgpu/si_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/si_ih.c @@ -119,6 +119,12 @@ static u32 si_ih_get_wptr(struct amdgpu_ tmp = RREG32(IH_RB_CNTL); tmp |= IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK; WREG32(IH_RB_CNTL, tmp); + + /* Unset the CLEAR_OVERFLOW bit immediately so new overflows + * can be detected. + */ + tmp &= ~IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK; + WREG32(IH_RB_CNTL, tmp); } return (wptr & ih->ptr_mask); } --- a/drivers/gpu/drm/amd/amdgpu/tonga_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/tonga_ih.c @@ -219,6 +219,12 @@ static u32 tonga_ih_get_wptr(struct amdg tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1); WREG32(mmIH_RB_CNTL, tmp);
+ /* Unset the CLEAR_OVERFLOW bit immediately so new overflows + * can be detected. + */ + tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0); + WREG32(mmIH_RB_CNTL, tmp); + out: return (wptr & ih->ptr_mask); } --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c @@ -373,6 +373,12 @@ static u32 vega10_ih_get_wptr(struct amd tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1); WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
+ /* Unset the CLEAR_OVERFLOW bit immediately so new overflows + * can be detected. + */ + tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0); + WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp); + out: return (wptr & ih->ptr_mask); } --- a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c @@ -421,6 +421,12 @@ static u32 vega20_ih_get_wptr(struct amd tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1); WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
+ /* Unset the CLEAR_OVERFLOW bit immediately so new overflows + * can be detected. + */ + tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0); + WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp); + out: return (wptr & ih->ptr_mask); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: David McFarland corngood@gmail.com
commit 8ef85a0ce24a6d9322dfa2a67477e473c3619b4f upstream.
The same calls are made directly above, but conditional on the firmware loading and validating successfully.
Cc: stable@vger.kernel.org Fixes: 9931b67690cf ("drm/amd: Load GFX10 microcode during early_init") Signed-off-by: David McFarland corngood@gmail.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 2 -- 1 file changed, 2 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -4027,8 +4027,6 @@ static int gfx_v10_0_init_microcode(stru err = 0; adev->gfx.mec2_fw = NULL; } - amdgpu_gfx_cp_init_microcode(adev, AMDGPU_UCODE_ID_CP_MEC2); - amdgpu_gfx_cp_init_microcode(adev, AMDGPU_UCODE_ID_CP_MEC2_JT);
gfx_v10_0_check_fw_write_wait(adev); out:
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wenjing Liu wenjing.liu@amd.com
commit 39079fe8e660851abbafa90cd55cbf029210661f upstream.
[why] MAX_SURFACES is per stream, while MAX_PLANES is per asic. The mpc_combine is an array that records all the planes per asic. Therefore MAX_PLANES should be used as the array size. Using MAX_SURFACES causes array overflow when there are more than 3 planes.
[how] Use the MAX_PLANES for the mpc_combine array size.
Cc: Mario Limonciello mario.limonciello@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Reviewed-by: Rodrigo Siqueira rodrigo.siqueira@amd.com Reviewed-by: Nevenko Stupar nevenko.stupar@amd.com Reviewed-by: Chaitanya Dhere chaitanya.dhere@amd.com Acked-by: Tom Chung chiahsuan.chung@amd.com Signed-off-by: Wenjing Liu wenjing.liu@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c @@ -1089,7 +1089,7 @@ struct pipe_slice_table { struct pipe_ctx *pri_pipe; struct dc_plane_state *plane; int slice_count; - } mpc_combines[MAX_SURFACES]; + } mpc_combines[MAX_PLANES]; int mpc_combine_count; };
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fangzhi Zuo jerry.zuo@amd.com
commit faf51b201bc42adf500945732abb6220c707d6f3 upstream.
[why] odm calculation is missing for pipe split policy determination and cause Underflow/Corruption issue.
[how] Add the odm calculation.
Cc: Mario Limonciello mario.limonciello@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Reviewed-by: Charlene Liu charlene.liu@amd.com Acked-by: Tom Chung chiahsuan.chung@amd.com Signed-off-by: Fangzhi Zuo jerry.zuo@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c | 27 +++------- drivers/gpu/drm/amd/display/dc/inc/core_types.h | 2 2 files changed, 12 insertions(+), 17 deletions(-)
--- a/drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c @@ -791,35 +791,28 @@ static void populate_dml_surface_cfg_fro } }
-/*TODO no support for mpc combine, need rework - should calculate scaling params based on plane+stream*/ -static struct scaler_data get_scaler_data_for_plane(const struct dc_plane_state *in, const struct dc_state *context) +static struct scaler_data get_scaler_data_for_plane(const struct dc_plane_state *in, struct dc_state *context) { int i; - struct scaler_data data = { 0 }; + struct pipe_ctx *temp_pipe = &context->res_ctx.temp_pipe; + + memset(temp_pipe, 0, sizeof(struct pipe_ctx));
for (i = 0; i < MAX_PIPES; i++) { const struct pipe_ctx *pipe = &context->res_ctx.pipe_ctx[i];
if (pipe->plane_state == in && !pipe->prev_odm_pipe) { - const struct pipe_ctx *next_pipe = pipe->next_odm_pipe; + temp_pipe->stream = pipe->stream; + temp_pipe->plane_state = pipe->plane_state; + temp_pipe->plane_res.scl_data.taps = pipe->plane_res.scl_data.taps;
- data = context->res_ctx.pipe_ctx[i].plane_res.scl_data; - while (next_pipe) { - data.h_active += next_pipe->plane_res.scl_data.h_active; - data.recout.width += next_pipe->plane_res.scl_data.recout.width; - if (in->rotation == ROTATION_ANGLE_0 || in->rotation == ROTATION_ANGLE_180) { - data.viewport.width += next_pipe->plane_res.scl_data.viewport.width; - } else { - data.viewport.height += next_pipe->plane_res.scl_data.viewport.height; - } - next_pipe = next_pipe->next_odm_pipe; - } + resource_build_scaling_params(temp_pipe); break; } }
ASSERT(i < MAX_PIPES); - return data; + return temp_pipe->plane_res.scl_data; }
static void populate_dummy_dml_plane_cfg(struct dml_plane_cfg_st *out, unsigned int location, const struct dc_stream_state *in) @@ -864,7 +857,7 @@ static void populate_dummy_dml_plane_cfg out->ScalerEnabled[location] = false; }
-static void populate_dml_plane_cfg_from_plane_state(struct dml_plane_cfg_st *out, unsigned int location, const struct dc_plane_state *in, const struct dc_state *context) +static void populate_dml_plane_cfg_from_plane_state(struct dml_plane_cfg_st *out, unsigned int location, const struct dc_plane_state *in, struct dc_state *context) { const struct scaler_data scaler_data = get_scaler_data_for_plane(in, context);
--- a/drivers/gpu/drm/amd/display/dc/inc/core_types.h +++ b/drivers/gpu/drm/amd/display/dc/inc/core_types.h @@ -462,6 +462,8 @@ struct resource_context { unsigned int hpo_dp_link_enc_to_link_idx[MAX_HPO_DP2_LINK_ENCODERS]; int hpo_dp_link_enc_ref_cnts[MAX_HPO_DP2_LINK_ENCODERS]; bool is_mpc_3dlut_acquired[MAX_PIPES]; + /* solely used for build scalar data in dml2 */ + struct pipe_ctx temp_pipe; };
struct dce_bw_output {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ondrej Mosnacek omosnace@redhat.com
commit 5a287d3d2b9de2b3e747132c615599907ba5c3c1 upstream.
For these hooks the true "neutral" value is -EOPNOTSUPP, which is currently what is returned when no LSM provides this hook and what LSMs return when there is no security context set on the socket. Correct the value in <linux/lsm_hooks.h> and adjust the dispatch functions in security/security.c to avoid issues when the BPF LSM is enabled.
Cc: stable@vger.kernel.org Fixes: 98e828a0650f ("security: Refactor declaration of LSM hooks") Signed-off-by: Ondrej Mosnacek omosnace@redhat.com [PM: subject line tweak] Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/lsm_hook_defs.h | 4 ++-- security/security.c | 31 +++++++++++++++++++++++++++---- 2 files changed, 29 insertions(+), 6 deletions(-)
--- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -311,9 +311,9 @@ LSM_HOOK(int, 0, socket_getsockopt, stru LSM_HOOK(int, 0, socket_setsockopt, struct socket *sock, int level, int optname) LSM_HOOK(int, 0, socket_shutdown, struct socket *sock, int how) LSM_HOOK(int, 0, socket_sock_rcv_skb, struct sock *sk, struct sk_buff *skb) -LSM_HOOK(int, 0, socket_getpeersec_stream, struct socket *sock, +LSM_HOOK(int, -ENOPROTOOPT, socket_getpeersec_stream, struct socket *sock, sockptr_t optval, sockptr_t optlen, unsigned int len) -LSM_HOOK(int, 0, socket_getpeersec_dgram, struct socket *sock, +LSM_HOOK(int, -ENOPROTOOPT, socket_getpeersec_dgram, struct socket *sock, struct sk_buff *skb, u32 *secid) LSM_HOOK(int, 0, sk_alloc_security, struct sock *sk, int family, gfp_t priority) LSM_HOOK(void, LSM_RET_VOID, sk_free_security, struct sock *sk) --- a/security/security.c +++ b/security/security.c @@ -4387,8 +4387,20 @@ EXPORT_SYMBOL(security_sock_rcv_skb); int security_socket_getpeersec_stream(struct socket *sock, sockptr_t optval, sockptr_t optlen, unsigned int len) { - return call_int_hook(socket_getpeersec_stream, -ENOPROTOOPT, sock, - optval, optlen, len); + struct security_hook_list *hp; + int rc; + + /* + * Only one module will provide a security context. + */ + hlist_for_each_entry(hp, &security_hook_heads.socket_getpeersec_stream, + list) { + rc = hp->hook.socket_getpeersec_stream(sock, optval, optlen, + len); + if (rc != LSM_RET_DEFAULT(socket_getpeersec_stream)) + return rc; + } + return LSM_RET_DEFAULT(socket_getpeersec_stream); }
/** @@ -4408,8 +4420,19 @@ int security_socket_getpeersec_stream(st int security_socket_getpeersec_dgram(struct socket *sock, struct sk_buff *skb, u32 *secid) { - return call_int_hook(socket_getpeersec_dgram, -ENOPROTOOPT, sock, - skb, secid); + struct security_hook_list *hp; + int rc; + + /* + * Only one module will provide a security context. + */ + hlist_for_each_entry(hp, &security_hook_heads.socket_getpeersec_dgram, + list) { + rc = hp->hook.socket_getpeersec_dgram(sock, skb, secid); + if (rc != LSM_RET_DEFAULT(socket_getpeersec_dgram)) + return rc; + } + return LSM_RET_DEFAULT(socket_getpeersec_dgram); } EXPORT_SYMBOL(security_socket_getpeersec_dgram);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ondrej Mosnacek omosnace@redhat.com
commit 99b817c173cd213671daecd25ca27f56b0c7c4ec upstream.
The inode_getsecctx LSM hook has previously been corrected to have -EOPNOTSUPP instead of 0 as the default return value to fix BPF LSM behavior. However, the call_int_hook()-generated loop in security_inode_getsecctx() was left treating 0 as the neutral value, so after an LSM returns 0, the loop continues to try other LSMs, and if one of them returns a non-zero value, the function immediately returns with said value. So in a situation where SELinux and the BPF LSMs registered this hook, -EOPNOTSUPP would be incorrectly returned whenever SELinux returned 0.
Fix this by open-coding the call_int_hook() loop and making it use the correct LSM_RET_DEFAULT() value as the neutral one, similar to what other hooks do.
Cc: stable@vger.kernel.org Reported-by: Stephen Smalley stephen.smalley.work@gmail.com Link: https://lore.kernel.org/selinux/CAEjxPJ4ev-pasUwGx48fDhnmjBnq_Wh90jYPwRQRAqX... Link: https://bugzilla.redhat.com/show_bug.cgi?id=2257983 Fixes: b36995b8609a ("lsm: fix default return value for inode_getsecctx") Signed-off-by: Ondrej Mosnacek omosnace@redhat.com Reviewed-by: Casey Schaufler casey@schaufler-ca.com [PM: subject line tweak] Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- security/security.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
--- a/security/security.c +++ b/security/security.c @@ -4030,7 +4030,19 @@ EXPORT_SYMBOL(security_inode_setsecctx); */ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen) { - return call_int_hook(inode_getsecctx, -EOPNOTSUPP, inode, ctx, ctxlen); + struct security_hook_list *hp; + int rc; + + /* + * Only one module will provide a security context. + */ + hlist_for_each_entry(hp, &security_hook_heads.inode_getsecctx, list) { + rc = hp->hook.inode_getsecctx(inode, ctx, ctxlen); + if (rc != LSM_RET_DEFAULT(inode_getsecctx)) + return rc; + } + + return LSM_RET_DEFAULT(inode_getsecctx); } EXPORT_SYMBOL(security_inode_getsecctx);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Sakamoto o-takashi@sakamocchi.jp
commit 5f9ab17394f831cb7986ec50900fa37507a127f1 upstream.
Against its current description, the kernel API can accepts all types of directory entries.
This commit corrects the documentation.
Cc: stable@vger.kernel.org Fixes: 3c2c58cb33b3 ("firewire: core: fw_csr_string addendum") Link: https://lore.kernel.org/r/20240130100409.30128-2-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto o-takashi@sakamocchi.jp Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firewire/core-device.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
--- a/drivers/firewire/core-device.c +++ b/drivers/firewire/core-device.c @@ -100,10 +100,9 @@ static int textual_leaf_to_string(const * @buf: where to put the string * @size: size of @buf, in bytes * - * The string is taken from a minimal ASCII text descriptor leaf after - * the immediate entry with @key. The string is zero-terminated. - * An overlong string is silently truncated such that it and the - * zero byte fit into @size. + * The string is taken from a minimal ASCII text descriptor leaf just after the entry with the + * @key. The string is zero-terminated. An overlong string is silently truncated such that it + * and the zero byte fit into @size. * * Returns strlen(buf) or a negative error code. */
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: José Relvas josemonsantorelvas@gmail.com
commit 2468e8922d2f6da81a6192b73023eff67e3fefdd upstream.
There currently exists two thinkpad headset jack fixups: ALC285_FIXUP_THINKPAD_NO_BASS_SPK_HEADSET_JACK ALC285_FIXUP_THINKPAD_HEADSET_JACK
The latter is applied to alc285 and alc287 thinkpads which contain bass speakers. However, the former was only being applied to alc285 thinkpads, leaving non-bass alc287 thinkpads with no headset button controls. This patch fixes that by adding ALC285_FIXUP_THINKPAD_NO_BASS_SPK_HEADSET_JACK to the alc287 chains, allowing the detection of headset buttons.
Signed-off-by: José Relvas josemonsantorelvas@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240131113407.34698-3-josemonsantorelvas@gmail.co... Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -9570,7 +9570,7 @@ static const struct hda_fixup alc269_fix .type = HDA_FIXUP_FUNC, .v.func = cs35l41_fixup_i2c_two, .chained = true, - .chain_id = ALC269_FIXUP_THINKPAD_ACPI, + .chain_id = ALC285_FIXUP_THINKPAD_NO_BASS_SPK_HEADSET_JACK, }, [ALC287_FIXUP_TAS2781_I2C] = { .type = HDA_FIXUP_FUNC, @@ -9591,6 +9591,8 @@ static const struct hda_fixup alc269_fix [ALC287_FIXUP_THINKPAD_I2S_SPK] = { .type = HDA_FIXUP_FUNC, .v.func = alc287_fixup_bind_dacs, + .chained = true, + .chain_id = ALC285_FIXUP_THINKPAD_NO_BASS_SPK_HEADSET_JACK, }, [ALC287_FIXUP_MG_RTKC_CSAMP_CS35L41_I2C_THINKPAD] = { .type = HDA_FIXUP_FUNC,
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nathan Chancellor nathan@kernel.org
commit e3a9ee963ad8ba677ca925149812c5932b49af69 upstream.
Commit 90ceddcb4950 ("bpf: Support llvm-objcopy for vmlinux BTF") changed the ELF type of .btf.vmlinux.bin.o to ET_REL via dd, which works fine for little endian platforms:
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............| -00000010 03 00 b7 00 01 00 00 00 00 00 00 80 00 80 ff ff |................| +00000010 01 00 b7 00 01 00 00 00 00 00 00 80 00 80 ff ff |................|
However, for big endian platforms, it changes the wrong byte, resulting in an invalid ELF file type, which ld.lld rejects:
00000000 7f 45 4c 46 02 02 01 00 00 00 00 00 00 00 00 00 |.ELF............| -00000010 00 03 00 16 00 00 00 01 00 00 00 00 00 10 00 00 |................| +00000010 01 03 00 16 00 00 00 01 00 00 00 00 00 10 00 00 |................|
Type: <unknown>: 103
ld.lld: error: .btf.vmlinux.bin.o: unknown file type
Fix this by updating the entire 16-bit e_type field rather than just a single byte, so that everything works correctly for all platforms and linkers.
00000000 7f 45 4c 46 02 02 01 00 00 00 00 00 00 00 00 00 |.ELF............| -00000010 00 03 00 16 00 00 00 01 00 00 00 00 00 10 00 00 |................| +00000010 00 01 00 16 00 00 00 01 00 00 00 00 00 10 00 00 |................|
Type: REL (Relocatable file)
While in the area, update the comment to mention that binutils 2.35+ matches LLD's behavior of rejecting an ET_EXEC input, which occurred after the comment was added.
Cc: stable@vger.kernel.org Fixes: 90ceddcb4950 ("bpf: Support llvm-objcopy for vmlinux BTF") Link: https://github.com/llvm/llvm-project/pull/75643 Suggested-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Nathan Chancellor nathan@kernel.org Reviewed-by: Fangrui Song maskray@google.com Reviewed-by: Nicolas Schier nicolas@fjasle.eu Reviewed-by: Kees Cook keescook@chromium.org Reviewed-by: Justin Stitt justinstitt@google.com Signed-off-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- scripts/link-vmlinux.sh | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -135,8 +135,13 @@ gen_btf() ${OBJCOPY} --only-section=.BTF --set-section-flags .BTF=alloc,readonly \ --strip-all ${1} ${2} 2>/dev/null # Change e_type to ET_REL so that it can be used to link final vmlinux. - # Unlike GNU ld, lld does not allow an ET_EXEC input. - printf '\1' | dd of=${2} conv=notrunc bs=1 seek=16 status=none + # GNU ld 2.35+ and lld do not allow an ET_EXEC input. + if is_enabled CONFIG_CPU_BIG_ENDIAN; then + et_rel='\0\1' + else + et_rel='\1\0' + fi + printf "${et_rel}" | dd of=${2} conv=notrunc bs=1 seek=16 status=none }
# Create ${2} .S file with all symbols from the ${1} object file
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fedor Pchelkin pchelkin@ispras.ru
commit bfb007aebe6bff451f7f3a4be19f4f286d0d5d9c upstream.
rx_data_reassembly skb is stored during NCI data exchange for processing fragmented packets. It is dropped only when the last fragment is processed or when an NTF packet with NCI_OP_RF_DEACTIVATE_NTF opcode is received. However, the NCI device may be deallocated before that which leads to skb leak.
As by design the rx_data_reassembly skb is bound to the NCI device and nothing prevents the device to be freed before the skb is processed in some way and cleaned, free it on the NCI device cleanup.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation") Cc: stable@vger.kernel.org Reported-by: syzbot+6b7c68d9c21e4ee4251b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/000000000000f43987060043da7b@google.com/ Signed-off-by: Fedor Pchelkin pchelkin@ispras.ru Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/nfc/nci/core.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/net/nfc/nci/core.c +++ b/net/nfc/nci/core.c @@ -1208,6 +1208,10 @@ void nci_free_device(struct nci_dev *nde { nfc_free_device(ndev->nfc_dev); nci_hci_deallocate(ndev); + + /* drop partial rx data packet if present */ + if (ndev->rx_data_reassembly) + kfree_skb(ndev->rx_data_reassembly); kfree(ndev); } EXPORT_SYMBOL(nci_free_device);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikita Zhandarovich n.zhandarovich@fintech.ru
commit 37e8c97e539015637cb920d3e6f1e404f707a06e upstream.
Syzkaller reported [1] hitting a warning after failing to allocate resources for skb in hsr_init_skb(). Since a WARN_ONCE() call will not help much in this case, it might be prudent to switch to netdev_warn_once(). At the very least it will suppress syzkaller reports such as [1].
Just in case, use netdev_warn_once() in send_prp_supervision_frame() for similar reasons.
[1] HSR: Could not send supervision frame WARNING: CPU: 1 PID: 85 at net/hsr/hsr_device.c:294 send_hsr_supervision_frame+0x60a/0x810 net/hsr/hsr_device.c:294 RIP: 0010:send_hsr_supervision_frame+0x60a/0x810 net/hsr/hsr_device.c:294 ... Call Trace: <IRQ> hsr_announce+0x114/0x370 net/hsr/hsr_device.c:382 call_timer_fn+0x193/0x590 kernel/time/timer.c:1700 expire_timers kernel/time/timer.c:1751 [inline] __run_timers+0x764/0xb20 kernel/time/timer.c:2022 run_timer_softirq+0x58/0xd0 kernel/time/timer.c:2035 __do_softirq+0x21a/0x8de kernel/softirq.c:553 invoke_softirq kernel/softirq.c:427 [inline] __irq_exit_rcu kernel/softirq.c:632 [inline] irq_exit_rcu+0xb7/0x120 kernel/softirq.c:644 sysvec_apic_timer_interrupt+0x95/0xb0 arch/x86/kernel/apic/apic.c:1076 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:649 ...
This issue is also found in older kernels (at least up to 5.10).
Cc: stable@vger.kernel.org Reported-by: syzbot+3ae0a3f42c84074b7c8e@syzkaller.appspotmail.com Fixes: 121c33b07b31 ("net: hsr: introduce common code for skb initialization") Signed-off-by: Nikita Zhandarovich n.zhandarovich@fintech.ru Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/hsr/hsr_device.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/hsr/hsr_device.c +++ b/net/hsr/hsr_device.c @@ -291,7 +291,7 @@ static void send_hsr_supervision_frame(s
skb = hsr_init_skb(master); if (!skb) { - WARN_ONCE(1, "HSR: Could not send supervision frame\n"); + netdev_warn_once(master->dev, "HSR: Could not send supervision frame\n"); return; }
@@ -338,7 +338,7 @@ static void send_prp_supervision_frame(s
skb = hsr_init_skb(master); if (!skb) { - WARN_ONCE(1, "PRP: Could not send supervision frame\n"); + netdev_warn_once(master->dev, "PRP: Could not send supervision frame\n"); return; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Esben Haabendal esben@geanix.com
commit 4896bb7c0b31a0a3379b290ea7729900c59e0c69 upstream.
With the dma conf being reallocated on each call to stmmac_open(), any information in there is lost, unless we specifically handle it.
The STMMAC_TBS_EN bit is set when adding an etf qdisc, and the etf qdisc therefore would stop working when link was set down and then back up.
Fixes: ba39b344e924 ("net: ethernet: stmicro: stmmac: generate stmmac dma conf before open") Cc: stable@vger.kernel.org Signed-off-by: Esben Haabendal esben@geanix.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -3869,6 +3869,9 @@ static int __stmmac_open(struct net_devi priv->rx_copybreak = STMMAC_RX_COPYBREAK;
buf_sz = dma_conf->dma_buf_sz; + for (int i = 0; i < MTL_MAX_TX_QUEUES; i++) + if (priv->dma_conf.tx_queue[i].tbs & STMMAC_TBS_EN) + dma_conf->tx_queue[i].tbs = priv->dma_conf.tx_queue[i].tbs; memcpy(&priv->dma_conf, dma_conf, sizeof(*dma_conf));
stmmac_reset_queues_param(priv);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Helge Deller deller@gmx.de
commit 8b1d72395635af45410b66cc4c4ab37a12c4a831 upstream.
The current exception handler implementation, which assists when accessing user space memory, may exhibit random data corruption if the compiler decides to use a different register than the specified register %r29 (defined in ASM_EXCEPTIONTABLE_REG) for the error code. If the compiler choose another register, the fault handler will nevertheless store -EFAULT into %r29 and thus trash whatever this register is used for. Looking at the assembly I found that this happens sometimes in emulate_ldd().
To solve the issue, the easiest solution would be if it somehow is possible to tell the fault handler which register is used to hold the error code. Using %0 or %1 in the inline assembly is not posssible as it will show up as e.g. %r29 (with the "%r" prefix), which the GNU assembler can not convert to an integer.
This patch takes another, better and more flexible approach: We extend the __ex_table (which is out of the execution path) by one 32-word. In this word we tell the compiler to insert the assembler instruction "or %r0,%r0,%reg", where %reg references the register which the compiler choosed for the error return code. In case of an access failure, the fault handler finds the __ex_table entry and can examine the opcode. The used register is encoded in the lowest 5 bits, and the fault handler can then store -EFAULT into this register.
Since we extend the __ex_table to 3 words we can't use the BUILDTIME_TABLE_SORT config option any longer.
Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # v6.0+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/parisc/Kconfig | 1 arch/parisc/include/asm/assembly.h | 1 arch/parisc/include/asm/extable.h | 64 ++++++++++++++++++++++++++++++++ arch/parisc/include/asm/special_insns.h | 6 ++- arch/parisc/include/asm/uaccess.h | 48 +++--------------------- arch/parisc/kernel/cache.c | 4 +- arch/parisc/kernel/unaligned.c | 44 +++++++++++----------- arch/parisc/mm/fault.c | 11 ++++- 8 files changed, 108 insertions(+), 71 deletions(-) create mode 100644 arch/parisc/include/asm/extable.h
--- a/arch/parisc/Kconfig +++ b/arch/parisc/Kconfig @@ -25,7 +25,6 @@ config PARISC select RTC_DRV_GENERIC select INIT_ALL_POSSIBLE select BUG - select BUILDTIME_TABLE_SORT select HAVE_KERNEL_UNCOMPRESSED select HAVE_PCI select HAVE_PERF_EVENTS --- a/arch/parisc/include/asm/assembly.h +++ b/arch/parisc/include/asm/assembly.h @@ -576,6 +576,7 @@ .section __ex_table,"aw" ! \ .align 4 ! \ .word (fault_addr - .), (except_addr - .) ! \ + or %r0,%r0,%r0 ! \ .previous
--- /dev/null +++ b/arch/parisc/include/asm/extable.h @@ -0,0 +1,64 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __PARISC_EXTABLE_H +#define __PARISC_EXTABLE_H + +#include <asm/ptrace.h> +#include <linux/compiler.h> + +/* + * The exception table consists of three addresses: + * + * - A relative address to the instruction that is allowed to fault. + * - A relative address at which the program should continue (fixup routine) + * - An asm statement which specifies which CPU register will + * receive -EFAULT when an exception happens if the lowest bit in + * the fixup address is set. + * + * Note: The register specified in the err_opcode instruction will be + * modified at runtime if a fault happens. Register %r0 will be ignored. + * + * Since relative addresses are used, 32bit values are sufficient even on + * 64bit kernel. + */ + +struct pt_regs; +int fixup_exception(struct pt_regs *regs); + +#define ARCH_HAS_RELATIVE_EXTABLE +struct exception_table_entry { + int insn; /* relative address of insn that is allowed to fault. */ + int fixup; /* relative address of fixup routine */ + int err_opcode; /* sample opcode with register which holds error code */ +}; + +#define ASM_EXCEPTIONTABLE_ENTRY( fault_addr, except_addr, opcode )\ + ".section __ex_table,"aw"\n" \ + ".align 4\n" \ + ".word (" #fault_addr " - .), (" #except_addr " - .)\n" \ + opcode "\n" \ + ".previous\n" + +/* + * ASM_EXCEPTIONTABLE_ENTRY_EFAULT() creates a special exception table entry + * (with lowest bit set) for which the fault handler in fixup_exception() will + * load -EFAULT on fault into the register specified by the err_opcode instruction, + * and zeroes the target register in case of a read fault in get_user(). + */ +#define ASM_EXCEPTIONTABLE_VAR(__err_var) \ + int __err_var = 0 +#define ASM_EXCEPTIONTABLE_ENTRY_EFAULT( fault_addr, except_addr, register )\ + ASM_EXCEPTIONTABLE_ENTRY( fault_addr, except_addr + 1, "or %%r0,%%r0," register) + +static inline void swap_ex_entry_fixup(struct exception_table_entry *a, + struct exception_table_entry *b, + struct exception_table_entry tmp, + int delta) +{ + a->fixup = b->fixup + delta; + b->fixup = tmp.fixup - delta; + a->err_opcode = b->err_opcode; + b->err_opcode = tmp.err_opcode; +} +#define swap_ex_entry_fixup swap_ex_entry_fixup + +#endif --- a/arch/parisc/include/asm/special_insns.h +++ b/arch/parisc/include/asm/special_insns.h @@ -8,7 +8,8 @@ "copy %%r0,%0\n" \ "8:\tlpa %%r0(%1),%0\n" \ "9:\n" \ - ASM_EXCEPTIONTABLE_ENTRY(8b, 9b) \ + ASM_EXCEPTIONTABLE_ENTRY(8b, 9b, \ + "or %%r0,%%r0,%%r0") \ : "=&r" (pa) \ : "r" (va) \ : "memory" \ @@ -22,7 +23,8 @@ "copy %%r0,%0\n" \ "8:\tlpa %%r0(%%sr3,%1),%0\n" \ "9:\n" \ - ASM_EXCEPTIONTABLE_ENTRY(8b, 9b) \ + ASM_EXCEPTIONTABLE_ENTRY(8b, 9b, \ + "or %%r0,%%r0,%%r0") \ : "=&r" (pa) \ : "r" (va) \ : "memory" \ --- a/arch/parisc/include/asm/uaccess.h +++ b/arch/parisc/include/asm/uaccess.h @@ -7,6 +7,7 @@ */ #include <asm/page.h> #include <asm/cache.h> +#include <asm/extable.h>
#include <linux/bug.h> #include <linux/string.h> @@ -26,37 +27,6 @@ #define STD_USER(sr, x, ptr) __put_user_asm(sr, "std", x, ptr) #endif
-/* - * The exception table contains two values: the first is the relative offset to - * the address of the instruction that is allowed to fault, and the second is - * the relative offset to the address of the fixup routine. Since relative - * addresses are used, 32bit values are sufficient even on 64bit kernel. - */ - -#define ARCH_HAS_RELATIVE_EXTABLE -struct exception_table_entry { - int insn; /* relative address of insn that is allowed to fault. */ - int fixup; /* relative address of fixup routine */ -}; - -#define ASM_EXCEPTIONTABLE_ENTRY( fault_addr, except_addr )\ - ".section __ex_table,"aw"\n" \ - ".align 4\n" \ - ".word (" #fault_addr " - .), (" #except_addr " - .)\n\t" \ - ".previous\n" - -/* - * ASM_EXCEPTIONTABLE_ENTRY_EFAULT() creates a special exception table entry - * (with lowest bit set) for which the fault handler in fixup_exception() will - * load -EFAULT into %r29 for a read or write fault, and zeroes the target - * register in case of a read fault in get_user(). - */ -#define ASM_EXCEPTIONTABLE_REG 29 -#define ASM_EXCEPTIONTABLE_VAR(__variable) \ - register long __variable __asm__ ("r29") = 0 -#define ASM_EXCEPTIONTABLE_ENTRY_EFAULT( fault_addr, except_addr )\ - ASM_EXCEPTIONTABLE_ENTRY( fault_addr, except_addr + 1) - #define __get_user_internal(sr, val, ptr) \ ({ \ ASM_EXCEPTIONTABLE_VAR(__gu_err); \ @@ -83,7 +53,7 @@ struct exception_table_entry { \ __asm__("1: " ldx " 0(%%sr%2,%3),%0\n" \ "9:\n" \ - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 9b) \ + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 9b, "%1") \ : "=r"(__gu_val), "+r"(__gu_err) \ : "i"(sr), "r"(ptr)); \ \ @@ -115,8 +85,8 @@ struct exception_table_entry { "1: ldw 0(%%sr%2,%3),%0\n" \ "2: ldw 4(%%sr%2,%3),%R0\n" \ "9:\n" \ - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 9b) \ - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 9b) \ + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 9b, "%1") \ + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 9b, "%1") \ : "=&r"(__gu_tmp.l), "+r"(__gu_err) \ : "i"(sr), "r"(ptr)); \ \ @@ -174,7 +144,7 @@ struct exception_table_entry { __asm__ __volatile__ ( \ "1: " stx " %1,0(%%sr%2,%3)\n" \ "9:\n" \ - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 9b) \ + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 9b, "%0") \ : "+r"(__pu_err) \ : "r"(x), "i"(sr), "r"(ptr))
@@ -186,15 +156,14 @@ struct exception_table_entry { "1: stw %1,0(%%sr%2,%3)\n" \ "2: stw %R1,4(%%sr%2,%3)\n" \ "9:\n" \ - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 9b) \ - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 9b) \ + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 9b, "%0") \ + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 9b, "%0") \ : "+r"(__pu_err) \ : "r"(__val), "i"(sr), "r"(ptr)); \ } while (0)
#endif /* !defined(CONFIG_64BIT) */
- /* * Complex access routines -- external declarations */ @@ -216,7 +185,4 @@ unsigned long __must_check raw_copy_from #define INLINE_COPY_TO_USER #define INLINE_COPY_FROM_USER
-struct pt_regs; -int fixup_exception(struct pt_regs *regs); - #endif /* __PARISC_UACCESS_H */ --- a/arch/parisc/kernel/cache.c +++ b/arch/parisc/kernel/cache.c @@ -850,7 +850,7 @@ SYSCALL_DEFINE3(cacheflush, unsigned lon #endif " fic,m %3(%4,%0)\n" "2: sync\n" - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 2b) + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 2b, "%1") : "+r" (start), "+r" (error) : "r" (end), "r" (dcache_stride), "i" (SR_USER)); } @@ -865,7 +865,7 @@ SYSCALL_DEFINE3(cacheflush, unsigned lon #endif " fdc,m %3(%4,%0)\n" "2: sync\n" - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 2b) + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 2b, "%1") : "+r" (start), "+r" (error) : "r" (end), "r" (icache_stride), "i" (SR_USER)); } --- a/arch/parisc/kernel/unaligned.c +++ b/arch/parisc/kernel/unaligned.c @@ -120,8 +120,8 @@ static int emulate_ldh(struct pt_regs *r "2: ldbs 1(%%sr1,%3), %0\n" " depw %2, 23, 24, %0\n" "3: \n" - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b) - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b) + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b, "%1") + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b, "%1") : "+r" (val), "+r" (ret), "=&r" (temp1) : "r" (saddr), "r" (regs->isr) );
@@ -152,8 +152,8 @@ static int emulate_ldw(struct pt_regs *r " mtctl %2,11\n" " vshd %0,%3,%0\n" "3: \n" - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b) - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b) + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b, "%1") + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b, "%1") : "+r" (val), "+r" (ret), "=&r" (temp1), "=&r" (temp2) : "r" (saddr), "r" (regs->isr) );
@@ -189,8 +189,8 @@ static int emulate_ldd(struct pt_regs *r " mtsar %%r19\n" " shrpd %0,%%r20,%%sar,%0\n" "3: \n" - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b) - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b) + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b, "%1") + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b, "%1") : "=r" (val), "+r" (ret) : "0" (val), "r" (saddr), "r" (regs->isr) : "r19", "r20" ); @@ -209,9 +209,9 @@ static int emulate_ldd(struct pt_regs *r " vshd %0,%R0,%0\n" " vshd %R0,%4,%R0\n" "4: \n" - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 4b) - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 4b) - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(3b, 4b) + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 4b, "%1") + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 4b, "%1") + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(3b, 4b, "%1") : "+r" (val), "+r" (ret), "+r" (saddr), "=&r" (shift), "=&r" (temp1) : "r" (regs->isr) ); } @@ -244,8 +244,8 @@ static int emulate_sth(struct pt_regs *r "1: stb %1, 0(%%sr1, %3)\n" "2: stb %2, 1(%%sr1, %3)\n" "3: \n" - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b) - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b) + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b, "%0") + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b, "%0") : "+r" (ret), "=&r" (temp1) : "r" (val), "r" (regs->ior), "r" (regs->isr) );
@@ -285,8 +285,8 @@ static int emulate_stw(struct pt_regs *r " stw %%r20,0(%%sr1,%2)\n" " stw %%r21,4(%%sr1,%2)\n" "3: \n" - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b) - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b) + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b, "%0") + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b, "%0") : "+r" (ret) : "r" (val), "r" (regs->ior), "r" (regs->isr) : "r19", "r20", "r21", "r22", "r1" ); @@ -329,10 +329,10 @@ static int emulate_std(struct pt_regs *r "3: std %%r20,0(%%sr1,%2)\n" "4: std %%r21,8(%%sr1,%2)\n" "5: \n" - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 5b) - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 5b) - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(3b, 5b) - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(4b, 5b) + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 5b, "%0") + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 5b, "%0") + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(3b, 5b, "%0") + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(4b, 5b, "%0") : "+r" (ret) : "r" (val), "r" (regs->ior), "r" (regs->isr) : "r19", "r20", "r21", "r22", "r1" ); @@ -357,11 +357,11 @@ static int emulate_std(struct pt_regs *r "4: stw %%r1,4(%%sr1,%2)\n" "5: stw %R1,8(%%sr1,%2)\n" "6: \n" - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 6b) - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 6b) - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(3b, 6b) - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(4b, 6b) - ASM_EXCEPTIONTABLE_ENTRY_EFAULT(5b, 6b) + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 6b, "%0") + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 6b, "%0") + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(3b, 6b, "%0") + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(4b, 6b, "%0") + ASM_EXCEPTIONTABLE_ENTRY_EFAULT(5b, 6b, "%0") : "+r" (ret) : "r" (val), "r" (regs->ior), "r" (regs->isr) : "r19", "r20", "r21", "r1" ); --- a/arch/parisc/mm/fault.c +++ b/arch/parisc/mm/fault.c @@ -150,11 +150,16 @@ int fixup_exception(struct pt_regs *regs * Fix up get_user() and put_user(). * ASM_EXCEPTIONTABLE_ENTRY_EFAULT() sets the least-significant * bit in the relative address of the fixup routine to indicate - * that gr[ASM_EXCEPTIONTABLE_REG] should be loaded with - * -EFAULT to report a userspace access error. + * that the register encoded in the "or %r0,%r0,register" + * opcode should be loaded with -EFAULT to report a userspace + * access error. */ if (fix->fixup & 1) { - regs->gr[ASM_EXCEPTIONTABLE_REG] = -EFAULT; + int fault_error_reg = fix->err_opcode & 0x1f; + if (!WARN_ON(!fault_error_reg)) + regs->gr[fault_error_reg] = -EFAULT; + pr_debug("Unalignment fixup of register %d at %pS\n", + fault_error_reg, (void*)regs->iaoq[0]);
/* zero target register for get_user() */ if (parisc_acctyp(0, regs->iir) == VM_READ) {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Helge Deller deller@gmx.de
commit 913b9d443a0180cf0de3548f1ab3149378998486 upstream.
When using hotplug and bringing up a 32-bit CPU, ask the firmware about the BTLB information to set up the static (block) TLB entries.
For that write access to the static btlb_info struct is needed, but since it is marked __ro_after_init the kernel segfaults with missing write permissions.
Fix the crash by dropping the __ro_after_init annotation.
Fixes: e5ef93d02d6c ("parisc: BTLB: Initialize BTLB tables at CPU startup") Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # v6.6+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/parisc/kernel/cache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/parisc/kernel/cache.c +++ b/arch/parisc/kernel/cache.c @@ -58,7 +58,7 @@ int pa_serialize_tlb_flushes __ro_after_
struct pdc_cache_info cache_info __ro_after_init; #ifndef CONFIG_PA20 -struct pdc_btlb_info btlb_info __ro_after_init; +struct pdc_btlb_info btlb_info; #endif
DEFINE_STATIC_KEY_TRUE(parisc_has_cache);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jan Beulich jbeulich@suse.com
commit 7b55984c96ffe9e236eb9c82a2196e0b1f84990d upstream.
Invoking the make_tx_response() / push_tx_responses() pair with no lock held would be acceptable only if all such invocations happened from the same context (NAPI instance or dealloc thread). Since this isn't the case, and since the interface "spec" also doesn't demand that multicast operations may only be performed with no in-flight transmits, MCAST_{ADD,DEL} processing also needs to acquire the response lock around the invocations.
To prevent similar mistakes going forward, "downgrade" the present functions to private helpers of just the two remaining ones using them directly, with no forward declarations anymore. This involves renaming what so far was make_tx_response(), for the new function of that name to serve the new (wrapper) purpose.
While there, - constify the txp parameters, - correct xenvif_idx_release()'s status parameter's type, - rename {,_}make_tx_response()'s status parameters for consistency with xenvif_idx_release()'s.
Fixes: 210c34dcd8d9 ("xen-netback: add support for multicast control") Cc: stable@vger.kernel.org Signed-off-by: Jan Beulich jbeulich@suse.com Reviewed-by: Paul Durrant paul@xen.org Link: https://lore.kernel.org/r/980c6c3d-e10e-4459-8565-e8fbde122f00@suse.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/xen-netback/netback.c | 84 ++++++++++++++++++-------------------- 1 file changed, 40 insertions(+), 44 deletions(-)
--- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -104,13 +104,12 @@ bool provides_xdp_headroom = true; module_param(provides_xdp_headroom, bool, 0644);
static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx, - u8 status); + s8 status);
static void make_tx_response(struct xenvif_queue *queue, - struct xen_netif_tx_request *txp, + const struct xen_netif_tx_request *txp, unsigned int extra_count, - s8 st); -static void push_tx_responses(struct xenvif_queue *queue); + s8 status);
static void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx);
@@ -208,13 +207,9 @@ static void xenvif_tx_err(struct xenvif_ unsigned int extra_count, RING_IDX end) { RING_IDX cons = queue->tx.req_cons; - unsigned long flags;
do { - spin_lock_irqsave(&queue->response_lock, flags); make_tx_response(queue, txp, extra_count, XEN_NETIF_RSP_ERROR); - push_tx_responses(queue); - spin_unlock_irqrestore(&queue->response_lock, flags); if (cons == end) break; RING_COPY_REQUEST(&queue->tx, cons++, txp); @@ -465,12 +460,7 @@ static void xenvif_get_requests(struct x for (shinfo->nr_frags = 0; nr_slots > 0 && shinfo->nr_frags < MAX_SKB_FRAGS; nr_slots--) { if (unlikely(!txp->size)) { - unsigned long flags; - - spin_lock_irqsave(&queue->response_lock, flags); make_tx_response(queue, txp, 0, XEN_NETIF_RSP_OKAY); - push_tx_responses(queue); - spin_unlock_irqrestore(&queue->response_lock, flags); ++txp; continue; } @@ -496,14 +486,8 @@ static void xenvif_get_requests(struct x
for (shinfo->nr_frags = 0; shinfo->nr_frags < nr_slots; ++txp) { if (unlikely(!txp->size)) { - unsigned long flags; - - spin_lock_irqsave(&queue->response_lock, flags); make_tx_response(queue, txp, 0, XEN_NETIF_RSP_OKAY); - push_tx_responses(queue); - spin_unlock_irqrestore(&queue->response_lock, - flags); continue; }
@@ -995,7 +979,6 @@ static void xenvif_tx_build_gops(struct (ret == 0) ? XEN_NETIF_RSP_OKAY : XEN_NETIF_RSP_ERROR); - push_tx_responses(queue); continue; }
@@ -1007,7 +990,6 @@ static void xenvif_tx_build_gops(struct
make_tx_response(queue, &txreq, extra_count, XEN_NETIF_RSP_OKAY); - push_tx_responses(queue); continue; }
@@ -1433,8 +1415,35 @@ int xenvif_tx_action(struct xenvif_queue return work_done; }
+static void _make_tx_response(struct xenvif_queue *queue, + const struct xen_netif_tx_request *txp, + unsigned int extra_count, + s8 status) +{ + RING_IDX i = queue->tx.rsp_prod_pvt; + struct xen_netif_tx_response *resp; + + resp = RING_GET_RESPONSE(&queue->tx, i); + resp->id = txp->id; + resp->status = status; + + while (extra_count-- != 0) + RING_GET_RESPONSE(&queue->tx, ++i)->status = XEN_NETIF_RSP_NULL; + + queue->tx.rsp_prod_pvt = ++i; +} + +static void push_tx_responses(struct xenvif_queue *queue) +{ + int notify; + + RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&queue->tx, notify); + if (notify) + notify_remote_via_irq(queue->tx_irq); +} + static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx, - u8 status) + s8 status) { struct pending_tx_info *pending_tx_info; pending_ring_idx_t index; @@ -1444,8 +1453,8 @@ static void xenvif_idx_release(struct xe
spin_lock_irqsave(&queue->response_lock, flags);
- make_tx_response(queue, &pending_tx_info->req, - pending_tx_info->extra_count, status); + _make_tx_response(queue, &pending_tx_info->req, + pending_tx_info->extra_count, status);
/* Release the pending index before pusing the Tx response so * its available before a new Tx request is pushed by the @@ -1459,32 +1468,19 @@ static void xenvif_idx_release(struct xe spin_unlock_irqrestore(&queue->response_lock, flags); }
- static void make_tx_response(struct xenvif_queue *queue, - struct xen_netif_tx_request *txp, + const struct xen_netif_tx_request *txp, unsigned int extra_count, - s8 st) + s8 status) { - RING_IDX i = queue->tx.rsp_prod_pvt; - struct xen_netif_tx_response *resp; - - resp = RING_GET_RESPONSE(&queue->tx, i); - resp->id = txp->id; - resp->status = st; - - while (extra_count-- != 0) - RING_GET_RESPONSE(&queue->tx, ++i)->status = XEN_NETIF_RSP_NULL; + unsigned long flags;
- queue->tx.rsp_prod_pvt = ++i; -} + spin_lock_irqsave(&queue->response_lock, flags);
-static void push_tx_responses(struct xenvif_queue *queue) -{ - int notify; + _make_tx_response(queue, txp, extra_count, status); + push_tx_responses(queue);
- RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&queue->tx, notify); - if (notify) - notify_remote_via_irq(queue->tx_irq); + spin_unlock_irqrestore(&queue->response_lock, flags); }
static void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nathan Chancellor nathan@kernel.org
commit 846cfbeed09b45d985079a9173cf390cc053715b upstream.
The kernel builds with -fno-PIE, so commit 883354afbc10 ("um: link vmlinux with -no-pie") added the compiler linker flag '-no-pie' via cc-option because '-no-pie' was only supported in GCC 6.1.0 and newer.
While this works for GCC, this does not work for clang because cc-option uses '-c', which stops the pipeline right before linking, so '-no-pie' is unconsumed and clang warns, causing cc-option to fail just as it would if the option was entirely unsupported:
$ clang -Werror -no-pie -c -o /dev/null -x c /dev/null clang-16: error: argument unused during compilation: '-no-pie' [-Werror,-Wunused-command-line-argument]
A recent version of clang exposes this because it generates a relocation under '-mcmodel=large' that is not supported in PIE mode:
/usr/sbin/ld: init/main.o: relocation R_X86_64_32 against symbol `saved_command_line' can not be used when making a PIE object; recompile with -fPIE /usr/sbin/ld: failed to set dynamic section sizes: bad value clang: error: linker command failed with exit code 1 (use -v to see invocation)
Remove the cc-option check altogether. It is wasteful to invoke the compiler to check for '-no-pie' because only one supported compiler version does not support it, GCC 5.x (as it is supported with the minimum version of clang and GCC 6.1.0+). Use a combination of the gcc-min-version macro and CONFIG_CC_IS_CLANG to unconditionally add '-no-pie' with CONFIG_LD_SCRIPT_DYN=y, so that it is enabled with all compilers that support this. Furthermore, using gcc-min-version can help turn this back into
LINK-$(CONFIG_LD_SCRIPT_DYN) += -no-pie
when the minimum version of GCC is bumped past 6.1.0.
Cc: stable@vger.kernel.org Closes: https://github.com/ClangBuiltLinux/linux/issues/1982 Signed-off-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/um/Makefile | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/arch/um/Makefile +++ b/arch/um/Makefile @@ -115,7 +115,9 @@ archprepare: $(Q)$(MAKE) $(build)=$(HOST_DIR)/um include/generated/user_constants.h
LINK-$(CONFIG_LD_SCRIPT_STATIC) += -static -LINK-$(CONFIG_LD_SCRIPT_DYN) += $(call cc-option, -no-pie) +ifdef CONFIG_LD_SCRIPT_DYN +LINK-$(call gcc-min-version, 60100)$(CONFIG_CC_IS_CLANG) += -no-pie +endif LINK-$(CONFIG_LD_SCRIPT_DYN_RPATH) += -Wl,-rpath,/lib
CFLAGS_NO_HARDENING := $(call cc-option, -fno-PIC,) $(call cc-option, -fno-pic,) \
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nathan Chancellor nathan@kernel.org
commit 397586506c3da005b9333ce5947ad01e8018a3be upstream.
After the linked LLVM change, building ARCH=um defconfig results in a segmentation fault in modpost. Prior to commit a23e7584ecf3 ("modpost: unify 'sym' and 'to' in default_mismatch_handler()"), there was a warning:
WARNING: modpost: vmlinux.o(__ex_table+0x88): Section mismatch in reference to the .ltext:(unknown) WARNING: modpost: The relocation at __ex_table+0x88 references section ".ltext" which is not in the list of authorized sections. If you're adding a new section and/or if this reference is valid, add ".ltext" to the list of authorized sections to jump to on fault. This can be achieved by adding ".ltext" to OTHER_TEXT_SECTIONS in scripts/mod/modpost.c.
The linked LLVM change moves global objects to the '.ltext' (and '.ltext.*' with '-ffunction-sections') sections with '-mcmodel=large', which ARCH=um uses. These sections should be handled just as '.text' and '.text.*' are, so add them to TEXT_SECTIONS.
Cc: stable@vger.kernel.org Closes: https://github.com/ClangBuiltLinux/linux/issues/1981 Link: https://github.com/llvm/llvm-project/commit/4bf8a688956a759b7b6b8d94f42d25c1... Signed-off-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- scripts/mod/modpost.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -807,7 +807,8 @@ static void check_section(const char *mo
#define DATA_SECTIONS ".data", ".data.rel" #define TEXT_SECTIONS ".text", ".text.*", ".sched.text", \ - ".kprobes.text", ".cpuidle.text", ".noinstr.text" + ".kprobes.text", ".cpuidle.text", ".noinstr.text", \ + ".ltext", ".ltext.*" #define OTHER_TEXT_SECTIONS ".ref.text", ".head.text", ".spinlock.text", \ ".fixup", ".entry.text", ".exception.text", \ ".coldtext", ".softirqentry.text"
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Edson Juliano Drosdeck edson.drosdeck@gmail.com
commit c7de2d9bb68a5fc71c25ff96705a80a76c8436eb upstream.
Vaio VJFE-ADL is equipped with ALC269VC, and it needs ALC298_FIXUP_SPK_VOLUME quirk to make its headset mic work.
Signed-off-by: Edson Juliano Drosdeck edson.drosdeck@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240201122114.30080-1-edson.drosdeck@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -10312,6 +10312,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x1d72, 0x1945, "Redmi G", ALC256_FIXUP_ASUS_HEADSET_MIC), SND_PCI_QUIRK(0x1d72, 0x1947, "RedmiBook Air", ALC255_FIXUP_XIAOMI_HEADSET_MIC), SND_PCI_QUIRK(0x2782, 0x0232, "CHUWI CoreBook XPro", ALC269VB_FIXUP_CHUWI_COREBOOK_XPRO), + SND_PCI_QUIRK(0x2782, 0x1707, "Vaio VJFE-ADL", ALC298_FIXUP_SPK_VOLUME), SND_PCI_QUIRK(0x8086, 0x2074, "Intel NUC 8", ALC233_FIXUP_INTEL_NUC8_DMIC), SND_PCI_QUIRK(0x8086, 0x2080, "Intel NUC 8 Rugged", ALC256_FIXUP_INTEL_NUC8_RUGGED), SND_PCI_QUIRK(0x8086, 0x2081, "Intel NUC 10", ALC256_FIXUP_INTEL_NUC10),
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kailang Yang kailang@realtek.com
commit fcfc9f711d1e2fc7876ac12b1b16c509404b9625 upstream.
SSID 0x0c0d platform. It can't mute speaker when HP plugged. This patch add quirk to fill speaker pin verbtable. And disable speaker passthrough.
Signed-off-by: Kailang Yang kailang@realtek.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/38b82976a875451d833d514cee34ff6a@realtek.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -439,6 +439,10 @@ static void alc_fill_eapd_coef(struct hd alc_update_coef_idx(codec, 0x67, 0xf000, 0x3000); fallthrough; case 0x10ec0215: + case 0x10ec0285: + case 0x10ec0289: + alc_update_coef_idx(codec, 0x36, 1<<13, 0); + fallthrough; case 0x10ec0230: case 0x10ec0233: case 0x10ec0235: @@ -452,9 +456,7 @@ static void alc_fill_eapd_coef(struct hd case 0x10ec0283: case 0x10ec0286: case 0x10ec0288: - case 0x10ec0285: case 0x10ec0298: - case 0x10ec0289: case 0x10ec0300: alc_update_coef_idx(codec, 0x10, 1<<9, 0); break; @@ -9722,6 +9724,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x1028, 0x0b71, "Dell Inspiron 16 Plus 7620", ALC295_FIXUP_DELL_INSPIRON_TOP_SPEAKERS), SND_PCI_QUIRK(0x1028, 0x0beb, "Dell XPS 15 9530 (2023)", ALC289_FIXUP_DELL_CS35L41_SPI_2), SND_PCI_QUIRK(0x1028, 0x0c03, "Dell Precision 5340", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x1028, 0x0c0d, "Dell Oasis", ALC289_FIXUP_RTK_AMP_DUAL_SPK), SND_PCI_QUIRK(0x1028, 0x0c19, "Dell Precision 3340", ALC236_FIXUP_DELL_DUAL_CODECS), SND_PCI_QUIRK(0x1028, 0x0c1a, "Dell Precision 3340", ALC236_FIXUP_DELL_DUAL_CODECS), SND_PCI_QUIRK(0x1028, 0x0c1b, "Dell Precision 3440", ALC236_FIXUP_DELL_DUAL_CODECS),
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
commit 086df711d9b886194481b4fbe525eb43e9ae7403 upstream.
WCD938x sound codec driver ignores return status of getting regulators and returns EINVAL instead of EPROBE_DEFER. If regulator provider probes after the codec, system is left without probed audio:
wcd938x_codec audio-codec: wcd938x_probe: Fail to obtain platform data wcd938x_codec: probe of audio-codec failed with error -22
Fixes: 16572522aece ("ASoC: codecs: wcd938x-sdw: add SoundWire driver") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Link: https://msgid.link/r/20240117151208.1219755-1-krzysztof.kozlowski@linaro.org Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/codecs/wcd938x.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/soc/codecs/wcd938x.c +++ b/sound/soc/codecs/wcd938x.c @@ -3589,7 +3589,7 @@ static int wcd938x_probe(struct platform ret = wcd938x_populate_dt_data(wcd938x, dev); if (ret) { dev_err(dev, "%s: Fail to obtain platform data\n", __func__); - return -EINVAL; + return ret; }
ret = wcd938x_add_slave_components(wcd938x, dev, &match);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vitaly Rodionov vitalyr@opensource.cirrus.com
commit a2ed0a44d637ef9deca595054c206da7d6cbdcbc upstream.
Customer has reported an issue with specific desktop platform where two CS42L42 codecs are connected to CS8409 HDA bridge. If "Master Volume Control" is created then on Ubuntu OS UCM left/right balance slider in UI audio settings has no effect. This patch will fix this issue for a target paltform.
Fixes: 20e507724113 ("ALSA: hda/cs8409: Add support for dolphin") Signed-off-by: Vitaly Rodionov vitalyr@opensource.cirrus.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240122184710.5802-1-vitalyr@opensource.cirrus.co... Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_cs8409.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_cs8409.c +++ b/sound/pci/hda/patch_cs8409.c @@ -1371,6 +1371,7 @@ void dolphin_fixups(struct hda_codec *co spec->scodecs[CS8409_CODEC1] = &dolphin_cs42l42_1; spec->scodecs[CS8409_CODEC1]->codec = codec; spec->num_scodecs = 2; + spec->gen.suppress_vmaster = 1;
codec->patch_ops = cs8409_dolphin_patch_ops;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Chi andy.chi@canonical.com
commit 1513664f340289cf10402753110f3cff12a738aa upstream.
The HP ZBook Power using ALC236 codec which using 0x02 to control mute LED and 0x01 to control micmute LED. Therefore, add a quirk to make it works.
Signed-off-by: Andy Chi andy.chi@canonical.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240122074826.1020964-1-andy.chi@canonical.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -9951,6 +9951,8 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x103c, 0x8c72, "HP EliteBook 865 G11", ALC287_FIXUP_CS35L41_I2C_2_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8c96, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF), SND_PCI_QUIRK(0x103c, 0x8c97, "HP ZBook", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF), + SND_PCI_QUIRK(0x103c, 0x8ca1, "HP ZBook Power", ALC236_FIXUP_HP_GPIO_LED), + SND_PCI_QUIRK(0x103c, 0x8ca2, "HP ZBook Power", ALC236_FIXUP_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8ca4, "HP ZBook Fury", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8ca7, "HP ZBook Fury", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8cf5, "HP ZBook Studio 16", ALC245_FIXUP_CS35L41_SPI_4_HP_GPIO_LED),
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Carlos Llamas cmllamas@google.com
commit 97830f3c3088638ff90b20dfba2eb4d487bf14d7 upstream.
In (e)poll mode, threads often depend on I/O events to determine when data is ready for consumption. Within binder, a thread may initiate a command via BINDER_WRITE_READ without a read buffer and then make use of epoll_wait() or similar to consume any responses afterwards.
It is then crucial that epoll threads are signaled via wakeup when they queue their own work. Otherwise, they risk waiting indefinitely for an event leaving their work unhandled. What is worse, subsequent commands won't trigger a wakeup either as the thread has pending work.
Fixes: 457b9a6f09f0 ("Staging: android: add binder driver") Cc: Arve Hjønnevåg arve@android.com Cc: Martijn Coenen maco@android.com Cc: Alice Ryhl aliceryhl@google.com Cc: Steven Moreland smoreland@google.com Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Carlos Llamas cmllamas@google.com Link: https://lore.kernel.org/r/20240131215347.1808751-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/android/binder.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -478,6 +478,16 @@ binder_enqueue_thread_work_ilocked(struc { WARN_ON(!list_empty(&thread->waiting_thread_node)); binder_enqueue_work_ilocked(work, &thread->todo); + + /* (e)poll-based threads require an explicit wakeup signal when + * queuing their own work; they rely on these events to consume + * messages without I/O block. Without it, threads risk waiting + * indefinitely without handling the work. + */ + if (thread->looper & BINDER_LOOPER_STATE_POLL && + thread->pid == current->pid && !thread->process_todo) + wake_up_interruptible_sync(&thread->wait); + thread->process_todo = true; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ekansh Gupta quic_ekangupt@quicinc.com
commit a4e61de63e34860c36a71d1a364edba16fb6203b upstream.
In remoteproc shutdown sequence, rpmsg_remove will get called which would depopulate all the child nodes that have been created during rpmsg_probe. This would result in cb_remove call for all the context banks for the remoteproc. In cb_remove function, session 0 is getting skipped which is not correct as session 0 will never become available again. Add changes to mark session 0 also as invalid.
Fixes: f6f9279f2bf0 ("misc: fastrpc: Add Qualcomm fastrpc basic driver model") Cc: stable stable@kernel.org Signed-off-by: Ekansh Gupta quic_ekangupt@quicinc.com Link: https://lore.kernel.org/r/20240108114833.20480-1-quic_ekangupt@quicinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/misc/fastrpc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/misc/fastrpc.c +++ b/drivers/misc/fastrpc.c @@ -2191,7 +2191,7 @@ static int fastrpc_cb_remove(struct plat int i;
spin_lock_irqsave(&cctx->lock, flags); - for (i = 1; i < FASTRPC_MAX_SESSIONS; i++) { + for (i = 0; i < FASTRPC_MAX_SESSIONS; i++) { if (cctx->session[i].sid == sess->sid) { cctx->session[i].valid = false; cctx->sesscount--;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baokun Li libaokun1@huawei.com
commit 55583e899a5357308274601364741a83e78d6ac4 upstream.
In ext4_move_extents(), moved_len is only updated when all moves are successfully executed, and only discards orig_inode and donor_inode preallocations when moved_len is not zero. When the loop fails to exit after successfully moving some extents, moved_len is not updated and remains at 0, so it does not discard the preallocations.
If the moved extents overlap with the preallocated extents, the overlapped extents are freed twice in ext4_mb_release_inode_pa() and ext4_process_freed_data() (as described in commit 94d7c16cbbbd ("ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT")), and bb_free is incremented twice. Hence when trim is executed, a zero-division bug is triggered in mb_update_avg_fragment_size() because bb_free is not zero and bb_fragments is zero.
Therefore, update move_len after each extent move to avoid the issue.
Reported-by: Wei Chen harperchen1110@gmail.com Reported-by: xingwei lee xrivendell7@gmail.com Closes: https://lore.kernel.org/r/CAO4mrferzqBUnCag8R3m2zf897ts9UEuhjFQGPtODT92rYyR2... Fixes: fcf6b1b729bc ("ext4: refactor ext4_move_extents code base") CC: stable@vger.kernel.org # 3.18 Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20240104142040.2835097-2-libaokun1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/move_extent.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
--- a/fs/ext4/move_extent.c +++ b/fs/ext4/move_extent.c @@ -618,6 +618,7 @@ ext4_move_extents(struct file *o_filp, s goto out; o_end = o_start + len;
+ *moved_len = 0; while (o_start < o_end) { struct ext4_extent *ex; ext4_lblk_t cur_blk, next_blk; @@ -672,7 +673,7 @@ ext4_move_extents(struct file *o_filp, s */ ext4_double_up_write_data_sem(orig_inode, donor_inode); /* Swap original branches with new branches */ - move_extent_per_page(o_filp, donor_inode, + *moved_len += move_extent_per_page(o_filp, donor_inode, orig_page_index, donor_page_index, offset_in_page, cur_len, unwritten, &ret); @@ -682,9 +683,6 @@ ext4_move_extents(struct file *o_filp, s o_start += cur_len; d_start += cur_len; } - *moved_len = o_start - orig_blk; - if (*moved_len > len) - *moved_len = len;
out: if (*moved_len) {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baokun Li libaokun1@huawei.com
commit 2331fd4a49864e1571b4f50aa3aa1536ed6220d0 upstream.
After updating bb_free in mb_free_blocks, it is possible to return without updating bb_fragments because the block being freed is found to have already been freed, which leads to inconsistency between bb_free and bb_fragments.
Since the group may be unlocked in ext4_grp_locked_error(), this can lead to problems such as dividing by zero when calculating the average fragment length. Hence move the update of bb_free to after the block double-free check guarantees that the corresponding statistics are updated only after the core block bitmap is modified.
Fixes: eabe0444df90 ("ext4: speed-up releasing blocks on commit") CC: stable@vger.kernel.org # 3.10 Suggested-by: Jan Kara jack@suse.cz Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20240104142040.2835097-5-libaokun1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/mballoc.c | 39 +++++++++++++++++++++------------------ 1 file changed, 21 insertions(+), 18 deletions(-)
--- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -1910,11 +1910,6 @@ static void mb_free_blocks(struct inode mb_check_buddy(e4b); mb_free_blocks_double(inode, e4b, first, count);
- this_cpu_inc(discard_pa_seq); - e4b->bd_info->bb_free += count; - if (first < e4b->bd_info->bb_first_free) - e4b->bd_info->bb_first_free = first; - /* access memory sequentially: check left neighbour, * clear range and then check right neighbour */ @@ -1928,23 +1923,31 @@ static void mb_free_blocks(struct inode struct ext4_sb_info *sbi = EXT4_SB(sb); ext4_fsblk_t blocknr;
+ /* + * Fastcommit replay can free already freed blocks which + * corrupts allocation info. Regenerate it. + */ + if (sbi->s_mount_state & EXT4_FC_REPLAY) { + mb_regenerate_buddy(e4b); + goto check; + } + blocknr = ext4_group_first_block_no(sb, e4b->bd_group); blocknr += EXT4_C2B(sbi, block); - if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) { - ext4_grp_locked_error(sb, e4b->bd_group, - inode ? inode->i_ino : 0, - blocknr, - "freeing already freed block (bit %u); block bitmap corrupt.", - block); - ext4_mark_group_bitmap_corrupted( - sb, e4b->bd_group, + ext4_grp_locked_error(sb, e4b->bd_group, + inode ? inode->i_ino : 0, blocknr, + "freeing already freed block (bit %u); block bitmap corrupt.", + block); + ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group, EXT4_GROUP_INFO_BBITMAP_CORRUPT); - } else { - mb_regenerate_buddy(e4b); - } - goto done; + return; }
+ this_cpu_inc(discard_pa_seq); + e4b->bd_info->bb_free += count; + if (first < e4b->bd_info->bb_first_free) + e4b->bd_info->bb_first_free = first; + /* let's maintain fragments counter */ if (left_is_free && right_is_free) e4b->bd_info->bb_fragments--; @@ -1969,9 +1972,9 @@ static void mb_free_blocks(struct inode if (first <= last) mb_buddy_mark_free(e4b, first >> 1, last >> 1);
-done: mb_set_largest_free_order(sb, e4b->bd_info); mb_update_avg_fragment_size(sb, e4b->bd_info); +check: mb_check_buddy(e4b); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Bristot de Oliveira bristot@kernel.org
commit 1389358bb008e7625942846e9f03554319b7fecc upstream.
Currently, the timerlat's hrtimer is initialized at the first read of timerlat_fd, and destroyed at close(). It works, but it causes an error if the user program open() and close() the file without reading.
Here's an example:
# echo NO_OSNOISE_WORKLOAD > /sys/kernel/debug/tracing/osnoise/options # echo timerlat > /sys/kernel/debug/tracing/current_tracer
# cat <<EOF > ./timerlat_load.py # !/usr/bin/env python3
timerlat_fd = open("/sys/kernel/tracing/osnoise/per_cpu/cpu0/timerlat_fd", 'r') timerlat_fd.close(); EOF
# ./taskset -c 0 ./timerlat_load.py <BOOM>
BUG: kernel NULL pointer dereference, address: 0000000000000010 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 2673 Comm: python3 Not tainted 6.6.13-200.fc39.x86_64 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39 04/01/2014 RIP: 0010:hrtimer_active+0xd/0x50 Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 57 30 <8b> 42 10 a8 01 74 09 f3 90 8b 42 10 a8 01 75 f7 80 7f 38 00 75 1d RSP: 0018:ffffb031009b7e10 EFLAGS: 00010286 RAX: 000000000002db00 RBX: ffff9118f786db08 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9117a0e64400 RDI: ffff9118f786db08 RBP: ffff9118f786db80 R08: ffff9117a0ddd420 R09: ffff9117804d4f70 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9118f786db08 R13: ffff91178fdd5e20 R14: ffff9117840978c0 R15: 0000000000000000 FS: 00007f2ffbab1740(0000) GS:ffff9118f7840000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 00000001b402e000 CR4: 0000000000750ee0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x171/0x4e0 ? srso_alias_return_thunk+0x5/0x7f ? avc_has_extended_perms+0x237/0x520 ? exc_page_fault+0x7f/0x180 ? asm_exc_page_fault+0x26/0x30 ? hrtimer_active+0xd/0x50 hrtimer_cancel+0x15/0x40 timerlat_fd_release+0x48/0xe0 __fput+0xf5/0x290 __x64_sys_close+0x3d/0x80 do_syscall_64+0x60/0x90 ? srso_alias_return_thunk+0x5/0x7f ? __x64_sys_ioctl+0x72/0xd0 ? srso_alias_return_thunk+0x5/0x7f ? syscall_exit_to_user_mode+0x2b/0x40 ? srso_alias_return_thunk+0x5/0x7f ? do_syscall_64+0x6c/0x90 ? srso_alias_return_thunk+0x5/0x7f ? exit_to_user_mode_prepare+0x142/0x1f0 ? srso_alias_return_thunk+0x5/0x7f ? syscall_exit_to_user_mode+0x2b/0x40 ? srso_alias_return_thunk+0x5/0x7f ? do_syscall_64+0x6c/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f2ffb321594 Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d d5 cd 0d 00 00 74 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3c c3 0f 1f 00 55 48 89 e5 48 83 ec 10 89 7d RSP: 002b:00007ffe8d8eef18 EFLAGS: 00000202 ORIG_RAX: 0000000000000003 RAX: ffffffffffffffda RBX: 00007f2ffba4e668 RCX: 00007f2ffb321594 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 00007ffe8d8eef40 R08: 0000000000000000 R09: 0000000000000000 R10: 55c926e3167eae79 R11: 0000000000000202 R12: 0000000000000003 R13: 00007ffe8d8ef030 R14: 0000000000000000 R15: 00007f2ffba4e668 </TASK> CR2: 0000000000000010 ---[ end trace 0000000000000000 ]---
Move hrtimer_init to timerlat_fd open() to avoid this problem.
Link: https://lore.kernel.org/linux-trace-kernel/7324dd3fc0035658c99b825204a660493...
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: stable@vger.kernel.org Fixes: e88ed227f639 ("tracing/timerlat: Add user-space interface") Signed-off-by: Daniel Bristot de Oliveira bristot@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace_osnoise.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/kernel/trace/trace_osnoise.c +++ b/kernel/trace/trace_osnoise.c @@ -2444,6 +2444,9 @@ static int timerlat_fd_open(struct inode tlat = this_cpu_tmr_var(); tlat->count = 0;
+ hrtimer_init(&tlat->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED_HARD); + tlat->timer.function = timerlat_irq; + migrate_enable(); return 0; }; @@ -2526,9 +2529,6 @@ timerlat_fd_read(struct file *file, char tlat->tracing_thread = false; tlat->kthread = current;
- hrtimer_init(&tlat->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED_HARD); - tlat->timer.function = timerlat_irq; - /* Annotate now to drift new period */ tlat->abs_period = hrtimer_cb_get_time(&tlat->timer);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steven Rostedt (Google) rostedt@goodmis.org
commit 44dc5c41b5b1267d4dd037d26afc0c4d3a568acb upstream.
While looking at improving the saved_cmdlines cache I found a huge amount of wasted memory that should be used for the cmdlines.
The tracing data saves pids during the trace. At sched switch, if a trace occurred, it will save the comm of the task that did the trace. This is saved in a "cache" that maps pids to comms and exposed to user space via the /sys/kernel/tracing/saved_cmdlines file. Currently it only caches by default 128 comms.
The structure that uses this creates an array to store the pids using PID_MAX_DEFAULT (which is usually set to 32768). This causes the structure to be of the size of 131104 bytes on 64 bit machines.
In hex: 131104 = 0x20020, and since the kernel allocates generic memory in powers of two, the kernel would allocate 0x40000 or 262144 bytes to store this structure. That leaves 131040 bytes of wasted space.
Worse, the structure points to an allocated array to store the comm names, which is 16 bytes times the amount of names to save (currently 128), which is 2048 bytes. Instead of allocating a separate array, make the structure end with a variable length string and use the extra space for that.
This is similar to a recommendation that Linus had made about eventfs_inode names:
https://lore.kernel.org/all/20240130190355.11486-5-torvalds@linux-foundation...
Instead of allocating a separate string array to hold the saved comms, have the structure end with: char saved_cmdlines[]; and round up to the next power of two over sizeof(struct saved_cmdline_buffers) + num_cmdlines * TASK_COMM_LEN It will use this extra space for the saved_cmdline portion.
Now, instead of saving only 128 comms by default, by using this wasted space at the end of the structure it can save over 8000 comms and even saves space by removing the need for allocating the other array.
Link: https://lore.kernel.org/linux-trace-kernel/20240209063622.1f7b6d5f@rorschach...
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Vincent Donnefort vdonnefort@google.com Cc: Sven Schnelle svens@linux.ibm.com Cc: Mete Durlu meted@linux.ibm.com Fixes: 939c7a4f04fcd ("tracing: Introduce saved_cmdlines_size file") Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace.c | 75 +++++++++++++++++++++++++-------------------------- 1 file changed, 37 insertions(+), 38 deletions(-)
--- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -2312,7 +2312,7 @@ struct saved_cmdlines_buffer { unsigned *map_cmdline_to_pid; unsigned cmdline_num; int cmdline_idx; - char *saved_cmdlines; + char saved_cmdlines[]; }; static struct saved_cmdlines_buffer *savedcmd;
@@ -2326,47 +2326,58 @@ static inline void set_cmdline(int idx, strncpy(get_saved_cmdlines(idx), cmdline, TASK_COMM_LEN); }
-static int allocate_cmdlines_buffer(unsigned int val, - struct saved_cmdlines_buffer *s) +static void free_saved_cmdlines_buffer(struct saved_cmdlines_buffer *s) { + int order = get_order(sizeof(*s) + s->cmdline_num * TASK_COMM_LEN); + + kfree(s->map_cmdline_to_pid); + free_pages((unsigned long)s, order); +} + +static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val) +{ + struct saved_cmdlines_buffer *s; + struct page *page; + int orig_size, size; + int order; + + /* Figure out how much is needed to hold the given number of cmdlines */ + orig_size = sizeof(*s) + val * TASK_COMM_LEN; + order = get_order(orig_size); + size = 1 << (order + PAGE_SHIFT); + page = alloc_pages(GFP_KERNEL, order); + if (!page) + return NULL; + + s = page_address(page); + memset(s, 0, sizeof(*s)); + + /* Round up to actual allocation */ + val = (size - sizeof(*s)) / TASK_COMM_LEN; + s->cmdline_num = val; + s->map_cmdline_to_pid = kmalloc_array(val, sizeof(*s->map_cmdline_to_pid), GFP_KERNEL); - if (!s->map_cmdline_to_pid) - return -ENOMEM; - - s->saved_cmdlines = kmalloc_array(TASK_COMM_LEN, val, GFP_KERNEL); - if (!s->saved_cmdlines) { - kfree(s->map_cmdline_to_pid); - return -ENOMEM; + if (!s->map_cmdline_to_pid) { + free_saved_cmdlines_buffer(s); + return NULL; }
s->cmdline_idx = 0; - s->cmdline_num = val; memset(&s->map_pid_to_cmdline, NO_CMDLINE_MAP, sizeof(s->map_pid_to_cmdline)); memset(s->map_cmdline_to_pid, NO_CMDLINE_MAP, val * sizeof(*s->map_cmdline_to_pid));
- return 0; + return s; }
static int trace_create_savedcmd(void) { - int ret; - - savedcmd = kmalloc(sizeof(*savedcmd), GFP_KERNEL); - if (!savedcmd) - return -ENOMEM; + savedcmd = allocate_cmdlines_buffer(SAVED_CMDLINES_DEFAULT);
- ret = allocate_cmdlines_buffer(SAVED_CMDLINES_DEFAULT, savedcmd); - if (ret < 0) { - kfree(savedcmd); - savedcmd = NULL; - return -ENOMEM; - } - - return 0; + return savedcmd ? 0 : -ENOMEM; }
int is_tracing_stopped(void) @@ -6048,26 +6059,14 @@ tracing_saved_cmdlines_size_read(struct return simple_read_from_buffer(ubuf, cnt, ppos, buf, r); }
-static void free_saved_cmdlines_buffer(struct saved_cmdlines_buffer *s) -{ - kfree(s->saved_cmdlines); - kfree(s->map_cmdline_to_pid); - kfree(s); -} - static int tracing_resize_saved_cmdlines(unsigned int val) { struct saved_cmdlines_buffer *s, *savedcmd_temp;
- s = kmalloc(sizeof(*s), GFP_KERNEL); + s = allocate_cmdlines_buffer(val); if (!s) return -ENOMEM;
- if (allocate_cmdlines_buffer(val, s) < 0) { - kfree(s); - return -ENOMEM; - } - preempt_disable(); arch_spin_lock(&trace_cmdline_lock); savedcmd_temp = savedcmd;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thorsten Blum thorsten.blum@toblux.com
commit 9b6326354cf9a41521b79287da3bfab022ae0b6d upstream.
Fix trace_string() by assigning the string length to the return variable which got lost in commit ddeea494a16f ("tracing/synthetic: Use union instead of casts") and caused trace_string() to always return 0.
Link: https://lore.kernel.org/linux-trace-kernel/20240214220555.711598-1-thorsten....
Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Fixes: ddeea494a16f ("tracing/synthetic: Use union instead of casts") Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Thorsten Blum thorsten.blum@toblux.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace_events_synth.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/kernel/trace/trace_events_synth.c +++ b/kernel/trace/trace_events_synth.c @@ -441,8 +441,9 @@ static unsigned int trace_string(struct if (is_dynamic) { union trace_synth_field *data = &entry->fields[*n_u64];
+ len = fetch_store_strlen((unsigned long)str_val); data->as_dynamic.offset = struct_size(entry, fields, event->n_u64) + data_size; - data->as_dynamic.len = fetch_store_strlen((unsigned long)str_val); + data->as_dynamic.len = len;
ret = fetch_store_string((unsigned long)str_val, &entry->fields[*n_u64], entry);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masami Hiramatsu (Google) mhiramat@kernel.org
commit 8c427cc2fa73684ea140999e121b7b6c1c717632 upstream.
Fix to show a parse error for bad type (non-string) for $comm/$COMM and immediate-string. With this fix, error_log file shows appropriate error message as below.
/sys/kernel/tracing # echo 'p vfs_read $comm:u32' >> kprobe_events sh: write error: Invalid argument /sys/kernel/tracing # echo 'p vfs_read "hoge":u32' >> kprobe_events sh: write error: Invalid argument /sys/kernel/tracing # cat error_log
[ 30.144183] trace_kprobe: error: $comm and immediate-string only accepts string type Command: p vfs_read $comm:u32 ^ [ 62.618500] trace_kprobe: error: $comm and immediate-string only accepts string type Command: p vfs_read "hoge":u32 ^ Link: https://lore.kernel.org/all/170602215411.215583.2238016352271091852.stgit@de...
Fixes: 3dd1f7f24f8c ("tracing: probeevent: Fix to make the type of $comm string") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace_probe.c | 7 +++++-- kernel/trace/trace_probe.h | 3 ++- 2 files changed, 7 insertions(+), 3 deletions(-)
--- a/kernel/trace/trace_probe.c +++ b/kernel/trace/trace_probe.c @@ -1159,9 +1159,12 @@ static int traceprobe_parse_probe_arg_bo if (!(ctx->flags & TPARG_FL_TEVENT) && (strcmp(arg, "$comm") == 0 || strcmp(arg, "$COMM") == 0 || strncmp(arg, "\"", 2) == 0)) { - /* The type of $comm must be "string", and not an array. */ - if (parg->count || (t && strcmp(t, "string"))) + /* The type of $comm must be "string", and not an array type. */ + if (parg->count || (t && strcmp(t, "string"))) { + trace_probe_log_err(ctx->offset + (t ? (t - arg) : 0), + NEED_STRING_TYPE); goto out; + } parg->type = find_fetch_type("string", ctx->flags); } else parg->type = find_fetch_type(t, ctx->flags); --- a/kernel/trace/trace_probe.h +++ b/kernel/trace/trace_probe.h @@ -515,7 +515,8 @@ extern int traceprobe_define_arg_fields( C(BAD_HYPHEN, "Failed to parse single hyphen. Forgot '>'?"), \ C(NO_BTF_FIELD, "This field is not found."), \ C(BAD_BTF_TID, "Failed to get BTF type info."),\ - C(BAD_TYPE4STR, "This type does not fit for string."), + C(BAD_TYPE4STR, "This type does not fit for string."),\ + C(NEED_STRING_TYPE, "$comm and immediate-string only accepts string type"),
#undef C #define C(a, b) TP_ERR_##a
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masami Hiramatsu (Google) mhiramat@kernel.org
commit 9a571c1e275cedacd48c66a6bddd0c23f1dffdbf upstream.
Since the BTF type setting updates probe_arg::type, the type size calculation and setting print-fmt should be done after that. Without this fix, the argument size and print-fmt can be wrong.
Link: https://lore.kernel.org/all/170602218196.215583.6417859469540955777.stgit@de...
Fixes: b576e09701c7 ("tracing/probes: Support function parameters if BTF is available") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace_probe.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-)
--- a/kernel/trace/trace_probe.c +++ b/kernel/trace/trace_probe.c @@ -1172,18 +1172,6 @@ static int traceprobe_parse_probe_arg_bo trace_probe_log_err(ctx->offset + (t ? (t - arg) : 0), BAD_TYPE); goto out; } - parg->offset = *size; - *size += parg->type->size * (parg->count ?: 1); - - ret = -ENOMEM; - if (parg->count) { - len = strlen(parg->type->fmttype) + 6; - parg->fmt = kmalloc(len, GFP_KERNEL); - if (!parg->fmt) - goto out; - snprintf(parg->fmt, len, "%s[%d]", parg->type->fmttype, - parg->count); - }
code = tmp = kcalloc(FETCH_INSN_MAX, sizeof(*code), GFP_KERNEL); if (!code) @@ -1207,6 +1195,19 @@ static int traceprobe_parse_probe_arg_bo goto fail; } } + parg->offset = *size; + *size += parg->type->size * (parg->count ?: 1); + + if (parg->count) { + len = strlen(parg->type->fmttype) + 6; + parg->fmt = kmalloc(len, GFP_KERNEL); + if (!parg->fmt) { + ret = -ENOMEM; + goto out; + } + snprintf(parg->fmt, len, "%s[%d]", parg->type->fmttype, + parg->count); + }
ret = -EINVAL; /* Store operation */
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masami Hiramatsu (Google) mhiramat@kernel.org
commit 9704669c386f9bbfef2e002e7e690c56b7dcf5de upstream.
Fix to search a field from the structure which has anonymous union correctly. Since the reference `type` pointer was updated in the loop, the search loop suddenly aborted where it hits an anonymous union. Thus it can not find the field after the anonymous union. This avoids updating the cursor `type` pointer in the loop.
Link: https://lore.kernel.org/all/170791694361.389532.10047514554799419688.stgit@d...
Fixes: 302db0f5b3d8 ("tracing/probes: Add a function to search a member of a struct/union") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace_btf.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/trace_btf.c b/kernel/trace/trace_btf.c index ca224d53bfdc..5bbdbcbbde3c 100644 --- a/kernel/trace/trace_btf.c +++ b/kernel/trace/trace_btf.c @@ -91,8 +91,8 @@ const struct btf_member *btf_find_struct_member(struct btf *btf, for_each_member(i, type, member) { if (!member->name_off) { /* Anonymous union/struct: push it for later use */ - type = btf_type_skip_modifiers(btf, member->type, &tid); - if (type && top < BTF_ANON_STACK_MAX) { + if (btf_type_skip_modifiers(btf, member->type, &tid) && + top < BTF_ANON_STACK_MAX) { anon_stack[top].tid = tid; anon_stack[top++].offset = cur_offset + member->offset;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tejun Heo tj@kernel.org
commit aac8a59537dfc704ff344f1aacfd143c089ee20f upstream.
This reverts commit ca10d851b9ad0338c19e8e3089e24d565ebfffd7.
The commit allowed workqueue_apply_unbound_cpumask() to clear __WQ_ORDERED on now removed implicitly ordered workqueues. This was incorrect in that system-wide config change shouldn't break ordering properties of all workqueues. The reason why apply_workqueue_attrs() path was allowed to do so was because it was targeting the specific workqueue - either the workqueue had WQ_SYSFS set or the workqueue user specifically tried to change max_active, both of which indicate that the workqueue doesn't need to be ordered.
The implicitly ordered workqueue promotion was removed by the previous commit 3bc1e711c26b ("workqueue: Don't implicitly make UNBOUND workqueues w/ @max_active==1 ordered"). However, it didn't update this path and broke build. Let's revert the commit which was incorrect in the first place which also fixes build.
Signed-off-by: Tejun Heo tj@kernel.org Fixes: 3bc1e711c26b ("workqueue: Don't implicitly make UNBOUND workqueues w/ @max_active==1 ordered") Fixes: ca10d851b9ad ("workqueue: Override implicit ordered attribute in workqueue_apply_unbound_cpumask()") Cc: stable@vger.kernel.org # v6.6+ Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/workqueue.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)
--- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -5793,13 +5793,9 @@ static int workqueue_apply_unbound_cpuma list_for_each_entry(wq, &workqueues, list) { if (!(wq->flags & WQ_UNBOUND)) continue; - /* creating multiple pwqs breaks ordering guarantee */ - if (!list_empty(&wq->pwqs)) { - if (wq->flags & __WQ_ORDERED_EXPLICIT) - continue; - wq->flags &= ~__WQ_ORDERED; - } + if (wq->flags & __WQ_ORDERED) + continue;
ctx = apply_wqattrs_prepare(wq, wq->unbound_attrs, unbound_cpumask); if (IS_ERR(ctx)) {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Schiller david.schiller@jku.at
commit 6db053cd949fcd6254cea9f2cd5d39f7bd64379c upstream.
Commit 4c3577db3e4f ("Staging: iio: impedance-analyzer: Fix sparse warning") fixed a compiler warning, but introduced a bug that resulted in one of the two 16 bit IIO channels always being zero (when both are enabled).
This is because int is 32 bits wide on most architectures and in the case of a little-endian machine the two most significant bytes would occupy the buffer for the second channel as 'val' is being passed as a void pointer to 'iio_push_to_buffers()'.
Fix by defining 'val' as u16. Tested working on ARM64.
Fixes: 4c3577db3e4f ("Staging: iio: impedance-analyzer: Fix sparse warning") Signed-off-by: David Schiller david.schiller@jku.at Link: https://lore.kernel.org/r/20240122134916.2137957-1-david.schiller@jku.at Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/staging/iio/impedance-analyzer/ad5933.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/staging/iio/impedance-analyzer/ad5933.c +++ b/drivers/staging/iio/impedance-analyzer/ad5933.c @@ -608,7 +608,7 @@ static void ad5933_work(struct work_stru struct ad5933_state, work.work); struct iio_dev *indio_dev = i2c_get_clientdata(st->client); __be16 buf[2]; - int val[2]; + u16 val[2]; unsigned char status; int ret;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: zhili.liu zhili.liu@ucas.com.cn
commit 792595bab4925aa06532a14dd256db523eb4fa5e upstream.
Recently, we encounter kernel crash in function rm3100_common_probe caused by out of bound access of array rm3100_samp_rates (because of underlying hardware failures). Add boundary check to prevent out of bound access.
Fixes: 121354b2eceb ("iio: magnetometer: Add driver support for PNI RM3100") Suggested-by: Zhouyi Zhou zhouzhouyi@gmail.com Signed-off-by: zhili.liu zhili.liu@ucas.com.cn Link: https://lore.kernel.org/r/1704157631-3814-1-git-send-email-zhouzhouyi@gmail.... Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/magnetometer/rm3100-core.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/drivers/iio/magnetometer/rm3100-core.c +++ b/drivers/iio/magnetometer/rm3100-core.c @@ -530,6 +530,7 @@ int rm3100_common_probe(struct device *d struct rm3100_data *data; unsigned int tmp; int ret; + int samp_rate_index;
indio_dev = devm_iio_device_alloc(dev, sizeof(*data)); if (!indio_dev) @@ -586,9 +587,14 @@ int rm3100_common_probe(struct device *d ret = regmap_read(regmap, RM3100_REG_TMRC, &tmp); if (ret < 0) return ret; + + samp_rate_index = tmp - RM3100_TMRC_OFFSET; + if (samp_rate_index < 0 || samp_rate_index >= RM3100_SAMP_NUM) { + dev_err(dev, "The value read from RM3100_REG_TMRC is invalid!\n"); + return -EINVAL; + } /* Initializing max wait time, which is double conversion time. */ - data->conversion_time = rm3100_samp_rates[tmp - RM3100_TMRC_OFFSET][2] - * 2; + data->conversion_time = rm3100_samp_rates[samp_rate_index][2] * 2;
/* Cycle count values may not be what we want. */ if ((tmp - RM3100_TMRC_OFFSET) == 0)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dinghao Liu dinghao.liu@zju.edu.cn
commit 95a0d596bbd0552a78e13ced43f2be1038883c81 upstream.
When iio_device_register_sysfs_group() fails, we should free iio_dev_opaque->chan_attr_group.attrs to prevent potential memleak.
Fixes: 32f171724e5c ("iio: core: rework iio device group creation") Signed-off-by: Dinghao Liu dinghao.liu@zju.edu.cn Link: https://lore.kernel.org/r/20231208073119.29283-1-dinghao.liu@zju.edu.cn Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/industrialio-core.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/iio/industrialio-core.c +++ b/drivers/iio/industrialio-core.c @@ -1581,10 +1581,13 @@ static int iio_device_register_sysfs(str ret = iio_device_register_sysfs_group(indio_dev, &iio_dev_opaque->chan_attr_group); if (ret) - goto error_clear_attrs; + goto error_free_chan_attrs;
return 0;
+error_free_chan_attrs: + kfree(iio_dev_opaque->chan_attr_group.attrs); + iio_dev_opaque->chan_attr_group.attrs = NULL; error_clear_attrs: iio_free_chan_devattr_list(&iio_dev_opaque->channel_attr_list);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nuno Sa nuno.sa@analog.com
commit 862cf85fef85becc55a173387527adb4f076fab0 upstream.
Aligning the buffer to the L1 cache is not sufficient in some platforms as they might have larger cacheline sizes for caches after L1 and thus, we can't guarantee DMA safety.
That was the whole reason to introduce IIO_DMA_MINALIGN in [1]. Do the same for st_sensors common buffer.
While at it, moved the odr_lock before buffer_data as we definitely don't want any other data to share a cacheline with the buffer.
[1]: https://lore.kernel.org/linux-iio/20220508175712.647246-2-jic23@kernel.org/
Fixes: e031d5f558f1 ("iio:st_sensors: remove buffer allocation at each buffer enable") Signed-off-by: Nuno Sa nuno.sa@analog.com Cc: Stable@vger.kernel.org Link: https://lore.kernel.org/r/20240131-dev_dma_safety_stm-v2-1-580c07fae51b@anal... Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/iio/common/st_sensors.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/include/linux/iio/common/st_sensors.h +++ b/include/linux/iio/common/st_sensors.h @@ -258,9 +258,9 @@ struct st_sensor_data { bool hw_irq_trigger; s64 hw_timestamp;
- char buffer_data[ST_SENSORS_MAX_BUFFER_SIZE] ____cacheline_aligned; - struct mutex odr_lock; + + char buffer_data[ST_SENSORS_MAX_BUFFER_SIZE] __aligned(IIO_DMA_MINALIGN); };
#ifdef CONFIG_IIO_BUFFER
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
commit 4cb81840d8f29b66d9d05c6d7f360c9560f7e2f4 upstream.
The kernel fails when compiling without `CONFIG_REGMAP_I2C` but with `CONFIG_BMA400`. ``` ld: drivers/iio/accel/bma400_i2c.o: in function `bma400_i2c_probe': bma400_i2c.c:(.text+0x23): undefined reference to `__devm_regmap_init_i2c' ```
Link: https://download.01.org/0day-ci/archive/20240131/202401311634.FE5CBVwe-lkp@i... Fixes: 465c811f1f20 ("iio: accel: Add driver for the BMA400") Fixes: 9bea10642396 ("iio: accel: bma400: add support for bma400 spi") Signed-off-by: Mario Limonciello mario.limonciello@amd.com Link: https://lore.kernel.org/r/20240131225246.14169-1-mario.limonciello@amd.com Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/accel/Kconfig | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/iio/accel/Kconfig +++ b/drivers/iio/accel/Kconfig @@ -219,10 +219,12 @@ config BMA400
config BMA400_I2C tristate + select REGMAP_I2C depends on BMA400
config BMA400_SPI tristate + select REGMAP_SPI depends on BMA400
config BMC150_ACCEL
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nuno Sa nuno.sa@analog.com
commit 59598510be1d49e1cff7fd7593293bb8e1b2398b upstream.
Aligning the buffer to the L1 cache is not sufficient in some platforms as they might have larger cacheline sizes for caches after L1 and thus, we can't guarantee DMA safety.
That was the whole reason to introduce IIO_DMA_MINALIGN in [1]. Do the same for the sigma_delta ADCs.
[1]: https://lore.kernel.org/linux-iio/20220508175712.647246-2-jic23@kernel.org/
Fixes: 0fb6ee8d0b5e ("iio: ad_sigma_delta: Don't put SPI transfer buffer on the stack") Signed-off-by: Nuno Sa nuno.sa@analog.com Link: https://lore.kernel.org/r/20240117-dev_sigma_delta_no_irq_flags-v1-1-db39261... Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/iio/adc/ad_sigma_delta.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/include/linux/iio/adc/ad_sigma_delta.h +++ b/include/linux/iio/adc/ad_sigma_delta.h @@ -8,6 +8,8 @@ #ifndef __AD_SIGMA_DELTA_H__ #define __AD_SIGMA_DELTA_H__
+#include <linux/iio/iio.h> + enum ad_sigma_delta_mode { AD_SD_MODE_CONTINUOUS = 0, AD_SD_MODE_SINGLE = 1, @@ -99,7 +101,7 @@ struct ad_sigma_delta { * 'rx_buf' is up to 32 bits per sample + 64 bit timestamp, * rounded to 16 bytes to take into account padding. */ - uint8_t tx_buf[4] ____cacheline_aligned; + uint8_t tx_buf[4] __aligned(IIO_DMA_MINALIGN); uint8_t rx_buf[16] __aligned(8); };
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nuno Sa nuno.sa@analog.com
commit 8e98b87f515d8c4bae521048a037b2cc431c3fd5 upstream.
Aligning the buffer to the L1 cache is not sufficient in some platforms as they might have larger cacheline sizes for caches after L1 and thus, we can't guarantee DMA safety.
That was the whole reason to introduce IIO_DMA_MINALIGN in [1]. Do the same for the sigma_delta ADCs.
[1]: https://lore.kernel.org/linux-iio/20220508175712.647246-2-jic23@kernel.org/
Fixes: ccd2b52f4ac6 ("staging:iio: Add common ADIS library") Signed-off-by: Nuno Sa nuno.sa@analog.com Link: https://lore.kernel.org/r/20240117-adis-improv-v1-1-7f90e9fad200@analog.com Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/iio/imu/adis.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/include/linux/iio/imu/adis.h +++ b/include/linux/iio/imu/adis.h @@ -11,6 +11,7 @@
#include <linux/spi/spi.h> #include <linux/interrupt.h> +#include <linux/iio/iio.h> #include <linux/iio/types.h>
#define ADIS_WRITE_REG(reg) ((0x80 | (reg))) @@ -131,7 +132,7 @@ struct adis { unsigned long irq_flag; void *buffer;
- u8 tx[10] ____cacheline_aligned; + u8 tx[10] __aligned(IIO_DMA_MINALIGN); u8 rx[4]; };
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Randy Dunlap rdunlap@infradead.org
commit 35ec2d03b282a939949090bd8c39eb37a5856721 upstream.
There are a ton of build errors when REGMAP is not set, so select REGMAP to fix all of them.
Examples (not all of them):
../drivers/iio/imu/bno055/bno055_ser_core.c:495:15: error: variable 'bno055_ser_regmap_bus' has initializer but incomplete type 495 | static struct regmap_bus bno055_ser_regmap_bus = { ../drivers/iio/imu/bno055/bno055_ser_core.c:496:10: error: 'struct regmap_bus' has no member named 'write' 496 | .write = bno055_ser_write_reg, ../drivers/iio/imu/bno055/bno055_ser_core.c:497:10: error: 'struct regmap_bus' has no member named 'read' 497 | .read = bno055_ser_read_reg, ../drivers/iio/imu/bno055/bno055_ser_core.c: In function 'bno055_ser_probe': ../drivers/iio/imu/bno055/bno055_ser_core.c:532:18: error: implicit declaration of function 'devm_regmap_init'; did you mean 'vmem_map_init'? [-Werror=implicit-function-declaration] 532 | regmap = devm_regmap_init(&serdev->dev, &bno055_ser_regmap_bus, ../drivers/iio/imu/bno055/bno055_ser_core.c:532:16: warning: assignment to 'struct regmap *' from 'int' makes pointer from integer without a cast [-Wint-conversion] 532 | regmap = devm_regmap_init(&serdev->dev, &bno055_ser_regmap_bus, ../drivers/iio/imu/bno055/bno055_ser_core.c: At top level: ../drivers/iio/imu/bno055/bno055_ser_core.c:495:26: error: storage size of 'bno055_ser_regmap_bus' isn't known 495 | static struct regmap_bus bno055_ser_regmap_bus = {
Fixes: 2eef5a9cc643 ("iio: imu: add BNO055 serdev driver") Signed-off-by: Randy Dunlap rdunlap@infradead.org Cc: Andrea Merello andrea.merello@iit.it Cc: Jonathan Cameron jic23@kernel.org Cc: Lars-Peter Clausen lars@metafoo.de Cc: linux-iio@vger.kernel.org Cc: Stable@vger.kernel.org Link: https://lore.kernel.org/r/20240110185611.19723-1-rdunlap@infradead.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/imu/bno055/Kconfig | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/iio/imu/bno055/Kconfig +++ b/drivers/iio/imu/bno055/Kconfig @@ -8,6 +8,7 @@ config BOSCH_BNO055 config BOSCH_BNO055_SERIAL tristate "Bosch BNO055 attached via UART" depends on SERIAL_DEV_BUS + select REGMAP select BOSCH_BNO055 help Enable this to support Bosch BNO055 IMUs attached via UART.
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sam Protsenko semen.protsenko@linaro.org
commit b67f3e653e305abf1471934d7b9fdb9ad2df3eef upstream.
"bmp085" is missing in bmp280_spi_id[] table, which leads to the next warning in dmesg:
SPI driver bmp280 has no spi_device_id for bosch,bmp085
Add "bmp085" to bmp280_spi_id[] by mimicking its existing description in bmp280_of_spi_match[] table to fix the above warning.
Signed-off-by: Sam Protsenko semen.protsenko@linaro.org Fixes: b26b4e91700f ("iio: pressure: bmp280: add SPI interface driver") Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Reviewed-by: Linus Walleij linus.walleij@linaro.org Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/pressure/bmp280-spi.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/iio/pressure/bmp280-spi.c +++ b/drivers/iio/pressure/bmp280-spi.c @@ -91,6 +91,7 @@ static const struct of_device_id bmp280_ MODULE_DEVICE_TABLE(of, bmp280_of_spi_match);
static const struct spi_device_id bmp280_spi_id[] = { + { "bmp085", (kernel_ulong_t)&bmp180_chip_info }, { "bmp180", (kernel_ulong_t)&bmp180_chip_info }, { "bmp181", (kernel_ulong_t)&bmp180_chip_info }, { "bmp280", (kernel_ulong_t)&bmp280_chip_info },
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eugen Hristev eugen.hristev@collabora.com
commit c41336f4d69057cbf88fed47951379b384540df5 upstream.
If the power domains are registered first with genpd and *after that* the driver attempts to power them on in the probe sequence, then it is possible that a race condition occurs if genpd tries to power them on in the same time. The same is valid for powering them off before unregistering them from genpd. Attempt to fix race conditions by first removing the domains from genpd and *after that* powering down domains. Also first power up the domains and *after that* register them to genpd.
Fixes: 59b644b01cf4 ("soc: mediatek: Add MediaTek SCPSYS power domains") Signed-off-by: Eugen Hristev eugen.hristev@collabora.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231225133615.78993-1-eugen.hristev@collabora.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pmdomain/mediatek/mtk-pm-domains.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-)
--- a/drivers/pmdomain/mediatek/mtk-pm-domains.c +++ b/drivers/pmdomain/mediatek/mtk-pm-domains.c @@ -561,6 +561,11 @@ static int scpsys_add_subdomain(struct s goto err_put_node; }
+ /* recursive call to add all subdomains */ + ret = scpsys_add_subdomain(scpsys, child); + if (ret) + goto err_put_node; + ret = pm_genpd_add_subdomain(parent_pd, child_pd); if (ret) { dev_err(scpsys->dev, "failed to add %s subdomain to parent %s\n", @@ -570,11 +575,6 @@ static int scpsys_add_subdomain(struct s dev_dbg(scpsys->dev, "%s add subdomain: %s\n", parent_pd->name, child_pd->name); } - - /* recursive call to add all subdomains */ - ret = scpsys_add_subdomain(scpsys, child); - if (ret) - goto err_put_node; }
return 0; @@ -588,9 +588,6 @@ static void scpsys_remove_one_domain(str { int ret;
- if (scpsys_domain_is_on(pd)) - scpsys_power_off(&pd->genpd); - /* * We're in the error cleanup already, so we only complain, * but won't emit another error on top of the original one. @@ -600,6 +597,8 @@ static void scpsys_remove_one_domain(str dev_err(pd->scpsys->dev, "failed to remove domain '%s' : %d - state may be inconsistent\n", pd->genpd.name, ret); + if (scpsys_domain_is_on(pd)) + scpsys_power_off(&pd->genpd);
clk_bulk_put(pd->num_clks, pd->clks); clk_bulk_put(pd->num_subsys_clks, pd->subsys_clks);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Young sean@mess.org
commit 6a9d552483d50953320b9d3b57abdee8d436f23f upstream.
Note that bpf attach/detach also requires CAP_NET_ADMIN.
Cc: stable@vger.kernel.org Signed-off-by: Sean Young sean@mess.org Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/rc/bpf-lirc.c | 6 +++--- drivers/media/rc/lirc_dev.c | 5 ++++- drivers/media/rc/rc-core-priv.h | 2 +- 3 files changed, 8 insertions(+), 5 deletions(-)
--- a/drivers/media/rc/bpf-lirc.c +++ b/drivers/media/rc/bpf-lirc.c @@ -253,7 +253,7 @@ int lirc_prog_attach(const union bpf_att if (attr->attach_flags) return -EINVAL;
- rcdev = rc_dev_get_from_fd(attr->target_fd); + rcdev = rc_dev_get_from_fd(attr->target_fd, true); if (IS_ERR(rcdev)) return PTR_ERR(rcdev);
@@ -278,7 +278,7 @@ int lirc_prog_detach(const union bpf_att if (IS_ERR(prog)) return PTR_ERR(prog);
- rcdev = rc_dev_get_from_fd(attr->target_fd); + rcdev = rc_dev_get_from_fd(attr->target_fd, true); if (IS_ERR(rcdev)) { bpf_prog_put(prog); return PTR_ERR(rcdev); @@ -303,7 +303,7 @@ int lirc_prog_query(const union bpf_attr if (attr->query.query_flags) return -EINVAL;
- rcdev = rc_dev_get_from_fd(attr->query.target_fd); + rcdev = rc_dev_get_from_fd(attr->query.target_fd, false); if (IS_ERR(rcdev)) return PTR_ERR(rcdev);
--- a/drivers/media/rc/lirc_dev.c +++ b/drivers/media/rc/lirc_dev.c @@ -814,7 +814,7 @@ void __exit lirc_dev_exit(void) unregister_chrdev_region(lirc_base_dev, RC_DEV_MAX); }
-struct rc_dev *rc_dev_get_from_fd(int fd) +struct rc_dev *rc_dev_get_from_fd(int fd, bool write) { struct fd f = fdget(fd); struct lirc_fh *fh; @@ -828,6 +828,9 @@ struct rc_dev *rc_dev_get_from_fd(int fd return ERR_PTR(-EINVAL); }
+ if (write && !(f.file->f_mode & FMODE_WRITE)) + return ERR_PTR(-EPERM); + fh = f.file->private_data; dev = fh->rc;
--- a/drivers/media/rc/rc-core-priv.h +++ b/drivers/media/rc/rc-core-priv.h @@ -325,7 +325,7 @@ void lirc_raw_event(struct rc_dev *dev, void lirc_scancode_event(struct rc_dev *dev, struct lirc_scancode *lsc); int lirc_register(struct rc_dev *dev); void lirc_unregister(struct rc_dev *dev); -struct rc_dev *rc_dev_get_from_fd(int fd); +struct rc_dev *rc_dev_get_from_fd(int fd, bool write); #else static inline int lirc_dev_init(void) { return 0; } static inline void lirc_dev_exit(void) {}
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Steven Rostedt (Google)" rostedt@goodmis.org
commit b0f7e2d739b4aac131ea1662d086a07775097b05 upstream.
The "lookup" parameter is a way to differentiate the call to create_file/dir_dentry() from when it's just a lookup (no need to up the dentry refcount) and accessed via a readdir (need to up the refcount).
But reality, it just makes the code more complex. Just up the refcount and let the caller decide to dput() the result or not.
Link: https://lore.kernel.org/linux-trace-kernel/20240103102553.17a19cea@gandalf.l... Link: https://lore.kernel.org/linux-trace-kernel/20240104015435.517502710@goodmis....
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Ajay Kaher akaher@vmware.com Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Christian Brauner brauner@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/tracefs/event_inode.c | 55 +++++++++++++++++------------------------------ 1 file changed, 20 insertions(+), 35 deletions(-)
--- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -419,16 +419,14 @@ void eventfs_set_ei_status_free(struct t * @mode: The mode of the file. * @data: The data to use to set the inode of the file with on open() * @fops: The fops of the file to be created. - * @lookup: If called by the lookup routine, in which case, dput() the created dentry. * * Create a dentry for a file of an eventfs_inode @ei and place it into the - * address located at @e_dentry. If the @e_dentry already has a dentry, then - * just do a dget() on it and return. Otherwise create the dentry and attach it. + * address located at @e_dentry. */ static struct dentry * create_file_dentry(struct eventfs_inode *ei, int idx, struct dentry *parent, const char *name, umode_t mode, void *data, - const struct file_operations *fops, bool lookup) + const struct file_operations *fops) { struct eventfs_attr *attr = NULL; struct dentry **e_dentry = &ei->d_children[idx]; @@ -443,9 +441,7 @@ create_file_dentry(struct eventfs_inode } /* If the e_dentry already has a dentry, use it */ if (*e_dentry) { - /* lookup does not need to up the ref count */ - if (!lookup) - dget(*e_dentry); + dget(*e_dentry); mutex_unlock(&eventfs_mutex); return *e_dentry; } @@ -470,13 +466,12 @@ create_file_dentry(struct eventfs_inode * way to being freed, don't return it. If e_dentry is NULL * it means it was already freed. */ - if (ei->is_freed) + if (ei->is_freed) { dentry = NULL; - else + } else { dentry = *e_dentry; - /* The lookup does not need to up the dentry refcount */ - if (dentry && !lookup) dget(dentry); + } mutex_unlock(&eventfs_mutex); return dentry; } @@ -494,9 +489,6 @@ create_file_dentry(struct eventfs_inode } mutex_unlock(&eventfs_mutex);
- if (lookup) - dput(dentry); - return dentry; }
@@ -529,13 +521,12 @@ static void eventfs_post_create_dir(stru * @pei: The eventfs_inode parent of ei. * @ei: The eventfs_inode to create the directory for * @parent: The dentry of the parent of this directory - * @lookup: True if this is called by the lookup code * * This creates and attaches a directory dentry to the eventfs_inode @ei. */ static struct dentry * create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei, - struct dentry *parent, bool lookup) + struct dentry *parent) { struct dentry *dentry = NULL;
@@ -547,11 +538,9 @@ create_dir_dentry(struct eventfs_inode * return NULL; } if (ei->dentry) { - /* If the dentry already has a dentry, use it */ + /* If the eventfs_inode already has a dentry, use it */ dentry = ei->dentry; - /* lookup does not need to up the ref count */ - if (!lookup) - dget(dentry); + dget(dentry); mutex_unlock(&eventfs_mutex); return dentry; } @@ -571,7 +560,7 @@ create_dir_dentry(struct eventfs_inode * * way to being freed. */ dentry = ei->dentry; - if (dentry && !lookup) + if (dentry) dget(dentry); mutex_unlock(&eventfs_mutex); return dentry; @@ -591,9 +580,6 @@ create_dir_dentry(struct eventfs_inode * } mutex_unlock(&eventfs_mutex);
- if (lookup) - dput(dentry); - return dentry; }
@@ -618,8 +604,8 @@ static struct dentry *eventfs_root_looku struct eventfs_inode *ei; struct dentry *ei_dentry = NULL; struct dentry *ret = NULL; + struct dentry *d; const char *name = dentry->d_name.name; - bool created = false; umode_t mode; void *data; int idx; @@ -655,13 +641,10 @@ static struct dentry *eventfs_root_looku ret = simple_lookup(dir, dentry, flags); if (IS_ERR(ret)) goto out; - create_dir_dentry(ei, ei_child, ei_dentry, true); - created = true; - break; - } - - if (created) + d = create_dir_dentry(ei, ei_child, ei_dentry); + dput(d); goto out; + }
for (i = 0; i < ei->nr_entries; i++) { entry = &ei->entries[i]; @@ -679,8 +662,8 @@ static struct dentry *eventfs_root_looku ret = simple_lookup(dir, dentry, flags); if (IS_ERR(ret)) goto out; - create_file_dentry(ei, i, ei_dentry, name, mode, cdata, - fops, true); + d = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops); + dput(d); break; } } @@ -797,9 +780,10 @@ static int dcache_dir_open_wrapper(struc inode_lock(parent->d_inode); list_for_each_entry_srcu(ei_child, &ei->children, list, srcu_read_lock_held(&eventfs_srcu)) { - d = create_dir_dentry(ei, ei_child, parent, false); + d = create_dir_dentry(ei, ei_child, parent); if (d) { ret = add_dentries(&dentries, d, cnt); + dput(d); if (ret < 0) break; cnt++; @@ -819,9 +803,10 @@ static int dcache_dir_open_wrapper(struc mutex_unlock(&eventfs_mutex); if (r <= 0) continue; - d = create_file_dentry(ei, i, parent, name, mode, cdata, fops, false); + d = create_file_dentry(ei, i, parent, name, mode, cdata, fops); if (d) { ret = add_dentries(&dentries, d, cnt); + dput(d); if (ret < 0) break; cnt++;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Steven Rostedt (Google)" rostedt@goodmis.org
commit 493ec81a8fb8e4ada6f223b8b73791a1280d4774 upstream.
The eventfs creates dynamically allocated dentries and inodes. Using the dcache_readdir() logic for its own directory lookups requires hiding the cursor of the dcache logic and playing games to allow the dcache_readdir() to still have access to the cursor while the eventfs saved what it created and what it needs to release.
Instead, just have eventfs have its own iterate_shared callback function that will fill in the dent entries. This simplifies the code quite a bit.
Link: https://lore.kernel.org/linux-trace-kernel/20240104015435.682218477@goodmis....
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Ajay Kaher akaher@vmware.com Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Christian Brauner brauner@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/tracefs/event_inode.c | 194 +++++++++++++++-------------------------------- 1 file changed, 64 insertions(+), 130 deletions(-)
--- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -53,9 +53,7 @@ enum { static struct dentry *eventfs_root_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags); -static int dcache_dir_open_wrapper(struct inode *inode, struct file *file); -static int dcache_readdir_wrapper(struct file *file, struct dir_context *ctx); -static int eventfs_release(struct inode *inode, struct file *file); +static int eventfs_iterate(struct file *file, struct dir_context *ctx);
static void update_attr(struct eventfs_attr *attr, struct iattr *iattr) { @@ -213,11 +211,9 @@ static const struct inode_operations eve };
static const struct file_operations eventfs_file_operations = { - .open = dcache_dir_open_wrapper, .read = generic_read_dir, - .iterate_shared = dcache_readdir_wrapper, + .iterate_shared = eventfs_iterate, .llseek = generic_file_llseek, - .release = eventfs_release, };
/* Return the evenfs_inode of the "events" directory */ @@ -672,128 +668,87 @@ static struct dentry *eventfs_root_looku return ret; }
-struct dentry_list { - void *cursor; - struct dentry **dentries; -}; - -/** - * eventfs_release - called to release eventfs file/dir - * @inode: inode to be released - * @file: file to be released (not used) - */ -static int eventfs_release(struct inode *inode, struct file *file) -{ - struct tracefs_inode *ti; - struct dentry_list *dlist = file->private_data; - void *cursor; - int i; - - ti = get_tracefs(inode); - if (!(ti->flags & TRACEFS_EVENT_INODE)) - return -EINVAL; - - if (WARN_ON_ONCE(!dlist)) - return -EINVAL; - - for (i = 0; dlist->dentries && dlist->dentries[i]; i++) { - dput(dlist->dentries[i]); - } - - cursor = dlist->cursor; - kfree(dlist->dentries); - kfree(dlist); - file->private_data = cursor; - return dcache_dir_close(inode, file); -} - -static int add_dentries(struct dentry ***dentries, struct dentry *d, int cnt) -{ - struct dentry **tmp; - - tmp = krealloc(*dentries, sizeof(d) * (cnt + 2), GFP_NOFS); - if (!tmp) - return -1; - tmp[cnt] = d; - tmp[cnt + 1] = NULL; - *dentries = tmp; - return 0; -} - -/** - * dcache_dir_open_wrapper - eventfs open wrapper - * @inode: not used - * @file: dir to be opened (to create it's children) - * - * Used to dynamic create file/dir with-in @file, all the - * file/dir will be created. If already created then references - * will be increased +/* + * Walk the children of a eventfs_inode to fill in getdents(). */ -static int dcache_dir_open_wrapper(struct inode *inode, struct file *file) +static int eventfs_iterate(struct file *file, struct dir_context *ctx) { const struct file_operations *fops; + struct inode *f_inode = file_inode(file); const struct eventfs_entry *entry; struct eventfs_inode *ei_child; struct tracefs_inode *ti; struct eventfs_inode *ei; - struct dentry_list *dlist; - struct dentry **dentries = NULL; - struct dentry *parent = file_dentry(file); - struct dentry *d; - struct inode *f_inode = file_inode(file); - const char *name = parent->d_name.name; + struct dentry *ei_dentry = NULL; + struct dentry *dentry; + const char *name; umode_t mode; - void *data; - int cnt = 0; int idx; - int ret; - int i; - int r; + int ret = -EINVAL; + int ino; + int i, r, c; + + if (!dir_emit_dots(file, ctx)) + return 0;
ti = get_tracefs(f_inode); if (!(ti->flags & TRACEFS_EVENT_INODE)) return -EINVAL;
- if (WARN_ON_ONCE(file->private_data)) - return -EINVAL; + c = ctx->pos - 2;
idx = srcu_read_lock(&eventfs_srcu);
mutex_lock(&eventfs_mutex); ei = READ_ONCE(ti->private); + if (ei && !ei->is_freed) + ei_dentry = READ_ONCE(ei->dentry); mutex_unlock(&eventfs_mutex);
- if (!ei) { - srcu_read_unlock(&eventfs_srcu, idx); - return -EINVAL; - } - - - data = ei->data; + if (!ei || !ei_dentry) + goto out;
- dlist = kmalloc(sizeof(*dlist), GFP_KERNEL); - if (!dlist) { - srcu_read_unlock(&eventfs_srcu, idx); - return -ENOMEM; - } + ret = 0;
- inode_lock(parent->d_inode); + /* + * Need to create the dentries and inodes to have a consistent + * inode number. + */ list_for_each_entry_srcu(ei_child, &ei->children, list, srcu_read_lock_held(&eventfs_srcu)) { - d = create_dir_dentry(ei, ei_child, parent); - if (d) { - ret = add_dentries(&dentries, d, cnt); - dput(d); - if (ret < 0) - break; - cnt++; + + if (c > 0) { + c--; + continue; } + + if (ei_child->is_freed) + continue; + + name = ei_child->name; + + dentry = create_dir_dentry(ei, ei_child, ei_dentry); + if (!dentry) + goto out; + ino = dentry->d_inode->i_ino; + dput(dentry); + + if (!dir_emit(ctx, name, strlen(name), ino, DT_DIR)) + goto out; + ctx->pos++; }
for (i = 0; i < ei->nr_entries; i++) { - void *cdata = data; + void *cdata = ei->data; + + if (c > 0) { + c--; + continue; + } + entry = &ei->entries[i]; name = entry->name; + mutex_lock(&eventfs_mutex); /* If ei->is_freed, then the event itself may be too */ if (!ei->is_freed) @@ -803,42 +758,21 @@ static int dcache_dir_open_wrapper(struc mutex_unlock(&eventfs_mutex); if (r <= 0) continue; - d = create_file_dentry(ei, i, parent, name, mode, cdata, fops); - if (d) { - ret = add_dentries(&dentries, d, cnt); - dput(d); - if (ret < 0) - break; - cnt++; - } - } - inode_unlock(parent->d_inode); - srcu_read_unlock(&eventfs_srcu, idx); - ret = dcache_dir_open(inode, file);
- /* - * dcache_dir_open() sets file->private_data to a dentry cursor. - * Need to save that but also save all the dentries that were - * opened by this function. - */ - dlist->cursor = file->private_data; - dlist->dentries = dentries; - file->private_data = dlist; - return ret; -} + dentry = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops); + if (!dentry) + goto out; + ino = dentry->d_inode->i_ino; + dput(dentry);
-/* - * This just sets the file->private_data back to the cursor and back. - */ -static int dcache_readdir_wrapper(struct file *file, struct dir_context *ctx) -{ - struct dentry_list *dlist = file->private_data; - int ret; + if (!dir_emit(ctx, name, strlen(name), ino, DT_REG)) + goto out; + ctx->pos++; + } + ret = 1; + out: + srcu_read_unlock(&eventfs_srcu, idx);
- file->private_data = dlist->cursor; - ret = dcache_readdir(file, ctx); - dlist->cursor = file->private_data; - file->private_data = dlist; return ret; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Steven Rostedt (Google)" rostedt@goodmis.org
commit e109deadb73318cf4a3bd61287d969f705df278f upstream.
If ei->is_freed is set in eventfs_iterate(), it means that the directory that is being iterated on is in the process of being freed. Just exit the loop immediately when that is ever detected, and separate out the return of the entry->callback() from ei->is_freed.
Link: https://lore.kernel.org/linux-trace-kernel/20240104220048.016261289@goodmis....
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Al Viro viro@zeniv.linux.org.uk Cc: Christian Brauner brauner@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/tracefs/event_inode.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
--- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -750,11 +750,12 @@ static int eventfs_iterate(struct file * name = entry->name;
mutex_lock(&eventfs_mutex); - /* If ei->is_freed, then the event itself may be too */ - if (!ei->is_freed) - r = entry->callback(name, &mode, &cdata, &fops); - else - r = -1; + /* If ei->is_freed then just bail here, nothing more to do */ + if (ei->is_freed) { + mutex_unlock(&eventfs_mutex); + goto out; + } + r = entry->callback(name, &mode, &cdata, &fops); mutex_unlock(&eventfs_mutex); if (r <= 0) continue;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Steven Rostedt (Google)" rostedt@goodmis.org
commit 1e4624eb5a0ecaae0d2c4e3019bece119725bb98 upstream.
The ctx->pos was only updated when it added an entry, but the "skip to current pos" check (c--) happened for every loop regardless of if the entry was added or not. This inconsistency caused readdir to be incorrect.
It was due to:
for (i = 0; i < ei->nr_entries; i++) {
if (c > 0) { c--; continue; }
mutex_lock(&eventfs_mutex); /* If ei->is_freed then just bail here, nothing more to do */ if (ei->is_freed) { mutex_unlock(&eventfs_mutex); goto out; } r = entry->callback(name, &mode, &cdata, &fops); mutex_unlock(&eventfs_mutex);
[..] ctx->pos++; }
But this can cause the iterator to return a file that was already read. That's because of the way the callback() works. Some events may not have all files, and the callback can return 0 to tell eventfs to skip the file for this directory.
for instance, we have:
# ls /sys/kernel/tracing/events/ftrace/function format hist hist_debug id inject
and
# ls /sys/kernel/tracing/events/sched/sched_switch/ enable filter format hist hist_debug id inject trigger
Where the function directory is missing "enable", "filter" and "trigger". That's because the callback() for events has:
static int event_callback(const char *name, umode_t *mode, void **data, const struct file_operations **fops) { struct trace_event_file *file = *data; struct trace_event_call *call = file->event_call;
[..]
/* * Only event directories that can be enabled should have * triggers or filters, with the exception of the "print" * event that can have a "trigger" file. */ if (!(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) { if (call->class->reg && strcmp(name, "enable") == 0) { *mode = TRACE_MODE_WRITE; *fops = &ftrace_enable_fops; return 1; }
if (strcmp(name, "filter") == 0) { *mode = TRACE_MODE_WRITE; *fops = &ftrace_event_filter_fops; return 1; } }
if (!(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE) || strcmp(trace_event_name(call), "print") == 0) { if (strcmp(name, "trigger") == 0) { *mode = TRACE_MODE_WRITE; *fops = &event_trigger_fops; return 1; } } [..] return 0; }
Where the function event has the TRACE_EVENT_FL_IGNORE_ENABLE set.
This means that the entries array elements for "enable", "filter" and "trigger" when called on the function event will have the callback return 0 and not 1, to tell eventfs to skip these files for it.
Because the "skip to current ctx->pos" check happened for all entries, but the ctx->pos++ only happened to entries that exist, it would confuse the reading of a directory. Which would cause:
# ls /sys/kernel/tracing/events/ftrace/function/ format hist hist hist_debug hist_debug id inject inject
The missing "enable", "filter" and "trigger" caused ls to show "hist", "hist_debug" and "inject" twice.
Update the ctx->pos for every iteration to keep its update and the "skip" update consistent. This also means that on error, the ctx->pos needs to be decremented if it was incremented without adding something.
Link: https://lore.kernel.org/all/20240104150500.38b15a62@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20240104220048.172295263@goodmis....
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Al Viro viro@zeniv.linux.org.uk Cc: Christian Brauner brauner@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Fixes: 493ec81a8fb8e ("eventfs: Stop using dcache_readdir() for getdents()") Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/tracefs/event_inode.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-)
--- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -722,6 +722,8 @@ static int eventfs_iterate(struct file * continue; }
+ ctx->pos++; + if (ei_child->is_freed) continue;
@@ -729,13 +731,12 @@ static int eventfs_iterate(struct file *
dentry = create_dir_dentry(ei, ei_child, ei_dentry); if (!dentry) - goto out; + goto out_dec; ino = dentry->d_inode->i_ino; dput(dentry);
if (!dir_emit(ctx, name, strlen(name), ino, DT_DIR)) - goto out; - ctx->pos++; + goto out_dec; }
for (i = 0; i < ei->nr_entries; i++) { @@ -746,6 +747,8 @@ static int eventfs_iterate(struct file * continue; }
+ ctx->pos++; + entry = &ei->entries[i]; name = entry->name;
@@ -753,7 +756,7 @@ static int eventfs_iterate(struct file * /* If ei->is_freed then just bail here, nothing more to do */ if (ei->is_freed) { mutex_unlock(&eventfs_mutex); - goto out; + goto out_dec; } r = entry->callback(name, &mode, &cdata, &fops); mutex_unlock(&eventfs_mutex); @@ -762,19 +765,23 @@ static int eventfs_iterate(struct file *
dentry = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops); if (!dentry) - goto out; + goto out_dec; ino = dentry->d_inode->i_ino; dput(dentry);
if (!dir_emit(ctx, name, strlen(name), ino, DT_REG)) - goto out; - ctx->pos++; + goto out_dec; } ret = 1; out: srcu_read_unlock(&eventfs_srcu, idx);
return ret; + + out_dec: + /* Incremented ctx->pos without adding something, reset it */ + ctx->pos--; + goto out; }
/**
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Steven Rostedt (Google)" rostedt@goodmis.org
commit 704f960dbee2f1634f4b4e16f208cb16eaf41c1e upstream.
In order to apply a shortcut to skip over the current ctx->pos immediately, by using the ei->entries array, the reading of that array should be first. Moving the array reading before the linked list reading will make the shortcut change diff nicer to read.
Link: https://lore.kernel.org/all/CAHk-=wiKwDUDv3+jCsv-uacDcHDVTYsXtBR9=6sGM5mqX+D... Link: https://lore.kernel.org/linux-trace-kernel/20240104220048.333115095@goodmis....
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Al Viro viro@zeniv.linux.org.uk Cc: Christian Brauner brauner@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/tracefs/event_inode.c | 46 +++++++++++++++++++++++----------------------- 1 file changed, 23 insertions(+), 23 deletions(-)
--- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -714,8 +714,8 @@ static int eventfs_iterate(struct file * * Need to create the dentries and inodes to have a consistent * inode number. */ - list_for_each_entry_srcu(ei_child, &ei->children, list, - srcu_read_lock_held(&eventfs_srcu)) { + for (i = 0; i < ei->nr_entries; i++) { + void *cdata = ei->data;
if (c > 0) { c--; @@ -724,23 +724,32 @@ static int eventfs_iterate(struct file *
ctx->pos++;
- if (ei_child->is_freed) - continue; + entry = &ei->entries[i]; + name = entry->name;
- name = ei_child->name; + mutex_lock(&eventfs_mutex); + /* If ei->is_freed then just bail here, nothing more to do */ + if (ei->is_freed) { + mutex_unlock(&eventfs_mutex); + goto out_dec; + } + r = entry->callback(name, &mode, &cdata, &fops); + mutex_unlock(&eventfs_mutex); + if (r <= 0) + continue;
- dentry = create_dir_dentry(ei, ei_child, ei_dentry); + dentry = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops); if (!dentry) goto out_dec; ino = dentry->d_inode->i_ino; dput(dentry);
- if (!dir_emit(ctx, name, strlen(name), ino, DT_DIR)) + if (!dir_emit(ctx, name, strlen(name), ino, DT_REG)) goto out_dec; }
- for (i = 0; i < ei->nr_entries; i++) { - void *cdata = ei->data; + list_for_each_entry_srcu(ei_child, &ei->children, list, + srcu_read_lock_held(&eventfs_srcu)) {
if (c > 0) { c--; @@ -749,27 +758,18 @@ static int eventfs_iterate(struct file *
ctx->pos++;
- entry = &ei->entries[i]; - name = entry->name; - - mutex_lock(&eventfs_mutex); - /* If ei->is_freed then just bail here, nothing more to do */ - if (ei->is_freed) { - mutex_unlock(&eventfs_mutex); - goto out_dec; - } - r = entry->callback(name, &mode, &cdata, &fops); - mutex_unlock(&eventfs_mutex); - if (r <= 0) + if (ei_child->is_freed) continue;
- dentry = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops); + name = ei_child->name; + + dentry = create_dir_dentry(ei, ei_child, ei_dentry); if (!dentry) goto out_dec; ino = dentry->d_inode->i_ino; dput(dentry);
- if (!dir_emit(ctx, name, strlen(name), ino, DT_REG)) + if (!dir_emit(ctx, name, strlen(name), ino, DT_DIR)) goto out_dec; } ret = 1;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Steven Rostedt (Google)" rostedt@goodmis.org
commit 1de94b52d5e8d8b32f0252f14fad1f1edc2e71f1 upstream.
As the ei->entries array is fixed for the duration of the eventfs_inode, it can be used to skip over already read entries in eventfs_iterate().
That is, if ctx->pos is greater than zero, there's no reason in doing the loop across the ei->entries array for the entries less than ctx->pos. Instead, start the lookup of the entries at the current ctx->pos.
Link: https://lore.kernel.org/all/CAHk-=wiKwDUDv3+jCsv-uacDcHDVTYsXtBR9=6sGM5mqX+D... Link: https://lore.kernel.org/linux-trace-kernel/20240104220048.494956957@goodmis....
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Al Viro viro@zeniv.linux.org.uk Cc: Christian Brauner brauner@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Suggested-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/tracefs/event_inode.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-)
--- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -708,21 +708,15 @@ static int eventfs_iterate(struct file * if (!ei || !ei_dentry) goto out;
- ret = 0; - /* * Need to create the dentries and inodes to have a consistent * inode number. */ - for (i = 0; i < ei->nr_entries; i++) { - void *cdata = ei->data; - - if (c > 0) { - c--; - continue; - } + ret = 0;
- ctx->pos++; + /* Start at 'c' to jump over already read entries */ + for (i = c; i < ei->nr_entries; i++, ctx->pos++) { + void *cdata = ei->data;
entry = &ei->entries[i]; name = entry->name; @@ -731,7 +725,7 @@ static int eventfs_iterate(struct file * /* If ei->is_freed then just bail here, nothing more to do */ if (ei->is_freed) { mutex_unlock(&eventfs_mutex); - goto out_dec; + goto out; } r = entry->callback(name, &mode, &cdata, &fops); mutex_unlock(&eventfs_mutex); @@ -740,14 +734,17 @@ static int eventfs_iterate(struct file *
dentry = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops); if (!dentry) - goto out_dec; + goto out; ino = dentry->d_inode->i_ino; dput(dentry);
if (!dir_emit(ctx, name, strlen(name), ino, DT_REG)) - goto out_dec; + goto out; }
+ /* Subtract the skipped entries above */ + c -= min((unsigned int)c, (unsigned int)ei->nr_entries); + list_for_each_entry_srcu(ei_child, &ei->children, list, srcu_read_lock_held(&eventfs_srcu)) {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Steven Rostedt (Google)" rostedt@goodmis.org
commit 53c41052ba3121761e6f62a813961164532a214f upstream.
The dentries and inodes are created in the readdir for the sole purpose of getting a consistent inode number. Linus stated that is unnecessary, and that all inodes can have the same inode number. For a virtual file system they are pretty meaningless.
Instead use a single unique inode number for all files and one for all directories.
Link: https://lore.kernel.org/all/20240116133753.2808d45e@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20240116211353.412180363@goodmis....
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Suggested-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/tracefs/event_inode.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -32,6 +32,10 @@ */ static DEFINE_MUTEX(eventfs_mutex);
+/* Choose something "unique" ;-) */ +#define EVENTFS_FILE_INODE_INO 0x12c4e37 +#define EVENTFS_DIR_INODE_INO 0x134b2f5 + /* * The eventfs_inode (ei) itself is protected by SRCU. It is released from * its parent's list and will have is_freed set (under eventfs_mutex). @@ -314,6 +318,9 @@ static struct dentry *create_file(const inode->i_fop = fop; inode->i_private = data;
+ /* All files will have the same inode number */ + inode->i_ino = EVENTFS_FILE_INODE_INO; + ti = get_tracefs(inode); ti->flags |= TRACEFS_EVENT_INODE; d_instantiate(dentry, inode); @@ -350,6 +357,9 @@ static struct dentry *create_dir(struct inode->i_op = &eventfs_root_dir_inode_operations; inode->i_fop = &eventfs_file_operations;
+ /* All directories will have the same inode number */ + inode->i_ino = EVENTFS_DIR_INODE_INO; + ti = get_tracefs(inode); ti->flags |= TRACEFS_EVENT_INODE;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Steven Rostedt (Google)" rostedt@goodmis.org
commit 852e46e239ee6db3cd220614cf8bce96e79227c2 upstream.
The original eventfs code added a wrapper around the dcache_readdir open callback and created all the dentries and inodes at open, and increment their ref count. A wrapper was added around the dcache_readdir release function to decrement all the ref counts of those created inodes and dentries. But this proved to be buggy[1] for when a kprobe was created during a dir read, it would create a dentry between the open and the release, and because the release would decrement all ref counts of all files and directories, that would include the kprobe directory that was not there to have its ref count incremented in open. This would cause the ref count to go to negative and later crash the kernel.
To solve this, the dentries and inodes that were created and had their ref count upped in open needed to be saved. That list needed to be passed from the open to the release, so that the release would only decrement the ref counts of the entries that were incremented in the open.
Unfortunately, the dcache_readdir logic was already using the file->private_data, which is the only field that can be used to pass information from the open to the release. What was done was the eventfs created another descriptor that had a void pointer to save the dcache_readdir pointer, and it wrapped all the callbacks, so that it could save the list of entries that had their ref counts incremented in the open, and pass it to the release. The wrapped callbacks would just put back the dcache_readdir pointer and call the functions it used so it could still use its data[2].
But Linus had an issue with the "hijacking" of the file->private_data (unfortunately this discussion was on a security list, so no public link). Which we finally agreed on doing everything within the iterate_shared callback and leave the dcache_readdir out of it[3]. All the information needed for the getents() could be created then.
But this ended up being buggy too[4]. The iterate_shared callback was not the right place to create the dentries and inodes. Even Christian Brauner had issues with that[5].
An attempt was to go back to creating the inodes and dentries at the open, create an array to store the information in the file->private_data, and pass that information to the other callbacks.[6]
The difference between that and the original method, is that it does not use dcache_readdir. It also does not up the ref counts of the dentries and pass them. Instead, it creates an array of a structure that saves the dentry's name and inode number. That information is used in the iterate_shared callback, and the array is freed in the dir release. The dentries and inodes created in the open are not used for the iterate_share or release callbacks. Just their names and inode numbers.
Linus did not like that either[7] and just wanted to remove the dentries being created in iterate_shared and use the hard coded inode numbers.
[ All this while Linus enjoyed an unexpected vacation during the merge window due to lack of power. ]
[1] https://lore.kernel.org/linux-trace-kernel/20230919211804.230edf1e@gandalf.l... [2] https://lore.kernel.org/linux-trace-kernel/20230922163446.1431d4fa@gandalf.l... [3] https://lore.kernel.org/linux-trace-kernel/20240104015435.682218477@goodmis.... [4] https://lore.kernel.org/all/202401152142.bfc28861-oliver.sang@intel.com/ [5] https://lore.kernel.org/all/20240111-unzahl-gefegt-433acb8a841d@brauner/ [6] https://lore.kernel.org/all/20240116114711.7e8637be@gandalf.local.home/ [7] https://lore.kernel.org/all/20240116170154.5bf0a250@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20240116211353.573784051@goodmis....
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Fixes: 493ec81a8fb8 ("eventfs: Stop using dcache_readdir() for getdents()") Reported-by: kernel test robot oliver.sang@intel.com Closes: https://lore.kernel.org/oe-lkp/202401152142.bfc28861-oliver.sang@intel.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/tracefs/event_inode.c | 20 +++++--------------- 1 file changed, 5 insertions(+), 15 deletions(-)
--- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -689,8 +689,6 @@ static int eventfs_iterate(struct file * struct eventfs_inode *ei_child; struct tracefs_inode *ti; struct eventfs_inode *ei; - struct dentry *ei_dentry = NULL; - struct dentry *dentry; const char *name; umode_t mode; int idx; @@ -711,11 +709,11 @@ static int eventfs_iterate(struct file *
mutex_lock(&eventfs_mutex); ei = READ_ONCE(ti->private); - if (ei && !ei->is_freed) - ei_dentry = READ_ONCE(ei->dentry); + if (ei && ei->is_freed) + ei = NULL; mutex_unlock(&eventfs_mutex);
- if (!ei || !ei_dentry) + if (!ei) goto out;
/* @@ -742,11 +740,7 @@ static int eventfs_iterate(struct file * if (r <= 0) continue;
- dentry = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops); - if (!dentry) - goto out; - ino = dentry->d_inode->i_ino; - dput(dentry); + ino = EVENTFS_FILE_INODE_INO;
if (!dir_emit(ctx, name, strlen(name), ino, DT_REG)) goto out; @@ -770,11 +764,7 @@ static int eventfs_iterate(struct file *
name = ei_child->name;
- dentry = create_dir_dentry(ei, ei_child, ei_dentry); - if (!dentry) - goto out_dec; - ino = dentry->d_inode->i_ino; - dput(dentry); + ino = EVENTFS_DIR_INODE_INO;
if (!dir_emit(ctx, name, strlen(name), ino, DT_DIR)) goto out_dec;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Erick Archer erick.archer@gmx.com
commit 1057066009c4325bb1d8430c9274894d0860e7c3 upstream.
As noted in the "Deprecated Interfaces, Language Features, Attributes, and Conventions" documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors.
So, use the purpose specific kcalloc() function instead of the argument size * count in the kzalloc() function.
[1] https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arit...
Link: https://lore.kernel.org/linux-trace-kernel/20240115181658.4562-1-erick.arche...
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Mark Rutland mark.rutland@arm.com Link: https://github.com/KSPP/linux/issues/162 Signed-off-by: Erick Archer erick.archer@gmx.com Reviewed-by: Gustavo A. R. Silva gustavoars@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/tracefs/event_inode.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -97,7 +97,7 @@ static int eventfs_set_attr(struct mnt_i /* Preallocate the children mode array if necessary */ if (!(dentry->d_inode->i_mode & S_IFDIR)) { if (!ei->entry_attrs) { - ei->entry_attrs = kzalloc(sizeof(*ei->entry_attrs) * ei->nr_entries, + ei->entry_attrs = kcalloc(ei->nr_entries, sizeof(*ei->entry_attrs), GFP_NOFS); if (!ei->entry_attrs) { ret = -ENOMEM; @@ -836,7 +836,7 @@ struct eventfs_inode *eventfs_create_dir }
if (size) { - ei->d_children = kzalloc(sizeof(*ei->d_children) * size, GFP_KERNEL); + ei->d_children = kcalloc(size, sizeof(*ei->d_children), GFP_KERNEL); if (!ei->d_children) { kfree_const(ei->name); kfree(ei); @@ -903,7 +903,7 @@ struct eventfs_inode *eventfs_create_eve goto fail;
if (size) { - ei->d_children = kzalloc(sizeof(*ei->d_children) * size, GFP_KERNEL); + ei->d_children = kcalloc(size, sizeof(*ei->d_children), GFP_KERNEL); if (!ei->d_children) goto fail; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Steven Rostedt (Google)" rostedt@goodmis.org
commit 834bf76add3e6168038150f162cbccf1fd492a67 upstream.
The eventfs inodes and directories are allocated when referenced. But this leaves the issue of keeping consistent inode numbers and the number is only saved in the inode structure itself. When the inode is no longer referenced, it can be freed. When the file that the inode was representing is referenced again, the inode is once again created, but the inode number needs to be the same as it was before.
Just making the inode numbers the same for all files is fine, but that does not work with directories. The find command will check for loops via the inode number and having the same inode number for directories triggers:
# find /sys/kernel/tracing find: File system loop detected; '/sys/kernel/debug/tracing/events/initcall/initcall_finish' is part of the same file system loop as '/sys/kernel/debug/tracing/events/initcall'. [..]
Linus pointed out that the eventfs_inode structure ends with a single 32bit int, and on 64 bit machines, there's likely a 4 byte hole due to alignment. We can use this hole to store the inode number for the eventfs_inode. All directories in eventfs are represented by an eventfs_inode and that data structure can hold its inode number.
That last int was also purposely placed at the end of the structure to prevent holes from within. Now that there's a 4 byte number to hold the inode, both the inode number and the last integer can be moved up in the structure for better cache locality, where the llist and rcu fields can be moved to the end as they are only used when the eventfs_inode is being deleted.
Link: https://lore.kernel.org/all/CAMuHMdXKiorg-jiuKoZpfZyDJ3Ynrfb8=X+c7x0Eewxn-YR... Link: https://lore.kernel.org/linux-trace-kernel/20240122152748.46897388@gandalf.l...
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Linus Torvalds torvalds@linux-foundation.org Reported-by: Geert Uytterhoeven geert@linux-m68k.org Tested-by: Geert Uytterhoeven geert+renesas@glider.be Fixes: 53c41052ba31 ("eventfs: Have the inodes all for files and directories all be the same") Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/tracefs/event_inode.c | 14 +++++++++++--- fs/tracefs/internal.h | 7 ++++--- 2 files changed, 15 insertions(+), 6 deletions(-)
--- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -34,7 +34,15 @@ static DEFINE_MUTEX(eventfs_mutex);
/* Choose something "unique" ;-) */ #define EVENTFS_FILE_INODE_INO 0x12c4e37 -#define EVENTFS_DIR_INODE_INO 0x134b2f5 + +/* Just try to make something consistent and unique */ +static int eventfs_dir_ino(struct eventfs_inode *ei) +{ + if (!ei->ino) + ei->ino = get_next_ino(); + + return ei->ino; +}
/* * The eventfs_inode (ei) itself is protected by SRCU. It is released from @@ -358,7 +366,7 @@ static struct dentry *create_dir(struct inode->i_fop = &eventfs_file_operations;
/* All directories will have the same inode number */ - inode->i_ino = EVENTFS_DIR_INODE_INO; + inode->i_ino = eventfs_dir_ino(ei);
ti = get_tracefs(inode); ti->flags |= TRACEFS_EVENT_INODE; @@ -764,7 +772,7 @@ static int eventfs_iterate(struct file *
name = ei_child->name;
- ino = EVENTFS_DIR_INODE_INO; + ino = eventfs_dir_ino(ei_child);
if (!dir_emit(ctx, name, strlen(name), ino, DT_DIR)) goto out_dec; --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -55,6 +55,10 @@ struct eventfs_inode { struct eventfs_attr *entry_attrs; struct eventfs_attr attr; void *data; + unsigned int is_freed:1; + unsigned int is_events:1; + unsigned int nr_entries:30; + unsigned int ino; /* * Union - used for deletion * @llist: for calling dput() if needed after RCU @@ -64,9 +68,6 @@ struct eventfs_inode { struct llist_node llist; struct rcu_head rcu; }; - unsigned int is_freed:1; - unsigned int is_events:1; - unsigned int nr_entries:30; };
static inline struct tracefs_inode *get_tracefs(const struct inode *inode)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Steven Rostedt (Google)" rostedt@goodmis.org
commit d81786f53aec14fd4d56263145a0635afbc64617 upstream.
eventfs uses the tracefs_inode and assumes that it's already initialized to zero. That is, it doesn't set fields to zero (like ti->private) after getting its tracefs_inode. This causes bugs due to stale values.
Just initialize the entire structure to zero on allocation so there isn't any more surprises.
This is a partial fix to access to ti->private. The assignment still needs to be made before the dentry is instantiated.
Link: https://lore.kernel.org/linux-trace-kernel/20240131185512.315825944@goodmis....
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: kernel test robot oliver.sang@intel.com Closes: https://lore.kernel.org/oe-lkp/202401291043.e62e89dc-oliver.sang@intel.com Suggested-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/tracefs/inode.c | 6 ++++-- fs/tracefs/internal.h | 3 ++- 2 files changed, 6 insertions(+), 3 deletions(-)
--- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -38,8 +38,6 @@ static struct inode *tracefs_alloc_inode if (!ti) return NULL;
- ti->flags = 0; - return &ti->vfs_inode; }
@@ -779,7 +777,11 @@ static void init_once(void *foo) { struct tracefs_inode *ti = (struct tracefs_inode *) foo;
+ /* inode_init_once() calls memset() on the vfs_inode portion */ inode_init_once(&ti->vfs_inode); + + /* Zero out the rest */ + memset_after(ti, 0, vfs_inode); }
static int __init tracefs_init(void) --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -11,9 +11,10 @@ enum { };
struct tracefs_inode { + struct inode vfs_inode; + /* The below gets initialized with memset_after(ti, 0, vfs_inode) */ unsigned long flags; void *private; - struct inode vfs_inode; };
/*
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Linus Torvalds torvalds@linux-foundation.org
commit 4fa4b010b83fb2f837b5ef79e38072a79e96e4f1 upstream.
The tracefs-specific fields in the inode were not initialized before the inode was exposed to others through the dentry with 'd_instantiate()'.
Move the field initializations up to before the d_instantiate.
Link: https://lore.kernel.org/linux-trace-kernel/20240131185512.478449628@goodmis....
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: kernel test robot oliver.sang@intel.com Closes: https://lore.kernel.org/oe-lkp/202401291043.e62e89dc-oliver.sang@intel.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/tracefs/event_inode.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
--- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -370,6 +370,8 @@ static struct dentry *create_dir(struct
ti = get_tracefs(inode); ti->flags |= TRACEFS_EVENT_INODE; + /* Only directories have ti->private set to an ei, not files */ + ti->private = ei;
inc_nlink(inode); d_instantiate(dentry, inode); @@ -515,7 +517,6 @@ create_file_dentry(struct eventfs_inode static void eventfs_post_create_dir(struct eventfs_inode *ei) { struct eventfs_inode *ei_child; - struct tracefs_inode *ti;
lockdep_assert_held(&eventfs_mutex);
@@ -525,9 +526,6 @@ static void eventfs_post_create_dir(stru srcu_read_lock_held(&eventfs_srcu)) { ei_child->d_parent = ei->dentry; } - - ti = get_tracefs(ei->dentry->d_inode); - ti->private = ei; }
/**
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Linus Torvalds torvalds@linux-foundation.org
commit 99c001cb617df409dac275a059d6c3f187a2da7a upstream.
The eventfs_find_events() code tries to walk up the tree to find the event directory that a dentry belongs to, in order to then find the eventfs inode that is associated with that event directory.
However, it uses an odd combination of walking the dentry parent, looking up the eventfs inode associated with that, and then looking up the dentry from there. Repeat.
But the code shouldn't have back-pointers to dentries in the first place, and it should just walk the dentry parenthood chain directly.
Similarly, 'set_top_events_ownership()' looks up the dentry from the eventfs inode, but the only reason it wants a dentry is to look up the superblock in order to look up the root dentry.
But it already has the real filesystem inode, which has that same superblock pointer. So just pass in the superblock pointer using the information that's already there, instead of looking up extraneous data that is irrelevant.
Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang... Link: https://lore.kernel.org/linux-trace-kernel/20240131185512.638645365@goodmis....
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions") Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/tracefs/event_inode.c | 26 ++++++++++++-------------- 1 file changed, 12 insertions(+), 14 deletions(-)
--- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -156,33 +156,30 @@ static int eventfs_set_attr(struct mnt_i return ret; }
-static void update_top_events_attr(struct eventfs_inode *ei, struct dentry *dentry) +static void update_top_events_attr(struct eventfs_inode *ei, struct super_block *sb) { - struct inode *inode; + struct inode *root;
/* Only update if the "events" was on the top level */ if (!ei || !(ei->attr.mode & EVENTFS_TOPLEVEL)) return;
/* Get the tracefs root inode. */ - inode = d_inode(dentry->d_sb->s_root); - ei->attr.uid = inode->i_uid; - ei->attr.gid = inode->i_gid; + root = d_inode(sb->s_root); + ei->attr.uid = root->i_uid; + ei->attr.gid = root->i_gid; }
static void set_top_events_ownership(struct inode *inode) { struct tracefs_inode *ti = get_tracefs(inode); struct eventfs_inode *ei = ti->private; - struct dentry *dentry;
/* The top events directory doesn't get automatically updated */ if (!ei || !ei->is_events || !(ei->attr.mode & EVENTFS_TOPLEVEL)) return;
- dentry = ei->dentry; - - update_top_events_attr(ei, dentry); + update_top_events_attr(ei, inode->i_sb);
if (!(ei->attr.mode & EVENTFS_SAVE_UID)) inode->i_uid = ei->attr.uid; @@ -235,8 +232,10 @@ static struct eventfs_inode *eventfs_fin
mutex_lock(&eventfs_mutex); do { - /* The parent always has an ei, except for events itself */ - ei = dentry->d_parent->d_fsdata; + // The parent is stable because we do not do renames + dentry = dentry->d_parent; + // ... and directories always have d_fsdata + ei = dentry->d_fsdata;
/* * If the ei is being freed, the ownership of the children @@ -246,12 +245,11 @@ static struct eventfs_inode *eventfs_fin ei = NULL; break; } - - dentry = ei->dentry; + // Walk upwards until you find the events inode } while (!ei->is_events); mutex_unlock(&eventfs_mutex);
- update_top_events_attr(ei, dentry); + update_top_events_attr(ei, dentry->d_sb);
return ei; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Linus Torvalds torvalds@linux-foundation.org
commit 49304c2b93e4f7468b51ef717cbe637981397115 upstream.
The dentry lookup for eventfs files was very broken, and had lots of signs of the old situation where the filesystem names were all created statically in the dentry tree, rather than being looked up dynamically based on the eventfs data structures.
You could see it in the naming - how it claimed to "create" dentries rather than just look up the dentries that were given it.
You could see it in various nonsensical and very incorrect operations, like using "simple_lookup()" on the dentries that were passed in, which only results in those dentries becoming negative dentries. Which meant that any other lookup would possibly return ENOENT if it saw that negative dentry before the data was then later filled in.
You could see it in the immense amount of nonsensical code that didn't actually just do lookups.
Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang... Link: https://lore.kernel.org/linux-trace-kernel/20240131233227.73db55e1@gandalf.l...
Cc: stable@vger.kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Ajay Kaher ajay.kaher@broadcom.com Cc: Mark Rutland mark.rutland@arm.com Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions") Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/tracefs/event_inode.c | 275 ++++++++--------------------------------------- fs/tracefs/inode.c | 69 ----------- fs/tracefs/internal.h | 3 3 files changed, 50 insertions(+), 297 deletions(-)
--- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -230,7 +230,6 @@ static struct eventfs_inode *eventfs_fin { struct eventfs_inode *ei;
- mutex_lock(&eventfs_mutex); do { // The parent is stable because we do not do renames dentry = dentry->d_parent; @@ -247,7 +246,6 @@ static struct eventfs_inode *eventfs_fin } // Walk upwards until you find the events inode } while (!ei->is_events); - mutex_unlock(&eventfs_mutex);
update_top_events_attr(ei, dentry->d_sb);
@@ -280,11 +278,10 @@ static void update_inode_attr(struct den }
/** - * create_file - create a file in the tracefs filesystem - * @name: the name of the file to create. + * lookup_file - look up a file in the tracefs filesystem + * @dentry: the dentry to look up * @mode: the permission that the file should have. * @attr: saved attributes changed by user - * @parent: parent dentry for this file. * @data: something that the caller will want to get to later on. * @fop: struct file_operations that should be used for this file. * @@ -292,13 +289,13 @@ static void update_inode_attr(struct den * directory. The inode.i_private pointer will point to @data in the open() * call. */ -static struct dentry *create_file(const char *name, umode_t mode, +static struct dentry *lookup_file(struct dentry *dentry, + umode_t mode, struct eventfs_attr *attr, - struct dentry *parent, void *data, + void *data, const struct file_operations *fop) { struct tracefs_inode *ti; - struct dentry *dentry; struct inode *inode;
if (!(mode & S_IFMT)) @@ -307,15 +304,9 @@ static struct dentry *create_file(const if (WARN_ON_ONCE(!S_ISREG(mode))) return NULL;
- WARN_ON_ONCE(!parent); - dentry = eventfs_start_creating(name, parent); - - if (IS_ERR(dentry)) - return dentry; - inode = tracefs_get_inode(dentry->d_sb); if (unlikely(!inode)) - return eventfs_failed_creating(dentry); + return ERR_PTR(-ENOMEM);
/* If the user updated the directory's attributes, use them */ update_inode_attr(dentry, inode, attr, mode); @@ -329,32 +320,29 @@ static struct dentry *create_file(const
ti = get_tracefs(inode); ti->flags |= TRACEFS_EVENT_INODE; - d_instantiate(dentry, inode); + + d_add(dentry, inode); fsnotify_create(dentry->d_parent->d_inode, dentry); - return eventfs_end_creating(dentry); + return dentry; };
/** - * create_dir - create a dir in the tracefs filesystem + * lookup_dir_entry - look up a dir in the tracefs filesystem + * @dentry: the directory to look up * @ei: the eventfs_inode that represents the directory to create - * @parent: parent dentry for this file. * - * This function will create a dentry for a directory represented by + * This function will look up a dentry for a directory represented by * a eventfs_inode. */ -static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent) +static struct dentry *lookup_dir_entry(struct dentry *dentry, + struct eventfs_inode *pei, struct eventfs_inode *ei) { struct tracefs_inode *ti; - struct dentry *dentry; struct inode *inode;
- dentry = eventfs_start_creating(ei->name, parent); - if (IS_ERR(dentry)) - return dentry; - inode = tracefs_get_inode(dentry->d_sb); if (unlikely(!inode)) - return eventfs_failed_creating(dentry); + return ERR_PTR(-ENOMEM);
/* If the user updated the directory's attributes, use them */ update_inode_attr(dentry, inode, &ei->attr, @@ -371,11 +359,14 @@ static struct dentry *create_dir(struct /* Only directories have ti->private set to an ei, not files */ ti->private = ei;
+ dentry->d_fsdata = ei; + ei->dentry = dentry; // Remove me! + inc_nlink(inode); - d_instantiate(dentry, inode); + d_add(dentry, inode); inc_nlink(dentry->d_parent->d_inode); fsnotify_mkdir(dentry->d_parent->d_inode, dentry); - return eventfs_end_creating(dentry); + return dentry; }
static void free_ei(struct eventfs_inode *ei) @@ -425,7 +416,7 @@ void eventfs_set_ei_status_free(struct t }
/** - * create_file_dentry - create a dentry for a file of an eventfs_inode + * lookup_file_dentry - create a dentry for a file of an eventfs_inode * @ei: the eventfs_inode that the file will be created under * @idx: the index into the d_children[] of the @ei * @parent: The parent dentry of the created file. @@ -438,157 +429,21 @@ void eventfs_set_ei_status_free(struct t * address located at @e_dentry. */ static struct dentry * -create_file_dentry(struct eventfs_inode *ei, int idx, - struct dentry *parent, const char *name, umode_t mode, void *data, +lookup_file_dentry(struct dentry *dentry, + struct eventfs_inode *ei, int idx, + umode_t mode, void *data, const struct file_operations *fops) { struct eventfs_attr *attr = NULL; struct dentry **e_dentry = &ei->d_children[idx]; - struct dentry *dentry;
- WARN_ON_ONCE(!inode_is_locked(parent->d_inode)); - - mutex_lock(&eventfs_mutex); - if (ei->is_freed) { - mutex_unlock(&eventfs_mutex); - return NULL; - } - /* If the e_dentry already has a dentry, use it */ - if (*e_dentry) { - dget(*e_dentry); - mutex_unlock(&eventfs_mutex); - return *e_dentry; - } - - /* ei->entry_attrs are protected by SRCU */ if (ei->entry_attrs) attr = &ei->entry_attrs[idx];
- mutex_unlock(&eventfs_mutex); - - dentry = create_file(name, mode, attr, parent, data, fops); - - mutex_lock(&eventfs_mutex); - - if (IS_ERR_OR_NULL(dentry)) { - /* - * When the mutex was released, something else could have - * created the dentry for this e_dentry. In which case - * use that one. - * - * If ei->is_freed is set, the e_dentry is currently on its - * way to being freed, don't return it. If e_dentry is NULL - * it means it was already freed. - */ - if (ei->is_freed) { - dentry = NULL; - } else { - dentry = *e_dentry; - dget(dentry); - } - mutex_unlock(&eventfs_mutex); - return dentry; - } + dentry->d_fsdata = ei; // NOTE: ei of _parent_ + lookup_file(dentry, mode, attr, data, fops);
- if (!*e_dentry && !ei->is_freed) { - *e_dentry = dentry; - dentry->d_fsdata = ei; - } else { - /* - * Should never happen unless we get here due to being freed. - * Otherwise it means two dentries exist with the same name. - */ - WARN_ON_ONCE(!ei->is_freed); - dentry = NULL; - } - mutex_unlock(&eventfs_mutex); - - return dentry; -} - -/** - * eventfs_post_create_dir - post create dir routine - * @ei: eventfs_inode of recently created dir - * - * Map the meta-data of files within an eventfs dir to their parent dentry - */ -static void eventfs_post_create_dir(struct eventfs_inode *ei) -{ - struct eventfs_inode *ei_child; - - lockdep_assert_held(&eventfs_mutex); - - /* srcu lock already held */ - /* fill parent-child relation */ - list_for_each_entry_srcu(ei_child, &ei->children, list, - srcu_read_lock_held(&eventfs_srcu)) { - ei_child->d_parent = ei->dentry; - } -} - -/** - * create_dir_dentry - Create a directory dentry for the eventfs_inode - * @pei: The eventfs_inode parent of ei. - * @ei: The eventfs_inode to create the directory for - * @parent: The dentry of the parent of this directory - * - * This creates and attaches a directory dentry to the eventfs_inode @ei. - */ -static struct dentry * -create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei, - struct dentry *parent) -{ - struct dentry *dentry = NULL; - - WARN_ON_ONCE(!inode_is_locked(parent->d_inode)); - - mutex_lock(&eventfs_mutex); - if (pei->is_freed || ei->is_freed) { - mutex_unlock(&eventfs_mutex); - return NULL; - } - if (ei->dentry) { - /* If the eventfs_inode already has a dentry, use it */ - dentry = ei->dentry; - dget(dentry); - mutex_unlock(&eventfs_mutex); - return dentry; - } - mutex_unlock(&eventfs_mutex); - - dentry = create_dir(ei, parent); - - mutex_lock(&eventfs_mutex); - - if (IS_ERR_OR_NULL(dentry) && !ei->is_freed) { - /* - * When the mutex was released, something else could have - * created the dentry for this e_dentry. In which case - * use that one. - * - * If ei->is_freed is set, the e_dentry is currently on its - * way to being freed. - */ - dentry = ei->dentry; - if (dentry) - dget(dentry); - mutex_unlock(&eventfs_mutex); - return dentry; - } - - if (!ei->dentry && !ei->is_freed) { - ei->dentry = dentry; - eventfs_post_create_dir(ei); - dentry->d_fsdata = ei; - } else { - /* - * Should never happen unless we get here due to being freed. - * Otherwise it means two dentries exist with the same name. - */ - WARN_ON_ONCE(!ei->is_freed); - dentry = NULL; - } - mutex_unlock(&eventfs_mutex); + *e_dentry = dentry; // Remove me
return dentry; } @@ -607,79 +462,49 @@ static struct dentry *eventfs_root_looku struct dentry *dentry, unsigned int flags) { - const struct file_operations *fops; - const struct eventfs_entry *entry; struct eventfs_inode *ei_child; struct tracefs_inode *ti; struct eventfs_inode *ei; - struct dentry *ei_dentry = NULL; - struct dentry *ret = NULL; - struct dentry *d; const char *name = dentry->d_name.name; - umode_t mode; - void *data; - int idx; - int i; - int r;
ti = get_tracefs(dir); if (!(ti->flags & TRACEFS_EVENT_INODE)) - return NULL; + return ERR_PTR(-EIO);
- /* Grab srcu to prevent the ei from going away */ - idx = srcu_read_lock(&eventfs_srcu); - - /* - * Grab the eventfs_mutex to consistent value from ti->private. - * This s - */ mutex_lock(&eventfs_mutex); - ei = READ_ONCE(ti->private); - if (ei && !ei->is_freed) - ei_dentry = READ_ONCE(ei->dentry); - mutex_unlock(&eventfs_mutex);
- if (!ei || !ei_dentry) + ei = ti->private; + if (!ei || ei->is_freed) goto out;
- data = ei->data; - - list_for_each_entry_srcu(ei_child, &ei->children, list, - srcu_read_lock_held(&eventfs_srcu)) { + list_for_each_entry(ei_child, &ei->children, list) { if (strcmp(ei_child->name, name) != 0) continue; - ret = simple_lookup(dir, dentry, flags); - if (IS_ERR(ret)) + if (ei_child->is_freed) goto out; - d = create_dir_dentry(ei, ei_child, ei_dentry); - dput(d); + lookup_dir_entry(dentry, ei, ei_child); goto out; }
- for (i = 0; i < ei->nr_entries; i++) { - entry = &ei->entries[i]; - if (strcmp(name, entry->name) == 0) { - void *cdata = data; - mutex_lock(&eventfs_mutex); - /* If ei->is_freed, then the event itself may be too */ - if (!ei->is_freed) - r = entry->callback(name, &mode, &cdata, &fops); - else - r = -1; - mutex_unlock(&eventfs_mutex); - if (r <= 0) - continue; - ret = simple_lookup(dir, dentry, flags); - if (IS_ERR(ret)) - goto out; - d = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops); - dput(d); - break; - } + for (int i = 0; i < ei->nr_entries; i++) { + void *data; + umode_t mode; + const struct file_operations *fops; + const struct eventfs_entry *entry = &ei->entries[i]; + + if (strcmp(name, entry->name) != 0) + continue; + + data = ei->data; + if (entry->callback(name, &mode, &data, &fops) <= 0) + goto out; + + lookup_file_dentry(dentry, ei, i, mode, data, fops); + goto out; } out: - srcu_read_unlock(&eventfs_srcu, idx); - return ret; + mutex_unlock(&eventfs_mutex); + return NULL; }
/* --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -495,75 +495,6 @@ struct dentry *tracefs_end_creating(stru return dentry; }
-/** - * eventfs_start_creating - start the process of creating a dentry - * @name: Name of the file created for the dentry - * @parent: The parent dentry where this dentry will be created - * - * This is a simple helper function for the dynamically created eventfs - * files. When the directory of the eventfs files are accessed, their - * dentries are created on the fly. This function is used to start that - * process. - */ -struct dentry *eventfs_start_creating(const char *name, struct dentry *parent) -{ - struct dentry *dentry; - int error; - - /* Must always have a parent. */ - if (WARN_ON_ONCE(!parent)) - return ERR_PTR(-EINVAL); - - error = simple_pin_fs(&trace_fs_type, &tracefs_mount, - &tracefs_mount_count); - if (error) - return ERR_PTR(error); - - if (unlikely(IS_DEADDIR(parent->d_inode))) - dentry = ERR_PTR(-ENOENT); - else - dentry = lookup_one_len(name, parent, strlen(name)); - - if (!IS_ERR(dentry) && dentry->d_inode) { - dput(dentry); - dentry = ERR_PTR(-EEXIST); - } - - if (IS_ERR(dentry)) - simple_release_fs(&tracefs_mount, &tracefs_mount_count); - - return dentry; -} - -/** - * eventfs_failed_creating - clean up a failed eventfs dentry creation - * @dentry: The dentry to clean up - * - * If after calling eventfs_start_creating(), a failure is detected, the - * resources created by eventfs_start_creating() needs to be cleaned up. In - * that case, this function should be called to perform that clean up. - */ -struct dentry *eventfs_failed_creating(struct dentry *dentry) -{ - dput(dentry); - simple_release_fs(&tracefs_mount, &tracefs_mount_count); - return NULL; -} - -/** - * eventfs_end_creating - Finish the process of creating a eventfs dentry - * @dentry: The dentry that has successfully been created. - * - * This function is currently just a place holder to match - * eventfs_start_creating(). In case any synchronization needs to be added, - * this function will be used to implement that without having to modify - * the callers of eventfs_start_creating(). - */ -struct dentry *eventfs_end_creating(struct dentry *dentry) -{ - return dentry; -} - /* Find the inode that this will use for default */ static struct inode *instance_inode(struct dentry *parent, struct inode *inode) { --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -80,9 +80,6 @@ struct dentry *tracefs_start_creating(co struct dentry *tracefs_end_creating(struct dentry *dentry); struct dentry *tracefs_failed_creating(struct dentry *dentry); struct inode *tracefs_get_inode(struct super_block *sb); -struct dentry *eventfs_start_creating(const char *name, struct dentry *parent); -struct dentry *eventfs_failed_creating(struct dentry *dentry); -struct dentry *eventfs_end_creating(struct dentry *dentry); void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry);
#endif /* _TRACEFS_INTERNAL_H */
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Linus Torvalds torvalds@linux-foundation.org
commit 408600be78cdb8c650a97ecc7ff411cb216811b5 upstream.
It's never used
Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang... Link: https://lore.kernel.org/linux-trace-kernel/20240131185512.961772428@goodmis....
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions") Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/tracefs/event_inode.c | 4 +--- fs/tracefs/internal.h | 2 -- 2 files changed, 1 insertion(+), 5 deletions(-)
--- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -680,10 +680,8 @@ struct eventfs_inode *eventfs_create_dir INIT_LIST_HEAD(&ei->list);
mutex_lock(&eventfs_mutex); - if (!parent->is_freed) { + if (!parent->is_freed) list_add_tail(&ei->list, &parent->children); - ei->d_parent = parent->dentry; - } mutex_unlock(&eventfs_mutex);
/* Was the parent freed? */ --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -36,7 +36,6 @@ struct eventfs_attr { * @name: the name of the directory to create * @children: link list into the child eventfs_inode * @dentry: the dentry of the directory - * @d_parent: pointer to the parent's dentry * @d_children: The array of dentries to represent the files when created * @entry_attrs: Saved mode and ownership of the @d_children * @attr: Saved mode and ownership of eventfs_inode itself @@ -51,7 +50,6 @@ struct eventfs_inode { const char *name; struct list_head children; struct dentry *dentry; /* Check is_freed to access */ - struct dentry *d_parent; struct dentry **d_children; struct eventfs_attr *entry_attrs; struct eventfs_attr attr;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Linus Torvalds torvalds@linux-foundation.org
commit 8dce06e98c70a7fcbb4bca7d90faf40522e65c58 upstream.
In order for the dentries to stay up-to-date with the eventfs changes, just add a 'd_revalidate' function that checks the 'is_freed' bit.
Also, clean up the dentry release to actually use d_release() rather than the slightly odd d_iput() function. We don't care about the inode, all we want to do is to get rid of the refcount to the eventfs data added by dentry->d_fsdata.
It would probably be cleaner to make eventfs its own filesystem, or at least set its own dentry ops when looking up eventfs files. But as it is, only eventfs dentries use d_fsdata, so we don't really need to split these things up by use.
Another thing that might be worth doing is to make all eventfs lookups mark their dentries as not worth caching. We could do that with d_delete(), but the DCACHE_DONTCACHE flag would likely be even better.
As it is, the dentries are all freeable, but they only tend to get freed at memory pressure rather than more proactively. But that's a separate issue.
Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang... Link: https://lore.kernel.org/linux-trace-kernel/20240131185513.124644253@goodmis....
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions") Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/tracefs/event_inode.c | 5 ++--- fs/tracefs/inode.c | 27 ++++++++++++++++++--------- fs/tracefs/internal.h | 3 ++- 3 files changed, 22 insertions(+), 13 deletions(-)
--- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -378,13 +378,12 @@ static void free_ei(struct eventfs_inode }
/** - * eventfs_set_ei_status_free - remove the dentry reference from an eventfs_inode - * @ti: the tracefs_inode of the dentry + * eventfs_d_release - dentry is going away * @dentry: dentry which has the reference to remove. * * Remove the association between a dentry from an eventfs_inode. */ -void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry) +void eventfs_d_release(struct dentry *dentry) { struct eventfs_inode *ei; int i; --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -377,21 +377,30 @@ static const struct super_operations tra .show_options = tracefs_show_options, };
-static void tracefs_dentry_iput(struct dentry *dentry, struct inode *inode) +/* + * It would be cleaner if eventfs had its own dentry ops. + * + * Note that d_revalidate is called potentially under RCU, + * so it can't take the eventfs mutex etc. It's fine - if + * we open a file just as it's marked dead, things will + * still work just fine, and just see the old stale case. + */ +static void tracefs_d_release(struct dentry *dentry) { - struct tracefs_inode *ti; + if (dentry->d_fsdata) + eventfs_d_release(dentry); +}
- if (!dentry || !inode) - return; +static int tracefs_d_revalidate(struct dentry *dentry, unsigned int flags) +{ + struct eventfs_inode *ei = dentry->d_fsdata;
- ti = get_tracefs(inode); - if (ti && ti->flags & TRACEFS_EVENT_INODE) - eventfs_set_ei_status_free(ti, dentry); - iput(inode); + return !(ei && ei->is_freed); }
static const struct dentry_operations tracefs_dentry_operations = { - .d_iput = tracefs_dentry_iput, + .d_revalidate = tracefs_d_revalidate, + .d_release = tracefs_d_release, };
static int trace_fill_super(struct super_block *sb, void *data, int silent) --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -78,6 +78,7 @@ struct dentry *tracefs_start_creating(co struct dentry *tracefs_end_creating(struct dentry *dentry); struct dentry *tracefs_failed_creating(struct dentry *dentry); struct inode *tracefs_get_inode(struct super_block *sb); -void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry); + +void eventfs_d_release(struct dentry *dentry);
#endif /* _TRACEFS_INTERNAL_H */
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Linus Torvalds torvalds@linux-foundation.org
commit 43aa6f97c2d03a52c1ddb86768575fc84344bdbb upstream.
The eventfs inode had pointers to dentries (and child dentries) without actually holding a refcount on said pointer. That is fundamentally broken, and while eventfs tried to then maintain coherence with dentries going away by hooking into the '.d_iput' callback, that doesn't actually work since it's not ordered wrt lookups.
There were two reasonms why eventfs tried to keep a pointer to a dentry:
- the creation of a 'events' directory would actually have a stable dentry pointer that it created with tracefs_start_creating().
And it needed that dentry when tearing it all down again in eventfs_remove_events_dir().
This use is actually ok, because the special top-level events directory dentries are actually stable, not just a temporary cache of the eventfs data structures.
- the 'eventfs_inode' (aka ei) needs to stay around as long as there are dentries that refer to it.
It then used these dentry pointers as a replacement for doing reference counting: it would try to make sure that there was only ever one dentry associated with an event_inode, and keep a child dentry array around to see which dentries might still refer to the parent ei.
This gets rid of the invalid dentry pointer use, and renames the one valid case to a different name to make it clear that it's not just any random dentry.
The magic child dentry array that is kind of a "reverse reference list" is simply replaced by having child dentries take a ref to the ei. As does the directory dentries. That makes the broken use case go away.
Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang... Link: https://lore.kernel.org/linux-trace-kernel/20240131185513.280463000@goodmis....
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions") Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/tracefs/event_inode.c | 248 ++++++++++++++--------------------------------- fs/tracefs/internal.h | 7 - 2 files changed, 78 insertions(+), 177 deletions(-)
--- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -62,6 +62,35 @@ enum {
#define EVENTFS_MODE_MASK (EVENTFS_SAVE_MODE - 1)
+/* + * eventfs_inode reference count management. + * + * NOTE! We count only references from dentries, in the + * form 'dentry->d_fsdata'. There are also references from + * directory inodes ('ti->private'), but the dentry reference + * count is always a superset of the inode reference count. + */ +static void release_ei(struct kref *ref) +{ + struct eventfs_inode *ei = container_of(ref, struct eventfs_inode, kref); + kfree(ei->entry_attrs); + kfree_const(ei->name); + kfree_rcu(ei, rcu); +} + +static inline void put_ei(struct eventfs_inode *ei) +{ + if (ei) + kref_put(&ei->kref, release_ei); +} + +static inline struct eventfs_inode *get_ei(struct eventfs_inode *ei) +{ + if (ei) + kref_get(&ei->kref); + return ei; +} + static struct dentry *eventfs_root_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags); @@ -289,7 +318,8 @@ static void update_inode_attr(struct den * directory. The inode.i_private pointer will point to @data in the open() * call. */ -static struct dentry *lookup_file(struct dentry *dentry, +static struct dentry *lookup_file(struct eventfs_inode *parent_ei, + struct dentry *dentry, umode_t mode, struct eventfs_attr *attr, void *data, @@ -302,7 +332,7 @@ static struct dentry *lookup_file(struct mode |= S_IFREG;
if (WARN_ON_ONCE(!S_ISREG(mode))) - return NULL; + return ERR_PTR(-EIO);
inode = tracefs_get_inode(dentry->d_sb); if (unlikely(!inode)) @@ -321,9 +351,12 @@ static struct dentry *lookup_file(struct ti = get_tracefs(inode); ti->flags |= TRACEFS_EVENT_INODE;
+ // Files have their parent's ei as their fsdata + dentry->d_fsdata = get_ei(parent_ei); + d_add(dentry, inode); fsnotify_create(dentry->d_parent->d_inode, dentry); - return dentry; + return NULL; };
/** @@ -359,22 +392,29 @@ static struct dentry *lookup_dir_entry(s /* Only directories have ti->private set to an ei, not files */ ti->private = ei;
- dentry->d_fsdata = ei; - ei->dentry = dentry; // Remove me! + dentry->d_fsdata = get_ei(ei);
inc_nlink(inode); d_add(dentry, inode); inc_nlink(dentry->d_parent->d_inode); fsnotify_mkdir(dentry->d_parent->d_inode, dentry); - return dentry; + return NULL; }
-static void free_ei(struct eventfs_inode *ei) +static inline struct eventfs_inode *alloc_ei(const char *name) { - kfree_const(ei->name); - kfree(ei->d_children); - kfree(ei->entry_attrs); - kfree(ei); + struct eventfs_inode *ei = kzalloc(sizeof(*ei), GFP_KERNEL); + + if (!ei) + return NULL; + + ei->name = kstrdup_const(name, GFP_KERNEL); + if (!ei->name) { + kfree(ei); + return NULL; + } + kref_init(&ei->kref); + return ei; }
/** @@ -385,39 +425,13 @@ static void free_ei(struct eventfs_inode */ void eventfs_d_release(struct dentry *dentry) { - struct eventfs_inode *ei; - int i; - - mutex_lock(&eventfs_mutex); - - ei = dentry->d_fsdata; - if (!ei) - goto out; - - /* This could belong to one of the files of the ei */ - if (ei->dentry != dentry) { - for (i = 0; i < ei->nr_entries; i++) { - if (ei->d_children[i] == dentry) - break; - } - if (WARN_ON_ONCE(i == ei->nr_entries)) - goto out; - ei->d_children[i] = NULL; - } else if (ei->is_freed) { - free_ei(ei); - } else { - ei->dentry = NULL; - } - - dentry->d_fsdata = NULL; - out: - mutex_unlock(&eventfs_mutex); + put_ei(dentry->d_fsdata); }
/** * lookup_file_dentry - create a dentry for a file of an eventfs_inode * @ei: the eventfs_inode that the file will be created under - * @idx: the index into the d_children[] of the @ei + * @idx: the index into the entry_attrs[] of the @ei * @parent: The parent dentry of the created file. * @name: The name of the file to create * @mode: The mode of the file. @@ -434,17 +448,11 @@ lookup_file_dentry(struct dentry *dentry const struct file_operations *fops) { struct eventfs_attr *attr = NULL; - struct dentry **e_dentry = &ei->d_children[idx];
if (ei->entry_attrs) attr = &ei->entry_attrs[idx];
- dentry->d_fsdata = ei; // NOTE: ei of _parent_ - lookup_file(dentry, mode, attr, data, fops); - - *e_dentry = dentry; // Remove me - - return dentry; + return lookup_file(ei, dentry, mode, attr, data, fops); }
/** @@ -465,6 +473,7 @@ static struct dentry *eventfs_root_looku struct tracefs_inode *ti; struct eventfs_inode *ei; const char *name = dentry->d_name.name; + struct dentry *result = NULL;
ti = get_tracefs(dir); if (!(ti->flags & TRACEFS_EVENT_INODE)) @@ -481,7 +490,7 @@ static struct dentry *eventfs_root_looku continue; if (ei_child->is_freed) goto out; - lookup_dir_entry(dentry, ei, ei_child); + result = lookup_dir_entry(dentry, ei, ei_child); goto out; }
@@ -498,12 +507,12 @@ static struct dentry *eventfs_root_looku if (entry->callback(name, &mode, &data, &fops) <= 0) goto out;
- lookup_file_dentry(dentry, ei, i, mode, data, fops); + result = lookup_file_dentry(dentry, ei, i, mode, data, fops); goto out; } out: mutex_unlock(&eventfs_mutex); - return NULL; + return result; }
/* @@ -653,25 +662,10 @@ struct eventfs_inode *eventfs_create_dir if (!parent) return ERR_PTR(-EINVAL);
- ei = kzalloc(sizeof(*ei), GFP_KERNEL); + ei = alloc_ei(name); if (!ei) return ERR_PTR(-ENOMEM);
- ei->name = kstrdup_const(name, GFP_KERNEL); - if (!ei->name) { - kfree(ei); - return ERR_PTR(-ENOMEM); - } - - if (size) { - ei->d_children = kcalloc(size, sizeof(*ei->d_children), GFP_KERNEL); - if (!ei->d_children) { - kfree_const(ei->name); - kfree(ei); - return ERR_PTR(-ENOMEM); - } - } - ei->entries = entries; ei->nr_entries = size; ei->data = data; @@ -685,7 +679,7 @@ struct eventfs_inode *eventfs_create_dir
/* Was the parent freed? */ if (list_empty(&ei->list)) { - free_ei(ei); + put_ei(ei); ei = NULL; } return ei; @@ -720,28 +714,20 @@ struct eventfs_inode *eventfs_create_eve if (IS_ERR(dentry)) return ERR_CAST(dentry);
- ei = kzalloc(sizeof(*ei), GFP_KERNEL); + ei = alloc_ei(name); if (!ei) - goto fail_ei; + goto fail;
inode = tracefs_get_inode(dentry->d_sb); if (unlikely(!inode)) goto fail;
- if (size) { - ei->d_children = kcalloc(size, sizeof(*ei->d_children), GFP_KERNEL); - if (!ei->d_children) - goto fail; - } - - ei->dentry = dentry; + // Note: we have a ref to the dentry from tracefs_start_creating() + ei->events_dir = dentry; ei->entries = entries; ei->nr_entries = size; ei->is_events = 1; ei->data = data; - ei->name = kstrdup_const(name, GFP_KERNEL); - if (!ei->name) - goto fail;
/* Save the ownership of this directory */ uid = d_inode(dentry->d_parent)->i_uid; @@ -772,7 +758,7 @@ struct eventfs_inode *eventfs_create_eve inode->i_op = &eventfs_root_dir_inode_operations; inode->i_fop = &eventfs_file_operations;
- dentry->d_fsdata = ei; + dentry->d_fsdata = get_ei(ei);
/* directory inodes start off with i_nlink == 2 (for "." entry) */ inc_nlink(inode); @@ -784,72 +770,11 @@ struct eventfs_inode *eventfs_create_eve return ei;
fail: - kfree(ei->d_children); - kfree(ei); - fail_ei: + put_ei(ei); tracefs_failed_creating(dentry); return ERR_PTR(-ENOMEM); }
-static LLIST_HEAD(free_list); - -static void eventfs_workfn(struct work_struct *work) -{ - struct eventfs_inode *ei, *tmp; - struct llist_node *llnode; - - llnode = llist_del_all(&free_list); - llist_for_each_entry_safe(ei, tmp, llnode, llist) { - /* This dput() matches the dget() from unhook_dentry() */ - for (int i = 0; i < ei->nr_entries; i++) { - if (ei->d_children[i]) - dput(ei->d_children[i]); - } - /* This should only get here if it had a dentry */ - if (!WARN_ON_ONCE(!ei->dentry)) - dput(ei->dentry); - } -} - -static DECLARE_WORK(eventfs_work, eventfs_workfn); - -static void free_rcu_ei(struct rcu_head *head) -{ - struct eventfs_inode *ei = container_of(head, struct eventfs_inode, rcu); - - if (ei->dentry) { - /* Do not free the ei until all references of dentry are gone */ - if (llist_add(&ei->llist, &free_list)) - queue_work(system_unbound_wq, &eventfs_work); - return; - } - - /* If the ei doesn't have a dentry, neither should its children */ - for (int i = 0; i < ei->nr_entries; i++) { - WARN_ON_ONCE(ei->d_children[i]); - } - - free_ei(ei); -} - -static void unhook_dentry(struct dentry *dentry) -{ - if (!dentry) - return; - /* - * Need to add a reference to the dentry that is expected by - * simple_recursive_removal(), which will include a dput(). - */ - dget(dentry); - - /* - * Also add a reference for the dput() in eventfs_workfn(). - * That is required as that dput() will free the ei after - * the SRCU grace period is over. - */ - dget(dentry); -} - /** * eventfs_remove_rec - remove eventfs dir or file from list * @ei: eventfs_inode to be removed. @@ -862,8 +787,6 @@ static void eventfs_remove_rec(struct ev { struct eventfs_inode *ei_child;
- if (!ei) - return; /* * Check recursion depth. It should never be greater than 3: * 0 - events/ @@ -875,28 +798,12 @@ static void eventfs_remove_rec(struct ev return;
/* search for nested folders or files */ - list_for_each_entry_srcu(ei_child, &ei->children, list, - lockdep_is_held(&eventfs_mutex)) { - /* Children only have dentry if parent does */ - WARN_ON_ONCE(ei_child->dentry && !ei->dentry); + list_for_each_entry(ei_child, &ei->children, list) eventfs_remove_rec(ei_child, level + 1); - } -
ei->is_freed = 1; - - for (int i = 0; i < ei->nr_entries; i++) { - if (ei->d_children[i]) { - /* Children only have dentry if parent does */ - WARN_ON_ONCE(!ei->dentry); - unhook_dentry(ei->d_children[i]); - } - } - - unhook_dentry(ei->dentry); - - list_del_rcu(&ei->list); - call_srcu(&eventfs_srcu, &ei->rcu, free_rcu_ei); + list_del(&ei->list); + put_ei(ei); }
/** @@ -907,22 +814,12 @@ static void eventfs_remove_rec(struct ev */ void eventfs_remove_dir(struct eventfs_inode *ei) { - struct dentry *dentry; - if (!ei) return;
mutex_lock(&eventfs_mutex); - dentry = ei->dentry; eventfs_remove_rec(ei, 0); mutex_unlock(&eventfs_mutex); - - /* - * If any of the ei children has a dentry, then the ei itself - * must have a dentry. - */ - if (dentry) - simple_recursive_removal(dentry, NULL); }
/** @@ -935,7 +832,11 @@ void eventfs_remove_events_dir(struct ev { struct dentry *dentry;
- dentry = ei->dentry; + dentry = ei->events_dir; + if (!dentry) + return; + + ei->events_dir = NULL; eventfs_remove_dir(ei);
/* @@ -945,5 +846,6 @@ void eventfs_remove_events_dir(struct ev * sticks around while the other ei->dentry are created * and destroyed dynamically. */ + d_invalidate(dentry); dput(dentry); } --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -35,8 +35,7 @@ struct eventfs_attr { * @entries: the array of entries representing the files in the directory * @name: the name of the directory to create * @children: link list into the child eventfs_inode - * @dentry: the dentry of the directory - * @d_children: The array of dentries to represent the files when created + * @events_dir: the dentry of the events directory * @entry_attrs: Saved mode and ownership of the @d_children * @attr: Saved mode and ownership of eventfs_inode itself * @data: The private data to pass to the callbacks @@ -45,12 +44,12 @@ struct eventfs_attr { * @nr_entries: The number of items in @entries */ struct eventfs_inode { + struct kref kref; struct list_head list; const struct eventfs_entry *entries; const char *name; struct list_head children; - struct dentry *dentry; /* Check is_freed to access */ - struct dentry **d_children; + struct dentry *events_dir; struct eventfs_attr *entry_attrs; struct eventfs_attr attr; void *data;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Steven Rostedt (Google)" rostedt@goodmis.org
commit 5a49f996046ba947466bc7461e4b19c4d1daf978 upstream.
There should never be a case where an evenfs_inode is being freed without is_freed being set. Add a WARN_ON_ONCE() if it ever happens. That would mean there was one too many put_ei()s.
Link: https://lore.kernel.org/linux-trace-kernel/20240201161616.843551963@goodmis....
Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/tracefs/event_inode.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
--- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -73,6 +73,9 @@ enum { static void release_ei(struct kref *ref) { struct eventfs_inode *ei = container_of(ref, struct eventfs_inode, kref); + + WARN_ON_ONCE(!ei->is_freed); + kfree(ei->entry_attrs); kfree_const(ei->name); kfree_rcu(ei, rcu); @@ -84,6 +87,14 @@ static inline void put_ei(struct eventfs kref_put(&ei->kref, release_ei); }
+static inline void free_ei(struct eventfs_inode *ei) +{ + if (ei) { + ei->is_freed = 1; + put_ei(ei); + } +} + static inline struct eventfs_inode *get_ei(struct eventfs_inode *ei) { if (ei) @@ -679,7 +690,7 @@ struct eventfs_inode *eventfs_create_dir
/* Was the parent freed? */ if (list_empty(&ei->list)) { - put_ei(ei); + free_ei(ei); ei = NULL; } return ei; @@ -770,7 +781,7 @@ struct eventfs_inode *eventfs_create_eve return ei;
fail: - put_ei(ei); + free_ei(ei); tracefs_failed_creating(dentry); return ERR_PTR(-ENOMEM); } @@ -801,9 +812,8 @@ static void eventfs_remove_rec(struct ev list_for_each_entry(ei_child, &ei->children, list) eventfs_remove_rec(ei_child, level + 1);
- ei->is_freed = 1; list_del(&ei->list); - put_ei(ei); + free_ei(ei); }
/**
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Steven Rostedt (Google)" rostedt@goodmis.org
commit 264424dfdd5cbd92bc5b5ddf93944929fc877fac upstream.
Some of the eventfs_inode structure has holes in it. Rework the structure to be a bit more condensed, and also remove the no longer used llist field.
Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.002321438@goodmis....
Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/tracefs/internal.h | 27 ++++++++++++--------------- 1 file changed, 12 insertions(+), 15 deletions(-)
--- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -32,40 +32,37 @@ struct eventfs_attr { /* * struct eventfs_inode - hold the properties of the eventfs directories. * @list: link list into the parent directory + * @rcu: Union with @list for freeing + * @children: link list into the child eventfs_inode * @entries: the array of entries representing the files in the directory * @name: the name of the directory to create - * @children: link list into the child eventfs_inode * @events_dir: the dentry of the events directory * @entry_attrs: Saved mode and ownership of the @d_children - * @attr: Saved mode and ownership of eventfs_inode itself * @data: The private data to pass to the callbacks + * @attr: Saved mode and ownership of eventfs_inode itself * @is_freed: Flag set if the eventfs is on its way to be freed * Note if is_freed is set, then dentry is corrupted. + * @is_events: Flag set for only the top level "events" directory * @nr_entries: The number of items in @entries + * @ino: The saved inode number */ struct eventfs_inode { - struct kref kref; - struct list_head list; + union { + struct list_head list; + struct rcu_head rcu; + }; + struct list_head children; const struct eventfs_entry *entries; const char *name; - struct list_head children; struct dentry *events_dir; struct eventfs_attr *entry_attrs; - struct eventfs_attr attr; void *data; + struct eventfs_attr attr; + struct kref kref; unsigned int is_freed:1; unsigned int is_events:1; unsigned int nr_entries:30; unsigned int ino; - /* - * Union - used for deletion - * @llist: for calling dput() if needed after RCU - * @rcu: eventfs_inode to delete in RCU - */ - union { - struct llist_node llist; - struct rcu_head rcu; - }; };
static inline struct tracefs_inode *get_tracefs(const struct inode *inode)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Steven Rostedt (Google)" rostedt@goodmis.org
commit 12d823b31fadf47c8f36ecada7abac5f903cac33 upstream.
The dentries and inodes are created when referenced in the lookup code. There's no reason to call fsnotify_*() functions when they are created by a reference. It doesn't make any sense.
Link: https://lore.kernel.org/linux-trace-kernel/20240201002719.GS2087318@ZenIV/ Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.166973329@goodmis....
Cc: stable@vger.kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Fixes: a376007917776 ("eventfs: Implement functions to create files and dirs when accessed"); Suggested-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/tracefs/event_inode.c | 2 -- 1 file changed, 2 deletions(-)
--- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -366,7 +366,6 @@ static struct dentry *lookup_file(struct dentry->d_fsdata = get_ei(parent_ei);
d_add(dentry, inode); - fsnotify_create(dentry->d_parent->d_inode, dentry); return NULL; };
@@ -408,7 +407,6 @@ static struct dentry *lookup_dir_entry(s inc_nlink(inode); d_add(dentry, inode); inc_nlink(dentry->d_parent->d_inode); - fsnotify_mkdir(dentry->d_parent->d_inode, dentry); return NULL; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Steven Rostedt (Google)" rostedt@goodmis.org
commit ca185770db914869ff9fe773bac5e0e5e4165b83 upstream.
The directory link count in eventfs was somewhat bogus. It was only being updated when a directory child was being looked up and not on creation.
One solution would be to update in get_attr() the link count by iterating the ei->children list and then adding 2. But that could slow down simple stat() calls, especially if it's done on all directories in eventfs.
Another solution would be to add a parent pointer to the eventfs_inode and keep track of the number of sub directories it has on creation. But this adds overhead for something not really worthwhile.
The solution decided upon is to keep all directory links in eventfs as 1. This tells user space not to rely on the hard links of directories. Which in this case it shouldn't.
Link: https://lore.kernel.org/linux-trace-kernel/20240201002719.GS2087318@ZenIV/ Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.339968298@goodmis....
Cc: stable@vger.kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions") Suggested-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/tracefs/event_inode.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
--- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -404,9 +404,7 @@ static struct dentry *lookup_dir_entry(s
dentry->d_fsdata = get_ei(ei);
- inc_nlink(inode); d_add(dentry, inode); - inc_nlink(dentry->d_parent->d_inode); return NULL; }
@@ -769,9 +767,17 @@ struct eventfs_inode *eventfs_create_eve
dentry->d_fsdata = get_ei(ei);
- /* directory inodes start off with i_nlink == 2 (for "." entry) */ - inc_nlink(inode); + /* + * Keep all eventfs directories with i_nlink == 1. + * Due to the dynamic nature of the dentry creations and not + * wanting to add a pointer to the parent eventfs_inode in the + * eventfs_inode structure, keeping the i_nlink in sync with the + * number of directories would cause too much complexity for + * something not worth much. Keeping directory links at 1 + * tells userspace not to trust the link number. + */ d_instantiate(dentry, inode); + /* The dentry of the "events" parent does keep track though */ inc_nlink(dentry->d_parent->d_inode); fsnotify_mkdir(dentry->d_parent->d_inode, dentry); tracefs_end_creating(dentry);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oleg Nesterov oleg@redhat.com
commit daa694e4137571b4ebec330f9a9b4d54aa8b8089 upstream.
Patch series "getrusage: use sig->stats_lock", v2.
This patch (of 2):
thread_group_cputime() does its own locking, we can safely shift thread_group_cputime_adjusted() which does another for_each_thread loop outside of ->siglock protected section.
This is also preparation for the next patch which changes getrusage() to use stats_lock instead of siglock, thread_group_cputime() takes the same lock. With the current implementation recursive read_seqbegin_or_lock() is fine, thread_group_cputime() can't enter the slow mode if the caller holds stats_lock, yet this looks more safe and better performance-wise.
Link: https://lkml.kernel.org/r/20240122155023.GA26169@redhat.com Link: https://lkml.kernel.org/r/20240122155050.GA26205@redhat.com Signed-off-by: Oleg Nesterov oleg@redhat.com Reported-by: Dylan Hatch dylanbhatch@google.com Tested-by: Dylan Hatch dylanbhatch@google.com Cc: Eric W. Biederman ebiederm@xmission.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sys.c | 36 ++++++++++++++++++++---------------- 1 file changed, 20 insertions(+), 16 deletions(-)
--- a/kernel/sys.c +++ b/kernel/sys.c @@ -1785,17 +1785,19 @@ void getrusage(struct task_struct *p, in struct task_struct *t; unsigned long flags; u64 tgutime, tgstime, utime, stime; - unsigned long maxrss = 0; + unsigned long maxrss; + struct mm_struct *mm; struct signal_struct *sig = p->signal;
- memset((char *)r, 0, sizeof (*r)); + memset(r, 0, sizeof(*r)); utime = stime = 0; + maxrss = 0;
if (who == RUSAGE_THREAD) { task_cputime_adjusted(current, &utime, &stime); accumulate_thread_rusage(p, r); maxrss = sig->maxrss; - goto out; + goto out_thread; }
if (!lock_task_sighand(p, &flags)) @@ -1819,9 +1821,6 @@ void getrusage(struct task_struct *p, in fallthrough;
case RUSAGE_SELF: - thread_group_cputime_adjusted(p, &tgutime, &tgstime); - utime += tgutime; - stime += tgstime; r->ru_nvcsw += sig->nvcsw; r->ru_nivcsw += sig->nivcsw; r->ru_minflt += sig->min_flt; @@ -1839,19 +1838,24 @@ void getrusage(struct task_struct *p, in } unlock_task_sighand(p, &flags);
-out: - r->ru_utime = ns_to_kernel_old_timeval(utime); - r->ru_stime = ns_to_kernel_old_timeval(stime); - - if (who != RUSAGE_CHILDREN) { - struct mm_struct *mm = get_task_mm(p); + if (who == RUSAGE_CHILDREN) + goto out_children;
- if (mm) { - setmax_mm_hiwater_rss(&maxrss, mm); - mmput(mm); - } + thread_group_cputime_adjusted(p, &tgutime, &tgstime); + utime += tgutime; + stime += tgstime; + +out_thread: + mm = get_task_mm(p); + if (mm) { + setmax_mm_hiwater_rss(&maxrss, mm); + mmput(mm); } + +out_children: r->ru_maxrss = maxrss * (PAGE_SIZE / 1024); /* convert pages to KBs */ + r->ru_utime = ns_to_kernel_old_timeval(utime); + r->ru_stime = ns_to_kernel_old_timeval(stime); }
SYSCALL_DEFINE2(getrusage, int, who, struct rusage __user *, ru)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oleg Nesterov oleg@redhat.com
commit f7ec1cd5cc7ef3ad964b677ba82b8b77f1c93009 upstream.
lock_task_sighand() can trigger a hard lockup. If NR_CPUS threads call getrusage() at the same time and the process has NR_THREADS, spin_lock_irq will spin with irqs disabled O(NR_CPUS * NR_THREADS) time.
Change getrusage() to use sig->stats_lock, it was specifically designed for this type of use. This way it runs lockless in the likely case.
TODO: - Change do_task_stat() to use sig->stats_lock too, then we can remove spin_lock_irq(siglock) in wait_task_zombie().
- Turn sig->stats_lock into seqcount_rwlock_t, this way the readers in the slow mode won't exclude each other. See https://lore.kernel.org/all/20230913154907.GA26210@redhat.com/
- stats_lock has to disable irqs because ->siglock can be taken in irq context, it would be very nice to change __exit_signal() to avoid the siglock->stats_lock dependency.
Link: https://lkml.kernel.org/r/20240122155053.GA26214@redhat.com Signed-off-by: Oleg Nesterov oleg@redhat.com Reported-by: Dylan Hatch dylanbhatch@google.com Tested-by: Dylan Hatch dylanbhatch@google.com Cc: Eric W. Biederman ebiederm@xmission.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sys.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-)
--- a/kernel/sys.c +++ b/kernel/sys.c @@ -1788,7 +1788,9 @@ void getrusage(struct task_struct *p, in unsigned long maxrss; struct mm_struct *mm; struct signal_struct *sig = p->signal; + unsigned int seq = 0;
+retry: memset(r, 0, sizeof(*r)); utime = stime = 0; maxrss = 0; @@ -1800,8 +1802,7 @@ void getrusage(struct task_struct *p, in goto out_thread; }
- if (!lock_task_sighand(p, &flags)) - return; + flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq);
switch (who) { case RUSAGE_BOTH: @@ -1829,14 +1830,23 @@ void getrusage(struct task_struct *p, in r->ru_oublock += sig->oublock; if (maxrss < sig->maxrss) maxrss = sig->maxrss; + + rcu_read_lock(); __for_each_thread(sig, t) accumulate_thread_rusage(t, r); + rcu_read_unlock(); + break;
default: BUG(); } - unlock_task_sighand(p, &flags); + + if (need_seqretry(&sig->stats_lock, seq)) { + seq = 1; + goto retry; + } + done_seqretry_irqrestore(&sig->stats_lock, seq, flags);
if (who == RUSAGE_CHILDREN) goto out_children;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fedor Pchelkin pchelkin@ispras.ru
commit 108a020c64434fed4b69762879d78cd24088b4c7 upstream.
ksmbd_iov_pin_rsp_read() doesn't free the provided aux buffer if it fails. Seems to be the caller's responsibility to clear the buffer in error case.
Found by Linux Verification Center (linuxtesting.org).
Fixes: e2b76ab8b5c9 ("ksmbd: add support for read compound") Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin pchelkin@ispras.ru Acked-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/server/smb2pdu.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/fs/smb/server/smb2pdu.c +++ b/fs/smb/server/smb2pdu.c @@ -6173,8 +6173,10 @@ static noinline int smb2_read_pipe(struc err = ksmbd_iov_pin_rsp_read(work, (void *)rsp, offsetof(struct smb2_read_rsp, Buffer), aux_payload_buf, nbytes); - if (err) + if (err) { + kvfree(aux_payload_buf); goto out; + } kvfree(rpc_resp); } else { err = ksmbd_iov_pin_rsp(work, (void *)rsp, @@ -6384,8 +6386,10 @@ int smb2_read(struct ksmbd_work *work) err = ksmbd_iov_pin_rsp_read(work, (void *)rsp, offsetof(struct smb2_read_rsp, Buffer), aux_payload_buf, nbytes); - if (err) + if (err) { + kvfree(aux_payload_buf); goto out; + } ksmbd_fd_put(work, fp); return 0;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Timur Tabi ttabi@nvidia.com
commit 042b5f83841fbf7ce39474412db3b5e4765a7ea7 upstream.
Nouveau manages GSP-RM DMA buffers with nvkm_gsp_mem objects. Several of these buffers are never dealloced. Some of them can be deallocated right after GSP-RM is initialized, but the rest need to stay until the driver unloads.
Also futher bullet-proof these objects by poisoning the buffer and clearing the nvkm_gsp_mem object when it is deallocated. Poisoning the buffer should trigger an error (or crash) from GSP-RM if it tries to access the buffer after we've deallocated it, because we were wrong about when it is safe to deallocate.
Finally, change the mem->size field to a size_t because that's the same type that dma_alloc_coherent expects.
Cc: stable@vger.kernel.org # v6.7 Fixes: 176fdcbddfd2 ("drm/nouveau/gsp/r535: add support for booting GSP-RM") Signed-off-by: Timur Tabi ttabi@nvidia.com Signed-off-by: Danilo Krummrich dakr@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20240202230608.1981026-1-ttabi... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h | 2 +- .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 59 ++++++++++++------- 2 files changed, 39 insertions(+), 22 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h index d1437c08645f..6f5d376d8fcc 100644 --- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h +++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h @@ -9,7 +9,7 @@ #define GSP_PAGE_SIZE BIT(GSP_PAGE_SHIFT)
struct nvkm_gsp_mem { - u32 size; + size_t size; void *data; dma_addr_t addr; }; diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c index 5e1fa176aac4..6208ddd92964 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c @@ -997,6 +997,32 @@ r535_gsp_rpc_get_gsp_static_info(struct nvkm_gsp *gsp) return 0; }
+static void +nvkm_gsp_mem_dtor(struct nvkm_gsp *gsp, struct nvkm_gsp_mem *mem) +{ + if (mem->data) { + /* + * Poison the buffer to catch any unexpected access from + * GSP-RM if the buffer was prematurely freed. + */ + memset(mem->data, 0xFF, mem->size); + + dma_free_coherent(gsp->subdev.device->dev, mem->size, mem->data, mem->addr); + memset(mem, 0, sizeof(*mem)); + } +} + +static int +nvkm_gsp_mem_ctor(struct nvkm_gsp *gsp, size_t size, struct nvkm_gsp_mem *mem) +{ + mem->size = size; + mem->data = dma_alloc_coherent(gsp->subdev.device->dev, size, &mem->addr, GFP_KERNEL); + if (WARN_ON(!mem->data)) + return -ENOMEM; + + return 0; +} + static int r535_gsp_postinit(struct nvkm_gsp *gsp) { @@ -1024,6 +1050,13 @@ r535_gsp_postinit(struct nvkm_gsp *gsp)
nvkm_inth_allow(&gsp->subdev.inth); nvkm_wr32(device, 0x110004, 0x00000040); + + /* Release the DMA buffers that were needed only for boot and init */ + nvkm_gsp_mem_dtor(gsp, &gsp->boot.fw); + nvkm_gsp_mem_dtor(gsp, &gsp->libos); + nvkm_gsp_mem_dtor(gsp, &gsp->rmargs); + nvkm_gsp_mem_dtor(gsp, &gsp->wpr_meta); + return ret; }
@@ -1532,27 +1565,6 @@ r535_gsp_msg_run_cpu_sequencer(void *priv, u32 fn, void *repv, u32 repc) return 0; }
-static void -nvkm_gsp_mem_dtor(struct nvkm_gsp *gsp, struct nvkm_gsp_mem *mem) -{ - if (mem->data) { - dma_free_coherent(gsp->subdev.device->dev, mem->size, mem->data, mem->addr); - mem->data = NULL; - } -} - -static int -nvkm_gsp_mem_ctor(struct nvkm_gsp *gsp, u32 size, struct nvkm_gsp_mem *mem) -{ - mem->size = size; - mem->data = dma_alloc_coherent(gsp->subdev.device->dev, size, &mem->addr, GFP_KERNEL); - if (WARN_ON(!mem->data)) - return -ENOMEM; - - return 0; -} - - static int r535_gsp_booter_unload(struct nvkm_gsp *gsp, u32 mbox0, u32 mbox1) { @@ -2150,6 +2162,11 @@ r535_gsp_dtor(struct nvkm_gsp *gsp) mutex_destroy(&gsp->cmdq.mutex);
r535_gsp_dtor_fws(gsp); + + nvkm_gsp_mem_dtor(gsp, &gsp->shm.mem); + nvkm_gsp_mem_dtor(gsp, &gsp->loginit); + nvkm_gsp_mem_dtor(gsp, &gsp->logintr); + nvkm_gsp_mem_dtor(gsp, &gsp->logrm); }
int
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arunpravin Paneer Selvam Arunpravin.PaneerSelvam@amd.com
commit 8746c6c9dfa31d269c65dd52ab42fde0720b7d91 upstream.
Few users have observed display corruption when they boot the machine to KDE Plasma or playing games. We have root caused the problem that whenever alloc_range() couldn't find the required memory blocks the function was returning SUCCESS in some of the corner cases.
The right approach would be if the total allocated size is less than the required size, the function should return -ENOSPC.
Cc: stable@vger.kernel.org # 6.7+ Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3097 Tested-by: Mario Limonciello mario.limonciello@amd.com Link: https://patchwork.kernel.org/project/dri-devel/patch/20240207174456.341121-1... Acked-by: Christian König christian.koenig@amd.com Reviewed-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Arunpravin Paneer Selvam Arunpravin.PaneerSelvam@amd.com Link: https://patchwork.freedesktop.org/patch/msgid/20240214131853.5934-1-Arunprav... Signed-off-by: Christian König christian.koenig@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/drm_buddy.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index f57e6d74fb0e..c1a99bf4dffd 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm, } while (1);
list_splice_tail(&allocated, blocks); + + if (total_allocated < size) { + err = -ENOSPC; + goto err_free; + } + return 0;
err_undo:
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rob Clark robdclark@chromium.org
commit 8c7bfd8262319fd3f127a5380f593ea76f1b88a2 upstream.
The brute force iommu_flush_iotlb_all() was good enough for unmap, but in some cases a map operation could require removing a table pte entry to replace with a block entry. This also requires tlb invalidation. Missing this was resulting an obscure iova fault on what should be a valid buffer address.
Thanks to Robin Murphy for helping me understand the cause of the fault.
Cc: Robin Murphy robin.murphy@arm.com Cc: stable@vger.kernel.org Fixes: b145c6e65eb0 ("drm/msm: Add support to create a local pagetable") Signed-off-by: Rob Clark robdclark@chromium.org Patchwork: https://patchwork.freedesktop.org/patch/578117/ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/msm/msm_iommu.c | 32 +++++++++++++++++++++++++++++--- 1 file changed, 29 insertions(+), 3 deletions(-)
--- a/drivers/gpu/drm/msm/msm_iommu.c +++ b/drivers/gpu/drm/msm/msm_iommu.c @@ -21,6 +21,8 @@ struct msm_iommu_pagetable { struct msm_mmu base; struct msm_mmu *parent; struct io_pgtable_ops *pgtbl_ops; + const struct iommu_flush_ops *tlb; + struct device *iommu_dev; unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */ phys_addr_t ttbr; u32 asid; @@ -201,11 +203,33 @@ static const struct msm_mmu_funcs pageta
static void msm_iommu_tlb_flush_all(void *cookie) { + struct msm_iommu_pagetable *pagetable = cookie; + struct adreno_smmu_priv *adreno_smmu; + + if (!pm_runtime_get_if_in_use(pagetable->iommu_dev)) + return; + + adreno_smmu = dev_get_drvdata(pagetable->parent->dev); + + pagetable->tlb->tlb_flush_all((void *)adreno_smmu->cookie); + + pm_runtime_put_autosuspend(pagetable->iommu_dev); }
static void msm_iommu_tlb_flush_walk(unsigned long iova, size_t size, size_t granule, void *cookie) { + struct msm_iommu_pagetable *pagetable = cookie; + struct adreno_smmu_priv *adreno_smmu; + + if (!pm_runtime_get_if_in_use(pagetable->iommu_dev)) + return; + + adreno_smmu = dev_get_drvdata(pagetable->parent->dev); + + pagetable->tlb->tlb_flush_walk(iova, size, granule, (void *)adreno_smmu->cookie); + + pm_runtime_put_autosuspend(pagetable->iommu_dev); }
static void msm_iommu_tlb_add_page(struct iommu_iotlb_gather *gather, @@ -213,7 +237,7 @@ static void msm_iommu_tlb_add_page(struc { }
-static const struct iommu_flush_ops null_tlb_ops = { +static const struct iommu_flush_ops tlb_ops = { .tlb_flush_all = msm_iommu_tlb_flush_all, .tlb_flush_walk = msm_iommu_tlb_flush_walk, .tlb_add_page = msm_iommu_tlb_add_page, @@ -254,10 +278,10 @@ struct msm_mmu *msm_iommu_pagetable_crea
/* The incoming cfg will have the TTBR1 quirk enabled */ ttbr0_cfg.quirks &= ~IO_PGTABLE_QUIRK_ARM_TTBR1; - ttbr0_cfg.tlb = &null_tlb_ops; + ttbr0_cfg.tlb = &tlb_ops;
pagetable->pgtbl_ops = alloc_io_pgtable_ops(ARM_64_LPAE_S1, - &ttbr0_cfg, iommu->domain); + &ttbr0_cfg, pagetable);
if (!pagetable->pgtbl_ops) { kfree(pagetable); @@ -279,6 +303,8 @@ struct msm_mmu *msm_iommu_pagetable_crea
/* Needed later for TLB flush */ pagetable->parent = parent; + pagetable->tlb = ttbr1_cfg->tlb; + pagetable->iommu_dev = ttbr1_cfg->iommu_dev; pagetable->pgsize_bitmap = ttbr0_cfg.pgsize_bitmap; pagetable->ttbr = ttbr0_cfg.arm_lpae_s1_cfg.ttbr;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhikai Zhai zhikai.zhai@amd.com
commit 94b38b895dec8c0ef093140a141e191b60ff614c upstream.
[WHY] We Double-check link status if training successful, but miss the lane align status.
[HOW] Add the lane align status check
Cc: Mario Limonciello mario.limonciello@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Reviewed-by: Wenjing Liu wenjing.liu@amd.com Acked-by: Aurabindo Pillai aurabindo.pillai@amd.com Signed-off-by: Zhikai Zhai zhikai.zhai@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training.c +++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training.c @@ -517,6 +517,7 @@ enum link_training_result dp_check_link_ { enum link_training_result status = LINK_TRAINING_SUCCESS; union lane_status lane_status; + union lane_align_status_updated dpcd_lane_status_updated; uint8_t dpcd_buf[6] = {0}; uint32_t lane;
@@ -532,10 +533,12 @@ enum link_training_result dp_check_link_ * check lanes status */ lane_status.raw = dp_get_nibble_at_index(&dpcd_buf[2], lane); + dpcd_lane_status_updated.raw = dpcd_buf[4];
if (!lane_status.bits.CHANNEL_EQ_DONE_0 || !lane_status.bits.CR_DONE_0 || - !lane_status.bits.SYMBOL_LOCKED_0) { + !lane_status.bits.SYMBOL_LOCKED_0 || + !dp_is_interlane_aligned(dpcd_lane_status_updated)) { /* if one of the channel equalization, clock * recovery or symbol lock is dropped * consider it as (link has been
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ville Syrjälä ville.syrjala@linux.intel.com
commit ad26d56d080780bbfcc1696ca0c0cce3e2124ef6 upstream.
Limit the link rate to HBR3 or below (<=8.1Gbps) in SST mode. UHBR (10Gbps+) link rates require 128b/132b channel encoding which we have not yet hooked up into the SST/no-sideband codepaths.
Cc: stable@vger.kernel.org Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240208154552.14545-1-ville.s... Reviewed-by: Jani Nikula jani.nikula@intel.com (cherry picked from commit 6061811d72e14f41f71b6a025510920b187bfcca) Signed-off-by: Joonas Lahtinen joonas.lahtinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_dp.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/gpu/drm/i915/display/intel_dp.c +++ b/drivers/gpu/drm/i915/display/intel_dp.c @@ -2275,6 +2275,9 @@ intel_dp_compute_config_limits(struct in limits->min_rate = intel_dp_common_rate(intel_dp, 0); limits->max_rate = intel_dp_max_link_rate(intel_dp);
+ /* FIXME 128b/132b SST support missing */ + limits->max_rate = min(limits->max_rate, 810000); + limits->min_lane_count = 1; limits->max_lane_count = intel_dp_max_lane_count(intel_dp);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Philip Yang Philip.Yang@amd.com
commit b671cd3d456315f63171a670769356a196cf7fd0 upstream.
Without unsigned long typecast, the size is passed in as zero if page array size >= 4GB, nr_pages >= 0x100000, then sg list converted will have the first and the last chunk lost.
Signed-off-by: Philip Yang Philip.Yang@amd.com Acked-by: Felix Kuehling Felix.Kuehling@amd.com Reviewed-by: Christian König christian.koenig@amd.com CC: stable@vger.kernel.org Signed-off-by: Christian König christian.koenig@amd.com Link: https://patchwork.freedesktop.org/patch/msgid/20230821200201.24685-1-Philip.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/drm_prime.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -820,7 +820,7 @@ struct sg_table *drm_prime_pages_to_sg(s if (max_segment == 0) max_segment = UINT_MAX; err = sg_alloc_table_from_pages_segment(sg, pages, nr_pages, 0, - nr_pages << PAGE_SHIFT, + (unsigned long)nr_pages << PAGE_SHIFT, max_segment, GFP_KERNEL); if (err) { kfree(sg);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thong thong.thai@amd.com
commit 2f542421a47e8246e9b7d2c6508fe3a6e6c63078 upstream.
Update the maximum resolution reported for HEVC encoding on VCN 4 devices to reflect its 8K encoding capability.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3159 Signed-off-by: Thong thong.thai@amd.com Reviewed-by: Ruijing Dong ruijing.dong@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/soc21.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/soc21.c +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c @@ -50,13 +50,13 @@ static const struct amd_ip_funcs soc21_c /* SOC21 */ static const struct amdgpu_video_codec_info vcn_4_0_0_video_codecs_encode_array_vcn0[] = { {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4_AVC, 4096, 2304, 0)}, - {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 4096, 2304, 0)}, + {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 8192, 4352, 0)}, {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_AV1, 8192, 4352, 0)}, };
static const struct amdgpu_video_codec_info vcn_4_0_0_video_codecs_encode_array_vcn1[] = { {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4_AVC, 4096, 2304, 0)}, - {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 4096, 2304, 0)}, + {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 8192, 4352, 0)}, };
static const struct amdgpu_video_codecs vcn_4_0_0_video_codecs_encode_vcn0 = {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fangzhi Zuo jerry.zuo@amd.com
commit e6a7df96facdcf5b1f71eb3ec26f2f9f6ad61e57 upstream.
The change try to fix below error specific to RV platform:
BUG: kernel NULL pointer dereference, address: 0000000000000008 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 4 PID: 917 Comm: sway Not tainted 6.3.9-arch1-1 #1 124dc55df4f5272ccb409f39ef4872fc2b3376a2 Hardware name: LENOVO 20NKS01Y00/20NKS01Y00, BIOS R12ET61W(1.31 ) 07/28/2022 RIP: 0010:drm_dp_atomic_find_time_slots+0x5e/0x260 [drm_display_helper] Code: 01 00 00 48 8b 85 60 05 00 00 48 63 80 88 00 00 00 3b 43 28 0f 8d 2e 01 00 00 48 8b 53 30 48 8d 04 80 48 8d 04 c2 48 8b 40 18 <48> 8> RSP: 0018:ffff960cc2df77d8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff8afb87e81280 RCX: 0000000000000224 RDX: ffff8afb9ee37c00 RSI: ffff8afb8da1a578 RDI: ffff8afb87e81280 RBP: ffff8afb83d67000 R08: 0000000000000001 R09: ffff8afb9652f850 R10: ffff960cc2df7908 R11: 0000000000000002 R12: 0000000000000000 R13: ffff8afb8d7688a0 R14: ffff8afb8da1a578 R15: 0000000000000224 FS: 00007f4dac35ce00(0000) GS:ffff8afe30b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000010ddc6000 CR4: 00000000003506e0 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x171/0x4e0 ? plist_add+0xbe/0x100 ? exc_page_fault+0x7c/0x180 ? asm_exc_page_fault+0x26/0x30 ? drm_dp_atomic_find_time_slots+0x5e/0x260 [drm_display_helper 0e67723696438d8e02b741593dd50d80b44c2026] ? drm_dp_atomic_find_time_slots+0x28/0x260 [drm_display_helper 0e67723696438d8e02b741593dd50d80b44c2026] compute_mst_dsc_configs_for_link+0x2ff/0xa40 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054] ? fill_plane_buffer_attributes+0x419/0x510 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054] compute_mst_dsc_configs_for_state+0x1e1/0x250 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054] amdgpu_dm_atomic_check+0xecd/0x1190 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054] drm_atomic_check_only+0x5c5/0xa40 drm_mode_atomic_ioctl+0x76e/0xbc0 ? _copy_to_user+0x25/0x30 ? drm_ioctl+0x296/0x4b0 ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 drm_ioctl_kernel+0xcd/0x170 drm_ioctl+0x26d/0x4b0 ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 amdgpu_drm_ioctl+0x4e/0x90 [amdgpu 62e600d2a75e9158e1cd0a243bdc8e6da040c054] __x64_sys_ioctl+0x94/0xd0 do_syscall_64+0x60/0x90 ? do_syscall_64+0x6c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f4dad17f76f Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c> RSP: 002b:00007ffd9ae859f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 000055e255a55900 RCX: 00007f4dad17f76f RDX: 00007ffd9ae85a90 RSI: 00000000c03864bc RDI: 000000000000000b RBP: 00007ffd9ae85a90 R08: 0000000000000003 R09: 0000000000000003 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c03864bc R13: 000000000000000b R14: 000055e255a7fc60 R15: 000055e255a01eb0 </TASK> Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device ccm cmac algif_hash algif_skcipher af_alg joydev mousedev bnep > typec libphy k10temp ipmi_msghandler roles i2c_scmi acpi_cpufreq mac_hid nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_mas> CR2: 0000000000000008 ---[ end trace 0000000000000000 ]--- RIP: 0010:drm_dp_atomic_find_time_slots+0x5e/0x260 [drm_display_helper] Code: 01 00 00 48 8b 85 60 05 00 00 48 63 80 88 00 00 00 3b 43 28 0f 8d 2e 01 00 00 48 8b 53 30 48 8d 04 80 48 8d 04 c2 48 8b 40 18 <48> 8> RSP: 0018:ffff960cc2df77d8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff8afb87e81280 RCX: 0000000000000224 RDX: ffff8afb9ee37c00 RSI: ffff8afb8da1a578 RDI: ffff8afb87e81280 RBP: ffff8afb83d67000 R08: 0000000000000001 R09: ffff8afb9652f850 R10: ffff960cc2df7908 R11: 0000000000000002 R12: 0000000000000000 R13: ffff8afb8d7688a0 R14: ffff8afb8da1a578 R15: 0000000000000224 FS: 00007f4dac35ce00(0000) GS:ffff8afe30b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000010ddc6000 CR4: 00000000003506e0
With a second DP monitor connected, drm_atomic_state in dm atomic check sequence does not include the connector state for the old/existing/first DP monitor. In such case, dsc determination policy would hit a null ptr when it tries to iterate the old/existing stream that does not have a valid connector state attached to it. When that happens, dm atomic check should call drm_atomic_get_connector_state for a new connector state. Existing dm has already done that, except for RV due to it does not have official support of dsc where .num_dsc is not defined in dcn10 resource cap, that prevent from getting drm_atomic_get_connector_state called. So, skip dsc determination policy for ASICs that don't have DSC support.
Cc: stable@vger.kernel.org # 6.1+ Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2314 Reviewed-by: Wayne Lin wayne.lin@amd.com Acked-by: Hamza Mahfooz hamza.mahfooz@amd.com Signed-off-by: Fangzhi Zuo jerry.zuo@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -10440,11 +10440,13 @@ static int amdgpu_dm_atomic_check(struct goto fail; }
- ret = compute_mst_dsc_configs_for_state(state, dm_state->context, vars); - if (ret) { - DRM_DEBUG_DRIVER("compute_mst_dsc_configs_for_state() failed\n"); - ret = -EINVAL; - goto fail; + if (dc_resource_is_dsc_encoding_supported(dc)) { + ret = compute_mst_dsc_configs_for_state(state, dm_state->context, vars); + if (ret) { + DRM_DEBUG_DRIVER("compute_mst_dsc_configs_for_state() failed\n"); + ret = -EINVAL; + goto fail; + } }
ret = dm_update_mst_vcpi_slots_for_dsc(state, dm_state->context, vars);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nathan Chancellor nathan@kernel.org
commit e63e35f0164c43fbc1adb481d6604f253b9f9667 upstream.
After a recent change in LLVM, allmodconfig (which has CONFIG_KCSAN=y and CONFIG_WERROR=y enabled) has a few new instances of -Wframe-larger-than for the mode support and system configuration functions:
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20v2.c:3393:6: error: stack frame size (2144) exceeds limit (2048) in 'dml20v2_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than] 3393 | void dml20v2_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib) | ^ 1 error generated.
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn21/display_mode_vba_21.c:3520:6: error: stack frame size (2192) exceeds limit (2048) in 'dml21_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than] 3520 | void dml21_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib) | ^ 1 error generated.
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20.c:3286:6: error: stack frame size (2128) exceeds limit (2048) in 'dml20_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than] 3286 | void dml20_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib) | ^ 1 error generated.
Without the sanitizers enabled, there are no warnings.
This was the catalyst for commit 6740ec97bcdb ("drm/amd/display: Increase frame warning limit with KASAN or KCSAN in dml2") and that same change was made to dml in commit 5b750b22530f ("drm/amd/display: Increase frame warning limit with KASAN or KCSAN in dml") but the frame_warn_flag variable was not applied to all files. Do so now to clear up the warnings and make all these files consistent.
Cc: stable@vger.kernel.org Closes: https://github.com/ClangBuiltLinux/linux/issue/1990 Signed-off-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/dml/Makefile | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/gpu/drm/amd/display/dc/dml/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile @@ -72,11 +72,11 @@ CFLAGS_$(AMDDALPATH)/dc/dml/display_mode CFLAGS_$(AMDDALPATH)/dc/dml/display_mode_vba.o := $(dml_ccflags) CFLAGS_$(AMDDALPATH)/dc/dml/dcn10/dcn10_fpu.o := $(dml_ccflags) CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/dcn20_fpu.o := $(dml_ccflags) -CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_mode_vba_20.o := $(dml_ccflags) +CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_mode_vba_20.o := $(dml_ccflags) $(frame_warn_flag) CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_rq_dlg_calc_20.o := $(dml_ccflags) -CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_mode_vba_20v2.o := $(dml_ccflags) +CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_mode_vba_20v2.o := $(dml_ccflags) $(frame_warn_flag) CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_rq_dlg_calc_20v2.o := $(dml_ccflags) -CFLAGS_$(AMDDALPATH)/dc/dml/dcn21/display_mode_vba_21.o := $(dml_ccflags) +CFLAGS_$(AMDDALPATH)/dc/dml/dcn21/display_mode_vba_21.o := $(dml_ccflags) $(frame_warn_flag) CFLAGS_$(AMDDALPATH)/dc/dml/dcn21/display_rq_dlg_calc_21.o := $(dml_ccflags) CFLAGS_$(AMDDALPATH)/dc/dml/dcn30/display_mode_vba_30.o := $(dml_ccflags) $(frame_warn_flag) CFLAGS_$(AMDDALPATH)/dc/dml/dcn30/display_rq_dlg_calc_30.o := $(dml_ccflags)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Roman Li roman.li@amd.com
commit 46806e59a87790760870d216f54951a5b4d545bc upstream.
[Why] There is a potential memory access violation while iterating through array of dcn35 clks.
[How] Limit iteration per array size.
Cc: Mario Limonciello mario.limonciello@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Reviewed-by: Nicholas Kazlauskas nicholas.kazlauskas@amd.com Acked-by: Aurabindo Pillai aurabindo.pillai@amd.com Signed-off-by: Roman Li roman.li@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c | 17 +++++++---- 1 file changed, 12 insertions(+), 5 deletions(-)
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c @@ -654,10 +654,13 @@ static void dcn35_clk_mgr_helper_populat struct clk_limit_table_entry def_max = bw_params->clk_table.entries[bw_params->clk_table.num_entries - 1]; uint32_t max_fclk = 0, min_pstate = 0, max_dispclk = 0, max_dppclk = 0; uint32_t max_pstate = 0, max_dram_speed_mts = 0, min_dram_speed_mts = 0; + uint32_t num_memps, num_fclk, num_dcfclk; int i;
/* Determine min/max p-state values. */ - for (i = 0; i < clock_table->NumMemPstatesEnabled; i++) { + num_memps = (clock_table->NumMemPstatesEnabled > NUM_MEM_PSTATE_LEVELS) ? NUM_MEM_PSTATE_LEVELS : + clock_table->NumMemPstatesEnabled; + for (i = 0; i < num_memps; i++) { uint32_t dram_speed_mts = calc_dram_speed_mts(&clock_table->MemPstateTable[i]);
if (is_valid_clock_value(dram_speed_mts) && dram_speed_mts > max_dram_speed_mts) { @@ -669,7 +672,7 @@ static void dcn35_clk_mgr_helper_populat min_dram_speed_mts = max_dram_speed_mts; min_pstate = max_pstate;
- for (i = 0; i < clock_table->NumMemPstatesEnabled; i++) { + for (i = 0; i < num_memps; i++) { uint32_t dram_speed_mts = calc_dram_speed_mts(&clock_table->MemPstateTable[i]);
if (is_valid_clock_value(dram_speed_mts) && dram_speed_mts < min_dram_speed_mts) { @@ -698,9 +701,13 @@ static void dcn35_clk_mgr_helper_populat /* Base the clock table on dcfclk, need at least one entry regardless of pmfw table */ ASSERT(clock_table->NumDcfClkLevelsEnabled > 0);
- max_fclk = find_max_clk_value(clock_table->FclkClocks_Freq, clock_table->NumFclkLevelsEnabled); - - for (i = 0; i < clock_table->NumDcfClkLevelsEnabled; i++) { + num_fclk = (clock_table->NumFclkLevelsEnabled > NUM_FCLK_DPM_LEVELS) ? NUM_FCLK_DPM_LEVELS : + clock_table->NumFclkLevelsEnabled; + max_fclk = find_max_clk_value(clock_table->FclkClocks_Freq, num_fclk); + + num_dcfclk = (clock_table->NumFclkLevelsEnabled > NUM_DCFCLK_DPM_LEVELS) ? NUM_DCFCLK_DPM_LEVELS : + clock_table->NumDcfClkLevelsEnabled; + for (i = 0; i < num_dcfclk; i++) { int j;
/* First search defaults for the clocks we don't read using closest lower or equal default dcfclk */
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tom Chung chiahsuan.chung@amd.com
commit deb110292180cd501f6fde2a0178d65fcbcabb0c upstream.
[Why] The original picture aspect ratio in mode struct may have chance be overwritten with wrong aspect ratio data in create_stream_for_sink(). It will create a different VIC output and cause HDMI compliance test failed.
[How] Preserve the original picture aspect ratio data during create the stream.
Cc: Mario Limonciello mario.limonciello@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Reviewed-by: Aurabindo Pillai aurabindo.pillai@amd.com Signed-off-by: Tom Chung chiahsuan.chung@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -6110,7 +6110,9 @@ create_stream_for_sink(struct drm_connec if (recalculate_timing) { freesync_mode = get_highest_refresh_rate_mode(aconnector, false); drm_mode_copy(&saved_mode, &mode); + saved_mode.picture_aspect_ratio = mode.picture_aspect_ratio; drm_mode_copy(&mode, freesync_mode); + mode.picture_aspect_ratio = saved_mode.picture_aspect_ratio; } else { decide_crtc_timing_for_drm_display_mode( &mode, preferred_mode, scale);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lijo Lazar lijo.lazar@amd.com
commit 55173942a63668bdc1d61812c7c9e0406aefb5bf upstream.
The present way to fetch VRAM vendor information turns out to be not reliable on GFX 9.4.3 dGPUs as well. Avoid using the data.
Signed-off-by: Lijo Lazar lijo.lazar@amd.com Reviewed-by: Hawking Zhang Hawking.Zhang@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 8 -------- 1 file changed, 8 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -1947,14 +1947,6 @@ static int gmc_v9_0_init_mem_ranges(stru
static void gmc_v9_4_3_init_vram_info(struct amdgpu_device *adev) { - static const u32 regBIF_BIOS_SCRATCH_4 = 0x50; - u32 vram_info; - - /* Only for dGPU, vendor informaton is reliable */ - if (!amdgpu_sriov_vf(adev) && !(adev->flags & AMD_IS_APU)) { - vram_info = RREG32(regBIF_BIOS_SCRATCH_4); - adev->gmc.vram_vendor = vram_info & 0xF; - } adev->gmc.vram_type = AMDGPU_VRAM_TYPE_HBM; adev->gmc.vram_width = 128 * 64; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Souradeep Chakrabarti schakrabarti@linux.microsoft.com
commit e0526ec5360a48ad3ab2e26e802b0532302a7e11 upstream.
In commit ac5047671758 ("hv_netvsc: Disable NAPI before closing the VMBus channel"), napi_disable was getting called for all channels, including all subchannels without confirming if they are enabled or not.
This caused hv_netvsc getting hung at napi_disable, when netvsc_probe() has finished running but nvdev->subchan_work has not started yet. netvsc_subchan_work() -> rndis_set_subchannel() has not created the sub-channels and because of that netvsc_sc_open() is not running. netvsc_remove() calls cancel_work_sync(&nvdev->subchan_work), for which netvsc_subchan_work did not run.
netif_napi_add() sets the bit NAPI_STATE_SCHED because it ensures NAPI cannot be scheduled. Then netvsc_sc_open() -> napi_enable will clear the NAPIF_STATE_SCHED bit, so it can be scheduled. napi_disable() does the opposite.
Now during netvsc_device_remove(), when napi_disable is called for those subchannels, napi_disable gets stuck on infinite msleep.
This fix addresses this problem by ensuring that napi_disable() is not getting called for non-enabled NAPI struct. But netif_napi_del() is still necessary for these non-enabled NAPI struct for cleanup purpose.
Call trace: [ 654.559417] task:modprobe state:D stack: 0 pid: 2321 ppid: 1091 flags:0x00004002 [ 654.568030] Call Trace: [ 654.571221] <TASK> [ 654.573790] __schedule+0x2d6/0x960 [ 654.577733] schedule+0x69/0xf0 [ 654.581214] schedule_timeout+0x87/0x140 [ 654.585463] ? __bpf_trace_tick_stop+0x20/0x20 [ 654.590291] msleep+0x2d/0x40 [ 654.593625] napi_disable+0x2b/0x80 [ 654.597437] netvsc_device_remove+0x8a/0x1f0 [hv_netvsc] [ 654.603935] rndis_filter_device_remove+0x194/0x1c0 [hv_netvsc] [ 654.611101] ? do_wait_intr+0xb0/0xb0 [ 654.615753] netvsc_remove+0x7c/0x120 [hv_netvsc] [ 654.621675] vmbus_remove+0x27/0x40 [hv_vmbus]
Cc: stable@vger.kernel.org Fixes: ac5047671758 ("hv_netvsc: Disable NAPI before closing the VMBus channel") Signed-off-by: Souradeep Chakrabarti schakrabarti@linux.microsoft.com Reviewed-by: Dexuan Cui decui@microsoft.com Reviewed-by: Haiyang Zhang haiyangz@microsoft.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/1706686551-28510-1-git-send-email-schakrabarti@lin... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/hyperv/netvsc.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -708,7 +708,10 @@ void netvsc_device_remove(struct hv_devi /* Disable NAPI and disassociate its context from the device. */ for (i = 0; i < net_device->num_chn; i++) { /* See also vmbus_reset_channel_cb(). */ - napi_disable(&net_device->chan_table[i].napi); + /* only disable enabled NAPI channel */ + if (i < ndev->real_num_rx_queues) + napi_disable(&net_device->chan_table[i].napi); + netif_napi_del(&net_device->chan_table[i].napi); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vincent Donnefort vdonnefort@google.com
commit 66bbea9ed6446b8471d365a22734dc00556c4785 upstream.
The return type for ring_buffer_poll_wait() is __poll_t. This is behind the scenes an unsigned where we can set event bits. In case of a non-allocated CPU, we do return instead -EINVAL (0xffffffea). Lucky us, this ends up setting few error bits (EPOLLERR | EPOLLHUP | EPOLLNVAL), so user-space at least is aware something went wrong.
Nonetheless, this is an incorrect code. Replace that -EINVAL with a proper EPOLLERR to clean that output. As this doesn't change the behaviour, there's no need to treat this change as a bug fix.
Link: https://lore.kernel.org/linux-trace-kernel/20240131140955.3322792-1-vdonnefo...
Cc: stable@vger.kernel.org Fixes: 6721cb6002262 ("ring-buffer: Do not poll non allocated cpu buffers") Signed-off-by: Vincent Donnefort vdonnefort@google.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/ring_buffer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -1091,7 +1091,7 @@ __poll_t ring_buffer_poll_wait(struct tr full = 0; } else { if (!cpumask_test_cpu(cpu, buffer->cpumask)) - return -EINVAL; + return EPOLLERR;
cpu_buffer = buffer->buffers[cpu]; work = &cpu_buffer->irq_work;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrew Lunn andrew@lunn.ch
commit 585b40e25dc9ff3d2b03d1495150540849009e5b upstream.
Not all mv88e6xxx device support C45 read/write operations. Those which do not return -EOPNOTSUPP. However, when phylib scans the bus, it considers this fatal, and the probe of the MDIO bus fails, which in term causes the mv88e6xxx probe as a whole to fail.
When there is no device on the bus for a given address, the pull up resistor on the data line results in the read returning 0xffff. The phylib core code understands this when scanning for devices on the bus. C45 allows multiple devices to be supported at one address, so phylib will perform a few reads at each address, so although thought not the most efficient solution, it is a way to avoid fatal errors. Make use of this as a minimal fix for stable to fix the probing problems.
Follow up patches will rework how C45 operates to make it similar to C22 which considers -ENODEV as a none-fatal, and swap mv88e6xxx to using this.
Cc: stable@vger.kernel.org Fixes: 743a19e38d02 ("net: dsa: mv88e6xxx: Separate C22 and C45 transactions") Reported-by: Tim Menninger tmenninger@purestorage.com Signed-off-by: Andrew Lunn andrew@lunn.ch Link: https://lore.kernel.org/r/20240129224948.1531452-1-andrew@lunn.ch Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/dsa/mv88e6xxx/chip.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -3545,7 +3545,7 @@ static int mv88e6xxx_mdio_read_c45(struc int err;
if (!chip->info->ops->phy_read_c45) - return -EOPNOTSUPP; + return 0xffff;
mv88e6xxx_reg_lock(chip); err = chip->info->ops->phy_read_c45(chip, bus, phy, devad, reg, &val);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hui Zhou hui.zhou@corigine.com
commit cefa98e806fd4e2a5e2047457a11ae5f17b8f621 upstream.
The nfp offload flow pay will not allocate a mask id when the out port is openvswitch internal port. This is because these flows are used to configure the pre_tun table and are never actually send to the firmware as an add-flow message. When a tc rule which action contains ct and the post ct entry's out port is openvswitch internal port, the merge offload flow pay with the wrong mask id of 0 will be send to the firmware. Actually, the nfp can not support hardware offload for this situation, so return EOPNOTSUPP.
Fixes: bd0fe7f96a3c ("nfp: flower-ct: add zone table entry when handling pre/post_ct flows") CC: stable@vger.kernel.org # 5.14+ Signed-off-by: Hui Zhou hui.zhou@corigine.com Signed-off-by: Louis Peens louis.peens@corigine.com Link: https://lore.kernel.org/r/20240124151909.31603-2-louis.peens@corigine.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/netronome/nfp/flower/conntrack.c | 22 +++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/netronome/nfp/flower/conntrack.c +++ b/drivers/net/ethernet/netronome/nfp/flower/conntrack.c @@ -1864,10 +1864,30 @@ int nfp_fl_ct_handle_post_ct(struct nfp_ { struct flow_rule *rule = flow_cls_offload_flow_rule(flow); struct nfp_fl_ct_flow_entry *ct_entry; + struct flow_action_entry *ct_goto; struct nfp_fl_ct_zone_entry *zt; + struct flow_action_entry *act; bool wildcarded = false; struct flow_match_ct ct; - struct flow_action_entry *ct_goto; + int i; + + flow_action_for_each(i, act, &rule->action) { + switch (act->id) { + case FLOW_ACTION_REDIRECT: + case FLOW_ACTION_REDIRECT_INGRESS: + case FLOW_ACTION_MIRRED: + case FLOW_ACTION_MIRRED_INGRESS: + if (act->dev->rtnl_link_ops && + !strcmp(act->dev->rtnl_link_ops->kind, "openvswitch")) { + NL_SET_ERR_MSG_MOD(extack, + "unsupported offload: out port is openvswitch internal port"); + return -EOPNOTSUPP; + } + break; + default: + break; + } + }
flow_rule_match_ct(rule, &ct); if (!ct.mask->ct_zone) {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hui Zhou hui.zhou@corigine.com
commit 3a007b8009b5f8af021021b7a590a6da0dc4c6e0 upstream.
The nfp driver will merge the tp source port and tp destination port into one dword which the offset must be zero to do hardware offload. However, the mangle action for the tp source port and tp destination port is separated for tc ct action. Modify the mangle action for the FLOW_ACT_MANGLE_HDR_TYPE_TCP and FLOW_ACT_MANGLE_HDR_TYPE_UDP to satisfy the nfp driver offload check for the tp port.
The mangle action provides a 4B value for source, and a 4B value for the destination, but only 2B of each contains the useful information. For offload the 2B of each is combined into a single 4B word. Since the incoming mask for the source is '0xFFFF<mask>' the shift-left will throw away the 0xFFFF part. When this gets combined together in the offload it will clear the destination field. Fix this by setting the lower bits back to 0xFFFF, effectively doing a rotate-left operation on the mask.
Fixes: 5cee92c6f57a ("nfp: flower: support hw offload for ct nat action") CC: stable@vger.kernel.org # 6.1+ Signed-off-by: Hui Zhou hui.zhou@corigine.com Signed-off-by: Louis Peens louis.peens@corigine.com Link: https://lore.kernel.org/r/20240124151909.31603-3-louis.peens@corigine.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/netronome/nfp/flower/conntrack.c | 24 ++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/netronome/nfp/flower/conntrack.c +++ b/drivers/net/ethernet/netronome/nfp/flower/conntrack.c @@ -1424,10 +1424,30 @@ static void nfp_nft_ct_translate_mangle_ mangle_action->mangle.mask = (__force u32)cpu_to_be32(mangle_action->mangle.mask); return;
+ /* Both struct tcphdr and struct udphdr start with + * __be16 source; + * __be16 dest; + * so we can use the same code for both. + */ case FLOW_ACT_MANGLE_HDR_TYPE_TCP: case FLOW_ACT_MANGLE_HDR_TYPE_UDP: - mangle_action->mangle.val = (__force u16)cpu_to_be16(mangle_action->mangle.val); - mangle_action->mangle.mask = (__force u16)cpu_to_be16(mangle_action->mangle.mask); + if (mangle_action->mangle.offset == offsetof(struct tcphdr, source)) { + mangle_action->mangle.val = + (__force u32)cpu_to_be32(mangle_action->mangle.val << 16); + /* The mask of mangle action is inverse mask, + * so clear the dest tp port with 0xFFFF to + * instead of rotate-left operation. + */ + mangle_action->mangle.mask = + (__force u32)cpu_to_be32(mangle_action->mangle.mask << 16 | 0xFFFF); + } + if (mangle_action->mangle.offset == offsetof(struct tcphdr, dest)) { + mangle_action->mangle.offset = 0; + mangle_action->mangle.val = + (__force u32)cpu_to_be32(mangle_action->mangle.val); + mangle_action->mangle.mask = + (__force u32)cpu_to_be32(mangle_action->mangle.mask); + } return;
default:
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gui-Dong Han 2045gemini@gmail.com
commit 30926783a46841c2d1bbf3f74067ba85d304fd0d upstream.
In uart_tiocmget(): result = uport->mctrl; uart_port_lock_irq(uport); result |= uport->ops->get_mctrl(uport); uart_port_unlock_irq(uport); ... return result;
In uart_update_mctrl(): uart_port_lock_irqsave(port, &flags); ... port->mctrl = (old & ~clear) | set; ... port->ops->set_mctrl(port, port->mctrl); ... uart_port_unlock_irqrestore(port, flags);
An atomicity violation is identified due to the concurrent execution of uart_tiocmget() and uart_update_mctrl(). After assigning result = uport->mctrl, the mctrl value may change in uart_update_mctrl(), leading to a mismatch between the value returned by uport->ops->get_mctrl(uport) and the mctrl value previously read. This can result in uart_tiocmget() returning an incorrect value.
This possible bug is found by an experimental static analysis tool developed by our team, BassCheck[1]. This tool analyzes the locking APIs to extract function pairs that can be concurrently executed, and then analyzes the instructions in the paired functions to identify possible concurrency bugs including data races and atomicity violations. The above possible bug is reported when our tool analyzes the source code of Linux 5.17.
To address this issue, it is suggested to move the line result = uport->mctrl inside the uart_port_lock block to ensure atomicity and prevent the mctrl value from being altered during the execution of uart_tiocmget(). With this patch applied, our tool no longer reports the bug, with the kernel configuration allyesconfig for x86_64. Due to the absence of the requisite hardware, we are unable to conduct runtime testing of the patch. Therefore, our verification is solely based on code logic analysis.
[1] https://sites.google.com/view/basscheck/
Fixes: c5f4644e6c8b ("[PATCH] Serial: Adjust serial locking") Cc: stable@vger.kernel.org Signed-off-by: Gui-Dong Han 2045gemini@gmail.com Link: https://lore.kernel.org/r/20240112113624.17048-1-2045gemini@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/serial_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/tty/serial/serial_core.c +++ b/drivers/tty/serial/serial_core.c @@ -1085,8 +1085,8 @@ static int uart_tiocmget(struct tty_stru goto out;
if (!tty_io_error(tty)) { - result = uport->mctrl; uart_port_lock_irq(uport); + result = uport->mctrl; result |= uport->ops->get_mctrl(uport); uart_port_unlock_irq(uport); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugo Villeneuve hvilleneuve@dimonoff.com
commit 0419373333c2f2024966d36261fd82a453281e80 upstream.
If regmap_read() returns a non-zero value, the 'val' variable can be left uninitialized.
Clear it before calling regmap_read() to make sure we properly detect the clock ready bit.
Fixes: 4cf9a888fd3c ("serial: max310x: Check the clock readiness") Cc: stable@vger.kernel.org Signed-off-by: Hugo Villeneuve hvilleneuve@dimonoff.com Link: https://lore.kernel.org/r/20240116213001.3691629-2-hugo@hugovil.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/max310x.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/tty/serial/max310x.c +++ b/drivers/tty/serial/max310x.c @@ -641,7 +641,7 @@ static u32 max310x_set_ref_clk(struct de
/* Wait for crystal */ if (xtal) { - unsigned int val; + unsigned int val = 0; msleep(10); regmap_read(s->regmap, MAX310X_STS_IRQSTS_REG, &val); if (!(val & MAX310X_STS_CLKREADY_BIT)) {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugo Villeneuve hvilleneuve@dimonoff.com
commit 93cd256ab224c2519e7c4e5f58bb4f1ac2bf0965 upstream.
Some people are seeing a warning similar to this when using a crystal:
max310x 11-006c: clock is not stable yet
The datasheet doesn't mention the maximum time to wait for the clock to be stable when using a crystal, and it seems that the 10ms delay in the driver is not always sufficient.
Jan Kundrát reported that it took three tries (each separated by 10ms) to get a stable clock.
Modify behavior to check stable clock ready bit multiple times (20), and waiting 10ms between each try.
Note: the first draft of the driver originally used a 50ms delay, without checking the clock stable bit. Then a loop with 1000 retries was implemented, each time reading the clock stable bit.
Fixes: 4cf9a888fd3c ("serial: max310x: Check the clock readiness") Cc: stable@vger.kernel.org Suggested-by: Jan Kundrát jan.kundrat@cesnet.cz Link: https://www.spinics.net/lists/linux-serial/msg35773.html Link: https://lore.kernel.org/all/20240110174015.6f20195fde08e5c9e64e5675@hugovil.... Link: https://github.com/boundarydevices/linux/commit/e5dfe3e4a751392515d780519731... Signed-off-by: Hugo Villeneuve hvilleneuve@dimonoff.com Link: https://lore.kernel.org/r/20240116213001.3691629-3-hugo@hugovil.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/max310x.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-)
--- a/drivers/tty/serial/max310x.c +++ b/drivers/tty/serial/max310x.c @@ -237,6 +237,10 @@ #define MAX310x_REV_MASK (0xf8) #define MAX310X_WRITE_BIT 0x80
+/* Crystal-related definitions */ +#define MAX310X_XTAL_WAIT_RETRIES 20 /* Number of retries */ +#define MAX310X_XTAL_WAIT_DELAY_MS 10 /* Delay between retries */ + /* MAX3107 specific */ #define MAX3107_REV_ID (0xa0)
@@ -641,12 +645,19 @@ static u32 max310x_set_ref_clk(struct de
/* Wait for crystal */ if (xtal) { - unsigned int val = 0; - msleep(10); - regmap_read(s->regmap, MAX310X_STS_IRQSTS_REG, &val); - if (!(val & MAX310X_STS_CLKREADY_BIT)) { + bool stable = false; + unsigned int try = 0, val = 0; + + do { + msleep(MAX310X_XTAL_WAIT_DELAY_MS); + regmap_read(s->regmap, MAX310X_STS_IRQSTS_REG, &val); + + if (val & MAX310X_STS_CLKREADY_BIT) + stable = true; + } while (!stable && (++try < MAX310X_XTAL_WAIT_RETRIES)); + + if (!stable) dev_warn(dev, "clock is not stable yet\n"); - } }
return bestfreq;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugo Villeneuve hvilleneuve@dimonoff.com
commit 8afa6c6decea37e7cb473d2c60473f37f46cea35 upstream.
A stable clock is really required in order to use this UART, so log an error message and bail out if the chip reports that the clock is not stable.
Fixes: 4cf9a888fd3c ("serial: max310x: Check the clock readiness") Cc: stable@vger.kernel.org Suggested-by: Jan Kundrát jan.kundrat@cesnet.cz Link: https://www.spinics.net/lists/linux-serial/msg35773.html Signed-off-by: Hugo Villeneuve hvilleneuve@dimonoff.com Link: https://lore.kernel.org/r/20240116213001.3691629-4-hugo@hugovil.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/max310x.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
--- a/drivers/tty/serial/max310x.c +++ b/drivers/tty/serial/max310x.c @@ -587,7 +587,7 @@ static int max310x_update_best_err(unsig return 1; }
-static u32 max310x_set_ref_clk(struct device *dev, struct max310x_port *s, +static s32 max310x_set_ref_clk(struct device *dev, struct max310x_port *s, unsigned long freq, bool xtal) { unsigned int div, clksrc, pllcfg = 0; @@ -657,7 +657,8 @@ static u32 max310x_set_ref_clk(struct de } while (!stable && (++try < MAX310X_XTAL_WAIT_RETRIES));
if (!stable) - dev_warn(dev, "clock is not stable yet\n"); + return dev_err_probe(dev, -EAGAIN, + "clock is not stable\n"); }
return bestfreq; @@ -1282,7 +1283,7 @@ static int max310x_probe(struct device * { int i, ret, fmin, fmax, freq; struct max310x_port *s; - u32 uartclk = 0; + s32 uartclk = 0; bool xtal;
for (i = 0; i < devtype->nr; i++) @@ -1360,6 +1361,11 @@ static int max310x_probe(struct device * }
uartclk = max310x_set_ref_clk(dev, s, freq, xtal); + if (uartclk < 0) { + ret = uartclk; + goto out_uart; + } + dev_dbg(dev, "Reference clock set to %i Hz\n", uartclk);
for (i = 0; i < devtype->nr; i++) {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugo Villeneuve hvilleneuve@dimonoff.com
commit b35f8dbbce818b02c730dc85133dc7754266e084 upstream.
If there is a problem after resetting a port, the do/while() loop that checks the default value of DIVLSB register may run forever and spam the I2C bus.
Add a delay before each read of DIVLSB, and a maximum number of tries to prevent that situation from happening.
Also fail probe if port reset is unsuccessful.
Fixes: 10d8b34a4217 ("serial: max310x: Driver rework") Cc: stable@vger.kernel.org Signed-off-by: Hugo Villeneuve hvilleneuve@dimonoff.com Link: https://lore.kernel.org/r/20240116213001.3691629-5-hugo@hugovil.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/max310x.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
--- a/drivers/tty/serial/max310x.c +++ b/drivers/tty/serial/max310x.c @@ -237,6 +237,10 @@ #define MAX310x_REV_MASK (0xf8) #define MAX310X_WRITE_BIT 0x80
+/* Port startup definitions */ +#define MAX310X_PORT_STARTUP_WAIT_RETRIES 20 /* Number of retries */ +#define MAX310X_PORT_STARTUP_WAIT_DELAY_MS 10 /* Delay between retries */ + /* Crystal-related definitions */ #define MAX310X_XTAL_WAIT_RETRIES 20 /* Number of retries */ #define MAX310X_XTAL_WAIT_DELAY_MS 10 /* Delay between retries */ @@ -1346,6 +1350,9 @@ static int max310x_probe(struct device * goto out_clk;
for (i = 0; i < devtype->nr; i++) { + bool started = false; + unsigned int try = 0, val = 0; + /* Reset port */ regmap_write(regmaps[i], MAX310X_MODE2_REG, MAX310X_MODE2_RST_BIT); @@ -1354,8 +1361,17 @@ static int max310x_probe(struct device *
/* Wait for port startup */ do { - regmap_read(regmaps[i], MAX310X_BRGDIVLSB_REG, &ret); - } while (ret != 0x01); + msleep(MAX310X_PORT_STARTUP_WAIT_DELAY_MS); + regmap_read(regmaps[i], MAX310X_BRGDIVLSB_REG, &val); + + if (val == 0x01) + started = true; + } while (!started && (++try < MAX310X_PORT_STARTUP_WAIT_RETRIES)); + + if (!started) { + ret = dev_err_probe(dev, -EAGAIN, "port reset failed\n"); + goto out_uart; + }
regmap_write(regmaps[i], MAX310X_MODE1_REG, devtype->mode1); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masami Hiramatsu (Google) mhiramat@kernel.org
commit a8b9cf62ade1bf17261a979fc97e40c2d7842353 upstream.
The commit 60c8971899f3 ("ftrace: Make DIRECT_CALLS work WITH_ARGS and !WITH_REGS") changed DIRECT_CALLS to use SAVE_ARGS when there are multiple ftrace_ops at the same function, but since the x86 only support to jump to direct_call from ftrace_regs_caller, when we set the function tracer on the same target function on x86, ftrace-direct does not work as below (this actually works on arm64.)
At first, insmod ftrace-direct.ko to put a direct_call on 'wake_up_process()'.
# insmod kernel/samples/ftrace/ftrace-direct.ko # less trace ... <idle>-0 [006] ..s1. 564.686958: my_direct_func: waking up rcu_preempt-17 <idle>-0 [007] ..s1. 564.687836: my_direct_func: waking up kcompactd0-63 <idle>-0 [006] ..s1. 564.690926: my_direct_func: waking up rcu_preempt-17 <idle>-0 [006] ..s1. 564.696872: my_direct_func: waking up rcu_preempt-17 <idle>-0 [007] ..s1. 565.191982: my_direct_func: waking up kcompactd0-63
Setup a function filter to the 'wake_up_process' too, and enable it.
# cd /sys/kernel/tracing/ # echo wake_up_process > set_ftrace_filter # echo function > current_tracer # less trace ... <idle>-0 [006] ..s3. 686.180972: wake_up_process <-call_timer_fn <idle>-0 [006] ..s3. 686.186919: wake_up_process <-call_timer_fn <idle>-0 [002] ..s3. 686.264049: wake_up_process <-call_timer_fn <idle>-0 [002] d.h6. 686.515216: wake_up_process <-kick_pool <idle>-0 [002] d.h6. 686.691386: wake_up_process <-kick_pool
Then, only function tracer is shown on x86. But if you enable 'kprobe on ftrace' event (which uses SAVE_REGS flag) on the same function, it is shown again.
# echo 'p wake_up_process' >> dynamic_events # echo 1 > events/kprobes/p_wake_up_process_0/enable # echo > trace # less trace ... <idle>-0 [006] ..s2. 2710.345919: p_wake_up_process_0: (wake_up_process+0x4/0x20) <idle>-0 [006] ..s3. 2710.345923: wake_up_process <-call_timer_fn <idle>-0 [006] ..s1. 2710.345928: my_direct_func: waking up rcu_preempt-17 <idle>-0 [006] ..s2. 2710.349931: p_wake_up_process_0: (wake_up_process+0x4/0x20) <idle>-0 [006] ..s3. 2710.349934: wake_up_process <-call_timer_fn <idle>-0 [006] ..s1. 2710.349937: my_direct_func: waking up rcu_preempt-17
To fix this issue, use SAVE_REGS flag for multiple ftrace_ops flag of direct_call by default.
Link: https://lore.kernel.org/linux-trace-kernel/170484558617.178953.1590516949390...
Fixes: 60c8971899f3 ("ftrace: Make DIRECT_CALLS work WITH_ARGS and !WITH_REGS") Cc: stable@vger.kernel.org Cc: Florent Revest revest@chromium.org Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Reviewed-by: Mark Rutland mark.rutland@arm.com Tested-by: Mark Rutland mark.rutland@arm.com [arm64] Acked-by: Jiri Olsa jolsa@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/ftrace.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -5325,7 +5325,17 @@ static LIST_HEAD(ftrace_direct_funcs);
static int register_ftrace_function_nolock(struct ftrace_ops *ops);
+/* + * If there are multiple ftrace_ops, use SAVE_REGS by default, so that direct + * call will be jumped from ftrace_regs_caller. Only if the architecture does + * not support ftrace_regs_caller but direct_call, use SAVE_ARGS so that it + * jumps from ftrace_caller for multiple ftrace_ops. + */ +#ifndef HAVE_DYNAMIC_FTRACE_WITH_REGS #define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_ARGS) +#else +#define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_REGS) +#endif
static int check_direct_multi(struct ftrace_ops *ops) {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Naveen N Rao naveen@kernel.org
commit aad98efd0b121f63a2e1c221dcb4d4850128c697 upstream.
Nysal reported that userspace backtraces are missing in offcputime bcc tool. As an example: $ sudo ./bcc/tools/offcputime.py -uU Tracing off-CPU time (us) of user threads by user stack... Hit Ctrl-C to end.
^C write - python (9107) 8
write - sudo (9105) 9
mmap - python (9107) 16
clock_nanosleep - multipathd (697) 3001604
The offcputime bcc tool attaches a bpf program to a kprobe on finish_task_switch(), which is usually hit on a syscall from userspace. With the switch to system call vectored, we started setting pt_regs->link to zero. This is because system call vectored behaves like a function call with LR pointing to the system call return address, and with no modification to SRR0/SRR1. The LR value does indicate our next instruction, so it is being saved as pt_regs->nip, and pt_regs->link is being set to zero. This is not a problem by itself, but BPF uses perf callchain infrastructure for capturing stack traces, and that stores LR as the second entry in the stack trace. perf has code to cope with the second entry being zero, and skips over it. However, generic userspace unwinders assume that a zero entry indicates end of the stack trace, resulting in a truncated userspace stack trace.
Rather than fixing all userspace unwinders to ignore/skip past the second entry, store the real LR value in pt_regs->link so that there continues to be a valid, though duplicate entry in the stack trace.
With this change: $ sudo ./bcc/tools/offcputime.py -uU Tracing off-CPU time (us) of user threads by user stack... Hit Ctrl-C to end.
^C write write [unknown] [unknown] [unknown] [unknown] [unknown] PyObject_VectorcallMethod [unknown] [unknown] PyObject_CallOneArg PyFile_WriteObject PyFile_WriteString [unknown] [unknown] PyObject_Vectorcall _PyEval_EvalFrameDefault PyEval_EvalCode [unknown] [unknown] [unknown] _PyRun_SimpleFileObject _PyRun_AnyFileObject Py_RunMain [unknown] Py_BytesMain [unknown] __libc_start_main - python (1293) 7
write write [unknown] sudo_ev_loop_v1 sudo_ev_dispatch_v1 [unknown] [unknown] [unknown] [unknown] __libc_start_main - sudo (1291) 7
syscall syscall bpf_open_perf_buffer_opts [unknown] [unknown] [unknown] [unknown] _PyObject_MakeTpCall PyObject_Vectorcall _PyEval_EvalFrameDefault PyEval_EvalCode [unknown] [unknown] [unknown] _PyRun_SimpleFileObject _PyRun_AnyFileObject Py_RunMain [unknown] Py_BytesMain [unknown] __libc_start_main - python (1293) 11
clock_nanosleep clock_nanosleep nanosleep sleep [unknown] [unknown] __clone - multipathd (698) 3001661
Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv instructions") Cc: stable@vger.kernel.org Reported-by: "Nysal Jan K.A" nysal@linux.ibm.com Signed-off-by: Naveen N Rao naveen@kernel.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20240202154316.395276-1-naveen@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/interrupt_64.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/powerpc/kernel/interrupt_64.S +++ b/arch/powerpc/kernel/interrupt_64.S @@ -52,7 +52,8 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectore mr r10,r1 ld r1,PACAKSAVE(r13) std r10,0(r1) - std r11,_NIP(r1) + std r11,_LINK(r1) + std r11,_NIP(r1) /* Saved LR is also the next instruction */ std r12,_MSR(r1) std r0,GPR0(r1) std r10,GPR1(r1) @@ -70,7 +71,6 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectore std r9,GPR13(r1) SAVE_NVGPRS(r1) std r11,_XER(r1) - std r11,_LINK(r1) std r11,_CTR(r1)
li r11,\trapnr
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Engraf david.engraf@sysgo.com
commit eb6d871f4ba49ac8d0537e051fe983a3a4027f61 upstream.
Commit e320a76db4b0 ("powerpc/cputable: Split cpu_specs[] out of cputable.h") moved the cpu_specs to separate header files. Previously PPC_FEATURE_BOOKE was enabled by CONFIG_PPC_BOOK3E_64. The definition in cpu_specs_e500mc.h for PPC64 no longer enables PPC_FEATURE_BOOKE.
This breaks user space reading the ELF hwcaps and expect PPC_FEATURE_BOOKE. Debugging an application with gdb is no longer working on e5500/e6500 because the 64-bit detection relies on PPC_FEATURE_BOOKE for Book-E.
Fixes: e320a76db4b0 ("powerpc/cputable: Split cpu_specs[] out of cputable.h") Cc: stable@vger.kernel.org # v6.1+ Signed-off-by: David Engraf david.engraf@sysgo.com Reviewed-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20240207092758.1058893-1-david.engraf@sysgo.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kernel/cpu_specs_e500mc.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/powerpc/kernel/cpu_specs_e500mc.h +++ b/arch/powerpc/kernel/cpu_specs_e500mc.h @@ -8,7 +8,8 @@
#ifdef CONFIG_PPC64 #define COMMON_USER_BOOKE (PPC_FEATURE_32 | PPC_FEATURE_HAS_MMU | \ - PPC_FEATURE_HAS_FPU | PPC_FEATURE_64) + PPC_FEATURE_HAS_FPU | PPC_FEATURE_64 | \ + PPC_FEATURE_BOOKE) #else #define COMMON_USER_BOOKE (PPC_FEATURE_32 | PPC_FEATURE_HAS_MMU | \ PPC_FEATURE_BOOKE)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shrikanth Hegde sshegde@linux.ibm.com
commit cbecc9fcbbec60136b0180ba0609c829afed5c81 upstream.
powerVM hypervisor updates the VPA fields with stolen time data. It currently reports enqueue_dispatch_tb and ready_enqueue_tb for this purpose. In linux these two fields are used to report the stolen time.
The VPA fields are updated at the TB frequency. On powerPC its mostly set at 512Mhz. Hence this needs a conversion to ns when reporting it back as rest of the kernel timings are in ns. This conversion is already handled in tb_to_ns function. So use that function to report accurate stolen time.
Observed this issue and used an Capped Shared Processor LPAR(SPLPAR) to simplify the experiments. In all these cases, 100% VP Load is run using stress-ng workload. Values of stolen time is in percentages as reported by mpstat. With the patch values are close to expected.
6.8.rc1 +Patch 12EC/12VP 0.0 0.0 12EC/24VP 25.7 50.2 12EC/36VP 37.3 69.2 12EC/48VP 38.5 78.3
Fixes: 0e8a63132800 ("powerpc/pseries: Implement CONFIG_PARAVIRT_TIME_ACCOUNTING") Cc: stable@vger.kernel.org # v6.1+ Signed-off-by: Shrikanth Hegde sshegde@linux.ibm.com Reviewed-by: Nicholas Piggin npiggin@gmail.com Reviewed-by: Srikar Dronamraju srikar@linux.vnet.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20240213052635.231597-1-sshegde@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/platforms/pseries/lpar.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/arch/powerpc/platforms/pseries/lpar.c +++ b/arch/powerpc/platforms/pseries/lpar.c @@ -662,8 +662,12 @@ u64 pseries_paravirt_steal_clock(int cpu { struct lppaca *lppaca = &lppaca_of(cpu);
- return be64_to_cpu(READ_ONCE(lppaca->enqueue_dispatch_tb)) + - be64_to_cpu(READ_ONCE(lppaca->ready_enqueue_tb)); + /* + * VPA steal time counters are reported at TB frequency. Hence do a + * conversion to ns before returning + */ + return tb_to_ns(be64_to_cpu(READ_ONCE(lppaca->enqueue_dispatch_tb)) + + be64_to_cpu(READ_ONCE(lppaca->ready_enqueue_tb))); } #endif
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Slaby (SUSE) jirislaby@kernel.org
commit 3ee07964d407411fd578a3bc998de44fd64d266a upstream.
And an enum with a flag: UART_TX_NOSTOP. To NOT call __port->ops->stop_tx() when the circular buffer is empty. mxs-uart needs this (see the next patch).
Signed-off-by: "Jiri Slaby (SUSE)" jirislaby@kernel.org Cc: stable stable@kernel.org Tested-by: Emil Kronborg emil.kronborg@protonmail.com Link: https://lore.kernel.org/r/20240201105557.28043-1-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/serial_core.h | 32 +++++++++++++++++++++++++++----- 1 file changed, 27 insertions(+), 5 deletions(-)
--- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -748,8 +748,17 @@ struct uart_driver {
void uart_write_wakeup(struct uart_port *port);
-#define __uart_port_tx(uport, ch, tx_ready, put_char, tx_done, for_test, \ - for_post) \ +/** + * enum UART_TX_FLAGS -- flags for uart_port_tx_flags() + * + * @UART_TX_NOSTOP: don't call port->ops->stop_tx() on empty buffer + */ +enum UART_TX_FLAGS { + UART_TX_NOSTOP = BIT(0), +}; + +#define __uart_port_tx(uport, ch, flags, tx_ready, put_char, tx_done, \ + for_test, for_post) \ ({ \ struct uart_port *__port = (uport); \ struct circ_buf *xmit = &__port->state->xmit; \ @@ -777,7 +786,7 @@ void uart_write_wakeup(struct uart_port if (pending < WAKEUP_CHARS) { \ uart_write_wakeup(__port); \ \ - if (pending == 0) \ + if (!((flags) & UART_TX_NOSTOP) && pending == 0) \ __port->ops->stop_tx(__port); \ } \ \ @@ -812,7 +821,7 @@ void uart_write_wakeup(struct uart_port */ #define uart_port_tx_limited(port, ch, count, tx_ready, put_char, tx_done) ({ \ unsigned int __count = (count); \ - __uart_port_tx(port, ch, tx_ready, put_char, tx_done, __count, \ + __uart_port_tx(port, ch, 0, tx_ready, put_char, tx_done, __count, \ __count--); \ })
@@ -826,8 +835,21 @@ void uart_write_wakeup(struct uart_port * See uart_port_tx_limited() for more details. */ #define uart_port_tx(port, ch, tx_ready, put_char) \ - __uart_port_tx(port, ch, tx_ready, put_char, ({}), true, ({})) + __uart_port_tx(port, ch, 0, tx_ready, put_char, ({}), true, ({}))
+ +/** + * uart_port_tx_flags -- transmit helper for uart_port with flags + * @port: uart port + * @ch: variable to store a character to be written to the HW + * @flags: %UART_TX_NOSTOP or similar + * @tx_ready: can HW accept more data function + * @put_char: function to write a character + * + * See uart_port_tx_limited() for more details. + */ +#define uart_port_tx_flags(port, ch, flags, tx_ready, put_char) \ + __uart_port_tx(port, ch, flags, tx_ready, put_char, ({}), true, ({})) /* * Baud rate helpers. */
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Slaby (SUSE) jirislaby@kernel.org
commit 7be50f2e8f20fc2299069b28dea59a28e3abe20a upstream.
Emil reports: After updating Linux on an i.MX28 board, serial communication over AUART broke. When I TX from the board and measure on the TX pin, it seems like the HW fifo is not emptied before the transmission is stopped.
MXS performs weird things with stop_tx(). The driver makes it conditional on uart_tx_stopped().
So the driver needs special handling. Pass the brand new UART_TX_NOSTOP to uart_port_tx_flags() and handle the stop on its own.
Signed-off-by: "Jiri Slaby (SUSE)" jirislaby@kernel.org Reported-by: Emil Kronborg emil.kronborg@protonmail.com Cc: stable stable@kernel.org Fixes: 2d141e683e9a ("tty: serial: use uart_port_tx() helper") Closes: https://lore.kernel.org/all/miwgbnvy3hjpnricubg76ytpn7xoceehwahupy25bubbduu2... Tested-by: Stefan Wahren wahrenst@gmx.net Tested-by: Emil Kronborg emil.kronborg@protonmail.com Link: https://lore.kernel.org/r/20240201105557.28043-2-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/mxs-auart.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/tty/serial/mxs-auart.c +++ b/drivers/tty/serial/mxs-auart.c @@ -605,13 +605,16 @@ static void mxs_auart_tx_chars(struct mx return; }
- pending = uart_port_tx(&s->port, ch, + pending = uart_port_tx_flags(&s->port, ch, UART_TX_NOSTOP, !(mxs_read(s, REG_STAT) & AUART_STAT_TXFF), mxs_write(ch, s, REG_DATA)); if (pending) mxs_set(AUART_INTR_TXIEN, s, REG_INTR); else mxs_clr(AUART_INTR_TXIEN, s, REG_INTR); + + if (uart_tx_stopped(&s->port)) + mxs_auart_stop_tx(&s->port); }
static void mxs_auart_rx_char(struct mxs_auart_port *s)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aleksander Mazur deweloper@wp.pl
commit f6a1892585cd19e63c4ef2334e26cd536d5b678d upstream.
The kernel built with MCRUSOE is unbootable on Transmeta Crusoe. It shows the following error message:
This kernel requires an i686 CPU, but only detected an i586 CPU. Unable to boot - please use a kernel appropriate for your CPU.
Remove MCRUSOE from the condition introduced in commit in Fixes, effectively changing X86_MINIMUM_CPU_FAMILY back to 5 on that machine, which matches the CPU family given by CPUID.
[ bp: Massage commit message. ]
Fixes: 25d76ac88821 ("x86/Kconfig: Explicitly enumerate i686-class CPUs in Kconfig") Signed-off-by: Aleksander Mazur deweloper@wp.pl Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Acked-by: H. Peter Anvin hpa@zytor.com Cc: stable@kernel.org Link: https://lore.kernel.org/r/20240123134309.1117782-1-deweloper@wp.pl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/Kconfig.cpu | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/Kconfig.cpu +++ b/arch/x86/Kconfig.cpu @@ -375,7 +375,7 @@ config X86_CMOV config X86_MINIMUM_CPU_FAMILY int default "64" if X86_64 - default "6" if X86_32 && (MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MEFFICEON || MATOM || MCRUSOE || MCORE2 || MK7 || MK8) + default "6" if X86_32 && (MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MEFFICEON || MATOM || MCORE2 || MK7 || MK8) default "5" if X86_32 && X86_CMPXCHG64 default "4"
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrei Vagin avagin@google.com
commit d877550eaf2dc9090d782864c96939397a3c6835 upstream.
Before this change, the expected size of the user space buffer was taken from fx_sw->xstate_size. fx_sw->xstate_size can be changed from user-space, so it is possible construct a sigreturn frame where:
* fx_sw->xstate_size is smaller than the size required by valid bits in fx_sw->xfeatures. * user-space unmaps parts of the sigrame fpu buffer so that not all of the buffer required by xrstor is accessible.
In this case, xrstor tries to restore and accesses the unmapped area which results in a fault. But fault_in_readable succeeds because buf + fx_sw->xstate_size is within the still mapped area, so it goes back and tries xrstor again. It will spin in this loop forever.
Instead, fault in the maximum size which can be touched by XRSTOR (taken from fpstate->user_size).
[ dhansen: tweak subject / changelog ]
Fixes: fcb3635f5018 ("x86/fpu/signal: Handle #PF in the direct restore path") Reported-by: Konstantin Bogomolov bogomolov@google.com Suggested-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Andrei Vagin avagin@google.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20240130063603.3392627-1-avagin%40google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/fpu/signal.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-)
--- a/arch/x86/kernel/fpu/signal.c +++ b/arch/x86/kernel/fpu/signal.c @@ -274,12 +274,13 @@ static int __restore_fpregs_from_user(vo * Attempt to restore the FPU registers directly from user memory. * Pagefaults are handled and any errors returned are fatal. */ -static bool restore_fpregs_from_user(void __user *buf, u64 xrestore, - bool fx_only, unsigned int size) +static bool restore_fpregs_from_user(void __user *buf, u64 xrestore, bool fx_only) { struct fpu *fpu = ¤t->thread.fpu; int ret;
+ /* Restore enabled features only. */ + xrestore &= fpu->fpstate->user_xfeatures; retry: fpregs_lock(); /* Ensure that XFD is up to date */ @@ -309,7 +310,7 @@ retry: if (ret != X86_TRAP_PF) return false;
- if (!fault_in_readable(buf, size)) + if (!fault_in_readable(buf, fpu->fpstate->user_size)) goto retry; return false; } @@ -339,7 +340,6 @@ static bool __fpu_restore_sig(void __use struct user_i387_ia32_struct env; bool success, fx_only = false; union fpregs_state *fpregs; - unsigned int state_size; u64 user_xfeatures = 0;
if (use_xsave()) { @@ -349,17 +349,14 @@ static bool __fpu_restore_sig(void __use return false;
fx_only = !fx_sw_user.magic1; - state_size = fx_sw_user.xstate_size; user_xfeatures = fx_sw_user.xfeatures; } else { user_xfeatures = XFEATURE_MASK_FPSSE; - state_size = fpu->fpstate->user_size; }
if (likely(!ia32_fxstate)) { /* Restore the FPU registers directly from user memory. */ - return restore_fpregs_from_user(buf_fx, user_xfeatures, fx_only, - state_size); + return restore_fpregs_from_user(buf_fx, user_xfeatures, fx_only); }
/*
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Prasad Pandit pjp@fedoraproject.org
commit 6231c9e1a9f35b535c66709aa8a6eda40dbc4132 upstream.
kvm_vcpu_ioctl_x86_set_vcpu_events() routine makes 'KVM_REQ_NMI' request for a vcpu even when its 'events->nmi.pending' is zero. Ex: qemu_thread_start kvm_vcpu_thread_fn qemu_wait_io_event qemu_wait_io_event_common process_queued_cpu_work do_kvm_cpu_synchronize_post_init/_reset kvm_arch_put_registers kvm_put_vcpu_events (cpu, level=[2|3])
This leads vCPU threads in QEMU to constantly acquire & release the global mutex lock, delaying the guest boot due to lock contention. Add check to make KVM_REQ_NMI request only if vcpu has NMI pending.
Fixes: bdedff263132 ("KVM: x86: Route pending NMIs from userspace through process_nmi()") Cc: stable@vger.kernel.org Signed-off-by: Prasad Pandit pjp@fedoraproject.org Link: https://lore.kernel.org/r/20240103075343.549293-1-ppandit@redhat.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/x86.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5405,7 +5405,8 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_e if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) { vcpu->arch.nmi_pending = 0; atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending); - kvm_make_request(KVM_REQ_NMI, vcpu); + if (events->nmi.pending) + kvm_make_request(KVM_REQ_NMI, vcpu); } static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mingwei Zhang mizhang@google.com
commit 05519c86d6997cfb9bb6c82ce1595d1015b718dc upstream.
Use a u64 instead of a u8 when taking a snapshot of pmu->fixed_ctr_ctrl when reprogramming fixed counters, as truncating the value results in KVM thinking fixed counter 2 is already disabled (the bug also affects fixed counters 3+, but KVM doesn't yet support those). As a result, if the guest disables fixed counter 2, KVM will get a false negative and fail to reprogram/disable emulation of the counter, which can leads to incorrect counts and spurious PMIs in the guest.
Fixes: 76d287b2342e ("KVM: x86/pmu: Drop "u8 ctrl, int idx" for reprogram_fixed_counter()") Cc: stable@vger.kernel.org Signed-off-by: Mingwei Zhang mizhang@google.com Link: https://lore.kernel.org/r/20240123221220.3911317-1-mizhang@google.com [sean: rewrite changelog to call out the effects of the bug] Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/vmx/pmu_intel.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -71,7 +71,7 @@ static int fixed_pmc_events[] = { static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data) { struct kvm_pmc *pmc; - u8 old_fixed_ctr_ctrl = pmu->fixed_ctr_ctrl; + u64 old_fixed_ctr_ctrl = pmu->fixed_ctr_ctrl; int i;
pmu->fixed_ctr_ctrl = data;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steve Wahl steve.wahl@hpe.com
commit d794734c9bbfe22f86686dc2909c25f5ffe1a572 upstream.
When ident_pud_init() uses only gbpages to create identity maps, large ranges of addresses not actually requested can be included in the resulting table; a 4K request will map a full GB. On UV systems, this ends up including regions that will cause hardware to halt the system if accessed (these are marked "reserved" by BIOS). Even processor speculation into these regions is enough to trigger the system halt.
Only use gbpages when map creation requests include the full GB page of space. Fall back to using smaller 2M pages when only portions of a GB page are included in the request.
No attempt is made to coalesce mapping requests. If a request requires a map entry at the 2M (pmd) level, subsequent mapping requests within the same 1G region will also be at the pmd level, even if adjacent or overlapping such requests could have been combined to map a full gbpage. Existing usage starts with larger regions and then adds smaller regions, so this should not have any great consequence.
[ dhansen: fix up comment formatting, simplifty changelog ]
Signed-off-by: Steve Wahl steve.wahl@hpe.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20240126164841.170866-1-steve.wahl%40hpe.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/mm/ident_map.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-)
--- a/arch/x86/mm/ident_map.c +++ b/arch/x86/mm/ident_map.c @@ -26,18 +26,31 @@ static int ident_pud_init(struct x86_map for (; addr < end; addr = next) { pud_t *pud = pud_page + pud_index(addr); pmd_t *pmd; + bool use_gbpage;
next = (addr & PUD_MASK) + PUD_SIZE; if (next > end) next = end;
- if (info->direct_gbpages) { - pud_t pudval; + /* if this is already a gbpage, this portion is already mapped */ + if (pud_large(*pud)) + continue; + + /* Is using a gbpage allowed? */ + use_gbpage = info->direct_gbpages;
- if (pud_present(*pud)) - continue; + /* Don't use gbpage if it maps more than the requested region. */ + /* at the begining: */ + use_gbpage &= ((addr & ~PUD_MASK) == 0); + /* ... or at the end: */ + use_gbpage &= ((next & ~PUD_MASK) == 0); + + /* Never overwrite existing mappings */ + use_gbpage &= !pud_present(*pud); + + if (use_gbpage) { + pud_t pudval;
- addr &= PUD_MASK; pudval = __pud((addr - info->offset) | info->page_flag); set_pud(pud, pudval); continue;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Axboe axboe@kernel.dk
commit a37ee9e117ef73bbc2f5c0b31911afd52d229861 upstream.
If we hit CQ ring overflow when attempting to post a multishot accept completion, we don't properly save the result or return code. This results in losing the accepted fd value.
Instead, we return the result from the poll operation that triggered the accept retry. This is generally POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND which is 0xc3, or 195, which looks like a valid file descriptor, but it really has no connection to that.
Handle this like we do for other multishot completions - assign the result, and return IOU_STOP_MULTISHOT to cancel any further completions from this request when overflow is hit. This preserves the result, as we should, and tells the application that the request needs to be re-armed.
Cc: stable@vger.kernel.org Fixes: 515e26961295 ("io_uring: revert "io_uring fix multishot accept ordering"") Link: https://github.com/axboe/liburing/issues/1062 Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/net.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/io_uring/net.c +++ b/io_uring/net.c @@ -1372,7 +1372,7 @@ retry: * has already been done */ if (issue_flags & IO_URING_F_MULTISHOT) - ret = IOU_ISSUE_SKIP_COMPLETE; + return IOU_ISSUE_SKIP_COMPLETE; return ret; } if (ret == -ERESTARTSYS) @@ -1397,7 +1397,8 @@ retry: ret, IORING_CQE_F_MORE)) goto retry;
- return -ECANCELED; + io_req_set_res(req, ret, 0); + return IOU_STOP_MULTISHOT; }
int io_socket_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Stein alexander.stein@ew.tq-group.com
commit cc9432c4fb159a3913e0ce3173b8218cd5bad2e0 upstream.
This change uses the appropriate _cansleep or non-sleeping API for reading GPIO read-only state. This allows users with GPIOs that never sleepbeing called in atomic context.
Implement the same mechanism as in commit 52af318c93e97 ("mmc: Allow non-sleeping GPIO cd").
Signed-off-by: Alexander Stein alexander.stein@ew.tq-group.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240206083912.2543142-1-alexander.stein@ew.tq-gro... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/core/slot-gpio.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/mmc/core/slot-gpio.c +++ b/drivers/mmc/core/slot-gpio.c @@ -75,11 +75,15 @@ EXPORT_SYMBOL(mmc_gpio_set_cd_irq); int mmc_gpio_get_ro(struct mmc_host *host) { struct mmc_gpio *ctx = host->slot.handler_priv; + int cansleep;
if (!ctx || !ctx->ro_gpio) return -ENOSYS;
- return gpiod_get_value_cansleep(ctx->ro_gpio); + cansleep = gpiod_cansleep(ctx->ro_gpio); + return cansleep ? + gpiod_get_value_cansleep(ctx->ro_gpio) : + gpiod_get_value(ctx->ro_gpio); } EXPORT_SYMBOL(mmc_gpio_get_ro);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
commit ebe0c15b135b1e4092c25b95d89e9a5899467499 upstream.
Add empty stub of gpio_device_get_base() when GPIOLIB is not enabled.
Cc: stable@vger.kernel.org Fixes: 8c85a102fc4e ("gpiolib: provide gpio_device_get_base()") Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/gpio/driver.h | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/include/linux/gpio/driver.h +++ b/include/linux/gpio/driver.h @@ -800,6 +800,12 @@ static inline struct gpio_chip *gpiod_to return ERR_PTR(-ENODEV); }
+static inline int gpio_device_get_base(struct gpio_device *gdev) +{ + WARN_ON(1); + return -ENODEV; +} + static inline int gpiochip_lock_as_irq(struct gpio_chip *gc, unsigned int offset) {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
commit 6ac86372102b477083db99a9af8246fb916271b5 upstream.
Add empty stub of gpiod_to_gpio_device() when GPIOLIB is not enabled.
Cc: stable@vger.kernel.org Fixes: 370232d096e3 ("gpiolib: provide gpiod_to_gpio_device()") Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/gpio/driver.h | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/include/linux/gpio/driver.h +++ b/include/linux/gpio/driver.h @@ -806,6 +806,12 @@ static inline int gpio_device_get_base(s return -ENODEV; }
+static inline struct gpio_device *gpiod_to_gpio_device(struct gpio_desc *desc) +{ + WARN_ON(1); + return ERR_PTR(-ENODEV); +} + static inline int gpiochip_lock_as_irq(struct gpio_chip *gc, unsigned int offset) {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eniac Zhang eniac-xw.zhang@hp.com
commit 32f03f4002c5df837fb920eb23fcd2f4af9b0b23 upstream.
The HP mt645 G7 Thin Client uses an ALC236 codec and needs the ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF quirk to make the mute and micmute LEDs work.
There are two variants of the USB-C PD chip on this device. Each uses a different BIOS and board ID, hence the two entries.
Signed-off-by: Eniac Zhang eniac-xw.zhang@hp.com Signed-off-by: Alexandru Gagniuc alexandru.gagniuc@hp.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240215154922.778394-1-alexandru.gagniuc@hp.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -9915,6 +9915,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x103c, 0x8abb, "HP ZBook Firefly 14 G9", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8ad1, "HP EliteBook 840 14 inch G9 Notebook PC", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8ad2, "HP EliteBook 860 16 inch G9 Notebook PC", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED), + SND_PCI_QUIRK(0x103c, 0x8b0f, "HP Elite mt645 G7 Mobile Thin Client U81", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF), SND_PCI_QUIRK(0x103c, 0x8b2f, "HP 255 15.6 inch G10 Notebook PC", ALC236_FIXUP_HP_MUTE_LED_COEFBIT2), SND_PCI_QUIRK(0x103c, 0x8b42, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8b43, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED), @@ -9922,6 +9923,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x103c, 0x8b45, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8b46, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x8b47, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED), + SND_PCI_QUIRK(0x103c, 0x8b59, "HP Elite mt645 G7 Mobile Thin Client U89", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF), SND_PCI_QUIRK(0x103c, 0x8b5d, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF), SND_PCI_QUIRK(0x103c, 0x8b5e, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF), SND_PCI_QUIRK(0x103c, 0x8b63, "HP Elite Dragonfly 13.5 inch G4", ALC245_FIXUP_CS35L41_SPI_4_HP_GPIO_LED),
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: bo liu bo.liu@senarytech.com
commit 4639c5021029d49fd2f97fa8d74731f167f98919 upstream.
The SWS JS201D need a different pinconfig from windows driver. Add a quirk to use a specific pinconfig to SWS JS201D.
Signed-off-by: bo liu bo.liu@senarytech.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240205013802.51907-1-bo.liu@senarytech.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_conexant.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
--- a/sound/pci/hda/patch_conexant.c +++ b/sound/pci/hda/patch_conexant.c @@ -344,6 +344,7 @@ enum { CXT_FIXUP_HP_ZBOOK_MUTE_LED, CXT_FIXUP_HEADSET_MIC, CXT_FIXUP_HP_MIC_NO_PRESENCE, + CXT_PINCFG_SWS_JS201D, };
/* for hda_fixup_thinkpad_acpi() */ @@ -841,6 +842,17 @@ static const struct hda_pintbl cxt_pincf {} };
+/* SuoWoSi/South-holding JS201D with sn6140 */ +static const struct hda_pintbl cxt_pincfg_sws_js201d[] = { + { 0x16, 0x03211040 }, /* hp out */ + { 0x17, 0x91170110 }, /* SPK/Class_D */ + { 0x18, 0x95a70130 }, /* Internal mic */ + { 0x19, 0x03a11020 }, /* Headset Mic */ + { 0x1a, 0x40f001f0 }, /* Not used */ + { 0x21, 0x40f001f0 }, /* Not used */ + {} +}; + static const struct hda_fixup cxt_fixups[] = { [CXT_PINCFG_LENOVO_X200] = { .type = HDA_FIXUP_PINS, @@ -996,6 +1008,10 @@ static const struct hda_fixup cxt_fixups .chained = true, .chain_id = CXT_FIXUP_HEADSET_MIC, }, + [CXT_PINCFG_SWS_JS201D] = { + .type = HDA_FIXUP_PINS, + .v.pins = cxt_pincfg_sws_js201d, + }, };
static const struct snd_pci_quirk cxt5045_fixups[] = { @@ -1069,6 +1085,7 @@ static const struct snd_pci_quirk cxt506 SND_PCI_QUIRK(0x103c, 0x8457, "HP Z2 G4 mini", CXT_FIXUP_HP_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x103c, 0x8458, "HP Z2 G4 mini premium", CXT_FIXUP_HP_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1043, 0x138d, "Asus", CXT_FIXUP_HEADPHONE_MIC_PIN), + SND_PCI_QUIRK(0x14f1, 0x0265, "SWS JS201D", CXT_PINCFG_SWS_JS201D), SND_PCI_QUIRK(0x152d, 0x0833, "OLPC XO-1.5", CXT_FIXUP_OLPC_XO), SND_PCI_QUIRK(0x17aa, 0x20f2, "Lenovo T400", CXT_PINCFG_LENOVO_TP410), SND_PCI_QUIRK(0x17aa, 0x215e, "Lenovo T410", CXT_PINCFG_LENOVO_TP410), @@ -1109,6 +1126,7 @@ static const struct hda_model_fixup cxt5 { .id = CXT_FIXUP_HP_ZBOOK_MUTE_LED, .name = "hp-zbook-mute-led" }, { .id = CXT_FIXUP_HP_MIC_NO_PRESENCE, .name = "hp-mic-fix" }, { .id = CXT_PINCFG_LENOVO_NOTEBOOK, .name = "lenovo-20149" }, + { .id = CXT_PINCFG_SWS_JS201D, .name = "sws-js201d" }, {} };
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shuming Fan shumingf@realtek.com
commit fddab35fd064414c677e9488c4fb3a1f67725d37 upstream.
This patch adds another two IDs for the Dell dual speaker platform.
Signed-off-by: Shuming Fan shumingf@realtek.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240205072252.3791500-1-shumingf@realtek.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -9724,7 +9724,9 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x1028, 0x0b71, "Dell Inspiron 16 Plus 7620", ALC295_FIXUP_DELL_INSPIRON_TOP_SPEAKERS), SND_PCI_QUIRK(0x1028, 0x0beb, "Dell XPS 15 9530 (2023)", ALC289_FIXUP_DELL_CS35L41_SPI_2), SND_PCI_QUIRK(0x1028, 0x0c03, "Dell Precision 5340", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x1028, 0x0c0b, "Dell Oasis 14 RPL-P", ALC289_FIXUP_RTK_AMP_DUAL_SPK), SND_PCI_QUIRK(0x1028, 0x0c0d, "Dell Oasis", ALC289_FIXUP_RTK_AMP_DUAL_SPK), + SND_PCI_QUIRK(0x1028, 0x0c0e, "Dell Oasis 16", ALC289_FIXUP_RTK_AMP_DUAL_SPK), SND_PCI_QUIRK(0x1028, 0x0c19, "Dell Precision 3340", ALC236_FIXUP_DELL_DUAL_CODECS), SND_PCI_QUIRK(0x1028, 0x0c1a, "Dell Precision 3340", ALC236_FIXUP_DELL_DUAL_CODECS), SND_PCI_QUIRK(0x1028, 0x0c1b, "Dell Precision 3440", ALC236_FIXUP_DELL_DUAL_CODECS),
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ryusuke Konishi konishi.ryusuke@gmail.com
commit 67b8bcbaed4777871bb0dcc888fb02a614a98ab1 upstream.
The helper function nilfs_recovery_copy_block() of nilfs_recovery_dsync_blocks(), which recovers data from logs created by data sync writes during a mount after an unclean shutdown, incorrectly calculates the on-page offset when copying repair data to the file's page cache. In environments where the block size is smaller than the page size, this flaw can cause data corruption and leak uninitialized memory bytes during the recovery process.
Fix these issues by correcting this byte offset calculation on the page.
Link: https://lkml.kernel.org/r/20240124121936.10575-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Tested-by: Ryusuke Konishi konishi.ryusuke@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nilfs2/recovery.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
--- a/fs/nilfs2/recovery.c +++ b/fs/nilfs2/recovery.c @@ -472,9 +472,10 @@ static int nilfs_prepare_segment_for_rec
static int nilfs_recovery_copy_block(struct the_nilfs *nilfs, struct nilfs_recovery_block *rb, - struct page *page) + loff_t pos, struct page *page) { struct buffer_head *bh_org; + size_t from = pos & ~PAGE_MASK; void *kaddr;
bh_org = __bread(nilfs->ns_bdev, rb->blocknr, nilfs->ns_blocksize); @@ -482,7 +483,7 @@ static int nilfs_recovery_copy_block(str return -EIO;
kaddr = kmap_atomic(page); - memcpy(kaddr + bh_offset(bh_org), bh_org->b_data, bh_org->b_size); + memcpy(kaddr + from, bh_org->b_data, bh_org->b_size); kunmap_atomic(kaddr); brelse(bh_org); return 0; @@ -521,7 +522,7 @@ static int nilfs_recover_dsync_blocks(st goto failed_inode; }
- err = nilfs_recovery_copy_block(nilfs, rb, page); + err = nilfs_recovery_copy_block(nilfs, rb, pos, page); if (unlikely(err)) goto failed_page;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ryusuke Konishi konishi.ryusuke@gmail.com
commit 38296afe3c6ee07319e01bb249aa4bb47c07b534 upstream.
Syzbot reported a hang issue in migrate_pages_batch() called by mbind() and nilfs_lookup_dirty_data_buffers() called in the log writer of nilfs2.
While migrate_pages_batch() locks a folio and waits for the writeback to complete, the log writer thread that should bring the writeback to completion picks up the folio being written back in nilfs_lookup_dirty_data_buffers() that it calls for subsequent log creation and was trying to lock the folio. Thus causing a deadlock.
In the first place, it is unexpected that folios/pages in the middle of writeback will be updated and become dirty. Nilfs2 adds a checksum to verify the validity of the log being written and uses it for recovery at mount, so data changes during writeback are suppressed. Since this is broken, an unclean shutdown could potentially cause recovery to fail.
Investigation revealed that the root cause is that the wait for writeback completion in nilfs_page_mkwrite() is conditional, and if the backing device does not require stable writes, data may be modified without waiting.
Fix these issues by making nilfs_page_mkwrite() wait for writeback to finish regardless of the stable write requirement of the backing device.
Link: https://lkml.kernel.org/r/20240131145657.4209-1-konishi.ryusuke@gmail.com Fixes: 1d1d1a767206 ("mm: only enforce stable page writes if the backing device requires it") Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Reported-by: syzbot+ee2ae68da3b22d04cd8d@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/00000000000047d819061004ad6c@google.com Tested-by: Ryusuke Konishi konishi.ryusuke@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nilfs2/file.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/fs/nilfs2/file.c +++ b/fs/nilfs2/file.c @@ -105,7 +105,13 @@ static vm_fault_t nilfs_page_mkwrite(str nilfs_transaction_commit(inode->i_sb);
mapped: - wait_for_stable_page(page); + /* + * Since checksumming including data blocks is performed to determine + * the validity of the log to be written and used for recovery, it is + * necessary to wait for writeback to finish here, regardless of the + * stable write requirement of the backing device. + */ + wait_on_page_writeback(page); out: sb_end_pagefault(inode->i_sb); return vmf_fs_error(ret);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kim Phillips kim.phillips@amd.com
commit ccb88e9549e7cfd8bcd511c538f437e20026e983 upstream.
The SEV platform device can be shutdown with a null psp_master, e.g., using DEBUG_TEST_DRIVER_REMOVE. Found using KASAN:
[ 137.148210] ccp 0000:23:00.1: enabling device (0000 -> 0002) [ 137.162647] ccp 0000:23:00.1: no command queues available [ 137.170598] ccp 0000:23:00.1: sev enabled [ 137.174645] ccp 0000:23:00.1: psp enabled [ 137.178890] general protection fault, probably for non-canonical address 0xdffffc000000001e: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN NOPTI [ 137.182693] KASAN: null-ptr-deref in range [0x00000000000000f0-0x00000000000000f7] [ 137.182693] CPU: 93 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc1+ #311 [ 137.182693] RIP: 0010:__sev_platform_shutdown_locked+0x51/0x180 [ 137.182693] Code: 08 80 3c 08 00 0f 85 0e 01 00 00 48 8b 1d 67 b6 01 08 48 b8 00 00 00 00 00 fc ff df 48 8d bb f0 00 00 00 48 89 f9 48 c1 e9 03 <80> 3c 01 00 0f 85 fe 00 00 00 48 8b 9b f0 00 00 00 48 85 db 74 2c [ 137.182693] RSP: 0018:ffffc900000cf9b0 EFLAGS: 00010216 [ 137.182693] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 000000000000001e [ 137.182693] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 00000000000000f0 [ 137.182693] RBP: ffffc900000cf9c8 R08: 0000000000000000 R09: fffffbfff58f5a66 [ 137.182693] R10: ffffc900000cf9c8 R11: ffffffffac7ad32f R12: ffff8881e5052c28 [ 137.182693] R13: ffff8881e5052c28 R14: ffff8881758e43e8 R15: ffffffffac64abf8 [ 137.182693] FS: 0000000000000000(0000) GS:ffff889de7000000(0000) knlGS:0000000000000000 [ 137.182693] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 137.182693] CR2: 0000000000000000 CR3: 0000001cf7c7e000 CR4: 0000000000350ef0 [ 137.182693] Call Trace: [ 137.182693] <TASK> [ 137.182693] ? show_regs+0x6c/0x80 [ 137.182693] ? __die_body+0x24/0x70 [ 137.182693] ? die_addr+0x4b/0x80 [ 137.182693] ? exc_general_protection+0x126/0x230 [ 137.182693] ? asm_exc_general_protection+0x2b/0x30 [ 137.182693] ? __sev_platform_shutdown_locked+0x51/0x180 [ 137.182693] sev_firmware_shutdown.isra.0+0x1e/0x80 [ 137.182693] sev_dev_destroy+0x49/0x100 [ 137.182693] psp_dev_destroy+0x47/0xb0 [ 137.182693] sp_destroy+0xbb/0x240 [ 137.182693] sp_pci_remove+0x45/0x60 [ 137.182693] pci_device_remove+0xaa/0x1d0 [ 137.182693] device_remove+0xc7/0x170 [ 137.182693] really_probe+0x374/0xbe0 [ 137.182693] ? srso_return_thunk+0x5/0x5f [ 137.182693] __driver_probe_device+0x199/0x460 [ 137.182693] driver_probe_device+0x4e/0xd0 [ 137.182693] __driver_attach+0x191/0x3d0 [ 137.182693] ? __pfx___driver_attach+0x10/0x10 [ 137.182693] bus_for_each_dev+0x100/0x190 [ 137.182693] ? __pfx_bus_for_each_dev+0x10/0x10 [ 137.182693] ? __kasan_check_read+0x15/0x20 [ 137.182693] ? srso_return_thunk+0x5/0x5f [ 137.182693] ? _raw_spin_unlock+0x27/0x50 [ 137.182693] driver_attach+0x41/0x60 [ 137.182693] bus_add_driver+0x2a8/0x580 [ 137.182693] driver_register+0x141/0x480 [ 137.182693] __pci_register_driver+0x1d6/0x2a0 [ 137.182693] ? srso_return_thunk+0x5/0x5f [ 137.182693] ? esrt_sysfs_init+0x1cd/0x5d0 [ 137.182693] ? __pfx_sp_mod_init+0x10/0x10 [ 137.182693] sp_pci_init+0x22/0x30 [ 137.182693] sp_mod_init+0x14/0x30 [ 137.182693] ? __pfx_sp_mod_init+0x10/0x10 [ 137.182693] do_one_initcall+0xd1/0x470 [ 137.182693] ? __pfx_do_one_initcall+0x10/0x10 [ 137.182693] ? parameq+0x80/0xf0 [ 137.182693] ? srso_return_thunk+0x5/0x5f [ 137.182693] ? __kmalloc+0x3b0/0x4e0 [ 137.182693] ? kernel_init_freeable+0x92d/0x1050 [ 137.182693] ? kasan_populate_vmalloc_pte+0x171/0x190 [ 137.182693] ? srso_return_thunk+0x5/0x5f [ 137.182693] kernel_init_freeable+0xa64/0x1050 [ 137.182693] ? __pfx_kernel_init+0x10/0x10 [ 137.182693] kernel_init+0x24/0x160 [ 137.182693] ? __switch_to_asm+0x3e/0x70 [ 137.182693] ret_from_fork+0x40/0x80 [ 137.182693] ? __pfx_kernel_init+0x10/0x10 [ 137.182693] ret_from_fork_asm+0x1b/0x30 [ 137.182693] </TASK> [ 137.182693] Modules linked in: [ 137.538483] ---[ end trace 0000000000000000 ]---
Fixes: 1b05ece0c931 ("crypto: ccp - During shutdown, check SEV data pointer before using") Cc: stable@vger.kernel.org Reviewed-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Kim Phillips kim.phillips@amd.com Reviewed-by: Liam Merwick liam.merwick@oracle.com Acked-by: John Allen john.allen@amd.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/crypto/ccp/sev-dev.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/drivers/crypto/ccp/sev-dev.c +++ b/drivers/crypto/ccp/sev-dev.c @@ -534,10 +534,16 @@ EXPORT_SYMBOL_GPL(sev_platform_init);
static int __sev_platform_shutdown_locked(int *error) { - struct sev_device *sev = psp_master->sev_data; + struct psp_device *psp = psp_master; + struct sev_device *sev; int ret;
- if (!sev || sev->state == SEV_STATE_UNINIT) + if (!psp || !psp->sev_data) + return 0; + + sev = psp->sev_data; + + if (sev->state == SEV_STATE_UNINIT) return 0;
ret = __sev_do_cmd_locked(SEV_CMD_SHUTDOWN, NULL, error);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Herbert Xu herbert@gondor.apana.org.au
commit 24c890dd712f6345e382256cae8c97abb0406b70 upstream.
When a zero-length message is hashed by algif_hash, and an error is triggered, it tries to free an SG list that was never allocated in the first place. Fix this by not freeing the SG list on the zero-length error path.
Reported-by: Shigeru Yoshida syoshida@redhat.com Reported-by: xingwei lee xrivendell7@gmail.com Fixes: b6d972f68983 ("crypto: af_alg/hash: Fix recvmsg() after sendmsg(MSG_MORE)") Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Reported-by: syzbot+3266db0c26d1fbbe3abb@syzkaller.appspotmail.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- crypto/algif_hash.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/crypto/algif_hash.c +++ b/crypto/algif_hash.c @@ -91,13 +91,13 @@ static int hash_sendmsg(struct socket *s if (!(msg->msg_flags & MSG_MORE)) { err = hash_alloc_result(sk, ctx); if (err) - goto unlock_free; + goto unlock_free_result; ahash_request_set_crypt(&ctx->req, NULL, ctx->result, 0); err = crypto_wait_req(crypto_ahash_final(&ctx->req), &ctx->wait); if (err) - goto unlock_free; + goto unlock_free_result; } goto done_more; } @@ -170,6 +170,7 @@ unlock:
unlock_free: af_alg_free_sg(&ctx->sgl); +unlock_free_result: hash_free_result(sk, ctx); ctx->more = false; goto unlock;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Basilio daniel.basilio@corigine.com
commit b3d4f7f2288901ed2392695919b3c0e24c1b4084 upstream.
The 1st and 2nd expansion BAR configuration registers are configured, when the driver starts up, in variables 'barcfg_msix_general' and 'barcfg_msix_xpb', respectively. The 'LengthSelect' field is ORed in from bit 0, which is incorrect. The 'LengthSelect' field should start from bit 27.
This has largely gone un-noticed because NFP_PCIE_BAR_PCIE2CPP_LengthSelect_32BIT happens to be 0.
Fixes: 4cb584e0ee7d ("nfp: add CPP access core") Cc: stable@vger.kernel.org # 4.11+ Signed-off-by: Daniel Basilio daniel.basilio@corigine.com Signed-off-by: Louis Peens louis.peens@corigine.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c @@ -537,11 +537,13 @@ static int enable_bars(struct nfp6000_pc const u32 barcfg_msix_general = NFP_PCIE_BAR_PCIE2CPP_MapType( NFP_PCIE_BAR_PCIE2CPP_MapType_GENERAL) | - NFP_PCIE_BAR_PCIE2CPP_LengthSelect_32BIT; + NFP_PCIE_BAR_PCIE2CPP_LengthSelect( + NFP_PCIE_BAR_PCIE2CPP_LengthSelect_32BIT); const u32 barcfg_msix_xpb = NFP_PCIE_BAR_PCIE2CPP_MapType( NFP_PCIE_BAR_PCIE2CPP_MapType_BULK) | - NFP_PCIE_BAR_PCIE2CPP_LengthSelect_32BIT | + NFP_PCIE_BAR_PCIE2CPP_LengthSelect( + NFP_PCIE_BAR_PCIE2CPP_LengthSelect_32BIT) | NFP_PCIE_BAR_PCIE2CPP_Target_BaseAddress( NFP_CPP_TARGET_ISLAND_XPB); const u32 barcfg_explicit[4] = {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: James Hershaw james.hershaw@corigine.com
commit 0f4d6f011bca0df2051532b41b596366aa272019 upstream.
Enable previously excluded xdp feature flag for NFD3 devices. This feature flag is required in order to bind nfp interfaces to an xdp socket and the nfp driver does in fact support the feature.
Fixes: 66c0e13ad236 ("drivers: net: turn on XDP features") Cc: stable@vger.kernel.org # 6.3+ Signed-off-by: James Hershaw james.hershaw@corigine.com Signed-off-by: Louis Peens louis.peens@corigine.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c @@ -2588,6 +2588,7 @@ static void nfp_net_netdev_init(struct n case NFP_NFD_VER_NFD3: netdev->netdev_ops = &nfp_nfd3_netdev_ops; netdev->xdp_features |= NETDEV_XDP_ACT_XSK_ZEROCOPY; + netdev->xdp_features |= NETDEV_XDP_ACT_REDIRECT; break; case NFP_NFD_VER_NFDK: netdev->netdev_ops = &nfp_nfdk_netdev_ops;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel de Villiers daniel.devilliers@corigine.com
commit 1a1c13303ff6d64e6f718dc8aa614e580ca8d9b4 upstream.
When physical ports are reset (either through link failure or manually toggled down and up again) that are slaved to a Linux bond with a tunnel endpoint IP address on the bond device, not all tunnel packets arriving on the bond port are decapped as expected.
The bond dev assigns the same MAC address to itself and each of its slaves. When toggling a slave device, the same MAC address is therefore offloaded to the NFP multiple times with different indexes.
The issue only occurs when re-adding the shared mac. The nfp_tunnel_add_shared_mac() function has a conditional check early on that checks if a mac entry already exists and if that mac entry is global: (entry && nfp_tunnel_is_mac_idx_global(entry->index)). In the case of a bonded device (For example br-ex), the mac index is obtained, and no new index is assigned.
We therefore modify the conditional in nfp_tunnel_add_shared_mac() to check if the port belongs to the LAG along with the existing checks to prevent a new global mac index from being re-assigned to the slave port.
Fixes: 20cce8865098 ("nfp: flower: enable MAC address sharing for offloadable devs") CC: stable@vger.kernel.org # 5.1+ Signed-off-by: Daniel de Villiers daniel.devilliers@corigine.com Signed-off-by: Louis Peens louis.peens@corigine.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c +++ b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c @@ -1084,7 +1084,7 @@ nfp_tunnel_add_shared_mac(struct nfp_app u16 nfp_mac_idx = 0;
entry = nfp_tunnel_lookup_offloaded_macs(app, netdev->dev_addr); - if (entry && nfp_tunnel_is_mac_idx_global(entry->index)) { + if (entry && (nfp_tunnel_is_mac_idx_global(entry->index) || netif_is_lag_port(netdev))) { if (entry->bridge_count || !nfp_flower_is_supported_bridge(netdev)) { nfp_tunnel_offloaded_macs_inc_ref_and_link(entry,
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
commit 353d321f63f7dbfc9ef58498cc732c9fe886a596 upstream.
The storage for the TLV PC register data wasn't done like all the other storage in the drv->fw area, which is cleared at the end of deallocation. Therefore, the freeing must also be done differently, explicitly NULL'ing it out after the free, since otherwise there's a nasty double-free bug here if a file fails to load after this has been parsed, and we get another free later (e.g. because no other file exists.) Fix that by adding the missing NULL assignment.
Cc: stable@vger.kernel.org Fixes: 5e31b3df86ec ("wifi: iwlwifi: dbg: print pc register data once fw dump occurred") Reported-by: Guy Kaplan guy.kaplan@intel.com Signed-off-by: Johannes Berg johannes.berg@intel.com Reviewed-by: Gregory Greenman gregory.greenman@intel.com Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Link: https://msgid.link/20240123200528.675f3c24ec0d.I6ab4015cd78d82dd95471f840629... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/intel/iwlwifi/iwl-drv.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/net/wireless/intel/iwlwifi/iwl-drv.c +++ b/drivers/net/wireless/intel/iwlwifi/iwl-drv.c @@ -128,6 +128,7 @@ static void iwl_dealloc_ucode(struct iwl kfree(drv->fw.ucode_capa.cmd_versions); kfree(drv->fw.phy_integration_ver); kfree(drv->trans->dbg.pc_data); + drv->trans->dbg.pc_data = NULL;
for (i = 0; i < IWL_UCODE_TYPE_MAX; i++) iwl_free_fw_img(drv, drv->fw.img + i);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
commit b743287d7a0007493f5cada34ed2085d475050b4 upstream.
When a wiphy work is queued with timer, and then again without a delay, it's started immediately but *also* started again after the timer expires. This can lead, for example, to warnings in mac80211's offchannel code as reported by Jouni. Running the same work twice isn't expected, of course. Fix this by deleting the timer at this point, when queuing immediately due to delay=0.
Cc: stable@vger.kernel.org Reported-by: Jouni Malinen j@w1.fi Fixes: a3ee4dc84c4e ("wifi: cfg80211: add a work abstraction with special semantics") Link: https://msgid.link/20240125095108.2feb0eaaa446.I4617f3210ed0e7f252290d5970da... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/wireless/core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/wireless/core.c +++ b/net/wireless/core.c @@ -5,7 +5,7 @@ * Copyright 2006-2010 Johannes Berg johannes@sipsolutions.net * Copyright 2013-2014 Intel Mobile Communications GmbH * Copyright 2015-2017 Intel Deutschland GmbH - * Copyright (C) 2018-2023 Intel Corporation + * Copyright (C) 2018-2024 Intel Corporation */
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt @@ -1661,6 +1661,7 @@ void wiphy_delayed_work_queue(struct wip unsigned long delay) { if (!delay) { + del_timer(&dwork->timer); wiphy_work_queue(wiphy, &dwork->work); return; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
commit c98d8836b817d11fdff4ca7749cbbe04ff7f0c64 upstream.
This pointer can change here since the SKB can change, so we actually later open-coded IEEE80211_SKB_CB() again. Reload the pointer where needed, so the monitor-mode case using it gets fixed, and then use info-> later as well.
Cc: stable@vger.kernel.org Fixes: 531682159092 ("mac80211: fix VLAN handling with TXQs") Link: https://msgid.link/20240131164910.b54c28d583bc.I29450cec84ea6773cff5d9c16ff9... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mac80211/tx.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -5,7 +5,7 @@ * Copyright 2006-2007 Jiri Benc jbenc@suse.cz * Copyright 2007 Johannes Berg johannes@sipsolutions.net * Copyright 2013-2014 Intel Mobile Communications GmbH - * Copyright (C) 2018-2022 Intel Corporation + * Copyright (C) 2018-2024 Intel Corporation * * Transmit and frame generation functions. */ @@ -3927,6 +3927,7 @@ begin: goto begin;
skb = __skb_dequeue(&tx.skbs); + info = IEEE80211_SKB_CB(skb);
if (!skb_queue_empty(&tx.skbs)) { spin_lock_bh(&fq->lock); @@ -3971,7 +3972,7 @@ begin: }
encap_out: - IEEE80211_SKB_CB(skb)->control.vif = vif; + info->control.vif = vif;
if (tx.sta && wiphy_ext_feature_isset(local->hw.wiphy, NL80211_EXT_FEATURE_AQL)) {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Emmanuel Grumbach emmanuel.grumbach@intel.com
commit b7198383ef2debe748118996f627452281cf27d7 upstream.
A DoS tool that injects loads of authentication frames made our AP crash. The iwl_mvm_is_dup() function couldn't find the per-queue dup_data which was not allocated.
The root cause for that is that we ran out of stations in the firmware and we didn't really add the station to the firmware, yet we didn't return an error to mac80211. Mac80211 was thinking that we have the station and because of that, sta_info::uploaded was set to 1. This allowed ieee80211_find_sta_by_ifaddr() to return a valid station object, but that ieee80211_sta didn't have any iwl_mvm_sta object initialized and that caused the crash mentioned earlier when we got Rx on that station.
Cc: stable@vger.kernel.org Fixes: 57974a55d995 ("wifi: iwlwifi: mvm: refactor iwl_mvm_mac_sta_state_common()") Signed-off-by: Emmanuel Grumbach emmanuel.grumbach@intel.com Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Link: https://msgid.link/20240206175739.1f76c44b2486.I6a00955e2842f15f0a089db2f834... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c | 3 +++ drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c | 4 ++++ 2 files changed, 7 insertions(+)
--- a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c @@ -3673,6 +3673,9 @@ iwl_mvm_sta_state_notexist_to_none(struc NL80211_TDLS_SETUP); }
+ if (ret) + return ret; + for_each_sta_active_link(vif, sta, link_sta, i) link_sta->agg.max_rc_amsdu_len = 1;
--- a/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c @@ -505,6 +505,10 @@ static bool iwl_mvm_is_dup(struct ieee80 return false;
mvm_sta = iwl_mvm_sta_from_mac80211(sta); + + if (WARN_ON_ONCE(!mvm_sta->dup_data)) + return false; + dup_data = &mvm_sta->dup_data[queue];
/*
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
commit b5d1b4b46f856da1473c7ba9a5cdfcb55c9b2478 upstream.
The "msg_addr" variable is u64. However, the "aligned_offset" is an unsigned int. This means that when the code does:
msg_addr &= ~aligned_offset;
it will unintentionally zero out the high 32 bits. Use ALIGN_DOWN() to do the alignment instead.
Fixes: 2217fffcd63f ("PCI: dwc: endpoint: Fix dw_pcie_ep_raise_msix_irq() alignment support") Link: https://lore.kernel.org/r/af59c7ad-ab93-40f7-ad4a-7ac0b14d37f5@moroto.mounta... Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Niklas Cassel cassel@kernel.org Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Reviewed-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/dwc/pcie-designware-ep.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c @@ -6,6 +6,7 @@ * Author: Kishon Vijay Abraham I kishon@ti.com */
+#include <linux/align.h> #include <linux/bitfield.h> #include <linux/of.h> #include <linux/platform_device.h> @@ -615,7 +616,7 @@ int dw_pcie_ep_raise_msix_irq(struct dw_ }
aligned_offset = msg_addr & (epc->mem->window.page_size - 1); - msg_addr &= ~aligned_offset; + msg_addr = ALIGN_DOWN(msg_addr, epc->mem->window.page_size); ret = dw_pcie_ep_map_addr(epc, func_no, 0, ep->msi_mem_phys, msg_addr, epc->mem->window.page_size); if (ret)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Doug Berger opendmb@gmail.com
commit b0344d6854d25a8b3b901c778b1728885dd99007 upstream.
It was observed on Broadcom devices that use GIC v3 architecture L1 interrupt controllers as the parent of brcmstb-l2 interrupt controllers that the deactivation of the parent interrupt could happen before the brcmstb-l2 deasserted its output. This would lead the GIC to reactivate the interrupt only to find that no L2 interrupt was pending. The result was a spurious interrupt invoking handle_bad_irq() with its associated messaging. While this did not create a functional problem it is a waste of cycles.
The hazard exists because the memory mapped bus writes to the brcmstb-l2 registers are buffered and the GIC v3 architecture uses a very efficient system register write to deactivate the interrupt.
Add a write memory barrier prior to invoking chained_irq_exit() to introduce a dsb(st) on those systems to ensure the system register write cannot be executed until the memory mapped writes are visible to the system.
[ florian: Added Fixes tag ]
Fixes: 7f646e92766e ("irqchip: brcmstb-l2: Add Broadcom Set Top Box Level-2 interrupt controller") Signed-off-by: Doug Berger opendmb@gmail.com Signed-off-by: Florian Fainelli florian.fainelli@broadcom.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Florian Fainelli florian.fainelli@broadcom.com Acked-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240210012449.3009125-1-florian.fainelli@broadcom... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/irqchip/irq-brcmstb-l2.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/irqchip/irq-brcmstb-l2.c +++ b/drivers/irqchip/irq-brcmstb-l2.c @@ -2,7 +2,7 @@ /* * Generic Broadcom Set Top Box Level 2 Interrupt controller driver * - * Copyright (C) 2014-2017 Broadcom + * Copyright (C) 2014-2024 Broadcom */
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt @@ -112,6 +112,9 @@ static void brcmstb_l2_intc_irq_handle(s generic_handle_domain_irq(b->domain, irq); } while (status); out: + /* Don't ack parent before all device writes are done */ + wmb(); + chained_irq_exit(chip, desc); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier maz@kernel.org
commit 8b02da04ad978827e5ccd675acf170198f747a7a upstream.
While refactoring the way the ITSs are probed, the handling of quirks applicable to ACPI-based platforms was lost. As a result, systems such as HIP07 lose their GICv4 functionnality, and some other may even fail to boot, unless they are configured to boot with DT.
Move the enabling of quirks into its_probe_one(), making it common to all firmware implementations.
Fixes: 9585a495ac93 ("irqchip/gic-v3-its: Split allocation from initialisation of its_node") Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Lorenzo Pieralisi lpieralisi@kernel.org Reviewed-by: Zenghui Yu yuzenghui@huawei.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240213101206.2137483-3-maz@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/irqchip/irq-gic-v3-its.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -5091,6 +5091,8 @@ static int __init its_probe_one(struct i u32 ctlr; int err;
+ its_enable_quirks(its); + if (is_v4(its)) { if (!(its->typer & GITS_TYPER_VMOVP)) { err = its_compute_its_list_map(its); @@ -5442,7 +5444,6 @@ static int __init its_of_probe(struct de if (!its) return -ENOMEM;
- its_enable_quirks(its); err = its_probe_one(its); if (err) { its_node_destroy(its);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier maz@kernel.org
commit af9acbfc2c4b72c378d0b9a2ee023ed01055d3e2 upstream.
When updating the affinity of a VPE, the VMOVP command is currently skipped if the two CPUs are part of the same VPE affinity.
But this is wrong, as the doorbell corresponding to this VPE is still delivered on the 'old' CPU, which screws up the balancing. Furthermore, offlining that 'old' CPU results in doorbell interrupts generated for this VPE being discarded.
The harsh reality is that VMOVP cannot be elided when a set_affinity() request occurs. It needs to be obeyed, and if an optimisation is to be made, it is at the point where the affinity change request is made (such as in KVM).
Drop the VMOVP elision altogether, and only use the vpe_table_mask to try and stay within the same ITS affinity group if at all possible.
Fixes: dd3f050a216e (irqchip/gic-v4.1: Implement the v4.1 flavour of VMOVP) Reported-by: Kunkun Jiang jiangkunkun@huawei.com Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240213101206.2137483-4-maz@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/irqchip/irq-gic-v3-its.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)
--- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -3826,8 +3826,9 @@ static int its_vpe_set_affinity(struct i bool force) { struct its_vpe *vpe = irq_data_get_irq_chip_data(d); - int from, cpu = cpumask_first(mask_val); + struct cpumask common, *table_mask; unsigned long flags; + int from, cpu;
/* * Changing affinity is mega expensive, so let's be as lazy as @@ -3843,19 +3844,22 @@ static int its_vpe_set_affinity(struct i * taken on any vLPI handling path that evaluates vpe->col_idx. */ from = vpe_to_cpuid_lock(vpe, &flags); - if (from == cpu) - goto out; - - vpe->col_idx = cpu; + table_mask = gic_data_rdist_cpu(from)->vpe_table_mask;
/* - * GICv4.1 allows us to skip VMOVP if moving to a cpu whose RD - * is sharing its VPE table with the current one. + * If we are offered another CPU in the same GICv4.1 ITS + * affinity, pick this one. Otherwise, any CPU will do. */ - if (gic_data_rdist_cpu(cpu)->vpe_table_mask && - cpumask_test_cpu(from, gic_data_rdist_cpu(cpu)->vpe_table_mask)) + if (table_mask && cpumask_and(&common, mask_val, table_mask)) + cpu = cpumask_test_cpu(from, &common) ? from : cpumask_first(&common); + else + cpu = cpumask_first(mask_val); + + if (from == cpu) goto out;
+ vpe->col_idx = cpu; + its_send_vmovp(vpe); its_vpe_db_proxy_move(vpe, from, cpu);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mohammad Rahimi rahimi.mhmmd@gmail.com
commit ec4d82f855ce332de26fe080892483de98cc1a19 upstream.
The bit 23, CM TBT3 Not Supported (CNS), in ROUTER_CS_5 indicates whether a USB4 Connection Manager is TBT3-Compatible and should be: 0b for TBT3-Compatible 1b for Not TBT3-Compatible
Fixes: b04079837b20 ("thunderbolt: Add initial support for USB4") Cc: stable@vger.kernel.org Signed-off-by: Mohammad Rahimi rahimi.mhmmd@gmail.com Signed-off-by: Mika Westerberg mika.westerberg@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/thunderbolt/tb_regs.h | 2 +- drivers/thunderbolt/usb4.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/thunderbolt/tb_regs.h +++ b/drivers/thunderbolt/tb_regs.h @@ -203,7 +203,7 @@ struct tb_regs_switch_header { #define ROUTER_CS_5_WOP BIT(1) #define ROUTER_CS_5_WOU BIT(2) #define ROUTER_CS_5_WOD BIT(3) -#define ROUTER_CS_5_C3S BIT(23) +#define ROUTER_CS_5_CNS BIT(23) #define ROUTER_CS_5_PTO BIT(24) #define ROUTER_CS_5_UTO BIT(25) #define ROUTER_CS_5_HCO BIT(26) --- a/drivers/thunderbolt/usb4.c +++ b/drivers/thunderbolt/usb4.c @@ -290,7 +290,7 @@ int usb4_switch_setup(struct tb_switch * }
/* TBT3 supported by the CM */ - val |= ROUTER_CS_5_C3S; + val &= ~ROUTER_CS_5_CNS;
return tb_sw_write(sw, &val, TB_CFG_SWITCH, ROUTER_CS_5, 1); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.com
commit 4508ec17357094e2075f334948393ddedbb75157 upstream.
When uid, gid and cruid are not specified, we need to dynamically set them into the filesystem context used for automounting otherwise they'll end up reusing the values from the parent mount.
Fixes: 9fd29a5bae6e ("cifs: use fs_context for automounts") Reported-by: Shane Nehring snehring@iastate.edu Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2259257 Cc: stable@vger.kernel.org # 6.2+ Signed-off-by: Paulo Alcantara (Red Hat) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/namespace.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
--- a/fs/smb/client/namespace.c +++ b/fs/smb/client/namespace.c @@ -168,6 +168,21 @@ static char *automount_fullpath(struct d return s; }
+static void fs_context_set_ids(struct smb3_fs_context *ctx) +{ + kuid_t uid = current_fsuid(); + kgid_t gid = current_fsgid(); + + if (ctx->multiuser) { + if (!ctx->uid_specified) + ctx->linux_uid = uid; + if (!ctx->gid_specified) + ctx->linux_gid = gid; + } + if (!ctx->cruid_specified) + ctx->cred_uid = uid; +} + /* * Create a vfsmount that we can automount */ @@ -205,6 +220,7 @@ static struct vfsmount *cifs_do_automoun tmp.leaf_fullpath = NULL; tmp.UNC = tmp.prepath = NULL; tmp.dfs_root_ses = NULL; + fs_context_set_ids(&tmp);
rc = smb3_fs_context_dup(ctx, &tmp); if (rc) {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steve French stfrench@microsoft.com
commit 4860abb91f3d7fbaf8147d54782149bb1fc45892 upstream.
The conversion to netfs in the 6.3 kernel caused a regression when maximum write size is set by the server to an unexpected value which is not a multiple of 4096 (similarly if the user overrides the maximum write size by setting mount parm "wsize", but sets it to a value that is not a multiple of 4096). When negotiated write size is not a multiple of 4096 the netfs code can skip the end of the final page when doing large sequential writes, causing data corruption.
This section of code is being rewritten/removed due to a large netfs change, but until that point (ie for the 6.3 kernel until now) we can not support non-standard maximum write sizes.
Add a warning if a user specifies a wsize on mount that is not a multiple of 4096 (and round down), also add a change where we round down the maximum write size if the server negotiates a value that is not a multiple of 4096 (we also have to check to make sure that we do not round it down to zero).
Reported-by: "R. Diez" rdiez-2006@rd10.de Fixes: d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather than a page list") Suggested-by: Ronnie Sahlberg ronniesahlberg@gmail.com Acked-by: Ronnie Sahlberg ronniesahlberg@gmail.com Tested-by: Matthew Ruffell matthew.ruffell@canonical.com Reviewed-by: Shyam Prasad N sprasad@microsoft.com Cc: stable@vger.kernel.org # v6.3+ Cc: David Howells dhowells@redhat.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/connect.c | 14 ++++++++++++-- fs/smb/client/fs_context.c | 11 +++++++++++ 2 files changed, 23 insertions(+), 2 deletions(-)
--- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -3425,8 +3425,18 @@ int cifs_mount_get_tcon(struct cifs_moun * the user on mount */ if ((cifs_sb->ctx->wsize == 0) || - (cifs_sb->ctx->wsize > server->ops->negotiate_wsize(tcon, ctx))) - cifs_sb->ctx->wsize = server->ops->negotiate_wsize(tcon, ctx); + (cifs_sb->ctx->wsize > server->ops->negotiate_wsize(tcon, ctx))) { + cifs_sb->ctx->wsize = + round_down(server->ops->negotiate_wsize(tcon, ctx), PAGE_SIZE); + /* + * in the very unlikely event that the server sent a max write size under PAGE_SIZE, + * (which would get rounded down to 0) then reset wsize to absolute minimum eg 4096 + */ + if (cifs_sb->ctx->wsize == 0) { + cifs_sb->ctx->wsize = PAGE_SIZE; + cifs_dbg(VFS, "wsize too small, reset to minimum ie PAGE_SIZE, usually 4096\n"); + } + } if ((cifs_sb->ctx->rsize == 0) || (cifs_sb->ctx->rsize > server->ops->negotiate_rsize(tcon, ctx))) cifs_sb->ctx->rsize = server->ops->negotiate_rsize(tcon, ctx); --- a/fs/smb/client/fs_context.c +++ b/fs/smb/client/fs_context.c @@ -1107,6 +1107,17 @@ static int smb3_fs_context_parse_param(s case Opt_wsize: ctx->wsize = result.uint_32; ctx->got_wsize = true; + if (ctx->wsize % PAGE_SIZE != 0) { + ctx->wsize = round_down(ctx->wsize, PAGE_SIZE); + if (ctx->wsize == 0) { + ctx->wsize = PAGE_SIZE; + cifs_dbg(VFS, "wsize too small, reset to minimum %ld\n", PAGE_SIZE); + } else { + cifs_dbg(VFS, + "wsize rounded down to %d to multiple of PAGE_SIZE %ld\n", + ctx->wsize, PAGE_SIZE); + } + } break; case Opt_acregmax: ctx->acregmax = HZ * result.uint_32;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Borntraeger borntraeger@linux.ibm.com
commit fe752331d4b361d43cfd0b89534b4b2176057c32 upstream.
Right now it is possible to see gmap->private being zero in kvm_s390_vsie_gmap_notifier resulting in a crash. This is due to the fact that we add gmap->private == kvm after creation:
static int acquire_gmap_shadow(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page) { [...] gmap = gmap_shadow(vcpu->arch.gmap, asce, edat); if (IS_ERR(gmap)) return PTR_ERR(gmap); gmap->private = vcpu->kvm;
Let children inherit the private field of the parent.
Reported-by: Marc Hartmayer mhartmay@linux.ibm.com Fixes: a3508fbe9dc6 ("KVM: s390: vsie: initial support for nested virtualization") Cc: stable@vger.kernel.org Cc: David Hildenbrand david@redhat.com Reviewed-by: Janosch Frank frankja@linux.ibm.com Reviewed-by: David Hildenbrand david@redhat.com Reviewed-by: Claudio Imbrenda imbrenda@linux.ibm.com Signed-off-by: Christian Borntraeger borntraeger@linux.ibm.com Link: https://lore.kernel.org/r/20231220125317.4258-1-borntraeger@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kvm/vsie.c | 1 - arch/s390/mm/gmap.c | 1 + 2 files changed, 1 insertion(+), 1 deletion(-)
--- a/arch/s390/kvm/vsie.c +++ b/arch/s390/kvm/vsie.c @@ -1220,7 +1220,6 @@ static int acquire_gmap_shadow(struct kv gmap = gmap_shadow(vcpu->arch.gmap, asce, edat); if (IS_ERR(gmap)) return PTR_ERR(gmap); - gmap->private = vcpu->kvm; vcpu->kvm->stat.gmap_shadow_create++; WRITE_ONCE(vsie_page->gmap, gmap); return 0; --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -1691,6 +1691,7 @@ struct gmap *gmap_shadow(struct gmap *pa return ERR_PTR(-ENOMEM); new->mm = parent->mm; new->parent = gmap_get(parent); + new->private = parent->private; new->orig_asce = asce; new->edat_level = edat_level; new->initialized = false;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sebastian Ene sebastianene@google.com
commit 10c02aad111df02088d1a81792a709f6a7eca6cc upstream.
The rule inside kvm enforces that the vcpu->mutex is taken *inside* kvm->lock. The rule is violated by the pkvm_create_hyp_vm() which acquires the kvm->lock while already holding the vcpu->mutex lock from kvm_vcpu_ioctl(). Avoid the circular locking dependency altogether by protecting the hyp vm handle with the config_lock, much like we already do for other forms of VM-scoped data.
Signed-off-by: Sebastian Ene sebastianene@google.com Cc: stable@vger.kernel.org Reviewed-by: Oliver Upton oliver.upton@linux.dev Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20240124091027.1477174-2-sebastianene@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/pkvm.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-)
--- a/arch/arm64/kvm/pkvm.c +++ b/arch/arm64/kvm/pkvm.c @@ -101,6 +101,17 @@ void __init kvm_hyp_reserve(void) hyp_mem_base); }
+static void __pkvm_destroy_hyp_vm(struct kvm *host_kvm) +{ + if (host_kvm->arch.pkvm.handle) { + WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_vm, + host_kvm->arch.pkvm.handle)); + } + + host_kvm->arch.pkvm.handle = 0; + free_hyp_memcache(&host_kvm->arch.pkvm.teardown_mc); +} + /* * Allocates and donates memory for hypervisor VM structs at EL2. * @@ -181,7 +192,7 @@ static int __pkvm_create_hyp_vm(struct k return 0;
destroy_vm: - pkvm_destroy_hyp_vm(host_kvm); + __pkvm_destroy_hyp_vm(host_kvm); return ret; free_vm: free_pages_exact(hyp_vm, hyp_vm_sz); @@ -194,23 +205,19 @@ int pkvm_create_hyp_vm(struct kvm *host_ { int ret = 0;
- mutex_lock(&host_kvm->lock); + mutex_lock(&host_kvm->arch.config_lock); if (!host_kvm->arch.pkvm.handle) ret = __pkvm_create_hyp_vm(host_kvm); - mutex_unlock(&host_kvm->lock); + mutex_unlock(&host_kvm->arch.config_lock);
return ret; }
void pkvm_destroy_hyp_vm(struct kvm *host_kvm) { - if (host_kvm->arch.pkvm.handle) { - WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_vm, - host_kvm->arch.pkvm.handle)); - } - - host_kvm->arch.pkvm.handle = 0; - free_hyp_memcache(&host_kvm->arch.pkvm.teardown_mc); + mutex_lock(&host_kvm->arch.config_lock); + __pkvm_destroy_hyp_vm(host_kvm); + mutex_unlock(&host_kvm->arch.config_lock); }
int pkvm_init_host_vm(struct kvm *host_kvm)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Damien Le Moal dlemoal@kernel.org
commit 14db5f64a971fce3d8ea35de4dfc7f443a3efb92 upstream.
Write error handling is racy and can sometime lead to the error recovery path wrongly changing the inode size of a sequential zone file to an incorrect value which results in garbage data being readable at the end of a file. There are 2 problems:
1) zonefs_file_dio_write() updates a zone file write pointer offset after issuing a direct IO with iomap_dio_rw(). This update is done only if the IO succeed for synchronous direct writes. However, for asynchronous direct writes, the update is done without waiting for the IO completion so that the next asynchronous IO can be immediately issued. However, if an asynchronous IO completes with a failure right before the i_truncate_mutex lock protecting the update, the update may change the value of the inode write pointer offset that was corrected by the error path (zonefs_io_error() function).
2) zonefs_io_error() is called when a read or write error occurs. This function executes a report zone operation using the callback function zonefs_io_error_cb(), which does all the error recovery handling based on the current zone condition, write pointer position and according to the mount options being used. However, depending on the zoned device being used, a report zone callback may be executed in a context that is different from the context of __zonefs_io_error(). As a result, zonefs_io_error_cb() may be executed without the inode truncate mutex lock held, which can lead to invalid error processing.
Fix both problems as follows: - Problem 1: Perform the inode write pointer offset update before a direct write is issued with iomap_dio_rw(). This is safe to do as partial direct writes are not supported (IOMAP_DIO_PARTIAL is not set) and any failed IO will trigger the execution of zonefs_io_error() which will correct the inode write pointer offset to reflect the current state of the one on the device. - Problem 2: Change zonefs_io_error_cb() into zonefs_handle_io_error() and call this function directly from __zonefs_io_error() after obtaining the zone information using blkdev_report_zones() with a simple callback function that copies to a local stack variable the struct blk_zone obtained from the device. This ensures that error handling is performed holding the inode truncate mutex. This change also simplifies error handling for conventional zone files by bypassing the execution of report zones entirely. This is safe to do because the condition of conventional zones cannot be read-only or offline and conventional zone files are always fully mapped with a constant file size.
Reported-by: Shin'ichiro Kawasaki shinichiro.kawasaki@wdc.com Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal dlemoal@kernel.org Tested-by: Shin'ichiro Kawasaki shinichiro.kawasaki@wdc.com Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/zonefs/file.c | 42 +++++++++++++++++++++------------ fs/zonefs/super.c | 68 ++++++++++++++++++++++++++++++------------------------ 2 files changed, 66 insertions(+), 44 deletions(-)
--- a/fs/zonefs/file.c +++ b/fs/zonefs/file.c @@ -348,7 +348,12 @@ static int zonefs_file_write_dio_end_io( struct zonefs_inode_info *zi = ZONEFS_I(inode);
if (error) { - zonefs_io_error(inode, true); + /* + * For Sync IOs, error recovery is called from + * zonefs_file_dio_write(). + */ + if (!is_sync_kiocb(iocb)) + zonefs_io_error(inode, true); return error; }
@@ -491,6 +496,14 @@ static ssize_t zonefs_file_dio_write(str ret = -EINVAL; goto inode_unlock; } + /* + * Advance the zone write pointer offset. This assumes that the + * IO will succeed, which is OK to do because we do not allow + * partial writes (IOMAP_DIO_PARTIAL is not set) and if the IO + * fails, the error path will correct the write pointer offset. + */ + z->z_wpoffset += count; + zonefs_inode_account_active(inode); mutex_unlock(&zi->i_truncate_mutex); }
@@ -504,20 +517,19 @@ static ssize_t zonefs_file_dio_write(str if (ret == -ENOTBLK) ret = -EBUSY;
- if (zonefs_zone_is_seq(z) && - (ret > 0 || ret == -EIOCBQUEUED)) { - if (ret > 0) - count = ret; - - /* - * Update the zone write pointer offset assuming the write - * operation succeeded. If it did not, the error recovery path - * will correct it. Also do active seq file accounting. - */ - mutex_lock(&zi->i_truncate_mutex); - z->z_wpoffset += count; - zonefs_inode_account_active(inode); - mutex_unlock(&zi->i_truncate_mutex); + /* + * For a failed IO or partial completion, trigger error recovery + * to update the zone write pointer offset to a correct value. + * For asynchronous IOs, zonefs_file_write_dio_end_io() may already + * have executed error recovery if the IO already completed when we + * reach here. However, we cannot know that and execute error recovery + * again (that will not change anything). + */ + if (zonefs_zone_is_seq(z)) { + if (ret > 0 && ret != count) + ret = -EIO; + if (ret < 0 && ret != -EIOCBQUEUED) + zonefs_io_error(inode, true); }
inode_unlock: --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -246,16 +246,18 @@ static void zonefs_inode_update_mode(str z->z_mode = inode->i_mode; }
-struct zonefs_ioerr_data { - struct inode *inode; - bool write; -}; - static int zonefs_io_error_cb(struct blk_zone *zone, unsigned int idx, void *data) { - struct zonefs_ioerr_data *err = data; - struct inode *inode = err->inode; + struct blk_zone *z = data; + + *z = *zone; + return 0; +} + +static void zonefs_handle_io_error(struct inode *inode, struct blk_zone *zone, + bool write) +{ struct zonefs_zone *z = zonefs_inode_zone(inode); struct super_block *sb = inode->i_sb; struct zonefs_sb_info *sbi = ZONEFS_SB(sb); @@ -270,8 +272,8 @@ static int zonefs_io_error_cb(struct blk data_size = zonefs_check_zone_condition(sb, z, zone); isize = i_size_read(inode); if (!(z->z_flags & (ZONEFS_ZONE_READONLY | ZONEFS_ZONE_OFFLINE)) && - !err->write && isize == data_size) - return 0; + !write && isize == data_size) + return;
/* * At this point, we detected either a bad zone or an inconsistency @@ -292,7 +294,7 @@ static int zonefs_io_error_cb(struct blk * In all cases, warn about inode size inconsistency and handle the * IO error according to the zone condition and to the mount options. */ - if (zonefs_zone_is_seq(z) && isize != data_size) + if (isize != data_size) zonefs_warn(sb, "inode %lu: invalid size %lld (should be %lld)\n", inode->i_ino, isize, data_size); @@ -352,8 +354,6 @@ static int zonefs_io_error_cb(struct blk zonefs_i_size_write(inode, data_size); z->z_wpoffset = data_size; zonefs_inode_account_active(inode); - - return 0; }
/* @@ -367,23 +367,25 @@ void __zonefs_io_error(struct inode *ino { struct zonefs_zone *z = zonefs_inode_zone(inode); struct super_block *sb = inode->i_sb; - struct zonefs_sb_info *sbi = ZONEFS_SB(sb); unsigned int noio_flag; - unsigned int nr_zones = 1; - struct zonefs_ioerr_data err = { - .inode = inode, - .write = write, - }; + struct blk_zone zone; int ret;
/* - * The only files that have more than one zone are conventional zone - * files with aggregated conventional zones, for which the inode zone - * size is always larger than the device zone size. - */ - if (z->z_size > bdev_zone_sectors(sb->s_bdev)) - nr_zones = z->z_size >> - (sbi->s_zone_sectors_shift + SECTOR_SHIFT); + * Conventional zone have no write pointer and cannot become read-only + * or offline. So simply fake a report for a single or aggregated zone + * and let zonefs_handle_io_error() correct the zone inode information + * according to the mount options. + */ + if (!zonefs_zone_is_seq(z)) { + zone.start = z->z_sector; + zone.len = z->z_size >> SECTOR_SHIFT; + zone.wp = zone.start + zone.len; + zone.type = BLK_ZONE_TYPE_CONVENTIONAL; + zone.cond = BLK_ZONE_COND_NOT_WP; + zone.capacity = zone.len; + goto handle_io_error; + }
/* * Memory allocations in blkdev_report_zones() can trigger a memory @@ -394,12 +396,20 @@ void __zonefs_io_error(struct inode *ino * the GFP_NOIO context avoids both problems. */ noio_flag = memalloc_noio_save(); - ret = blkdev_report_zones(sb->s_bdev, z->z_sector, nr_zones, - zonefs_io_error_cb, &err); - if (ret != nr_zones) + ret = blkdev_report_zones(sb->s_bdev, z->z_sector, 1, + zonefs_io_error_cb, &zone); + memalloc_noio_restore(noio_flag); + + if (ret != 1) { zonefs_err(sb, "Get inode %lu zone information failed %d\n", inode->i_ino, ret); - memalloc_noio_restore(noio_flag); + zonefs_warn(sb, "remounting filesystem read-only\n"); + sb->s_flags |= SB_RDONLY; + return; + } + +handle_io_error: + zonefs_handle_io_error(inode, &zone, write); }
static struct kmem_cache *zonefs_inode_cachep;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fred Ai fred.ai@bayhubtech.com
commit 58aeb5623c2ebdadefe6352b14f8076a7073fea0 upstream.
Driver shall switch clock source from DLL clock to OPE clock when power off card to ensure that card can be identified with OPE clock by BIOS.
Signed-off-by: Fred Ai fred.ai@bayhubtech.com Fixes:4be33cf18703 ("mmc: sdhci-pci-o2micro: Improve card input timing at SDR104/HS200 mode") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240203102908.4683-1-fredaibayhubtech@126.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/sdhci-pci-o2micro.c | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+)
--- a/drivers/mmc/host/sdhci-pci-o2micro.c +++ b/drivers/mmc/host/sdhci-pci-o2micro.c @@ -693,6 +693,35 @@ static int sdhci_pci_o2_init_sd_express( return 0; }
+static void sdhci_pci_o2_set_power(struct sdhci_host *host, unsigned char mode, unsigned short vdd) +{ + struct sdhci_pci_chip *chip; + struct sdhci_pci_slot *slot = sdhci_priv(host); + u32 scratch_32 = 0; + u8 scratch_8 = 0; + + chip = slot->chip; + + if (mode == MMC_POWER_OFF) { + /* UnLock WP */ + pci_read_config_byte(chip->pdev, O2_SD_LOCK_WP, &scratch_8); + scratch_8 &= 0x7f; + pci_write_config_byte(chip->pdev, O2_SD_LOCK_WP, scratch_8); + + /* Set PCR 0x354[16] to switch Clock Source back to OPE Clock */ + pci_read_config_dword(chip->pdev, O2_SD_OUTPUT_CLK_SOURCE_SWITCH, &scratch_32); + scratch_32 &= ~(O2_SD_SEL_DLL); + pci_write_config_dword(chip->pdev, O2_SD_OUTPUT_CLK_SOURCE_SWITCH, scratch_32); + + /* Lock WP */ + pci_read_config_byte(chip->pdev, O2_SD_LOCK_WP, &scratch_8); + scratch_8 |= 0x80; + pci_write_config_byte(chip->pdev, O2_SD_LOCK_WP, scratch_8); + } + + sdhci_set_power(host, mode, vdd); +} + static int sdhci_pci_o2_probe_slot(struct sdhci_pci_slot *slot) { struct sdhci_pci_chip *chip; @@ -1051,6 +1080,7 @@ static const struct sdhci_ops sdhci_pci_ .set_bus_width = sdhci_set_bus_width, .reset = sdhci_reset, .set_uhs_signaling = sdhci_set_uhs_signaling, + .set_power = sdhci_pci_o2_set_power, };
const struct sdhci_pci_fixes sdhci_o2 = {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Brown broonie@kernel.org
commit 61da7c8e2a602f66be578cbbcebe8638c10e0f48 upstream.
When we are in a syscall we will only save the FPSIMD subset even though the task still has access to the full register set, and on context switch we will only remove TIF_SVE when loading the register state. This means that the signal handling code should not assume that TIF_SVE means that the register state is stored in SVE format, it should instead check the format that was recorded during save.
Fixes: 8c845e273104 ("arm64/sve: Leave SVE enabled on syscall if we don't context switch") Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240130-arm64-sve-signal-regs-v2-1-9fc6f9502782@k... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/fpsimd.c | 2 +- arch/arm64/kernel/signal.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-)
--- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1628,7 +1628,7 @@ void fpsimd_preserve_current_state(void) void fpsimd_signal_preserve_current_state(void) { fpsimd_preserve_current_state(); - if (test_thread_flag(TIF_SVE)) + if (current->thread.fp_type == FP_STATE_SVE) sve_to_fpsimd(current); }
--- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -242,7 +242,7 @@ static int preserve_sve_context(struct s vl = task_get_sme_vl(current); vq = sve_vq_from_vl(vl); flags |= SVE_SIG_FLAG_SM; - } else if (test_thread_flag(TIF_SVE)) { + } else if (current->thread.fp_type == FP_STATE_SVE) { vq = sve_vq_from_vl(vl); }
@@ -878,7 +878,7 @@ static int setup_sigframe_layout(struct if (system_supports_sve() || system_supports_sme()) { unsigned int vq = 0;
- if (add_all || test_thread_flag(TIF_SVE) || + if (add_all || current->thread.fp_type == FP_STATE_SVE || thread_sm_enabled(¤t->thread)) { int vl = max(sve_max_vl(), sme_max_vl());
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Easwar Hariharan eahariha@linux.microsoft.com
commit fb091ff394792c018527b3211bbdfae93ea4ac02 upstream.
Add the MIDR value of Microsoft Azure Cobalt 100, which is a Microsoft implemented CPU based on r0p0 of the ARM Neoverse N2 CPU, and therefore suffers from all the same errata.
CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Easwar Hariharan eahariha@linux.microsoft.com Reviewed-by: Anshuman Khandual anshuman.khandual@arm.com Acked-by: Mark Rutland mark.rutland@arm.com Acked-by: Marc Zyngier maz@kernel.org Reviewed-by: Oliver Upton oliver.upton@linux.dev Link: https://lore.kernel.org/r/20240214175522.2457857-1-eahariha@linux.microsoft.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/arch/arm64/silicon-errata.rst | 7 +++++++ arch/arm64/include/asm/cputype.h | 4 ++++ arch/arm64/kernel/cpu_errata.c | 3 +++ 3 files changed, 14 insertions(+)
--- a/Documentation/arch/arm64/silicon-errata.rst +++ b/Documentation/arch/arm64/silicon-errata.rst @@ -235,3 +235,10 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ASR | ASR8601 | #8601001 | N/A | +----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ +| Microsoft | Azure Cobalt 100| #2139208 | ARM64_ERRATUM_2139208 | ++----------------+-----------------+-----------------+-----------------------------+ +| Microsoft | Azure Cobalt 100| #2067961 | ARM64_ERRATUM_2067961 | ++----------------+-----------------+-----------------+-----------------------------+ +| Microsoft | Azure Cobalt 100| #2253138 | ARM64_ERRATUM_2253138 | ++----------------+-----------------+-----------------+-----------------------------+ --- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -61,6 +61,7 @@ #define ARM_CPU_IMP_HISI 0x48 #define ARM_CPU_IMP_APPLE 0x61 #define ARM_CPU_IMP_AMPERE 0xC0 +#define ARM_CPU_IMP_MICROSOFT 0x6D
#define ARM_CPU_PART_AEM_V8 0xD0F #define ARM_CPU_PART_FOUNDATION 0xD00 @@ -135,6 +136,8 @@
#define AMPERE_CPU_PART_AMPERE1 0xAC3
+#define MICROSOFT_CPU_PART_AZURE_COBALT_100 0xD49 /* Based on r0p0 of ARM Neoverse N2 */ + #define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A53) #define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A57) #define MIDR_CORTEX_A72 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A72) @@ -193,6 +196,7 @@ #define MIDR_APPLE_M2_BLIZZARD_MAX MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M2_BLIZZARD_MAX) #define MIDR_APPLE_M2_AVALANCHE_MAX MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M2_AVALANCHE_MAX) #define MIDR_AMPERE1 MIDR_CPU_MODEL(ARM_CPU_IMP_AMPERE, AMPERE_CPU_PART_AMPERE1) +#define MIDR_MICROSOFT_AZURE_COBALT_100 MIDR_CPU_MODEL(ARM_CPU_IMP_MICROSOFT, MICROSOFT_CPU_PART_AZURE_COBALT_100)
/* Fujitsu Erratum 010001 affects A64FX 1.0 and 1.1, (v0r0 and v1r0) */ #define MIDR_FUJITSU_ERRATUM_010001 MIDR_FUJITSU_A64FX --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -374,6 +374,7 @@ static const struct midr_range erratum_1 static const struct midr_range trbe_overwrite_fill_mode_cpus[] = { #ifdef CONFIG_ARM64_ERRATUM_2139208 MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2), + MIDR_ALL_VERSIONS(MIDR_MICROSOFT_AZURE_COBALT_100), #endif #ifdef CONFIG_ARM64_ERRATUM_2119858 MIDR_ALL_VERSIONS(MIDR_CORTEX_A710), @@ -387,6 +388,7 @@ static const struct midr_range trbe_over static const struct midr_range tsb_flush_fail_cpus[] = { #ifdef CONFIG_ARM64_ERRATUM_2067961 MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2), + MIDR_ALL_VERSIONS(MIDR_MICROSOFT_AZURE_COBALT_100), #endif #ifdef CONFIG_ARM64_ERRATUM_2054223 MIDR_ALL_VERSIONS(MIDR_CORTEX_A710), @@ -399,6 +401,7 @@ static const struct midr_range tsb_flush static struct midr_range trbe_write_out_of_range_cpus[] = { #ifdef CONFIG_ARM64_ERRATUM_2253138 MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2), + MIDR_ALL_VERSIONS(MIDR_MICROSOFT_AZURE_COBALT_100), #endif #ifdef CONFIG_ARM64_ERRATUM_2224489 MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Curtis Malainey cujomalainey@chromium.org
commit fcbe4873089c84da641df75cda9cac2e9addbb4b upstream.
commit 74ad8ed65121 ("ASoC: SOF: ipc3: Implement rx_msg IPC ops") introduced a new allocation before the upper bounds check in do_rx_work. As a result A DSP can cause bad allocations if spewing garbage.
Fixes: 74ad8ed65121 ("ASoC: SOF: ipc3: Implement rx_msg IPC ops") Reported-by: Tim Van Patten timvp@google.com Cc: stable@vger.kernel.org Signed-off-by: Curtis Malainey cujomalainey@chromium.org Reviewed-by: Péter Ujfalusi peter.ujfalusi@linux.intel.com Reviewed-by: Daniel Baluta daniel.baluta@nxp.com Reviewed-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Signed-off-by: Peter Ujfalusi peter.ujfalusi@linux.intel.com Link: https://msgid.link/r/20240213123834.4827-1-peter.ujfalusi@linux.intel.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/sof/ipc3.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/soc/sof/ipc3.c +++ b/sound/soc/sof/ipc3.c @@ -1067,7 +1067,7 @@ static void sof_ipc3_rx_msg(struct snd_s return; }
- if (hdr.size < sizeof(hdr)) { + if (hdr.size < sizeof(hdr) || hdr.size > SOF_IPC_MSG_MAX_SIZE) { dev_err(sdev->dev, "The received message size is invalid\n"); return; }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gergo Koteles soyer@irl.hu
commit 34a1066981a967eab619938e7b35a9be6b4c34e1 upstream.
The tascodec_init() of the snd-soc-tas2781-comlib module is called from snd-soc-tas2781-i2c and snd-hda-scodec-tas2781-i2c modules. It calls request_firmware_nowait() with parameter THIS_MODULE and a cont/callback from the latter modules.
The latter modules can be removed while their callbacks are running, resulting in a general protection failure.
Add module parameter to tascodec_init() so request_firmware_nowait() can be called with the module of the callback.
Fixes: ef3bcde75d06 ("ASoC: tas2781: Add tas2781 driver") CC: stable@vger.kernel.org Signed-off-by: Gergo Koteles soyer@irl.hu Link: https://lore.kernel.org/r/118dad922cef50525e5aab09badef2fa0eb796e5.170707660... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/sound/tas2781.h | 1 + sound/pci/hda/tas2781_hda_i2c.c | 2 +- sound/soc/codecs/tas2781-comlib.c | 3 ++- sound/soc/codecs/tas2781-i2c.c | 2 +- 4 files changed, 5 insertions(+), 3 deletions(-)
--- a/include/sound/tas2781.h +++ b/include/sound/tas2781.h @@ -135,6 +135,7 @@ struct tasdevice_priv {
void tas2781_reset(struct tasdevice_priv *tas_dev); int tascodec_init(struct tasdevice_priv *tas_priv, void *codec, + struct module *module, void (*cont)(const struct firmware *fw, void *context)); struct tasdevice_priv *tasdevice_kzalloc(struct i2c_client *i2c); int tasdevice_init(struct tasdevice_priv *tas_priv); --- a/sound/pci/hda/tas2781_hda_i2c.c +++ b/sound/pci/hda/tas2781_hda_i2c.c @@ -627,7 +627,7 @@ static int tas2781_hda_bind(struct devic
strscpy(comps->name, dev_name(dev), sizeof(comps->name));
- ret = tascodec_init(tas_hda->priv, codec, tasdev_fw_ready); + ret = tascodec_init(tas_hda->priv, codec, THIS_MODULE, tasdev_fw_ready); if (!ret) comps->playback_hook = tas2781_hda_playback_hook;
--- a/sound/soc/codecs/tas2781-comlib.c +++ b/sound/soc/codecs/tas2781-comlib.c @@ -267,6 +267,7 @@ void tas2781_reset(struct tasdevice_priv EXPORT_SYMBOL_GPL(tas2781_reset);
int tascodec_init(struct tasdevice_priv *tas_priv, void *codec, + struct module *module, void (*cont)(const struct firmware *fw, void *context)) { int ret = 0; @@ -280,7 +281,7 @@ int tascodec_init(struct tasdevice_priv tas_priv->dev_name, tas_priv->ndev); crc8_populate_msb(tas_priv->crc8_lkp_tbl, TASDEVICE_CRC8_POLYNOMIAL); tas_priv->codec = codec; - ret = request_firmware_nowait(THIS_MODULE, FW_ACTION_UEVENT, + ret = request_firmware_nowait(module, FW_ACTION_UEVENT, tas_priv->rca_binaryname, tas_priv->dev, GFP_KERNEL, tas_priv, cont); if (ret) --- a/sound/soc/codecs/tas2781-i2c.c +++ b/sound/soc/codecs/tas2781-i2c.c @@ -564,7 +564,7 @@ static int tasdevice_codec_probe(struct { struct tasdevice_priv *tas_priv = snd_soc_component_get_drvdata(codec);
- return tascodec_init(tas_priv, codec, tasdevice_fw_ready); + return tascodec_init(tas_priv, codec, THIS_MODULE, tasdevice_fw_ready); }
static void tasdevice_deinit(void *context)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
commit 610010737f74482a61896596a0116876ecf9e65c upstream.
The laptop requires a quirk ID to enable its internal microphone. Add it to the DMI quirk table.
Reported-by: Stanislav Petrov stanislav.i.petrov@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=216925 Cc: stable@vger.kernel.org Signed-off-by: Mario Limonciello mario.limonciello@amd.com Link: https://lore.kernel.org/r/20240205214853.2689-1-mario.limonciello@amd.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/amd/yc/acp6x-mach.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/sound/soc/amd/yc/acp6x-mach.c +++ b/sound/soc/amd/yc/acp6x-mach.c @@ -251,6 +251,13 @@ static const struct dmi_system_id yc_acp { .driver_data = &acp6x_card, .matches = { + DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "83AS"), + } + }, + { + .driver_data = &acp6x_card, + .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "ASUSTeK COMPUTER INC."), DMI_MATCH(DMI_PRODUCT_NAME, "UM5302TA"), }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Bristot de Oliveira bristot@kernel.org
commit 61ec586bc0815959d3314cf7ce242529c977b357 upstream.
clang is reporting:
$ make HOSTCC=clang CC=clang LLVM_IAS=1
clang -O -g -DVERSION="6.8.0-rc3" -flto=auto -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS $(pkg-config --cflags libtracefs) -I include -c -o src/in_kernel.o src/in_kernel.c [...]
src/in_kernel.c:227:6: warning: variable 'curr_reactor' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] 227 | if (!end) | ^~~~ src/in_kernel.c:242:9: note: uninitialized use occurs here 242 | return curr_reactor; | ^~~~~~~~~~~~ src/in_kernel.c:227:2: note: remove the 'if' if its condition is always false 227 | if (!end) | ^~~~~~~~~ 228 | goto out_free; | ~~~~~~~~~~~~~ src/in_kernel.c:221:6: warning: variable 'curr_reactor' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] 221 | if (!start) | ^~~~~~ src/in_kernel.c:242:9: note: uninitialized use occurs here 242 | return curr_reactor; | ^~~~~~~~~~~~ src/in_kernel.c:221:2: note: remove the 'if' if its condition is always false 221 | if (!start) | ^~~~~~~~~~~ 222 | goto out_free; | ~~~~~~~~~~~~~ src/in_kernel.c:215:20: note: initialize the variable 'curr_reactor' to silence this warning 215 | char *curr_reactor; | ^ | = NULL 2 warnings generated.
Which is correct. Setting curr_reactor to NULL avoids the problem.
Link: https://lkml.kernel.org/r/3a35551149e5ee0cb0950035afcb8082c3b5d05b.170721709...
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: Bill Wendling morbo@google.com Cc: Justin Stitt justinstitt@google.com Cc: Donald Zickus dzickus@redhat.com Fixes: 6d60f89691fc ("tools/rv: Add in-kernel monitor interface") Signed-off-by: Daniel Bristot de Oliveira bristot@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/verification/rv/src/in_kernel.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/verification/rv/src/in_kernel.c +++ b/tools/verification/rv/src/in_kernel.c @@ -210,9 +210,9 @@ static char *ikm_read_reactor(char *moni static char *ikm_get_current_reactor(char *monitor_name) { char *reactors = ikm_read_reactor(monitor_name); + char *curr_reactor = NULL; char *start; char *end; - char *curr_reactor;
if (!reactors) return NULL;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Bristot de Oliveira bristot@kernel.org
commit f9b2c87105c989a7b259c6da87673ada96dce2f8 upstream.
The following errors are showing up when compiling rv with clang:
$ make HOSTCC=clang CC=clang LLVM_IAS=1 [...] clang -O -g -DVERSION="6.8.0-rc1" -flto=auto -ffat-lto-objects -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized $(pkg-config --cflags libtracefs) -I include -c -o src/utils.o src/utils.c clang: warning: optimization flag '-ffat-lto-objects' is not supported [-Wignored-optimization-argument] warning: unknown warning option '-Wno-maybe-uninitialized'; did you mean '-Wno-uninitialized'? [-Wunknown-warning-option] 1 warning generated.
clang -o rv -ggdb src/in_kernel.o src/rv.o src/trace.o src/utils.o $(pkg-config --libs libtracefs) src/in_kernel.o: file not recognized: file format not recognized clang: error: linker command failed with exit code 1 (use -v to see invocation) make: *** [Makefile:110: rv] Error 1
Solve these issues by: - removing -ffat-lto-objects and -Wno-maybe-uninitialized if using clang - informing the linker about -flto=auto
Link: https://lkml.kernel.org/r/ed94a8ddc2ca8c8ef663cfb7ae9dd196c4a66b33.170721709...
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: Bill Wendling morbo@google.com Cc: Justin Stitt justinstitt@google.com Fixes: 4bc4b131d44c ("rv: Add rv tool") Suggested-by: Donald Zickus dzickus@redhat.com Signed-off-by: Daniel Bristot de Oliveira bristot@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/verification/rv/Makefile | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/tools/verification/rv/Makefile +++ b/tools/verification/rv/Makefile @@ -28,10 +28,15 @@ FOPTS := -flto=auto -ffat-lto-objects -f -fasynchronous-unwind-tables -fstack-clash-protection WOPTS := -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized
+ifeq ($(CC),clang) + FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS)) + WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS)) +endif + TRACEFS_HEADERS := $$($(PKG_CONFIG) --cflags libtracefs)
CFLAGS := -O -g -DVERSION="$(VERSION)" $(FOPTS) $(MOPTS) $(WOPTS) $(TRACEFS_HEADERS) $(EXTRA_CFLAGS) -I include -LDFLAGS := -ggdb $(EXTRA_LDFLAGS) +LDFLAGS := -flto=auto -ggdb $(EXTRA_LDFLAGS) LIBS := $$($(PKG_CONFIG) --libs libtracefs)
SRC := $(wildcard src/*.c)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Bristot de Oliveira bristot@kernel.org
commit 084ce16df0f060efd371092a09a7ae74a536dc11 upstream.
Clang is reporting:
$ make HOSTCC=clang CC=clang LLVM_IAS=1 [...] clang -O -g -DVERSION="6.8.0-rc3" -flto=auto -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS $(pkg-config --cflags libtracefs) -c -o src/utils.o src/utils.c src/utils.c:241:19: warning: unused function 'sched_getattr' [-Wunused-function] 241 | static inline int sched_getattr(pid_t pid, struct sched_attr *attr, | ^~~~~~~~~~~~~ 1 warning generated.
Which is correct, so remove the unused function.
Link: https://lkml.kernel.org/r/eaed7ba122c4ae88ce71277c824ef41cbf789385.170721709...
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: Bill Wendling morbo@google.com Cc: Justin Stitt justinstitt@google.com Cc: Donald Zickus dzickus@redhat.com Fixes: b1696371d865 ("rtla: Helper functions for rtla") Signed-off-by: Daniel Bristot de Oliveira bristot@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/tracing/rtla/src/utils.c | 6 ------ 1 file changed, 6 deletions(-)
--- a/tools/tracing/rtla/src/utils.c +++ b/tools/tracing/rtla/src/utils.c @@ -238,12 +238,6 @@ static inline int sched_setattr(pid_t pi return syscall(__NR_sched_setattr, pid, attr, flags); }
-static inline int sched_getattr(pid_t pid, struct sched_attr *attr, - unsigned int size, unsigned int flags) -{ - return syscall(__NR_sched_getattr, pid, attr, size, flags); -} - int __set_sched_attr(int pid, struct sched_attr *attr) { int flags = 0;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: limingming3 limingming890315@gmail.com
commit 14f08c976ffe0d2117c6199c32663df1cbc45c65 upstream.
Since the sched_priority for SCHED_OTHER is always 0, it makes no sence to set it. Setting nice for SCHED_OTHER seems more meaningful.
Link: https://lkml.kernel.org/r/20240207065142.1753909-1-limingming3@lixiang.com
Cc: stable@vger.kernel.org Fixes: b1696371d865 ("rtla: Helper functions for rtla") Signed-off-by: limingming3 limingming3@lixiang.com Signed-off-by: Daniel Bristot de Oliveira bristot@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/tracing/rtla/src/utils.c | 6 +++--- tools/tracing/rtla/src/utils.h | 2 ++ 2 files changed, 5 insertions(+), 3 deletions(-)
--- a/tools/tracing/rtla/src/utils.c +++ b/tools/tracing/rtla/src/utils.c @@ -473,13 +473,13 @@ int parse_prio(char *arg, struct sched_a if (prio == INVALID_VAL) return -1;
- if (prio < sched_get_priority_min(SCHED_OTHER)) + if (prio < MIN_NICE) return -1; - if (prio > sched_get_priority_max(SCHED_OTHER)) + if (prio > MAX_NICE) return -1;
sched_param->sched_policy = SCHED_OTHER; - sched_param->sched_priority = prio; + sched_param->sched_nice = prio; break; default: return -1; --- a/tools/tracing/rtla/src/utils.h +++ b/tools/tracing/rtla/src/utils.h @@ -9,6 +9,8 @@ */ #define BUFF_U64_STR_SIZE 24 #define MAX_PATH 1024 +#define MAX_NICE 20 +#define MIN_NICE -19
#define container_of(ptr, type, member)({ \ const typeof(((type *)0)->member) *__mptr = (ptr); \
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Bristot de Oliveira bristot@kernel.org
commit 30369084ac6e27479a347899e74f523e6ca29b89 upstream.
clang is reporting this warning:
$ make HOSTCC=clang CC=clang LLVM_IAS=1 [...] clang -O -g -DVERSION="6.8.0-rc3" -flto=auto -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS $(pkg-config --cflags libtracefs) -c -o src/utils.o src/utils.c
src/utils.c:548:66: warning: 'fscanf' may overflow; destination buffer in argument 3 has size 1024, but the corresponding specifier may require size 1025 [-Wfortify-source] 548 | while (fscanf(fp, "%*s %" STR(MAX_PATH) "s %99s %*s %*d %*d\n", mount_point, type) == 2) { | ^
Increase mount_point variable size to MAX_PATH+1 to avoid the overflow.
Link: https://lkml.kernel.org/r/1b46712e93a2f4153909514a36016959dcc4021c.170721709...
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: Bill Wendling morbo@google.com Cc: Justin Stitt justinstitt@google.com Cc: Donald Zickus dzickus@redhat.com Fixes: a957cbc02531 ("rtla: Add -C cgroup support") Signed-off-by: Daniel Bristot de Oliveira bristot@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/tracing/rtla/src/utils.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/tracing/rtla/src/utils.c +++ b/tools/tracing/rtla/src/utils.c @@ -530,7 +530,7 @@ int set_cpu_dma_latency(int32_t latency) */ static const int find_mount(const char *fs, char *mp, int sizeof_mp) { - char mount_point[MAX_PATH]; + char mount_point[MAX_PATH+1]; char type[100]; int found = 0; FILE *fp;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: John Kacur jkacur@redhat.com
commit b5f319360371087d52070d8f3fc7789e80ce69a6 upstream.
Fix rtla so that the following commands exit with 0 when help is invoked
rtla osnoise top -h rtla osnoise hist -h rtla timerlat top -h rtla timerlat hist -h
Link: https://lore.kernel.org/linux-trace-devel/20240203001607.69703-1-jkacur@redh...
Cc: stable@vger.kernel.org Fixes: 1eeb6328e8b3 ("rtla/timerlat: Add timerlat hist mode") Signed-off-by: John Kacur jkacur@redhat.com Signed-off-by: Daniel Bristot de Oliveira bristot@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/tracing/rtla/src/osnoise_hist.c | 6 +++++- tools/tracing/rtla/src/osnoise_top.c | 6 +++++- tools/tracing/rtla/src/timerlat_hist.c | 6 +++++- tools/tracing/rtla/src/timerlat_top.c | 6 +++++- 4 files changed, 20 insertions(+), 4 deletions(-)
--- a/tools/tracing/rtla/src/osnoise_hist.c +++ b/tools/tracing/rtla/src/osnoise_hist.c @@ -480,7 +480,11 @@ static void osnoise_hist_usage(char *usa
for (i = 0; msg[i]; i++) fprintf(stderr, "%s\n", msg[i]); - exit(1); + + if (usage) + exit(EXIT_FAILURE); + + exit(EXIT_SUCCESS); }
/* --- a/tools/tracing/rtla/src/osnoise_top.c +++ b/tools/tracing/rtla/src/osnoise_top.c @@ -331,7 +331,11 @@ static void osnoise_top_usage(struct osn
for (i = 0; msg[i]; i++) fprintf(stderr, "%s\n", msg[i]); - exit(1); + + if (usage) + exit(EXIT_FAILURE); + + exit(EXIT_SUCCESS); }
/* --- a/tools/tracing/rtla/src/timerlat_hist.c +++ b/tools/tracing/rtla/src/timerlat_hist.c @@ -546,7 +546,11 @@ static void timerlat_hist_usage(char *us
for (i = 0; msg[i]; i++) fprintf(stderr, "%s\n", msg[i]); - exit(1); + + if (usage) + exit(EXIT_FAILURE); + + exit(EXIT_SUCCESS); }
/* --- a/tools/tracing/rtla/src/timerlat_top.c +++ b/tools/tracing/rtla/src/timerlat_top.c @@ -375,7 +375,11 @@ static void timerlat_top_usage(char *usa
for (i = 0; msg[i]; i++) fprintf(stderr, "%s\n", msg[i]); - exit(1); + + if (usage) + exit(EXIT_FAILURE); + + exit(EXIT_SUCCESS); }
/*
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Bristot de Oliveira bristot@kernel.org
commit 64dc40f7523369912d7adb22c8cb655f71610505 upstream.
When compiling rtla with clang, I am getting the following warnings:
$ make HOSTCC=clang CC=clang LLVM_IAS=1
[..] clang -O -g -DVERSION="6.8.0-rc3" -flto=auto -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS $(pkg-config --cflags libtracefs) -c -o src/osnoise_hist.o src/osnoise_hist.c src/osnoise_hist.c:138:6: warning: variable 'bucket' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] 138 | if (data->bucket_size) | ^~~~~~~~~~~~~~~~~ src/osnoise_hist.c:149:6: note: uninitialized use occurs here 149 | if (bucket < entries) | ^~~~~~ src/osnoise_hist.c:138:2: note: remove the 'if' if its condition is always true 138 | if (data->bucket_size) | ^~~~~~~~~~~~~~~~~~~~~~ 139 | bucket = duration / data->bucket_size; src/osnoise_hist.c:132:12: note: initialize the variable 'bucket' to silence this warning 132 | int bucket; | ^ | = 0 1 warning generated.
[...]
clang -O -g -DVERSION="6.8.0-rc3" -flto=auto -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS $(pkg-config --cflags libtracefs) -c -o src/timerlat_hist.o src/timerlat_hist.c src/timerlat_hist.c:181:6: warning: variable 'bucket' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] 181 | if (data->bucket_size) | ^~~~~~~~~~~~~~~~~ src/timerlat_hist.c:204:6: note: uninitialized use occurs here 204 | if (bucket < entries) | ^~~~~~ src/timerlat_hist.c:181:2: note: remove the 'if' if its condition is always true 181 | if (data->bucket_size) | ^~~~~~~~~~~~~~~~~~~~~~ 182 | bucket = latency / data->bucket_size; src/timerlat_hist.c:175:12: note: initialize the variable 'bucket' to silence this warning 175 | int bucket; | ^ | = 0 1 warning generated.
This is a legit warning, but data->bucket_size is always > 0 (see timerlat_hist_parse_args()), so the if is not necessary.
Remove the unneeded if (data->bucket_size) to avoid the warning.
Link: https://lkml.kernel.org/r/6e1b1665cd99042ae705b3e0fc410858c4c42346.170721709...
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: Bill Wendling morbo@google.com Cc: Justin Stitt justinstitt@google.com Cc: Donald Zickus dzickus@redhat.com Fixes: 1eeb6328e8b3 ("rtla/timerlat: Add timerlat hist mode") Fixes: 829a6c0b5698 ("rtla/osnoise: Add the hist mode") Signed-off-by: Daniel Bristot de Oliveira bristot@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/tracing/rtla/src/osnoise_hist.c | 3 +-- tools/tracing/rtla/src/timerlat_hist.c | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-)
--- a/tools/tracing/rtla/src/osnoise_hist.c +++ b/tools/tracing/rtla/src/osnoise_hist.c @@ -135,8 +135,7 @@ static void osnoise_hist_update_multiple if (params->output_divisor) duration = duration / params->output_divisor;
- if (data->bucket_size) - bucket = duration / data->bucket_size; + bucket = duration / data->bucket_size;
total_duration = duration * count;
--- a/tools/tracing/rtla/src/timerlat_hist.c +++ b/tools/tracing/rtla/src/timerlat_hist.c @@ -178,8 +178,7 @@ timerlat_hist_update(struct osnoise_tool if (params->output_divisor) latency = latency / params->output_divisor;
- if (data->bucket_size) - bucket = latency / data->bucket_size; + bucket = latency / data->bucket_size;
if (!context) { hist = data->hist[cpu].irq;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Bristot de Oliveira bristot@kernel.org
commit bc4cbc9d260ba8358ca63662919f4bb223cb603b upstream.
The following errors are showing up when compiling rtla with clang:
$ make HOSTCC=clang CC=clang LLVM_IAS=1 [...]
clang -O -g -DVERSION="6.8.0-rc1" -flto=auto -ffat-lto-objects -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized $(pkg-config --cflags libtracefs) -c -o src/utils.o src/utils.c
clang: warning: optimization flag '-ffat-lto-objects' is not supported [-Wignored-optimization-argument] warning: unknown warning option '-Wno-maybe-uninitialized'; did you mean '-Wno-uninitialized'? [-Wunknown-warning-option] 1 warning generated.
clang -o rtla -ggdb src/osnoise.o src/osnoise_hist.o src/osnoise_top.o src/rtla.o src/timerlat_aa.o src/timerlat.o src/timerlat_hist.o src/timerlat_top.o src/timerlat_u.o src/trace.o src/utils.o $(pkg-config --libs libtracefs)
src/osnoise.o: file not recognized: file format not recognized clang: error: linker command failed with exit code 1 (use -v to see invocation) make: *** [Makefile:110: rtla] Error 1
Solve these issues by: - removing -ffat-lto-objects and -Wno-maybe-uninitialized if using clang - informing the linker about -flto=auto
Link: https://lore.kernel.org/linux-trace-kernel/567ac1b94effc228ce9a0225b9df7232a...
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: Bill Wendling morbo@google.com Cc: Justin Stitt justinstitt@google.com Fixes: 1a7b22ab15eb ("tools/rtla: Build with EXTRA_{C,LD}FLAGS") Suggested-by: Donald Zickus dzickus@redhat.com Signed-off-by: Daniel Bristot de Oliveira bristot@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/tracing/rtla/Makefile | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/tools/tracing/rtla/Makefile +++ b/tools/tracing/rtla/Makefile @@ -28,10 +28,15 @@ FOPTS := -flto=auto -ffat-lto-objects -f -fasynchronous-unwind-tables -fstack-clash-protection WOPTS := -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized
+ifeq ($(CC),clang) + FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS)) + WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS)) +endif + TRACEFS_HEADERS := $$($(PKG_CONFIG) --cflags libtracefs)
CFLAGS := -O -g -DVERSION="$(VERSION)" $(FOPTS) $(MOPTS) $(WOPTS) $(TRACEFS_HEADERS) $(EXTRA_CFLAGS) -LDFLAGS := -ggdb $(EXTRA_LDFLAGS) +LDFLAGS := -flto=auto -ggdb $(EXTRA_LDFLAGS) LIBS := $$($(PKG_CONFIG) --libs libtracefs)
SRC := $(wildcard src/*.c)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Brauner brauner@kernel.org
commit 46f5ab762d048dad224436978315cbc2fa79c630 upstream.
When we added mount_setattr() I added additional checks compared to the legacy do_reconfigure_mnt() and do_change_type() helpers used by regular mount(2). If that mount had a parent then verify that the caller and the mount namespace the mount is attached to match and if not make sure that it's an anonymous mount.
The real rootfs falls into neither category. It is neither an anoymous mount because it is obviously attached to the initial mount namespace but it also obviously doesn't have a parent mount. So that means legacy mount(2) allows changing mount properties on the real rootfs but mount_setattr(2) blocks this. I never thought much about this but of course someone on this planet of earth changes properties on the real rootfs as can be seen in [1].
Since util-linux finally switched to the new mount api in 2.39 not so long ago it also relies on mount_setattr() and that surfaced this issue when Fedora 39 finally switched to it. Fix this.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2256843 Link: https://lore.kernel.org/r/20240206-vfs-mount-rootfs-v1-1-19b335eee133@kernel... Reviewed-by: Jan Kara jack@suse.cz Reported-by: Karel Zak kzak@redhat.com Cc: stable@vger.kernel.org # v5.12+ Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/namespace.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
--- a/fs/namespace.c +++ b/fs/namespace.c @@ -4472,10 +4472,15 @@ static int do_mount_setattr(struct path /* * If this is an attached mount make sure it's located in the callers * mount namespace. If it's not don't let the caller interact with it. - * If this is a detached mount make sure it has an anonymous mount - * namespace attached to it, i.e. we've created it via OPEN_TREE_CLONE. + * + * If this mount doesn't have a parent it's most often simply a + * detached mount with an anonymous mount namespace. IOW, something + * that's simply not attached yet. But there are apparently also users + * that do change mount properties on the rootfs itself. That obviously + * neither has a parent nor is it a detached mount so we cannot + * unconditionally check for detached mounts. */ - if (!(mnt_has_parent(mnt) ? check_mnt(mnt) : is_anon_ns(mnt->mnt_ns))) + if ((mnt_has_parent(mnt) || !is_anon_ns(mnt->mnt_ns)) && !check_mnt(mnt)) goto out;
/*
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sinthu Raja sinthu.raja@ti.com
commit bc4ce46b1e3d1da4309405cd4afc7c0fcddd0b90 upstream.
The below commit introduced a WARN when phy state is not in the states: PHY_HALTED, PHY_READY and PHY_UP. commit 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
When cpsw resumes, there have port in PHY_NOLINK state, so the below warning comes out. Set mac_managed_pm be true to tell mdio that the phy resume/suspend is managed by the mac, to fix the following warning:
WARNING: CPU: 0 PID: 965 at drivers/net/phy/phy_device.c:326 mdio_bus_phy_resume+0x140/0x144 CPU: 0 PID: 965 Comm: sh Tainted: G O 6.1.46-g247b2535b2 #1 Hardware name: Generic AM33XX (Flattened Device Tree) unwind_backtrace from show_stack+0x18/0x1c show_stack from dump_stack_lvl+0x24/0x2c dump_stack_lvl from __warn+0x84/0x15c __warn from warn_slowpath_fmt+0x1a8/0x1c8 warn_slowpath_fmt from mdio_bus_phy_resume+0x140/0x144 mdio_bus_phy_resume from dpm_run_callback+0x3c/0x140 dpm_run_callback from device_resume+0xb8/0x2b8 device_resume from dpm_resume+0x144/0x314 dpm_resume from dpm_resume_end+0x14/0x20 dpm_resume_end from suspend_devices_and_enter+0xd0/0x924 suspend_devices_and_enter from pm_suspend+0x2e0/0x33c pm_suspend from state_store+0x74/0xd0 state_store from kernfs_fop_write_iter+0x104/0x1ec kernfs_fop_write_iter from vfs_write+0x1b8/0x358 vfs_write from ksys_write+0x78/0xf8 ksys_write from ret_fast_syscall+0x0/0x54 Exception stack(0xe094dfa8 to 0xe094dff0) dfa0: 00000004 005c3fb8 00000001 005c3fb8 00000004 00000001 dfc0: 00000004 005c3fb8 b6f6bba0 00000004 00000004 0059edb8 00000000 00000000 dfe0: 00000004 bed918f0 b6f09bd3 b6e89a66
Cc: stable@vger.kernel.org # v6.0+ Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM") Signed-off-by: Sinthu Raja sinthu.raja@ti.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/ti/cpsw.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -631,6 +631,8 @@ static void cpsw_slave_open(struct cpsw_ } }
+ phy->mac_managed_pm = true; + slave->phy = phy;
phy_attached_info(slave->phy);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandra Winter wintera@linux.ibm.com
commit 2fe8a236436fe40d8d26a1af8d150fc80f04ee1a upstream.
Symptom: In case of a bad cable connection (e.g. dirty optics) a fast sequence of network DOWN-UP-DOWN-UP could happen. UP triggers recovery of the qeth interface. In case of a second DOWN while recovery is still ongoing, it can happen that the IP@ of a Layer3 qeth interface is lost and will not be recovered by the second UP.
Problem: When registration of IP addresses with Layer 3 qeth devices fails, (e.g. because of bad address format) the respective IP address is deleted from its hash-table in the driver. If registration fails because of a ENETDOWN condition, the address should stay in the hashtable, so a subsequent recovery can restore it.
3caa4af834df ("qeth: keep ip-address after LAN_OFFLINE failure") fixes this for registration failures during normal operation, but not during recovery.
Solution: Keep L3-IP address in case of ENETDOWN in qeth_l3_recover_ip(). For consistency with qeth_l3_add_ip() we also keep it in case of EADDRINUSE, i.e. for some reason the card already/still has this address registered.
Fixes: 4a71df50047f ("qeth: new qeth device driver") Cc: stable@vger.kernel.org Signed-off-by: Alexandra Winter wintera@linux.ibm.com Link: https://lore.kernel.org/r/20240206085849.2902775-1-wintera@linux.ibm.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/net/qeth_l3_main.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/drivers/s390/net/qeth_l3_main.c +++ b/drivers/s390/net/qeth_l3_main.c @@ -255,9 +255,10 @@ static void qeth_l3_clear_ip_htable(stru if (!recover) { hash_del(&addr->hnode); kfree(addr); - continue; + } else { + /* prepare for recovery */ + addr->disp_flag = QETH_DISP_ADDR_ADD; } - addr->disp_flag = QETH_DISP_ADDR_ADD; }
mutex_unlock(&card->ip_lock); @@ -278,9 +279,11 @@ static void qeth_l3_recover_ip(struct qe if (addr->disp_flag == QETH_DISP_ADDR_ADD) { rc = qeth_l3_register_addr_entry(card, addr);
- if (!rc) { + if (!rc || rc == -EADDRINUSE || rc == -ENETDOWN) { + /* keep it in the records */ addr->disp_flag = QETH_DISP_ADDR_DO_NOTHING; } else { + /* bad address */ hash_del(&addr->hnode); kfree(addr); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: SeongJae Park sj@kernel.org
commit b9e4bc1046d20e0623a80660ef8627448056f817 upstream.
DAMON sysfs interface's update_schemes_tried_regions command has a timeout of two apply intervals of the DAMOS scheme. Having zero value DAMOS scheme apply interval means it will use the aggregation interval as the value. However, the timeout setup logic is mistakenly using the sampling interval insted of the aggregartion interval for the case. This could cause earlier-than-expected timeout of the command. Fix it.
Link: https://lkml.kernel.org/r/20240202191956.88791-1-sj@kernel.org Fixes: 7d6fa31a2fd7 ("mm/damon/sysfs-schemes: add timeout for update_schemes_tried_regions") Signed-off-by: SeongJae Park sj@kernel.org Cc: stable@vger.kernel.org # 6.7.x Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/damon/sysfs-schemes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/damon/sysfs-schemes.c +++ b/mm/damon/sysfs-schemes.c @@ -1928,7 +1928,7 @@ static void damos_tried_regions_init_upd sysfs_regions->upd_timeout_jiffies = jiffies + 2 * usecs_to_jiffies(scheme->apply_interval_us ? scheme->apply_interval_us : - ctx->attrs.sample_interval); + ctx->attrs.aggr_interval); } }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sinthu Raja sinthu.raja@ti.com
commit 9def04e759caa5a3d741891037ae99f81e2fff01 upstream.
The below commit introduced a WARN when phy state is not in the states: PHY_HALTED, PHY_READY and PHY_UP. commit 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
When cpsw_new resumes, there have port in PHY_NOLINK state, so the below warning comes out. Set mac_managed_pm be true to tell mdio that the phy resume/suspend is managed by the mac, to fix the following warning:
WARNING: CPU: 0 PID: 965 at drivers/net/phy/phy_device.c:326 mdio_bus_phy_resume+0x140/0x144 CPU: 0 PID: 965 Comm: sh Tainted: G O 6.1.46-g247b2535b2 #1 Hardware name: Generic AM33XX (Flattened Device Tree) unwind_backtrace from show_stack+0x18/0x1c show_stack from dump_stack_lvl+0x24/0x2c dump_stack_lvl from __warn+0x84/0x15c __warn from warn_slowpath_fmt+0x1a8/0x1c8 warn_slowpath_fmt from mdio_bus_phy_resume+0x140/0x144 mdio_bus_phy_resume from dpm_run_callback+0x3c/0x140 dpm_run_callback from device_resume+0xb8/0x2b8 device_resume from dpm_resume+0x144/0x314 dpm_resume from dpm_resume_end+0x14/0x20 dpm_resume_end from suspend_devices_and_enter+0xd0/0x924 suspend_devices_and_enter from pm_suspend+0x2e0/0x33c pm_suspend from state_store+0x74/0xd0 state_store from kernfs_fop_write_iter+0x104/0x1ec kernfs_fop_write_iter from vfs_write+0x1b8/0x358 vfs_write from ksys_write+0x78/0xf8 ksys_write from ret_fast_syscall+0x0/0x54 Exception stack(0xe094dfa8 to 0xe094dff0) dfa0: 00000004 005c3fb8 00000001 005c3fb8 00000004 00000001 dfc0: 00000004 005c3fb8 b6f6bba0 00000004 00000004 0059edb8 00000000 00000000 dfe0: 00000004 bed918f0 b6f09bd3 b6e89a66
Cc: stable@vger.kernel.org # v6.0+ Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM") Signed-off-by: Sinthu Raja sinthu.raja@ti.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/ti/cpsw_new.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/net/ethernet/ti/cpsw_new.c +++ b/drivers/net/ethernet/ti/cpsw_new.c @@ -773,6 +773,9 @@ static void cpsw_slave_open(struct cpsw_ slave->slave_num); return; } + + phy->mac_managed_pm = true; + slave->phy = phy;
phy_attached_info(slave->phy);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geert Uytterhoeven geert+renesas@glider.be
commit f0e4a1356466ec1858ae8e5c70bea2ce5e55008b upstream.
The power domain containing the Cortex-R7 CPU core on the R-Car V3H SoC must always be in power-on state, unlike on other SoCs in the R-Car Gen3 family. See Table 9.4 "Power domains" in the R-Car Series, 3rd Generation Hardware User’s Manual Rev.1.00 and later.
Fix this by marking the domain as a CPU domain without control registers, so the driver will not touch it.
Fixes: 41d6d8bd8ae9 ("soc: renesas: rcar-sysc: add R8A77980 support") Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/fdad9a86132d53ecddf72b734dac406915c4edc0.170507673... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pmdomain/renesas/r8a77980-sysc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/pmdomain/renesas/r8a77980-sysc.c +++ b/drivers/pmdomain/renesas/r8a77980-sysc.c @@ -25,7 +25,8 @@ static const struct rcar_sysc_area r8a77 PD_CPU_NOCR }, { "ca53-cpu3", 0x200, 3, R8A77980_PD_CA53_CPU3, R8A77980_PD_CA53_SCU, PD_CPU_NOCR }, - { "cr7", 0x240, 0, R8A77980_PD_CR7, R8A77980_PD_ALWAYS_ON }, + { "cr7", 0x240, 0, R8A77980_PD_CR7, R8A77980_PD_ALWAYS_ON, + PD_CPU_NOCR }, { "a3ir", 0x180, 0, R8A77980_PD_A3IR, R8A77980_PD_ALWAYS_ON }, { "a2ir0", 0x400, 0, R8A77980_PD_A2IR0, R8A77980_PD_A3IR }, { "a2ir1", 0x400, 1, R8A77980_PD_A2IR1, R8A77980_PD_A3IR },
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jan Kiszka jan.kiszka@siemens.com
commit afb2a4fb84555ef9e61061f6ea63ed7087b295d5 upstream.
The cflags for the RISC-V efistub were missing -mno-relax, thus were under the risk that the compiler could use GP-relative addressing. That happened for _edata with binutils-2.41 and kernel 6.1, causing the relocation to fail due to an invalid kernel_size in handle_kernel_image. It was not yet observed with newer versions, but that may just be luck.
Cc: stable@vger.kernel.org Signed-off-by: Jan Kiszka jan.kiszka@siemens.com Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firmware/efi/libstub/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/firmware/efi/libstub/Makefile +++ b/drivers/firmware/efi/libstub/Makefile @@ -28,7 +28,7 @@ cflags-$(CONFIG_ARM) += -DEFI_HAVE_STRL -DEFI_HAVE_MEMCHR -DEFI_HAVE_STRRCHR \ -DEFI_HAVE_STRCMP -fno-builtin -fpic \ $(call cc-option,-mno-single-pic-base) -cflags-$(CONFIG_RISCV) += -fpic -DNO_ALTERNATIVE +cflags-$(CONFIG_RISCV) += -fpic -DNO_ALTERNATIVE -mno-relax cflags-$(CONFIG_LOONGARCH) += -fpie
cflags-$(CONFIG_EFI_PARAMS_FROM_FDT) += -I$(srctree)/scripts/dtc/libfdt
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Petr Tesarik petr@tesarici.cz
commit 38cc3c6dcc09dc3a1800b5ec22aef643ca11eab8 upstream.
As explained by a comment in <linux/u64_stats_sync.h>, write side of struct u64_stats_sync must ensure mutual exclusion, or one seqcount update could be lost on 32-bit platforms, thus blocking readers forever. Such lockups have been observed in real world after stmmac_xmit() on one CPU raced with stmmac_napi_poll_tx() on another CPU.
To fix the issue without introducing a new lock, split the statics into three parts:
1. fields updated only under the tx queue lock, 2. fields updated only during NAPI poll, 3. fields updated only from interrupt context,
Updates to fields in the first two groups are already serialized through other locks. It is sufficient to split the existing struct u64_stats_sync so that each group has its own.
Note that tx_set_ic_bit is updated from both contexts. Split this counter so that each context gets its own, and calculate their sum to get the total value in stmmac_get_ethtool_stats().
For the third group, multiple interrupts may be processed by different CPUs at the same time, but interrupts on the same CPU will not nest. Move fields from this group to a newly created per-cpu struct stmmac_pcpu_stats.
Fixes: 133466c3bbe1 ("net: stmmac: use per-queue 64 bit statistics where necessary") Link: https://lore.kernel.org/netdev/Za173PhviYg-1qIn@torres.zugschlus.de/t/ Cc: stable@vger.kernel.org Signed-off-by: Petr Tesarik petr@tesarici.cz Reviewed-by: Jisheng Zhang jszhang@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/stmicro/stmmac/common.h | 56 +++++--- drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 15 +- drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c | 15 +- drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c | 15 +- drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c | 15 +- drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 129 ++++++++++++------ drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 133 +++++++++---------- 7 files changed, 221 insertions(+), 157 deletions(-)
--- a/drivers/net/ethernet/stmicro/stmmac/common.h +++ b/drivers/net/ethernet/stmicro/stmmac/common.h @@ -59,28 +59,51 @@ #undef FRAME_FILTER_DEBUG /* #define FRAME_FILTER_DEBUG */
+struct stmmac_q_tx_stats { + u64_stats_t tx_bytes; + u64_stats_t tx_set_ic_bit; + u64_stats_t tx_tso_frames; + u64_stats_t tx_tso_nfrags; +}; + +struct stmmac_napi_tx_stats { + u64_stats_t tx_packets; + u64_stats_t tx_pkt_n; + u64_stats_t poll; + u64_stats_t tx_clean; + u64_stats_t tx_set_ic_bit; +}; + struct stmmac_txq_stats { - u64 tx_bytes; - u64 tx_packets; - u64 tx_pkt_n; - u64 tx_normal_irq_n; - u64 napi_poll; - u64 tx_clean; - u64 tx_set_ic_bit; - u64 tx_tso_frames; - u64 tx_tso_nfrags; - struct u64_stats_sync syncp; + /* Updates protected by tx queue lock. */ + struct u64_stats_sync q_syncp; + struct stmmac_q_tx_stats q; + + /* Updates protected by NAPI poll logic. */ + struct u64_stats_sync napi_syncp; + struct stmmac_napi_tx_stats napi; } ____cacheline_aligned_in_smp;
+struct stmmac_napi_rx_stats { + u64_stats_t rx_bytes; + u64_stats_t rx_packets; + u64_stats_t rx_pkt_n; + u64_stats_t poll; +}; + struct stmmac_rxq_stats { - u64 rx_bytes; - u64 rx_packets; - u64 rx_pkt_n; - u64 rx_normal_irq_n; - u64 napi_poll; - struct u64_stats_sync syncp; + /* Updates protected by NAPI poll logic. */ + struct u64_stats_sync napi_syncp; + struct stmmac_napi_rx_stats napi; } ____cacheline_aligned_in_smp;
+/* Updates on each CPU protected by not allowing nested irqs. */ +struct stmmac_pcpu_stats { + struct u64_stats_sync syncp; + u64_stats_t rx_normal_irq_n[MTL_MAX_TX_QUEUES]; + u64_stats_t tx_normal_irq_n[MTL_MAX_RX_QUEUES]; +}; + /* Extra statistic and debug information exposed by ethtool */ struct stmmac_extra_stats { /* Transmit errors */ @@ -205,6 +228,7 @@ struct stmmac_extra_stats { /* per queue statistics */ struct stmmac_txq_stats txq_stats[MTL_MAX_TX_QUEUES]; struct stmmac_rxq_stats rxq_stats[MTL_MAX_RX_QUEUES]; + struct stmmac_pcpu_stats __percpu *pcpu_stats; unsigned long rx_dropped; unsigned long rx_errors; unsigned long tx_dropped; --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c @@ -441,8 +441,7 @@ static int sun8i_dwmac_dma_interrupt(str struct stmmac_extra_stats *x, u32 chan, u32 dir) { - struct stmmac_rxq_stats *rxq_stats = &priv->xstats.rxq_stats[chan]; - struct stmmac_txq_stats *txq_stats = &priv->xstats.txq_stats[chan]; + struct stmmac_pcpu_stats *stats = this_cpu_ptr(priv->xstats.pcpu_stats); int ret = 0; u32 v;
@@ -455,9 +454,9 @@ static int sun8i_dwmac_dma_interrupt(str
if (v & EMAC_TX_INT) { ret |= handle_tx; - u64_stats_update_begin(&txq_stats->syncp); - txq_stats->tx_normal_irq_n++; - u64_stats_update_end(&txq_stats->syncp); + u64_stats_update_begin(&stats->syncp); + u64_stats_inc(&stats->tx_normal_irq_n[chan]); + u64_stats_update_end(&stats->syncp); }
if (v & EMAC_TX_DMA_STOP_INT) @@ -479,9 +478,9 @@ static int sun8i_dwmac_dma_interrupt(str
if (v & EMAC_RX_INT) { ret |= handle_rx; - u64_stats_update_begin(&rxq_stats->syncp); - rxq_stats->rx_normal_irq_n++; - u64_stats_update_end(&rxq_stats->syncp); + u64_stats_update_begin(&stats->syncp); + u64_stats_inc(&stats->rx_normal_irq_n[chan]); + u64_stats_update_end(&stats->syncp); }
if (v & EMAC_RX_BUF_UA_INT) --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c @@ -171,8 +171,7 @@ int dwmac4_dma_interrupt(struct stmmac_p const struct dwmac4_addrs *dwmac4_addrs = priv->plat->dwmac4_addrs; u32 intr_status = readl(ioaddr + DMA_CHAN_STATUS(dwmac4_addrs, chan)); u32 intr_en = readl(ioaddr + DMA_CHAN_INTR_ENA(dwmac4_addrs, chan)); - struct stmmac_rxq_stats *rxq_stats = &priv->xstats.rxq_stats[chan]; - struct stmmac_txq_stats *txq_stats = &priv->xstats.txq_stats[chan]; + struct stmmac_pcpu_stats *stats = this_cpu_ptr(priv->xstats.pcpu_stats); int ret = 0;
if (dir == DMA_DIR_RX) @@ -201,15 +200,15 @@ int dwmac4_dma_interrupt(struct stmmac_p } /* TX/RX NORMAL interrupts */ if (likely(intr_status & DMA_CHAN_STATUS_RI)) { - u64_stats_update_begin(&rxq_stats->syncp); - rxq_stats->rx_normal_irq_n++; - u64_stats_update_end(&rxq_stats->syncp); + u64_stats_update_begin(&stats->syncp); + u64_stats_inc(&stats->rx_normal_irq_n[chan]); + u64_stats_update_end(&stats->syncp); ret |= handle_rx; } if (likely(intr_status & DMA_CHAN_STATUS_TI)) { - u64_stats_update_begin(&txq_stats->syncp); - txq_stats->tx_normal_irq_n++; - u64_stats_update_end(&txq_stats->syncp); + u64_stats_update_begin(&stats->syncp); + u64_stats_inc(&stats->tx_normal_irq_n[chan]); + u64_stats_update_end(&stats->syncp); ret |= handle_tx; }
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c @@ -162,8 +162,7 @@ static void show_rx_process_state(unsign int dwmac_dma_interrupt(struct stmmac_priv *priv, void __iomem *ioaddr, struct stmmac_extra_stats *x, u32 chan, u32 dir) { - struct stmmac_rxq_stats *rxq_stats = &priv->xstats.rxq_stats[chan]; - struct stmmac_txq_stats *txq_stats = &priv->xstats.txq_stats[chan]; + struct stmmac_pcpu_stats *stats = this_cpu_ptr(priv->xstats.pcpu_stats); int ret = 0; /* read the status register (CSR5) */ u32 intr_status = readl(ioaddr + DMA_STATUS); @@ -215,16 +214,16 @@ int dwmac_dma_interrupt(struct stmmac_pr u32 value = readl(ioaddr + DMA_INTR_ENA); /* to schedule NAPI on real RIE event. */ if (likely(value & DMA_INTR_ENA_RIE)) { - u64_stats_update_begin(&rxq_stats->syncp); - rxq_stats->rx_normal_irq_n++; - u64_stats_update_end(&rxq_stats->syncp); + u64_stats_update_begin(&stats->syncp); + u64_stats_inc(&stats->rx_normal_irq_n[chan]); + u64_stats_update_end(&stats->syncp); ret |= handle_rx; } } if (likely(intr_status & DMA_STATUS_TI)) { - u64_stats_update_begin(&txq_stats->syncp); - txq_stats->tx_normal_irq_n++; - u64_stats_update_end(&txq_stats->syncp); + u64_stats_update_begin(&stats->syncp); + u64_stats_inc(&stats->tx_normal_irq_n[chan]); + u64_stats_update_end(&stats->syncp); ret |= handle_tx; } if (unlikely(intr_status & DMA_STATUS_ERI)) --- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c @@ -337,8 +337,7 @@ static int dwxgmac2_dma_interrupt(struct struct stmmac_extra_stats *x, u32 chan, u32 dir) { - struct stmmac_rxq_stats *rxq_stats = &priv->xstats.rxq_stats[chan]; - struct stmmac_txq_stats *txq_stats = &priv->xstats.txq_stats[chan]; + struct stmmac_pcpu_stats *stats = this_cpu_ptr(priv->xstats.pcpu_stats); u32 intr_status = readl(ioaddr + XGMAC_DMA_CH_STATUS(chan)); u32 intr_en = readl(ioaddr + XGMAC_DMA_CH_INT_EN(chan)); int ret = 0; @@ -367,15 +366,15 @@ static int dwxgmac2_dma_interrupt(struct /* TX/RX NORMAL interrupts */ if (likely(intr_status & XGMAC_NIS)) { if (likely(intr_status & XGMAC_RI)) { - u64_stats_update_begin(&rxq_stats->syncp); - rxq_stats->rx_normal_irq_n++; - u64_stats_update_end(&rxq_stats->syncp); + u64_stats_update_begin(&stats->syncp); + u64_stats_inc(&stats->rx_normal_irq_n[chan]); + u64_stats_update_end(&stats->syncp); ret |= handle_rx; } if (likely(intr_status & (XGMAC_TI | XGMAC_TBU))) { - u64_stats_update_begin(&txq_stats->syncp); - txq_stats->tx_normal_irq_n++; - u64_stats_update_end(&txq_stats->syncp); + u64_stats_update_begin(&stats->syncp); + u64_stats_inc(&stats->tx_normal_irq_n[chan]); + u64_stats_update_end(&stats->syncp); ret |= handle_tx; } } --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c @@ -539,44 +539,79 @@ stmmac_set_pauseparam(struct net_device } }
+static u64 stmmac_get_rx_normal_irq_n(struct stmmac_priv *priv, int q) +{ + u64 total; + int cpu; + + total = 0; + for_each_possible_cpu(cpu) { + struct stmmac_pcpu_stats *pcpu; + unsigned int start; + u64 irq_n; + + pcpu = per_cpu_ptr(priv->xstats.pcpu_stats, cpu); + do { + start = u64_stats_fetch_begin(&pcpu->syncp); + irq_n = u64_stats_read(&pcpu->rx_normal_irq_n[q]); + } while (u64_stats_fetch_retry(&pcpu->syncp, start)); + total += irq_n; + } + return total; +} + +static u64 stmmac_get_tx_normal_irq_n(struct stmmac_priv *priv, int q) +{ + u64 total; + int cpu; + + total = 0; + for_each_possible_cpu(cpu) { + struct stmmac_pcpu_stats *pcpu; + unsigned int start; + u64 irq_n; + + pcpu = per_cpu_ptr(priv->xstats.pcpu_stats, cpu); + do { + start = u64_stats_fetch_begin(&pcpu->syncp); + irq_n = u64_stats_read(&pcpu->tx_normal_irq_n[q]); + } while (u64_stats_fetch_retry(&pcpu->syncp, start)); + total += irq_n; + } + return total; +} + static void stmmac_get_per_qstats(struct stmmac_priv *priv, u64 *data) { u32 tx_cnt = priv->plat->tx_queues_to_use; u32 rx_cnt = priv->plat->rx_queues_to_use; unsigned int start; - int q, stat; - char *p; + int q;
for (q = 0; q < tx_cnt; q++) { struct stmmac_txq_stats *txq_stats = &priv->xstats.txq_stats[q]; - struct stmmac_txq_stats snapshot; + u64 pkt_n;
do { - start = u64_stats_fetch_begin(&txq_stats->syncp); - snapshot = *txq_stats; - } while (u64_stats_fetch_retry(&txq_stats->syncp, start)); + start = u64_stats_fetch_begin(&txq_stats->napi_syncp); + pkt_n = u64_stats_read(&txq_stats->napi.tx_pkt_n); + } while (u64_stats_fetch_retry(&txq_stats->napi_syncp, start));
- p = (char *)&snapshot + offsetof(struct stmmac_txq_stats, tx_pkt_n); - for (stat = 0; stat < STMMAC_TXQ_STATS; stat++) { - *data++ = (*(u64 *)p); - p += sizeof(u64); - } + *data++ = pkt_n; + *data++ = stmmac_get_tx_normal_irq_n(priv, q); }
for (q = 0; q < rx_cnt; q++) { struct stmmac_rxq_stats *rxq_stats = &priv->xstats.rxq_stats[q]; - struct stmmac_rxq_stats snapshot; + u64 pkt_n;
do { - start = u64_stats_fetch_begin(&rxq_stats->syncp); - snapshot = *rxq_stats; - } while (u64_stats_fetch_retry(&rxq_stats->syncp, start)); + start = u64_stats_fetch_begin(&rxq_stats->napi_syncp); + pkt_n = u64_stats_read(&rxq_stats->napi.rx_pkt_n); + } while (u64_stats_fetch_retry(&rxq_stats->napi_syncp, start));
- p = (char *)&snapshot + offsetof(struct stmmac_rxq_stats, rx_pkt_n); - for (stat = 0; stat < STMMAC_RXQ_STATS; stat++) { - *data++ = (*(u64 *)p); - p += sizeof(u64); - } + *data++ = pkt_n; + *data++ = stmmac_get_rx_normal_irq_n(priv, q); } }
@@ -635,39 +670,49 @@ static void stmmac_get_ethtool_stats(str pos = j; for (i = 0; i < rx_queues_count; i++) { struct stmmac_rxq_stats *rxq_stats = &priv->xstats.rxq_stats[i]; - struct stmmac_rxq_stats snapshot; + struct stmmac_napi_rx_stats snapshot; + u64 n_irq;
j = pos; do { - start = u64_stats_fetch_begin(&rxq_stats->syncp); - snapshot = *rxq_stats; - } while (u64_stats_fetch_retry(&rxq_stats->syncp, start)); - - data[j++] += snapshot.rx_pkt_n; - data[j++] += snapshot.rx_normal_irq_n; - normal_irq_n += snapshot.rx_normal_irq_n; - napi_poll += snapshot.napi_poll; + start = u64_stats_fetch_begin(&rxq_stats->napi_syncp); + snapshot = rxq_stats->napi; + } while (u64_stats_fetch_retry(&rxq_stats->napi_syncp, start)); + + data[j++] += u64_stats_read(&snapshot.rx_pkt_n); + n_irq = stmmac_get_rx_normal_irq_n(priv, i); + data[j++] += n_irq; + normal_irq_n += n_irq; + napi_poll += u64_stats_read(&snapshot.poll); }
pos = j; for (i = 0; i < tx_queues_count; i++) { struct stmmac_txq_stats *txq_stats = &priv->xstats.txq_stats[i]; - struct stmmac_txq_stats snapshot; + struct stmmac_napi_tx_stats napi_snapshot; + struct stmmac_q_tx_stats q_snapshot; + u64 n_irq;
j = pos; do { - start = u64_stats_fetch_begin(&txq_stats->syncp); - snapshot = *txq_stats; - } while (u64_stats_fetch_retry(&txq_stats->syncp, start)); - - data[j++] += snapshot.tx_pkt_n; - data[j++] += snapshot.tx_normal_irq_n; - normal_irq_n += snapshot.tx_normal_irq_n; - data[j++] += snapshot.tx_clean; - data[j++] += snapshot.tx_set_ic_bit; - data[j++] += snapshot.tx_tso_frames; - data[j++] += snapshot.tx_tso_nfrags; - napi_poll += snapshot.napi_poll; + start = u64_stats_fetch_begin(&txq_stats->q_syncp); + q_snapshot = txq_stats->q; + } while (u64_stats_fetch_retry(&txq_stats->q_syncp, start)); + do { + start = u64_stats_fetch_begin(&txq_stats->napi_syncp); + napi_snapshot = txq_stats->napi; + } while (u64_stats_fetch_retry(&txq_stats->napi_syncp, start)); + + data[j++] += u64_stats_read(&napi_snapshot.tx_pkt_n); + n_irq = stmmac_get_tx_normal_irq_n(priv, i); + data[j++] += n_irq; + normal_irq_n += n_irq; + data[j++] += u64_stats_read(&napi_snapshot.tx_clean); + data[j++] += u64_stats_read(&q_snapshot.tx_set_ic_bit) + + u64_stats_read(&napi_snapshot.tx_set_ic_bit); + data[j++] += u64_stats_read(&q_snapshot.tx_tso_frames); + data[j++] += u64_stats_read(&q_snapshot.tx_tso_nfrags); + napi_poll += u64_stats_read(&napi_snapshot.poll); } normal_irq_n += priv->xstats.rx_early_irq; data[j++] = normal_irq_n; --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -2442,7 +2442,6 @@ static bool stmmac_xdp_xmit_zc(struct st struct xdp_desc xdp_desc; bool work_done = true; u32 tx_set_ic_bit = 0; - unsigned long flags;
/* Avoids TX time-out as we are sharing with slow path */ txq_trans_cond_update(nq); @@ -2515,9 +2514,9 @@ static bool stmmac_xdp_xmit_zc(struct st tx_q->cur_tx = STMMAC_GET_ENTRY(tx_q->cur_tx, priv->dma_conf.dma_tx_size); entry = tx_q->cur_tx; } - flags = u64_stats_update_begin_irqsave(&txq_stats->syncp); - txq_stats->tx_set_ic_bit += tx_set_ic_bit; - u64_stats_update_end_irqrestore(&txq_stats->syncp, flags); + u64_stats_update_begin(&txq_stats->napi_syncp); + u64_stats_add(&txq_stats->napi.tx_set_ic_bit, tx_set_ic_bit); + u64_stats_update_end(&txq_stats->napi_syncp);
if (tx_desc) { stmmac_flush_tx_descriptors(priv, queue); @@ -2565,7 +2564,6 @@ static int stmmac_tx_clean(struct stmmac unsigned int bytes_compl = 0, pkts_compl = 0; unsigned int entry, xmits = 0, count = 0; u32 tx_packets = 0, tx_errors = 0; - unsigned long flags;
__netif_tx_lock_bh(netdev_get_tx_queue(priv->dev, queue));
@@ -2721,11 +2719,11 @@ static int stmmac_tx_clean(struct stmmac if (tx_q->dirty_tx != tx_q->cur_tx) *pending_packets = true;
- flags = u64_stats_update_begin_irqsave(&txq_stats->syncp); - txq_stats->tx_packets += tx_packets; - txq_stats->tx_pkt_n += tx_packets; - txq_stats->tx_clean++; - u64_stats_update_end_irqrestore(&txq_stats->syncp, flags); + u64_stats_update_begin(&txq_stats->napi_syncp); + u64_stats_add(&txq_stats->napi.tx_packets, tx_packets); + u64_stats_add(&txq_stats->napi.tx_pkt_n, tx_packets); + u64_stats_inc(&txq_stats->napi.tx_clean); + u64_stats_update_end(&txq_stats->napi_syncp);
priv->xstats.tx_errors += tx_errors;
@@ -4150,7 +4148,6 @@ static netdev_tx_t stmmac_tso_xmit(struc struct stmmac_tx_queue *tx_q; bool has_vlan, set_ic; u8 proto_hdr_len, hdr; - unsigned long flags; u32 pay_len, mss; dma_addr_t des; int i; @@ -4315,13 +4312,13 @@ static netdev_tx_t stmmac_tso_xmit(struc netif_tx_stop_queue(netdev_get_tx_queue(priv->dev, queue)); }
- flags = u64_stats_update_begin_irqsave(&txq_stats->syncp); - txq_stats->tx_bytes += skb->len; - txq_stats->tx_tso_frames++; - txq_stats->tx_tso_nfrags += nfrags; + u64_stats_update_begin(&txq_stats->q_syncp); + u64_stats_add(&txq_stats->q.tx_bytes, skb->len); + u64_stats_inc(&txq_stats->q.tx_tso_frames); + u64_stats_add(&txq_stats->q.tx_tso_nfrags, nfrags); if (set_ic) - txq_stats->tx_set_ic_bit++; - u64_stats_update_end_irqrestore(&txq_stats->syncp, flags); + u64_stats_inc(&txq_stats->q.tx_set_ic_bit); + u64_stats_update_end(&txq_stats->q_syncp);
if (priv->sarc_type) stmmac_set_desc_sarc(priv, first, priv->sarc_type); @@ -4420,7 +4417,6 @@ static netdev_tx_t stmmac_xmit(struct sk struct stmmac_tx_queue *tx_q; bool has_vlan, set_ic; int entry, first_tx; - unsigned long flags; dma_addr_t des;
tx_q = &priv->dma_conf.tx_queue[queue]; @@ -4590,11 +4586,11 @@ static netdev_tx_t stmmac_xmit(struct sk netif_tx_stop_queue(netdev_get_tx_queue(priv->dev, queue)); }
- flags = u64_stats_update_begin_irqsave(&txq_stats->syncp); - txq_stats->tx_bytes += skb->len; + u64_stats_update_begin(&txq_stats->q_syncp); + u64_stats_add(&txq_stats->q.tx_bytes, skb->len); if (set_ic) - txq_stats->tx_set_ic_bit++; - u64_stats_update_end_irqrestore(&txq_stats->syncp, flags); + u64_stats_inc(&txq_stats->q.tx_set_ic_bit); + u64_stats_update_end(&txq_stats->q_syncp);
if (priv->sarc_type) stmmac_set_desc_sarc(priv, first, priv->sarc_type); @@ -4858,12 +4854,11 @@ static int stmmac_xdp_xmit_xdpf(struct s set_ic = false;
if (set_ic) { - unsigned long flags; tx_q->tx_count_frames = 0; stmmac_set_tx_ic(priv, tx_desc); - flags = u64_stats_update_begin_irqsave(&txq_stats->syncp); - txq_stats->tx_set_ic_bit++; - u64_stats_update_end_irqrestore(&txq_stats->syncp, flags); + u64_stats_update_begin(&txq_stats->q_syncp); + u64_stats_inc(&txq_stats->q.tx_set_ic_bit); + u64_stats_update_end(&txq_stats->q_syncp); }
stmmac_enable_dma_transmission(priv, priv->ioaddr); @@ -5013,7 +5008,6 @@ static void stmmac_dispatch_skb_zc(struc unsigned int len = xdp->data_end - xdp->data; enum pkt_hash_types hash_type; int coe = priv->hw->rx_csum; - unsigned long flags; struct sk_buff *skb; u32 hash;
@@ -5038,10 +5032,10 @@ static void stmmac_dispatch_skb_zc(struc skb_record_rx_queue(skb, queue); napi_gro_receive(&ch->rxtx_napi, skb);
- flags = u64_stats_update_begin_irqsave(&rxq_stats->syncp); - rxq_stats->rx_pkt_n++; - rxq_stats->rx_bytes += len; - u64_stats_update_end_irqrestore(&rxq_stats->syncp, flags); + u64_stats_update_begin(&rxq_stats->napi_syncp); + u64_stats_inc(&rxq_stats->napi.rx_pkt_n); + u64_stats_add(&rxq_stats->napi.rx_bytes, len); + u64_stats_update_end(&rxq_stats->napi_syncp); }
static bool stmmac_rx_refill_zc(struct stmmac_priv *priv, u32 queue, u32 budget) @@ -5123,7 +5117,6 @@ static int stmmac_rx_zc(struct stmmac_pr unsigned int desc_size; struct bpf_prog *prog; bool failure = false; - unsigned long flags; int xdp_status = 0; int status = 0;
@@ -5278,9 +5271,9 @@ read_again:
stmmac_finalize_xdp_rx(priv, xdp_status);
- flags = u64_stats_update_begin_irqsave(&rxq_stats->syncp); - rxq_stats->rx_pkt_n += count; - u64_stats_update_end_irqrestore(&rxq_stats->syncp, flags); + u64_stats_update_begin(&rxq_stats->napi_syncp); + u64_stats_add(&rxq_stats->napi.rx_pkt_n, count); + u64_stats_update_end(&rxq_stats->napi_syncp);
priv->xstats.rx_dropped += rx_dropped; priv->xstats.rx_errors += rx_errors; @@ -5318,7 +5311,6 @@ static int stmmac_rx(struct stmmac_priv unsigned int desc_size; struct sk_buff *skb = NULL; struct stmmac_xdp_buff ctx; - unsigned long flags; int xdp_status = 0; int buf_sz;
@@ -5571,11 +5563,11 @@ drain_data:
stmmac_rx_refill(priv, queue);
- flags = u64_stats_update_begin_irqsave(&rxq_stats->syncp); - rxq_stats->rx_packets += rx_packets; - rxq_stats->rx_bytes += rx_bytes; - rxq_stats->rx_pkt_n += count; - u64_stats_update_end_irqrestore(&rxq_stats->syncp, flags); + u64_stats_update_begin(&rxq_stats->napi_syncp); + u64_stats_add(&rxq_stats->napi.rx_packets, rx_packets); + u64_stats_add(&rxq_stats->napi.rx_bytes, rx_bytes); + u64_stats_add(&rxq_stats->napi.rx_pkt_n, count); + u64_stats_update_end(&rxq_stats->napi_syncp);
priv->xstats.rx_dropped += rx_dropped; priv->xstats.rx_errors += rx_errors; @@ -5590,13 +5582,12 @@ static int stmmac_napi_poll_rx(struct na struct stmmac_priv *priv = ch->priv_data; struct stmmac_rxq_stats *rxq_stats; u32 chan = ch->index; - unsigned long flags; int work_done;
rxq_stats = &priv->xstats.rxq_stats[chan]; - flags = u64_stats_update_begin_irqsave(&rxq_stats->syncp); - rxq_stats->napi_poll++; - u64_stats_update_end_irqrestore(&rxq_stats->syncp, flags); + u64_stats_update_begin(&rxq_stats->napi_syncp); + u64_stats_inc(&rxq_stats->napi.poll); + u64_stats_update_end(&rxq_stats->napi_syncp);
work_done = stmmac_rx(priv, budget, chan); if (work_done < budget && napi_complete_done(napi, work_done)) { @@ -5618,13 +5609,12 @@ static int stmmac_napi_poll_tx(struct na struct stmmac_txq_stats *txq_stats; bool pending_packets = false; u32 chan = ch->index; - unsigned long flags; int work_done;
txq_stats = &priv->xstats.txq_stats[chan]; - flags = u64_stats_update_begin_irqsave(&txq_stats->syncp); - txq_stats->napi_poll++; - u64_stats_update_end_irqrestore(&txq_stats->syncp, flags); + u64_stats_update_begin(&txq_stats->napi_syncp); + u64_stats_inc(&txq_stats->napi.poll); + u64_stats_update_end(&txq_stats->napi_syncp);
work_done = stmmac_tx_clean(priv, budget, chan, &pending_packets); work_done = min(work_done, budget); @@ -5654,17 +5644,16 @@ static int stmmac_napi_poll_rxtx(struct struct stmmac_rxq_stats *rxq_stats; struct stmmac_txq_stats *txq_stats; u32 chan = ch->index; - unsigned long flags;
rxq_stats = &priv->xstats.rxq_stats[chan]; - flags = u64_stats_update_begin_irqsave(&rxq_stats->syncp); - rxq_stats->napi_poll++; - u64_stats_update_end_irqrestore(&rxq_stats->syncp, flags); + u64_stats_update_begin(&rxq_stats->napi_syncp); + u64_stats_inc(&rxq_stats->napi.poll); + u64_stats_update_end(&rxq_stats->napi_syncp);
txq_stats = &priv->xstats.txq_stats[chan]; - flags = u64_stats_update_begin_irqsave(&txq_stats->syncp); - txq_stats->napi_poll++; - u64_stats_update_end_irqrestore(&txq_stats->syncp, flags); + u64_stats_update_begin(&txq_stats->napi_syncp); + u64_stats_inc(&txq_stats->napi.poll); + u64_stats_update_end(&txq_stats->napi_syncp);
tx_done = stmmac_tx_clean(priv, budget, chan, &tx_pending_packets); tx_done = min(tx_done, budget); @@ -6990,10 +6979,13 @@ static void stmmac_get_stats64(struct ne u64 tx_bytes;
do { - start = u64_stats_fetch_begin(&txq_stats->syncp); - tx_packets = txq_stats->tx_packets; - tx_bytes = txq_stats->tx_bytes; - } while (u64_stats_fetch_retry(&txq_stats->syncp, start)); + start = u64_stats_fetch_begin(&txq_stats->q_syncp); + tx_bytes = u64_stats_read(&txq_stats->q.tx_bytes); + } while (u64_stats_fetch_retry(&txq_stats->q_syncp, start)); + do { + start = u64_stats_fetch_begin(&txq_stats->napi_syncp); + tx_packets = u64_stats_read(&txq_stats->napi.tx_packets); + } while (u64_stats_fetch_retry(&txq_stats->napi_syncp, start));
stats->tx_packets += tx_packets; stats->tx_bytes += tx_bytes; @@ -7005,10 +6997,10 @@ static void stmmac_get_stats64(struct ne u64 rx_bytes;
do { - start = u64_stats_fetch_begin(&rxq_stats->syncp); - rx_packets = rxq_stats->rx_packets; - rx_bytes = rxq_stats->rx_bytes; - } while (u64_stats_fetch_retry(&rxq_stats->syncp, start)); + start = u64_stats_fetch_begin(&rxq_stats->napi_syncp); + rx_packets = u64_stats_read(&rxq_stats->napi.rx_packets); + rx_bytes = u64_stats_read(&rxq_stats->napi.rx_bytes); + } while (u64_stats_fetch_retry(&rxq_stats->napi_syncp, start));
stats->rx_packets += rx_packets; stats->rx_bytes += rx_bytes; @@ -7402,9 +7394,16 @@ int stmmac_dvr_probe(struct device *devi priv->dev = ndev;
for (i = 0; i < MTL_MAX_RX_QUEUES; i++) - u64_stats_init(&priv->xstats.rxq_stats[i].syncp); - for (i = 0; i < MTL_MAX_TX_QUEUES; i++) - u64_stats_init(&priv->xstats.txq_stats[i].syncp); + u64_stats_init(&priv->xstats.rxq_stats[i].napi_syncp); + for (i = 0; i < MTL_MAX_TX_QUEUES; i++) { + u64_stats_init(&priv->xstats.txq_stats[i].q_syncp); + u64_stats_init(&priv->xstats.txq_stats[i].napi_syncp); + } + + priv->xstats.pcpu_stats = + devm_netdev_alloc_pcpu_stats(device, struct stmmac_pcpu_stats); + if (!priv->xstats.pcpu_stats) + return -ENOMEM;
stmmac_set_ethtool_ops(ndev); priv->pause = pause;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shradha Gupta shradhagupta@linux.microsoft.com
commit 9cae43da9867412f8bd09aee5c8a8dc5e8dc3dc2 upstream.
If hv_netvsc driver is unloaded and reloaded, the NET_DEVICE_REGISTER handler cannot perform VF register successfully as the register call is received before netvsc_probe is finished. This is because we register register_netdevice_notifier() very early( even before vmbus_driver_register()). To fix this, we try to register each such matching VF( if it is visible as a netdevice) at the end of netvsc_probe.
Cc: stable@vger.kernel.org Fixes: 85520856466e ("hv_netvsc: Fix race of register_netdevice_notifier and VF register") Suggested-by: Dexuan Cui decui@microsoft.com Signed-off-by: Shradha Gupta shradhagupta@linux.microsoft.com Reviewed-by: Haiyang Zhang haiyangz@microsoft.com Reviewed-by: Dexuan Cui decui@microsoft.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/hyperv/netvsc_drv.c | 82 ++++++++++++++++++++++++++++++---------- 1 file changed, 62 insertions(+), 20 deletions(-)
--- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -42,6 +42,10 @@ #define LINKCHANGE_INT (2 * HZ) #define VF_TAKEOVER_INT (HZ / 10)
+/* Macros to define the context of vf registration */ +#define VF_REG_IN_PROBE 1 +#define VF_REG_IN_NOTIFIER 2 + static unsigned int ring_size __ro_after_init = 128; module_param(ring_size, uint, 0444); MODULE_PARM_DESC(ring_size, "Ring buffer size (# of 4K pages)"); @@ -2183,7 +2187,7 @@ static rx_handler_result_t netvsc_vf_han }
static int netvsc_vf_join(struct net_device *vf_netdev, - struct net_device *ndev) + struct net_device *ndev, int context) { struct net_device_context *ndev_ctx = netdev_priv(ndev); int ret; @@ -2206,7 +2210,11 @@ static int netvsc_vf_join(struct net_dev goto upper_link_failed; }
- schedule_delayed_work(&ndev_ctx->vf_takeover, VF_TAKEOVER_INT); + /* If this registration is called from probe context vf_takeover + * is taken care of later in probe itself. + */ + if (context == VF_REG_IN_NOTIFIER) + schedule_delayed_work(&ndev_ctx->vf_takeover, VF_TAKEOVER_INT);
call_netdevice_notifiers(NETDEV_JOIN, vf_netdev);
@@ -2344,7 +2352,7 @@ static int netvsc_prepare_bonding(struct return NOTIFY_DONE; }
-static int netvsc_register_vf(struct net_device *vf_netdev) +static int netvsc_register_vf(struct net_device *vf_netdev, int context) { struct net_device_context *net_device_ctx; struct netvsc_device *netvsc_dev; @@ -2384,7 +2392,7 @@ static int netvsc_register_vf(struct net
netdev_info(ndev, "VF registering: %s\n", vf_netdev->name);
- if (netvsc_vf_join(vf_netdev, ndev) != 0) + if (netvsc_vf_join(vf_netdev, ndev, context) != 0) return NOTIFY_DONE;
dev_hold(vf_netdev); @@ -2482,10 +2490,31 @@ static int netvsc_unregister_vf(struct n return NOTIFY_OK; }
+static int check_dev_is_matching_vf(struct net_device *event_ndev) +{ + /* Skip NetVSC interfaces */ + if (event_ndev->netdev_ops == &device_ops) + return -ENODEV; + + /* Avoid non-Ethernet type devices */ + if (event_ndev->type != ARPHRD_ETHER) + return -ENODEV; + + /* Avoid Vlan dev with same MAC registering as VF */ + if (is_vlan_dev(event_ndev)) + return -ENODEV; + + /* Avoid Bonding master dev with same MAC registering as VF */ + if (netif_is_bond_master(event_ndev)) + return -ENODEV; + + return 0; +} + static int netvsc_probe(struct hv_device *dev, const struct hv_vmbus_device_id *dev_id) { - struct net_device *net = NULL; + struct net_device *net = NULL, *vf_netdev; struct net_device_context *net_device_ctx; struct netvsc_device_info *device_info = NULL; struct netvsc_device *nvdev; @@ -2597,6 +2626,30 @@ static int netvsc_probe(struct hv_device }
list_add(&net_device_ctx->list, &netvsc_dev_list); + + /* When the hv_netvsc driver is unloaded and reloaded, the + * NET_DEVICE_REGISTER for the vf device is replayed before probe + * is complete. This is because register_netdevice_notifier() gets + * registered before vmbus_driver_register() so that callback func + * is set before probe and we don't miss events like NETDEV_POST_INIT + * So, in this section we try to register the matching vf device that + * is present as a netdevice, knowing that its register call is not + * processed in the netvsc_netdev_notifier(as probing is progress and + * get_netvsc_byslot fails). + */ + for_each_netdev(dev_net(net), vf_netdev) { + ret = check_dev_is_matching_vf(vf_netdev); + if (ret != 0) + continue; + + if (net != get_netvsc_byslot(vf_netdev)) + continue; + + netvsc_prepare_bonding(vf_netdev); + netvsc_register_vf(vf_netdev, VF_REG_IN_PROBE); + __netvsc_vf_setup(net, vf_netdev); + break; + } rtnl_unlock();
netvsc_devinfo_put(device_info); @@ -2752,28 +2805,17 @@ static int netvsc_netdev_event(struct no unsigned long event, void *ptr) { struct net_device *event_dev = netdev_notifier_info_to_dev(ptr); + int ret = 0;
- /* Skip our own events */ - if (event_dev->netdev_ops == &device_ops) - return NOTIFY_DONE; - - /* Avoid non-Ethernet type devices */ - if (event_dev->type != ARPHRD_ETHER) - return NOTIFY_DONE; - - /* Avoid Vlan dev with same MAC registering as VF */ - if (is_vlan_dev(event_dev)) - return NOTIFY_DONE; - - /* Avoid Bonding master dev with same MAC registering as VF */ - if (netif_is_bond_master(event_dev)) + ret = check_dev_is_matching_vf(event_dev); + if (ret != 0) return NOTIFY_DONE;
switch (event) { case NETDEV_POST_INIT: return netvsc_prepare_bonding(event_dev); case NETDEV_REGISTER: - return netvsc_register_vf(event_dev); + return netvsc_register_vf(event_dev, VF_REG_IN_NOTIFIER); case NETDEV_UNREGISTER: return netvsc_unregister_vf(event_dev); case NETDEV_UP:
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rishabh Dave ridave@redhat.com
commit cda4672da1c26835dcbd7aec2bfed954eda9b5ef upstream.
In fs/ceph/caps.c, in encode_cap_msg(), "use after free" error was caught by KASAN at this line - 'ceph_buffer_get(arg->xattr_buf);'. This implies before the refcount could be increment here, it was freed.
In same file, in "handle_cap_grant()" refcount is decremented by this line - 'ceph_buffer_put(ci->i_xattrs.blob);'. It appears that a race occurred and resource was freed by the latter line before the former line could increment it.
encode_cap_msg() is called by __send_cap() and __send_cap() is called by ceph_check_caps() after calling __prep_cap(). __prep_cap() is where arg->xattr_buf is assigned to ci->i_xattrs.blob. This is the spot where the refcount must be increased to prevent "use after free" error.
Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/59259 Signed-off-by: Rishabh Dave ridave@redhat.com Reviewed-by: Jeff Layton jlayton@kernel.org Reviewed-by: Xiubo Li xiubli@redhat.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ceph/caps.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -1452,7 +1452,7 @@ static void __prep_cap(struct cap_msg_ar if (flushing & CEPH_CAP_XATTR_EXCL) { arg->old_xattr_buf = __ceph_build_xattrs_blob(ci); arg->xattr_version = ci->i_xattrs.version; - arg->xattr_buf = ci->i_xattrs.blob; + arg->xattr_buf = ceph_buffer_get(ci->i_xattrs.blob); } else { arg->xattr_buf = NULL; arg->old_xattr_buf = NULL; @@ -1553,6 +1553,7 @@ static void __send_cap(struct cap_msg_ar encode_cap_msg(msg, arg); ceph_con_send(&arg->session->s_con, msg); ceph_buffer_put(arg->old_xattr_buf); + ceph_buffer_put(arg->xattr_buf); if (arg->wake) wake_up_all(&ci->i_cap_wq); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Airlie airlied@redhat.com
commit 61712c94782ce105253ee1939cda0c5c025b2c0c upstream.
Timur pointed this out before, and it just slipped my mind, but this might help some things work better, around pcie power management.
Cc: stable@vger.kernel.org # v6.7 Fixes: 8d55b0a940bb ("nouveau/gsp: add some basic registry entries.") Signed-off-by: Dave Airlie airlied@redhat.com Signed-off-by: Danilo Krummrich dakr@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20240130032643.2498315-1-airli... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c @@ -1111,7 +1111,6 @@ r535_gsp_rpc_set_registry(struct nvkm_gs if (IS_ERR(rpc)) return PTR_ERR(rpc);
- rpc->size = sizeof(*rpc); rpc->numEntries = NV_GSP_REG_NUM_ENTRIES;
str_offset = offsetof(typeof(*rpc), entries[NV_GSP_REG_NUM_ENTRIES]); @@ -1127,6 +1126,7 @@ r535_gsp_rpc_set_registry(struct nvkm_gs strings += name_len; str_offset += name_len; } + rpc->size = str_offset;
return nvkm_gsp_rpc_wr(gsp, rpc, false); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oscar Salvador osalvador@suse.de
commit 79d72c68c58784a3e1cd2378669d51bfd0cb7498 upstream.
When configuring a hugetlb filesystem via the fsconfig() syscall, there is a possible NULL dereference in hugetlbfs_fill_super() caused by assigning NULL to ctx->hstate in hugetlbfs_parse_param() when the requested pagesize is non valid.
E.g: Taking the following steps:
fd = fsopen("hugetlbfs", FSOPEN_CLOEXEC); fsconfig(fd, FSCONFIG_SET_STRING, "pagesize", "1024", 0); fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
Given that the requested "pagesize" is invalid, ctxt->hstate will be replaced with NULL, losing its previous value, and we will print an error:
... ... case Opt_pagesize: ps = memparse(param->string, &rest); ctx->hstate = h; if (!ctx->hstate) { pr_err("Unsupported page size %lu MB\n", ps / SZ_1M); return -EINVAL; } return 0; ... ...
This is a problem because later on, we will dereference ctxt->hstate in hugetlbfs_fill_super()
... ... sb->s_blocksize = huge_page_size(ctx->hstate); ... ...
Causing below Oops.
Fix this by replacing cxt->hstate value only when then pagesize is known to be valid.
kernel: hugetlbfs: Unsupported page size 0 MB kernel: BUG: kernel NULL pointer dereference, address: 0000000000000028 kernel: #PF: supervisor read access in kernel mode kernel: #PF: error_code(0x0000) - not-present page kernel: PGD 800000010f66c067 P4D 800000010f66c067 PUD 1b22f8067 PMD 0 kernel: Oops: 0000 [#1] PREEMPT SMP PTI kernel: CPU: 4 PID: 5659 Comm: syscall Tainted: G E 6.8.0-rc2-default+ #22 5a47c3fef76212addcc6eb71344aabc35190ae8f kernel: Hardware name: Intel Corp. GROVEPORT/GROVEPORT, BIOS GVPRCRB1.86B.0016.D04.1705030402 05/03/2017 kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0 kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28 kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004 kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000 kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004 kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000 kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400 kernel: FS: 00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0 kernel: Call Trace: kernel: <TASK> kernel: ? __die_body+0x1a/0x60 kernel: ? page_fault_oops+0x16f/0x4a0 kernel: ? search_bpf_extables+0x65/0x70 kernel: ? fixup_exception+0x22/0x310 kernel: ? exc_page_fault+0x69/0x150 kernel: ? asm_exc_page_fault+0x22/0x30 kernel: ? __pfx_hugetlbfs_fill_super+0x10/0x10 kernel: ? hugetlbfs_fill_super+0xb4/0x1a0 kernel: ? hugetlbfs_fill_super+0x28/0x1a0 kernel: ? __pfx_hugetlbfs_fill_super+0x10/0x10 kernel: vfs_get_super+0x40/0xa0 kernel: ? __pfx_bpf_lsm_capable+0x10/0x10 kernel: vfs_get_tree+0x25/0xd0 kernel: vfs_cmd_create+0x64/0xe0 kernel: __x64_sys_fsconfig+0x395/0x410 kernel: do_syscall_64+0x80/0x160 kernel: ? syscall_exit_to_user_mode+0x82/0x240 kernel: ? do_syscall_64+0x8d/0x160 kernel: ? syscall_exit_to_user_mode+0x82/0x240 kernel: ? do_syscall_64+0x8d/0x160 kernel: ? exc_page_fault+0x69/0x150 kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76 kernel: RIP: 0033:0x7ffbc0cb87c9 kernel: Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 96 0d 00 f7 d8 64 89 01 48 kernel: RSP: 002b:00007ffc29d2f388 EFLAGS: 00000206 ORIG_RAX: 00000000000001af kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffbc0cb87c9 kernel: RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003 kernel: RBP: 00007ffc29d2f3b0 R08: 0000000000000000 R09: 0000000000000000 kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 kernel: R13: 00007ffc29d2f4c0 R14: 0000000000000000 R15: 0000000000000000 kernel: </TASK> kernel: Modules linked in: rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) grace(E) sunrpc(E) netfs(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) intel_rapl_msr(E) intel_rapl_common(E) iTCO_wdt(E) intel_pmc_bxt(E) sb_edac(E) iTCO_vendor_support(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) rfkill(E) ipmi_ssif(E) kvm(E) acpi_ipmi(E) irqbypass(E) pcspkr(E) igb(E) ipmi_si(E) mei_me(E) i2c_i801(E) joydev(E) intel_pch_thermal(E) i2c_smbus(E) dca(E) lpc_ich(E) mei(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_pad(E) tiny_power_button(E) button(E) fuse(E) efi_pstore(E) configfs(E) ip_tables(E) x_tables(E) ext4(E) mbcache(E) jbd2(E) hid_generic(E) usbhid(E) sd_mod(E) t10_pi(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) polyval_clmulni(E) ahci(E) xhci_pci(E) polyval_generic(E) gf128mul(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) xhci_pci_renesas(E) libahci(E) ehci_pci(E) sha1_ssse3(E) xhci_hcd(E) ehci_hcd(E) libata(E) kernel: mgag200(E) i2c_algo_bit(E) usbcore(E) wmi(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) scsi_common(E) aesni_intel(E) crypto_simd(E) cryptd(E) kernel: Unloaded tainted modules: acpi_cpufreq(E):1 fjes(E):1 kernel: CR2: 0000000000000028 kernel: ---[ end trace 0000000000000000 ]--- kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0 kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28 kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004 kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000 kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004 kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000 kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400 kernel: FS: 00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0
Link: https://lkml.kernel.org/r/20240130210418.3771-1-osalvador@suse.de Fixes: 32021982a324 ("hugetlbfs: Convert to fs_context") Signed-off-by: Michal Hocko mhocko@suse.com Signed-off-by: Oscar Salvador osalvador@suse.de Acked-by: Muchun Song muchun.song@linux.dev Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/hugetlbfs/inode.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -1354,6 +1354,7 @@ static int hugetlbfs_parse_param(struct { struct hugetlbfs_fs_context *ctx = fc->fs_private; struct fs_parse_result result; + struct hstate *h; char *rest; unsigned long ps; int opt; @@ -1398,11 +1399,12 @@ static int hugetlbfs_parse_param(struct
case Opt_pagesize: ps = memparse(param->string, &rest); - ctx->hstate = size_to_hstate(ps); - if (!ctx->hstate) { + h = size_to_hstate(ps); + if (!h) { pr_err("Unsupported page size %lu MB\n", ps / SZ_1M); return -EINVAL; } + ctx->hstate = h; return 0;
case Opt_min_size:
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Prakash Sangappa prakash.sangappa@oracle.com
commit e656c7a9e59607d1672d85ffa9a89031876ffe67 upstream.
For shared memory of type SHM_HUGETLB, hugetlb pages are reserved in shmget() call. If SHM_NORESERVE flags is specified then the hugetlb pages are not reserved. However when the shared memory is attached with the shmat() call the hugetlb pages are getting reserved incorrectly for SHM_HUGETLB shared memory created with SHM_NORESERVE which is a bug.
------------------------------- Following test shows the issue.
$cat shmhtb.c
int main() { int shmflags = 0660 | IPC_CREAT | SHM_HUGETLB | SHM_NORESERVE; int shmid;
shmid = shmget(SKEY, SHMSZ, shmflags); if (shmid < 0) { printf("shmat: shmget() failed, %d\n", errno); return 1; } printf("After shmget()\n"); system("cat /proc/meminfo | grep -i hugepages_");
shmat(shmid, NULL, 0); printf("\nAfter shmat()\n"); system("cat /proc/meminfo | grep -i hugepages_");
shmctl(shmid, IPC_RMID, NULL); return 0; }
#sysctl -w vm.nr_hugepages=20 #./shmhtb
After shmget() HugePages_Total: 20 HugePages_Free: 20 HugePages_Rsvd: 0 HugePages_Surp: 0
After shmat() HugePages_Total: 20 HugePages_Free: 20 HugePages_Rsvd: 5 <-- HugePages_Surp: 0 --------------------------------
Fix is to ensure that hugetlb pages are not reserved for SHM_HUGETLB shared memory in the shmat() call.
Link: https://lkml.kernel.org/r/1706040282-12388-1-git-send-email-prakash.sangappa... Signed-off-by: Prakash Sangappa prakash.sangappa@oracle.com Acked-by: Muchun Song muchun.song@linux.dev Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/hugetlbfs/inode.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
--- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -100,6 +100,7 @@ static int hugetlbfs_file_mmap(struct fi loff_t len, vma_len; int ret; struct hstate *h = hstate_file(file); + vm_flags_t vm_flags;
/* * vma address alignment (but not the pgoff alignment) has @@ -141,10 +142,20 @@ static int hugetlbfs_file_mmap(struct fi file_accessed(file);
ret = -ENOMEM; + + vm_flags = vma->vm_flags; + /* + * for SHM_HUGETLB, the pages are reserved in the shmget() call so skip + * reserving here. Note: only for SHM hugetlbfs file, the inode + * flag S_PRIVATE is set. + */ + if (inode->i_flags & S_PRIVATE) + vm_flags |= VM_NORESERVE; + if (!hugetlb_reserve_pages(inode, vma->vm_pgoff >> huge_page_order(h), len >> huge_page_shift(h), vma, - vma->vm_flags)) + vm_flags)) goto out;
ret = 0;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Huacai Chen chenhuacai@loongson.cn
commit 639420e9f6cd9ca074732b17ac450d2518d5937f upstream.
The earlycon parameter is based on fixmap, and fixmap addresses are not supposed to be shadowed by KASAN. So return the kasan_early_shadow_page in kasan_mem_to_shadow() if the input address is above FIXADDR_START. Otherwise earlycon cannot work after kasan_init().
Cc: stable@vger.kernel.org Fixes: 5aa4ac64e6add3e ("LoongArch: Add KASAN (Kernel Address Sanitizer) support") Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/loongarch/mm/kasan_init.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/arch/loongarch/mm/kasan_init.c +++ b/arch/loongarch/mm/kasan_init.c @@ -44,6 +44,9 @@ void *kasan_mem_to_shadow(const void *ad unsigned long xrange = (maddr >> XRANGE_SHIFT) & 0xffff; unsigned long offset = 0;
+ if (maddr >= FIXADDR_START) + return (void *)(kasan_early_shadow_page); + maddr &= XRANGE_SHADOW_MASK; switch (xrange) { case XKPRANGE_CC_SEG:
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jan Kara jack@suse.cz
commit f814bdda774c183b0cc15ec8f3b6e7c6f4527ba5 upstream.
The detection of dirty-throttled tasks in blk-wbt has been subtly broken since its beginning in 2016. Namely if we are doing cgroup writeback and the throttled task is not in the root cgroup, balance_dirty_pages() will set dirty_sleep for the non-root bdi_writeback structure. However blk-wbt checks dirty_sleep only in the root cgroup bdi_writeback structure. Thus detection of recently throttled tasks is not working in this case (we noticed this when we switched to cgroup v2 and suddently writeback was slow).
Since blk-wbt has no easy way to get to proper bdi_writeback and furthermore its intention has always been to work on the whole device rather than on individual cgroups, just move the dirty_sleep timestamp from bdi_writeback to backing_dev_info. That fixes the checking for recently throttled task and saves memory for everybody as a bonus.
CC: stable@vger.kernel.org Fixes: b57d74aff9ab ("writeback: track if we're sleeping on progress in balance_dirty_pages()") Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20240123175826.21452-1-jack@suse.cz [axboe: fixup indentation errors] Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- block/blk-wbt.c | 4 ++-- include/linux/backing-dev-defs.h | 7 +++++-- mm/backing-dev.c | 2 +- mm/page-writeback.c | 2 +- 4 files changed, 9 insertions(+), 6 deletions(-)
--- a/block/blk-wbt.c +++ b/block/blk-wbt.c @@ -165,9 +165,9 @@ static void wb_timestamp(struct rq_wb *r */ static bool wb_recent_wait(struct rq_wb *rwb) { - struct bdi_writeback *wb = &rwb->rqos.disk->bdi->wb; + struct backing_dev_info *bdi = rwb->rqos.disk->bdi;
- return time_before(jiffies, wb->dirty_sleep + HZ); + return time_before(jiffies, bdi->last_bdp_sleep + HZ); }
static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb, --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -141,8 +141,6 @@ struct bdi_writeback { struct delayed_work dwork; /* work item used for writeback */ struct delayed_work bw_dwork; /* work item used for bandwidth estimate */
- unsigned long dirty_sleep; /* last wait */ - struct list_head bdi_node; /* anchored at bdi->wb_list */
#ifdef CONFIG_CGROUP_WRITEBACK @@ -179,6 +177,11 @@ struct backing_dev_info { * any dirty wbs, which is depended upon by bdi_has_dirty(). */ atomic_long_t tot_write_bandwidth; + /* + * Jiffies when last process was dirty throttled on this bdi. Used by + * blk-wbt. + */ + unsigned long last_bdp_sleep;
struct bdi_writeback wb; /* the root writeback info for this bdi */ struct list_head wb_list; /* list of all wbs */ --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -436,7 +436,6 @@ static int wb_init(struct bdi_writeback INIT_LIST_HEAD(&wb->work_list); INIT_DELAYED_WORK(&wb->dwork, wb_workfn); INIT_DELAYED_WORK(&wb->bw_dwork, wb_update_bandwidth_workfn); - wb->dirty_sleep = jiffies;
err = fprop_local_init_percpu(&wb->completions, gfp); if (err) @@ -921,6 +920,7 @@ int bdi_init(struct backing_dev_info *bd INIT_LIST_HEAD(&bdi->bdi_list); INIT_LIST_HEAD(&bdi->wb_list); init_waitqueue_head(&bdi->wb_waitq); + bdi->last_bdp_sleep = jiffies;
return cgwb_bdi_init(bdi); } --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1921,7 +1921,7 @@ pause: break; } __set_current_state(TASK_KILLABLE); - wb->dirty_sleep = now; + bdi->last_bdp_sleep = jiffies; io_schedule_timeout(pause);
current->dirty_paused_when = now + pause;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vegard Nossum vegard.nossum@oracle.com
commit c23de7ceae59e4ca5894c3ecf4f785c50c0fa428 upstream.
If the directory passed to the '.. kernel-feat::' directive does not exist or the get_feat.pl script does not find any files to extract features from, Sphinx will report the following error:
Sphinx parallel build error: UnboundLocalError: local variable 'fname' referenced before assignment make[2]: *** [Documentation/Makefile:102: htmldocs] Error 2
This is due to how I changed the script in c48a7c44a1d0 ("docs: kernel_feat.py: fix potential command injection"). Before that, the filename passed along to self.nestedParse() in this case was weirdly just the whole get_feat.pl invocation.
We can fix it by doing what kernel_abi.py does -- just pass self.arguments[0] as 'fname'.
Fixes: c48a7c44a1d0 ("docs: kernel_feat.py: fix potential command injection") Cc: Justin Forbes jforbes@fedoraproject.org Cc: Salvatore Bonaccorso carnil@debian.org Cc: Jani Nikula jani.nikula@intel.com Cc: Mauro Carvalho Chehab mchehab@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Vegard Nossum vegard.nossum@oracle.com Link: https://lore.kernel.org/r/20240205175133.774271-2-vegard.nossum@oracle.com Signed-off-by: Jonathan Corbet corbet@lwn.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/sphinx/kernel_feat.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/Documentation/sphinx/kernel_feat.py +++ b/Documentation/sphinx/kernel_feat.py @@ -109,7 +109,7 @@ class KernelFeat(Directive): else: out_lines += line + "\n"
- nodeList = self.nestedParse(out_lines, fname) + nodeList = self.nestedParse(out_lines, self.arguments[0]) return nodeList
def nestedParse(self, lines, fname):
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nuno Sa nuno.sa@analog.com
commit 8f7e917907385e112a845d668ae2832f41e64bf5 upstream.
The property is io-channels and not io-channel. This was effectively preventing the devlink creation.
Fixes: 8e12257dead7 ("of: property: Add device link support for iommus, mboxes and io-channels") Cc: stable@vger.kernel.org Signed-off-by: Nuno Sa nuno.sa@analog.com Reviewed-by: Saravana Kannan saravanak@google.com Acked-by: Jonathan Cameron Jonathan.Cameron@huawei.com Link: https://lore.kernel.org/r/20240123-iio-backend-v7-1-1bff236b8693@analog.com Signed-off-by: Rob Herring robh@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/of/property.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/of/property.c +++ b/drivers/of/property.c @@ -1213,7 +1213,7 @@ DEFINE_SIMPLE_PROP(clocks, "clocks", "#c DEFINE_SIMPLE_PROP(interconnects, "interconnects", "#interconnect-cells") DEFINE_SIMPLE_PROP(iommus, "iommus", "#iommu-cells") DEFINE_SIMPLE_PROP(mboxes, "mboxes", "#mbox-cells") -DEFINE_SIMPLE_PROP(io_channels, "io-channel", "#io-channel-cells") +DEFINE_SIMPLE_PROP(io_channels, "io-channels", "#io-channel-cells") DEFINE_SIMPLE_PROP(interrupt_parent, "interrupt-parent", NULL) DEFINE_SIMPLE_PROP(dmas, "dmas", "#dma-cells") DEFINE_SIMPLE_PROP(power_domains, "power-domains", "#power-domain-cells")
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maximilian Heyne mheyne@amazon.de
commit fa765c4b4aed2d64266b694520ecb025c862c5a9 upstream.
shutdown_pirq and startup_pirq are not taking the irq_mapping_update_lock because they can't due to lock inversion. Both are called with the irq_desc->lock being taking. The lock order, however, is first irq_mapping_update_lock and then irq_desc->lock.
This opens multiple races: - shutdown_pirq can be interrupted by a function that allocates an event channel:
CPU0 CPU1 shutdown_pirq { xen_evtchn_close(e) __startup_pirq { EVTCHNOP_bind_pirq -> returns just freed evtchn e set_evtchn_to_irq(e, irq) } xen_irq_info_cleanup() { set_evtchn_to_irq(e, -1) } }
Assume here event channel e refers here to the same event channel number. After this race the evtchn_to_irq mapping for e is invalid (-1).
- __startup_pirq races with __unbind_from_irq in a similar way. Because __startup_pirq doesn't take irq_mapping_update_lock it can grab the evtchn that __unbind_from_irq is currently freeing and cleaning up. In this case even though the event channel is allocated, its mapping can be unset in evtchn_to_irq.
The fix is to first cleanup the mappings and then close the event channel. In this way, when an event channel gets allocated it's potential previous evtchn_to_irq mappings are guaranteed to be unset already. This is also the reverse order of the allocation where first the event channel is allocated and then the mappings are setup.
On a 5.10 kernel prior to commit 3fcdaf3d7634 ("xen/events: modify internal [un]bind interfaces"), we hit a BUG like the following during probing of NVMe devices. The issue is that during nvme_setup_io_queues, pci_free_irq is called for every device which results in a call to shutdown_pirq. With many nvme devices it's therefore likely to hit this race during boot because there will be multiple calls to shutdown_pirq and startup_pirq are running potentially in parallel.
------------[ cut here ]------------ blkfront: xvda: barrier or flush: disabled; persistent grants: enabled; indirect descriptors: enabled; bounce buffer: enabled kernel BUG at drivers/xen/events/events_base.c:499! invalid opcode: 0000 [#1] SMP PTI CPU: 44 PID: 375 Comm: kworker/u257:23 Not tainted 5.10.201-191.748.amzn2.x86_64 #1 Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006 Workqueue: nvme-reset-wq nvme_reset_work RIP: 0010:bind_evtchn_to_cpu+0xdf/0xf0 Code: 5d 41 5e c3 cc cc cc cc 44 89 f7 e8 2b 55 ad ff 49 89 c5 48 85 c0 0f 84 64 ff ff ff 4c 8b 68 30 41 83 fe ff 0f 85 60 ff ff ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 RSP: 0000:ffffc9000d533b08 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006 RDX: 0000000000000028 RSI: 00000000ffffffff RDI: 00000000ffffffff RBP: ffff888107419680 R08: 0000000000000000 R09: ffffffff82d72b00 R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000001ed R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000002 FS: 0000000000000000(0000) GS:ffff88bc8b500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000002610001 CR4: 00000000001706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? show_trace_log_lvl+0x1c1/0x2d9 ? show_trace_log_lvl+0x1c1/0x2d9 ? set_affinity_irq+0xdc/0x1c0 ? __die_body.cold+0x8/0xd ? die+0x2b/0x50 ? do_trap+0x90/0x110 ? bind_evtchn_to_cpu+0xdf/0xf0 ? do_error_trap+0x65/0x80 ? bind_evtchn_to_cpu+0xdf/0xf0 ? exc_invalid_op+0x4e/0x70 ? bind_evtchn_to_cpu+0xdf/0xf0 ? asm_exc_invalid_op+0x12/0x20 ? bind_evtchn_to_cpu+0xdf/0xf0 ? bind_evtchn_to_cpu+0xc5/0xf0 set_affinity_irq+0xdc/0x1c0 irq_do_set_affinity+0x1d7/0x1f0 irq_setup_affinity+0xd6/0x1a0 irq_startup+0x8a/0xf0 __setup_irq+0x639/0x6d0 ? nvme_suspend+0x150/0x150 request_threaded_irq+0x10c/0x180 ? nvme_suspend+0x150/0x150 pci_request_irq+0xa8/0xf0 ? __blk_mq_free_request+0x74/0xa0 queue_request_irq+0x6f/0x80 nvme_create_queue+0x1af/0x200 nvme_create_io_queues+0xbd/0xf0 nvme_setup_io_queues+0x246/0x320 ? nvme_irq_check+0x30/0x30 nvme_reset_work+0x1c8/0x400 process_one_work+0x1b0/0x350 worker_thread+0x49/0x310 ? process_one_work+0x350/0x350 kthread+0x11b/0x140 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x22/0x30 Modules linked in: ---[ end trace a11715de1eee1873 ]---
Fixes: d46a78b05c0e ("xen: implement pirq type event channels") Cc: stable@vger.kernel.org Co-debugged-by: Andrew Panyakin apanyaki@amazon.com Signed-off-by: Maximilian Heyne mheyne@amazon.de Reviewed-by: Juergen Gross jgross@suse.com Link: https://lore.kernel.org/r/20240124163130.31324-1-mheyne@amazon.de Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/xen/events/events_base.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/xen/events/events_base.c +++ b/drivers/xen/events/events_base.c @@ -923,8 +923,8 @@ static void shutdown_pirq(struct irq_dat return;
do_mask(info, EVT_MASK_REASON_EXPLICIT); - xen_evtchn_close(evtchn); xen_irq_info_cleanup(info); + xen_evtchn_close(evtchn); }
static void enable_pirq(struct irq_data *data) @@ -956,6 +956,7 @@ EXPORT_SYMBOL_GPL(xen_irq_from_gsi); static void __unbind_from_irq(struct irq_info *info, unsigned int irq) { evtchn_port_t evtchn; + bool close_evtchn = false;
if (!info) { xen_irq_free_desc(irq); @@ -975,7 +976,7 @@ static void __unbind_from_irq(struct irq struct xenbus_device *dev;
if (!info->is_static) - xen_evtchn_close(evtchn); + close_evtchn = true;
switch (info->type) { case IRQT_VIRQ: @@ -995,6 +996,9 @@ static void __unbind_from_irq(struct irq }
xen_irq_info_cleanup(info); + + if (close_evtchn) + xen_evtchn_close(evtchn); }
xen_free_irq(info);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maxime Jayat maxime.jayat@mobile-devices.fr
commit 2aa0a5e65eae27dbd96faca92c84ecbf6f492d42 upstream.
The TDCO calculation was done using the currently applied data bittiming, instead of the newly computed data bittiming, which means that the TDCO had an invalid value unless setting the same data bittiming twice.
Fixes: d99755f71a80 ("can: netlink: add interface for CAN-FD Transmitter Delay Compensation (TDC)") Signed-off-by: Maxime Jayat maxime.jayat@mobile-devices.fr Reviewed-by: Vincent Mailhol mailhol.vincent@wanadoo.fr Link: https://lore.kernel.org/all/40579c18-63c0-43a4-8d4c-f3a6c1c0b417@munic.io Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/can/dev/netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/can/dev/netlink.c +++ b/drivers/net/can/dev/netlink.c @@ -346,7 +346,7 @@ static int can_changelink(struct net_dev /* Neither of TDC parameters nor TDC flags are * provided: do calculation */ - can_calc_tdco(&priv->tdc, priv->tdc_const, &priv->data_bittiming, + can_calc_tdco(&priv->tdc, priv->tdc_const, &dbt, &priv->ctrlmode, priv->ctrlmode_supported); } /* else: both CAN_CTRLMODE_TDC_{AUTO,MANUAL} are explicitly * turned off. TDC is disabled: do nothing
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ziqi Zhao astrajoan@yahoo.com
commit 6cdedc18ba7b9dacc36466e27e3267d201948c8d upstream.
The following 3 locks would race against each other, causing the deadlock situation in the Syzbot bug report:
- j1939_socks_lock - active_session_list_lock - sk_session_queue_lock
A reasonable fix is to change j1939_socks_lock to an rwlock, since in the rare situations where a write lock is required for the linked list that j1939_socks_lock is protecting, the code does not attempt to acquire any more locks. This would break the circular lock dependency, where, for example, the current thread already locks j1939_socks_lock and attempts to acquire sk_session_queue_lock, and at the same time, another thread attempts to acquire j1939_socks_lock while holding sk_session_queue_lock.
NOTE: This patch along does not fix the unregister_netdevice bug reported by Syzbot; instead, it solves a deadlock situation to prepare for one or more further patches to actually fix the Syzbot bug, which appears to be a reference counting problem within the j1939 codebase.
Reported-by: syzbot+1591462f226d9cbf0564@syzkaller.appspotmail.com Signed-off-by: Ziqi Zhao astrajoan@yahoo.com Reviewed-by: Oleksij Rempel o.rempel@pengutronix.de Acked-by: Oleksij Rempel o.rempel@pengutronix.de Link: https://lore.kernel.org/all/20230721162226.8639-1-astrajoan@yahoo.com [mkl: remove unrelated newline change] Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/can/j1939/j1939-priv.h | 2 +- net/can/j1939/main.c | 2 +- net/can/j1939/socket.c | 24 ++++++++++++------------ 3 files changed, 14 insertions(+), 14 deletions(-)
--- a/net/can/j1939/j1939-priv.h +++ b/net/can/j1939/j1939-priv.h @@ -86,7 +86,7 @@ struct j1939_priv { unsigned int tp_max_packet_size;
/* lock for j1939_socks list */ - spinlock_t j1939_socks_lock; + rwlock_t j1939_socks_lock; struct list_head j1939_socks;
struct kref rx_kref; --- a/net/can/j1939/main.c +++ b/net/can/j1939/main.c @@ -274,7 +274,7 @@ struct j1939_priv *j1939_netdev_start(st return ERR_PTR(-ENOMEM);
j1939_tp_init(priv); - spin_lock_init(&priv->j1939_socks_lock); + rwlock_init(&priv->j1939_socks_lock); INIT_LIST_HEAD(&priv->j1939_socks);
mutex_lock(&j1939_netdev_lock); --- a/net/can/j1939/socket.c +++ b/net/can/j1939/socket.c @@ -80,16 +80,16 @@ static void j1939_jsk_add(struct j1939_p jsk->state |= J1939_SOCK_BOUND; j1939_priv_get(priv);
- spin_lock_bh(&priv->j1939_socks_lock); + write_lock_bh(&priv->j1939_socks_lock); list_add_tail(&jsk->list, &priv->j1939_socks); - spin_unlock_bh(&priv->j1939_socks_lock); + write_unlock_bh(&priv->j1939_socks_lock); }
static void j1939_jsk_del(struct j1939_priv *priv, struct j1939_sock *jsk) { - spin_lock_bh(&priv->j1939_socks_lock); + write_lock_bh(&priv->j1939_socks_lock); list_del_init(&jsk->list); - spin_unlock_bh(&priv->j1939_socks_lock); + write_unlock_bh(&priv->j1939_socks_lock);
j1939_priv_put(priv); jsk->state &= ~J1939_SOCK_BOUND; @@ -329,13 +329,13 @@ bool j1939_sk_recv_match(struct j1939_pr struct j1939_sock *jsk; bool match = false;
- spin_lock_bh(&priv->j1939_socks_lock); + read_lock_bh(&priv->j1939_socks_lock); list_for_each_entry(jsk, &priv->j1939_socks, list) { match = j1939_sk_recv_match_one(jsk, skcb); if (match) break; } - spin_unlock_bh(&priv->j1939_socks_lock); + read_unlock_bh(&priv->j1939_socks_lock);
return match; } @@ -344,11 +344,11 @@ void j1939_sk_recv(struct j1939_priv *pr { struct j1939_sock *jsk;
- spin_lock_bh(&priv->j1939_socks_lock); + read_lock_bh(&priv->j1939_socks_lock); list_for_each_entry(jsk, &priv->j1939_socks, list) { j1939_sk_recv_one(jsk, skb); } - spin_unlock_bh(&priv->j1939_socks_lock); + read_unlock_bh(&priv->j1939_socks_lock); }
static void j1939_sk_sock_destruct(struct sock *sk) @@ -1080,12 +1080,12 @@ void j1939_sk_errqueue(struct j1939_sess }
/* spread RX notifications to all sockets subscribed to this session */ - spin_lock_bh(&priv->j1939_socks_lock); + read_lock_bh(&priv->j1939_socks_lock); list_for_each_entry(jsk, &priv->j1939_socks, list) { if (j1939_sk_recv_match_one(jsk, &session->skcb)) __j1939_sk_errqueue(session, &jsk->sk, type); } - spin_unlock_bh(&priv->j1939_socks_lock); + read_unlock_bh(&priv->j1939_socks_lock); };
void j1939_sk_send_loop_abort(struct sock *sk, int err) @@ -1273,7 +1273,7 @@ void j1939_sk_netdev_event_netdown(struc struct j1939_sock *jsk; int error_code = ENETDOWN;
- spin_lock_bh(&priv->j1939_socks_lock); + read_lock_bh(&priv->j1939_socks_lock); list_for_each_entry(jsk, &priv->j1939_socks, list) { jsk->sk.sk_err = error_code; if (!sock_flag(&jsk->sk, SOCK_DEAD)) @@ -1281,7 +1281,7 @@ void j1939_sk_netdev_event_netdown(struc
j1939_sk_queue_drop_all(priv, jsk, error_code); } - spin_unlock_bh(&priv->j1939_socks_lock); + read_unlock_bh(&priv->j1939_socks_lock); }
static int j1939_sk_no_ioctlcmd(struct socket *sock, unsigned int cmd,
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oleksij Rempel o.rempel@pengutronix.de
commit efe7cf828039aedb297c1f9920b638fffee6aabc upstream.
Lock jsk->sk to prevent UAF when setsockopt(..., SO_J1939_FILTER, ...) modifies jsk->filters while receiving packets.
Following trace was seen on affected system: ================================================================== BUG: KASAN: slab-use-after-free in j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939] Read of size 4 at addr ffff888012144014 by task j1939/350
CPU: 0 PID: 350 Comm: j1939 Tainted: G W OE 6.5.0-rc5 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: print_report+0xd3/0x620 ? kasan_complete_mode_report_info+0x7d/0x200 ? j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939] kasan_report+0xc2/0x100 ? j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939] __asan_load4+0x84/0xb0 j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939] j1939_sk_recv+0x20b/0x320 [can_j1939] ? __kasan_check_write+0x18/0x20 ? __pfx_j1939_sk_recv+0x10/0x10 [can_j1939] ? j1939_simple_recv+0x69/0x280 [can_j1939] ? j1939_ac_recv+0x5e/0x310 [can_j1939] j1939_can_recv+0x43f/0x580 [can_j1939] ? __pfx_j1939_can_recv+0x10/0x10 [can_j1939] ? raw_rcv+0x42/0x3c0 [can_raw] ? __pfx_j1939_can_recv+0x10/0x10 [can_j1939] can_rcv_filter+0x11f/0x350 [can] can_receive+0x12f/0x190 [can] ? __pfx_can_rcv+0x10/0x10 [can] can_rcv+0xdd/0x130 [can] ? __pfx_can_rcv+0x10/0x10 [can] __netif_receive_skb_one_core+0x13d/0x150 ? __pfx___netif_receive_skb_one_core+0x10/0x10 ? __kasan_check_write+0x18/0x20 ? _raw_spin_lock_irq+0x8c/0xe0 __netif_receive_skb+0x23/0xb0 process_backlog+0x107/0x260 __napi_poll+0x69/0x310 net_rx_action+0x2a1/0x580 ? __pfx_net_rx_action+0x10/0x10 ? __pfx__raw_spin_lock+0x10/0x10 ? handle_irq_event+0x7d/0xa0 __do_softirq+0xf3/0x3f8 do_softirq+0x53/0x80 </IRQ> <TASK> __local_bh_enable_ip+0x6e/0x70 netif_rx+0x16b/0x180 can_send+0x32b/0x520 [can] ? __pfx_can_send+0x10/0x10 [can] ? __check_object_size+0x299/0x410 raw_sendmsg+0x572/0x6d0 [can_raw] ? __pfx_raw_sendmsg+0x10/0x10 [can_raw] ? apparmor_socket_sendmsg+0x2f/0x40 ? __pfx_raw_sendmsg+0x10/0x10 [can_raw] sock_sendmsg+0xef/0x100 sock_write_iter+0x162/0x220 ? __pfx_sock_write_iter+0x10/0x10 ? __rtnl_unlock+0x47/0x80 ? security_file_permission+0x54/0x320 vfs_write+0x6ba/0x750 ? __pfx_vfs_write+0x10/0x10 ? __fget_light+0x1ca/0x1f0 ? __rcu_read_unlock+0x5b/0x280 ksys_write+0x143/0x170 ? __pfx_ksys_write+0x10/0x10 ? __kasan_check_read+0x15/0x20 ? fpregs_assert_state_consistent+0x62/0x70 __x64_sys_write+0x47/0x60 do_syscall_64+0x60/0x90 ? do_syscall_64+0x6d/0x90 ? irqentry_exit+0x3f/0x50 ? exc_page_fault+0x79/0xf0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Allocated by task 348: kasan_save_stack+0x2a/0x50 kasan_set_track+0x29/0x40 kasan_save_alloc_info+0x1f/0x30 __kasan_kmalloc+0xb5/0xc0 __kmalloc_node_track_caller+0x67/0x160 j1939_sk_setsockopt+0x284/0x450 [can_j1939] __sys_setsockopt+0x15c/0x2f0 __x64_sys_setsockopt+0x6b/0x80 do_syscall_64+0x60/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Freed by task 349: kasan_save_stack+0x2a/0x50 kasan_set_track+0x29/0x40 kasan_save_free_info+0x2f/0x50 __kasan_slab_free+0x12e/0x1c0 __kmem_cache_free+0x1b9/0x380 kfree+0x7a/0x120 j1939_sk_setsockopt+0x3b2/0x450 [can_j1939] __sys_setsockopt+0x15c/0x2f0 __x64_sys_setsockopt+0x6b/0x80 do_syscall_64+0x60/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Fixes: 9d71dd0c70099 ("can: add support of SAE J1939 protocol") Reported-by: Sili Luo rootlab@huawei.com Suggested-by: Sili Luo rootlab@huawei.com Acked-by: Oleksij Rempel o.rempel@pengutronix.de Cc: stable@vger.kernel.org Signed-off-by: Oleksij Rempel o.rempel@pengutronix.de Link: https://lore.kernel.org/all/20231020133814.383996-1-o.rempel@pengutronix.de Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/can/j1939/j1939-priv.h | 1 + net/can/j1939/socket.c | 22 ++++++++++++++++++---- 2 files changed, 19 insertions(+), 4 deletions(-)
--- a/net/can/j1939/j1939-priv.h +++ b/net/can/j1939/j1939-priv.h @@ -301,6 +301,7 @@ struct j1939_sock {
int ifindex; struct j1939_addr addr; + spinlock_t filters_lock; struct j1939_filter *filters; int nfilters; pgn_t pgn_rx_filter; --- a/net/can/j1939/socket.c +++ b/net/can/j1939/socket.c @@ -262,12 +262,17 @@ static bool j1939_sk_match_dst(struct j1 static bool j1939_sk_match_filter(struct j1939_sock *jsk, const struct j1939_sk_buff_cb *skcb) { - const struct j1939_filter *f = jsk->filters; - int nfilter = jsk->nfilters; + const struct j1939_filter *f; + int nfilter; + + spin_lock_bh(&jsk->filters_lock); + + f = jsk->filters; + nfilter = jsk->nfilters;
if (!nfilter) /* receive all when no filters are assigned */ - return true; + goto filter_match_found;
for (; nfilter; ++f, --nfilter) { if ((skcb->addr.pgn & f->pgn_mask) != f->pgn) @@ -276,9 +281,15 @@ static bool j1939_sk_match_filter(struct continue; if ((skcb->addr.src_name & f->name_mask) != f->name) continue; - return true; + goto filter_match_found; } + + spin_unlock_bh(&jsk->filters_lock); return false; + +filter_match_found: + spin_unlock_bh(&jsk->filters_lock); + return true; }
static bool j1939_sk_recv_match_one(struct j1939_sock *jsk, @@ -401,6 +412,7 @@ static int j1939_sk_init(struct sock *sk atomic_set(&jsk->skb_pending, 0); spin_lock_init(&jsk->sk_session_queue_lock); INIT_LIST_HEAD(&jsk->sk_session_queue); + spin_lock_init(&jsk->filters_lock);
/* j1939_sk_sock_destruct() depends on SOCK_RCU_FREE flag */ sock_set_flag(sk, SOCK_RCU_FREE); @@ -703,9 +715,11 @@ static int j1939_sk_setsockopt(struct so }
lock_sock(&jsk->sk); + spin_lock_bh(&jsk->filters_lock); ofilters = jsk->filters; jsk->filters = filters; jsk->nfilters = count; + spin_unlock_bh(&jsk->filters_lock); release_sock(&jsk->sk); kfree(ofilters); return 0;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Konrad Dybcio konrad.dybcio@linaro.org
commit 741ba0134fa7822fcf4e4a0a537a5c4cfd706b20 upstream.
The unused clock cleanup uses the _sync initcall to give all users at earlier initcalls time to probe. Do the same to avoid leaving some PDs dangling at "on" (which actually happened on qcom!).
Fixes: 2fe71dcdfd10 ("PM / domains: Add late_initcall to disable unused PM domains") Signed-off-by: Konrad Dybcio konrad.dybcio@linaro.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231227-topic-pmdomain_sync_cleanup-v1-1-5f36769d... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/base/power/domain.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -1111,7 +1111,7 @@ static int __init genpd_power_off_unused
return 0; } -late_initcall(genpd_power_off_unused); +late_initcall_sync(genpd_power_off_unused);
#ifdef CONFIG_PM_SLEEP
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oleg Nesterov oleg@redhat.com
commit 60f92acb60a989b14e4b744501a0df0f82ef30a3 upstream.
Patch series "fs/proc: do_task_stat: use sig->stats_".
do_task_stat() has the same problem as getrusage() had before "getrusage: use sig->stats_lock rather than lock_task_sighand()": a hard lockup. If NR_CPUS threads call lock_task_sighand() at the same time and the process has NR_THREADS, spin_lock_irq will spin with irqs disabled O(NR_CPUS * NR_THREADS) time.
This patch (of 3):
thread_group_cputime() does its own locking, we can safely shift thread_group_cputime_adjusted() which does another for_each_thread loop outside of ->siglock protected section.
Not only this removes for_each_thread() from the critical section with irqs disabled, this removes another case when stats_lock is taken with siglock held. We want to remove this dependency, then we can change the users of stats_lock to not disable irqs.
Link: https://lkml.kernel.org/r/20240123153313.GA21832@redhat.com Link: https://lkml.kernel.org/r/20240123153355.GA21854@redhat.com Signed-off-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Dylan Hatch dylanbhatch@google.com Cc: Eric W. Biederman ebiederm@xmission.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/proc/array.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
--- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -511,7 +511,7 @@ static int do_task_stat(struct seq_file
sigemptyset(&sigign); sigemptyset(&sigcatch); - cutime = cstime = utime = stime = 0; + cutime = cstime = 0; cgtime = gtime = 0;
if (lock_task_sighand(task, &flags)) { @@ -546,7 +546,6 @@ static int do_task_stat(struct seq_file
min_flt += sig->min_flt; maj_flt += sig->maj_flt; - thread_group_cputime_adjusted(task, &utime, &stime); gtime += sig->gtime;
if (sig->flags & (SIGNAL_GROUP_EXIT | SIGNAL_STOP_STOPPED)) @@ -562,10 +561,13 @@ static int do_task_stat(struct seq_file
if (permitted && (!whole || num_threads < 2)) wchan = !task_is_running(task); - if (!whole) { + + if (whole) { + thread_group_cputime_adjusted(task, &utime, &stime); + } else { + task_cputime_adjusted(task, &utime, &stime); min_flt = task->min_flt; maj_flt = task->maj_flt; - task_cputime_adjusted(task, &utime, &stime); gtime = task_gtime(task); }
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oleg Nesterov oleg@redhat.com
commit 7601df8031fd67310af891897ef6cc0df4209305 upstream.
lock_task_sighand() can trigger a hard lockup. If NR_CPUS threads call do_task_stat() at the same time and the process has NR_THREADS, it will spin with irqs disabled O(NR_CPUS * NR_THREADS) time.
Change do_task_stat() to use sig->stats_lock to gather the statistics outside of ->siglock protected section, in the likely case this code will run lockless.
Link: https://lkml.kernel.org/r/20240123153357.GA21857@redhat.com Signed-off-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Dylan Hatch dylanbhatch@google.com Cc: Eric W. Biederman ebiederm@xmission.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/proc/array.c | 58 ++++++++++++++++++++++++++++++-------------------------- 1 file changed, 32 insertions(+), 26 deletions(-)
--- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -477,13 +477,13 @@ static int do_task_stat(struct seq_file int permitted; struct mm_struct *mm; unsigned long long start_time; - unsigned long cmin_flt = 0, cmaj_flt = 0; - unsigned long min_flt = 0, maj_flt = 0; - u64 cutime, cstime, utime, stime; - u64 cgtime, gtime; + unsigned long cmin_flt, cmaj_flt, min_flt, maj_flt; + u64 cutime, cstime, cgtime, utime, stime, gtime; unsigned long rsslim = 0; unsigned long flags; int exit_code = task->exit_code; + struct signal_struct *sig = task->signal; + unsigned int seq = 1;
state = *get_task_state(task); vsize = eip = esp = 0; @@ -511,12 +511,8 @@ static int do_task_stat(struct seq_file
sigemptyset(&sigign); sigemptyset(&sigcatch); - cutime = cstime = 0; - cgtime = gtime = 0;
if (lock_task_sighand(task, &flags)) { - struct signal_struct *sig = task->signal; - if (sig->tty) { struct pid *pgrp = tty_get_pgrp(sig->tty); tty_pgrp = pid_nr_ns(pgrp, ns); @@ -527,27 +523,9 @@ static int do_task_stat(struct seq_file num_threads = get_nr_threads(task); collect_sigign_sigcatch(task, &sigign, &sigcatch);
- cmin_flt = sig->cmin_flt; - cmaj_flt = sig->cmaj_flt; - cutime = sig->cutime; - cstime = sig->cstime; - cgtime = sig->cgtime; rsslim = READ_ONCE(sig->rlim[RLIMIT_RSS].rlim_cur);
- /* add up live thread stats at the group level */ if (whole) { - struct task_struct *t; - - __for_each_thread(sig, t) { - min_flt += t->min_flt; - maj_flt += t->maj_flt; - gtime += task_gtime(t); - } - - min_flt += sig->min_flt; - maj_flt += sig->maj_flt; - gtime += sig->gtime; - if (sig->flags & (SIGNAL_GROUP_EXIT | SIGNAL_STOP_STOPPED)) exit_code = sig->group_exit_code; } @@ -562,6 +540,34 @@ static int do_task_stat(struct seq_file if (permitted && (!whole || num_threads < 2)) wchan = !task_is_running(task);
+ do { + seq++; /* 2 on the 1st/lockless path, otherwise odd */ + flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); + + cmin_flt = sig->cmin_flt; + cmaj_flt = sig->cmaj_flt; + cutime = sig->cutime; + cstime = sig->cstime; + cgtime = sig->cgtime; + + if (whole) { + struct task_struct *t; + + min_flt = sig->min_flt; + maj_flt = sig->maj_flt; + gtime = sig->gtime; + + rcu_read_lock(); + __for_each_thread(sig, t) { + min_flt += t->min_flt; + maj_flt += t->maj_flt; + gtime += task_gtime(t); + } + rcu_read_unlock(); + } + } while (need_seqretry(&sig->stats_lock, seq)); + done_seqretry_irqrestore(&sig->stats_lock, seq, flags); + if (whole) { thread_group_cputime_adjusted(task, &utime, &stime); } else {
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Petr Pavlu petr.pavlu@suse.com
commit bdbddb109c75365d22ec4826f480c5e75869e1cb upstream.
Commit a8b9cf62ade1 ("ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default") attempted to fix an issue with direct trampolines on x86, see its description for details. However, it wrongly referenced the HAVE_DYNAMIC_FTRACE_WITH_REGS config option and the problem is still present.
Add the missing "CONFIG_" prefix for the logic to work as intended.
Link: https://lore.kernel.org/linux-trace-kernel/20240213132434.22537-1-petr.pavlu...
Fixes: a8b9cf62ade1 ("ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default") Signed-off-by: Petr Pavlu petr.pavlu@suse.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/ftrace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -5331,7 +5331,7 @@ static int register_ftrace_function_nolo * not support ftrace_regs_caller but direct_call, use SAVE_ARGS so that it * jumps from ftrace_caller for multiple ftrace_ops. */ -#ifndef HAVE_DYNAMIC_FTRACE_WITH_REGS +#ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS #define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_ARGS) #else #define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_REGS)
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steven Rostedt (Google) rostedt@goodmis.org
commit 2394ac4145ea91b92271e675a09af2a9ea6840b7 upstream.
The allocation of the struct saved_cmdlines_buffer structure changed from:
s = kmalloc(sizeof(*s), GFP_KERNEL); s->saved_cmdlines = kmalloc_array(TASK_COMM_LEN, val, GFP_KERNEL);
to:
orig_size = sizeof(*s) + val * TASK_COMM_LEN; order = get_order(orig_size); size = 1 << (order + PAGE_SHIFT); page = alloc_pages(GFP_KERNEL, order); if (!page) return NULL;
s = page_address(page); memset(s, 0, sizeof(*s));
s->saved_cmdlines = kmalloc_array(TASK_COMM_LEN, val, GFP_KERNEL);
Where that s->saved_cmdlines allocation looks to be a dangling allocation to kmemleak. That's because kmemleak only keeps track of kmalloc() allocations. For allocations that use page_alloc() directly, the kmemleak needs to be explicitly informed about it.
Add kmemleak_alloc() and kmemleak_free() around the page allocation so that it doesn't give the following false positive:
unreferenced object 0xffff8881010c8000 (size 32760): comm "swapper", pid 0, jiffies 4294667296 hex dump (first 32 bytes): ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ backtrace (crc ae6ec1b9): [<ffffffff86722405>] kmemleak_alloc+0x45/0x80 [<ffffffff8414028d>] __kmalloc_large_node+0x10d/0x190 [<ffffffff84146ab1>] __kmalloc+0x3b1/0x4c0 [<ffffffff83ed7103>] allocate_cmdlines_buffer+0x113/0x230 [<ffffffff88649c34>] tracer_alloc_buffers.isra.0+0x124/0x460 [<ffffffff8864a174>] early_trace_init+0x14/0xa0 [<ffffffff885dd5ae>] start_kernel+0x12e/0x3c0 [<ffffffff885f5758>] x86_64_start_reservations+0x18/0x30 [<ffffffff885f582b>] x86_64_start_kernel+0x7b/0x80 [<ffffffff83a001c3>] secondary_startup_64_no_verify+0x15e/0x16b
Link: https://lore.kernel.org/linux-trace-kernel/87r0hfnr9r.fsf@kernel.org/ Link: https://lore.kernel.org/linux-trace-kernel/20240214112046.09a322d6@gandalf.l...
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Catalin Marinas catalin.marinas@arm.com Fixes: 44dc5c41b5b1 ("tracing: Fix wasted memory in saved_cmdlines logic") Reported-by: Kalle Valo kvalo@kernel.org Tested-by: Kalle Valo kvalo@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -39,6 +39,7 @@ #include <linux/ctype.h> #include <linux/init.h> #include <linux/panic_notifier.h> +#include <linux/kmemleak.h> #include <linux/poll.h> #include <linux/nmi.h> #include <linux/fs.h> @@ -2331,6 +2332,7 @@ static void free_saved_cmdlines_buffer(s int order = get_order(sizeof(*s) + s->cmdline_num * TASK_COMM_LEN);
kfree(s->map_cmdline_to_pid); + kmemleak_free(s); free_pages((unsigned long)s, order); }
@@ -2350,6 +2352,7 @@ static struct saved_cmdlines_buffer *all return NULL;
s = page_address(page); + kmemleak_alloc(s, size, 1, GFP_KERNEL); memset(s, 0, sizeof(*s));
/* Round up to actual allocation */
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Junxiao Bi junxiao.bi@oracle.com
[ Upstream commit d6e035aad6c09991da1c667fb83419329a3baed8 ]
commit 5e2cf333b7bd ("md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d") introduced a hung bug and will be reverted in next patch, since the issue that commit is fixing is due to md superblock write is throttled by wbt, to fix it, we can have superblock write bypass block layer throttle.
Fixes: 5e2cf333b7bd ("md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d") Cc: stable@vger.kernel.org # v5.19+ Suggested-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Junxiao Bi junxiao.bi@oracle.com Reviewed-by: Logan Gunthorpe logang@deltatee.com Reviewed-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Song Liu song@kernel.org Link: https://lore.kernel.org/r/20231108182216.73611-1-junxiao.bi@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/md.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c index 21b04607d53a..d243c196c598 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -1037,9 +1037,10 @@ void md_super_write(struct mddev *mddev, struct md_rdev *rdev, return;
bio = bio_alloc_bioset(rdev->meta_bdev ? rdev->meta_bdev : rdev->bdev, - 1, - REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH | REQ_FUA, - GFP_NOIO, &mddev->sync_set); + 1, + REQ_OP_WRITE | REQ_SYNC | REQ_IDLE | REQ_META + | REQ_PREFLUSH | REQ_FUA, + GFP_NOIO, &mddev->sync_set);
atomic_inc(&rdev->nr_pending);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Damien Le Moal dlemoal@kernel.org
[ Upstream commit 748dc0b65ec2b4b7b3dbd7befcc4a54fdcac7988 ]
Partial completions of zone append request is not allowed but if a zone append completion indicates a number of completed bytes different from the original BIO size, only the BIO status is set to error. This leads to bio_advance() not setting the BIO size to 0 and thus to not call bio_endio() at the end of req_bio_endio().
Make sure a partially completed zone append is failed and completed immediately by forcing the completed number of bytes (nbytes) to be equal to the BIO size, thus ensuring that bio_endio() is called.
Fixes: 297db731847e ("block: fix req_bio_endio append error handling") Cc: stable@kernel.vger.org Signed-off-by: Damien Le Moal dlemoal@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Reviewed-by: Hannes Reinecke hare@suse.de Link: https://lore.kernel.org/r/20240110092942.442334-1-dlemoal@kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-mq.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c index a71974a5e57c..a02d3d922c58 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -772,11 +772,16 @@ static void req_bio_endio(struct request *rq, struct bio *bio, /* * Partial zone append completions cannot be supported as the * BIO fragments may end up not being written sequentially. + * For such case, force the completed nbytes to be equal to + * the BIO size so that bio_advance() sets the BIO remaining + * size to 0 and we end up calling bio_endio() before returning. */ - if (bio->bi_iter.bi_size != nbytes) + if (bio->bi_iter.bi_size != nbytes) { bio->bi_status = BLK_STS_IOERR; - else + nbytes = bio->bi_iter.bi_size; + } else { bio->bi_iter.bi_sector = rq->__sector; + } }
bio_advance(bio, nbytes);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Brown broonie@kernel.org
commit 69f89168b310878be82d7d97bc0d22068ad858c0 upstream.
Since the merge of b717dfbf73e8 ("Revert "usb: typec: tcpm: fix cc role at port reset"") into mainline the LibreTech Renegade Elite/Firefly has died during boot, the main symptom observed in testing is a sudden stop in console output. Gábor Stefanik identified in review that the patch would cause power to be removed from devices without batteries (like this board), observing that while the patch is correct according to the spec this appears to be an oversight in the spec.
Given that the change makes previously working systems unusable let's revert it, there was some discussion of identifying systems that have alternative power and implementing the standards conforming behaviour in only that case.
Fixes: b717dfbf73e8 ("Revert "usb: typec: tcpm: fix cc role at port reset"") Cc: stable stable@kernel.org Cc: Badhri Jagan Sridharan badhri@google.com Signed-off-by: Mark Brown broonie@kernel.org Acked-by: Heikki Krogerus heikki.krogerus@linux.intel.com Reviewed-by: Guenter Roeck linux@roeck-us.net Link: https://lore.kernel.org/r/20240212-usb-fix-renegade-v1-1-22c43c88d635@kernel... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/tcpm/tcpm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/usb/typec/tcpm/tcpm.c +++ b/drivers/usb/typec/tcpm/tcpm.c @@ -4862,7 +4862,8 @@ static void run_state_machine(struct tcp break; case PORT_RESET: tcpm_reset_port(port); - tcpm_set_cc(port, TYPEC_CC_OPEN); + tcpm_set_cc(port, tcpm_default_state(port) == SNK_UNATTACHED ? + TYPEC_CC_RD : tcpm_rp_cc(port)); tcpm_set_state(port, PORT_RESET_WAIT_OFF, PD_T_ERROR_RECOVERY); break;
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jozsef Kadlecsik kadlec@netfilter.org
commit 97f7cf1cd80eeed3b7c808b7c12463295c751001 upstream.
The patch "netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test", commit 28628fa9 fixes a race condition. But the synchronize_rcu() added to the swap function unnecessarily slows it down: it can safely be moved to destroy and use call_rcu() instead.
Eric Dumazet pointed out that simply calling the destroy functions as rcu callback does not work: sets with timeout use garbage collectors which need cancelling at destroy which can wait. Therefore the destroy functions are split into two: cancelling garbage collectors safely at executing the command received by netlink and moving the remaining part only into the rcu callback.
Link: https://lore.kernel.org/lkml/C0829B10-EAA6-4809-874E-E1E9C05A8D84@automattic... Fixes: 28628fa952fe ("netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test") Reported-by: Ale Crismani ale.crismani@automattic.com Reported-by: David Wang 00107082@163.com Tested-by: David Wang 00107082@163.com Signed-off-by: Jozsef Kadlecsik kadlec@netfilter.org Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/netfilter/ipset/ip_set.h | 4 +++ net/netfilter/ipset/ip_set_bitmap_gen.h | 14 +++++++++--- net/netfilter/ipset/ip_set_core.c | 37 ++++++++++++++++++++++++-------- net/netfilter/ipset/ip_set_hash_gen.h | 15 ++++++++++-- net/netfilter/ipset/ip_set_list_set.c | 13 ++++++++--- 5 files changed, 65 insertions(+), 18 deletions(-)
--- a/include/linux/netfilter/ipset/ip_set.h +++ b/include/linux/netfilter/ipset/ip_set.h @@ -186,6 +186,8 @@ struct ip_set_type_variant { /* Return true if "b" set is the same as "a" * according to the create set parameters */ bool (*same_set)(const struct ip_set *a, const struct ip_set *b); + /* Cancel ongoing garbage collectors before destroying the set*/ + void (*cancel_gc)(struct ip_set *set); /* Region-locking is used */ bool region_lock; }; @@ -242,6 +244,8 @@ extern void ip_set_type_unregister(struc
/* A generic IP set */ struct ip_set { + /* For call_cru in destroy */ + struct rcu_head rcu; /* The name of the set */ char name[IPSET_MAXNAMELEN]; /* Lock protecting the set data */ --- a/net/netfilter/ipset/ip_set_bitmap_gen.h +++ b/net/netfilter/ipset/ip_set_bitmap_gen.h @@ -28,6 +28,7 @@ #define mtype_del IPSET_TOKEN(MTYPE, _del) #define mtype_list IPSET_TOKEN(MTYPE, _list) #define mtype_gc IPSET_TOKEN(MTYPE, _gc) +#define mtype_cancel_gc IPSET_TOKEN(MTYPE, _cancel_gc) #define mtype MTYPE
#define get_ext(set, map, id) ((map)->extensions + ((set)->dsize * (id))) @@ -57,9 +58,6 @@ mtype_destroy(struct ip_set *set) { struct mtype *map = set->data;
- if (SET_WITH_TIMEOUT(set)) - del_timer_sync(&map->gc); - if (set->dsize && set->extensions & IPSET_EXT_DESTROY) mtype_ext_cleanup(set); ip_set_free(map->members); @@ -288,6 +286,15 @@ mtype_gc(struct timer_list *t) add_timer(&map->gc); }
+static void +mtype_cancel_gc(struct ip_set *set) +{ + struct mtype *map = set->data; + + if (SET_WITH_TIMEOUT(set)) + del_timer_sync(&map->gc); +} + static const struct ip_set_type_variant mtype = { .kadt = mtype_kadt, .uadt = mtype_uadt, @@ -301,6 +308,7 @@ static const struct ip_set_type_variant .head = mtype_head, .list = mtype_list, .same_set = mtype_same_set, + .cancel_gc = mtype_cancel_gc, };
#endif /* __IP_SET_BITMAP_IP_GEN_H */ --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -1182,6 +1182,14 @@ ip_set_destroy_set(struct ip_set *set) kfree(set); }
+static void +ip_set_destroy_set_rcu(struct rcu_head *head) +{ + struct ip_set *set = container_of(head, struct ip_set, rcu); + + ip_set_destroy_set(set); +} + static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info, const struct nlattr * const attr[]) { @@ -1193,8 +1201,6 @@ static int ip_set_destroy(struct sk_buff if (unlikely(protocol_min_failed(attr))) return -IPSET_ERR_PROTOCOL;
- /* Must wait for flush to be really finished in list:set */ - rcu_barrier();
/* Commands are serialized and references are * protected by the ip_set_ref_lock. @@ -1206,8 +1212,10 @@ static int ip_set_destroy(struct sk_buff * counter, so if it's already zero, we can proceed * without holding the lock. */ - read_lock_bh(&ip_set_ref_lock); if (!attr[IPSET_ATTR_SETNAME]) { + /* Must wait for flush to be really finished in list:set */ + rcu_barrier(); + read_lock_bh(&ip_set_ref_lock); for (i = 0; i < inst->ip_set_max; i++) { s = ip_set(inst, i); if (s && (s->ref || s->ref_netlink)) { @@ -1221,6 +1229,8 @@ static int ip_set_destroy(struct sk_buff s = ip_set(inst, i); if (s) { ip_set(inst, i) = NULL; + /* Must cancel garbage collectors */ + s->variant->cancel_gc(s); ip_set_destroy_set(s); } } @@ -1228,6 +1238,9 @@ static int ip_set_destroy(struct sk_buff inst->is_destroyed = false; } else { u32 flags = flag_exist(info->nlh); + u16 features = 0; + + read_lock_bh(&ip_set_ref_lock); s = find_set_and_id(inst, nla_data(attr[IPSET_ATTR_SETNAME]), &i); if (!s) { @@ -1238,10 +1251,16 @@ static int ip_set_destroy(struct sk_buff ret = -IPSET_ERR_BUSY; goto out; } + features = s->type->features; ip_set(inst, i) = NULL; read_unlock_bh(&ip_set_ref_lock); - - ip_set_destroy_set(s); + if (features & IPSET_TYPE_NAME) { + /* Must wait for flush to be really finished */ + rcu_barrier(); + } + /* Must cancel garbage collectors */ + s->variant->cancel_gc(s); + call_rcu(&s->rcu, ip_set_destroy_set_rcu); } return 0; out: @@ -1394,9 +1413,6 @@ static int ip_set_swap(struct sk_buff *s ip_set(inst, to_id) = from; write_unlock_bh(&ip_set_ref_lock);
- /* Make sure all readers of the old set pointers are completed. */ - synchronize_rcu(); - return 0; }
@@ -2409,8 +2425,11 @@ ip_set_fini(void) { nf_unregister_sockopt(&so_set); nfnetlink_subsys_unregister(&ip_set_netlink_subsys); - unregister_pernet_subsys(&ip_set_net_ops); + + /* Wait for call_rcu() in destroy */ + rcu_barrier(); + pr_debug("these are the famous last words\n"); }
--- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -221,6 +221,7 @@ static const union nf_inet_addr zeromask #undef mtype_gc_do #undef mtype_gc #undef mtype_gc_init +#undef mtype_cancel_gc #undef mtype_variant #undef mtype_data_match
@@ -265,6 +266,7 @@ static const union nf_inet_addr zeromask #define mtype_gc_do IPSET_TOKEN(MTYPE, _gc_do) #define mtype_gc IPSET_TOKEN(MTYPE, _gc) #define mtype_gc_init IPSET_TOKEN(MTYPE, _gc_init) +#define mtype_cancel_gc IPSET_TOKEN(MTYPE, _cancel_gc) #define mtype_variant IPSET_TOKEN(MTYPE, _variant) #define mtype_data_match IPSET_TOKEN(MTYPE, _data_match)
@@ -449,9 +451,6 @@ mtype_destroy(struct ip_set *set) struct htype *h = set->data; struct list_head *l, *lt;
- if (SET_WITH_TIMEOUT(set)) - cancel_delayed_work_sync(&h->gc.dwork); - mtype_ahash_destroy(set, ipset_dereference_nfnl(h->table), true); list_for_each_safe(l, lt, &h->ad) { list_del(l); @@ -598,6 +597,15 @@ mtype_gc_init(struct htable_gc *gc) queue_delayed_work(system_power_efficient_wq, &gc->dwork, HZ); }
+static void +mtype_cancel_gc(struct ip_set *set) +{ + struct htype *h = set->data; + + if (SET_WITH_TIMEOUT(set)) + cancel_delayed_work_sync(&h->gc.dwork); +} + static int mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, struct ip_set_ext *mext, u32 flags); @@ -1440,6 +1448,7 @@ static const struct ip_set_type_variant .uref = mtype_uref, .resize = mtype_resize, .same_set = mtype_same_set, + .cancel_gc = mtype_cancel_gc, .region_lock = true, };
--- a/net/netfilter/ipset/ip_set_list_set.c +++ b/net/netfilter/ipset/ip_set_list_set.c @@ -426,9 +426,6 @@ list_set_destroy(struct ip_set *set) struct list_set *map = set->data; struct set_elem *e, *n;
- if (SET_WITH_TIMEOUT(set)) - timer_shutdown_sync(&map->gc); - list_for_each_entry_safe(e, n, &map->members, list) { list_del(&e->list); ip_set_put_byindex(map->net, e->id); @@ -545,6 +542,15 @@ list_set_same_set(const struct ip_set *a a->extensions == b->extensions; }
+static void +list_set_cancel_gc(struct ip_set *set) +{ + struct list_set *map = set->data; + + if (SET_WITH_TIMEOUT(set)) + timer_shutdown_sync(&map->gc); +} + static const struct ip_set_type_variant set_variant = { .kadt = list_set_kadt, .uadt = list_set_uadt, @@ -558,6 +564,7 @@ static const struct ip_set_type_variant .head = list_set_head, .list = list_set_list, .same_set = list_set_same_set, + .cancel_gc = list_set_cancel_gc, };
static void
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jozsef Kadlecsik kadlec@netfilter.org
commit 27c5a095e2518975e20a10102908ae8231699879 upstream.
The patch fdb8e12cc2cc ("netfilter: ipset: fix performance regression in swap operation") missed to add the calls to gc cancellations at the error path of create operations and at module unload. Also, because the half of the destroy operations now executed by a function registered by call_rcu(), neither NFNL_SUBSYS_IPSET mutex or rcu read lock is held and therefore the checking of them results false warnings.
Fixes: 97f7cf1cd80e ("netfilter: ipset: fix performance regression in swap operation") Reported-by: syzbot+52bbc0ad036f6f0d4a25@syzkaller.appspotmail.com Reported-by: Brad Spengler spender@grsecurity.net Reported-by: Стас Ничипорович stasn77@gmail.com Tested-by: Brad Spengler spender@grsecurity.net Tested-by: Стас Ничипорович stasn77@gmail.com Signed-off-by: Jozsef Kadlecsik kadlec@netfilter.org Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/ipset/ip_set_core.c | 2 ++ net/netfilter/ipset/ip_set_hash_gen.h | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-)
--- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -1154,6 +1154,7 @@ static int ip_set_create(struct sk_buff return ret;
cleanup: + set->variant->cancel_gc(set); set->variant->destroy(set); put_out: module_put(set->type->me); @@ -2378,6 +2379,7 @@ ip_set_net_exit(struct net *net) set = ip_set(inst, i); if (set) { ip_set(inst, i) = NULL; + set->variant->cancel_gc(set); ip_set_destroy_set(set); } } --- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -431,7 +431,7 @@ mtype_ahash_destroy(struct ip_set *set, u32 i;
for (i = 0; i < jhash_size(t->htable_bits); i++) { - n = __ipset_dereference(hbucket(t, i)); + n = (__force struct hbucket *)hbucket(t, i); if (!n) continue; if (set->extensions & IPSET_EXT_DESTROY && ext_destroy) @@ -451,7 +451,7 @@ mtype_destroy(struct ip_set *set) struct htype *h = set->data; struct list_head *l, *lt;
- mtype_ahash_destroy(set, ipset_dereference_nfnl(h->table), true); + mtype_ahash_destroy(set, (__force struct htable *)h->table, true); list_for_each_safe(l, lt, &h->ad) { list_del(l); kfree(l);
6.7-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
commit 5ea9a7c5fe4149f165f0e3b624fe08df02b6c301 upstream.
A recent change to check_for_locks() changed it to take ->flc_lock while holding ->fi_lock. This creates a lock inversion (reported by lockdep) because there is a case where ->fi_lock is taken while holding ->flc_lock.
->flc_lock is held across ->fl_lmops callbacks, and nfsd_break_deleg_cb() is one of those and does take ->fi_lock. However it doesn't need to.
Prior to v4.17-rc1~110^2~22 ("nfsd: create a separate lease for each delegation") nfsd_break_deleg_cb() would walk the ->fi_delegations list and so needed the lock. Since then it doesn't walk the list and doesn't need the lock.
Two actions are performed under the lock. One is to call nfsd_break_one_deleg which calls nfsd4_run_cb(). These doesn't act on the nfs4_file at all, so don't need the lock.
The other is to set ->fi_had_conflict which is in the nfs4_file. This field is only ever set here (except when initialised to false) so there is no possible problem will multiple threads racing when setting it.
The field is tested twice in nfs4_set_delegation(). The first test does not hold a lock and is documented as an opportunistic optimisation, so it doesn't impose any need to hold ->fi_lock while setting ->fi_had_conflict.
The second test in nfs4_set_delegation() *is* make under ->fi_lock, so removing the locking when ->fi_had_conflict is set could make a change. The change could only be interesting if ->fi_had_conflict tested as false even though nfsd_break_one_deleg() ran before ->fi_lock was unlocked. i.e. while hash_delegation_locked() was running. As hash_delegation_lock() doesn't interact in any way with nfs4_run_cb() there can be no importance to this interaction.
So this patch removes the locking from nfsd_break_one_deleg() and moves the final test on ->fi_had_conflict out of the locked region to make it clear that locking isn't important to the test. It is still tested *after* vfs_setlease() has succeeded. This might be significant and as vfs_setlease() takes ->flc_lock, and nfsd_break_one_deleg() is called under ->flc_lock this "after" is a true ordering provided by a spinlock.
Fixes: edcf9725150e ("nfsd: fix RELEASE_LOCKOWNER") Signed-off-by: NeilBrown neilb@suse.de Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfs4state.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
--- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4945,10 +4945,8 @@ nfsd_break_deleg_cb(struct file_lock *fl */ fl->fl_break_time = 0;
- spin_lock(&fp->fi_lock); fp->fi_had_conflict = true; nfsd_break_one_deleg(dp); - spin_unlock(&fp->fi_lock); return false; }
@@ -5557,12 +5555,13 @@ nfs4_set_delegation(struct nfsd4_open *o if (status) goto out_unlock;
+ status = -EAGAIN; + if (fp->fi_had_conflict) + goto out_unlock; + spin_lock(&state_lock); spin_lock(&fp->fi_lock); - if (fp->fi_had_conflict) - status = -EAGAIN; - else - status = hash_delegation_locked(dp, fp); + status = hash_delegation_locked(dp, fp); spin_unlock(&fp->fi_lock); spin_unlock(&state_lock);
Works fine on my desktop with model name : AMD Ryzen 5 5600 6-Core Processor and Arch Linux
Tested-by: Luna Jernberg droidbittin@gmail.com
Den tis 20 feb. 2024 kl 22:27 skrev Greg Kroah-Hartman gregkh@linuxfoundation.org:
This is the start of the stable review cycle for the 6.7.6 release. There are 309 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 22 Feb 2024 20:55:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.7.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.7.y and the diffstat can be found below.
thanks,
greg k-h
Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.7.6-rc1
NeilBrown neilb@suse.de nfsd: don't take fi_lock in nfsd_break_deleg_cb()
Jozsef Kadlecsik kadlec@netfilter.org netfilter: ipset: Missing gc cancellations fixed
Jozsef Kadlecsik kadlec@netfilter.org netfilter: ipset: fix performance regression in swap operation
Mark Brown broonie@kernel.org usb: typec: tpcm: Fix issues with power being removed during reset
Damien Le Moal dlemoal@kernel.org block: fix partial zone append completion handling in req_bio_endio()
Junxiao Bi junxiao.bi@oracle.com md: bypass block throttle for superblock update
Steven Rostedt (Google) rostedt@goodmis.org tracing: Inform kmemleak of saved_cmdlines allocation
Petr Pavlu petr.pavlu@suse.com tracing: Fix HAVE_DYNAMIC_FTRACE_WITH_REGS ifdef
Oleg Nesterov oleg@redhat.com fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats
Oleg Nesterov oleg@redhat.com fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of lock_task_sighand()
Konrad Dybcio konrad.dybcio@linaro.org pmdomain: core: Move the unused cleanup to a _sync initcall
Oleksij Rempel o.rempel@pengutronix.de can: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER)
Ziqi Zhao astrajoan@yahoo.com can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock
Maxime Jayat maxime.jayat@mobile-devices.fr can: netlink: Fix TDCO calculation using the old data bittiming
Maximilian Heyne mheyne@amazon.de xen/events: close evtchn after mapping cleanup
Nuno Sa nuno.sa@analog.com of: property: fix typo in io-channels
Vegard Nossum vegard.nossum@oracle.com docs: kernel_feat.py: fix build error for missing files
Jan Kara jack@suse.cz blk-wbt: Fix detection of dirty-throttled tasks
Huacai Chen chenhuacai@kernel.org LoongArch: Fix earlycon parameter if KASAN enabled
Prakash Sangappa prakash.sangappa@oracle.com mm: hugetlb pages should not be reserved by shmat() if SHM_NORESERVE
Oscar Salvador osalvador@suse.de fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super
Dave Airlie airlied@redhat.com nouveau/gsp: use correct size for registry rpc.
Rishabh Dave ridave@redhat.com ceph: prevent use-after-free in encode_cap_msg()
Shradha Gupta shradhagupta@linux.microsoft.com hv_netvsc: Register VF in netvsc_probe if NET_DEVICE_REGISTER missed
Petr Tesarik petr@tesarici.cz net: stmmac: protect updates of 64-bit statistics counters
Jan Kiszka jan.kiszka@siemens.com riscv/efistub: Ensure GP-relative addressing is not used
Geert Uytterhoeven geert+renesas@glider.be pmdomain: renesas: r8a77980-sysc: CR7 must be always on
Sinthu Raja sinthu.raja@ti.com net: ethernet: ti: cpsw_new: enable mac_managed_pm to fix mdio
SeongJae Park sj@kernel.org mm/damon/sysfs-schemes: fix wrong DAMOS tried regions update timeout setup
Alexandra Winter wintera@linux.ibm.com s390/qeth: Fix potential loss of L3-IP@ in case of network issues
Sinthu Raja sinthu.raja@ti.com net: ethernet: ti: cpsw: enable mac_managed_pm to fix mdio
Christian Brauner brauner@kernel.org fs: relax mount_setattr() permission checks
Daniel Bristot de Oliveira bristot@kernel.org tools/rtla: Fix Makefile compiler options for clang
Daniel Bristot de Oliveira bristot@kernel.org tools/rtla: Fix uninitialized bucket/data->bucket_size warning
John Kacur jkacur@redhat.com tools/rtla: Exit with EXIT_SUCCESS when help is invoked
Daniel Bristot de Oliveira bristot@kernel.org tools/rtla: Fix clang warning about mount_point var size
limingming3 limingming890315@gmail.com tools/rtla: Replace setting prio with nice for SCHED_OTHER
Daniel Bristot de Oliveira bristot@kernel.org tools/rtla: Remove unused sched_getattr() function
Daniel Bristot de Oliveira bristot@kernel.org tools/rv: Fix Makefile compiler options for clang
Daniel Bristot de Oliveira bristot@kernel.org tools/rv: Fix curr_reactor uninitialized variable
Mario Limonciello mario.limonciello@amd.com ASoC: amd: yc: Add DMI quirk for Lenovo Ideapad Pro 5 16ARP8
Gergo Koteles soyer@irl.hu ASoC: tas2781: add module parameter to tascodec_init()
Curtis Malainey cujomalainey@chromium.org ASoC: SOF: IPC3: fix message bounds on ipc ops
Easwar Hariharan eahariha@linux.microsoft.com arm64: Subscribe Microsoft Azure Cobalt 100 to ARM Neoverse N2 errata
Mark Brown broonie@kernel.org arm64/signal: Don't assume that TIF_SVE means we saved SVE state
Fred Ai fred.ai@bayhubtech.com mmc: sdhci-pci-o2micro: Fix a warm reboot issue that disk can't be detected by BIOS
Damien Le Moal dlemoal@kernel.org zonefs: Improve error handling
Sebastian Ene sebastianene@google.com KVM: arm64: Fix circular locking dependency
Christian Borntraeger borntraeger@linux.ibm.com KVM: s390: vsie: fix race during shadow creation
Steve French stfrench@microsoft.com smb: Fix regression in writes when non-standard maximum write size negotiated
Paulo Alcantara pc@manguebit.com smb: client: set correct id, uid and cruid for multiuser automounts
Mohammad Rahimi rahimi.mhmmd@gmail.com thunderbolt: Fix setting the CNS bit in ROUTER_CS_5
Marc Zyngier maz@kernel.org irqchip/gic-v3-its: Fix GICv4.1 VPE affinity update
Marc Zyngier maz@kernel.org irqchip/gic-v3-its: Restore quirk probing for ACPI-based systems
Doug Berger opendmb@gmail.com irqchip/irq-brcmstb-l2: Add write memory barrier before exit
Dan Carpenter dan.carpenter@linaro.org PCI: dwc: Fix a 64bit bug in dw_pcie_ep_raise_msix_irq()
Emmanuel Grumbach emmanuel.grumbach@intel.com wifi: iwlwifi: mvm: fix a crash when we run out of stations
Johannes Berg johannes.berg@intel.com wifi: mac80211: reload info pointer in ieee80211_tx_dequeue()
Johannes Berg johannes.berg@intel.com wifi: cfg80211: fix wiphy delayed work queueing
Johannes Berg johannes.berg@intel.com wifi: iwlwifi: fix double-free bug
Daniel de Villiers daniel.devilliers@corigine.com nfp: flower: prevent re-adding mac index for bonded port
James Hershaw james.hershaw@corigine.com nfp: enable NETDEV_XDP_ACT_REDIRECT feature flag
Daniel Basilio daniel.basilio@corigine.com nfp: use correct macro for LengthSelect in BAR config
Herbert Xu herbert@gondor.apana.org.au crypto: algif_hash - Remove bogus SGL free on zero-length error path
Kim Phillips kim.phillips@amd.com crypto: ccp - Fix null pointer dereference in __sev_platform_shutdown_locked
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: fix data corruption in dsync block recovery for small block sizes
Shuming Fan shumingf@realtek.com ALSA: hda/realtek: add IDs for Dell dual spk platform
bo liu bo.liu@senarytech.com ALSA: hda/conexant: Add quirk for SWS JS201D
Eniac Zhang eniac-xw.zhang@hp.com ALSA: hda/realtek: fix mute/micmute LED For HP mt645
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org gpiolib: add gpiod_to_gpio_device() stub for !GPIOLIB
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org gpiolib: add gpio_device_get_base() stub for !GPIOLIB
Alexander Stein alexander.stein@ew.tq-group.com mmc: slot-gpio: Allow non-sleeping GPIO ro
Jens Axboe axboe@kernel.dk io_uring/net: fix multishot accept overflow handling
Steve Wahl steve.wahl@hpe.com x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
Mingwei Zhang mizhang@google.com KVM: x86/pmu: Fix type length error when reading pmu->fixed_ctr_ctrl
Prasad Pandit pjp@fedoraproject.org KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu
Andrei Vagin avagin@google.com x86/fpu: Stop relying on userspace for info to fault in xsave buffer
Aleksander Mazur deweloper@wp.pl x86/Kconfig: Transmeta Crusoe is CPU family 5, not 6
Jiri Slaby (SUSE) jirislaby@kernel.org serial: mxs-auart: fix tx
Jiri Slaby (SUSE) jirislaby@kernel.org serial: core: introduce uart_port_tx_flags()
Shrikanth Hegde sshegde@linux.ibm.com powerpc/pseries: fix accuracy of stolen time
David Engraf david.engraf@sysgo.com powerpc/cputable: Add missing PPC_FEATURE_BOOKE on PPC64 Book-E
Naveen N Rao naveen@kernel.org powerpc/64: Set task pt_regs->link to the LR value on scv entry
Masami Hiramatsu (Google) mhiramat@kernel.org ftrace: Fix DIRECT_CALLS to use SAVE_REGS by default
Hugo Villeneuve hvilleneuve@dimonoff.com serial: max310x: prevent infinite while() loop in port startup
Hugo Villeneuve hvilleneuve@dimonoff.com serial: max310x: fail probe if clock crystal is unstable
Hugo Villeneuve hvilleneuve@dimonoff.com serial: max310x: improve crystal stable clock detection
Hugo Villeneuve hvilleneuve@dimonoff.com serial: max310x: set default value when reading clock ready bit
Gui-Dong Han 2045gemini@gmail.com serial: core: Fix atomicity violation in uart_tiocmget
Hui Zhou hui.zhou@corigine.com nfp: flower: fix hardware offload for the transfer layer port
Hui Zhou hui.zhou@corigine.com nfp: flower: add hardware offload check for post ct entry
Andrew Lunn andrew@lunn.ch net: dsa: mv88e6xxx: Fix failed probe due to unsupported C45 reads
Vincent Donnefort vdonnefort@google.com ring-buffer: Clean ring_buffer_poll_wait() error return
Souradeep Chakrabarti schakrabarti@linux.microsoft.com hv_netvsc: Fix race condition between netvsc_probe and netvsc_remove
Lijo Lazar lijo.lazar@amd.com drm/amdgpu: Avoid fetching VRAM vendor info
Tom Chung chiahsuan.chung@amd.com drm/amd/display: Preserve original aspect ratio in create stream
Roman Li roman.li@amd.com drm/amd/display: Fix array-index-out-of-bounds in dcn35_clkmgr
Nathan Chancellor nathan@kernel.org drm/amd/display: Increase frame-larger-than for all display_mode_vba files
Fangzhi Zuo jerry.zuo@amd.com drm/amd/display: Fix MST Null Ptr for RV
Thong thong.thai@amd.com drm/amdgpu/soc21: update VCN 4 max HEVC encoding resolution
Philip Yang Philip.Yang@amd.com drm/prime: Support page array >= 4GB
Ville Syrjälä ville.syrjala@linux.intel.com drm/i915/dp: Limit SST link rate to <=8.1Gbps
Zhikai Zhai zhikai.zhai@amd.com drm/amd/display: Add align done check
Rob Clark robdclark@chromium.org drm/msm: Wire up tlb ops
Arunpravin Paneer Selvam Arunpravin.PaneerSelvam@amd.com drm/buddy: Fix alloc_range() error handling code
Timur Tabi ttabi@nvidia.com drm/nouveau: fix several DMA buffer leaks
Fedor Pchelkin pchelkin@ispras.ru ksmbd: free aux buffer if ksmbd_iov_pin_rsp_read fails
Oleg Nesterov oleg@redhat.com getrusage: use sig->stats_lock rather than lock_task_sighand()
Oleg Nesterov oleg@redhat.com getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand()
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Keep all directory links at 1
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Remove fsnotify*() functions from lookup()
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Restructure eventfs_inode structure to be more condensed
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Warn if an eventfs_inode is freed without is_freed being set
Linus Torvalds torvalds@linux-foundation.org eventfs: Get rid of dentry pointers without refcounts
Linus Torvalds torvalds@linux-foundation.org eventfs: Clean up dentry ops and add revalidate function
Linus Torvalds torvalds@linux-foundation.org eventfs: Remove unused d_parent pointer field
Linus Torvalds torvalds@linux-foundation.org tracefs: dentry lookup crapectomy
Linus Torvalds torvalds@linux-foundation.org tracefs: Avoid using the ei->dentry pointer unnecessarily
Linus Torvalds torvalds@linux-foundation.org eventfs: Initialize the tracefs inode properly
Steven Rostedt (Google) rostedt@goodmis.org tracefs: Zero out the tracefs_inode when allocating it
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Save directory inodes in the eventfs_inode structure
Erick Archer erick.archer@gmx.com eventfs: Use kcalloc() instead of kzalloc()
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Do not create dentries nor inodes in iterate_shared
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Have the inodes all for files and directories all be the same
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Shortcut eventfs_iterate() by skipping entries already read
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Read ei->entries before ei->children in eventfs_iterate()
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Do ctx->pos update for all iterations in eventfs_iterate()
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Have eventfs_iterate() stop immediately if ei->is_freed is set
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Stop using dcache_readdir() for getdents()
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Remove "lookup" parameter from create_dir/file_dentry()
Sean Young sean@mess.org media: rc: bpf attach/detach requires write permission
Eugen Hristev eugen.hristev@collabora.com pmdomain: mediatek: fix race conditions with genpd
Sam Protsenko semen.protsenko@linaro.org iio: pressure: bmp280: Add missing bmp085 to SPI id table
Randy Dunlap rdunlap@infradead.org iio: imu: bno055: serdev requires REGMAP
Nuno Sa nuno.sa@analog.com iio: imu: adis: ensure proper DMA alignment
Nuno Sa nuno.sa@analog.com iio: adc: ad_sigma_delta: ensure proper DMA alignment
Mario Limonciello mario.limonciello@amd.com iio: accel: bma400: Fix a compilation problem
Nuno Sa nuno.sa@analog.com iio: commom: st_sensors: ensure proper DMA alignment
Dinghao Liu dinghao.liu@zju.edu.cn iio: core: fix memleak in iio_device_register_sysfs
zhili.liu zhili.liu@ucas.com.cn iio: magnetometer: rm3100: add boundary check for the value read from RM3100_REG_TMRC
David Schiller david.schiller@jku.at staging: iio: ad5933: fix type mismatch regression
Tejun Heo tj@kernel.org Revert "workqueue: Override implicit ordered attribute in workqueue_apply_unbound_cpumask()"
Masami Hiramatsu (Google) mhiramat@kernel.org tracing/probes: Fix to search structure fields correctly
Masami Hiramatsu (Google) mhiramat@kernel.org tracing/probes: Fix to set arg size and fmt after setting type from BTF
Masami Hiramatsu (Google) mhiramat@kernel.org tracing/probes: Fix to show a parse error for bad type for $comm
Thorsten Blum thorsten.blum@toblux.com tracing/synthetic: Fix trace_string() return value
Steven Rostedt (Google) rostedt@goodmis.org tracing: Fix wasted memory in saved_cmdlines logic
Daniel Bristot de Oliveira bristot@kernel.org tracing/timerlat: Move hrtimer_init to timerlat_fd open()
Baokun Li libaokun1@huawei.com ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks()
Baokun Li libaokun1@huawei.com ext4: fix double-free of blocks due to wrong extents moved_len
Ekansh Gupta quic_ekangupt@quicinc.com misc: fastrpc: Mark all sessions as invalid in cb_remove
Carlos Llamas cmllamas@google.com binder: signal epoll threads of self-work
Andy Chi andy.chi@canonical.com ALSA: hda/realtek: fix mute/micmute LEDs for HP ZBook Power
Vitaly Rodionov vitalyr@opensource.cirrus.com ALSA: hda/cs8409: Suppress vmaster control for Dolphin models
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org ASoC: codecs: wcd938x: handle deferred probe
Kailang Yang kailang@realtek.com ALSA: hda/realtek - Add speaker pin verbtable for Dell dual speaker platform
Edson Juliano Drosdeck edson.drosdeck@gmail.com ALSA: hda/realtek: Enable headset mic on Vaio VJFE-ADL
Nathan Chancellor nathan@kernel.org modpost: Add '.ltext' and '.ltext.*' to TEXT_SECTIONS
Nathan Chancellor nathan@kernel.org um: Fix adding '-no-pie' for clang
Jan Beulich jbeulich@suse.com xen-netback: properly sync TX responses
Helge Deller deller@gmx.de parisc: BTLB: Fix crash when setting up BTLB at CPU bringup
Helge Deller deller@gmx.de parisc: Fix random data corruption from exception handler
Esben Haabendal esben@geanix.com net: stmmac: do not clear TBS enable bit on link up/down
Nikita Zhandarovich n.zhandarovich@fintech.ru net: hsr: remove WARN_ONCE() in send_hsr_supervision_frame()
Fedor Pchelkin pchelkin@ispras.ru nfc: nci: free rx_data_reassembly skb on NCI device cleanup
Nathan Chancellor nathan@kernel.org kbuild: Fix changing ELF file type for output of gen_btf for big endian
José Relvas josemonsantorelvas@gmail.com ALSA: hda/realtek: Apply headset jack quirk for non-bass alc287 thinkpads
Takashi Sakamoto o-takashi@sakamocchi.jp firewire: core: correct documentation of fw_csr_string() kernel API
Ondrej Mosnacek omosnace@redhat.com lsm: fix the logic in security_inode_getsecctx()
Ondrej Mosnacek omosnace@redhat.com lsm: fix default return value of the socket_getpeersec_*() hooks
Fangzhi Zuo jerry.zuo@amd.com drm/amd/display: Fix dcn35 8k30 Underflow/Corruption Issue
Wenjing Liu wenjing.liu@amd.com drm/amd/display: fix incorrect mpc_combine array size
David McFarland corngood@gmail.com drm/amd: Don't init MEC2 firmware when it fails to load
Friedrich Vock friedrich.vock@gmx.de drm/amdgpu: Reset IH OVERFLOW_CLEAR bit
Sebastian Ott sebott@redhat.com drm/virtio: Set segment size for virtio_gpu device
Vaishnav Achath vaishnav.a@ti.com spi: omap2-mcspi: Revert FIFO support without DMA
Keqi Wang wangkeqi_chris@163.com connector/cn_proc: revert "connector: Fix proc_event_num_listeners count not cleared"
Rob Clark robdclark@chromium.org Revert "drm/msm/gpu: Push gpu lock down past runpm"
Mario Limonciello mario.limonciello@amd.com Revert "drm/amd: flush any delayed gfxoff on suspend entry"
Lee Duncan lduncan@suse.com scsi: Revert "scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock"
Tomi Valkeinen tomi.valkeinen@ideasonboard.com media: Revert "media: rkisp1: Drop IRQF_SHARED"
Michael Ellerman mpe@ellerman.id.au Revert "powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add"
Paolo Abeni pabeni@redhat.com mptcp: really cope with fastopen race
Geliang Tang geliang@kernel.org mptcp: check addrs list in userspace_pm_get_local_id
Paolo Abeni pabeni@redhat.com mptcp: fix rcv space initialization
Paolo Abeni pabeni@redhat.com mptcp: drop the push_pending field
Geliang Tang geliang.tang@linux.dev selftests: mptcp: add mptcp_lib_kill_wait
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: mptcp: allow changing subtests prefix
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: mptcp: increase timeout to 30 min
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: mptcp: add missing kconfig for NF Mangle
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: mptcp: add missing kconfig for NF Filter in v6
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: mptcp: add missing kconfig for NF Filter
Paolo Abeni pabeni@redhat.com mptcp: fix data re-injection from stale subflow
Arnd Bergmann arnd@arndb.de kallsyms: ignore ARMv4 thunks along with others
Radek Krejci radek.krejci@oracle.com modpost: trim leading spaces when processing source files list
Jean Delvare jdelvare@suse.de i2c: i801: Fix block process call transactions
Arnd Bergmann arnd@arndb.de i2c: pasemi: split driver into two separate modules
Shivaprasad G Bhat sbhat@linux.ibm.com powerpc/iommu: Fix the missing iommu_group_put() during platform domain attach
Michael Ellerman mpe@ellerman.id.au powerpc/kasan: Limit KASAN thread size increase to 32KB
Marc Zyngier maz@kernel.org irqchip/gic-v3-its: Handle non-coherent GICv4 redistributors
Bibo Mao maobibo@loongson.cn irqchip/loongson-eiointc: Use correct struct type in eiointc_domain_alloc()
Viken Dadhaniya quic_vdadhani@quicinc.com i2c: qcom-geni: Correct I2C TRE sequence
Dan Carpenter dan.carpenter@linaro.org cifs: fix underflow in parse_server_interfaces()
Cosmin Tanislav demonsingur@gmail.com iio: adc: ad4130: only set GPIO_CTRL if pin is unused
Cosmin Tanislav demonsingur@gmail.com iio: adc: ad4130: zero-initialize clock init data
Alex Williamson alex.williamson@redhat.com PCI: Fix active state requirement in PME polling
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "kobject: Remove redundant checks for whether ktype is NULL"
Jiangfeng Xiao xiaojiangfeng@huawei.com powerpc/kasan: Fix addr error caused by page alignment
Matthias Schiffer matthias.schiffer@ew.tq-group.com powerpc/6xx: set High BAT Enable flag on G2_LE cores
Gaurav Batra gbatra@linux.ibm.com powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add
Saravana Kannan saravanak@google.com driver core: fw_devlink: Improve detection of overlapping cycles
Zhipeng Lu alexious@zju.edu.cn media: ir_toy: fix a memleak in irtoy_tx
Konrad Dybcio konrad.dybcio@linaro.org interconnect: qcom: sm8550: Enable sync_state
Konrad Dybcio konrad.dybcio@linaro.org interconnect: qcom: sc8180x: Mark CO0 BCM keepalive
Uttkarsh Aggarwal quic_uaggarwa@quicinc.com usb: dwc3: gadget: Fix NULL pointer dereference in dwc3_gadget_suspend
Udipto Goswami quic_ugoswami@quicinc.com usb: core: Prevent null pointer dereference in update_port_device_state
Xu Yang xu.yang_2@nxp.com usb: chipidea: core: handle power lost in workqueue
yuan linyu yuanlinyu@hihonor.com usb: f_mass_storage: forbid async queue when shutdown happen
Oliver Neukum oneukum@suse.com USB: hub: check for alternate port before enabling A_ALT_HNP_SUPPORT
Christian A. Ehrhardt lk@c--e.de usb: ucsi_acpi: Fix command completion handling
Sean Anderson sean.anderson@seco.com usb: ulpi: Fix debugfs directory leak
Christian A. Ehrhardt lk@c--e.de usb: ucsi: Add missing ppm_lock
Srinivas Pandruvada srinivas.pandruvada@linux.intel.com iio: hid-sensor-als: Return 0 for HID_USAGE_SENSOR_TIME_TIMESTAMP
Jason Gerecke killertofu@gmail.com HID: wacom: Do not register input devices until after hid_hw_start
Tatsunosuke Tobita tatsunosuke.tobita@wacom.com HID: wacom: generic: Avoid reporting a serial of '0' to userspace
Johan Hovold johan+linaro@kernel.org HID: i2c-hid-of: fix NULL-deref on failed power up
Benjamin Tissoires bentiss@kernel.org HID: bpf: actually free hdev memory after attaching a HID-BPF program
Benjamin Tissoires bentiss@kernel.org HID: bpf: remove double fdget()
Luka Guzenko l.guzenko@web.de ALSA: hda/realtek: Enable Mute LED on HP Laptop 14-fq0xxx
David Senoner seda18@rolmail.net ALSA: hda/realtek: Fix the external mic not being recognised for Acer Swift 1 SF114-32
Helge Deller deller@gmx.de parisc: Prevent hung tasks when printing inventory on serial console
Techno Mooney techno.mooney@gmail.com ASoC: amd: yc: Add DMI quirk for MSI Bravo 15 C7VF
Mikulas Patocka mpatocka@redhat.com dm-crypt, dm-verity: disable tasklets
Dave Airlie airlied@redhat.com nouveau: offload fence uevents work to workqueue
Michael Kelley mhklinux@outlook.com scsi: storvsc: Fix ring buffer size calculation
Nico Pache npache@redhat.com selftests: mm: fix map_hugetlb failure on 64K page size systems
Audra Mitchell audra@redhat.com selftests/mm: Update va_high_addr_switch.sh to check CPU for la57 flag
Zach O'Keefe zokeefe@google.com mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again
Jan Kara jack@suse.cz readahead: avoid multiple marked readahead pages
Muhammad Usama Anjum usama.anjum@collabora.com selftests/mm: switch to bash from sh
Sidhartha Kumar sidhartha.kumar@oracle.com fs/hugetlbfs/inode.c: mm/memory-failure.c: fix hugetlbfs hwpoison handling
Masami Hiramatsu (Google) mhiramat@kernel.org tracing/trigger: Fix to return error if failed to alloc snapshot
Samuel Holland samuel.holland@sifive.com scs: add CONFIG_MMU dependency for vfree_atomic()
Ryan Roberts ryan.roberts@arm.com selftests/mm: ksm_tests should only MADV_HUGEPAGE valid memory
Lokesh Gidra lokeshgidra@google.com userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb
Ryan Roberts ryan.roberts@arm.com mm: thp_get_unmapped_area must honour topdown preference
Ivan Vecera ivecera@redhat.com i40e: Fix waiting for queues of all VSIs to be disabled
Ivan Vecera ivecera@redhat.com i40e: Do not allow untrusted VF to remove administratively set MAC
Jiaxun Yang jiaxun.yang@flygoat.com mm/memory: Use exception ip to search exception tables
Jiaxun Yang jiaxun.yang@flygoat.com ptrace: Introduce exception_ip arch hook
Guenter Roeck linux@roeck-us.net MIPS: Add 'memory' clobber to csum_ipv6_magic() inline assembler
Arnd Bergmann arnd@arndb.de nouveau/svm: fix kvcalloc() argument order
Breno Leitao leitao@debian.org net: sysfs: Fix /sys/class/net/<iface> path for statistics
Manasi Navare navaremanasi@chromium.org drm/i915/dsc: Fix the macro that calculates DSCC_/DSCA_ PPS reg address
Alexey Khoroshilov khoroshilov@ispras.ru ASoC: rt5645: Fix deadlock in rt5645_jack_detect_work()
Uwe Kleine-König u.kleine-koenig@pengutronix.de spi: ppc4xx: Drop write-only variable
Jakub Kicinski kuba@kernel.org net: tls: fix returned read length with async decrypt
Sabrina Dubroca sd@queasysnail.net net: tls: fix use-after-free with partial reads and async decrypt
Jakub Kicinski kuba@kernel.org net: tls: handle backlogging of crypto requests
Jakub Kicinski kuba@kernel.org tls: fix race between tx work scheduling and socket close
Jakub Kicinski kuba@kernel.org tls: fix race between async notify and socket close
Jakub Kicinski kuba@kernel.org net: tls: factor out tls_*crypt_async_wait()
Horatiu Vultur horatiu.vultur@microchip.com lan966x: Fix crash when adding interface under a lag
Aaron Conole aconole@redhat.com net: openvswitch: limit the number of recursions from action sets
Ido Schimmel idosch@nvidia.com selftests: forwarding: Fix bridge locked port test flakiness
Ido Schimmel idosch@nvidia.com selftests: forwarding: Suppress grep warnings
Ido Schimmel idosch@nvidia.com selftests: forwarding: Fix bridge MDB test flakiness
Ido Schimmel idosch@nvidia.com selftests: forwarding: Fix layer 2 miss test flakiness
Ido Schimmel idosch@nvidia.com selftests: net: Fix bridge backup port test flakiness
Hangbin Liu liuhangbin@gmail.com selftests/net: convert test_bridge_backup_port.sh to run it in unique namespace
Hojin Nam hj96.nam@samsung.com perf: CXL: fix mismatched cpmu event opcode
Lukas Bulwahn lukas.bulwahn@gmail.com ALSA: hda/cs35l56: select intended config FW_CS_DSP
Saravana Kannan saravanak@google.com of: property: Improve finding the supplier of a remote-endpoint property
Saravana Kannan saravanak@google.com of: property: Improve finding the consumer of a remote-endpoint property
Parav Pandit parav@nvidia.com devlink: Fix command annotation documentation
Magnus Karlsson magnus.karlsson@intel.com bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY
Chuck Lever chuck.lever@oracle.com net/handshake: Fix handshake_req_destroy_test1
Jiri Pirko jiri@resnulli.us net/mlx5: DPLL, Fix possible use after free after delayed work timer triggers
Jiri Pirko jiri@resnulli.us dpll: fix possible deadlock during netlink dump operation
Ranjani Sridharan ranjani.sridharan@linux.intel.com ASoC: SOF: ipc3-topology: Fix pipeline tear down logic
Dan Carpenter dan.carpenter@linaro.org wifi: iwlwifi: uninitialized variable in iwl_acpi_get_ppag_table()
Dan Carpenter dan.carpenter@linaro.org wifi: iwlwifi: Fix some error codes
Miri Korenblit miriam.rachel.korenblit@intel.com wifi: iwlwifi: clear link_id in time_event
Amadeusz Sławiński amadeuszx.slawinski@linux.intel.com ASoC: Intel: avs: Fix dynamic port assignment when TDM is set
Carlos Song carlos.song@nxp.com spi: imx: fix the burst length at DMA mode and CPU mode
Cezary Rojewski cezary.rojewski@intel.com ASoC: Intel: avs: Fix pci_probe() error path
Mickaël Salaün mic@digikod.net selftests/landlock: Fix capability for net_test
Rob Clark robdclark@chromium.org drm/msm/gem: Fix double resv lock aquire
Christian A. Ehrhardt lk@c--e.de of: unittest: Fix compile in the non-dynamic case
Hu Yadi hu.yadi@h3c.com selftests/landlock: Fix fs_test build with old libc
Hu Yadi hu.yadi@h3c.com selftests/landlock: Fix net_test build with old libc
Nícolas F. R. A. Prado nfraprado@collabora.com kselftest: dt: Stop relying on dirname to improve performance
Saravana Kannan saravanak@google.com driver core: Fix device_link_flag_is_sync_state_only()
Josef Bacik josef@toxicpanda.com btrfs: don't drop extent_map for free space inode on write error
Filipe Manana fdmanana@suse.com btrfs: reject encoded write if inode has nodatasum flag set
Filipe Manana fdmanana@suse.com btrfs: don't reserve space for checksums when writing to nocow files
David Sterba dsterba@suse.com btrfs: send: return EOPNOTSUPP on unknown flags
Boris Burkov boris@bur.io btrfs: forbid deleting live subvol qgroup
Qu Wenruo wqu@suse.com btrfs: do not ASSERT() if the newly created subvolume already got read
Boris Burkov boris@bur.io btrfs: forbid creating subvol qgroups
Filipe Manana fdmanana@suse.com btrfs: don't refill whole delayed refs block reserve when starting transaction
Filipe Manana fdmanana@suse.com btrfs: add new unused block groups to the list of unused block groups
Filipe Manana fdmanana@suse.com btrfs: do not delete unused block group if it may be used soon
Filipe Manana fdmanana@suse.com btrfs: add and use helper to check if block group is used
Yang Shi yang@os.amperecomputing.com mm: mmap: map MAP_STACK to VM_NOHUGEPAGE
Yang Shi yang@os.amperecomputing.com mm: huge_memory: don't force huge page alignment on 32 bit
Linus Torvalds torvalds@linux-foundation.org update workarounds for gcc "asm goto" issue
Linus Torvalds torvalds@linux-foundation.org work around gcc bugs with 'asm goto' with outputs
Diffstat:
.../ABI/testing/sysfs-class-net-statistics | 48 +- Documentation/arch/arm64/silicon-errata.rst | 7 + Documentation/netlink/specs/dpll.yaml | 4 - Documentation/networking/devlink/devlink-port.rst | 2 +- Documentation/sphinx/kernel_feat.py | 2 +- Makefile | 4 +- arch/Kconfig | 1 + arch/arc/include/asm/jump_label.h | 4 +- arch/arm/include/asm/jump_label.h | 4 +- arch/arm64/include/asm/alternative-macros.h | 4 +- arch/arm64/include/asm/cputype.h | 4 + arch/arm64/include/asm/jump_label.h | 4 +- arch/arm64/kernel/cpu_errata.c | 3 + arch/arm64/kernel/fpsimd.c | 2 +- arch/arm64/kernel/signal.c | 4 +- arch/arm64/kvm/pkvm.c | 27 +- arch/csky/include/asm/jump_label.h | 4 +- arch/loongarch/include/asm/jump_label.h | 4 +- arch/loongarch/mm/kasan_init.c | 3 + arch/mips/include/asm/checksum.h | 3 +- arch/mips/include/asm/jump_label.h | 4 +- arch/mips/include/asm/ptrace.h | 2 + arch/mips/kernel/ptrace.c | 7 + arch/parisc/Kconfig | 1 - arch/parisc/include/asm/assembly.h | 1 + arch/parisc/include/asm/extable.h | 64 ++ arch/parisc/include/asm/jump_label.h | 4 +- arch/parisc/include/asm/special_insns.h | 6 +- arch/parisc/include/asm/uaccess.h | 48 +- arch/parisc/kernel/cache.c | 6 +- arch/parisc/kernel/drivers.c | 3 + arch/parisc/kernel/unaligned.c | 44 +- arch/parisc/mm/fault.c | 11 +- arch/powerpc/include/asm/jump_label.h | 4 +- arch/powerpc/include/asm/reg.h | 2 + arch/powerpc/include/asm/thread_info.h | 2 +- arch/powerpc/include/asm/uaccess.h | 12 +- arch/powerpc/kernel/cpu_setup_6xx.S | 20 +- arch/powerpc/kernel/cpu_specs_e500mc.h | 3 +- arch/powerpc/kernel/interrupt_64.S | 4 +- arch/powerpc/kernel/iommu.c | 4 +- arch/powerpc/kernel/irq_64.c | 2 +- arch/powerpc/mm/kasan/init_32.c | 1 + arch/powerpc/platforms/pseries/lpar.c | 8 +- arch/riscv/include/asm/bitops.h | 8 +- arch/riscv/include/asm/cpufeature.h | 4 +- arch/riscv/include/asm/jump_label.h | 4 +- arch/s390/include/asm/jump_label.h | 4 +- arch/s390/kvm/vsie.c | 1 - arch/s390/mm/gmap.c | 1 + arch/sparc/include/asm/jump_label.h | 4 +- arch/um/Makefile | 4 +- arch/um/include/asm/cpufeature.h | 2 +- arch/x86/Kconfig.cpu | 2 +- arch/x86/include/asm/cpufeature.h | 2 +- arch/x86/include/asm/jump_label.h | 6 +- arch/x86/include/asm/rmwcc.h | 2 +- arch/x86/include/asm/special_insns.h | 2 +- arch/x86/include/asm/uaccess.h | 10 +- arch/x86/kernel/fpu/signal.c | 13 +- arch/x86/kvm/svm/svm_ops.h | 6 +- arch/x86/kvm/vmx/pmu_intel.c | 2 +- arch/x86/kvm/vmx/vmx.c | 4 +- arch/x86/kvm/vmx/vmx_ops.h | 6 +- arch/x86/kvm/x86.c | 3 +- arch/x86/mm/ident_map.c | 23 +- arch/xtensa/include/asm/jump_label.h | 4 +- block/blk-mq.c | 9 +- block/blk-wbt.c | 4 +- crypto/algif_hash.c | 5 +- drivers/android/binder.c | 10 + drivers/base/core.c | 15 +- drivers/base/power/domain.c | 2 +- drivers/connector/cn_proc.c | 5 +- drivers/crypto/ccp/sev-dev.c | 10 +- drivers/dpll/dpll_netlink.c | 20 +- drivers/dpll/dpll_nl.c | 4 - drivers/dpll/dpll_nl.h | 2 - drivers/firewire/core-device.c | 7 +- drivers/firmware/efi/libstub/Makefile | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 9 +- drivers/gpu/drm/amd/amdgpu/cik_ih.c | 6 + drivers/gpu/drm/amd/amdgpu/cz_ih.c | 5 + drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 2 - drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 8 - drivers/gpu/drm/amd/amdgpu/iceland_ih.c | 5 + drivers/gpu/drm/amd/amdgpu/ih_v6_0.c | 6 + drivers/gpu/drm/amd/amdgpu/ih_v6_1.c | 7 + drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 6 + drivers/gpu/drm/amd/amdgpu/si_ih.c | 6 + drivers/gpu/drm/amd/amdgpu/soc21.c | 4 +- drivers/gpu/drm/amd/amdgpu/tonga_ih.c | 6 + drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 6 + drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 6 + drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 14 +- .../amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c | 15 +- drivers/gpu/drm/amd/display/dc/dml/Makefile | 6 +- .../gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c | 2 +- .../amd/display/dc/dml2/dml2_translation_helper.c | 27 +- drivers/gpu/drm/amd/display/dc/inc/core_types.h | 2 + .../display/dc/link/protocols/link_dp_training.c | 5 +- drivers/gpu/drm/drm_buddy.c | 6 + drivers/gpu/drm/drm_prime.c | 2 +- drivers/gpu/drm/i915/display/intel_dp.c | 3 + drivers/gpu/drm/i915/display/intel_vdsc_regs.h | 4 +- drivers/gpu/drm/msm/msm_gem_prime.c | 4 +- drivers/gpu/drm/msm/msm_gpu.c | 11 +- drivers/gpu/drm/msm/msm_iommu.c | 32 +- drivers/gpu/drm/msm/msm_ringbuffer.c | 7 +- drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h | 2 +- drivers/gpu/drm/nouveau/nouveau_fence.c | 26 +- drivers/gpu/drm/nouveau/nouveau_fence.h | 1 + drivers/gpu/drm/nouveau/nouveau_svm.c | 2 +- drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 61 +- drivers/gpu/drm/virtio/virtgpu_drv.c | 1 + drivers/hid/bpf/hid_bpf_dispatch.c | 83 ++- drivers/hid/bpf/hid_bpf_dispatch.h | 4 +- drivers/hid/bpf/hid_bpf_jmp_table.c | 40 +- drivers/hid/i2c-hid/i2c-hid-of.c | 1 + drivers/hid/wacom_sys.c | 63 +- drivers/hid/wacom_wac.c | 9 +- drivers/i2c/busses/Makefile | 6 +- drivers/i2c/busses/i2c-i801.c | 4 +- drivers/i2c/busses/i2c-pasemi-core.c | 6 + drivers/i2c/busses/i2c-qcom-geni.c | 16 +- drivers/iio/accel/Kconfig | 2 + drivers/iio/adc/ad4130.c | 12 +- drivers/iio/imu/bno055/Kconfig | 1 + drivers/iio/industrialio-core.c | 5 +- drivers/iio/light/hid-sensor-als.c | 1 + drivers/iio/magnetometer/rm3100-core.c | 10 +- drivers/iio/pressure/bmp280-spi.c | 1 + drivers/interconnect/qcom/sc8180x.c | 1 + drivers/interconnect/qcom/sm8550.c | 1 + drivers/irqchip/irq-brcmstb-l2.c | 5 +- drivers/irqchip/irq-gic-v3-its.c | 62 +- drivers/irqchip/irq-loongson-eiointc.c | 2 +- drivers/md/dm-crypt.c | 38 +- drivers/md/dm-verity-target.c | 26 +- drivers/md/dm-verity.h | 1 - drivers/md/md.c | 7 +- .../media/platform/rockchip/rkisp1/rkisp1-dev.c | 2 +- drivers/media/rc/bpf-lirc.c | 6 +- drivers/media/rc/ir_toy.c | 2 + drivers/media/rc/lirc_dev.c | 5 +- drivers/media/rc/rc-core-priv.h | 2 +- drivers/misc/fastrpc.c | 2 +- drivers/mmc/core/slot-gpio.c | 6 +- drivers/mmc/host/sdhci-pci-o2micro.c | 30 + drivers/net/bonding/bond_main.c | 5 +- drivers/net/can/dev/netlink.c | 2 +- drivers/net/dsa/mv88e6xxx/chip.c | 2 +- drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 38 +- drivers/net/ethernet/mellanox/mlx5/core/dpll.c | 2 +- .../net/ethernet/microchip/lan966x/lan966x_lag.c | 9 +- .../net/ethernet/netronome/nfp/flower/conntrack.c | 46 +- .../ethernet/netronome/nfp/flower/tunnel_conf.c | 2 +- .../net/ethernet/netronome/nfp/nfp_net_common.c | 1 + .../ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c | 6 +- drivers/net/ethernet/stmicro/stmmac/common.h | 56 +- drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 15 +- drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c | 15 +- drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c | 15 +- drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c | 15 +- .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 125 +++- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 136 ++-- drivers/net/ethernet/ti/cpsw.c | 2 + drivers/net/ethernet/ti/cpsw_new.c | 3 + drivers/net/hyperv/netvsc.c | 5 +- drivers/net/hyperv/netvsc_drv.c | 82 ++- drivers/net/wireless/intel/iwlwifi/fw/acpi.c | 15 +- drivers/net/wireless/intel/iwlwifi/iwl-drv.c | 1 + drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c | 3 + drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c | 4 + .../net/wireless/intel/iwlwifi/mvm/time-event.c | 3 +- drivers/net/xen-netback/netback.c | 100 ++- drivers/of/property.c | 61 +- drivers/of/unittest.c | 12 +- drivers/pci/controller/dwc/pcie-designware-ep.c | 3 +- drivers/pci/pci.c | 37 +- drivers/perf/cxl_pmu.c | 2 +- drivers/pmdomain/mediatek/mtk-pm-domains.c | 15 +- drivers/pmdomain/renesas/r8a77980-sysc.c | 3 +- drivers/s390/net/qeth_l3_main.c | 9 +- drivers/scsi/fcoe/fcoe_ctlr.c | 20 +- drivers/scsi/storvsc_drv.c | 12 +- drivers/spi/spi-imx.c | 9 +- drivers/spi/spi-omap2-mcspi.c | 137 +--- drivers/spi/spi-ppc4xx.c | 5 - drivers/staging/iio/impedance-analyzer/ad5933.c | 2 +- drivers/thunderbolt/tb_regs.h | 2 +- drivers/thunderbolt/usb4.c | 2 +- drivers/tty/serial/max310x.c | 53 +- drivers/tty/serial/mxs-auart.c | 5 +- drivers/tty/serial/serial_core.c | 2 +- drivers/usb/chipidea/ci.h | 2 + drivers/usb/chipidea/core.c | 44 +- drivers/usb/common/ulpi.c | 2 +- drivers/usb/core/hub.c | 46 +- drivers/usb/dwc3/gadget.c | 6 +- drivers/usb/gadget/function/f_mass_storage.c | 20 +- drivers/usb/typec/tcpm/tcpm.c | 3 +- drivers/usb/typec/ucsi/ucsi.c | 2 + drivers/usb/typec/ucsi/ucsi_acpi.c | 17 +- drivers/xen/events/events_base.c | 8 +- fs/btrfs/block-group.c | 80 +- fs/btrfs/block-group.h | 7 + fs/btrfs/delalloc-space.c | 29 +- fs/btrfs/disk-io.c | 13 +- fs/btrfs/inode.c | 26 +- fs/btrfs/ioctl.c | 5 + fs/btrfs/qgroup.c | 14 + fs/btrfs/send.c | 2 +- fs/btrfs/transaction.c | 38 +- fs/ceph/caps.c | 3 +- fs/ext4/mballoc.c | 39 +- fs/ext4/move_extent.c | 6 +- fs/hugetlbfs/inode.c | 21 +- fs/namespace.c | 11 +- fs/nfsd/nfs4state.c | 11 +- fs/nilfs2/file.c | 8 +- fs/nilfs2/recovery.c | 7 +- fs/proc/array.c | 66 +- fs/smb/client/connect.c | 14 +- fs/smb/client/fs_context.c | 11 + fs/smb/client/namespace.c | 16 + fs/smb/client/smb2ops.c | 2 +- fs/smb/server/smb2pdu.c | 8 +- fs/tracefs/event_inode.c | 814 ++++++--------------- fs/tracefs/inode.c | 102 +-- fs/tracefs/internal.h | 46 +- fs/zonefs/file.c | 42 +- fs/zonefs/super.c | 66 +- include/linux/backing-dev-defs.h | 7 +- include/linux/compiler-gcc.h | 20 + include/linux/compiler_types.h | 11 +- include/linux/gpio/driver.h | 12 + include/linux/iio/adc/ad_sigma_delta.h | 4 +- include/linux/iio/common/st_sensors.h | 4 +- include/linux/iio/imu/adis.h | 3 +- include/linux/lsm_hook_defs.h | 4 +- include/linux/mman.h | 1 + include/linux/netfilter/ipset/ip_set.h | 4 + include/linux/ptrace.h | 4 + include/linux/serial_core.h | 32 +- include/net/tls.h | 5 - include/sound/tas2781.h | 1 + init/Kconfig | 9 + io_uring/net.c | 5 +- kernel/sys.c | 54 +- kernel/trace/ftrace.c | 10 + kernel/trace/ring_buffer.c | 2 +- kernel/trace/trace.c | 78 +- kernel/trace/trace_btf.c | 4 +- kernel/trace/trace_events_synth.c | 3 +- kernel/trace/trace_events_trigger.c | 6 +- kernel/trace/trace_osnoise.c | 6 +- kernel/trace/trace_probe.c | 32 +- kernel/trace/trace_probe.h | 3 +- kernel/workqueue.c | 8 +- lib/kobject.c | 24 +- mm/backing-dev.c | 2 +- mm/damon/sysfs-schemes.c | 2 +- mm/huge_memory.c | 14 +- mm/memory-failure.c | 2 +- mm/memory.c | 4 +- mm/mmap.c | 6 +- mm/page-writeback.c | 4 +- mm/readahead.c | 4 +- mm/userfaultfd.c | 15 +- net/can/j1939/j1939-priv.h | 3 +- net/can/j1939/main.c | 2 +- net/can/j1939/socket.c | 46 +- net/handshake/handshake-test.c | 5 +- net/hsr/hsr_device.c | 4 +- net/mac80211/tx.c | 5 +- net/mptcp/pm_userspace.c | 13 +- net/mptcp/protocol.c | 25 +- net/mptcp/protocol.h | 7 +- net/mptcp/subflow.c | 4 +- net/netfilter/ipset/ip_set_bitmap_gen.h | 14 +- net/netfilter/ipset/ip_set_core.c | 39 +- net/netfilter/ipset/ip_set_hash_gen.h | 19 +- net/netfilter/ipset/ip_set_list_set.c | 13 +- net/netfilter/nft_set_pipapo_avx2.c | 2 +- net/nfc/nci/core.c | 4 + net/openvswitch/flow_netlink.c | 49 +- net/tls/tls_sw.c | 135 ++-- net/wireless/core.c | 3 +- samples/bpf/asm_goto_workaround.h | 8 +- scripts/link-vmlinux.sh | 9 +- scripts/mksysmap | 13 +- scripts/mod/modpost.c | 3 +- scripts/mod/sumversion.c | 7 +- security/security.c | 45 +- sound/pci/hda/Kconfig | 4 +- sound/pci/hda/patch_conexant.c | 18 + sound/pci/hda/patch_cs8409.c | 1 + sound/pci/hda/patch_realtek.c | 20 +- sound/pci/hda/tas2781_hda_i2c.c | 2 +- sound/soc/amd/yc/acp6x-mach.c | 14 + sound/soc/codecs/rt5645.c | 1 + sound/soc/codecs/tas2781-comlib.c | 3 +- sound/soc/codecs/tas2781-i2c.c | 2 +- sound/soc/codecs/wcd938x.c | 2 +- sound/soc/intel/avs/core.c | 3 + sound/soc/intel/avs/topology.c | 2 +- sound/soc/sof/ipc3-topology.c | 69 +- sound/soc/sof/ipc3.c | 2 +- tools/arch/x86/include/asm/rmwcc.h | 2 +- tools/include/linux/compiler_types.h | 4 +- .../testing/selftests/dt/test_unprobed_devices.sh | 13 +- tools/testing/selftests/landlock/common.h | 48 +- tools/testing/selftests/landlock/fs_test.c | 11 +- tools/testing/selftests/landlock/net_test.c | 13 +- .../selftests/mm/charge_reserved_hugetlb.sh | 2 +- tools/testing/selftests/mm/ksm_tests.c | 2 +- tools/testing/selftests/mm/map_hugetlb.c | 7 + tools/testing/selftests/mm/va_high_addr_switch.sh | 6 + tools/testing/selftests/mm/write_hugetlb_memory.sh | 2 +- .../selftests/net/forwarding/bridge_locked_port.sh | 4 +- .../testing/selftests/net/forwarding/bridge_mdb.sh | 14 +- .../selftests/net/forwarding/tc_flower_l2_miss.sh | 8 +- tools/testing/selftests/net/mptcp/config | 3 + tools/testing/selftests/net/mptcp/mptcp_join.sh | 10 +- tools/testing/selftests/net/mptcp/mptcp_lib.sh | 11 +- tools/testing/selftests/net/mptcp/settings | 2 +- tools/testing/selftests/net/mptcp/userspace_pm.sh | 31 +- .../selftests/net/test_bridge_backup_port.sh | 394 +++++----- tools/tracing/rtla/Makefile | 7 +- tools/tracing/rtla/src/osnoise_hist.c | 9 +- tools/tracing/rtla/src/osnoise_top.c | 6 +- tools/tracing/rtla/src/timerlat_hist.c | 9 +- tools/tracing/rtla/src/timerlat_top.c | 6 +- tools/tracing/rtla/src/utils.c | 14 +- tools/tracing/rtla/src/utils.h | 2 + tools/verification/rv/Makefile | 7 +- tools/verification/rv/src/in_kernel.c | 2 +- 340 files changed, 3297 insertions(+), 2530 deletions(-)
Hello,
On Tue, 20 Feb 2024 21:52:39 +0100 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.7.6 release. There are 309 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 22 Feb 2024 20:55:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.7.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.7.y and the diffstat can be found below.
This rc kernel passes DAMON functionality test[1] on my test machine. Attaching the test results summary below. Please note that I retrieved the kernel from linux-stable-rc tree[2].
Tested-by: SeongJae Park sj@kernel.org
[1] https://github.com/awslabs/damon-tests/tree/next/corr [2] c40c992a3e2e ("Linux 6.7.6-rc1")
Thanks, SJ
[...]
---
ok 1 selftests: damon: debugfs_attrs.sh ok 2 selftests: damon: debugfs_schemes.sh ok 3 selftests: damon: debugfs_target_ids.sh ok 4 selftests: damon: debugfs_empty_targets.sh ok 5 selftests: damon: debugfs_huge_count_read_write.sh ok 6 selftests: damon: debugfs_duplicate_context_creation.sh ok 7 selftests: damon: debugfs_rm_non_contexts.sh ok 8 selftests: damon: sysfs.sh ok 9 selftests: damon: sysfs_update_removed_scheme_dir.sh ok 10 selftests: damon: reclaim.sh ok 11 selftests: damon: lru_sort.sh ok 1 selftests: damon-tests: kunit.sh ok 2 selftests: damon-tests: huge_count_read_write.sh ok 3 selftests: damon-tests: buffer_overflow.sh ok 4 selftests: damon-tests: rm_contexts.sh ok 5 selftests: damon-tests: record_null_deref.sh ok 6 selftests: damon-tests: dbgfs_target_ids_read_before_terminate_race.sh ok 7 selftests: damon-tests: dbgfs_target_ids_pid_leak.sh ok 8 selftests: damon-tests: damo_tests.sh ok 9 selftests: damon-tests: masim-record.sh ok 10 selftests: damon-tests: build_i386.sh ok 11 selftests: damon-tests: build_arm64.sh ok 12 selftests: damon-tests: build_m68k.sh ok 13 selftests: damon-tests: build_i386_idle_flag.sh ok 14 selftests: damon-tests: build_i386_highpte.sh ok 15 selftests: damon-tests: build_nomemcg.sh [33m [92mPASS [39m
This is the start of the stable review cycle for the 6.7.6 release. There are 309 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 22 Feb 2024 20:55:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.7.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.7.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my x86_64 and ARM64 test systems. No errors or regressions.
Tested-by: Allen Pais apais@linux.microsoft.com
Thanks.
On Tue, Feb 20, 2024 at 09:52:39PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.7.6 release. There are 309 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Successfully compiled and installed the kernel on my computer (Acer Aspire E15, Intel Core i3 Haswell). No noticeable regressions.
Tested-by: Bagas Sanjaya bagasdotme@gmail.com
On 20 Feb 21:52, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.7.6 release. There are 309 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 22 Feb 2024 20:55:42 +0000. Anything received after that time might be too late.
Hi Greg,
no noticeable regressions in my system.
Tested-by: Ricardo B. Marliere ricardo@marliere.net
Thanks, - Ricardo.
On Wed, 21 Feb 2024 at 02:57, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.7.6 release. There are 309 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 22 Feb 2024 20:55:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.7.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.7.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 6.7.6-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-6.7.y * git commit: c40c992a3e2ef2cd425dd1628140a318e9ec9b23 * git describe: v6.7.4-439-gc40c992a3e2e * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.7.y/build/v6.7.4-...
## Test Regressions (compared to v6.7.4)
## Metric Regressions (compared to v6.7.4)
## Test Fixes (compared to v6.7.4)
## Metric Fixes (compared to v6.7.4)
## Test result summary total: 252994, pass: 220458, fail: 2893, skip: 29351, xfail: 292
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 145 total, 143 passed, 2 failed * arm64: 51 total, 50 passed, 1 failed * i386: 41 total, 40 passed, 1 failed * mips: 26 total, 26 passed, 0 failed * parisc: 4 total, 4 passed, 0 failed * powerpc: 36 total, 34 passed, 2 failed * riscv: 18 total, 18 passed, 0 failed * s390: 13 total, 13 passed, 0 failed * sh: 10 total, 10 passed, 0 failed * sparc: 8 total, 8 passed, 0 failed * x86_64: 46 total, 45 passed, 1 failed
## Test suites summary * boot * kselftest-android * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers-dma-buf * kselftest-efivarfs * kselftest-exec * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-filesystems-epoll * kselftest-firmware * kselftest-fpu * kselftest-ftrace * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-membarrier * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-net-forwarding * kselftest-net-mptcp * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-user_events * kselftest-vDSO * kselftest-vm * kselftest-watchdog * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-fsx * ltp-hugetlb * ltp-io * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-pty * ltp-sched * ltp-securebits * ltp-smoke * ltp-smoketest * ltp-syscalls * ltp-tracing * perf * rcutorture
-- Linaro LKFT https://lkft.linaro.org
On 20/02/2024 20:52, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.7.6 release. There are 309 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 22 Feb 2024 20:55:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.7.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.7.y and the diffstat can be found below.
thanks,
greg k-h
No new regressions for Tegra ...
Test results for stable-v6.7: 10 builds: 10 pass, 0 fail 26 boots: 26 pass, 0 fail 116 tests: 115 pass, 1 fail
Linux version: 6.7.6-rc1-gc40c992a3e2e Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Test failures: tegra186-p2771-0000: pm-system-suspend.sh
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On 2/20/24 12:52 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.7.6 release. There are 309 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 22 Feb 2024 20:55:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.7.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.7.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
On 2/20/24 13:52, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.7.6 release. There are 309 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 22 Feb 2024 20:55:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.7.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.7.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
linux-stable-mirror@lists.linaro.org